Do you do EDA before or after model building?

Question

I can see this going either way, but it seems most of the time, it's done before model building. EDA, from my understanding, is intended to provide some insights, that can helpfully assist with model building.

I can also understand doing it after model building, more so in an iterative process, where you build an initial model with doing any EDA, and then use EDA, to perhaps, improve upon the initially built model.

What do you guys think?

Definitely before. There are a lot of problems you can avoid by doing EDA first, such as high sparsity, multicollinearity, many levels in factors, zero variance features, outliers and so on. It's always better to understand what you're dealing with before trying to model. — Chris, Apr 23 '21 at 21:44
It is really a situation-based question. It relies on what you want to see through EDA. If your need is to have a glance at the raw data source, EDA is at your hand; in addition, EDA is also useful when you need to how your data model works. — Memphis Meng, Apr 24 '21 at 03:43

score 1 · Answer 1 · answered Apr 23 '21 at 21:44

Doing EDA before is always helpful. It helps to create a better model. Suppose you have an image dataset and you do EDA by looking at the various sizes it contains and the various types of images it contains. It gives you an idea as to how big the image you are going to use in the network and also, what type of augmentations you are going to use. Same goes for other types of data.

If you do it afterwards, you go through the same above process but you just wasted the first run of your network (resources you could have used to get to the solution faster).

Familiarize yourself with the data initially and be patient, then, go for modelling.

score 0 · Answer 2 · answered Apr 24 '21 at 01:41

It depends. If you're building a neural network, you would benefit by doing an EDA of the errors instead of the raw data, to see what patterns your neural net isn't capturing.

EDA is great for discovering what features correlate with your target, and if you have a sense of the problem you can build complex features and test their correlations. Neural networks get rid of the need for feature engineering, and an error analysis instead after building a quick model may be more useful to refine the model pipeline.

Do you do EDA before or after model building?

2 Answers2