0

I have a large dataset, where I should make a binary prediction. The fact is that, after analyzing the data, I found that some variables are positively correlated to each other. So, I was wondering whether I have to delete some variables and keep the others(i.e if A and B are correlated, should I delete A and leave B in the data) to continue the process or What is the best way to deal with this kind of problem ?

2 Answers2

1

It depends. If you are using this data on a linear model it is better to remove correlated features. But some non-linear complex model can use or eliminate these correlated feature automatcially.

SrJ
  • 838
  • 4
  • 9
0

Yes you have to remove one of them. For example when you plot a heatmap and notice that 2 features A and B have a correlation value of 0.91, remove one of them as removing both of them will lead to information loss.

After removing one of them, again plot a heatmap of the remaining features and you'll notice the correlation values of other features have changed. So it is an iterative process. Now lets say you have 4 correlated features A, B, C and D. Instead of removing 2 of them, first remove one (either A or B) and then again plot the heatmap. If C and D are still correlated, only then remove either of them.

spectre
  • 2,055
  • 1
  • 12
  • 34
  • There must be a confusion here: the correlation between 2 variables doesn't depend on any other variable, so it's impossible that the correlation score between C and D would change after removing A or B. – Erwan Oct 08 '21 at 02:22
  • Thank you for your help – Ahmed Camara Oct 08 '21 at 05:09