0

I have two redundant features. A & B with 0.85 correlation. I know only one of them should be used to trained my model, but which feature should i use? A or B? Is there any method that can i use to know which redundant feature should i choosed? Or just select one of them randomly?

  • 1
    "only one of them should be used" isn't necessarily true, see https://datascience.stackexchange.com/q/24452/55122 for a start. I've seen people choose the "better" feature to keep, but I'm not sure if there's much study into the bias-variance tradeoff inherent in making that decision in preprocessing (or leakage, if the decision is based in part on the test set). – Ben Reiniger Oct 27 '22 at 14:24
  • Ben already touched on it with the linked thread; what kind of algorithms are you using? If it's a tree-based algorithm, feature removal in order to avoid colinear features isn't necessary in the first place. – OliverHennhoefer Oct 28 '22 at 08:53

0 Answers0