Feature creation: Problem with correlated features?

Question

I recently started to read about feature creation. I've seen some general guidelines although I am not really sure if they are completely true, for example:

1 - Linear classifiers for binary classification (e.g logistic regression) like binary features and linear relationships overall

2- Models such as trees, random forests etc. capture non-linear interactions, therefore are more likely to 'accept' features like ratios, etc.

What I also know is that in general correlation among features is not a good thing. However, when thinking of feature creation, it is pretty obvious that there will be correlation between the created feature and the features in the dataset that exist already.

For instance, I managed to improve logistic regression in the Titanic dataset by creating a binary feature called "woman_or_children" which depends directly on age and sex. While this feature is obviously highly correlated with age and sex, it improved the results of the classifier.

What I see then is that there is a tradeoff, but I cannot exactly see what it is. Can I view this in light of bias-variance tradeoff? I know that adding more features leads to increased variance and reduced bias, but this is a more general case, what about when the new features are directly correlated with the existing ones?

Correlation can cause issues in linear and generalized linear models like logistic regression, but it doesn’t have to. — Dave, Nov 14 '21 at 19:18
Colinear features can cause instability in linear models, but generally models like random forests can use the information and may, for instance, just split trees randomly between the two colinear features without hurting performance. You can always try a correlation analysis / clustering analysis as in the sklearn docs. This can remove colinear features, but I generally find that it's not usually necessary for classifiers. You can also just add a dimensionality reducer to your pipeline and perhaps deal with some of the colinearity that way. — neuroguy123, Nov 14 '21 at 19:41

Feature creation: Problem with correlated features?

0 Answers0