Sklearn has a feature_importances_
attribute, but this is highly model-specific and I'm not sure how to interpret it as removing the most important feature does not necessarily decrease the models quality most.
Is there a model-agnostic way to tell which features are important for a prediction problem?
The only way I could see is the following:
- Use an ensemble of different models
- Either start with a big set of features and remove one at a time. To find the features "uplift", compare the ensembles quality with the full feature set against the ensembles quality against the removed feature set.
(What this can't do is to find connected features: Some features might be not exactly the same, but have a common underlying cause which is important for the prediction. Hence removing either of them doesn't change much, but removing both might change a lot. I ask another question for that.)