Is there a model-agnostic way to determine feature importance?

Question

Sklearn has a feature_importances_ attribute, but this is highly model-specific and I'm not sure how to interpret it as removing the most important feature does not necessarily decrease the models quality most.

Is there a model-agnostic way to tell which features are important for a prediction problem?

The only way I could see is the following:

Use an ensemble of different models
Either start with a big set of features and remove one at a time. To find the features "uplift", compare the ensembles quality with the full feature set against the ensembles quality against the removed feature set.

(What this can't do is to find connected features: Some features might be not exactly the same, but have a common underlying cause which is important for the prediction. Hence removing either of them doesn't change much, but removing both might change a lot. I ask another question for that.)

noe · Accepted Answer · 2017-10-21T09:18:11.870

8

Ways to "determine feature importance" are normally called feature selection algorithms.

There are 3 types of feature selection algorithms:

Filter approaches: they choose variables without using a model at all, just looking at the feature values. One example is scikit-learn's variance threshold selector.
Wrapper approaches: they use whatever prediction algorithm to score different subsets of features and choose the best subset based on that. These use a model but are model agnostic, as they don't care about which model you use. One example is recursive feature elimination.
Embedded approaches: in these approaches, the variable selection is part of a model, hence the feature selection and the model are coupled together. This is the case of the feature_importances_ in random forest algorithm.

From the question, I understand that both filter and wrapper approaches are suitable for the OP needs. A classic article that covers both very well is this one by Kovavi and John.

Here you can see an overview of scikit-learn feature selection capabilities, which includes examples of the three aforementioned types of variable selection algorithms.

edited Oct 21 '17 at 09:18

answered Oct 21 '17 at 09:12

noe

26,410
1
46
76

I get why I want to remove features with variance 0 (or very close to it), but I do that manually already. Why should I remove a Boolean feature which has only 80% of the time one value? Are there more interesting filter approaches? – Martin Thoma Oct 22 '17 at 09:38
2

Filter approaches are not very popular, precisely because they ignore the label information, which is normally the focus of the problem. You can try with Relief and their derivations; for instance in package scikit-rebate there are implementations of them. – noe Oct 22 '17 at 09:53
@ncasas I disagree with the first sentence, there are plenty of approaches to assess importance without performing feature selection. One widely used feature importance approach would be looking at the coefficients of linear model (given standardized predictors). However, I would agree with the “inverted argument”, i.e. “feature selection” approaches normally eliminate less important features. – aivanov Apr 23 '20 at 17:07
Linear model coefficients are an example of the embedded approaches, mentioned in the third bullet in my answer. Selecting the most influential features does not imply eliminating the not so influential ones. It's just a tool. You choose what to do with the information. One option is to remove non-influential variables, but other options include interpreting the model, which is what linear model coefficients are frequently used for. – noe Apr 23 '20 at 17:17
@ncasas 1) I think we have different understanding of the term “selection”. The first paragraph from the Wikipedia link you provided says that feature selection “is the process of selecting the subset of relevant features for use in model construction” implying that irrelevant are not used / eliminated. 2) linear model coefficients are not “feature selection approach” – aivanov Apr 23 '20 at 18:40

score 0 · Answer 2 · answered Apr 23 '20 at 16:48

Yes, there are feature importance approaches which are model agnostic, i.e. can be applied to any predictive model.

For example, check the permutation importance approach, described in chapter "15.3.2 Variable Importance" in The Elements of Statistical Learning

Disclosure This part is copied from my another answer

Is there a model-agnostic way to determine feature importance?

2 Answers2