XGBoost: Quantifying Feature Importances

Question

I need to quantify the importance of the features in my model. However, when I use XGBoost to do this, I get completely different results depending on whether I use the variable importance plot or the feature importances.

For example, if I use model.feature_importances_ versus xgb.plot_importance(model) I get values that do not align. Presumably the feature importance plot uses the feature importances, bu the numpy array feature_importances do not directly correspond to the indexes that are returned from the plot_importance function.

Here is what the plot looks like:

But this is the output of model.feature_importances_ gives entirely different values:

array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.00568182,  0.        ,  0.        ,  0.        ,
        0.13636364,  0.        ,  0.        ,  0.        ,  0.01136364,
        0.        ,  0.        ,  0.        ,  0.        ,  0.07386363,
        0.03409091,  0.        ,  0.00568182,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.00568182,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.00568182,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.01704546,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.05681818,  0.15909091,  0.0625    ,  0.        ,
        0.        ,  0.        ,  0.10227273,  0.        ,  0.07386363,
        0.01704546,  0.05113636,  0.00568182,  0.        ,  0.        ,
        0.02272727,  0.        ,  0.01136364,  0.        ,  0.        ,
        0.11363637,  0.        ,  0.01704546,  0.01136364,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ], dtype=float32)

If I just try to grab Feature 81 (model.feature_importances_[81]), I get:0.051136363. However model.feature_importances_.argmax() returns 72.

So the values do not correspond to each other and I am unsure about what to make of this.

Does anyone know why these values are not concordant?

Welcome to the site! XGBoost produces multiple measures of feature "importance" (3 actually). Check that the same type of feature importances are being output. — bradS, Jul 10 '18 at 07:43
Good idea @bradS. I'll take a closer look. Any idea how to specify the type in for model.feature_importances_? I know how to specify it with xgb.plot_importances(model), but it is not clear if you can change it with the .feature_importances_ method. — NLR, Jul 10 '18 at 19:48
This suggests using model.booster().get_score(importance_type='weight')... I'd wager changing the importance_type will solve your issue. — bradS, Jul 11 '18 at 08:17

Anton Tarasenko · Answer 1 · 2018-11-28T14:11:51.127

In xgboost 0.7.post3:

XGBRegressor.feature_importances_ returns weights that sum up to one.
XGBRegressor.get_booster().get_score(importance_type='weight') returns occurrences of the features in splits. If you divide these occurrences by their sum, you'll get Item 1. Except here, features with 0 importance will be excluded.
xgboost.plot_importance(XGBRegressor.get_booster()) plots the values of Item 2: the number of occurrences in splits.
XGBRegressor.get_booster().get_fscore() is the same as XGBRegressor.get_booster().get_score(importance_type='weight')

Method get_score returns other importance scores as well. Check the argument importance_type.

In xgboost 0.81, XGBRegressor.feature_importances_ now returns gains by default, i.e., the equivalent of get_score(importance_type='gain'). See importance_type in XGBRegressor.

So, for importance scores, better stick to the function get_score with an explicit importance_type parameter.

Also, check this question for the interpretation of the importance_type parameter: "weight", "gain", and "cover".

Thanks for this! It was driving me crazy that everything said feature_importances_ was weight but it seemed to be gain. — Rob, Jan 25 '22 at 21:21

XGBoost: Quantifying Feature Importances

1 Answers1