How many features do you generally use for your ML Model?

Question

I am working on a certain kaggle competition and users there say that they are using >5000 features and training a XGBoost or Random Forest on it.

The mentioned post is here: https://www.kaggle.com/c/walmart-recruiting-trip-type-classification/forums/t/17258/feature-counts

I tried doing so myself after exploding the feature space and creating interaction features, but it takes forever to run on my 16GB macbook. For example: (96000,1000) data took around 5 hrs to finish to train an RF with n_estimators=1000,max_depth=40, cv=3.

Tuning such a model would take weeks and it only uses 1000 predictors....

Now here are some of my doubts: 1. Is my python scikit environment properly loaded. I use anaconda distribution. Are these running times normal?

score 0 · Answer 1 · answered Dec 11 '16 at 04:52

The number of features should take into account the number of examples you have in the training data.

If your model training takes too long, have you considered principal components analysis or feature selection? Both techniques can cut down the number of features into a more manageable size without throwing away information which describes the target variable.

How many features do you generally use for your ML Model?

1 Answers1