I am working on a certain kaggle competition and users there say that they are using >5000 features and training a XGBoost or Random Forest on it.
The mentioned post is here: https://www.kaggle.com/c/walmart-recruiting-trip-type-classification/forums/t/17258/feature-counts
I tried doing so myself after exploding the feature space and creating interaction features, but it takes forever to run on my 16GB macbook. For example: (96000,1000) data took around 5 hrs to finish to train an RF with n_estimators=1000,max_depth=40, cv=3.
Tuning such a model would take weeks and it only uses 1000 predictors....
Now here are some of my doubts: 1. Is my python scikit environment properly loaded. I use anaconda distribution. Are these running times normal?