2

I am applying the feature selection method, RFE (recursive feature elimination), from scikit-learn to a dataset. I do not have any pre-determined number of features for RFE and would rather get the number from data itself.

So far, I applied range of number of features, 1 to 10, for training data. For evaluation, I use the F1 from prediction outcome using the features from RFE. For serialization, I plan to use the number of features that provided the best F1.

What other methods may be used to determine the number of features for RFE? Thanks!

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109
TTZ
  • 133
  • 1
  • 5

2 Answers2

1

I encourage you to look at this method https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV, it allows you to recursively test all of your features based on the scoring method of your choice including F1

Hood LGV
  • 11
  • 1
0

Generally, there are 3 Feature Selection methods:

  • Filter Methods
  • Wrapper Methods
  • Embedded Methods

I believe Feature Selection is totally overrated. But what do I know?

There's an amazing Feature Selection course on Udemy by Dr. Soledad Galli:

https://www.udemy.com/feature-selection-for-machine-learning/learn/v4/content

FrancoSwiss
  • 1,067
  • 6
  • 10
  • Why do you feel feature selection is overrated? An explanation might help – HFulcher Feb 22 '19 at 20:29
  • (a) if you have 10 rows for each column, you should be fine (b) RFE sounds intellectually sound, but I've not seen a difference in accuracy of more than 0.10% – FrancoSwiss Feb 22 '19 at 20:32