Questions tagged [feature-selection]

Methods and principles of selecting a subset of attributes for use in further modelling

Feature selection, also called attribute selection or feature reduction, refers to techniques for identifying a subset of features of a data set that are relevant to a given problem. By removing irrelevant and redundant features, successful feature selection can avoid the curse of dimensionality and improve the performance, speed, and interpretability of subsequent models.

Feature selection includes manual methods (such those those based on domain knowledge) and automatic methods. Automatic methods are often categorized into filter, wrapper, and embedded approaches.

Filter approaches perform feature selection as a separate preprocessing step before the learning algorithm. Filter approaches thus look only at the intrinsic properties of the data. Filter methods include Wilcoxon rank sum tests and Correlation based tests.

Wrapper approaches uses performance of a learning algorithm to select features. A search algorithm is “wrapped” around the learning algorithm to ensure the space of feature subsets is adequately searched. As such, wrapper methods can be seen as conducting the model hypothesis search within the feature subset search. Examples of wrapper approaches are simulated annealing and beam search.

Embedded approaches incorporate variable selection as a part of the training process, with feature relevance obtained analytically from the objective of the learning model. Embedded methods can be seen as a search in the combined space of feature subsets and hypotheses. Examples of embedded approaches are boosting and recursive ridge regression.

974 questions
8
votes
3 answers

Feature selection for tracking user activity within an application

I am developing a system that is intended to capture the "context" of user activity within an application; it is a framework that web applications can use to tag user activity based on requests made to the system. It is hoped that this data can…
7
votes
2 answers

Is there a model-agnostic way to determine feature importance?

Sklearn has a feature_importances_ attribute, but this is highly model-specific and I'm not sure how to interpret it as removing the most important feature does not necessarily decrease the models quality most. Is there a model-agnostic way to tell…
Martin Thoma
  • 18,880
  • 35
  • 95
  • 169
3
votes
1 answer

Boruta Python No feature Selected

I run Boruta with RandomForestClassifier the previous day on my data (nb features = 36) and got 17/36 confirmed. Now I run it again and there is 0/36 and stop at the 9th iteration. Any idea why this is happening? %%time rfc =…
irkinosor
  • 233
  • 1
  • 7
3
votes
2 answers

Feature Importance - How to choose the number of best features?

What is the standard or what method do you use to select part of features. For ex: Using random forest, I got the following feature importances: a : 25.4884726 b : 17.2736393 c : 12.3493490 d : 8.9383737 e : 8.1083837 f : 6.8272717 g :…
user1787687
  • 31
  • 1
  • 4
2
votes
2 answers

Can I expect necessarily higher accuracy when feature selection/dimension reduction is used to select a subset of features?

The feature selection/dimension reduction is performed to eliminate irrelevant or redundant features so it will improve the computation efficiency (less computationally expensive). My question is that can we expect any changes in the accuracy of…
Zeynab
  • 21
  • 1
2
votes
2 answers

What do you call a feature that always has the same value?

Is there a standard term for a feature that always has the same value, i.e. that can be discarded without loss of information? For example I am trying to classify cats vs dogs, and every example in my training set has has_two_eyes=true. I am…
Imran
  • 2,381
  • 12
  • 22
2
votes
1 answer

How does SHAP values help us to determine importance of a feature for a model trained by gradient boost?

I've read http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf and https://medium.com/@gabrieltseng/interpreting-complex-models-with-shap-values-1c187db6ec83 which is like a summary of the first link. In general…
J.Smith
  • 458
  • 3
  • 16
2
votes
2 answers

Best way to determine the number of features for RFE (recursive feature elimination)

I am applying the feature selection method, RFE (recursive feature elimination), from scikit-learn to a dataset. I do not have any pre-determined number of features for RFE and would rather get the number from data itself. So far, I applied range of…
TTZ
  • 133
  • 1
  • 5
2
votes
0 answers

Am I correct in finding correlations

I want to perform feature selection, having 128 real-valued standardized features and 1/0 labels. Below are feature a5 density histograms for Classes 1 and 0. The data is skewed, so that Class 1 is about 5% weight. Next I subtract right curve from…
noname7619
  • 323
  • 2
  • 9
2
votes
3 answers

Can we select features by examining scatterplots?

Suppose, I have a data set with eight features in my hand. I want to find features to predict the diamonds, hearts, clubs, spades. ------------------------------------------------------------------------------ | f1 | f2 | f3 | f4 |…
user9232
2
votes
1 answer

Ongoing feature selection

If you have a set of n features you have 2^n-1 non-empty feature subsets. As a result, if you pick one of them you are unlikely to have found the best one. To me, it seems intuitive that as you build your model, you would want to look at the things…
Abijah
  • 181
  • 7
1
vote
2 answers

Should i drop simple features after deriving more complex features from them?

I know for a fact that complex features projects the data into higher dimensions which makes the previously non-separable data linearly separable. But, Is this not true that these complex features will be highly correlated with the features from…
1
vote
0 answers

Using Self Organising Maps For Feature Selection

I have a time series of indicators for a single stock. Some of the features are in repeated sets e.g. simple moving average 5D, 10D , 20D, 100D. I only want to use one feature from each set. Is there a way for SOMs to be used to select the most…
Kern
  • 11
  • 1
1
vote
1 answer

Creating a composite score from dataset with no target variable

I have a dataset that includes 6 variables about prospective sales opportunities (probability of closing, days until expected close, age of opportunity, etc.). 2 of the columns are categorical and 4 are continuous. I am looking to create a composite…
1
vote
1 answer

How to use the $\chi^2$ test to select the features, that can be String or categorical?

I want statistics to select the characteristics that have the greatest relationship to the output variable. Thanks to this article, I learned that the scikit-learn library proposes the SelectKBest class that can be used with a set of different…
1
2 3