2

Is there a standard term for a feature that always has the same value, i.e. that can be discarded without loss of information?

For example I am trying to classify cats vs dogs, and every example in my training set has has_two_eyes=true.

I am thinking something like "useless," "redundant," "constant," or "degenerate," but I don't know what the standard terminology is here.

Ethan
  • 1,633
  • 9
  • 24
  • 39
Imran
  • 2,381
  • 12
  • 22

2 Answers2

4

I think there are a few terms, but the one I have seen most often is "Zero Variance Predictor" or "Zero Variance Feature"

Dan Scally
  • 1,754
  • 7
  • 25
1

After some more research, I believe this is a type of "redundant" feature, in the language of Machine Learning. A redundant feature is one that can be proven to add no information by looking at the inputs only. This might also be used to describe a feature which has a correlation of ±1 with another feature, but perfectly-correlated and zero-variance features are both redundant in that you can see they add no information without looking at the targets.

I'd probably call it a "redundant, constant-value feature" to be specific.

Redundant features are related to irrelevant features, which have no predictive power. However, it is necessary to consider the targets to determine if a feature is irrelevant. This can be done by calculating feature importance, for example. An example of an irrelevant feature can be one that takes a random value, which will thus not be correlated with the targets, and therefore can't possibly help predict the target value of a previously unseen example.

The Wikipedia article on feature selection does a pretty good job of explaining these concepts.

I believe "Zero Variance Predictor" as Dan Scally answered is equally valid, just more common in Statistics - so if that's your field then it might be the more appropriate term.

I find "near-zero variance predictor" to be a lot more descriptive in the case where the feature is close to but not exactly zero variance. "Zero variance predictor" seems like a convoluted way of saying the feature always takes the same value, so I'd prefer to call it a constant-value feature, but this is just my preference.

Imran
  • 2,381
  • 12
  • 22
  • 2
    I think redundant is inappropriate as I believe it is better suited to features that are similar in your feature set. For instance if you took feature x1 and made a copy of it x2, then x2 (or x1) could be redundant. At least this fits more in line with the English definition of the word redundant – astel Sep 20 '19 at 19:24