2

I am reading the gini index definition for decision tree:

Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. 

This seems to be the same as misclassification. Is Gini index just a fancy name for misclassification? Or is there really some subtle difference? Thanks!

Edamame
  • 2,745
  • 5
  • 24
  • 33
  • Misclassification is a general term used in the statistics/ML literature, while Gini impurity is a misclassification metric to train CART models. – Fadi Bakoura May 11 '18 at 17:56

2 Answers2

1

Is Gini index just a fancy name for misclassification?

No.

Note that Gini index definition doesn't involve predicted values, and also it involves some probabilities, which are not dependent on classifier.

Also in context of decision trees, Gini impurity corresponds to each region, and is not a single value, such as missclassification rate (technically you could also count missclassification rate per region, but then you'd also ).

See this notebook for a concrete example.

Jakub Bartczuk
  • 289
  • 1
  • 8
0

To compute misclassification rate, you should specify what the method of classification is.

Gini impurity uses a random classification with the same distribution of labels as in the set. i.e., if a set had 70 positive and 30 negative examples, each example would be randomly labeled: 70% of the time as positive and 30% of the time as negative. The misclassification rate for this classifier will be:

= Pr[Positive] * Pr[Label is Negative] + Pr[Negative] * Pr[Label is Positive]

= 0.7 * 0.3 + 0.3 * 0.7 = 0.42

We can also compute misclassification rate using a different classifier method: a majority rule. In the above example, we would always predict positive. Misclassification rate will be:

= Pr[Positive] * Pr[Label is Negative] + Pr[Negative] * Pr[Label is Positive]

= 0.7 * 0 + 0.3 * 1 = 0.3

We see that Gini impurity is one specific type of misclassification rate.

raghu
  • 641
  • 5
  • 4