3

I'm reading sklearn Decision Trees reference page.

In the advantages section, it is mentioned that 'Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model.'

Can someone please explain what statistical tests are performed to validate the DT model?

Venkatesh Gandi
  • 263
  • 1
  • 3
  • 10

2 Answers2

1

I am assuming you are using the D.T. for binary classification for a moment. One of the first tests I learned (and still dam good) is the 2x2 contingency table or frequency table or marginal frequencies or 2-way tables (many names means it's been around a while). It is simple to use and can branch off into many other tests and areas. Such as;

  • Phi Coefficient of Association
  • Chi-Square Test of Association
  • Fisher Exact Probability Test
  • Accuracy and precision
  • F-score or F-measure, ...(to name a few)

I usually like Kahn Academy way over Wikipedia(hate their explanations).

mccurcio
  • 223
  • 2
  • 11
0

I got the answer to the question. What I understood from the statement is the validation of the DT model means the splitting criteria in DT is decided by a statistical test instead of Gini Index, Entropy/Information Gain. For more information, one can refer this.

I find another perspective of DT splits.

Venkatesh Gandi
  • 263
  • 1
  • 3
  • 10
  • I looked over the reference above, and IMHO, although it is correct, It does not promote a broader understanding of the issues and alternatives. my 2 cents – mccurcio May 02 '20 at 21:26