How to Validate Decision Tree model by using statistical tests?

Question

I'm reading sklearn Decision Trees reference page.

In the advantages section, it is mentioned that 'Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model.'

Can someone please explain what statistical tests are performed to validate the DT model?

score 1 · Answer 1 · answered May 02 '20 at 21:21

1

I am assuming you are using the D.T. for binary classification for a moment. One of the first tests I learned (and still dam good) is the 2x2 contingency table or frequency table or marginal frequencies or 2-way tables (many names means it's been around a while). It is simple to use and can branch off into many other tests and areas. Such as;

Phi Coefficient of Association
Chi-Square Test of Association
Fisher Exact Probability Test
Accuracy and precision
F-score or F-measure, ...(to name a few)

I usually like Kahn Academy way over Wikipedia(hate their explanations).

answered May 02 '20 at 21:21

mccurcio

223
2
11

These are great alternatives even though some of them will not provide p values. – Venkatesh Gandi May 03 '20 at 04:29
1

I am afraid you are sadly mistaken, all of the above are or have associated statistical tests. – mccurcio May 03 '20 at 18:17
Ok, That is great, can you share some links to know more. – Venkatesh Gandi May 03 '20 at 18:39

score 0 · Answer 2 · answered May 02 '20 at 18:27

0

I got the answer to the question. What I understood from the statement is the validation of the DT model means the splitting criteria in DT is decided by a statistical test instead of Gini Index, Entropy/Information Gain. For more information, one can refer this.

I find another perspective of DT splits.

answered May 02 '20 at 18:27

Venkatesh Gandi

263
1
3
10

I looked over the reference above, and IMHO, although it is correct, It does not promote a broader understanding of the issues and alternatives. my 2 cents – mccurcio May 02 '20 at 21:26

How to Validate Decision Tree model by using *statistical tests*?

2 Answers2

How to Validate Decision Tree model by using statistical tests?