Criteria used to create and select leaf nodes in sklearn

Question

I just want to know the details of what (and how) is the criteria used by sklearn.tree.DecisionTreeClassifier to create leaf nodes. I know that the parameters criterion{“gini”, “entropy”}, default=”gini” and splitter{“best”, “random”}, default=”best” are used to split nodes. However, I could not find more information about the threshold used for spliting.

There are some methods involved in the creation of leaf nodes: post-pruning (cutting back the tree after a tree has been built) and pre-pruning (preventing overfitting by trying and stopping the tree-building process early). It would be very useful to know more details about the criteria used for splitting to have a better understanding and be able to customize these models even more.

Related: https://datascience.stackexchange.com/a/77108/64377 — Erwan, Jul 29 '20 at 22:34

score 0 · Answer 1 · answered Aug 01 '20 at 03:35

Pre-pruning is handled by a variety of parameters: max_depth, min_samples_split, min_samples_leaf, min_weight_fraction_leaf, max_leaf_nodes, and min_impurity_decrease.

Post-pruning is relatively new to sklearn, and is accomplished with minimal cost-complexity pruning, parameter ccp_alpha.

Criteria used to create and select leaf nodes in sklearn

1 Answers1