Questions tagged [decision-trees]

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.

752 questions
15
votes
1 answer

Can gradient boosted trees fit any function?

For neural networks we have the universal approximation theorem which states that neural networks can approximate any continuous function on a compact subset of $R^n$. Is there a similar result for gradient boosted trees? It seems reasonable since…
Imran
  • 2,381
  • 12
  • 22
11
votes
1 answer

Decision tree, how to understand or calculate the probability/confidence of prediction result

For example, a drug prediction problem using a decision tree. I trained the decision tree model and would like to predict using new data. For example: patient, Attr1, Attr2, Attr3, .., Label 002 90.0 8.0 98.0 ... ? ===> predict drug…
GoingMyWay
  • 233
  • 1
  • 2
  • 9
10
votes
2 answers

Multicollinearity in Decision Tree

Can anybody please explain the affect of multicollinearity on Decision Tree algorithms (Classification and regression). I have done some searching but was not able to find the right answer as some say it affects it and others say it doesn't.
deepguy
  • 1,441
  • 8
  • 18
  • 39
8
votes
1 answer

How to (better) discretize continuous data in decision trees?

Standard decision tree algorithms, such as ID3 and C4.5, have a brute force approach for choosing the cut point in a continuous feature. Every single value is tested as a possible cut point. (By tested I mean that e.g. the Information gain is…
AutoMiner
  • 169
  • 1
  • 9
5
votes
4 answers

Are Decision Trees Robust to Outliers

I read that decision trees (I am using scikit-learn's classifier) are robust to outlier. Does that mean that I will not have any side-effect if I choose not to remove my outliers?
Jishan
  • 107
  • 1
  • 1
  • 11
4
votes
1 answer

Numeric variables in Decision trees

If we have numeric variable, decision trees will use < and > comparisons as splitting criteria. Lets consider this case : If our target variable is 1 for even numeric value, and 0 for odd numeric value. How to deal with this type of variables? How…
Venkatesh Gandi
  • 263
  • 1
  • 3
  • 10
4
votes
2 answers

Ordinal feature in decision tree

I am curious if ordinal features are treated differently from categorical features in decision tree, I am interested in both cases where target is categorical or continuous. If there is a difference, could you anybody point to good source with…
user1700890
  • 345
  • 1
  • 3
  • 13
3
votes
1 answer

What is the hypothesis space of decision tree learning?

Could you please explain what the hypothesis space for decision tree learning look like? And what is the cardinality of this space?
Said Savci
  • 141
  • 1
  • 3
3
votes
2 answers

How to Validate Decision Tree model by using *statistical tests*?

I'm reading sklearn Decision Trees reference page. In the advantages section, it is mentioned that 'Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model.' Can someone please…
Venkatesh Gandi
  • 263
  • 1
  • 3
  • 10
3
votes
1 answer

Gini Index in Regression Decision Tree

I want to implement my own version of the CART Decision Tree from scrach (to learn how it works) but I have some trouble with the Gini Index, used to express the purity of a dataset. More precisely, I don't understand how Gini Index is supposed to…
Nakeuh
  • 238
  • 1
  • 4
  • 11
2
votes
3 answers

Is it possible to do hard-coded decision tree on some variables and random forest / something on the remaining ones?

Is it possible to do hard-coded decision tree on some variables and random forest / something on the remaining ones? The situation seems that for some variables it's possible to draw strong empirical assumptions, but for others their "relative…
mavavilj
  • 416
  • 1
  • 3
  • 12
2
votes
0 answers

How to implement an oblique decision tree for regression?

There are numerous ways to induce an oblique decision tree in the decision tree induction domain, such as using a support vector machine to determine the best hyper-plane. However, is it possible to generate an oblique decision tree for regression?…
HZ-VUW
  • 121
  • 2
2
votes
2 answers

Decision Tree Induction using Information Gain and Entropy

I’m trying to build a decision tree algorithm, but I think I misinterpreted how information gain works. Let’s say we have a balanced classification problem. So, the initial entropy should equal 1. Let’s define information gain as follows:…
Krushe
  • 21
  • 2
2
votes
0 answers

How to come up with the splitting point in a decision tree?

I read https://www.researchgate.net/post/How_to_compute_impurity_using_Gini_Index I understand why choosing smallest gini index, but how do I come up with different candidate splits in the first place? How does R come up with the splits? Take the…
Leo Jiang
  • 21
  • 1
2
votes
2 answers

Difference between impurity and misclassificaton

I am reading the gini index definition for decision tree: Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.…
Edamame
  • 2,745
  • 5
  • 24
  • 33
1
2 3 4 5