2

Is it possible to do hard-coded decision tree on some variables and random forest / something on the remaining ones?

The situation seems that for some variables it's possible to draw strong empirical assumptions, but for others their "relative importance" seems more random.

So e.g.

Researcher is certain that splitting X1 > 5 and X2 < 3 gives best information, since they are empirically sound splits e.g. based on stakeholder views. And X1, X2 are more important than X3, X4, X5, since X3, X4, X5 are redundant, if X1 or X2 don't exist.

Thus the model could essentially be based on X1, X2 only , but X3, X4, X5 should add explanatory power. Yet their relative importances are not known. Using the decision tree to them might be prone to model inaccuracies due to random forest or something perhaps offering better reduction in overfitting etc.

mavavilj
  • 416
  • 1
  • 3
  • 12
  • Definetely, yes, the questions is how to do in pythonic way. Or whatever you are coding – Carlos Mougan Jul 12 '21 at 07:52
  • @CarlosMougan I don't find it definite, because how to e.g. balance the metrics of the model, if part of it is decision tree and part is RF. Then what's the total accuracy for example? – mavavilj Jul 12 '21 at 08:25
  • @mavavilj performance (accuracy or other) can be calculated for any predicting system, including a human making predictions for example. It is not related to how the predicting system is made. As I said in your other question you probably have some confusion about the distinction between a predicting system and evaluation. – Erwan Jul 13 '21 at 20:13
  • @Erwan I understand the technical difference, but I'm confused about the difference of accuracy gained by "expert opinion" and "computational methods", as in this example: https://pubmed.ncbi.nlm.nih.gov/9555627/. So, if we have "strong expert opinion" on X1, X2 above, but not on X3, X4, X5. Then this could motivate one to use different methods on them. While the global prediction should still be aggregated from all. Using e.g. RF for all could distort expert opinion on X1, X2 in favor of "numerical idea of truth". Evaluation is also pointless, if the model is faulty(?) – mavavilj Jul 14 '21 at 04:56
  • @mavavilj this abstract doesn't give any detail about what the authors do. More importantly, selecting the features is like choosing a method to solve the problem, as you said. But evaluation is not about the method, it's about determining how well the method works. Evaluation can be applied to any system, whether it's good or bad, including a system which gives random predictions or always the same output. If the system is faulty, then evaluation should return a low performance score, that's it. So it must be possible to compare any system by their performance. – Erwan Jul 14 '21 at 09:45
  • In case this comparison helps, evaluation works exactly like a drug trial: the goal is to objectively and precisely determine whether the new drug works or not. The evaluation methodology or the drug trial design has nothing to do with how the drug was prepared, its components, dosage etc.: this what is being evaluated. – Erwan Jul 14 '21 at 09:48
  • The problem in this example with selecting features is that there's no definition of what is the end goal, i.e. what is the tree supposed to predict and how this output is evaluated. Without evaluation this is all just a subjective discussion, exactly like saying that there is a new drug for disease X is pointless before running the drug trial to assess its efficacy. – Erwan Jul 14 '21 at 09:50

3 Answers3

2

You could use stacking ensemble learning where one of the "learners" is the expert written decision tree. The meta learner will then apply the relevant weight to the expert model such that accuracy is maximized.

Iyar Lin
  • 799
  • 4
  • 18
  • The main question is about knowing how to weight such trees. A RF usually treats them as equals. But an expert tree may intuitively have more information, yet it's not complete. It's also possible that the expert tree is not as informative as a RF on non-expert trees. So is an expert tree just another tree? – mavavilj Sep 01 '22 at 13:34
  • 1
    Like I wrote, the meta learner in a stacked ensemble learner sets those weights automatically such that error is minimized. See this medium post for more: https://link.medium.com/uJ49BaQ8Xsb – Iyar Lin Sep 01 '22 at 13:38
1

The important point here is the distinction between rule-based and data-driven:

  • A rule-based predicting system is an algorithm which calculates the target variable based on rules which have been predetermined and implemented by a human expert.
  • A data-driven predicting system is an algorithm which is first trained on some labelled training data in order to automatically determine the "rules".

Both types of systems can be used to predict the target variable for some new instance. Both types of systems can (and should) be evaluated using some labeled test set.

So in general yes, one can hard-code a decision tree. If it's entirely hard-coded then it's a rule-based system.

In theory one can use a hybrid method to build a tree using some human-determined rules and the rest based on data. However in this case there's no established algorithm which says how the two types of "rules" should be combined (to the best of my knowledge). For example one could create the top of the tree manually and then let the learning algorithm determine the rest of the tree. Or one could create multiple trees, some rule-based and some data-driven, then use an ensemble method to combine their predictions.

But it's important to realize that this kind hybrid method could lead to inconsistencies: if the expert decides to give priority to a rule which is not supported (or doesn't have a high importance) in the data, then it's likely that the resulting model will perform poorly.

Erwan
  • 25,321
  • 3
  • 14
  • 35
0

It is possible to create a hybrid system between human selected and machine learned rules in a decision tree.

Hybrid systems have fallen out of favor because they are more difficult to create, use, and maintain. Often times the human domain experts that are most capable of creating useful rules are not capable of formatting the rules so the machine learning system can use them.

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109