Random Forest vs. RainForest

Question

I have studied Random Forest and RainForest papers, but they are a bit confusing! In summary, I understand following steps for these algorithms. Could you help me to find out if I am right or not?

I appreciate your help.

In Random forest first:

define number of trees
partition data by bootstrapping
on each partition construct trees (in each node a sub sample of features is selected)
label leaf nodes
for classifying a new instance vote over all trees.

In RainForest:

Partition dataset
Build AVC-set of a partition
Build tree over the partition by computing a purity criterion (like gini-index) over AVC-sets

Welcome Aboard, your RF ones looks fine as a big picture... Didnt know about Rain Forest thanks for that.. — Aditya, May 16 '18 at 19:34

score 1 · Accepted Answer · answered May 22 '18 at 04:07

1

Random forest is a learning algorithm. It is an ensemble learning algorithm that uses decision trees as base learners. You wrote the steps for it correctly.

Rain forest is not a learning algorithm. It is an algorithm of constructing a decision tree (how to do splitting) when the dataset is so large that it does not fit the memory. In rain forest, the whole dataset is not required for making a splitting decision. Only some aggregated information (AVC-set for an attribute or AVC-group if you have more memory) is required.

If your dataset is large, and memory is small, you can use rain forest to build several different decision trees. Then use random forest algorithm with those trees as base learners.

answered May 22 '18 at 04:07

Vladislav Gladkikh

1,136
10
19

You mean AVC-sets should be constructed according to the memory size, then any purity criterion could be deployed to build decision trees over them? – Fatemeh khodaparast May 23 '18 at 09:30
Yes. Simple algorithms to construct decision trees load the whole dataset into memory for evaluation of possible split points. The authors of RainForest noticed that it is not necessary because the utility of a predictor attribute as a possible splitting attribute does not depend on the other predictor attributes. Moreover, it is enough to know only a subset of distinct values in the attribute together with their class labels and their counts - which is the AVC-set. You need to load only the AVC-set for only one attribute at a time, and apply purity criterion based only on it. – Vladislav Gladkikh May 23 '18 at 13:27

Random Forest vs. RainForest

1 Answers1

Linked