Can I use macro recall to check if my RF model is overfitting?

Question

I have a dataset with 837377 observations (51% to train, 25% to validation and 24% to test) and 19 features.

I calculated the recall score using average macro for train, validation and test and obtained:

Train: 0.9981845060159042 Val: 0.7559011239753489 Test: 0.7325217067167821

Can I say my multiclass and imbalanced Random Forest model is overfitting by saying that recall_train > recall_val and recall_train > recall_test? Is recall the best metric to use in this case?

How many classes has your dataset? What is their distribution? — Eduard, Feb 08 '23 at 12:50
11 Classes. For the train dataset - 0: 65295, 1: 870, 2: 469, 3: 1943, 4: 100725, ... — Just_4n0th3r_Pr0gr4mm3r, Feb 08 '23 at 14:40
BTW, I am also using IoU (Intersection over Union) for this analysis. Maybe this is a better metric in this case. — Just_4n0th3r_Pr0gr4mm3r, Feb 08 '23 at 17:30
I do not have practice with IoU, but I have learned that essentially is a fraction $\frac{|A \cap B|}{|A \cup B|}$ where $A$ and $B$ are, for example, geometrical objects (e.g., rectangles). Is this really what you want? — Eduard, Feb 09 '23 at 19:07
Yes, I saw that IoU can also be applied on my case. The equation is the following: IoU = true_positive/(true_positive+false_positive+false_negative). — Just_4n0th3r_Pr0gr4mm3r, Feb 10 '23 at 12:06

score 0 · Answer 1 · answered Feb 09 '23 at 19:10

0

I suggest using the micro or macro F1 score for unbalanced problems like yours.

To understand the difference between micro versus macro metric, read this great answer (and follow-up comments).

answered Feb 09 '23 at 19:10

Eduard

669
3
10

Can I use macro recall to check if my RF model is overfitting?

1 Answers1