I have a dataset with 837377 observations (51% to train, 25% to validation and 24% to test) and 19 features.
I calculated the recall score using average macro for train, validation and test and obtained:
Train: 0.9981845060159042 Val: 0.7559011239753489 Test: 0.7325217067167821
Can I say my multiclass and imbalanced Random Forest model is overfitting by saying that recall_train > recall_val and recall_train > recall_test? Is recall the best metric to use in this case?