I am dealing with multi-class classifiers. My data is unbalanced. Hence, I need to apply sampling techniques before training (undersampling or oversampling). When I apply undersampling, loss and val_loss, as well as acc and val_acc show a good fit. In this case, is it still necessary to oversample the data? What results should I expect?
Asked
Active
Viewed 535 times
3
-
3Almost never: it turns out that class imbalance is not a problem when proper evaluation methods are used. – Dave Sep 07 '21 at 19:18
-
Thank you for the link @Dave – Kyv Sep 07 '21 at 20:33
1 Answers
2
The only case where I would consider resampling data is when there is a requirement to improve recall for a particular class. Thus the goal would be to force the classifier to predict this class more often, even though it usually means decreasing performance in general.
Resampling is an easy method but rarely the optimal one. In general I'd first do an analysis of the errors made by the classifier, possibly consider alternative designs and/or feature engineering.
Erwan
- 25,321
- 3
- 14
- 35
-
Why calculate a wrong posterior probability instead of altering the classification threshold? – Dave Sep 08 '21 at 17:46
-
@dave I agree that setting the classification threshold is a good solution for binary classification (assuming a soft classifier which outputs a probability), but if I'm not mistaken it's not possible in the case of multiclass classification. – Erwan Sep 08 '21 at 22:43
-
The trouble in a multiclass problem is that if you set a high standard of, say, $0.9$ probability of class membership, you might wind up with no class meeting your standard and having to consider that a grey zone. Frank Harrell would see this as a positive, however. // I have no idea how software handles this, though, and assume most standard packages just pick the class with the highest probability. – Dave Sep 08 '21 at 22:55
-
@Dave Interesting, thanks. Yes, I think most software apply the one-vs-rest strategy which boils down to picking the class with the highest probability. – Erwan Sep 10 '21 at 09:53