F1 maximization with convolutional neural net. for an imbalanced dataset

Question

I'm dealing with an imbalanced dataset for binary classification (about 70% to 30%). I was wondering what is the best way to optimize the F1 score for such a task when using a convolutional neural net.

As of now, I'm sampling the dataset in order to create a balanced training set and am using the mean of softmax_cross_entropy_with_logits (from tf) with a regularization term as my loss.

How can I optimize for the F1 score? As it isn't convex I can't just plug it in as my loss, right? Most papers I found mentioned finding the best threshold. However, these were relatively old, is there something better which can be done for convolutional neural nets ?

Ricardo Cruz · Answer 1 · 2016-12-19T15:01:36.303

I have a paper where we try to maximize F1 score by using different techniques. We hoped that a ranking algorithm like RankNet would be able to get better F1 score than the others. But as you can see looking at our table, regular neural networks, without even using a cost matrix, were good enough. Creating synthetic samples until classes were balanced and even using MetaCost was also largely irrelevant.

We did have a F1 gain when using RankNet, but that was likely due to our post-processing where we convert the ranking score into classes. But that is a pain. I think you can get similar results by, instead of using 0.5 as your threshold for choosing the positive class, choosing a threshold that maximizes the F1 score by just using the training data.

After working on class imbalance for awhile, I think this topic is a great way to get many publications easily, but, in the real world, a post-processing step and/or using a cost matrix to balance priors is more than enough.

ps: Please do not take this results as holy grail. Neural networks are notoriously difficult to optimize, especially across a range of very different datasets. Some cross validation was performed as mentioned in the paper, but, due to lack of time, not as much as we should have. And we did not allow that many iterations. I would try in your own code to introduce weights and some simple things to see if it makes a difference. Do let me know how it works please.

score 2 · Answer 2 · edited May 23 '17 at 12:38

The problem with directly optimising the F1 score is not that it is non-convex, rather that it is non-differentiable. The surface for any loss function for typical neural networks is highly non-convex.

What you can do instead, is optimise a surrogate function that is close to the F1 score, or when minimised produces a good F1 score. One way to do this is to simply optimise the cross entropy as usual with weights on the classes - see this answer on stackoverflow.

F1 maximization with convolutional neural net. for an imbalanced dataset

2 Answers2