I have a paper where we try to maximize F1 score by using different techniques. We hoped that a ranking algorithm like RankNet would be able to get better F1 score than the others. But as you can see looking at our table, regular neural networks, without even using a cost matrix, were good enough. Creating synthetic samples until classes were balanced and even using MetaCost was also largely irrelevant.

We did have a F1 gain when using RankNet, but that was likely due to our post-processing where we convert the ranking score into classes. But that is a pain. I think you can get similar results by, instead of using 0.5 as your threshold for choosing the positive class, choosing a threshold that maximizes the F1 score by just using the training data.
After working on class imbalance for awhile, I think this topic is a great way to get many publications easily, but, in the real world, a post-processing step and/or using a cost matrix to balance priors is more than enough.
ps: Please do not take this results as holy grail. Neural networks are notoriously difficult to optimize, especially across a range of very different datasets. Some cross validation was performed as mentioned in the paper, but, due to lack of time, not as much as we should have. And we did not allow that many iterations. I would try in your own code to introduce weights and some simple things to see if it makes a difference. Do let me know how it works please.