I just start learning Faster R-CNN and I have some doubts about the optimizer of this network. In my understanding, Adam optimizer performs much better than SGD in a lot of networks. However, the paper of Faster R-CNN choose SGD optimizer instead of Adam and a lot of implementations of Faster R-CNN I found on github use SGD as optimizer as well.
I guess that in case for Faster R-CNN Adam maybe doesn't have a better performance. After I looked up for my guessing, I found this answer link that gave me a rough idea. In the answer, it suggests that SGD is a better generalized adapter than ADAM. But I still need some more detailed explannations about it.
Here is my questions:
- Can we use Adam as optimizer for Faster R-CNN? If someone has used Adam for Faster R-CNN, could you share some results about Adam's performance?
- As the answer in the link above suggests, Adam may have worst performance in some special cases. I would like to ask in what kinds of special cases will Adam perform poorly. Can anyone gives me some examples? And does Faster R-CNN belongs to these special cases?