Why not always use the ADAM optimization technique?

Question

It seems the Adaptive Moment Estimation (Adam) optimizer nearly always works better (faster and more reliably reaching a global minimum) when minimising the cost function in training neural nets.

Why not always use Adam? Why even bother using RMSProp or momentum optimizers?

I don't believe there is any strict, formalized way to support either statement. It's all purely empirical, as error surface is unknown. As a rule of thumb, and purely from m experience, ADAM does well where others fail (instance segmentation), although not without drawbacks (convergence is not monotone) — Alex, May 08 '18 at 08:53
Adam is faster to converge. SGD is slower but generalizes better. So at the end it all depends on your particular circumstances. — agcala, Mar 21 '19 at 12:10
https://en.wikipedia.org/wiki/No_free_lunch_theorem would seem relevant. Different optimization algorithms work better on different problems, and there is no universally superior one. — endolith, Nov 28 '22 at 20:55

score 42 · Accepted Answer · edited Aug 05 '20 at 08:48

42

Here’s a blog post reviewing an article claiming SGD is a better generalized adapter than ADAM.

There is often a value to using more than one method (an ensemble), because every method has a weakness.

edited Aug 05 '20 at 08:48

Zephyr

997
4
10
20

answered Apr 15 '18 at 17:36

Christopher Klaus

556
5
6

score 19 · Answer 2 · answered Apr 15 '18 at 22:25

19

You should also take a look at this post comparing different gradient descent optimizers. As you can see below Adam is clearly not the best optimizer for some tasks as many converge better.

answered Apr 15 '18 at 22:25

4

Just for the record: In the linked article they mention some of the flaws of ADAM and present AMSGrad as a solution. However, they conclude that whether AMSGrad outperform ADAM in practices is (at the time of writing) non-conclusive. – Lus Sep 19 '19 at 11:24
this link appears to be offline – OuttaSpaceTime Jul 26 '22 at 18:17
@OuttaSpaceTime the link works for me! (not sure if fixed or edited) – tturbo Nov 04 '22 at 10:11
1

@tturbo you are right! now it's working. Thanks for pointing this out – OuttaSpaceTime Nov 04 '22 at 11:21

Why not always use the ADAM optimization technique?

2 Answers2

Linked