Why training deep learning models is more likely to suffer from plateaus than local minima?
Could anyone illustrate it.