1

So this is more of a yes/no question. I'm not sure where/how to ask simple questions on stack. Anyways, the definition of gradient descent says gradient descent takes you down the steepest path. Assuming we always move towards the minimum, is this always true?

For example, if there is a elongated cone shaped plane in a 3d space (like a sheet of paper folded in half), is moving diagonal down that plane a steeper descent then moving down one axis first and then the other? In the gradient descent algo, it seems like we adjust all x values at once as opposed to one at a time.

Thanks.

confused
  • 407

1 Answers1

2

The gradient method gets you to the steepest direction, but in the sense of local first order approximation. Hence, it takes more than one step. - If it were the true minimum direction, we wouldn't even need any iterative optimization method.

Thus, generally, you need lots of steps to take.

More importantly, even if you follow the steepest descent direction, it's never guaranteed that you'll reach the global optimum. - this is called local optima problems.

And yes, gradient method changes all coordinates (since the gradient is by definition the collection of $n$ partial derivatives of $\newcommand{\reals}{{\mathbb R}}f:\reals^n \to \reals$.

Hope this helps you better understand it.

For more rigorous explanation, you can Google it. There are hundreds of documents and papers describing it. One good reference would be Wiki page,