So this is more of a yes/no question. I'm not sure where/how to ask simple questions on stack. Anyways, the definition of gradient descent says gradient descent takes you down the steepest path. Assuming we always move towards the minimum, is this always true?
For example, if there is a elongated cone shaped plane in a 3d space (like a sheet of paper folded in half), is moving diagonal down that plane a steeper descent then moving down one axis first and then the other? In the gradient descent algo, it seems like we adjust all x values at once as opposed to one at a time.
Thanks.
\mathbb R
for $\mathbb R$ – gen-ℤ ready to perish Feb 22 '20 at 05:15