Need help with a simple example where it's not clear that the gradient is in direction of "steepest ascent"

Question

Say I am on a point $(x^*,y^*)$ of a function $f(x,y)$ where the function value increases if I go a very small step in any positive direction (i.e. in the direction of a vector where the coordinates $x$ and $y$ are both positive), but the function increases MORE if I go in a very small step in another direction, say a vector where the $x$-coordinate is positive but the $y$-coordinate is negative. Doesn't that mean that the gradient does not point in the direction of steepest ascent?

There was a great answer in this thread about seeing the region around the point as "almost planar", but I still don't see why the function can't be differentiable in that point and still increase in both directions (even if its by a infinitesimal amount), and increase just a little bit more in one direction than another. Does it really HAVE to mean that there is a sharp turn just at that point? Why can't it be smooth but still not planar?

I have drawn two examples where I am imagining that the point I am evaluating the gradient at is $(0,0)$. From there, it is supposed to be steeper to go in the direction of $(-ax, -by)$ than $(ax, by)$:

Example 1

Example 2

I am fairly new to math and very technical explanations are still hard for me to understand. I know I am asking for much, but additional ways of looking at it which are not algebraic would help me the most.

Thanks.

score 1 · Answer 1 · answered Jun 13 '19 at 12:15

Around any non-stationary point, a smooth function is well approximated by a planar model

$$f(x+u,y+v)=f(x,y)+g_x(x,y)u+g_y(x,y)v,$$ where $g_x,g_y$ are the components of the gradient.

If you look for the direction of largest increase, you can maximize

$$f(x+u,y+v)-f(x,y)=g_x(x,y)\cos\theta+g_y(x,y)\sin\theta,$$

which can be done by finding the roots of the derivative

$$-g_x(x,y)\sin\theta+g_y(x,y)\cos\theta.$$

From this equation,

$$\tan\theta=\frac{g_y}{g_x},$$ and

$$\begin{cases}\cos\theta=\pm\dfrac{g_x}{\sqrt{g_x^2+g_y^2}},\\\sin\theta=\pm\dfrac{g_y}{\sqrt{g_x^2+g_y^2}},\end{cases}$$ where the signs are synchronized.

Hence after simplification,

$$f_{\max},f_{min}=f\pm\sqrt{g_x^2+g_y^2}$$ are obtained in opposite directions, parallel to the direction of the gradient. Always.

I appreciate your reply but this answer was a bit too technical for me I'm afraid, I'm not quite there yet. Will look at it later. — A_Weierstrass, Jun 17 '19 at 00:40

score 1 · Accepted Answer · answered Jun 13 '19 at 12:53

If a function is increasing in one direction, but increasing faster in another direction, it does not mean the gradient is not the direction of steepest increase; it means the first direction is not the direction of the gradient.

If the function is differentiable, it can have "bumps" and/or concave/convex "bowls" like the ones you've drawn, but if you zoom in very close to the point $(x^*,y^*)$ those bumps or concavities will become less and less visible until you see something that looks more like a tilted plane.

But let's look more closely at a tilted plane. If you take a flat plane and tilt it, only one line in the plane stays at the original height of the plane. Every other point in the plane is either raised or lowered. The points that are raised are all on the same side of that line. So if the line has $x,y$ coordinates that pass through $(x^*,y^*),$ a movement in any direction on that plane that stays on the "raised" side of that line will produce an increase in height.

For a specific example, suppose the $x,y$ plot of the "original height" line makes an angle of $80$ degrees counterclockwise from the positive $x$ axis, and going in the direction of the positive $x$ axis the height of the plane is increasing. With that setup, you will have an increasing height of the plane in any direction you go as long as the direction is between $0$ and $80$ degrees counterclockwise from the positive $x$ axis, or between $0$ and $100$ degrees clockwise from the positive $x$ axis. That's a $180$-degree range of directions all with increasing heights. (The other $180$-degree range of directions has decreasing heights.)

The height of the plane in this example will increase the fastest if you go at an angle $10$ degrees clockwise from the positive $x$ axis--a direction vector with positive $x$ but negative $y$ coordinate. That's the direction of the gradient of that plane (and the direction of the gradient of any two-variable function whose derivative is identified by that plane). But the height of the plane will also increase if you go at an angle $10$ degrees counterclockwise from the positive $x$ axis, where the direction vector has both positive $x$ and $y$ coordinates. It just won't increase quite as fast.

If you're looking in two exactly opposite directions, with direction vectors such as $(a,b)$ and $(-a,-b)$, and you see increases in both directions, then you're in one of the following situations:

The steps you are making away from $(x^*,y^*)$ are too large. If you make the steps small enough, one of the directions will show a decrease instead of an increase.
You are looking at a point where the derivative is exactly zero, the tangent plane is exactly horizontal, and the gradient doesn't have a direction.
You are looking at a point where the function is not differentiable.

Thanks, I found this explanation very illuminating. The tilted plane also helps me see how the components in the gradient make up the "right" vector. My upvotes aren't shown yet but I really appreciate your answer. — A_Weierstrass, Jun 17 '19 at 00:37

Need help with a simple example where it's not clear that the gradient is in direction of "steepest ascent"

2 Answers2