Why does the gradient point at direction of maximum slope only?

Question

I'm just learning about gradient and I'm a bit hazy about this: In the above diagram, the length of the line represents magnitude of slope (change along z) in that direction. According to my understanding, the gradient at a point is the sum of slope at every direction from that point resolved into the x and y components.

Then how does the resultant direction give the direction maximum slope? Wouldn't it it give something like a weighted average direction which may be a different direction like in the diagram I drew?

My guess was that since the function we are dealing with is continuous, it does not allow such cases, but I'm still not sure.

Edit:

I think I wasn't able to communicate my question properly before. I think I will be able to do it using a little physics.

So imagine I place a ball on an uneven hill. The force of gravity acting on the ball from every direction is directly proportional to the slope of the hill in that direction (mg*slope). For this example, let mg=1, so the force from every direction is the slope along that direction.

To find the direction of resultant slope, or essentially where the ball would roll, I would resolve the slopes along all directions and add them up.

However, to find gradient (again, which gives me the direction along which the ball will roll), it seems to me that all we are doing is finding slope along x and y and predicting the whole thing, ignoring every other direction.

I suspect this is because a hill is a continuous structure (read: we are dealing with differentiable functions) and no sudden change in slope is possible. But I'm not convinced with my own argument.

If you are only just starting out with the gradient, I would strongly advise you to watch MIT's Multivariable Calculus Video on Youtube (gradient appears in around lecture 11). Some nice geometric and algebraic is given that directly answers your question ! — Hugh Entwistle, Oct 12 '16 at 10:11
Look here http://math.stackexchange.com/questions/223252/why-is-gradient-the-direction-of-steepest-ascent?rq=1 — la flaca, Oct 12 '16 at 10:23
The gradient really isn't defined by $\nabla f = \langle \partial_x f, \partial_y f,\partial_z f\rangle$ -- it's defined by $\nabla f \cdot u = D_uf,\ \forall u$ where $D_uf$ is the directional derivative in the direction of $u$. So the gradient is essentially defined as the vector which points in the direction of max slope. — , Oct 12 '16 at 11:26
@Bye_World How are those two expressions equal, is my question exactly. — Mahathi Vempati, Oct 12 '16 at 12:17
@Arthur The Pythagoras theorem intuition helped me a lot. Thanks. However, I edited my question to pinpoint at my exact problem. — Mahathi Vempati, Oct 12 '16 at 12:40
The reason that it works to look at just $x$ and $y$ directions and that we can tell the maximum slope direction from that that in a miniscule neighbourhood around the relevant point, the graph is indistinguishable from a plane (the so-called "tangent plane"). And since telling the slope in $x$ and $y$ direction amounts to finding two specific lines in that plane, the rest of the plane is uniquely determined, along with the direction and size of the maximal slope. — Arthur, Oct 12 '16 at 16:28

score 2 · Answer 1 · answered Oct 16 '16 at 15:48

$\newcommand{\Del}{\nabla}\newcommand{\Reals}{\mathbf{R}}$If $f$ is a differentiable, real-valued function of two real variables, then by definition, at each point $(x_0, y_0)$ in the domain there exists a linear transformation $D:\Reals^{2} \to \Reals$ such that $$ f(x_0 + h, y_0 + k) = f(x_0, y_0) + D(h, k) + \epsilon(h, k),\qquad \lim_{(h, k) \to(0, 0)} \frac{\epsilon(h, k)}{\sqrt{h^2 + k^2}} = 0. \tag{1} $$ To indicate the dependence of $D$ on the function $f$ and the point $(x_0, y_0)$, we usually write $D = Df(x_0, y_0)$.

The chain rule gives $$ \frac{d}{dt}\bigg|_{t=0} f(x_0 + th, y_0 + tk) = Df(x_0, y_0) (h, k). \tag{2} $$

The first point is, a linear transformation $D:\Reals^{2} \to \Reals$ is completely determined by two real numbers. Conventionally these numbers are taken to be the values on the standard basis vectors, a.k.a., the rates of change of $f$ in the Cartesian coordinate directions, a.k.a. the partial derivatives of $f$ at $(x_0, y_0)$. That's why the rate of change of a differentiable function $f$ at a point $(x_0, y_0)$ in an arbitrary direction $(h, k)$ is completely determined by two numbers. (If $f$ is a differentiable function of $n \geq 1$ variables, the derivative at each point, similarly, is completely determined by $n$ real numbers, which can be taken to be the partial derivatives.)

Second, the linear transformation $Df(x_0, y_0)$, normally represented by a row matrix, can be written as a column and interpreted as a gradient vector $\Del f(x_0, y_0)$ based at $(x_0, y_0)$. If $(h, k)$ is an arbitrary vector (viewed as a displacement from $(x_0, y_0)$ as in (1)), then $$ Df(x_0, y_0) (h, k) = \Del f(x_0, y_0) \cdot (h, k), \tag{3} $$ the dot product of the gradient with the displacement. Consequently, if $(h, k)$ is a unit vector making angle $\theta$ with the gradient vector at $(x_0, y_0)$, then $$ \Del f(x_0, y_0) \cdot (h, k) = \|\Del f(x_0, y_0)\| \cos\theta. \tag{4} $$ Combining (2), (3), and (4), $$ \frac{d}{dt}\bigg|_{t=0} f(x_0 + th, y_0 + tk) = \|\Del f(x_0, y_0)\| \cos\theta \tag{5} $$ for a unit vector $(h, k)$ making angle $\theta$ with $\Del f(x_0, y_0)$. This equation contains the geometric facts that ($\theta = 0$) "the gradient points in the direction of most rapid increase (of $f$ at $(x_0, y_0)$)" and ($\theta = \frac{\pi}{2}$) "the gradient $\Del f(x_0, y_0)$ is orthogonal to the level set of $f$ through $(x_0, y_0)$.

Another consequence, incidentally, is that your diagram is misleading: If you plot vectors $(h, k)$ scaled so the magnitude is the rate of change of $f$ in that direction, the tips of the vectors trace the circle through $(x_0, y_0)$ and with the segment from $(x_0, y_0)$ to $(x_0, y_0) + \Del f(x_0, y_0)$ as a diameter. (Pleasant polar coordinates exercise.)

In your physics analogy, you should really zoom in on your hill until it looks like a plane (i.e., zoom in on the graph $z = f(x, y)$ at the point $(x_0, y_0, f(x_0, y_0))$ until the graph is indistinguishable from the tangent plane).

Finally, if it matters, an actual ball rolling down an actual hill (or a point particle sliding without friction down a hill) does not follow the gradient: Otherwise, roller coasters (etc.) wouldn't work. Mathematically, the second-order equations of motion do not coincide with the first-order flow equations for the gradient field of $f$.

Why does the gradient point at direction of maximum slope only?

1 Answers1