0

The most helpful answer I could find was Jonathan's answer here, and I decided not to comment and bump a literally 6 years old thread for clarification on the explanation, and also because my question, I believe, is not a direct duplicate, because it asks for a question following this, and because it can't be contained by the word limit.

His explanation, in "Why is gradient the direction of steepest ascent?" is this:

Consider a Taylor expansion of this function, $$f({\bf r}+{\bf\delta r})=f({\bf r})+(\nabla f)\cdot{\bf\delta r}+\ldots$$ The linear correction term $(\nabla f)\cdot{\bf\delta r}$ is maximized when ${\bf\delta r}$ is in the direction of $\nabla f$.

I find this is a very graceful answer, but I have one confusion.

For the linear term, it is implied that the derivative for $f(\mathbf r + \mathbf{\delta r})$ is $\mathbf{\nabla f}$, and I can't seem to figure out why this is. Also it seems clear that a Taylor series for a vector-valued function seems to replace multiplication by what seems like its vector analog, the dot product, for which I have no reliable, rigorous understanding as to why other than it being sort of what I would assume it would be if one created a Taylor series for a vector-valued function.

sangstar
  • 1,947
  • The Taylor series expansion packs a lot of information, but it is only valid (in the sense of converging to $f(r)$) for special cases (when $f$ is analytic). Your Question seems to me to ask why the gradient is the gradient if a function has a Taylor series. Keep in mind that the ... terms omitted involve quadratic and higher powers of components of $\delta r$. – hardmath Feb 13 '18 at 19:43
  • Right, but those terms would be vanishingly small since $\delta r$ is already, small, would it not? And what do you mean by $f$ is analytic? – sangstar Feb 13 '18 at 19:48
  • Well, analytic functions are special in having arbitrarily many derivatives, while we only need first partial derivatives to define the gradient. I'm just pointing out that the Taylor series notation you used presumes that the function $f$ has not only a gradient at $r$ but also all higher order derivatives. – hardmath Feb 13 '18 at 19:58
  • 1
    Oh. So perhaps a directional derivative justification is a more precise explanation. – sangstar Feb 13 '18 at 20:01
  • Yes, more precise in focusing on what needs to be/can be explained. A direction of "steepest ascent" implies at least one directional derivative exists, and suggests maybe they all do, so we can compare. In any case it helps to make a more rigorous argument to say something about the mixed partial derivatives. – hardmath Feb 13 '18 at 20:13

2 Answers2

1

Without Taylor series or polynomials, it follows directly from the directional derivative formula for a differentiable function. The directional derivative (instantaneous rate of change) of $f$ at $\mathbf a$ in the direction of a unit vector $\mathbf v$ is given by $$D_{\mathbf v}f(\mathbf a) = \nabla f(\mathbf a)\cdot\mathbf v,$$ and so you get the maximum rate of change when you move in the direction of $\nabla f(\mathbf a)$ and a zero rate of change when you move orthogonal (perpendicular) to $\mathbf a$. (This is why the gradient vector gives the normal vector to level sets of $f$.)

Ted Shifrin
  • 115,160
  • The problem I have with this answer to me is that, which I think may stem from a lack of understanding of directional derivatives, is that if I have accepted the fact that the gradient points toward the maximal rate of change of the function, then it would be readily apparent that your answer implies that the gradient is in the direction of the maximum rate of change. However, I haven't accepted the fact yet because I can't justify it to myself in my head, and this answer seems like it already implies what I'm trying to accept in the first place. – sangstar Feb 13 '18 at 19:18
  • Well, I can't try to aim for a moving target. You have to tell me what you accept and understand. – Ted Shifrin Feb 13 '18 at 19:19
  • Am I making sense? If the gradient points in the direction of greatest change, then any vector parallel would maximize the dot product. The gradient is parallel to itself, so it would be a good vector for a high directional derivative. However, I think I have to first accept that the gradient points in the direction of greatest change in the first place, for this answer to ring clear to me. Also, I'll get to answering your last comment right now. – sangstar Feb 13 '18 at 19:20
  • No, you're not making sense :) We have the gradient vector, and we're looking at all possible directions $\mathbf v$. We look to see what the rate of change in direction $\mathbf v$ is and try to choose $\mathbf v$ to make it largest possible. We use properties of the dot product to see that this $\mathbf v$ must be a positive scalar multiple of $\nabla f(\mathbf a)$. This in turn gives us the interpretation of the gradient. ... – Ted Shifrin Feb 13 '18 at 19:23
  • Now, if you'll accept the chain rule, you can easily prove that this directional derivative formula is correct by taking $\mathbf g(t) = \mathbf a + t\mathbf v$ and computing $\dfrac d{dt}\Big|_{t=0} f(\mathbf g(t))$. – Ted Shifrin Feb 13 '18 at 19:23
  • I'll explain my interpretation of the gradient and directional derivative. To me, the gradient is a vector that contains all the partial derivatives of a function. The directional derivative is a scalar value that, in my view, indicates how well aligned the gradient and some arbitrary unit vector are, as it is a dot product. However, I don't see how the gradient being a vector of partial derivatives points to maximal rate of change in the first place. If I did see, your answer would be clear to me. – sangstar Feb 13 '18 at 19:25
  • No, the directional derivative is the rate of change in a certain direction. That is defined and computable independent of the gradient vector. It is not defined by that dot product formula. The chain rule, for example, will give you the dot product formula. So you are in fact chasing your own tail until you separate the two notions. – Ted Shifrin Feb 13 '18 at 19:28
  • Interesting. The dot product formula was my only exposure to the directional derivative. It's defined and computable without the gradient? I've never seen a definition with it that didn't include the gradient in some way. – sangstar Feb 13 '18 at 19:38
  • Use my comment four up as the definition :) In fact, you can take any differentiable curve $\mathbf g(t)$ in there, but do it with the line and you'll be fine. (I don't know how inclined you are to see proofs/derivations, but you might get something out of checking out some of my YouTube lectures from my multivariable mathematics course. See the link in my profile if you're curious.) – Ted Shifrin Feb 13 '18 at 19:40
  • I see that one can express it as something that looks like the limit definition of the derivative but using the vector function $\mathbf a + t\mathbf v$, looking something like the limit as $h$ approaches $0$ of $\frac{f(\mathbf a + t\mathbf v) - f(\mathbf x)}{h}$. Are you perhaps referring to something along these lines? The chain rule bit is confusing to me, as I know $g(t)$ but not $f(g(t))$, so I'm not sure what I would be computing the chain rule on. – sangstar Feb 13 '18 at 19:45
  • Yes, officially it's $\lim\limits_{t\to 0}\dfrac{f(\mathbf a+t\mathbf v)-f(\mathbf a)}t$, which is precisely what I wrote if you plug in $g(t) = \dots$. Chain rule is precisely about differentiating a composition of functions! – Ted Shifrin Feb 13 '18 at 19:47
  • I see. Not to pester you for too much longer in one answer, but why then, does Wikipedia state that that official definition is equal to $\nabla_v f(\mathbf x)$? – sangstar Feb 13 '18 at 19:50
  • That's just definition of notation. The formula with the dot product of the vector with the gradient is a theorem, not a definition. (You need a technical assumption of differentiability, which is more than the usual engineering-style multivariable calculus course ever presents.) – Ted Shifrin Feb 13 '18 at 19:54
  • I suppose I do! I'll try and find some explanation on it. I would think I had an idea from that word shouting the word "continuous" at me but if this isn't clear to me I clearly need to brush up on it in higher dimensions. – sangstar Feb 13 '18 at 20:00
0

Not sure if it is of help anymore, but I've been trying to answer the same question myself. To get the intuition behind why the gradient is related to the rate of maximum change of a function. And I believe it is covered very satisfactorily in https://tutorial.math.lamar.edu/classes/calciii/directionalderiv.aspx

For completeness related to the thread, and in order to summarize the aforementioned link, I would say the following:

  • The directional derivative contains the information of the rate of change of a function, say $z=f(x,y)$ for simplicity, since in its definition it accounts for how much more or less $x$ changes with respect to $y$ when moving from an initial point $(x,y)$ to $(x+ah, y+bh)$, via the vector $\langle a,b \rangle$.
  • The gradient is a vector defined as $\nabla f = \langle f_x,f_y \rangle$, so that the directional derivative can be written $D_{\vec{u}}f = \nabla f \cdot \vec{u}$, where $\vec{u}$ is a unit vector in the direction of interest.
  • The last formula can be rewritten taking into account the moduli and angle of the vectors. In fact, ${ D_{\vec{u}}f = \lVert \nabla f \rVert \lVert\vec{u}\rVert \cos\theta = \lVert \nabla f \rVert \cos\theta }$, with $\theta$ the angle in between the gradient vector and the unit direction vector.
  • Obviously, then, the maximum directional derivative occurs when $\theta=0$ and, thus, the gradient expresses by definition both the direction and value of the greatest rate of change of the function.

Also, taking into account that the gradient vector happens to be perpendicular to the level curves, and considering that the curves cluster together when the rate of change of the function is maximal, could give a further intuition on the matter.