0

I'm having trouble figuring out how the gradient (which seems to be the derivative in multiple variables) can be both the direction of steepest ascent AND the perpendicular vector to a surface. Say we have a function = $v = x^{2} + y^{2}$

The gradient is is then $[2x, 2y]$

At point say -1, 1, the gradient is $[-2, 2]$. Does this mean that at point -1, 1, the direction of steepest ascent is the vector -2, 2?

IF so, that to me implies that the steepest ascent direction is also tangent to the curve $x^{2} + y^{2}$ in some way since it points in the direction of the curve. How can it also be perpendicular to the surface?

I'm obviously misunderstanding something but I need help in figuring out what.

Jwan622
  • 5,704

1 Answers1

1

Let $f:\mathbb{R}^2 \rightarrow \mathbb{R}$ be differentiable.

  1. Direction of steepest ascent: we want to prove that among all directions (i.e. unitary vectors $v \in \mathbb{R}^2$), the vector $\nabla f(x)/|\nabla f(x)|$ is the one that gives us the maximum value of the function $v\mapsto \nabla f(x) \cdot v$ (where $\cdot$ is the dot product). So, $\nabla f(x) \cdot v=|\nabla f(x)||v|\cos \theta(v)=|\nabla f(x)|\cos \theta(v)$, since $|v|=1$. Here $\cos \theta(v)$ is the angle of $v$ with the gradient. Hence we have a maximum when $\theta(v)=0$, so that $v$ must lie on the direction of gradient with the same orientation. So $v=c\nabla f(x)$, with $c>0$. Apply the norm on both sides and get $c=1/|\nabla f(x)|$ (assuming the gradient is not $0$).
  2. Orthogonality: orthogonality is a bit tricky, how did you define it? Do you know what it means for a vector $v$ to be orthogonal to a surface, or a curve? In any case, the gradient $\nabla f(x)$ is a $2$-dimensional vector, so he lives in $\mathbb{R}^2$, while the surface $\{x,f(x)\}$ lives in $\mathbb{R}^{2+1}$. The gradient $\nabla f(x)$ is orthogonal to the set $\{y|f(y)=f(x)\}$, which is (sometimes) a smooth curve. However, a proof requires a definition of orthogonality (or tangency).

We maximize $v\mapsto \nabla f(x) \cdot v$ because it's a (first order) approximation of $f(x+v)-f(x)$, which is what you call "ascent". It is a first order approximation since, by differentiability $f(x+v)=f(x)+\nabla f(x) \cdot v+$higher order terms.

We choose $|v|=1$ because it is a standard way of defining a "direction", but also because you can make $\nabla f(x) \cdot v$ as big as you want if you don't limit the norm $|v|$. Hence requiring $|v|=1$ is a way of making the problem "well-phrased".

Lilla
  • 2,099