On "the Hessian is the Jacobian of the gradient"

Question

According to Wikipedia,

The Hessian matrix of a function $f$ is the Jacobian matrix of the gradient of the function $f$; that is: $H(f(x)) = J(\nabla f(x))$.

Suppose $f : \Bbb R^m \to \Bbb R^n,x \mapsto f(x)$ and $f \in C^2 (\Bbb R^m)$. Here, I regard points in $\Bbb R^m, \Bbb R^n$ as column vectors, therefore $f$ sends column vectors to column vectors. When $n=1$, we can define $\nabla f: \Bbb R^m \to (\Bbb R^m)^t,x\mapsto\nabla f(x)$, which sends column vectors to row vectors. I use $(\Bbb R^m)^t$ to denote row vector space, which is just a random notation.

We do have a good definition for functions that sends column vectors to column vectors, but what can we say about functions that sends column vectors to row vectors?

I discovered that if I manipulate $\nabla f(x)$ as a column vector, then I know how to calculate, and my calculation agree with Wiki. But I don't think we can "manipulate $\nabla f(x)$ as a column vector".

I'm not sure I understand exactly what you're asking; maybe it's about the convention for gradients. On Wikipedia, the gradient is a column vector, the transpose of the Jacobian. — Kyle Miller, Mar 09 '21 at 03:22
@KyleMiller Please see https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant, it says "When $m = 1$, that is when $f : ℝ^n → ℝ$ is a scalar-valued function, the Jacobian matrix reduces to a row vector. This row vector of all first-order partial derivatives of $f$ is the gradient of $f$, i.e.$ {\displaystyle \mathbf {J} _{f}=\nabla f}$." So when $f$ is a scalar function, then $J_f=\nabla f$, not $J_f ^T=\nabla f$. — cxh007, Mar 09 '21 at 03:30
That is not correct, since $J_f$ is a row vector and $\nabla f$ is a column vector. There should be a transpose there. — Kyle Miller, Mar 09 '21 at 03:33
Thanks for pointing this out -- I corrected the Wikipedia article. — Kyle Miller, Mar 09 '21 at 04:03
https://math.stackexchange.com/questions/2053229/the-connection-between-the-jacobian-hessian-and-the-gradient — HappyFace, Jun 11 '23 at 15:00

Kyle Miller · Answer 1 · 2021-03-09T04:05:39.073

The relationship between gradients and Jacobians is the transpose. Suppose $f:\mathbb{R}^n\to\mathbb{R}$ is some $C^1$ function.

$J_f$ takes a point of $\mathbb{R}^n$ and produces a $1\times n$ matrix that is able to calculate directional derivatives. That is to say, if $\vec{v}\in\mathbb{R}^n$ is a vector and $p\in\mathbb{R}^n$ is a point, then $J_f(p)\vec{v}$ is a $1\times 1$ matrix whose entry is the directional derivative at $p$ in the $\vec{v}$ direction.

$\nabla f$ takes a point of $\mathbb{R}^n$ and produces a vector in $\mathbb{R}^n$ that can be used to calculate directional derivatives using the dot product. That is, with the same $p$ and $\vec{v}$ above, $\vec{v}\cdot\nabla f(p)$ is the directional derivative at $p$ in the $\vec{v}$ direction.

Recall that if $\vec{v},\vec{w}\in\mathbb{R}^n$ are vectors, then $\vec{v}\cdot\vec{w}=\vec{v}^T\vec{w}$, where we pretend that $1\times 1$ matrices are the same as scalars for this equation to make sense. Then, we have that $\vec{v}\cdot\nabla f(p) = (\nabla f(p))^T\vec{v}$, where the latter is commonly written as $(\nabla^Tf(p))\vec{v}$. Since this represents the directional derivative, too, and it holds for all $\vec{v}$ and $p$, then $\nabla^Tf=J_f$.

Let's take a look at the equation $H_f=J_{\nabla f}$ for the Hessian, where $f:\mathbb{R}^n\to\mathbb{R}$ is a $C^2$ function. We have that $\nabla f$ is a function $\mathbb{R}^n\to\mathbb{R}^n$, taking points to column vectors, and so $J_{\nabla f}$ is going to be an $n\times n$ matrix. Using that $(\nabla f(p))_i = \tfrac{\partial f}{\partial x_i}(p)$, then $$(H_f(p))_{ij} = (J_{\nabla f}(p))_{ij} = \frac{\partial}{\partial x_j}(\nabla f(p))_i = \tfrac{\partial^2f}{\partial x_j\partial x_i}(p),$$ as expected.

The last line shows that the Hessian is actually the transpose of the Jacobian of the gradient. Can you correct this on Wikipedia? — HappyFace, Jun 07 '22 at 15:58
I just noticed that the Hessian is usually symmetric, so the transpose doesn't matter. (But what about the exceptional cases?) — HappyFace, Jun 11 '23 at 14:58

On "the Hessian is the Jacobian of the gradient"

1 Answers1