60

I am wondering if there is any quick way to see the following two definitions of Hessian coincide with each other without using local coordinates.

  • $\operatorname{Hess}(f)(X,Y)= \langle \nabla_X \operatorname{grad}f,Y \ \rangle$; and
  • $\operatorname{Hess}(f)(X,Y)=X (Yf) - (\nabla_XY) f$.
Boar
  • 125
user17150
  • 965

3 Answers3

79

This question arises from the last two equation in the Wikipedia article. This is a common problem in differential geometry, namely too many different notations mixed up in one place.

The definition is clearly stated in the aforementioned article. The Hessian of a smooth function $f:M\rightarrow \mathbb{R}$ on an arbitrary smooth manifold with an arbitrary connection $\nabla$ is the iterated covariant derivative of function $f$, that is $$ \operatorname{Hess}(f):=\nabla{\nabla{f}} $$ so that $\operatorname{Hess}(f) \in \Gamma(T^*M \otimes T^*M)$, in other terms it is a $(0,2)$-tensor field on $M$. For any two vector fields $X,Y$ on $M$ we have a smooth real-valued function $\operatorname{Hess}(f)(X,Y)=\nabla{\nabla{f}}\,(X,Y)$ on manifold $M$. All we know about this function is that it is bilinear.

Given a connection $\nabla$ we know how to compute $\nabla{f}\;$ for any smooth function, and $\nabla_X{Y}\;$ for any two (smooth) vector fields $X,Y$. This rule is linear in slot $X$ and satisfies a product rule in slot $Y$. For tensors of all other types the covariant derivatives are computed using the requirement that it commutes with contractions and satisfies the product rule with respect to the tensor product.

In particular, $\nabla{f}\equiv \operatorname{d}f\;$ by definition and $$ \nabla_Y{f}=\nabla{f}(Y)=\operatorname{d}f(Y)=Y\,f \tag{1} $$ This can also be seen as $$ \nabla{f}(Y)=\nabla{f}\cdot Y=\mathcal{C}(\nabla{f} \otimes Y) $$ where $\mathcal{C}$ denotes the contraction operator (as well as symbol $\cdot$ there).

Now using the declared properties of covariant derivative we can write down the following calculation $$ \begin{align} \nabla_{X}(\nabla{f}(Y)) &= \mathcal{C}((\nabla_X \nabla{f} \otimes Y) + \nabla{f} \otimes \nabla_X{Y} \\ &= \nabla{\nabla{f}(X,Y)} + \nabla_{\nabla_X{Y}}{f} \end{align} $$ Rewriting this using conventions (1) we get $$ \nabla{\nabla{f}}(X,Y)=X(Y\,f))-(\nabla_X{Y})f \tag{2} $$

Now let us recall that in Riemannian geometry we have canonical isomorphisms between tangent and cotangent spaces (so called musical isomorhphisms), so we can identify $$ \operatorname{d}f \equiv \operatorname{grad}(f) $$ However, one must keep in mind that this really means $$ \operatorname{grad}(f) = \operatorname{d}f^\# $$ which by definition is a unique vector such that $$ g(\operatorname{grad}(f), Y)=g(\operatorname{d}f^\#, Y) = \operatorname{d}f(Y) = Y\,f $$

Another feature of Riemannian geometry is that we use the Levi-Civita connection by default.

Now using (2) it is easy to complete what @Jason suggested: $$ \begin{align} \nabla{\nabla{f}}(X,Y) &= X(g(\operatorname{grad}(f), Y)) - g(\operatorname{grad}(f), \nabla_X{Y}) \\ &= g(\nabla_X \operatorname{grad}(f), Y) + g(\operatorname{grad}(f), \nabla_X{Y}) - g(\operatorname{grad}(f), \nabla_X{Y}) \\ &= g(\nabla_X \operatorname{grad}(f), Y) \end{align} $$ quod erat demonstrandum (QED).

Yuri Vyatkin
  • 11,279
  • @gofvonx yes, it is the definition. Recall that the connection maps sections to bundle-valued forms: $\nabla \colon \Gamma(E) \to \Gamma(T^{*}M) \otimes \Gamma{E}$ and compare this with the action of $\mathrm{d}$ on functions. There is a little clash with the notation in multivariable calculus where $\nabla f$ means the gradient. – Yuri Vyatkin Aug 22 '13 at 08:06
  • 1
    @gofvonx This is a legitimate question, of course, but we want the whole construction to be natural, so the exterior derivative is the only possible choice. See more here – Yuri Vyatkin Aug 22 '13 at 10:02
  • So the crucial point is to identify $\mathsf{grad}f$ with $df$ via the isomorphism $\flat$, please? – Boar Feb 05 '22 at 11:48
  • 3
    @Steve yes, if you like to say so. It is very common in Riemannian Geometry to use these identifications without explicit mentioning, which may be quite confusing for the beginners. I simply attempted to make all the silent assumptions visible. – Yuri Vyatkin Feb 06 '22 at 01:27
19

Hint: $X\langle \operatorname{grad} f, Y\rangle = \langle \nabla_X \operatorname{grad f}, Y\rangle + \langle \operatorname{grad} f, \nabla_X Y\rangle$. Now use the definition of $\operatorname{grad} f$.

5

On one hand, you can define the Hessian without mentioning the gradient (so you don't have to raise or lower indices):

First, define the covariant derivative of a $1$-form $\theta$ as follows: It should satisfy the "product rule", $$ X\langle \theta, Y\rangle = \langle \nabla_X\theta, Y\rangle + \langle \theta, \nabla_XY\rangle. $$ Therefore, $\nabla_X\theta$ is the $1$-form satisfying $$ \langle \nabla_X\theta, Y\rangle = X\langle\theta,Y\rangle - \langle\theta,\nabla_XY\rangle. $$ It's easy to check that this is a $2$-tensor. The Hessian is simply the covariant derivative of $df$. In particular, $$ \langle \nabla^2f, X\otimes Y\rangle = \langle \nabla_Xdf, Y\rangle = X\langle df,Y\rangle - \langle df,\nabla_XY\rangle $$

On the other hand, the gradient of $f$ is defined by its property that for any vector $Y$, $$ \langle df, Y\rangle = g(\nabla f, Y), $$ where $g$ is the Riemannian metric. Therefore, by the "product rule", $$ Xg(Z,Y) = g(\nabla_XZ,Y) + g(X,\nabla_XY), $$ it follows that $$ \langle\nabla^2f,X\otimes Y\rangle = Xg(\nabla f, Y) - g(\nabla f,\nabla_XY) = g(\nabla_X\nabla f, Y) $$

Deane
  • 7,582