According to Wikipedia,
The Hessian matrix of a function $f$ is the Jacobian matrix of the gradient of the function $f$; that is: $H(f(x)) = J(\nabla f(x))$.
Suppose $f : \Bbb R^m \to \Bbb R^n,x \mapsto f(x)$ and $f \in C^2 (\Bbb R^m)$. Here, I regard points in $\Bbb R^m, \Bbb R^n$ as column vectors, therefore $f$ sends column vectors to column vectors. When $n=1$, we can define $\nabla f: \Bbb R^m \to (\Bbb R^m)^t,x\mapsto\nabla f(x)$, which sends column vectors to row vectors. I use $(\Bbb R^m)^t$ to denote row vector space, which is just a random notation.
We do have a good definition for functions that sends column vectors to column vectors, but what can we say about functions that sends column vectors to row vectors?
I discovered that if I manipulate $\nabla f(x)$ as a column vector, then I know how to calculate, and my calculation agree with Wiki. But I don't think we can "manipulate $\nabla f(x)$ as a column vector".