Thanks to the scientific community, things are getting clear relatively to the question: what is the gradient of a function $f(X)$ when $X$ is a symmetric matrix?.
In particular, I report here some useful links that addressed this question in the past and can be used as a reference to proceed further on this discussion:
- Understanding notation of derivatives of a matrix
- Taylor expansion of a function of a symmetric matrix
- https://arxiv.org/pdf/1911.06491.pdf
In a nutshell, we can say that, when involving a function with matrix argument, we have to distinguish between two "different", but related, gradients:
- the unconstrained gradient $G$, computed with standard matrix calculus without assuming dependent variables in the matrix $X$, and used for the computation of the differential of the function, i.e. $G:dX$
- the constrained gradient $S$, that considers only the independent variables of the matrix $X$.
These two gradients are related by the expression:
$$S=G+G^{T}-I \circ G $$
and it turns out that the first-order differential of the function $f$ at a given point $X$ after a perturbation $\Delta X$ can be computed as:
$$ d f=\sum_{i, j} G_{i j} d X_{i j} = \sum_{i \geq j} S_{i j} d X_{i j}$$
It is important however to note how, in an iterative algorithm that updates a variable $X^{k+1}$ (such as in gradient descent), we have to use the constrained gradient $S$ and not the gradient $G$, due to the fact that $X$ is symmetric while the gradient $G$ could be not symmetric.
More information can be found in the above links, that explain the relation also in terms of $vec(\cdot)$ and $vech(\cdot)$ operators.
Coming to my question. I want now to find the Hessian of the function $f(X)$, that in theory is a $4$th order tensor and we already know the mangy road crisscrossed to get to the gradient.
To start, is it correct to perturb the first-order differential (with the unconstrained gradient)? If yes, I will reach a scalar quadratic form. For instance, if we consider as function $f(X)=\log \operatorname{det} X$, we know that the second order approximation with perturbation in $U$ and $V$ is given by (and I reference this question Second order approximation of log det X):
$$-\operatorname{tr}\left(X^{-1} U X^{-1} V\right) = - \operatorname{vec}(U^{\top})^{\top}(X^{-\top} \otimes X^{-1}) \operatorname{vec}(V)$$
We can arrive at the Hessian in matrix form $X^{-\top} \otimes X^{-1}$.
My first question is: how to write it in a tensor form?
And second question is: how to reach in this case our constrained Hessian?