3

I am quite a beginner in linear algebra and matrix calculus. I was wondering what is the derivative of the matrix inverse when the matrix is symmetric. More precisely, I'm looking for $\frac{\partial}{\partial \mathbf{X}} \mathbf{X}^{-1}$ when $\mathbf{X}$ is a symmetric matrix.

I am asking this because I have a function $f: \mathbb{R}^{n\times n} \to \mathbb{R}$ in the form of \begin{equation*} f(\mathbf{X}) = \mathrm{trace}(\mathbf{A} \mathbf{X}^{-1}) - \log |\mathbf{X}| \end{equation*} and I want to find its extremums using derivatives. I also know if $\mathbf{X}$ is symmetric, then \begin{align*} \frac{\partial \mathrm{trace} (\mathbf{A} \mathbf{X})}{\partial \mathbf{X}} & = \mathbf{A} + \mathbf{A}^T - (\mathbf{A} \circ \mathbf{I}) \\ \frac{\partial \log |\mathbf{X}|}{\partial \mathbf{X}} & = 2 \mathbf{X}^{-1} - (\mathbf{X}^{-1} \circ \mathbf{I}) \end{align*} (from http://www.mit.edu/~wingated/stuff_i_use/matrix_cookbook.pdf Section 2.5).

I somehow want to use the above with the chain rule to write \begin{equation*} \frac{\partial f}{\partial \mathbf{X}} = \frac{\partial f}{\partial \mathbf{X}^{-1}} \frac{\partial \mathbf{X}^{-1}}{\partial \mathbf{X}} \end{equation*} and compute the derivative of $f$ with respect to $\mathbf{X}$ (since I can easily write $\log |\mathbf{X}| = - \log |\mathbf{X}^{-1}|$).

MikeL
  • 627
  • 1
    Those formulas for derivatives with respect to symmetric matrices are quite simply incorrect. They should not have the term with $- (...\otimes I)$, and should instead be divided by two -- symmetrized versions of the normal derivative. This is wrongly constructed by treating the elements above the diagonal as a basis, but this "stretches" the space in those directions. See https://arxiv.org/abs/1911.06491 for the history and analysis of what went wrong and https://saturdaygenfo.github.io/posts/symmetric-gradients/ for why it matters in practice. – wnoise Jan 19 '22 at 16:24

3 Answers3

2

Hint: use $(X^{-1})_{ij}=\frac{C_{ji}}{\operatorname{det}(X)}$, with $C_{ji}=(-1)^{i+j}X_{ji}=(-1)^{i+j}X_{ij}$, as $X$ is symmetric.

Then

$$\frac{\partial (X^{-1})_{ij}}{\partial X_{kl}}= \frac{\partial }{\partial X_{kl}}\left( \frac{(-1)^{i+j}X_{ij}}{\operatorname{det}(X)}\right).$$

Using the formula for the derivative of the determinant of $X$

$$\frac{\partial \det(X)}{\partial X_{kl}}= \det(X)(X^{-1})_{lk}$$

you can arrive at the result.

Avitus
  • 14,018
1

Ignoring for the moment the symmetry constraint. First rewrite the function as $$\eqalign{ f &= A^T:X^{-1} - {\rm tr}({\rm log}(X)) \cr }$$ Then take the differential $$\eqalign{ df &= A^T:(-X^{-1}\,dX\,X^{-1}) - X^{-T}:dX \cr &= -X^{-T}A^TX^{-T}:dX - X^{-T}:dX \cr &= -(X^{-T}A^TX^{-T}+X^{-T}):dX \cr }$$ Since $df = \frac{\partial f}{\partial X}:dX$, the unconstrained derivative is $$\eqalign{ \frac{\partial f}{\partial X} &= -X^{-T}A^TX^{-T}-X^{-T} \cr }$$ The symmetry of $X$ constrains the actual gradient to $$\eqalign{ g &= \Big(\frac{\partial f}{\partial X}\Big) + \Big(\frac{\partial f}{\partial X}\Big)^T - \Big(\frac{\partial f}{\partial X}\Big)\circ I \cr &= I\circ\Big(X^{-1}AX^{-1}+X^{-1}\Big)-X^{-1}(A+A^T)X^{-1}-2X^{-1} \cr }$$

Update

Taking things a bit further, let $S=\frac{1}{2}(A+A^T)$.

Then $$\eqalign{ g &= I\circ\Big(X^{-1}SX^{-1}+X^{-1}\Big)-2X^{-1}SX^{-1}-2X^{-1} \cr &= I\circ B-2B \cr }$$ Setting the gradient to zero leaves us with $I\circ B=2B$, which can only be true if $B=0$. So now we solve $$\eqalign{ X^{-1}SX^{-1}+X^{-1} &= 0 \cr X &= -S = -\frac{1}{2}(A+A^T) \cr }$$ Compare this to the solution of the unconstrained problem, which is $$\eqalign{ X^{-1}AX^{-1}+X^{-1} &= 0 \cr X &= -A \cr }$$

lynn
  • 1,746
0

Assume that $A,X\in Sym$ (the $n\times n$ real symmetric matrices) and $\det(X)>0$. The derivative of $f$ is $Df_X:H\in Sym \rightarrow -tr((X^{-1}AX^{-1}+X^{-1})H)$ (cf. the first part of the lynn's post). If $f$ reaches an extremum in $X$, then, for every $H\in Sym$, $tr((X^{-1}AX^{-1}+X^{-1})H)=0$. Chosing $H=X^{-1}AX^{-1}+X^{-1}$, we deduce that $X^{-1}AX^{-1}+X^{-1}=0$, that is $AX^{-1}=-I$. If $A$ is not invertible, then there are no solutions; otherwise, necessarily $X=-A$; $-A$ is symmetric and we obtain another necessary condition: $\det(-A)>0$.

Now $D^2f_X:(H,K)\in Sym^2\rightarrow tr(K(X^{-1}AX^{-2}+X^{-2}AX^{-1}+X^{-2})H)$ and the symmetrix matrix associated to $D^2f_X$ is $Hess(f)_X=X^{-1}AX^{-2}+X^{-2}AX^{-1}+X^{-2}$; when $X=-A$, we obtain $Hess(f)_{-A}=-A^{-2}$. Note that $-A^{-2}$ is symmetric $<0$. Then, for every symmetric $H$, $D^2f_{-A}(H,H)=-tr(HA^{-2}H)=-tr((A^{-1}H)^T(A^{-1}H))<0$ except if $A^{-1}H=0$, that is, if $H=0$. Finally $f(-A)=-n-\log(\det(-A))$ is a local maximum of $f$.