I'm trying to calculate the derivative of $\mathrm{tr}((I+X^{-1})^{-1})$ with respect to $X$. By some sort of a chain rule, I believe this should be $X^{-1}(I+X^{-1})^{-2}X^{-1}$. However, I'm having a hard time finding a good reference for such a result. Any help would be greatly appreciated.
-
maybe P154 of this (in general, search matrix calculus or vector calculus) mason.gmu.edu/~jgentle/csi771/13f/matrixcalculus.pdf – Benjamin Wang Jul 16 '20 at 15:35
-
maybe P154 of this (in general, search matrix calculus or vector calculus) mason.gmu.edu/~jgentle/csi771/13f/matrixcalculus.pdf – Benjamin Wang Jul 16 '20 at 15:36
-
Thanks for the reference! – apprentice Aug 06 '20 at 04:19
3 Answers
Alternative approach to Ben Grossmann's approach
We will use the following Frobenius product identity \begin{align} \operatorname{tr}\left(A^T B \right) := A:B . \end{align}
Further, we will use the differential of invertible (and assuming symmetric) matrix $X$ \begin{align} XX^{-1} = I \Longrightarrow dX X^{-1} + X dX^{-1} = 0 \Longleftrightarrow dX^{-1} = -X^{-1} dX X^{-1}. \end{align}
Let us define the following matrix with their differential \begin{align} M := \left(I + X^{-1} \right) \Longrightarrow dM = dX^{-1} = -X^{-1} dX X^{-1}. \end{align}
To this end, say $f := \operatorname{tr}\left( M^{-1} \right)$, then we find differential followed by the gradient. \begin{align} df &= d\operatorname{tr}\left( M^{-1} \right) = d\operatorname{tr}\left( I M^{-1} \right) \\ &= I : dM^{-1} \\ &= I : -M^{-1} dM M^{-1} \\ &= - M^{-2} : dM \\ &= - M^{-2} : -X^{-1} dX X^{-1} \\ &= X^{-1} M^{-2} X^{-1} : dX \end{align}
Then the gradient is \begin{align} \frac{\partial f}{\partial X} = X^{-1} M^{-2} X^{-1} = X^{-1} \left(I + X^{-1} \right)^{-2} X^{-1} . \end{align}

- 2,688
Define the matrix function $$\eqalign{ &F = (I+X^{-1})^{-1} = X(I+X)^{-1} \\ &F + (I+X)^{-1} = (I+X)(I+X)^{-1} \;\doteq\; I \\ }$$ and its differential$$\eqalign{ F &= I - (I+X)^{-1} \\ dF &= (I+X)^{-1}dX\,(I+X)^{-1} \\ }$$ Then calculate the differential and the gradient of its trace. $$\eqalign{ \phi &= {\rm Tr}(F) \\ d\phi &= {\rm Tr}(dF) \\ &= {\rm Tr}\Big((I+X)^{-1}dX\,(I+X)^{-1}\Big) \\ &= {\rm Tr}\Big((I+X)^{-2}dX\Big) \\ \frac{\partial\phi}{\partial X} &= (I+X)^{-2} \\ }$$

- 35,825
We can make life much easier if we use this matrix identity to rewrite $$ (I + X^{-1})^{-1} = X(I + X)^{-1}. $$ Now, we compute the derivative of $f(X) = X(I + X)^{-1}$ in "differential form" as follows. $$ df = d[X(I + X)^{-1}] = dX(I + X)^{-1} + X d(I + X)^{-1}\\ = dX(I + X)^{-1} - X(I + X)^{-1}d(I + X)(I + X)^{-1}\\ = dX(I + X)^{-1} - X(I + X)^{-1}dX(I + X)^{-1}\\ = [I - X(I + X)^{-1}]dX(I + X)^{-1}. $$ Thus, we have $$ d\operatorname{tr}(f(X)) = \operatorname{tr}[[I - X(I + X)^{-1}]dX(I + X)^{-1}]\\ = \operatorname{tr}[(I + X)^{-1}[I - X(I + X)^{-1}]dX]. $$ We can now convert from differential form to get the numerator-layout derivative $$ \frac{\partial f}{\partial X} = \left((I + X)^{-1}[I - X(I + X)^{-1}]\right)^\top. $$ The denominator-layout version is the same, but without the transpose. If we use the fact that all rational functions of $X$ commute, we can simplify the expression a bit: $$ (I + X)^{-1}[I - X(I + X)^{-1}] = \\ [I - X(I + X)^{-1}](I + X)^{-1} = \\ [(I + X) - X](I + X)^{-2} =\\ (I + X)^{-2}. $$ That is, the derivative is either $(I + X)^{-2}$ or $[(I + X)^{-2}]^\top$, depending on your convention.

- 225,327