2

Let $X$ be a square invertible $n \times n$ matrix. Calculate the derivative of the following function with respect to X.

$$ g(X)=\operatorname{tr}\left(X^{-1}\right) $$

I'm stumped with this. As when I work through it I use these two identities.

  1. $$\frac{\partial}{\partial \boldsymbol{X}} \boldsymbol{f}(\boldsymbol{X})^{-1}=-\boldsymbol{f}(\boldsymbol{X})^{-1} \frac{\partial \boldsymbol{f}(\boldsymbol{X})}{\partial \boldsymbol{X}} \boldsymbol{f}(\boldsymbol{X})^{-1}$$

and 2. $$ \frac{\partial}{\partial \boldsymbol{X}} \operatorname{tr}(\boldsymbol{f}(\boldsymbol{X}))=\operatorname{tr}\left(\frac{\partial \boldsymbol{f}(\boldsymbol{X})}{\partial \boldsymbol{X}}\right) $$

I should arrive at the solution. using 1. I get $$d/dX(X^{-1}) = -X^{-1}\otimes X^{-1}$$. So the answer should be the trace of that right? which = $$tr(-X^{-1})tr(X^{-1}).$$

but the solution seems to be $$-X^{-2T}$$? which I can't see

JimSi
  • 421
  • Your second identity cannot be correct, since the derivative of a scalar function with vector arguments must be a vector. – Hans Engler Jul 21 '20 at 17:39

3 Answers3

1

We will use the following Frobenius product identity \begin{align} \operatorname{tr}\left(A^T B \right) := A:B . \end{align} and use the cyclic property of trace, e.g., \begin{align} A: BCD = B^T A: CD = B^TAD^T: C \end{align}

Further, we will use the differential of invertible matrix $X$ \begin{align} XX^{-1} = I \Longrightarrow dX X^{-1} + X dX^{-1} = 0 \Longleftrightarrow dX^{-1} = -X^{-1} dX X^{-1}. \end{align}

Now, say $f := \operatorname{tr}\left( X^{-1} \right)$, then we find the differential followed by the gradient. \begin{align} df &= d\operatorname{tr}\left( X^{-1} \right) = d\operatorname{tr}\left( I X^{-1} \right) \\ &= I : dX^{-1} \\ &= I : -X^{-1} dX X^{-1} \\ &= - X^{-T} I X^{-T} : dX \\ &= - X^{-2T} : dX \end{align}

Then the gradient is \begin{align} \frac{\partial f}{\partial X} = - X^{-2T}. \end{align}

user550103
  • 2,688
0

$\newcommand{tr}{\operatorname{tr}}$If $i(X)=X^{-1}$, then by all means $D_Xi(H)=-X^{-1}HX^{-1}$. Using the chain rule $$D_Xg(H)=D_X(\tr\circ i)(H)=(D_{i(X)}\tr)(D_Xi(H))=\tr(-X^{-1}HX^{-1})$$

and since $\tr(AB)=\tr(BA)$, we have $$D_Xg(H)=-\tr(X^{-1}HX^{-1})=-\tr(X^{-2}H)=-\tr(HX^{-2})$$

0

The problem is with this equation

$$\frac{\partial}{\partial \boldsymbol{X}} \operatorname{tr}(\boldsymbol{f}(\boldsymbol{X}))=\operatorname{tr}\left(\frac{\partial \boldsymbol{f}(\boldsymbol{X})}{\partial \boldsymbol{X}}\right)$$

Note that on the LHS you are taking the derivative of a function $\mathbb R^{n\times n} \to \mathbb R$, whereas on the RHS you are taking the trying to take the trace of the derivative of a function $f\colon\mathbb R^{n\times n}\to\mathbb R^{n\times n}$. As you already figured out, this derivative can be expressed by a 4-th order tensor $-(X^{-1} \otimes X^{-1})$. Obviously, the result cannot be $-\operatorname{tr}(X^{-1})\operatorname{tr}(X^{-1})$, as this is a scalar, but the result needs to be a second order tensor.

Hyperplane
  • 11,659
  • Hi thanks for your comment. So am I right in saying $tr(-(X^{-1} \otimes X^{-1}))$ is the soluton and = $-X^{-2T}$. – JimSi Jul 23 '20 at 12:47
  • In index notation would a way of showing this be${-(X^{-1} \otimes X^{-1})}{ijmn} = X{ij}^{-1}X_{np}^{-1}$. so $tr({-(X^{-1} \otimes X^{-1})}{ijmn}) = \sum _{i}X{ij}^{-1}X_{ni}^{-1} = -X^{-2T}$ As I'm struggling to do the contraction. – JimSi Jul 23 '20 at 12:52