Derivative of the inverse of a symmetric matrix w.r.t itself

Question

I'm trying take the derivative of a symmetrix matrix $\mathbf{C}$ with respect to itself. $$ \begin{equation} \frac{\partial \mathbf{C}^{-1}}{\partial \mathbf{C}} \end{equation} $$

Using the indicial notation, above equation can be rewritten as follows $$ \begin{equation} \frac{\partial C_{ij}^{-1}}{\partial C_{kl}} \end{equation} $$

At first I've used the following formula, $$ \begin{equation} \frac{\partial C_{ij}^{-1}}{\partial C_{kl}} = -C^{-1}_{ik}C^{-1}_{lj} \end{equation} $$

But I quickly realized that we've lost the symmetry of the problem now.

I read The Matrix Cookbook and the other posts about the same problem but unfortunately, I couldn't understand the things they've done.

For example in this article, at Eq.(100) authors have used the property below when taking the derivative of Eq.(99) $$ \begin{equation} \frac{\partial \mathbf{C}^{-1}}{\partial \mathbf{C}} = -\mathbf{C}^{-1} \boxtimes \mathbf{C}^{-T} \mathbf{I}_s \end{equation} $$ Where $\boxtimes$ is the square product, $\mathbf{I}_s$ is the symmetric fourth-order identity tensor and they are defined as follows $$ \begin{align} (\mathbf{A} \boxtimes \mathbf{B})_{ijkl} &= \mathbf{A}_{ik}\mathbf{B}_{jl} \\ (\mathbf{I}_s)_{ijkl} &= \frac{1}{2}(\delta_{ik}\delta_{jl}+\delta_{il}\delta_{jk}) \end{align} $$

I couldn't understand how did they achieve this result and how can I derive it myself.

You need a $4$-dimensional matrix. – Rodrigo de Azevedo May 19 '22 at 11:13 — Rodrigo de Azevedo, May 19 '22 at 11:13

greg · Accepted Answer · 2022-05-19T13:39:53.607

$ \def\p{\partial}\def\o{{\tt1}} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\C{C^{-1}}\def\Ct{C^{-T}} \def\LR#1{\left(#1\right)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Once you learn the technique, the problem can be solved very briefly $$\eqalign{ d\C &= -{\C\,dC\,\C} \\ &= -\LR{\C\E\Ct}:dC \\ \grad{\C}{C} &= -{\C\E\Ct} \\ }$$ The details are as follows...

Introduce a fourth-order tensor $\E$ with components $$\eqalign{ \E_{ijk\ell} = \delta_{ik}\,\delta_{j\ell} = \begin{cases} \o\quad{\rm if}\; i=k\;\;{\rm and}\;\;j=\ell \\ 0\quad{\rm otherwise} \end{cases} \\ }$$ The most useful property of this tensor is its ability to rearrange matrix products $$\eqalign{ ABC &= \LR{A\E C^T}:B \;=\; \F:B \\ }$$ where juxtaposition implies a single-dot product and a colon $(:)$ denotes the double-dot product $$\eqalign{ &\F_{ijk\ell} = \sum_{p=1}^n\sum_{r=1}^n A_{i\c{p}}\E_{\c{p}jk\c{r}}\,C_{\c{r}\ell}^T \;=\; A_{ik}C_{\ell j} \\ &\LR{\F:B}_{ij} = \sum_{k=1}^n\sum_{\ell=1}^n \F_{ij\c{k\ell}}\,B_{\c{k\ell}} = \sum_{k=1}^n\sum_{\ell=1}^n A_{i\c{k}}B_{\c{k\ell}}C_{\c{\ell}j} \\ }$$ Start with the differential of the matrix inverse identity $$\eqalign{ &I = \C C \\ &dI = \c{d\C}C + \C dC \;\doteq\; 0 \\ &\c{d\C} = -\C\,dC\,\C \\ }$$ Then use $\E$ to rearrange the terms and recover the gradient $$\eqalign{ d\C &= -{\C\E\Ct}:dC \\ \grad{\C}{C} &= -{\C\E\Ct} \\ }$$ Or in component notation $$\eqalign{ \grad{\C_{ij}}{C_{k\ell}} &= -\sum_{p=1}^n\sum_{r=1}^n \C_{i\c{p}}\E_{\c{p}jk\c{r}}\C_{\ell\c{r}} \;=\; -\C_{ik}\C_{\ell j} \\ }$$

Update

The comments have become a rehash of the old "symmetric gradient" debate.

On the other hand, if a small set of scalar parameters are used to construct a tensor quantity, then the derivative of the tensor components with respect to one of those scalar parameters can exhibit any number of interesting symmetries. A large part of Continuum Mechanics is devoted to studying the implications of such symmetries.

But that's a different problem than calculating the derivative of one tensor component with respect to another tensor component. But many people (even professors and famous authors) often conflate these two problems.

Nice explanation but is this result still valid for symmetric C? It looks like you ended up with the same expression I've used in my post. — Murat Güven, May 18 '22 at 18:30
When it comes to gradients, symmetric matrices are terribly misunderstood. If you're interested you should really study this paper. In short, a gradient expression which is valid for a general matrix is also valid for a symmetric matrix. When it comes to higher-order tensors, the concept of symmetry itself gets ambiguous, i.e. is it symmetric in its first 2 indexes? Its last 2? The first and the last indexes? It's not worth the headache to worry about such things if all you need is a valid expression for a gradient descent algorithm. — greg, May 18 '22 at 20:09
BTW, you haven't "lost" any of the symmetry inherent in the problem. In the indexed expression, $(i,j)$ can be swapped on the RHS without affecting its validity. Likewise, and independently, $(k,\ell)$ could be swapped. — greg, May 18 '22 at 20:48
For example if I swap the indices of the nominator I get $\frac{\partial C_{ji}^{-1}}{\partial C_{kl}} = -C^{-1}{jk}C^{-1}{li}$ but is this equal to $-C^{-1}{ik}C^{-1}{lj}$ — Murat Güven, May 19 '22 at 08:05

KBS · Answer 2 · 2022-05-19T11:05:58.250

3

In the single-variable case, we have that

$$\dfrac{d}{dt}C(t)^{-1}=-C(t)^{-1}\dfrac{dC(t)}{dt}C(t)^{-1}.$$

This can obtained by differentiating the expression $C(t)C(t)^{-1}=I$ on both sides with some simple algebra. This directly generalizes to the multivariable case by expressing

$$C=\sum_{i,j}c_{ij}e_ie_j^T.$$

Then, we have that

$$\dfrac{d}{dc_{kl}}C^{-1}=-C^{-1}\dfrac{dC}{dc_{kl}}C^{-1}=-C^{-1}e_ke_l^TC^{-1},$$

from which we get

$$\dfrac{d}{dc_{kl}}C_{ij}^{-1}=-e_i^TC^{-1}e_ke_l^TC^{-1}e_j=-C_{ik}^{-1}C_{lj}^{-1}.$$

When $C$ is symmetric, then it can be written as

$$C=\sum_{i}c_{ii}e_ie_i^T+\sum_{i>j}c_{ij}(e_ie_j^T+e_je_i^T).$$

$$\begin{array}{rcl} \dfrac{d}{dc_{kl}}C^{-1}&=&-C^{-1}\dfrac{dC}{dc_{kl}}C^{-1}=-C^{-1}(e_ke_l^T+e_le_k^T)C^{-1},\ \mathrm{for}\ k\ne l\\ \dfrac{d}{dc_{kk}}C^{-1}&=&-C^{-1}\dfrac{dC}{dc_{kk}}C^{-1}=-C^{-1}e_ke_k^TC^{-1} \end{array}$$

then we have that

$$\begin{array}{rcl} \dfrac{d}{dc_{kl}}C_{ij}^{-1}&=&-e_i^TC^{-1}(e_ke_l^T+e_le_k^T)C^{-1}e_j=-C_{ik}^{-1}C_{lj}^{-1}-C_{il}^{-1}C_{kj}^{-1},\ \mathrm{for}\ k\ne l\\ \dfrac{d}{dc_{kk}}C_{ij}^{-1}&=&-e_i^TC^{-1}e_ke_k^TC^{-1}e_j=-C_{ik}^{-1}C_{kj}^{-1}.\end{array}$$

edited May 19 '22 at 11:05

answered May 18 '22 at 15:40

KBS

7,114

What about the symmetry? – Murat Güven May 18 '22 at 15:52
@MuratGüven Edited the post. – KBS May 18 '22 at 16:59
Consider a diagonal (and therefore symmetric) matrix $$ C = {\rm Diag}\Big(a;;b;;c;;\ldots\Big) \quad\implies\quad C^{-1} = {\rm Diag}\Big(a^{-1};;b^{-1};;c^{-1};;\ldots\Big) $$ According to your general formula the derivative of the first component is $$\frac{da^{-1}}{da} = -a^{-2}\qquad \Big(i=j=k=\ell=\tt1\Big)$$ which is correct, but according to your symmetric formula it should be doubled for some reason. That doesn't make any sense. – greg May 19 '22 at 11:05
Okay, you've fixed the diagonal terms, but the whole idea that symmetric matrices require special treatment is bogus. Please read this paper for a detailed explanation. – greg May 19 '22 at 11:10
@greg I'll pass. I do not care. – KBS May 19 '22 at 11:11
1

@greg If symmetric matrices don't require special treatment why do we have symmetric fourth-order identity in the first place? Please take a look at this paper – Murat Güven May 19 '22 at 12:02
2

@MuratGüven I've read the paper and updated my answer with an explanation. – greg May 19 '22 at 13:25

Derivative of the inverse of a symmetric matrix w.r.t itself

2 Answers2

Update