6

I'm trying take the derivative of a symmetrix matrix $\mathbf{C}$ with respect to itself. $$ \begin{equation} \frac{\partial \mathbf{C}^{-1}}{\partial \mathbf{C}} \end{equation} $$

Using the indicial notation, above equation can be rewritten as follows $$ \begin{equation} \frac{\partial C_{ij}^{-1}}{\partial C_{kl}} \end{equation} $$

At first I've used the following formula, $$ \begin{equation} \frac{\partial C_{ij}^{-1}}{\partial C_{kl}} = -C^{-1}_{ik}C^{-1}_{lj} \end{equation} $$

But I quickly realized that we've lost the symmetry of the problem now.

I read The Matrix Cookbook and the other posts about the same problem but unfortunately, I couldn't understand the things they've done.

For example in this article, at Eq.(100) authors have used the property below when taking the derivative of Eq.(99) $$ \begin{equation} \frac{\partial \mathbf{C}^{-1}}{\partial \mathbf{C}} = -\mathbf{C}^{-1} \boxtimes \mathbf{C}^{-T} \mathbf{I}_s \end{equation} $$ Where $\boxtimes$ is the square product, $\mathbf{I}_s$ is the symmetric fourth-order identity tensor and they are defined as follows $$ \begin{align} (\mathbf{A} \boxtimes \mathbf{B})_{ijkl} &= \mathbf{A}_{ik}\mathbf{B}_{jl} \\ (\mathbf{I}_s)_{ijkl} &= \frac{1}{2}(\delta_{ik}\delta_{jl}+\delta_{il}\delta_{jk}) \end{align} $$

I couldn't understand how did they achieve this result and how can I derive it myself.

2 Answers2

5

$ \def\p{\partial}\def\o{{\tt1}} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\C{C^{-1}}\def\Ct{C^{-T}} \def\LR#1{\left(#1\right)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Once you learn the technique, the problem can be solved very briefly $$\eqalign{ d\C &= -{\C\,dC\,\C} \\ &= -\LR{\C\E\Ct}:dC \\ \grad{\C}{C} &= -{\C\E\Ct} \\ }$$ The details are as follows...


Introduce a fourth-order tensor $\E$ with components $$\eqalign{ \E_{ijk\ell} = \delta_{ik}\,\delta_{j\ell} = \begin{cases} \o\quad{\rm if}\; i=k\;\;{\rm and}\;\;j=\ell \\ 0\quad{\rm otherwise} \end{cases} \\ }$$ The most useful property of this tensor is its ability to rearrange matrix products $$\eqalign{ ABC &= \LR{A\E C^T}:B \;=\; \F:B \\ }$$ where juxtaposition implies a single-dot product and a colon $(:)$ denotes the double-dot product $$\eqalign{ &\F_{ijk\ell} = \sum_{p=1}^n\sum_{r=1}^n A_{i\c{p}}\E_{\c{p}jk\c{r}}\,C_{\c{r}\ell}^T \;=\; A_{ik}C_{\ell j} \\ &\LR{\F:B}_{ij} = \sum_{k=1}^n\sum_{\ell=1}^n \F_{ij\c{k\ell}}\,B_{\c{k\ell}} = \sum_{k=1}^n\sum_{\ell=1}^n A_{i\c{k}}B_{\c{k\ell}}C_{\c{\ell}j} \\ }$$ Start with the differential of the matrix inverse identity $$\eqalign{ &I = \C C \\ &dI = \c{d\C}C + \C dC \;\doteq\; 0 \\ &\c{d\C} = -\C\,dC\,\C \\ }$$ Then use $\E$ to rearrange the terms and recover the gradient $$\eqalign{ d\C &= -{\C\E\Ct}:dC \\ \grad{\C}{C} &= -{\C\E\Ct} \\ }$$ Or in component notation $$\eqalign{ \grad{\C_{ij}}{C_{k\ell}} &= -\sum_{p=1}^n\sum_{r=1}^n \C_{i\c{p}}\E_{\c{p}jk\c{r}}\C_{\ell\c{r}} \;=\; -\C_{ik}\C_{\ell j} \\ }$$

Update

The comments have become a rehash of the old "symmetric gradient" debate.

On the other hand, if a small set of scalar parameters are used to construct a tensor quantity, then the derivative of the tensor components with respect to one of those scalar parameters can exhibit any number of interesting symmetries. A large part of Continuum Mechanics is devoted to studying the implications of such symmetries.

But that's a different problem than calculating the derivative of one tensor component with respect to another tensor component. But many people (even professors and famous authors) often conflate these two problems.

greg
  • 35,825
  • Nice explanation but is this result still valid for symmetric C? It looks like you ended up with the same expression I've used in my post. – Murat Güven May 18 '22 at 18:30
  • 2
    When it comes to gradients, symmetric matrices are terribly misunderstood. If you're interested you should really study this paper. In short, a gradient expression which is valid for a general matrix is also valid for a symmetric matrix. When it comes to higher-order tensors, the concept of symmetry itself gets ambiguous, i.e. is it symmetric in its first 2 indexes? Its last 2? The first and the last indexes? It's not worth the headache to worry about such things if all you need is a valid expression for a gradient descent algorithm. – greg May 18 '22 at 20:09
  • 1
    BTW, you haven't "lost" any of the symmetry inherent in the problem. In the indexed expression, $(i,j)$ can be swapped on the RHS without affecting its validity. Likewise, and independently, $(k,\ell)$ could be swapped. – greg May 18 '22 at 20:48
  • For example if I swap the indices of the nominator I get $\frac{\partial C_{ji}^{-1}}{\partial C_{kl}} = -C^{-1}{jk}C^{-1}{li}$ but is this equal to $-C^{-1}{ik}C^{-1}{lj}$ – Murat Güven May 19 '22 at 08:05
3

In the single-variable case, we have that

$$\dfrac{d}{dt}C(t)^{-1}=-C(t)^{-1}\dfrac{dC(t)}{dt}C(t)^{-1}.$$

This can obtained by differentiating the expression $C(t)C(t)^{-1}=I$ on both sides with some simple algebra. This directly generalizes to the multivariable case by expressing

$$C=\sum_{i,j}c_{ij}e_ie_j^T.$$

Then, we have that

$$\dfrac{d}{dc_{kl}}C^{-1}=-C^{-1}\dfrac{dC}{dc_{kl}}C^{-1}=-C^{-1}e_ke_l^TC^{-1},$$

from which we get

$$\dfrac{d}{dc_{kl}}C_{ij}^{-1}=-e_i^TC^{-1}e_ke_l^TC^{-1}e_j=-C_{ik}^{-1}C_{lj}^{-1}.$$

When $C$ is symmetric, then it can be written as

$$C=\sum_{i}c_{ii}e_ie_i^T+\sum_{i>j}c_{ij}(e_ie_j^T+e_je_i^T).$$

$$\begin{array}{rcl} \dfrac{d}{dc_{kl}}C^{-1}&=&-C^{-1}\dfrac{dC}{dc_{kl}}C^{-1}=-C^{-1}(e_ke_l^T+e_le_k^T)C^{-1},\ \mathrm{for}\ k\ne l\\ \dfrac{d}{dc_{kk}}C^{-1}&=&-C^{-1}\dfrac{dC}{dc_{kk}}C^{-1}=-C^{-1}e_ke_k^TC^{-1} \end{array}$$

then we have that

$$\begin{array}{rcl} \dfrac{d}{dc_{kl}}C_{ij}^{-1}&=&-e_i^TC^{-1}(e_ke_l^T+e_le_k^T)C^{-1}e_j=-C_{ik}^{-1}C_{lj}^{-1}-C_{il}^{-1}C_{kj}^{-1},\ \mathrm{for}\ k\ne l\\ \dfrac{d}{dc_{kk}}C_{ij}^{-1}&=&-e_i^TC^{-1}e_ke_k^TC^{-1}e_j=-C_{ik}^{-1}C_{kj}^{-1}.\end{array}$$

KBS
  • 7,114
  • What about the symmetry? – Murat Güven May 18 '22 at 15:52
  • @MuratGüven Edited the post. – KBS May 18 '22 at 16:59
  • Consider a diagonal (and therefore symmetric) matrix $$ C = {\rm Diag}\Big(a;;b;;c;;\ldots\Big) \quad\implies\quad C^{-1} = {\rm Diag}\Big(a^{-1};;b^{-1};;c^{-1};;\ldots\Big) $$ According to your general formula the derivative of the first component is $$\frac{da^{-1}}{da} = -a^{-2}\qquad \Big(i=j=k=\ell=\tt1\Big)$$ which is correct, but according to your symmetric formula it should be doubled for some reason. That doesn't make any sense. – greg May 19 '22 at 11:05
  • Okay, you've fixed the diagonal terms, but the whole idea that symmetric matrices require special treatment is bogus. Please read this paper for a detailed explanation. – greg May 19 '22 at 11:10
  • @greg I'll pass. I do not care. – KBS May 19 '22 at 11:11
  • 1
    @greg If symmetric matrices don't require special treatment why do we have symmetric fourth-order identity in the first place? Please take a look at this paper – Murat Güven May 19 '22 at 12:02
  • 2
    @MuratGüven I've read the paper and updated my answer with an explanation. – greg May 19 '22 at 13:25