derivative of inverse matrix by itself

Question

Let $A$ be a matrix, supposedly $k\times k$ matrix. I know that $$\frac{\partial A^{-1}}{\partial A} = -A^{-2} $$ I do not know how I am supposed to obtain the following results using this fact. I want to know the step of $$\frac{\partial a^\top A^{-1} b}{\partial A} = -(A^\top)^{-1}ab^\top (A^\top)^{-1} $$ Also, I want to know the solution to $$\frac{\partial (A^\top)^{-1}ab^\top (A^\top)^{-1} }{\partial A} = ? $$

Do you just know the first identity you stated or also how to obtain it? Because if so, you should be able to figure out the other identities too. However, please let us know what your attempts are. — , Dec 03 '18 at 15:41

greg · Accepted Answer · 2021-11-17T15:35:32.337

Start with the defining equation for the matrix inverse and find its differential. $$\eqalign{ I &= A^{-1}A \\ 0 &= dA^{-1}\,A + A^{-1}\,dA \\ dA^{-1} &= -A^{-1}\,dA\,A^{-1} \\ }$$ Next note the gradient of a matrix with respect to itself. $$ {\mathcal H}_{ijkl} = \frac{\partial A_{ij}}{\partial A_{kl}} = \delta_{ik}\delta_{jl} $$ Note that ${\mathcal H}$ is a 4th order tensor with some interesting symmetry properties (isotropic). It is also the identity element for the Frobenius product, i.e. for any matrix $B$ $${\mathcal H}:B=B:{\mathcal H}=B$$ Now we can answer your first question. The function of interest is scalar-valued. Let's find its differential and gradient $$\eqalign{ \phi &= a^TA^{-1}b \cr &= ab^T:A^{-1} \\ d\phi &= ab^T:dA^{-1} \cr &= -ab^T:A^{-1}\,dA\,A^{-1} \\ &= -A^{-T}ab^TA^{-T}:dA \\ \frac{\partial\phi}{\partial A} &= -A^{-T}ab^TA^{-T} \\ }$$ Now let's try the second question. This time the function of interest is matrix-valued. $$\eqalign{ F &= A^{-1}ab^TA^{-1} \\ dF &= dA^{-1}ab^TA^{-1} + A^{-1}ab^TdA^{-1} \\ &= -A^{-1}\,dA\,A^{-1}ab^TA^{-1} - A^{-1}ab^TA^{-1}\,dA\,A^{-1} \\ &= -A^{-1}\,dA\,F - F\,dA\,A^{-1} \\ &= -\Big(A^{-1}{\mathcal H}F^T + F{\mathcal H}A^{-T}\Big):dA \\ \frac{\partial F}{\partial A} &= -\Big(A^{-1}{\mathcal H}F^T+F{\mathcal H}A^{-T}\Big) \\ }$$ This gradient is a 4th order tensor.

If you prefer, you can vectorize the matrices to flatten the result. $$\eqalign{ {\rm vec}(dF) &= -{\rm vec}(A^{-1}\,dA\,F + F\,dA\,A^{-1}) \\ &= -(F^T\otimes A^{-1} + A^{-T}\otimes F)\,{\rm vec}(dA) \\ df &= -(F^T\otimes A^{-1} + A^{-T}\otimes F)\,da \\ \frac{\partial f}{\partial a} &= -\Big(F^T\otimes A^{-1} + A^{-T}\otimes F\Big) \\\\ }$$ In some step above, a colon was used to denote the Frobenius (double-contraction) product $$\eqalign{ A &= {\mathcal H}:B &\implies &A_{ij} &= \sum_{kl}{\mathcal H}_{ijkl} B_{kl} \\ \alpha &= H:B &\implies &\alpha &= \sum_{ij}H_{ij} B_{ij} = {\rm Tr}(H^TB) \\ }$$

it might be obvious to you and others, but can you explain how to reach from this step $-A^{-1},dA,F - F,dA,A^{-1}$ to this step $-\Big(A^{-1}{\mathcal H}F^T + F{\mathcal H}A^{-T}\Big):dA$? — user550103, Dec 04 '18 at 06:04
It's not obvious, it's just one of the properties of ${\mathcal H}$. Work out a simple example like $(A{\mathcal H}B:X)$ in index notation and, utilizing the properties of those Kronecker deltas, you'll find that it equals $(AXB^T)$ — greg, Dec 04 '18 at 13:14
It's also closely related to the equally non-obvious formula $${\rm vec}(AXB^T) = (B\otimes A),{\rm vec}(X)$$ which uses a Kronecker product instead of Kronecker deltas. — greg, Dec 04 '18 at 13:22
Late to the party, but how should one modify this answer if $A$ is symmetric? — husB, Dec 28 '21 at 06:18
(continuing from my prev comment,) My attempt was to replace $\mathcal{H}{ijkl}$ with $\frac{1}{2} ( \delta{ik}\delta_{jl}+ \delta_{il}\delta_{jk} )$, but could not proceed after that — husB, Dec 28 '21 at 10:10

derivative of inverse matrix by itself

1 Answers1

Linked