Let $A$ be a matrix, supposedly $k\times k$ matrix. I know that $$\frac{\partial A^{-1}}{\partial A} = -A^{-2} $$ I do not know how I am supposed to obtain the following results using this fact. I want to know the step of $$\frac{\partial a^\top A^{-1} b}{\partial A} = -(A^\top)^{-1}ab^\top (A^\top)^{-1} $$ Also, I want to know the solution to $$\frac{\partial (A^\top)^{-1}ab^\top (A^\top)^{-1} }{\partial A} = ? $$
-
3First formula; is not it $-A^ {-2}$ ? – Damien Dec 03 '18 at 06:36
-
@Damien Yes, i editted – user1292919 Dec 03 '18 at 15:26
-
Do you just know the first identity you stated or also how to obtain it? Because if so, you should be able to figure out the other identities too. However, please let us know what your attempts are. – Dec 03 '18 at 15:41
1 Answers
Start with the defining equation for the matrix inverse and find its differential. $$\eqalign{ I &= A^{-1}A \\ 0 &= dA^{-1}\,A + A^{-1}\,dA \\ dA^{-1} &= -A^{-1}\,dA\,A^{-1} \\ }$$ Next note the gradient of a matrix with respect to itself. $$ {\mathcal H}_{ijkl} = \frac{\partial A_{ij}}{\partial A_{kl}} = \delta_{ik}\delta_{jl} $$ Note that ${\mathcal H}$ is a 4th order tensor with some interesting symmetry properties (isotropic). It is also the identity element for the Frobenius product, i.e. for any matrix $B$ $${\mathcal H}:B=B:{\mathcal H}=B$$ Now we can answer your first question. The function of interest is scalar-valued. Let's find its differential and gradient $$\eqalign{ \phi &= a^TA^{-1}b \cr &= ab^T:A^{-1} \\ d\phi &= ab^T:dA^{-1} \cr &= -ab^T:A^{-1}\,dA\,A^{-1} \\ &= -A^{-T}ab^TA^{-T}:dA \\ \frac{\partial\phi}{\partial A} &= -A^{-T}ab^TA^{-T} \\ }$$ Now let's try the second question. This time the function of interest is matrix-valued. $$\eqalign{ F &= A^{-1}ab^TA^{-1} \\ dF &= dA^{-1}ab^TA^{-1} + A^{-1}ab^TdA^{-1} \\ &= -A^{-1}\,dA\,A^{-1}ab^TA^{-1} - A^{-1}ab^TA^{-1}\,dA\,A^{-1} \\ &= -A^{-1}\,dA\,F - F\,dA\,A^{-1} \\ &= -\Big(A^{-1}{\mathcal H}F^T + F{\mathcal H}A^{-T}\Big):dA \\ \frac{\partial F}{\partial A} &= -\Big(A^{-1}{\mathcal H}F^T+F{\mathcal H}A^{-T}\Big) \\ }$$ This gradient is a 4th order tensor.
If you prefer, you can vectorize the matrices to flatten the result. $$\eqalign{ {\rm vec}(dF) &= -{\rm vec}(A^{-1}\,dA\,F + F\,dA\,A^{-1}) \\ &= -(F^T\otimes A^{-1} + A^{-T}\otimes F)\,{\rm vec}(dA) \\ df &= -(F^T\otimes A^{-1} + A^{-T}\otimes F)\,da \\ \frac{\partial f}{\partial a} &= -\Big(F^T\otimes A^{-1} + A^{-T}\otimes F\Big) \\\\ }$$ In some step above, a colon was used to denote the Frobenius (double-contraction) product $$\eqalign{ A &= {\mathcal H}:B &\implies &A_{ij} &= \sum_{kl}{\mathcal H}_{ijkl} B_{kl} \\ \alpha &= H:B &\implies &\alpha &= \sum_{ij}H_{ij} B_{ij} = {\rm Tr}(H^TB) \\ }$$

- 35,825
-
2it might be obvious to you and others, but can you explain how to reach from this step $-A^{-1},dA,F - F,dA,A^{-1}$ to this step $-\Big(A^{-1}{\mathcal H}F^T + F{\mathcal H}A^{-T}\Big):dA$? – user550103 Dec 04 '18 at 06:04
-
1It's not obvious, it's just one of the properties of ${\mathcal H}$. Work out a simple example like $(A{\mathcal H}B:X)$ in index notation and, utilizing the properties of those Kronecker deltas, you'll find that it equals $(AXB^T)$ – greg Dec 04 '18 at 13:14
-
1It's also closely related to the equally non-obvious formula $${\rm vec}(AXB^T) = (B\otimes A),{\rm vec}(X)$$ which uses a Kronecker product instead of Kronecker deltas. – greg Dec 04 '18 at 13:22
-
-
Late to the party, but how should one modify this answer if $A$ is symmetric? – husB Dec 28 '21 at 06:18
-
(continuing from my prev comment,) My attempt was to replace $\mathcal{H}{ijkl}$ with $\frac{1}{2} ( \delta{ik}\delta_{jl}+ \delta_{il}\delta_{jk} )$, but could not proceed after that – husB Dec 28 '21 at 10:10