7

Let $A$ be a matrix, supposedly $k\times k$ matrix. I know that $$\frac{\partial A^{-1}}{\partial A} = -A^{-2} $$ I do not know how I am supposed to obtain the following results using this fact. I want to know the step of $$\frac{\partial a^\top A^{-1} b}{\partial A} = -(A^\top)^{-1}ab^\top (A^\top)^{-1} $$ Also, I want to know the solution to $$\frac{\partial (A^\top)^{-1}ab^\top (A^\top)^{-1} }{\partial A} = ? $$

user1292919
  • 1,895

1 Answers1

16

Start with the defining equation for the matrix inverse and find its differential. $$\eqalign{ I &= A^{-1}A \\ 0 &= dA^{-1}\,A + A^{-1}\,dA \\ dA^{-1} &= -A^{-1}\,dA\,A^{-1} \\ }$$ Next note the gradient of a matrix with respect to itself. $$ {\mathcal H}_{ijkl} = \frac{\partial A_{ij}}{\partial A_{kl}} = \delta_{ik}\delta_{jl} $$ Note that ${\mathcal H}$ is a 4th order tensor with some interesting symmetry properties (isotropic). It is also the identity element for the Frobenius product, i.e. for any matrix $B$ $${\mathcal H}:B=B:{\mathcal H}=B$$ Now we can answer your first question. The function of interest is scalar-valued. Let's find its differential and gradient $$\eqalign{ \phi &= a^TA^{-1}b \cr &= ab^T:A^{-1} \\ d\phi &= ab^T:dA^{-1} \cr &= -ab^T:A^{-1}\,dA\,A^{-1} \\ &= -A^{-T}ab^TA^{-T}:dA \\ \frac{\partial\phi}{\partial A} &= -A^{-T}ab^TA^{-T} \\ }$$ Now let's try the second question. This time the function of interest is matrix-valued. $$\eqalign{ F &= A^{-1}ab^TA^{-1} \\ dF &= dA^{-1}ab^TA^{-1} + A^{-1}ab^TdA^{-1} \\ &= -A^{-1}\,dA\,A^{-1}ab^TA^{-1} - A^{-1}ab^TA^{-1}\,dA\,A^{-1} \\ &= -A^{-1}\,dA\,F - F\,dA\,A^{-1} \\ &= -\Big(A^{-1}{\mathcal H}F^T + F{\mathcal H}A^{-T}\Big):dA \\ \frac{\partial F}{\partial A} &= -\Big(A^{-1}{\mathcal H}F^T+F{\mathcal H}A^{-T}\Big) \\ }$$ This gradient is a 4th order tensor.

If you prefer, you can vectorize the matrices to flatten the result. $$\eqalign{ {\rm vec}(dF) &= -{\rm vec}(A^{-1}\,dA\,F + F\,dA\,A^{-1}) \\ &= -(F^T\otimes A^{-1} + A^{-T}\otimes F)\,{\rm vec}(dA) \\ df &= -(F^T\otimes A^{-1} + A^{-T}\otimes F)\,da \\ \frac{\partial f}{\partial a} &= -\Big(F^T\otimes A^{-1} + A^{-T}\otimes F\Big) \\\\ }$$ In some step above, a colon was used to denote the Frobenius (double-contraction) product $$\eqalign{ A &= {\mathcal H}:B &\implies &A_{ij} &= \sum_{kl}{\mathcal H}_{ijkl} B_{kl} \\ \alpha &= H:B &\implies &\alpha &= \sum_{ij}H_{ij} B_{ij} = {\rm Tr}(H^TB) \\ }$$

greg
  • 35,825
  • 2
    it might be obvious to you and others, but can you explain how to reach from this step $-A^{-1},dA,F - F,dA,A^{-1}$ to this step $-\Big(A^{-1}{\mathcal H}F^T + F{\mathcal H}A^{-T}\Big):dA$? – user550103 Dec 04 '18 at 06:04
  • 1
    It's not obvious, it's just one of the properties of ${\mathcal H}$. Work out a simple example like $(A{\mathcal H}B:X)$ in index notation and, utilizing the properties of those Kronecker deltas, you'll find that it equals $(AXB^T)$ – greg Dec 04 '18 at 13:14
  • 1
    It's also closely related to the equally non-obvious formula $${\rm vec}(AXB^T) = (B\otimes A),{\rm vec}(X)$$ which uses a Kronecker product instead of Kronecker deltas. – greg Dec 04 '18 at 13:22
  • Thank you for the explanation, greg! – user550103 Dec 05 '18 at 09:58
  • Late to the party, but how should one modify this answer if $A$ is symmetric? – husB Dec 28 '21 at 06:18
  • (continuing from my prev comment,) My attempt was to replace $\mathcal{H}{ijkl}$ with $\frac{1}{2} ( \delta{ik}\delta_{jl}+ \delta_{il}\delta_{jk} )$, but could not proceed after that – husB Dec 28 '21 at 10:10