How can I calculate $\dfrac{\partial a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a}{\partial A}$, where $A\in\mathbb{R}^{n\times n}$ and $a,b\in\mathbb{R}^n$?
-
1I just finished your previous problem... Why add another b? – Jiaqi Li Feb 11 '18 at 18:52
-
@JiaqiLi Sorry about that. Could you possibly modify your answer? – Yasi Feb 11 '18 at 18:56
3 Answers
The problem was just modified. If there is b (as now), then the solution would be much simpler. Note that $$a^{\rm T}A^{-\rm T}b = b^{\rm T}A^{-1}a$$ since they are numbers and transposing one of them would give you the other. Hence from chain rule, $$\frac{\partial}{\partial A}(a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a)=2(b^{\rm T}A^{-1}a)\frac{\partial}{\partial A}(b^{\rm T}A^{-1}a)$$ Also note that when we take derivative with respect to $A$, both $a$ and $b$ are treated as constants. Then $$\frac{\partial}{\partial A}(b^{\rm T}A^{-1}a)=b^{\rm T}\frac{\partial A^{-1}}{\partial A}a$$ Finally it remains to calculate $\partial A^{-1}/\partial A$. From the identity $$AA^{-1} = I$$ taking derivative with respect to $A$, we obtain $$\frac{\partial}{\partial A}(AA^{-1})=IA^{-1}+A\frac{\partial A^{-1}}{\partial A}=0$$ Thus $$\frac{\partial A^{-1}}{\partial A}=-A^{-2}.$$

- 888
-
1The equality $\frac{\partial}{\partial A}(AA^{-1})=IA^{-1}+A\frac{\partial A^{-1}}{\partial A}=0$ isn’t correct. The issue which isn’t easy to understand is that $\frac{\partial A}{\partial A}$ isn’t the matrix $I$ but the identity. And $\frac{\partial A^{-1}}{\partial A}.H = -A^{-1}HA^{-1}$ not $-A{-2}$. See https://math.stackexchange.com/questions/1471825/derivative-of-the-inverse-of-a-matrix for more details. – mathcounterexamples.net Feb 12 '18 at 06:52
-
@mathcounterexamples.net Yes, you are right. Thanks for pointing it out. Could you explain how do we calculate $\frac{\partial A^{-1}}{\partial A}.H = -A^{-1}HA^{-1}$? – Jiaqi Li Feb 12 '18 at 23:24
-
@mathcounterexamples.net I've looked at the link you provided, but I still have trouble understanding the derivative when the independent variable is a matrix (2nd order tensor). – Jiaqi Li Feb 12 '18 at 23:27
-
What is important to understand is for an internal application $f$ of the matrices spaces, the derivative is an application from the matrices space into the linear applications between the matrices space. Apart from that the link I provided is pretty explicit on the computation of the derivative of $A \mapsto A^{-1}$ – mathcounterexamples.net Feb 13 '18 at 05:42
Hint
Name $\phi_1 : A \mapsto A^{-1}$, $\phi_2 : A \mapsto b^T A a$ and $\phi_3: A \mapsto A^T A$. Note that your map $\phi$ is $\phi = \phi_3 \circ \phi_2 \circ \phi_1$.
You can then use the chain rule $\phi^\prime = \phi_3^\prime \cdot \phi_2^\prime \cdot \phi_1^\prime$, based on $\phi_1^\prime(A).H =-A^{-1}HA^{-1}$, $\phi_2^\prime(A).H = b^T H a$ and $\phi_3^\prime(A).H = 2A^T H$.
You’ll finally get:
$$\frac{\partial \phi}{\partial A}.H = -2 (b^TA^{-1}a)^Tb^TA^{-1}HA^{-1}a =-2a^T\left(A^{-1}\right)^T bb^T A^{-1}HA^{-1}a$$

- 70,018
-
Since the final result is a second-order tensor, how could we transform the result to a matrix form (without using the increment $H$)? – Jiaqi Li Feb 12 '18 at 23:34
-
@JiaqiLi You can’t do that. The reason is that a linear application between matrices spaces isn’t always of the form $H \mapsto AH$. This is the case here. The reason is that matrices are not always commuting. – mathcounterexamples.net Feb 13 '18 at 05:47
$
\def\l{\lambda}\def\o{{\tt1}}\def\p{\partial}
\def\A{A^{-1}}
\def\B{A^{-T}}
\def\L{\left}\def\R{\right}
\def\LR#1{\L(#1\R)}
\def\BR#1{\Big(#1\Big)}
\def\trace#1{\operatorname{Tr}\LR{#1}}
\def\qiq{\quad\implies\quad}
\def\grad#1#2{\frac{\p #1}{\p #2}}
\def\c#1{\color{red}{#1}}
$Use a colon to denote the Frobenius product, which is a concise notation for the trace, i.e.
$$\eqalign{
A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\
A:A &= \big\|A\big\|^2_F \\
}$$
This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.
The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$
Introduce the scalar variable $$\eqalign{ \l \;=\; {a^T\B b} \;=\; {b^T\A a} \;=\; {ba^T:\A} }$$ whose differential is $$\eqalign{ d\l &= {ba^T:\c{d\A}} \\ &= ba^T:\c{\LR{-\A\;dA\;\A}} \\ &= -\LR{\B ba^T\B}:dA \\ }$$
Use the above notation to write the function, then calculate its differential and gradient. $$\eqalign{ f &= \l^2 \\ df &= 2\l\;\c{d\l} \\ &= -2\l \c{\LR{\B ba^T\B}:dA} \\ \grad{f}{A} &= -2\l \LR{\B ba^T\B} \\ &= -2 \LR{b^T\A a} \LR{\B ba^T\B} \\\\ }$$

- 35,825