8

How can I calculate $\dfrac{\partial a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a}{\partial A}$, where $A\in\mathbb{R}^{n\times n}$ and $a,b\in\mathbb{R}^n$?

Yasi
  • 909

3 Answers3

3

The problem was just modified. If there is b (as now), then the solution would be much simpler. Note that $$a^{\rm T}A^{-\rm T}b = b^{\rm T}A^{-1}a$$ since they are numbers and transposing one of them would give you the other. Hence from chain rule, $$\frac{\partial}{\partial A}(a^{\rm T}A^{-\rm T}bb^{\rm T}A^{-1}a)=2(b^{\rm T}A^{-1}a)\frac{\partial}{\partial A}(b^{\rm T}A^{-1}a)$$ Also note that when we take derivative with respect to $A$, both $a$ and $b$ are treated as constants. Then $$\frac{\partial}{\partial A}(b^{\rm T}A^{-1}a)=b^{\rm T}\frac{\partial A^{-1}}{\partial A}a$$ Finally it remains to calculate $\partial A^{-1}/\partial A$. From the identity $$AA^{-1} = I$$ taking derivative with respect to $A$, we obtain $$\frac{\partial}{\partial A}(AA^{-1})=IA^{-1}+A\frac{\partial A^{-1}}{\partial A}=0$$ Thus $$\frac{\partial A^{-1}}{\partial A}=-A^{-2}.$$

Jiaqi Li
  • 888
  • 1
    The equality $\frac{\partial}{\partial A}(AA^{-1})=IA^{-1}+A\frac{\partial A^{-1}}{\partial A}=0$ isn’t correct. The issue which isn’t easy to understand is that $\frac{\partial A}{\partial A}$ isn’t the matrix $I$ but the identity. And $\frac{\partial A^{-1}}{\partial A}.H = -A^{-1}HA^{-1}$ not $-A{-2}$. See https://math.stackexchange.com/questions/1471825/derivative-of-the-inverse-of-a-matrix for more details. – mathcounterexamples.net Feb 12 '18 at 06:52
  • @mathcounterexamples.net Yes, you are right. Thanks for pointing it out. Could you explain how do we calculate $\frac{\partial A^{-1}}{\partial A}.H = -A^{-1}HA^{-1}$? – Jiaqi Li Feb 12 '18 at 23:24
  • @mathcounterexamples.net I've looked at the link you provided, but I still have trouble understanding the derivative when the independent variable is a matrix (2nd order tensor). – Jiaqi Li Feb 12 '18 at 23:27
  • What is important to understand is for an internal application $f$ of the matrices spaces, the derivative is an application from the matrices space into the linear applications between the matrices space. Apart from that the link I provided is pretty explicit on the computation of the derivative of $A \mapsto A^{-1}$ – mathcounterexamples.net Feb 13 '18 at 05:42
3

Hint

Name $\phi_1 : A \mapsto A^{-1}$, $\phi_2 : A \mapsto b^T A a$ and $\phi_3: A \mapsto A^T A$. Note that your map $\phi$ is $\phi = \phi_3 \circ \phi_2 \circ \phi_1$.

You can then use the chain rule $\phi^\prime = \phi_3^\prime \cdot \phi_2^\prime \cdot \phi_1^\prime$, based on $\phi_1^\prime(A).H =-A^{-1}HA^{-1}$, $\phi_2^\prime(A).H = b^T H a$ and $\phi_3^\prime(A).H = 2A^T H$.

You’ll finally get:

$$\frac{\partial \phi}{\partial A}.H = -2 (b^TA^{-1}a)^Tb^TA^{-1}HA^{-1}a =-2a^T\left(A^{-1}\right)^T bb^T A^{-1}HA^{-1}a$$

  • Since the final result is a second-order tensor, how could we transform the result to a matrix form (without using the increment $H$)? – Jiaqi Li Feb 12 '18 at 23:34
  • @JiaqiLi You can’t do that. The reason is that a linear application between matrices spaces isn’t always of the form $H \mapsto AH$. This is the case here. The reason is that matrices are not always commuting. – mathcounterexamples.net Feb 13 '18 at 05:47
0

$ \def\l{\lambda}\def\o{{\tt1}}\def\p{\partial} \def\A{A^{-1}} \def\B{A^{-T}} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Big(#1\Big)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Use a colon to denote the Frobenius product, which is a concise notation for the trace, i.e. $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.

The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different but equivalent ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$

Introduce the scalar variable $$\eqalign{ \l \;=\; {a^T\B b} \;=\; {b^T\A a} \;=\; {ba^T:\A} }$$ whose differential is $$\eqalign{ d\l &= {ba^T:\c{d\A}} \\ &= ba^T:\c{\LR{-\A\;dA\;\A}} \\ &= -\LR{\B ba^T\B}:dA \\ }$$


Use the above notation to write the function, then calculate its differential and gradient. $$\eqalign{ f &= \l^2 \\ df &= 2\l\;\c{d\l} \\ &= -2\l \c{\LR{\B ba^T\B}:dA} \\ \grad{f}{A} &= -2\l \LR{\B ba^T\B} \\ &= -2 \LR{b^T\A a} \LR{\B ba^T\B} \\\\ }$$

greg
  • 35,825