How to show $\frac {\partial a^{T}X^{-1}b}{\partial X} = -\left( X^{-1}\right) ^{T}ab^{T}\left( X^{-1}\right) ^{T}$?

Question

I am struggling with this proof where $X$ is $m \times n$ matrix, $a$ is $m$ vector, $b$ is $n$ vector.

$$\frac {\partial a^{T}X^{-1}b}{\partial X} = -\left( X^{-1}\right) ^{T}ab^{T}\left( X^{-1}\right) ^{T}$$

I know $$\frac {\partial }{\partial X}f\left( X\right) ^{-1}=-f\left( X\right) ^{-1}\dfrac {\partial f\left( X\right) }{\partial x}f\left( X\right) ^{-1}$$

and am guessing to use this fact, I also know $\dfrac {\partial a^{T}Xb}{\partial X} = ab^{T}$.

When I use the chain rule I don't seem to get the form with the transposes.

I believe the result should be $\in \mathbb{R} ^{1\times \left( m\times n\right) }$

This is not a duplicate as the result is different, the result here is not $a^{T}b^{T}$ in the middle, as the function to be differentiated is not the same, and I am trying to understand how this works. — JimSi, Aug 25 '19 at 16:26
@JimSi The duplicate is more general, but your question is precisely the case where $A=a^T$ is a row matrix and $B=b$ is a column matrix. If you don't understand the duplicate you can ask a new question asking for clarification. — Arnaud D., Aug 25 '19 at 16:57
Agreed, I still don't know how the technique below exactly works. But yes you are of course right. — JimSi, Aug 26 '19 at 17:40

score 2 · Accepted Answer · answered Aug 22 '19 at 10:01

2

Before we start deriving the gradient, some facts and notations for brevity:

Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A C^T : B \\ &= {\text{etc.}} \cr \end{align}

Firstly, we obtain the differential for $X^{-1}$, which will be utilized for the gradient you are seeking: \begin{align} d\left[X^{-1}X = I\right] &= dX^{-1} X + X^{-1}dX = 0 \\ & \Leftrightarrow dX^{-1} = -X^{-1} dX X^{-1} \ . \end{align}

Let $f := a^T X^{-1} b = a: X^{-1} b$.

Now, we can obtain the differential first, and then the gradient of $\frac{\partial f}{\partial X}$. \begin{align} df &= a: dX^{-1} b \\ &= a: -X^{-1} dX X^{-1} b\\ &= -X^{-T} a b^T X^{-T} : dX \\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial f}{\partial X} = -X^{-T} a b^T X^{-T}. \end{align}

answered Aug 22 '19 at 10:01

user550103

2,688

Thanks, that taught me a lot. From $\langle A,B\rangle {F}=\sum _{i,j}A{ij}B_{ij}= tr\left( A^{T}B\right) = tr\left( AB^{T}\right) $ etc. – JimSi Aug 22 '19 at 12:04
How common is this notation for the Frobenius inner product? It's certainly handy but I hadn't encountered it before. – Semiclassical Aug 23 '19 at 05:32
I must admit that I don't know how common it is. But I know few people who use it (here in stack exchange for instance, e.g., greg, lynn, frank). Also, I have seen some lecture notes, e.g., http://www.cs.cmu.edu/~ggordon/10725-F12/slides/10-matrix.pdf (e.g., slide#17). Since it is a notation and turns out to be handy, then we should embrace it, in my humble opinion. – user550103 Aug 23 '19 at 08:04
Hi @user550103, I'm having another look at this technique, as it appears really useful and I realise I don't understand. You have $f := a^T X^{-1} b = a: X^{-1} b = \langle a,X^{-1} b\rangle _{F}$ – JimSi Aug 25 '19 at 15:50
But then $df = \langle-X^{-T} a b^T X^{-T}, dX\rangle _{F}$, wouldn't that become $df = (-X^{-T} a b^T X^{-T})^{T}dX$ – JimSi Aug 25 '19 at 15:51
@JimSi $df = Tr(...)$ – Spaceship222 Sep 10 '19 at 10:11

How to show $\frac {\partial a^{T}X^{-1}b}{\partial X} = -\left( X^{-1}\right) ^{T}ab^{T}\left( X^{-1}\right) ^{T}$?

1 Answers1

Linked