1

During the derivation of GDA as generative algorithm, I am stuck at how to take the gradient

$$\nabla_X \left( a^TX^{-1}b \right)$$

where $a, b$ are column vectors independent of $X$.

I have tried using trace operator and chain rule, but could not crack it. How should this derivative be approached?


The answer is

$$-X^{-T}ab^TX^{-T}$$

Freemn
  • 15

2 Answers2

2

Let's use a colon to denote the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$ Use the Frobenius product to write the function. Then find its differential and gradient. $$\eqalign{ \phi &= a^TX^{-1}b = a:X^{-1}b \cr &= ab^T:X^{-1} \cr d\phi &= ab^T:dX^{-1} = ab^T:(-X^{-1}\,dX\,X^{-1}) \cr &= -X^{-T}ab^TX^{-T}:dX \cr \frac{\partial \phi}{\partial X} &= -X^{-T}ab^TX^{-T} \cr }$$ NB:
The cyclic property of the trace allows terms in a Frobenius product to be rearranged, e.g. $$\eqalign{ A:BC &= B^TA:C = AC^T:B }$$ The differential of $X^{-1}$ is obtained from the differential of its defining property. $$\eqalign{ I &= X^{-1}X \cr dI &= dX^{-1}X+X^{-1}dX \cr 0 &= dX^{-1}+X^{-1}dX\,X^{-1} \cr dX^{-1} &= -X^{-1}dX\,X^{-1} \cr }$$

greg
  • 35,825
0

Let function $f : \mbox{GL}_n (\mathbb R) \to \mathbb R$ be defined by

$$f (\mathrm X) := \mathrm a^{\top} \mathrm X^{-1} \mathrm b$$

where $\mathrm a, \mathrm b \in \mathbb R^n$ are given. Hence, for $|h| \ll 1$,

$$\begin{array}{rl} f (\mathrm X + h \mathrm V) &= \mathrm a^{\top} (\mathrm X + h \mathrm V)^{-1} \mathrm b\\ &= \mathrm a^{\top} (\mathrm I_n + h \mathrm X^{-1} \mathrm V)^{-1} \mathrm X^{-1} \mathrm b\\ &\approx \mathrm a^{\top} (\mathrm I_n - h \mathrm X^{-1} \mathrm V) \mathrm X^{-1} \mathrm b\\ &= f (\mathrm X) - h \, \mathrm a^{\top} \mathrm X^{-1} \mathrm V \mathrm X^{-1} \mathrm b\\ &= f (\mathrm X) - h \, \mbox{tr} \left( \mathrm X^{-1} \mathrm b \mathrm a^{\top} \mathrm X^{-1} \mathrm V \right)\\ &= f (\mathrm X) + h \left\langle \color{blue}{-\mathrm X^{-\top} \mathrm a \mathrm b^{\top} \mathrm X^{-\top}} , \mathrm V \right\rangle \end{array}$$

Thus, the gradient of $f$ with respect to $\rm X$ is

$$\nabla_{\rm X} f (\mathrm X) = \color{blue}{-\mathrm X^{-\top} \mathrm a \mathrm b^{\top} \mathrm X^{-\top}}$$