Let $q$ be the multivariate Normal distribution $\mathcal{N}(\mu, \Sigma)$ and $x$ be a sample from $q$. Hence, $x$ can be written as $$x = \mu + A\epsilon \,, \Sigma = AA^T\,, \epsilon \sim \mathcal{N}(0, I)$$ and $I$ represents the identity matrix. I am trying to compute $\nabla_{A}\log{q(x)}$.
Now, $$ \nabla_A \log{q(x)} = -\frac{1}{2}\nabla_A \log\det(AA^T) - \frac{1}{2}\nabla_A \epsilon^TA^T(AA^T)^{-1}A\epsilon$$
The first gradient evaluates to $-A^T(AA^T)^{-1}$ (with help from stack exchange answers). However, since I don't have a formal training in graduate level calculus (I am a CS student), I don't know how to evaluate the gradient of the second term. Can anybody help?
After reading up a bit about matrix calculus, this is my effort.
Let $B = A^T(AA^T)^{-1}A$ and $(B + \delta B) = (A+\delta A)^T((A+\delta A)(A+\delta A)^T)^{-1}(A+\delta A)$ This implies $$AB = A$$ and $$(A+\delta A)(B + \delta B) = A+\delta A$$ Expanding the last equation, we get $$A\delta B = \delta A(I-A^T(AA^T)^{-1}A) $$
I am not sure how to proceed after this. Thanks
Note that the gradient of g(A) will be a 4-dimension tensor.
– user2808118 May 28 '16 at 05:49The transpose of accepted answer will be same as yours. $$2<S_A\epsilon \epsilon^TA^T(AA^T)^{-1}, H>$$ $$=2trace(S_A\epsilon \epsilon^TA^T(AA^T)^{-1}H )$$ $$=2trace(\epsilon^TA^T(AA^T)^{-1}HS_A\epsilon )$$ $$=2*\epsilon^TA^T(AA^T)^{-1}HS_A\epsilon$$. which is the same as your reply. (Note that the two terms in your answer are equal.)
– user2808118 May 28 '16 at 16:43