11

I want to differentiate this wrt $\mu$ and $\Sigma$ :
$${1\over \sqrt{(2\pi)^k |\Sigma |}} e^{-0.5 (x-\mu)^T \Sigma^{-1} (x-\mu)} $$

I'm following the matrix cookbook here and also this answer . The solution given in the answer (2nd link), doesn't match with what I read in the cookbook.
For example, for this term, if I follow rule 81 from the linked cookbook, I get a different answer (differentiating wrt $\mu$) :
$(x-\mu)^T \Sigma^{-1} (x-\mu)$

According to the cookbook, the answer should be : $-(\Sigma^{-1} + \Sigma^{-T}) (x-\mu)$ . Or, am I missing something here? Also, how do I differentiate $(x-\mu)^T \Sigma^{-1} (x-\mu)$
with respect to $\Sigma$ ?

3 Answers3

15

For convenience, define some variables which are easier to type $$\eqalign{ M &= \Sigma^{-1} \cr z &= (\mu-x) \cr }$$

Now let's answer your second question first.
Rewrite the function in terms of the above variables and the Frobenius (:) product and find its differential $$\eqalign{ f &= z^TMz \cr &= M:z\,z^T \cr\cr df &= M:(dz\,z^T+z\,dz^T) + zz^T:dM \cr &= Mz:dz + z^TM:dz^T - zz^T:M\,d\Sigma\,M \cr &= Mz:dz + M^Tz:dz - M^Tzz^TM^T:d\Sigma \cr &= (M+M^T)\,z:dz - M^Tzz^TM^T:d\Sigma \cr &= (M+M^T)\,(\mu-x):d\mu - M^Tzz^TM^T:d\Sigma \cr }$$ Setting $d\Sigma=0$ yields the gradient with respect to $\mu$ as $$\eqalign{ \frac{\partial f}{\partial \mu} &= (\Sigma^{-1}+\Sigma^{-T})\,(\mu-x) \cr }$$ and setting $d\mu=0$ yields $$\eqalign{ \frac{\partial f}{\partial \Sigma} &= - M^Tzz^TM^T \cr }$$ Now back to your first function.
Let's write down its logarithm and find the differential $$\eqalign{ L &= \frac{1}{2}\Big(\log\det(M) - f\Big) \cr &= \frac{1}{2}\Big({\rm tr}\log(M) - f\Big) \cr\cr dL &= \frac{1}{2}\Big(M^{-T}:dM - df\Big) \cr &= \frac{1}{2}\Big(M^{-T}:dM - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr &= \frac{1}{2}\Big(-M^{-T}:M\,d\Sigma\,M - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr &= \frac{1}{2}\Big(-M^T:d\Sigma - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr &=\frac{1}{2}(M^Tzz^TM^T-M^T):d\Sigma-\frac{1}{2}(M+M^T)\,(\mu-x):d\mu \cr }$$ Once again, holding one of the independent variables constant yields the gradient with respect to the other $$\eqalign{ \frac{\partial L}{\partial \mu} &= \frac{1}{2}(M+M^T)\,(x-\mu) \cr\cr \frac{\partial L}{\partial \Sigma} &= \frac{1}{2}(M^Tzz^TM^T-M^T) \cr }$$ To recover the gradient of the original function (let's call it $H$) simply apply the logarithmic derivative rule $$\eqalign{ \frac{\partial H}{\partial\mu} &= H\Bigg(\frac{\partial L}{\partial\mu}\Bigg) \cr \frac{\partial H}{\partial\Sigma} &= H\Bigg(\frac{\partial L}{\partial\Sigma}\Bigg) \cr }$$

john316
  • 1,286
  • $\Sigma$ was symmetric to begin with, so a lot of this is more complicated than it needed to be by virtue of generality. – Ian Nov 06 '16 at 20:07
5

I also had the same question as you. After trying equation 81 from the Matrix cookbook, I got this equation: $$ \frac{\partial{f}}{\partial{\mu}} = -\frac{1}{2}(\Sigma ^{-1} + (\Sigma^{-1})^{T}) (x - \mu)*(-1) $$ Since $ \Sigma $ is the co-variance matrix, it is symmetrical. Inverse of a symmetrical matrix is also symmetric (Is the inverse of a symmetric matrix also symmetric?). Therefore, we have $ (\Sigma^{-1})^{T} = \Sigma ^{-1} $.

Now, the above equation reduces to $$ \frac{\partial{f}}{\partial{\mu}} = \Sigma ^{-1}(x - \mu) $$

0

If you're trying to find the derivative with respect to $\mu\in\mathbb R^{n\times 1}$ and $\Sigma\in\mathbb R^{n\times n}$, then I don't think the answer could possibly be $(\Sigma^{-1} + \Sigma^{-T}) (x-\mu)$. I'm wondering if what you're trying to do is find the values of $\mu$ and $\Sigma$ that maximize the expression you wrote, then perhaps This section from Wikipedia will shed some light.

  • No, I don't want to find a maximum likelihood estimate by taking the log.. I want to differentiate the original expression. If what I've suggested isn't the answer, could you elaborate on how to go on with this? – sanjeev mk Jan 04 '16 at 20:38