1

I need to take the gradient of the following expression with respect to the matrix $\Sigma$ (of size $d$ by $d$)

\begin{eqnarray*} p(x \vert y=i) &=& \frac{1}{(2\pi)^{d/2} \vert \Sigma\vert ^{1/2}} \sum_{i=1}^n \exp\left(-\frac{1}{2}(x-\mu_{y^{(i)}})^T \Sigma^{-1} (x-\mu_{y^{(i)}}) \right), \end{eqnarray*}

where $\vert \Sigma\vert$ is the determinant of the matrix and $\Sigma^{-1}$ is the inverse of the matrix.

We know the gradient of a determinant of the matrix (How to calculate the gradient of log det matrix inverse?) and the inverse of a matrix (from Derivative of the inverse of a matrix). But I am not sure how to take the gradient of the expression in the sum.

This equation is from Andrew Ng's ML notes (Pages 35 and 36) - https://sgfin.github.io/files/notes/CS229_Lecture_Notes.pdf

  • Derivative despective to which variable vector? – Bruno Lobo Oct 24 '23 at 18:08
  • With respect to $\Sigma$ (not $\vert \Sigma\vert$). I have edited my original post. – batman08 Oct 24 '23 at 20:30
  • Take a look at page 9 of the lecture notes you placed on OP. I recommend to derive respective to matrix element $\Sigma_{ij}$ instead. For that, you must understand what $\frac{\partial}{\partial \Sigma_{ij}} \Sigma^{-1}$ and $\frac{\partial}{\partial \Sigma_{ij}} \lvert \Sigma \rvert$. – Bruno Lobo Oct 24 '23 at 21:06

1 Answers1

0

I will change the question a bit for the sake of simplicity:

$$\nabla_\Sigma f(\Sigma)$$ for $$f(\Sigma) = \frac{1}{(2 \pi)^{\frac{d}{2}} \lvert \Sigma \rvert^{\frac{1}{2}}} e^{-\frac{1}{2} (x - \mu)^\intercal \Sigma^{-1} (x - \mu)}$$

becomes

$$\nabla_{\Sigma_{ij}} f(\Sigma) = \frac{1}{(2 \pi)^{\frac{d}{2}} } \underbrace{\nabla_{\Sigma_{ij}} \left( \frac{1}{\lvert \Sigma \rvert^{\frac{1}{2}}} \right)}_{(1)} e^{-\frac{1}{2} (x - \mu)^\intercal \Sigma^{-1} (x - \mu)} + \frac{1}{(2 \pi)^{\frac{d}{2}} \lvert \Sigma \rvert^{\frac{1}{2}}} \underbrace{\nabla_{\Sigma_{ij}} \left(e^{-\frac{1}{2} (x - \mu)^\intercal \Sigma^{-1} (x - \mu)}\right)}_{(2)}$$.

Let us go in parts:

(1) $\nabla_{\Sigma_{ij}} \left( \frac{1}{\lvert \Sigma \rvert^{\frac{1}{2}}} \right) = -\frac{1}{2} \frac{1}{\lvert \Sigma \rvert^{\frac{3}{2}}} \nabla_{\Sigma_{ij}} \lvert \Sigma \rvert = -\frac{1}{2} \frac{1}{\lvert \Sigma \rvert^{\frac{3}{2}}} \lvert \Sigma \rvert (\Sigma^{-\intercal})_{ij} = -\frac{1}{2} \frac{1}{\lvert \Sigma \rvert^{\frac{1}{2}}} ((\Sigma^{-1})^\intercal)_{ij}$

(2) $\nabla_{\Sigma_{ij}} \left(e^{-\frac{1}{2} (x - \mu)^\intercal \Sigma^{-1} (x - \mu)}\right) = -\frac{1}{2} e^{-\frac{1}{2} (x - \mu)^\intercal \Sigma^{-1} (x - \mu)} \nabla_{\Sigma_{ij}} (x - \mu)^\intercal \Sigma^{-1} (x - \mu) = -\frac{1}{2} e^{-\frac{1}{2} (x - \mu)^\intercal \Sigma^{-1} (x - \mu)} (x - \mu)^\intercal \left(\nabla_{\Sigma_{ij}} \Sigma^{-1}\right) (x - \mu)$

This is where it gets messy, since $\nabla_{\Sigma_{ij}} (\Sigma^{-1})_{kl} = (\Sigma^{-1})_{ik} (\Sigma^{-1})_{jl}$. I will stop for now, and wait for your feedback.