2

I've been trying to prove that a normal inverse Wishart distribution can act as a conjugate to a series of multivariable normal distributions. Formally,

$$\prod_{i=1}^I Norm_{\boldsymbol x_i}[\boldsymbol\mu, \boldsymbol\Sigma] \cdot NorIWis_{\boldsymbol\mu, \boldsymbol\Sigma}[\alpha, \boldsymbol\Psi, \gamma, \boldsymbol\delta] = \kappa \cdot NorIWis_{\boldsymbol\mu, \boldsymbol\Sigma}[\tilde\alpha, \tilde{\boldsymbol\Psi}, \tilde\gamma, \tilde{\boldsymbol\delta}]$$

where the following definitions are already given:

$$\kappa = \frac{1}{\pi^{ID/2}}\frac{\boldsymbol\Psi^{\alpha/2}}{\tilde{\boldsymbol\Psi}^{\alpha/2}}\frac{\Gamma_D[\tilde\alpha/2]}{\Gamma_D[\alpha/2]}\frac{\gamma^{D/2}}{\tilde\gamma^{D/2}}$$ $$\tilde\alpha = \alpha + 1$$ $$\tilde{\boldsymbol\Psi} = \boldsymbol\Psi + \gamma \boldsymbol\delta \boldsymbol\delta^T + \sum_{i=1}^I\boldsymbol x_i \boldsymbol x_i^T - \frac{1}{\gamma + I}\bigg(\gamma \boldsymbol\delta + \sum_{i=1}^I \boldsymbol x_i\bigg)\bigg(\gamma \boldsymbol\delta + \sum_{i=1}^I \boldsymbol x_i\bigg)^T$$ $$\tilde\gamma = \gamma + I$$ $$\tilde{\boldsymbol\delta} = \frac{\gamma \boldsymbol\delta + \sum_{i=1}^Ix_i}{\gamma + I}$$


The proof is as follows. First, expand the definitions of each distribution:

$$\prod_{i=1}^I Norm_{x_i}[\boldsymbol\mu, \boldsymbol\Sigma] \cdot NorIWis_{\boldsymbol\mu, \boldsymbol\Sigma}[\alpha, \boldsymbol\Psi, \gamma, \boldsymbol\delta]$$

$$=\prod_{i=1}^I \bigg(\frac{1}{(2\pi)^{D/2}\vert\Sigma\vert^{1/2}} \cdot \exp{\big[-0.5(x_i - \mu)^T\Sigma^{-1}(x_i - \mu)\big]}\bigg) \cdot \frac{\gamma^{D/2}\vert\Psi\vert^{\alpha/2}\exp{\big[-0.5\big(Tr\big[\Psi\Sigma^{-1}\big] + \gamma(\mu - \delta)^T\Sigma^{-1}(\mu - \delta)\big)\big]}} {2^{\alpha D/2}(2\pi)^{D/2}\vert\Sigma\vert^{(\alpha + D + 2)/2}\Gamma_D(\alpha/2)}$$

To improve readability, let

$$\kappa_0 = \bigg(\frac{1}{(2\pi)^{D/2}\vert\Sigma\vert^{1/2}}\bigg)^I \cdot \frac{\gamma^{D/2}\vert\Psi\vert^{\alpha/2}}{2^{\alpha D/2}(2\pi)^{D/2}\vert\Sigma\vert^{(\alpha + D + 2)/2}\Gamma_D(\alpha/2)}$$

This simplifies the above expression to

$$\kappa_0\prod_{i=1}^I \bigg(\exp{\big[-0.5(x_i - \boldsymbol\mu)^T\Sigma^{-1}(x_i - \boldsymbol\mu)\big]}\bigg) \cdot \exp{\big[-0.5\big(Tr\big[\boldsymbol\Psi\Sigma^{-1}\big] + \gamma(\boldsymbol\mu - \boldsymbol\delta)^T\boldsymbol\Sigma^{-1}(\boldsymbol\mu - \boldsymbol\delta)\big)\big]}$$

$$\kappa_0\exp{\bigg[-0.5\bigg(\sum_{i=1}^I(\boldsymbol x_i - \boldsymbol\mu)^T\boldsymbol\Sigma^{-1}(\boldsymbol x_i - \boldsymbol\mu) + Tr\big[\boldsymbol\Psi\boldsymbol\Sigma^{-1}\big] + \gamma(\boldsymbol\mu - \boldsymbol\delta)^T\boldsymbol\Sigma^{-1}(\boldsymbol\mu - \boldsymbol\delta)\bigg)\bigg]}$$

The following relation is also important:

$$(\boldsymbol a - \boldsymbol b)^T \boldsymbol M (\boldsymbol a - \boldsymbol b) = (\boldsymbol a^T - \boldsymbol b^T) (\boldsymbol M \boldsymbol a - \boldsymbol M \boldsymbol b) = \boldsymbol a^T \boldsymbol M \boldsymbol a - \boldsymbol a^T \boldsymbol M \boldsymbol b - \boldsymbol b^T \boldsymbol M \boldsymbol a + \boldsymbol a^T \boldsymbol M \boldsymbol a$$

Using this, the above expression can be expanded to

$$\kappa_0\exp{\bigg[-0.5\bigg(\sum_{i=1}^I\big(\boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol x_i - \boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol\mu - \boldsymbol\mu^T\boldsymbol\Sigma^{-1} \boldsymbol x_i + \boldsymbol\mu^T \boldsymbol\Sigma^{-1}\boldsymbol\mu \big) + Tr\big[\boldsymbol\Psi\boldsymbol\Sigma^{-1}\big] + \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\mu - \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\delta - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\mu + \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\delta \bigg)\bigg]}$$

Now, using the definition of $\tilde{\boldsymbol\Psi}$ and the given relation $Tr[\boldsymbol z \boldsymbol z^T \boldsymbol A^{-1}] = \boldsymbol z^T \boldsymbol A^{-1} \boldsymbol z$, we can expand:

$$Tr[\boldsymbol\Psi\boldsymbol\Sigma^{-1}] = Tr \bigg[ \bigg(\tilde{\boldsymbol\Psi} - \gamma\boldsymbol\delta\boldsymbol\delta^T - \sum_{i=1}^I \boldsymbol x_i \boldsymbol x_i^T + \frac{1}{\gamma + I}\bigg(\gamma \boldsymbol\delta + \sum_{i=1}^I \boldsymbol x_i\bigg)\bigg(\gamma \boldsymbol\delta + \sum_{i=1}^I \boldsymbol x_i\bigg)^T\bigg)\boldsymbol\Sigma^{-1}\bigg] = Tr[\tilde{\boldsymbol\Psi}\boldsymbol\Sigma^{-1}] - Tr[\gamma\boldsymbol\delta\boldsymbol\delta^T\boldsymbol\Sigma^{-1}] - Tr\bigg[\sum_{i=1}^I \boldsymbol x_i \boldsymbol x_i^T \boldsymbol\Sigma^{-1}\bigg] + Tr\bigg[\frac{1}{\gamma + I}\big((\gamma + I)\tilde{\boldsymbol\delta}\big)\big((\gamma + I)\tilde{\boldsymbol\delta}\big)^T\boldsymbol\Sigma^{-1}\bigg] = Tr[\tilde{\boldsymbol\Psi}\boldsymbol\Sigma^{-1}] - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\delta - \sum_{i=1}^I \boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol x_i + \tilde\gamma\tilde{\boldsymbol\delta}^T\boldsymbol\Sigma^{-1}\tilde{\boldsymbol\delta}$$

Substituting this back in, the following results:

$$\kappa_0\exp{\bigg[-0.5\bigg(\sum_{i=1}^I\big(\boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol x_i - \boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol\mu - \boldsymbol\mu^T\boldsymbol\Sigma^{-1} \boldsymbol x_i + \boldsymbol\mu^T \boldsymbol\Sigma^{-1}\boldsymbol\mu \big) + Tr[\tilde{\boldsymbol\Psi}\boldsymbol\Sigma^{-1}] - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\delta - \sum_{i=1}^I \boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol x_i + \tilde\gamma\tilde{\boldsymbol\delta}^T\boldsymbol\Sigma^{-1}\tilde{\boldsymbol\delta} + \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\mu - \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\delta - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\mu + \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\delta \bigg)\bigg]}$$

Rearranging the terms, a solution is almost reached:

$$\kappa_0\exp{\bigg[-0.5\bigg(\sum_{i=1}^I\big(- \boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol\mu - \boldsymbol\mu^T\boldsymbol\Sigma^{-1} \boldsymbol x_i + \boldsymbol\mu^T \boldsymbol\Sigma^{-1}\boldsymbol\mu \big) + Tr[\tilde{\boldsymbol\Psi}\boldsymbol\Sigma^{-1}] - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\delta + \tilde\gamma\tilde{\boldsymbol\delta}^T\boldsymbol\Sigma^{-1}\tilde{\boldsymbol\delta} + \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\mu - \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\delta - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\mu + \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\delta\bigg)\bigg]}$$

$$\kappa_0\exp{\bigg[-0.5\bigg(\sum_{i=1}^I\big(- \boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol\mu - \boldsymbol\mu^T\boldsymbol\Sigma^{-1} \boldsymbol x_i + \boldsymbol\mu^T \boldsymbol\Sigma^{-1}\boldsymbol\mu \big) + Tr[\tilde{\boldsymbol\Psi}\boldsymbol\Sigma^{-1}] + \tilde\gamma\tilde{\boldsymbol\delta}^T\boldsymbol\Sigma^{-1}\tilde{\boldsymbol\delta} + \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\mu - \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\delta - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\mu\bigg)\bigg]}$$

$$\kappa_0\exp{\bigg[-0.5\bigg(\sum_{i=1}^I\big(- \boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol\mu - \boldsymbol\mu^T\boldsymbol\Sigma^{-1} \boldsymbol x_i \big) + I \boldsymbol\mu^T \boldsymbol\Sigma^{-1}\boldsymbol\mu + Tr[\tilde{\boldsymbol\Psi}\boldsymbol\Sigma^{-1}] + \tilde\gamma\tilde{\boldsymbol\delta}^T\boldsymbol\Sigma^{-1}\tilde{\boldsymbol\delta} + \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\mu - \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\delta - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\mu\bigg)\bigg]}$$

$$\kappa_0\exp{\bigg[-0.5\bigg(\sum_{i=1}^I\big(- \boldsymbol x_i^T \boldsymbol\Sigma^{-1} \boldsymbol\mu - \boldsymbol\mu^T\boldsymbol\Sigma^{-1} \boldsymbol x_i \big) + Tr[\tilde{\boldsymbol\Psi}\boldsymbol\Sigma^{-1}] + \tilde\gamma\tilde{\boldsymbol\delta}^T\boldsymbol\Sigma^{-1}\tilde{\boldsymbol\delta} + \tilde\gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\mu - \gamma\boldsymbol\mu^T\boldsymbol\Sigma^{-1}\boldsymbol\delta - \gamma\boldsymbol\delta^T\boldsymbol\Sigma^{-1}\boldsymbol\mu\bigg)\bigg]}$$


Unfortunately, from here, I can't seem to find a solution. I'm aiming for the form

$$\kappa \cdot NorIWis_{\boldsymbol\mu, \boldsymbol\Sigma}[\tilde\alpha, \tilde{\boldsymbol\Psi}, \tilde\gamma, \tilde{\boldsymbol\delta}] = \kappa_0 \exp{\big[-0.5\big(Tr\big[\tilde{\boldsymbol\Psi}\boldsymbol\Sigma^{-1}\big] + \tilde\gamma(\boldsymbol\mu - \tilde{\boldsymbol\delta})^T\boldsymbol\Sigma^{-1}(\boldsymbol\mu - \tilde{\boldsymbol\delta})\big)\big]}$$

I've noticed, however, that if $\boldsymbol a^T \boldsymbol\Sigma^{-1} \boldsymbol b = -\boldsymbol b^T \boldsymbol\Sigma^{-1} \boldsymbol a$ for vectors $\boldsymbol a$ and $\boldsymbol b$ ($\boldsymbol\Sigma^{-1}$ still being the inverse of the covariance matrix), then everything simplifies and the proof works. Unfortunately, this is impossible because $\boldsymbol\Sigma^{-1}$ is symmetric, not antisymmetric. Nonetheless, I haven't been able to prove this point. Are there any errors in my proof?

  • Also, if someone knows how to fix the formatting on the $\LaTeX$, I would love an edit (those hanging lines and my OCD don't agree). –  Sep 08 '15 at 04:57
  • Alas, it appears I have found an answer! By simply expanding the terms further, it's trivial (but algebraically painful) to simply. I won't post the answer right now (indeed, the LaTeX is painful) but if someone would like a full answer, or is working on this same problem, let me know. –  Sep 10 '15 at 16:40

0 Answers0