Writing Multivariate Normal (Gaussian) distribution as Exponential Family: How to deal with trace?

Question

$$ \newcommand{\vect}[1]{\boldsymbol{\mathbf{#1}}} \newcommand{\nc}[2]{\newcommand{#1}{#2}} \nc{\vx}{\vect{x}} \nc{\vmu}{\vect{\mu}} \nc{\vSigma}{\vect{\Sigma}} \nc{\vtheta}{\vect{\theta}} $$

Context

I'm reading Graphical Models, Exponential Families, and Variational Inference by Martin J. Wainwright and Michael I. Jordan. I've already read this post but it's very cryptic.

Problem

I want to write the multivariate normal distribution $$ f(\vx) = (2\pi)^{-\frac{d}{2}}\text{det}(\vSigma)^{-\frac{1}{2}}\exp\left\{-\frac{1}{2}(\vx - \vmu)^\top \vSigma^{-1}(\vx - \vmu)\right\} $$ as being part of the exponential family. In the book (and in many other places, see this paper) they say $f$ belongs to the exponential family if it can be written as $$ f(\vx; \vtheta) = \exp\left\{\langle\vtheta, \phi(\vx)\rangle - A(\vtheta)\right\} $$ where $\vtheta$ are the natural parameters of the distribution, $\phi(\vx)$ are sufficient statistics, and $A(\vtheta)$ is the log-partition function, which simply makes sure it integrates to one.

My Attempt

Firstly, we can write the pdf as an exponential

$$ \begin{align} f(\vx) &= (2\pi)^{-\frac{d}{2}}\text{det}(\vSigma)^{-\frac{1}{2}}\exp\left\{-\frac{1}{2}(\vx - \vmu)^\top \vSigma^{-1}(\vx - \vmu)\right\} \\ &= \exp\left\{-\frac{d}{2}\log 2\pi - \frac{1}{2}|\vSigma|-\frac{1}{2}\left[\vx^\top\vSigma^{-1}\vx - \vx^\top\vSigma^{-1}\vmu - \vmu^\top\vSigma^{-1}\vx + \vmu^\top\vSigma^{-1}\vmu\right]\right\} \\ &= \exp\left\{\vx^\top\vSigma^{-1}\vmu -\frac{1}{2}\vx^\top\vSigma^{-1}\vx -\frac{1}{2}\left[d\log2\pi + \log|\vSigma| +\vmu^\top\vSigma^{-1}\vmu\right]\right\} \end{align} $$

Next, we can see that the following expression $$ \vx^\top \vSigma^{-1}\vx = \sum_{k=1}^d \sum_{j=1}^d x_k\Sigma_{kj}x_j $$ is actually equivalent to this $$ \begin{align} \text{tr}\left[\vSigma\vx\vx^\top\right] &= \text{tr}\left[ \begin{pmatrix} \Sigma_{11} & \cdots & \Sigma_{1d} \\ \vdots & \ddots & \vdots \\ \Sigma_{d1} & \cdots & \Sigma_{dd} \end{pmatrix} \begin{pmatrix} x_1^2 & \cdots & x_1x_d\\ \vdots & \ddots & \vdots\\ x_dx_1 & \cdots & x_d^2 \end{pmatrix} \right]\\ &= \text{tr}\left[ \begin{pmatrix} \sum_{j=1}^d\Sigma_{1j}x_jx_1 & \cdots & \sum_{j=1}\Sigma_{1j}x_jx_d \\ \vdots & \ddots & \vdots \\ \sum_{j=1}^d \Sigma_{dj}x_jx_1 & \cdots & \sum_{j=1}^d \Sigma_{dj}x_jx_d \end{pmatrix} \right] = \sum_{k=1}^d\sum_{j=1}^d x_{k}\Sigma_{kj}x_j \end{align} $$ So that we can write the pdf as $$ f(\vx) = \exp\left\{\langle \vx, \vSigma^{-1}\vmu\rangle -\frac{1}{2}\text{tr}\left[\vSigma\vx\vx^\top\right] -\frac{1}{2}\left[d\log2\pi + \log|\vSigma| +\vmu^\top\vSigma^{-1}\vmu\right]\right\} $$ According to the question that I've linked to we can use the Frobenius product between two real matrices $A$ and $B$ $$ \langle A, B\rangle_F = \text{tr}(A^\top B) $$ In our case $A = \vSigma$ and $B=\vx\vx^\top$. In addition both $\vSigma$ and $\vx\vx^\top$ are symmetric so we have $$ \langle \vSigma, \vx\vx^\top\rangle_F = \text{tr}(\vSigma\vx\vx^\top) $$ This means that we can write the pdf as $$ f(\vx) = \exp\left\{\langle \vSigma^{-1}\vmu, \vx\rangle + \left\langle -\frac{1}{2}\vSigma, \vx\vx^\top\right\rangle_F -\frac{1}{2}\left[d\log2\pi + \log|\vSigma| +\vmu^\top\vSigma^{-1}\vmu\right]\right\} $$ We can construct the following vectors $$ \vtheta = \begin{pmatrix} \vSigma^{-1}\vmu \\ -\frac{1}{2}\vSigma \end{pmatrix} \qquad \phi(\vx) = \begin{pmatrix}\vx \\ \vx\vx^\top \end{pmatrix} $$ and when taking the inner product $$ \langle \vtheta, \phi(\vx)\rangle $$ we make sure to take the dot product for the first two terms, since they're both vectors $$ \langle \vSigma^{-1}\vmu, \vx\rangle $$ and we take the Frobenius inner product for the second terms $$ \left\langle -\frac{1}{2}\vSigma, \vx\vx^\top\right\rangle_F $$ In addition we have $$ A(\vtheta) = A\left( \begin{pmatrix} \vSigma^{-1}\vmu \\ -\frac{1}{2}\vSigma \end{pmatrix} \right) = \frac{1}{2}\left[d\log2\pi + \log|\vSigma| +\vmu^\top\vSigma^{-1}\vmu\right] $$

I've ended up writing a blog post with the full derivation here — Euler_Salter, Mar 11 '20 at 16:57

Writing Multivariate Normal (Gaussian) distribution as Exponential Family: How to deal with trace?

Context

Problem

My Attempt

0 Answers0