5

I am studying about Variational Auto Encoders. I found the following equation.

$$\mu +\Sigma^\frac12\odot\epsilon$$

I can imagine that $\Sigma$ is related to variance. What does the $\Sigma^{1/2}$ mean? And how is the notation called? I can not find any good articles related to this.

Blue
  • 75,673
jef
  • 161
  • 2
    $\Sigma$ is variance, and $\Sigma^{\frac12}$ is standard deviation? – jef Mar 30 '18 at 21:04
  • One thing which would help is where you read that. Presumably, they define notation somewhere before. (Is $\epsilon$ an error or pertubation vector?) – Clement C. Mar 30 '18 at 21:05
  • 2
    If $\Sigma$ is a symmetric positive definite matrix (as it should if it's a covariance matrix), then presumably $\Sigma^{1/2}$ is its (well-defined) matrix square root. – Clement C. Mar 30 '18 at 21:06
  • The original source is in Japanese. But I found it refers a wikipedia article. I'm reading it now. https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Geometric_interpretation – jef Mar 30 '18 at 21:08
  • In normal distribution, I found two cases of definition $\mathcal{N}(\mu(x), \sigma(x))$ and $\mathcal{N}(\mu(x), \Sigma(x))$. Do both "sigma" mean standard deviation? Does the latter want to focus the sigma is a matrix? – jef Mar 30 '18 at 21:26
  • I am going to be blunt here: if you are not clear on what a covariance matrix (cf. my previous comment) is and a bit hazy on your linear algebra (so that you are not clear on what a positive semi-definite matrix and its square root are), maybe you should work on that before wanting to study neural nets and autoencoders. – Clement C. Mar 30 '18 at 21:28

1 Answers1

2

This literature is honestly not very well-notated in general. Let me try to clarify.

Usually a 1D normally distributed random variable $\widetilde{z}$ (dependent on some other variable $x$) is written like: $$ \widetilde{z}\sim\mathcal{N}(\mu(x),\sigma(x)^2) $$ where $\mu(x)\in\mathbb{R}$ is the mean and $\sigma(x)^2\in\mathbb{R}$ is the variance. An equivalent value can be sampled via $ \widetilde{z} = \mu(x) + \varepsilon\sqrt{\sigma(x)} $ where $\varepsilon\sim\mathcal{N}(0,1)$. Notice the square root is used to get the standard deviation.

Separately, a multivariate, normally distributed random variable is often written as: $$ z\sim\mathcal{N}(\mu(x),\Sigma(x)) $$ where $z,\mu(x)\in\mathbb{R}^n$ and $\Sigma(x)\in\mathbb{R}^{n\times n}$. Note that $\Sigma(x)$ must be positive definite and symmetric in order to be a proper covariance matrix. In this case, we can sample a value via $z = \mu(x) + \sqrt{\Sigma(x)}\,\xi$, where $\xi\sim\mathcal{N}(0,I)$, $\xi\in\mathbb{R}^n$. Here, $\sqrt{\Sigma(x)}=\Sigma^{1/2}(x)\in\mathbb{R}^{n\times n}$ is the matrix square root, i.e. $\Sigma^{1/2}(x)\Sigma^{1/2}(x)=\Sigma(x)$. Note that one cannot in general compute this by taking the square root of the elements of the matrix (though one can for diagonal matrices)! This can be non-trivial (e.g. see here or here).

However, for VAEs and the reparameterization trick, in most cases the covariance matrix is defined to be diagonal. This tells us two things: (1) the matrix-vector product $\sqrt{\Sigma}\xi$ becomes a Hadamard product, i.e. in this case $ \sqrt{\Sigma}\odot\xi= \sqrt{\Sigma}\xi$, and (2) the matrix root becomes an element-wise root, i.e. $\Sigma = \text{diag}(\Sigma_1,\ldots,\Sigma_n)\;\implies\;\Sigma^{1/2}=\text{diag}(\Sigma^{1/2}_1,\ldots,\Sigma^{1/2}_n)$. Thus, sampling a value in a VAE with diagonal covariance becomes: $$ z = \mu(x) + \sqrt{\Sigma(x)}\,\xi = \mu(x) + \Sigma^{1/2}(x)\odot\xi $$ where you don't have to worry about matrix roots, because you can take the elementwise root! :)

user3658307
  • 10,433