PCA for data compression

Question

I would like to use PCA (Principal Component Analysis) to compress a sequence of vectors, $v_0 \ldots v_n$.

My plan is to concatenate these vectors into a matrix: $M = [ v_0 \ldots v_n ]$

I will then use PCA to create a smaller representative set of vectors: $x_0 \ldots x_m$

For each $v_i$, I will then find the weights, $w_0 .. w_m$ such that $w_0 x_0 + \cdots + w_m x_m$ approximates $v_i$.

Assuming this sounds reasonable at the high level, I need to better understand the practical details.

Can anyone point me to a good explanation of the concrete steps for applying PCA, SVD, etc. to solve this problem or similar applied problems? Most of the material I've found is very abstract and self-referential. To use an analogy, it's like looking up ways to use $e$ and reading that $e^{i\pi}=-1$. Although accurate and interesting, it doesn't actually explain $e$, $i$ or $\pi$ or how one would apply any of them to solve common problems.
Can anyone recommend an easy-to-understand free C/C++ library I can use to experiment with numerical PCA in my own code? The simpler and more focused on my problem, the better.

score 5 · Accepted Answer · edited Apr 13 '17 at 12:21

Your approach is correct. There are couple of points you need to decide.

Is $n$ really large? If not, you can use the SVD algorithm (matlab or dgesvd) to get $M = U \Sigma V^T$.

In MATLAB,

[U,S,V] = svd(M);

The columns of $U$ gives your $x_0,x_1,\ldots x_m$ and the weights $w_{jk}$ form the matrix $\Sigma V^T$

In LAPACK, there are some more parameters which you need to pass to your function DGESVD. (Look up the link I have given)

If $n$ is really really large, then there are some really nice approximate/ randomized algorithms which give you the low rank form. One such algorithm is here.

What the SVD does:

If $M \in \mathbb{R}^{m \times n}$ is a matrix of rank $r$, then $M = U \Sigma V^T$ is the singular value decomposition of the matrix $M$ where $U \in \mathbb{R}^{m \times r}$, $V \in \mathbb{R}^{n \times r}$ and $\Sigma \in \mathbb{R}^{r \times r}$.

The matrices $U$ and $V$ are orthonormal matrices i.e. $U^T U = I_{r \times r}$ and $V^T V = I_{r \times r}$ and the matrix $\Sigma$ is a diagonal matrix with positive entries.

This enables us to write the matrix $M$ as $$M = \sigma_1 u_1 v_1^T + \sigma_2 u_2 v_2^T + \cdots + \sigma_r u_r v_r^T$$

There are numerous advantages to SVD. The most important of this being that SVD gives the optimal low rank approximation where the sense of optimality is in the $2-$norm and Frobenius norm sense. (You may want to look up here why the rank of a matrix is important)

What this means is if we are looking for a rank $p$ approximation to the matrix $M$, such that $||M- \tilde{M}||_2$ or $||M- \tilde{M}||_F$ is minimized then $\tilde{M}$ is given by $$\tilde{M} = \sigma_1 u_1 v_1^T + \sigma_2 u_2 v_2^T + \cdots + \sigma_p u_p v_p^T$$

Hence, if we have the SVD of a matrix, then from an application point of view, we have almost all we need to know about the matrix.

PCA for data compression

1 Answers1