4

I am reading Radim's Phd Thesis (the creator of gensim), on the chapter about SVD (page 45). There he puts certain assignment, indicating that we can calculate the "merge" of a couple of SVD factorizations $A_1=U_1S_1V_1^{T}$ and $A_2=U_2S_2V_2^{T}$. The purpose of such merge can be thought as an incremental SVD algorithm, which uses a couple of already calculated SVD and combines them into a single SVD factorization (which shall be the same as if we would have started with matrix $\begin{bmatrix}A_1 \mid A_2\end{bmatrix}$, on the first place).

The tricky part about his merge, is that it uses only the matrices $U_1S_1$ and $U_2S_2$; the matrices $V_1$ and $V_2$ are not saved because he cares about an streamed Latent Semantic Indexing application, where the columns of matrices $A_1$ and $A_2$ represent documents which can come continuously in high volumes (hence the resulting matrices $V_1$,$V_2$ from SVD could be quite big):

$USV^{T}$ $\xleftarrow{SVD}$ $\begin{bmatrix} U_1S_1 \mid U_2S_2 \end{bmatrix}$

The thesis uses $SVD_k$ indicating a truncated SVD, and actually the two already calculated SVDs are truncated as well. On the other hand, the one-line algorithm above for the merge is non efficient, as is not really exploiting the fact that we have already two calculated SVDs; nor that we do not really need the matrix $V^{T}$. The thesis also mentions a decay factor $\gamma$ that I am omitting here. All those details are not relevant for my question, but mentioning just as additional context. Actually, this one-line merge is just presented in the thesis, as one step in a sequence of refinements that eventually lead to an optimized merge (which evolves from the simple assignment presented here, to an algorithm with several steps).

Now the actual question: Radim seems to claim that the matrices $U$ and $S$ that we get from the SVD calculation below (using only $U_1S_1$ and $U_2S_2$):

\begin{equation} USV^{T} \xleftarrow{SVD} \begin{bmatrix} U_1S_1 \mid U_2S_2 \end{bmatrix} \end{equation}

are the same that we would have obtained, from calculating SVD against the full decompositions; that is:

\begin{equation} USV^{T} \xleftarrow{SVD} \begin{bmatrix} U_1S_1V_1^{T} \mid U_2S_2V_2^{T} \end{bmatrix} = \begin{bmatrix}A_1 \mid A_2\end{bmatrix} \end{equation}

Such equivalence is not evident to me; that is, the statement that we arrive to the same matrices $U$ and $S$ with either of the two equations above (first equation does not use the matrices $V_1$, $V_2$, while second equation does) ... is there a missing step or an additional theorem that justifies this equivalence? (even more, did I interpret incorrectly the claim?)

Would say "thanks in advance", but seems to be considered bad practice.

user251053
  • 51
  • 3

1 Answers1

2

By definition, $U_1S_1V_1^{T} = A_1$ and $U_2S_2V_2^{T} = A_2$.

When we concatenate $A_1$ and $A_2$, we get $\begin{bmatrix}A_1 \mid A_2\end{bmatrix}$ = $\begin{bmatrix}U_1S_1V_1^{T} \mid U_2S_2V_2^{T}\end{bmatrix}$ (simple substituion).

So it's trivially true that computing SVD from the LHS is equal to computing SVD from the RHS:

$USV^{T}$ $\xleftarrow{SVD}$ $\begin{bmatrix} U_1S_1V_1^{T} \mid U_2S_2V_2^{T} \end{bmatrix} = \begin{bmatrix}A_1 \mid A_2\end{bmatrix}$

There's nothing to prove, this is all from the definition of SVD. My guess is the confusion comes from some notation misunderstanding :)

Or maybe you mean, "how does the equivalence hold in case $U_1$, $S_1$, $U_2$, $S_2$ are already truncated rank-k matrices, i.e. $U_1^k$, $U_2^k$ etc, rather than full rank?". Is it true that

$$SVD_k(A_1 \mid A_2) = SVD_k(SVD_k(A_1) \mid SVD_k(A_2))$$ (eq. 3.8 in the thesis)?

The answer is of course it doesn't: each truncation introduces an approximation. Information is lost at each merge step.

What this online algorithm aims to do is lose as little information as possible = retain as much variance as possible, given we keep only the rank-k truncated matrices $S^k$, $U^k$ from the previous merge steps at any time.

Radim
  • 21
  • Wow! The author of the thesis answered himself! (perhaps nobody else could have done that ;-) Thanks a lot Radim, both for replying and for the additional information. Unfortunately, my original wording at the end of post (last paragraph beginning with "Such equivalence ..."), may have made unclear that the question was not referring exclusively to last equation but to the last two equations indeed. I just edited, aiming to make it clearer; please take a look. – user251053 Jul 28 '15 at 11:56
  • 1
    You've updated your question, but I cannot update my answer. So I will post my response as a comment below: – Radim Aug 04 '15 at 11:11
  • The $V$ will be different between the 1st vs 2nd equation. But as explained in section 3.2, we only care about $U$ and $S$ -- we throw away $V$ anyway. In other words, we are only interested in the eigen decomposition (eq 3.1) of $AA^t$, which corresponds to SVD of $A$. $V$ plays no role, as $AA^t=(USV^t)(USV^t)^t=USV^tVS^tU^t=US^2U^t$ (follows from orthonormality of $V$, and $S$ is diagonal). – Radim Aug 04 '15 at 11:54
  • Hello Radhim, thanks again for your time. Sorry for my insistence, but I do get that we care only about the matrices $U$ and $S$; the question is about whether they are the same if we use equation 1 or 2. It seems to me, that you suggest we get the same matrices $U$ and $S$ with either equation, and that is the part which is not quite clear to me. For instance, you just said that the matrix $V$ would be different for each equation ... then, why would the matrices $U$ and $S$ be the same? Kind regards. – user251053 Aug 04 '15 at 15:26