I implemented a version of the Subspace Iteration algorithm for SVD based on Uri Cohen's answer.
def ssi(M, k):
""" Subspace iteration ignoring the diagonal """
n, _ = M.shape
U = np.random.randn(n, k)
last_diag = np.diag(M)
for _ in range(100):
M1 = M - np.diag(np.diag(M) - last_diag)
U, R = np.linalg.qr(M1 @ U)
last_diag = np.einsum('c,rc,rc->r', np.diag(np.abs(R)), U, U)
return np.sqrt(np.diag(R)) * U
This returns a lowrank matrix $L$, such that $\|M-LL^T\|_F$ is small outside of the diagonal.
For comparison I implemented Uri Cohen's answer:
def iterated-svd(M, k):
last_diag = np.diag(M)
for _ in range(100):
M1 = M - np.diag(np.diag(M) - last_diag)
U, D, V = np.linalg.svd(M1)
ud = np.einsum('c,rc,rc->r', D[:k], U[:,:k], U[:,:k])
return np.sqrt(D)[None,:k] * U[:,:k]
As well as an SGD version using torch
.
I tested all of them on 1000 random symmetric matrices with dimension $5$. Trying to get a diagonal + rank-2 approximation.
The mean squared error obtained from each algorithm were:
- ssi 100its: 2.05
- svd 1it: 7.07
- svd 100its: 1.74
- torch 100its: 4.18
In other words, all algorithms more or less agreed.
To me this suggests that "subspace iteration with replaced diagonal" works pretty well.
And since it's much faster the iterated SVD and torch, I'll be using that personally.
However, there is no proof for this, and I'm sure Ben Grossmann is right that it can fail for some matrices. So be warned!
I also should point out the strange need for using abs(R)
in the subspace iteration instead of just R
. This doesn't seem to be needed when computing the normal SVD, but experimentally it is crucial.
Another interesting thing to note: Even if your original matrix is symmetric, the decompositions $M \approx D + LL^T$ are not necessarily PSD!
I counted the number of times this happened in my 1000 experiments
and got the following number of non PSD matrices:
- ssi 100it: 109
- svd 1it: 0
- svd 100it: 92
- torch 100it: 331
While I don't even know a proof that SVD with 1 iteration can't have this problem, it suggests that the improved approximations may not always be worth the trouble if you want to later, say, interpret your approximation as a covariance matrix.
I tried a 5th version that runs ssi, but forces the diagonal matrix to be non-negative.
This trivially enforces PSDness of the approximation.
Concretely I changed the "diagonal replacement line" to
M1 = M - np.maximum(np.diag(d - ud), 0)
The mean squared error was 2.33, so it is worse than ssi (2.05) and iterated svd (1.74), but still a lot better than plain svd (7.06) and we get the extra guarantee about positiveness of the diagonal.