Discrepancies in Custom SVD Implementation Compared to np.linalg.svd - Sign Issues

Question

I've been working on implementing a Singular Value Decomposition (SVD) algorithm from scratch in Python without using the np.linalg.svd function. My goal is to understand the underlying mechanics of SVD. However, I'm encountering some discrepancies in my results compared to the standard np.linalg.svd, specifically regarding the signs of the singular vectors. Interesting enough, my approach works when A is 4x3 but not when it is 3x3 shaped.

My Goal:

Is creating a function that uses a stable and reliable mathematical approach to create an inverse or pseudo inverse to resolve a least squares equation. Limitation - I am not allowed to use any existing math functions. I will later on replace np.linalg.norm and other stuff, by my own functions.

I tried to implement the Moore Penrose inverse, but there were to many different exceptions and issues in regards to the matrix properties. According to some papers, the usage of SVD allows an easy way out.

My Approach:

I've implemented a custom QR decomposition (my_qr) and an eigenvalue/eigenvector computation (eig_custom) method, which are then utilized in calculating the SVD. Here's a simplified version of my code:

import numpy as np
def my_qr(A, standard=False):
    m, n = A.shape

    Q = np.eye(m)
    R = np.copy(A)
    if m == n: end = n-1
    else: end = n
    for i in range(0, end):
        H = np.eye(m)
    a = R[i:, i]    # partial column vector
    norm_a = np.linalg.norm(a)    # 
    if a[0] &lt; 0.0: norm_a = -norm_a
    v = a / (a[0] + norm_a)
    v[0] = 1.0
    h = np.eye(len(a))    # H reflection
    h -= (2 / np.dot(v, v)) * \
        np.matmul(v[:,None], v[None,:])

    H[i:, i:] = h 
    Q = np.matmul(Q, H)
    R = np.matmul(H, R)

if standard == True:    
    S = np.zeros((n,n)) 
    for i in range(n):
        if R[i][i] &lt; 0.0: S[i][i] = -1.0
        else: S[i][i] = +1.0
    Q = np.matmul(Q, S)
    R = np.matmul(S, R)

return Q, R


def is_upper_tri(A, tol):
    n = len(A)
    for i in range(0,n):
        for j in range(0,i): 
            if np.abs(A[i][j]) > tol:
                return False
    return True
def eig_custom(A, num_iterations: int = 100):
    Ak = np.copy(A)
    n = A.shape[0]
    eigenvectors = np.eye(n)
for cnt in range(num_iterations):
    Q, R = my_qr(Ak) #my_qr
    Ak = R @ Q

    eigenvectors = eigenvectors @ Q
    if is_upper_tri(Ak, 1.0e-16) == True:
        break

eigenvalues = np.diag(Ak)
return eigenvalues, eigenvectors



def calculU(M): 
    B = np.dot(M, M.T)
eigenvalues, eigenvectors = eig_custom(B) 
ncols = np.argsort(eigenvalues)[::-1] 
return eigenvectors[:,ncols] 


def calculVt(M): 
    B = np.dot(M.T, M)
eigenvalues, eigenvectors = eig_custom(B) 
ncols = np.argsort(eigenvalues)[::-1] 

return eigenvectors[:,ncols].T 


def calculSigma(M): 
    if (np.size(np.dot(M, M.T)) > np.size(np.dot(M.T, M))): 
        newM = np.dot(M.T, M) 
    else: 
        newM = np.dot(M, M.T)
eigenvalues, eigenvectors = eig_custom(newM) 
eigenvalues = np.sqrt(np.abs(eigenvalues)) 
#Sorting in descending order as the svd function does 
return eigenvalues[::-1] 


def svd(A, num_iterations=1000):
    # Compute eigenvectors and eigenvalues for A^TA and AA^T
    eigvals_AAt, eigvecs_AAt = eig_custom(A @ A.T, num_iterations)
    eigvals_AAt_ref, eigvecs_AAt_ref = np.linalg.eig(A @ A.T)
eigvals_AtA, eigvecs_AtA = eig_custom(A.T @ A, num_iterations)
eigvals_AtA_ref, eigvecs_AtA_ref = np.linalg.eig(A.T @ A)

singular_values = np.sqrt(np.abs(eigvals_AtA))

sorted_indices = np.argsort(singular_values)[::-1]
singular_values = singular_values[sorted_indices]
U = eigvecs_AAt[:, sorted_indices]
V = eigvecs_AtA[:, sorted_indices]

return U, singular_values, V.T


def main():
    mat = []
    for i in range(3):
        mat.append(np.array([7.0, 6.0, 5.0]) - i)
    A = np.array(mat, dtype=float)
    #print(A)
    b = np.array([10,-3, 7])
    x = A @ b
    print(f"Reference Matrix:\n{A}")
print(f&quot;\n{'-'*50}\nReference SVD decomp:&quot;)
U_ref, S_ref, V_ref = np.linalg.svd(A.copy(), full_matrices=False)
print(f&quot;\n\nU:\n{U_ref}\nS:\n{S_ref}\nV:{V_ref}\n\n&quot;)
print(&quot;Verification [U @ Sigma @ V]:&quot;)
print(U_ref @ np.diag(S_ref) @ V_ref)

print(f&quot;\n{'-'*50}\nMy SVD decomp:&quot;)

U, S, V = svd(A.copy())
print(f&quot;\nU:\n{U}\n\nS:\n{S}\n\nV:{V}\n\n&quot;)
print(&quot;Verification [U @ Sigma @ V]:&quot;)
print(U @ np.diag(S) @ V)

print(f&quot;\n{'-'*50}\nDifferent custom SVD decomp:&quot;)
U, S, V = calculU(A), calculSigma(A), calculVt(A) 
print(f&quot;\nU:\n{U}\n\nS:\n{S}\n\nV:{V}\n\n&quot;)
newSigma = np.zeros(A.T.shape)
newSigma[:S.size, :S.size] = np.diag(S[::-1])
A_res = U @ newSigma.T @ V
print(&quot;Verification [U @ Sigma @ V]:&quot;)
print(A_res)


if name == "main":
    main()

The complete code, along with the custom QR decomposition and eigenvalue computation, can be found attached.

Issue Encountered:

While my custom SVD implementation does produce singular values that are close to those generated by np.linalg.svd, the singular vectors (U and V matrices) often have different signs. This leads to a final reconstructed matrix that differs from the original.

The reconstructed matrix from my custom SVD differs from the original, while np.linalg.svd perfectly reconstructs it.

Reference Matrix:
[[7. 6. 5.]
 [6. 5. 4.]
 [5. 4. 3.]]

Reference SVD decomp:
U:
[[-0.6813193   0.60756674  0.40824829]
 [-0.57017342 -0.09075023 -0.81649658]
 [-0.45902754 -0.7890672   0.40824829]]
S:
[1.53898669e+01 3.89866919e-01 8.98295192e-16]
V:[[-0.6813193  -0.57017342 -0.45902754]
 [-0.60756674  0.09075023  0.7890672 ]
 [ 0.40824829 -0.81649658  0.40824829]]
Verification [U @ Sigma @ V]:
[[7. 6. 5.]
 [6. 5. 4.]
 [5. 4. 3.]]

My SVD decomp:
U:
[[ 0.6813193  -0.60756674  0.40824829]
 [ 0.57017342  0.09075023 -0.81649658]
 [ 0.45902754  0.7890672   0.40824829]]
S:
[1.53898669e+01 3.89866919e-01 2.26925653e-08]
V:[[ 0.6813193   0.57017342  0.45902754]
 [-0.60756674  0.09075023  0.7890672 ]
 [ 0.40824829 -0.81649658  0.40824829]]
Verification [U @ Sigma @ V]:
[[7.28782888 5.95700795 4.62618703]
 [5.95700795 5.00642159 4.0558352 ]
 [4.62618703 4.0558352  3.48548338]]

Different custom SVD decomp:
U:
[[ 0.6813193  -0.60756674  0.40824829]
 [ 0.57017342  0.09075023 -0.81649658]
 [ 0.45902754  0.7890672   0.40824829]]
S:
[2.26925653e-08 3.89866919e-01 1.53898669e+01]
V:[[ 0.6813193   0.57017342  0.45902754]
 [-0.60756674  0.09075023  0.7890672 ]
 [ 0.40824829 -0.81649658  0.40824829]]
Verification [U @ Sigma @ V]:
[[7.28782888 5.95700795 4.62618703]
 [5.95700795 5.00642159 4.0558352 ]
 [4.62618703 4.0558352  3.48548338]]

Questions:

Could the discrepancies in the signs of the singular vectors be the reason for the imperfect reconstruction of the original matrix?
Is there a specific step in the QR decomposition or the eigenvalue computation where I should pay extra attention to the signs?
Are there any known best practices or adjustments to ensure the signs align more closely with those produced by standard libraries like NumPy?

Any insights or suggestions on how to resolve these discrepancies and improve the accuracy of my custom SVD implementation would be greatly appreciated!

Thank you!

Note that the "V" returned from the numpy svd algorithm is not actually V, it's the transpose (the Hermitian transpose in the complex case) of V, which is what makes it so that U @ Sigma @ V matches the original matrix. For the traditional SVD, we have $A = U\Sigma V^T$. So, for your verification, you should be calculating U @ Sigma @ V.T instead of U @ Sigma @ V. — Ben Grossmann, Jan 15 '24 at 00:27
I mean thats right, but i am return it already transposed: return U, singular_values, V.T
Also if you take a look at the results, they are the same except the signs. — Momo, Jan 15 '24 at 11:36
Even comparing np.linalg.eig(A @ A.T)[1] and np.linalg.svd(A.copy(), full_matrices=False)[0] show different signs. How do i know which one is correct without calculating the outcome everytime.
np.linalg.eig(A @ A.T)[1] array([[-0.6813193 , -0.60756674, 0.40824829], [-0.57017342, 0.09075023, -0.81649658], [-0.45902754, 0.7890672 , 0.40824829]]) np.linalg.svd(A.copy(), full_matrices=False)[0] array([[-0.6813193 , 0.60756674, 0.40824829], [-0.57017342, -0.09075023, -0.81649658], [-0.45902754, -0.7890672 , 0.40824829]]) — Momo, Jan 15 '24 at 12:02
the sign change is not necessarily wrong: if you take a valid SVD and flip the sign of some columns of U and also the corresponding columns of V (the corresponding rows of $V^T$), then you get another valid SVD. — Ben Grossmann, Jan 15 '24 at 17:14
Now that I’ve actually read your code, I see the problem: you’re not choosing a V that is compatible with your U, you’re simply calculating them independently and hope that the results are compatible — Ben Grossmann, Jan 15 '24 at 17:16

Ben Grossmann · Accepted Answer · 2024-01-15T17:46:42.900

2

The problem with your code is that you are calculating $U$ and $V$ separately rather than calculating one and choosing the other to be compatible. As I explain in detail here, the fact that the columns of $U$ are eigenvectors of $A^TA$ and that the columns of $V$ are eigenvectors of $AA^T$ is not enough to ensure that $A = U \Sigma V^T$ is a singular value decomposition. Once you have selected $U$ and $\Sigma$, $V$ needs to be chosen so that $$ A^TU = V\Sigma^T. $$ For any non-zero singular value $\sigma_j$, this means that the corresponding column of $V$ must satisfy $$ v_j = \frac 1{\sigma_j}A^Tu_j. $$ the remaining columns of $V$ should be selected so that the columns of $V$ form an orthonormal basis of $\Bbb R^n$.

edited Jan 15 '24 at 17:46

answered Jan 15 '24 at 17:25

Ben Grossmann

225,327

YOU ARE JUST AMAZING - thank you! I will confirm the answer - i just have a question regarding the last sentence the last eigenvalue is almost zero. How do i select then the vector as such that the matrix becomes orthonormal. I just saw that the corresponding vector of U in this case, is equal to expected outcome. Should i just take u_j if sigma_j is zero? – Momo Jan 27 '24 at 14:10
1

@Momo Exactly. There are different approaches that could be taken, but every approach involves a QR decomp at some point – Ben Grossmann Jan 27 '24 at 14:24
For now I used the QR decomposition of the partially filled V by using the Q matrix. But there is something very important to highlight. I had to change the signs in the R diagonal if they were negative and the corresponding Q column to get a consistent Q (V) matrix. And not again a differently signed V matrix. That worked for me! – Momo Feb 04 '24 at 13:58

Discrepancies in Custom SVD Implementation Compared to np.linalg.svd - Sign Issues

1 Answers1