The SVD Solution to Linear Least Squares / Linear System of Equations

Question

I'm a little confused about the various explanations for using Singular Value Decomposition (SVD) to solve the Linear Least Squares (LLS) problem. I understand that LLS attempts fit $Ax=b$ by minimizing $\|A\hat{x}-b\|$, then calculating the vector $\hat{x}$ such that $\hat{x}=(A^{\top}A)^{-1}A^{\top}b$

But my question(s) are in relation to the two explanations given at SVD and least squares proof and Why does SVD provide the least squares solution to $Ax=b$? :

Why do we need (or care to) to calculate $\hat{x}=V{\Sigma}^{-1}U^{\top}b$ where $SVD(A)=U\Sigma V^{\top}$ when $\hat{x}$ can be calculated vie at the pseudo-inverse mentioned above ($\hat{x}=(A^{\top}A)^{-1}A^{\top}b$)
The first post mentioned that we are subject to the constraint that $\|\hat{x}\|=1$? What happens when the least squares solution does not have $\|\hat{x}\|=1$? Does this invalidate using SVD for the solution of $\hat{x}$ or is there a "back-door" approach?
How do the answers to the questions above (as well as our approach) change when we are minimizing $Ax=0$ versus a generic $Ax=b$? Example: When the SVD of A is $U$, $\Sigma$, and $V^{\top}$ (that is $A\hat{x}=U\Sigma V^{\top}\hat{x}$), I would think we only care about the smallest singular value $\sigma_i$ in $\Sigma$ when solving $Ax=0$, since using the smallest $\sigma_i$ does not necessarily give the best fit to $u_i \sigma_i v^{\top}_i \hat{x} = b$?

Much thanks, Jeff

The matrix $(A^T A)^{-1} A^T$ only exists when $A^T A$ is invertible, i.e., when $A$ has full column rank. Computing the SVD is one way of determining whether $A$ has full column rank. If $A$ does not have full column rank, then one uses the pseudoinverse $(A^T A)^{+}$, which also requires the SVD of $A$. — Rodrigo de Azevedo, Jun 06 '16 at 21:39
It is worth mentioning that the most common way to solve least squares problems is to use a QR decomposition, and not SVD. QR is more efficient while still being extremely stable, even moreso when $A$ is large and sparse. — user7530, Jan 05 '18 at 08:06
@user7530 : if A is large and sparse then conjugate gradients can be faster than QR. — mathreadler, Feb 06 '18 at 17:07
Proof and deep analysis can be found in my Singular Value Decomposition (SVD) Presentation. Specifically this issue is on pages 50-60. — Royi, Jun 16 '18 at 18:10

Ian · Answer 1 · 2018-02-06T17:10:33.760

For the full rank least squares problem, where $A \in \mathbb{K}^{m \times n},m>n=\mathrm{rank}(A)$ ($\mathbb{K}$ is the base field), the solution is $(A^T A)^{-1} A^T b$. This is a very bad way to approach the problem numerically for condition number reasons: you roughly square the condition number, so a relatively tractable problem with $\kappa=10^8$ becomes a hopelessly intractable problem with $\kappa=10^{16}$ (where we think about tractability in double precision floating point). The condition number also enters into convergence rates for certain iterative methods, so such methods often perform poorly for the normal equations.

The SVD pseudoinverse is exactly the same as the normal equations pseudoinverse i.e. $(A^T A)^{-1} A^T$. You simply compute it using the SVD and simplify. There is indeed a simplification; the end result is

$$(A^T A)^{-1} A^T = V (\Sigma^T \Sigma)^{-1} \Sigma^T V^T.$$

This means that if I know the matrix of right singular vectors $V$, then I can transform the problem of finding the pseudoinverse of $A$ to the (trivial) problem of finding the pseudoinverse of $\Sigma$.

The above is for the full rank problem. For the rank deficient problem with $m>n>\mathrm{rank}(A)$, the LS solution is not unique; in particular, $A^T A$ is not invertible. The usual choice is to choose the solution of minimal Euclidean norm (I don't really know exactly why people do this, but you do need some criterion). It turns out that the SVD pseudoinverse gives you this minimal norm solution. Note that the SVD pseudoinverse still makes sense here, although it does not take the form I wrote above since $\Sigma^T \Sigma$ is no longer invertible either. But you still obtain it in basically the same way (invert the nonzero singular values, leave the zeros alone).

One nice thing about considering the rank-deficient problem is that even in the full rank case, if $A$ has some singular value "gap", one can forget about the singular values below this gap and obtain a good approximate solution to the full rank least squares problem. The SVD is the ideal method for elucidating this.

The homogeneous problem is sort of unrelated to least squares, it is really an eigenvector problem which should be understood using different methods entirely.

Finally a fourth comment, not directly related to your three questions: in reasonably small problems, there isn't much reason to do the SVD. You still should not use the normal equations, but the QR decomposition will do the job just as well and it will terminate in an amount of time that you can know in advance.

dantopa · Answer 2 · 2023-05-29T23:14:46.780

3

The SVD decomposition is $$\mathbf{A}=\mathbf{U}\Sigma\mathbf{V}^{*}$$ The pseudoinverse is $$\mathbf{A}^{\dagger}=\mathbf{V}\Sigma^{\dagger}\mathbf{U}^{*}$$ Given one form, you can compute the other. The least solution to the generic linear system $\mathbf{A}x = b$ is $$ x_{LS} = \mathbf{A}^{\dagger}b + \left( \mathbf{I} - \mathbf{A}^{\dagger}\mathbf{A} \right ) y $$ where $y$ is an arbitrary vector in the same space a $x$. As long as the data vector $b$ is not in the null space $\mathcal{N}\left( \mathbf{A}^{*}\right)$, we will always have a least squares solution, written above. If the matrix $\mathbf{A}$ has full column rank, then we can form and solve the normal equations which has the solution $$ x_{LS} = \left( \mathbf{A}^{*}\mathbf{A} \right)^{-1} \mathbf{A}^{*} b. $$ When the inverse of the product matrix exists $$ \mathbf{A}^{\dagger} = \mathbf{V}\Sigma^{\dagger}\mathbf{U}^{*} = \left( \mathbf{A}^{*}\mathbf{A} \right)^{-1} \mathbf{A}^{*}. $$ If the problem is poorly conditioned the normal equations may fail to provide a reliable answer.
The SVD always exists and provides a solution as long as the data vector is not in the null space. The relationship between the SVD and the pseudoinverse is developed in proving standard least square problem with SVD
When $\mathbf{A}x = \mathbf{0}$ the data vector $b=\mathbf{0}$ is in the null space. There is no least squares solution.

edited May 29 '23 at 23:14

answered Mar 08 '17 at 00:50

dantopa

10,342

1

Actually there exist more than one pseudoinverse. – mathreadler Feb 06 '18 at 17:05
1

@mathreadler: Could you elaborate a context? Are you talking about other generalized inverses (e.g. Drazin inverse, https://math.stackexchange.com/questions/2186679/prove-an-identity-which-uses-drazin-inverses-and-moore-penrose-inverses/2199043#2199043. See also Generalized Inverses: Theory and Applications by Ben-Israel and Greville)? Or perhaps different forms the Moore-Penrose pseudoinverse takes (https://math.stackexchange.com/questions/1971211/pseudo-inverse-of-a-matrix-that-is-neither-fat-nor-tall/2181445#2181445)? The Moore-Penrose pseudoinverse here is unique. – dantopa Feb 06 '18 at 21:49
1

Yep. I was thinking Drazin and some other. – mathreadler Feb 07 '18 at 09:59
1

@mathreadler: You raise an interesting and subtle point. While there are many generalized inverses, the SVD solution to the least squares problem is uniquely given by the Moore-Penrose pseudoinverse: https://math.stackexchange.com/questions/772039/how-does-the-svd-solve-the-least-squares-problem/2173715#2173715 – dantopa Feb 08 '18 at 05:42
1

good, that is maybe something worth to add rather than to just write "the pseudoinverse". – mathreadler Feb 08 '18 at 06:08
1

Sorry, but what is this "random vector" $y$? How does it arise in, say, linear least squares regression? I don't think I have never seen it returned by any software that fits a regression line. Thank you. – Confounded May 26 '23 at 01:38
@Confounded Routines to find least squares solutions return the particular solution $\mathbf{A}^{\dagger}b$. If the solution is unique, you are done. Otherwise you need to account for the homogeneous solution. Perhaps this will help: https://math.stackexchange.com/questions/2253443/difference-between-least-squares-and-minimum-norm-solution/2253614#2253614 – dantopa May 28 '23 at 22:10

The SVD Solution to Linear Least Squares / Linear System of Equations

2 Answers2

Linked