2

The least squares problem tries to minimize $||Ax - y ||^{2}$ .

I'm trying to prove that $x^{LS} \perp Ker(A)$ where $x^{LS} = A^{\dagger}y$ and that $x^{LS}$ is the solution with the smallest $L2$ norm.

$A^{\dagger}$ is the Moore-Penrose pseudo-inverse of $A$, and can be written as $A^{\dagger} = VE^{\dagger}U^{T}$

To prove this I tried starting with the fact that showing that $x^{LS} \perp Ker(A)$ is the same as showing that $x^{LS} \in Im(A^{T})$.

My unfinished (probably wrong) proof using SVD:

Let $z \in Im(A^{T})$

\begin{align*} A^Tz & = A^{\dagger}y\\ (U\Sigma V^{T})^{T}z & = V \Sigma^{\dagger}U^{T}y \\ V \Sigma^{T}U^{T}z&= V \Sigma^{\dagger}U^{T}y \\\ V^{T} V \Sigma^{T}U^{T}z&= V^T V \Sigma^{\dagger}U^{T}y\\ \Sigma^{T}U^{T}z&= \Sigma^{\dagger}U^{T}y\\ \end{align*}

I get stuck here, while trying to prove that $x^{LS} \in Im(A^{T})$. To prove orthogonality, I could also see if the dot product of the two equates to $0$, but I got stuck there as well.

Could someone help me with this proof?

1 Answers1

1

Note that $x$ solves the least squares problem if and only if $x \perp \ker(A)$ and $A^TAx = A^Ty$.

Here's one way to see that $A^\dagger y$ is in $\ker(A)^\perp = \operatorname{im}(A^T)$. Suppose $A = U\Sigma V^T$. Let $u_1,\dots,u_m$ denote the columns of $U$, and let $\sigma_1,\dots,\sigma_r$ denote the non-zero diagonal entries of $\Sigma$.

First, note (or verify) that the vectors $v_{r+1},\dots,v_m$ form an orthonormal basis for $\ker(A)$. Let $e_1,\dots,e_m$ denote the columns of the identity matrix (i.e. the standard basis). We now note that for $i = r+1,\dots,m$, we have $$ v_i^TA^\dagger y = (V e_i)^T V\Sigma^\dagger Uy = e_i^T(V^TV) \Sigma^\dagger Uy = (e_i^T\Sigma^\dagger)(Uy). $$ Verify that $e_i^T \Sigma^\dagger$ (the $i$th row of $\Sigma^\dagger$) is zero.

Now, to check that $A^TAx = A^Ty$. We have $$ A^TA(A^\dagger y) = A^T(U\Sigma^T\Sigma V^T)(V \Sigma^\dagger U^T)y = A^T U(\Sigma^T\Sigma\Sigma^\dagger)U^T y \\ = A^T U \Sigma^\dagger U^Ty = (V \Sigma^T U^T)(U \Sigma^\dagger U^T) y = V(\Sigma^T \Sigma^\dagger)U^T y\\ = V \Sigma^T U^Ty = A^T y. $$


Here's a proof of those requirements. Note that a minimizer of $\|Ax - y\|^2$ will satisfy the equation $$ A^TAx = A^Ty, \tag{1} $$ and note that $\ker(A) = \ker(A^TA)$. Now, we show that this equation has at least one solution satisfying $x \perp \ker(A)$. Given a solution $x$, let $x_{\|}$ denote the projection of $x$ onto $\ker(A)$, which is to say that $x_{\|} \in \ker(A)$ and $(x - x_{\|}) \perp \ker(A)$. We see that $x^* = x - x_{\|}$ must be another solution to Equation (1) since $$ (A^TA)x^* = A^TA x - A^TA x_{\|} = A^Ty - A^T(Ax_{\|}) = A^Ty. $$

Now, suppose that $x^*$ is a solution satisfying $x^* \perp \ker(A)$. We can see that $x^*$ minimizes $\|x\|$ subject to the constraint that $A^TAx = A^Ty$ as follows:

Note that every solution to the (1) can be written as a particular solution (in this case, $x^*$) added to a "homogeneous solution" (in this case, any $x_h \in \ker(A^TA)$). In short, every $x$ satisfying the above equation has the form $x^* + x_h$ for some $x_h \in \ker(A)$. With that, it follows that $$ \|x\|^2 = \|x^*\|^2 + \|x_h\|^2 \geq \|x^*\|^2. $$ So, $x = x^*$ indeed minimizes $\|x\|$.

Ben Grossmann
  • 225,327