SVD definition of pseudoinverse: $A^+ b$ is as close as possible to $y$ in terms of the Euclidean norm $\lVert Ax-b\rVert_2$

Question

I am currently studying this answer on the Moore-Penrose pseudoinverse and the Euclidean norm by the user "Etienne dM". The first point of their answer proceeds as follows:

Let $x$ be $A^+y$.

Let me begin by the second point. For all $z$, we have: \begin{align} \lVert Az-b \rVert_2^2 &= \lVert Ax-b \rVert_2^2 + \lVert A(z-x) \rVert_2^2 + 2 (z-x)^TA^T(Ax-b)\\ & \geq \lVert Ax-b \rVert_2^2 + 2 (z-x)^TA^T(Ax-b) \end{align} Moreover, because $(AA^+)^T = AA^+$, $$ A^T(Ax-b) = ((AA^+)A)^Tb - A^Tb = 0$$ Thus, we prove that for all $z$, $\rVert Az-b \lVert_2^2 \geq\rVert Ax-b \lVert_2^2$, that is to say $A^+b$ is as close as possible to $y$ in term of the Euclidian norm $\lVert Ax-b\rVert_2$.

I realise that $x = A^+ y$, but I don't understand how any of this implies that "$A^+ b$ is as close as possible to $y$ in terms of the Euclidean norm $\lVert Ax-b\rVert_2$". The way I see it is that we have 4 facts:

$\lVert Az - b \rVert_2^2 = \lVert Ax - b \rVert_2^2 + \lVert A(z - x) \rVert_2^2 + 2(z - x)^T A^T (Ax - b) \ge \lVert Ax - b \rVert_2^2 + 2(z - x)^T A^T (Ax - b)$;
Because $(AA^+)^T = AA^+$, $A^T(Ax - b) = ((AA^+)A)^T b - A^T b = 0$;
Singular value decomposition (SVD): $A^+ = VD^+U^T = V \Sigma^+ U^T$;
$x = A^+ y$.

Supposedly, these 4 facts together prove that "$A^+ b$ is as close as possible to $y$ in terms of the Euclidean norm $\lVert Ax-b\rVert_2$". However, I do not see how this is so.

This answer by "Ben Grossmann" is supposed to show something similar. I can see how this answer is related to what "Etienne dM" did in their proof, but I do not see how it is the same.

I suspect that I am lacking some assumed, fundamental knowledge that "Etienne dM" had in constructing this proof, and so I am unable to see how these 4 facts combine to prove what is claimed. I would greatly appreciate it if people would please take the time to carefully explain this to me.

This related question is for the limit definition of pseudoinverse.

The phrase "$A^+ b$ is as close as possible to $y$ in terms of the Euclidean norm $\lVert Ax-b\rVert_2$" was chosen a little bit unlucky. All that it means is that $A^+b = \arg\min_x |Ax-b|_2$, and that is clear from showing $|Az-b|_2 \ge |A(A^+b) - b|_2$ for all $z$, i.e. point (1). (It would have been better to say that $A^+b$ is the input for which the function value of $f(x)=Ax$ comes as close as possible to $y$). In any case I do not really see how this question really differs from your previous one, so I will flag it as a duplicate for now. — Hyperplane, Aug 03 '20 at 16:50
@Hyperplane Ahh, phrasing it that way makes a lot more sense. With regards to your next point, the author actually shows that $\rVert Az-b \lVert_2^2 \geq\rVert Ax-b \lVert_2^2$, right? — The Pointer, Aug 03 '20 at 16:59
That is a typo then. Typicially linear systems ar written as $Ax=y$ or as $Ax=b$ and sometimes people forget to use the same variable name as in the question. — Hyperplane, Aug 03 '20 at 17:03
@Hyperplane Oh, ok. All of these small things are probably what has ended up confusing me so much. It just wasn't making sense. — The Pointer, Aug 03 '20 at 17:03
Yes, I just noticed this now as well, the phrase should actually be wither "$A^+b$ is the input for which the function value of $f(x)=Ax$ comes as close as possible to $b$" or "$A^+y$ is the input for which the function value of $f(x)=Ax$ comes as close as possible to $y$". The OP in your first reference even accidentally mixes them both in their answer — Hyperplane, Aug 03 '20 at 17:06
@Hyperplane Something is still missing. We've established that $\rVert Az-b \lVert_2^2 \geq\rVert Ax-b \lVert_2^2$ and $x = A^+ b$, but this still doesn't establish that "$A^+b$ is as close as possible to $y$ in term of the Euclidian norm $\lVert Ax-b\rVert_2$", or whatever corrected version of this we wish to use. Can we have that $z = x = A^+b$? I'm not sure if this would then establish the fact. I mean, with $z = x = A^+b$, I think the proof still won't make sense, but then for a different reason, since we would just be concluding that $\rVert Az-b \lVert_2^2 = \rVert Az-b \lVert_2^2$. — The Pointer, Aug 03 '20 at 17:24
I'm guessing that I'll have to bounty this. There are so many problems with this proof that it'll probably require a full rewriting to make sense of. Fixing one problem just leads to another problem. — The Pointer, Aug 03 '20 at 17:31
You are not seeing that $|Az-y|_2 \ge |A(A^+y)-y|_2\forall z \implies A^+y = \arg\min_x |Ax-y|_2$ ? It's pretty much by definition... — Hyperplane, Aug 03 '20 at 17:47
@Hyperplane But $|Az-y|_2 \ge |A(A^+y)-y|_2\forall z$ is just putting an upper bound on $|A(A^+y)-y|_2$. This doesn't really telling us anything about how good of a fit $x = A^+ y$ is, right? And isn't that the objective of this proof? — The Pointer, Aug 03 '20 at 17:53
The objective is to find the vector $x$ for which $Ax$ is as close as possible to $y$. And if one can prove that for the choice $x=A^+y$ holds that $\operatorname{dist}\text{eucl.}(Az, y) \ge \operatorname{dist}\text{eucl.}(Ax, y)$ for any other vector $z$, then well, you have found your optimum solution in terms of the euclidean distance metric. — Hyperplane, Aug 03 '20 at 17:57
@ThePointer It seems as though that you are having trouble parsing the definition of a "least squares solution." I would say that $x$ is a least squares solution if $z = x$ minimizes $|Az - y|$, with is to say that $|Ax - y|$ is smaller than or equal to $|Az - y|$ for all choices of $z$. Do you agree with that? — Ben Grossmann, Aug 03 '20 at 18:06
@BenGrossmann Ahh, phrased that way, it makes perfect sense, since, if $|Az-y|_2 \ge |A(A^+y)-y|_2\forall z$, then $z = x = A^+y$ is clearly the value of $z$ that minimizes $|Az-y|$ (that is, $z = x = A^+y$ is the value of $z$ such that $|Az-y|$ is closest to $|A(A^+y)-y|$)! Somewhere along the way, I seem to have somehow confused myself about the objective itself: We want the value of $z$ such that $|Az-y|$ is as close as possible to $|A(A^+y)-y|$. — The Pointer, Aug 05 '20 at 18:22
@Hyperplane Anyway, thank you both for taking the time to clarify this. — The Pointer, Aug 05 '20 at 18:31
@ThePointer Missed this comment earlier. Anyway, glad it made sense! — Ben Grossmann, Aug 08 '20 at 20:21

SVD definition of pseudoinverse: $A^+ b$ is as close as possible to $y$ in terms of the Euclidean norm $\lVert Ax-b\rVert_2$

0 Answers0

Linked