2

I am currently studying this answer on the Moore-Penrose pseudoinverse and the Euclidean norm by the user "Etienne dM". The first point of their answer proceeds as follows:

Let $x$ be $A^+y$.

  1. Let me begin by the second point. For all $z$, we have: \begin{align} \lVert Az-b \rVert_2^2 &= \lVert Ax-b \rVert_2^2 + \lVert A(z-x) \rVert_2^2 + 2 (z-x)^TA^T(Ax-b)\\ & \geq \lVert Ax-b \rVert_2^2 + 2 (z-x)^TA^T(Ax-b) \end{align} Moreover, because $(AA^+)^T = AA^+$, $$ A^T(Ax-b) = ((AA^+)A)^Tb - A^Tb = 0$$ Thus, we prove that for all $z$, $\rVert Az-b \lVert_2^2 \geq\rVert Ax-b \lVert_2^2$, that is to say $A^+b$ is as close as possible to $y$ in term of the Euclidian norm $\lVert Ax-b\rVert_2$.

I realise that $x = A^+ y$, but I don't understand how any of this implies that "$A^+ b$ is as close as possible to $y$ in terms of the Euclidean norm $\lVert Ax-b\rVert_2$". I would greatly appreciate it if people would please take the time to explain this to me.

The Pointer
  • 4,182
  • You might find this post helpful. – Ben Grossmann Aug 03 '20 at 11:32
  • @BenGrossmann Eh, I'm not exactly well-acquainted with the theory of singular value decompositions either. – The Pointer Aug 03 '20 at 11:33
  • What definition of $A^+$ are you familiar with, then? – Ben Grossmann Aug 03 '20 at 11:33
  • @BenGrossmann According to my question https://math.stackexchange.com/q/3525644/356308 , the author introduced and defined the pseudoinverse of a matrix $\mathbf{A}$ as $$\mathbf{A}^+ = \lim_{\alpha \searrow 0^+}(\mathbf{A}^T \mathbf{A} + \alpha \mathbf{I} )^{-1} \mathbf{A}^T. \tag{2.46}$$ Not a very insightful introduction/definition, but that's it. – The Pointer Aug 03 '20 at 11:35
  • Thanks for pointing that out. Actually, I this definition might be easier to work with in our case – Ben Grossmann Aug 03 '20 at 11:37
  • @BenGrossmann The author did also briefly introduce the SVD definition $$\mathbf{A}^+ = \mathbf{V} \mathbf{D}^+ \mathbf{U}^T, \tag{2.47}$$ but, as I said, it is quite brief, and it doesn't offer much insight into SVD itself. – The Pointer Aug 03 '20 at 11:39
  • For that second definition, the post from my first comment addresses how one would go from the SVD definition to seeing that $A^+y$ is a least squares solution to $Ax = y$. If you're willing to take it on faith that an SVD decomposition $A = UDV^T$ exists, then that post (and this one if you prefer) handle the rest – Ben Grossmann Aug 03 '20 at 11:44
  • @BenGrossmann Oh, ok, let me try to wrap my head around it. – The Pointer Aug 03 '20 at 11:47
  • @BenGrossmann what is the $\sum$ in the SVD? – The Pointer Aug 03 '20 at 11:48
  • $\Sigma$ is the more common symbol for the diagonal matrix $D$. – Ben Grossmann Aug 03 '20 at 11:49
  • Intuitively, the idea behind the limit definition is that the least-squares solution that we want satisfies $A^TA x = A^Ty$, and if we take the limit of the solutions to $(A^TA + \alpha I)x = A^TA y$ as $\alpha \to 0^+$, then we get the solution as a limit. However, I'm having some difficulty proving that this limit works out as expected. – Ben Grossmann Aug 03 '20 at 11:54
  • The author shows that for any possible vector $z$, we have $|Az-b|_2 \ge |A(A^+b)-b|_2$, hence $x^=A^+b$ must be a solution to the optimization problem $\min_x |Ax-b|_2$. Thus $x^=A^+b$ is the input for which the function value of $f(x)=Ax$ comes as close as possible to $y$ in terms of the euclidean distance. – Hyperplane Aug 03 '20 at 12:01
  • @BenGrossmann What is the reasoning for why the least squares solution must be $$x = (b_1/\sigma_1,\dots,b_r/\sigma_r,0,\dots,0)^T = \Sigma^+ b$$? I think this might be central to my understanding of the proof by "Etienne dM", but it is not included in your answer to the other question. – The Pointer Aug 03 '20 at 12:18
  • @ThePointer I suspect that if you write out the system of equations $\Sigma x = b$ for $n = 3$ and $r = 2$ in its complete form (i.e. as $3$ separate equations), then you'll find the answer to be more obvious. – Ben Grossmann Aug 03 '20 at 13:02
  • @BenGrossmann Is it because $\sum x = b \Rightarrow x = \sum^{-1} b$, where $\sum^{-1} = \sum^+$? This is obviously conflating the cases where $\sum$ has dimensions that do and do not necessitate using the pseudoinverse, but I'm just using this to illustrate the idea. – The Pointer Aug 03 '20 at 13:23
  • @BenGrossmann Ok, I think I now understand your answer to the other question. But how does it illustrate what "Etienne dM" was trying to show in his answer? These are two different situations, and although your answer is related, I don't think it illustrates the same point. – The Pointer Aug 03 '20 at 13:34
  • @Pointer We reach the same conclusion, but we take different approaches. He uses the fact that $x$ is a least squares solution for $Ax = b$ if $A^T(Ax - b) = 0$. – Ben Grossmann Aug 03 '20 at 13:37
  • @BenGrossmann But I don't even see $Ax = b$ mentioned in his answer. – The Pointer Aug 03 '20 at 13:39
  • @Pointer regarding your first comment: first, the Latex is \Sigma, not \sum. Second, your comment is not correct because it is not true that $\Sigma x = b$ has only one solution. – Ben Grossmann Aug 03 '20 at 13:40
  • @ThePointer He says "$A^+b$ is as close as possible to $y$ in term of the Euclidian norm $\lVert Ax-b\rVert_2$". In other words, among all possible $x$, $x = A^+b$ is such that $\lVert Ax-b\rVert_2$ is as small as possible. This is precisely the definition of a "least squares solution". – Ben Grossmann Aug 03 '20 at 13:41
  • @ThePointer In fact, he gives you a way to see this directly: for any $z$, we have $$ |Az - b|^2 \geq |Ax - b|^2 + 2(z - x)^TA^T(Ax - b) = |Ax - b|^2 + 2(z - x)^T0 = |Ax - b|^2. $$ So, because we chose an $x$ for which $A^T(Ax - b) = 0$, we find that $|Ax - b|^2$ is the minimum among all possible values for $|Az - b|^2$. – Ben Grossmann Aug 03 '20 at 13:46
  • @BenGrossmann I'm not sure that I agree with you that this proof makes sense. I think there are a number of problems with it that probably require a full rewriting. See our discussion in the comments here https://math.stackexchange.com/q/3778673/356308 – The Pointer Aug 03 '20 at 17:35

1 Answers1

3

We assume that $(A^TA + \alpha I)^{-1}A^T$ indeed has a limit as $\alpha \to 0^+$. Let $x$ be given by $$ x = A^+ y = \lim_{\alpha \to 0^+ }[(A^TA + \alpha I)^{-1}A^T y]. $$

Consider any $\alpha > 0$. We note that $x_\alpha = (A^TA + \alpha I)^{-1}A^Ty$ is the unique solution to the system $$ (A^TA + \alpha I)x_{\alpha} = A^Ty, $$ and that $x_{\alpha} \to x$ as $\alpha \to 0^+$. It follows that $$ \|A^Ty - A^TAx\| = \lim_{\alpha \to 0^+}\|A^T y - A^TAx_{\alpha}\| = \lim_{\alpha \to 0^+}\|\alpha x_{\alpha}\| \\ \qquad \qquad = \lim_{\alpha \to 0^+} \alpha \|x_{\alpha}\| = \lim_{\alpha \to 0^+} \alpha \|x\| = 0. $$ So, we indeed have $A^TAx = A^Ty$, which means that $x$ is a least-squares solution to $Ax = y$, which is what we wanted.

Ben Grossmann
  • 225,327
  • I forgot about this question. Where did $|A^Ty - A^TAx|$ come from? I can't trace it back to anything that came before. – The Pointer Aug 07 '20 at 19:17
  • We want to show that $A^Ty = A^TAx$, which is the same as showing that $|A^Ty - A^TAx| = 0$ – Ben Grossmann Aug 07 '20 at 20:43
  • If $Ax = y$, then we have that $x = A^+ x = \lim_{\alpha \to 0^+ }[(A^TA + \alpha I)^{-1}A^T Ax]$, which doesn't seem correct? Also, why do we want to show that $A^Ty = A^TAx$? I don't see how this follows from the work that was done before it. So, then, what was the point of the first half of the proof, and what is the reasoning for why the second part follows from it? – The Pointer Aug 08 '20 at 00:37
  • @ThePointer That should say $x = A^+y$, sorry for the typo – Ben Grossmann Aug 08 '20 at 10:32
  • Oh, ok. And how did you get that that $\lim_{\alpha \to 0^+}|A^T y - A^TAx_{\alpha}| = \lim_{\alpha \to 0^+}|\alpha x_{\alpha}|$? – The Pointer Aug 08 '20 at 17:24
  • @ThePointer You can rearrange $(A^TA + \alpha I)x_{\alpha} = A^Ty$ to get $A^Ty - A^TA x_\alpha = \alpha x_{\alpha}$ – Ben Grossmann Aug 08 '20 at 19:10
  • @BenGrossmann Do you have some resources that define the pseudo inverse as this limit? I am curious to learn more. – Jürgen Sukumaran Mar 29 '24 at 11:03
  • @JürgenSukumaran None off the top of my head, but you should find information about this if you look into “Tikhonov regularization” – Ben Grossmann Mar 29 '24 at 13:04