Why is the projection matrix $P = A(A^T A)^{-1} A^T$ left-multiplied by $A$?

Question

Consider a vector space $V$ and its (orthogonal) subspaces $W$ and $U$. If $A$ is a matrix representing the linear map $T: V \rightarrow W$, and we want to project an element of $U$ onto $W$, why is our projection matrix defined as it is?

If we have an element $b \in U$, then its projection onto $W$ is $P = A (A^T A)^{-1} A^T b$, but if we look at what the actual vector that we're looking for is, it is $(A^T A)^{-1} A^T b$, without the left-multiplier $A$.

In Gilbert Strang's 'Introduction to Linear Algebra', he also writes:

"The $\textit{projection}$ of $b$ onto the subspace is $\textbf{p} = A \overline x = A (A^T A)^{-1} A^T b$"

Why the multiplication by $A$? The requested vector is already found, and it is $\overline x$, not $A \overline x$.

Edit: Difference between orthogonal projection and least squares solution is another thread with the same question that I found that clarified my misunderstanding. Specifically Chad's answer. Our $\overline x$ is simply the solution to the equation $A \overline x = p$, so of course the projection must be left-multiplied by A.

Short answer: without that last $A$, the resulting vector is expressed relative to the wrong basis. — amd, May 21 '19 at 03:52

score 1 · Accepted Answer · answered May 21 '19 at 13:07

1

When you want to find the projection $p$ of $b$ onto $W,$ and $W$ is described in terms of a linear map, then you can start with a "parameterization" of $p$ that guarantees that your result is in $W.$ So you set $p=A\bar x$ and $\bar x\in V.$ Now you want $p-b$ to be orthogonal to $W,$ which means $A^T(p-b)=0$ or $A^T(A\bar x-b)=0$ or $$ \bar x = (A^T A)^{-1}A^T b $$ Now you have the particular $\bar x\in V$ that provides the correct parameters for your projection $p,$ and you just have to apply your initial choice $p=A\bar x$ for the parametrization to obtain the projection $p$.

answered May 21 '19 at 13:07

Reinhard Meier

7,331
10
18

I still don't understand why. By our construction, $\overline x = (A^T A)^{-1} A^T b$ is already guaranteed to be the orthogonal projection into the subspace. Why would we 'set' $p = A \overline x$, when that is not what we get when we construct the projection? – Not Legato May 21 '19 at 15:37
1

$\bar x$ is not the projection. It is the input that you have to plug into the map $T$ to get the projection. – Reinhard Meier May 21 '19 at 15:48
Ah, I see now. I went looking for clarification, and found another thread where an explanation clicked with me. I forgot that we were looking for $\overline x$ such that $A \overline x = p$, not simply $\overline x$ itself. – Not Legato May 21 '19 at 23:04

Why is the projection matrix $P = A(A^T A)^{-1} A^T$ left-multiplied by $A$?

1 Answers1