1

Why solving $A^T Ax = A^T b$ means the same as that $Ax$ is the point in the range of $A$ closest to $b$?

Can somebody please explain in detail? Thanks.

dxdydz
  • 1,371
  • 1
    These are the "normal equations", and you can see discussion here too: https://math.stackexchange.com/questions/2363703/proof-of-the-normal-equations-theorem. (They wrote there $m\ge n$, but you don't actually need to assume that.) – Minus One-Twelfth Mar 30 '19 at 05:35
  • I thought this question would be a duplicate, but to my surprise after ten minutes or so of searching I have not found a great math.stackexchange question about how to derive the normal equations. (The question linked above isn't a perfect duplicate because it asks about a particular way of deriving the normal equations using calculus; but that is only one approach, and the linear algebra approach is arguably more elegant.) – littleO Mar 30 '19 at 06:23

4 Answers4

3

Short and very formal answer: if you want to minimize $\|Ax-b\|_2^2$, you should search for $x$ such that $$\nabla\|Ax-b\|^2_2 = 0.$$ But $$0 = \nabla\|Ax-b\|_2^2 = 2A^T(Ax-b) \,\,\,\, \Longleftrightarrow \,\,\,\, A^TAx = A^Tb.$$

User8128
  • 15,485
  • 1
  • 18
  • 31
1

HINT

To begin with, notice that

\begin{align*} A^{T}Ax = A^{T}b \Longleftrightarrow A^{T}(b - Ax) = 0 \Longleftrightarrow (b - Ax)\perp\mathcal{C}(A) \end{align*}

Therefore $Ax$ is the projection of $b$ onto $\mathcal{C}(A)$. Can you take from here?

user0102
  • 21,572
1

In the Euclidean norm the distance is $$ d(Ax, b) = \Vert Ax-b\Vert_2 $$ We are looking for an extremum of $d$ regarding the choice of $x$, so we need the partial derivatives regarding the coordinates $x_k$ to vanish: $$ \begin{align} 0 &= \partial_k d(Ax, b) \\ &= \frac{\partial}{\partial x_k} \left( \sum_i \left( \sum_j a_{ij}x_j-b_i \right)^2 \right)^{1/2} \\ &= \frac{1}{2 d(Ax,b)} \sum_i 2 \left( \sum_j a_{ij}x_j-b_i \right) a_{ij} \delta_{jk} \\ &= \frac{1}{d(Ax,b)} \sum_i a_{ik} \left( \sum_j a_{ij}x_j-b_i \right) \\ &= \frac{1}{d(Ax,b)}\left( A^T (A x - b) \right)_k \end{align} $$ Thus we need a solution $x$ of $A^TAx = A^Tb$.

mvw
  • 34,562
0

If $Ax$ is the point in $R(A)$ which is as close as possible to $b$, then the residual $r = b - Ax$ is orthogonal to $R(A)$. But the "four subspaces" theorem, which is emphasized in Gilbert Strang's books, tells us that $R(A)^\perp = N(A^T)$. Thus, $$ A^T (b - Ax) = 0 \implies A^T Ax = A^T b.$$


Alternatively, we can minimize $f(x)=(1/2) \| Ax - b \|^2$ by setting the gradient equal to $0$. By the multivariable chain rule we have $$ f'(x) = (Ax - b)^T A. $$ It follows that $$ \nabla f(x) = f'(x)^T = A^T (Ax - b). $$ So, setting the gradient equal to $0$, we obtain $$ A^T(Ax - b) = 0 \implies A^T Ax = A^T b. $$

littleO
  • 51,938