2

Working through a non linear problem and i have no idea how he has reached the expansion from, i have wrote the question and the start of the solution which i am confused with, including the explanation given so far.

Question -

Let $A \in R^{m×n} $ with $m \geq n$ and $b \in R^n$. Prove: Point $\bar{x} \in R^n $ is a solution of the unconstrained optimization problem $$min\Vert Ax−b \Vert_{2}, x \in R ^n$$ if and only if $\bar{x}$ is a solution of the system of linear equations $$A^T Ax = A^T b$$

Solution -

$$f(x) := \Vert Ax - b \Vert_2 = ( x^T A^T A x - b^Tx-x^TA^Tb +b^Tb)^{1/2} $$ $$= ( x^T A^T A x -2 b^TAx+b^Tb)^{1/2}$$

The optimization problem is equivalent to minimizing the square of $f$, as $f(x)$ is positive $ \forall x \in R^{n} $and the quadratic function is increasing on $R_{+}$.

1 Answers1

0

You are looking at the normal equations.

First, note that $$ x^TA^Tb = (b^TAx)^T = b^TAx $$ because it is a scalar (and thus equals its tranpose).

Second, note that $$ \frac{\partial (x^TMx)}{\partial x} = x^TM^T + x^TM $$ which is derived here.

Then define \begin{align} V(x) &= ||Ax - b||_2^2 \\ &= (Ax-b)^T(Ax-b) \\ &= x^TA^TAx - x^TA^Tb - b^TAx + b^Tb \\ &= x^TA^TAx - 2 b^TAx + b^Tb \end{align} We want $x^*=\min_x V(x)$, which we can look for by computing the zero of the derivative \begin{align} \frac{\partial V}{\partial x} &= \frac{\partial }{\partial x} x^TA^TAx - 2\frac{\partial }{\partial x}b^TAx \\ &= x^T(A^TA)^T + x^TA^TA - 2b^TA \\ &= x^TA^TA + x^TA^TA - 2b^TA \\ &= 2x^TA^TA - 2b^TA \end{align} Thus, $ \frac{\partial V}{\partial x} = 0 $ implies $$ x^TA^TA = b^TA \;\;\;\;\implies\;\;\;\; A^TAx = A^Tb $$ which are the normal equations.


We can also derive the second identity above using components, in case that's what you wanted. If we have $$ x^TMx = \sum_i\sum_j M_{ij}x_jx_i $$ Then the $k$th component of the vector derivative is given by: \begin{align} \frac{\partial}{\partial x_k} x^TMx &= \frac{\partial}{\partial x_k} \sum_i\sum_j M_{ij}x_jx_i\\ &= \sum_i\sum_j M_{ij} \frac{\partial}{\partial x_k}[x_j x_i]\\ &= \sum_i\sum_j M_{ij} [\delta_{jk}x_i + \delta_{ik}x_j]\\ &= \sum_i\sum_j M_{ij} \delta_{jk}x_i + \sum_i\sum_j M_{ij}\delta_{ik}x_j\\ &= \sum_i M_{ik} x_i + \sum_j M_{kj} x_j\\ &= [x^TM]_k + [x^TM^T]_k\\ \end{align} which implies $\frac{\partial (x^TMx)}{\partial x} = x^TM^T + x^TM$.

user3658307
  • 10,433