1

Calculate the derivative of $$f(x) = \|Ax - b\|^2$$ where $x \in \mathbb{R}^n$, $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$.

My approach involves replacing substituting all notation with most elementary definitions and trying to isolate particular $x_i$, such that $f(x)$ into a form $f(x) = x_i^2\cdot c_1 + x_i \cdot c_2 + c_3$, where $c_1$, $c_2$ and $c_3$ are some expressions that don't depend on $x_i$ and can be treated as constants. Then, since $i$ is arbitrary, I can make a vector of partial derivatives $\frac{\delta f}{\delta x_i}$.
However, this approach is tedious, requires multiple pages of error-prone derivations, and, judging by the context of the exercise, I'm not meant to do it this way. I wonder, is there a more proper way of approaching such problems, perhaps using row and/or column vectors?

2'5 9'2
  • 54,717
Zyx
  • 786

2 Answers2

3

Here is an approach. I did not double check details and typed in haste. Does it match your result?

You have:

$$ \begin{align} f(x) &= \lVert Ax - b\rVert^2\\ &=(Ax-b)\cdot(Ax-b)\\ &=(Ax)\cdot (Ax) -2b\cdot(Ax)+b\cdot b\\ &=x^TA^TAx -2b^T(Ax)+b^T b\\ &=x^TQx -(2b^TA)x+b^T b&(Q=A^TA\text{, symmetric})\\ \end{align}$$

Now you can see (if you are familiar with the derivative of a symmetric quadratic form) that $\frac{\partial f}{\partial x_i}$ is $$2\sum_{j}Q_{ij}x_j-(2b^TA)_i$$

Or rewritten as:

$$\sum_{j} \left(2e_i^TA^TAe_j\right)x_j-2b^TAe_i$$ where $e_i$ is the unit vector with a $1$ in the $i$th position.

2'5 9'2
  • 54,717
  • Thank you! It seems like the results match, I got $$f(x) = \sum_{i=1}^m \left(\sum_{j=1}^n a_{i j} x_j \right)^2
    • 2 \sum_{i=1}^m b_i \cdot \left( \sum\limits_{j=1}^n a_{i j} x_j\right)
    • \sum\limits_{i=1}^m b_i^2 $$
    – Zyx Nov 03 '19 at 17:29
0

Hint: Calculate the Frechet derivative by writing $f$ as a composition $f=g\circ h$ where $h(x)=Ax-b$ and $g(x)=\|x\|$. Then, by the chain rule $Df(x_0)=Dg(h(x_0))\circ Dh(x_0).$ The derivative of each of these functions is straightforward. Now, any partial derivative can be read off the Jacobian matrix of the Frechet derivative.

Matematleta
  • 29,139