1

I have $f(x) = \frac{1}{2}|| Ax - b ||^2 $ for $ x \in \mathbb{R}^n, b \in \mathbb{R}^m $ and $ A \in \mathbb{R}^{m \times n}$

I am trying to write this as a summation, calculate the 1. and 2. derivative and then transform the result (for both derivatives) back to algebra notation.

Rewritten as a summation: $f(x) = \frac{1}{2}|| \sum_{i=1}^{n} \sum_{j=1}^{n} a_{i j} xj - \sum_{i=1}^{n} b_j ||^2$

What I have trouble with is calculating the derivatives since I cannot find much on the subject.

  1. Derivative (not sure if that is correct): $\frac{\partial f(x)_i}{\partial x_k} = \frac{\partial}{\partial x_k}\frac{1}{2}||\sum_{i=j}^{n}(a_{i j} - b_i)||^2 = \frac{1}{2}||\sum_{i=j}^{n} a_{i j} \frac{\partial f(x)_j}{\partial x_k} + 0||^2$

$f(x) = \frac{1}{2}||A||^2$

I would really appreciate if someone could show me how to solve problems like this.

Willi
  • 249

2 Answers2

2

The norm $\|\cdot\|$ is for vectors... You are using norms where it does not make sense. What you have is that

$$ f(x) =\frac 12 \|Ax -b\|^2 = \frac 12 \sum_{i=1}^m\left(\sum_{j=1}^n a_{ij} x_j - b_i\right)^2 $$

So, the derivatives are computed as

$$ \frac{\partial f}{\partial x_k} = \frac 12 \sum_{i=1}^m 2a_{ik}\left(\sum_{j=1}^n a_{ij}x_j - b_i\right)=\sum_{i=1}^m a_{ik}\left(\sum_{j=1}^n a_{ij}x_j - b_i\right) $$

PierreCarre
  • 20,974
  • 1
  • 18
  • 34
2

We can use matrix notation to get the results.

$$ \|Ax - b\|^2 = (Ax - b)^T(Ax - b) = (x^TA^T - b^T)(Ax - b) = $$ $$ x^TA^TAx - x^TA^T b - b^T Ax + b^T b = x^TA^TAx - 2 x^TA^T b + b^T b $$

Not the derivatives: $$ \frac{\partial}{\partial x}b^T b = 0 $$

$$ \frac{\partial}{\partial x}2 x^T A^T b = 2 A^T b \> \frac{\partial}{\partial x}x^T = 2 A^T b $$

$$ \frac{\partial}{\partial x}x^TA^TAx = A^TA \> \frac{\partial}{\partial x}x^Tx = 2 A^TAx $$

Finally we get $$ \frac{1}{2}\frac{\partial}{\partial x}\|Ax - b\|^2 = A^TAx - A^T b $$

Here we use the property of the transposition operation $(AB)^T = B^TA^T$

  • 1
    You can just use the chain rule so you don't have to expand everything. If $f(x) = (1/2) | Ax - b |^2$ then $f(x) = g(h(x))$ where $h(x) = Ax - b$ and $g(u) = (1/2) | u |^2$. Note that $h'(x) = A$ and $g'(u) = u^T$. By the chain rule, $f'(x) = g'(h(x)) h'(x) = (Ax - b)^T A$. If we use the convention that the gradient is a column vector, then $\nabla f(x) = f'(x)^T = A^T(Ax - b)$. – littleO Nov 06 '19 at 10:04
  • In other words, the chain rule just immediately tells us that $f'(x) = (Ax - b)^T A$. So $\nabla f(x) = A^T (Ax - b)$. – littleO Nov 06 '19 at 10:05
  • Is the second derivation A^(T+1)? – Willi Nov 06 '19 at 10:20
  • The second derivative should be $A^T A$. The term $A^T b$ do not depend on $x$, so the derivative equals to zero. The term $A^T A x$ is linear on $x$, so the derivative is $A^TA$. – Alec Kalinin Nov 06 '19 at 10:30
  • Thanks. A^(T+1) = A^T A if I am not mistaken, so I finally got it right. – Willi Nov 06 '19 at 10:46