1

I'm trying to work out the partial derivatives of a function $L$ in terms of $x_i$:

$$ A \in \mathbb{R}^{m x n} \quad b \in \mathbb{R}^m \quad x \in \mathbb{R}^n $$

$$\begin{aligned} L(x) &= \left\|{Ax - b }\right\|^2 \\&= (Ax-b)^T(Ax-b) \\ &= x^TA^TAx - b^TAx - x^TA^Tb + b^Tb\end{aligned}$$

All four of these terms are scalars so I think I can transpose the third term to get:

$$\begin{aligned} &= x^TA^TAx - 2b^TAx + b^Tb\end{aligned}$$

I'm a little stuck on how to transform $x^TA^TAx$ further.

Calculating the partial derivatives, the final term $b^Tb$ is constant so goes to zero, the second term has coefficient vector $2b^TA$ so we just drop the $x$, its again the first term I'm stuck on:

$$\frac{\partial{L(x)}}{\partial x} = ??? -2b^TA$$

What's the derivative of $x^TA^TAx$ in terms of $x$? How you work it out?

Update:

After looking at the potential duplicate, I think I'm mostly covered - however it's not immediately obvious to me why:

$$\frac {\partial(x^TMx)} {\partial x}=(M+M^T)x$$

How is this rule derived?

2 Answers2

2

We have $\frac{\partial}{\partial x} = \begin{bmatrix} \frac{\partial}{\partial x_1} \\ \frac{\partial}{\partial x_2}\\ \ldots \\ \frac{\partial}{\partial x_n} \\ \end{bmatrix}$,

Let $M$ be a $n\times n$ matrix, then $\frac{\partial}{\partial x}(x^TMx)$ is a $n\times1$ vector. Consider the $k^{th}$ element of this vector

$\frac{\partial}{\partial x_k}(x^TMx)=\frac{\partial}{\partial x_k}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}x_im_{ij}x_j=\sum\limits_{i=1}^{n}x_im_{ik}+\sum\limits_{j=1}^{n}x_jm_{kj}$

$\implies \frac{\partial}{\partial x}(x^TMx) = Mx+M^Tx=(M+M^T)x$

Sandipan Dey
  • 2,111
1

I answer to your last question for the term $f(x)=x^\top A^\top A x$.

You can easily find it in the following way:

$$f(x+dx)= ( x + dx)^\top A^\top A (x+dx) = x^\top A^\top A x + 2x^\top A^\top A dx + dx^\top A^\top A dx\\= f(x) + 2x^\top A^\top A dx + O(dx^2) $$

As you can see, the linear term in $dx$ is your gradient (or better, the transpose of it, since what you see there is gradient transpose).

So $$\frac{\partial (x^\top A^\top A x)}{\partial x}= 2 A^\top A x$$

yes
  • 878