3

I was reading this pdf and on page 6 proposition 8 states:

proposition_eight

I don't really understand the steps that bring from

$$\alpha = \sum_{j=1}^n\sum_{i=1}^n a_{ij} x_{i} x_{j}$$

to its derivative

$$\frac{\partial \alpha}{\partial \bf{x}} = \sum_{j=1}^n a_{kj} x_J + \sum_{i=1}^n a_{ik}x_i$$

and then back to the final result:

$$\frac{\partial \alpha}{\partial \bf{x}} = \bf{x}^T A^T + \bf{x}^T A$$

Can someone please help me?

the_candyman
  • 14,064
  • 4
  • 35
  • 62
Euler_Salter
  • 5,153

3 Answers3

5

Another way to approach this formula is to use the definition of derivatives in multivariable calculus. The function is $\alpha: \mathbb R^n \to \mathbb R$ and the Jocabian matrix $D\alpha = \frac{\partial \alpha}{\partial x}$ is thus an $n \times n$ matrix and by definition satisfies the following equation \begin{align*} \lim_{\|h\| \to 0} \frac{\alpha(x+h) - \alpha(x) - D\alpha(h)} {h} = 0. \end{align*} But note $\alpha(x+h) - \alpha(h) = (x+h)^T A (x+h)- x^T A x = h^T A x + x^T A h$. Since $h^T A x = x^T A^T h$, we have $\alpha(x+h) - \alpha(h) = x^T(A^T + A)h$. Then it follows $\frac{\partial \alpha}{\partial x} = x^T(A^T+A)$.

user1101010
  • 3,528
3

Consider a generic $1 \leq k \leq n$. We can write the following: $$\alpha = \sum_{j=1}^n\sum_{i=1}^n a_{ij} x_{i} x_{j} = \sum_{j=1}^n\left(\sum_{i=1, i \neq k}^n a_{ij} x_{i} x_{j} + a_{kj}x_{k}x_{j}\right) = \\ = \sum_{i=1, i \neq k}^n \sum_{j=1}^na_{ij} x_{i} x_{j} + \sum_{j=1}^na_{kj}x_{k}x_{j} =\\ = \sum_{i=1, i \neq k}^n \left(\sum_{j=1, j\neq k}^na_{ij} x_{i} x_{j} + a_{ik}x_i x_k\right) + \sum_{j=1, j \neq k}^na_{kj}x_{k}x_{j} + a_{kk}x_{k}^2 =\\ = \sum_{i=1, i \neq k}^n \sum_{j=1, j\neq k}^na_{ij} x_{i} x_{j} + \sum_{i=1, i\neq k}^na_{ik}x_i x_k + \sum_{j=1, j \neq k}^na_{kj}x_{k}x_{j} + a_{kk}x_{k}^2.\\ $$

Specifically, we have separated all the contributions depending on $x_k$ and those not depending on $x_k$. It is clear now that: $$\frac{\partial \alpha}{\partial x_k} = \sum_{i=1, i\neq k}^na_{ik}x_i + \sum_{j=1, j \neq k}^na_{kj}x_{j} + 2a_{kk}x_{k}.$$

We can further work on the last expression:

$$\frac{\partial \alpha}{\partial x_k} = \left[\sum_{i=1}^na_{ik}x_i - a_{kk}x_k\right] + \left[\sum_{j=1}^na_{kj}x_{j} - a_{kk}x_k\right] + 2a_{kk}x_{k} = \sum_{i=1}^na_{ik}x_i + \sum_{j=1}^na_{kj}x_{j}.$$

Now, we can try to obtain a vectorial representation. Let's pose:

  1. $f_k = \displaystyle\sum_{i=1}^na_{ik}x_i,$
  2. $g_k = \displaystyle\sum_{j=1}^na_{kj}x_{j},$
  3. ${\bf f} = [f_1, f_2, \ldots, f_n],$
  4. ${\bf g} = [g_1, g_2, \ldots, g_n],$

where ${\bf f}$ and ${\bf g}$ are row vectors.

It is clear that:

  1. ${\bf f} = {\bf x}^\top {\bf A},$
  2. ${\bf g} = {\bf x}^\top {\bf A}^\top,$

and hence:

$$\frac{\partial \alpha}{\partial {\bf x}} = {\bf x}^\top {\bf A} + {\bf x}^\top {\bf A}^\top.$$

the_candyman
  • 14,064
  • 4
  • 35
  • 62
  • That is way more involved than I expected honestly.. Thank you anyway! Do you know of a simpler derivation? Everywhere I look people say it's trivial to prove, yet it's quite complex from your example – Euler_Salter Oct 23 '18 at 17:55
  • 1
    @Euler_Salter I guess that the "triviality" that other people refer to rely on the fact that to derive it you must do very simple (but long and boring) calculations like I did... – the_candyman Oct 23 '18 at 18:00
3

Alternative approach(once you get used with these notations, then it will get easier)

Before we start deriving the gradient, some facts:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A^T : (BC)^T \\ &= B^T A : C \\ &= {\text{etc.}} \cr \end{align}

Let $f := x^T A x = {\rm tr}\left(x^T A x \right) = x:Ax$. (Trace of a scalar will return the same scalar).

Now, we can obtain the differential first, and then the gradient. \begin{align} df = d \ {\rm tr }\left ( x^T A x \right) &= d\left(x : A x \right) \\ &= \left(dx : Ax\right) + \left(x : A \ dx\right) \\ &= \left(Ax : dx\right) + \left(A^Tx : dx\right) \\ &= \left( Ax + A^T x \right) : dx\\ &= \left( Ax + A^T x \right)^T : dx^T\\ &= \left( x^T A^T + x^T A \right) : dx^T\\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial}{\partial x^T} \left( x^T Ax \right)= x^T A^T + x^T A. \end{align}

user550103
  • 2,688