0

Let $f: \Bbb R^n \to R$ be a scalar field defined by

$$ f(x) = \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j .$$

I want to calculate $\frac{\partial f}{\partial x_1}$. I found a brute force way of calculating $\frac{\partial f}{\partial x_1}$. It goes as follows:

First, we eliminate all terms that do not contain $x_1$. This leaves

\begin{align*} \frac{\partial f}{\partial x_1} &= \frac{\partial}{\partial x_1} \Big( a_{11} x_1 x_1 + \sum_{j=2}^n a_{1j} x_1 x_j + \sum_{i=2}^n a_{i1} x_i x_1 \Big)\\ &= 2a_{11}x_1 + \sum_{j=2}^n a_{1j} a_j + \sum_{i=2}^n a_{i1}a_i \\ &= \sum_{j=1}^n a_{1j} a_j + \sum_{i=1}^n a_{i1} a_i. \end{align*}

This is a pretty nice result on its own. But then I realized that this problem is related to inner products. Specifically, if we rewrite the terms $f(x)$ and $\frac{\partial f}{\partial x_1}$ as inner products we get

$$ f(x) = \langle x, Ax \rangle $$

and

$$ \frac{\partial f}{\partial x_1} = \langle (A^T)^{(1)}, x\rangle + \langle A^{(1)}, x \rangle = \langle (A^T + A)^{(1)}, x \rangle $$

where $A^{(1)}$ denotes the first column of the matrix $A$.

This suggests that there is a way to circumvent the explicit calculations with sums and instead use properties of the inner product to calculate $\frac{\partial}{\partial x_1}\langle x, Ax \rangle$. However, I wasn't able to find such a proof. If it's possible, how could I go about calculating the partial derivative of $f$ with respect to $x_1$ only using the properties of the inner product?

aras
  • 5,649

2 Answers2

1

The following could be something that you might accept as a "general rule". We just compute the derivative of $\langle x,Ax\rangle$ explicitely, using our knowledge about inner products. Choose some direction $v$, i.e. $v$ is a vector with $\|v\|=1$. Then

$$\lim_{h\to 0} \frac{\langle x+hv,A(x+hv)\rangle-\color{blue}{\langle x,Ax\rangle}}{h}.$$

Because of the bilinear nature of the inner product we find

$$\langle x+hv,A(x+hv)\rangle = \color{blue}{\langle x,Ax\rangle} + h\langle v,Ax\rangle+h\langle x,Av\rangle +\color{red}{h^2\langle v,Av\rangle}.$$

The blue terms cancel out, while the red term will vanish during the limit process. We are left with

$$\langle v,Ax\rangle+\langle x,Av\rangle$$

which can be seen as the derivative of $\langle x,Ax\rangle$ in the direction $v$. Your special case of computing the partial derivative $\partial x_1$ is asking to derive $\langle x,Ax\rangle$ in the direction of $e_1$, which is is the vector $(1,0,\cdots,0)^\top$. Plug it in to get

$$(*)\qquad\langle e_1,Ax\rangle+\langle x,Ae_1\rangle.$$

Such "axis aligned vectors" like $e_1$ are good at extracting coordinates or rows/columns. So, the first term of $(*)$ gives you the first coordinate of $Ax$. This is what you wrote as $\langle (A^\top)^{(1)},x\rangle$. The second term gives you the inner product of $x$ with the first column of $A$. You wrote this as $\langle A^{(1)},x\rangle$.

M. Winter
  • 29,928
0

The partial derivative with respect to $x_1$ can be computed as a directional derivative : $$\frac{\partial f }{\partial x_1}(x) = \frac{d}{dt}(f(x+te_1))|_{t=0}$$ (where $e_1=(1,0,\dots,0)$.)

For $f:x\mapsto \langle x,Ax\rangle$, we obtain \begin{align}\frac{\partial f }{\partial x_1}(x) & = \frac{d}{dt}(f(x+te_1))|_{t=0}=\frac{d}{dt}\langle x+te_1,A(x+te_1)\rangle|_{t=0} \\ & = \frac{d}{dt}\left(\langle x,A,x \rangle + t\langle e_1, Ax\rangle + t \langle x,Ae_1\rangle +t^2 \langle e_1,Ae_1\rangle \right)|_{t=0} \\ & = \langle e_1, Ax\rangle + \langle x,Ae_1\rangle = \langle A^Te_1,x\rangle + \langle Ae_1,x\rangle = \langle (A^T+A)e_1,x\rangle, \end{align} which is what you had obtained, since $Ae_1$ is the first column of $A$ for any matrix $A$. The same proof works for the other partial derivatives (and more generally any directional derivative, if you replace $e_1$ by a vector $v$).

Arnaud D.
  • 20,884