Using a component-wise approach (using partial derivatives), show that the gradient of $f$ is given by $\nabla f(x) = \frac{1}{2}(A +A^T)x + b$

Question

Using a component-wise approach (using partial derivatives), show that the gradient of $f$ is given by te following formulas

\begin{align*} \nabla f(x) &= \frac{1}{2}(A +A^T)x + b\\ \end{align*}

For $A\in \mathbb{R}^{n\times n}$, $b\in \mathbb{R}^n$, $x\in \mathbb{R}^n$ We note $A = (a_{ij})_{\substack{ 1\leq i\leq n \\ 1\leq j\leq n}}$, $b = (b_i)_{1\leq i \leq n}$, $x = (x_i)_{1\leq i \leq n}$

Then $$f(x) = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}x_i a_{ij}x_j + \sum_{i=1}^{n}b_i x_i$$

$$ \nabla f(x) = \big(\frac{\partial f}{\partial x_i}(x)\big)_{1\leq i \leq n} $$

\begin{align*} \forall k\leq n: \frac{\partial f}{\partial x_k}(x) &=\frac{1}{2} = \big[\sum_{i\neq k}x_i a_{ik} + \sum_{j\neq k}a_{kj}x_j + 2a_{kk}x_k\big] + b_k \quad (1)\\ &=\frac{1}{2}\big[\sum_{i=1}^{n}x_ia_{ik} + \sum_{j=1}^{n}a_{kj}x_j\big] + b_k \quad (2)\\ x_i &=\frac{1}{2} \big[\sum_{i=1}^{n}a_{ik} x_i+ \sum_{j=1}^{n}a_{kj}x_j\big] + b_k \\ &=\frac{1}{2}\big[\sum_{i=1}^{n}a_{ik} + a_{kj} x_i\big] + b_k \\ \nabla f(x) &= \frac{1}{2}(A +A^T)x + b\ \end{align*}

Here are my questions - what does exactly the $k$ represent in (1) and why can't it be equal to $i$? - how can we go from (1) to (2) and from $\sum_{i\neq k}$ to $\sum_{i=1}^{n}$

angryavian · Accepted Answer · 2019-03-08T17:25:46.293

You are computing the partial derivative with respect to the component $x_k$; this is the $k$ throughout the computation.

There is some casework to do when computing the partial derivative of each addend. $$\frac{\partial}{\partial x_k} (a_{ij} x_i x_j) = \begin{cases}2 a_{kk} x_k^2 & i=j=k \\ a_{kj} x_j & i=k, j \ne k \\ a_{ik} x_i & i \ne k, j = k \\ 0 & i \ne k, j \ne k\end{cases}$$

Doing this term by term in the sum $\sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j$ yields (1).

To get (2), just combine sums: $$\sum_{i \ne k} a_{ik} x_i + a_{kk} x_k = \sum_{i=1}^n a_{ik} x_i$$ $$\sum_{j \ne k} a_{kj} x_j + a_{kk} x_k = \sum_{j=1}^n a_{kj} x_i$$

Response to comment:

It's not as complicated as you think. Just ask yourself how to compute the following: \begin{align} &\frac{\partial}{\partial x_k} (a_{kk} x_k x_k)\\ &\frac{\partial}{\partial x_k} (a_{kj} x_k x_j) & j \ne k\\ &\frac{\partial}{\partial x_k} (a_{ik} x_i x_k) & i \ne k\\ &\frac{\partial}{\partial x_k} (a_{ij} x_i x_j) & i \ne k, j \ne k \end{align}

Finally, $\sum_{j \ne k}$ is a sum over all possible values of $j$ in $1, \ldots, n$ excluding $k$. So if $n=5$ and $k=2$, the sum is over $j$ in $\{1,3,4,5\}$.

Thanks a lot @angryavian for your answer. Where can I find the proof of $\frac{\partial}{\partial x_k} (a_{ij} x_i x_j) = \begin{cases}2 a_{kk} x_k^2 & i=j=k \ a_{kj} x_j & i=k, j \ne k \ a_{ik} x_i & i \ne k, j = k \ 0 & i \ne k, j \ne k\end{cases}$ and how should I put in word $\sum_{j\ne k}$: "the sum of... starting to $j$ but $j$ not equal to $k$"? — ecjb, Mar 08 '19 at 16:42

score 0 · Answer 2 · answered Mar 08 '19 at 16:23

The index $k$ stands for any index in the range $1, \cdots, n$. It does not make sense to make it equal to $i$ because $i$ is also a dummy sum variable in the formula. If you really want to use $i$, just write $f$ equivalently as

$$ f(x)=\frac 12 \sum_{k=1}^n \sum_{j=1}^n a_{kj} x_k x_j + \sum_{k=1}^n b_k x_k $$

and proceed from there computing $\frac{\partial f}{\partial x_i}$.

Using a component-wise approach (using partial derivatives), show that the gradient of $f$ is given by $\nabla f(x) = \frac{1}{2}(A +A^T)x + b$

2 Answers2