1

Suppose there is a matrix function $$f(w)=w^\top Rw.$$ Where $R∈ℝ^{mxm}$ is an arbitrary matrix, and $w∈ℝ^m$. The gradient of this function with respect to $w$comes out to be $Rw$.

I have looked at different formulas and none of them give me this answer. What is the procedure of solving such matrix gradients?

  • http://thousandfold.net/cz/2013/11/12/a-useful-trick-for-computing-gradients-w-r-t-matrix-arguments-with-some-examples/ – venrey Mar 31 '19 at 08:20

3 Answers3

2

Have a look at this Wikipedia article of the Gâteaux-Derivative.

So using a small increment $ε$ and a direction $δw$ we yield \begin{align*}f(w,εδw) &= (w+εδw)^\top R(w+εδw)\\ &= w^\top Rw + ε(δw)^\top Rw + εw^\top R(δw) + ε^2(δw)^\top R(δw) \end{align*} Applying the derivative w.r.t. $ε$: \begin{align*} \frac{\mathrm{d}}{\mathrm{d}ε}f(w,εδw)= (δw)^\top Rw + w^\top R(δw) + 2ε(δw)^\top R(δw) \end{align*} Setting $ε=0$: \begin{align*} \frac{\mathrm{d}}{\mathrm{d}ε}f(w,εδw)\big|_{ε=0}= (δw)^\top Rw + w^\top R(δw) \end{align*}

Now if $R$ is symmetric you get: \begin{align*} \frac{\mathrm{d}}{\mathrm{d}ε}f(w,εδw)\big|_{ε=0}= 2(δw)^\top Rw \end{align*}

So the gradient is $∇f(w) = 2Rw$.

That is because, $∇f = (∂_{e_1}f, ∂_{e_2}f, …)^T$. So replacing δw with $e_i$ gives: $$∂_{e_i}f = [2Rw]_i,$$ the i-th entry of the vector $2Rw$.


Here is a similar question. IMO, even though the top answer calculates the derivative by brute force doing matrix multiplication, the concept of variational derivative grants you a very nice method to calculate derivatives.
After some times, you can do it in your head skipping the first two steps.

P. Siehr
  • 3,672
0

Write $f(w)=\sum_{i,j=1}^{m}r_{ij}w_{i}w_{j}$, then

$$ \partial_{k}f(w)=2\sum_{j=1}^{m}r_{kj}w_{j} $$

which is the $k$-th component of $2Rw$

shdp
  • 381
0

Let's use a colon to denote the trace/Frobenius product, i.e. $$A:B={\rm tr}(A^TB)$$ Then we can jot down the function, differential, and gradient as $$\eqalign{ f &= R:ww^T\cr df &= R:d(ww^T) = R:(dw\,w^T+w\,dw^T) \cr &= (R+R^T):dw\,w^T = (R+R^T)w:dw \cr \frac{\partial f}{\partial w} &= (R+R^T)w \cr\cr }$$ If $\,R=R^T\,$ then you can simplify the gradient to $2Rw$

greg
  • 35,825