Gradient of $x^T B^T B x - x^T B^T b - b^T Bx$

Question

I want to compute the gradient $\nabla_x f(x)$ of $f(x) = x^T B^T B x - x^T B^T b - b^T Bx$ with respect to the vector $x$. So far I have tried below. But when I try to add them together, I couldn't see they come together.

Edits

Now I think I got it. They does come together as the following answers. Since $C=B^TB$, $C$ is symmetric. Then $c_{ij} = c_{ji}$, so $$\frac{\partial D}{\partial x} = 2\Big[\sum_{i=1}^n c_{1i} x_i \ \ \sum_{i=1}^n c_{2i} x_i \ \ \cdots \ \ \sum_{i=1}^n c_{ni} x_i\Big] =2 C x.$$

Hence $$\frac{\partial x^TBb}{\partial x} = \Big[ \sum_{i=1}^n b_{i1} b_i \ \ \sum_{i=1}^n b_{i2} b_i \ \ \ \ \cdots \ \ \sum_{i=1}^n b_{in} b_i \ \ \Big] = B^T b.$$

So $\frac{d}{dx} (x^T B^T B x - 2x^T B^T b) = 2B^T B - 2B^Tb$.

Thank you!

Let $C = B^T B$. \begin{align*} x^T C x =& \begin{pmatrix} x_1 & x_2 & \cdots & x_n\end{pmatrix} \begin{pmatrix} c_{11} & c_{12} & \cdots & c_{1m}\\ c_{21} & c_{22} & \cdots & c_{2m}\\ \vdots & \vdots& \ddots & \vdots \\ c_{n1} & c_{n2} & \cdots & c_{nm}\\ \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\x_n\end{pmatrix}\\ =& \begin{pmatrix} x_1 & x_2 & \cdots & x_n\end{pmatrix} \begin{pmatrix} c_{11} x_1 + c_{12} x_2 + \cdots c_{1n} x_n\\ c_{21} x_1 + c_{22} x_2 + \cdots c_{2n} x_n\\ \vdots\\ c_{n1} x_1 + a_{n2} x_2 + \cdots a_{nn} x_n\\ \end{pmatrix}\\ =& x_1(c_{11} x_1 + c_{12} x_2 + \cdots c_{1n} x_n)+x_2(c_{21} x_1 + c_{22} x_2 + \cdots c_{2n} x_n) + \cdots + x_n(c_{n1} x_1 + a_{n2} x_2 + \cdots a_{nn} x_n) \end{align*} Since the derivative of the scalar $D = x^T C x$ by a vector $x$ is $$\frac{\partial D}{\partial x} = \bigg(\frac{\partial D}{\partial x_1} \frac{\partial D}{\partial x_2} \cdots \frac{\partial D}{\partial x_n}\bigg),$$ I have: \begin{align*} \frac{\partial D}{\partial x_1} &= c_{11} x_1 + c_{12} x_2 + \cdots c_{1n} x_n + c_{11} x_1 + c_{21} x_2 + \cdots + c_{n1} x_n = \sum_{i=1}^n (c_{1i} + c_{i1})x_i \end{align*} Hence $$\frac{\partial D}{\partial x} = \Big[\sum_{i=1}^n (c_{1i} + c_{i1})x_i \ \ \sum_{i=1}^n (c_{2i}+c_{i2})x_i \ \ \cdots \ \ \sum_{i=1}^n (c_{ni}+c_{in})x_i\Big].$$ \begin{align*} x^T B b =& \begin{pmatrix} x_1 & x_2 & \cdots & x_n\end{pmatrix} \begin{pmatrix} b_{11} b_1 + b_{12} b_2 + \cdots b_{1n} b_n\\ b_{21} b_1 + b_{22} b_2 + \cdots b_{2n} b_n\\ \vdots\\ b_{n1} b_1 + b_{n2} b_2 + \cdots b_{nn} b_n\\ \end{pmatrix}\\ =& x_1(b_{11} b_1 + b_{12} b_2 + \cdots b_{1n} b_n)+ x_2(b_{21} b_1 + b_{22} b_2 + \cdots b_{2n} b_n)+\cdots+ x_n(b_{n1} b_1 + b_{n2} b_2 + \cdots b_{nn} b_n) \end{align*} Hence $$\frac{\partial x^TBb}{\partial x} = \Big[ \sum_{i=1}^n b_{1i} b_i \ \ \sum_{i=1}^n b_{2i} b_i \ \ \ \ \cdots \ \ \sum_{i=1}^n b_{ni} b_i \ \ \Big].$$

score 1 · Answer 1 · answered Dec 02 '14 at 09:12

1

Hint: For a scalar $\epsilon$ and appropriate sized vectors $x,y$ we have:

$f(x+\epsilon y) = (x+\epsilon y)^TB^TB(x+\epsilon y) - (x+\epsilon y)^TB^Tb - b^TB(x+\epsilon y)$

$= x^TB^TBx + \epsilon y^TB^TBx + \epsilon x^TB^TBy + \epsilon^2 y^TB^TBy - x^TB^Tb - \epsilon y^TB^Tb - b^TBx - \epsilon b^TBy$

$= (x^TB^TBx - x^TB^Tb - b^TBx) + \epsilon(y^TB^TBx+x^TB^TBy-y^TB^Tb-b^TBy) + \epsilon^2y^TB^TBy$

$= f(x) + \epsilon(2y^TB^TBx-2y^TBb) + \epsilon^2\text{stuff}$

where we have used the fact that $x^TB^TBy = y^TB^TBx$ and $b^TBy = y^TB^Tb$, since the transpose of a scalar is itself.

answered Dec 02 '14 at 09:12

JimmyK4542

54,331

Hi Jimmy I followed so far, but then..? – 1LiterTears Dec 02 '14 at 09:29
1

@1LiterTears then the stuff of order $\epsilon$ is the derivative. – Steven Gubkin Dec 02 '14 at 17:57
I see, thank you @StevenGubkin – 1LiterTears Dec 02 '14 at 21:31

Alex Silva · Answer 2 · 2014-12-02T13:43:19.430

Let $f = \mathbf{x}^{T}B^{T}B\mathbf{x}-\mathbf{x}^{T}B^{T}\mathbf{b} - \mathbf{b}^TB\mathbf{x}$. Notice that $f = \mathbf{x}^{T}B^{T}B\mathbf{x} -2\mathbf{b}^{T}B\mathbf{x}$, because $\mathbf{x}^{T}B^{T}\mathbf{b} = \mathbf{b}^TB\mathbf{x}$. In multivariate calculus*, $\nabla_{\mathbf{x}} ^{T} f = \frac{\partial f}{\partial\mathbf{x}} = 2\mathbf{x}^{T}B^{T}B -2\mathbf{b}^{T}B$. It is a simple calculation.

However, if you don't understand the calculation before. You just follow the steps:

1) Calculate $\Delta f = f(\mathbf{x}+\mathbf{\Delta x})-f(\mathbf{x})$;

2) Keep only the first order terms (in $\Delta \mathbf{x}$) of $\Delta f$. Here, you will have been constructing the differential of $f$ ($\partial f$);

3) Change, thus, $\Delta \mathbf{x}$ by $\partial \mathbf{x}$ and you'll have $\partial f = (\cdot)\partial \mathbf{x}$. The quantity $(\cdot)$ is the derivative $\frac{\partial f}{\partial\mathbf{x}}$.

In order to support you in vector and matrix calculus, I suggest you the books:

a) Matrix Differential Calculus with Applications in Statistics and Econometrics (W.E. Shewhart and S. S. Wilks);

b) Complex-Valued Matrix Derivatives (Are Hjørungnes).

*depends on the notation used.

score 0 · Answer 3 · answered Nov 12 '15 at 04:25

0

Write the function in terms of the Frobenius product and take its differential $$\eqalign{ f &= Bx:Bx - 2\,b:Bx \cr &= M:M - 2\,b:M \cr\cr df &= 2\,M:dM - 2\,b:dM \cr &= 2\,(M-b):dM \cr &= 2\,(Bx-b):B\,dx \cr &= 2\,B^T(Bx - b):dx \cr }$$ The gradient is related to the differential by $df=\big(\,\frac{\partial f}{\partial x}:dx\big),\,$ therefore the gradient is $$\eqalign{ \frac{\partial f}{\partial x} &= 2\,B^T(Bx - b) \cr }$$

answered Nov 12 '15 at 04:25

hans

61

What are you using : to denote? – Michael Albanese Nov 12 '15 at 04:45
The : denotes the Frobenius product. – hans Nov 14 '15 at 07:45

Gradient of $x^T B^T B x - x^T B^T b - b^T Bx$

3 Answers3