3

I have a question about the following derivative. Let us have $X\in\mathbb{R}^{m\times n}, z\in\mathbb{R}^{n}$ and I would like to find the derivative $$\frac{\partial (z^{T}X^{T}Xz)}{\partial z}. $$ Any idea? It gives me a hard time. Thank you.

Laura
  • 343

3 Answers3

2

$$\frac{\partial (z^{T}X^{T}Xz)}{\partial z} = 2X^T X z.$$

Indeed:

$$z^{T}X^{T}Xz = \sum_{k=1}^n\sum_{h=1}^m \sum_{j=1}^n z_kx_{hk}x_{hj}z_j.$$

Fix $i \in \{1, \ldots, N\}$. Observe that:

$$\sum_{k=1}^n\sum_{h=1}^m \sum_{j=1}^n z_kx_{hk}x_{hj}z_j = \sum_{k=1}^n\sum_{h=1}^m \left(\sum_{j=1, j \neq i}^n z_kx_{hk}x_{hj}z_j + z_kx_{hk}x_{hi}z_i\right) = \\ = \sum_{k=1, k \neq i}^n\left[\sum_{h=1}^m \left(\sum_{j=1, j \neq i}^n z_kx_{hk}x_{hj}z_j + z_kx_{hk}x_{hi}z_i\right)\right] + \\ +\sum_{h=1}^m \left(\sum_{j=1, j \neq i}^n z_ix_{hi}x_{hj}z_j + z_ix_{hi}x_{hi}z_i\right).$$

As you can see, there are terms which depend on $i$, and other which do not depend on, i.e.

$$\sum_{k=1}^n\sum_{h=1}^m \sum_{j=1}^n z_kx_{hk}x_{hj}z_j = \sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi}z_i + \sum_{h=1}^m \sum_{j=1, j \neq i}^n z_ix_{hi}x_{hj}z_j +\sum_{h=1}^m z_ix_{hi}x_{hi}z_i + \text{terms which do not depend on}~i.$$

Taking the derivative with respect to $z_i$, one gets:

$$\frac{\partial z^{T}X^{T}Xz}{\partial z_i} = \sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi} + \sum_{h=1}^m \sum_{j=1, j \neq i}^n x_{hi}x_{hj}z_j + 2\sum_{h=1}^m x_{hi}x_{hi}z_i = \\ = \sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi} + \sum_{h=1}^m \sum_{k=1, k \neq i}^n x_{hi}x_{hk}z_k + 2\sum_{h=1}^m x_{hi}x_{hi}z_i = \\ = 2\sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi} +2\sum_{h=1}^m x_{hi}x_{hi}z_i = \\ = 2\left(\sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi} +\sum_{h=1}^m x_{hi}x_{hi}z_i\right) = \\ = 2\sum_{h=1}^m\sum_{k=1}^n x_{hi} x_{hk} z_k, $$

which corresponds to twice the $i$-th component of $X^T Xz.$

the_candyman
  • 14,064
  • 4
  • 35
  • 62
  • How do you know that the result is this? Could you describe some steps? Thanks. – Laura Dec 06 '20 at 19:36
  • @Laura I've added more details to my answer. Please, check them. – the_candyman Dec 06 '20 at 20:49
  • Thanks for the the time you spent on it. I really appreciate this. However, is this possible to explain it without using elements of the vectors? Somehow in a matrix form. Let's say I am supposed to do this during exam as a part of an example and I do not have so much time to do this.... – Laura Dec 06 '20 at 20:53
  • This question will eventually be closed. You may want to re-post your answer at the linked question with almost 50 upvotes. – Rodrigo de Azevedo Dec 11 '20 at 16:51
0

Let $f: \mathbb{R}^n\to \mathbb{R}$ the map $f(z)= z^TX^TXz $ for $X\in\mathbb{R}^{m\times n}$ and $z\in \mathbb{R}^{n\times 1}$. We have $$ Df(z)\cdot v= v^TX^TXz+z^TX^TXv $$ In fact, $$ f(z+v)=(x+v)^TX^TX(x+v)= z^TX^TXz+v^TX^TXz+z^TX^TXv+v^TX^TXv. $$ Then $$ \lim_{v\to 0}\frac{f(z+v)-f(z)- v^TX^TXz+z^TX^TXv}{\|v\|} = \lim_{v\to 0}\frac{v^TX^TXv}{\|v\|} =0 $$ The derivatie is a linear map $$ \mathbb{R}^n\ni v\longmapsto Df(z)\cdot v=v^TX^TXz+z^TX^TXv\in \mathbb{R}^n $$

Elias Costa
  • 14,658
0

Define a new vector $$y=Xz$$ Write the function in terms of this vector. Then calculate the differential and gradient. $$\eqalign{ \phi &= y:y \\ d\phi &= 2y:dy &= 2y:X\,dz &= 2X^Ty:dz \\ \frac{\partial \phi}{\partial z} &= 2X^Ty &= 2X^TXz \\ }$$


In the above, a colon is used to denote the trace/Frobenius product, i.e. $$A:B = \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} = {\rm Tr}(A^TB)$$ This product is obviously commutative, i.e. $\,A\!:\!B=B\!:\!A$

In addition, the cyclic property of the trace permits the terms in such a product to be rearranged in a number of equivalent ways, e.g. $$A:BC \;=\; AC^T:B \;=\; B^TA:C$$ Finally, the product is applicable to vectors by treating them as rectangular matrices (set $n=1$) in which case it becomes the familiar dot product.

greg
  • 35,825