I have a question about the following derivative. Let us have $X\in\mathbb{R}^{m\times n}, z\in\mathbb{R}^{n}$ and I would like to find the derivative $$\frac{\partial (z^{T}X^{T}Xz)}{\partial z}. $$ Any idea? It gives me a hard time. Thank you.
-
where is $Y$? I could not see it. – Red Phoenix Dec 06 '20 at 19:47
-
It was a mistake. I deleted it. – Laura Dec 06 '20 at 19:51
-
Probably yes, but it is still too complicated and not proparly clear. – Laura Dec 06 '20 at 21:42
-
This question has been asked dozens of times. There are 63 questions linking to the question I linked to. – Rodrigo de Azevedo Dec 06 '20 at 21:45
-
This question should be closed. It is ridiculous to answer the very same question in at least 65 different places. – Rodrigo de Azevedo Dec 11 '20 at 16:50
3 Answers
$$\frac{\partial (z^{T}X^{T}Xz)}{\partial z} = 2X^T X z.$$
Indeed:
$$z^{T}X^{T}Xz = \sum_{k=1}^n\sum_{h=1}^m \sum_{j=1}^n z_kx_{hk}x_{hj}z_j.$$
Fix $i \in \{1, \ldots, N\}$. Observe that:
$$\sum_{k=1}^n\sum_{h=1}^m \sum_{j=1}^n z_kx_{hk}x_{hj}z_j = \sum_{k=1}^n\sum_{h=1}^m \left(\sum_{j=1, j \neq i}^n z_kx_{hk}x_{hj}z_j + z_kx_{hk}x_{hi}z_i\right) = \\ = \sum_{k=1, k \neq i}^n\left[\sum_{h=1}^m \left(\sum_{j=1, j \neq i}^n z_kx_{hk}x_{hj}z_j + z_kx_{hk}x_{hi}z_i\right)\right] + \\ +\sum_{h=1}^m \left(\sum_{j=1, j \neq i}^n z_ix_{hi}x_{hj}z_j + z_ix_{hi}x_{hi}z_i\right).$$
As you can see, there are terms which depend on $i$, and other which do not depend on, i.e.
$$\sum_{k=1}^n\sum_{h=1}^m \sum_{j=1}^n z_kx_{hk}x_{hj}z_j = \sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi}z_i + \sum_{h=1}^m \sum_{j=1, j \neq i}^n z_ix_{hi}x_{hj}z_j +\sum_{h=1}^m z_ix_{hi}x_{hi}z_i + \text{terms which do not depend on}~i.$$
Taking the derivative with respect to $z_i$, one gets:
$$\frac{\partial z^{T}X^{T}Xz}{\partial z_i} = \sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi} + \sum_{h=1}^m \sum_{j=1, j \neq i}^n x_{hi}x_{hj}z_j + 2\sum_{h=1}^m x_{hi}x_{hi}z_i = \\ = \sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi} + \sum_{h=1}^m \sum_{k=1, k \neq i}^n x_{hi}x_{hk}z_k + 2\sum_{h=1}^m x_{hi}x_{hi}z_i = \\ = 2\sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi} +2\sum_{h=1}^m x_{hi}x_{hi}z_i = \\ = 2\left(\sum_{k=1, k \neq i}^n\sum_{h=1}^m z_kx_{hk}x_{hi} +\sum_{h=1}^m x_{hi}x_{hi}z_i\right) = \\ = 2\sum_{h=1}^m\sum_{k=1}^n x_{hi} x_{hk} z_k, $$
which corresponds to twice the $i$-th component of $X^T Xz.$

- 14,064
- 4
- 35
- 62
-
How do you know that the result is this? Could you describe some steps? Thanks. – Laura Dec 06 '20 at 19:36
-
-
Thanks for the the time you spent on it. I really appreciate this. However, is this possible to explain it without using elements of the vectors? Somehow in a matrix form. Let's say I am supposed to do this during exam as a part of an example and I do not have so much time to do this.... – Laura Dec 06 '20 at 20:53
-
This question will eventually be closed. You may want to re-post your answer at the linked question with almost 50 upvotes. – Rodrigo de Azevedo Dec 11 '20 at 16:51
Let $f: \mathbb{R}^n\to \mathbb{R}$ the map $f(z)= z^TX^TXz $ for $X\in\mathbb{R}^{m\times n}$ and $z\in \mathbb{R}^{n\times 1}$. We have $$ Df(z)\cdot v= v^TX^TXz+z^TX^TXv $$ In fact, $$ f(z+v)=(x+v)^TX^TX(x+v)= z^TX^TXz+v^TX^TXz+z^TX^TXv+v^TX^TXv. $$ Then $$ \lim_{v\to 0}\frac{f(z+v)-f(z)- v^TX^TXz+z^TX^TXv}{\|v\|} = \lim_{v\to 0}\frac{v^TX^TXv}{\|v\|} =0 $$ The derivatie is a linear map $$ \mathbb{R}^n\ni v\longmapsto Df(z)\cdot v=v^TX^TXz+z^TX^TXv\in \mathbb{R}^n $$

- 14,658
-
Why do you have $f:\mathbb{R}^{n}\to \mathbb{R}^{n}$? I would say that it should be $f:\mathbb{R}^{n}\to \mathbb{R}$. – Laura Dec 07 '20 at 19:46
-
Define a new vector $$y=Xz$$ Write the function in terms of this vector. Then calculate the differential and gradient. $$\eqalign{ \phi &= y:y \\ d\phi &= 2y:dy &= 2y:X\,dz &= 2X^Ty:dz \\ \frac{\partial \phi}{\partial z} &= 2X^Ty &= 2X^TXz \\ }$$
In the above, a colon is used to denote the trace/Frobenius product, i.e. $$A:B = \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} = {\rm Tr}(A^TB)$$ This product is obviously commutative, i.e. $\,A\!:\!B=B\!:\!A$
In addition, the cyclic property of the trace permits the terms in such a product to be rearranged in a number of equivalent ways, e.g. $$A:BC \;=\; AC^T:B \;=\; B^TA:C$$ Finally, the product is applicable to vectors by treating them as rectangular matrices (set $n=1$) in which case it becomes the familiar dot product.

- 35,825