3

I am trying to prove that

$$\displaystyle{\frac{\partial}{\partial x} \left( x\cdot x^\top \right) = x\otimes \mathbb{I}+\mathbb{I}\otimes x}$$

The product $x\cdot x^\top$ is a rank-$1$ matrix. Thus, we actually have a derivative of a matrix by a vector. How do we handle such case? Can you please help prove the equality?

darkmoor
  • 793
  • 1
  • 1
    Is the equality even correct? Aren't you forgetting to vectorize the left-hand side before taking the Jacobian? – Rodrigo de Azevedo Jul 10 '18 at 22:15
  • @RodrigodeAzevedo I was trying to compute the derivative and accidentally came up with this (site)[http://www.matrixcalculus.org/] where I took it. It has to be correct because this derivative is part of a larger derivative which according to this site and the paper I am reading the results are the same. – darkmoor Jul 11 '18 at 07:36
  • 1
    Sorry, but it has to be incorrect! Differentiating a matrix with respect to a vector produces a $3$-dimensional matrix. However, if the matrix is first vectorized, then the derivative with respect to a vector would be a tall Jacobian matrix (which is what you have). – Rodrigo de Azevedo Jul 11 '18 at 09:30

2 Answers2

2

I prefer to use Frechét derivatives to think about derivatives with matrices involved. If you prefer other concepts of derivatives, let me know.

Computation

Let $F: \mathbb R^n \to \mathrm{Mat}(n\times n): x \mapsto x \cdot x^T$, we compute $$F(x + h) - F(x) = (x+h) \cdot (x+h)^T - x \cdot x^T\\ = h \cdot x ^T + x \cdot h^T + h \cdot h^T,$$ hence $$F(x+h)-F(x) = (\mathbb I \otimes x + x \otimes \mathbb I)[h] + \mathcal o(h).$$

This implies $Df(x) = \mathbb I \otimes x + x \otimes \mathbb I$.

($\mathbb I \otimes x: \mathbb R^n \to \mathrm{Mat}(n\times n): h \mapsto h \cdot x^T$ and $x \otimes \mathbb I: \mathbb R^n \to \mathrm{Mat}(n\times n): h \mapsto x \cdot h^T$.)

Interpretation

The Frechét derivative of $f$ at a point $x$ is a linear map $Df(x): \mathbb R^n \to \mathrm{Mat}(n \times n)$, which is the best linear approximation of $f$.

We already have computed how $\mathrm D f(x)$ acts on an vector $h$. ($\mathrm D f(x): h \mapsto h \cdot x^T + x \cdot h^T$.)

Now sometimes a better notation or a matrix-like representation is needed. One possibility is to identify $\mathrm{Mat}(n \otimes n)$ with $\mathbb R^{n\cdot n}$ and to represent the linear map as a matrix (with dimension $n \times (n^2)$. But this is not really instructive.

Instead we use the tensor notation as above and get a nice and short representation for this linear map.

To practice this concept, you may could try to compute the Frechét derivative of the map $\mathrm{Mat}(n \times n) \to \mathrm{Mat}(n \times n): X \mapsto X^T \cdot X$.

Clarification

Like pointed out by RodrigodeAzevedo: The term $\mathbb I \otimes x$ is maybe more commonly interpreted as a Kronecker-product, which leads to a vectorized expression.

Hence $$\mathbb I \otimes x = \begin{pmatrix} 1 \cdot x& \dots &0 \cdot x \\ \vdots & \ddots & \vdots \\ 0 \cdot x & \dots &1 \cdot x \end{pmatrix} \in \mathbb R^{n^2 \times n}$$ and $$x \otimes \mathbb I = \begin{pmatrix} x_1 \cdot \mathbb I \\ \vdots \\ x_n \cdot \mathbb I \end{pmatrix} \in \mathbb R^{n^2 \times n}.$$

If we use this definition for $\otimes$, we have to vectorize the matrices in order to prove that our candidate is a derivative in the usual sense, i.e. $$ \lim_{h\to 0} \frac{\mathrm{vec}( F(x+h) - F(x) ) - (\mathbb I \otimes x + x \otimes \mathbb I)h }{||h||} = 0.$$

Steffen Plunder
  • 1,033
  • 9
  • 12
  • 2
    How come $(I\otimes x + x \otimes I)h = hx^{T}+xh^{T}$ ? – Jackozee Hakkiuz Jul 10 '18 at 23:04
  • Oh, I didn't took a lot of care at this point! Using the Kronecker-product, it works out well. I'm not sure how a tensor product is defined between a linear map $\mathbb R^n \to \mathbb R^n$ and a vector $\mathbb R^n$. With $x$ seen as a map $\mathbb R \to \mathbb R^n: t \mapsto t x$, it seems to work out, since $\mathbb R^n \otimes \mathbb R \cong \mathbb R$. (https://en.wikipedia.org/wiki/Tensor_product#Tensor_product_of_linear_maps, using $S=\mathbb I$, $V=X=\mathbb R^n$ and $T=x$, $W=\mathbb R$ and $Y=\mathbb R^n$) – Steffen Plunder Jul 11 '18 at 00:03
  • @SteffenPlunder thanks for the answer. I am not used to with this logic rather than usual matrix calculus (e.g computing Jacobian). Please give me some time to study your answer I am very pressed these days with other things. I am working on this concepts on my free time. Of course any alternative approach is of course welcome. – darkmoor Jul 11 '18 at 07:40
  • It should be $$\mbox{vec} \left( F(x+h) - F(x) \right)= (\mathbb I \otimes x + x \otimes \mathbb I)[h] + \mathcal o(h)$$ – Rodrigo de Azevedo Jul 11 '18 at 10:08
  • @RodrigodeAzevedo My answer was all about trying to avoid the vectorization of matrices, by using Frechét derivatives. If $\mathbb I \otimes x$ is defined as in my answer, I would argue, that both sides of the equality are matrices and there is no need to apply the isomorphism to vectorize it. Like pointed out in the comments there is a way to see $\mathbb I \otimes x$ as a usual tensor product as well. If $\otimes$ is the Kronecker-product, you are right. I will add a correction. If you still think it's wrong, I would prefer to delete the answer instead of using vectorization. – Steffen Plunder Jul 11 '18 at 10:57
  • I have no idea what a tensor product is. – Rodrigo de Azevedo Jul 11 '18 at 11:00
  • It's a useful and universal concept! Basically $x \cdot x^T$ is already a tensor product, for this reason it is natural to use tensor products. Physicist usually use indices to describe them: $(x_i){i=1,\dots,n}$ is a 1-tensor and $x \cdot x^T = (x_i \cdot x_j ){i,j=1,\dots,n}$ is a 2-tensor. Vectorization is now the process of using $n^2$ numbers $1,2,\dots,n^2$ instead of $n^2$ pairs of numbers $(1,1),(1,2),\dots,(n,n)$. Using indices a tensor product between a 1-tensor and a 2-tensor is just $x \otimes A = (x_i A_{kj})_{i,j,k=1,\dots,n}$. But this is only a partial description. – Steffen Plunder Jul 11 '18 at 11:32
2

Let matrix-valued function $\mathrm F : \mathbb R^n \to \mathbb R^{n \times n}$ be defined by

$$\mathrm F (\mathrm x) := \mathrm x \mathrm x^\top$$

Its directional derivative in the direction of $\rm v$ at $\rm x$ is

$$\lim_{h \to 0} \frac 1h \big( \mathrm F (\mathrm x + h \mathrm v) - \mathrm F (\mathrm x) \big) = \mathrm v \mathrm x^\top + \mathrm x \mathrm v^\top$$

Vectorizing, we obtain

$$\mbox{vec} \big( \mathrm v \mathrm x^\top + \mathrm x \mathrm v^\top \big) = \mbox{vec} \big( \mathrm I_n \mathrm v \mathrm x^\top + \mathrm x \mathrm v^\top \mathrm I_n \big) = \big( \color{blue}{\mathrm x \otimes \mathrm I_n + \mathrm I_n \otimes \mathrm x} \big) \mathrm v$$