0

I would like to compute the following derivative:

$$\frac{d}{d\mathbf{X}} |\mathbf{X}^{T} \mathbf{X}|$$

Minka, in 'Old and New Matrix Algebra Useful for Statistics', says:

$$\frac{d}{d\mathbf{X}}|\mathbf{X}^{T} \mathbf{X}| = 2 |\mathbf{X}^{T}\mathbf{X}| (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T} $$

His reasoning is that because:

$$ d|\mathbf{X}^{T}\mathbf{X}| = 2|\mathbf{X}^{T}\mathbf{X}|\text{tr}( (\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T} d\mathbf{X} ) $$

We get the aforementioned derivative.


Question: How do we get rid of the trace? What justifies this?

Is this saying that:

$$ \text{tr}( (\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T} d\mathbf{X} ) = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}d\mathbf{X} $$

That doesn't seem right to me.

@greg has some way to do with using the Frobenius inner product, but I haven't see that anywhere but on some answers.

Could anyone provide some insight into what is going on?

Thanks

  • 2
    Given the derivative expression you've provided, the notation $|M| = \det(M)$. – greg Dec 11 '19 at 15:07
  • We get rid of the trace because a directional derivative is one thing and a gradient is another. We extract the gradient from the Frobenius inner product. – Rodrigo de Azevedo Dec 11 '19 at 21:35
  • @RodrigodeAzevedo We can do that? Man I am so lost haha I really need a reference to start learning this then instead of just jumping in and solving problems. – the_src_dude Dec 12 '19 at 11:03
  • @the_src_dude Take a look at this. In matrix calculus, one uses the Frobenius inner product instead, but the idea is similar. Exploit the cyclical property of the trace till one obtains a Frobenius inner product in which one of the inputs is the direction matrix. The other input is then the gradient. Take a look at this. – Rodrigo de Azevedo Dec 13 '19 at 00:05
  • @RodrigodeAzevedo Thanks! I am slowly getting the hang of this. Your references are of great help! – the_src_dude Dec 13 '19 at 13:10
  • @the_src_dude Take a look at this one, too, as it's quite similar to your question. – Rodrigo de Azevedo Dec 14 '19 at 00:34

1 Answers1

2

Let $$Y=X^TX$$ Then the gradient of the function $(\log\det Y)$ is a well known result which can be looked up in the Matrix Cookbook or on Wikipedia. $$\eqalign{ f &= \log\det Y \\ G = \frac{\partial f}{\partial Y} &= (\det Y)\;Y^{-T} \\ }$$ All that's needed to answer this question is to perform a change of variables from $Y\to X$. $$\eqalign{ df &= G:dY \\ &= G:(dX^TX + X^TdX) \\ &= G:dX^TX + G:X^TdX \\ &= G^T:X^TdX + G:X^TdX \\ &= (G^T+G):X^TdX \\ &= 2G:X^TdX \\ &= 2XG:dX \\ &= 2(\det Y)XY^{-T}:dX \\ &= 2(\det X^TX)X(X^TX)^{-1}:dX \\ \frac{\partial f}{\partial X} &= 2(\det X^TX)X(X^TX)^{-1} \\ }$$ where a colon is employed as a convenient product notation for the trace, i.e. $$\eqalign{ A:B = {\rm Tr}(A^TB) \\ }$$

lynn
  • 3,396