I would like to compute the following derivative:
$$\frac{d}{d\mathbf{X}} |\mathbf{X}^{T} \mathbf{X}|$$
Minka, in 'Old and New Matrix Algebra Useful for Statistics', says:
$$\frac{d}{d\mathbf{X}}|\mathbf{X}^{T} \mathbf{X}| = 2 |\mathbf{X}^{T}\mathbf{X}| (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T} $$
His reasoning is that because:
$$ d|\mathbf{X}^{T}\mathbf{X}| = 2|\mathbf{X}^{T}\mathbf{X}|\text{tr}( (\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T} d\mathbf{X} ) $$
We get the aforementioned derivative.
Question: How do we get rid of the trace? What justifies this?
Is this saying that:
$$ \text{tr}( (\mathbf{X}^{T}\mathbf{X})^{-1} \mathbf{X}^{T} d\mathbf{X} ) = (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}d\mathbf{X} $$
That doesn't seem right to me.
@greg has some way to do with using the Frobenius inner product, but I haven't see that anywhere but on some answers.
Could anyone provide some insight into what is going on?
Thanks