6

I am reading a book and it says that $\frac{\delta \log(\det(X))}{\delta X} = 2X^{-1}-diag X^{-1}$ where $X$ is a $2\times 2$ positive definite matrix. However I have computed that $\frac{\delta \log(\det(X))}{\delta X} = (X^{-1})^T$. Now the book cites another book and does not give the proof. I looked up in the other book and it is as I said. Now I am a bit confused. This can't be a typo. Does somebody know what's going on? The book with the claim (p. 68) can be found here:

http://www.utstat.toronto.edu/~brunner/books/LinearModelsInStatistics.pdf

However the book which it cites (p. 310) can be found here:

https://books.google.de/books?id=fYvaBwAAQBAJ&printsec=frontcover&hl=de&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false

Edit: I think I found the answer for the confusion:

The cited book (page 306) has the formula and the derivation of it: http://hbanaszak.mjr.uw.edu.pl/TempTxt/Haville_2008__Matrix_Algebra_From_a_Statistician__039_s_Perspective.pdf

The difference is that in the case when $(X^{-1})^T$ is the answer the components of $X$ are assumed to be "independent". Whereas in the case where $2 X^{-1} - diag X^{-1}$ is the answer, the matrix $X$ is assumed to be symmetric.

Again the confusion arises from the word "positive definite": The authors assume that a symmetric matrix is called "positive definite" if... So by definiton "positive definite" implies symmetric in this case.

  • Ok, thanks for your comment. What confused me was that the authors of the book state it as a theorem for a $p\times p$ positive definite matrix $X$ and then ask to do a exercise for a $2 \times 2$ matrix. –  Jun 20 '17 at 15:20
  • 1
    The definition is as you have computed it, but I think I found the confusion: The formula they suggest in the book is only valid for symmetric $X$. For the case $2 \times 2$ I have computed this. –  Jun 20 '17 at 16:03
  • relevant? Prove $\frac{\partial \rm{ln}|X|}{\partial X} = 2X^{-1} - \rm{diag}(X^{-1})$.. Here I say 'We first note that for the case where the elements of X are independent, a constructive proof involving cofactor expansion and adjoint matrices can be made to show that $\frac{\partial ln|X|}{\partial X} = X^{-T}$ (Harville). This is not always equal to $2X^{-1}-diag(X^{-1})$. The fact alone that X is positive definite is sufficient to conclude that X is symmetric and thus its elements are not independent.' – BCLC Apr 16 '21 at 10:06

2 Answers2

3

As to whether to transpose or not: This depends on what double-contraction operator is in use. Defining differentiation of a scalar-valued function $f$ with respect to a (typically square) matrix $X$ is meant to help expressing linearized changes as $$\mathrm{d}f = \frac{\delta f}{\delta X}\cdot\cdot\,\mathrm{d}X\tag{*}\label{ccorn-D}$$ Here the double-dot stands for the double-contraction operator. It takes two matrix arguments and contracts those bilinearly to a scalar. Some people define the double-contraction operator as $$A\cdot\cdot B = \operatorname{tr}(AB)\tag{1}\label{ccorn-I}$$ whereas others use $$A\cdot\cdot B = \operatorname{tr}(A^\top B)\tag{2}\label{ccorn-II}$$ These two agree for symmetric matrices; and both ensure that the double-contraction of a symmetric matrix with a skew-symmetric matrix yields zero. They differ basically in the sign of the result when contracting two skew-symmetric matrices. As long as at least one of $A$ or $B$ is symmetric, it does not matter which of $\eqref{ccorn-I}$ or $\eqref{ccorn-II}$ you use, but it matters in formulae like the one you have encountered.

Note that $\eqref{ccorn-D}$ alone does not uniquely define $\delta f/\delta X$. In particular, if $X$ is symmetric, $\delta f/\delta X$ could contain arbitrary skew-symmetric components that do not affect $\eqref{ccorn-D}$. In such scenarios, $\delta f/\delta X$ is commonly understood to be symmetric as well.


As to the result named first in your question: Probably the authors mean $$\frac{\delta f}{\delta X} = \begin{pmatrix} \frac{\partial f}{\partial X_{11}} & \cdots & \frac{\partial f}{\partial X_{1n}} \\\vdots & \ddots & \vdots \\\frac{\partial f}{\partial X_{n1}} & \cdots & \frac{\partial f}{\partial X_{nn}} \end{pmatrix}\tag{3}\label{ccorn-III}$$ or the transpose thereof. This is compatible with $\eqref{ccorn-D}$ if you choose the double-contraction operator accordingly – $\eqref{ccorn-II}$ for the above version, $\eqref{ccorn-I}$ for the transposed version – so that you end up with the chain rule, as it should be: $$\frac{\delta f}{\delta X}\cdot\cdot\,\mathrm{d}X = \sum_{i,j}\frac{\partial f}{\partial X_{ij}}\mathrm{d}X_{ij} = \mathrm{d}f$$ $\eqref{ccorn-III}$ assumes that the $X_{ij}$ are independent variables insofar as the definition of $f$ is concerned. This does not require that $X$ be unconstrained; it just means that $f(X)$ needs to be defined in terms of the $X_{ij}$ so that the partial derivatives can be taken.

If $X$ is constrained to be symmetric, one could make $f$ depend only on those $X_{ij}$ where $i\leq j$ by replacing $X_{ji}$ with $X_{ij}$ in the definition of $f$. Let us call this version $f^*$. Then the right-hand side matrix in $\eqref{ccorn-III}$ becomes upper triangular, and the former lower triangle gets transposed and added to the former upper triangle: $$\frac{\delta f^*}{\delta X} = \begin{pmatrix} \frac{\partial f}{\partial X_{11}} & \frac{\partial f}{\partial X_{12}} + \frac{\partial f}{\partial X_{21}} & \cdots & \cdots & \frac{\partial f}{\partial X_{1n}} + \frac{\partial f}{\partial X_{n1}} \\ 0 & \frac{\partial f}{\partial X_{22}} & \frac{\partial f}{\partial X_{23}} + \frac{\partial f}{\partial X_{32}} & \cdots & \frac{\partial f}{\partial X_{2n}} + \frac{\partial f}{\partial X_{n2}} \\\vdots & \ddots & \ddots & \ddots & \vdots \\ 0 & \cdots & 0 & \frac{\partial f}{\partial X_{n-1,n-1}} & \frac{\partial f}{\partial X_{n-1,n}} + \frac{\partial f}{\partial X_{n,n-1}} \\ 0 & \cdots & \cdots & 0 & \frac{\partial f}{\partial X_{nn}} \end{pmatrix}\tag{4}\label{ccorn-IV}$$ This would still fulfill $\eqref{ccorn-D}$. Another option would be to symmetrize the right-hand side of \eqref{ccorn-IV} by averaging that triangular matrix with its transpose. That is like replacing $f(X)$ with $$\bar{f}(X) = \frac{f(X)+f(X^\top)}{2}\tag{5}\label{ccorn-S}$$

But somehow the authors of your first-mentioned formula chose to re-interpret the right-hand side of $\eqref{ccorn-III}$ as $$\frac{\delta f}{\delta X} \stackrel{?}{=} \begin{pmatrix} \frac{\partial f^*}{\partial X_{11}} & \frac{\partial f^*}{\partial X_{12}} & \cdots & \frac{\partial f^*}{\partial X_{1n}} \\\frac{\partial f^*}{\partial X_{12}} & \frac{\partial f^*}{\partial X_{22}} & \cdots & \frac{\partial f^*}{\partial X_{2n}} \\\vdots & \vdots & \ddots & \vdots \\\frac{\partial f^*}{\partial X_{1n}} & \frac{\partial f^*}{\partial X_{2n}} & \cdots & \frac{\partial f^*}{\partial X_{nn}} \end{pmatrix}\tag{6}\label{ccorn-bad}$$ So they sorted the indices and replaced $f$ with $f^*$. That's misleading:

  • The replacement alone would have resulted in $\eqref{ccorn-IV}$ and thus been compatible with $\eqref{ccorn-D}$.
  • The sorting would have been compatible with $\eqref{ccorn-D}$ if applied to the symmetric $\bar{f}$ from $\eqref{ccorn-S}$ instead of $f^*$.
  • Doing both means that now the terms $\partial f/\partial X_{ij}$ with $i\neq j$ occur twice when contracting. This violates the chain rule.

In other words, their resulting formula is incompatible with $\eqref{ccorn-D}$.

ccorn
  • 9,803
  • relevant? Prove $\frac{\partial \rm{ln}|X|}{\partial X} = 2X^{-1} - \rm{diag}(X^{-1})$.. Here I say 'We first note that for the case where the elements of X are independent, a constructive proof involving cofactor expansion and adjoint matrices can be made to show that $\frac{\partial ln|X|}{\partial X} = X^{-T}$ (Harville). This is not always equal to $2X^{-1}-diag(X^{-1})$. The fact alone that X is positive definite is sufficient to conclude that X is symmetric and thus its elements are not independent.' – BCLC Apr 16 '21 at 10:06
1

I think I found the answer for the confusion:

The cited book (page 306) has the formula and the derivation of it, but I do not understand how they derive it.

http://hbanaszak.mjr.uw.edu.pl/TempTxt/Haville_2008__Matrix_Algebra_From_a_Statistician__039_s_Perspective.pdf

The difference is, that in the second case $X$ is a symmetric matrix. For $X$ a $2 \times 2$ a symmetric matrix, the (strange) formula is valid.

  • relevant? Prove $\frac{\partial \rm{ln}|X|}{\partial X} = 2X^{-1} - \rm{diag}(X^{-1})$.. Here I say 'We first note that for the case where the elements of X are independent, a constructive proof involving cofactor expansion and adjoint matrices can be made to show that $\frac{\partial ln|X|}{\partial X} = X^{-T}$ (Harville). This is not always equal to $2X^{-1}-diag(X^{-1})$. The fact alone that X is positive definite is sufficient to conclude that X is symmetric and thus its elements are not independent.' – BCLC Apr 16 '21 at 10:06