3

I am familiar with the derivation of $D_A(\det(A)) := \dfrac{\partial \det(A)}{\partial A}$ as a Fréchet derivative by considering $$D_A[H] = \det(A+H) - \det(A) \; as \; \|H \| \rightarrow 0 = \det(A) \mathrm{tr} (A^{-1}H) = \det(A) A^{-T} \cdot H$$ for a general square matrix $A$. I have used the Frobenius inner product for writing the trace as an inner product. This leads to the same answer as that given in The Matrix Cookbook.

But if $A$ is symmetric , how do I compute the Fréchet derivative to be $$ \det(A) (2A^{-1} - \operatorname{diag}(A^{-1}) )$$ I am curious to know if we can derive this formula that is quoted from The Matrix Cookbook in various posts, see for example --

What is the derivative of the determinant of a symmetric positive definite matrix?

copper.hat
  • 172,524
me10240
  • 1,215
  • 10
  • 27
  • Why would the derivative be any different if you are just evaluating it at a specific point? I think you need to clarify what you are asking. Are you looking for a simpler formula? – copper.hat Sep 19 '17 at 22:34
  • I added the expression that matrix calculus gives. Is there a disconnect between the two definitions then ? – me10240 Sep 19 '17 at 22:37
  • I don't understand your notation, the derivative $D \det (A)$ is a linear functional on the space of matrices. How can the derivative be a matrix (is there something implied that I am missing?)? – copper.hat Sep 19 '17 at 22:41
  • What is the domain here? – copper.hat Sep 19 '17 at 22:45
  • Added more details. – me10240 Sep 19 '17 at 22:50
  • using the inverse you should also assume that your matrix is invertible. The general derivative is $D\det(A)[H]=tr(A^# H)$ if that helps you in any way. – F.R. Sep 19 '17 at 23:16
  • As an aside, in an inner product space, the quantity you are using is called the gradient. – copper.hat Sep 20 '17 at 04:36

1 Answers1

2

This doesn't look right to me.

Let $J = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$, $A=A^{-1} = J$, $H=J$.

Then $D \det (A)(H) = -2$, whereas the other formula would give $-4$ (since $A$ is zero on the diagonal).

Commentary:

Let me abuse notation and use $S^n$ to denote the set of symmetric $n \times n$ matrices.

The Fréchet derivative is the best linear approximation to a function at a given point, it is unique and does not depend on any basis. In particular, the if the given point lies in some subspace (for example, $S^n \subset \mathbb{R}^{n \times n}$), the derivative is the same. There may be some simplifications in the formula that depend on characteristics of the subspace, but the resulting derivative will be the same.

Since the Cookbook formula gives a value that differs from the derivative, it cannot be the Fréchet derivative of the map $\det : S^n \to S^n$.

My understanding of what the Cookbook means by structured matrices is that there is some map $p$ from $\mathbb{R}^{n \times n}$ into the collection of structured matrices, and the (Fréchet) derivative of the composition $\det \circ p$, evaluated at a point of $ S^n$, is what the Cookbook gives.

For symmetric matrices, a parameterisation that is consistent with the Cookbook is $p:\mathbb{R}^{n \times n} \to S^n$, $p(X) = X+X^T - \operatorname{diag} X$, which would give rise to the formula (using the chain rule, linearity & symmetry of $p$ and properties of $\operatorname{tr}$): \begin{eqnarray} D (\det \circ p) (X)(H) &=& D \det(p(X)) ( D p(X)(H)) \\ &=& D \det(p(X))( p(H)) \\ &=& \det(p(X)) \operatorname{tr} ( p(X)^{-1} p(H)) \\ &=& \det(p(X)) \operatorname{tr} ( p(X)^{-1} (H+H^T - \operatorname{diag} H)) \\ &=& \det(p(X)) \operatorname{tr} ( 2p(X)^{-1} H - p(X)^{-1} \operatorname{diag} H)) \\ &=& \det(p(X)) \operatorname{tr} ( 2p(X)^{-1} H - \operatorname{diag} (p(X)^{-1})H) \\ &=& \det(p(X)) \operatorname{tr} ( (2p(X)^{-1} -\operatorname{diag} (p(X)^{-1}) H) \\ \end{eqnarray} Note, however, this is the Fréchet derivative of the composition $\det \circ p$, not the Fréchet derivative evaluated at a point of $S^n$ in the direction $H$. The parameterisation is very relevant.

In terms of the example above, we have $D \det (J)(J) = -2$, but $D (\det \circ p) (J) (J) = -4$.

The reason for the extra factor of $2$ is that $p(J) = 2J$ since the parameterisation $p$ doubles the non diagonal elements of the perturbation $p$.

That is, $\det(J+H) \approx -1 -2 \langle J, H \rangle$ but $(\det \circ p)(J+H) \approx -2 -4 \langle J, H \rangle$, so while both are Fréchet derivatives, they are derivatives of different functions evaluated at different points.

copper.hat
  • 172,524
  • I thought the point of the two different formulae was one is for a general (non-symmetric) matrix only and other for symmetric matrix. So we should not get the same result from both. I invite you to take a look at the cookbook and decide. What I was looking for was confirmation that the frechet derivative calculation will yield same result, or an explanation of why it wont and how are these approaches to the derivative different. – me10240 Sep 20 '17 at 06:21
  • I don't know where the second formula comes from. Again, why would the derivative be different on a subspace? – copper.hat Sep 20 '17 at 06:58
  • Well, the reasoning provided was that the matrix calculus rule has to account for symmetry in the matrix, that there are not $n^2$ different entries but $n(n+1)/2$. I do not understand from the Frechet derivative viewpoint why they ought to be different, which is why I asked the question. – me10240 Sep 20 '17 at 15:18
  • @me10240 This question contains an interesting example of what you're asking about. – lynn Sep 20 '17 at 18:00
  • @me10240: I can explain the Frechet part. I don't understand the other formula. Perhaps this is with respect to a particular basis? – copper.hat Sep 20 '17 at 18:20
  • The Frechet derivative just gives a linear approximation to $A \mapsto \det A$ at a given point, it doesn't matter if the point or direction are in a particular subspace (the symmetric matrices), the result will be the same. However, one can imagine a map from $\mathbb{R}^{{1 \over 2} n (n+1)}$ to $\mathbb{R}^{n^2}$ in which the resulting derivative will be different. – copper.hat Sep 20 '17 at 18:30
  • I couldn't find the last formula in the question in the cookbook. – copper.hat Sep 20 '17 at 18:41
  • Its Section 2.8.2, equation (140) – me10240 Sep 20 '17 at 18:55
  • @me10240: Thanks, I found it and I believe I understand what it going on. It is essentially finding the derivative with respect to a parameterisation in a different basis. However, using the results requires that the directions are represented in this different basis, so some care is needed. – copper.hat Sep 21 '17 at 03:50
  • I am still in the dark. Could you explain it in details in an answer ? – me10240 Sep 21 '17 at 04:23
  • And also, what about the counterexample you provided? – me10240 Sep 21 '17 at 04:57
  • @me10240: I have added some commentary. – copper.hat Sep 21 '17 at 16:52
  • I understand what you say. If this is what the matrix cookbook actually means, in my opinion the result stated is downright incorrect. It isnt the gradient of $\det(A)$ by any means, its gradient of $\det(A + A^T - A\circ I)$, and these are not even close. but you have raised a follow up question that I shall post separately. I shall wait a couple of days to see if somebody contradicts your interpretation, then select your answer. – me10240 Sep 21 '17 at 17:11
  • @me10240: Assuming my reading is correct, I think the authors should have included an explicit parameterisation to avoid ambiguity. – copper.hat Sep 21 '17 at 17:14
  • @me10240: I don't intend to malign the Cookbook, it is sufficiently dense that covering every detail is impossible, but certainly a sentence would help disambiguate. It took me a while to realise what they were doing (assuming my interpretation is correct). – copper.hat Sep 21 '17 at 17:39