Why is $\nabla \log{\det{X}} = X^{-1}$? Where did the trace go?

Question

I am studying Boyd & Vandenberghe's Convex Optimization and encountered a problem on page 642. According to the definition, the derivative $Df(x)$ has the form:

$$f(x)+Df(x)(z-x)$$

and when $f$ is real-valued (i.e., $f : \Bbb R^n \to \Bbb R$), the gradient is

$$\nabla{f(x)}=Df(x)^{T}$$

See the original text below:

But when discussing the gradient of function $f(X)=\log{\det{X}}$, author said "we can identify $X^{-1}$ as the gradient of $f$ at $X$", please see below:

Where did trace $\mbox{tr}(\cdot)$ go?

In the general formula, $Df(x)(z-x)$ is actually the scalar product $\langle Df(x),z-x\rangle$. In the example, the scalar product is $\langle U,V\rangle=\mathrm{tr}(UV)$, hence indeed the gradient at $X$ is $Df(X)=X^{-1}$. — Did, Sep 23 '15 at 07:34
Hi, Did. You mean this is just a trivial problem caused by the notation? — BioCoder, Sep 23 '15 at 07:44
Yes, and probably by failing to note that $Df(x)(z-x)$ stands for $\langle Df(x),z-x\rangle$. — Did, Sep 23 '15 at 07:46
Well, it sounds plausible. But $Df(x) \in R^{m\times{n}}$ is a matrix and $(z-x) \in R^n$ is a vector, can $Df(x)(z−x)$ still be interpreted as $⟨Df(x),z−x⟩$ ? — BioCoder, Sep 23 '15 at 07:55
Yes, because $m=1$ hence $Df(x)$ and $z-x$ are both vectors in $R^n$. — Did, Sep 23 '15 at 07:58
Note that $(Z-X)$ is in $\mathbb R^{n^2}$ instead of $\mathbb R^n$. — , Sep 23 '15 at 08:26
$(Z-X) \in S^n$, but we are identifying $S^n$ as a subset of $\mathbb R^{n^2}$ anyway. — , Sep 23 '15 at 09:30
Hi, Did. I am still kind of worried about the "general formula" you just said. Can you give me some reference materials to read? — BioCoder, Sep 23 '15 at 09:35
@Did the $X$ is a member of symmetric matrices. So how $Df(X)$ is a vector? — Frank Moses, Jan 24 '18 at 04:03
@FrankMoses Yes, hence in in that case, $\langle\ ,\ \rangle$ is the scalar product between vectors of size $n^2$. Simply rehashing the general definition of the gradient... — Did, Jan 24 '18 at 08:17
@Did ok then it means that $Df(X)$ is a vector of size $n^2$ but then $Df(X)=X^{-1}$ means a vector is equal to a matrix inverse. Is it possible? — Frank Moses, Jan 24 '18 at 08:21
@FrankMoses ?? If $X$ is a matrix of size $n^2$ then $X^{-1}$ is also a matrix of size $n^2$. — Did, Jan 24 '18 at 08:22
@Did yes I completely agree with that. But as you said $Df(X)$ is a vector of size $n^2$ thats why I asked you in my comment that how a vector equals a matrix. — Frank Moses, Jan 24 '18 at 08:24
@FrankMoses As explained on https://math.stackexchange.com/q/2618560 as well... — Did, Jan 24 '18 at 08:24
@Did as you can see I even asked him the question that how does he conclude that $$Df(X)(Z-X)=<Df(X),Z-X>$$ If I can understand this then there will be no problem. But untill now I am unable to understand this thing. — Frank Moses, Jan 24 '18 at 08:44
@FrankMoses We should (and I shall) stop this since you basically hijacked the comment thread here to get explanations about your question on another page (which is a big no-no on this site), but let me simply add that you seem to be lacking a definition of $Df(X)$ and that this may explain the trouble you have following the (very basic) points Martin is making over tthere. — Did, Jan 24 '18 at 08:57
relevant? Prove $\frac{\partial \rm{ln}|X|}{\partial X} = 2X^{-1} - \rm{diag}(X^{-1})$.. Here I say 'We first note that for the case where the elements of X are independent, a constructive proof involving cofactor expansion and adjoint matrices can be made to show that $\frac{\partial ln|X|}{\partial X} = X^{-T}$ (Harville). This is not always equal to $2X^{-1}-diag(X^{-1})$. The fact alone that X is positive definite is sufficient to conclude that X is symmetric and thus its elements are not independent.' — BCLC, Apr 16 '21 at 09:58

score 5 · Accepted Answer · 2018-01-24T04:05:10.487

5

First of all, if you write (for a general function $f: U \to \mathbb R$, where $U \subset \mathbb R^K$)

$$f(y) \approx f(x) + Df(x) (y-x),$$

then term $Df(x) (y-x)$ is really

$$\sum_{i=1}^K D_i f \ (y_i - x_i).$$

Now the function $Z\mapsto \log\det (Z)$ are defined on an open set $S^n_{++}$ in $\mathbb R^{n^2}$, so it has $n^2$ coordinate given by $Z_{ij}$, where $i, j = 1, \cdots, n$.

Now take a look at

$$\begin{split} \text{tr} \left( X^{-1} (Z-X)\right) &= \sum_{i=1}^n \left(X^{-1} (Z-X) \right)_{ii}\\ &= \sum_{i=1}^n \sum_{j=1}^n X^{-1}_{ij} (Z_{ji}-X_{ji}) \\ \end{split}$$

Thus we should have identified $(X^{-1})^T$ as the gradient of $\log \det$.

edited Jan 24 '18 at 04:05

answered Sep 23 '15 at 07:38

As Did said, the second term of the first-order approximation $f(x)+Df(x)(z−x)$ is actually an inner product of the derivative $Df(x)$ and $(z-x)$, but why is it written as $Df(x)(z-x)$ instead of $⟨Df(x),z−x⟩$ ? After all, the inner product is $Df(x)^T(z-x)$ – BioCoder Sep 24 '15 at 03:49
1

That corresponds to matrix multiplication. You can think of $Df(x)$ as a $1\times n$ matrix and $(z-x)$ an $n\times 1$ matrix. Then the dot product $\langle Df(x) , z-x\rangle$ is the same as the matrix multiplication $Df(x) (z-x)$. @BioCoder – Sep 24 '15 at 03:53
Please add some steps before concluding "Thus we should have identified ...". I am also trying to understand this but I am unable to understand this. Are you concluding by comparing the second equation in your post with the fourth equation? Please help. – Frank Moses Jan 24 '18 at 03:38
1

@FrankMoses Yes, it's just comparing the terms. – Jan 24 '18 at 04:05
Please add some steps to get the second equation in your post and please make the comparison more easier. I am sorry but I cannot get it from the current form. I will be very thankful to you – Frank Moses Jan 24 '18 at 04:07
And I do not understand why the first equation is written for vector $X$ while the question is about matrix $X$. Maybe my question is too basic but I will be very thankful to you for helping me in understanding this – Frank Moses Jan 24 '18 at 04:10
Is your second equation right or should it be $\sum_{i=1}^{K}D_{i}f(x_i)(y_i-x_i)$? – Frank Moses Jan 24 '18 at 05:40

score 2 · Answer 2 · answered Mar 07 '18 at 15:30

The trace $tr(X^{-1}(Z-X)$ is the standard inner product of $X^{-1}$ and $Z-X$. The choice of inner product is depend on the specific space.

So it doesn't mean that $tr(X^{-1}(Z-X)$=$Df(x)(Z-X)$(the latter uses matrix multiplication). It is the definition of inner product that matters.

Why is $\nabla \log{\det{X}} = X^{-1}$? Where did the trace go?

2 Answers2

Linked