5

I am studying Boyd & Vandenberghe's Convex Optimization and encountered a problem on page 642. According to the definition, the derivative $Df(x)$ has the form:

$$f(x)+Df(x)(z-x)$$

and when $f$ is real-valued (i.e., $f : \Bbb R^n \to \Bbb R$), the gradient is

$$\nabla{f(x)}=Df(x)^{T}$$

See the original text below:


enter image description here


But when discussing the gradient of function $f(X)=\log{\det{X}}$, author said "we can identify $X^{-1}$ as the gradient of $f$ at $X$", please see below:


enter image description here


enter image description here


Where did trace $\mbox{tr}(\cdot)$ go?

BioCoder
  • 875
  • 3
    In the general formula, $Df(x)(z-x)$ is actually the scalar product $\langle Df(x),z-x\rangle$. In the example, the scalar product is $\langle U,V\rangle=\mathrm{tr}(UV)$, hence indeed the gradient at $X$ is $Df(X)=X^{-1}$. – Did Sep 23 '15 at 07:34
  • 1
    Hi, Did. You mean this is just a trivial problem caused by the notation? – BioCoder Sep 23 '15 at 07:44
  • 2
    Yes, and probably by failing to note that $Df(x)(z-x)$ stands for $\langle Df(x),z-x\rangle$. – Did Sep 23 '15 at 07:46
  • Well, it sounds plausible. But $Df(x) \in R^{m\times{n}}$ is a matrix and $(z-x) \in R^n$ is a vector, can $Df(x)(z−x)$ still be interpreted as $⟨Df(x),z−x⟩$ ? – BioCoder Sep 23 '15 at 07:55
  • 1
    Yes, because $m=1$ hence $Df(x)$ and $z-x$ are both vectors in $R^n$. – Did Sep 23 '15 at 07:58
  • Note that $(Z-X)$ is in $\mathbb R^{n^2}$ instead of $\mathbb R^n$. –  Sep 23 '15 at 08:26
  • Hi, John. You mean $(Z-X)\in S^n$ ? – BioCoder Sep 23 '15 at 09:22
  • $(Z-X) \in S^n$, but we are identifying $S^n$ as a subset of $\mathbb R^{n^2}$ anyway. –  Sep 23 '15 at 09:30
  • Hi, Did. I am still kind of worried about the "general formula" you just said. Can you give me some reference materials to read? – BioCoder Sep 23 '15 at 09:35
  • @Did the $X$ is a member of symmetric matrices. So how $Df(X)$ is a vector? – Frank Moses Jan 24 '18 at 04:03
  • @Did further $Z-X$ is also a symmetric matrix – Frank Moses Jan 24 '18 at 04:04
  • @FrankMoses Yes, hence in in that case, $\langle\ ,\ \rangle$ is the scalar product between vectors of size $n^2$. Simply rehashing the general definition of the gradient... – Did Jan 24 '18 at 08:17
  • @Did ok then it means that $Df(X)$ is a vector of size $n^2$ but then $Df(X)=X^{-1}$ means a vector is equal to a matrix inverse. Is it possible? – Frank Moses Jan 24 '18 at 08:21
  • @FrankMoses ?? If $X$ is a matrix of size $n^2$ then $X^{-1}$ is also a matrix of size $n^2$. – Did Jan 24 '18 at 08:22
  • @Did yes I completely agree with that. But as you said $Df(X)$ is a vector of size $n^2$ thats why I asked you in my comment that how a vector equals a matrix. – Frank Moses Jan 24 '18 at 08:24
  • @FrankMoses As explained on https://math.stackexchange.com/q/2618560 as well... – Did Jan 24 '18 at 08:24
  • @Did I am sorry but which comment are you referring to? – Frank Moses Jan 24 '18 at 08:27
  • 1
    @FrankMoses Not a comment, Martin's answer. – Did Jan 24 '18 at 08:41
  • @Did as you can see I even asked him the question that how does he conclude that $$Df(X)(Z-X)=<Df(X),Z-X>$$ If I can understand this then there will be no problem. But untill now I am unable to understand this thing. – Frank Moses Jan 24 '18 at 08:44
  • @FrankMoses We should (and I shall) stop this since you basically hijacked the comment thread here to get explanations about your question on another page (which is a big no-no on this site), but let me simply add that you seem to be lacking a definition of $Df(X)$ and that this may explain the trouble you have following the (very basic) points Martin is making over tthere. – Did Jan 24 '18 at 08:57
  • relevant? Prove $\frac{\partial \rm{ln}|X|}{\partial X} = 2X^{-1} - \rm{diag}(X^{-1})$.. Here I say 'We first note that for the case where the elements of X are independent, a constructive proof involving cofactor expansion and adjoint matrices can be made to show that $\frac{\partial ln|X|}{\partial X} = X^{-T}$ (Harville). This is not always equal to $2X^{-1}-diag(X^{-1})$. The fact alone that X is positive definite is sufficient to conclude that X is symmetric and thus its elements are not independent.' – BCLC Apr 16 '21 at 09:58

2 Answers2

5

First of all, if you write (for a general function $f: U \to \mathbb R$, where $U \subset \mathbb R^K$)

$$f(y) \approx f(x) + Df(x) (y-x),$$

then term $Df(x) (y-x)$ is really

$$\sum_{i=1}^K D_i f \ (y_i - x_i).$$

Now the function $Z\mapsto \log\det (Z)$ are defined on an open set $S^n_{++}$ in $\mathbb R^{n^2}$, so it has $n^2$ coordinate given by $Z_{ij}$, where $i, j = 1, \cdots, n$.

Now take a look at

$$\begin{split} \text{tr} \left( X^{-1} (Z-X)\right) &= \sum_{i=1}^n \left(X^{-1} (Z-X) \right)_{ii}\\ &= \sum_{i=1}^n \sum_{j=1}^n X^{-1}_{ij} (Z_{ji}-X_{ji}) \\ \end{split}$$

Thus we should have identified $(X^{-1})^T$ as the gradient of $\log \det$.

  • As Did said, the second term of the first-order approximation $f(x)+Df(x)(z−x)$ is actually an inner product of the derivative $Df(x)$ and $(z-x)$, but why is it written as $Df(x)(z-x)$ instead of $⟨Df(x),z−x⟩$ ? After all, the inner product is $Df(x)^T(z-x)$ – BioCoder Sep 24 '15 at 03:49
  • 1
    That corresponds to matrix multiplication. You can think of $Df(x)$ as a $1\times n$ matrix and $(z-x)$ an $n\times 1$ matrix. Then the dot product $\langle Df(x) , z-x\rangle$ is the same as the matrix multiplication $Df(x) (z-x)$. @BioCoder –  Sep 24 '15 at 03:53
  • Please add some steps before concluding "Thus we should have identified ...". I am also trying to understand this but I am unable to understand this. Are you concluding by comparing the second equation in your post with the fourth equation? Please help. – Frank Moses Jan 24 '18 at 03:38
  • 1
    @FrankMoses Yes, it's just comparing the terms. –  Jan 24 '18 at 04:05
  • Please add some steps to get the second equation in your post and please make the comparison more easier. I am sorry but I cannot get it from the current form. I will be very thankful to you – Frank Moses Jan 24 '18 at 04:07
  • And I do not understand why the first equation is written for vector $X$ while the question is about matrix $X$. Maybe my question is too basic but I will be very thankful to you for helping me in understanding this – Frank Moses Jan 24 '18 at 04:10
  • Is your second equation right or should it be $\sum_{i=1}^{K}D_{i}f(x_i)(y_i-x_i)$? – Frank Moses Jan 24 '18 at 05:40
2

The trace $tr(X^{-1}(Z-X)$ is the standard inner product of $X^{-1}$ and $Z-X$. The choice of inner product is depend on the specific space.

So it doesn't mean that $tr(X^{-1}(Z-X)$=$Df(x)(Z-X)$(the latter uses matrix multiplication). It is the definition of inner product that matters.

Aaron
  • 31