0

Let $\Sigma$ be an arbitrary $n\times n$ positive semidefinite matrix and $\mathbb S^{n}_{+}$ denote the set of $n \times n$ positive semidefinite matrices.

Why is the optimal solution to $ \min\limits_{S \in \mathbb S^{n}_{+}} -\log(\det(S))+\text{trace}(\Sigma S)$ equal to $\Sigma^{-1}$? I am totally lost on where to begin on proving the above.

SABOY
  • 1,828
  • 1
    Perhaps you could try to take the gradient with respect to $S$? You can see https://math.stackexchange.com/questions/1447692/why-the-gradient-of-log-detx-is-x-1-and-where-did-trace-tr-go for how to find the gradient of $\log \det$ and https://math.stackexchange.com/a/3941224 for finding the gradient of $\mathrm{trace}$. – Minus One-Twelfth Nov 26 '21 at 23:46
  • 1
    If $\Sigma$ is only assumed to be positive semidefinite, what is $\Sigma^{-1}$ (especially when $\Sigma$ is not positive definite)? – Sangchul Lee Nov 27 '21 at 04:45
  • Agree if the sample size is large and one has no other information. Otherwise, can one do better with some prior information in smaller samples? – AJKOER Dec 16 '21 at 22:58

3 Answers3

2

My reformulation of this problem is first to possibly determine the eigenvalues of the matrix S.

Now, the reformulated problem is examine the log sum of the reciprocal of these respective eigenvalues of S (this is the −log(det(S)) part) such that their sum equals the trace of inverse of the given Σ matrix.

Trimming the largest values created from the smallest (near zero) eigenvalues is a place to start.

Re-apply transformation matrix to produce a final test matrix solution.

AJKOER
  • 239
2

Define $F : \mathbb{S}_+^n \to (-\infty, +\infty]$ by

$$ F(S) = -\log\det S + \operatorname{tr}(\Sigma S). $$

Also, denote by $\operatorname{int}(\mathbb{S}_+^{n})$ the set of all $n\times n$ positive-definite matrices. In this answer, we will only consider the case where $\Sigma \in \operatorname{int}(\mathbb{S}_+^{n})$.

Let $S \in \operatorname{int}(\mathbb{S}_+^{n})$. Then for any $n\times n$ symmetric $H$, there exists $\delta > 0$ such that $\| t S^{-1}H \| < 1$ and $S + tH \in \operatorname{int}(\mathbb{S}_+^{n})$ for all $|t| < \delta$, where $\| \cdot \|$ is the operator norm. Moreover, for such $t$, we get

\begin{align*} F(S + t H) &= F(S) - \log \det (I + t S^{-1}H) + \operatorname{tr}(\Sigma H) t \\ &= F(S) - \operatorname{tr} \log (I + t S^{-1}H) + \operatorname{tr}(\Sigma H) t . \end{align*}

From this, it follows that

$$ \left. \frac{\mathrm{d}}{\mathrm{d}t} \right|_{t=0} F(S + t H) = \operatorname{tr}((\Sigma - S^{-1})H) \tag{1} $$

and

$$ \left. \frac{\mathrm{d}^2}{\mathrm{d}t^2} \right|_{t=0} F(S + t H) = \operatorname{tr}(S^{-1}HS^{-1}H) = \operatorname{tr}\bigl((S^{-1/2}HS^{-1/2})^2\bigr) \geq 0 \tag{2} $$

with the equality if and only if $H = 0$.

Then $\text{(2)}$ shows that $F$ is strictly convex on $\operatorname{int}(\mathbb{S}_+^n)$, hence by continuity, $F$ is convex on $\mathbb{S}_+^n$. Moreover, $\text{(1)}$ shows that $S = \Sigma^{-1}$ is the unique critical point of $F$. Therefore $\Sigma^{-1}$ is the unique minimizer of $F$ as required.

Sangchul Lee
  • 167,468
  • Agree if the sample size is large and one has no other information. Otherwise, can one do better with some prior information in smaller samples? – AJKOER Dec 16 '21 at 23:00
2

Minimum does not exist if $\Sigma$ is singular. In fact, if you define $Sx=kx$ when $x\in\ker\Sigma$ and $Sx=x$ when $x\in(\ker\Sigma)^\perp=\operatorname{range}(\Sigma)$, then $$ \lim_{k\to+\infty}\bigl(-\log\det(S)+\operatorname{tr}(\Sigma S)\bigr) =\lim_{k\to+\infty}\bigl(-\operatorname{nullity}(\Sigma)\log(k)+\operatorname{rank}(\Sigma)\bigr) =-\infty. $$

Suppose $\Sigma$ is nonsingular. That is, suppose it is positive definite. Then $P=\Sigma^{1/2}S\Sigma^{1/2}$ is positive definite too. Denote the spectrum of $P$ by $\sigma$. Then \begin{aligned} -\log\det(S)+\operatorname{tr}(\Sigma S) &=\log\det\Sigma-\log\det(P)+\operatorname{tr}(P)\\ &=\log\det\Sigma+\sum_{\lambda\in\sigma}(-\log\lambda+\lambda). \end{aligned} Since $-\log\lambda+\lambda$ attains its only global minimum over $(0,\infty)$ at $\lambda=1$, the objective function attains minimum exactly when $P=I$. Hence $S=\Sigma^{-1}$ is the unique global minimiser.

user1551
  • 139,064