1

I'm struggeling with a part of a proof.

Let $A = \mathcal{N}(\mu, \Sigma)$ be a $n-$variate Gaussian, and let $R$ be a $n \times n$ rotation matrix. We can rotate this distribution by the rotation matrix via $\mathcal{N}(R \mu, R\Sigma R^T)$. Now I want to know: under which rotation matrix is the sum of standard deviations minimized?

To formalize, we want to minimize the square rooted elements along the diagonal: $$\text{argmin}_{R} \sum_{i=1}^n \sqrt{(R\Sigma R^T)_{i,i}}$$


Strong suspicion:

I have a strong suspicion that if we rotate our distribution such that it becomes uncorrelated (use the normalized principle component vectors as a new basis and construct $R$ such that we rotate to that basis), this sum is minimized. This appeared to be the case when I solved this numerically.

In terms of reasoning why, I'm getting stuck.

Steps so far:

The trace (so sum of marginal variances) of $R\Sigma R^T$ remains equal under rotation.

Therefore, the problem "feels" a bit like minimizing $\sum_{i=1}^n |a_i|$ for a set of numbers under the constraint that $C = \sum_{i=1}^n a_i^2$ which is usually done by trying to maximize the difference between the $a_i$'s, but that's how far I got.

Maybe we can use the fact that the uncorrelated basis/principle component basis is used in PCA as it is the direction that explains the maximum amount of variance?

Another way I tried to look at it is by taking an uncorrelated Gaussian and showing that any rotation would increase the sum of marginal standard deviations, but that didn't help much either.

  • stripping the problem down to it's simplest form: your question is for real symmetric $B\succeq \mathbf 0$ with diagonal matrix of eigenvalues $D$, is it true that $\text{trace}\Big( \big(B\circ I\big)^\frac{1}{2}\Big)\geq \text{trace}\Big( D^\frac{1}{2}\Big)$ where $\circ$ denotes Hadamard proeduct. The answer is yes though the only answer I can think of uses majorization and I suspect you won't understand it. – user8675309 Aug 22 '22 at 01:54
  • How would you prove this? I'm familiar with the basics of majorization, but I don't see how it brings me any further. What I tried since now: due to the equality of sum of variances (trace of a covariance matrix remains equal under rotation) we can rewrite this problem to a problem of maximizing the variance between the standard deviations. I was hoping to apply a PCA like proof on it. –  Aug 22 '22 at 13:38

1 Answers1

1

Consider any $n\times n$ real symmetric PSD matrix $B=UDU^T$ where $U$ is orthogonal and $D$ is diagonal. Now collect the diagonal elements of $B$ in vector $\mathbf b$ and collect the eigenvalues of $B$ (diagonals of $D$) in vector $\mathbf d$.

1.) $\mathbf b\preceq \mathbf d$
which reads: $\mathbf b$ is (strongly) majorized by $\mathbf d$

this is a corollary of Maximize $\mathrm{tr}(Q^TCQ)$ subject to $Q^TQ=I$
with the additional constraint that the columns of $Q$ are standard basis vectors which proves that for any $m\in \big\{1,2,\dots,n\big\}$
$\sum_{k=1}^m b_{[k]} \leq \sum_{k=1}^m d_{[k]}$
(where e.g. $b_{[k]}$ denotes the kth largest value in $\mathbf b$)

And recall for the case of $m=n$ that
$\sum_{k=1}^n d_{k}=\text{trace}\big(D\big)=\text{trace}\big(UDU^T\big)=\text{trace}\big(B\big)=\sum_{k=1}^n b_{k}$

2.) for $x\geq 0$ note that $x\mapsto x^\frac{1}{2}$ is concave (check 2nd derivative), thus $f:\mathbb R_{\geq 0}^n\longrightarrow \mathbb R$ given by
$f\big(\mathbf a\big)= \sum_{k=1}^n a_k^\frac{1}{2}$ is Schur Concave

Putting (1) and (2) together gives
$\text{trace}\big((B\circ I)^\frac{1}{2}\big)=f\big(\mathbf b\big)\geq f\big(\mathbf d\big) = \text{trace}\big(D^\frac{1}{2}\big)$

user8675309
  • 10,034