Find rotation matrix for multivariate Gaussian such that sum of standard deviations is minimized

Question

I'm struggeling with a part of a proof.

Let $A = \mathcal{N}(\mu, \Sigma)$ be a $n-$variate Gaussian, and let $R$ be a $n \times n$ rotation matrix. We can rotate this distribution by the rotation matrix via $\mathcal{N}(R \mu, R\Sigma R^T)$. Now I want to know: under which rotation matrix is the sum of standard deviations minimized?

To formalize, we want to minimize the square rooted elements along the diagonal: $$\text{argmin}_{R} \sum_{i=1}^n \sqrt{(R\Sigma R^T)_{i,i}}$$

Strong suspicion:

I have a strong suspicion that if we rotate our distribution such that it becomes uncorrelated (use the normalized principle component vectors as a new basis and construct $R$ such that we rotate to that basis), this sum is minimized. This appeared to be the case when I solved this numerically.

In terms of reasoning why, I'm getting stuck.

Steps so far:

The trace (so sum of marginal variances) of $R\Sigma R^T$ remains equal under rotation.

Therefore, the problem "feels" a bit like minimizing $\sum_{i=1}^n |a_i|$ for a set of numbers under the constraint that $C = \sum_{i=1}^n a_i^2$ which is usually done by trying to maximize the difference between the $a_i$'s, but that's how far I got.

Maybe we can use the fact that the uncorrelated basis/principle component basis is used in PCA as it is the direction that explains the maximum amount of variance?

Another way I tried to look at it is by taking an uncorrelated Gaussian and showing that any rotation would increase the sum of marginal standard deviations, but that didn't help much either.

stripping the problem down to it's simplest form: your question is for real symmetric $B\succeq \mathbf 0$ with diagonal matrix of eigenvalues $D$, is it true that $\text{trace}\Big( \big(B\circ I\big)^\frac{1}{2}\Big)\geq \text{trace}\Big( D^\frac{1}{2}\Big)$ where $\circ$ denotes Hadamard proeduct. The answer is yes though the only answer I can think of uses majorization and I suspect you won't understand it. — user8675309, Aug 22 '22 at 01:54
How would you prove this? I'm familiar with the basics of majorization, but I don't see how it brings me any further. What I tried since now: due to the equality of sum of variances (trace of a covariance matrix remains equal under rotation) we can rewrite this problem to a problem of maximizing the variance between the standard deviations. I was hoping to apply a PCA like proof on it. — , Aug 22 '22 at 13:38

score 1 · Accepted Answer · answered Aug 23 '22 at 03:55

Consider any $n\times n$ real symmetric PSD matrix $B=UDU^T$ where $U$ is orthogonal and $D$ is diagonal. Now collect the diagonal elements of $B$ in vector $\mathbf b$ and collect the eigenvalues of $B$ (diagonals of $D$) in vector $\mathbf d$.

1.) $\mathbf b\preceq \mathbf d$
which reads: $\mathbf b$ is (strongly) majorized by $\mathbf d$

this is a corollary of Maximize $\mathrm{tr}(Q^TCQ)$ subject to $Q^TQ=I$
with the additional constraint that the columns of $Q$ are standard basis vectors which proves that for any $m\in \big\{1,2,\dots,n\big\}$
$\sum_{k=1}^m b_{[k]} \leq \sum_{k=1}^m d_{[k]}$
(where e.g. $b_{[k]}$ denotes the kth largest value in $\mathbf b$)

And recall for the case of $m=n$ that
$\sum_{k=1}^n d_{k}=\text{trace}\big(D\big)=\text{trace}\big(UDU^T\big)=\text{trace}\big(B\big)=\sum_{k=1}^n b_{k}$

2.) for $x\geq 0$ note that $x\mapsto x^\frac{1}{2}$ is concave (check 2nd derivative), thus $f:\mathbb R_{\geq 0}^n\longrightarrow \mathbb R$ given by
$f\big(\mathbf a\big)= \sum_{k=1}^n a_k^\frac{1}{2}$ is Schur Concave

Putting (1) and (2) together gives
$\text{trace}\big((B\circ I)^\frac{1}{2}\big)=f\big(\mathbf b\big)\geq f\big(\mathbf d\big) = \text{trace}\big(D^\frac{1}{2}\big)$

Find rotation matrix for multivariate Gaussian such that sum of standard deviations is minimized

1 Answers1