I'm struggeling with a part of a proof.
Let $A = \mathcal{N}(\mu, \Sigma)$ be a $n-$variate Gaussian, and let $R$ be a $n \times n$ rotation matrix. We can rotate this distribution by the rotation matrix via $\mathcal{N}(R \mu, R\Sigma R^T)$. Now I want to know: under which rotation matrix is the sum of standard deviations minimized?
To formalize, we want to minimize the square rooted elements along the diagonal: $$\text{argmin}_{R} \sum_{i=1}^n \sqrt{(R\Sigma R^T)_{i,i}}$$
Strong suspicion:
I have a strong suspicion that if we rotate our distribution such that it becomes uncorrelated (use the normalized principle component vectors as a new basis and construct $R$ such that we rotate to that basis), this sum is minimized. This appeared to be the case when I solved this numerically.
In terms of reasoning why, I'm getting stuck.
Steps so far:
The trace (so sum of marginal variances) of $R\Sigma R^T$ remains equal under rotation.
Therefore, the problem "feels" a bit like minimizing $\sum_{i=1}^n |a_i|$ for a set of numbers under the constraint that $C = \sum_{i=1}^n a_i^2$ which is usually done by trying to maximize the difference between the $a_i$'s, but that's how far I got.
Maybe we can use the fact that the uncorrelated basis/principle component basis is used in PCA as it is the direction that explains the maximum amount of variance?
Another way I tried to look at it is by taking an uncorrelated Gaussian and showing that any rotation would increase the sum of marginal standard deviations, but that didn't help much either.