Maximize $\mathrm{tr}(Q^TCQ)$ subject to $Q^TQ=I$

Question

Let $C \in \mathbb{R}^{d \times d}$ be symmetric, and

$$Q = \begin{bmatrix} \vert & \vert & & \vert \\ q_1 & q_2 & \dots & q_K \\ \vert & \vert & & \vert \end{bmatrix} \in \mathbb{R}^{d\times K}$$

where $d \geq K$. Using Lagrange multipliers,

$$\begin{array}{ll} \text{maximize} & \mbox{tr} \left( Q^T C Q \right)\\ \text{subject to} & Q^T Q = I\end{array}$$

I am unfamiliar with these kind of constraints with this method, and after reading another post I believe the same specific and simple result given is also applicable, and therefore the lagrangian would be:

$$\mathcal{L}(Q,\lambda)=\mathrm{tr}(Q^TCQ)-\left<\lambda,Q^TQ-I\right>$$

where $\lambda\in\mathbb{R}^{K\times K}$, and $\left<\cdot,\cdot\right>$ is the element wise inner product (what kind of makes sense to me since we're actually adding as many constraints as there are elements in these matrices.

In attempting to do that I start taking $\frac{\partial \mathcal{L}}{\partial Q}=O\in\mathbb{R}^{d\times K}$, and compute that LHS element by element; for the $(l,m)$ one:

\begin{equation} 0=\frac{\partial \mathcal{L}}{\partial Q_{lm}}=(CQ+C^TQ)_{lm}-\underbrace{\frac{\partial}{\partial Q_{lm}}\sum_{i,j}\lambda_{i,j}(Q^TQ-I)_{ij}}_{=\lambda_{lm}\frac{\partial (Q^TQ)_{lm}}{\partial Q_{lm}}}=2(CQ)_{lm}-\lambda_{lm}\frac{\partial (q_l^Tq_m)}{\partial q_m(l)} \tag{1}\end{equation}

where in the last step I've used the definition I made at the beginning for $Q$, and $q_m(l)$ denotes the $l$-th component of the column vector $q_m$.

In trying to compute the very last term: $$\frac{\partial (q_l^Tq_m)}{\partial q_m(l)}=\frac{\partial \left[q_l(1)q_m(1)+ \ldots + q_l(d)q_m(d)\right]}{\partial q_m(l)}= \begin{cases} q_l(l)\equiv Q_{ll} & \text{if } l\neq m\\ 2q_l(l)\equiv 2Q_{ll} & \text{if} l=m \end{cases}$$

The whole equality (1) then can be written:

$$0=2(CQ)_{lm}-\lambda_{lm}Q_{ll}(1+\delta_{lm})$$

where $\delta_{lm}$ is the Kronecker delta.

The equation for the other stationary point of the lagrangian, $\frac{\partial \mathcal{L}}{\partial \lambda}=O\in\mathbb{R}^{K\times K}$, for the $(l,m)$ element as well:

$$ 0=\frac{\partial \mathcal L}{\partial \lambda_{lm}}= \frac{\partial }{\partial \lambda_{lm}}\sum_{i,j}\lambda_{i,j}(Q^TQ-I)_{ij}=(Q^TQ-I)_{lm}\tag{2}$$

what obviously leads to $(Q^TQ)_{lm}=\delta_{lm}$.

All this should tell that the columns of $Q$ are eventually the $K$ first eigenvectors of $C$, but I don't know how to continue from here to prove that, supposing I didn't make a mistake. Please I would sincerely appreciate any help.

Edit:

I have rewritten the inner product as a trace of a product of matrices (after seeing this question):

$$\left<\lambda,Q^TQ-I\right>=\sum_{i,j}\lambda_{i,j}(Q^TQ-I)_{ij}=\mathrm{tr}(\lambda^TQ^TQ) $$

and have thus managed to do the derivative without losing the matrix format (using formulas from the Matrix Cookbook):

\begin{align} O=&\frac{\partial \mathcal{L}}{\partial Q}=\frac{\partial}{\partial Q}\mathrm{tr}(Q^TCQ)-\frac{\partial}{\partial Q}\underbrace{\mathrm{tr}(\lambda^T(Q^TQ-I))}_{\mathrm{tr}(\lambda^TQ^TQ)-\mathrm{tr}(\lambda^T)}\\=&(CQ+C^TQ)-(Q(\lambda^T)^T+Q\lambda^T)=2CQ+Q(\lambda+\lambda^T) \end{align}

And this leads to:

$$CQ=Q\underbrace{\left(-\frac{\lambda+\lambda^T}{2}\right)}_{:=\widetilde{\lambda}};\quad CQ=Q$$

If the defined matrix $\widetilde{\lambda}=Q^TCQ$ were diagonal we would already have the result.

$d$ would be the dimension of the square (symmetric) matrix $C$, which has then $d$ (independent, since it's symmetric) eigenvectors. $K \leq d$ then. If we set $K=d$ the resulting $Q$ would contain all the eigenvectors in its columns. — abcd, Apr 22 '20 at 00:29
You should add that to the question. I guess it is implicit in $Q^TQ = I$, but no harm to add. — copper.hat, Apr 22 '20 at 00:29
You can assume $C$ to be diagonal in the first place. That simplifies life considerably. — copper.hat, Apr 22 '20 at 01:11
@abcd Do you want any solution or those using Lagrange multipliers? — River Li, Apr 22 '20 at 11:54
@River Li For my problem itself just needed a proof but now that I started with this method I really am interested in how I can apply the general answer I linked to this relatively simple case, and see whether what I made makes sense / goes anywhere. So yes one using Lagrange Multipliers but appreciate others' inputs as well. — abcd, Apr 22 '20 at 18:08
It is not entirely clear that the Lagrange multiplier approach helps, the resulting equation is something like $2 \operatorname{tr}(Q^TCH)+ \lambda(Q^TH+H^TQ) = 0$ for all $H$, where $\lambda$ is a real functional. — copper.hat, Apr 22 '20 at 20:11
I think I have made progress with the Lagrange approach and have arrived to something very similar to what @copper.hat describes (see edit). I believe $\widetilde{\lambda}$ is diagonal (from (2)) and hence it would be the matrix of eigenvalues (and hence the result) but am not fully sure now. — abcd, Apr 23 '20 at 02:59
I don't think the Lagrange approach will give you much more. — copper.hat, Apr 23 '20 at 03:43

score 4 · Answer 1 · edited Apr 02 '23 at 14:48

4

Since $C$ is symmetric real we can write $C=U \Lambda U^T$ where $\Lambda$ is a diagonal matrix of eigenvalues. As $Q^T U U^T Q = I$, we can just assume $C= \operatorname{diag} (\lambda_1,...,\lambda_d)$, where $\lambda_1 \ge \cdots \ge \lambda_d$.

The problem is then $\max_{Q^TQ=I} \operatorname{tr}(Q^T \Lambda Q)$.

Note that $\operatorname{tr}(Q^T \Lambda Q) = \operatorname{tr}(Q^T Q Q^T \Lambda Q) = \operatorname{tr}( Q Q^T \Lambda QQ^T) = \operatorname{tr}(P^T \Lambda P)$, where $P=Q Q^T$.

Note that $P$ is an orthogonal projection onto a subspace of dimension $K$. Furthermore, any such orthogonal projection can be written in the form $Q Q^T$, where $Q^TQ = I$.

So now the problem is $\max_{P \text{ orthogonal projection}, \text{ rk } P=K} \operatorname{tr}(P^T \Lambda P)$.

Note that $\operatorname{tr}(P^T \Lambda P) = \sum_{n=1}^d \lambda_n \|P e_n\|^2$. Furthermore, note that $\|P\|_F^2 = K$ and so $\sum_{n=1}^d \|P e_n\|^2 = K$ with $0 \le \|P e_n\|^2 \le 1$. ($e_n$ is the $n$th unit vector.)

It is straightforward to check that $\max\{ \sum_{n=1}^d \lambda_n \mu_n | \sum_{n=1}^d \mu_n = K, 0 \le \mu_n \le 1 \}$ is $\lambda_1+\cdots+ \lambda_K$.

Hence $\operatorname{tr}(P^T \Lambda P) \le \lambda_1+\cdots+ \lambda_K$ and by choosing ${\cal R} P = \operatorname{sp}\{e_1,...,e_K \}$ we see that this is attained.

edited Apr 02 '23 at 14:48

Clarent

95

answered Apr 22 '20 at 01:21

copper.hat

172,524

Why $\mathrm{Tr}(r_kr_k^T) = 1$? – River Li Apr 22 '20 at 02:20
@RiverLi: I may have goofed. – copper.hat Apr 22 '20 at 02:23
@RiverLi: Slightly messier, but correct. Thanks for spotting that. – copper.hat Apr 22 '20 at 02:57
1

Should it be $V^TQ = \left( \begin{array}{c} I \ 0 \ \end{array} \right) $? – River Li Apr 22 '20 at 04:04
@RiverLi: Thanks again. Another mistake. – copper.hat Apr 22 '20 at 04:35
1

@RiverLi: I think I have a reasonable proof now. – copper.hat Apr 22 '20 at 17:43
The proof is messy, I do not like it. – copper.hat Apr 22 '20 at 17:59
in $\sum_{n=1}^d \lambda_n |P e_k|^2$, is it $e_k$ or $e_n$? It might be a good idea to define $e_k$. – River Li Apr 23 '20 at 00:32
@RiverLi: Thanks, updated. – copper.hat Apr 23 '20 at 00:39
1

(+1) It is fine now. Also a typo in $\operatorname{tr}(P^T \Lambda P) = \sum_{n=1}^d \lambda_n |P e_k|^2$. – River Li Apr 23 '20 at 01:17
@RiverLi: Thanks again! Good eye for detail. – copper.hat Apr 23 '20 at 01:19
Hi, could you please clarify why can we assume that $C=\text{diag}(\lambda_1,...,\lambda_d)$? Also, what is exactly $\text{diag}(\lambda_1,...,\lambda_d)$ equal to ? – May 11 '21 at 04:19
It is just a change of variables $Q'=U^TQ$. I do not understand what your second question means, it is a diagonal matrix of the eigenvalues of $C$. – copper.hat May 11 '21 at 06:35

user8675309 · Answer 2 · 2020-04-22T18:35:59.670

$B: = C + \delta I$
for some $\delta \in R$ that is large enough so our real symmetric $B\succ0$

let $\Sigma_B$ be a diagonal matrix with the singular values of $B$ (which are also its eigenvalues) and $\Sigma_{QQ^T}$ have the singular values of $(QQ^T)$.

Singular values are in the usual ordering of largest to smallest
note this means $\Sigma_{QQ^T} = \begin{bmatrix} \mathbf I_k & \mathbf 0 \\ \mathbf 0 & \mathbf 0 \end{bmatrix}$

by application of von Neumann trace inequality:
$\text{trace}\big(Q^TBQ\big)$
$=\text{trace}\big((QQ^T)B\big)$
$\leq \text{trace}\big(\Sigma_{QQ^T}\Sigma_{B}\big)$
$= \sum_{i=1}^k \sigma_i^{(B)}$
$= \sum_{i=1}^k \lambda_i^{(B)}$

Making use of linearity we also know
$\text{trace}\big(Q^TBQ\big) = \text{trace}\big(Q^T(C + \delta I)Q\big)= \text{trace}\big(Q^TC Q\big) + \delta\cdot \text{trace}\big( Q^TQ\big) = \text{trace}\big(Q^TC Q\big) + \delta \cdot k$

to conclude
$ \text{trace}\big(Q^TC Q\big) $
$= \text{trace}\big(Q^TBQ\big) -\delta \cdot k $
$\leq \big( \sum_{i=1}^k \lambda_i^{(B)}\big)-\delta \cdot k$
$= \big( \sum_{i=1}^k (\lambda_i^{(B)}-\delta)\big)$
$= \sum_{i=1}^k \lambda_i^{(C)}$

and this is met with equality when you select the columns of $Q$ to be the first $k$ (mutually othornomal) eigenvectors of $B$

user8675309 · Answer 3 · 2022-08-23T03:59:16.007

Here's a proof with Cauchy Eigenvalue Interlacing

Given that $Q^T Q = I_k$
$A:=Q^T C Q$ has $k$ eigenvalues that interlace with those of $C$. With eigenvalues in the usual ordering of
$\lambda_1^{(C)} \geq \lambda_2^{(C)} \geq ... \geq \lambda_n^{(C)}$ and $\lambda_1^{(A)} \geq \lambda_2^{(A)} \geq ... \geq \lambda_k^{(A)}$
A crude consequence of Cauchy Interlacing is that
$\lambda_j^{(C)} \geq \lambda_j^{(A)}$ for $j\in\{1,2,...,k\}$

summing over the bound
$\sum_{j=1}^k \lambda_j^{(C)} \geq \sum_{j=1}^k\lambda_j^{(A)} = \text{trace}\big(Q^T C Q\big)$
the upper bound is met with equality when $Q$ is chosen to have the first $k$ eigenvectors of $C$

score 2 · Answer 4 · answered Apr 23 '20 at 00:56

A proof by Schur-Horn theorem:

Let $V = [Q \ P]$ be an orthogonal matrix. Then $Q = V\left( \begin{array}{c} I_K \\ 0 \\ \end{array} \right)$. We have \begin{align} \mathrm{Tr}(Q^{\mathsf{T}}CQ) &= \mathrm{Tr}\left([I_k \ 0]V^{\mathsf{T}}CV\left( \begin{array}{c} I_K \\ 0 \\ \end{array} \right)\right)\\ &= \mathrm{Tr}\left(V^{\mathsf{T}}CV\left( \begin{array}{c} I_K \\ 0 \\ \end{array} \right)[I_k \ 0]\right)\tag{1}\\ &= \mathrm{Tr}\left(V^{\mathsf{T}}CV\left( \begin{array}{cc} I_K & 0 \\ 0 & 0 \\ \end{array} \right) \right)\\ &= \sum_{i=1}^K (V^\mathsf{T}CV)_{i,i}. \tag{2} \end{align} In (1), we have used the well-known fact that $\mathrm{Tr}(AB) = \mathrm{Tr}(BA)$ for $A \in \mathbb{R}^{m\times n}$ and $B \in \mathbb{R}^{n\times m}$.

Thus, we turn to find an orthogonal matrix $V$ such that $\sum_{i=1}^K (V^\mathsf{T}CV)_{i,i}$ is maximized.
Let $C = U\mathrm{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d)U^\mathsf{T}$ be the eigendecomposition of $C$ where $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_d$ are the eigenvalues of $C$ in descending order, and $U$ is an orthogonal matrix whose columns are the eigenvectors of $C$. Let $$G = V^\mathsf{T}CV = V^\mathsf{T}U\mathrm{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d)U^\mathsf{T}V. \tag{3}$$ Clearly, $\lambda_1, \lambda_2, \cdots, \lambda_d$ are also the eigenvalues of $G$. Let $d = (G_{1,1}, G_{2,2}, \cdots, G_{d,d})$. Let $\lambda = (\lambda_1, \lambda_2, \cdots, \lambda_d)$. By the Schur-Horn theorem [1][2], we know that $d$ is majorized by $\lambda$ which results in $$\sum_{i=1}^K G_{i,i} \le \sum_{i=1}^K \lambda_i \tag{4}$$ with equality if $U^\mathsf{T}V = I_d$ (see (3)), i.e., $V = U$.

We conclude that the maximum of $\mathrm{Tr}(Q^{\mathsf{T}}CQ)$ is $\sum_{i=1}^K \lambda_i$ which is achieved at $Q = U\left( \begin{array}{c} I_K \\ 0 \\ \end{array} \right)$.

Reference

[1] https://en.wikipedia.org/wiki/Schur%E2%80%93Horn_theorem

[2] https://mathworld.wolfram.com/HornsTheorem.html

Definition of majorization: Let $x, y \in \mathbb{R}^n$ be given. We say that $y$ is majorized by $x$ if and only if $$\sum_{i=1}^k x_{[i]} \ge \sum_{i=1}^k y_{[i]}, \ k=1, 2, \cdots, n-1$$ and $$\sum_{i=1}^n x_{[i]} = \sum_{i=1}^n y_{[i]}$$ where $x_{[1]} \ge x_{[2]} \ge \cdots \ge x_{[n]}$ denotes a decreasing rearrangement of $x_1, x_2, \cdots, x_n$.

score 1 · Answer 5 · answered Dec 28 '20 at 16:23

Here's a slightly shorter and, hopefully, more transparent version of copper.hat's answer:

Let $e_j$ be the eigenvectors of $C$, with eigenvalues $\lambda_j$. Then \begin{align}\mathrm{tr}Q^TCQ&=\sum_{i=1}^k\langle q_i,Cq_i\rangle\\ &=\sum_{i=1}^k\sum_{j=1}^d\langle q_i,e_j\rangle\langle e_j,Cq_i\rangle\\ &=\sum_{i,j}\lambda_j|\langle e_j,q_i\rangle|^2=\sum_{j=1}^d\lambda_j\alpha_j^2 \end{align} where $0\le\alpha_j^2=\sum_i|\langle e_j,q_i\rangle|^2\le\|e_j\|^2=1$ and $\sum_{j=1}^d\alpha_j^2=\sum_{i,j}|\langle e_j,q_i\rangle|^2=\sum_{i=1}^k\|q_i\|^2=k$.
It then follows that the maximum is achieved at $\lambda_1+\cdots+\lambda_k$ with $q_i=e_i$.

Maximize $\mathrm{tr}(Q^TCQ)$ subject to $Q^TQ=I$

5 Answers5

Linked

Related