2

Let $C \in \mathbb{R}^{d \times d}$ be symmetric, and

$$Q = \begin{bmatrix} \vert & \vert & & \vert \\ q_1 & q_2 & \dots & q_K \\ \vert & \vert & & \vert \end{bmatrix} \in \mathbb{R}^{d\times K}$$

where $d \geq K$. Using Lagrange multipliers,

$$\begin{array}{ll} \text{maximize} & \mbox{tr} \left( Q^T C Q \right)\\ \text{subject to} & Q^T Q = I\end{array}$$


I am unfamiliar with these kind of constraints with this method, and after reading another post I believe the same specific and simple result given is also applicable, and therefore the lagrangian would be:

$$\mathcal{L}(Q,\lambda)=\mathrm{tr}(Q^TCQ)-\left<\lambda,Q^TQ-I\right>$$

where $\lambda\in\mathbb{R}^{K\times K}$, and $\left<\cdot,\cdot\right>$ is the element wise inner product (what kind of makes sense to me since we're actually adding as many constraints as there are elements in these matrices.

In attempting to do that I start taking $\frac{\partial \mathcal{L}}{\partial Q}=O\in\mathbb{R}^{d\times K}$, and compute that LHS element by element; for the $(l,m)$ one:

\begin{equation} 0=\frac{\partial \mathcal{L}}{\partial Q_{lm}}=(CQ+C^TQ)_{lm}-\underbrace{\frac{\partial}{\partial Q_{lm}}\sum_{i,j}\lambda_{i,j}(Q^TQ-I)_{ij}}_{=\lambda_{lm}\frac{\partial (Q^TQ)_{lm}}{\partial Q_{lm}}}=2(CQ)_{lm}-\lambda_{lm}\frac{\partial (q_l^Tq_m)}{\partial q_m(l)} \tag{1}\end{equation}

where in the last step I've used the definition I made at the beginning for $Q$, and $q_m(l)$ denotes the $l$-th component of the column vector $q_m$.

In trying to compute the very last term: $$\frac{\partial (q_l^Tq_m)}{\partial q_m(l)}=\frac{\partial \left[q_l(1)q_m(1)+ \ldots + q_l(d)q_m(d)\right]}{\partial q_m(l)}= \begin{cases} q_l(l)\equiv Q_{ll} & \text{if } l\neq m\\ 2q_l(l)\equiv 2Q_{ll} & \text{if} l=m \end{cases}$$

The whole equality (1) then can be written:

$$0=2(CQ)_{lm}-\lambda_{lm}Q_{ll}(1+\delta_{lm})$$

where $\delta_{lm}$ is the Kronecker delta.

The equation for the other stationary point of the lagrangian, $\frac{\partial \mathcal{L}}{\partial \lambda}=O\in\mathbb{R}^{K\times K}$, for the $(l,m)$ element as well:

$$ 0=\frac{\partial \mathcal L}{\partial \lambda_{lm}}= \frac{\partial }{\partial \lambda_{lm}}\sum_{i,j}\lambda_{i,j}(Q^TQ-I)_{ij}=(Q^TQ-I)_{lm}\tag{2}$$

what obviously leads to $(Q^TQ)_{lm}=\delta_{lm}$.

All this should tell that the columns of $Q$ are eventually the $K$ first eigenvectors of $C$, but I don't know how to continue from here to prove that, supposing I didn't make a mistake. Please I would sincerely appreciate any help.


Edit:

I have rewritten the inner product as a trace of a product of matrices (after seeing this question):

$$\left<\lambda,Q^TQ-I\right>=\sum_{i,j}\lambda_{i,j}(Q^TQ-I)_{ij}=\mathrm{tr}(\lambda^TQ^TQ) $$

and have thus managed to do the derivative without losing the matrix format (using formulas from the Matrix Cookbook):

\begin{align} O=&\frac{\partial \mathcal{L}}{\partial Q}=\frac{\partial}{\partial Q}\mathrm{tr}(Q^TCQ)-\frac{\partial}{\partial Q}\underbrace{\mathrm{tr}(\lambda^T(Q^TQ-I))}_{\mathrm{tr}(\lambda^TQ^TQ)-\mathrm{tr}(\lambda^T)}\\=&(CQ+C^TQ)-(Q(\lambda^T)^T+Q\lambda^T)=2CQ+Q(\lambda+\lambda^T) \end{align}

And this leads to:

$$CQ=Q\underbrace{\left(-\frac{\lambda+\lambda^T}{2}\right)}_{:=\widetilde{\lambda}};\quad CQ=Q$$

If the defined matrix $\widetilde{\lambda}=Q^TCQ$ were diagonal we would already have the result.

abcd
  • 259
  • $K \ge d$? ${}{}$ – copper.hat Apr 22 '20 at 00:25
  • $d$ would be the dimension of the square (symmetric) matrix $C$, which has then $d$ (independent, since it's symmetric) eigenvectors. $K \leq d$ then. If we set $K=d$ the resulting $Q$ would contain all the eigenvectors in its columns. – abcd Apr 22 '20 at 00:29
  • 1
    You should add that to the question. I guess it is implicit in $Q^TQ = I$, but no harm to add. – copper.hat Apr 22 '20 at 00:29
  • You can assume $C$ to be diagonal in the first place. That simplifies life considerably. – copper.hat Apr 22 '20 at 01:11
  • @abcd Do you want any solution or those using Lagrange multipliers? – River Li Apr 22 '20 at 11:54
  • 1
    @River Li For my problem itself just needed a proof but now that I started with this method I really am interested in how I can apply the general answer I linked to this relatively simple case, and see whether what I made makes sense / goes anywhere. So yes one using Lagrange Multipliers but appreciate others' inputs as well. – abcd Apr 22 '20 at 18:08
  • 1
    It is not entirely clear that the Lagrange multiplier approach helps, the resulting equation is something like $2 \operatorname{tr}(Q^TCH)+ \lambda(Q^TH+H^TQ) = 0$ for all $H$, where $\lambda$ is a real functional. – copper.hat Apr 22 '20 at 20:11
  • 1
    This results in something like $CQ = - Q \lambda$. – copper.hat Apr 22 '20 at 20:20
  • I think I have made progress with the Lagrange approach and have arrived to something very similar to what @copper.hat describes (see edit). I believe $\widetilde{\lambda}$ is diagonal (from (2)) and hence it would be the matrix of eigenvalues (and hence the result) but am not fully sure now. – abcd Apr 23 '20 at 02:59
  • I don't think the Lagrange approach will give you much more. – copper.hat Apr 23 '20 at 03:43

5 Answers5

4

Since $C$ is symmetric real we can write $C=U \Lambda U^T$ where $\Lambda$ is a diagonal matrix of eigenvalues. As $Q^T U U^T Q = I$, we can just assume $C= \operatorname{diag} (\lambda_1,...,\lambda_d)$, where $\lambda_1 \ge \cdots \ge \lambda_d$.

The problem is then $\max_{Q^TQ=I} \operatorname{tr}(Q^T \Lambda Q)$.

Note that $\operatorname{tr}(Q^T \Lambda Q) = \operatorname{tr}(Q^T Q Q^T \Lambda Q) = \operatorname{tr}( Q Q^T \Lambda QQ^T) = \operatorname{tr}(P^T \Lambda P)$, where $P=Q Q^T$.

Note that $P$ is an orthogonal projection onto a subspace of dimension $K$. Furthermore, any such orthogonal projection can be written in the form $Q Q^T$, where $Q^TQ = I$.

So now the problem is $\max_{P \text{ orthogonal projection}, \text{ rk } P=K} \operatorname{tr}(P^T \Lambda P)$.

Note that $\operatorname{tr}(P^T \Lambda P) = \sum_{n=1}^d \lambda_n \|P e_n\|^2$. Furthermore, note that $\|P\|_F^2 = K$ and so $\sum_{n=1}^d \|P e_n\|^2 = K$ with $0 \le \|P e_n\|^2 \le 1$. ($e_n$ is the $n$th unit vector.)

It is straightforward to check that $\max\{ \sum_{n=1}^d \lambda_n \mu_n | \sum_{n=1}^d \mu_n = K, 0 \le \mu_n \le 1 \}$ is $\lambda_1+\cdots+ \lambda_K$.

Hence $\operatorname{tr}(P^T \Lambda P) \le \lambda_1+\cdots+ \lambda_K$ and by choosing ${\cal R} P = \operatorname{sp}\{e_1,...,e_K \}$ we see that this is attained.

Clarent
  • 95
copper.hat
  • 172,524
2

$B: = C + \delta I$
for some $\delta \in R$ that is large enough so our real symmetric $B\succ0$

let $\Sigma_B$ be a diagonal matrix with the singular values of $B$ (which are also its eigenvalues) and $\Sigma_{QQ^T}$ have the singular values of $(QQ^T)$.

Singular values are in the usual ordering of largest to smallest
note this means $\Sigma_{QQ^T} = \begin{bmatrix} \mathbf I_k & \mathbf 0 \\ \mathbf 0 & \mathbf 0 \end{bmatrix}$

by application of von Neumann trace inequality:
$\text{trace}\big(Q^TBQ\big)$
$=\text{trace}\big((QQ^T)B\big)$
$\leq \text{trace}\big(\Sigma_{QQ^T}\Sigma_{B}\big)$
$= \sum_{i=1}^k \sigma_i^{(B)}$
$= \sum_{i=1}^k \lambda_i^{(B)}$

Making use of linearity we also know
$\text{trace}\big(Q^TBQ\big) = \text{trace}\big(Q^T(C + \delta I)Q\big)= \text{trace}\big(Q^TC Q\big) + \delta\cdot \text{trace}\big( Q^TQ\big) = \text{trace}\big(Q^TC Q\big) + \delta \cdot k$

to conclude
$ \text{trace}\big(Q^TC Q\big) $
$= \text{trace}\big(Q^TBQ\big) -\delta \cdot k $
$\leq \big( \sum_{i=1}^k \lambda_i^{(B)}\big)-\delta \cdot k$
$= \big( \sum_{i=1}^k (\lambda_i^{(B)}-\delta)\big)$
$= \sum_{i=1}^k \lambda_i^{(C)}$

and this is met with equality when you select the columns of $Q$ to be the first $k$ (mutually othornomal) eigenvectors of $B$

user8675309
  • 10,034
2

Here's a proof with Cauchy Eigenvalue Interlacing

Given that $Q^T Q = I_k$
$A:=Q^T C Q$ has $k$ eigenvalues that interlace with those of $C$. With eigenvalues in the usual ordering of
$\lambda_1^{(C)} \geq \lambda_2^{(C)} \geq ... \geq \lambda_n^{(C)}$ and $\lambda_1^{(A)} \geq \lambda_2^{(A)} \geq ... \geq \lambda_k^{(A)}$
A crude consequence of Cauchy Interlacing is that
$\lambda_j^{(C)} \geq \lambda_j^{(A)}$ for $j\in\{1,2,...,k\}$

summing over the bound
$\sum_{j=1}^k \lambda_j^{(C)} \geq \sum_{j=1}^k\lambda_j^{(A)} = \text{trace}\big(Q^T C Q\big)$
the upper bound is met with equality when $Q$ is chosen to have the first $k$ eigenvectors of $C$

user8675309
  • 10,034
2

A proof by Schur-Horn theorem:

Let $V = [Q \ P]$ be an orthogonal matrix. Then $Q = V\left( \begin{array}{c} I_K \\ 0 \\ \end{array} \right)$. We have \begin{align} \mathrm{Tr}(Q^{\mathsf{T}}CQ) &= \mathrm{Tr}\left([I_k \ 0]V^{\mathsf{T}}CV\left( \begin{array}{c} I_K \\ 0 \\ \end{array} \right)\right)\\ &= \mathrm{Tr}\left(V^{\mathsf{T}}CV\left( \begin{array}{c} I_K \\ 0 \\ \end{array} \right)[I_k \ 0]\right)\tag{1}\\ &= \mathrm{Tr}\left(V^{\mathsf{T}}CV\left( \begin{array}{cc} I_K & 0 \\ 0 & 0 \\ \end{array} \right) \right)\\ &= \sum_{i=1}^K (V^\mathsf{T}CV)_{i,i}. \tag{2} \end{align} In (1), we have used the well-known fact that $\mathrm{Tr}(AB) = \mathrm{Tr}(BA)$ for $A \in \mathbb{R}^{m\times n}$ and $B \in \mathbb{R}^{n\times m}$.

Thus, we turn to find an orthogonal matrix $V$ such that $\sum_{i=1}^K (V^\mathsf{T}CV)_{i,i}$ is maximized.
Let $C = U\mathrm{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d)U^\mathsf{T}$ be the eigendecomposition of $C$ where $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_d$ are the eigenvalues of $C$ in descending order, and $U$ is an orthogonal matrix whose columns are the eigenvectors of $C$. Let $$G = V^\mathsf{T}CV = V^\mathsf{T}U\mathrm{diag}(\lambda_1, \lambda_2, \cdots, \lambda_d)U^\mathsf{T}V. \tag{3}$$ Clearly, $\lambda_1, \lambda_2, \cdots, \lambda_d$ are also the eigenvalues of $G$. Let $d = (G_{1,1}, G_{2,2}, \cdots, G_{d,d})$. Let $\lambda = (\lambda_1, \lambda_2, \cdots, \lambda_d)$. By the Schur-Horn theorem [1][2], we know that $d$ is majorized by $\lambda$ which results in $$\sum_{i=1}^K G_{i,i} \le \sum_{i=1}^K \lambda_i \tag{4}$$ with equality if $U^\mathsf{T}V = I_d$ (see (3)), i.e., $V = U$.

We conclude that the maximum of $\mathrm{Tr}(Q^{\mathsf{T}}CQ)$ is $\sum_{i=1}^K \lambda_i$ which is achieved at $Q = U\left( \begin{array}{c} I_K \\ 0 \\ \end{array} \right)$.

Reference

[1] https://en.wikipedia.org/wiki/Schur%E2%80%93Horn_theorem

[2] https://mathworld.wolfram.com/HornsTheorem.html

Definition of majorization: Let $x, y \in \mathbb{R}^n$ be given. We say that $y$ is majorized by $x$ if and only if $$\sum_{i=1}^k x_{[i]} \ge \sum_{i=1}^k y_{[i]}, \ k=1, 2, \cdots, n-1$$ and $$\sum_{i=1}^n x_{[i]} = \sum_{i=1}^n y_{[i]}$$ where $x_{[1]} \ge x_{[2]} \ge \cdots \ge x_{[n]}$ denotes a decreasing rearrangement of $x_1, x_2, \cdots, x_n$.

River Li
  • 37,323
1

Here's a slightly shorter and, hopefully, more transparent version of copper.hat's answer:

Let $e_j$ be the eigenvectors of $C$, with eigenvalues $\lambda_j$. Then \begin{align}\mathrm{tr}Q^TCQ&=\sum_{i=1}^k\langle q_i,Cq_i\rangle\\ &=\sum_{i=1}^k\sum_{j=1}^d\langle q_i,e_j\rangle\langle e_j,Cq_i\rangle\\ &=\sum_{i,j}\lambda_j|\langle e_j,q_i\rangle|^2=\sum_{j=1}^d\lambda_j\alpha_j^2 \end{align} where $0\le\alpha_j^2=\sum_i|\langle e_j,q_i\rangle|^2\le\|e_j\|^2=1$ and $\sum_{j=1}^d\alpha_j^2=\sum_{i,j}|\langle e_j,q_i\rangle|^2=\sum_{i=1}^k\|q_i\|^2=k$.
It then follows that the maximum is achieved at $\lambda_1+\cdots+\lambda_k$ with $q_i=e_i$.

Chrystomath
  • 10,798