1

Let $A \in \mathbb{R}^{n \times n}$ be symmetric and positive definite and let $m < n$. We want to find an orthonormal basis $\{ u_1,..., u_n \}$ of $\mathbb{R}^n$ such that $$ J = \sum_{i=m}^n u_i^T A u_i $$ is minimised.

Apparently the set of eigenvectors of $A$ minimise this expression. Is that true? How could one show this?

blat
  • 1,050

3 Answers3

3

No. If $u_1,\ldots,u_n $ is an orthonormal basis, then $J $ is the trace of $A $, which is independent of the choice of basis. This, for any $A $, symmetric or not, positive definite or not.

Martin Argerami
  • 205,756
3

It is indeed the case that the set of eigenvectors $u_1,\dots,u_n$ associated with eigenvalues $\lambda_1 \geq \cdots \geq \lambda_n$ minimizes your sum. Note that taking these vectors $u_i$ yields the sum $$ \sum_{i=m}^n u_i^TAu_i = \sum_{i=m}^n \lambda_i. $$ It suffices, then, to prove that the above is a lower bound for the sum for any choice of orthonormal basis of $u_i$. One way to prove that this holds is as follows. The Schur-Horn theorem (or to be precise, a corollary of one direction of the Schur-Horn theorem) tells us that a symmetric matrix $M$ with eigenvalues $\mu_1 \geq \cdots \geq \mu_n$ and diagonal entries $d_1 \geq \cdots \geq d_n$ will necessarily satisfy $$ \sum_{i=m}^n d_i \geq \sum_{i=m}^n \mu_i \qquad m = 1,\dots,n, $$ with equality in the case of $m = n$ (in which both sides are equal to the trace of $M$).

Now, we apply this to your problem. Consider an arbitrary orthonormal basis $u_1,\dots,u_n$. Let $U$ denote the matrix whose columns are $u_1,\dots,u_n$. Let $M$ denote the matrix $$ M = U^TAU = [u_i^TAu_j]_{i,j = 1}^n. $$ $M$ is similar to $A$, so its eigenvalues are $\lambda_1 \geq \cdots \geq \lambda_n$. Let $d_1 \geq \cdots \geq d_n$ denote the diagonal entries of $M$. Applying the Schur-Horn theorem gives us $$ \sum_{i=m}^n u_i^TAu_i = \sum_{i=m}^n M_{i,i} \geq \sum_{i=m}^{n} d_i \geq \sum_{i=m}^{n} \lambda_i, $$ as was desired.

Ben Grossmann
  • 225,327
  • There are alternative approaches to this proof. However, this approach (adapted from Bhatia's Matrix Analysis) is one that I find particularly appealing. – Ben Grossmann Feb 15 '20 at 13:14
1

According to the Spectral Theorem, if $A \in \mathbb{R}^{n \times n}$ is symmetric, there exists an orthonormal basis of the corresponding vector space $V$ that consissts of eigenvectors of $A$. So, we know that the orthonormal basis $\left \{ u_{1},u_{2},\cdots ,u_{n} \right \}$ exists.

The positive-definiteness of matrix $A$ tells us that there is a unique global minimum.

As for the question of how to find the $u_{i}$'s that minimize $\sum_{i=m}^{n}u_{i}^{T}Au_{i}$, the technique that is used in Principal Component Analysis (PCA) to find the projection matrix that retains maximum variance when data is projected into lower dimension space, i.e., compressed, can be used here, with modification to minimize instead of to maximize.

For each $u_{i}$, $$\left\{\begin{matrix}u_{i}^{T}Au_{i}\rightarrow minimize\\ s.t.\\ \left \| u_{i} \right \|^{2}=1\end{matrix}\right.$$

The Lagrangian for the above constrained optimization problem is: $$\mathfrak{L}\left ( u_{i},\lambda_{i} \right )=b_{i}^{T}Au_{i}+\lambda _{i}\left ( 1-u_{i}^{T}u_{i} \right )$$

Take the partial derivative with respet to $u_{i}$ and $\lambda_{i}$: $$\frac{\partial \mathfrak{L}}{\partial u_{i}}=2u_{i}^{T}A-2\lambda_{i}b_{i}^{T},\:\frac{\partial \mathfrak{L}}{\partial \lambda_{i}}=1-u_{i}^{T}u_{i}$$

Setting the partial derivatives to $0$ gives us the following: $$\left\{\begin{matrix}u_{i}^{T}A=\lambda_{i}u_{i}^{T}\cdots \cdots \cdots \left ( 1 \right )\\ u_{i}^{T}u_{i}=1\cdots \cdots \cdots \left ( 2 \right )\end{matrix}\right.$$

Transpose both sides of (1). This is where the symmetry of $A$ is needed. $A$ is symmetric means that $A=A^{T}$.

LHS: $\left ( u_{i}^{T}A \right )^{T}=A^{T}\left ( u_{i}^{T} \right )^{T}=Au_{i}$

RHS: $\left ( \lambda_{i}u_{i}^{T} \right )^{T}=\left ( u_{i}^{T} \right )^{T}\lambda_{i}=\lambda_{i}u_{i}$

$$\left\{\begin{matrix}Au_{i}=\lambda_{i}u_{i}\\ u_{i}^{T}u_{i}=1\end{matrix}\right.$$

This is an eigen-problem. $u_{i}$'s are the eingenvectors of $A$.

Rewrite $J_{i}=u_{i}^{T}Au_{i}$: $$J_{i}=u_{i}^{T}\lambda_{i}u_{i}=\lambda_{i}u_{i}^{T}u_{i}=\lambda_{i}$$

For each $i$, $J_{i}$ is now reduced to the corresponding eigen-value. Furthermore, $J_{i}$ is minimized if the smallest eingen-value is chosen. That means, to minimize $$J=\sum_{i=m}^{n}u_{i}^{T}Au_{i}$$, for each $i$, choose the eingenvector $u_{i}$ that is associated with the smallest eingenvalue.

This is the technique that is used in Principal Component Analysis to find the projection matrix that retains the maximum amount of variance (data spread) of the original dataset when it is projected into lower dimension space, except that in PCA, the objective is to maximize.