Minimize $\operatorname{tr}(X^TA^TAX(X^T(I-P)X)^{-1})$ by solving an eigenproblem?

Question

My optimization problem is $$\min_X\operatorname{tr}(X^TA^TAX(X^T(I-P)X)^{-1}),$$ where $P$ is a projection matrix.

I was told this could be solved as an eigenproblem: columns of $X^*$ (the solution) are eigenvectors of $(A^TA)^{-1}(I-P)$, but I failed to see why.

The form looks so familiar to me that I feel I am just lacking one last bit to reach the solution.

Update -- Thanks to @AWashburn, I realize my projection matrix $P$ is symmetric, and equivalently, the projection is orthogonal.

@MichaelHardy Sorry for being unclear. No, it is not necessarily square. — Sibbs Gambling, Jul 05 '16 at 16:19
If all columns of $X$ are in the column space of $P$, then, since $P$ is a projection, you have $PX=X$, so the trace is that of a zero matrix. Next I'd work on showing the trace cannot be negative. The matrix $X$ cannot have more columns than rows, since then the matrix that gets inverted would have rank at most the number of rows, but would have more rows and columns than that number; hence it would not be invertible. $\qquad$ — Michael Hardy, Jul 05 '16 at 17:38
@AWashburn Ah yes! It is indeed orthogonal. I didn't realize it before you asked. Question updated. Hopefully, this helps solve the thing? — Sibbs Gambling, Jul 13 '16 at 04:51

Siong Thye Goh · Accepted Answer · 2016-07-13T06:38:27.120

3

Consider the simultaneous diagonalization of $A^TA$ and $I-P$ which exists as they are both symmetric matrices.

$$Y^TA^TAY=I$$ $$Y^T(I-P)Y=D$$

In particular, $Y$ can be computed as eigenvector of $(A^TA)^{-1}(I-P)$ since we have

$$(A^TA)^{-1}(I-P)Y=YD.$$

Now, let's look at our optimization problem,

\begin{align} tr(X^TA^TAX(X^T(I-P)X)^{-1})&=tr(X^TY^{-T}Y^{-1}X(X^TY^{-T}DY^{-1}X)^{-1}). \end{align}

Let $Y^{-1}X=QR$ where $R$ is nonsingular and $Q$ has orthogonal columns.

\begin{align} tr(X^TA^TAX(X^T(I-P)X)^{-1})&=tr(R^TR(R^TQ^TDQR)^{-1})\\ &=tr((Q^TDQ)^{-1})\\ &\geq \lambda_1 + \ldots + \lambda_q \end{align}

where $\lambda_1,\ldots, \lambda_q$ are the $q$ smallest positive eigenvalues of $D^{-1}$. The minimal value can be attained by picking $Q$ to be the standard unit vectors.

Hence, we have $X=YQR$.

Last but not least, note that if $X$ and $XS$ attained the same value for $tr(X^TA^TAX(X^T(I-P)X)^{-1})$ if $S$ is nonsingular.

Hence we can pick $X=YQ$ which are the eigenvectors of $(A^TA)^{-1}(I-P)$.

edited Jul 13 '16 at 06:38

answered Jul 13 '16 at 06:29

Siong Thye Goh

149,520
20
88
149

Thanks! Almost got it. Two missing bits: (1) Why does simultaneous diagonalization definitely exist for two symmetric matrices? (2) Why is $Y$ orthogonal (so that $Y^{-1}=Y^T$)? Thanks! – Sibbs Gambling Jul 13 '16 at 08:31
1

@SibbsGambling (1) The full theorem can be found here (with proof) link and to prove they commute consider $(AB)^t = AB = BA$ (for $A$, $B$ symmetric) (2) You can always pick your basis of eigenvectors to be orthogonal for symmetric matrices – MathIsKey Jul 13 '16 at 18:14
1

The only thing I would add to this proof is that we are not including the eigenvectors with an eigenvalue of 0 in $X$. So in general $X$ will not be a square matrix and will have less columns than rows (unless $P$ is the trivial projections of $0$ or $I$) – MathIsKey Jul 13 '16 at 18:22

Minimize $\operatorname{tr}(X^TA^TAX(X^T(I-P)X)^{-1})$ by solving an eigenproblem?

1 Answers1