3

My optimization problem is $$\min_X\operatorname{tr}(X^TA^TAX(X^T(I-P)X)^{-1}),$$ where $P$ is a projection matrix.

I was told this could be solved as an eigenproblem: columns of $X^*$ (the solution) are eigenvectors of $(A^TA)^{-1}(I-P)$, but I failed to see why.

The form looks so familiar to me that I feel I am just lacking one last bit to reach the solution.


Update -- Thanks to @AWashburn, I realize my projection matrix $P$ is symmetric, and equivalently, the projection is orthogonal.

  • 2
    Is $X$ supposed to be a square matrix? $\qquad$ – Michael Hardy Jul 05 '16 at 15:42
  • @MichaelHardy Sorry for being unclear. No, it is not necessarily square. – Sibbs Gambling Jul 05 '16 at 16:19
  • If all columns of $X$ are in the column space of $P$, then, since $P$ is a projection, you have $PX=X$, so the trace is that of a zero matrix. Next I'd work on showing the trace cannot be negative. The matrix $X$ cannot have more columns than rows, since then the matrix that gets inverted would have rank at most the number of rows, but would have more rows and columns than that number; hence it would not be invertible. $\qquad$ – Michael Hardy Jul 05 '16 at 17:38
  • 1
    Is P an orthogonal projection? – MathIsKey Jul 12 '16 at 21:32
  • @AWashburn Ah yes! It is indeed orthogonal. I didn't realize it before you asked. Question updated. Hopefully, this helps solve the thing? – Sibbs Gambling Jul 13 '16 at 04:51

1 Answers1

3

Consider the simultaneous diagonalization of $A^TA$ and $I-P$ which exists as they are both symmetric matrices.

$$Y^TA^TAY=I$$ $$Y^T(I-P)Y=D$$

In particular, $Y$ can be computed as eigenvector of $(A^TA)^{-1}(I-P)$ since we have

$$(A^TA)^{-1}(I-P)Y=YD.$$

Now, let's look at our optimization problem,

\begin{align} tr(X^TA^TAX(X^T(I-P)X)^{-1})&=tr(X^TY^{-T}Y^{-1}X(X^TY^{-T}DY^{-1}X)^{-1}). \end{align}

Let $Y^{-1}X=QR$ where $R$ is nonsingular and $Q$ has orthogonal columns.

\begin{align} tr(X^TA^TAX(X^T(I-P)X)^{-1})&=tr(R^TR(R^TQ^TDQR)^{-1})\\ &=tr((Q^TDQ)^{-1})\\ &\geq \lambda_1 + \ldots + \lambda_q \end{align}

where $\lambda_1,\ldots, \lambda_q$ are the $q$ smallest positive eigenvalues of $D^{-1}$. The minimal value can be attained by picking $Q$ to be the standard unit vectors.

Hence, we have $X=YQR$.

Last but not least, note that if $X$ and $XS$ attained the same value for $tr(X^TA^TAX(X^T(I-P)X)^{-1})$ if $S$ is nonsingular.

Hence we can pick $X=YQ$ which are the eigenvectors of $(A^TA)^{-1}(I-P)$.

Siong Thye Goh
  • 149,520
  • 20
  • 88
  • 149
  • Thanks! Almost got it. Two missing bits: (1) Why does simultaneous diagonalization definitely exist for two symmetric matrices? (2) Why is $Y$ orthogonal (so that $Y^{-1}=Y^T$)? Thanks! – Sibbs Gambling Jul 13 '16 at 08:31
  • 1
    @SibbsGambling (1) The full theorem can be found here (with proof) link and to prove they commute consider $(AB)^t = AB = BA$ (for $A$, $B$ symmetric) (2) You can always pick your basis of eigenvectors to be orthogonal for symmetric matrices – MathIsKey Jul 13 '16 at 18:14
  • 1
    The only thing I would add to this proof is that we are not including the eigenvectors with an eigenvalue of 0 in $X$. So in general $X$ will not be a square matrix and will have less columns than rows (unless $P$ is the trivial projections of $0$ or $I$) – MathIsKey Jul 13 '16 at 18:22