Property of trace in optimization problem

Question

If I consider the following optimization problem

\begin{align} &\min_{\phi \in \mathbb{R}^d} \|X- \phi \phi^T X\|^2_F\\ &\text{s.t. } \, \, \, \phi^T\phi=1 \end{align}

where $X$ is a $d \times n$ matrix and $\|\cdot\|_F$ denotes the frobenius norm. I want to show that this is just an eigenvalue problem.

With some few steps I can write $\|X- \phi \phi^T X\|^2_F = \operatorname{tr}(X^TX - XX^T\phi\phi^T)$ that I now consider as the objective function. So writing the lagrangian (and using linearity of trace) I get

\begin{equation} \mathcal{L} = \operatorname{tr}(X^TX) - \operatorname{tr}(XX^T\phi\phi^T) -\lambda(\phi^T\phi-1). \end{equation}

Then,

\begin{equation} \mathbb{R}^d \ni\frac{\partial\mathcal{L}}{\partial\phi} = -\operatorname{Tr}(2XX^T\phi) - 2\lambda\phi. \end{equation}

where I've used again linearity of trace and then commutation with derivative.

\begin{equation} \frac{d(\operatorname{tr}(f(X))}{dX} = \operatorname{Tr}\left(\frac{df(X)}{dX}\right) \end{equation}

The little thing that makes me wonder is that now the object I have inside the trace has dimensions $d \times 1$ (it's a vector), so the Trace isn't defined? Or is it just equal to its argument? And if yes why? Thanks!

EDIT:

Thank you, a way that I have found to justify in a more rigorous way that works for me is the following:

\begin{equation} \frac{\partial tr(XX^T\phi\phi^T)}{\partial \phi} = \frac{\partial}{\partial\phi}\sum_{i,j}x_{ij}^2\phi_j^2 = \sum_{i,j}2x_{i,j}^2\phi_j = 2XX^T\phi \end{equation}

The key thing here is that the trace of a product of matrices can be seen as the the sum of entry-wise products of their elements i.e. \begin{equation} tr(A^TB) = \sum_{i,j}(A\circ B)_{i,j} \end{equation}

where $\circ$ denotes the Hadamard product.

For an alternative approach to this problem, see this post or this post — Ben Grossmann, Feb 12 '21 at 21:24
Long story short, there is no standard treatment of the object $\frac{df}{dX}$ such that its "trace" is defined. — Ben Grossmann, Feb 12 '21 at 21:37
The derivative of the Lagrangian was miscalculated. It should read $$\eqalign{ \frac{\partial\mathcal{L}}{\partial\phi} = -2XX^T\phi - 2\lambda\phi ;=; -2(XX^T+\lambda I)\phi\ }$$ Setting it to zero leads to an eigenvalue equation for the $XX^T$ matrix. — greg, Feb 13 '21 at 00:48
@Ben Grossmann what does it mean? That the problem of finding the derivative of lagrangian is not well posed in this case? — James Arten, Feb 13 '21 at 12:07

score 3 · Answer 1 · answered Feb 12 '21 at 21:36

Since $\phi$ is just a column-vector, you can circumvent the headache of assigning a data-type to a matrix-by-matrix derivative if you rewrite the original expression as $$ \begin{align} \|X - \phi\phi^T X\|_F^2 &= \operatorname{tr}[(X - \phi\phi^TX)^T(X - \phi\phi^TX)] \\ & = \operatorname{tr}(X^TX) - 2\operatorname{tr}(X^T\phi\phi^TX) + \operatorname{tr}(X^T\phi\phi^T\phi\phi^TX) \\ & = \operatorname{tr}(X^TX) - \operatorname{tr}(XX^T\phi\phi^T) \\ & = \operatorname{tr}(X^TX) - \operatorname{tr}(\phi^TXX^T\phi) = \operatorname{tr}(X^TX) - \phi^TXX^T\phi. \end{align} $$ In other words, it is equivalent to consider the optimization problem $$ \max_{\phi \in \Bbb R^d} \phi^T XX^T\phi \quad \text{s.t.} \quad \phi^T\phi = 1. $$ The fact that this maximum is the maximal eigenvalue of $XX^T$ (attained with $\phi$ equal to the corresponding eigenvector) is known as the "Rayleigh-Ritz theorem", but if you wanted you could derive this result using Lagrange multipliers.

Ok thanks but instead of rewriting how to formally justify the operation of differentiating something that is inside of the trace? — James Arten, Feb 14 '21 at 12:32

Property of trace in optimization problem

1 Answers1