2

If I consider the following optimization problem

\begin{align} &\min_{\phi \in \mathbb{R}^d} \|X- \phi \phi^T X\|^2_F\\ &\text{s.t. } \, \, \, \phi^T\phi=1 \end{align}

where $X$ is a $d \times n$ matrix and $\|\cdot\|_F$ denotes the frobenius norm. I want to show that this is just an eigenvalue problem.

With some few steps I can write $\|X- \phi \phi^T X\|^2_F = \operatorname{tr}(X^TX - XX^T\phi\phi^T)$ that I now consider as the objective function. So writing the lagrangian (and using linearity of trace) I get

\begin{equation} \mathcal{L} = \operatorname{tr}(X^TX) - \operatorname{tr}(XX^T\phi\phi^T) -\lambda(\phi^T\phi-1). \end{equation}

Then,

\begin{equation} \mathbb{R}^d \ni\frac{\partial\mathcal{L}}{\partial\phi} = -\operatorname{Tr}(2XX^T\phi) - 2\lambda\phi. \end{equation}

where I've used again linearity of trace and then commutation with derivative.

\begin{equation} \frac{d(\operatorname{tr}(f(X))}{dX} = \operatorname{Tr}\left(\frac{df(X)}{dX}\right) \end{equation}

The little thing that makes me wonder is that now the object I have inside the trace has dimensions $d \times 1$ (it's a vector), so the Trace isn't defined? Or is it just equal to its argument? And if yes why? Thanks!

EDIT:

Thank you, a way that I have found to justify in a more rigorous way that works for me is the following:

\begin{equation} \frac{\partial tr(XX^T\phi\phi^T)}{\partial \phi} = \frac{\partial}{\partial\phi}\sum_{i,j}x_{ij}^2\phi_j^2 = \sum_{i,j}2x_{i,j}^2\phi_j = 2XX^T\phi \end{equation}

The key thing here is that the trace of a product of matrices can be seen as the the sum of entry-wise products of their elements i.e. \begin{equation} tr(A^TB) = \sum_{i,j}(A\circ B)_{i,j} \end{equation}

where $\circ$ denotes the Hadamard product.

James Arten
  • 1,953
  • 1
  • 8
  • 20
  • For an alternative approach to this problem, see this post or this post – Ben Grossmann Feb 12 '21 at 21:24
  • Long story short, there is no standard treatment of the object $\frac{df}{dX}$ such that its "trace" is defined. – Ben Grossmann Feb 12 '21 at 21:37
  • The derivative of the Lagrangian was miscalculated. It should read $$\eqalign{ \frac{\partial\mathcal{L}}{\partial\phi} = -2XX^T\phi - 2\lambda\phi ;=; -2(XX^T+\lambda I)\phi\ }$$ Setting it to zero leads to an eigenvalue equation for the $XX^T$ matrix. – greg Feb 13 '21 at 00:48
  • @Ben Grossmann what does it mean? That the problem of finding the derivative of lagrangian is not well posed in this case? – James Arten Feb 13 '21 at 12:07

1 Answers1

3

Since $\phi$ is just a column-vector, you can circumvent the headache of assigning a data-type to a matrix-by-matrix derivative if you rewrite the original expression as $$ \begin{align} \|X - \phi\phi^T X\|_F^2 &= \operatorname{tr}[(X - \phi\phi^TX)^T(X - \phi\phi^TX)] \\ & = \operatorname{tr}(X^TX) - 2\operatorname{tr}(X^T\phi\phi^TX) + \operatorname{tr}(X^T\phi\phi^T\phi\phi^TX) \\ & = \operatorname{tr}(X^TX) - \operatorname{tr}(XX^T\phi\phi^T) \\ & = \operatorname{tr}(X^TX) - \operatorname{tr}(\phi^TXX^T\phi) = \operatorname{tr}(X^TX) - \phi^TXX^T\phi. \end{align} $$ In other words, it is equivalent to consider the optimization problem $$ \max_{\phi \in \Bbb R^d} \phi^T XX^T\phi \quad \text{s.t.} \quad \phi^T\phi = 1. $$ The fact that this maximum is the maximal eigenvalue of $XX^T$ (attained with $\phi$ equal to the corresponding eigenvector) is known as the "Rayleigh-Ritz theorem", but if you wanted you could derive this result using Lagrange multipliers.

Ben Grossmann
  • 225,327
  • Ok thanks but instead of rewriting how to formally justify the operation of differentiating something that is inside of the trace? – James Arten Feb 14 '21 at 12:32