2

I am studying the minimization problem:

$$\min_{X \in \mathbb{R}^{n \times d}} \quad f(X) := \left\| A - X X^T \right\|_F^2$$

where $n \times n$ matrix $A$ is positive semidefinite and $n > d$.


My work

Firstly, I observed that if $C$ is an ortogonal $d \times d$ matrix, and $X^{*}$ is a solution to the minimization problem, then $X^{*}C$ must also be a solution, since

$$f(XC) = ||A-XC(XC)^T||^2_F = ||A-XCC^TX^T||^2_F = ||A-XX^T||^2_F = f(X), \quad \forall X$$

Then, we have that $f(X) \geq 0 \ \ \forall X, \ $ so if $\ f(X_0)=0, \ $ $X_0 \ $ must be a local minimum, right?

So then we can have a minimum for an $X_0$ that satisfies $A = X_0X_0^T$. It is my understanding such $X_0$ exists given $A$ is semi-definite positive.

Does this prove $f(X)$ is non-convex since it appears to have multiple minima? Is my reasoning correct thus far?

3 Answers3

4

A couple of points:

First, convex functions can have multiple minima, so long as the minima form a convex set. It would prove $f$ is not strictly convex though.

Second, it's not immediately clear that we even get multiple values from $XC$. For example, if $X$ is the zero matrix, then $XC = 0$ for all $C$.

To clear this up, I would suggest considering $A = YY^\top$ for some non-zero $Y \in \Bbb{R}^{n \times d}$. Note that:

  • $A$ is positive-semidefinite,
  • $A$ is non-zero, and
  • The problem is minimised at (at least) $X = Y$, with $f(X) = 0$.

Then, the problem is minimised at $X = -Y$ as well, for the reasons you outlined (take $C = -I$). If $f$ were convex, we would expect the midpoint of these minimisers, i.e. $X = 0$, to also be a minimum. But, $$f(0) = \|A - 00^\top\|_F^2 = \|A\|_F^2 > 0 = f(X).$$ This is a contradiction, thus $f$ is indeed not convex.

Theo Bendit
  • 50,900
  • That are some good pointers, thank you. Would taking any other orthogonal $C$ give a non zero minimiser, given a non zero $Y$? – Carlos Gruss Jul 11 '21 at 06:38
  • To add on that last comment, that would mean the problem has $\frac{d(d+1)}{2}$ minima! – Carlos Gruss Jul 11 '21 at 06:47
  • The third bullet point isn't quite right. The $X$ in the OP is rectangular but $A$ can be positive definite (so that it isn't always possible to decompose it as $YY^T$ for some $n\times d$ matrix $Y$). – user1551 Jul 16 '21 at 11:34
  • 1
    @user1551 One of us has it backwards (it could be me); I'm fixing some arbitrary non-zero rectangular $Y$, and considering the specific case of the problem where $A = YY^\top$. That is, I'm not starting with $A$ and hoping to get $Y$, I'm starting with $Y$, and defining $A = YY^\top$. With this choice of $A$, there is demonstrably a non-convex minimising set, which proves the problem is non-convex. – Theo Bendit Jul 16 '21 at 16:51
3

Given $n \times n$ symmetric positive semidefinite matrix $\bf A$, let scalar field $f : \Bbb R^{n \times d} \to \Bbb R$ be defined by

$$f ({\bf X}) := \left\| {\bf X} {\bf X}^\top - {\bf A} \right\|_{\text{F}}^2 = \cdots = \mbox{tr} \left( \, {\bf X} {\bf X}^\top {\bf X} {\bf X}^\top \right) - 2 \, \mbox{tr} \left( {\bf X}^\top {\bf A} \, {\bf X} \right) + \left\| {\bf A} \right\|_{\text{F}}^2$$

Taking the gradient of $f$,

$$\nabla f ({\bf X}) = 4 \left( {\bf X} {\bf X}^\top - {\bf A} \right) {\bf X}$$

and, finding where the gradient vanishes, we obtain the following cubic matrix equation

$$\boxed{\left( {\bf X} {\bf X}^\top - {\bf A} \right) {\bf X} = {\bf O}_{n \times d}}$$

2

$ \def\l{\left} \def\r{\right} \def\lr#1{\l(#1\r)} \def\rnk#1{\operatorname{rank}\lr{#1}} $Since $\;\rnk{X}\le d,\;$ it looks like you want the best rank-$d$ approximation of $A$ in factored form.

Find the SVD of $(A^TA)$, with ordered singular values $(\sigma_1\ge\sigma_2\ge\ldots\ge\sigma_n)$ $$\eqalign{ A^2 &= (A^TA) = USU^T \\ Y &= US^{1/4} \quad\implies\quad YY^T = US^{1/2}U^T = A \\ }$$ The best rank-$d$ approximation (in the Frobenius sense) comes from the first $d$-columns of $Y$ $$\eqalign{ Y &= \Big[\,y_1\;\;y_2\;\;\ldots\;y_d\;\;\ldots\;y_n\,\Big] \quad&\implies A &= YY^T\\ X &= \Big[\,y_1\;\;y_2\;\;\ldots\;y_d\,\Big] \quad&\implies A &\approx XX^T \\\\ }$$


If the elements of $A$ are positive, NMF factorization is another approach $$\min_{\small W,H}\;\left\|A-WH^T\right\|_F^2$$ Initialize $\{W,H\}$ to random positive $(n\times d)$ matrices.
After the $\sf Lee$-$\sf Seung$ iterations $$\eqalign{ W_+ &= W\odot\lr{\frac{AH}{WH^TH}}, \qquad H_+ &= H\odot\lr{\frac{A^TW}{HW^TW}}\\ }$$ converge, $X$ can be recovered as the Arithmetic-Harmonic Mean $$X=\operatorname{mean}(W,H)$$

greg
  • 35,825