On the uniqueness statement in the Eckart–Young–Mirsky theorem

Question

Let $A$ be an $m\times n$ real or complex matrix with SVD decomposition $A=U\Sigma V^*$ and $\text{rank}(A)=r$. The Eckart–Young–Mirsky theorem states that the following minimization problem:

$$\min_{B\in F^{m\times n}, \, \text{rank}(B)\leq k } \|A-B\|_F, $$

where $1\leq k \leq \min\{m,n\}$ and $\|\cdot\|_F$ denotes the Frobenius norm, has a solution given by $B=U_1\Sigma_1V_1^*$, where $U_1$ consists of the first $k$ columns of $U$, $\Sigma_1$ consists of the upper left $k\times k$ block of $\Sigma$, and $V_1$ consists of the first $k$ columns of $V$. Moreover, the minimum is given by $\sqrt{\sigma^2_{k+1}+\dots+\sigma^2_{\min\{m,n\}}}$, where $\sigma_j$ denotes the $j$th largest singular value of $A$.

On this Wikipedia page it is written that the minimizer $B$ is unique if and only if $\sigma_{k+1}\neq \sigma_k$. How can one show this?

Clearly the statement is false if $k> r$, for then $\sigma_{k+1}=\sigma_{k}=0$ and the only solution to the problem is $B=A$.

Thanks a lot for your help.

Remark: A similar question has been answered here but without proof.

EDIT: I made a proof attempt below. Any feedback is greatly appreciated!

I think that this is straightforward to show if you take it as given that every minimizer has the form $B = U_1 \Sigma_1 V_1^$ for some* SVD $A = U \Sigma V^*$ of $A$. — Ben Grossmann, Jul 26 '21 at 16:57
@BenGrossmann Thank you for your time. I will try to figure out why this is the case. — Alphie, Jul 26 '21 at 17:07
Note, by the way, that the correct statement is that the minimizers is unique iff $\sigma_{k+1} \neq \sigma_k$; you and the Wiki page use $r$ to mean different things. — Ben Grossmann, Jul 27 '21 at 14:31
@BenGrossmann I believe the minimization problem here can be reduced to the problem $\max Tr(U^AA^U)$ s.t. $U^U=I_k$ with $U\in F^{m\times k}$. It is known that one solution for this problem is to take the columns of $U$ to be eigenvectors of $AA^$. But I still don't see why this is the only possible solution. — Alphie, Jul 27 '21 at 15:03
That might be a bit easier to explain. For example, the argument given here via Lagrange multipliers means that every column must be an eigenvector in order to have a critical point. — Ben Grossmann, Jul 27 '21 at 15:14
@BenGrossmann Can I show it without Lagrange multipliers? Or do you see another way to show that every minimizer must have the form $B = U_1 \Sigma_1 V_1^$ for some SVD $A = U \Sigma V^$? — Alphie, Jul 27 '21 at 15:23
@BenGrossmann In the post you linked why is minimizing $\sum_{i=m}^n u_i^T A u_i$ equivalent to minimizing each $ u_i^T A u_i$ individually? — Alphie, Jul 27 '21 at 16:58
Actually, I'm not sure. Also, it looks like the Lagrange multiplier argument seems to ignore the constraint that the columns are orthogonal. — Ben Grossmann, Jul 27 '21 at 17:05
I thought I did see a way to show it, but in working through the details it's not so clear — Ben Grossmann, Jul 27 '21 at 17:05
@BenGrossmann I tried to provide a complete answer below. Any feedback would be very appreciated! — Alphie, Aug 03 '21 at 01:14

Alphie · Answer 1 · 2021-08-03T19:13:43.660

I will try to give a proof based on exercise $7.4.P17$ in the book Matrix Analysis by Horn & Johnson (2nd edition). Throughout $A=U\Sigma V^*$ will denote a fixed SVD decomposition for $A$.

If $k\geq r$ the unique solution to the minimization problem is $B=A$. It what follows we will assume $k< r$.

First suppose that $\sigma_{k+1}=\sigma_{k}$. Let $\Sigma_1=\text{diag}(\sigma_1,\dots,\sigma_k,0,\dots,0)$ and $\Sigma_2=\text{diag}(\sigma_1,\dots,\sigma_{k-1},0,\sigma_{k+1},0\dots,0)$ be $m\times n$ rectangular diagonal, and let $B_1=U\Sigma_1V^*$ and $B_2=U\Sigma_1V^*$. Then $\text{rank}(B_1)=\text{rank}(B_2)=k$ and

$$\|A-B_1\|^2_F=\sigma^2_{k+1}+\sigma^2_{k+2}+\dots+\sigma^2_{\min\{m,n\}}=\sigma^2_{k}+\sigma^2_{k+2}+\dots+\sigma^2_{\min\{m,n\}}=\|A-B_2\|^2_F$$ so both $B_1$ and $B_2$ are solutions to the minimization problem. Moreover,

$$\|B_1-B_2\|^2_F=\|\sigma_{k}u_kv_k^*-\sigma_{k+1}u_{k+1}v_{k+1}^*\|^2_F=2\sigma^2_k>0$$

so $B_1\neq B_2$. It follows that the minimizer $B$ is not unique.

Next suppose $\sigma_{k+1}\neq\sigma_{k}$. Suppose $B$ is $m\times n$ with $\text{rank}(B)\leq k$ and that $B$ is a minimizer. Then

$$\|A-B\|^2_F=\sum_{i=k+1}^n \sigma_i^2(A) \quad\quad (1)$$

A direct application of Von Neumann's trace inequality to the identity $\|A-B\|^2_F=\langle A,A\rangle_F-2\text{Re} \langle A,B\rangle_F+\langle B,B\rangle_F $ shows that

$$\|A-B\|^2_F\geq \sum_{i=1}^n[\sigma_i(A)-\sigma_i(B)]^2 \quad\quad (2)$$

with equality if and only if $\text{Re} \langle A,B\rangle_F=\sum_{i=1}^n\sigma_i(A)\sigma_i(B)$. Comparing $(1)$ and $(2)$ yields that $\sigma_i(A)=\sigma_i(B)$ for $i=1,\dots,k$. Hence $(2)$ is an equality and $$\text{Re} \text{ Tr}(B^*A)=\sum_{i=1}^n\sigma_i(A)\sigma_i(B)=\sum_{i=1}^k\sigma^2_i(A)$$ To progress further we will need the following two lemmas.

Lemma 1. Suppose $A$ is $n\times n$ and $\text{Re} \text{ Tr}(A)=\sum_{i=1}^n\sigma_i(A)$. Then $A$ is Hermitian positive-semidefinite.

Proof. Let $A=U\Sigma V^*$ be an SVD for $A$ and let $r=\text{rank}(A)$, so that $A=\sum_{i=1}^r \sigma_i(A) u_iv_i^*$. Then

$$\text{Re} \text{ Tr}(A)=\text{Re} \sum_{i=1}^r \sigma_i(A) \text{ Tr}(u_iv_i^*)= \sum_{i=1}^r \sigma_i(A) \text{Re} \langle u_i,v_i\rangle=\sum_{i=1}^r\sigma_i(A)$$ where the last equality holds by assumption. By Cauchy-Schwarz we have $\text{Re} \langle u_i,v_i \rangle \leq |\langle u_i,v_i\rangle|\leq \|u_i\|\|v_i\|=1$, so the only way to have equality above is to have $\text{Re} \langle u_i,v_i \rangle=1$ for $i=1,\dots,r$. Hence

$$1\leq \text{Re} \langle u_i,v_i \rangle \leq |\langle u_i,v_i\rangle|\leq \|u_i\|\|v_i\|=1, \quad i=1,\dots,r$$

Since the Cauchy-Schwarz inequality is an equality, we must have $u_i=d_iv_i$ for some scalar $d_i$ for each $i=1,\dots ,r $. Plugging back we obtain $1\leq \text{Re } d_i \leq |d_i|\leq 1$, and so $d_i=1$ for each $i=1, \dots r$. We have shown that $A=U_r\Sigma_r U_r^*$, where $U_r$ consists of the first $r$ columns of $U$ and $\Sigma_r$ is the upper left $r\times r$ block of $\Sigma$. So $A$ is Hermitian positive semi-definite.

Lemma 2. Let $A$,$B$ be $m\times n$ and suppose that $\|A-B\|^2_F= \sum_{i=1}^n[\sigma_i(A)-\sigma_i(B)]^2 $. Then $AB^*$ and $A^*B$ are both Hermitian positive semi-definite.

Proof. We noted above that equality holds if and only if $\text{Re} \text{ Tr}(B^*A)=\sum_{i=1}^n\sigma_i(A)\sigma_i(B)$. Hence

$$ \text{Re} \text{ Tr}(B^*A)\leq \sum_{i=1}^n\sigma_i(B^*A)\leq \sum_{i=1}^n\sigma_i(A)\sigma_i(B)=\text{Re} \text{ Tr}(B^*A)$$

where the first two inequalities are consequences of the Von Neumann trace inequality. Therefore $ \text{Re} \text{ Tr}(B^*A)= \sum_{i=1}^n\sigma_i(B^*A)$, and from Lemma 1 we conclude that $B^*A$ is Hermitian positive semi-definite. This implies that $A^*B=(B^*A)^*$ is Hermitian positive semi-definite. Since $\|A-B\|^2_F=\|A^*-B^*\|^2_F$ and $\sum_{i=1}^n[\sigma_i(A)-\sigma_i(B)]^2=\sum_{i=1}^m[\sigma_i(A^*)-\sigma_i(B^*)]^2 $ we can repeat the argument and conclude that $(B^*)^*A^*=BA^*$ is positive semi-definite. This implies that $AB^*=(B A^*)^*$ is Hermitian positive semi-definite.

Back to our proof where $B$ is a minimizer. Thanks to Lemma 2 we now know that $AB^*$ and $A^*B$ are both Hermitian positive semi-definite. Adapting the proof from here, it follows that there exist unitary matrices $X,Y$ of sizes $m,n$ respectively such that $A=X\Sigma Y^*$ and $B=X\Lambda Y^*$, where $\Sigma$ is the $m\times n$ matrix from the SVD of $A$ and $\Lambda$ is an $m\times n$ rectangular diagonal matrix with real entries. In fact, since $AB^*$ and $A^*B$ are positive semi-definite, one sees that we may take $\Lambda$ to have nonnegative entries (by using the fact that the upper left $r\times r$ submatrices of $AB^*$ and $A^*B$ are positive semi-definite in the proof) .

The fact that $\Lambda$ is rectangular diagonal with non-negative entries implies that its diagonal elements are the singular values of $B$, but not necessarily in decreasing order (from top left top bottom right). We claim that the singular values are in fact in decreasing order. To see this we will use the following result.

Lemma 3. Let $x,y,z\in \mathbb R^n$ with $x_1\geq\dots\geq x_n$ and $z_1\geq\dots\geq z_n$ . Let $X_j=\sum_{i=1}^j x_i$ and $Y_j=\sum_{i=1}^j y_i$ for $j=1, \dots,n$. Suppose that $Y_j\leq X_j$ for each $j=1, \dots,n$, with equality at $j=n$. Then $\sum_{i=1}^nz_iy_i\leq \sum_{i=1}^nz_ix_i$ with equality if and only if $(z_i-z_{i+1})(X_i-Y_i)=0$ for each $i=1,\dots,n-1$.

Proof. Use summation by parts twice to get

$$\sum_{i=1}^n z_i y_i=\sum_{i=1}^{n-1} (z_i-z_{i+1})Y_i + z_nY_n \leq \sum_{i=1}^{n-1} (z_i-z_{i+1})X_i + z_nX_n=\sum_{i=1}^n z_i x_i $$

since $z_i-z_{i+1}\geq 0$ for each $i=1,\dots,n-1$. Equality is obtained if and only if $(z_i-z_{i+1})Y_i=(z_i-z_{i+1})X_i$ for $i=1,\dots,n-1$.

Write $\Lambda=\text{diag}(\lambda_1,\dots,\lambda_{\min\{m,n\}})$ $m\times n$ rectangular diagonal. Using the cyclic property of trace and the results obtained so far we have

$$ \text{Re} \text{ Tr} (B^*A)= \text{ Tr} (B^*A)=\text{ Tr} (\Lambda' \Sigma)=\sum_{i=1}^{\min\{m,n\}}\lambda_i\sigma_i(A)=\sum_{i=1}^{k}\sigma^2_i(A) =\sum_{i=1}^{\min\{m,n\}}\sigma_i(A)\sigma_i(B) $$

We now use the equality condition in lemma 3 with $x_i=\sigma_i(B)$, $y_i=\lambda_i$, $z_i=\sigma_i(A)$ for $i=1,\dots,\min\{m,n\}$ to obtain

$$\bigg[\sigma_j(A)-\sigma_{j+1}(A)\bigg]\bigg[\sum_{i=1}^j \sigma_i(B)-\sum_{i=1}^j \lambda_i\bigg]=0, \quad j=1,\dots, \min\{m,n\}-1 \quad \quad (3)$$

Since $k<r\leq \min\{m,n\}$, we can use $(3)$ at $j=k$ and the condition $\sigma_k(A)\neq\sigma_{k+1}(A)$ to conclude that $\sum_{i=1}^k \sigma_i(B)=\sum_{i=1}^k \lambda_i$. Since $\text{rank}(B)=k$ and $\sum_{i=1}^{\min\{m,n\}}\lambda_i=\sum_{i=1}^{\min\{m,n\}}\sigma_i(B)$, this implies $\lambda_i=0$ for $i=k+1,\dots, \min\{m,n\}$.

Next we work from the top down through the distinct singular values of $A$. Let $s_1>\dots>s_d>0$ denote the distinct positive singular values of $A$ with respective multiplicities $n_1,\dots,n_d$, with $n_1+\dots+n_d=r$. The conditions $k<r$ and $\sigma_k(A)\neq\sigma_{k+1}(A)$ imply that $k= n_1+\dots+n_l$ for some $1\leq l< d$. Use $(3)$ at $j=n_1$, that is

$$\bigg[\sigma_{n_1}(A)-\sigma_{n_1+1}(A)\bigg]\bigg[\sum_{i=1}^{n_1} \sigma_i(B)-\sum_{i=1}^{n_1} \lambda_i\bigg]=\bigg[s_1-s_2 \bigg]\bigg[n_1s_1-\sum_{i=1}^{n_1} \lambda_i\bigg]=0$$ and the inequality $\lambda_i\leq s_1$ to conclude that $\lambda_i= s_1$ for $i=1,\dots,n_1$. Then use $(3)$ at $j=n_1+n_2$, that is

$$\bigg[\sigma_{n_1+n_2}(A)-\sigma_{n_1+n_2+1}(A)\bigg]\bigg[\sum_{i=1}^{n_1+n_2} \sigma_i(B)-\sum_{i=1}^{n_1+n_2} \lambda_i\bigg]=\bigg[s_2-s_3\bigg]\bigg[n_2 s_2-\sum_{i=n_1+1}^{n_2} \lambda_i\bigg]=0$$ and the inequality $\lambda_i\leq s_2$ for $i=n_1+1,\dots,n_2$ implied the previous step to conclude that $\lambda_i= s_2$ for $i=n_1+1,\dots,n_2$. Continuing inductively we obtain that the first $k$ diagonal elements of $\Lambda$ are the first $k$ singular values of $A$ in decreasing order from top left to bottom right, and that all other diagonal elements of $\Lambda$ are zero.

We have $A=X\Sigma Y^*$ and $B=X\Lambda Y^*$ with $\Sigma,\Lambda$ the $m\times n$ matrices of singular values from the SVD of $A$ and $B$. To finish the proof we will use a uniqueness property of the SVD proven here. Namely, there exist unitary matrices $U_i$ of sizes $n_i$ for $i=1,\dots,d$, and unitary matrices $\tilde{U},\tilde{V}$ of sizes $m-r,n-r$ respectively, such that $$X=U(U_1\oplus\dots\oplus U_d \oplus \tilde{U})$$ $$Y=V(U_1\oplus\dots\oplus U_d \oplus \tilde{V})$$

Also,

$$\Lambda=\begin{bmatrix} \Lambda_r & 0_{r \times(n-r)} \\ 0_{(m-r) \times r} & 0_{(m-r) \times (n-r)} \end{bmatrix}$$

with

$$\Lambda_r=s_1 I_{n_1}\oplus\dots\oplus s_l I_{n_l} \oplus 0_{n_{l+1} \times n_{l+1} } \dots\oplus 0_{n_d \times n_d }$$

Defining $W:=U_1\oplus\dots\oplus U_d \oplus \tilde{U}$ and $Z:=U_1\oplus\dots\oplus U_d \oplus \tilde{V}$, then direct computation shows that

$$W\Lambda=\Lambda Z$$

Therefore,

$$X\Lambda Y^*=UW\Lambda Z^*V^*=U\Lambda ZZ^* V^*=U\Lambda V^*$$

We have shown that, if $B$ is a solution to the minimization problem, then $B=U_1\Sigma_1V_1^*$, where $U_1$ consists of the first $k$ columns of $U$, $\Sigma_1$ consists of the upper left $k\times k$ block of $\Sigma$, and $V_1$ consists of the first $k$ columns of $V$. This completes the proof of uniqueness.

This works! I strongly suspect that there is an easier approach, but I haven't been able to put one together — Ben Grossmann, Aug 06 '21 at 16:22
@BenGrossmann Thanks a lot for your feeback I was afraid I made an obvious mistake. — Alphie, Aug 07 '21 at 16:17

On the uniqueness statement in the Eckart–Young–Mirsky theorem

1 Answers1

Linked