Nearest semi-orthogonal matrix using the entry-wise $ {\ell}_{1} $ norm

Question

Given an $m \times n$ matrix $M$ ($m \geq n$), the nearest semi-orthogonal matrix problem in $m \times n$ matrix $R$ is

$$\begin{array}{ll} \text{minimize} & \| M - R \|_F\\ \text{subject to} & R^T R = I_n\end{array}$$

A solution can be found by using Lagrangian or polar decomposition, and is known to be

$$\hat{R} := M(M^TM)^{-1/2}$$

If $\|\cdot\|_F$ is replaced by the entry-wise $1$-norm

$$\|A\|_1 := \|\operatorname{vec}(A)\|_1 = \sum_{i,j} |A_{i,j}|$$

the problem becomes

$$\boxed{\begin{array}{ll} \text{minimize} & \| M - R \|_1\\ \text{subject to} & R^T R = I_n\end{array}}$$

What do we know about the solutions in this case? Is $\hat{R}$ still a solution? If the solution is something else, do analytic forms or approximations exist? Any insight or direction to literature is appreciated.

Rodrigo de Azevedo · Accepted Answer · 2018-04-06T11:09:39.193

We have the following non-convex optimization problem in (non-fat) $\mathrm X \in \mathbb R^{m \times n}$

\begin{array}{ll} \text{minimize} & \| \mathrm X - \mathrm A \|_1\\ \text{subject to} & \mathrm X^\top \mathrm X = \mathrm I_n\end{array}

where (non-fat) $\mathrm A \in \mathbb R^{m \times n}$ is given. The feasible region, defined by $\mathrm X^\top \mathrm X = \mathrm I_n$, is a Stiefel manifold whose convex hull is defined by $\mathrm X^\top \mathrm X \preceq \mathrm I_n$, or, equivalently, by the inequality $\| \mathrm X \|_2 \leq 1$. Hence, a convex relaxation of the original optimization problem is

$$\boxed{\qquad \begin{array}{ll} \text{minimize} & \| \mathrm X - \mathrm A \|_1\\ \text{subject to} & \| \mathrm X \|_2 \leq 1\end{array} \qquad}$$

Via the Schur complement test for positive semidefiniteness, $\mathrm X^\top \mathrm X \preceq \mathrm I_n$ can be rewritten as the following linear matrix inequality (LMI)

$$\begin{bmatrix} \mathrm I_m & \mathrm X\\ \mathrm X^\top & \mathrm I_n\end{bmatrix} \succeq \mathrm O_{m+n}$$

and, thus,

$$\begin{array}{ll} \text{minimize} & \| \mathrm X - \mathrm A \|_1\\ \text{subject to} & \begin{bmatrix} \mathrm I_m & \mathrm X\\ \mathrm X^\top & \mathrm I_n\end{bmatrix} \succeq \mathrm O_{m+n}\end{array}$$

Introducing variable $\mathrm Y \in \mathbb R^{m \times n}$, we obtain the following semidefinite program in $\mathrm X, \mathrm Y \in \mathbb R^{m \times n}$

$$\begin{array}{ll} \text{minimize} & \langle 1_m 1_n^\top, \mathrm Y \rangle\\ \text{subject to} & - \mathrm Y \leq \mathrm X - \mathrm A \leq \mathrm Y\\ & \begin{bmatrix} \mathrm I_m & \mathrm X\\ \mathrm X^\top & \mathrm I_n\end{bmatrix} \succeq \mathrm O_{m+n}\end{array}$$

Numerical experiments

The following Python + NumPy + CVXPY code solves the convex relaxation of the original problem:

from cvxpy import *
import numpy as np

(m,n) = (30,3)

A = np.random.rand(m,n)

X = Variable(m,n)

# define optimization problem
prob = Problem( Minimize(norm(X-A,1)), [ norm(X,2) <= 1 ])

# solve optimization problem
print prob.solve()
print prob.status

# print results
print "X = \n", X.value
print "Spectral norm of X = ", np.linalg.norm(X.value,2)
print "X.T * X = \n", (X.value).T * (X.value)
print "Round(X.T * X) = \n", np.round((X.value).T * (X.value), 3)

Unfortunately, it produces

X.T * X = 
[[ 0.50912521  0.22478268  0.25341844]
 [ 0.22478268  0.4961892   0.2654229 ]
 [ 0.25341844  0.2654229   0.46896397]]

which is quite "far" from the $3 \times 3$ identity matrix. This is disappointing.

However, using the Frobenius norm instead of the entry-wise $1$-norm:

prob = Problem( Minimize(norm(X-A,'fro')), [ norm(X,2) <= 1 ])

we obtain the following output

3.85187827124
optimal
X = 
[[ 0.13146134 -0.14000011  0.32874275]
 [-0.01234834  0.11837379  0.06604536]
 [-0.10490978  0.10542352  0.25211362]
 [ 0.25263062  0.14707194  0.03884215]
 [ 0.00181182  0.4702248  -0.20126385]
 [ 0.13013444 -0.07199484  0.08727077]
 [ 0.19077548 -0.01423872 -0.05960523]
 [ 0.2865637  -0.00202074  0.02844798]
 [ 0.01602302  0.04395754  0.00154713]
 [ 0.17932924  0.30926775 -0.05940074]
 [ 0.42908676 -0.21953956  0.12394825]
 [ 0.1255695   0.16415755  0.14634119]
 [ 0.28144817  0.08592836  0.08426443]
 [ 0.18209884  0.25983065 -0.02550957]
 [ 0.11077068 -0.10874038  0.23649308]
 [ 0.01565326  0.14043185  0.01186364]
 [-0.04374642  0.20360714  0.01079417]
 [-0.00440237  0.17746665  0.12931808]
 [ 0.18899948  0.08389032  0.23493301]
 [ 0.25802202  0.16171055  0.09263858]
 [-0.01921053  0.26863496  0.13077382]
 [-0.11175044  0.06137184  0.32758781]
 [-0.20321302  0.37803842  0.17629377]
 [-0.09301606 -0.0783033   0.36603431]
 [-0.00572361  0.17620931 -0.04822991]
 [ 0.24944001  0.18830197 -0.05522287]
 [-0.2049578   0.1091767   0.38943593]
 [ 0.36823908 -0.10026829  0.04425441]
 [ 0.03582131  0.03531081  0.08337656]
 [-0.07315648 -0.0739467   0.35741916]]
Spectral norm of X =  1.00018499995
X.T * X = 
[[ 0.99843209 -0.00168429 -0.0019862 ]
 [-0.00168429  0.99833293 -0.00222193]
 [-0.0019862  -0.00222193  0.99790575]]
Round(X.T * X) = 
[[ 0.998 -0.002 -0.002]
 [-0.002  0.998 -0.002]
 [-0.002 -0.002  0.998]]

where $\rm X^\top X$ is now quite "close" to the $3 \times 3$ identity matrix.

I did perform many numerical experiments, with several values for $m$ and $n$. My conclusions:

Minimizing the Frobenius norm worked rather satisfactorily for very tall matrices ($m \gg n$).
Unfortunately, minimizing the entry-wise $1$-norm failed in all experiments I performed.

By "worked" and "failed" I mean that the solution of the relaxed convex problem is a solution of the original (non-convex) optimization problem or not, respectively.

Your notation is inconsistent with the original question. Your $A$ is the original poster's $M$. — Brian Borchers, Apr 05 '18 at 01:13
My comment was merely there to help any reader who was confused by change in notation. — Brian Borchers, Apr 05 '18 at 01:17
I'm with Brian on this one—the notation change is unnecessarily distracting. As for using $m$ and $M$ at the same time, well, that would throw me off every time I need to formulate a big-$M$ method! ;-) — Michael Grant, Apr 05 '18 at 01:17
$$\begin{array}{ll} \text{minimize} & \langle 1_m 1_n^\top, \mathrm Y \rangle\ \text{subject to} & - \mathrm Y \leq \mathrm R - \mathrm M \leq \mathrm Y\ & \begin{bmatrix} \mathrm I_m & \mathrm R\ \mathrm R^\top & \mathrm I_n\end{bmatrix} \succeq \mathrm O_{m+n}\end{array}$$ — Rodrigo de Azevedo, Apr 05 '18 at 01:25
@RodrigodeAzevedo, Is there a projection onto the Stiefel Manifold (Also the space of Semi Orthogonal Matrices)? — Royi, Feb 03 '19 at 04:29
@Royi That sounds Procrustes-like. Please post a question on that so that I know which objective and constraints you have in mind, and then send me the URL. — Rodrigo de Azevedo, Feb 03 '19 at 08:20

greg · Answer 2 · 2021-03-22T17:21:03.540

Reformulate the problem by constructing the $R$ matrix in terms of an unconstrained matrix $U$ such that the constraint is always satisfied: $$R = U(U^TU)^{-1/2}\quad\implies\quad R^TR=I$$ Then optimize with respect to the new unconstrained variable $U.\;$ Since $R$ is independent of $\,\|U\|\,$ controlling the growth of $\,\|U\|\,$ (via normalization) will be important for numerical algorithms.

The (sub)gradient of the Manhattan norm of $X$ is given by $$\eqalign{ \mu &= \|X\|_1 \\ d\mu &= {\rm sign}(X):dX \;\;\doteq\;\; S:dX \\ }$$ where the colon denotes the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB)$$ and the sign function $${\rm sign}(z) = \begin{cases} +{\tt1}\quad{\rm if}\;z\ge 0 \\ -{\tt1}\quad{\rm otherwise} \\ \end{cases}$$ is applied elementwise.

Now calculate the differential of $R$ $$\eqalign{ B &= (U^TU)^{1/2} = B^T \qquad&\big\{{\rm square\,root}\big\} \\ B^2 &= U^TU \\ B\,dB+dB\,B &= U^TdU+dU^TU \qquad&\big\{{\rm differentials}\big\} \\ (I_n\otimes B+B\otimes I_n)\,db &= \Big(I_n\otimes U^T+(U^T\otimes I_n)K\Big)\,du \qquad&\big\{{\rm vectorized}\big\} \\ db &= P\,du \\ \\ R &= UB^{-1} \\ dR &= dU\,B^{-1} - UB^{-1}dB\,B^{-1} \\ &= dU\,B^{-1} - R\,dB\,B^{-1} \\ dr &= (B^{-1}\otimes I_m)\,du - (B^{-1}\otimes R)\,db \\ &= \Big((B^{-1}\otimes I_m) - (B^{-1}\otimes R)P\Big)\,du \\ &= Q\,du \\ }$$ where $K$ is the Commutation Matrix associated with the vectorization of a matrix transpose via the Kronecker product.

Finally, let $\,X=(R-M)\,$ and calculate the gradient of the objective function $$\eqalign{ d\mu &= S:dX \\ &= S:dR \\ &= s:dr &\qquad\big\{{\rm vectorized}\big\} \\ &= s:Q\,du \\ &= Q^Ts:du \\ \frac{\partial\mu}{\partial u} &= Q^Ts &\qquad\big\{{\rm gradient}\big\} \\ \\ }$$ Now use a sub-gradient method initialized with the Frobenius solution $$\eqalign{ U_0 &= M \\ }$$ and iterated as
$$\eqalign{ B_k &= (U_k^TU_k)^{1/2} \\ R_k &= U_kB_k^{-1} \qquad\qquad\qquad\big\{{\rm current/best\,solution}\big\} \\ P_k &= (I_n\otimes B_k+B_k\otimes I_n)^{-1}\Big(I_n\otimes U_k^T+(U_k^T\otimes I_n)K\Big) \\ Q_k &= \Big((B_k^{-1}\otimes I_m) - (B_k^{-1}\otimes R_k)P_k\Big) \\ g_k &= Q_k^T\,{\rm vec}\Big({\rm sign}(R_k-M)\Big) \\ w_k &= \operatorname{vec}(U_k) - \lambda_kg_k \qquad\;\big\{{\rm subgradient\,step}\big\} \\ u_{k+1} &= \frac{w_k}{\|w_k\|_2} \qquad\qquad\qquad\big\{{\rm normalize}\big\} \\ U_{k+1} &= {\rm reshape}\big(\,u_{k+1},\, {\rm shape}(U_k)\,\big) \\ }$$

Nearest semi-orthogonal matrix using the entry-wise $ {\ell}_{1} $ norm

2 Answers2

Numerical experiments

Linked