How this norm is converted to a linear programming problem

Question

I came across this problem in control systems and I would like to know how minimizing the norm is converted to a linear programming problem. The optimization problem seeks to minimize the Taxicab norm stated as follows

$$ \min_{\boldsymbol{u}} \| J\boldsymbol{u}+\alpha\boldsymbol{e}\|_1 $$ where $J \in \mathbb{R}^{m\times n},\boldsymbol{u}\in \mathbb{R}^{n},\boldsymbol{e}\in \mathbb{R}^{m}$. Clearly the optimization problem is not linear due to the absolute value of the norm. In order to solve it using Simplex method, we need to convert it to a linear optimization problem. In the paper I'm reading, they suggest the following conversion which I don't understand how they reach to this form.

$$ \begin{aligned} \min_{\boldsymbol{u},\boldsymbol{y}} \quad & \boldsymbol{1}^T\left( 2\boldsymbol{y} - ( J\boldsymbol{u}+\alpha\boldsymbol{e}) \right)\\ \textrm{s.t.} \quad & J\boldsymbol{u}-\boldsymbol{y} \leq -\alpha\boldsymbol{e}\\ &\boldsymbol{y}\geq0 \\ \end{aligned} $$

I came across this Linear programming: minimizing ... which suggests two inequality constraints but I couldn't relate it to my problem. Any suggestions?

Sudix · Answer 1 · 2022-12-13T21:16:43.050

Since $\alpha,\boldsymbol{e}$ are both constant, let $c := \alpha\boldsymbol{e}$.

Let $J_i$ be the i'th row of the matrix $J$, i.e. $J:=\pmatrix{J_1\\\vdots\\ J_n}$.

Then from the definition of the matrix multiplication follows that $$ Ju = \pmatrix{J_1 u \\\vdots\\ J_n u} $$ Therefore $$ \min_{u} | Ju+c|_1 = \min_{u} \sum_{i=1}^n |J_i u+c_i| $$ Now, we can use your linked question:
It tells us that we take every summand $|J_i u+c_i|$, replace it with a new variable $x_i$ in the objective function and add the two constraints $x_i\ge (J_i u+c_i)$ and $x_i \ge -(J_i u+c_i)$.

Now our minimization problem is $$ \min_{u,x_1,...,x_n} \sum_{i=1}^n x_i $$ such that $$\forall i\in \{1,...,n\}:\quad x_i\ge (J_i u+c_i) ,x_i \ge -(J_i u+c_i)$$ And from now it's reverse fitting:

Defining $x:= \pmatrix{x_1\\\vdots\\x_n}$ let's us reformulate to:

$$ \min_{u,x} \mathbb{1}^T x $$ such that $$x\ge J u+c ,x \ge -J u-c$$

Now we make the substitution $ x\leftarrow 2y-Ju-c$ and obtain: $$ \min_{u,y} \mathbb{1}^T (2y-(Ju+c)) $$ such that $$y\ge Ju+c ,y \ge 0$$

, which is the claimed formula.

Regarding the comment:

Substitutions like this are something general to equation systems. There are lots of small tricks involved, depending on what you want to do; Unluckily, I know of no source with gathers them.

I'll try for a somewhat general explanation:

First, rewrite the optimization problem using set notation: $$ \min\left\{ 1^T x \mid u\in\mathbb{R}^n \land x\in \mathbb{R}^m\land x\ge Ju+c \land x \ge -Ju-c \right\} $$ In other words, we have a function selecting on a set, and the function selects a specific element of this set. Let's call the set $M_0$.

Now, a substitution is an action altering the set. In general, this happens by adding, removing and/or altering the constraints on the right side of the set (i.e. after the $|$). You can do anything here (but if you add variables, then you should also add a constraint stating which set they do come from), but for your action to be of use for you, you want in general to be able to prove afterwards, that your resulting set, let's call it $M_1$, has one of the following three relations to $M_0$:

a) $M_0\subseteq M_1$
b) $M_0=M_1$
c) $M_0\supseteq M_1$

Then a) implies $\min M_1 \le \min M_0$,
b) implies $\min M_1 = \min M_0$,
and c) implies $\min M_1 \le \min M_0$.

Armed with this knowledge, we can know apply this to our substitution: We do this by adding the equation $x=2y-Ju-c$ to $\left\{ 1^T x \mid u\in\mathbb{R}^n \land x\in \mathbb{R}^m\land x\ge Ju+c \land x \ge -Ju-c \right\}$, obtaining $$ \min\left\{ 1^T x \mid u\in\mathbb{R}^n \land x\in \mathbb{R}^m\land x\ge Ju+c \land x \ge -Ju-c \land x=2y-Ju-c\right\} $$

Now linear algebra tells you that for every $x$, there is a unique solution of $x=2y-Ju-c$ for $y$. Therefore we have $$ \left\{ 1^T x \mid u\in\mathbb{R}^n \land x\in \mathbb{R}^m\land x\ge Ju+c \land x \ge -Ju-c \right\}\\=\\ \left\{ 1^T x \mid u\in\mathbb{R}^n \land x\in \mathbb{R}^m\land x\ge Ju+c \land x \ge -Ju-c \land x=2y-Ju-c\right\} $$, because any $x$ which satisfied the set restriction of the LHS still does in the RHS if we choose $y = (x-Ju-c)/2$, and in turn for a pair $(x,y)$ to satisfy the set restriction of the RHS, it has to hold that $y = (x-Ju-c)/2$.

When working with sets, any equation in the set restriction can be substituted into any other equation in the set restriction or the LHS of the set.

Doing that, we obtain: $$ \left\{ 1^T x \mid u\in\mathbb{R}^n \land x\in \mathbb{R}^m\land x\ge Ju+c \land x \ge -Ju-c \land x=2y-Ju-c\right\} \\=\\ \left\{ 1^T (2y-Ju-c) \mid u\in\mathbb{R}^n \land x\in \mathbb{R}^m\land 2y-Ju-c\ge Ju+c \land 2y-Ju-c \ge -Ju-c \land x=2y-Ju-c\right\} $$

, and all that's left to do is simplify each part on its own.

Thank you for the answer. I think I'm still struggling with the substitution part. Is it valid to modify the objective function that way? Also, could you please elaborate on the substitution part? — CroCo, Dec 13 '22 at 12:32

score 0 · Answer 2 · answered Dec 13 '22 at 09:42

0

I hope this might help.

What they are doing is to introduce an additional (auxiliary) variable $y$ which somehow upperbounds $Ju + \alpha e$, meaning that they would like to do something:

\begin{align} \min_{u,y} 1^\top y \\ s.t \quad &y \geq Ju+\alpha e \\ &-y \leq Ju+\alpha e \\ & y \geq 0 \end{align}

I don't know why they have in the cost $2y - (Ju+\alpha e)$, but my guess is that at the end of the minimization problem $2y - (Ju+\alpha e) = y$, meaning that you are actually minimizing the auxiliary variable as you would like, and the bound is tight.

answered Dec 13 '22 at 09:42

yes

878

Thank you for the attempt. This is what I was thinking by imposing two inequality constraints but I am still not able to understand why the objective function is written that way and why one constraint is deleted. Also, I don't understand why we need $\boldsymbol{1}^T$. – CroCo Dec 13 '22 at 09:54
The $1^\top$ is just to sum all the entries of the vector – yes Dec 13 '22 at 10:04
so this means each element in the vector $|\cdot|_1$ needs to be replaced with an auxiliary variable having two inequality constraints, right? – CroCo Dec 13 '22 at 10:06
Yes, because you have one constraint for the absolute value of each element of the vector, boiling down to two inequalities. – yes Dec 13 '22 at 10:31

score 0 · Answer 3 · answered Dec 13 '22 at 10:29

0

Intuitively this model arises from considering $|x|$ as a sum of the positive and negative part of $x$:

$$\mathrm{minimize}\ |x|$$

is the same as

$$\begin{array}{rl}\mathrm{minimize}& x_++x_-\\ \textrm{subject to} & x_+,x_-\geq 0 \\ & x_+-x_-=x\end{array}$$

Now call $y=x_+$, substitute out $x_-$, and you get your formulation.

(This was for one variable, of course you have to do this and add it all coordinatewise, hence $1^T$ etc.)

There are also other models of the same, most typically one uses

$$\begin{array}{rl}\mathrm{minimize}& t\\ \textrm{subject to} & t\geq x \\ & t\geq -x\end{array}$$

which differs from yours just by some linear substitution.

answered Dec 13 '22 at 10:29

Michal Adamaszek

4,018

Thank you for the answer. Could you please elaborate on the linear substitution. – CroCo Dec 13 '22 at 12:28
$t=2y-x$ is the only choice. But that's a'posteriori to make the model fit your model. In practice there is no reason for it, any of the 3 models you now have is just as good. – Michal Adamaszek Dec 13 '22 at 12:45
From the two inequalities, how did you form $t=2y-x$? – CroCo Dec 13 '22 at 12:48
I formed it because it looks like you very strongly want to squeeze your problem into having objective of the form $2y-x$, although it is just an unimportant detail. Maybe I should not have written this second formulation. Just focus on the first one. Sorry. – Michal Adamaszek Dec 13 '22 at 12:53

score 0 · Answer 4 · answered Dec 13 '22 at 13:25

Note that if $$ Y=\left\{\left.\mathbf{y}\in\mathbb{R}^m\,\right|\,J\mathbf{u}-\mathbf{y}\le-\alpha\mathbf{e},\,\mathbf{y}\ge0\right\}\ , $$ then $$ \min_{\mathbf{y}\in Y}\,\mathbf{1}^T(2\mathbf{y}-(J\mathbf{u}+\alpha\mathbf{e}))=\|J\mathbf{u}+\alpha\mathbf{e}\|_1\ , $$ the minimum being achieved for $$ y_i=\max\big(0,\,\mathbf{e}_i^T(J\mathbf{u}+\alpha\mathbf{e})\big)\ , $$ where $\ \mathbf{e}_i\ $ is the unit column vector whose $\ i^\text{th}\ $ entry is $\ 1\ $, and all of whose other entries are $\ 0\ $. Thus $$ \min_{\mathbf{u}\in\mathbb{R}^n,\,\mathbf{y}\in Y}\,\mathbf{1}^T(2\mathbf{y}-(J\mathbf{u}+\alpha\mathbf{e}))=\min_{\mathbf{u}\in\mathbb{R}^n}\|J\mathbf{u}+\alpha\mathbf{e}\|_1\ . $$

How this norm is converted to a linear programming problem

4 Answers4