2

I want to minimize the following cost function

$$J= \displaystyle\min_{\boldsymbol{\rm x}} \left\{\|A\boldsymbol{\rm x}_\lambda-b\|_2^2 + \lambda\|\boldsymbol{\rm x}_\lambda\|_2\right\}$$

It is a bit different from Tikhonov regularization because the penalty term is not squared. As opposed to Tikhonov, which has an analytic solution, I was not able to find such solution for $J$. Could someone give me some hint which method I can use to solve for $\boldsymbol{\rm x}_\lambda$?

Royi
  • 8,711
  • 1
    any quasi newton / trust region method for instance. you might lose some theoretical convergence speed due to the fact that $\left|\cdot\right|_2$ is continous but not smooth. (more explicit hint for the case of quasi-newton: CG/BFGS/limited-memory-BFGS) – Max Feb 09 '18 at 22:55
  • @Max: I used CVX (http://cvxr.com/) which uses SeDuMi to solve it but unfortunately, I have not figured out the details of the solver method. CVX gives me solutions that makes sense. However, I am trying to write my own solver. By any chance do you happen to use CVX/SeDuMi? – shashashamti2008 Feb 09 '18 at 23:08
  • 1
    since cvx seems to be a matlab lib, i guess you use matlab. do you have the optimization toolbox? if yes: fminunc( @(x) sum((Ax-b).^2 +$\lambda$norm(x) , x0 ) might already do the job. https://de.mathworks.com/help/optim/ug/fminunc.html having said this, i would always recommend fmincon before fminunc since it has more solvers (such as L-BFGS) implemented (and you can just omit any constraints) https://de.mathworks.com/help/optim/ug/fmincon.html And of course it's always a good idea to implement a gradient function such that finite differences are not necessairy. – Max Feb 09 '18 at 23:20
  • Thanks Max. I do have MATLAB but it does not have the optimization toolbox. – shashashamti2008 Feb 09 '18 at 23:25
  • 1
    if the example code on http://cvxr.com/cvx/ does work, all you need to do is to omit the constraints and to replace the objective function by sum((Ax-b).^2) +$\lambda$norm(x). – Max Feb 10 '18 at 09:28
  • I successfully used CVX and got excellent results. I want to know how SeDuMi minimizes $J$ and by which method as I want to write my own solver. – shashashamti2008 Feb 10 '18 at 09:30
  • The method seems to be from "An O (√ nL)-iteration homogeneous and self-dual linear programming algorithm" (no free pdf on google scholar, but sci-hub.tv works) – Max Feb 10 '18 at 21:08

1 Answers1

3

You can solve this problem easily using the proximal gradient method or an accelerated proximal gradient method such as FISTA. This is a powerful tool that everyone can easily learn to use. The proximal gradient method solves optimization problems of the form $$ \text{minimize} \quad f(x) + g(x) $$ where $f$ and $g$ are convex and $f$ is differentiable (with a Lipschitz continuous gradient) and $g$ is "simple" in the sense that its proximal operator can be evaluated efficiently. The proximal operator of $g$ is defined as follows: $$ \tag{$\spadesuit$} \text{prox}_{tg}(\hat x) = \arg \, \min_x \quad g(x) + \frac{1}{2t} \| x - \hat x \|_2^2. $$ The proximal operator of $g$ reduces the value of $g$ without straying too far from $\hat x$. The parameter $t > 0$ can be thought of as a "step size" that controls how far away from $\hat x$ we are allowed to move. Note that $g$ is not required to be differentiable. (But $g$ is required to be lower-semicontinuous, which is a mild condition that is usually satisfied in practice.)

The proximal gradient method iteration is $$ x^{k+1} = \text{prox}_{tg}(x^k - t \nabla f(x^k)). $$ What is happening here is that, starting from the current location $x^k$, we take a gradient descent step to reduce the value of $f$, then we apply the prox-operator of $g$ to reduce the value of $g$. The step size $t > 0$ is required to satisfy $t \leq 1/L$, where $L$ is a Lipschitz constant for $\nabla f$. (There is also a line search version of the proximal gradient method that might converge faster. In the line search version the step size $t$ is chosen adaptively at each iteration.)

Your optimization problem has the form $(\spadesuit)$ where $$ f(x) = \| Ax - b \|_2^2 $$ and $$ g(x) = \lambda \|x\|_2. $$ The gradient of $f$ can be evaluated using calculus: $$ \nabla f(x) = 2 A^T(Ax - b). $$ There is a nice formula for the proximal operator of the $\ell_2$-norm: $$ \text{prox}_{tg}(x) = \text{prox}_{t \lambda \| \cdot \|_2}(x) = x - P_B(x), $$ where $B$ is the $2$-norm ball of radius $t \lambda$ centered at the origin and $P_B(x)$ is the projection of $x$ onto $B$. (A formula for the proximal operator of a norm can be found for example in Vandenberghe's UCLA 236c notes. See chapter 8 "The proximal mapping", slide 8-3.)

With these formulas, you are ready to implement the proximal gradient method to solve your optimization problem using just a few lines of code. There is also an accelerated version of the proximal gradient method known as FISTA which is equally simple and likely converges faster.

littleO
  • 51,938
  • 1
    Nicely explained. Thank you for that. I did not know of this method. I am going to learn more about it and implement it in MATLAB. – shashashamti2008 Feb 10 '18 at 10:09
  • The calculation of the Prox is given here - https://math.stackexchange.com/questions/1681658. – Royi Mar 16 '18 at 16:02