0

Say I have a problem given by \begin{align} \min_{x\in\mathbb{R}^n} & \ ||g(x)||_1, \\ \text{s.t. } &z_i(x)+||c^{(i)}(x)||_1 \leq d_i, \ i\in\{1,...,N\}, \end{align} where $g:\mathbb{R}^n\mapsto\mathbb{R}^m$, $c^{(i)}:\mathbb{R}^n\mapsto\mathbb{R}^q$, $z:\mathbb{R}^n\mapsto \mathbb{R}^N$ are (for the sake of the question) infinitely often differentiable (but not necessarily convex), and $d\in\mathbb{R}^N$, $N\in\mathbb{Z}_{\geq 1}$ are fixed. As given, this problem is non-smooth (due to the 1 norm).

I know that this problem can be reformulated into a smooth, non-convex optimization problem by introducing auxiliary variables (as e.g. explained here). However, if $m$ or $q$ get large, the number of auxiliary variables that I need to introduce becomes quite large (not to mention the number of additional constraints). Say I want to solve the reformulated problem using MATLAB's fmincon with the sqp algorithm (I know that the interior-point algorithm is more large-scale capable, but sqp performs better for my problems, especially since I have a good initial guess for $x$ (that might be infeasible)). Then the growing number of auxiliary variables and constraints becomes a problem (assume that I can compute all gradients for the reformulated, smooth optimization problem), especially since I want the problem to be solved reasonably quickly (order of seconds).

I can think of the following ways to circumvent this problem:

  1. Simply compute the subgradients (instead of gradients) and continue using fmincon's sqp as if the problem were smooth. In that case, there is no need for auxiliary variables (or auxiliary constraints); but this is probably not the best idea, as then convergence to a stationary point is no longer guarantueed.
  2. Use a sub-gradient optimization solver. However, I did not find an obvious choice for my problem setup (ideally already implemented in MATLAB, as I do not consider myself an optimization expert and thus would like to avoid implementing it myself). Again, assume that I can compute all necessary sub-gradients analytically.
  3. Reformulate the absolute values using e.g. the pseudo huber-loss function. In that case, however, one has to choose an additional hyperparameter. For me, it is not entirely clear how that would have to be chosen.

With 1. not really being an option, I was wondering if either anyone knows any good sub-gradient solvers, or if 3. might make more sense? Any other input or suggestion is also appreciated!

VGD
  • 292
  • You are constraining all q elements of c(x) to be zero? I.e., q (presumably nonlinear) equality constraints (right-hand side equals zero). is that your intention? if so, it would be better to formulate the problem that way. – Mark L. Stone Apr 25 '22 at 17:58
  • Not entirely sure what you mean: I have inequality constraints, and $||c(x)||1 = \sum{i=1}^{q} |c_i(x)|$ (as per definition of the 1 norm). – VGD Apr 26 '22 at 09:36
  • All norms are $\ge 0$. So because your norm $\le 0$, it must equal zero. Which means all elements must equal zero. is that really what you want? – Mark L. Stone Apr 26 '22 at 11:13
  • You are of course right. Guess I was a little hasty with removing unnecessary stuff from the actual problem statement... Hopefully it makes sense now, and thanks for the help btw. – VGD Apr 26 '22 at 11:43

0 Answers0