1

In Machine Learning optimization problems - a "regularization" term is often added to the optimization problem to reduce overfitting:

enter image description here

I have noticed that in the case of the L2-Norm regularization, this term (i.e. a function) can be considered as a Convex Term as it is basically "quadratic" in nature.

My Question: In the L2-Norm case, the optimization problem without this regularization term is likely a Non-Convex problem - but we then add a Convex Term to this problem. Do we know if doing this (i.e. adding the Convex Term to a Non-Convex Optimization Problem) automatically makes the optimization problem as Convex?

I do not think that this is the case, seeing as:

  • Convex Optimization Problems are generally easier to solve than Non-Convex Optimization Problems

  • Anecdotally, I have heard of Regularized Loss Functions (e.g. for Neural Networks) that are considered to be "very difficult" optimization problems - even though they have this Convex Term. This informally leads me to believe that in the case of L2 Regularization, the fundamental optimization problem remains Non-Convex.

However, "anecdotal and informal logic" is generally never acceptable in understanding mathematics.

Can someone please comment on this?

Thanks!

stats_noob
  • 3,112
  • 4
  • 10
  • 36
  • 1
    Certainly not automatically; if lambda is small enough then your equation will be indistinguishable from the original. – Steven Stadnicki Mar 13 '22 at 18:03
  • 1
    Non-convex plus convex is non convex. One of the main issues with the optimization problems for Neural Networks is the large amount of data and the large amount of variables/constraints. But it also true that the costs may be pretty ugly. – KBS Mar 13 '22 at 18:24
  • Thank you everyone for your replies! Much Apprecaited! – stats_noob Mar 13 '22 at 19:21
  • @KBS That is a good rule of thumb but is not always true. – RobPratt Mar 13 '22 at 19:48
  • @RobPratt Yes, you are right. – KBS Mar 13 '22 at 20:22

1 Answers1

4

Yes, adding a large enough convex term can make a problem convex. For example, consider the nonconvex function $-x^2$ and the convex function $x^2$. For constant $\lambda \ge 1$, the sum $-x^2 + \lambda x^2=(\lambda-1)x^2$ is convex.

This is also a standard trick in binary quadratic programming, where $x_i$ is a binary decision variable and the objective is to minimize the multivariate quadratic function $$\sum_i \sum_j q_{ij} x_i x_j + \sum_i c_i x_i$$ subject to linear constraints. Let $\lambda$ be the absolute value of the smallest (negative) eigenvalue of $Q=(q_{ij})$. Then adding $\lambda(x_i^2-x_i)$, which is $0$ when $x_i$ is binary, makes the objective function convex.

RobPratt
  • 45,619
  • @ RobPratt: Thank you so much for your answer! How "large" is "large enough"? In the case of the L2 Norm regularization problem that I posted - does this L2 Norm regularization term automatically make this optimization problem as Convex? Thank you so much! – stats_noob Mar 13 '22 at 19:50
  • If you have time - could you please take a look at this related question over here? https://math.stackexchange.com/questions/4402076/how-can-the-loss-functions-of-neural-networks-be-non-convex Thank you so much! – stats_noob Mar 13 '22 at 19:51
  • Is there a typo on the binary QP addition term, i.e., it should be the negative of what's there, i.e., need to be adding positive number to the diagonal? – Mark L. Stone Mar 15 '22 at 17:19
  • @MarkL.Stone Yes, corrected. Thanks! – RobPratt Mar 15 '22 at 17:52