0

Consider the convex constrained optimization problem $$ \min_{x\in \mathcal C} \; f(x) $$ where $\mathcal C$ is a closed convex set and $f(x)$ is convex. Suppose I know that for some $\bar x$, $$ \nabla f(\bar x)^T (\bar x - x) \leq \epsilon, \; \forall x \in \mathcal C \text{ and } \|x-\bar x\|_2 \leq \delta $$ for some small $\epsilon > 0$, $\delta > 0$. Can I generalize this condition globally, e.g. does this imply that $$ \nabla f(\bar x)^T (\bar x - x) \leq \epsilon, \; \forall x \in \mathcal C? $$ Note that this is true if $\epsilon = 0$, since this would be equivalent to saying "local optimality implies global optimality". But does it hold for approximate optimality?

Y. S.
  • 1,816

1 Answers1

1

By the Cauchy-Schwarz inequality, $$\nabla f(\bar{x})^T(\bar{x}-x)\leq |\nabla f(\bar{x})^T(\bar{x}-x)|\leq ||\nabla f(\bar{x})||_2||\bar{x}-x||_2$$ and so given any $\bar{x}\in\mathcal{C}$ and $\varepsilon>0$, it is always possible to choose $\delta>0$ (depending on $\bar{x}$ and $\varepsilon$) so that $x\in \mathcal{C}$ and $||\bar{x}-x||_2\leq \delta$ implies that $\nabla f(\bar{x})^T(\bar{x}-x)\leq\varepsilon$. And of course in general you cannot expect the inequality $\nabla f(\bar{x})^T(\bar{x}-x)\leq\varepsilon$ to hold for all $x\in\mathcal{C}$.

carmichael561
  • 53,688
  • Ok, but now let me ask a followup. If we now assume that $\delta$ is "big enough", does there exist a point in which local implies global? For the sake of argument, let's just say that $\epsilon < 2\delta |\nabla f(\bar x)|_2$. Does this give us any extrapolating properties? – Y. S. Dec 21 '19 at 00:08
  • 1
    Yes, I think you can always choose $\delta$ big enough, but I don't think it's enough to bound $\delta$ from below by $\varepsilon$ and $||\nabla f(\bar{x})||_2$. For instance, say $\mathcal{C}$ is the $x$-axis in $\mathbb{R}^2$, $\bar{x}=0$, and $\nabla f(0)=(\eta,1)^T$ for some small positive $\eta$. Then $\nabla f(0)^T(-t,0)=-\eta t$ can always be made larger than $\epsilon$ for $t$ sufficiently large and negative, so your $\delta$ would have to be huge even though $||\nabla f(0)||_2$ is roughly $1$. – carmichael561 Dec 21 '19 at 00:38
  • Yeah, it seems like I need to add more regularity assumptions here to get this "approximately optimal" thing to make sense. If you have just a tiny slant in the descent direction, then it seems like you always lose something sufficiently far away. I was hoping convexity would help here! Well, thanks for the discussion! – Y. S. Dec 21 '19 at 13:08