Conditions for local and global optimality

Question

Consider the everywhere twice differentiable function $f:\mathbb R^n\to \mathbb R$, the closed and convex set $\mathcal S$, and the convex optimization problem

$$ \min_{x\in \mathcal S} \; f(x). $$

Is there an easy / intuitive way of proving both statements?

$x = x^*$ is a local minimizer if $\nabla f(x^*) = 0$ and $\nabla^2 f(x^*) \succ 0$. Specifically, the condition $\nabla^2 f(x^*) \succeq 0$ is not sufficient, since as a counterexample we can consider $f(x) = x^3$, where at $x = 0$, $\nabla^2 f(0) = 0$ and $\nabla f(0) = 0$, but $0$ is not a minimum.
$x = x^*$ is a global minimizer if $\nabla f(x^*) = 0$ and $\nabla^2 f(x) \succeq 0$ for all $x\in \mathcal S$.

The second statement in particular is quite well-known in convex optimization literature. However, I wonder if there is a nice proof, to reassure ourselves that there are no corner cases (like the one found in case 1).

In the second case, yes (but not strictly convex). in the first, no. But the only assumptions I'm stating is that $f$ is twice differentiable over $x\in \mathcal S$; the rest should be from the assumptions. — Y. S., Dec 23 '18 at 03:17
Let's assume $S$ is closed and convex, not necessarily bounded. (question amended) — Y. S., Dec 23 '18 at 03:55
So I think the answer might be as simple as: all stationary points are either local min, local max, or saddle points. If the function has a PSD hessian everywhere, then in the interior of S the stationary points must be local mins. Clearly at the boundary, saddles can occur, but the "descending" part must happen outside of S. Anyway, the question is probably unnecessarily pedantic; I mostly just wanted to clarify my understanding here to make sure there weren't any strange corner cases in case 2. — Y. S., Dec 23 '18 at 04:01
@Y.S. I'm not really sure what kind of intuition you're looking for but here's my take: $\nabla^2 f(x^)\ge 0$ doesn't tell us much, it's the fact that we assume $\nabla^2 f\ge 0$ throughout the domain in 2. that allows us to deduce anything at all. That's why merely assuming that $\nabla^2 f(x^)\ge 0$ in 1. doesn't tell you anything. It's the stronger assumption $\nabla^2 f(x^)> 0$ (and continuity) that happens to imply the weak form of "$\nabla^2 f\ge 0$ throughout the domain". Hence it's not like "$\nabla^2 f(x^)= 0$" is a corner case but more of $\nabla^2 f(x^*)> 0$ is nice. — BigbearZzz, Dec 23 '18 at 04:06
Ok I got it. Basically we just need to prove that if there exists $x^$ and $\hat x$ in the interior of $\mathcal S$ where $f(\hat x) < f(x^)$ and also $\nabla f(x^) = 0$, then by running mean value theorem twice, this suggests there is a negative second directional derivitive for some point between $x^$ and $\hat x$. — Y. S., Dec 23 '18 at 04:06
Specifically, there exist some point between $\hat x$ and $x^$ (let's call it $y$) where the directional derivative is negative. But all directional derivatives at $x^$ are 0. So there furthermore exist a point between $x^*$ and $y$ where the SECOND directional derivative is ALSO negative. Therefore the Hessian at that point must be NOT PSD. — Y. S., Dec 23 '18 at 04:08
@Y.S. By assuming continuity of $\nabla^2 f$ (which is a reasonable assumption), you can even get an open neighborhood consisting of negative definite hessian. I hope my answers so far help you somehow :) — BigbearZzz, Dec 23 '18 at 04:14
@BigbearZzz Thanks for your ongoing discussion. I just want to clarify though, the issue I am trying to resolve here is not when the Hessian is PD. I agree with you, that if you have continuity and all that jazz, then if you carefully construct the argument enough, you can show local optimality. However, we know from the $f(x) = x^3$ example that it is possible to have an SPD (not PD) Hessian and not be locally optimal, not in any neighborhood no matter how small. — Y. S., Dec 23 '18 at 04:19
The question I was asking is, when you force global convexity (and thus have no saddles), can you provably guarantee that all gradient-0, Hessian-SPD points MUST be local minima? (ignoring globality for the moment.) I still maintain that without using that mean value theorem argument, the statement is simply "memorized folklore" and not mathematically proven. However, thanks for your ongoing discussion; it has helped me clarify what exactly it was I was asking. — Y. S., Dec 23 '18 at 04:19
I'm not sure I follow you thought. If we force global convexity then Hessian doesn't really matter anymore. For a globally convex function, local minimum and global minimum coincide hence all we need is $\nabla f(x^*)=0$. — BigbearZzz, Dec 23 '18 at 04:28

score 1 · Answer 1 · answered Dec 23 '18 at 03:31

1

Yes to both of the questions (assuming that $\nabla^2 f$ is continuous for the first question).

The result follows from the multivariables Taylor theorem: $$ f(x+v) = f(x) + \nabla f(x)\cdot v + (\nabla^2f(x+\theta v): v\otimes v ) $$ for some $\theta\in(0,1)$. By letting $x=x^*$ this reduces to $$ f(x^*+v) - f(x^*) = (\nabla^2f(x^*+\theta v): v\otimes v ), $$ which obviously implies (2.).

For (1.), the assumption that $\nabla^2 f(x^*) \succ 0$ and continuity of $\nabla^2 f(x^*)$ means that $\nabla^2 f(x^*+\theta v) \succ 0$ for sufficiently small $v$ (see this question), thus the above formula shows that $f(x^*+v) - f(x^*) >0$ in a small neighborhood.

answered Dec 23 '18 at 03:31

BigbearZzz

15,084

Taylor's theorem only gives inequality for some $\theta\in (0,1)$, not for all $\theta\in (0,1)$, which is not sufficient to say that $f(x)$ is a minimizer (local or global). To be more specific, the possible corner case is $\nabla^2 f(x^) \succeq 0$ but not $\nabla^2 f(x^) \succ 0$, which isn't really resolved by this (since $f(x+v) \leq f(x)$ without really requiring global convexity). – Y. S. Dec 23 '18 at 03:36
What do you mean? For each $y\in\Bbb R^n$ we can let $v=y-x^*$ and for each such $v$ there's a corresponding $\theta = \theta(v)$. – BigbearZzz Dec 23 '18 at 03:39
So ok, restricting to case 1, for local optimality we need to say that there exists some $\epsilon$ where for ALL $y$ where $|y-x|\leq \epsilon$, $f(y) \leq f(x)$. This only says that there exists SOME $y$ in the $\epsilon$ ball around $x$. – Y. S. Dec 23 '18 at 03:41
1

That's why I said we require continuity of $\nabla^2 f$ at $x^$. The link I gave is about the openess of the set of positive definite matrices. Together they imply that $\nabla^2 f(y) > 0$ in a small neighborhood of $x^$. – BigbearZzz Dec 23 '18 at 03:43
Sure, ok. I am probably being unnecessarily pedantic about case 1; with a proof for open set PD-ness, I agree it works. But what about case 2? I think you can't use the same argument here. – Y. S. Dec 23 '18 at 03:45
For case 2. it's even simpler since you assumed $\nabla^2 f \ge 0$ on the entire $\Bbb R^n$. I don't see what's the problem you're talking about. – BigbearZzz Dec 23 '18 at 03:46
I mean, I know it is obviously true. I guess I am just hoping for some math intuition, since previously I thought $\nabla f(x^*) \succeq 0$ was sufficient for local optimality, then found a counterexample. It's more of like, what is a nice way of proving this, so we don't have to be paranoid about possible counterexamples? – Y. S. Dec 23 '18 at 03:47
I edited the question to clarify that point. – Y. S. Dec 23 '18 at 03:50
In the first case, $\nabla^2 f(x^) > 0$ implies that $\nabla^2 f(y) > 0$ in a small neighborhood by continuity. However, $\nabla^2 f(x^) \ge 0$ doesn't imply anything and we don't get $\nabla^2 f(y) \ge 0$ in any neighborhood for free. That's why we need to impose the stronger assumption that $\nabla^2 f(y) \ge 0$ ourself to ensure the result. – BigbearZzz Dec 23 '18 at 03:51

Conditions for local and global optimality

1 Answers1