1

I trying to prove and understand the equivalence of definitions for a $\gamma$-strongly convex function. I am aware that a function $f:\mathbb{R}^n\mapsto\mathbb{R}$ is strongly convex of modulus $\gamma$ if $\forall$ ${\bf x}$, ${\bf y}\in\mathbb{R}^n$, we have \begin{align*} \frac{\gamma}2||{\bf x}-{\bf y}||^2_2&\leq f({\bf y})-f({\bf x})-\nabla f({\bf x})^\mathrm{T}({\bf y}-{\bf x}),\\ f[(1-\lambda)x+\lambda y]&\leq(1-\lambda)f(x)+\lambda f(y)-\frac12\gamma\lambda(1-\lambda)||{\bf x}-{\bf y}||^2_2. \end{align*} However, I do not really understand how to prove that the two are equivalent, even after looking through other MSE questions. I found a presentation that talks about the graph being convex but not being flat, and I am having a hard time relating geometry of the problem to the definitions above as well. Could someone help me out/link me to other MSE posts that may help me out? I’ll delete this if it’s been asked before (I don’t know why I couldn’t find anything that made sense to me). Will really appreciate any help, thanks!

user107224
  • 2,218

1 Answers1

3

Let me provide the geometric interpretation. To start with the first definition: $$\frac{\gamma}2||{\bf x}-{\bf y}||^2_2\leq f({\bf y})-f({\bf x})-\nabla f({\bf x})^\mathrm{T}({\bf y}-{\bf x})$$ It can be rewritten as: $$f({\bf y}) \geq f({\bf x})+\nabla f({\bf x})^\mathrm{T}({\bf y}-{\bf x}) + \frac{\gamma}2||{\bf x}-{\bf y}||^2_2$$ The right hand side is the first order taylor approximation at $\bf{x}$ plus a quadratic function. Without the last term, the inequality means that the graph lies above any tangent line, which is valid for convex functions (see this question for a proof). The quadratic term means that the graph is not just above the tangent line, but that the difference between the tangent line and the graph increases at least quadratically.

In the second definition: $$f[(1-\lambda)x+\lambda y]\leq(1-\lambda)f(x)+\lambda f(y)-\frac12\gamma\lambda(1-\lambda)||{\bf x}-{\bf y}||^2_2$$ if you omit the final term, you recognize the definition of a convex function: the line that connects $x$ and $y$ should be above the graph on the interval $[x,y]$. The final term provides a lower bound on the distance between the line and the graph. For fixed $x$ and $y$, the last term is quadratic in $\lambda$ with the peak at $\lambda=0.5$.

To show equivalence, I suggest you start with the second definition and try to adapt the proof linked above.

LinAlg
  • 19,822