1

A differentiable function $f$ is strongly convex if

$$ f(y) \geq f(x) + \nabla f(x)^T (y-x) + \frac{\mu}{2} \| y - x \|^2$$

for some $\mu >0$ and all $x, y$. How to minimize with respect to $y$ on both sides above?

The answer from the reference website is:

$$ f(x^*) \geq f(x) - \frac{1}{2 \mu} \| \nabla f(x) \|^2 $$

I didn't figure out the last term above.

Jonathen
  • 1,044

1 Answers1

2

It looks like he is taking the gradient of equation 1 with respect to $y$ and then taking the minimum: $$ \nabla_y\left(\nabla f(x)^T (y-x) + \frac{\mu}{2} \| y - x \|^2\right)= \nabla f(x) + \mu(y-x)=0, $$ where the final "$=0$" comes from the minimization, and all the terms in $x$ drop out, so $$\nabla f(x)=-\mu(y - x)$$ at the minimum. If you insert that back into his equation 1 again, you get $$ \begin{align} f(x^*) &\geq f(x) - {1\over\mu}\nabla f(x)^T \nabla f(x) + \frac{\mu}{2}{1\over\mu^2}\Vert \nabla f(x)\Vert^2\\ &=f(x)-{1\over 2\mu}\Vert \nabla f(x)\Vert^2\\ \end{align} $$

(Just in case people are having difficulty understanding this answer, the top equation is the gradient (vector partial derivative) with respect to the $y$ variable, so terms like $\nabla_y \nabla f(x)= 0$ because there are no terms in $y$ in $\nabla f(x)$. The equation is a vector equation, so $y-x$ and $0$ here both mean the vectors. The next equation is also in vectors, then I substitute the vector $y-x$ with $\nabla f/\mu$, and the final inequality is a scalar one.)

Suzu Hirose
  • 11,660