3

Let $f:\mathbb{R}^n \rightarrow \mathbb{R}$ be three times continuously differentiable function that is strongly convex. Does the Newton-Raphson method for minimization given by

$$ \mathbf{x_{k+1}}=\mathbf{x_{k}}-\left(\nabla^2 f(\mathbf{x_{k}})\right)^{-1}\nabla f(\mathbf{x_k}) $$

converge to the global minimun for any given initial guess $\mathbf{x_0}\in\mathbb{R}^n$?

There is an known way to prove that it's true in that situation. If one prove that Armijo condition holds with $\alpha=1$ for Newton-Raphson method then my statement is true.

Usually this question comes on equations and with $n=1$ as done in Proof of convergence of newton method for convex function and Newton’s method works for convex real functions. There is yet an unsolved question related to mine, i.e., Conditions under which the damped Newton method is globally convergent?.

2 Answers2

4

The answer is no. Newton's Method for minimization does not necessarily converge for any strongly convex function and any initial guess. $\textbf{Stephen Boyd}$ and $\textbf{Lieven Vandenberghe}$ in their book called $\textbf{Convex Optimization}$ give an example of such function.

Difine $f:\mathbb{R}\rightarrow\mathbb{R}$ by $f(x)=\ln(\exp(-x)+\exp(x))$ and the initial guess $x_0=1.1$. The Newton's Method does not converge in that situation.

  • The reference of the answer can be found in the Exercice 9.10 of the aforementioned book. – Guillaume Jan 31 '20 at 15:58
  • 2
    Actually, the given example is a strictly convex function ($f''(x)\neq 0$ for all $x$), but not a strongly convex one since $lim_{x\to\pm\infty}f''(x)=0$. It is strongly convex on bounded subsets however. – Kas Dec 02 '22 at 13:07
0

The example I gave was not strongly convex in the entire domain. So I asked if it would be the same case when $f$ has a Lipschitz derivative and is strongly convex. The assumptions made here are too restrictive for several applications, but the answer will be positive, as you will see. The idea is to guarantee the Armijo rule for the method.

Let us start. First, the derivative is Lipschitz with constant $L$, and the function is strongly convex. Hence, there is a real number $\delta>0$ such that, for all $x$ and $d$ in $\mathbb{R}^n$, we have $$\begin{array}{c} \delta \|d\|^2 \leq d^{T}\nabla^2 f(x) d \leq L \|d\|^2 \\ 1/L \|d\|^2 \leq d^{T}\nabla^2 f(x)^{-1} d \leq 1/\delta \|d\|^2 \end{array}.$$ This means that, for all $x$ and $d$ in $\mathbb{R}^n$, $$ f(x+h) \leq f(x)+ \nabla f (x)^{T}h + \dfrac{L}{2} \|h\|^2. $$ Hence, applying at $h = - \nabla^2 f(x)^{-1} \nabla f(x), $ we have

$$\begin{align} f( x- \nabla^2 f(x)^{-1} \nabla f(x) )\leq & f(x) - \nabla f (x)^{T}\left(I-\dfrac{L}{2} \nabla^{2} f(x)^{-1}\right) \nabla^{2} f(x)^{-1} \nabla f (x) \\ = & f(x) + \dfrac{2}{L} \nabla f (x)^{T}\left(I-\dfrac{L}{2} \nabla^{2} f(x)^{-1}\right) \left( - \dfrac{L}{2} \nabla^{2} f(x)^{-1} \right) \nabla f (x) \\ = & f(x) + \dfrac{2}{L} \nabla f (x)^{T}\left(I-\dfrac{L}{2} \nabla^{2} f(x)^{-1}\right)^2 \nabla f (x) - \dfrac{2}{L}\nabla f (x)^{T} \left(I-\dfrac{L}{2} \nabla^{2} f(x)^{-1}\right) \nabla f (x)\\ = & f(x) + \dfrac{2}{L} \nabla f (x)^{T}\left( \left(I-\dfrac{L}{2} \nabla^{2} f(x)^{-1}\right)^2 - \left(I-\dfrac{L}{2} \nabla^{2} f(x)^{-1}\right)\right) \nabla f (x) \end{align}$$ Let us suppose that $L<2 \delta. $ This implies that $$\left( \left(I-\dfrac{L}{2} \nabla^{2} f(x)^{-1}\right)^2 - \left(I-\dfrac{L}{2} \nabla^{2} f(x)^{-1}\right) \right)\leq \left( \dfrac{L}{2 \delta} - 1 \right) \dfrac{L}{2 \delta} I .$$ Applying to $x^{k}$, this implies that $$f(x^{k+1})\leq f(x^{k}) + \dfrac{2}{L}\left( \dfrac{L}{2 \delta} - 1 \right) \dfrac{L}{2 \delta}\|\nabla f(x^{k})\|^2.$$ Hence, by the telescoping sum, $$-f(x^{k+1}) + f (x^{0}) \geq \left( 1 - \dfrac{L}{2 \delta} \right) \dfrac{1}{\delta} \sum^{k}_{i=0} \|\nabla f(x^{k})\|^2.$$ Now, this is enough to guarantee that the method converges since the function is bounded below and, due to that, the sum $\sum^{k}_{i=0} \|\nabla f(x^{k})\|^2$ is bounded. Hence, $\lim_{k \rightarrow \infty} \|\nabla f (x^{k})\| = 0.$ It's not hard to derive that $\{x^{k}\}$ converges using this fact.

Despite the assumption $L<2\delta$ is not practical for most situations, this shows that the Newtons's method work for a strongly convex function $f:\mathbb{R} \rightarrow \mathbb{R}$.

  • I said: Despite the assumption <2 is not practical for most situations, this shows that the Newtons's method work for a strongly convex function. Notice that this result is not weaker than those I cited in my question. The Newton's method for minimization is the Newton's method for zero applied to the derivative of $f$. Hence, my result has a different formulation for finding zeros: Suppose that a primitive of $f$ is strongly convex and has bounded derivative ($L>f'>\delta$), then the Newton's method for zero converges – R. W. Prado Sep 06 '23 at 18:39