Why does the Newton-Raphson method not converge for some functions?

Question

$f(x)=2x^2-x^3-2$. This is a cubic type graph as shown. The real root of this graph is $(-0.839,0)$.

So, the question is to use Newton's approximation method twice to approximate a solution to this $f(x)$.

I use an initial starting value of $x_0=1$. My first approximation is $x_1=2$, and my second one is $x_2=1.5$. I seem to not move any closer to the real solution as I keep iterating through the method.

Am I misunderstanding how to use this approximation? Is the issue that my first guess was too far from the actual solution?

Note that from $x_{n+1}=x_n+\dfrac{x_n^3-2x_n^2+2}{4x_n-3x_n^2}$, we should obtain $x_2=1.5$, not $2.5$. — projectilemotion, Aug 27 '17 at 14:20
Very nice example! The method seems to fail, but eventually is successful! — Peter, Aug 27 '17 at 14:23
@user163862 The graph shows that it is better to start from a value at left of root, like -1.2 — Narasimham, Aug 27 '17 at 19:17
@Narasimham To the left of the root does not really matter, the point is that it should be to the left of that minimum at $0$. — Ian, Aug 27 '17 at 22:13
In your problem, what happens if $x_1 = 0$. Why? What feature of the graph at $x=0$ causes this? If you pick $x_1 \in (0,1)$, where is $x_2$? What feature of the graph in that region causes this? — Eric Towers, Aug 27 '17 at 22:52
In short: your starting point was too near to an extremum. If you can pick starting points that are as far away from extrema as possible (and of course close to the desired root!), do so. — J. M. ain't a mathematician, Aug 28 '17 at 04:47
@Ian Indeed.If we take $ q \sin px + q \sin qx=0 , p,q$ real wavy function with many extrema..to keep on track it requires a criterion.. — Narasimham, Aug 28 '17 at 06:26
If you start with an initial value of -1, you will quickly zero in on the solution. In electronics we have a similar problem with the diode equation, I = Io (exp(kV) - 1). If you have a diode in series with a resistor, and you guess an initial voltage that is too high, you get nonsense. It turns out that for this one, there is weird way to get the solution directly by means of the Lambert W function (which is built in to the WP-34s handheld calculator). But that is not applicable to your function. — richard1941, Aug 30 '17 at 04:30
This article might be interesting: https://www.chiark.greenend.org.uk/~sgtatham/newton/ — Alice Ryhl, Aug 31 '17 at 10:40

score 69 · Answer 1 · edited Aug 27 '17 at 18:33

69

In fact, you gave up too early ; The method eventually converges :

1   2.000000000000000000000000000
2   1.500000000000000000000000000
3   0.3333333333333333333333333333
4   2.148148148148148148148148148
5   1.637079608343976160068114091
6   0.9483928480399477528436835979
7   1.910874140183680201544963299
8   1.405089904362402921055022221
9   -1.324018083676046424512855515
10  -0.9614381794507316717924414480
11  -0.8500221808505758631523579893
12  -0.8393807176849843501240483025
13  -0.8392867625049899194321196645
14  -0.8392867552141611764525252322
15  -0.8392867552141611325518525647
16  -0.8392867552141611325518525647
17  -0.8392867552141611325518525647
18  -0.8392867552141611325518525647
19  -0.8392867552141611325518525647
20  -0.8392867552141611325518525647
?

edited Aug 27 '17 at 18:33

Simply Beautiful Art

74,685

answered Aug 27 '17 at 14:17

Peter

84,454

3

Oops, very fast upvote :) – Peter Aug 27 '17 at 14:17
1

Yes, I was going to transfer my previous comment to an answer, but you were a bit faster than me. – projectilemotion Aug 27 '17 at 14:18
@projectilemotion Well, no problem, just post the answer. Sometimes answers very similar to a given answer are upvoted as well. – Peter Aug 27 '17 at 14:20
8

Note that the magic happens when you finally bounce around to hit ~1.4 which then sends you way over to ~-1.3 – Ian Aug 27 '17 at 14:20
@Peter Too late, I deleted it already. – projectilemotion Aug 27 '17 at 14:21
Thank you! I see that I should start closer to the actual root. But in approximation, you may not know how to be closer. In just 12 tried, the computer got pretty close. – user163862 Aug 27 '17 at 14:33
4

There are some collections of conditions sufficient to guarantee the convergence of the newton-method. Especially cubics can be dangerous, sometimes the method actually diverges or oscillates. In doubt, you can try the slower but more reliable numerical methods as the bisection-method or regula-falsi. Or you can combine the methods and first search an approximation and then use newton to get a more precise result. – Peter Aug 27 '17 at 14:36
But even the bisection method requires the change of a sign. In the case of a double-root for example, it cannot be used. But in this case, you can apply the methods for the derivate. – Peter Aug 27 '17 at 14:41
Just a small tidbit, but I prefer all approximations to be shown out to the same amount of places. – Simply Beautiful Art Aug 27 '17 at 18:33
2

I don't think this answer actually addresses any point raised in the original question. So this particular initial value does lead to a converging sequence; fine. But why? Did it have to be that way? What about other possible starting values? – Marc van Leeuwen Aug 29 '17 at 08:51

Ian · Accepted Answer · 2017-08-27T14:50:52.307

Newton's method does not always converge. Its convergence theory is for "local" convergence which means you should start close to the root, where "close" is relative to the function you're dealing with. Far away from the root you can have highly nontrivial dynamics.

One qualitative property is that, in the 1D case, you should not have an extremum between the root you want and your initial guess. If you have an odd number of extrema in the way, then you will start going away from the root you want, as you see here. If you have an even number of extrema in the way, then you will start going the right way, but you may later find yourself in a spot with an odd number of extrema in the way, leading to problems later.

Of course you may eventually find an occasion where there are an even number of extrema in the way, and then you manage to skip over all of them and get to the right side. At that point things will usually work out (not always, though). In this problem with your initial guess, that eventually happens, because the system eventually finds its way just slightly to the right of the extremum on the right, which sends it far off to the left.

Wow! What a great explanation. Now I totally get it. – user163862 Aug 27 '17 at 14:36 — user163862, Aug 27 '17 at 14:36

Ramillies · Answer 3 · 2017-08-28T13:06:30.570

41

The other answers are great. I'd just like to add a concrete example of weird behavior of which the Ian's answer speaks.

Let's consider a function $f(x) = \operatorname{sgn} x \sqrt{|x|}$. According to the algorithm, we iterate $$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.$$ Now for the derivative (if we're not doing the derivative at $x = 0$, the $\operatorname{sgn} x$ is just a constant, and $|x| = x \operatorname{sgn} x$): $$f' = [\operatorname{sgn} x \sqrt{x \operatorname{sgn} x}]' = \operatorname{sgn} x \frac{1}{2\sqrt{x \operatorname{sgn} x}} \operatorname{sgn} x = \frac{1}{2\sqrt{|x|}}.$$ Plugging in: $$x_{n+1} = x_n - \frac{\operatorname{sgn} x \sqrt{|x|}}{1/(2\sqrt{|x|})} = x_n - 2 \operatorname{sgn} x \left(\sqrt{|x|}\right)^2 =\\ =x_n - 2\operatorname{sgn} x |x| = x_n - 2 x_n = - x_n.$$

So if we start iterating in $x = a$ (where $a \not = 0$), we get the sequence $a, -a, a, -a, \ldots$ and the method loops forever between those two points, never getting to the root $x = 0$!

Edit: Here's a gnuplotted image: (In each iteration, we make a tangent in the current point (the blue dashed line) and the $x$ for which the tangent crosses 0 is taken to be the next approximation (so we go along the magenta line in order to get the starting point for the next iteration).)

By the way, have a look at this image from Wikipedia:

It shows the complex plane colored with 5 colors, each color corresponding to one root of the complex equation $z^5 = 1$. Each point then has the color corresponding to the root to which Newton's method converges, if we start from that point. The "flowers" are beautiful to behold but totally abhorrent from the numerical point of view.

edited Aug 28 '17 at 13:06

answered Aug 27 '17 at 19:41

Ramillies

948

That's a nice image! – dafinguzman Aug 27 '17 at 20:35
11

"[...]Flowers are beautiful to behold but totally abhorrent from a numerical point of view" now range alongside "Sufficiently accurate for Poetry" as my favorite expressions ever! – Beyer Aug 28 '17 at 07:05
1

I like the "Sufficiently accurate for Poetry" much more! :-). // By the way, thank you, Simply Beautiful Art, for nicely inlining the image. – Ramillies Aug 28 '17 at 10:55
I doubt there is a root at x = 0! = 1 – ChemiCalChems Aug 28 '17 at 11:49
1

I'm sorry, but I can't see why not. $f(x) = \operatorname{sgn} x \sqrt{|x|}$, so $f(0) = \operatorname{sgn} 0 \sqrt{|0|} = 0 \cdot 0 = 0$. Hence $x = 0$ is a root. (Btw.: I say root, not stationary point.) – Ramillies Aug 28 '17 at 12:03
I added a gnuplotted image of the function (and of the loop the Newton's method makes on it). Hope that helps prevent some confusion. – Ramillies Aug 28 '17 at 13:00
@Ramillies - ChemiCalChems was joking about the exclamation mark present in the english sentence being confused with a factorial symbol applied to zero, and since $0! = 1$ indeed there is no root at $0!$. – Pedro A Aug 28 '17 at 23:18
I see. Sorry! (Being a programmer, I just translated != as $\not =$ and was very, very confused about comparing the value of $f(0)$ with 1.) – Ramillies Aug 28 '17 at 23:42

user541686 · Answer 4 · 2019-11-06T23:39:09.080

Newton's method has no global convergence guarantee for arbitrary functions, as you just learned.

Now, people have posted examples of where Newton's method doesn't converge, but they're all rather "unusual" functions (some being very non-smooth), so it's natural to assume they're pathological and won't happen in practice.

Your example is one where Newton just takes more iterations than expected to converge, so it's not too bad. But here is an example of a cubic polynomial for which Newton's method won't converge!

\begin{align*} f(x) &= -0.74 + 0.765 x + 1.1 x^2 - 3.55 x^3 \\ x_0 &= 5/9 \end{align*}

Not only that, but it's in fact a stable oscillation—small perturbations won't change the behavior.

And for bonus points, you can generate as many of these as you want! Just let the Newton step be $$g(x) = x - f'(x)^{-1} f(x)$$ and then you're just looking for a nontrivial solution to the equation $$x_0 = g^3(x_0) = g(g(g(x_0)))$$ where a solution $x_0$ would be trivial if $g(x_0) = 0$.
Notice the equation above is just another polynomial equation (although of a much higher order)—which means its solutions can be readily found numerically.
The only caveat is that these solutions might not necessarily be stable. I suspect you should be able to place a second-derivative condition to ensure stable solutions, but the exact equation is not obvious to me at the moment, so I'll leave it as an exercise for the reader. :-)

Mathematica code for the plot:

Manipulate[
  With[{f = Evaluate[Rationalize[d + c # + b #^2 + a #^3]] &}, Plot[
    f[x], {x, -0.61, 1},
    PlotStyle -> {Thickness[Tiny]},
    Prolog -> {Thickness[Tiny],
      Line[Flatten[Map[
     {{#, 0}, {#, f[#]}} &,
     NestList[Compile[t, t - f[t]/f'[t]], x0, n]], 1]]}]],
  {{a, -3.55}, -4, 4},
  {{b, 1.1}, -2, 2},
  {{c, 0.765}, -1, 1},
  {{d, -0.74}, -1, 1},
  {{x0, 5/9}, 0, 1},
  {{n, 100}, 0, 1000, 1}]

Update:

I wrote some code to purposefully find both stable and unstable iterations:

Newton[f_] := t \[Function] t - f[t]/f'[t];
NewtonPlot[f_, xmin_, xmax_, x0_, n_, args___] :=
  Plot[f[x], {x, xmin, xmax}, args,
   Prolog -> {Thickness[Tiny],
     Line[Flatten[Map[
        {{#, 0}, {#, f[#]}} &,
        NestList[Compile[t, Newton[f][t]], x0, n]], 1]]}];
FindOscillatoryNewtonSolutions[f_, n_, h_](* {Stables,Unstables} *):=
  With[{step = Newton[f]},
   With[{nstep = t \[Function] Nest[step, t, n]},
    GroupBy[
     Map[#[[1]][[2]] &, 
      Solve[{nstep[t] == t, Abs[step[t] - t] >= h}, t, Reals]],
     t \[Function] 
      With[{step1hp = nstep[t + h], step1hm = nstep[t - h]}, True \[And]
        Abs[N[step1hp - t]] >= Abs[N[nstep[step1hp] - t]] \[And]
        Abs[N[step1hm - t]] >= Abs[N[nstep[step1hm] - t]]]]]];
With[{z = 400, xmin = -1.1, xmax = +0.65, h = 10^-3},
 Manipulate[
  With[{f = 
     t \[Function] Evaluate[Rationalize[d + c t + b t^2 + a t^3]], 
    n = 3, m = 8},
   With[{solcategories = FindOscillatoryNewtonSolutions[f, n, h]},
    If[Length[solcategories] >= 2,
     Map[{Transpose@{N@SortBy[NestList[Newton[f], #1, n - 1], N]},
        NewtonPlot[f, xmin, xmax, N[#1 + dx], n*m,
         PlotStyle -> {Thickness[Tiny]},
         ImageSize -> Scaled[0.2],
         PerformanceGoal -> "Quality"]} &, {First@Lookup[solcategories, True], 
       First@Lookup[solcategories, False]}],
     Throw["Did not manage to find both a stable and an unstable solution."]]]],
  {{dx, h}, -4 h, +4 h, h/4},
  {{a, -355}, -z, +z, 1}, {{b, 110}, -z, +z, 1},
  {{c, 77}, -z, +z, 1}, {{d, -74}, -z, +z, 1},
  SynchronousUpdating -> False]]

Notice, below, that the first point iterates stably, whereas the second one iterates unstably. (The iterations would diverge further, but I cut them off at some point.)

Thank you, I was hoping someone would post an example like this. — , Aug 28 '17 at 15:39

Luca Citi · Answer 5 · 2017-08-27T22:09:51.363

10

Newton-Raphson can behave badly even in seemingly easy situations. I am considering the use of N-R for minimization (rather than root finding, but the same applies). Even in the case of convex functions, N-R may not converge.

For example: $$f(x)=\ln(e^x+e^{-x})$$ is $C^{\infty}$, strictly convex and admits a single (global) minimum in 0. $f(x)=\ln(e^x+e^{-x})$

Yet, if we try to use N-R to find the minimum (i.e. the root of the derivative), the algorithm fails if started from a point $|x|>1.09$ (approx).

To see this, consider that $f'(x)=\tanh(x)$ and $f''(x)=\cosh^{-2}(x)$. Therefore, the N-R update rule maps the current $x$ to the new $x$ according to: $$x \leftarrow x - \frac{f'(x)}{f''(x)} = x - \frac12 \sinh(2x)$$ Whenever $|\frac12 \sinh(2x)| > |2x|$, the new iterate will be bigger in modulus than the previous one, i.e. N-R will diverge away from the solution. This happens for $|x|>1.09$ (approx).

edited Aug 27 '17 at 22:09

answered Aug 27 '17 at 22:03

Luca Citi

1,809

5

Note that this happens because $\tanh$ is extremely flat for large $x$, so although you go in the right direction toward the root (since $\tanh$ has no extrema as I said in my answer), you massively overshoot. (By the way, this is a good example but I think it would be better to just go for root finding applied to $\tanh$ in the first place instead of talking about optimization.) – Ian Aug 27 '17 at 22:11
1

@Ian Indeed, the quadratic approximation fits locally but is very poor globally. A better approach in this case would be to use a majorization minimization algorithm, which guarantees monotonic convergence towards a local (global in this case) minimum. – Luca Citi Aug 27 '17 at 22:15
A simpler thing to do for this (rather easy) problem is just to detect that the sign changed in the first iteration, so that the root was overshot. Thus you obtain a bracket and so you can either default to a pure bracket method like bisection, or a hybrid method (e.g. do a few bisection steps to get close enough to the root and then run Newton to refine). Of course this won't help you much in higher dimensions. It may also fail to help you even in one dimension if there is no sign change at the root...though of course that case is "bad" for Newton's method anyway. – Ian Aug 27 '17 at 22:34
1

I agree, it's far easier to see what's going on if you do root-finding with tanh than minimization with its antiderivative. – user541686 Aug 28 '17 at 04:11
1

@Ian and Mehrdad I see your point but the reason I showed this example of minimization rather than root finding is that I wanted to contrast the apparent niceness of the function ($C^{\infty}$ and convex) with the fact that N-R does not converge. In minimization convexity is a big deal and I have heard many colleagues blindly assume that N-R would converge. I do not know an equivalent on the case of root finding. – Luca Citi Aug 28 '17 at 11:02
BTW, while this example is 1d, I am thinking of applications in much higher dimensions as it often happens in machine learning and operations research. – Luca Citi Aug 28 '17 at 11:13
2

My other objection to such an example is that if you have a $C^2$ convex function and you use Newton's method to minimize it, then Newton's method always goes in the correct direction (in multiple dimensions, the correct direction and the Newton direction always have a positive dot product). It just might overshoot. You can't always detect overshoots, but you can detect that the objective function gets bigger. If it did, then by convexity you must have overshot, and so you should trigger a line search to find a better point in between the new point and the previous point. – Ian Aug 28 '17 at 13:27
1

@Ian: Your objection only holds in 1D, and in that case Newton isn't even the one doing the work; it's just the sign of the derivative telling you which direction to move. In 2D+ there's no guarantee you'll move in the correct direction; best you can hope for with gradients is that you won't be more than 90 degrees away from the correct direction (which is far less useful) and I don't think Newton can do any better there either. – user541686 Aug 31 '17 at 08:42
1

@LucaCiti: Thanks for the reply. I actually have a thing against convexity being such a big deal, but that'll take us on a long tangent (pun intended). :-) I'm not sure I understand what you mean regarding "not knowing an equivalent on the case of root-finding" -- isn't your example also a root-finding example if you just try to find the root of the derivative? Regardless though, you can definitely keep it as-is, it's nice to have an optimization example here anyway. – user541686 Aug 31 '17 at 08:46
Your answer is a bit old, but I would like to ask you where does the inequality $|\frac12 \sinh(2x)| > |2x|$ come from? I know that $\frac12 \sinh(2x)$ equals $f'/f''$ but how did you choose $2x$? – asd Mar 14 '20 at 22:48
@Jazz The borderline case between convergence and divergence is when the algorithm is stuck in a limit cycle between opposite values ($x_{k+1} = -x_k; \forall k$). For positive $x_k$, the algorithm diverges if $x_{k+1} = x_k - \frac12 \sinh(2 x_k) < -x_k$, i.e.: $\frac12 \sinh(2 x_k) > 2 x_k$. For negative $x_k$, the algorithm diverges if $x_{k+1} = x_k - \frac12 \sinh(2 x_k) > -x_k$, i.e.: $\frac12 \sinh(2 x_k) < 2 x_k$. Both cases correspond to $|\frac12 \sinh(2 x_k)| > |2 x_k|$. – Luca Citi Mar 16 '20 at 00:08
1

@LucaCiti Ahhh. Got it! Thanks for your answer! – asd Mar 19 '20 at 11:37

score 10 · Answer 6 · answered Aug 27 '17 at 23:01

The relationship between starting point and eventual convergence point for Newton's method is complicated. Consider this image (from Wolfram's MathWorld, Newton's Method):

showing the basis of attraction in the complex plane for $x^3 - 1$, $x^4 - 1$, ... $x^{11} - 1$. Each colour corresponds to a root of the polynomial. Each point is colored to match the root that Newton's method eventually takes that point to.

These basins are typically quite complicated, but if you look at what is going on in the complex plane, it can be a little clearer what's going on. The real line runs horizontally, halfway down these plots. If all you know is what is happening on the real axis, which is what you are seeing in your problem, it can be difficult to predict where a given point is going to end up.

score 7 · Answer 7 · answered Aug 27 '17 at 22:34

There are several articles about the convergence of Newton's method. There is something called the Newton-Kantorovich theorem which gives rigour to the notion of convergence regions.. your starting point must be within the Fatou set which encloses the point of attraction of the dynamical system formed by the iterates of the Newton iteration.

On the Newton–Kantorovich hypothesis for solving equations

I used the Method here to explore the convergence regions for asserting convergence of Newton's method for the Gram points of the zeta function Convergence of the Newton-Kantorovich Method for the Gram Points

score 2 · Answer 8 · edited Sep 01 '17 at 03:55

The Newton-Raphson method basically asks you to draw the tangent to the function at the point $x_0$, and $x_1$ is the point where that tangent hits the $x$-axis. This obviously works well if the function follows reasonably close to the tangent up to the point where $y = 0$, and not so well otherwise.

You can see that between $x = 0$ and $x \approx 1.4$ the tangent actually points in the wrong direction, away from the $0$. And where the tangent is parallel or almost to the $x$-axis, close to $x = 0$ and $x = 1.4$, following the tangent will get you far away from the zero.

For $x_0 < -0.346, \ x_1$ will be closer to the solution than $x_0$, and will be in the same range again, so the sequence will converge towards the solution. For $x_0 > 1.5, \ x_1$ will also be closer to the solution than $x_0$, but you need to get over the area from $-0.346 \to 1.5$.

And $\approx 1.081792$ there is a point where $3$ iterations get you back to the point where you started :-)

Why does the Newton-Raphson method not converge for some functions?

8 Answers8

Update:

Linked