17

I'm creating a program to find the real roots of any given polynomial. It takes a user given polynomial as input and tries to find its roots. As an aid, mainly to check if I've missed any after the process, I'd like to know how many roots the polynomial has beforehand.

I am completely oblivious to the given polynomial. The only thing I can do with it is calculate the value for a given x. I do not know the degree of the polynomial.

The only lead I've found in this matter is Sturm's theorem but I don't see how it can be applied into this program.

I understand that given this restraint it may sound impossible but since there are so many mathematical algorithms that work solely based on this functionality, I thought it would be possible for there to be one that could solve this that I would be unaware of.

disnoca
  • 189
  • 2
    The number of real roots, or complex roots? – Arthur Feb 05 '23 at 23:30
  • In some cases, you can find out by computing the discriminant. For a degree $n$ polynomial with complex coefficients, if the discriminant is non-zero, the polynomial will have $n$ complex roots. – Joshua Tilley Feb 05 '23 at 23:43
  • More generally, you will just have to find the roots and count them. – Joshua Tilley Feb 05 '23 at 23:45
  • @Arthur Real roots, I've edited the question to clarify – disnoca Feb 05 '23 at 23:50
  • @JoshuaTilley The problem is that I can never know if I've found all the roots or if there are any missing that I'm unaware of, hence the question. – disnoca Feb 06 '23 at 00:00
  • 4
    How can you evaluate the polynomial without knowing its degree? – Joshua Tilley Feb 06 '23 at 00:02
  • 3
    Sturm's theorem seems like the right tool for this. It gives an algorithm for solving this problem. What part of Sturm's algorithm are you having difficulty implementing? – Ted Feb 06 '23 at 00:24
  • 1
    Give a clear example of your input and desired output. – NoChance Feb 06 '23 at 01:40
  • @Ted I apologize. I wasn't clear with the behaviour of my program. I've updated the question. I'm not programming for a specific function that I know beforehand, I'm programming for any given user function. Strum's theorem has various steps that can't be coded into a program such as finding the derivative expression and doing calculations between polynomials. – disnoca Feb 06 '23 at 02:13
  • @JoshuaTilley I have an algorithm in my program that transforms a user input polynomial into a function that returns the result of its expression given the value for its variable. – disnoca Feb 06 '23 at 02:16
  • "The only thing I can do with it is calculate the value for a given x." How could that be the case, if you say you are the taking the polynomial as input from the user? How is the user giving you the input? Presumably the user is giving you the coefficients of the polynomial. If the user gives you the coefficients, you have enough information to find the derivative, do long division, and all the other steps in Sturm's algorithm. You mimic the steps in the computer that you would do by hand. – Ted Feb 06 '23 at 03:08
  • 1
    Related: https://math.stackexchange.com/questions/825541/find-the-polynomial-if-we-know-its-output – mr_e_man Feb 06 '23 at 03:14
  • 2
    You wrote "there are so many mathematical algorithms that work solely based on this functionality," but you omitted any specific cases. Indeed if we are limited to evaluating a function (at arbitrary real arguments), we are not in any position to tell whether the given function is actually a polynomial function. The whole problem statement lacks context that would illuminate why you wanted to do what you describe. – hardmath Feb 06 '23 at 03:58
  • 1
    @Ted The user types in a polynomial and it goes through an algorithm called Shunting yard algorithm that can parse arithmetical expressions. This part of the problem is more geared towards programming than it is towards math, which is why I chose to not mention it and just say that the only thing I can do is evaluate the polynomial for a given variable. – disnoca Feb 06 '23 at 13:41
  • So you construct the computational graph of the polynomial, you know the polynomial as polynomial, you can evaluate with dual numbers and truncated Taylor series, you can compute (an upper bound for) the degree, also the full coefficient sequence of its expansion at any point. Of course the graph can be more compact in memory than the coefficient sequence. So you can apply the Budan-Fourier-Vincent (or Vincent-Akritas) method, or the heavier Sturm method. // Alternative to the algebraic method are (bisection-)exclusion methods that find a collection of intervals without roots. – Lutz Lehmann Feb 07 '23 at 08:48
  • @hardmath The keyword is reverse-communication. The main subroutine (eigensolver, root finder, time-integrator, etc.) asks the user to perform a specific calculation on specific input and return it at a specific address. Example: If the main routine is a Krylov subspace method for solving a linear system, then it will regularly ask the user to compute the action of the linear map or the preconditioner. This allows the user to provide subroutines that are optimized for the specific operators. – Carl Christian Feb 07 '23 at 10:02

7 Answers7

36

My understanding of the question is that an algorithm is sought that will use the input polynomial as a black-box for computational function evaluation, but without knowing the nature of the polynomial or even its degree.

In this case, I claim, there can be no algorithm for determining the number of real roots.

To see this, suppose that we had such an algorithm. Apply it to the polynomial $x^2+1$, say. The algorithm must eventually return the answer of 0 real roots. In the course of its computation, it will make a number of calls to the black-box input function. But only finitely many. Thus, the answer of 0 real roots was determined on the basis of those finitely many input/output values of the function.

But there are many other polynomials that agree exactly on those input/output values, but which do have other roots. For example, let us imagine a new point $(a,0)$ where $a$ was not used as one of the black-box function calls during the computation. The finitely many points that were used plus this new point $(a,0)$ determine a polynomial of that degree (equal to the number of points), which has at least one real root. And the point is that the root-counting algorithm would behave exactly the same for this modified polynomial, because it agrees on the finitely many evaluated points, and so the computation would proceed just as for $x^2+1$. The root-counting algorithm would therefore still give the answer of zero many real roots, but this answer would be wrong for this modified polynomial.

So there can be no such algorithm.

JDH
  • 44,236
  • 3
    Specifically, if you've evaluated $f(x_1),f(x_2),\cdots,f(x_n)$, then any polynomial of the form $g(x)=f(x)+(x-x_1)(x-x_2)\cdots(x-x_n),h(x)$, where $h$ is an arbitrary polynomial, has the same values at the same points: $g(x_1)=f(x_1),g(x_2)=f(x_2),\cdots,g(x_n)=f(x_n)$. So $f$ cannot be distinguished from $g$. – mr_e_man Feb 06 '23 at 03:02
  • 3
    Very nice argument. You can't even decide whether the polynomial is just constant with finitely many queries. – Metin Ersin Arıcan Feb 06 '23 at 07:20
  • In practice, the degree of $p$ can be determined by observing the behavior of $p(2x)/p(x)$ as $x$ tends to infinity. The limit is $2^n$ where $n$ is the degree. It is possible to distinguish beyond reasonable doubt between a sequence that converges to $2^6 =64$ and a sequence that converges to $2^7 = 128$. Here it useful to observe that the convergence will eventually be monotone. – Carl Christian Feb 08 '23 at 10:29
  • 2
    @CarlChristian The problem is how do you determine convergence. For any finite-computation convergence criterion, one can construct a polynomial that fools the machinery. – Hyperplane Feb 08 '23 at 16:31
10

While I appreciate the other answers given about theoretical foundations of decidability, I can't help but think that the original post - which has the "numerical-methods" category on it - may have been looking for a different kind of answer: something simple that is good enough to get things done... approximately... most of the time... for real-world instances... using something like floating point arithmetic... on polynomials which have less coefficients than the number of atoms that exist in the universe.

So given these relaxed standards of success, here's a simple randomized algorithm that will do.

At a high level, the algorithm is this:

  1. "De-black box" the polynomial.
  2. Use e.g. Sturm's theorem or one of the methods here to determine how many real roots there are within some sufficiently large interval.

As you seem to already know about Sturm's theorem, I'll focus on the first part, which is de-black boxing the polynomial.

The first part is nontrivial, but really the basic idea is about as simple as it gets:

  1. Choose a random real number $r$. Evaluate your polynomial at $f(r)$. Add the pair $(r, f(r))$ to your set of measurements.
  2. Use Lagrange interpolation to build a model of whatever you think the polynomial currently is based on your set of measurements so far. Call the resulting polynomial $\hat f(r)$.
  3. Estimate if $f(r) = \hat f(r)$ by trying $N$ randomly generated values $r_1, r_2, ..., r_N$, and seeing if $f(r_n) = \hat f(r_n)$ for all $n$. (Actually, even just setting $N = 1$ can be good enough, as we will see below.)
  4. If any of them aren't equal, you go back to step 1 and iterate with a new measurement.
  5. If they are all equal, we declare them to have passed the test and be "probably equal," and thus the algorithm has completed.

The last part is the famous Schwartz-Zippel algorithm, which solves the black-box Polynomial Identity Testing problem in randomized polynomial time.

There are a few things that need to be addressed:

First, from a "pure math" standpoint, there are a few things I've handwaved above in making this rigorous. What model of computation are we using? What does "random" mean? Are we assuming some physical way to generate infinite-precision "random" real numbers and compute stuff with them?

The short answer is that the Schwartz-Zippel algorithm above makes this all rigorous. We do not need magical infinite-precision random real numbers at all. All we need is a sufficiently large, finite set of values to seed our polynomial with, which can be rational numbers or integers or anything else. For instance, we can draw randomly from the set of all reals in $[0, 1)$ with at most 64 bits after the decimal point. Technically, all of these values are rational numbers. The Schwartz-Zippel lemma tells us that even just choosing one number from this set is sufficient to tell $f(x)$ and $\hat f(x)$ apart, as if the two are truly different, the chance you have magically drawn a real number $r$ for which $f(r) = \hat f(r)$ is no greater than $d/2^{64}$, where $d$ is the degree of the polynomial. So as long as we have $d \ll 2^{64}$, we're good to go. And doing this multiple times again decreases our chance of a false positive exponentially.

This also addresses JDH's point above, which in the setting of this answer, takes place in step 5 (the Schwartz-Zippel step). There is always a chance that one can get "false positives" here, and we can only successfully make them "exponentially unlikely" if we have some idea in advance of what we expect the maximum degree of our polynomial to be. However, it is noteworthy that, given any arbitrarily small desired false positive, the maximum degree increases exponentially with the number of bits of precision we use. So even as soon as we get to a modest amount of precision, we'll be alright, as our black boxes aren't made up of "exponentially large polynomials" we are evaluating which have more coefficients than atoms in the universe.

Lastly, while many of the "pure math" objections will tend to focus on step 5, in the "real world" one will tend to run into issues primarily in step 2. The real problem with this algorithm isn't that the notion of "randomness" isn't well-defined, but that Lagrange interpolation can be numerically unstable. Floating point roundoff error can easily ruin the location of the zeros even for polynomials with degree as low as 8.

It's a bit hard to address this because the extent to which this is a problem depends on what format the black-box polynomials will be in.

  • Are the coefficients being stored using floating point arithmetic, where things are being quantized and rounded off after each polynomial evaluation?
  • Or can we make it so we're using a computational algebra system where coefficients are stored using exact arbitrary-length rational values, so this isn't a problem?
  • Or something else?

There are several possible ways of dealing with this issue - doing stable Lagrange interpolation in the presence of floating-point error is kind of a whole other question in and of itself, but as long as you have sufficient flexibility with the precision and representation of the arithmetic, you'll be alright.

  • 1
    As per OP's last comment, the input is actually an expression string that gets parsed, apparently with direct evaluation instead of an intermediate AST. However, with direct evaluation using more complex algebraic data structures one can also systematically evaluate the coefficient sequence, it is a question of where the effort is spent to "un-blackbox". – Lutz Lehmann Feb 07 '23 at 08:53
  • +1 I appreciate your pragmatic approach. I added an answer that explains how to identify the degree of the polynomial by observing its growth. – Carl Christian Feb 08 '23 at 10:09
9

tl;dr: You can compute every coefficient of a computable complex polynomial with just black-box access, and yet root counting is impossible, although JDH's argument doesn't suffice for computable polynomials.

JDH's answer seems a bit incomplete to me. It's spot on, and gets at the crux of the issue assuming the black-box polynomial is over $\mathbb{Q}$ (takes in rational inputs and gives rational outputs), but the computable number case is more interesting.

If the black-box polynomial takes in a computable real as input and gives a computable real as output, then I believe JDH's argument doesn't hold. Specifically, the behavior of JDH's modified polynomial that passes through $(a, 0)$ will potentially differ from that of the original polynomial – it might have to inspect its input more closely than the original polynomial did, and your root counting algorithm can potentially take advantage of this.

To see that JDH's argument is insufficient if the polynomial operates on computable reals, observe that their argument would seem to also imply that integrating a computable function with only black-box access would be impossible. After all, how do you know if you're integrating $f$, as opposed to $g$, a function that happens to agree with $f$ on every point you evaluated? And yet, incredibly, integrating computable functions with only black-box access is possible.

Unfortunately, root counting is still impossible, because $x^2 + a$ has a number of real roots that's discontinuous in $a$, and all computable functions are continuous. So JDH's conclusion remains true, even if the original argument wasn't quite sufficient.

However, other properties of your polynomial are computable, even with just black-box access.

In particular, if we require that the polynomial also take in complex inputs, then every coefficient of the secret polynomial is computable given only black-box access!

Differentiation of a computable differentiable function $\mathbb{C} \to \mathbb{C}$ is computable (note that differentiation over $\mathbb{R}$ is not computable!), and therefore each term in the polynomial's Maclaurin series is computable.

Therefore, given just black-box access to a secret polynomial over the computable complex numbers, one can construct an algorithm that, given $n$, emits a computable number that is exactly the polynomial's $x^n$ coefficient. This doesn't suffice to root count, as per the above, but maybe it's helpful to know that black-box access lets you do more than one might expect?

(Much of the above was pointed out to me by Shachaf Ben-Kiki.)

  • I struggle to understand your use of the word computable. In particular, you say that all computable functions are continuous. Why is the sign function not computable? Would you develop the term computable a bit? – Carl Christian Feb 08 '23 at 11:14
  • @CarlChristian You can read about that in quite a number of places, e.g. https://math.andrej.com/2006/03/27/sometimes-all-functions-are-continuous/ or https://math.stackexchange.com/questions/1698088/are-all-computable-functions-continuous-or-vice-versa or https://en.m.wikipedia.org/wiki/Computable_real_function – Jean-Armand Moroni Feb 08 '23 at 20:40
  • @CarlChristian Consider an infinite sequence $a$ of zeroes and ones. Clearly, $x = \sum\limits_{i = 0}^\infty \frac{1}{(-3)^n} a_i$ is a real number, which is computable when $a$ is computable. Note that $x < 0$ if and only if the first one in $a$ occurs at an odd position, $x > 0$ if and only if the first one occurs at an even position, and $x = 0$ if and only if there aren't any ones at all. All three of these are undecidable. Intuitively, you have no way of knowing how many zeroes to examine before you conclude all the elements are zero. – Mark Saving Feb 20 '23 at 20:22
  • @MarkSaving My thanks. This is a good illustrative example. It reminds me of the table maker's dilemma in numerical analysis. But, why should one worry about computability in the context of the OP? The machine will deal with finite precision numbers and our task is to recognize the degree of the polynomial using these numbers. The difference between reals and finite precision numbers is taken into account when we consider say the conditioning of the polynomial at a specific real value. – Carl Christian Feb 20 '23 at 21:27
6

In practice, if you are happy with just being highly confident of a conclusion even though it's possible you got very unlucky, a probabilistic algorithm can start by finding the degree with a high level of certainty (but not 100%) as follows. Choose some $x_0$ randomly. And then let $x_i=x_0+i$.

  1. Evaluate $f(x_0)$ and $f(x_1)$. If they are equal, conclude it is highly likely the polynomial is constant. (You had to choose $x_0$ randomly!)

  2. Evaluate $f(x_2)$. Now you can find first differences $f(x_2)-f(x_1),f(x_1)-f(x_0)$. If they are equal, conclude it is highly likely the polynomial is linear.

  3. Evaluate $f(x_3)$. Now you can find second differences (the differences between adjacent first differences). If they are equal, conclude it is highly likely the polynomial is quadratic.

  4. Continue this way, computing $n$th differences. It is a polynomial, so eventually two adjacent $n$th differences are equal. If you find two adjacent $n$th differences that are equal, it is highly likely that you have an $n$th degree polynomial (if your seed $x_0$ was "randomly" chosen in some reasonable sense.)

If you are not satisfied, start over with a new random seed $x_0$. And start over again as many times as it takes to feel very certain you know the degree.

Once you think you know the degree, use linear algebra to find the coefficients with the same level of confidence (by evaluating at $n+1$ random inputs and obtaining $n+1$ linear equations in the $n+1$ unknown coefficients.)

So now you feel confident you have a formula for this polynomial. You can do a lot with that. For starters you can give a bounded interval that all real roots must fall in. Within that interval, you can put a bound on the derivative. You can then carve that interval into enough subintervals where you can be sure that between two positive (or two negative) outputs, the bound on the derivative implies it is impossible for this function to take value $0$. You will have some subintervals that are positive at one end and negative at the other. These could contain multiple roots, but you can keep subdividing until you'd feel incredibly unlucky to still have such a subinterval with multiple roots. And then you can offer a count of roots. (You could also do smarter things like similarly study the derivative first to know when it's not possible for one of these subintervals to have multiple zeros because the derivative can't change sign.)


If you try to actually code this imperfect, but good-enough-for-government-work algorithm, you will also have to address questions of precision. Like, does this black box express exact-value outputs? Probably not if it's anything remotely realistic. It is likely giving you something like floating point approximations. For the outline above, this will affect how and when you declare two values to be equal. And it will affect the linear algebra to find coefficients. And then it will affect all of the calculating at subinterval endpoints.

2'5 9'2
  • 54,717
  • +1 I appreciate the use of divided differences. I added an answer that explains how to identify the degree by observing the growth of the polynomial. – Carl Christian Feb 08 '23 at 10:17
2

There are hard limitations on what can be done in practice when the roots are ill-conditioned and/or the degree is huge. However, let us not forget that there is a subroutine that evaluates the polynomial in a reasonable amount of time and with an accuracy that is perceived as satisfactory!

Now let $p$ denote your polynomial. We recover the (unknown) degree of $p$ by observing the behavior of the function $f$ given by $$f(x) = \frac{p(2x)}{p(x)}.$$ It is straightforward to verify that $$f(x) \rightarrow 2^n, \quad x \rightarrow \infty, \quad x \in \mathbb{R}$$ where $n$ is the degree of $p$. Moreover, the convergence will eventually be monotonic. Example: If $p(x) = x^2 + 1$, then $$f(x) = \frac{4x^2 + 1}{x^2+1} = 4 \frac{1 + \frac{1}{4}x^{-2}}{1 + x^{-2}} \rightarrow 4 = 2^2, \quad x \rightarrow \infty, \quad x \in \mathbb{R}.$$ Once $n$ is know, then you recover $p$ using $n+1$ function evaluations and polynomial interpolation. Once $p$ is known, you proceed using either the Sturm sequence or methods based on Decartes' rule of sign.

Can this procedure be defeated? Certainly. The simplest thing is trigger an overflow in the evaluation of $p$. Is there a defence? Yes. Augment the implementation of $p$ and scale to protect against overflow in every arithmetic operation. This procedure is standard in eigenvector computations and is used in LAPACK (sequential) and StarNEig (parallel). The procedure is equivalent to extending the exponent field of the floating point numbers. Can Lagrange interpolation fail? Certainly. The conditioning of the relevant linear system can be arbitrarily bad, but it can be estimated. Is there a defense? Yes. Use extra arithmetic operations to emulate a sufficient small unit roundoff.

Carl Christian
  • 12,583
  • 1
  • 14
  • 37
2

In a comment to the question OP told that the input for this task is not so black-boxed at all, it is a text string containing an expression for the polynomial $p$ in $\Bbb R[X]$ using arithmetic (incl. power) operations. This gets evaluated by providing a number as the value of $X$ and applying the standard "shunting yard" parser, likely with immediate evaluation of the scanned operations.

Now for this a slight generalization is possible, one can evaluate with other algebraic arguments than just scalars for $X$.

  • If $X$ is set to an instance of the degree arithmetic, where addition is implemented as maximum and multiplication as addition, with constants having degree zero and $X$ degree one, then the result of the evaluation of the string is an upper bound for the degree.

  • If $X$ is set to an object implementing truncated power series arithmetic, $X=a+t+O(t^N)$, then the result $q_N(t)$ of the evaluation contains the $N$ lowest-degree coefficients of $p(a+t)$.

  • If $N$ is set to one plus the degree bound, all coefficients of the shifted polynomial are obtained. With that all algebraic sign-variation algorithms (Budan-Fourier, Sturm) can be applied.

  • Given an additional radius $R$, one can get a qualitative bound for the truncated polynomial, so that $|p(a+t)-q_N(t)|\le C|t|^N$ for all $|t|\le R$. This allows to apply bisection-exclusion methods with cheaper low-order evaluations, applying inner root-radius estimates for $q_N(t)-{\rm sgn}(q_N(0))C|t|^N$ if $q_N(0)\ne 0$ to find a root-free region around $a$ (where $[a-R,a+R]$ is a not yet excluded interval).

Lutz Lehmann
  • 126,666
1

A problem distinct from the one described by JDH is to distinguish $(x-a)^2+\epsilon$, with no real roots, from $(x-a)^2-\epsilon$, with two real roots. There is also the boundary case of $(x-a)^2$ with one double root. If $\epsilon \ll a$ it is very hard to tell the difference.

Ross Millikan
  • 374,822