4

How to compute $\exp(x)$ for $x \in [0, \ln(2)]$ in double precision with high accuracy, possibly at most $0.5$ ulp error?

It is possible to obtain low error. After testing a few libraries I have found one with 1 ulp accuracy. However I am not able to find any reference, which delivers precise error bounds for computed value.

Calculating $\exp(x)$ is not a trivial task. Althougth is is easy to obtain fast converging polynomial approximating this function, for example the Taylor series, roundoff errors occuring during evaluation of a polynomial are quite high. For example if we want to evaluate a polynomial $p(x) = \sum_{i=0}^{N-1} a_ix^i$, then apriori relative forward error is bounded as $$\left|\frac{\operatorname{fl}(p(x)) - p(x)}{p(x)}\right| \leq \operatorname{cond}(p,x)\times u+ O(u^2) $$ where $$\operatorname{cond}(p,x) = \frac{2N}{1-2Nu} \times \left|\frac{\sum_{i=0}^N |a_i||x|^i}{p(x)}\right| \geq 2N$$ and $u$ is the unit roundoff (half of the machine precision).

Therefore even if we have a polynomial approximation of $\exp(x)$ of order $5$ (which is hard by itself), with truncation error at most $0.5$ ulp, then still roundoff error is at least $5$. However $1$ ulp error is possible.

My question is what kind of tricks can be used here to obtain high accuracy (without using high precision arithmetics). Is it even possible to obtain $0.5$ ulp error?

hardmath
  • 37,015
Pawel Kowal
  • 2,252
  • When you write \text{exp} rather than \exp, then you see $a\text{exp}b$ instead of $a\exp b$ and $a\text{exp}(b)$ instead of $a\exp(b).$ I am including both examples (with and without parentheses) so that you can see how the spacing depends on the context with \exp. \exp is standard usage. Similarly \operatorname{cond} rather than \text{cond} for the same reason. – Michael Hardy Nov 16 '17 at 01:40
  • I'd use scaling and squaring: divide $x$ with some power of $2$ such that the result, when fed into a truncated series or Padé approximant, gives an accurate result, and then square repeatedly. – J. M. ain't a mathematician Nov 16 '17 at 02:52
  • @J. M. is not a mathematician Do you have a reference? Pade is inaccurate, each squaring multiplies error by 3 (rough estimation). Why would it be accurate? – Pawel Kowal Nov 16 '17 at 02:59
  • 1
    I don't have a reference; what I did was to carry out a few experiments a long time ago. Since you're restricted to $[0,\log 2]$, you'll want to use a series or Padé approximant centered at $\frac12\log 2$. – J. M. ain't a mathematician Nov 16 '17 at 03:17
  • The OP refers to "compensated summation" in a Commented on my Answer. A gentle introduction to this topic is embedded in this blog post by Alex Sanchez-Stern, "Improving Accuracy: A Look at Sums" (Oct. 2015). – hardmath Nov 17 '17 at 01:15

3 Answers3

4

The computation of the exponential function with arbitrary accuracy has been discussed previously. The problem posed here is not so much the choice of an approximation as it is the choice of evaluation schemes with good numerical conditioning.

The conditioning of polynomial approximations on a bounded interval such as $[0,\ln 2]$ is pretty good. You are summing a finite series of positive terms, so summing from smallest to largest avoids any possible "accumulation" of rounding errors.

If you want to do better than $1$ ulp accuracy in a double precision "computation", you necessarily need higher than double precision accuracy in the intermediate results and care in rounding to a final double precision result. This is true for basic arithmetic operations, and especially so for the exponential function, where essentially the only exact value that can be represented is $\exp(0) = 1$.

An alternative is to avoid "computation" and resort to a gigantic lookup table. Effectively one precomputes the correctly rounded results and stores them for retrieval at runtime. A mixed strategy of using short computations over a moderately large number of short subintervals has been explored by several authors.

A key idea that has been proposed is that we can do better than exhaustive search to find the difficult to round cases.

References

In The Table Maker's Dilemma (1998) Lefvre, Muller, and Tisserand wrote:

The Table Maker's Dilemma is the problem of always getting correctly rounded results when computing the elementary functions. After a brief presentation of this problem, we present new developments that have helped us to solve this problem for the double-precision exponential function in a small domain.

Some more recent exposition by two of these same authors is offered as Introduction to the Table Maker's Dilemma, last modified in 2003, and focuses on the search for "worst cases", i.e. arguments $x$ where $f(x)$ is exactly in between two machine representable values. A superset of authors contributed to a book chapter for Handbook of Floating-point Arithmetic in Solving the Table Maker's Dilemma (2010).

Tips and Tricks

Note that any absolute rounding error present in an input value is scaled with the derivative of the exact exponential function in the output value, so that as a relative error it would be conserved. For simplicity we will henceforth treat the input value $x$ for $\exp(x)$ as if it were representing an exact double-precision floating point equivalent under IEEE 754.

With care you can approach correct rounding, i.e. $1/2$ ulp accuracy (in a double precision answer), but why bother? The OP responded:

Why I am interested in this this problem? Firsty I wrote a library, which correctness depends on high accuracy of log and exp (symmetric level index representation of doubles) and I dont have time to check its correctness for all possible values of doubles is considered region. Secondly I found I a library with lower error and higher efficiency, than VS C++ std implementation, but I dont understand why it works. And I can't find any error analysis of the exp function.

As broached in my responding Comment, implementing a mathematical library for a specific CPU architecture (such as x86) allows one to optimize code at an assembly language level for best accuracy ("lower error") and speed ("higher efficiency"). Pursuit of this topic is outside the scope of Math.SE, but as a starting point I refer you the Intel Floating Point Instruction Set and F2XM1 in particular.

What is within the scope of Math.SE is discussion of algorithms and in generic terms the trade-off implications for speed vs. accuracy. High-level languages such as C allow a variety of algorithms, and it makes sense to consider "error analysis" and operation complexity as a proxy for real world performance (accuracy and speed).

The CORDIC algorithm for the exponential function is attractive for simplifying the intermediate operations required to adding and shifting, whose round-off errors are easily modeled. By adding extra "guard bits" to the computation, the computation can be pushed as close to correct rounding as we wish. However the number of operations (complexity) is roughly linear with the number of bits (including guard bits) required.

A classical analysis of optimal approximation of the exponential function would begin with a polynomial approximation on a bounded interval. The truncated Taylor series is of greatest accuracy at (a small neighborhood of) $x=0$. For an extended interval such as $[0,\ln 2]$ one asks for the best uniform accuracy (minmax) by applying the Remez algorithm.

That said, I'd recommend using best rational approximations rather than polynomial approximations. The rational approximations can cheaply impose the symmetry of the exponential function:

$$ \exp(-x) = \frac{1}{\exp(x)} $$

Details (which involve Bernoulli numbers) can be found here.

The Padé approximant of appropriate degree (mentioned by J. M. is not a mathematician) is often used as a starting value for the rational version of the Remez algorithm.

Regardless of whether the approximating function is polynomial or rational, a reduction of operation count is possible by subdividing the interval of approximation $[0,\ln 2]$ into suitable smaller pieces. In the limit one imagines a lookup table (see this previous Question)! Short of this one exerts effort to obtain pre-computed values of the exponential function which are especially close to their floating-point representations. Using these as endpoints of the subintervals allows us to get economical computations of known accuracy (fewer operations making the error analysis easier and more conservative).

hardmath
  • 37,015
  • It is not so easy. Even for compensated summation with all elements exact and positive error is at least 1ulp (see Higham for example). But in this case we have additional error resulting form computing $i$-th element of the sum. Simple summation, even if all elements are positive and sorted gives error of order $N$ ulp. – Pawel Kowal Nov 16 '17 at 02:09
  • In this case we are not summing $N$ arbitrary (positive) numbers. Worst case rounding error for $1 + x$ with $0 \le x \lt 1/4$ will be less than $1$ ulp if done properly. But you've not said anything to explain what real benefit the fractional improvement on $1$ ulp is expected to provide. – hardmath Nov 16 '17 at 02:34
  • It is interesting, could you give me some reference? It is possible, that actual errors are lower, but how to prove this? Why I am interested in this this problem? Firsty I wrote a library, which correctness depends on high accuracy of log and exp (symmetric level index representation of doubles) and I dont have time to check its correctness for all possible values of doubles is considered region. Secondly I found I a library with lower error and higher efficiency, than VS C++ std implementation, but I dont understand why it works. And I can't find any error analysis of the exp function. – Pawel Kowal Nov 16 '17 at 02:48
  • If you were writing a replacement for Microsoft's Visual Studio C++ math library, you would have the advantage of targeting the x86 platform, which has floating point (x87) registers that extend the mantissa size to $64$ bits. Error analysis is a fine mathematical topic but it hardly compensates for lack of thorough software design and testing. I'll update my Answer with some mathematical notes, but the Windows floating point environment is more suitable for StackOverflow. – hardmath Nov 16 '17 at 14:11
  • If $P$ and $Q$ in approximation $P(x)/Q(x)$ (with small truncation error) are computed with accuracy $a$ ulp, then total error is $2a + 0.5$ ulp. Is there an approximation, where $P$ and $Q$ can be computed with two times higher accuracy, then the Taylor series? Approximation in the link (with Bernouli numbers) is not well conditioned (over 2x higher cond number, then for Taylor series), why is it better than Taylor?. It is easy to find mathematically good approximation. The problem is how to find very stable numerically approximation, evaluation scheme, error compensation method, ect. – Pawel Kowal Nov 16 '17 at 23:25
  • 1
    I agree that the challenge is to find "very stable" numerical evaluation schemes. Even the problem of summing a set of positive numbers poses some interesting challenges. However you give the impression that you are asking for exact rounding ($\le 1$ ulp) while insisting on nothing better than double precision arithmetic for intermediate results, something that is simply unobtainable except by the pre-computed lookup table approach (which I mention). It is sensible to compare two methods and choose based on better error behavior. – hardmath Nov 17 '17 at 01:04
2

I have found one nice formula for the $\exp$ function: \begin{align} G(x) &= \frac{\operatorname{coth}(\sqrt x/2)}{\sqrt x} - \frac{2}{x} \tag{1}\\ c &= x - G(x^2) \times x^2 \tag{2}\\ \exp(x) &= 1 + \left(\frac{xc}{2 - c} + x\right) \tag{3} \end{align}

This formula was used for interval $[-\ln(2)/2, \ln(2)/2]$ but this is also good (transformation from this interval to $[0, \ln(2)]$ can be done without any errors. Unfortunately I don't have any reference on this scheme. I found this scheme using "reverse engeneering" of a library source.

The function $G(x)$ is nearly linear in this interval. This function can be accurately approximated by a polynomial of order $4$ using the Remez algorithm. Its Taylor expansion is $$G(x) \approx \frac{1}{6} - \frac{x}{360} + \frac{x^2}{15120} - \frac{x^3}{604800} + \frac{x^4}{23950080}$$ Condition number of this polynomial in given interval is very good, $1.012\times\gamma_5$, $\gamma_N = 2N/(1-2Nu)$.

Expression (2) cannot be computed accurately. However (3) suppresses most of numerical errors. For example condition number of (3) with respect to $c$ is $0.0619$, thus error if computing $c$, $\delta_c$, contributes to the total error as $0.0619\times\delta_c$, similarly $xc/(2 - c)$ is inaccurate, but error in computing this subexpression ($\delta_{xc}$) contributes to error of $xc/(2 - c) + x$ approximately as $0.2\times\delta_{xc}$, which again is suppressed in the final sum (3) (approximately twice).

Final error of computing $\exp$ by (1)-(3) is less than 1 (maximum error found is $0.83$).

It is possible to compute $\exp(x)$ even with $0.5$ ulp error using the compensated Horner's scheme. See "Faithful Polynomial Evaluation with Compensated Horner Algorithm", P. Langlois, N. Louvet, 2006, for details. This scheme must however be corrected in order to take into account errors in computed coefficients of approximation polynomial. As the polynomial approximation of $\exp(x)$ we can use the Taylor expansion of order $27$. Probably we can use lower order approximation, in tests I was not able to find a case with higher than $0.5$ ulp even in case of approximation of order $20$. Moreover this scheme is quite fast, at most $8$-times slower, than library implementation of $\exp$.

Pawel Kowal
  • 2,252
  • This seems to be based on Hans J. Maehly, "Approximations for Control Data 1604." Control Data Corporation, technical report, March 1960. On p. 12: $e^{x}= 1 + \frac{2x}{S(x^2)-x} = \frac {S(x^2)+x} {S(x^2)-x}$ where $S(x^2) = x\ \mathrm{coth} \left( \frac{x}{2} \right)$. Now, $S(x^2) \approx 2 + \frac{x^2}{6} - \frac{x^4}{360} + \frac{x^6}{15120} -\ldots$. Maehly uses a continued fraction instead of polynomial. While this is interesting (+1) I do not find it to be superior to a minimax polynomial approximation evaluated with an FMA-enhanced Horner scheme (neither accuracy nor performance). – njuffa Aug 19 '23 at 08:42
0

Using continued fraction functions should give you much better convergence, especially in the complex plane. While Taylor's formula for the exponential function will eventually converge, for large x it's inefficient. Continued fraction functions will produce terms which alternate between being too big, and too small, as they become closer and closer, so you have a built in range for the result.

$e^x$=1+x/(1+x/(-2+x/(-3+x/(2+x/(5+x/(-2+x/(-7+x/(2+x/(9+x/(-2+x/(-11+... $$ e^x = 1 + K_{i=1}^\infty {x \over c_i} \ where \ c_i = \begin{cases} i\ is \ odd \ \, \ c_i=(-1)^{(i+1) \over 2} \ i \\ i \ is \ even \ c_i =(-1)^{i \over 2} \ 2\end{cases}$$

While these might look a bit intimidating to evaluate at first, John Wallis' Fundamental Recurrence Formula gives you an easy way to evaluate them efficiently. https://en.wikipedia.org/wiki/Generalized_continued_fraction

KevB
  • 196