Minimum error in floating point approximation of an elementary function.

Question

I need a confirmation of a thing that probably is silly.

Let $x$ a floating point number representable using $e$ bits for exponent and $m$ bits for mantissa, let $f$ a be an elementary function, you can suppose if it helps $D(f) = A = [1,2)$ and $f(A) = [1,2)$, you can transform any elementary function such that is defined in the intervals I just specified. Assume I've implemented a an algorithm $\psi$ such that $\psi$ approximates $f$ in the same floating point system.

I was wondering except for trivial cases the MINIMUM accuracy I could achieve by such algorithm, whatever it is. The answer should trivially be $0.5 ulp$ right?

My answer is motivated by the following:

The set of floating point numbers I've defined is finite, so trivially I can implement the computation of $f$ as $\psi(x) = \circ(f(x))$, where with $\circ(\cdot)$ I denote the rounding operation. So I can trivially sample the original function through all the floating points numbers, round the result, and store the result. I can have two situations:

$f(x)$ is a floating point number, in such a case the error is $0$, this is what a would call trivial.
$f(x)$ is not a floating point number, in such a case the rounding would provide me 0.5 ulp of accuracy. This is not trivial.

So because of this the min accuracy I can achieve is 0.5 ulp, right? It's a theoretical lower bound in non trivial situation what I'm looking for.

It depends on the rounding scheme; your rounding scheme could be such that it is 1ulp. But the general argument seems okay. Also, I don't think I would call it the "min" accuracy; you are looking at something that's more like a max-min (maximum [over all elementary functions] of the minimum accuracy). — Willie Wong, Jun 24 '16 at 13:52
I don't understand your last comment. Can you explain what you mean by "values I can't do"? — Willie Wong, Jun 24 '16 at 13:58
Sorry xD, I meant... If I'm looking for the min possible ulp error I cannot do any better than implementing a LUT table, right? — user8469759, Jun 24 '16 at 14:00

score 1 · Answer 1 · answered Jun 24 '16 at 14:24

I guess the following is what you are asking:

The set of floating point numbers (let us write $\mathbf{FP}$ for it) with a given exponent and mantissa is finite.
Thus the set of functions $\mathcal{F} = \{ f: \mathbf{FP} \to \mathbf{FP}\}$ with domain and range the floating point numbers is also finite.
$\mathbf{FP}$ can be included in the reals $\mathbb{R}$ as a subset.
Let $g: \mathbb{R} \to \mathbb{R}$ be a function.
Define a function $d_g: \mathcal{F} \to \mathbb{R}$, which we call the "accuracy" (the actual definition is unimportant here!) of the floating point function $f$ compared to the real valued function $g$. There are various ways to define it, but you can write for example $$ d_g(f) = \max_{x\in \mathbf{FP}} \frac{|f(x) - g(x)|}{\max(|f(x)|, |g(x)|)} $$ with the convention that if $f(x) = g(x) = 0$ then the ratio evaluates to 0. Again I stress the actual definition $d_g$ is unimportant!
Since $\mathcal{F}$ is a finite set, there exists (not necessarily uniquely), some function $f_0\in \mathcal{F}$ such that $d_g(f_0) \leq d_g(f)$ for any other $f\in \mathcal{F}$. In other words, regardless of how you define accuracy, as long as accuracy is a well-defined concept that can be measured by a real number, then there is a (but not necessarily uniquely) "most accurate" floating point representation of $g$.
If you know that $g$ is, you can pre-compute $f_0$ and define, as you indicated, $g$'s floating point representation by $f_0$ via, say, a look-up-table.

Incidentally, another way to define $d_g$ is $$ d_g(f) = \sum_{x \in \mathbf{FP}} |f(x) - g(x)| $$ You then see that if $f_0$ minimizes $d_g$, this means for every other floating point function $f$ and every floating point number $x$ is must be that $$ |f(x) - g(x) | \geq |f_0(x) - g(x)| $$ which is probably what you want in terms of "most accurate representation".

In terms of whether you can do computationally better than implementing a look-up table: it depends really on the function. For example, if you look at the function $$ g(x) = \frac32 x $$ it has no exact floating point representation. In fact, for half of the floating point numbers $g(x)$ is as far from having an exact floating point representation as possible. But there is an easy algorithm that does as well as the LUT.

Minimum error in floating point approximation of an elementary function.

1 Answers1

Linked