1

I need a confirmation of a thing that probably is silly.

Let $x$ a floating point number representable using $e$ bits for exponent and $m$ bits for mantissa, let $f$ a be an elementary function, you can suppose if it helps $D(f) = A = [1,2)$ and $f(A) = [1,2)$, you can transform any elementary function such that is defined in the intervals I just specified. Assume I've implemented a an algorithm $\psi$ such that $\psi$ approximates $f$ in the same floating point system.

I was wondering except for trivial cases the MINIMUM accuracy I could achieve by such algorithm, whatever it is. The answer should trivially be $0.5 ulp$ right?

My answer is motivated by the following:

The set of floating point numbers I've defined is finite, so trivially I can implement the computation of $f$ as $\psi(x) = \circ(f(x))$, where with $\circ(\cdot)$ I denote the rounding operation. So I can trivially sample the original function through all the floating points numbers, round the result, and store the result. I can have two situations:

  1. $f(x)$ is a floating point number, in such a case the error is $0$, this is what a would call trivial.
  2. $f(x)$ is not a floating point number, in such a case the rounding would provide me 0.5 ulp of accuracy. This is not trivial.

So because of this the min accuracy I can achieve is 0.5 ulp, right? It's a theoretical lower bound in non trivial situation what I'm looking for.

user8469759
  • 5,285
  • It depends on the rounding scheme; your rounding scheme could be such that it is 1ulp. But the general argument seems okay. Also, I don't think I would call it the "min" accuracy; you are looking at something that's more like a max-min (maximum [over all elementary functions] of the minimum accuracy). – Willie Wong Jun 24 '16 at 13:52
  • But better than listing all the values I can't do, right? – user8469759 Jun 24 '16 at 13:56
  • I don't understand your last comment. Can you explain what you mean by "values I can't do"? – Willie Wong Jun 24 '16 at 13:58
  • Sorry xD, I meant... If I'm looking for the min possible ulp error I cannot do any better than implementing a LUT table, right? – user8469759 Jun 24 '16 at 14:00

1 Answers1

1

I guess the following is what you are asking:

  1. The set of floating point numbers (let us write $\mathbf{FP}$ for it) with a given exponent and mantissa is finite.
  2. Thus the set of functions $\mathcal{F} = \{ f: \mathbf{FP} \to \mathbf{FP}\}$ with domain and range the floating point numbers is also finite.
  3. $\mathbf{FP}$ can be included in the reals $\mathbb{R}$ as a subset.
  4. Let $g: \mathbb{R} \to \mathbb{R}$ be a function.
  5. Define a function $d_g: \mathcal{F} \to \mathbb{R}$, which we call the "accuracy" (the actual definition is unimportant here!) of the floating point function $f$ compared to the real valued function $g$. There are various ways to define it, but you can write for example $$ d_g(f) = \max_{x\in \mathbf{FP}} \frac{|f(x) - g(x)|}{\max(|f(x)|, |g(x)|)} $$ with the convention that if $f(x) = g(x) = 0$ then the ratio evaluates to 0. Again I stress the actual definition $d_g$ is unimportant!
  6. Since $\mathcal{F}$ is a finite set, there exists (not necessarily uniquely), some function $f_0\in \mathcal{F}$ such that $d_g(f_0) \leq d_g(f)$ for any other $f\in \mathcal{F}$. In other words, regardless of how you define accuracy, as long as accuracy is a well-defined concept that can be measured by a real number, then there is a (but not necessarily uniquely) "most accurate" floating point representation of $g$.
  7. If you know that $g$ is, you can pre-compute $f_0$ and define, as you indicated, $g$'s floating point representation by $f_0$ via, say, a look-up-table.

Incidentally, another way to define $d_g$ is $$ d_g(f) = \sum_{x \in \mathbf{FP}} |f(x) - g(x)| $$ You then see that if $f_0$ minimizes $d_g$, this means for every other floating point function $f$ and every floating point number $x$ is must be that $$ |f(x) - g(x) | \geq |f_0(x) - g(x)| $$ which is probably what you want in terms of "most accurate representation".


In terms of whether you can do computationally better than implementing a look-up table: it depends really on the function. For example, if you look at the function $$ g(x) = \frac32 x $$ it has no exact floating point representation. In fact, for half of the floating point numbers $g(x)$ is as far from having an exact floating point representation as possible. But there is an easy algorithm that does as well as the LUT.

Willie Wong
  • 73,139