9

I'm trying to understand the Double-precision floating-point format: enter image description here

As I understand so far, a floating-point number is of the form $$ (-1)^s2^{c-1023}(1+f) $$ where $s=0,1$ is the sign indicator, $c$ is the 11-bit exponent, $f$ is the 52-bit binary fraction.

Here is my question:

What is the largest floating point number?

I get an answer in Burden and Faires's Numerical Analysis, which I don't understand:

... the largest has $s=0$, $c=2046$, and $f=1-2^{-52}$ and is equivalent to $$ 2^{1023}(1+1-2^{-52}) $$

I don't see why $c=2046$. The exponent of 11 binary digits gives a range of $0$ to $2^{11}-1=2047$. Why $c$ is not $2047$?

1 Answers1

10

In the IEEE-754 binary floating point formats, a floating point number with all exponent bits set is special, it is an infinity (positive or negative, according to the sign bit) if the mantissa bits are all $0$, and it is a NaN ("Not a Number"), if some mantissa bits are nonzero.

So the largest finite double-precision floating point number is $2^{1024} - 2^{971}$, corresponding to all exponent bits save the last set - a biased exponent of $2046$, unbiased exponent of $1023$ - and all mantissa bits set - corresponding to a significand of $2 - 2^{-52}$. (Positive infinity is larger).

The IEEE-754 formats also treat numbers with all exponent bits $0$ specially, these are denormalized numbers (or subnormals), those have no implied hidden $1$ bit.

Daniel Fischer
  • 206,697