What types of numbers are representable in binary floating-point?

The rule can be summed up as this:

A number can be represented exactly in binary if the prime factorization of the denominator contains only 2. (i.e. the denominator is a power-of-two)

So 1/(32 + 16) is not representable in binary because it has a factor of 3 in the denominator. But 1/32 + 1/16 = 3/32 is.

That said, there are more restrictions to be representable in a floating-point type. For example, you only have 53 bits of mantissa in an IEEE double so 1/2 + 1/2^500 is not representable.

So you can do sum of powers-of-two as long as the range of the exponents doesn’t span more than 53 powers.

To generalize this to other bases:

A number can be exactly represented in base 10 if the prime factorization of the denominator consists of only 2’s and 5’s.
A rational number X can be exactly represented in base N if the prime factorization of the denominator of X contains only primes found in the factorization of N.

More Related Contents:

Leave a Comment Cancel reply