How to calculate float type precision and does it make sense?

The MSDN documentation is nonsensical and wrong.

Bad concept. Binary-floating-point format does not have any precision in decimal digits because it has no decimal digits at all. It represents numbers with a sign, a fixed number of binary digits (bits), and an exponent for a power of two.

Wrong on the high end. The floating-point format represents many numbers exactly, with infinite precision. For example, “3” is represented exactly. You can write it in decimal arbitrarily far, 3.0000000000…, and all of the decimal digits will be correct. Another example is 1.40129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125•10⁻⁴⁵. This number has 105 significant digits in decimal, but the float format represents it exactly (it is 2⁻¹⁴⁹).

Wrong on the low end. When “999999.97” is converted from decimal to float, the result is 1,000,000. So not even one decimal digit is correct.

Not a measure of accuracy. Because the float significand has 24 bits, the resolution of its lowest bit is about 2²³ times finer than the resolution of its highest bit. This is about 6.9 digits in the sense that log₁₀2²³ is about 6.9. But that just tells us the resolution—the coarseness—of the representation. When we convert a number to the float format, we get a result that differs from the number by at most ½ of this resolution, because we round to the nearest representable value. So a conversion to float has a relative error of at most 1 part in 2²⁴, which corresponds to about 7.2 digits in the above sense. If we are using digits to measure resolution, then we say the resolution is about 7.2 digits, not that it is 6-9 digits.

Where do these numbers came from?

So, if “~6-9 digits” is not a correct concept, does not come from actual bounds on the digits, and does not measure accuracy, where does it come from? We cannot be sure, but 6 and 9 do appear in two descriptions of the float format.

6 is the largest number x for which this is guaranteed:

If any decimal numeral with at most x significant digits is within the normal exponent bounds of the float format and is converted to the nearest value represented in the format, then, when the result is converted to the nearest decimal numeral with at most x significant digits, the result of that conversion equals the original number.

So it is reasonable to say float can preserve at least six decimal digits. However, as we will see, there is no bound involving nine digits.

9 is the smallest number x that guarantees this:

If any finite float number is converted to the nearest decimal numeral with x digits, then, when the result is converted to the nearest value representable in float, the result of that conversion equals the original number.

As an analogy, if float is a container, then the largest “decimal container” guaranteed to fit inside it is six digits, and the smallest “decimal container” guaranteed to hold it is nine digits. 6 and 9 are akin to interior and exterior measurements of the float container.

Suppose you had a block 7.2 units long, and you were looking at its placement on a line of bricks each 1 unit long. If you put the start of the block at the start of a brick, it will extend 7.2 bricks. However, somebody else chooses where it starts, they might start it in the middle of a brick. Then it would cover part of that brick, all of the next 6 bricks, and and part of the last brick (e.g., .5 + 6 + .7 = 7.2). So a 7.2-unit block is only guaranteed to cover 6 bricks. Conversely, 8 bricks can cover the 7.2-unit block if you choose where they are placed. But if somebody else chooses where they start, the first might cover just .1 units of the block. Then you need 7 more and another fraction, so 9 bricks are needed.

The reason this analogy holds is that powers of two and powers of 10 are irregularly spaced relative to each other. 2¹⁰ (1024) is near 10³ (1000). 10 is the exponent used in the float format for numbers from 1024 (inclusive) to 2048 (exclusive). So this interval from 1024 to 2048 is like a block that has been placed just after the 100-1000 ends and the 1000-10,000 block starts.

But note that this property involving 9 digits is the exterior measurement—it is not a capability that float can perform or a service that it can provide. It is something that float needs (if it is to be held in a decimal format), not something it provides. So it is not a bound on how many digits a float can store.

Further Reading

For better understanding of floating-point arithmetic, consider studying the IEEE-754 Standard for Floating-Point Arithmetic or a good textbook like Handbook of Floating-Point Arithmetic by Jean-Michel Muller et al.

More Related Contents:

Leave a Comment Cancel reply