understanding floating point variables

Ok. Let me try to explain it.

The basic thing to remember with floating point numbers is this: They occupy a limited amount of bits and try to represent the original number using base-2 arithmetic.

As you know, in base-2 arithmetic integers are represented by the powers of 2 that they contain. Thus, 6 would be represented as 4 + 2, ie. in binary as 110.

In order to understand how fractional numbers are represented, you have to think about how we represent fractional numbers in our decimal system. The fractional part of numbers (for example 0.11) is represented as multiples of inverse powers of 10 (since the base is 10). Thus 0.11 is actually 1/10 + 1/100. As you can appreciate, this is not powerful enough to represent all fractional numbers in a limited number of digits. For example, 1/3 would be 0.333333…. in a never ending fashion. If we had only 32 digits of space to write the number down, we would end up having only an approximation to the original number, 0.33333333333333333333333333333333. This number, for example, would give 0.99999999999999999999999999999999 if it was multiplied by 3 and not 1 as you would have expected.

The situation is similar in base-2. Each fractional number would be represented as multiples of inverse powers of 2. Thus 0.75 (in decimal) (ie 3/4) would be represented as 1/2 + 1/4, which would mean 0.11 (in base-2). Just as base 10 is not capable enough to represent every fractional number in a finite manner, base-2 cannot represent all fractional numbers given a limited amount of space.

Now, try to represent 0.11 in base-2; you start with 11/100 and try to find an inverse power of 2 that is just less than this number. 1/2 doesn’t work, 1/4 neither, nor does 1/8. 1/16 fits the bill, so you mark a 1 in the 4th place after the decimal point and subtract 1/16 from 11/100. You are left with 19/400. Now try to find the next power of 2 that fits the description. 1/32 seems to be that one, mark the 5th place after the point and subtract 1/32 from 19/400, you get 13/800. Next one is 1/64 and you are left with 1/1600 thus the next one is all the way up at 1/2048, etc. etc. Thus we got as far as 0.00011100001 but it goes on and on; and you will see that there always is a fraction remaining. Now, I didn’t go through the whole calculation, but after you have put in 32 binary digits after the dot you will still probably have some fraction left (and this is assuming that all of the 32 bits of space is spent representing the decimal part, which it is not). Thus, I am sure you can appreciate that the resulting number might differ from its actual value by some amount.

In your case, the difference is 0.00000000000000001 which is 1/100000000000000000 = 1/10^17 and I am sure that you can see why you might have that.

Leave a Comment