Denormalized Numbers – IEEE 754 Floating Point

Essentially, a denormalized float has the ability to represent the SMALLEST (in magnitude) number that is possible to be represented with any floating point value. That is correct. using denormalized numbers comes with a performance cost on many platforms The penalty is different on different processors, but it can be up to 2 orders of … Read more

Uses for negative zero floating point value?

From Wikipedia: It is claimed that the inclusion of signed zero in IEEE 754 makes it much easier to achieve numerical accuracy in some critical problems[1], in particular when computing with complex elementary functions[2]. The first reference is “Branch Cuts for Complex Elementary Functions or Much Ado About Nothing’s Sign Bit” by W. Kahan, that … Read more

Matlab vs C++ Double Precision

You got confused by the different ways C++ and MATLAB are printing double values. MATLAB’s format long only prints 15 significant digits while C++ prints 17 significant digits. Internally both use the same numbers: IEEE 754 64 bit floating point numbers. To reproduce the C++-behaviour in MATLAB, I defined a anonymous function disp17 which prints … Read more

Type-juggling and (strict) greater/lesser-than comparisons in PHP

PHP’s comparison operators deviate from the computer-scientific definitions in several ways: In order to constitute an equivalence relation == has to be reflexive, symmetric and transitive: PHP’s == operator is not reflexive, i.e. $a == $a is not always true: var_dump(NAN == NAN); // bool(false) Note: The fact that any comparison involving NAN is always … Read more

Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

Table of contents: C/C++ asm Creating real-life software that achieves this. In C or C++: No, a fully ISO C11 and IEEE-conforming C implementation does not guarantee bit-identical results to other C implementations, even other implementations on the same hardware. (And first of all, I’m going to assume we’re talking about normal C implementations where … Read more

Why does the floating-point value of 4*0.1 look nice in Python 3 but 3*0.1 doesn’t?

The simple answer is because 3*0.1 != 0.3 due to quantization (roundoff) error (whereas 4*0.1 == 0.4 because multiplying by a power of two is usually an “exact” operation). Python tries to find the shortest string that would round to the desired value, so it can display 4*0.1 as 0.4 as these are equal, but … Read more

Extreme numerical values in floating-point precision in R

R uses IEEE 754 double-precision floating-point numbers. Floating-point numbers are more dense near zero. This is a result of their being designed to compute accurately (the equivalent of about 16 significant decimal digits, as you have noticed) over a very wide range. Perhaps you expected a fixed-point system with uniform absolute precision. In practice fixed-point … Read more