Floating point comparison revisited

“Almost Equals” Is Not a Good Function

4 is not an appropriate value: The answer you point to states “Therefore, 4 should be enough for ordinary use” but contains no basis for that claim. In fact, there are ordinary situations in which numbers calculated in floating-point by different means may differ by many ULP even though they would be equal if calculated by exact mathematics. Therefore, there should be no default value for the tolerance; each user should be required to supply their own, hopefully based on thorough analysis of their code.

As an example of why a default of 4 ULP is bad, consider 1./49*49-1. The mathematically exact result is 0, but the computed result (64-bit IEEE 754 binary) is 2−53, an error exceeding 10307 ULP of the exact result and almost 1016 ULP of the computed result.

Sometimes, no value is appropriate: In some cases, the tolerance cannot be relative to the values being compared, neither a mathematically exact relative tolerance nor a quantized ULP tolerance. For example, nearly every output value in an FFT is affected by nearly every input value, and the error in any one element is related to the magnitudes of other elements. An “almost equals” routine must be supplied additional context with information about the potential error.

“Almost Equals” has poor mathematical properties: This shows one of the shortcomings of “almost equals”: Scaling changes the results. The code below prints 1 and 0.

double x0 = 1.1;
double x1 = 1.1 + 3*0x1p-52;
std::cout << almostEqual(x0, x1) << "\n";
x0 *= .8;
x1 *= .8;
std::cout << almostEqual(x0, x1) << "\n";

Another failing is that it is not transitive; almostEqual(a, b) and almostEqual(b, c) does not imply almostEqual(a, c).

A Bug in Extreme Cases

almostEqual(1.f, 1.f/11, 0x745d17) incorrectly returns 1.

1.f/11 is 0x1.745d18p-4 (hexadecimal floating-point notation, meaning 1.745d1816•2−4). Subtracting this from 1 (which is 0x10p-4) yields 0xe.8ba2e8p-4. Since an ULP of 1 is 0x1p-23, that is 0xe.8ba2e8p19 ULP = 0xe8ba2e.8/2 ULP (shifted 20 bits and divided by 2, netting 19 bits) = 0x745d17.4 ULP. That exceeds the specified tolerance of 0x745d17, so the correct answer would be 0.

This error is caused by rounding in max_frac-scaled_min_frac.

An easy escape from this problem is to specify that ulps must be less than .5/limits::epsilon. Then rounding occurs in max_frac-scaled_min_frac only if the difference (even when rounded) exceeds ulps; if the difference is less than that, the subtraction is exact, by Sterbenz’ Lemma.

There was a suggestion about using long double to correct this. However, long double would not correct this. Consider comparing 1 and -0x1p-149f with ulps set to 1/limits::epsilon. Unless your significand has 149 bits, the subtraction result rounds to 1, which is less than or equal to 1/limits::epsilon ULP. Yet the mathematical difference clearly exceeds 1.

Minor Note

The expression factor * limits::epsilon / 2 converts factor to the floating-point type, which causes rounding errors for large values of factor that are not exactly representable. Likely, the routine is not intended to be used with such large values (millions of ULPs in float), so this ought to be specified as a limit on the routine rather than a bug.

Leave a Comment