floating-point - w3toppers.com

clang 14.0.0 floating point optimizations

This is a pretty deep rabbit hole, and I don’t know if I’ve explored all of its twists and turns yet. But here’s a first draft of an answer; suggestions for improvement are welcome. At its core, the culprit is the so-called “fused multiply-add” (or, in this case, a fused multiply-subtract). Fused multiply-add is a … Read more

Using software floating point on x86 linux

It is surprising that gcc doesn’t support this natively as the code is clearly available in the source within a directory called soft-fp. It’s possible to compile that library manually: $ svn co svn://gcc.gnu.org/svn/gcc/trunk/libgcc/ libgcc $ cd libgcc/soft-fp/ $ gcc -c -O2 -msoft-float -m32 -I../config/arm/ -I.. *.c $ ar -crv libsoft-fp.a *.o There are a … Read more

How to convert hex string to float in Java?

public class Test { public static void main (String[] args) { String myString = “BF800000”; Long i = Long.parseLong(myString, 16); Float f = Float.intBitsToFloat(i.intValue()); System.out.println(f); System.out.println(Integer.toHexString(Float.floatToIntBits(f))); } }

pow() cast to integer, unexpected result

Some investigation in assembly code. (OllyDbg) #include <stdio.h> #include <math.h> int main(void) { int x1 = (int) pow(10, 2); int x2 = (int) pow(10, 2); printf(“%d %d”, x1, x2); return 0; } The related assembly section: FLD QWORD PTR DS:[402000] // Loads 2.0 onto stack SUB ESP,8 FSTP QWORD PTR SS:[ESP] FLD QWORD PTR DS:[402008] … Read more

Having parameter (constant) variable with NaN value in Fortran

To add to Vladimir F’s answer I’ll mention that gfortran 5.0 (but not earlier) supports the IEEE intrinsic modules. Instead of real x x=0 x=0/x one can use use, intrinsic :: iso_fortran_env use, intrinsic :: ieee_arithmetic integer(int32) i real(real32) x x = ieee_value(x, ieee_quiet_nan) i = transfer(x,i) This gives you a little flexibility over which … Read more

How is floating point conversion actually done in C++?(double to float or float to double)

The cvtsd2ss instruction uses the FPU’s rounding mode to do the conversion. The default rounding mode is round-to-nearest-even. In order to follow the algorithm, it helps to keep in mind the information at the IEEE 754-1985 Wikipedia page, especially the diagrams representing the layout. First, the exponent of the target float is computed: the double … Read more

Does setprecision in c++ round? If so why am I seeing this?

The number 0.298475 isn’t representable exactly in a double (since it’s not a fraction whose denominator is a power of two, being 11939/40000), and the actual number that is stored is actually closer to 0.29847 than to 0.29848.

How to get floats value without including exponential notation

Try this Console.WriteLine(dummy.ToString(“F”)); You can also specify number of decimal places. For example F5, F3, etc. Also, you can check custom format specifier Console.WriteLine(dummy.ToString(“0.#########”));

How do you get the next value in the floating-point sequence? [duplicate]

Here are five (really four-and-a-half) possible solutions. Solution 1: use Python 3.9 or later Python 3.9, released in October 2020, includes a new standard library function math.nextafter which provides this functionality directly: use math.nextafter(x, math.inf) to get the next floating-point number towards positive infinity. For example: >>> from math import nextafter, inf >>> nextafter(100.0, inf) … Read more

Why do you need an explicit `-lm` compiler option? [duplicate]

It is to accomodate systems (mainly embedded) where floating point math is not possible or necessary. It’s kind of historical indeed but don’t forget that gcc and most other C compilers were written in a time where a 386SX was considered a high performance processor. To give an example, when I still worked in embedded … Read more