If by “modern” you mean processors supporting the sort of SSE2 instructions that you quote in your question as produced by your compiler (mulsd
, …), then the answer is no, strictfp
does not make a difference, because the instruction set does not allow to take advantage of the absence of strictfp
. The available instructions are already optimal to compute to the precise specifications of strictfp
. In other words, on that kind of modern CPU, you get strictfp
semantics all the time for the same price.
If by “modern” you mean the historical 387 FPU, then it is possible to observe a difference if an intermediate computation would overflow or underflow in strictfp
mode (the difference being that it may not overflow or, on underflow, keep more precision bits than expected).
A typical strictfp
computation compiled for the 387 will look like the assembly in this answer, with well-placed multiplications by well-chosen powers of two to make underflow behave the same as in IEEE 754 binary64. A round-trip of the result through a 64-bit memory location then takes care of overflows.
The same computation compiled without strictfp
would produce one 387 instruction per basic operation, for instance just the multiplication instruction fmulp
for a source-level multiplication. (The 387 would have been configured to use the same significand width as binary64, 53 bits, at the beginning of the program.)