Why does mulss take only 3 cycles on Haswell, different from Agner’s instruction tables? (Unrolling FP loops with multiple accumulators)

Leave a Comment