Modular arithmetics and NTT (finite field DFT) optimizations
First off, thank you very much for posting and making it free to use. I really appreciate that. I was able to use some bit tricks to eliminate some branching, rearranged the main loop, and modified the assembly, and was able to get a 1.35x speedup. Also, I added a preprocessor condition for 64 bit, … Read more