If you want the fastest way, you will need to use non-portable methods.
Windows/MSVC:
GCC:
These typically map directly to native hardware instructions. So it doesn’t get much faster than these.
But since there’s no C/C++ functionality for them, they’re only accessible via compiler intrinsics.