How to count clock cycles with RDTSC in GCC x86? [duplicate]

The other answers work, but you can avoid inline assembly by using GCC’s __rdtsc intrinsic, available by including x86intrin.h. It is defined at: gcc/config/i386/ia32intrin.h: /* rdtsc */ extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) __rdtsc (void) { return __builtin_ia32_rdtsc (); }

Calculate system time using rdtsc

The idea is not unsound but it is not suited for user-mode applications, for which, as @Basile suggested, there are better alternatives. Intel itself suggests to use the TSC as a wall-clock: The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behaviour moving forward. … Read more

Difference between rdtscp, rdtsc : memory and cpuid / rdtsc?

As mentioned in a comment, there’s a difference between a compiler barrier and a processor barrier. volatile and memory in the asm statement act as a compiler barrier, but the processor is still free to reorder instructions. Processor barrier are special instructions that must be explicitly given, e.g. rdtscp, cpuid, memory fence instructions (mfence, lfence, … Read more