Difference between rdtscp, rdtsc : memory and cpuid / rdtsc?

As mentioned in a comment, there’s a difference between a compiler barrier and a processor barrier. volatile and memory in the asm statement act as a compiler barrier, but the processor is still free to reorder instructions.

Processor barrier are special instructions that must be explicitly given, e.g. rdtscp, cpuid, memory fence instructions (mfence, lfence, …) etc.

As an aside, while using cpuid as a barrier before rdtsc is common, it can also be very bad from a performance perspective, since virtual machine platforms often trap and emulate the cpuid instruction in order to impose a common set of CPU features across multiple machines in a cluster (to ensure that live migration works). Thus it’s better to use one of the memory fence instructions.

The Linux kernel uses mfence;rdtsc on AMD platforms and lfence;rdtsc on Intel. If you don’t want to bother with distinguishing between these, mfence;rdtsc works on both although it’s slightly slower as mfence is a stronger barrier than lfence.

Edit 2019-11-25: As of Linux kernel 5.4, lfence is used to serialize rdtsc on both Intel and AMD. See this commit “x86: Remove X86_FEATURE_MFENCE_RDTSC”: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be261ffce6f13229dad50f59c5e491f933d3167f

Leave a Comment