What’s the best way to remember the x86-64 System V arg register order?

If you remember C memcpy‘s arg order, and how rep movsb works, that’s most of the way to remembering x86-64 System V.

The design makes memcpy(dst, src, size) cheap to implement with rep movsb, except leaving RCX unused in more functions because it’s needed for variable-count shifts more often than anything needs RDX.

Then R8 and R9 are the first two “high” registers. Using them requires a REX prefix, which costs an extra byte of code size in instructions that wouldn’t otherwise need one. Thus they’re a sensible choice for the last 2 args. (Windows x64 makes the same choice of using R8, R9 for the last 2 register args).


The actual design process involved minimizing a cost tradeoff of instruction count and code-size for compiling something (perhaps SPECcpu) with a then-current AMD64 port of GCC. I don’t know whether inlining memcpy as rep movsb was relevant, or whether glibc at the time actually implemented it that way, or what.

My answer on Why does Windows64 use a different calling convention from all other OSes on x86-64? cites some sources for the calling convention design decisions. (Early x86-64.org mailing list posts from GCC devs, notably Jan Hubicka who experimented with a few register orders before coming up with this one.)

Of particular note for remembering the RDX, RCX part of the order is this quote:

We are trying to avoid RCX early in the sequence, since it is register
used commonly for special purposes, like EAX, so it has same purpose
to be missing in the sequence. Also it can’t be used for syscalls and
we would like to make syscall sequence to match function call sequence
as much as possible.


User-space vs. syscall difference:

R10 replaces RCX in the system call convention because the syscall instruction itself destroys RCX (using it to save RIP, avoiding using the user-space stack, and it can’t use the kernel stack because it leaves stack switching up to software). Like how it uses R11 to save RFLAGS.

Keeping it as similar as possible allows libc wrappers to just mov %rcx, %r10, not shuffle over multiple args to fill the gap. R10 is the next available register after R8 and R9.


Alternative: a mnemonic:

Diane’s silk dress costs $89

(Suggested by the CS:APP blog)

Leave a Comment