Why can I access lower dword/word/byte in a register but not higher?

why can’t I use multiple higher bytes in a register

Every permutation of an instruction needs to be encoded in the instruction. The original 8086 processor supports the following options:

instruction     encoding    remarks
---------------------------------------------------------
mov ax,value    b8 01 00    <-- whole register
mov al,value    b4 01       <-- lower byte
mov ah,value    b0 01       <-- upper byte

Because the 8086 is a 16 bit processor three different versions cover all options.
In the 80386 32-bit support was added. The designers had a choice, either add support for 3 additional sets of registers (x 8 registers = 24 new registers) and somehow find encodings for these, or leave things mostly as they were before.

Here’s what the designers opted for:

instruction     encoding           remarks
---------------------------------------------------------
mov eax,value    b8 01 00 00 00    (same encoding as mov ax,value!)
mov ax,value     66 b8 01 00       (prefix 66 + encoding for mov eax,value)
mov al,value     (same as before)
mov ah,value     (same as before)

They simply added a 0x66 prefix to change the register size from the (now) default 32 to 16 bit plus a 0x67 prefix to change the memory operand size. And left it at that.

To do otherwise would have meant doubling the number of instruction encodings or add three six new prefixes for each of your ‘new’ partial registers.
By the time the 80386 came out all instruction bytes were already taken, so there was no space for new prefixes. This opcode space had been eaten up by useless instructions like AAA, AAD, AAM, AAS, DAA, DAS SALC. (These have been disabled in X64 mode to free up much needed encoding space).

If you want to change only the higher bytes of a register, simply do:

movzx eax,cl     //mov al,cl, but faster   
shl eax,24       //mov al to high byte.

But why not two (say r8dl and r8dh)

In the original 8086 there were 8 byte sized registers:

al,cl,dl,bl,ah,ch,dh,bh  <-- in this order.

The index registers, base pointer and stack reg do not have byte registers.

In the x64 this was changed. If there is a REX prefix (denoting x64 registers) then al..bh (8 regs) encode al..r15l. 16 regs incl. 1 extra encoding bit from the rex prefix. This adds spl, dil, sil, bpl, but excludes any xh reg. (you can still get the four xh regs when not using a rex prefix).

And using r8b makes the complete r8 “busy”

Yes, this is called a ‘partial register write’. Because writing r8b changes part, but not all of r8, r8 is now split into two halves. One half has changed and one half has not. The CPU needs to join the two halves. It can either do this by using an extra CPU cycle to perform the work, or by adding more circuitry to the task to be able to do it in a single cycle.
The latter is expensive in terms of silicon and complex in terms of design, it also adds extra heat because of the extra work being done (more work per cycle = more heat produced). See Why doesn’t GCC use partial registers? for a run-down on how different x86 CPUs handle partial-register writes (and later reads of the full register).

if I use r8b I can’t access upper 56 bits at the same time, they exist, but unaccessible

No they are not unaccessible.

mov  rax,bignumber         //random value in eax
mov  al,0                  //clear al
xor  r8d,r8d               //r8=0
mov  r8b,16                //set r8b
or   r8,rax                //change r8 upper without changing r8b  

You use masks plus and, or, xor and not and to change parts of a register without affecting the rest of it.

There really was never a need for ah, but it did lead to more compact code on 8086 (and effectively more usable registers). It’s still sometimes useful to write EAX or RAX and then read AL and AH separately (e.g. movzx ecx, al / movzx edx, ah) as part of unpacking bytes.

Leave a Comment