Instruction decoding when instructions are length-variable

There are very good reasons to have a fixed instruction length, implementation simplicity being the most important. That’s why many processors do indeed have a fixed instruction length, like RISC processors and many early computers.

CISC instruction sets like x86 are desinged to be decoded sequentially (step by step) by microcode. (you can think of microcode as a kind of interpreter for CISC instructions) That was the state of the art in the early 80s when x86 was designed.

Nowadays this is a problem, because microcode is dead. x86 instructions are now broken into smaller ยต-ops, not unlike RISC instructions. But to do so, the x86 instructions have to be decoded first. And current CPUs decode up to 4 instructions each cycle. Because there is no time to decode sequentially one instruction after another, this works simply by brute force. When a line is brought in from the instruction cache, many decoders decode the line in parallel. One instruction decoder at each possible byte offset. After the deocoding, the length of each instruction is known, and the processor decides which decoders actually provide valid instructions. This is wasteful, but very fast.

Variable instruction sizes introduce more headakes, e.g. an instruction can span two cache lines or even two pages in memory. So your observation is spot on. Today nobody would design a CISC instruction set like x86. However, some RISCs have recently introduced a second instruction size to get more compact code: MIPS16, ARM-Thumb, etc.

Leave a Comment