Why does the ARM PC register point to the instruction after the next one to be executed?

It’s a nasty bit of legacy abstraction leakage.

The original ARM design had a 3-stage pipeline (fetch-decode-execute). To simplify the design they chose to have the PC read as the value currently on the instruction fetch address lines, rather than that of the currently executing instruction from 2 cycles ago. Since most PC-relative addresses are calculated at link time, it’s easier to have the assembler/linker compensate for that 2-instruction offset than to design all the logic to ‘correct’ the PC register.

Of course, that’s all firmly on the “things that made sense 30 years ago” pile. Now imagine what it takes to keep a meaningful value in that register on today’s 15+ stage, multiple-issue, out-of-order pipelines, and you might appreciate why it’s hard to find a CPU designer these days who thinks exposing the PC as a register is a good idea.

Still, on the upside, at least it’s not quite as horrible as delay slots. Instead, contrary to what you suppose, having every instruction execute conditionally was really just another optimisation around that prefetch offset. Rather than always having to take pipeline flush delays when branching around conditional code (or still executing whatever’s left in the pipe like a crazy person), you can avoid very short branches entirely; the pipeline stays busy, and the decoded instructions can just execute as NOPs when the flags don’t match*. Again, these days we have effective branch predictors and it ends up being more of a hindrance than a help, but for 1985 it was cool.

* “…the instruction set with the most NOPs on the planet.”

Leave a Comment