What are the semantics of ADRP and ADRL instructions in ARM assembly?

Question

ADR

ADR is a simple PC-relative address calculation: you give it an immediate offset, and it stores in the register the address relative to the current PC.

For example, if the following ADR instruction is placed at position 0x4000 in memory:

adr x0, #1

then after this instruction is executed x0 now contains the value 0x4001. On GitHub with runnable assertion.

We could instead of that just try to do:

mov x0, #0x4001

but PC-relative addressing has the following advantages:

all ARMv7 / ARMv8 instructions are 4 bytes long. This in in large contrast to x86 where instruction widths are variable.

This simplifies a lot of things, but it has one unfortunate implication: you cannot encode full addresses (4 / 8 bytes) in a single instruction, since we need some bits to encode the instruction itself.

Even though we cannot store full addresses, we can refer to some of them (those that fit into the encoding) by the relative address to the PC, which is often enough for many applications, since we are often only jumping to nearby code locations.

The rationale here is analogous to that of the existence of the ldr = pseudo instruction: Why use LDR over MOV (or vice versa) in ARM assembly?
it allows for position independent code, which is fundamental to avoid shared libraries from clashing in memory, and but is also useful for the main text segment to enable ASLR, see also: What is the -fPIE option for position-independent executables in gcc and ld?
the generated code is smaller

The ADR instruction uses a 21-bit immediate for the offset, which allows for +-1MiB jumps (20-bits + 1 for the sign).

In ARmv7/aarch32, ADR can sometimes be achieved with ADD and SUB with PC as documented on the ARMv7 DDI 0406C.d manual D9.4 “Explicit use of the PC in ARM instructions”:

Some forms of the ADR instruction can be expressed as forms of ADD or SUB, with the PC as Rn. Those forms of ADD and SUB are permitted, and
not deprecated.

TODO when can it not be achieved with ADD? GNU GAS suggests that ADR is just a pseudo-op that always assembles into ADD or SUB: https://sourceware.org/binutils/docs-2.31/as/ARM-Opcodes.html#ARM-Opcodes

This instruction will load the address of label into the indicated register. The instruction will evaluate to a PC relative ADD or SUB instruction depending upon where the label is located. If the label is out of range, or if it is not defined in the same file (and section) as the ADR instruction, then an error will be generated. This instruction will not make use of the literal pool.

In ARMv8 aarch64 however, the PC cannot be used in every instruction like a general purpose register, therefore ADR is actually important there and has a separate encoding: Howto write PC relative adressing on arm asm?

ADRP

ADRP is similar to ADR, but it:

shifts pages (4KiB, P in ADRP stands for Page) relative to the current pages instead of just bytes
zeroes out the 12 lower bits

For example, if the following ADRP instruction is placed at position 0x4050 in memory:

adrp x0, #0x1000

then after this instruction is executed x0 now contains the value 0x5000 (+ 0x1000 and zero out the first 12 bits).

Note however that the above syntax is only educational, as GNU GAS does not appear to accept literal integer constants as arguments, only symbols. (or it treats 0x1000 as a symbol and link fails, something along those lines, no time to understand it fully now TODO).

Since the lower 12 bits are zeroed out, to calculate a full address, ADRP is normally used together with an ADD + :lo12: relocation as in:

adrp x0, myvariable
add x0, x0, :lo12:myvariable

On GitHub with runnable assertion.

Note that the :lo12: just extracts the lower 12 bits of myvariable to an immediate, the final instruction produced by the linker is just an add x0, x0, #<immediate>, see also: AArch64 relocation prefixes and What do linkers do?.

The advantage of ADRP over ADR is that we can jump much further (+-4GiB), at the cost of needing to do an extra ADD after ADRP to set the lower 12-bits. The ARMv8 manual says:

The ADR instruction adds a signed, 21-bit immediate to the value of the program counter that fetched this instruction, and then writes the result to a general-purpose register. This permits the calculation of any byte address within ±1MB of the current PC.

The ADRP instruction shifts a signed, 21-bit immediate left by 12 bits, adds it to the value of the program counter with the bottom 12 bits cleared to zero, and then writes the result to a general-purpose register. This permits the calculation of the address at a 4KB aligned memory region. In conjunction with an ADD (immediate) instruction, or a Load/Store instruction with a 12-bit immediate offset, this allows for the calculation of, or access to, any address within ±4GB of the current PC.

Another limitation of ADRP is that unlike ADR, it would break if you were to load the code to memory at a position that is not offset by a multiple of a 4K relative to the original linker offset (e.g. due to ASLR) . For example, if you shift a little bit up, the target address could fall on the next page, while the PC location stays on the old one, making ADRP point to the wrong page. However, executables that rely on ADRP are still considered PIE, and systems such as the dynamic linker/ASLR can only relocate in memory by multiples of 4K, related: How is the address of the text section of a PIE executable determined in Linux?

ADRP only exists in ARMv8, not in ARMv7.

The ARMv8 DDI 0487C.a manual says that Page is just a mnemonic for 4KB, and does not reflect the actual page size, which is configurable to other sizes. C3.3.5 “PC-relative address calculation”:

The term page used in the ADRP description is short-hand for the 4KB memory region, and is not related to the virtual
memory translation granule size.

ADRL

ADRL is is not an actual instruction, just a “pseudo-instruction”, i.e. an assembler shortcut that emits real instructions.

As such, it is not mentioned in the v7 manual, and there is just one mention on the v8 manual at “Instructions that read the PC”, but I can’t find anywhere in the manual that explains it, so maybe it is just a documentation mistake?

I will therefore focus on the GNU AS implementation which documents it at https://sourceware.org/binutils/docs-2.31/as/ARM-Opcodes.html#ARM-Opcodes under ARM specific features:

adrl <register> <label>
This instruction will load the address of label into the indicated register. The instruction will evaluate to one or two PC relative ADD or SUB instructions depending upon where the label is located. If a second instruction is not needed a NOP instruction will be generated in its place, so that this instruction is always 8 bytes long.

Therefore it appears to be able to expand to multiple ADD/SUB, presumably to allow for a larger jump from the PC.

Objdump confirms what the GNU manual says for short addresses:

    adr r0, label
   10478:       e28f0008        add     r0, pc, #8

    adrl r2, label
   10480:       e28f2000        add     r2, pc, #0
   10484:       e1a00000        nop                     ; (mov r0, r0)

TODO: example of long addresses. What is the maximum length? Just 2x that of ADD/ADR?

Trying to use it on aarch64 fails, since it is an ARMv7 specific features according to the GNU GAS manual. The error message on GNU GAS is 2.29.1 is:

Error: unknown mnemonic `adrl' -- `adrl r6,.Llabel'

The Linux kernel has also defined a macro called adr_l at https://patchwork.kernel.org/patch/9883301/ TODO understand rationale.

Alternatives

One main alternative for when the PC offset is too long to encode into the instruction either, is to use movk / movw / movt, see: What is the difference between =label (equals sign) and [label] (brackets) in ARMv6 assembly?

More Related Contents:

Leave a Comment Cancel reply