Mis-aligned pointers on x86

The situations are uncommon where unaligned access will cause problems on an x86 (beyond having the memory access take longer). Here are some of the ones I’ve heard about:

  1. You might not count this as x86 issue, but SSE operations benefit from alignment. Aligned data can be used as a memory source operand to save instructions. Unaligned-load instructions like movups are slower than movaps on microarchitectures before Nehalem, but on Nehalem and later (and AMD Bulldozer-family), unaligned 16-byte loads/stores are about as efficient as unaligned 8-byte loads/stores; single uop and no penalty at all if the data happens to be aligned at runtime or doesn’t cross a cache-line boundary, otherwise efficient hardware support for cache-line splits. 4k splits are very expensive (~100 cycles) until Skylake (down to ~10 cycles like a cache line split). See https://agner.org/optimize/ and performance links in the x86 tag wiki for more info.

  2. interlocked operations (like lock add [mem], eax) are very slow if they aren’t sufficiently aligned, especially if they cross a cache-line boundary so they can’t just use a cache-lock inside the CPU core. On older (buggy) SMP systems, they might actually fail to be atomic (see https://blogs.msdn.com/oldnewthing/archive/2004/08/30/222631.aspx).

  3. and another possibility discussed by Raymond Chen is when dealing with devices that have hardware banked memory (admittedly an oddball situation) – https://blogs.msdn.com/oldnewthing/archive/2004/08/27/221486.aspx

  4. I recall (but don’t have a reference for – so I’m not sure about this one) similar problems with unaligned accesses that straddle page boundaries that also involve a page fault. I’ll see if I can dig up a reference for this.

And I learned something new when looking into this question (I was wondering about the “$ps |= (1<<18)” GDB command that was mentioned in a couple places). I didn’t realize that x86 CPUs (starting with the 486 it seems) have the ability to cause an exception when a misaligned access is performed.

From Jeffery Richter’s “Programming Applications for Windows, 4th Ed”:

Let’s take a closer look at how the x86 CPU handles data alignment. The x86 CPU contains a special bit flag in its EFLAGS register called the AC (alignment check) flag. By default, this flag is set to zero when the CPU first receives power. When this flag is zero, the CPU automatically does whatever it has to in order to successfully access misaligned data values. However, if this flag is set to 1, the CPU issues an INT 17H interrupt whenever there is an attempt to access misaligned data. The x86 version of Windows 2000 and Windows 98 never alters this CPU flag bit. Therefore, you will never see a data misalignment exception occur in an application when it is running on an x86 processor.

This was news to me.

Of course the big problem with misaligned accesses is that when you eventually go to compile the code for a non-x86/x64 processor you end up having to track down and fix a whole bunch of stuff, since virtually all other 32-bit or larger processors are sensitive to alignment issues.

Leave a Comment