Structure of Arrays vs Array of Structures

Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however – similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory accesses in CUDA).

E.g. for RGB pixels versus separate RGB planes:

struct {
    uint8_t r, g, b;
} AoS[N];

struct {
    uint8_t r[N];
    uint8_t g[N];
    uint8_t b[N];
} SoA;

If you are going to be accessing the R/G/B components of each pixel concurrently then AoS usually makes sense, since the successive reads of R, G, B components will be contiguous and usually contained within the same cache line. For CUDA this also means memory read/write coalescing.

However if you are going to process color planes separately then SoA might be preferred, e.g. if you want to scale all R values by some scale factor, then SoA means that all R components will be contiguous.

One further consideration is padding/alignment. For the RGB example above each element in an AoS layout is aligned to a multiple of 3 bytes, which may not be convenient for CUDA, SIMD, et al – in some cases perhaps even requiring padding within the struct to make alignment more convenient (e.g. add a dummy uint8_t element to ensure 4 byte alignment). In the SoA case however the planes are byte aligned which can be more convenient for certain algorithms/architectures.

For most image processing type applications the AoS scenario is much more common, but for other applications, or for specific image processing tasks this may not always be the case. When there is no obvious choice I would recommend AoS as the default choice.

See also this answer for more general discussion of AoS v SoA.

Leave a Comment