Why can’t C compilers rearrange struct members to eliminate alignment padding? [duplicate]

There are multiple reasons why the C compiler cannot automatically reorder the fields:

  • The C compiler doesn’t know whether the struct represents the memory structure of objects beyond the current compilation unit (for example: a foreign library, a file on disc, network data, CPU page tables, …). In such a case the binary structure of data is also defined in a place inaccessible to the compiler, so reordering the struct fields would create a data type that is inconsistent with the other definitions. For example, the header of a file in a ZIP file contains multiple misaligned 32-bit fields. Reordering the fields would make it impossible for C code to directly read or write the header (assuming the ZIP implementation would like to access the data directly):

    struct __attribute__((__packed__)) LocalFileHeader {
        uint32_t signature;
        uint16_t minVersion, flag, method, modTime, modDate;
        uint32_t crc32, compressedSize, uncompressedSize;
        uint16_t nameLength, extraLength;
    };
    

    The packed attribute prevents the compiler from aligning the fields according to their natural alignment, and it has no relation to the problem of field ordering. It would be possible to reorder the fields of LocalFileHeader so that the structure has both minimal size and has all fields aligned to their natural alignment. However, the compiler cannot choose to reorder the fields because it does not know that the struct is actually defined by the ZIP file specification.

  • C is an unsafe language. The C compiler doesn’t know whether the data will be accessed via a different type than the one seen by the compiler, for example:

    struct S {
        char a;
        int b;
        char c;
    };
    
    struct S_head {
        char a;
    };
    
    struct S_ext {
        char a;
        int b;
        char c;
        int d;
        char e;
    };
    
    struct S s;
    struct S_head *head = (struct S_head*)&s;
    fn1(head);
    
    struct S_ext ext;
    struct S *sp = (struct S*)&ext;
    fn2(sp);
    

    This is a widely used low-level programming pattern, especially if the header contains the type ID of data located just beyond the header.

  • If a struct type is embedded in another struct type, it is impossible to inline the inner struct:

    struct S {
        char a;
        int b;
        char c, d, e;
    };
    
    struct T {
        char a;
        struct S s; // Cannot inline S into T, 's' has to be compact in memory
        char b;
    };
    

    This also means that moving some fields from S to a separate struct disables some optimizations:

    // Cannot fully optimize S
    struct BC { int b; char c; };
    struct S {
        char a;
        struct BC bc;
        char d, e;
    };
    
  • Because most C compilers are optimizing compilers, reordering struct fields would require new optimizations to be implemented. It is questionable whether those optimizations would be able to do better than what programmers are able to write. Designing data structures by hand is much less time consuming than other compiler tasks such as register allocation, function inlining, constant folding, transformation of a switch statement into binary search, etc. Thus the benefits to be gained by allowing the compiler to optimize data structures appear to be less tangible than traditional compiler optimizations.

Leave a Comment