Team LiB
Previous Section Next Section

Data Alignment

Alignment refers to a piece of data's location in memory. A variable is naturally aligned if it exists at a memory address that is a multiple of its size. For example, a 32-bit type is naturally aligned if it is located in memory at an address that is a multiple of four (that is, its lowest two bits are zero). Thus, a data type with size 2n bytes must have an address with the n least significant bits set to zero.

Some architectures have very stringent requirements on the alignment of data. On some systems, usually RISC-based ones, a load of unaligned data results in a processor trap (a handled error). On other systems, accessing unaligned data works, but results in a degradation of performance. When writing portable code, alignment issues must be avoided and all types should be naturally aligned.

Avoiding Alignment Issues

The compiler generally prevents alignment issues by naturally aligning all data types. In fact, alignment issues are normally not major concerns of the kernel developersthe gcc folks have to worry about them. Issues arise, however, when the programmer plays too closely with pointers and accesses data outside the environment anticipated by the compiler.

Accessing an aligned address with a recast pointer of a larger-aligned address causes an alignment issue (whatever that might mean for a particular architecture). That is, this is bad news:

char dog[10];
char *p = &dog[1];
unsigned long l = *(unsigned long *)p;

This example treats the pointer to a char as a pointer to an unsigned long, which might result in the 32-bit unsigned long being loaded from an address that is not a multiple of four.

If you are thinking, "When in the world would I do this?" you are probably right. Nevertheless, it has come up, and it will again, so be careful. The real-world examples might not be so obvious.

Alignment of Nonstandard Types

As mentioned, the aligned address of a standard data type is a multiple of the size of that data type. Nonstandard (complex) C types have the following alignment rules:

  • The alignment of an array is the alignment of the base type (and thus, each element is further aligned correctly).

  • The alignment of a union is the alignment of the largest included type.

  • The alignment of a structure is the alignment of the largest included type.

Structures also introduce padding, which introduces other issues.

Structure Padding

Structures are padded so that each element of the structure is naturally aligned. This ensures that when the processor accesses a given element in the structure, that element itself is aligned. For example, consider this structure on a 32-bit machine:

struct animal_struct {
        char dog;             /* 1 byte */
        unsigned long cat;    /* 4 bytes */
        unsigned short pig;   /* 2 bytes */
        char fox;             /* 1 byte */
};

The structure is not laid out exactly like this in memory because the natural alignment of the structure's members is insufficient. Instead, the compiler creates the structure such that in memory, the struct resembles the following:

struct animal_struct {
        char dog;             /* 1 byte */
        u8 __pad0[3];         /* 3 bytes */
        unsigned long cat;    /* 4 bytes */
        unsigned short pig;   /* 2 bytes */
        char fox;             /* 1 byte */
        u8 __pad1;            /* 1 byte */
};

The padding variables exist to ensure proper natural alignment. The first padding provides a 3-byte waste-of-space to place cat on a 4-byte boundary. This automatically aligns the remaining types because they are all smaller than cat. The second and final padding is to pad the size of the struct itself. The extra byte ensures the structure is a multiple of four, and thus each member of an array of this structure is naturally aligned.

Note that sizeof(animal_struct) returns 12 for either of these structures on most 32-bit machines. The C compiler automatically adds this padding to ensure proper alignment.

You can often rearrange the order of members in a structure to obviate the need for padding. This gives you properly aligned data without the need for padding, and therefore a smaller structure:

struct animal_struct {
        unsigned long cat;    /* 4 bytes */
        unsigned short pig;   /* 2 bytes */
        char dog;             /* 1 byte */
        char fox;             /* 1 byte */
};

This structure is only eight bytes in size. It might not always be possible to rearrange structure definitions, however. For example, if this structure was specified as part of a standard or already used in existing code, its order is set in stone, although such requirements are less common in the kernel (which lacks a formal ABI) than user-space. Often, you might want to use a specific order for other reasonsfor example, to best lay out variables to optimize cache behavior. Note that ANSI C specifies that the compiler itself must never change the order of members in a structure[5]it is always up to you, the programmer. The compiler can help you out, however: The -Wpadded flag instructs gcc to generate a warning whenever padding is added to a structure.

[5] If the compiler could arbitrarily change the order of items in a structure, any existing code using the structure would break. In C, functions calculate the location of variables in a structure simply by adding offsets to the base address of the structure.

Kernel developers need to be aware of structure padding when using structures wholesalethat is, when sending them out over the network or when saving a structure directly to disk, because the required padding might differ among various architectures. This is one reason C does not have a native structure comparison operator. The padding in a structure might contain gibberish, and it is not possible to do a byte-by-byte comparison of one structure to another. The C designers (correctly) felt it is best for the programmer to write a comparison function for each unique situation, to take advantage of the structure's layout.

    Team LiB
    Previous Section Next Section