C / C++ MultiDimensional Array Internals

A two-dimensional array:

int foo[5][4];

is nothing more or less than an array of arrays:

typedef int row[4];   /* type "row" is an array of 4 ints */
row foo[5];           /* the object "foo" is an array of 5 rows */

There are no pointer objects here, either explicit or implicit.

Arrays are not pointers. Pointers are not arrays.

What often causes confusion is that an array expression is, in most contexts, implicitly converted to a pointer to its first element. (And a separate rule says that what looks like an array parameter declaration is really a pointer declaration, but that doesn’t apply in this example.) An array object is an array object; declaring such an object does not create any pointer objects. Referring to an array object can create a pointer value (the address of the array’s first element), but there is no pointer object stored in memory.

The array object foo is stored in memory as 5 contiguous elements, where each element is itself an array of 4 contiguous int elements; the whole thing is therefore stored as 20 contiguous int objects.

The indexing operator is defined in terms of pointer arithmetic; x[y] is equivalent to *(x + y). Typically the left operand is going to be either a pointer expression or an array expression; if it’s an array expression, the array is implicitly converted to a pointer.

So foo[x][y] is equivalent to *(foo[x] + y), which in turn is equivalent to *(*(foo + x) + y). (Note that no casts are necessary.) Fortunately, you don’t have to write it that way, and foo[x][y] is a lot easier to understand.

Note that you can create a data structure that can be accessed with the same foo[x][y] syntax, but where foo really is a pointer to pointer to int. (In that case, the prefix of each [] operator is already a pointer expression, and doesn’t need to be converted.) But to do that, you’d have to declare foo as a pointer-to-pointer-to-int:

int **foo;

and then allocate and initialize all the necessary memory. This is more flexible than int foo[5][4], since you can determine the number of rows and the size (or even existence) of each row dynamically.

Section 6 of the comp.lang.c FAQ explains this very well.

EDIT:

In response to Arrakis’s comment, it’s important to keep in mind the distinction between type and representation.

For example, these two types:

struct pair { int x; int y;};
typedef int arr2[2];

very likely have the same representation in memory (two consecutive int objects), but the syntax to access the elements is quite different.

Similarly, the types int[5][4] and int[20] have the same memory layout (20 consecutive int objects), but the syntax to access the elements is different.

You can access foo[2][2] as ((int*)foo)[10] (treating the 2-dimensional array as if it were a 1-dimensional array). And sometimes it’s useful to do so, but strictly speaking the behavior is undefined. You can likely get away with it because most C implementations don’t do array bounds-checking. On the other hand, optimizing compilers can assume that your code’s behavior is defined, and generate arbitrary code if it isn’t.

Leave a Comment