how do compilers assign memory addresses to variables?

The answer to this question is quite complex since there are various approaches to memory allocation depending on variable scope, size and programming environment.

Stack allocated variables

Typically local variables are put on the “stack”. This means that the compiler assigns an offset to the “stack pointer” which can be different depending on the invocation of the current function. I.e. the compiler assumes that memory locations like Stack-Pointer+4, Stack-Pointer+8, etc. are accessible and usable by the program. Upon return-ing from the function the memory locations are not guaranteed to retain these values.

This is mapped into assembly instructions similar to the following. esp is the stack pointer, esp + N refers to a memory location relative to esp:

mov eax, DWORD PTR SS:[esp]
mov eax, DWORD PTR SS:[esp + 4]
mov eax, DWORD PTR SS:[esp + 8]

Heap

Then there are variables that are heap-allocated. This means that there is a library call to request memory from the standard library (alloc in C or new in C++). This memory is reserved until the end of the programs execution. alloc and new return pointers to memory in a region of memory called the heap. The allocating functions have to make sure that the memory is not reserved which can make heap-allocation slow at times. Also, if you don’t want to run out of memory you should free (or delete) memory that is not used anymore. Heap allocation is quite complicated internally since the standard library has to keep track of used and unsused ranges in memory as well as freed ranges of memory. Therefore even freeing a heap-allocated variable can be more time-consuming than allocating it. For more information see How is malloc() implemented internally?

Understanding the difference between stack and heap is quite fundamental to learning how to program in C and C++.

Arbitrary Pointers

Naively one might assume, that by setting a pointer to an arbitrary address int *a = 0x123 it should be possible to address arbitrary locations in the computer’s memory. This does not exactly hold true since (depending on the CPU und system) programs are heavily restricted when addressing memory.

Getting a feel for memory

In a guided classroom experience, it might be beneficial to explore some simple C code by compiling source code to assembler (gcc can do this for example). A simple function such as int foo(int a, int b) { return a+b;} should suffice (without optimizations). Then see something like int bar(int *a, int *b) { return (*a) + (*b);};

When invoking bar, allocate the parameters once on the stack, once per malloc.

Conclusion

The compiler does perform some variable placment and alignment relative to base-adresses which are obtained by the program/standard library at runtime.

For a deep understanding of memory related questions see Ulrich Drepper’s “What every programmer should know about memory” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.957

Apart from C-ish Country idenote

Then there also is Garbage Collection which is popular among lots of scripting languages (Python, Perl, Javascript, lisp) and device independent environments (Java, C#). It is related to heap allocation but slightly more complicated.

Varieties of programming languages are only heap-based (stackless python) or entirely stack based (forth).

I think the answer to this question starts with an understanding of the layout of a program in memory. Underneath the operating system, a computer’s main memory is just a giant array. When you run a program, the operating system will take a chunk of this memory and break it up into logical sections for the following purposes:

  • stack: this area of memory stores information about all functions currently in scope, including the currently running function and all of its ancestors. Information stored includes local variables and the address to return to when the function is done.

  • heap: this area of memory is used when you want to dynamically allocate some storage. Generally your local variable would then contain an address (ie, it would be a pointer) in the heap where your data is stored, and you could publish this address to other parts of your program without worrying that your data will be overwritten when the current function goes out of scope.

  • data, bss, text segments: these are more or less outside the scope of this particular question, but they store things such as global data and the program itself.

Hope that helps. There are lots of good resources online as well. I just googled “layout of a program in memory” and found this one: http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory

Leave a Comment