How exactly does the callstack work?

The call stack could also be called a frame stack.
The things that are stacked after the LIFO principle are not the local variables but the entire stack frames (“calls”) of the functions being called. The local variables are pushed and popped together with those frames in the so-called function prologue and epilogue, respectively.

Inside the frame the order of the variables is completely unspecified; Compilers “reorder” the positions of local variables inside a frame appropriately to optimize their alignment so the processor can fetch them as quickly as possible. The crucial fact is that the offset of the variables relative to some fixed address is constant throughout the lifetime of the frame – so it suffices to take an anchor address, say, the address of the frame itself, and work with offsets of that address to the variables. Such an anchor address is actually contained in the so-called base or frame pointer which is stored in the EBP register. The offsets, on the other hand, are clearly known at compile time and are therefore hardcoded into the machine code.

This graphic from Wikipedia shows what the typical call stack is structured like1:

Picture of a stack

Add the offset of a variable we want to access to the address contained in the frame pointer and we get the address of our variable. So shortly said, the code just accesses them directly via constant compile-time offsets from the base pointer; It’s simple pointer arithmetic.

Example

#include <iostream>

int main()
{
    char c = std::cin.get();
    std::cout << c;
}

gcc.godbolt.org gives us

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp

    movl    std::cin, %edi
    call    std::basic_istream<char, std::char_traits<char> >::get()
    movb    %al, -1(%rbp)
    movsbl  -1(%rbp), %eax
    movl    %eax, %esi
    movl    std::cout, %edi
    call    [... the insertion operator for char, long thing... ]

    movl    $0, %eax
    leave
    ret

.. for main. I divided the code into three subsections.
The function prologue consists of the first three operations:

  • Base pointer is pushed onto the stack.
  • The stack pointer is saved in the base pointer
  • The stack pointer is subtracted to make room for local variables.

Then cin is moved into the EDI register2 and get is called; The return value is in EAX.

So far so good. Now the interesting thing happens:

The low-order byte of EAX, designated by the 8-bit register AL, is taken and stored in the byte right after the base pointer: That is -1(%rbp), the offset of the base pointer is -1. This byte is our variable c. The offset is negative because the stack grows downwards on x86. The next operation stores c in EAX: EAX is moved to ESI, cout is moved to EDI and then the insertion operator is called with cout and c being the arguments.

Finally,

  • The return value of main is stored in EAX: 0. That is because of the implicit return statement.
    You might also see xorl rax rax instead of movl.
  • leave and return to the call site. leave is abbreviating this epilogue and implicitly
    • Replaces the stack pointer with the base pointer and
    • Pops the base pointer.

After this operation and ret have been performed, the frame has effectively been popped, although the caller still has to clean up the arguments as we’re using the cdecl calling convention. Other conventions, e.g. stdcall, require the callee to tidy up, e.g. by passing the amount of bytes to ret.

Frame Pointer Omission

It is also possible not to use offsets from the base/frame pointer but from the stack pointer (ESB) instead. This makes the EBP-register that would otherwise contain the frame pointer value available for arbitrary use – but it can make debugging impossible on some machines, and will be implicitly turned off for some functions. It is particularly useful when compiling for processors with only few registers, including x86.

This optimization is known as FPO (frame pointer omission) and set by -fomit-frame-pointer in GCC and -Oy in Clang; note that it is implicitly triggered by every optimization level > 0 if and only if debugging is still possible, since it doesn’t have any costs apart from that.
For further information see here and here.


1 As pointed out in the comments, the frame pointer is presumably meant to point to the address after the return address.

2 Note that the registers that start with R are the 64-bit counterparts of the ones that start with E. EAX designates the four low-order bytes of RAX. I used the names of the 32-bit registers for clarity.

Leave a Comment