C typically compiles to assembler, just because that makes life easy for the poor compiler writer.
Assembly code always assembles (not “compiles”) to relocatable object code. You can think of this as binary machine code and binary data, but with lots of decoration and metadata. The key parts are:
-
Code and data appear in named “sections”.
-
Relocatable object files may include definitions of labels, which refer to locations within the sections.
-
Relocatable object files may include “holes” that are to be filled with the values of labels defined elsewhere. The official name for such a hole is a relocation entry.
For example, if you compile and assemble (but don’t link) this program
int main () { printf("Hello, world\n"); }
you are likely to wind up with a relocatable object file with
-
A
text
section containing the machine code formain
-
A label definition for
main
which points to the beginning of the text section -
A
rodata
(read-only data) section containing the bytes of the string literal"Hello, world\n"
-
A relocation entry that depends on
printf
and that points to a “hole” in a call instruction in the middle of a text section.
If you are on a Unix system a relocatable object file is generally called a .o file, as in hello.o
, and you can explore the label definitions and uses with a simple tool called nm
, and you can get more detailed information from a somewhat more complicated tool called objdump
.
I teach a class that covers these topics, and I have students write an assembler and linker, which takes a couple of weeks, but when they’ve done that most of them have a pretty good handle on relocatable object code. It’s not such an easy thing.