What is Partial Linking in GNU Linker?

Minimal runnable example

Here I produce a minimal example and compile it in two ways to produce functionally identical executables:

one combined f12.c file without partial linking linking into f12.o
two separate f1.c and f2.c which are first partially linked into f12_r.o

main.c

#include <assert.h>
#include <stdlib.h>

int f_1_2(void);
int f_2_1(void);

int main(void) {
    assert(f_1_2() + f_2_1() == 5);
    return EXIT_SUCCESS;
}

f1.c

#include "f1.h"

f2.c

#include "f2.h"

f12.c

#include "f1.h"
#include "f2.h"

f1.h

int f_2(void);

int f_1_2(void) {
    return f_2() + 1;
}

int f_1(void) {
    return 1;
}

f2.h

int f_1(void);

int f_2_1(void) {
    return f_1() + 1;
}

int f_2(void) {
    return 2;
}

run.sh

#!/usr/bin/env bash
set -eux
cflags="-ggdb3 -std=c99 -O0 -fPIE -pie"
gcc $cflags -c -o f1.o f1.c
gcc $cflags -c -o f2.o f2.c
gcc $cflags -c -o f12.o f12.c
ld -o f12_r.o -r f1.o f2.o
gcc $cflags -c -o main.o main.c
gcc $cflags -o main.out f12.o main.o
gcc $cflags -o main_r.out f12_r.o main.o
./main.out
./main_r.out

GitHub upstream.

If we try the same thing but without ld -r, then we get the final warnings:

+ ld -o f12_r.o f1.o f2.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
+ gcc -ggdb3 -std=c99 -O0 -fPIE -pie -o main_r.out f12_r.o main.o
/usr/bin/ld: error in f12_r.o(.eh_frame); no .eh_frame_hdr table will be created

none of them makes makes the tool exit non-0, and the final executable still runs, so I’m not sure how bad it is. TODO understand.

Binary analysis

If you are not familiar with relocation, first read this: What do linkers do?

The key question is how could partial linking speed up the link. The only thing I could think of was by resolving references across pre-linked files. I’ve focused on this for now.

However, it does not do that as asked at: Resolve relative relocations in partial link so I would expect it not to speed up link significantly.

I have confirmed this with:

objdump -S f12.o
objdump -S f12_r.o

both of which produce identical outputs that contain:

int f_1_2(void) {
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
    return f_2() + 1;
   4:   e8 00 00 00 00          callq  9 <f_1_2+0x9>
   9:   83 c0 01                add    $0x1,%eax
}
   c:   5d                      pop    %rbp
   d:   c3                      retq

so we see that the call to f_1_2 has not yet been resolved in either case because the relative offset address is still 0: e8 00 00 00 00 (e8 is the opcode).

This also taught me that GCC does not resolve function calls before the final link either TODO rationale, possible to force it to resolve?

Benchmark

I had benchmarked LD vs GOLD at: Replacing ld with gold – any experience? so I decided to reuse it to see if partial linking leads to any link speedup.

I generated the test objects with this script:

./generate-objects 100 1000 100

and then I started with the most extreme link case possible: pre-link everything except the main file, and then benchmark the final link:

mv main.o ..
ld -o partial.o -r *.o
time gcc               partial.o ../main.o
time gcc -fuse-ld=gold partial.o ../main.o

The wall clock time results in seconds were as follows:

          No partial link   Partial link
No Gold   6.15              5.756
Gold      4.06              4.457

Therefore:

the time difference exists, but is not very significant
without gold it went faster, but with GOLD it became slower!

Therefore, based on this experiment, it seems that partial linking may not speed up your link time, at all, and I’d just recommend you to try GOLD instead to start with.

Let me know if you can produce a concrete example where incremental linking leads to significant speedup.

Case study: the Linux kernel

The Linux kernel is one example of a large project that used to use incremental linking, so maybe we can learn something from it.

It has since moved to ar T thin archives as shown at: https://unix.stackexchange.com/questions/5518/what-is-the-difference-between-the-following-kernel-makefile-terms-vmlinux-vml/482978#482978

The initial commit and rationale are at: a5967db9af51a84f5e181600954714a9e4c69f1f (included in v4.9) whose commit message says:

ld -r is an incremental link used to create built-in.o files in build
subdirectories. It produces relocatable object files containing all
its input files, and these are are then pulled together and relocated
in the final link. Aside from the bloat, this constrains the final
link relocations, which has bitten large powerpc builds with
unresolvable relocations in the final link.

this is also mentioned at Documentation/process/changes.rst:

Binutils
--------

The build system has, as of 4.13, switched to using thin archives (`ar T`)
rather than incremental linking (`ld -r`) for built-in.a intermediate steps.
This requires binutils 2.20 or newer.

TODO: find out when incremental linking was introduced, and see if there is a minimal test case that we can use to see it going faster: https://unix.stackexchange.com/questions/491312/why-does-the-linux-kernel-build-system-use-incremental-linking-or-ar-t-thin-arch

Tested on Ubuntu 18.10, GCC 8.2.0, Lenovo ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB), Samsung MZVLB512HAJQ-000L7 SSD (3,000 MB/s).

More Related Contents:

Leave a Comment Cancel reply