Preventing compiler optimizations while benchmarking

tl;dr doNotOptimizeAway creates an artificial “use”s.

A little bit of terminology here: a “def” (“definition”) is a statement, which assigns a value to a variable; a “use” is a statement, which uses the value of a variable to perform some operation.

If from the point immediately after a def, all the paths to the program exit do not encounter a use of a variable, that def is called dead and Dead Code Elimination (DCE) pass will remove it. Which in turn may cause other defs to become dead (if that def was an use by virtue of having variable operands), etc.

Imagine the program after Scalar Replacement of Aggregates (SRA) pass, which turns the local std::vector in two variables len and ptr. At some point the program assigns a value to ptr; that statement is a def.

Now, the original program didn’t do anything with the vector; in other words there weren’t any uses of either len or ptr. Hence, all of their defs are dead and the DCE can remove them, effectively removing all code and making the benchmark worthless.

Adding doNotOptimizeAway(ptr) creates an artificial use, which prevents DCE from removing the defs. (As a side note, I see no point in the “+”, “g” should have been enough).

A similar line of reasoning can be followed with memory loads and stores: a store (a def) is dead iff there is no path to the end of the program, which contains load (a use) from that store location. As tracking arbitrary memory locations is a lot harder than tracking individual pseudo-register variables, the compiler reasons conservatively – a store is dead if there is no path to the end of the program, which could possibly encounter a use of that store.

One such case, is a store to a region of memory, which is guaranteed to not be aliased – after that memory is deallocated, there could not possibly be a use of that store, which does not trigger undefined behaviour. IOW, there are no such uses.

Thus a compiler could eliminate v.push_back(42). But there comes escape – it causes the v.data() to be considered as arbitrarily aliased, as @Leon described above.

The purpose of clobber() in the example is to create an artificial use of all of the aliased memory. We have a store (from push_back(42)), the store is to a location that is globally aliased (due to the escape(v.data())), hence clobber() could potentially contain a use of that store (IOW, the store side effect to be observable), therefore the compiler is not allowed to remove the store.

A few simpler examples:

Example I:

void f() {
  int v[1];
  v[0] = 42;
}

This does not generate any code.

Example II:

extern void g();

void f() {
  int v[1];
  v[0] = 42;
  g();
}

This generates just a call to g(), no memory store. The function g cannot possibly access v because v is not aliased.

Example III:

void clobber() {
  __asm__ __volatile__ ("" : : : "memory");
}

void f() {
  int v[1];
  v[0] = 42;
  clobber();
}

Like in the previous example, no store generated because v is not aliased and the call to clobber is inlined to nothing.

Example IV:

template<typename T>
void use(T &&t) {
  __asm__ __volatile__ ("" :: "g" (t));
}

void f() {
  int v[1];
  use(v);
  v[0] = 42;
}

This time v escapes (i.e. can be potentially accessed from other activation frames). However, the store is still removed, since after it there were no potential uses of that memory (without UB).

Example V:

template<typename T>
void use(T &&t) {
  __asm__ __volatile__ ("" :: "g" (t));
}

extern void g();

void f() {
  int v[1];
  use(v);
  v[0] = 42;
  g(); // same with clobber()
}

And finally we get the store, because v escapes and the compiler must conservatively assume that the call to g may access the stored value.

(for experiments https://godbolt.org/g/rFviMI)

Leave a Comment