[[carries_dependency]] what it means and how to implement

Just FYI, memory_order_consume (and [[carries_dependency]]) is essentially deprecated because it’s too hard for compilers to efficiently and correctly implement the rules the way C++11 designed them. (And/or because [[carries_dependency]] and/or kill_dependency would end up being needed all over the place.) See P0371R1: Temporarily discourage memory_order_consume.

Current compilers simply treat mo_consume as mo_acquire (and thus on ISAs that need one, put a barrier right after the consume load). If you want the performance of data dependency ordering without barriers, you have to trick the compiler by using mo_relaxed and code carefully to avoid things that would make it likely for the compiler to create asm without an actual dependency. (e.g. Linux RCU). See C++11: the difference between memory_order_relaxed and memory_order_consume for more details and links about that, and the asm feature that mo_consume was designed to expose.

Also Memory order consume usage in C11.
Understanding the concept of dependency ordering (in asm) is basically essential to understanding how this C++ feature is designed.

When [an] atomic variable is being passed as a parameter to the function the compiler will introduce a fence hardware instruction …

You don’t “pass an atomic variable” to a function in the first place; what would that even mean? If you were passing a pointer or reference to an atomic object, the function would be doing its own load from it, and the source code for that function would use memory_order_consume or not.

The relevant thing is passing a value loaded from an atomic variable with mo_consume. Like this:

    int tmp = shared_var.load(std::memory_order_consume);
    func(tmp);

func may use that arg as an index into an array of atomic<int> to do an mo_relaxed load. For that load to be dependency-ordered after the shared_var.load even without a memory barrier, code-gen for func has to make sure that load has an asm data dependency on the arg, even if the C++ code does something like tmp -= tmp; that compilers would normally just treat the same as tmp = 0; (killing the previous value).

But [[carries_dependency]] would make the compiler still reference that zeroed value with a data dependency in implementing something like array[idx+tmp].

the atomic variable value is already consumed and then what dependency the function is carried?

“Already consumed” is not a valid concept. The whole point of consume instead of acquire is that later loads are ordered correctly because they have a data dependency on the mo_consume load result, letting you avoid barriers. Every later load needs such a dependency if you want it ordered after the original load; there is no sense in which you can say a value is “already consumed”.

If you do end up inserting a barrier to promote consume to acquire because of a missing carries_dependency on one function, later functions wouldn’t need another barrier because you could say the value was “already acquired”. (Although that’s not standard terminology. You’d instead say code after the first barrier was ordered after the load.)


It might be useful to understand how the Linux kernel handles this, with their hand-rolled atomics and limited set of compilers they support. Search for “dependency” in
https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt, and note the difference between a “control dependency” like if(flag) data.load() vs. a data dependency like data[idx].load.

IIRC, even C++ doesn’t guarantee mo_consume dependency ordering when the dependency is a conditional like if(x.load(consume)) tmp=y.load();.

Note that compilers will sometimes turn a data dependency into a control dependency if there’s only 2 possible values for example. This would break mo_consume, and be an optimization that wouldn’t be allowed if the value came from a mo_consume load or a [[carries_dependency]] function arg. This is part of why it’s hard to implement; it would require teaching lots of optimization passes about data dependency ordering instead of just expecting users to write code that doesn’t do things which will normally optimize away. (Like tmp -= tmp;)

Leave a Comment