Why do compilers insist on using a callee-saved register here?
TL:DR: Compiler internals are probably not set up to look for this optimization easily, and it’s probably only useful around small functions, not inside large functions between calls. Inlining to create large functions is a better solution most of the time There can be a latency vs. throughput tradeoff if foo happens not to save/restore … Read more