branch prediction on a function pointer

From The microarchitecture of Intel, AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers http://www.agner.org/optimize/microarchitecture.pdf section 3.7 (for Sandy Bridge, other processors are in other sections) Pattern recognition for indirect jumps and calls Indirect jumps and indirect calls (but not returns) are predicted using the same two-level predictor as branch instructions. … Read more

Why did Intel change the static branch prediction mechanism over these years?

The primary reason why static prediction is not favored in modern designs, to the point of perhaps not even being present, is that static predictions occur too late in the pipeline compared to dynamic predictions. The basic issue is that branch directions and target locations must be known before fetching them, but static predictions can … Read more

Branch target prediction in conjunction with branch prediction?

Do read along with the Intel optimization manual, current download location is here. When stale (they move stuff around all the time) then search the Intel site for “Architectures optimization manual”. Keep in mind the info there is fairly generic, they disclose only as much as needed to allow writing efficient code. Branch prediction implementation … Read more

Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?

Several of the answers in the question you link talk about rewriting the code to be branchless and thus avoiding any branch prediction issues. That’s what your updated compiler is doing. Specifically, clang++ 10 with -O3 vectorizes the inner loop. See the code on godbolt, lines 36-67 of the assembly. The code is a little … Read more

What branch misprediction does the Branch Target Buffer detect?

This is a good question! I think the confusion that it’s causing is due to Intel’s strange naming schemes which often overload terms standard in academia. I will try to both answer your question and also clear up the confusion I see in the comments. First of all. I agree that in standard computer science … Read more

Performance optimisations of x86-64 assembly – Alignment and branch prediction

Alignment optimisations 1. Use .p2align <abs-expr> <abs-expr> <abs-expr> instead of align. Grants fine-grained control using its 3 params param1 – Align to what boundary. param2 – Fill padding with what (zeroes or NOPs). param3 – Do NOT align if padding would exceed specified number of bytes. 2. Align the start of a frequently used code … Read more

Why is a conditional move not vulnerable to Branch Prediction Failure?

Mis-predicted branches are expensive A modern processor generally executes between one and three instructions each cycle if things go well (if it does not stall waiting for data dependencies for these instructions to arrive from previous instructions or from memory). The statement above holds surprisingly well for tight loops, but this shouldn’t blind you to … Read more

Avoid stalling pipeline by calculating conditional early

Yes, it can be beneficial to allow the the branch condition to calculated as early as possible, so that any misprediction can be resolved early and the front-end part of the pipeline can start re-filling early. In the best case, the mis-prediction can be free if there is enough work already in-flight to totally hide … Read more