Why did Intel change the static branch prediction mechanism over these years?

The primary reason why static prediction is not favored in modern designs, to the point of perhaps not even being present, is that static predictions occur too late in the pipeline compared to dynamic predictions. The basic issue is that branch directions and target locations must be known before fetching them, but static predictions can … Read more

Are load ops deallocated from the RS when they dispatch, complete or some other time?

The following experiments suggest that the uops are deallocated at some point before the load completes. While this is not a complete answer to your question, it might provide some interesting insights. On Skylake, there is a 33-entry reservation station for loads (see https://stackoverflow.com/a/58575898/10461973). This should also be the case for the Coffee Lake i7-8700K, … Read more

Where is VPERMB in AVX2?

I’m 99% sure the main factor is transistor cost of implementation. It would clearly be very useful, and the only reason it doesn’t exist is that the implementation cost must outweigh the significant benefit. Coding space issues are unlikely; the VEX coding space provides a LOT of room. Like, really a lot, since the field … Read more

Return address prediction stack buffer vs stack-stored return address?

Predictors are normally part of the fetch stage, in order to determine which instructions to fetch next. This takes place before the processor has decoded the instructions, and therefore doesn’t even know with certainty that a branch instruction exists. Like all predictors, the intent of the return address predictor is to get the direction / … Read more

Can the simple decoders in recent Intel microarchitectures handle all 1-µop instructions?

No, there are some instructions that can only decode 1/clock This effect is Intel-only, not AMD. Theory: the “steering” logic that sends chunks of machine code to decoders looks for patterns in the opcode byte(s) during pre-decode, and any pattern-match that might be a multi-uop instructions has to get sent to the complex decoder. To … Read more

What branch misprediction does the Branch Target Buffer detect?

This is a good question! I think the confusion that it’s causing is due to Intel’s strange naming schemes which often overload terms standard in academia. I will try to both answer your question and also clear up the confusion I see in the comments. First of all. I agree that in standard computer science … Read more