cpu - w3toppers.com

Return address prediction stack buffer vs stack-stored return address?

Predictors are normally part of the fetch stage, in order to determine which instructions to fetch next. This takes place before the processor has decoded the instructions, and therefore doesn’t even know with certainty that a branch instruction exists. Like all predictors, the intent of the return address predictor is to get the direction / … Read more

How does Linux perf calculate the cache-references and cache-misses events

The built-in perf events that you are interested in are mapping to the following hardware performance monitoring events on your processor: 523,288,816 cache-references (architectural event: LLC Reference) 205,331,370 cache-misses (architectural event: LLC Misses) 237,794,728 L1-dcache-load-misses L1D.REPLACEMENT 3,495,080,007 L1-dcache-loads MEM_INST_RETIRED.ALL_LOADS 2,039,344,725 L1-dcache-stores MEM_INST_RETIRED.ALL_STORES 531,452,853 L1-icache-load-misses ICACHE_64B.IFTAG_MISS 77,062,627 LLC-loads OFFCORE_RESPONSE (MSR bits 0, 16, 30-37) 27,462,249 LLC-load-misses … Read more

Is there hardware support for 128bit integers in modern processors?

The x86-64 instruction set can do 64-bit*64-bit to 128-bit using one instruction (mul for unsigned imul for signed each with one operand) so I would argue that to some degree that the x86 instruction set does include some support for 128-bit integers. If your instruction set does not have an instruction to do 64-bit*64-bit to … Read more

CPUID implementations in C++

Accessing raw CPUID information is actually very easy, here is a C++ class for that which works in Windows, Linux and OSX: #ifndef CPUID_H #define CPUID_H #ifdef _WIN32 #include <limits.h> #include <intrin.h> typedef unsigned __int32 uint32_t; #else #include <stdint.h> #endif class CPUID { uint32_t regs[4]; public: explicit CPUID(unsigned i) { #ifdef _WIN32 __cpuid((int *)regs, (int)i); … Read more

Is bit shifting O(1) or O(n)?

Some instruction sets are limited to one bit shift per instruction. And some instruction sets allow you to specify any number of bits to shift in one instruction, which usually takes one clock cycle on modern processors (modern being an intentionally vague word). See dan04’s answer about a barrel shifter, a circuit that shifts more … Read more

Which is faster: x

Potentially depends on the CPU. However, all modern CPUs (x86, ARM) use a “barrel shifter” — a hardware module specifically designed to perform arbitrary shifts in constant time. So the bottom line is… no. No difference.

Can the simple decoders in recent Intel microarchitectures handle all 1-µop instructions?

No, there are some instructions that can only decode 1/clock This effect is Intel-only, not AMD. Theory: the “steering” logic that sends chunks of machine code to decoders looks for patterns in the opcode byte(s) during pre-decode, and any pattern-match that might be a multi-uop instructions has to get sent to the complex decoder. To … Read more

What does “rep; nop;” mean in x86 assembly? Is it the same as the “pause” instruction?

rep; nop is indeed the same as the pause instruction (opcode F390). It might be used for assemblers which don’t support the pause instruction yet. On previous processors, this simply did nothing, just like nop but in two bytes. On new processors which support hyperthreading, it is used as a hint to the processor that … Read more

How to obtain the number of CPUs/cores in Linux from the command line?

grep -c ^processor /proc/cpuinfo will count the number of lines starting with “processor” in /proc/cpuinfo For systems with hyper-threading, you can use grep ^cpu\\scores /proc/cpuinfo | uniq | awk ‘{print $4}’ which should return (for example) 8 (whereas the command above would return 16)

What kind of address instruction does the x86 cpu have?

x86 is a CISC register machine, where at most 1 operand for any instruction can be an explicit memory address instead of a register, using an addressing mode like [rdi + rax*4]. (There are instruction which can have 2 memory operands with one or both being implicit, though: What x86 instructions take two (or more) … Read more