cpu-architecture - w3toppers.com

What specifically marks an x86 cache line as dirty – any write, or is an explicit change required?

Currently no implementation of x86 (or any other ISA, as far as I know) supports optimizing silent stores. There has been academic research on this and there is even a patent on “eliminating silent store invalidation propagation in shared memory cache coherency protocols”. (Googling ‘”silent store” cache’ if you are interested in more.) For x86, … Read more

How can I get the iOS device CPU architecture in runtime

You can use sysctlbyname : #include <sys/types.h> #include <sys/sysctl.h> #include <mach/machine.h> NSString *getCPUType(void) { NSMutableString *cpu = [[NSMutableString alloc] init]; size_t size; cpu_type_t type; cpu_subtype_t subtype; size = sizeof(type); sysctlbyname(“hw.cputype”, &type, &size, NULL, 0); size = sizeof(subtype); sysctlbyname(“hw.cpusubtype”, &subtype, &size, NULL, 0); // values for cputype and cpusubtype defined in mach/machine.h if (type == CPU_TYPE_X86) … Read more

Why are C++ int and long types both 4 bytes?

The only things guaranteed about integer types are: sizeof(char) == 1 sizeof(char) <= sizeof(short) sizeof(short) <= sizeof(int) sizeof(int) <= sizeof(long) sizeof(long) <= sizeof(long long) sizeof(char) * CHAR_BIT >= 8 sizeof(short) * CHAR_BIT >= 16 sizeof(int) * CHAR_BIT >= 16 sizeof(long) * CHAR_BIT >= 32 sizeof(long long) * CHAR_BIT >= 64 The other things are implementation … Read more

Why did Intel change the static branch prediction mechanism over these years?

The primary reason why static prediction is not favored in modern designs, to the point of perhaps not even being present, is that static predictions occur too late in the pipeline compared to dynamic predictions. The basic issue is that branch directions and target locations must be known before fetching them, but static predictions can … Read more

Difference between word addressable and byte addressable

A byte is a memory unit for storage A memory chip is full of such bytes. Memory units are addressable. That is the only way we can use memory. In reality, memory is only byte addressable. It means: A binary address always points to a single byte only. A word is just a group of … Read more

Is a mov to a segmentation register slower than a mov to a general purpose register?

mov %eax, %ebx between general-purpose registers is one of the most common instructions. Modern hardware supports it extremely efficiently, often with special cases that don’t apply to any other instruction. On older hardware, it’s always been one of the cheapest instructions. On Ivybridge and later, it doesn’t even need an execution unit and has zero … Read more

Are load ops deallocated from the RS when they dispatch, complete or some other time?

The following experiments suggest that the uops are deallocated at some point before the load completes. While this is not a complete answer to your question, it might provide some interesting insights. On Skylake, there is a 33-entry reservation station for loads (see https://stackoverflow.com/a/58575898/10461973). This should also be the case for the Coffee Lake i7-8700K, … Read more

Out-of-order instruction execution: is commit order preserved?

TL:DR: memory ordering is not the same thing as out of order execution. It happens even on in-order pipelined CPUs. In-order commit is necessary1 for precise exceptions that can roll-back to exactly the instruction that faulted, without any instructions after that having already retired. The cardinal rule of out-of-order execution is don’t break single-threaded code. … Read more

Dependent loads reordering in CPU

Short answer: In an out-of-order processor the load-store queue is used to track and enforce memory ordering constraints. Processors such as the Alpha 21264 have the necessary hardware to prevent dependent load reordering, but enforcing this dependency could add overhead for inter-processor communication. Long answer: Background on dependence tracking This is probably best explained using … Read more

What happens when different CPU cores write to the same RAM address without synchronization?

x86 (like every other mainstream SMP CPU architecture) has coherent data caches. It’s impossible for two difference caches (e.g. L1D of 2 different cores) to hold conflicting data for the same cache line. The hardware imposes an order (by some implementation-specific mechanism to break ties in case two requests for ownership arrive in the same … Read more