microbenchmark - w3toppers.com

Simple for() loop benchmark takes the same time with any loop bound

BTW, if you’d actually done i<49058349083, gcc and clang create an infinite loop on systems with 32-bit int (including x86 and x86-64). 49058349083 is greater than INT_MAX. Large literal numbers are implicitly promoted to a type large enough to hold them, so you effectively did (int64_t)i < 49058349083LL, which is true for any possible value … Read more

“Escape” and “Clobber” equivalent in MSVC

While I don’t know of an equivalent assembly trick for MSVC, Facebook uses the following in their Folly benchmark library: /** * Call doNotOptimizeAway(var) against variables that you use for * benchmarking but otherwise are useless. The compiler tends to do a * good job at eliminating unused variables, and this function fools * it … Read more

how do numactl & perf change memory placement policy of child processes?

TL;DR: The default policy used by numactl can cause performances issues as well as the OpenMP thread binding. numactl constraints are applied to all (forked) children process. Indeed, numactl use a predefined policy by default. This policy is can be –interleaved, –preferred, –membind, –localalloc. This policy change the behavior of the operating system page allocation … Read more

Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Adding an extra load speeds it up?

Tl;DR: For these three cases, a penalty of a few cycles is incurred when performing a load and store at the same time. The load latency is on the critical path in all of the three cases, but the penalty is different in different cases. Case 3 is about a cycle higher than case 1 … Read more

Getting an accurate execution time in C++ (micro seconds)

If you are using c++11 or later you could use std::chrono::high_resolution_clock. A simple use case : auto start = std::chrono::high_resolution_clock::now(); … auto elapsed = std::chrono::high_resolution_clock::now() – start; long long microseconds = std::chrono::duration_cast<std::chrono::microseconds>( elapsed).count(); This solution has the advantage of being portable. Beware that micro-benchmarking is hard. It’s very easy to measure the wrong thing (like … Read more

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

What is microbenchmarking?

It means exactly what it says on the tin can – it’s measuring the performance of something “small”, like a system call to the kernel of an operating system. The danger is that people may use whatever results they obtain from microbenchmarking to dictate optimizations. And as we all know: We should forget about small … Read more

Idiomatic way of performance evaluation?

Generally: For repeated short things, you can just time the whole repeat loop. (But microbenchmarking is hard; easy to distort results unless you understand the implications of doing that; for very short things, throughput and latency are different, so measure both separately by making one iteration use the result of the previous or not. Also … Read more

How do I write a correct micro-benchmark in Java?

Tips about writing micro benchmarks from the creators of Java HotSpot: Rule 0: Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro-benchmarks; they measure only a limited range of JVM performance characteristics. Rule 1: Always include a warmup phase which runs your … Read more

What is more efficient for a PHP variable, $timelimit 10*60; or $limit 600;? [closed]

Efficient to what? If readability, 60 * 10 is much more understandable than 600 in terms of timing. For performance, 600 is probably a tiny bit better. Let’s say I want to say one day, I’d write 60 * 60 * 24. It seems cleanier than 86400 Sometimes, efficiency is the way other people (or … Read more