Produce loops without cmp instruction in GCC

How about this. Compiler is gcc 4.9.0 mingw x64: void triad(float *x, float *y, float *z, const int n) { float k = 3.14159f; intptr_t i; __m256 k4 = _mm256_set1_ps(k); for(i = -n; i < 0; i += 8) { _mm256_store_ps(&z[i+n], _mm256_add_ps(_mm256_load_ps(&x[i+n]), _mm256_mul_ps(k4, _mm256_load_ps(&y[i+n])))); } } gcc -c -O3 -march=corei7 -mavx2 triad.c 0000000000000000 <triad>: 0: … Read more

How can the rep stosb instruction execute faster than the equivalent loop?

In modern CPUs, rep stosb‘s and rep movsb‘s microcoded implementation actually uses stores that are wider than 1B, so it can go much faster than one byte per clock. (Note this only applies to stos and movs, not repe cmpsb or repne scasb. They’re still slow, unfortunately, like at best 2 cycles per byte compared … Read more

C# Compiler optimization – Unused methods

Just checked in reflector with a release build. The compiler doesn’t remove the unused private methods. There are ways to use a method without the compiler knowledge, like with reflection. So the compiler doesn’t try to guess. It just leaves the methods there. The only private methods the compiler removes are partial methods without implementation. … Read more

Is it possible to find two numbers whose difference is minimum in O(n) time

Find smallest and largest element in the list. The difference smallest-largest will be minimum. If you’re looking for nonnegative difference, then this is of course at least as hard as checking if the array has two same elements. This is called element uniqueness problem and without any additional assumptions (like limiting size of integers, allowing … Read more