Pythagorean Triples make a good example for claiming “for
loops considered harmful“, because for
loops seduce us into thinking about counting, often the most irrelevant part of a task.
(I’m going to stick with pseudo-code to avoid language biases, and to keep the pseudo-code streamlined, I’ll not optimize away multiple calculations of e.g. x * x
and y * y
.)
Version 1:
for x in 1..N {
for y in 1..N {
for z in 1..N {
if x * x + y * y == z * z then {
// use x, y, z
}
}
}
}
is the worst solution. It generates duplicates, and traverses parts of the space that aren’t useful (e.g. whenever z < y
). Its time complexity is cubic on N
.
Version 2, the first improvement, comes from requiring x < y < z
to hold, as in:
for x in 1..N {
for y in x+1..N {
for z in y+1..N {
if x * x + y * y == z * z then {
// use x, y, z
}
}
}
}
which reduces run time and eliminates duplicated solutions. However, it is still cubic on N
; the improvement is just a reduction of the co-efficient of N
-cubed.
It is pointless to continue examining increasing values of z
after z * z < x * x + y * y
no longer holds. That fact motivates Version 3, the first step away from brute-force iteration over z
:
for x in 1..N {
for y in x+1..N {
z = y + 1
while z * z < x * x + y * y {
z = z + 1
}
if z * z == x * x + y * y and z <= N then {
// use x, y, z
}
}
}
For N
of 1000, this is about 5 times faster than Version 2, but it is still cubic on N
.
The next insight is that x
and y
are the only independent variables; z
depends on their values, and the last z
value considered for the previous value of y
is a good starting search value for the next value of y
. That leads to Version 4:
for x in 1..N {
y = x+1
z = y+1
while z <= N {
while z * z < x * x + y * y {
z = z + 1
}
if z * z == x * x + y * y and z <= N then {
// use x, y, z
}
y = y + 1
}
}
which allows y
and z
to “sweep” the values above x
only once. Not only is it over 100 times faster for N
of 1000, it is quadratic on N
, so the speedup increases as N
grows.
I’ve encountered this kind of improvement often enough to be mistrustful of “counting loops” for any but the most trivial uses (e.g. traversing an array).
Update: Apparently I should have pointed out a few things about V4 that are easy to overlook.
-
Both of the
while
loops are controlled by the value ofz
(one directly, the other indirectly through the square ofz
). The innerwhile
is actually speeding up the outerwhile
, rather than being orthogonal to it. It’s important to look at what the loops are doing, not merely to count how many loops there are. -
All of the calculations in V4 are strictly integer arithmetic. Conversion to/from floating-point, as well as floating-point calculations, are costly by comparison.
-
V4 runs in constant memory, requiring only three integer variables. There are no arrays or hash tables to allocate and initialize (and, potentially, to cause an out-of-memory error).
-
The original question allowed all of
x
,y
, andx
to vary over the same range. V1..V4 followed that pattern.
Below is a not-very-scientific set of timings (using Java under Eclipse on my older laptop with other stuff running…), where the “use x, y, z” was implemented by instantiating a Triple object with the three values and putting it in an ArrayList. (For these runs, N
was set to 10,000, which produced 12,471 triples in each case.)
Version 4: 46 sec.
using square root: 134 sec.
array and map: 400 sec.
The “array and map” algorithm is essentially:
squares = array of i*i for i in 1 .. N
roots = map of i*i -> i for i in 1 .. N
for x in 1 .. N
for y in x+1 .. N
z = roots[squares[x] + squares[y]]
if z exists use x, y, z
The “using square root” algorithm is essentially:
for x in 1 .. N
for y in x+1 .. N
z = (int) sqrt(x * x + y * y)
if z * z == x * x + y * y then use x, y, z
The actual code for V4 is:
public Collection<Triple> byBetterWhileLoop() {
Collection<Triple> result = new ArrayList<Triple>(limit);
for (int x = 1; x < limit; ++x) {
int xx = x * x;
int y = x + 1;
int z = y + 1;
while (z <= limit) {
int zz = xx + y * y;
while (z * z < zz) {++z;}
if (z * z == zz && z <= limit) {
result.add(new Triple(x, y, z));
}
++y;
}
}
return result;
}
Note that x * x
is calculated in the outer loop (although I didn’t bother to cache z * z
); similar optimizations are done in the other variations.
I’ll be glad to provide the Java source code on request for the other variations I timed, in case I’ve mis-implemented anything.