Parallel Algorithms for Generating Prime Numbers (possibly using Hadoop’s map reduce)

Here’s an algorithm that is built on mapping and reducing (folding). It expresses the sieve of Eratosthenes

     P = {3,5,7, …} \ U {{p2, p2+2p, p2+4p, …} | p in P}

for the odd primes (i.e without the 2). The folding tree is indefinitely deepening to the right, like this:

enter image description here

where each prime number marks a stream of odd multiples of that prime, e.g. for 7: 49, 49+14, 49+28, … , which are all merged to get all the composite numbers, and then primes are found in the gaps between the composite numbers. It is in Haskell, so the timing is taken care of implicitly by the lazy evaluation mechanism (and the algorithmic adjustment where each comparing node always lets through the first number from the left without demanding a number from the right, because it is guaranteed to be bigger anyway).

Odds can be used instead of odd primes as seeds for multiples generation, to simplify things (with obvious performance implications).

The work can naturally be divided into segments between consecutive primes’ squares. Haskell code follows, but we can regard it as an executable pseudocode too (where : is a list node lazy constructor, a function call f(x) is written f x, and parentheses are used for grouping only):

primes = 2 : g []
  where
    g ps = 3 : minus [5,7..] (_U [[p*p, p*p+2*p..] | p <- g ps])
    _U ((x:xs):t) = x : union xs (_U (pairs t))
    pairs ((x:xs):ys:t) = (x : union xs ys) : pairs t

union (x:xs) (y:ys) = case compare x y of 
    LT -> x : union  xs (y:ys) 
    EQ -> x : union  xs    ys 
    GT -> y : union (x:xs) ys

minus (x:xs) (y:ys) = case compare x y of
    LT -> x : minus  xs (y:ys) 
    EQ ->     minus  xs    ys 
    GT ->     minus (x:xs) ys

A discussion is here. More sophisticated, lazier scheduling is here. Also this SO answer shows approximate translation of (related) Haskell code in terms of generators; this one in Python.

Leave a Comment