Generating integers in ascending order using a set of prime numbers

Removing a number and reinserting all its multiples (by the primes in the set) into a priority queue is wrong (in the sense of the question) – i.e. it produces correct sequence but inefficiently so.

It is inefficient in two ways – first, it overproduces the sequence; second, each PriorityQueue operation incurs extra cost (the operations remove_top and insert are not usually both O(1), certainly not in any list- or tree-based PriorityQueue implementation).

The efficient O(n) algorithm maintains pointers back into the sequence itself as it is being produced, to find and append the next number in O(1) time. In pseudocode:

  return array h where
    h[0]=1; n=0; ps=[2,3,5, ... ]; // base primes
    is=[0 for each p in ps];       // indices back into h
    xs=[p for each p in ps]        // next multiples: xs[k]==ps[k]*h[is[k]]
    repeat:
      h[++n] := minimum xs
      for each ref (i,x,p) in (is,xs,ps):
        if( x==h[n] )
          { x := p*h[++i]; }       // advance the minimal multiple/pointer

For each minimal multiple it advances its pointer, while at the same time calculating its next multiple value. This too effectively implements a PriorityQueue but with crucial distinctions – it is before the end point, not after; it doesn’t create any additional storage except for the sequence itself; and its size is constant (just k numbers, for k base primes) whereas the size of past-the-end PriorityQueue grows as we progress along the sequence (in the case of Hamming sequence, based on set of 3 primes, as n2/3, for n numbers of the sequence).


The classic Hamming sequence in Haskell is essentially the same algorithm:

h = 1 : map (2*) h `union` map (3*) h `union` map (5*) h

union a@(x:xs) b@(y:ys) = case compare x y of LT -> x : union  xs  b
                                              EQ -> x : union  xs  ys
                                              GT -> y : union  a   ys

We can generate the smooth numbers for arbitrary base primes using the foldi function (see Wikipedia) to fold lists in a tree-like fashion for efficiency, creating a fixed sized tree of comparisons:

smooth base_primes = h   where       -- strictly increasing base_primes  NB!
    h = 1 : foldi g [] [map (p*) h | p <- base_primes]
    g (x:xs) ys = x : union xs ys

foldi f z []     = z
foldi f z (x:xs) = f x (foldi f z (pairs f xs))
 
pairs f (x:y:t)  = f x y : pairs f t
pairs f t        = t

It is also possible to directly calculate a slice of Hamming sequence around its nth member in O(n2/3) time, by direct enumeration of the triples and assessing their values through logarithms, logval(i,j,k) = i*log 2+j*log 3+k*log 5. This Ideone.com test entry calculates 1 billionth Hamming number in 1.12 0.05 seconds (2016-08-18: main speedup due to usage of Int instead of the default Integer where possible, even on 32-bit; additional 20% thanks to the tweak suggested by @GordonBGood, bringing band size complexity down to O(n1/3)).

This is discussed some more in this answer where we also find its full attribution:

slice hi w = (c, sortBy (compare `on` fst) b) where  -- hi is a top log2 value
  lb5=logBase 2 5 ; lb3=logBase 2 3                  -- w<1 (NB!) is (log2 width)
  (Sum c, b) = fold                                  -- total count, the band
      [ ( Sum (i+1),                                 -- total triples w/this j,k
          [ (r,(i,j,k)) | frac < w ] )               -- store it, if inside the band
        | k <- [ 0 .. floor ( hi   /lb5) ],  let p = fromIntegral k*lb5,
          j <- [ 0 .. floor ((hi-p)/lb3) ],  let q = fromIntegral j*lb3 + p,
          let (i,frac) = pr  (hi-q)      ;       r = hi - frac    -- r = i + q
      ]    -- (sum . map fst &&& concat . map snd)
  pr = properFraction 

This can be generalized for k base primes as well, probably running in O(n(k-1)/k) time.


(see this SO entry for an important later development. also, this answer is interesting. and another related answer.)

Leave a Comment