Given Prime Number N, Compute the Next Prime?

About a year ago I was working this area for libc++ while implementing the
unordered (hash) containers for C++11. I thought I would share
my experiences here. This experience supports marcog’s accepted answer for a
reasonable definition of “brute force”.

That means that even a simple brute force will be fast enough in most
circumstances, taking O(ln(p)*sqrt(p)) on average.

I developed several implementations of size_t next_prime(size_t n) where the
spec for this function is:

Returns: The smallest prime that is greater than or equal to n.

Each implementation of next_prime is accompanied by a helper function is_prime. is_prime should be considered a private implementation detail; not meant to be called directly by the client. Each of these implementations was of course tested for correctness, but also
tested with the following performance test:

int main()
{
    typedef std::chrono::high_resolution_clock Clock;
    typedef std::chrono::duration<double, std::milli> ms;
    Clock::time_point t0 = Clock::now();

    std::size_t n = 100000000;
    std::size_t e = 100000;
    for (std::size_t i = 0; i < e; ++i)
        n = next_prime(n+1);

    Clock::time_point t1 = Clock::now();
    std::cout << e/ms(t1-t0).count() << " primes/millisecond\n";
    return n;
}

I should stress that this is a performance test, and does not reflect typical
usage, which would look more like:

// Overflow checking not shown for clarity purposes
n = next_prime(2*n + 1);

All performance tests were compiled with:

clang++ -stdlib=libc++ -O3 main.cpp

Implementation 1

There are seven implementations. The purpose for displaying the first
implementation is to demonstrate that if you fail to stop testing the candidate
prime x for factors at sqrt(x) then you have failed to even achieve an
implementation that could be classified as brute force. This implementation is
brutally slow.

bool
is_prime(std::size_t x)
{
    if (x < 2)
        return false;
    for (std::size_t i = 2; i < x; ++i)
    {
        if (x % i == 0)
            return false;
    }
    return true;
}

std::size_t
next_prime(std::size_t x)
{
    for (; !is_prime(x); ++x)
        ;
    return x;
}

For this implementation only I had to set e to 100 instead of 100000, just to
get a reasonable running time:

0.0015282 primes/millisecond

Implementation 2

This implementation is the slowest of the brute force implementations and the
only difference from implementation 1 is that it stops testing for primeness
when the factor surpasses sqrt(x).

bool
is_prime(std::size_t x)
{
    if (x < 2)
        return false;
    for (std::size_t i = 2; true; ++i)
    {
        std::size_t q = x / i;
        if (q < i)
            return true;
        if (x % i == 0)
            return false;
    }
    return true;
}

std::size_t
next_prime(std::size_t x)
{
    for (; !is_prime(x); ++x)
        ;
    return x;
}

Note that sqrt(x) isn’t directly computed, but inferred by q < i. This
speeds things up by a factor of thousands:

5.98576 primes/millisecond

and validates marcog’s prediction:

… this is well within the constraints of
most problems taking on the order of
a millisecond on most modern hardware.

Implementation 3

One can nearly double the speed (at least on the hardware I’m using) by
avoiding use of the % operator:

bool
is_prime(std::size_t x)
{
    if (x < 2)
        return false;
    for (std::size_t i = 2; true; ++i)
    {
        std::size_t q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
    }
    return true;
}

std::size_t
next_prime(std::size_t x)
{
    for (; !is_prime(x); ++x)
        ;
    return x;
}

11.0512 primes/millisecond

Implementation 4

So far I haven’t even used the common knowledge that 2 is the only even prime.
This implementation incorporates that knowledge, nearly doubling the speed
again:

bool
is_prime(std::size_t x)
{
    for (std::size_t i = 3; true; i += 2)
    {
        std::size_t q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
    }
    return true;
}

std::size_t
next_prime(std::size_t x)
{
    if (x <= 2)
        return 2;
    if (!(x & 1))
        ++x;
    for (; !is_prime(x); x += 2)
        ;
    return x;
}

21.9846 primes/millisecond

Implementation 4 is probably what most people have in mind when they think
“brute force”.

Implementation 5

Using the following formula you can easily choose all numbers which are
divisible by neither 2 nor 3:

6 * k + {1, 5}

where k >= 1. The following implementation uses this formula, but implemented
with a cute xor trick:

bool
is_prime(std::size_t x)
{
    std::size_t o = 4;
    for (std::size_t i = 5; true; i += o)
    {
        std::size_t q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        o ^= 6;
    }
    return true;
}

std::size_t
next_prime(std::size_t x)
{
    switch (x)
    {
    case 0:
    case 1:
    case 2:
        return 2;
    case 3:
        return 3;
    case 4:
    case 5:
        return 5;
    }
    std::size_t k = x / 6;
    std::size_t i = x - 6 * k;
    std::size_t o = i < 2 ? 1 : 5;
    x = 6 * k + o;
    for (i = (3 + o) / 2; !is_prime(x); x += i)
        i ^= 6;
    return x;
}

This effectively means that the algorithm has to check only 1/3 of the
integers for primeness instead of 1/2 of the numbers and the performance test
shows the expected speed up of nearly 50%:

32.6167 primes/millisecond

Implementation 6

This implementation is a logical extension of implementation 5: It uses the
following formula to compute all numbers which are not divisible by 2, 3 and 5:

30 * k + {1, 7, 11, 13, 17, 19, 23, 29}

It also unrolls the inner loop within is_prime, and creates a list of “small
primes” that is useful for dealing with numbers less than 30.

static const std::size_t small_primes[] =
{
    2,
    3,
    5,
    7,
    11,
    13,
    17,
    19,
    23,
    29
};

static const std::size_t indices[] =
{
    1,
    7,
    11,
    13,
    17,
    19,
    23,
    29
};

bool
is_prime(std::size_t x)
{
    const size_t N = sizeof(small_primes) / sizeof(small_primes[0]);
    for (std::size_t i = 3; i < N; ++i)
    {
        const std::size_t p = small_primes[i];
        const std::size_t q = x / p;
        if (q < p)
            return true;
        if (x == q * p)
            return false;
    }
    for (std::size_t i = 31; true;)
    {
        std::size_t q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        i += 6;

        q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        i += 4;

        q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        i += 2;

        q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        i += 4;

        q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        i += 2;

        q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        i += 4;

        q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        i += 6;

        q = x / i;
        if (q < i)
            return true;
        if (x == q * i)
            return false;
        i += 2;
    }
    return true;
}

std::size_t
next_prime(std::size_t n)
{
    const size_t L = 30;
    const size_t N = sizeof(small_primes) / sizeof(small_primes[0]);
    // If n is small enough, search in small_primes
    if (n <= small_primes[N-1])
        return *std::lower_bound(small_primes, small_primes + N, n);
    // Else n > largest small_primes
    // Start searching list of potential primes: L * k0 + indices[in]
    const size_t M = sizeof(indices) / sizeof(indices[0]);
    // Select first potential prime >= n
    //   Known a-priori n >= L
    size_t k0 = n / L;
    size_t in = std::lower_bound(indices, indices + M, n - k0 * L) - indices;
    n = L * k0 + indices[in];
    while (!is_prime(n))
    {
        if (++in == M)
        {
            ++k0;
            in = 0;
        }
        n = L * k0 + indices[in];
    }
    return n;
}

This is arguably getting beyond “brute force” and is good for boosting the
speed another 27.5% to:

41.6026 primes/millisecond

Implementation 7

It is practical to play the above game for one more iteration, developing a
formula for numbers not divisible by 2, 3, 5 and 7:

210 * k + {1, 11, ...},

The source code isn’t shown here, but is very similar to implementation 6.
This is the implementation I chose to actually use for the unordered containers
of libc++, and that source code is open source (found at the link).

This final iteration is good for another 14.6% speed boost to:

47.685 primes/millisecond

Use of this algorithm assures that clients of libc++‘s hash tables can choose
any prime they decide is most beneficial to their situation, and the performance
for this application is quite acceptable.

Leave a Comment