Algorithm for sampling without replacement?

Here’s some code for sampling without replacement based on Algorithm 3.4.2S of Knuth’s book Seminumeric Algorithms.

void SampleWithoutReplacement
(
    int populationSize,    // size of set sampling from
    int sampleSize,        // size of each sample
    vector<int> & samples  // output, zero-offset indicies to selected items
)
{
    // Use Knuth's variable names
    int& n = sampleSize;
    int& N = populationSize;

    int t = 0; // total input records dealt with
    int m = 0; // number of items selected so far
    double u;

    while (m < n)
    {
        u = GetUniform(); // call a uniform(0,1) random number generator

        if ( (N - t)*u >= n - m )
        {
            t++;
        }
        else
        {
            samples[m] = t;
            t++; m++;
        }
    }
}

There is a more efficient but more complex method by Jeffrey Scott Vitter in “An Efficient Algorithm for Sequential Random Sampling,” ACM Transactions on Mathematical Software, 13(1), March 1987, 58-67.

Leave a Comment