How to generate random numbers in parallel?

I’ll post here what I posted to Concurrent random number generation :

I think you’re looking for rand_r(), which explicitly takes the current RNG state as a parameter. Then each thread should have its own copy of seed data (whether you want each thread to start off with the same seed or different ones depends on what you’re doing, here you want them to be different or you’d get the same row again and again). There’s some discussion of rand_r() and thread-safety here: whether rand_r is real thread safe? .

So say you wanted each thread to have its seed start off with its thread number (which is probably not what you want, as it would give the same results every time you ran with the same number of threads, but just as an example):

#pragma omp parallel default(none)
{
    int i;
    unsigned int myseed = omp_get_thread_num();
    #pragma omp for
    for(i=0; i<100; i++)
            printf("%d %d %d\n",i,omp_get_thread_num(),rand_r(&myseed));
}

Edit: Just on a lark, checked to see if the above would get any speedup. Full code was

#define NRANDS 1000000
int main(int argc, char **argv) {

    struct timeval t;
    int a[NRANDS];

    tick(&t);
    #pragma omp parallel default(none) shared(a)
    {
        int i;
        unsigned int myseed = omp_get_thread_num();
        #pragma omp for
        for(i=0; i<NRANDS; i++)
                a[i] = rand_r(&myseed);
    }
    double sum = 0.;
    double time=tock(&t);
    for (long int i=0; i<NRANDS; i++) {
        sum += a[i];
    }
    printf("Time = %lf, sum = %lf\n", time, sum);

    return 0;
}

where tick and tock are just wrappers to gettimeofday(), and tock() returns the difference in seconds. Sum is printed just to make sure that nothing gets optimized away, and to demonstrate a small point; you will get different numbers with different numbers of threads because each thread gets its own threadnum as a seed; if you run the same code again and again with the same number of threads you’ll get the same sum, for the same reason. Anyway, timing (running on a 8-core nehalem box with no other users):

$ export OMP_NUM_THREADS=1
$ ./rand
Time = 0.008639, sum = 1074808568711883.000000

$ export OMP_NUM_THREADS=2
$ ./rand
Time = 0.006274, sum = 1074093295878604.000000

$ export OMP_NUM_THREADS=4
$ ./rand
Time = 0.005335, sum = 1073422298606608.000000

$ export OMP_NUM_THREADS=8
$ ./rand
Time = 0.004163, sum = 1073971133482410.000000

So speedup, if not great; as @ruslik points out, this is not really a compute-intensive process, and other issues like memory bandwidth start playing a role. Thus, only a shade over 2x speedup on 8 cores.

Leave a Comment