How does rand() work? Does it have certain tendencies? Is there something better to use?

Briefly:

  • You use time.h to get a seed, which is an initial random number. C then does a bunch of operations on this number to get the next random number, then operations on that one to get the next, then… you get the picture.

  • rand() is able to touch on every possible integer. It will not prefer even or odd numbers regardless of the input seed, happily. Still, it has limits – it repeats itself relatively quickly, and in almost every implementation only gives numbers up to 32767.

  • C does not have another built-in random number generator. If you need a real tough one, there are many packages available online, but the Mersenne Twister algorithm is probably the most popular pick.

Now, if you are interested on the reasons why the above is true, here are the gory details on how rand() works:

rand() is what’s called a “linear congruential generator.” This means that it employs an equation of the form:

xn+1 = (*a****xn + ***b*) mod m

where xn is the nth random number, and a and b are some predetermined integers. The arithmetic is performed modulo m, with m usually 232 depending on the machine, so that only the lowest 32 bits are kept in the calculation of xn+1.

In English, then, the idea is this: To get the next random number, multiply the last random number by something, add a number to it, and then take the last few digits.

A few limitations are quickly apparent:

  • First, you need a starting random number. This is the “seed” of your random number generator, and this is where you’ve heard of time.h being used. Since we want a really random number, it is common practice to ask the system what time it is (in integer form) and use this as the first “random number.” Also, this explains why using the same seed twice will always give exactly the same sequence of random numbers. This sounds bad, but is actually useful, since debugging is a lot easier when you control the inputs to your program

  • Second, a and b have to be chosen very, very carefully or you’ll get some disastrous results. Fortunately, the equation for a linear congruential generator is simple enough that the math has been worked out in some detail. It turns out that choosing an a which satisfies *a***mod8 = 5 together with ***b* = 1 will insure that all m integers are equally likely, independent of choice of seed. You also want a value of a that is really big, so that every time you multiply it by xn you trigger a the modulo and chop off a lot of digits, or else many numbers in a row will just be multiples of each other. As a result, two common values of a (for example) are 1566083941 and 1812433253 according to Knuth. The GNU C library happens to use a=1103515245 and b=12345. A list of values for lots of implementations is available at the wikipedia page for LCGs.

  • Third, the linear congruential generator will actually repeat itself because of that modulo. This gets to be some pretty heady math, but the result of it all is happily very simple: The sequence will repeat itself after m numbers of have been generated. In most cases, this means that your random number generator will repeat every 232 cycles. That sounds like a lot, but it really isn’t for many applications. If you are doing serious numerical work with Monte Carlo simulations, this number is hopelessly inadequate.

  • A fourth much less obvious problem is that the numbers are actually not really random. They have a funny sort of correlation. If you take three consecutive integers, (x, y, z), from an LCG with some value of a and m, those three points will always fall on the lattice of points generated by all linear combinations of the three points (1, a, a2), (0, m, 0), (0, 0, m). This is known as Marsaglia’s Theorem, and if you don’t understand it, that’s okay. All it means is this: Triplets of random numbers from an LCG will show correlations at some deep, deep level. Usually it’s too deep for you or I to notice, but its there. It’s possible to even reconstruct the first number in a “random” sequence of three numbers if you are given the second and third! This is not good for cryptography at all.

The good part is that LCGs like rand() are very, very low footprint. It typically requires only 32 bits to retain state, which is really nice. It’s also very fast, requiring very few operations. These make it good for noncritical embedded systems, video games, casual applications, stuff like that.

PRNGs are a fascinating topic. Wikipedia is always a good place to go if you are hungry to learn more on the history or the various implementations that are around today.

Leave a Comment