Why does this simple shuffle algorithm produce biased results?

See this:
The Danger of Naïveté (Coding Horror)

Let’s look at your three card deck as an example. Using a 3 card deck, there are only 6 possible orders for the deck after a shuffle: 123, 132, 213, 231, 312, 321.

With your 1st algorithm there are 27 possible paths (outcomes) for the code, depending on the results of the rand() function at different points. Each of these outcomes are equally likely (unbiased). Each of these outcomes will map to the same single result from the list of 6 possible “real” shuffle results above. We now have 27 items and 6 buckets to put them in. Since 27 is not evenly divisible by 6, some of those 6 combinations must be over-represented.

With the 2nd algorithm there are 6 possible outcomes that map exactly to the 6 possible “real” shuffle results, and they should all be represented equally over time.

This is important because the buckets that are over-represented in the first algorithm are not random. The buckets selected for the bias are repeatable and predictable. So if you’re building an online poker game and use the 1st algorithm a hacker could figure out you used the naive sort and from that work out that certain deck arrangements are much more likely to occur than others. Then they can place bets accordingly. They’ll lose some, but they’ll win much more than they lose and quickly put you out of business.

Leave a Comment