Is it correct to use JavaScript Array.sort() method for shuffling?

After Jon has already covered the theory, here’s an implementation:

function shuffle(array) {
    var tmp, current, top = array.length;

    if(top) while(--top) {
        current = Math.floor(Math.random() * (top + 1));
        tmp = array[current];
        array[current] = array[top];
        array[top] = tmp;
    }

    return array;
}

The algorithm is O(n), whereas sorting should be O(n log n). Depending on the overhead of executing JS code compared to the native sort() function, this might lead to a noticable difference in performance which should increase with array sizes.

In the comments to bobobobo’s answer, I stated that the algorithm in question might not produce evenly distributed probabilities (depending on the implementation of sort()).

My argument goes along these lines: A sorting algorithm requires a certain number c of comparisons, eg c = n(n-1)/2 for Bubblesort. Our random comparison function makes the outcome of each comparison equally likely, ie there are 2^c equally probable results. Now, each result has to correspond to one of the n! permutations of the array’s entries, which makes an even distribution impossible in the general case. (This is a simplification, as the actual number of comparisons neeeded depends on the input array, but the assertion should still hold.)

As Jon pointed out, this alone is no reason to prefer Fisher-Yates over using sort(), as the random number generator will also map a finite number of pseudo-random values to the n! permutations. But the results of Fisher-Yates should still be better:

Math.random() produces a pseudo-random number in the range [0;1[. As JS uses double-precision floating point values, this corresponds to 2^x possible values where 52 ≤ x ≤ 63 (I’m too lazy to find the actual number). A probability distribution generated using Math.random() will stop behaving well if the number of atomic events is of the same order of magnitude.

When using Fisher-Yates, the relevant parameter is the size of the array, which should never approach 2^52 due to practical limitations.

When sorting with a random comparision function, the function basically only cares if the return value is positive or negative, so this will never be a problem. But there is a similar one: Because the comparison function is well-behaved, the 2^c possible results are, as stated, equally probable. If c ~ n log n then 2^c ~ n^(a·n) where a = const, which makes it at least possible that 2^c is of same magnitude as (or even less than) n! and thus leading to an uneven distribution, even if the sorting algorithm where to map onto the permutaions evenly. If this has any practical impact is beyond me.

The real problem is that the sorting algorithms are not guaranteed to map onto the permutations evenly. It’s easy to see that Mergesort does as it’s symmetric, but reasoning about something like Bubblesort or, more importantly, Quicksort or Heapsort, is not.

The bottom line: As long as sort() uses Mergesort, you should be reasonably safe except in corner cases (at least I’m hoping that 2^c ≤ n! is a corner case), if not, all bets are off.

More Related Contents:

Leave a Comment Cancel reply