QuickSort and Hoare Partition

To answer the question of “Why does Hoare partitioning work?”:

Let’s simplify the values in the array to just three kinds: L values (those less than the pivot value), E values (those equal to the pivot value), and G value (those larger than the pivot value).

We’ll also give a special name to one location in the array; we’ll call this location s, and it’s the location where the j pointer is when the procedure finishes. Do we know ahead of time which location s is? No, but we know that some location will meet that description.

With these terms, we can express the goal of the partitioning procedure in slightly different terms: it is to split a single array into two smaller sub-arrays which are not mis-sorted with respect to each other. That “not mis-sorted” requirement is satisfied if the following conditions are true:

  1. The “low” sub-array, that goes from the left end of the array up to and includes s, contains no G values.
  2. The “high” sub-array, that starts immediately after s and continues to the right end, contains no L values.

That’s really all we need to do. We don’t even need to worry where the E values wind up on any given pass. As long as each pass gets the sub-arrays right with respect to each other, later passes will take care of any disorder that exists inside any sub-array.

So now let’s address the question from the other side: how does the partitioning procedure ensure that there are no G values in s or to the left of it, and no L values to the right of s?

Well, “the set of values to the right of s” is the same as “the set of cells the j pointer moves over before it reaches s“. And “the set of values to the left of and including s” is the same as “the set of values that the i pointer moves over before j reaches s“.

That means that any values which are misplaced will, on some iteration of the loop, be under one of our two pointers. (For convenience, let’s say it’s the j pointer pointing at a L value, though it works exactly the same for the i pointer pointing at a G value.) Where will the i pointer be, when the j pointer is on a misplaced value? We know it will be:

  1. at a location in the “low” subarray, where the L value can go with no problems;
  2. pointing at a value that’s either an E or a G value, which can easily replace the L value under the j pointer. (If it wasn’t on an E or a G value, it wouldn’t have stopped there.)

Note that sometimes the i and j pointer will actually both stop on E values. When this happens, the values will be switched, even though there’s no need for it. This doesn’t do any harm, though; we said before that the placement of the E values can’t cause mis-sorting between the sub-arrays.

So, to sum up, Hoare partitioning works because:

  1. It separates an array into smaller sub-arrays which are not mis-sorted relative to each other;
  2. If you keep doing that and recursively sorting the sub-arrays, eventually there will be nothing left of the array that’s unsorted.

Leave a Comment