Prove that binary heap build max comparsion is (2N-2)

The build-heap algorithm starts at the midpoint and moves items down as required. Let’s consider a heap of 127 items (7 levels). In the worst case:

64 nodes (the leaf level) don't move at all
32 nodes move down one level
16 nodes move down two levels
 8 nodes move down three levels
 4 nodes move down four levels
 2 nodes move down five levels
 1 node moves down six levels

So in the worst case you have

64*0 + 32*1 + 16*2 + 8*3 + 4*4 + 2*5 + 1*6
0 + 32 + 32 + 24 + 16 + 10 + 6 = 120 swaps

So in the worst case, build-heap makes fewer than N swaps.

Because build-heap requires that you swap an item with the smallest of its children, it requires two comparisons to initiate a swap: one to find the smallest of the two children, and one to determine if the node is larger and must be swapped.

The number of comparisons required to move a node is 2*(levels_moved+1), and no more than N/2 nodes will be moved.

The general case

We need to prove that the maximum number of comparisons is no more than 2N-2. As I noted above, it takes two comparisons to move a node one level. So if the number of levels moved is less than N (i.e. (N-1) or fewer), then the maximum number of comparisons cannot exceed 2N-2.

I use a full heap in the discussion below because it represents the worst case.

In a full heap of N items, there are (N+1)/2 nodes at the leaf level. (N+1)/4 at the next level up. (N+1)/8 at the next, etc. You end up with this:

(N+1)/2 nodes move 0 levels
(N+1)/4 nodes move 1 level
(N+1)/8 nodes move 2 levels
(N+1)/16 nodes move 3 levels
(N+1)/32 nodes move 4 levels
...

That gives us the series:

((N+1)/2)*0 + ((N+1)/4)*1 + ((N+1)/8)*2 + ((N+1)/16)*3 ...

Let’s see what that does for heaps of different sizes:

heap size  levels   levels moved
   1         1          0
   3         2          1
   7         3          2*1 + 1*2 = 4
   15        4          4*1 + 2*2 + 1*3 = 11
   31        5          8*1 + 4*2 + 2*3 + 1*4 = 26
   63        6          16*1 + 8*2 + 4*3 + 2*4 + 1*5 = 57
   127       7          32*1 + 16*2 + 8*3 + 4*4 + 2*5 + 1*6 = 120
         ....

I ran that for heaps up of up to 20 levels (size a million and change), and it holds true: the maximum number of levels moved for a full heap of N items is N-log2(N+1).

Taking the above series as an Arithetico-geometric Sequence, we compute the sum for log2(N + 1) - 1 terms, ignoring the first term as it is zero, to be equal to N - 1. (Recall that a full binary tree has log2(N + 1) levels)

This sum represents the total number of times a siftup operation was performed. The total number of comparisons thus required is 2N - 2 (since each sift up operation requires two comparisons). This is also the upper bound, since a full binary tree always represents the worst case for a given tree depth.

Leave a Comment