Books / Priority Queue / Chapter 4.3
Priority Queue Algorithms - Building the Heap
A heap with \(n\) keys is built by inserting the \(n\) keys into an initially empty heap in any order without maintaining the heap property. After all items are in the tree, it is “heapified”. A binary tree can be converted into a heap by starting at the level just above the lowest level and percolating down all keys that are too big. When this level is finished, the next level up is processed, and so on, until we reach the root. Since Corollary 4 tells us that the highest index non-leaf node has index \(\lfloor n / 2\rfloor\), we only need to percolate down the nodes at indices \(n / 2, n / 2-1, n / 2-2, \ldots, 1\). The algorithm makes use of the PercolateDown()
helper function described above and is given below:
template < class Comparable >
void heap < Comparable > ::heapify() {
for (int i = n / 2; i > 0; i - -)
percolateDown(i);
}
Example In this example, we just work directly on the array. You can draw the complete binary tree at each stage to visualize it becoming a heap. Suppose that the initial array has the following data:
Then percolateDown
will be called successively on the elements in positions \(4,3,2\), and 1 . After the first call, it will look this this:
Then 15 is compared to its two children, 26 and \(10.10\) and 15 are swapped:
Then 21 is compared to 17 and 18 , and it is swapped with 17 . Because it is still larger than 20 , it is swapped again:
Finally, 23 is percolated down and the result is:
Analysis of heapify() Now it is time to do a little math and establish why building a heap this way is actually very efficient. We will prove the following theorem.
Heapifying a binary tree with N nodes takes O(N) time.
Observe that \(n\) steps are needed to insert \(n\) items into the heap, without restoring the heap order property after each insertion, if an array is used, since each insertion takes \(O(1)\) steps. Then heapifying the entire array requires that each of the nodes from the root to the level above the bottom be percolated. In the worst case, each of these is moved to the lowest level. Thus, the total number of steps in heapifying is equal to the sum of the heights of these nodes in the heap. It is sufficient to show that the sum of the heights of the nodes in a heap is O(n). We start by establishing the following lemma.
The sum of the heights of the nodes in a full (perfect) binary tree of height \(h\) is \(\left(2^{h+1}-1\right)-(h+1)\).
Let’s proof this using Math. There are \(2^{h}\) nodes at height 0 . There are \(2^{h-1}\) nodes at height \(1,2^{h-2}\) nodes at height 2 , and so on, until there is but \(1=2^{0}=2^{h-h}\) node at height \(h\). In general, there are \(2^{k}\) nodes at height \((h-k)\). Since the nodes at height 0 add nothing to the sum of the heights, the sum \(S\) can be expressed as
\[S=\sum_{k=1}^{h} k 2^{h-k}\]While it is possible to solve for \(S\) in a straightforward approach, the following “trick” works well. Double \(S\) and you get
\[2 S=\sum_{k=1}^{h} 2 k 2^{h-k}=\sum_{k=1}^{h} k 2^{h-k+1}\]Separate the low order \((k=1)\) term in \(2 S\) from Eq. 2
\[2 S=2^{h}+\sum_{k=2}^{h} k 2^{h-k+1}\]and the high-order \((k=h)\) term in \(S\) from Eq. 1
\[S=h+\sum_{k=1}^{h-1} k 2^{h-k}\]Now subtract \(S\) from \(2 S\), i.e., Eq. 4 from Eq. 3
\[S=2 S-S=\left(2^{h}+\sum_{k=2}^{h} k 2^{h-k+1}\right)-\left(h+\sum_{k=1}^{h-1} k 2^{h-k}\right)\]Since
\[\sum_{k=2}^{h} k 2^{h-k+1}\]is the same as
\[\sum_{k=1}^{h-1}(k+1) 2^{h-(k+1)+1}=\sum_{k=1}^{h-1}(k+1) 2^{h-k}\]we can rewrite the previous equation as follows:
\[\begin{aligned} & S=\left(2^{h}+\sum_{k=2}^{h} k 2^{h-k+1}\right)-\left(h+\sum_{k=1}^{h-1} k 2^{h-k}\right) \\ & =\left(2^{h}+\sum_{k=1}^{h-1}(k+1) 2^{h-k}\right)-\left(h+\sum_{k=1}^{h-1} k 2^{h-k}\right) \\ & =2^{h}+\sum_{k=1}^{h-1}\left((k+1) 2^{h-k}-k 2^{h-k}\right)-h \\ & =2^{h}+\sum_{k=1}^{h-1} 2^{h-k}-h \\ & =2^{h}-h+\sum_{k=1}^{h-1} 2^{h-k} \\ & =2^{h}-h+\sum_{k=1}^{h-1} 2^{k} \\ & =2^{h}-h+\sum_{k=0}^{h-1} 2^{k}-1 \\ & =2^{h}-h+2^{h}-1-1 \\ & =\left(2^{h+1}-1\right)-(h+1) \end{aligned}\]Recall that the number of nodes in a full binary tree of height \(h\) is \(2^{h+1}-1\). Since the lemma tells us that the sum of the heights is \(2^{h+1}-1-(h+1)\), and since \(h+1=\log 2^{h+1}\), it follows that the sum of the heights of the nodes in a full binary tree with \(n\) nodes is \(n-\log (n+1)\), which is O(n). But a heap is not necessarily a full binary tree; it might be missing nodes in its lowest level. This means that the heights of some nodes in the tree are reduced by one and therefore the expression \(2^{h+1}-1-(h+1)\) is an upper bound on the sum of heights. So for a complete binary tree of height \(h\), the sum of the heights is at most \(2^{h+1}-1-(h+1)\).
In a complete binary tree of height \(h\) that is not full, the least number of nodes is \(2^{h}\). Letting \(n=2^{h}, 2 n=2^{h+1}>2^{h+1}-1\) and \(\log (2 n)=h+1\). Hence if the heap has \(n=2^{h}\) nodes, the sum of the heights, expressed in terms of \(n\) is at most \((2 n-1)-\log (2 n)\), which is also O(n). Therefore, the sum of the heights ranges between \(n-\log (n+1)\) and \((2 n-1)-\log (2 n)\), both of which are O(n), proving that Theorem 7 is true.