# Heaps

A heap is a binary tree in which each element has a key (or sometimes priority) that is less than the keys of its children. Heaps are used to implement the priority queue abstract data type, which we’ll talk about first.

## Priority queues

In a standard queue, elements leave the queue in the same order as they arrive. In a priority queue, elements leave the queue in order of decreasing priority: the DEQUEUE operation becomes a DELETE-MIN operation (or DELETE-MAX, if higher numbers mean higher priority), which removes and returns the highest-priority element of the priority queue, regardless of when it was inserted. Priority queues are often used in operating system schedulers to determine which job to run next: a high-priority job (e.g., turn on the fire suppression system) runs before a low-priority job (floss the cat) even if the low-priority job has been waiting longer.

## Expensive implementations of priority queues

Implementing a priority queue using an array or linked list is likely to be expensive. If the array or list is unsorted, it takes O(n) time to find the minimum element; if it is sorted, it takes O(n) time (in the worst case) to add a new element. So such implementations are only useful when the numbers of INSERT and DELETE-MIN operations are very different. For example, if DELETE-MIN is called only rarely but INSERT is called often, it may actually be cheapest to implement a priority queue as an unsorted linked list with O(1) INSERTs and O(n) DELETE-MINs. But if we expect that every element that is inserted is eventually removed, we want something for which both INSERT and DELETE-MIN are cheap operations.

## Structure of a heap

A heap is a binary tree in which each node has a smaller key than its children; this property is called the heap property or heap invariant.

To insert a node in the heap, we add it as a new leaf, which may violate the heap property if the new node has a lower key than its parent. But we can restore the heap property (at least between this node and its parent) by swapping either the new node or its sibling with the parent, where in either case we move up the node with the smaller key. This may still leave a violation of the heap property one level up in the tree, but by continuing to swap small nodes with their parents we eventually reach the top and have a heap again. The time to complete this operation is proportional to the depth of the heap, which will typically be O(log n) (we will see how to enforce this in a moment).

To implement DELETE-MIN, we can easily find the value to return at the top of the heap. Unfortunately, removing it leaves a vacuum that must be filled in by some other element. The easiest way to do this is to grab a leaf (which probably has a very high key), and then float it down to where it belongs by swapping it with its smaller child at each iteration. After time proportional to the depth (again O(log n) if we are doing things right), the heap invariant is restored.

Similar local swapping can be used to restore the heap invariant if the priority of some element in the middle changes; we will not discuss this in detail.

## Packed heaps

It is possible to build a heap using structs and pointers, where each element points to its parent and children. In practice, heaps are instead stored in arrays, with an implicit pointer structure determined by array indices. For zero-based arrays as in C, the rule is that a node at position i has children at positions 2*i+1 and 2*i+2; in the other direction, a node at position i has a parent at position (i-1)/2 (which rounds down in C). This is equivalent to storing a heap in an array by reading through the tree in breadth-first search order:

   0
/ \
1   2
/ \ / \
3 4 5 6


becomes

0 1 2 3 4 5 6


This approach works best if there are no gaps in the array. So to maximize efficiency we make this “no gaps” policy part of the invariant. We can do so because we don’t care which leaf gets added when we do an INSERT, so we can make it be whichever leaf is at the end of the array. Similarly, in a DELETE-MIN operation, we can promote the element at the end of the array to the root before floating it down. Both these operations change the number of elements in the array, and INSERTs in particular might force us to reallocate eventually. So in the worst case INSERT can be an expensive operation, although as with growing hash tables, the amortized cost may still be small.

## Bottom-up heap’ification

If we are presented with an unsorted array, we can turn it into a heap more quickly than the O(nlog n) time required to do n INSERTs. The trick is to build the heap from the bottom up. This means starting with position n − 1 and working back to position 0, so that when it comes time to fix the heap invariant at position i, we have already fixed it at all later positions (this is a form of dynamic programming). Unfortunately, it is not quite enough simply to swap a[i] with its smaller child when we get there, because we could find that a[0] (say) was the largest element in the heap. But the cost of floating a[i] down to its proper place will be proportional to its own height rather than the height of the entire heap. Since most of the elements of the heap are close to the bottom, the total cost will turn out to be O(n).

## Heapsort

Bottom-up heapification is used in the Heapsort algorithm, which first does bottom-up heapification in O(n) time and then repeatedly calls DELETE-MAX to extract the largest remaining element. This is no faster than the O(nlog n) cost of mergesort or quicksort in typical use, but requires very little auxiliary storage since we can maintain the heap in the bottom part of the same array whose top part stores the max elements extracted so far.

Here is a simple implementation of heapsort, that demonstrates how both bottom-up heapification and the DELETE-MAX procedure work by floating elements down to their proper places:

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

/* max heap implementation */

/* compute child 0 or 1 */
#define Child(x, dir) (2*(x)+1+(dir))

/* float value at position pos down */
static void
floatDown(int n, int *a, int pos)
{
int x;

/* save original value once */
x = a[pos];

for(;;) {
if(Child(pos, 1) < n && a[Child(pos, 1)] > a[Child(pos, 0)]) {
/* maybe swap with Child(pos, 1) */
if(a[Child(pos, 1)] > x) {
a[pos] = a[Child(pos, 1)];
pos = Child(pos, 1);
} else {
/* x is bigger than both kids */
break;
}
} else if(Child(pos, 0) < n && a[Child(pos, 0)] > x) {
/* swap with Child(pos, 0) */
a[pos] = a[Child(pos, 0)];
pos = Child(pos, 0);
} else {
/* done */
break;
}
}

a[pos] = x;
}

/* construct a heap bottom-up */
static void
heapify(int n, int *a)
{
int i;

for(i = n-1; i >= 0; i--) {
floatDown(n, a, i);
}
}

/* sort an array */
void
heapSort(int n, int *a)
{
int i;
int tmp;

heapify(n, a);

for(i = n-1; i > 0; i--) {
/* swap max to a[i] */
tmp = a[0];
a[0] = a[i];
a[i] = tmp;

/* float new a[0] down */
floatDown(i, a, 0);
}
}

#define N (100)
#define MULTIPLIER (17)

int
main(int argc, char **argv)
{
int a[N];
int i;

if(argc != 1) {
fprintf(stderr, "Usage: %s\n", argv[0]);
return 1;
}

for(i = 0; i < N; i++) { a[i] = (i*MULTIPLIER) % N; }

for(i = 0; i < N; i++) { printf("%d ", a[i]); }
putchar('\n');

heapSort(N, a);

for(i = 0; i < N; i++) { printf("%d ", a[i]); }
putchar('\n');

return 0;
}


examples/sorting/heapsort.c