Binary Search Trees

A binary search tree is a binary tree in which each node has a key, and a node’s key must be greater than all keys in the subtree of its left-hand child and less than all keys in the subtree of its right-hand child. This allows a node to be searched for using essentially the same binary search algorithm used on sorted arrays.

Searching for a node

/* returns pointer to node with given target key */
/* or 0 if no such node exists */
struct node *
treeSearch(struct node *root, int target)
{
    if(root == 0 || root->key == target) {
        return root;
    } else if(root->key > target) {
        return treeSearch(root->left, target);
    } else {
        return treeSearch(root->right, target);
    }
}

This procedure can be rewritten iteratively, which avoids stack overflow and is likely to be faster:

struct node *
treeSearch(struct node *root, int target)
{
    while(root != 0 && root->key != target) {
        if(root->key > target) {
            root = root->left;
        } else {
            root = root->right;
        }
    }

    return root;
}

These procedures can be modified in the obvious way to deal with keys that aren’t ints, as long as they can be compared (e.g., by using strcmp on strings).

Inserting a new node

As in a hash table, the insertion procedure mirrors the search procedure. We must be a little careful to avoid actually walking all the way down to a null pointer, since a null pointer now indicates a missing space for a leaf that we can fill with our new node. So the code is a little more complex.

// Insert a new key into a tree whose root is pointed to
// by *parent, which should be 0 if tree is empty.  May modify *parent.
void
treeInsert(struct tree **parent, int key)
{
    struct tree *newNode;

    for(;;) {
        if(*parent == 0) {
            // put it here
            *parent = malloc(sizeof(*newNode));
            assert(*parent);

            (*parent)->key = key;
            (*parent)->left = 0;
            (*parent)->right = 0;

            return;
        } else if(key == (*parent)->key) {
            // already present
            return;
        } else if(key < (*parent)->key) {
            // insert in left subtree
            parent = &(*parent)->left;
        } else {
            // insert in right subtree
            parent = &(*parent)->right;
        }
    }
}

Note that this function makes not attempt to keep the tree balanced. This may lead to very long paths if new keys are inserted in strictly increasing or strictly decreasing order.

Deleting a node

Deletion is more complicated. If a node has no children, we can just remove it, and the rest of the tree stays the same. A node with one child can be spliced out, connecting its parent directly to its child. But with two children, we can’t do this.

The trick is to find the leftmost node in our target’s right subtree (or vice versa). This node exists assuming the target has two children. As in a hash table, we can then swap our target node with this more convenient node. Because it is the leftmost node, it has no left child, so we can delete it using the no-children or one-child case.

Costs

Searching for or inserting a node in a binary search tree with n nodes takes time proportional to the depth of the node. In balanced trees, where the nodes in each subtree are divided roughly evenly between the two child subtrees, this will be O(log n), but for a badly unbalanced tree, this might be as much as O(n). So making a binary search tree work efficiently requires keeping it balanced.


Licenses and Attributions


Speak Your Mind