Augmented trees

An augmented data structure stores additional information in each of its nodes that caches values that might otherwise be expensive to compute. For trees, this might include information like the size of a subtree (which can be useful for ranking values, where we want to determine how many elements of the tree are smaller), the height of a subtree, or other summary information like the sum of all the keys in a subtree.

Augmented data structures, in a sense, violate the no-separate-but-equal rule that says we shouldn’t store the same information in different places. The reason we try to avoid this is that it’s trouble if the two copies diverge, and by not having two copies in the first place there is no possibility that they contradict each other. But in this case the reduced cost justifies breaking this rule.

The idea is that when we insert a new element into an augmented tree, it only changes the height/size/sum/etc. values for nodes on the path from the root to the new value. Since each of these aggregate values can be computed for a node in O(1) time from the values in its children, we can update all the aggregate values on our way back up the stack after doing the insertion at a cost of O(1) per node. This will give a total cost of O(log n) assuming our tree is reasonably balanced.


Storing the height field can be useful for balancing, as in AVL trees.

Storing the size allows ranking (computing the number of elements less than a given target value) and unraking (find an element with a particular rank). Sample code for doing this is given in the AVL tree sample implementation chapter.

Storing other aggregates like the sum of keys or values allows range queries, where we ask, for example, for some aggregate statistic (like the sum or average) of all the elements between some given minimum and maximum.

Assuming we keep the tree balanced and correctly maintain the aggregate data or each subtree, all of these operations can be done in O(log n) time.

Licenses and Attributions

Speak Your Mind