Asymptotic notation

Asymptotic notation is a tool for measuring the growth rate of functions, which for program design usually means the way in which the time or space costs of a program scale with the size of the input. We’ll start with an example of why this is important.

Two sorting algorithms

Suppose we want to sort in increasing order a deck of n cards, numbered 1 through n. Here are two algorithms for doing this.

In the mergesort algorithm, we start with n piles of one card each. We then take pairs of piles and merge them together, by repeatedly pulling the smaller of the two smallest cards off the top of the pile and putting it on the bottom of our output pile. After the first round of this, we have n/2 piles of two cards each. After another round, n/4 piles of four cards each, and so on until we get one pile with n cards after roughly log2n rounds of merging.

Here’s a picture of this algorithm in action on 8 cards:

5 7 1 2 3 4 8 6

57 12 34 68

1257 3468

12345678

Suppose that we want to estimate the cost of this algorithm without actually coding it up. We might observe that each time a card is merged into a new pile, we need to do some small, fixed number of operations to decide that it’s the smaller card, and then do an additional small, fixed number of operations to physically move it to a new place. If we are really clever, we might notice that since the size of the pile a card is in doubles with each round, there can be at most ⌈log2n rounds until all cards are in the same pile. So the cost of getting a single card in the right place will be at most clog n where c counts the “small, fixed” number of operations that we keep mentioning, and the cost of getting every card in the right place will be at most cnlog n.

In the ’‘’selection sort’’’ algorithm, we look through all the cards to find the smallest one, swap it to the beginning of the list, then look through the remaining cards for the second smallest, swap it to the next position, and so on.

Here’s a picture of this algorithm in action on 8 cards:

57123486

17523486

12573486

12375486

12345786

12345786

12345687

12345678

This is a simpler algorithm to implement that mergesort, but it is usually slower on large inputs. We can formalize this by arguing that each time we scan k cards to find the smallest, it’s going to take some small, fixed number of operations to test each card against the best one we found so far, and an additional small, fixed number of operations to swap the smallest card to the right place. To compute the total cost we have to add these costs for all cards, which will give us a total cost that looks something like (c1n + c2) + (c1(n − 1) + c2) + (c1(n − 2) + c2) + … + (c11 + c2) = c1n(n + 1)/2 + c2n.

For large n, it looks like this is going to cost more than mergesort. But how can we make this claim cleanly, particularly if we don’t know the exact values of c, c1, and c2?

Big-O to the rescue

The idea is to replace complex running time formulae like cnlog n or c1n(n + 1)/2 + c2n with an asymptotic growth rate O(nlog n) or O(n2). These asymptotic growth rates omit the specific details of exactly how fast our algorithms run (which we don’t necessarily know without actually coding them up) and concentrate solely on how the cost scales as the size of the input n becomes large.

This avoids two issues:

  1. Different computers run at different speeds, and we’d like to be able to say that one algorithm is better than another without having to measure its running time on specific hardware.
  2. Performance on large inputs is more important than performance on small inputs, since programs running on small inputs are usually pretty fast.

The idea of ’‘’asymptotic notation’’’ is to consider the shape of the worst-case cost T(n) to process an input of size n. Here, worst-case means we consider the input that gives the greatest cost, where cost is usually time, but may be something else like space. To formalize the notion of shape, we define classes of functions that behave like particular interesting functions for large inputs. The definition looks much like a limit in calculus:

O(g(n))

A function f(n) is in the class O(g(n)) if there exist constants N and c such that f(n) < c ⋅ g(n) when n > N.

If f(n) is in O(g(n)) we say f(n) is ’‘’big-O’’’ of g(n) or just f(n) = O(g(n)).

Unpacked, this definition says that f(n) is less than a constant times g(n) when n is large enough.

Some examples:

  • Let f(n) = 3n + 12, and let g(n) = n. To show that f(n) is in O(g(n)) = O(n), we can pick whatever constants we want for c and N (as long as they work). So let’s make N be 100 and c be 4. Then we need to show that if n > 100, 3n + 12 < 4n. But 3n + 12 < 4n holds precisely when 12 < n, which is implied by our assumption that n > 100.
  • Let f(n) = 4n2 + 23n + 15, and let g(n) = n2. Now let N be 100 again and c be 5. So we need 4n2 + 23n + 15 < 5n2, or 23n + 15 < n2. But n > 100 means that n2 > 100n = 50n + 50n > 50n + 5000 > 23n + 15, which proves that f(n) is in O(n2).
  • Let f(n) < 146 for all n, and let g(n) = 1. Then for N = 0 and c = 146, f(n) < 146 = 146g(n), and f(n) is in O(1).

Writing proofs like this over and over again is a nuisance, so we can use some basic rules of thumb to reduce messy functions f(n) to their asymptotic forms:

  • If c is a constant (doesn’t depend on n), then c ⋅ f(n) = O(f(n)). This follows immediately from being able to pick c in the definition. So we can always get rid of constant factors: 137n5 = O(n5).
  • If f(n) = g(n) + h(n), then the bigger of g(n) or h(n) wins. This is because if g(n) ≤ h(n), then g(n) + h(n) ≤ 2g(n), and then big-O eats the 2. So 12n2 + 52n + 3 = O(n2) because n2 dominates all the other terms.
  • To figure out which of two terms dominates, the rule is
    • Bigger exponents win: If a < b, then O(na) + O(nb) = O(nb).
    • Polynomials beat logarithms: For any a and any b > 0, O(logan) + O(nb) = O(nb).
    • Exponentials beat polynomials: For any a and any b > 1, O(na) + O(bn) = O(bn).
    • The distributive law works: Because O(log n) dominates O(1), O(nlog n) dominates O(n).

This means that almost any asymptotic bound can be reduced down to one of a very small list of common bounds. Ones that you will typically see in practical algorithms, listed in increasing order, are O(1), O(log n), O(n), O(nlog n), or O(n2).

Applying these rules to mergesort and selection sort gives us asymptotic bounds of cnlog n = O(nlog n) (the constant vanishes) and c1n(n + 1)/2 + c2n = c1n2/2 + c1n/2 + c2n = O(n2) + O(n) + O(n) = O(n2) (the constants vanish and then O(n2) dominates). Here we see that no matter how fast our machine is at different low-level operations, for large enough inputs mergesort will beat selection sort.

Asymptotic cost of programs

To compute the asymptotic cost of a program, the rule of thumb is that any simple statement costs O(1) time to evaluate, and larger costs are the result of loops or calls to expensive functions, where a loop multiplies the cost by the number of iterations in the loop. When adding costs together, the biggest cost wins:

So this function takes O(1) time:

/* return the sum of the integers i with 0 <= i  and i < n */
int
sumTo(int n)
{
    return n*(n-1)/2;
}

But this function, which computes exactly the same value, takes O(n) time:

/* return the sum of the integers i with 0 <= i  and i < n */
int
sumTo(int n)
{
    int i;
    int sum = 0;

    for(i = 0; i < n; i++) {
        sum += i;
    }

    return sum;
}

The reason it takes so long is that each iteration of the loop takes only O(1) time, but we execute the loop n times, and n ⋅ O(1) = O(n).

Here’s an even worse version that takes O(n2) time:

/* return the sum of the integers i with 0 <= i  and i < n */
int
sumTo(int n)
{
    int i;
    int j;
    int sum = 0;

    for(i = 0; i < n; i++) {
        for(j = 0; j < i; j++) {
            sum++;
        }
    }

    return sum;
}

Here we have two nested loops. The outer loop iterates exactly n times, and for each iteration the inner loop iterates at most n times, and the innermost iteration costs O(1) each time, so the total is at most O(n2). (In fact, it’s no better than this, because at least n/2 times we execute the inner loop, we do at least n/2 iterations.)

So even if we knew that the constant on the first implementation was really large (maybe our CPU is bad at dividing by 2?), for big values of n it’s still likely to be faster than the other two.

(This example is a little misleading, because n is not the size of the input but the actual input value. More typical might be a statement that the cost of strlen is O(n) where n is the length of the string.)

Other variants of asymptotic notation

Big-O notation is good for upper bounds, but the inequality in the definition means that it can’t be used for anything else: it is the case that 12 = O(n67) just because 12 < n67 when n is large enough. There is an alternative definition, called ’‘’big-Omega’’’, that works in the other direction:

Ω(g(n))

A function f(n) is in the class Ω(g(n)) if there exist constants N and c such that f(n) > c ⋅ g(n) when n > N.

This is exactly the same as the definition of O(g(n)) except that the inequality goes in the other direction. So if we want to express that some algorithm is very expensive, we might write that it’s Ω(n2), which says that once the size of the input is big enough, then the cost grows at least as fast as n2.

If you want to claim that your bound is tight—both an upper and a lower bound—use big-Theta: f(n) is Θ(g(n)) if it is both O(f(n)) and Ω(g(n)).

Mostly we will just use big-O, with the understanding that when we say that a particular algorithm is O(n), that’s the best bound we could come up with.


Licenses and Attributions


Speak Your Mind

-->