Functions

A function, procedure, or subroutine encapsulates some complex computation as a single operation. Typically, when we call a function, we pass as arguments all the information this function needs, and any effect it has will be reflected in either its return value or (in some cases) in changes to values pointed to by the arguments. Inside the function, the arguments are copied into local variables, which can be used just like any other local variable—they can even be assigned to without affecting the original argument.

Function definitions

A typical function definition looks like this:

/* Returns the square of the distance between two points separated by 
   dx in the x direction and dy in the y direction. */
int
distSquared(int dx, int dy)
{
    return dx*dx + dy*dy;
}

examples/functions/distSquaredNoHeader.c

The part outside the braces is called the function declaration; the braces and their contents is the function body.

Like most complex declarations in C, once you delete the type names the declaration looks like how the function is used: the name of the function comes before the parentheses and the arguments inside. The ints scattered about specify the type of the return value of the function (before the function name) and of the parameters (inside the parentheses after the function name); these are used by the compiler to determine how to pass values in and out of the function and (usually for more complex types, since numerical types will often convert automatically) to detect type mismatches.

If you want to define a function that doesn’t return anything, declare its return type as void. You should also declare a parameter list of void if the function takes no arguments.

/* Prints "hi" to stdout */
void
helloWorld(void)
{
    puts("hi");
}

examples/functions/helloWorld.c

It is not strictly speaking an error to omit the second void here. Putting void in for the parameters tells the compiler to enforce that no arguments are passed in. If we had instead declared helloWorld as

/* Prints "hi" to stdout */
void
helloWorld()    /* DANGER! */
{
    puts("hi");
}

it would be possible to call it as

    helloWorld("this is a bogus argument");

without causing an error. The reason is that a function declaration with no arguments means that the function can take an unspecified number of arguments, and it’s up to the user to make sure they pass in the right ones. There are good historical reasons for what may seem like obvious lack of sense in the design of the language here, and fixing this bug would break most C code written before 1989. But you shouldn’t ever write a function declaration with an empty argument list, since you want the compiler to know when something goes wrong.

When to write a function

As with any kind of abstraction, there are two goals to making a function:

  • Encapsulation: If you have some task to carry out that is simple do describe from the outside but messy to understand from the inside, wrapping it in a function lets somebody carry out this task without having to know the details. This is also useful if you want to change the implementation later.
  • Code re-use: If you find yourself writing the same lines of code in several places (or worse, are tempted to copy a block of code to several places), you should probably put this code in a function (or perhaps more than one function, if there is no succinct way to describe what this block of code is doing).

Both of these goals may be trumped by the goal of making your code understandable. If you can’t describe what a function is doing in a single, simple sentence, this is a sign that maybe you need to restructure your code. Having a function that does more than one thing (or does different thing depending on its arguments) is likely to lead to confusion. So, for example, this is not a good function definition:

/*** ### UGLY CODE AHEAD ### ***/

/*
 * If getMaximum is true, return maximum of x and y,
 * else return minimum.
 */
int
computeMaximumOrMinimum(int x, int y, int getMaximum)
{
    if(x > y) {
        if(getMaximum) {
            return x;
        } else {
            return y;
        }
    } else {
        if(getMaximum) {
            return y;
        } else {
            return x;
        }
    }
}

Better would be to write two functions:

/* return the maximum of x and y */
int
maximum(int x, int y)
{
    if(x > y) {
        return x;
    } else {
        return y;
    }
}

/* return the minimum of x and y */
int
minimum(int x, int y)
{
    if(x < y) {
        return x;
    } else {
        return y;
    }
}

At the same time, it’s possible for a function to be too simple. Suppose I write the function

/* print x to stdout followed by a newline */
void
printIntWithNewline(int x)
{
    printf("%d\n", x);
}

It’s pretty clear from the name what this function does. But since anybody who has been using C for a while has seen printf("%d\n", ...) over and over again, it’s usually more clear to expand out the definition:

    printIntWithNewline(2+5);   /* this could do anything */
    printf("%d\n", 2+7);        /* this does exactly what it says */

As with all caveats, this caveat comes with its own caveat: what might justify a function like this is if you want to be able to do some kind of specialized formatting that should be consistent for all values of a particular form. So you might write a printDistance function like the above as a stub for a fancier function that might use different units at different scales or something.

A similar issue will come up with non-syntactic macros, which also tend to fail the “does this make my code more or less understandable” test. Usually it is a bad idea to try to replace common C idioms.

Calling a function

A function call consists of the function followed by its arguments (if any) inside parentheses, separated by comments. For a function with no arguments, call it with nothing between the parentheses. A function call that returns a value can be used in an expression just like a variable. A call to a void function can only be used as an expression by itself:

    totalDistance += distSquared(x1 - x2, y1 - y2);
    helloWorld();
    greetings += helloWorld();  /* ERROR */

The return statement

To return a value from a function, write a return statement, e.g.

    return 172;

The argument to return can be any expression. Unlike the expression in, say, an if statement, you do not need to wrap it in parentheses. If a function is declared void, you can do a return with no expression, or just let control reach the end of the function.

Executing a return statement immediately terminates the function. This can be used like break to get out of loops early.

/* returns 1 if n is prime, 0 otherwise */
int
isPrime(int n)
{
    int i;

    if (n < 2) return 0;   /* special case for 0, 1, negative n */
 
    for(i = 2; i < n; i++) {
        if (n % i == 0) {
            /* found a factor */
            return 0;
        }
    }

    /* no factors */
    return 1;
}

examples/functions/isPrime.c

Function declarations and modules

By default, functions have global scope: they can be used anywhere in your program, even in other files. If a file doesn’t contain a declaration for a function someFunc before it is used, the compiler will assume that it is declared like int someFunc() (i.e., return type int and unknown arguments). This can produce infuriating complaints later when the compiler hits the real declaration and insists that your function someFunc should be returning an int and you are a bonehead for declaring it otherwise.

The solution to such insulting compiler behavior errors is to either (a) move the function declaration before any functions that use it; or (b) put in a declaration without a body before any functions that use it, in addition to the declaration that appears in the function definition. (Note that this violates the no separate but equal rule, but the compiler should tell you when you make a mistake.) Option (b) is generally preferred, and is the only option when the function is used in a different file.

To make sure that all declarations of a function are consistent, the usual practice is to put them in an include file. For example, if distSquared is used in a lot of places, we might put it in its own file distSquared.c:

#include "distSquared.h"

int
distSquared(int dx, int dy)
{
    return dx*dx + dy*dy;
}

examples/functions/distSquared.c

The file distSquared.c above uses #include to include a copy of the following header file distSquared.h:

/* Returns the square of the distance between two points separated by 
   dx in the x direction and dy in the y direction. */
int distSquared(int dx, int dy);

examples/functions/distSquared.h

Note that the declaration in distSquared.h doesn’t have a body. Instead, it’s terminated by a semicolon, like a variable declaration. It’s also worth noting that we moved the documenting comment to distSquared.h: the idea is that distSquared.h is the public face of this (very small one-function) module, and so the explanation of how to use the function should be there.

The reason distSquared.c includes distSquared.h is to get the compiler to verify that the declarations in the two files match. But to use the distSquared function, we also put #include "distSquared.h" at the top of the file that uses it:

#include "distSquared.h"

#define THRESHOLD (100)

int
tooClose(int x1, int y1, int x2, int y2)
{
    return distSquared(x1 - x2, y1 - y2) < THRESHOLD;
}

examples/functions/tooClose.c

The #include on line 1 uses double quotes instead of angle brackets; this tells the compiler to look for distSquared.h in the current directory instead of the system include directory (typically /usr/include).

Static functions

By default, all functions are global; they can be used in any file of your program whether or not a declaration appears in a header file. To restrict access to the current file, declare a function static, like this:

static void
helloHelper(void)
{
    puts("hi!");
}

void
hello(int repetitions)
{
    int i;

    for(i = 0; i < repetitions; i++) {
        helloHelper();
    }
}

examples/functions/staticHello.c

The function hello will be visible everywhere. The function helloHelper will only be visible in the current file.

It’s generally good practice to declare a function static unless you intend to make it available, since not doing so can cause namespace conflicts, where the presence of two functions with the same name either prevent the program from linking or—even worse—cause the wrong function to be called. The latter can happen with library functions, since C allows the programmer to override library functions by defining a new function with the same name. Early on in my career as a C programmer, I once had a program fail in a spectacularly incomprehensible way because I’d written a select function without realizing that select is a core library function in Unix.

Local variables

A function may contain definitions of local variables, which are visible only inside the function and which survive only until the function returns. These may be declared at the start of any block (group of statements enclosed by braces), but it is conventional to declare all of them at the outermost block of the function.

/* Given n, compute n! = 1*2*...*n */
/* Warning: will overflow on 32-bit machines if n > 12 */
int
factorial(int n)
{
    int i;
    int product;

    if(n < 2) return n;
    /* else */

    product = 1;

    for(i = 2; i <= n; i++) {
        product *= i;
    }

    return product;
}

examples/functions/factorial.c

Mechanics of function calls

Several things happen under the hood when a function is called. Since a function can be called from several different places, the CPU needs to store its previous state to know where to go back. It also needs to allocate space for function arguments and local variables.

Some of this information will be stored in registers, memory locations built into the CPU itself, but most will go on the stack, a region of memory that on typical machines grows downward, even though the most recent additions to the stack are called the “top” of the stack. The location of the top of the stack is stored in the CPU in a special register called the stack pointer.

So a typical function call looks like this internally:

  1. The current instruction pointer or program counter value, which gives the address of the next line of machine code to be executed, is pushed onto the stack.
  2. Any arguments to the function are copied either into specially designated registers or onto new locations on the stack. The exact rules for how to do this vary from one CPU architecture to the next, but a typical convention might be that the first few arguments are copied into registers and the rest (if any) go on the stack.
  3. The instruction pointer is set to the first instruction in the code for the function.
  4. The code for the function allocates additional space on the stack to hold its local variables (if any) and to save copies of the values of any registers it wants to use (so that it can restore their contents before returning to its caller).
  5. The function body is executed until it hits a return statement.
  6. Returning from the function is the reverse of invoking it: any saved registers are restored from the stack, the return value is copied to a standard register, and the values of the instruction pointer and stack pointer are restored to what they were before the function call.

From the programmer’s perspective, the important point is that both the arguments and the local variables inside a function are stored in freshly-allocated locations that are thrown away after the function exits. So after a function call the state of the CPU is restored to its previous state, except for the return value. Any arguments that are passed to a function are passed as copies, so changing the values of the function arguments inside the function has no effect on the caller. Any information stored in local variables is lost.

Under very rare circumstances, it may be useful to have a variable local to a function that persists from one function call to the next. You can do so by declaring the variable static. For example, here is a function that counts how many times it has been called:

/* return the number of times the function has been called */
int
counter(void)
{
    static count = 0;

    return ++count;
}

examples/functions/staticCounter.c

Static local variables are stored outside the stack with global variables, and have unbounded extent. But they are only visible inside the function that declares them. This makes them slightly less dangerous than global variables—there is no fear that some foolish bit of code elsewhere will quietly change their value—but it is still the case that they usually aren’t what you want. It is also likely that operations on static variables will be slightly slower than operations on ordinary (“automatic”) variables, since making them persistent means that they have to be stored in (slow) main memory instead of (fast) registers.


Licenses and Attributions


Speak Your Mind

-->