Variables

Variables in C are a direct abstraction of physical memory locations. To understand how variables work, it helps to start by understanding how computer memory works.

Memory

Memory consists of many bytes of storage, each of which has an address which is itself a sequence of bits. Though the actual memory architecture of a modern computer is complex, from the point of view of a C program we can think of as simply a large address space that the CPU can store things in (and load things from), provided it can supply an address to the memory. Because we don’t want to have to type long strings of bits all the time, the C compiler lets us give names to particular regions of the address space, and will even find free space for us to use.

Variables as names

A variable is a name given in a program for some region of memory. Each variable has a type, which tells the compiler how big the region of memory corresponding to it is and how to treat the bits stored in that region when performing various kinds of operations (e.g. integer variables are added together by very different circuitry than floating-point variables, even though both represent numbers as bits). In modern programming languages, a variable also has a scope (a limit on where the name is meaningful, which allows the same name to be used for different variables in different parts of the program) and an extent (the duration of the variable’s existence, controlling when the program allocates and deallocates space for it).

Variable declarations

Before you can use a variable in C, you must declare it. Variable declarations show up in three places:

  • Outside a function. These declarations declare global variables that are visible throughout the program (i.e. they have global scope). Use of global variables is almost always a mistake.
  • In the argument list in the header of a function. These variables are parameters to the function. They are only visible inside the function body (local scope), exist only from when the function is called to when the function returns (bounded extent—note that this is different from what happens in some garbage-collected languages like Scheme), and get their initial values from the arguments to the function when it is called.
  • Inside a function. (Before C99, only at the start of a block delimited by curly braces.) Such variables are visible only within the block in which they are declared (local scope again) and exist only when the containing function is active (bounded extent). The convention in C is has generally been to declare all such local variables at the top of a function; this is different from the convention in C++ or Java, which encourage variables to be declared when they are first used. This convention may be less strong in C99 code, since C99 adopts the C++ rule of allowing variables to be declared anywhere (which can be particularly useful for index variables in for loops).

Another feature of function parameters and local variables is that if a function is called more than once (even if the function calls itself), each copy of the function gets its own local variables.

Variable declarations consist of a type name followed by one or more variable names separated by commas and terminated by a semicolon (except in argument lists, where each declaration is terminated by a comma). I personally find it easiest to declare variables one per line, to simplify documenting them. It is also possible for global and local variables (but not function arguments) to assign an initial value to a variable by putting in something like = 0 after the variable name. It is good practice to put a comment after each variable declaration that explains what the variable does (with a possible exception for conventionally-named loop variables like i or j in short functions). Below is an example of a program with some variable declarations in it:

#include <stdio.h>
#include <ctype.h>

/* This program counts the number of digits in its input. */

/*
 *This global variable is not used; it is here only to demonstrate
 * what a global variable declaration looks like.
 */
unsigned long SpuriousGlobalVariable = 127;

int
main(int argc, char **argv)
{
    int c;              /* character read */
    int count = 0;      /* number of digits found */

    while((c = getchar()) != EOF) {
        if(isdigit(c)) {
            count++;
        }
    }

    printf("%d\n", count);

    return 0;
}

examples/variables/countDigits.c

Variable names

C is pretty generous about what you can use in a variable name: generally any sequence of digits, letters, and the underscore character _ can be used, so long as the first character is not a digit. Very old versions of C may limit the length of external variables (those that can be reference from other files, like library routines) to 6 characters, but modern versions don’t. (This explains the compact form of many standard library routine names like malloc, printf, or strlen.)

Older languages were more restrictive, and variable names have evolved over the years:

11101001001001

Physical addresses represented as bits.

#FC27

Typical assembly language address represented in hexadecimal to save typing (and because it’s easier for humans to distinguish #A7 from #B6 than to distinguish 10100111 from 10110110.)

A1$

A string variable in BASIC, back in the old days where BASIC variables were one uppercase letter, optionally followed by a number, optionally followed by $ for a string variable and % for an integer variable. These type tags were used because BASIC interpreters didn’t have a mechanism for declaring variable types.

IFNXG7

A typical FORTRAN variable name, back in the days of 6-character all-caps variable names. The I at the start means it’s an integer variable. The rest of the letters probably abbreviate some much longer description of what the variable means. The default type based on the first letter was used because FORTRAN programmers were lazy, but it could be overridden by an explicit declaration.

i, j, c, count, top_of_stack, accumulatedTimeInFlight

Typical names from modern C programs. There is no type information contained in the name; the type is specified in the declaration and remembered by the compiler elsewhere. Note that there are two different conventions for representing multi-word names: the first is to replace spaces with underscores (snake case), and the second is to capitalize the first letter of each word after the first (camel case). You should pick one of these two conventions and stick to it.

prgcGradeDatabase

An example of Hungarian notation, a style of variable naming in which the type of the variable is encoded in the first few character. The type is now back in the variable name again. This is not enforced by the compiler: even though iNumberOfStudents is supposed to be an int, there is nothing to prevent you from declaring float iNumberOfStudents if you are teaching a class on improper chainsaw handling and want to allow for the possibility of fractional students. See this MSDN page for a much more detailed explanation of the system.

Not clearly an improvement on standard naming conventions, but it is popular in some programming shops.

In C, variable names are called identifiers. These are also used to identify things that are not variables, like functions and user-defined types.

An identifier in C must start with a lower or uppercase letter or the underscore character _. Typically variables starting with underscores are used internally by system libraries, so it’s dangerous to name your own variables this way. Subsequent characters in an identifier can be letters, digits, or underscores. So for example a, ____a___a_a_11727_a, AlbertEinstein, aAaAaAaAaAAAAAa, and ______ are all legal identifiers in C, but $foo and 01 are not.

The basic principle of variable naming is that a variable name is a substitute for the programmer’s memory. It is generally best to give identifiers names that are easy to read and describe what the variable is used for. Such variables are called self-documenting. None of the variable names in the preceding list are any good by this standard. Better names would be total_input_characters, dialedWrongNumber, or stepsRemaining. Non-descriptive single-character names are acceptable for certain conventional uses, such as the use of i and j for loop iteration variables, or c for an input character. Such names should only be used when the scope of the variable is small, so that it’s easy to see all the places where it is used at the same time.

C identifiers are case-sensitive, so aardvark, AArDvARK, and AARDVARK are all different variables. Because it is hard to remember how you capitalized something before, it is important to pick a standard convention and stick to it. The traditional convention in C goes like this:

  • Ordinary variables and functions are lowercased or camel-cased: count, countOfInputBits.
  • User-defined types (and in some conventions global variables) are capitalized: Stack, TotalBytesAllocated.
  • Constants created with #define or enum are written in all-caps: MAXIMUM_STACK_SIZE, BUFFER_LIMIT.

Using variables

Ignoring pointers for the moment, there are essentially two things you can do to a variable. You can assign a value to it using the = operator, as in:

    x = 2;      /* assign 2 to x */
    y = 3;      /* assign 3 to y */

or you can use its value in an expression:

    x = y+1;    /* assign y+1 to x */

The assignment operator is an ordinary operator, and assignment expressions can be used in larger expressions:

    x = (y=2)*3; /* sets y to 2 and x to 6 */

This feature is usually only used in certain standard idioms, since it’s confusing otherwise.

There are also shorthand operators for expressions of the form variable = variable operator expression. For example, writing x += y is equivalent to writing x = x + y, x /= y is the same as x = x / y, etc.

For the special case of adding or subtracting 1, you can abbreviate still further with the ++ and -- operators. These come in two versions, depending on whether you want the result of the expression (if used in a larger expression) to be the value of the variable before or after the variable is incremented:

    x = 0;
    y = x++;    /* sets x to 1 and y to 0 (the old value) */
    y = ++x;    /* sets x to 2 and y to 2 (the new value) */
    y = x--;    /* sets x to 1 and y to 2 (the old value) */
    y = --x;    /* sets x to 0 and y to 0 (the new value) */

The intuition is that if the ++ comes before the variable, the increment happens before the value of the variable is read (a preincrement; if it comes after, it happens after the value is read (a postincrement). This is confusing enough that it is best not to use the value of preincrement or postincrement operations except in certain standard idioms. But using x++ or ++x by itself as a substitute for x = x+1 is perfectly acceptable style.

Initialization

It is a serious error to use the value of a variable that has never been assigned to, because you will get whatever junk is sitting in memory at the address allocated to the variable, and this might be some arbitrary leftover value from a previous function call that doesn’t even represent the same type.

Fortunately, C provides a way to guarantee that a variable is initialized as soon as it is declared. Many of the examples in the notes do not use this mechanism, because of bad habits learned by the instructor using early versions of C that imposed tighter constraints on initialization. But initializing variables is a good habit to get in the practice of doing.

For variables with simple types (that is, not arrays, structs, or unions), an initializer looks like an assignment:

    int sum = 0;
    int n = 100;
    int nSquared = n*n;
    double gradeSchoolPi = 3.14;
    const char * const greeting = "Hi!";
    const int greetingLength = strlen(greeting);

For ordinary local variables, the initializer value can be any expression, including expressions that call other functions. There is an exception for variables allocated when the program starts (which includes global variables outside functions and static variables inside functions), which can only be initialized to constant expressions.

The last two examples show how initializers can set the values of variables that are declared to be const (the variable greeting is both constant itself, because of const greeting, and points to data that is also constant, because it is of type const char). This is the only way to set the values of such variables without cheating, because the compiler will complain if you try to do an ordinary assignment to a variable declared to be constant.

For fixed-size arrays and structs, it is possible to supply an initializer for each component, by enclosing the initializer values in braces, separated by commas. For example:

    int threeNumbers[3] = { 1, 2, 3 };

    struct numericTitle {
        int number;
        const char *name;
    };

    struct numericTitle s = { 7, "Samurai" };
    struct numericTitle n = { 3, "Ninjas" };

Storage class qualifiers

It is possible to specify additional information about how a variable can be used using storage class qualifiers, which usually go before the type of a variable in a declaration.

Scope and extent

Most variables that you will use in C are either parameters to functions or local variables inside functions. These have local scope, meaning the variable names can only be used in the function in which they are declared, and automatic extent, meaning the space for the variable is allocated, typically on the stack, when the function is called, and reclaimed when the function exits. (If the function calls itself, you get another copy of all the local variables; see recursion.)

On very rare occasions you might want to have a variable that survives the entire execution of a program (has static extent) or that is visible throughout the program (has global scope). C provides a mechanism for doing this that you shold never use under normal circumstances. Pretty much the only time you are going to want to have a variable with static extent is if you are keeping track of some piece of information that (a) you only need one instance of, (b) you need to survive between function calls, and (c) it would be annoying to pass around as an extra argument to any function that uses it. An example would be the internal data structures used by malloc, or the count variable in the function below:

/* returns the number of times this function has previously been called */
/* this can be used to generate unique numerical identifiers */
unsigned long long
ticketMachine(void)
{
    static unsigned long long count = 0;

    return count++;
}

To declare a local variable with static extent, use the static qualifier as in the above example. To declare a global variable with static extent, declare it outside a function. In both cases you should provide an initializer for the variable.

Additional qualifiers for global variables

It is possible to put some additional constraints on the visibility of global variables. By default, a global variable will be visible everywhere, but functions files other than the one in which it is defined won’t necessarily know what type it has. This latter problem can be fixed using an extern declaration, which says that there is a variable somewhere else of a particular type that we are declaring (but not defining, so no space is allocated). In contrast, the static keyword (on a global variable) specifies that it will only be visible in the current file, even if some other file includes a declaration of a global variable of the same name.

Here are three variable declarations that illustrate how this works:

    unsigned short Global = 5;    /* global variable, can be used anywhere */

    extern float GlobalFloat;     /* this global variable, defined somewhere else, has type float */

    static char Character = 'c';  /* global variable, can only be used by functions in this file */

(Note the convention of putting capital letters on global variables to distinguish them from local variables.)

Typically, an extern definition would appear in a header file so that it can be included in any function that uses the variable, while an ordinary global variable definition would appear in a C file so it only occurs once.

Marking variables as constant

The const qualifier declares a variable to be constant:

    const int three = 3;   /* this will always be 3 */

It is an error to apply any sort of assignment (=, +=, ++, etc.) to a variable qualified as const.

Pointers to const

A pointer to a region that should not be modified should be declared with const type:

    const char *string = "You cannot modify this string.";

The const in the declaration above applies to the characters that string points to: string is not const itself, but is instead a pointer to const. It is still possible to make string point somewhere else, say by doing an assignment:

    string = "You cannot modify this string either."

If you want to make it so that you can’t assign to string, put const right before the variable name:

    /* prevent assigning to string as well */
    const char * const string = "You cannot modify this string.";

Now string is a const pointer to const: you can neither modify string nor the values it points to.

Note that const only restricts what you can do using this particular variable name. If you can get at the memory that something points to by some other means, say through another pointer, you may be able to change the values in these memory locations anyway:

    int x = 5;
    const int *p = &x;
    int *q;

    *p = 1; /* will cause an error at compile time */
    x = 3;  /* also changes *p, but will not cause an error */

Licenses and Attributions


Speak Your Mind

-->