Books / The PHP Programming Language / Chapter 13
Strings in PHP
As we’ve previously seen, PHP has a built-in string type. Internally, PHP strings are simply a sequence of bytes, but for our purposes we can treat it as a 0-indexed character array. PHP strings are mutable and can be changed, but it is considered best practice to treat them as immutable and rely on the many functions PHP provides to manipulate strings.
Basics
We can create strings by assigning a string literal value to a variable. Strings can be specified by either single quotes or double quotes (there are no individual characters in PHP, only single character strings), but we will use the double quote syntax.
$firstName = "Thomas";
$lastName = "Waits";
//we can also reassign values
$firstName = "Tom";
The reassignment in the last line in the example effectively destroys the old string. The assignment operator can also be used to make copies of strings.
$firstName = "Thomas";
$alias = $firstName;
It is important to understand that this assignment makes a deep copy of the string. Changes to the first do not affect the second one. You can make changes to individual characters in a string by treating it like a zero-indexed array.
$a = "hello";
$a[0] = "H";
$a[5] = "!";
//a is now "Hello!"
The last line extends the string by adding an additional character. You can even remove characters by setting them to the empty string.
$a = "Apples!";
$a[5] = "";
//a is now "Apple!"
String Functions
PHP provides dozens of convenient functions that allow you to process and modify strings. We highlight a few of the more common ones here. A full list of supported functions can be found in standard documentation. Because of the history of PHP, many of the same functions defined in the C string library can also be used in PHP.
Length
When accessing individual characters in a string, it is necessary that we know the length
of the string so that we do not access invalid characters (though doing so is not an error,
it just results in null
). The strlen()
function returns an integer that represents the
number of characters in the string.
$s = "Hello World!";
$x = strlen($s); //x is 12
$s = "";
$x = strlen($s); //x is 0
//careful:
$s = NULL
$x = strlen($s); //x is 0
As demonstrated in the last example, strlen()
will return 0 for null
strings. Recall
that we can distinguish between these two situations by using is_null()
. Using this
function we can iterate over each individual character in a string.
$fullName = "Tom Waits";
for($i=0; $i<strlen($fullName); $i++) {
printf("fullName[%d] = %s\n", $i, $fullName[$i]);
}
This would print the following:
fullName[0] = T
fullName[1] = o
fullName[2] = m
fullName[3] =
fullName[4] = W
fullName[5] = a
fullName[6] = i
fullName[7] = t
fullName[8] = s
Concatenation
PHP has a concatenation operator built into the language. To concatenate one or more strings together, you can use a simple period between them as the concatenation operator. Concatenation results in a new string.
$firstName = "Tom";
$lastName = "Waits";
$formattedName = $lastName . ", " . $firstName;
//formattedName now contains "Waits, Tom"
Concatenation also works with other variable types.
$x = 10;
$y = 3.14;
$s = "Hello, x is " . $x . " and y = " . $y;
//s contains "Hello, x is 10 and y = 3.14"
Computing a Substring
PHP provides a simple function, substr()
to compute a substring of a string. It takes
at at least 2 arguments: the string to operate on and the starting index. There is a third,
optional parameter that allows you to specify the length of the resulting substring.
$name = "Thomas Alan Waits";
$firstName = substr($name, 0, 6); //"Thomas"
$middleName = substr($name, 7, 4); //"Alan"
$lastName = substr($name, 12); //"Waits"
In the final example, omitting the optional length parameter results in the entire remainder of the string being returned as the substring.
Arrays of Strings
We often need to deal with collections of strings. In PHP we can define arrays of strings.
Indeed, we’ve seen arrays of strings before. When processing command line arguments,
PHP defines an array of strings, $argv
. Each string can be accessed using an index,
$argv[0]
for example is always the name of the script.
We can create our own arrays of strings using the same syntax as with other arrays.
$names = array(
"Margaret Hamilton",
"Ada Lovelace",
"Grace Hopper",
"Marie Curie",
"Hedy Lamarr");
String Comparisons
When comparing strings in PHP, we can use the usual comparison operators such
as ===
, <
, or <=
which will compare the strings lexicographically. However, this is
generally discouraged because of type juggling issues and strict vs loose equality/inequality
comparisons. Instead, there are several comparator methods that PHP provides to
compare strings based on their content. strcmp($a, $b)
takes two strings and returns
an integer based on the lexicographic ordering of $a
and $b
. If $a
precedes $b
,
strcmp()
returns something negative. It returns zero if $a
and $b
have the same
content. Otherwise it returns something positive if $b
precedes $a
.
$x = strcmp("apple", "banana"); //x is negative
$x = strcmp("zelda", "mario"); //x is positive
$x = strcmp("Hello", "Hello"); //x is zero
//shorter strings precede longer strings:
$x = strcmp("apple", "apples"); //x is negative
$x = strcmp("Apple", "apple"); //x is negative
In the last example, "Apple"
precedes "apple"
since uppercase letters are ordered
before lowercase letters according to the ASCII table. We can also make case insensitive comparisons if we need to using the alternative, strcasecmp($a, $b)
. Here,
strcasecmp("Apple", "apple")
will return zero as the two strings are the same ignoring the cases.
The comparison functions also have length-limited versions, strncmp($a, $b, $n)
and strncasecmp($a, $b, $n)
. Both will only make comparisons in the first $n
characters of the strings. Thus, strncmp("apple", "apples", 5)
will result in zero
as the two strings are equal in the first 5 characters.
Splitting a String in PHP
Tokenizing is the process of splitting up a string along some delimiter. For
example, the comma delimited string, "Smith,Joe,12345678,1985-09-08"
contains
four pieces of data delimited by a comma. Our aim is to split this string up into four
separate strings so that we can process each one. PHP provides several functions to to
this, explode()
and preg_split()
.
The simpler one, explode()
takes two arguments: the first one is a string delimiter and
the second is the string to be processed. It then returns an array of strings.
$data = "Smith,Joe,12345678,1985-09-08";
$tokens = explode(",", $data);
//tokens is [ "Smith", "Joe", "12345678", "1985-09-08" ]
$dateTokens = explode("-", $tokens[3]);
//dateTokens is now [ "1985", "09", "08" ]
The more sophisticated preg_split()
also takes two arguments,1 but instead of a
simple delimiter, it uses a regular expression; a sequence of characters that define a
search pattern in which special characters can be used to define complex patterns. For example, the complex expression ^[+-]?(\d+(\.\d+)?|\.\d+)([eE][+-]?\d+)?$
will
match any valid numerical value including scientific notation. We will not cover regular
expressions in depth, but to demonstrate their usefulness, here’s an example by which
you can split a string along any and all whitespace:
$s = "Alpha Beta \t Gamma \n Delta \t\nEpsilon";
$tokens = preg_split("/[\s]+/", $s);
//tokens is now [ "Alpha", "Beta", "Gamma", "Delta", "Epsilon" ]