Team LiB
Previous Section Next Section

Formatting Strings

If strings are nothing more than a collection of characters to a computer, to a human being they often represent concepts and data that are best presented following certain conventions. Even when dealing with computers, however, it's sometimes necessary to ensure that the contents of a string follow certain rules. For example, strings that must be passed to a Web browser must be formatted according to the HTML standards for them to be properly visualized.

As a result, PHP provides a wide range of functions that can be used to format the contents of a string for a number of occasions. Perhaps the most generic example of this functionality is printf(), whose syntax is as follows:

void printf ($format_specification[, $parameters...]);

The $format_specification parameter is a string that contains both normal text, which is output as is, and replacement directives, which are replaced using the values provided in the $parameters section of the function call.

A replacement directive has the following form:

%[P][-]W[.R]T

T is the type of the parameter (see Table 1.4), W is the minimum length that the data should take in the output string, P is an optional padding character to be used as a filler to ensure that the data takes at least W characters.

Table 1.4. printf() Type Specifiers

Option

Value

%

A literal percent characters (takes no parameters)

b

Integer represented as a binary number (for example: 101110111)

c

Integer represented as the character corresponding to its ASCII value

d

Integer represented as a signed integer number

u

Integer represented as an unsigned number

f

Floating-point value

o

Integer represented as an octal value

s

String value

x

Integer value represented in hexadecimal notation (with lowercase characters)

X

Integer value represented in hexadecimal notation (with uppercase characters)


R is an optional precision token that has meaning only when dealing with floating-point values; it specifies the number of decimal digits that should be used to represent the data.

Finally, a dash (-) placed strategically between P and W indicates that the data should be left-aligned in the space allotted to it by W.

This all sounds a lot more complicated than it really is. Let's take a look at a few examples:

%-5d

This token represents a right-aligned integer value that must be at least five characters long.

%05.3f

This token represents a floating-point value at least five characters long and with no less than three decimal digits. The character "0" is used to pad the string to its minimum length.

The printf function makes it relatively easy to format complex strings using a single expression. Here's an example:

<?php

    $n = 15.32;
    $log = log ($n); 

    printf ("log (%0.2f) = %.5f\n", $n, $log); 

?>

This script outputs log (15.32) = 2.72916. For those of you who come from the C language, note that printf() does not provide any kind of substitution of backslash-escaped special characters, such as \n. If you want to use these special characters, ensure that you specify the value of format_specification using the double-quote syntax.

The traditional C implementation of printf() requires that a parameter be specified for each replacement directive stored in format_specification. As the directives are found, the interpreter moves from one parameter to the next until all substitutions are made.

Unfortunately, this approach can cause some serious trouble. Consider the case, for example, of using printf() as the basis for a system that supports multiple languages. The English sentence

"The [box/case] contains [three/five] pens"

can be translated into another language using a different construction, for example:

"There are [three/five] pens in the [box/case]"

It's clear that using printf() to provide a localization system flexible enough to support the construction forms of different languages would be difficult without the possibility of specifying which parameter should be used to provide a value for each replacement directive.

Luckily, PHP makes it possible to do so by using a slightly different directive syntaxall you need to do is prepend the number of the parameter, followed by a dollar sign ($), to the directive. For example, the following script:

<?php

    function replace_me ($s)
    {
      printf ($s, 10, 'box');
    }

    replace_me ("There are %d pens in the %s\n");
    replace_me ("The %2\$s contains %1\$s pens\n");

?>

returns the correct value despite the fact that the order of the parameters is inverted in the second string (notice how I have escaped the dollar signs using a backslash to ensure that they are not trapped by PHP's string declaration mechanism):

There are 10 pens in the box.

The box contains 10 pens.

The sprintf function takes the same parameters as printf(), but returns the string that results from its execution:

$a = printf ("%d cases of wine\n", 10);

Alternatives to printf()

Although the printf() function is extremely useful, it is also computationally intensive. As a result, you should try to limit its use as much as possible, relying instead on other functions provided by PHP for more specific tasks.

For example, you can use the number_format function to format a number according to a number of parameters:

number_format 
(
  $number, 
  [$decimals,
  [$point_separator, 
  $thousand_separator]]
);

The function works by formatting $number using at a minimum $decimals decimal digits, using $point_separator as a separator between the integer and decimal parts, and $thousand_separator to separate groups of thousands. If $decimals isn't specified, no decimal digits are shown. If $point_separator and $thousand_separator aren't used, the interpreter uses a dot (.) and a comma (,) in their place.

For example, in countries such as the U.K. and the United States, numbers are formatted using commas to separate the thousand groups, and dots are used to separate the integer part from the decimal part. Some European countries, such as Italy, use the opposite notation: dots separate the thousands and the comma indicates the beginning of the decimal part. Here's how number_format can be used to satisfy both requirements:

<?php

    $a = 1232322210.44;

    echo number_format ($a, 2);   // English format
    echo "\n";
    echo number_format ($a, 2, ',', '.'); // Italian format
    echo "\n";

?>

The preceding example produces the following output:

1,232,322,210.44
1.232.322.210,44

    Team LiB
    Previous Section Next Section