[ Team LiB ] Previous Section Next Section

Manipulating Strings

PHP provides many functions that transform a string argument, subtly or radically.

Cleaning Up a String with trim(), Itrim(), and strip_tags()

When you acquire text from the user or a file, you can't always be sure that you haven't also picked up white space at the beginning and end of your data. trim() shaves any white space characters, including newlines, tabs, and spaces, from both the start and end of a string. It accepts the string to be modified, returning the cleaned-up version:


$text = "\t\t\tlots of room to breathe";
$text = trim( $text );
print $text;
// prints "lots of room to breathe";

Of course, this might be more work than you require. You might want to keep white space at the beginning of a string but remove it from the end. You can use PHP's rtrim() function exactly the same as you would trim(). Only white space at the end of the string argument is removed, however:


$text = "\t\t\tlots of room to breathe  ";
$text = rtrim( $text );
print $text;
// prints "      lots of room to breathe";

PHP provides the ltrim() function to strip white space only from the beginning of a string. Once again, this is called with the string you want to transform and returns a new string, shorn of tabs, newlines, and spaces:


$text = "\t\t\tlots of room to breathe  ";
$text = ltrim( $text );
print "<pre>$text</pre>";
// prints    "lots of room to breathe   ";

Notice that we wrapped the $text variable in a <pre> element. Remember that the <pre> element preserves space and newlines, so we can use it to check on the performance of the ltrim() function.

PHP by its nature tends to work with markup text. It is not unusual to have to remove tags from a block to present it without formatting. PHP provides the strip_tags() function, which accepts two arguments, for this purpose. The first argument it accepts is the text to transform. The second argument is optional and should be a list of HTML tags that strip_tags() can leave in place. Tags in the exception list should not be separated by any characters, like so:


$string = "<p>I <i>simply</i> will not have it,";
$string .= "<br/>said Mr Dean</p><b>The end</b>";
print strip_tags( $string, "<br/>" );

In the previous code fragment, we create an HTML-formatted string. When we call strip_tags(), we pass it the $string variable and a list of exceptions. The result is that the <p> and <br/> elements are left in place and all other tags are stripped out.

Replacing a Portion of a String Using substr_replace()

substr_replace() works similarly to substr() except it enables you to replace the portion of the string you extract. The function requires three arguments: the string you are transforming, the text you want to add to it, and the starting index. It also accepts an optional length argument. substr_replace() finds the portion of a string specified by the starting index and length arguments, replacing this portion with the string provided in the replace string argument and returning the entire transformed string.

In the following code fragment, to renew a user's membership code, we must change its second two characters:


<?
$membership = "mz99xyz";
$membership = substr_replace( $membership, "00", 2, 2);
print "New membership number: $membership<br/>";
// prints "New membership number: mz00xyz"
?>



Replacing Substrings Using str_replace()

str_replace() replaces all instances of a string within another string. It requires three arguments: a search string, the replacement string, and the string on which this transformation is to be effected. The function returns the transformed string. The following example uses str_replace() to change all instances of 2000 to 2001 within a string:


$string = "Site contents copyright 2003.";
$string .= "The 2003 Guide to All Things Good in Europe";
print str_replace("2003","2004",$string);

As of PHP 4.05, str_replace() has been enhanced to accept arrays as well as strings for all its arguments. This enables you to perform multiple search and replace operations on a subject string, and even on more than one subject string:


<?php
$source = array(
"The package which is at version 4.2 was released in 2000",
  "The year 2000 was an excellent period for PointyThing4.2" );
$search = array( "4.2", "2000" );
$replace = array ( "5.0", "2001" );
$source = str_replace( $search, $replace, $source );
foreach ( $source as $str )
  print "$str<br>";

// prints:
// The package which is at version 5.0 was released in 2001
// The year 2001 was an excellent period for PointyThing5.0
?>

When str_replace() is passed an array of strings for its first and second arguments, it attempts to switch each search string with its corresponding replace string in the text to be transformed. When the third argument is an array, the str_replace() returns an array of strings. The search and replace operations are executed upon each string in the array.

Converting Case

PHP provides several functions that enable you to convert the case of a string. When you write user-submitted data to a file or database, you might want to convert it all to upper-or lowercase text first, to let you more easily compare it later. To get an uppercase version of a string, use the function strtoupper(). This function requires only the string you want to convert and returns the converted string:


$membership = "mz00xyz";
$membership = strtoupper( $membership );
print "$membership<P>"; // prints "MZ00XYZ"

To convert a string to lowercase characters, use the function strtolower(). Again, this requires the string you want to convert and returns a converted version:


$home_url = "WWW.CORROSIVE.CO.UK";
$home_url = strtolower( $home_url );
if ( ! ( strpos ( $home_url, "http://") === 0) )
  $home_url = "http://$home_url";
print $home_url; // prints "http://www.corrosive.co.uk"

PHP also provides a case function that has a useful cosmetic purpose. ucwords() makes the first letter of every word in a string uppercase. The following fragment makes the first letter of every word in a user-submitted string uppercase:


$full_name = "violet elizabeth bott";
$full_name = ucwords ( $full_name );
print $full_name; // prints "Violet Elizabeth Bott"

Although this function makes the first letter of each word uppercase, it does not touch any other letters. So, if the user had had problems with her Shift key in the previous example and submitted VIolEt eLIZaBeTH bOTt, our approach would not have done much to fix the string. We would have ended up with VIolEt ELIZaBeTH BOTt, which isn't much of an improvement. We can deal with this by making the submitted string lowercase with strtolower() before invoking ucwords():


$full_name = "VIolEt eLIZaBeTH bOTt";
$full_name =  ucwords( strtolower($full_name) );
print $full_name; // prints "Violet Elizabeth Bott"



Wrapping Text with wordwrap() and nl2br()

When you present plain text within a Web page, you are often faced with the problems that newlines are not displayed and your text runs together into a featureless blob. nl2br() is a convenient method that converts every newline into an HTML break. So


$string = "one line\n";
$string .= "another line\n";
$string .= "a third for luck\n";
print nl2br( $string );

prints the following:


one line<br />
another line<br />
a third for luck<br />

Notice that the <br> tags are output in XHTML-compliant form. This was introduced in PHP 4.0.5.

nl2br() is great for honoring newlines that are already in the text you are converting. Occasionally, though, you might want to add arbitrary line breaks to format a column of text. The wordwrap() function is perfect for this; it requires one argument, the string to be transformed. By default, wordwrap() wraps lines every 75 characters and uses \n as its line-break character. So, the code fragment


$string = "Given a long line, wordwrap() is useful as a means of";
$string .= "breaking it into a column and thereby making it easier to read";
print wordwrap($string);

would output


Given a long line, wordwrap() is useful as a means of breaking it into a
column and thereby making it easier to read

Because the lines are broken with the character \n, the formatting does not show up in HTML mode. wordwrap() has two more optional arguments: a number representing the maximum number of characters per line and a string representing the end of the line string you want to use. Applying the function call


print wordwrap( $string, 24, "<br/>\n");

to our $string variable, our output would be


Given a long line,<br/>
wordwrap() is useful as<br/>
a means of breaking it<br/>
into a column and<br/>
thereby making it easier<br/>
to read

wordwrap() doesn't automatically break at your line limit if a word has more characters than the limit. You can, however, use an optional fourth argument to enforce this. The argument should be a positive integer. Using wordwrap() in conjunction with the fourth argument, we can now wrap a string, even where it contains words that extend beyond the limit we are setting. This fragment


$string = "As usual you will find me at http://www.witteringonaboutit.com/";
$string .= "chat/eating_green_cheese/forum.php. Hope to see you there!";
print wordwrap( $string, 24, "<br/>\n", 1 );

outputs the following:


As usual you will find<br/>
me at<br/>
http://www.witteringonab<br/>
outit.com/chat/eating_gr<br/>
een_cheese/forum.php.<br/>
Hope to see you there!



Breaking Strings into Arrays with explode()

The delightfully named explode() function is similar in some ways to strtok(). explode(), though, breaks up a string into an array, which you can then store, sort, or examine as you want. explode() requires two arguments: the delimiter string you want to use to break up the source string and the source string itself. explode() optionally accepts a third argument that determines the maximum number of pieces the string can be broken into. The delimiter string can include more than one character, all of which form a single delimiter (unlike multiple delimiter characters passed to strtok(), each of which is a delimiter in its own right). The following fragment breaks up a date and stores the result in an array:


$start_date = "2000-01-12";
$date_array = explode ("-",$start_date);
// $date[0] == "2000"
// $date[1] == "01"
// $date[2] == "12"



Formatting Numbers As Text

We have already looked at printf() and sprintf(), which are powerful functions for formatting numbers of all types in a string context. printf() is not, however, an ideal tool for adding commas to larger numbers. For that, we can turn to number_format().

At a minimum, number_format() accepts a number to be transformed. It returns a string representation of the number with commas inserted after every three digits, as shown here:


print number_format(100000.56 );
// 100,001

In the previous fragment, we pass 100000.56 to number_format(), and it returns 100,001. It has removed the decimal part and rounded the number up and has also inserted a comma. We might want to keep the full number, so number_format() enables us to determine the precision we require using a second argument: an integer. Here's how:


print number_format (100000.56, 2 );
// 100,001.56
print number_format(100000.56, 4 );
// 100,001.5600

We can even alter the characters used to represent the decimal point and the thousands separator. To do this, we should pass two further strings to number_format()—the first representing the thousands separator and the second representing the decimal point:


print number_format (100000.56, 2, "-", " ");
// 100 000-56


Formatting Currency with money_format()

graphics/bytheway_icon.gif

The money_format() function is not available on Windows platforms.


Although the printf() function is a useful way of presenting currency data, as of PHP 4.3, a more specialized tool has become available. money_format() is similar to printf() and sprintf() in that it works with a format specification to transform its data.

money_format() requires two arguments: a string containing a format specification and a double. It returns a formatted string. In contrast to printf(), you cannot pass the function additional arguments, so you should use it to format one number at a time.

The format specification should begin with a percent symbol and can be followed by optional flags, a field width specifier, left and right precision specifiers, and a conversion character. Of these, only the percent character and the conversion character are required.

The output of this function is affected by the locale of your system. This determines the symbol used for currency, the decimal point character, and other attributes that change from region to region. For our examples, we will use a function called setLocale() to set the context to U.S. English explicitly:


setLocale (LC_ALL, 'en_US');

Having done this, we can set up some test values and store them in an array:


$cash_array = array( 235.31, 5, 2000000.45 );

Let's take a look at the most basic format specification possible:


foreach ( $cash_array as $cash ) {
  print money_format ("%\n", $cash);
}
/*
$235.31
$5.00
$2,000,000.45
*/

We pass a string and a floating-point number, stored in the $cash variable, to money_format(). The format specification is made up of the % character and a conversion character (n), which stands for "national." This conversion character causes the number to be formatted according to national conventions for money. In this case, it signifies the use of the dollar character, as well as commas inserted to break up the thousands in larger numbers. The alternative conversion specifier is i, which causes an international format to be applied. Replacing the n specifier with an i specifier in the previous fragment would yield the following:


USD 235.31
USD 5.00
USD 2,000,000.45

A field width specifier can optionally follow the percent character (or follow the flags described next if they are set). This provides padding to ensure that the output matches at least the given number of characters:


foreach ( $cash_array as $cash ) {
  print money_format("%40n\n", $cash);
}
/*
                 $235.31
                  $5.00
              $2,000,000.45
*/

In the previous fragment, we set the field width to 40 simply by adding 40 to the format specification after the percent sign. Notice that the numbers are rightaligned by default.

We can also define padding for the left side of the decimal point in a number using a left precision specifier. This follows the field width specifier and consists of a hash character (#) followed by a number representing the number of characters to pad:


foreach ( $cash_array as $cash ) {
  print money_format("%#10n\n", $cash);
}
/*
 $     235.31
 $      5.00
 $  2,000,000.45
*/

In the example, we used #10 to pad the left side of the decimal place. Notice that the gap between the dollar character and the decimal place is greater than 10 characters—this allows room for the grouping characters (that is, the commas that separate the thousands in numbers to aid readability). So, to combine a field width of 40 with a left precision of 10, we would use %40#10n. This would give us the following output:


$    235.31
$     5.00
$ 2,000,000.45

We can also control the number of decimal places to display using the right precision specifier. This follows the left precision specifier and consists of a decimal point and the number of decimal places to display. To show five decimal places, we might extend the previous format specification: %40#10.5n. This would give the following output:


$    235.31000
$     5.00000
$ 2,000,000.45000

Finally, you can use optional flags directly after the percent character to change the way in which formatting occurs. Table 8.3 lists the available flags and shows their effects on output when applied to a format specifier of %#10n. Let's take a look at the effect of this format specifier without a flag:


print money_format("%#10n", -2000000.45);
/*
-$ 2,000,000.45
*/

Table 8.3. Format Specifier Flags

Flag

Description

Example Format

Example Output

!

Suppress currency character

%!#10n

- 2,000,000.45

^

Suppress number grouping

%^#10n

-$ 2000000.45

+

Include +/-symbol

%+#10n

-$ 2,000,000.45

(

Use brackets to distinguish minus numbers

%(#10n

($ 2,000,000.45)

-

Left-justify (default is right-justify)

%-#10n

-$2,000,000.45

=n

Use n character to fill left padding

%=.#10n

-$....2,000,000.45

    [ Team LiB ] Previous Section Next Section