Perl Cookbook

Perl CookbookSearch this book
Previous: 1.8. Expanding Variables in User InputChapter 1
Strings
Next: 1.10. Interpolating Functions and Expressions Within Strings
 

1.9. Controlling Case

Problem

A string in uppercase needs converting to lowercase, or vice versa.

Solution

Use the lc and uc functions or the \L and \U string escapes.

use locale;                     # needed in 5.004 or above

$big = uc($little);             # "bo peep" -> "BO PEEP"
$little = lc($big);             # "JOHN"    -> "john"
$big = "\U$little";             # "bo peep" -> "BO PEEP"
$little = "\L$big";             # "JOHN"    -> "john"

To alter just one character, use the lcfirst and ucfirst functions or the \l and \u string escapes.

$big = "\u$little";             # "bo"      -> "Bo"
$little = "\l$big";             # "BoPeep"    -> "boPeep" 

Discussion

The functions and string escapes look different, but both do the same thing. You can set the case of either the first character or the whole string. You can even do both at once to force uppercase on initial characters and lowercase on the rest.

The use locale directive tells Perl's case-conversion functions and pattern matching engine to respect your language environment, allowing for characters with diacritical marks, and so on. A common mistake is to use tr/// to convert case. (We're aware that the old Camel book recommended tr/A-Z/a-z/. In our defense, that was the only way to do it back then.) This won't work in all situations because when you say tr/A-Z/a-z/ you have omitted all characters with umlauts, accent marks, cedillas, and other diacritics used in dozens of languages, including English. The uc and \U case-changing commands understand these characters and convert them properly, at least when you've said use locale. (An exception is that in German, the uppercase form of ñ is SS, but it's not in Perl.)

use locale;                     # needed in 5.004 or above

$beast   = "dromedary";
# capitalize various parts of $beast
$capit   = ucfirst($beast);         # Dromedary
$capit   = "\u\L$beast";            # (same)
$capall  = uc($beast);              # DROMEDARY
$capall  = "\U$beast";              # (same)
$caprest = lcfirst(uc($beast));     # dROMEDARY
$caprest = "\l\U$beast";            # (same)

These capitalization changing escapes are commonly used to make the case in a string consistent:

# capitalize each word's first character, downcase the rest
$text = "thIS is a loNG liNE";
$text =~ s/(\w+)/\u\L$1/g;
print $text;
This Is A Long Line

You can also use their functional forms to do case-insensitive comparison:

if (uc($a) eq uc($b)) {
    print "a and b are the same\n";
}

The randcap program, shown in Example 1.2, randomly capitalizes 20 percent of the letters of its input. This lets you converse with 14-year-old WaREz d00Dz.

Example 1.2: randcap

#!/usr/bin/perl -p
# randcap: filter to randomly capitalize 20% of the letters
# call to srand() is unnecessary in 5.004
BEGIN { srand(time() ^ ($$ + ($$ << 15))) }
sub randcase { rand(100) < 20 ? "\u$_[0]" : "\l$_[0]" }
s/(\w)/randcase($1)/ge;

% randcap < genesis | head -9
boOk 01 genesis

001:001 in the BEginning goD created the heaven and tHe earTh.
    
001:002 and the earth wAS without ForM, aND void; AnD darkneSS was
        upon The Face of the dEEp. and the spIrit of GOd movEd upOn
        tHe face of the Waters.

001:003 and god Said, let there be ligHt: and therE wAs LigHt.

A more interesting approach would have been to take advantage of Perl's ability to use bitwise operators on strings:

sub randcase {
    rand(100) < 20 ? ("\040" ^ $1) : $1
}

That would, in 20 percent of the cases, switch the case of the letter. However, this misbehaves on 8-bit characters. The original randcase program had the same problem, but appying use locale would have easily fixed it.

This example of bitwise string operations quickly strips off all the high bits on a string:

$string &= "\177" x length($string);

Again, they'll be talking about you all over Europe, and not in the most glowing of terms, if you force all strings to seven bits.

See Also

The uc, lc, ucfirst, and lcfirst functions in perlfunc (1) and Chapter 3 of Programming Perl; the \L, \U, \l, and \u string escapes in the "Quote and Quote-like Operators" section of perlop (1) and Chapter 2 of Programming Perl


Previous: 1.8. Expanding Variables in User InputPerl CookbookNext: 1.10. Interpolating Functions and Expressions Within Strings
1.8. Expanding Variables in User InputBook Index1.10. Interpolating Functions and Expressions Within Strings