Chapter 3. Regular Expressions

IN THIS CHAPTER

Regular expressions (regex) are one of the black arts of practical computer programming. Ask any programmer, and chances are that he or she will, at some point, have had serious problems with them (or, even worse, avoided them altogether).

Yet, regular expressions, although complicated, are not really difficult to understand. Fundamentally, they are a way to describe patterns of text using a single set of strings. Unlike a simple search-and-replace operation, such as changing all instances of "Marco" with "Tabini," regex allows for much more flexibilityfor example, finding all instances of the letters "Mar" followed by either "co" or "k," and so forth.

Regular expressions were initially described in the 1950s by a mathematician named S. C. Kleene, who formalized models first designed by Warren McCulloch and Walter Pitts to describe the nervous system. Regex, however, were not actually applied to computer science until Ken Thompson (who then went on to become one of the original designers of the UNIX operating system) used them as a means to search and replace text in his qed text editor.

Regular expressions eventually made their way into the UNIX operating system (and later into the POSIX standard) and into Perl, where they are considered one of the language's strongest features. PHP actually makes both standards availablethe idea being that Perl programmers will feel right at home, and beginners will be able to use the simpler POSIX expressions.