1.0 Introduction

A string is one of the fundamental building blocks of data that JavaScript works with. Any script that touches URLs or user entries in form text boxes works with strings. Most document object model properties are string values. Data that you read or write to a browser cookie is a string. Strings are everywhere!

The core JavaScript language has a repertoire of the common string manipulation properties and methods that you find in most programming languages. You can tear apart a string character by character if you like, change the case of all letters in the string, or work with subsections of a string. Most scriptable browsers now in circulation also benefit from the power of regular expressions, which greatly simplify numerous string manipulation tasks—once you surmount a fairly steep learning curve.

Your scripts will commonly be handed values that are already string data types. For instance, if you need to inspect the text that a user has entered into a form's text box, the value property of that text box object returns a value already typed as a string. All properties and methods of any string object are immediately available for your scripts to operate on that text box value.

1.0.1 Creating a String

If you need to create a string, you have a couple of ways to accomplish it. The simplest way is to simply assign a quoted string of characters to a variable (or object property):

var myString = "Fluffy is a pretty cat.";

Quotes around a JavaScript string can be either single or double quotes, but each pair must be of the same type. Therefore, both of the following statements are acceptable:

var myString = "Fluffy is a pretty cat.";
var myString = 'Fluffy is a pretty cat.';

But the following mismatched pair is illegal and throws a script error:

var myString = "Fluffy is a pretty cat.';

Having the two sets of quote symbols is handy when you need to embed one string within another. The following document.write( ) statement that would execute while a page loads into the browser has one outer string (the entire string being written by the method) and nested sets of quotes that surround a string value for an HTML element attribute:

document.write("<img src='img/logo.jpg' height='30' width='100' alt='Logo'>");

You are also free to reverse the order of double and single quotes as your style demands. Thus, the above statement would be interpreted the same way if it were written as follows:

document.write('<img src="img/logo.jpg" height="30" width="100" alt="Logo">');

Two more levels of nesting are also possible if you use escape characters with the quote symbols. See Recipe 1.8 for examples of escaped character usage in JavaScript strings.

Technically speaking, the strings described so far aren't precisely string objects in the purest sense of JavaScript. They are string values, which, as it turns out, lets the strings use all of the properties and methods of the global String object that inhabits every scriptable browser window. Use string values for all of your JavaScript text manipulation. In a few rare instances, however, a JavaScript string value isn't quite good enough. You may encounter this situation if you are using JavaScript to communicate with a Java applet, and one of the applet's public methods requires an argument as a string data type. In this case, you might need to create a full-fledged instance of a String object and pass that object as the method argument. To create such an object, use the constructor function of the String object:

var myString = new String("Fluffy is a pretty cat.");

The data type of the myString variable after this statement executes is object rather than string. But this object inherits all of the same String object properties and methods that a string value has, and works fine with a Java applet.

1.0.2 Regular Expressions

For the uninitiated, regular expressions can be cryptic and confusing. This isn't the forum to teach you regular expressions from scratch, but perhaps the recipes in this chapter that demonstrate them will pique your interest enough to pursue their study.

The purpose of a regular expression is to define a pattern of characters that you can then use to compare against an existing string. If the string contains characters that match the pattern, the regular expression tells you where the match is within the string, facilitating further manipulation (perhaps a search-and-replace operation). Regular expression patterns are powerful entities because they let you go much further than simply defining a pattern of fixed characters. For example, you can define a pattern to be a sequence of five numerals bounded on each side by whitespace. Another pattern can define the format for a typical email address, regardless of the length of the username or domain, but the full domain must include at least one period.

The cryptic part of regular expressions is the notation they use to specify the various conditions within the pattern. JavaScript regular expressions notation is nearly identical to regular expressions found in languages such as Perl. The syntax is the same for all except for some of the more esoteric uses. One definite difference is the way you create a regular expression object from a pattern. You can use either the formal constructor function or shortcut syntax. The following two syntax examples create the same regular expression object:

var re = /pattern/ [g | i | gi];                         // Shortcut syntax
var re = new RegExp(["pattern", ["g "| "i" | "gi"]]);     // Formal constructor

The optional trailing characters (g, i, and gi) indicate whether the pattern should be applied globally and whether the pattern is case-insensitive. Internet Explorer 5.5 or later for Windows and Netscape 6 or later also recognize the optional m modifier, which influences string boundary pattern matching within multiline strings.

If you have been exposed to regular expressions in the past, Table 1-1 lists the regular expression pattern notation available in browsers since NN 4 and IE 4.

Table 1-1. Regular expression notation

Character

Matches

Example

\b

Word boundary

/\bto/ matches "tomorrow"

/to\b/ matches "Soweto"

/\bto\b/ matches "to"

\B

Word nonboundary

/\Bto/ matches "stool" and "Soweto"

/to\B/ matches "stool" and "tomorrow"

/\Bto\B/ matches "stool"

\d

Numeral 0 through 9

/\d\d/ matches "42"

\D

Nonnumeral

/\D\D/ matches "to"

\s

Single whitespace

/under\sdog/ matches "under dog"

\S

Single nonwhitespace

/under\Sdog/ matches "under-dog"

\w

Letter, numeral, or underscore

/1\w/ matches "1A"

\W

Not a letter, numeral, or underscore

/1\W/ matches "1%"

.

Any character except a newline

/../ matches "Z3"

[...]

Any one of the character set in brackets

/J[aeiou]y/ matches "Joy"

[^...]

Negated character set

/J[^eiou]y/ matches "Jay"

*

Zero or more times

/\d*/ matches "", "5", or "444"

?

Zero or one time

/\d?/ matches "" or "5"

+

One or more times

/\d+/ matches "5" or "444"

{n}

Exactly n times

/\d{2}/ matches "55"

{n,}

n or more times

/\d{2,}/ matches "555"

{n,m}

At least n, at most m times

/\d{2,4}/ matches "5555"

^

At beginning of a string or line

/^Sally/ matches "Sally says..."

$

At end of a string or line

/Sally.$/ matches "Hi, Sally."

See Recipe 1.5 through Recipe 1.7, as well as Recipe 8.2, to see how regular expressions can empower a variety of string examination operations with less overhead than more traditional string manipulations. For in-depth coverage of regular expressions, see Mastering Regular Expressions, by Jeffrey E. F. Friedl (O'Reilly).

[ Team LiB ]