Previous Page
Next Page

3.4. String Literals

A string literal consists of a sequence of characters (and/or escape sequences) enclosed in double quotation marks. Example:

    "Hello world!\n"

Like character constants, string literals may contain all the characters in the source character set. The only exceptions are the double quotation mark ", the backslash \, and the newline character, which must be represented by escape sequences. The following printf statement first produces an alert tone, then indicates a documentation directory in quotation marks, substituting the string literal addressed by the pointer argument doc_path for the conversion specification %s:

    char doc_path[128] = ".\\share\\doc";
    printf("\aSee the documentation in the directory \"%s\"\n", doc_path);

A string literal is a static array of char that contains character codes followed by a string terminator, the null character \0 (see also Chapter 8). The empty string "" occupies exactly one byte in memory, which holds the terminating null character. Characters that cannot be represented in one byte are stored as multibyte characters.

As illustrated in the previous example, you can use a string literal to initialize a char array. A string literal can also be used to initialize a pointer to char:

    char *pStr = "Hello, world!";     // pStr points to the first character, 'H'

In such an initializer, the string literal represents the address of its first element, just as an array name would.

In Example 3-1, the array error_msg contains three pointers to char, each of which is assigned the address of the first character of a string literal.

Example 3-1. Sample function error_exit( )
#include <stdlib.h>
#include <stdio.h>
void error_exit(unsigned int error_n)  // Print a last error message
{                                      // and exit the program.
  char * error_msg[ ] = { "Unknown error code.\n",
                         "Insufficient memory.\n",
                         "Illegal memory access.\n" };
  unsigned int arr_len = sizeof(error_msg)/sizeof(char *);

  if ( error_n >= arr_len )
     error_n = 0;
  fputs( error_msg[error_n], stderr );
  exit(1);
}

Like wide-character constants, you can also specify string literals as strings of wide characters by using the prefix L:

    L"Here's a wide-string literal."

A wide-string literal defines a null-terminated array whose elements have the type wchar_t. The array is initialized by converting the multibyte characters in the string literal to wide characters in the same way as the standard function mbstowcs( ) ("multibyte string to wide-character string") would do. Similarly, any universal character names indicated by escape sequences in the string literal are stored as individual wide characters.

In the following example, \u03b1 is the universal name for the character a, and wprintf( ) is the wide-character version of the printf function, which formats and prints a string of wide characters:

    double angle_alpha = 90.0/3;
    wprintf( L"Angle \u03b1 measures %lf degrees.\n", angle_alpha );

If any multibyte character or escape sequence in a string literal is not representable in the execution character set, then the value of the string literal is not specifiedin other words, its value depends on the given compiler.

The compiler's preprocessor concatenates any adjacent string literalsthat is, those which are separated only by whitespaceinto a single string. As the following example illustrates, this concatenation also makes it simple to break up a string into several lines for readability:

    #define PRG_NAME "EasyLine"
    char msg[ ] = "The installation of " PRG_NAME
                 " is now complete.";

If any of the adjacent component strings is a wide-string literal, then the string that results from their concatenation is also a wide-character string.

Another way to break a string literal into several lines is to end a line with a backslash, as in this example:

    char info[ ] =
    "This is a string literal broken up into\
     several source code lines.\nNow one more line:\n\
    that's enough, the string ends here.";

The string continues at the beginning of the next line: any spaces at the left margin, such as the space before several in the preceding example, are part of the string literal. Furthermore, the string literal defined here contains exactly two newline characters: one immediately before Now, and one immediately before that's.

The compiler interprets escape sequences before concatenating adjacent strings (see the section "The C Compiler's Translation Phases" in Chapter 1). As a result, the following two string literals form one wide-character string that begins with the two characters '\xA7' and '2':

    L"\xA7" L"2 et cetera"

However, if the string is written in one piece as L"\xA72 et cetera", then the first character in the string is the wide character '\xA72'.

Although C does not strictly prohibit modifying string literals, you should not attempt to do so. In the following example, the second statement is an attempt to replace the first character of a string:

    char *p = "house";         // Initialize a pointer to char.
    *p = 'm';                  // This is not a good idea!

This statement is not portable, and causes a run-time error on some systems. For one thing, the compiler, treating the string literal as a constant, may place it in read-only memory, so that the attempted write operation causes a fault. For another, if two or more identical string literals are used in the program, the compiler may store them at the same location, so that modifying one causes unexpected results when you access another.

However, if you use a string literal to initialize an array variable, you can then modify the contents of the array:

    char s[ ] = "house";        // Initialize an array of char.
    s[0] = 'm';                // Now the array contains the string "mouse".


Previous Page
Next Page