Previous Page
Next Page

3.3. Character Constants

A character constant consists of one or more characters enclosed in single quotation marks. Some examples:

    'a'   'XY'   '0'   '*'

All the characters of the source character set are permissible in character constants , except the single quotation mark ', the backslash \, and the newline character. To represent these characters, you must use escape sequences:

    '\''   '\\'   '\n'

All the escape sequences that are permitted in character constants are described in the upcoming section "Escape sequences."

3.3.1. The Type of Character Constants

Character constants have the type int, unless they are explicitly defined as wide characters, with type wchar_t, by the prefix L. If a character constant contains one character that can be represented in a single byte, then its value is the character code of that character in the execution character set. For example, the constant 'a' in ASCII encoding has the decimal value 97. The value of character constants that consist of more than one character can vary from one compiler to another.

The following code fragment tests whether the character read is a digit between 1 and 5, inclusive:

    #include <stdio.h>
    int c = 0;

    /* ... */

    c = getchar( );                          // Read a character.
    if ( c != EOF && c > '0' && c < '6' )   // Compare input to character
                                            // constants.
    {
      /* This block is executed if the user entered a digit from 1 to 5. */
    }

If the type char is signed, then the value of a character constant can also be negative, because the constant's value is the result of a type conversion of the character code from char to int. For example, ISO 8859-1 is a commonly used 8-bit character set, also known as the ISO Latin 1 or ANSI character set . In this character set, the currency symbol for pounds sterling, £, is coded as hexadecimal A3:

    int c = '\xA3';                         // Symbol for pounds sterling
    printf("Character: %c     Code: %d\n", c, c);

If the execution character set is ISO 8859-1, and the type char is signed, then the printf statement in the preceding example generates the following output:

    Character: £     Code: -93

In a program that uses characters that are not representable in a single byte, you can use wide-character constants . Wide-character constants have the type wchar_t, and are written with the prefix L, as in these examples:

    L'a'   L'12'   L'\012'   L'\u03B2'

The value of a wide-character constant that contains a single multibyte character is the value that the standard function mbtowc( ) ("multibyte to wide character") would return for that multibyte character.

The value of a character constant containing several characters, such as L'xy', is not specified. To ensure portability, make sure your programs do not depend on such a character constant having a specific value.


3.3.2. Escape Sequences

An escape sequence begins with a backslash \, and represents a single character. Escape sequences allow you to represent any character in character constants and string literals, including nonprintable characters and characters that otherwise have a special meaning, such as ' and ". Table 3-3 lists the escape sequences recognized in C.

Table 3-3. Escape sequences

Escape sequence

Character value

Action on output device

\'

A single quotation mark (')

Prints the character.

\"

A double quotation mark (")

 

\?

A question mark (?)

 

\\

A backslash character (\)

 

\a

Alert

Generates an audible or visible signal.

\b

Backspace

Moves the active position back one character.

\f

Form feed

Moves the active position to the beginning of the next page.

\n

Line feed

Moves the active position to the beginning of the next line.

\r

Carriage return

Moves the active position to the beginning of the current line.

\t

Horizontal tab

Moves the active position to the next horizontal tab stop.

\v

Vertical tab

Moves the active position to the next vertical tab stop.

\o, \oo, or \ooo

(where o is an octal digit)

The character with the given octal code

Prints the character.

\xh[h...]

(where h is a hexadecimal digit)

The character with the given hexadecimal code

 

\uhhhh

\Uhhhhhhhh

The character with the given universal character name

 


In the table, the active position refers to the position at which the output device prints the next output character, such as the position of the cursor on a console display. The behavior of the output device is not defined in the following cases: if the escape sequence \b (backspace) occurs at the beginning of a line; if \t (tab) occurs at the end of a line; or if \v (vertical tab) occurs at the end of a page.

As Table 3.3 shows, universal character names are also considered escape sequences. Universal character names allow you to specify any character in the extended character set, regardless of the encoding used. See "Universal Character Names" in Chapter 1 for more information.

You can also specify any character code in the value range of the type unsigned charor any wide-character code in the value range of wchar_tusing the octal and hexadecimal escape sequences , as shown in Table 3-4.

Table 3-4. Examples of octal and hexadecimal escape sequences

Octal

Hexadecimal

Description

'\0'

'\x0'

The null character.

'\033'

'\33'

'\x1B'

The control character ESC ("escape").

'\376'

'\xfe'

The character with the decimal code 254.

'\417'

'\x10f'

Illegal, as the numeric value is beyond the range of the type unsigned char.

L'\417'

L'\x10f'

That's better! It's now a wide-character constant; the type is wchar_t.

-

L'\xF82'

Another wide-character constant.


There is no equivalent octal notation for the last constant in the table, L'\xF82', because octal escape sequences cannot have more than three octal digits. For the same reason, the wide-character constant L'\3702' consists of two characters: L'\370' and L'2'.


Previous Page
Next Page