Team LiB
Previous Section Next Section

MySQL Data Types

Each table in a database contains one or more columns. When you create a table using a CREATE TABLE statement, you specify a data type for each column. A data type is more specific than a general category such as "number" or "string." For a column, the data type is the means by which you precisely characterize the kind of values the column may contain, such as SMALLINT or VARCHAR(32). This in turn determines how MySQL treats those values. For example, if you have numeric values, you could store them using either a numeric or string column, but MySQL will treat the values somewhat differently depending on what type you use. Each data type has several characteristics:

  • What kind of values it can represent.

  • How much space values take up, and whether the values are fixed-length (all values of the type take the same amount of space) or variable-length (the amount of space depends on the particular value being stored)

  • How values of the type are compared and sorted

  • Whether the type can be indexed

The following discussion surveys MySQL's data types briefly, and then describes in more detail the syntax for defining them and the properties that characterize each type, such as their range and storage requirements. The type specifications are shown as you use them in CREATE TABLE statements. Optional information is indicated by square brackets ([]). For example, the syntax MEDIUMINT[(M)] indicates that the maximum display width, specified as (M), is optional. On the other hand, for VARCHAR(M), the lack of brackets indicates that (M) is required.

Overview of Data Types

MySQL provides data types for values from all the general data value categories except the NULL value. NULL spans all types in the sense that the property of whether a column may contain NULL values is treated as a type attribute.

MySQL has numeric data types for integer, floating-point, and fixed-point values, as shown in Table 3.2. Numeric columns can be signed or unsigned. A special attribute allows sequential integer column values to be generated automatically, which is useful in applications that require a series of unique identification numbers.

Table 3.2. Numeric Data Types

Type Name

Meaning

TINYINT

A very small integer

SMALLINT

A small integer

MEDIUMINT

A medium-sized integer

INT

A standard integer

BIGINT

A large integer

FLOAT

A single-precision floating-point number

DOUBLE

A double-precision floating-point number

DECIMAL

A fixed-point number

BIT

A bit field


Table 3.3 shows the MySQL string data types. Strings can hold anything, even arbitrary binary data such as images or sounds. Strings can be compared according to whether or not they are case sensitive. In addition, you can perform pattern matching on strings. (Actually, in MySQL, you can even perform pattern matching on numeric types, but it's more commonly done with string types.)

Table 3.3. String Data Types

Type Name

Meaning

CHAR

A fixed-length non-binary string

VARCHAR

A variable-length non-binary string

BINARY

A fixed-length binary string

VARBINARY

A variable-length binary string

TINYBLOB

A very small BLOB (binary large object)

BLOB

A small BLOB

MEDIUMBLOB

A medium-sized BLOB

LONGBLOB

A large BLOB

TINYTEXT

A very small non-binary string

TEXT

A small non-binary string

MEDIUMTEXT

A medium-sized non-binary string

LONGTEXT

A large non-binary string

ENUM

An enumeration; each column value may be assigned one enumeration member

SET

A set; each column value may be assigned zero or more set members


Table 3.4 shows the MySQL date and types, where CC, YY, MM, DD hh, mm, and ss represent century, year, month, day, hour, minute, and second, respectively. For temporal values, MySQL provides types for dates and times (either combined or separate) and timestamps (a special type that allows you to track when changes were last made to a record). There is also a type for efficiently representing year values when you don't need an entire date.

Table 3.4. Date and Time Data Types

Type Name

Meaning

DATE

A date value, in 'CCYY-MM-DD' format

TIME

A time value, in 'hh:mm:ss' format

DATETIME

A date and time value, in 'CCYY-MM-DD hh:mm:ss' format

TIMESTAMP

A timestamp value, in 'CCYY-MM-DD hh:mm:ss' format

YEAR

A year value, in CCYY format


Table 3.5 shows the MySQL spatial data types. These represent various kinds of geometrical or geographical values.

Table 3.5. Spatial Data Types

Type Name

Meaning

GEOMETRY

A spatial value of any type

POINT

A point (a pair of X,Y coordinates)

LINESTRING

A curve (one or more POINT values)

POLYGON

A polygon

GEOMETRYCOLLECTION

A collection of GEOMETRY values

MULTILINESTRING

A collection of LINESTRING values

MULTIPOINT

A collection of POINT values

MULTIPOLYGON

A collection of POLYGON values


Defining Column Types in Table Definitions

To create a table, issue a CREATE TABLE statement that includes a list of the columns in the table. Here's an example that creates a table named mytbl with three columns named f, c, and i:

CREATE TABLE mytbl
(
    f FLOAT(10,4),
    c CHAR(15) NOT NULL DEFAULT 'none',
    i TINYINT UNSIGNED NULL
);

Each column has a name and a type. Various attributes may be associated with the type. The syntax for defining a column is as follows:

col_name col_type [col_attributes] [general_attributes]

The name of the column, col_name, is always first in the definition and must be a legal identifier. The precise rules for identifier syntax are given in "MySQL Naming Rules," in Chapter 2. Briefly summarized, column identifiers may be up to 64 characters long, and may consist of alphanumeric characters from the system character set, as well as the underscore and dollar sign characters ('_' and '$'). Keywords such as SELECT, DELETE, and CREATE normally are reserved and cannot be used. However, you can include other characters within an identifier or use a reserved word as an identifier if you are willing to put up with the bother of quoting it whenever you refer to it. To quote an identifier, enclose it within backtick ('`') characters. If the ANSI_QUOTES SQL mode is enabled, you also can quote identifiers within double quote ('"') characters.

col_type indicates the column data type; that is, the specific kind of values the column can hold. Some type specifiers indicate the maximum length of the values you store in the column. For others, the length is implied by the type name. For example, CHAR(10) specifies an explicit length of 10 characters, whereas TINYTEXT values have an implicit maximum length of 255 characters. Some of the type specifiers allow you to indicate a maximum display width (how many characters to use for displaying values). For floating-point and fixed-point types, you can specify the number of significant digits and number of decimal places.

Following the column's data type, you may specify optional type-specific attributes as well as more general attributes. These attributes function as type modifiers. They cause MySQL to change the way it treats column values in some way:

  • The type-specific attributes that are allowable depend on the data type you choose. For example, UNSIGNED and ZEROFILL are allowable only for numeric types, and CHARACTER SET and COLLATE are allowable only for non-binary string types.

  • The general attributes may be given for any data type, with a few exceptions. You may specify NULL or NOT NULL to indicate whether a column can hold NULL values. For most data types, you can specify a DEFAULT clause to define a default value for the column. Default value handling is described in the next section.

If multiple column attributes are present, there are some constraints on the order in which they may appear. In general, you should be safe if you specify data type-specific attributes such as UNSIGNED or ZEROFILL before general attributes such as NULL or NOT NULL.

Specifying Column Default Values

For all but BLOB and TEXT types, spatial types, or columns with the AUTO_INCREMENT attribute, you can specify a DEFAULT def_value clause to indicate that a column should be assigned the value def_value when a new row is created that does not explicitly specify the column's value. With some limited exceptions for TIMESTAMP columns, def_value must be a constant. It cannot be an expression or refer to other columns.

If a column definition includes no explicit DEFAULT clause and the column can take NULL values, its default value is NULL. Otherwise, the handling of a missing DEFAULT clause is version dependent:

  • Before MySQL 5.0.2, MySQL defines the column with a DEFAULT clause that specifies the implicit default value. The implicit default depends on the column data type:

    • For numeric columns, the default is 0, except for integer columns that have the AUTO_INCREMENT attribute. For AUTO_INCREMENT, the default is the next number in the column sequence.

    • For date and time types except TIMESTAMP, the default is the "zero" value for the type (for example, '0000-00-00' for DATE). For TIMESTAMP, the default is the current date and time for the first TIMESTAMP column in a table, and the "zero" value for any following TIMESTAMP columns. (The TIMESTAMP defaults actually are more complex and are discussed in "The TIMESTAMP Data Type" later in this chapter.)

    • For string types other than ENUM, the default is the empty string. For ENUM, the default is the first enumeration element. For SET, the default when the column cannot contain NULL is actually the empty set, but that is equivalent to the empty string.

  • From MySQL 5.0.2 on, the column is created without any DEFAULT clause. That is, it has no default value. This affects how the column is handled if new rows that do not specify a value for the column are inserted into the table:

    • When strict mode is not in effect, the column is set to the implicit default for its data type.

    • When strict mode is in effect, an error occurs if the table is transactional. The statement aborts and rolls back. For non-transactional tables, an error occurs and the statement aborts if the row is the first row inserted by the statement. If it is not the first row, you can elect to have the statement abort or to have the column set to its implicit default with a warning. The choice depends on which strict mode setting is in effect. See "How MySQL Handles Invalid Data Values" for details.

You can use the SHOW CREATE TABLE statement to see which columns have a DEFAULT value and what they are.

Numeric Data Types

MySQL's numeric data types group into three general classifications:

  • Integer types are used for numbers that have no fractional part, such as 43, -3, 0, or -798432. You can use integer columns for data represented by whole numbers, such as weight to the nearest pound, height to the nearest inch, number of stars in a galaxy, number of people in a household, or number of bacteria in a petri dish.

  • Floating-point and fixed-point types are used for numbers that may have a fractional part, such as 3.14159, -.00273, -4.78, or 39.3E+4. You can use these data types for values that may have a fractional part or that are extremely large or small.

    Some types of data you might represent as floating-point values are average crop yield, distances, money values, unemployment rates, or stock prices.

  • The BIT type is used for specifying bit-field values.

Floating-point values may be assigned to integer columns, but will be rounded to the nearest integer. Conversely, integer values may be assigned to floating-point or fixed-point columns. They are treated as having a fractional part of zero.

When you specify a number, you should not include commas as a separator. For example, 12345678.90 is legal, but 12,345,678.90 is not.

Table 3.6 shows the name and range of each numeric type, and Table 3.7 shows the amount of storage required for values of each type.

Table 3.6. Numeric Data Type Ranges

Type Specification

Range

TINYINT[(M)]

Signed values: 128 to 127 (27 to 271); Unsigned values: 0 to 255 (0 to 281)

SMALLINT[(M)]

Signed values: 32768 to 32767 (215 to 2151); Unsigned values: 0 to 65535 (0 to 2161)

MEDIUMINT[(M)]

Signed values: 8388608 to 8388607 (223 to 2231); Unsigned values: 0 to 16777215 (0 to 2241)

INT[(M)]

Signed values: 2147683648 to 2147483647 (231 to 2311); Unsigned values: 0 to 4294967295 (0 to 2321)

BIGINT[(M)]

Signed values: 9223372036854775808 to 9223372036854775807 (263 to 2631); Unsigned values: 0 to 18446744073709551615 (0 to 2641)

FLOAT[(M,D)]

Minimum non-zero values: ±1.175494351E38; Maximum non-zero values: ±3.402823466E+38

DOUBLE[(M,D)]

Minimum non-zero values: ±2.2250738585072014E308; Maximum non-zero values: ±1.7976931348623157E+308

DECIMAL([M[,D]])

Varies; range depends on M and D

BIT[(M)]

0 to 2M1


Table 3.7. Numeric Data Type Storage Requirements

Type Specification

Storage Required

TINYINT[(M)]

1 byte

SMALLINT[(M)]

2 bytes

MEDIUMINT[(M)]

3 bytes

INT[(M)]

4 bytes

BIGINT[(M)]

8 bytes

FLOAT[(M,D)]

4 bytes

DOUBLE[(M,D)]

8 bytes

DECIMAL([M[,D]])

M+2 bytes

BIT[(M)]

Varies depending on M


Integer Data Types

MySQL provides five integer types: TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT. INTEGER is a synonym for INT. These types vary in the range of values they can represent and in the amount of storage space they require. (The larger the range, the more storage is required.) Integer columns can be defined as UNSIGNED to disallow negative values; this shifts the range for the column upward to begin at 0.

When you define an integer column, you can specify an optional display size M. If given, M should be an integer from 1 to 255. It represents the number of characters used to display values for the column. For example, MEDIUMINT(4) specifies a MEDIUMINT column with a display width of 4. If you define an integer column without an explicit width, a default width is assigned. The defaults are the lengths of the "longest" values for each type. Note that displayed values are not chopped to fit within M characters. If the printable representation of a particular value requires more than M characters, MySQL displays the full value.

The display size M for an integer column relates only to the number of characters used to display column values. It has nothing to do with the number of bytes of storage space required. For example, BIGINT values require 8 bytes of storage regardless of the display width. It is not possible to magically cut the required storage space for a BIGINT column in half by defining it as BIGINT(4). Nor does M have anything to do with the range of values allowed. If you define a column as INT(3), that doesn't restrict it to a maximum value of 999.

The following statement creates a table to illustrate the default values of M and D for integer data types:

CREATE TABLE mytbl
(
    itiny     TINYINT,
    itiny_u   TINYINT UNSIGNED,
    ismall    SMALLINT,
    ismall_u  SMALLINT UNSIGNED,
    imedium   MEDIUMINT,
    imedium_u MEDIUMINT UNSIGNED,
    ireg      INT,
    ireg_u    INT UNSIGNED,
    ibig      BIGINT,
    ibig_u    BIGINT UNSIGNED
);

If you issue a DESCRIBE mytbl statement after creating the table, the number following each type name shows the value that MySQL uses by default in the absence of an explicit display width specifier:

mysql> DESCRIBE mytbl;
+-----------+-----------------------+------+-----+---------+-------+
| Field     | Type                  | Null | Key | Default | Extra |
+-----------+-----------------------+------+-----+---------+-------+
| itiny     | tinyint(4)            | YES  |     | NULL    |       |
| itiny_u   | tinyint(3) unsigned   | YES  |     | NULL    |       |
| ismall    | smallint(6)           | YES  |     | NULL    |       |
| ismall_u  | smallint(5) unsigned  | YES  |     | NULL    |       |
| imedium   | mediumint(9)          | YES  |     | NULL    |       |
| imedium_u | mediumint(8) unsigned | YES  |     | NULL    |       |
| ireg      | int(11)               | YES  |     | NULL    |       |
| ireg_u    | int(10) unsigned      | YES  |     | NULL    |       |
| ibig      | bigint(20)            | YES  |     | NULL    |       |
| ibig_u    | bigint(20) unsigned   | YES  |     | NULL    |       |
+-----------+-----------------------+------+-----+---------+-------+

Floating-Point and Fixed-Point Data Types

MySQL provides two floating-point types (FLOAT, DOUBLE), and one fixed-point type (DECIMAL). Synonymous types are DOUBLE PRECISION for DOUBLE, and NUMERIC and FIXED for DECIMAL. The REAL type is a synonym for DOUBLE by default. If the REAL_AS_DEFAULT SQL mode is enabled, REAL type is a synonym for FLOAT.

Ranges for these types differ from ranges for integer types in the sense that there is not only a maximum value that a floating-point type can represent, but also a minimum non-zero value. The minimum values provide a measure of how precise the type is, which is often important for recording scientific data. (There are, of course, corresponding negative maximum and minimum values.)

Floating-point and fixed-point types can be defined as UNSIGNED. Unlike the integer types, defining a floating-point or fixed-point type as UNSIGNED doesn't shift the type's range upward, it merely eliminates the negative end.

For each floating-point or fixed-point type, you may specify a maximum number of significant digits M and the number of decimal places D. The value of M should be from 1 to 255. The value of D may be from 0 to 30. If M is not greater than D, it is adjusted up to a value of D+1. M and D correspond to the concepts of "precision" and "scale" with which you may be familiar.

For FLOAT and DOUBLE, M and D are optional. If you omit both from the column definition, values are stored to the full precision allowed by your hardware.

For DECIMAL, M and D are optional. If D is omitted, it defaults to 0. If M is omitted as well, it defaults to 10. In other words, the following equivalences hold:

DECIMAL = DECIMAL(10) = DECIMAL(10,0)
DECIMAL(n) = DECIMAL(n,0)

FLOAT(p) syntax also is allowed. However, whereas p stands for the required number of bits of precision in standard SQL, it is treated differently in MySQL. p may range from 0 to 53 and is used only to determine whether the column stores single-precision or double-precision values. For p values from 0 to 24, the column is treated as single precision. For values from 25 to 53, the column is treated as double precision. That is, the column is treated as a FLOAT or DOUBLE with no M or D values.

The DECIMAL type is a fixed-point type. It differs from FLOAT and DOUBLE in that DECIMAL values actually are stored as strings and have a fixed number of decimals. The significance of this fact is that DECIMAL values are not subject to roundoff error the way that FLOAT and DOUBLE columns area property that makes DECIMAL especially applicable for storing currency values. The corresponding tradeoff is that DECIMAL values are not as efficient as floating-point values stored in native format that the processor can operate on directly. Also, be aware that the fixed-point properties of DECIMAL apply only to storage and retrieval. Calculations on DECIMAL values might be done using floating-point operations.

MySQL handles DECIMAL values according to the standard SQL specification, with one extension. Standard SQL requires that a type of DECIMAL(M,D) must be able to represent any value with M digits and D decimal places. For example, DECIMAL(4,2) must be able to represent values from 99.99 to 99.99. Because the sign character and decimal point must still be stored, this requires an extra two bytes, so DECIMAL(M,D) values use M+2 bytes. For DECIMAL(4,2), six bytes are needed for the "widest" value (99.99).

The MySQL extension to standard SQL occurs at the positive end of the range. The sign byte is not needed to hold a sign character, so MySQL uses it to extend the range beyond that required by the SQL standard. In other words, for DECIMAL(4,2), the maximum value that can be stored in the six bytes available is 999.99.

There are two special conditions that reduce the DECIMAL storage requirement of M+2 bytes to a lesser value:

  • If D is 0, DECIMAL values have no fractional part, and no byte need be allocated to store the decimal point. This reduces the required storage by one byte.

  • If a DECIMAL column is UNSIGNED, no sign character need be stored, also reducing the required storage by one byte.

The maximum possible range for DECIMAL is the same as for DOUBLE, but the effective range is determined by the values of M and D. If you vary M and hold D fixed, the range becomes larger as M becomes larger. If you hold M fixed and vary D, the range becomes smaller as D becomes larger, although the precision increases. These properties are shown by Table 3.8 and Table 3.9.

Table 3.8. How M Affects the Range of DECIMAL(M,D)

Type Specification

Range

DECIMAL(4,1)

999.9 to 9999.9

DECIMAL(5,1)

9999.9 to 99999.9

DECIMAL(6,1)

99999.9 to 999999.9


Table 3.9. How D Affects the Range of DECIMAL(M,D)

Type Specification

Range

DECIMAL(4,0)

9999 to 99999

DECIMAL(4,1)

999.9 to 9999.9

DECIMAL(4,2)

99.99 to 999.99


The BIT Data Type

The BIT data type was introduced in MySQL 5.0.3 as a type for holding bit-field values. When you define a BIT column, you can specify an optional maximum width M that indicates the "width" of the column in bits. M should be an integer from 1 to 64. If omitted, M defaults to 1.

Values retrieved from a BIT column are not displayed in printable form by default. To display a printable representation of bit-field values, add zero. The BIN() function also can be useful for display bit-field values or the result of computations on them.

mysql> CREATE TABLE t (b BIT(3));  # holds values from 0 to 7
mysql> INSERT INTO t (b) VALUES(0),(b'11'),(b'101'),(b'111');
mysql> SELECT BIN(b+0), BIN(b & b'101'), BIN(b | b'101') FROM t;
+----------+-----------------+-----------------+
| BIN(b+0) | BIN(b & b'101') | BIN(b | b'101') |
+----------+-----------------+-----------------+
| 0        | 0               | 101             |
| 11       | 1               | 111             |
| 101      | 101             | 101             |
| 111      | 101             | 111             |
+----------+-----------------+-----------------+

Numeric Data Type Attributes

The UNSIGNED attribute disallows negative values. It can be used with all numeric types except BIT, but is most often used with integer types. Making an integer column UNSIGNED doesn't change the "size" of the underlying data type's range; it just shifts the range upward. Consider this table definition:

CREATE TABLE mytbl
(
    itiny   TINYINT,
    itiny_u TINYINT UNSIGNED
);

itiny and itiny_u both are TINYINT columns with a range of 256 values, but differ in the set of allowable values. The range of itiny is 128 to 127, whereas the range of itiny_u is shifted up, resulting in a range of 0 to 255.

UNSIGNED is useful for columns into which you plan to store information that doesn't take on negative values, such as population counts or attendance figures. Were you to use a signed column for such values, you would use only half of the data type's range. By making the column UNSIGNED, you effectively double your usable range. For example, if you use the column for sequence numbers, it will take twice as long to run out of values if you make it UNSIGNED.

You also can specify UNSIGNED for floating-point and fixed-point columns, although the effect is slightly different than for integer columns. The range does not shift upward; instead, the upper end remains unchanged and the lower end becomes zero.

The SIGNED attribute is allowed for all numeric types that allow UNSIGNED. However, it has no effect because such types are signed by default. SIGNED serves simply to indicate explicitly in a column definition that the column allows negative values.

The ZEROFILL attribute may be specified for all numeric types except BIT. It causes displayed values for the column to be padded with leading zeros to the display width. You can use ZEROFILL when you want to make sure column values always display using a given number of digits. Actually, it's more accurate to say "a given minimum number of digits" because values wider than the display width are displayed in full without being chopped. You can see this by issuing the following statements:

mysql> DROP TABLE IF EXISTS mytbl;
mysql> CREATE TABLE mytbl (my_zerofill INT(5) ZEROFILL);
mysql> INSERT INTO mytbl VALUES(1),(100),(10000),(1000000);
mysql> SELECT my_zerofill FROM mytbl;
+-------------+
| my_zerofill |
+-------------+
|       00001 |
|       00100 |
|       10000 |
|     1000000 |
+-------------+

Note that the final value, which is wider than the column's display width, is displayed in full.

If you specify the ZEROFILL attribute for a column, it automatically becomes UNSIGNED as well.

One other attribute, AUTO_INCREMENT, is intended only for use with integer data types. Specify the AUTO_INCREMENT attribute when you want to generate a series of unique identifier values. When you insert NULL into an AUTO_INCREMENT column, MySQL generates the next sequence value and stores it in the column. Normally, unless you take steps to cause otherwise, AUTO_INCREMENT values begin at 1 and increase by 1 for each new row. The sequence may be affected if you delete rows from the table. That is, sequence values might be reused; it is storage engine-dependent whether this occurs.

You can have at most one AUTO_INCREMENT column in a table. The column should have the NOT NULL constraint, and it must be indexed. Generally, an AUTO_INCREMENT column is indexed as a PRIMARY KEY or UNIQUE index. Also, because sequence values always are positive, you normally define the column UNSIGNED as well. For example, you can define an AUTO_INCREMENT column in any of the following ways:

CREATE TABLE ai (i INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY);
CREATE TABLE ai (i INT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE);
CREATE TABLE ai (i INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (i));
CREATE TABLE ai (i INT UNSIGNED NOT NULL AUTO_INCREMENT, UNIQUE (i));

The first two forms specify the index information as part of the column definition. The second two specify the index as a separate clause of the CREATE TABLE statement. Using a separate clause is optional if the index includes only the AUTO_INCREMENT column. If you want to create a multiple-column index that includes the AUTO_INCREMENT column, you must use a separate clause. (For an example of this, see "AUTO_INCREMENT for MyISAM Tables for MyISAM Tables.")

It is always allowable to define an AUTO_INCREMENT column explicitly as NOT NULL, but if you omit NOT NULL, MySQL adds it automatically.

"Working with Sequences" discusses the behavior of AUTO_INCREMENT columns further.

Following the attributes just described, which are specific to numeric columns, you may specify NULL or NOT NULL. If you do not specify NULL or NOT NULL for a numeric column, it allows NULL by default.

You also can specify a default value using the DEFAULT attribute. The following table contains three INT columns, having default values of -1, 1, and NULL:

CREATE TABLE t
(
    i1 INT DEFAULT -1,
    i2 INT DEFAULT 1,
    i3 INT DEFAULT NULL
);

The rules that MySQL uses for assigning a default value if you specify no DEFAULT clause are given in "Specifying Column Default Values."

Choosing Numeric Data Types

When you choose a type for a numeric column, consider the range of values that you need to represent and choose the smallest type that will cover the range. Choosing a larger type wastes space, leading to tables that are unnecessarily large and that cannot be processed as efficiently as if you had chosen a smaller type. For integer values, TINYINT is the best if the range of values in your data is small, such as a person's age or number of siblings. MEDIUMINT can represent millions of values and can be used for many more types of values, at some additional cost in storage space. BIGINT has the largest range of all but requires twice as much storage as the next smallest integer type (INT) and should be used only when really necessary. For floating-point values, DOUBLE takes twice as much space as FLOAT. Unless you need exceptionally high precision or an extremely large range of values, you can probably represent your data at half the storage cost by using FLOAT instead of DOUBLE.

Every numeric column's range of values is determined by its type. If you attempt to insert a value that lies outside the column's range, the result depends on whether strict mode is enabled. If it is, an out of range value results in an error. If strict mode is not enabled, truncation occurs: MySQL clips the value to the appropriate endpoint of the range and uses the result.

Value truncation occurs according to the range of the data type, not the display width. For example, a SMALLINT(3) column has a display width of 3 and a range from 32768 to 32767. The value 12345 is wider than the display width but within the range of the column, so it is inserted without clipping and retrieved as 12345. The value 99999 is outside the range, so it is clipped to 32767 when inserted. Subsequent retrievals return the value 32767.

In general, values assigned to a floating-point or fixed-point column are rounded to the number of decimals indicated by the column specification. If you store 1.23456 in a FLOAT(8,1) column, the result is 1.2. If you store the same value in a FLOAT(8,4) column, the result is 1.2346. This means you should define floating-point columns with a sufficient number of decimals to store values as precise as you require. If you need accuracy to thousandths, don't define a type with only two decimal places.

String Data Types

MySQL provides several data types for storing string values. Strings are often used for text values like these:

'N. Bertram, et al.'
'Pencils (no. 2 lead)'
'123 Elm St.'
'Monograph Series IX'

But strings are actually "generic" types in a sense because you can use them to represent any value. For example, you can use binary string types to hold binary data, such as images, sounds, or compressed output from gzip.

Table 3.10 shows all the types provided by MySQL for defining string-valued columns, and the maximum size and storage requirements of each type. The BLOB and TEXT types each have several variants that are distinguished by the maximum size of values they can hold.

Table 3.10. String Data Types

Type Specification

Maximum Size

Storage Required

BINARY[(M)]

M bytes

M bytes

VARBINARY(M)

M bytes

L + 1 or 2 bytes

CHAR[(M)]

M characters

M characters

VARCHAR(M)

M characters

L characters + 1 or 2 bytes

TINYBLOB

281 bytes

L + 1 bytes

BLOB

2161 bytes

L + 2 bytes

MEDIUMBLOB

2241 bytes

L + 3 bytes

LONGBLOB

2321 bytes

L + 4 bytes

TINYTEXT

281 characters

L characters + 1 byte

TEXT

2161 characters

L characters + 2 bytes

MEDIUMTEXT

2241 characters

L characters + 3 bytes

LONGTEXT

2321 characters

L characters + 4 bytes

ENUM('value1','value2',...)

65,535 members

1 or 2 bytes

SET('value1','value2',...)

64 members

1, 2, 3, 4, or 8 bytes


Some types hold binary strings (byte strings) and others hold non-binary strings (character strings). Thus, size and storage requirements are given in number of bytes per value for binary string types and number of characters for non-binary string types. For example, BINARY(20) holds 20 characters, whereas CHAR(20) holds 20 bytes. The differences between byte and character semantics for binary and non-binary strings are characterized in "String Values." Each of the binary string types for byte strings has a corresponding non-binary type for character strings, as shown in Table 3.11.

Table 3.11. Corresponding Binary and Non-Binary String Types

Binary String Type

Non-Binary String Type

BINARY

CHAR

VARBINARY

VARCHAR

BLOB

TEXT


Each of the non-binary string types, as well as ENUM and SET, can be assigned a character set and collation. The MyISAM, MEMORY, and InnoDB storage engines include support for using multiple character sets within a single table. Character set assignment is discussed in "String Data Type Attributes."

Some string types are fixed-length. For a given column, each value requires the same amount of storage. Other string types are variable-length. The amount of storage taken by a value varies from row to row and depends on the length of the values actually stored in the column. This length is represented by L in the table for variable-length types. The extra bytes required in addition to L are the number of bytes needed to store the length of the value. MySQL handles variable-length values by storing both the content of the value and its length. These extra bytes are treated as an unsigned integer. There is a correspondence between a variable-length type's maximum length, the number of extra bytes required for that type, and the range of the unsigned integer type that uses the same number of bytes. For example, a MEDIUMBLOB value may be up to 2241 bytes long and requires 3 bytes to record the length. The 3-byte integer type MEDIUMINT has a maximum unsigned value of 2241. That's not a coincidence.

The number of extra bytes for VARBINARY and VARCHAR is always one prior to MySQL 5.0.3 because the maximum column length is 255. As of MySQL 5.0.3, the column length can be up to 65,535, and two extra bytes are required for lengths greater than 255.

Values for all string types except ENUM and SET are stored as a sequence of bytes and interpreted either as bytes or characters depending on whether the type holds binary or non-binary strings. Values that are too long are chopped to fit. (In strict mode, an error occurs instead unless the chopped characters are spaces.) But string types range from very small to very large, with the largest type able to hold nearly 4GB of data, so you should be able to find something long enough to avoid truncation of your information. (The effective maximum column size actually is imposed by the maximum packet size of the client/server communication protocol, which is 1GB.)

For ENUM and SET, the column definition includes a list of legal string values, but ENUM and SET values are stored internally as numbers, as detailed later in "The ENUM and SET Data Types." Attempting to store a value other than those in the list causes the value to be converted to '' (the empty string) unless strict mode is enabled. In strict mode, an error occurs instead.

The CHAR and VARCHAR Data Types

CHAR and VARCHAR are two of the most commonly used string types. They both hold non-binary strings, and thus are associated with a character set and collation.

Some significant changes were made in MySQL 5.0.3 with regard to the VARCHAR data type:

  • The maximum length for VARCHAR was increased from 255 to 65,535.

  • Trailing spaces no longer are stripped when VARCHAR values are stored.

  • There is no automatic conversion between CHAR and VARCHAR columns. In earlier versions, this occurs under some circumstances, as described later in this section.

The primary differences between CHAR and VARCHAR lie in whether they have a fixed or variable length, and in how trailing spaces are treated:

  • CHAR is a fixed-length type, whereas VARCHAR is a variable-length type.

  • Values retrieved from CHAR columns have trailing spaces removed. For a CHAR(M) column, values that are shorter than M characters are padded to a length of M when stored, but trailing spaces are stripped when the values are retrieved.

  • For a VARCHAR(M) column, handling of trailing spaces is version-dependent. Before MySQL 5.0.3, VARCHAR values have trailing spaces stripped when they are stored. No space stripping occurs at retrieval time because that has already occurred. This trailing space removal differs from standard SQL behavior. As of MySQL 5.0.3, trailing spaces in VARCHAR values are retained both for storage and retrieval.

CHAR columns can be defined with a maximum length M from 0 to 255. M is optional for CHAR and defaults to 1 if missing. Note that CHAR(0) is legal. A CHAR(0) column can be used to represent on/off values if you allow it to be NULL. Values in such a column can have one of two values: NULL or the empty string. A CHAR(0) column takes very little storage space in the tableonly a single bit. It can be useful as a placeholder when you want to define a column but don't want to allocate space for it if you're not sure yet how wide to make it. You can use ALTER TABLE to widen the column later.

VARCHAR columns can be defined with a maximum length M from 0 to 255 before MySQL 5.0.3. As of MySQL 5.0.3, the length can be from 0 to 65,535. However, the actual maximum length of a VARCHAR column in practice may be less than 65,535, depending on storage engine internal row-size limits, the column character set, and the number of other columns in the table.

Keep in mind two general principles when choosing between CHAR and VARCHAR data types:

  • If your values all are M characters long, a VARCHAR(M) column actually will use more space than a CHAR(M) column due to the extra byte or bytes required to record the length of values. On the other hand, if your values vary in length, VARCHAR columns have the advantage of taking less space. A CHAR(M) column always takes M characters, even if it is empty or NULL.

  • If you're using MyISAM tables and your values don't vary much in length, CHAR is a better choice than VARCHAR because the MyISAM storage engine can process fixed-length rows more efficiently than variable-length rows. See "Data Type Choices and Query Efficiency," in Chapter 4, "Query Optimization."

Before MySQL 5.0.3, you cannot mix CHAR and VARCHAR within the same table, with a few limited exceptions. Depending on the circumstances, MySQL will even convert columns from one type to another when you create a table, something that other databases do not do. The principles that govern these conversions are as follows:

  • Table rows are fixed-length only if all the columns in the table are fixed-length types.

  • If even a single column has a variable length, table rows become variable length as well.

  • If table rows are variable-length, MySQL converts any fixed-length columns in the table to their variable-length equivalents when doing so will save space.

What this means is that if you have VARCHAR, BLOB, or TEXT columns in a table, you cannot also have CHAR columns; MySQL silently converts them to VARCHAR. Suppose that you create a table like this:

CREATE TABLE mytbl
(
    c1 CHAR(10),
    c2 VARCHAR(10)
);

If you issue a DESCRIBE statement, the output is as follows:

mysql> DESCRIBE mytbl;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| c1    | varchar(10) | YES  |     | NULL    |       |
| c2    | varchar(10) | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+

Notice that the presence of the VARCHAR column causes MySQL to convert c1 to VARCHAR as well. If you try using ALTER TABLE to convert c1 to CHAR, it won't work. The only way to convert a VARCHAR column to CHAR is to convert all VARCHAR columns in the table at the same time:

mysql> ALTER TABLE mytbl MODIFY c1 CHAR(10), MODIFY c2 CHAR(10);
mysql> DESCRIBE mytbl;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| c1    | char(10) | YES  |     | NULL    |       |
| c2    | char(10) | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+

The BLOB and TEXT data types are variable-length like VARCHAR, but they have no fixed-length equivalent, so you cannot use CHAR columns in the same table as BLOB or TEXT columns. CHAR columns will be converted to VARCHAR. Even using ALTER TABLE to convert all VARCHAR columns to CHAR at the same time will fail. The presence of a BLOB or TEXT column requires that the rows be variable length, so MySQL will not convert the VARCHAR columns.

The exception to the prohibition on mixing fixed-length and variable-length columns is that CHAR columns shorter than four characters are not converted to VARCHAR. For example, MySQL will not change the CHAR column in the following table to VARCHAR:

CREATE TABLE mytbl
(
    c1 CHAR(2),
    c2 VARCHAR(10)
);

You can see this from the output of DESCRIBE:

mysql> DESCRIBE mytbl;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| c1    | char(2)     | YES  |     | NULL    |       |
| c2    | varchar(10) | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+

There is a reason for not converting columns that are shorter than four characters: On average, any savings you might gain by not storing trailing spaces are offset by the extra byte needed in a VARCHAR column to record the length of each value. In fact, if all your columns are short, MySQL will convert any that you define as VARCHAR to CHAR. MySQL does this because the conversion decreases storage requirements on average and, for MyISAM tables, improves performance by making table rows fixed-length. Suppose that you create a table with the following specification:

CREATE TABLE mytbl
(
    c0 VARCHAR(0),
    c1 VARCHAR(1),
    c2 VARCHAR(2),
    c3 VARCHAR(3),
    c4 VARCHAR(4)
);

DESCRIBE reveals that MySQL silently changes all the VARCHAR columns shorter than four characters to CHAR:

mysql> DESCRIBE mytbl;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| c0    | char(0)    | YES  |     | NULL    |       |
| c1    | char(1)    | YES  |     | NULL    |       |
| c2    | char(2)    | YES  |     | NULL    |       |
| c3    | char(3)    | YES  |     | NULL    |       |
| c4    | varchar(4) | YES  |     | NULL    |       |
+-------+------------+------+-----+---------+-------+

As of MySQL 5.0.3, automatic conversion between CHAR and VARCHAR no longer occurs. A table can freely mix CHAR and VARCHAR columns.

The BINARY and VARBINARY Data Types

The BINARY and VARBINARY types are similar to CHAR and VARCHAR, with the following differences:

  • CHAR and VARCHAR are non-binary types that store characters and have a character set and collation. Comparisons are based on the collating sequence.

  • BINARY and VARBINARY are binary types that store bytes and have no character set or collation. Comparisons are based on numeric byte values.

Trailing space removal is the same for BINARY as for CHAR, and the same for VARBINARY as for VARCHAR. Also, prior to MySQL 5.0.3, BINARY columns may be converted to VARBINARY or vice versa when you create a table. The rules for when this occurs are the same as for conversion between CHAR and VARCHAR, as described in the previous section.

The BLOB and TEXT Data Types

A "BLOB" is a binary large objectbasically, a container that can hold anything you want to toss into it, and that you can make about as big as you want. In MySQL, the BLOB type is really a family of types (TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB). These types are identical except in the maximum amount of information they can hold (see Table 3.10). BLOB columns store binary strings. They are useful for storing data that may grow very large or that may vary widely in size from row to row. Some examples are compressed data, encrypted data, images, and sounds.

MySQL also has a family of TEXT types (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT). These are similar to the corresponding BLOB types, except that TEXT types store non-binary strings rather than binary strings. That is, they store characters rather than bytes, and are associated with a character set and collation. This results in the general differences between binary and non-binary strings that were described earlier in "String Values." For example, in comparison operations, BLOB values are compared in byte units and TEXT values are compared in character units using the column collation.

BLOB or TEXT columns sometimes can be indexed, depending on the storage engine you're using:

  • MyISAM, InnoDB, and BDB tables support BLOB and TEXT indexing. However, you must specify a prefix size to be used for the index. This avoids creating index entries that might be huge and thereby defeat any benefits to be gained by that index. The exception is that prefixes are not used for FULLTEXT indexes on TEXT columns. FULLTEXT searches are based on the entire content of the indexed columns, so any prefix you specify is ignored.

  • MEMORY tables do not support BLOB and TEXT indexes. This is because the MEMORY engine does not support BLOB or TEXT columns at all.

BLOB or TEXT columns may require special care:

  • Due to the typical large variation in the size of BLOB and TEXT values, tables containing them are subject to high rates of fragmentation if many deletes and updates are done. If you're using a MyISAM table to store BLOB or TEXT values, you can run OPTIMIZE TABLE periodically to reduce fragmentation and maintain good performance. See Chapter 4 for more information.

  • The max_sort_length system variable influences BLOB and TEXT comparison and sorting operations. Only the first max_sort_length bytes of each value are used. (For TEXT columns that use a multi-byte character set, this means that comparisons might involve fewer than max_sort_length characters.) If this causes a problem with the default max_sort_length value of 1024, you might want to increase the value before performing comparisons.

  • If you're using very large values, you might need to configure the server to increase the value of the max_allowed_packet parameter. See Chapter 11, "General MySQL Administration," for more information. You will also need to increase the packet size for any client that wants to use very large values. The mysql and mysqldump clients support setting this value directly using a startup option.

The ENUM and SET Data Types

ENUM and SET are special string data types for which values must be chosen from a fixed (predefined) list of allowable strings. The primary difference between them is that ENUM column values must consist of exactly one member of the list of values, whereas SET column values may contain any or all members of the list. In other words, ENUM is used for values that are mutually exclusive, whereas SET allows multiple choices from the list.

The ENUM data type defines an enumeration. ENUM columns may be assigned values consisting of exactly one member chosen from a list of values specified at table-creation time. You can define an enumeration to have up to 65,535 members. Enumerations are commonly used to represent category values. For example, values in a column defined as ENUM('N','Y') can be either 'N' or 'Y'. Or you can use ENUM for such things as available sizes or colors for a product or for answers to multiple-choice questions in a survey or questionnaire where a single response must be selected:

employees ENUM('less than 100','100-500','501-1500','more than 1500')
color ENUM('red','green','blue','black')
size ENUM('S','M','L','XL','XXL')
vote ENUM('Yes','No','Undecided')

If you are processing selections from a Web page that includes mutually exclusive radio buttons, you can use an ENUM to represent the options from which a visitor to your site chooses. For example, if you run an online pizza ordering service, ENUM columns can be used to represent the type of crust and size of pizza a customer orders:

crust ENUM('thin','regular','pan style','deep dish')
size ENUM('small','medium','large')

If enumeration categories represent counts, it's important to choose your categories properly when you create the enumeration. For example, when recording white blood cell counts from a laboratory test, you might group the counts into categories like this:

wbc ENUM('0-100','101-300','>300')

If any given test result is provided as an exact count, you can record the value in the wbc column using the category into which the count falls. But you cannot recover the original count if you decide you want to convert the column from a category-based ENUM to an integer column based on exact count. If you really need the exact count, use an integer column instead. You can group integer values into categories when you retrieve them using the CASE construct. For example, if wbc is defined as an integer column, you can select it as a category like this:

SELECT CASE WHEN wbc <= 100 THEN '0-100'
            WHEN wbc <= 300 THEN '101-300'
            ELSE '>300' END AS 'wbc category'
FROM ...

The SET type is similar to ENUM in the sense that when you create a SET column, you specify a list of legal set members. But unlike ENUM, each column value may consist of any number of members from the set. The set may have up to 64 members. You can use a SET when you have a fixed set of values that are not mutually exclusive as they are in an ENUM column. For example, you might use a SET to represent options available for an automobile:

SET('luggage rack','cruise control','air conditioning','sun roof')

Then particular SET values would represent those options actually ordered by customers:

'cruise control,sun roof'
'luggage rack,air conditioning'
'luggage rack,cruise control,air conditioning'
'air conditioning'
''

The final value shown (the empty string) means that the customer ordered no options. This is a legal value for any SET column.

SET column definitions are written as a list of individual strings separated by commas to indicate what the set members are. A SET column value, on the other hand, is written as a single string. If the value consists of multiple set members, the members are separated within the string by commas. This means you shouldn't use a string containing a comma as a SET member.

Other uses for SET columns might be for representing information such as patient diagnoses or results from selections on Web pages. For a diagnosis, there may be a standard list of symptoms to ask a patient about, and the patient might exhibit any or all of them:

SET('dizziness','shortness of breath','cough')

For an online pizza service, the Web page for ordering could have a set of check boxes for ingredients that a customer wants as toppings on a pizza, several of which might be chosen:

SET('pepperoni','sausage','mushrooms','onions','ripe olives')

The way you define the legal value list for an ENUM or SET column is significant in several ways:

  • The list determines the possible legal values for the column, as has already been discussed.

  • If an ENUM or SET column has a collation that is not case sensitive, you can insert legal values in any lettercase and they will be recognized. However, the lettercase of the strings as specified in the column definition determines the lettercase of column values when they are retrieved later. For example, if you have an ENUM('Y','N') column and you store 'y' and 'n' in it, the values are displayed as 'Y' and 'N' when you retrieve them. If the column has a case sensitive or binary collation, you must insert values using exactly the lettercase used in the column definition or the values will not be recognized as legal. On the other hand, you can have distinct elements that differ only in lettercase, something that is not true when you use a collation that is not case sensitive.

  • The order of values in an ENUM definition is the order used for sorting. The order of values in a SET definition also determines sort order, although the relationship is more complicated because column values may contain multiple set members.

  • When MySQL displays a SET value that consists of multiple set members, the order in which it lists the members within the value is determined by the order in which they appear in the SET column definition.

ENUM and SET are classified as string types because enumeration and set members are specified as strings when you create columns of these types. However, the ENUM and SET types actually have a split personality: The members are stored internally as numbers and you can work with them as such. This means that ENUM and SET types are more efficient than other string types because they often can be handled using numeric operations rather than string operations. It also means that ENUM and SET values can be used in either string or numeric contexts. Finally, ENUM and SET columns can cause confusion if you use them in string context but expect them to behave as numbers, or vice versa.

MySQL sequentially numbers ENUM members in the column definition beginning with 1. (The value 0 is reserved for the error member, which is represented in string form by the empty string.) The number of enumeration values determines the storage size of an ENUM column. One byte can represent 256 values and two bytes can represent 65,536 values. (Compare this to the ranges of the one-byte and two-byte integer types TINYINT UNSIGNED and SMALLINT UNSIGNED.) Thus, counting the error member, the maximum number of enumeration members is 65,536 and the storage size depends on whether there are more than 256 members. You can specify a maximum of 65,535 (not 65,536) members in the ENUM definition because MySQL reserves a spot for the error member as an implicit member of every enumeration. When you assign an illegal value to an ENUM column, MySQL assigns the error member. (In strict mode, an error occurs instead.)

Here's an example you can try using the mysql client. It demonstrates that you can retrieve ENUM values in either string or numeric form (which shows the numeric ordering of enumeration members and also that the NULL value has no number in the ordering):

mysql> CREATE TABLE e_table (e ENUM('jane','fred','will','marcia'));
mysql> INSERT INTO e_table
    -> VALUES('jane'),('fred'),('will'),('marcia'),(''),(NULL);
mysql> SELECT e, e+0, e+1, e*3 FROM e_table;
+--------+------+------+------+
| e      | e+0  | e+1  | e*3  |
+--------+------+------+------+
| jane   |    1 |    2 |    3 |
| fred   |    2 |    3 |    6 |
| will   |    3 |    4 |    9 |
| marcia |    4 |    5 |   12 |
|        |    0 |    1 |    0 |
| NULL   | NULL | NULL | NULL |
+--------+------+------+------+

You can compare ENUM members either by name or number:

mysql> SELECT e FROM e_table WHERE e='will';
+------+
| e    |
+------+
| will |
+------+
mysql> SELECT e FROM e_table WHERE e=3;
+------+
| e    |
+------+
| will |
+------+

It is possible to define the empty string as a legal enumeration member, but this will only cause confusion. The string is assigned a non-zero numeric value, just as any other member listed in the definition. However, an empty string also is used for the error member that has a numeric value of 0, so it would correspond to two internal numeric element values. In the following example, assigning the illegal enumeration value 'x' to the ENUM column causes the error member to be assigned. This is distinguishable from the empty string member listed in the column definition only when retrieved in numeric form:

mysql> CREATE TABLE t (e ENUM('a','','b'));
mysql> INSERT INTO t VALUES('a'),(''),('b'),('x');
mysql> SELECT e, e+0 FROM t;
+------+------+
| e    | e+0  |
+------+------+
| a    |    1 |
|      |    2 |
| b    |    3 |
|      |    0 |
+------+------+

The numeric representation of SET columns is a little different than for ENUM columns. Set members are not numbered sequentially. Instead, members correspond to successive individual bits in the SET value. The first set member corresponds to bit 0, the second member corresponds to bit 1, and so on. In other words, the numeric values of SET members all are powers of two. The empty string corresponds to a numeric SET value of 0.

SET values are stored as bit values. Eight set members per byte can be stored this way, so the storage size for a SET column is determined by the number of set members, up to a maximum of 64 members. SET values take 1, 2, 3, 4, or 8 bytes for set sizes of 1 to 8, 9 to 16, 17 to 24, 25 to 32, and 33 to 64 members.

The representation of a SET as a set of bits is what allows a SET value to consist of multiple set members. Any combination of bits can be turned on in the value, so the value may consist of any combination of the strings in the SET definition that correspond to those bits.

The following example shows the relationship between the string and numeric forms of a SET column. The numeric value is displayed in both decimal and binary form:

mysql> CREATE TABLE s_table (s SET('table','lamp','chair','stool'));
mysql> INSERT INTO s_table
    -> VALUES('table'),('lamp'),('chair'),('stool'),(''),(NULL);
mysql> SELECT s, s+0, BIN(s+0) FROM s_table;
+-------+------+----------+
| s     | s+0  | BIN(s+0) |
+-------+------+----------+
| table |    1 | 1        |
| lamp  |    2 | 10       |
| chair |    4 | 100      |
| stool |    8 | 1000     |
|       |    0 | 0        |
| NULL  | NULL | NULL     |
+-------+------+----------+

If you assign to the column s a value of 'lamp,stool', MySQL stores it internally as 10 (binary 1010) because 'fred' has a value of 2 (bit 1) and 'marcia' has a value of 8 (bit 3).

When you assign values to SET columns, the substrings don't need to be listed in the same order that you used when you defined the column. However, when you retrieve the value later, members are displayed within the value in definition order. Also, if you assign to a SET column a value containing substrings that are not listed as set members, those strings drop out and the column is assigned a value consisting of the remaining substrings. When you retrieve the value later, the illegal substrings will not be present.

If you assign a value of 'chair,couch,table' to the column s in s_table, two things happen:

  • First, 'couch' drops out because it's not a member of the set. This occurs because MySQL determines which bits correspond to each substring of the value to be assigned and turns them on in the stored value. 'couch' corresponds to no bit and is ignored.

  • Second, when you retrieve the value later, it appears as 'table,chair'. On retrieval, MySQL constructs the string value from the numeric value by scanning the bits in order, which automatically reorders the substrings to the order used when the column was defined. This behavior also means that if you specify a set member more than once in a value, it will appear only once when you retrieve the value. If you assign 'lamp,lamp,lamp' to a SET column, it will be simply 'lamp' when retrieved.

In strict mode, use of an illegal SET member causes an error instead and the value is not stored. In the preceding example, assigning a value containing 'couch' would cause an error and the assignment would fail.

The fact that MySQL reorders members in a SET value means that if you search for values using a string, you must list members in the proper order. If you insert 'chair,table' and then search for 'chair,table' you won't find the record; you must look for it as 'table,chair'.

Sorting and indexing of ENUM and SET columns is done according to the internal (numeric) values of column values. The following example might appear to be incorrect because the values are not displayed in alphanumeric order:

mysql> SELECT e FROM e_table ORDER BY e;
+--------+
| e      |
+--------+
| NULL   |
|        |
| jane   |
| fred   |
| will   |
| marcia |
+--------+

You can better see what's going on by retrieving both the string and numeric forms of the ENUM values:

mysql> SELECT e, e+0 FROM e_table ORDER BY e;
+--------+------+
| e      | e+0  |
+--------+------+
| NULL   | NULL |
|        |    0 |
| jane   |    1 |
| fred   |    2 |
| will   |    3 |
| marcia |    4 |
+--------+------+

If you have a fixed set of values and you want them to sort in a particular order, you can exploit the ENUM sorting properties: Represent the values as an ENUM column in a table and list the enumeration values in the column definition in the order that you want them to be sorted. Suppose that you have a table representing personnel for a sports organization, such as a football team, and that you want to sort output by personnel position so that it comes out in a particular order, such as coaches, assistant coaches, quarterbacks, running backs, receivers, linemen, and so on. Define the column as an ENUM and list the enumeration elements in the order that you want to see them. Then column values automatically will come out in that order for sort operations.

For cases where you want an ENUM to sort in normal lexical order, you can convert the column to a non-ENUM string by using CAST() and sorting the result:

mysql> SELECT CAST(e AS CHAR) AS e_str FROM e_table ORDER BY e_str;
+--------+
| e_str  |
+--------+
| NULL   |
|        |
| fred   |
| jane   |
| marcia |
| will   |
+--------+

CAST() doesn't change the displayed values, but has the effect in this statement of performing an ENUM-to-string conversion that alters their sorting properties so they sort as strings.

String Data Type Attributes

The attributes unique to the string data types are CHARACTER SET and COLLATE for designating a character set and collating order. You can specify these as options for the table itself to set its defaults, or for individual columns to override the table defaults. (Actually, each database also has a default character set and collation, as does the server itself. These defaults sometimes come into play during table creation, as we'll see later.)

The CHARACTER SET and COLLATION attributes apply to the CHAR, VARCHAR, TEXT, ENUM, and SET data types. They do not apply to the binary string data types (BINARY, VARBINARY, and BLOB), because those types contain byte strings, not character strings.

When you specify the CHARACTER SET and COLLATION attributes, whether at the column, table, or database level, the following rules apply:

  • The character set must be one that the server supports. To display the available character sets, use SHOW CHARACTER SET.

  • If you specify both CHARACTER SET and COLLATE, the collation must be compatible with the character set. For example, with a character set of latin2, you could use a collation of latin2_croatian_ci, but not latin1_bin. To display the collations for each character set, use SHOW COLLATION.

  • If you specify CHARACTER SET without COLLATE, the character set's default collation is used.

  • If you specify COLLATE without CHARACTER SET, the character set is determined from the first part of the collation name.

To see how these rules apply, consider the following statement. It creates a table that uses several character sets:

CREATE TABLE mytbl
(
    c1  CHAR(10),
    c2  CHAR(40) CHARACTER SET latin2,
    c3  CHAR(10) COLLATE latin1_german1_ci,
    c4  BINARY(40)
) CHARACTER SET utf8;

The resulting table has utf8 as its default character set. No COLLATE table option is given, so the default table collation is the default utf8 collation (which happens to be utf8_general_ci). The definition for the c1 column contains no CHARACTER SET or COLLATE attributes of its own, so the table defaults are used for it. On the other hand, the table-level character set and collation are not used for c2, c3, and c4: c2 and c3 have their own character set information, and c4 has a binary string type, so the character set attributes do not apply. For c2, the collation is latin2_general_ci, the default collation for latin2. For c3, the character set is latin1, as can be determined from the collation name latin1_german1_ci.

To see character set information for an existing table, use SHOW CREATE TABLE:

mysql> SHOW CREATE TABLE mytbl\G
*************************** 1. row ***************************
       Table: mytbl
Create Table: CREATE TABLE `mytbl` (
  `c1` char(10) default NULL,
  `c2` char(40) character set latin2 default NULL,
  `c3` char(10) character set latin1 collate latin1_german1_ci default NULL,
  `c4` binary(40) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8


If SHOW CREATE TABLE does not display a column character set, it is the same as the table default character set. If it does not display a column collation, it is the default collation for the character set.

You also can add the FULL keyword to SHOW COLUMNS to cause it to display collation information (from which character sets can be determined):

mysql> SHOW FULL COLUMNS FROM mytbl;
+-------+------------+-------------------+------+-----+---------+...
| Field | Type       | Collation         | Null | Key | Default |...
+-------+------------+-------------------+------+-----+---------+...
| c1    | char(10)   | utf8_general_ci   | YES  |     | NULL    |...
| c2    | char(40)   | latin2_general_ci | YES  |     | NULL    |...
| c3    | char(10)   | latin1_german1_ci | YES  |     | NULL    |...
| c4    | binary(40) | NULL              | YES  |     | NULL    |...
+-------+------------+-------------------+------+-----+---------+...

The preceding discussion mentions column and table character set assignments, but character sets actually can be designated at the column, table, database, or server level. When MySQL processes a character column definition, it determines which character set to use for it by trying the following rules in order:

  1. If the column definition includes a character set, use that set. (This includes the case where only a COLLATE attribute is present, because that implies which character set to use.)

  2. Otherwise, if the table definition includes a character set table option, use that set.

  3. Otherwise, use the database character set as the table default character set, which also becomes the column character set. If the database was never assigned a character set explicitly (for example, if it was a database created prior to upgrading to MySQL 4.1), the database character set is taken from the server character set.

In other words, MySQL searches up through the levels at which character sets may be specified until it finds a character set defined, and then uses that for the column's set. The database always has a default character set, so the search process is guaranteed to terminate at the database level even if no character set is specified explicitly at any of the lower levels.

The character set name binary is special. If you assign the binary character set to a non-binary string column, the result is a type conversion that forces the column to the corresponding binary string type. The following pairs of column definitions each show two equivalent definitions:

c1 CHAR(10) CHARACTER SET binary
c1 BINARY(10)

c2 VARCHAR(10) CHARACTER SET binary
c2 VARBINARY(10)

c3 TEXT CHARACTER SET binary
c3 BLOB

If you specify CHARACTER SET binary for a binary string column, it is ignored because the type already is binary. If you specify CHARACTER SET binary for an ENUM or SET, it is used as is.

If you assign the binary character set as a table option, it applies to each string column that does not have any character set information specified in its own definition.

MySQL provides some shortcut attributes for defining character columns:

  • The ASCII attribute is shorthand for CHARACTER SET latin1.

  • The UNICODE attribute is shorthand for CHARACTER SET ucs2.

  • If you use the BINARY attribute for a non-binary string column, ENUM, or SET, it is shorthand for specifying the binary collation of the column's character set. For example, assuming a table default character set of latin1, these definitions are equivalent:

    c1 CHAR(10) BINARY
    c2 CHAR(10) CHARACTER SET latin1 BINARY
    c3 CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin
    

    If you specify the BINARY attribute for a binary string column, it is ignored because the type already is binary.

The general attributes NULL or NOT NULL can be specified for any of the string types. If you don't specify either of them, NULL is the default. However, defining a string column as NOT NULL does not prevent you from storing an empty string (that is, '') in the column. In MySQL, an empty value is different from a missing value, so don't make the mistake of thinking that you can force a string column to contain non-empty values by defining it NOT NULL. If you require string values to be non-empty, that is a constraint you must enforce from within your own applications.

You also can specify a default value using the DEFAULT attribute for all string data types except the BLOB and TEXT types. The rules that MySQL uses for assigning a default value if you specify no DEFAULT clause are given in "Specifying Column Default Values."

Choosing String Data Types

When choosing a data type for a string column, consider the following questions:

  • Are values represented as text or binary data? For text, non-binary string types are most appropriate. For binary data, use a binary string type.

  • Do you want comparisons to be lettercase-aware? If so, use one of the non-binary string types, because those store characters and are associated with a character set and collation.

    You can control case sensitivity of non-binary string values for comparison and sorting purposes by the collation that you assign to them. If you want string values to be regarded equal regardless of lettercase, use a case-insensitive collation. Otherwise, use either a binary or case-sensitive collation. A binary collation compares character units using the numeric character codes. A case-sensitive collation compares character units using a specific collating order, which need not correspond to character code order. In either case, the lowercase and uppercase versions of a given character are considered distinct for comparisons. Suppose that 'mysql', 'MySQL', and 'MYSQL' are strings in the latin1 character set. They are all considered the same if compared using a case-insensitive collation such as latin1_swedish_ci, but as three different strings if compared using the binary latin1_bin collation or case-sensitive latin1_general_cs collation.

    If you want to use a string column both for case-sensitive and not case-sensitive comparisons, use a collation that corresponds to the type of comparison you will perform most often. For comparisons of the other type, apply the COLLATE operator to change the collation. For example, if mycol is a CHAR column that uses the latin1 character set, you can assign it the latin1_swedish_ci collation to perform case-insensitive comparisons by default. The following comparison is not case sensitive:

    mycol = 'ABC'
    

    For those times when you need case-sensitive comparisons, use the latin1_general_cs or latin1_bin collation. The following comparisons are case sensitive (it doesn't matter whether you apply the COLLATE operator to the left hand string or the right hand string):

    mycol COLLATE latin1_general_cs = 'ABC'
    mycol COLLATE latin1_bin = 'ABC'
    mycol = 'ABC' COLLATE latin1_general_cs
    mycol = 'ABC' COLLATE latin1_bin
    

  • Do you want to minimize storage requirements? If so, use a variable-length type, not a fixed-length type.

  • Will the allowable set of values for the column always be chosen from a fixed set of legal values? If so, ENUM or SET might be a good choice.

    ENUM also can be useful if you have a limited set of string values that you want to sort in some non-lexical order. Sorting of ENUM values occurs according to the order in which you list the enumeration values in the column definition, so you can make the values sort in any order you want.

  • Are trailing spaces significant? If values must be retrieved exactly as they are stored without having trailing spaces removed during storage or retrieval, use a TEXT column for non-binary strings and a BLOB column for binary strings. (As of MySQL 5.0.3, you also can use VARCHAR or VARBINARY.) This factor is important also if you are storing compressed, hashed, or encrypted values computed in such a way that the encoding method might result in trailing spaces. The following table shows how trailing spaces are handled for storage and retrieval operations for various string data types:

    Data Type

    Storage

    Retrieval

    Result

    CHAR, BINARY

    Padded

    Removed

    Retrieved values have no trailing spaces

    VARCHAR, VARBINARY (< MySQL 5.0.3)

    Removed

    No action

    Retrieved values have no trailing spaces

    VARCHAR, VARBINARY ( MySQL 5.0.3)

    No action

    No action

    Trailing spaces are not changed

    TEXT, BLOB

    No action

    No action

    Trailing spaces are not changed


Date and Time Data Types

MySQL provides several data types for storing temporal values: DATE, TIME, DATETIME, TIMESTAMP, and YEAR. Table 3.12 shows these types and the range of legal values for each type. The storage requirements for each type are shown in Table 3.13.

Table 3.12. Date and Time Data Types

Type Specification

Range

DATE

'1000-01-01' to '9999-12-31'

TIME

'-838:59:59' to '838:59:59'

DATETIME

'1000-01-01 00:00:00' to '9999-12-31 23:59:59'

TIMESTAMP

'1970-01-01 00:00:00' to partially through the year 2037

YEAR[(M)]

1901 to 2155 for YEAR(4), and 1970 to 2069 for YEAR(2)


Table 3.13. Date and Time Data Type Storage Requirements

Type Specification

Storage Required

DATE

3 bytes

TIME

3 bytes

DATETIME

8 bytes

TIMESTAMP

4 bytes

YEAR

1 byte


Each date and time type has a "zero" value that is stored when you insert a value that is illegal for the type, as shown in Table 3.14. The "zero" value also is the default value for date and time columns that are defined with the NOT NULL constraint.

Table 3.14. Date and Time Type "Zero" Values

Type Specification

Zero Value

DATE

'0000-00-00'

TIME

'00:00:00'

DATETIME

'0000-00-00 00:00:00'

TIMESTAMP

'0000-00-00 00:00:00'

YEAR

0000


MySQL always represents dates with the year first, in accordance with the standard SQL and ISO 8601 specifications. For example, December 3, 2004 is represented as '2004-12-03'. However, MySQL does allow some leeway in how you can specify input dates. For example, it will convert two-digit year values to four digits, and you need not supply a leading zero digit for month and day values that are less than 10. However, you must specify the year first and the day last. Formats that you may be more used to, such as '12/3/99' or '3/12/99', will not be interpreted as you might intend. The date interpretation rules MySQL uses are discussed further in "Working with Date and Time Values."

For combined date and time values, it is also allowable to specify a 'T' character rather than a space between the date and time (for example, '2004-12-31T12:00:00').

Time or combined date and time values can include a microseconds part following the time, consisting of a decimal point and up to six digits. (For example, '12:30:15.5' or '2005-06-15 10:30:12.000045'.) However, current support for microsecond values is only partial. Some temporal functions use them, but you cannot store a temporal value that includes a microseconds part in a table; the microseconds part is discarded.

For retrieval, you can display date and time values in a variety of formats by using the DATE_FORMAT() and TIME_FORMAT() functions.

The DATE, TIME, and DATETIME Data Types

The DATE and TIME types hold date and time values. The DATETIME type holds combined date and time values. The formats for the three types of values are 'CCYY-MM-DD', 'hh:mm:ss', and 'CCYY-MM-DD hh:mm:ss', where CC, YY, MM, DD hh, mm, and ss represent century, year, month, day, hour, minute, and second, respectively.

For the DATETIME type, the date and time parts are both required; if you assign a DATE value to a DATETIME column, MySQL automatically adds a time part of '00:00:00'. Conversions work in the other direction as well. If you assign a DATETIME value to a DATE or TIME column, MySQL discards the part that is irrelevant:

mysql> CREATE TABLE t (dt DATETIME, d DATE, t TIME);
mysql> INSERT INTO t (dt,d,t) VALUES(NOW(), NOW(), NOW());
mysql> SELECT * FROM t;
+---------------------+------------+----------+
| dt                  | d          | t        |
+---------------------+------------+----------+
| 2004-07-17 16:30:44 | 2004-07-17 | 16:30:44 |
+---------------------+------------+----------+

MySQL treats the time in DATETIME and TIME values slightly differently. For DATETIME, the time part represents a time of day and must be in the range from '00:00:00' to '23:59:59'. A TIME value, on the other hand, represents elapsed timethat's why the range shown in Table 3.12 for TIME columns is so great and why negative values are allowed.

One thing to watch out when inserting TIME values into a table is that if you use a "short" (not fully qualified) value, it may not be interpreted as you expect. For example, you'll probably find that if you insert '30' and '12:30', into a TIME column, one value will be interpreted from right to left and the other from left to right, resulting in stored values of '00:00:30' and '12:30:00'. If you consider '12:30' to represent a value of "12 minutes, 30 seconds," you should specify it in fully qualified form as '00:12:30'.

The TIMESTAMP Data Type

TIMESTAMP is a temporal data type that stores combined date and time values. (The word "timestamp" might appear to connote time only, but that is not the case.) The description of the TIMESTAMP data type in this section is current as of MySQL 4.1.6. Certain aspects of TIMESTAMP properties were in flux during earlier 4.1 releases, so avoid them and use a current release. The end of the section summarizes how the TIMESTAMP properties differ in MySQL 4.0 from the current properties.

The TIMESTAMP data type has several special properties:

  • TIMESTAMP columns have a range of values from '1970-01-01 00:00:00' to partially through the year 2037. The range is tied to Unix time, where the first day of 1970 is "day zero," also known as "the epoch." Values are stored as a four-byte number of seconds since the epoch. The beginning of 1970 determines the lower end of the TIMESTAMP range. The upper end of the range corresponds to the maximum four-byte value for Unix time.

  • Values are stored in Universal Coordinated Time (UTC). When you store a TIMESTAMP value, the server converts it from the current time zone to UTC. When you retrieve the value later, the server converts it back from UTC to the current time zone, so you see the same value that you stored. However, if another client connects to the server, uses a different time zone, and retrieves the value, it will see the value adjusted to its own time zone. In fact, you can see this effect within a single connection if you change your own time zone:

    mysql> CREATE TABLE t (ts TIMESTAMP);
    mysql> SET time_zone = '+00:00';   # set time zone to UTC
    mysql> INSERT INTO t VALUES('2000-01-01 00:00:00');
    mysql> SELECT ts FROM t;
    +---------------------+
    | ts                  |
    +---------------------+
    | 2000-01-01 00:00:00 |
    +---------------------+
    mysql> SET time_zone = '+03:00';   # advance time zone 3 hours
    mysql> SELECT ts FROM t;
    +---------------------+
    | ts                  |
    +---------------------+
    | 2000-01-01 03:00:00 |
    +---------------------+
    

    These examples specify time zones using values given as a signed offset in hours and minutes relative to UTC. You also can use named time zones such as 'Europe/Zurich' if the server time zone tables have been set up as described in "Configuring Time Zone Support," in Chapter 11.

  • TIMESTAMP has automatic initialization and update properties. You can designate any single TIMESTAMP column in a table to have either or both of these properties:

    • "Automatic initialization" means that for new records the column is set to the current timestamp if you omit it from the INSERT statement or set it to NULL.

    • "Automatic update" means that for existing records the column is updated to the current timestamp when you change any other column. Setting a column to its current value does not count as a change. You must set it to a different value for automatic update to occur.

    In addition, if you set any TIMESTAMP column to NULL, its value is set to the current timestamp. You can defeat this by defining the column with the NULL attribute to allow NULL values to be stored in the column.

Only one TIMESTAMP column in a table can be designated to have automatic properties. You cannot have automatic initialization for one TIMESTAMP column and automatic update for another. Nor can you have automatic initialization for multiple columns, or automatic update for multiple columns.

The syntax for specifying a TIMESTAMP column is as follows, assuming a column name of ts:

ts TIMESTAMP [DEFAULT value] [ON UPDATE CURRENT_TIMESTAMP]

The DEFAULT and ON UPDATE attributes can be given in any order, if both are given. The default value can be CURRENT_TIMESTAMP or a constant value such as 0 or a value in 'CCYY-MM-DD hh:mm:ss' format. Synonyms for CURRENT_TIMESTAMP are CURRENT_TIMESTAMP() and NOW(); they're all interchangeable in a TIMESTAMP definition.

To have one or both of the automatic properties for the first TIMESTAMP column in a table, you can define it using various combinations of the DEFAULT and ON UPDATE attributes:

  • With DEFAULT CURRENT_TIMESTAMP, the column has automatic initialization. It also has automatic update if ON UPDATE CURRENT_TIMESTAMP is given.

  • With neither attribute, MySQL defines the column with both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP. (This preserves compatibility with table definitions from MySQL 4.0.)

  • With a DEFAULT constant_value attribute that specifies a constant value, the column does not have automatic initialization. It does have automatic update if ON UPDATE CURRENT_TIMESTAMP is given.

  • Without DEFAULT but with ON UPDATE CURRENT_TIMESTAMP, the default value is 0 and the column has automatic update.

To use automatic initialization or update for a different TIMESTAMP column than the first one, you must explicitly define the first one with a DEFAULT constant_value attribute and no ON UPDATE CURRENT_TIMESTAMP attribute. Then you can use DEFAULT CURRENT_TIMESTAMP or ON UPDATE CURRENT_TIMESTAMP (or both) with any other single TIMESTAMP column.

If you want to defeat automatic initialization or update for a TIMESTAMP column, set it explicitly to the desired value for insert or update operations. For example, you can prevent an update from changing the column by setting the column to its current value.

TIMESTAMP column definitions also can include NULL or NOT NULL. The default is NOT NULL. Its effect is that when you explicitly set the column to NULL, MySQL sets it to the current timestamp. (This is true both for inserts and updates.) If you specify NULL, setting the column to NULL stores NULL rather than the current timestamp.

If you want a table to contain a column that is set to the current timestamp for new records and that remains unchanged thereafter, you can achieve that two ways:

  • Use a TIMESTAMP column declared as follows without an ON UPDATE attribute:

    ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    

    When you create a new record, initialize the column to the current timestamp by setting it to NULL or by omitting it from the INSERT statement. The column will retain its value for subsequent updates unless you change it explicitly.

  • Use a DATETIME column. When you create a record, initialize the column to NOW(). Whenever you update the record thereafter, leave the column alone.

If you want a table to contain columns for both a time-created value and a last-modified value, use two TIMESTAMP columns:

CREATE TABLE t
(
    t_created  TIMESTAMP DEFAULT 0,
    t_modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP
                         ON UPDATE CURRENT_TIMESTAMP
    ... other columns ...
);

When inserting a new record, set both TIMESTAMP columns to NULL to set them to the insertion timestamp. When updating an existing record, leave both columns alone; t_modified will be updated automatically to the modification timestamp.

During table creation, TIMESTAMP columns are subject to the setting of the SQL mode. If the MAXDB mode is enabled, any TIMESTAMP column is created as a DATETIME column instead. This is for compatibility with the MaxDB DBMS.

If you have been using MySQL 4.0, you'll notice that its handling of the TIMESTAMP data type differs in several ways from 4.1. Here's a brief characterization of the important differences for MySQL 4.0:

  • TIMESTAMP display format is not the same as for DATETIME. Instead, MySQL displays TIMESTAMP values in 14-digit CCYYMMDDhhmmss format by default. A display width can be specified to display only part of the 14-digit value. For example, the display format for TIMESTAMP(10) values is YYMMDDhhmm.

  • Only the first TIMESTAMP column is subject to automatic initialization and update, and you cannot disable these properties. You can defeat them only by assigning a non-NULL value to the column for specific statements.

  • You cannot store NULL in a TIMESTAMP column. Setting the column to NULL always sets it to the current timestamp.

  • There is only one time zone.

The YEAR Data Type

YEAR is a one-byte data type intended for efficient representation of year values. A YEAR column definition may include a specification for a display width M, which should be either 4 or 2. If you omit M from a YEAR definition, the default is 4. YEAR(4) has a range of 1901 to 2155. YEAR(2) has a range of 1970 to 2069, but only the last two digits are displayed. You can use the YEAR type when you want to store date information but only need the year part of the date, such as year of birth, year of election to office, and so forth. When you do not require a full date value, YEAR is much more space-efficient than other date types.

TINYINT has the same storage size as YEAR (one byte), but not the same range. To cover the same range of years as YEAR by using an integer type, you would need a SMALLINT, which takes twice as much space. If the range of years you need to represent coincides with the range of the YEAR type, YEAR is more space-efficient than SMALLINT. Another advantage of YEAR over an integer column is that MySQL converts two-digit values into four-digit values for you using MySQL's usual year-guessing rules. For example, 97 and 14 become 1997 and 2014. However, be aware that inserting the numeric value 00 into a four-digit YEAR column results in the value 0000 being stored, not 2000. If you want a value of 00 to convert to 2000, you should specify it in string form as '00'.

Date and Time Data Type Attributes

The following remarks apply to all temporal types except TIMESTAMP:

  • The general attributes NULL or NOT NULL may be specified. If you don't specify either of them, NULL is the default.

  • You also can specify a default value using the DEFAULT attribute. The rules that MySQL uses for assigning a default value if you specify no DEFAULT clause are given in "Specifying Column Default Values."

    Note that because default values must be constants, you cannot use a function such as NOW() to supply a value of "the current date and time" as the default for a DATETIME column. To achieve that result, set the column value explicitly to NOW() whenever you create a new record, or else consider using a TIMESTAMP column instead.

TIMESTAMP columns are special; the default for the first such column in a table is the current date and time, and the "zero" value for any others. However, the full set of rules governing default values is more complex. See "The TIMESTAMP Data Type" for details.

Working with Date and Time Values

MySQL tries to interpret input values for date and time columns in a variety of formats, including both string and numeric forms. Table 3.16 shows the allowable formats for each of the date and time types.

Table 3.16. Date and Time Type Input Formats

Type

Allowable Formats

DATETIME, TIMESTAMP

'CCYY-MM-DD hh:mm:ss'

 

'YY-MM-DD hh:mm:ss'

 

'CCYYMMDDhhmmss'

 

'YYMMDDhhmmss'

 

CCYYMMDDhhmmss

 

YYMMDDhhmmss

DATE

'CCYY-MM-DD'

 

'YY-MM-DD'

 

'CCYYMMDD'

 

'YYMMDD'

 

CCYYMMDD

 

YYMMDD

TIME

'hh:mm:ss'

 

'hhmmss'

 

hhmmss

YEAR

'CCYY'

 

'YY'

 

CCYY

 

YY


MySQL interprets formats that have no century part (CC) using the rules described in "Interpretation of Ambiguous Year Values." For string formats that include delimiter characters, you don't have to use '-' for dates and ':' for times. Any punctuation character may be used as the delimiter. Interpretation of values depends on context, not on the delimiter. For example, although times are typically specified using a delimiter of ':', MySQL won't interpret a value containing ':' as a time in a context where a date is expected. In addition, for the string formats that include delimiters, you need not specify two digits for month, day, hour, minute, or second values that are less than 10. The following are all equivalent:

'2012-02-03 05:04:09'
'2012-2-03 05:04:09'
'2012-2-3 05:04:09'
'2012-2-3 5:04:09'
'2012-2-3 5:4:09'
'2012-2-3 5:4:9'

Note that MySQL may interpret values with leading zeros in different ways depending on whether they are specified as strings or numbers. The string '001231' will be seen as a six-digit value and interpreted as '2000-12-31' for a DATE, and as '2000-12-31 00:00:00' for a DATETIME. On the other hand, the number 001231 will be seen as 1231 after the parser gets done with it and then the interpretation becomes problematic. This is a case where it's best to supply a string value '001231', or else use a fully qualified value if you are using numbers (that is, 20001231 for DATE and 200012310000 for DATETIME).

In general, you may freely assign values between the DATE, DATETIME, and TIMESTAMP types, although there are certain restrictions to keep in mind:

  • If you assign a DATETIME or TIMESTAMP value to a DATE, the time part is discarded.

  • If you assign a DATE value to a DATETIME or TIMESTAMP, the time part of the resulting value is set to zero ('00:00:00').

  • The types have different ranges. In particular, TIMESTAMP has a more limited range (1970 to 2037); so, for example, you cannot assign a pre-1970 DATETIME value to a TIMESTAMP and expect reasonable results. Nor can you assign values to a TIMESTAMP that are far in the future.

MySQL provides many functions for working with date and time values. See Appendix C for more information.

Interpretation of Ambiguous Year Values

For all date and time types that include a year part (DATE, DATETIME TIMESTAMP, YEAR), MySQL handles values that contain two-digit years by converting them to four-digit years:

  • Year values from 00 to 69 become 2000 to 2069

  • Year values from 70 to 99 become 1970 to 1999

You can see the effect of these rules most easily by storing different two-digit values into a YEAR column and then retrieving the results. This demonstrates something you should take note of:

mysql> CREATE TABLE y_table (y YEAR);
mysql> INSERT INTO y_table VALUES(68),(69),(99),(00),('00');
mysql> SELECT * FROM y_table;
+------+
| y    |
+------+
| 2068 |
| 2069 |
| 1999 |
| 0000 |
| 2000 |
+------+

Notice that 00 is converted to 0000, not to 2000. That's because, as a number, 00 is the same as 0, and is a perfectly legal value for the YEAR type. If you insert a numeric zero, that's what you get. To get 2000 using a value that does not contain the century, insert the string '0' or '00'. You can make sure that MySQL sees a string and not a number by inserting YEAR values using CAST(value AS CHAR) to produce a string result uniformly regardless of whether value is a string or a number.

In any case, keep in mind that the rules for converting two-digit to four-digit year values provide only a reasonable guess. There is no way for MySQL to be certain about the meaning of a two-digit year when the century is unspecified. If MySQL's conversion rules don't produce the values that you want, the solution is to provide unambiguous data with four-digit years.

Is MySQL Year-2000 Safe?

MySQL itself is year-2000 safe because it stores dates internally with four-digit years, but it's your responsibility to provide data that result in the proper values being stored in the first place. The real problem with two-digit year interpretation comes not from MySQL, but from the human desire to take a shortcut and enter ambiguous data. If you're willing to take the risk, go ahead. It's your risk to take, and MySQL's guessing rules are adequate for many situations. Just be aware that there are times when you really do need to enter four digits. For example, to enter birth and death dates into the president table, which lists U.S. presidents back into the 1700s, four-digit year values are in order. Values in these columns span several centuries, so letting MySQL guess the century from a two-digit year is definitely the wrong thing to do.


Spatial Data Types

MySQL 4.1 and up supports spatial values. This capability allows representation of values such as points, lines, and polygons. These data types are implemented per the OpenGIS specification, which is available at the Open Geospatial Consortium Web site:

http://www.opengeospatial.org/

The spatial data types allowed in MySQL are listed in Table 3.17.

Table 3.17. Spatial Data Types

Type Name

Meaning

GEOMETRY

A spatial value of any type

POINT

A point (a pair of X,Y coordinates)

LINESTRING

A curve (one or more POINT values)

POLYGON

A polygon

GEOMETRYCOLLECTION

A collection of GEOMETRY values

MULTILINESTRING

A collection of LINESTRING values

MULTIPOINT

A collection of POINT values

MULTIPOLYGON

A collection of POLYGON values


Currently, MySQL supports spatial types only for MyISAM tables, and indexed spatial columns do not allow NULL values. These restrictions may or may not matter to you. Probably the most significant implications to consider are these:

  • MyISAM is a non-transactional storage engine, so you cannot use GIS types within transactional operations that require commit and rollback.

  • You cannot use NULL to represent missing values within indexed columns. Depending on your application, it might be acceptable to use an empty (zero-dimensional) value instead.

MySQL works with spatial values in three formats. Well-Known Text (WKT) and Well-Known Binary (WKB) formats represent spatial values as text strings or in a standard binary format. The syntax for text strings and the binary representation are defined in the OpenGIS specification. For example, the WKT format for a POINT value with coordinates of x and y is written as a string:

'POINT(x y)'

Note the absence of a comma between the coordinate values. More complex values have a more complex string representation. The following strings represent a LINESTRING consisting of several points and a POLYGON that has a rectangular outer boundary and a triangular inner boundary:

'LINESTRING(10 20, 0 0, 10 20, 0 0)'
'POLYGON((0 0, 100 0, 100 100, 0 100, 0 0),(30 30, 30 60, 45 60, 30 30))'

The third format is the internal format that MySQL uses for storing spatial values in tables.

Because spatial values can be complex, most operations on them are done by invoking functions. The set of spatial functions is extensive and includes functions for converting from one format to another. For the complete list, see Appendix C.

The following example shows how to use several aspects of spatial support:

  • It creates a table that includes a spatial column.

  • It populates the table with some POINT values, using the POINTFROMTEXT() function that produces an internal-format value from a WKT representation.

  • It creates a stored function that computes the distance between two points, using the X() and Y() functions to extract point coordinates.

  • It computes the distance of each point in the table from a given reference point.

mysql> CREATE TABLE pt_tbl (p POINT);
mysql> INSERT INTO pt_tbl (p) VALUES
    ->    (POINTFROMTEXT('POINT(0 0)')),
    ->    (POINTFROMTEXT('POINT(0 50)')),
    ->    (POINTFROMTEXT('POINT(100 100)'));
mysql> CREATE FUNCTION dist (p1 POINT, p2 POINT)
    ->    RETURNS FLOAT
    ->    RETURN SQRT(POW(X(p2)-X(p1),2) + POW(Y(p2)-Y(p1),2));
mysql> SET @ref_pt = POINTFROMTEXT('POINT(0 0)');
mysql> SELECT ASTEXT(p), dist (p, @ref_pt) AS dist FROM pt_tbl;
+----------------+-----------------+
| ASTEXT(p)      | dist            |
+----------------+-----------------+
| POINT(0 0)     |               0 |
| POINT(0 50)    |              50 |
| POINT(100 100) | 141.42135620117 |
+----------------+-----------------+

    Team LiB
    Previous Section Next Section