Previous Section  < Day Day Up >  Next Section

6.1 Using SQL Expressions and Functions

Expressions are a common element of SQL statements, and they occur in many contexts. For example, expressions often occur in the WHERE clause of SELECT, DELETE, and UPDATE statements to identify which records to retrieve, delete, or update. But expressions may be used in many other places; for example, in the output column list of a SELECT statement, or in ORDER BY or GROUP BY clauses.

Terms of expressions consist of constants (literal numbers, strings, dates, and times), NULL values, references to table columns, and function calls. Terms may be combined using operators into more complex expressions. Many types of operators are available, such as those for arithmetic, comparison, logical, and pattern-matching operations.

Here are some examples of expressions:

  • The following statement refers to table columns to select country names and populations from the Country table:

    
    
    
    

    
    SELECT Name, Population FROM Country;
    
    

  • You can work directly with literal data values that aren't stored in a table. The following statement refers to literal integer, floating-point number, and string values:

    
    
    
    

    
    SELECT 14, -312.82, 'I am a string';
    
    

  • Another way to produce data values is by invoking functions. This statement calls functions that return the current date and a server version string:

    
    
    
    

    
    SELECT CURDATE(), VERSION();
    
    

All these types of values can be combined into more complex expressions to produce other values of interest. The following statement demonstrates this:






mysql> SELECT Name,

    -> TRUNCATE(Population/SurfaceArea,2) AS 'people/sq. km',

    -> IF(GNP > GNPOld,'Increasing','Not increasing') AS 'GNP Trend'

    -> FROM Country ORDER BY Name LIMIT 10;

+---------------------+---------------+----------------+

| Name                | people/sq. km | GNP Trend      |

+---------------------+---------------+----------------+

| Afghanistan         |         34.84 | Not increasing |

| Albania             |        118.31 | Increasing     |

| Algeria             |         13.21 | Increasing     |

| American Samoa      |        341.70 | Not increasing |

| Andorra             |        166.66 | Not increasing |

| Angola              |         10.32 | Not increasing |

| Anguilla            |         83.33 | Not increasing |

| Antarctica          |          0.00 | Not increasing |

| Antigua and Barbuda |        153.84 | Increasing     |

| Argentina           |         13.31 | Increasing     |

+---------------------+---------------+----------------+


The expressions in the preceding statement refer to these types of values:

  • Table columns: Name, Population, SurfaceArea, GNP, and GNPOld. ("GNP" means "gross national product.")

  • Literal values: 'Increasing', 'Not increasing', and the column aliases are all string constants.

  • Functions: TRUNCATE() is used to format the population/area ratio to two decimals, and IF() is a logical function that tests the expression in its first argument and returns the second or third argument depending on whether the expression is true or false.

Many functions are available in MySQL. This study guide uses several of them for examples, but they make up only a fraction of the number available. Consult the MySQL Reference Manual for a complete list of functions and how to use them. For the purpose of testing, Appendix C, "Quick Reference," lists a subset of functions with which you should familiarize yourself before going to the exam.

Note that when you invoke a function, there must be no space after the function name and before the opening parenthesis. (It's possible to change this default behavior by starting the server with a special option; see section 6.5, "Using Reserved Words as Identifiers.")

More examples of SQL expressions can be found in the sample exercises at the end of the chapter.

6.1.1 Case Sensitivity in String Comparisons

String comparisons are somewhat more complex than numeric or temporal comparisons. Numbers sort in numeric order and dates and times sort in temporal order, but string comparisons depend not only on the specific content of the strings, but on whether they are binary or nonbinary and on the character set of the server. A letter in uppercase may compare as the same or different than the same letter in lowercase, and a letter with one type of accent may be considered the same or different than that letter with another type of accent.

The earlier discussion in Chapter 4, "Data Definition Language," on data and column types described how strings may be binary or nonbinary, and how the properties of these two types of strings differ. To summarize:

  • Binary strings are treated as raw bytes. Nonbinary strings are treated as characters. Characters may consist of single or multiple bytes, depending on the character set.

  • Because binary strings have no character set, they have no collating (sorting) order. Comparisons between binary strings are based on the numeric values of the raw bytes. This results in case sensitive comparisons. Nonbinary strings contain characters in a particular character set, and each character set has a collating order that determines sorting behavior. Typically, collating orders treat uppercase and lowercase versions of a given character the same, which results in case-insensitive comparisons.

The rules that govern string comparison apply in several ways. They determine the result of comparisons performed explicitly with operators such as = and <, and comparisons performed implicitly by ORDER BY, GROUP BY, and DISTINCT operations.

Literal strings are nonbinary by default, and thus are not case sensitive. You can see this by comparing strings that differ only in lettercase:






mysql> SELECT 'Hello' = 'hello';

+-------------------+

| 'Hello' = 'hello' |

+-------------------+

|                 1 |

+-------------------+


Depending on the character set, strings might not be accent sensitive, either. If the German sorting order is in use, ue and ü are the same:






mysql> SELECT 'Mueller' = 'Müller';

+----------------------+

| 'Mueller' = 'Müller' |

+----------------------+

|                    1 |

+----------------------+


A nonbinary string can be treated as a binary string by preceding it with the BINARY keyword. If either string in a comparison is binary, both strings are treated as binary:






mysql> SELECT BINARY 'Hello' = 'hello';

+--------------------------+

| BINARY 'Hello' = 'hello' |

+--------------------------+

|                        0 |

+--------------------------+

mysql> SELECT 'Hello' = BINARY 'hello';

+--------------------------+

| 'Hello' = BINARY 'hello' |

+--------------------------+

|                        0 |

+--------------------------+


The same principles apply to CHAR and VARCHAR table columns. By default, they're nonbinary, but you can add BINARY to a column definition to make the column binary. Suppose that a table t contains a CHAR column c and has the following rows (each of which means "goodbye"):






mysql> SELECT c FROM t;

+-----------+

| c         |

+-----------+

| Hello     |

| goodbye   |

| Bonjour   |

| au revoir |

+-----------+


Because c is a nonbinary CHAR column, its character set collating order controls how its values are sorted. For the default character set (Latin-1, also known as ISO-8859-1), the collating order isn't case sensitive, so uppercase and lowercase are treated as identical and a sort operation that uses ORDER BY produces results like this:






mysql> SELECT c FROM t ORDER BY c;

+-----------+

| c         |

+-----------+

| au revoir |

| Bonjour   |

| goodbye   |

| Hello     |

+-----------+


If c is declared as a CHAR BINARY column instead, ORDER BY sorts based on raw byte codes and produces a different result. Assuming that values are stored on a machine that uses ASCII codes, uppercase precedes lowercase and the results look like this:






mysql> SELECT c FROM t ORDER BY c;

+-----------+

| c         |

+-----------+

| Bonjour   |

| Hello     |

| au revoir |

| goodbye   |

+-----------+


String comparison rules also apply to GROUP BY and DISTINCT operations. Suppose that t has a column c with the following contents:






mysql> SELECT c FROM t;

+---------+

| c       |

+---------+

| Hello   |

| hello   |

| Goodbye |

| goodbye |

+---------+


If c is a CHAR column, GROUP BY and DISTINCT do not make lettercase distinctions:






mysql> SELECT c, COUNT(*) FROM t GROUP BY c;

+---------+----------+

| c       | COUNT(*) |

+---------+----------+

| Goodbye |        2 |

| Hello   |        2 |

+---------+----------+

mysql> SELECT DISTINCT c FROM t;

+---------+

| c       |

+---------+

| Hello   |

| Goodbye |

+---------+


On the other hand, if c is a CHAR BINARY column, those operations do take lettercase into account:






mysql> SELECT c, COUNT(*) FROM t GROUP BY c;

+---------+----------+

| c       | COUNT(*) |

+---------+----------+

| Goodbye |        1 |

| Hello   |        1 |

| goodbye |        1 |

| hello   |        1 |

+---------+----------+

mysql> SELECT DISTINCT c FROM t;

+---------+

| c       |

+---------+

| Hello   |

| hello   |

| Goodbye |

| goodbye |

+---------+


The preceding discussion shows that to understand sorting and comparison behavior for literal strings or string columns, it's important to know whether they are binary or nonbinary. This is important when using string functions as well. String functions may treat their arguments as binary or nonbinary strings, or return binary or nonbinary results. It depends on the function. Here are some examples:

  • LENGTH() returns the length of a string in bytes, whereas CHAR_LENGTH() returns the length in characters. For single-byte character sets, the two functions return identical results, but for multi-byte character sets, you should choose the function that is appropriate for the type of result you want. For example, the SJIS character set includes characters that require two bytes to represent. The value of LENGTH() for any string containing such characters will be greater than the value of CHAR_LENGTH().

  • The LOCATE() function determines whether a substring is present in another string and returns 1 or 0 depending on whether it is or is not. The comparison rules depend on the arguments. LOCATE() performs a case-sensitive comparison if at least one of its arguments is a binary string. Otherwise, the comparison is not case sensitive:

    
    
    
    

    
    mysql> SELECT LOCATE('A','abc');
    
    +-------------------+
    
    | LOCATE('A','abc') |
    
    +-------------------+
    
    |                 1 |
    
    +-------------------+
    
    mysql> SELECT LOCATE('A',BINARY 'abc');
    
    +--------------------------+
    
    | LOCATE('A',BINARY 'abc') |
    
    +--------------------------+
    
    |                        0 |
    
    +--------------------------+
    
    

  • MD5() takes a string argument and produces a 32-byte byte checksum represented as a string of hexadecimal digits. It treats its argument as a binary string:

    
    
    
    

    
    mysql> SELECT MD5('a');
    
    +----------------------------------+
    
    | MD5('a')                         |
    
    +----------------------------------+
    
    | 0cc175b9c0f1b6a831c399e269772661 |
    
    +----------------------------------+
    
    mysql> SELECT MD5('A');
    
    +----------------------------------+
    
    | MD5('A')                         |
    
    +----------------------------------+
    
    | 7fc56270e7a70fa81a5935b72eacbe29 |
    
    +----------------------------------+
    
    

    However, the result from MD5() is not a binary string, as can be seen by using LOWER() and UPPER() to convert a given MD5() result to lowercase and uppercase and comparing the two values:

    
    
    
    

    
    mysql> SELECT LOWER(MD5('a'));
    
    +----------------------------------+
    
    | LOWER(MD5('a'))                  |
    
    +----------------------------------+
    
    | 0cc175b9c0f1b6a831c399e269772661 |
    
    +----------------------------------+
    
    mysql> SELECT UPPER(MD5('a'));
    
    +----------------------------------+
    
    | UPPER(MD5('a'))                  |
    
    +----------------------------------+
    
    | 0CC175B9C0F1B6A831C399E269772661 |
    
    +----------------------------------+
    
    mysql> SELECT LOWER(MD5('a')) = UPPER(MD5('a'));
    
    +-----------------------------------+
    
    | LOWER(MD5('a')) = UPPER(MD5('a')) |
    
    +-----------------------------------+
    
    |                                 1 |
    
    +-----------------------------------+
    
    

These examples demonstrate that you must take into account the properties of the particular function you want to use. If you don't, you might be surprised at the results you get. See the MySQL Reference Manual for details on specific functions.

    Previous Section  < Day Day Up >  Next Section