Team LiB
Previous Section Next Section

Performing Multiple-Table Retrievals with Subqueries

Subquery support is a capability that allows one SELECT statement to be written within parentheses and nested inside another. Here's an example that looks up the IDs for grade event records that correspond to tests ('T') and uses them to select scores for those tests:

SELECT * FROM score
WHERE event_id IN (SELECT event_id FROM grade_event WHERE category = 'T');

Before version 4.1, MySQL could not do subqueries, which was one of the knocks against it. The situation has changed and you can use subqueries freely now, although it is not unusual to see the claim "MySQL doesn't support subqueries." I guess that isn't surprising; some people still think MySQL doesn't support transactions, either.

Subqueries can return different amounts of information:

  • A scalar subquery returns a single value.

  • A column subquery returns a single column of one or more values.

  • A row subquery returns a single row of one or more values.

  • A table subquery returns a table of one or more rows of one or more columns.

Subquery results can be tested in different ways:

  • Scalar subquery results can be evaluated using relative comparison operators such as = or <.

  • IN and NOT IN test whether a value is present in a set of values returned by a subquery.

  • ALL, ANY, and SOME compare a value to the set of values returned by a subquery.

  • EXISTS and NOT EXISTS test whether a subquery result is empty.

A scalar subquery is the most restrictive because it produces only a single value. But as a consequence, scalar subqueries can be used in the widest variety of contexts. They are applicable essentially anywhere that you can use a scalar operand, such as a term of an expression, as a function argument, or in the output column list. Column, row, and table subqueries that return more information cannot be used in contexts that require a single value.

Subqueries can be correlated or uncorrelated. This is a function of whether a subquery refers to and is dependent on values in the outer query.

You can use subqueries with statements other than SELECT. However, for statements that modify tables (INSERT, REPLACE, DELETE, UPDATE) there is currently a restriction that the subquery cannot refer to the table being modified.

In some cases, subqueries can be rewritten as joins. You might find subquery rewriting techniques useful if you're writing queries that need to run on an older MySQL server, or if you want to see if the MySQL optimizer does a better job with a join than a subquery.

The following sections discuss the kinds of operations you can use to test subquery results, how to write correlated subqueries, and how to rewrite subqueries as joins.

Subqueries with Relative Comparison Operators

The =, <>, >, >=, <, and <= operators perform relative-value comparisons. When used with a scalar subquery, they find all rows in the outer query that stand in particular relationship to the value returned by the subquery. For example, to identify the scores for the quiz that took place on '2004-09-23', use a scalar subquery to determine the quiz event ID and then match score records against it in the outer SELECT:

SELECT * FROM score
WHERE event_id =
(SELECT event_id FROM grade_event
   WHERE date = '2004-09-23' AND category = 'Q');

With this form of statement, where the subquery is preceded by a value and a relative comparison operator, it is necessary that the subquery produce a single value. That is, it must be a scalar subquery; if it produces multiple values, the statement will fail. In some cases, it may be appropriate to satisfy the single-value requirement by limiting the subquery result with LIMIT 1.

Use of scalar subqueries with relative comparison operators is handy for solving problems where you'd be tempted to use an aggregate function in a WHERE clause. For example, to determine which of the presidents in the president table was born first, you might try this statement:

SELECT * FROM president WHERE birth = MIN(birth);

That doesn't work because you can't use aggregates in WHERE clauses. (The WHERE clause determines which records to select, but the value of MIN() isn't known until after the records have already been selected.) However, you can use a subquery to produce the minimum birth date like this:

SELECT * FROM president
WHERE birth = (SELECT MIN(birth) FROM president);

Other aggregate functions can be used to solve similar problems. The following statement uses a subquery to select the above-average scores from a given grade event:

SELECT * FROM score WHERE event_id = 5
AND score > (SELECT AVG(score) FROM score WHERE event_id = 5);

If a subquery returns a single row, you can use a row constructor to compare a set of values (that is, a tuple) to the subquery result. This statement returns records for presidents who were born in the same city and state as John Adams:

mysql> SELECT last_name, first_name, city, state FROM president
    -> WHERE (city, state) =
    -> (SELECT city, state FROM president
    -> WHERE last_name = 'Adams' AND first_name = 'John');
+-----------+-------------+-----------+-------+
| last_name | first_name  | city      | state |
+-----------+-------------+-----------+-------+
| Adams     | John        | Braintree | MA    |
| Adams     | John Quincy | Braintree | MA    |
+-----------+-------------+-----------+-------+

You can also use ROW(city,state) notation, which is equivalent to (city,state). Both act as row constructors that represent tuples.

IN and NOT IN Subqueries

The IN and NOT IN operators can be used when a subquery returns multiple rows to be evaluated in comparison to the outer query. They test whether a comparison value is present in a set of values. IN is true for rows in the outer query that match any row returned by the subquery. NOT IN is true for rows in the outer query that match no rows returned by the subquery. The following statements use IN and NOT IN to find those students who have absences listed in the absence table, and those who have perfect attendance (no absences):

mysql> SELECT * FROM student
    -> WHERE student_id IN (SELECT student_id FROM absence);
+-------+-----+------------+
| name  | sex | student_id |
+-------+-----+------------+
| Kyle  | M   |          3 |
| Abby  | F   |          5 |
| Peter | M   |         10 |
| Will  | M   |         17 |
| Avery | F   |         20 |
+-------+-----+------------+
mysql> SELECT * FROM student
    -> WHERE student_id NOT IN (SELECT student_id FROM absence);
+-----------+-----+------------+
| name      | sex | student_id |
+-----------+-----+------------+
| Megan     | F   |          1 |
| Joseph    | M   |          2 |
| Katie     | F   |          4 |
| Nathan    | M   |          6 |
| Liesl     | F   |          7 |
...

IN and NOT IN also work for subqueries that return multiple columns. In other words, you can use them with table subqueries. In this case, use a row constructor to specify the comparison values to test against each column:

mysql> SELECT last_name, first_name, city, state FROM president
    -> WHERE (city, state) IN
    -> (SELECT city, state FROM president
    -> WHERE last_name = 'Roosevelt');
+-----------+-------------+-----------+-------+
| last_name | first_name  | city      | state |
+-----------+-------------+-----------+-------+
| Roosevelt | Theodore    | New York  | NY    |
| Roosevelt | Franklin D. | Hyde Park | NY    |
+-----------+-------------+-----------+-------+

IN and NOT IN actually are synonyms for = ANY and <> ALL, which are covered in the next section.

ALL, ANY, and SOME Subqueries

The ALL and ANY operators are used in conjunction with a relative comparison operator to test the result of a column subquery. They test whether the comparison value stands in particular relationship to all or some of the values returned by the subquery. For example, <= ALL is true if the comparison value is less than or equal to every value that the subquery returns, whereas <= ANY is true if the comparison value is less than or equal to any value that the subquery returns. SOME is a synonym for ANY.

This statement determines which president was born first by selecting the record with a birth date less than or equal to all the birth dates in the president table (only the earliest date satisfies this condition):

mysql> SELECT last_name, first_name, birth FROM president
    -> WHERE birth <= ALL (SELECT birth FROM president);
+------------+------------+------------+
| last_name  | first_name | birth      |
+------------+------------+------------+
| Washington | George     | 1732-02-22 |
+------------+------------+------------+

On the other hand, the following statement returns all rows because every date is less than or equal to at least one other date (itself):

mysql> SELECT last_name, first_name, birth FROM president
    -> WHERE birth <= ANY (SELECT birth FROM president);
+------------+---------------+------------+
| last_name  | first_name    | birth      |
+------------+---------------+------------+
| Washington | George        | 1732-02-22 |
| Adams      | John          | 1735-10-30 |
| Jefferson  | Thomas        | 1743-04-13 |
| Madison    | James         | 1751-03-16 |
| Monroe     | James         | 1758-04-28 |
...

When ALL, ANY, or SOME are used with the = comparison operator, the subquery can be a table subquery. In this case, you test return rows using a row constructor to provide the comparison values.

mysql> SELECT last_name, first_name, city, state FROM president
    -> WHERE (city, state) = ANY
    -> (SELECT city, state FROM president
    -> WHERE last_name = 'Roosevelt');
+-----------+-------------+-----------+-------+
| last_name | first_name  | city      | state |
+-----------+-------------+-----------+-------+
| Roosevelt | Theodore    | New York  | NY    |
| Roosevelt | Franklin D. | Hyde Park | NY    |
+-----------+-------------+-----------+-------+

As mentioned in the previous section, IN and NOT IN are shorthand for = ANY and <> ALL. That is, IN means "equal to any of the values returned by the subquery" and NOT IN means "unequal to all values returned by the subquery."

EXISTS and NOT EXISTS Subqueries

The EXISTS and NOT EXISTS operators merely test whether a subquery returns any rows. If it does, EXISTS is true and NOT EXISTS is false. The following statements show some trivial examples of these subqueries. The first returns 0 if the absence table is empty, the second returns 1:

SELECT EXISTS (SELECT * FROM absence);
SELECT NOT EXISTS (SELECT * FROM absence);

EXISTS and NOT EXISTS actually are much more commonly used in correlated subqueries. The next section shows some examples.

With EXISTS and NOT EXISTS, the subquery uses * as the output column list. There's no need to name columns explicitly, because the subquery is assessed as true or false based on whether it returns any rows, not based on the particular values that the rows might contain. You can actually write pretty much anything for the subquery column selection list, but if you want to make it explicit that you're returning a true value when the subquery succeeds, you might write it with SELECT 1 rather than with SELECT *.

Correlated Subqueries

Subqueries can be uncorrelated or correlated:

  • An uncorrelated subquery contains no references to values from the outer query. An uncorrelated subquery can be executed by itself as a separate statement. For example, the subquery in the following statement is uncorrelated because it refers only to the table t1 and not to t2:

    SELECT j FROM t2 WHERE j IN (SELECT i FROM t1);
    

  • A correlated subquery does contain references to values from the outer query, and thus is dependent on it. Due to this linkage, a correlated subquery cannot be executed by itself as a separate statement. For example, the subquery in the following statement is true for each value of column j in t2 that matches a column i value in t1:

    SELECT j FROM t2 WHERE (SELECT i FROM t1 WHERE i = j);
    

Correlated subqueries commonly are used for EXISTS and NOT EXISTS subqueries, which are useful for finding records in one table that match or don't match records in another. Correlated subqueries work by passing values from the outer query to the subquery to see whether they match the conditions specified in the subquery. For this reason, it's necessary to qualify column names with table names if they are ambiguous (appear in more than one table).

The following EXISTS subquery identifies matches between the tablesthat is, values that are present in both. The statement selects students who have at least one absence listed in the absence table:

SELECT student_id, name FROM student WHERE EXISTS
(SELECT * FROM absence WHERE absence.student_id = student.student_id);

NOT EXISTS identifies non-matchesvalues in one table that are not present in the other. This statement selects students who have no absences:

SELECT student_id, name FROM student WHERE NOT EXISTS
(SELECT * FROM absence WHERE absence.student_id = student.student_id);

Subqueries in the FROM Clause

Subqueries can be used in the FROM clause to generate values. In this case, the result of the subquery acts like a table. It can participate in joins, its values can be tested in the WHERE clause, and so forth. When using a subquery in a FROM clause, you must provide a table alias to give the subquery result a name.

mysql> SELECT * FROM (SELECT 1, 2, 3) AS t;
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+

Rewriting Subqueries as Joins

For versions of MySQL prior to 4.1, subqueries are not available. However, it's often possible to rephrase a query that uses a subquery in terms of a join. In fact, even for MySQL 4.1 or higher, it's not a bad idea to examine queries that you might be inclined to write in terms of subqueries. A join is sometimes more efficient than a subquery, so if a SELECT written as a subquery takes a long time to execute, try writing it as a join to see if it performs better. This section shows how to do that.

Rewriting Subqueries That Select Matching Values

Here's an example statement containing a subquery; it selects scores from the score table only for tests (that is, it ignores quiz scores):

SELECT * FROM score
WHERE event_id IN (SELECT event_id FROM grade_event WHERE category = 'T');

The same statement can be written without a subquery by converting it to a simple join:

SELECT score.* FROM score, grade_event
WHERE score.event_id = grade_event.event_id AND grade_event.category = 'T';

As another example, the following query selects scores for female students:

SELECT * from score
WHERE student_id IN (SELECT student_id FROM student WHERE sex = 'F');

This can be converted to a join as follows:

SELECT score.* FROM score, student
WHERE score.student_id = student.student_id AND student.sex = 'F';

There is a pattern here. The subquery statements follow this form:

SELECT * FROM table1
WHERE column1 IN (SELECT column2a FROM table2 WHERE column2b = value);

Such queries can be converted to a join using this form:

SELECT table1.* FROM table1, table2
WHERE table1.column1 = table2.column2a AND table2.column2b = value;

Note: In some cases, the subquery and the join might return different results. This occurs when table2 contains multiple instances of column2a. The subquery form produces only one instance of each column2a value, but the join would produce them all and its output would include duplicate rows. To suppress these duplicates, begin the join with SELECT DISTINCT rather than SELECT.

Rewriting Subqueries That Select Non-Matching (Missing) Values

Another common type of subquery statement searches for values in one table that are not present in another table. As we've seen before, the "which values are not present" type of problem is a clue that a LEFT JOIN may be helpful. Here's the statement with a subquery seen earlier that tests for students who are not listed in the absence table (it finds those students with perfect attendance):

SELECT * FROM student
WHERE student_id NOT IN (SELECT student_id FROM absence);

This query can be rewritten using a LEFT JOIN as follows:

SELECT student.*
FROM student LEFT JOIN absence ON student.student_id = absence.student_id
WHERE absence.student_id IS NULL;

In general terms, the subquery statement form is as follows:

SELECT * FROM table1
WHERE column1 NOT IN (SELECT column2 FROM table2);

A query having that form can be rewritten like this:

SELECT table1.*
FROM table1 LEFT JOIN table2 ON table1.column1 = table2.column2
WHERE table2.column2 IS NULL;

This assumes that table2.column2 is defined as NOT NULL.

The subquery has the advantage of being more intuitive than the LEFT JOIN. "Not in" is a concept that most people understand without difficulty, because it occurs outside the context of database programming. The same cannot be said for the concept of "left join," for which there is no such basis for natural understanding.

    Team LiB
    Previous Section Next Section