2.4 Selecting the Right Engine

When designing MySQL-based applications, you should decide which engine to use for storing your data. If you don't think about it during the design phase, you will likely face complications later in the process. You might find that the default engine doesn't provide a feature you need, such as transactions. Or maybe the mix of read and write queries your application generates will require more granular locking than MyISAM's table locks.

Because you can make the choice on a table-by-table basis, you'll need a clear idea of how each table is used and the data it stores. Of course, it also helps to have a good understanding of the application as a whole and its potential for growth. Armed with this information, you can begin to make good choices about which table engines can do the job.

2.4.1 Considerations

While there are many factors that can affect your decision, it usually boils down to just a few considerations: transactions and concurrency, backups, and special features.

2.4.1.1 Transactions and concurrency

When it comes to transactions and concurrency, consider the following guidelines:

If your application requires transactions and high read/write concurrency, InnoDB is probably your best bet.
If your application requires transactions but only moderate read/write concurrency, either BDB or InnoDB tables should work fine.
If your application doesn't require transactions and issues primarily SELECT or primarily INSERT/UPDATE queries, MyISAM is a good choice. Many web applications fall into this category.

2.4.1.2 Backups

The need to perform regular backups may also influence your table choices. If your server can be shut down at regular intervals for backups, the storage engines are equally easy to deal with. However, if you need to perform online backups in one form or another, the choices become less clear. Chapter 9 deals with this topic in more detail.

Another way of looking at this is simplicity. As you'll see in Chapter 9, using multiple storage engines increases the complexity of backups and server tuning. You may decide that it's just easier to use a single storage engine rather than those that are theoretically best.

2.4.1.3 Special features

Finally, you sometimes find that an application relies on particular features or optimizations that are provided by only some of MySQL's storage engines. For example, not all tables provide a quick answer to queries like the following:

SELECT COUNT(*) FROM mytable

If your application depends on accurate and fast row counts, MyISAM is the answer. InnoDB must actually count up all the rows, but the MyISAM storage engine always knows the exact row count of a table without the need to do any work.

If your application requires referential integrity with foreign keys, you're limited to just InnoDB tables. Do you need full-text search capabilities? Only MyISAM tables provide it.

Keep this in mind as you read the more detailed information about each table's features. There will come a time when you find that the feature you really, really need is available only in one table engine. When that happens, you need to either compromise or break a table into multiple tables of different types.

2.4.2 Practical Examples

These issues may seem rather abstract without some sort of real-world context. So let's consider some common uses for tables in various database applications. For each table, we'll look at which engine best matches with the table's needs. The details of each engine are covered in the next section.

2.4.2.1 Logging

Suppose you want to use MySQL to log a record of every telephone call from a central telephone switch in real time. Or maybe you've installed mod_log_sql for Apache so you can log all visits to your web site directly in a table. In such an application, speed is probably the most important goal; you don't want the database to be the bottleneck. Using MyISAM tables works very well because they have very low overhead and can handle inserting thousands of records per second.

Things will get interesting if you decide it's time to start running reports to summarize the data you've logged. Depending on the queries you use, there's a good chance you will significantly slow the process of inserting records while gathering data for the report. What can you do?

You can use MySQL's built-in replication (Chapter 7) to clone the data onto a second (slave) server. You can then run your time- and CPU-intensive queries against the data on the slave. This keeps the master free to insert records as fast as it possibly can while also giving you the freedom to run any query you want without worrying about how it could affect the real-time logging.

Another option is to use a MyISAM Merge table. Rather than always logging to the same table, adjust the application to log to a table that contains the name or number of the month in its name, such as web_logs_2004_01 or web_logs_2004_jan. Then define a Merge table that contains the data you'd like to summarize and use it in your queries. If you need to summarize data daily or weekly, the same strategy works; you just need to create tables with more specific names, such as web_logs_2004_01_01. While you're busy running queries against tables that are no longer being written to, your application can log records to its current table uninterrupted. Merge tables are discussed in the later section "MyISAM Merge tables."

A final possibility is simply to switch to using a table that has more granular locking than MyISAM does. Either BDB or InnoDB works well in this case. Non-MyISAM tables will generally use more CPU and disk space, but that may be a reasonable tradeoff in this case. Also, in the event of a crash, MyISAM tables may take quite a long time to check and repair while InnoDB tables should recover quickly.

2.4.2.2 Read-only or read-mostly tables

Tables that contain the data used to construct a catalog or listing of some sort (jobs, auctions, real estate, etc.) are usually read from far more often than they are written to. This makes them great candidates for MyISAM.

2.4.2.3 Order processing

When you deal with any sort of order processing, transactions are a requirement. Half-completed orders aren't going to endear customers to your service. Using transaction-safe table types (InnoDB or BDB), these unfortunate "data surprises" can be avoided. Considering that BDB tables use—at best—locking at the page level, applications with high transaction volumes should consider InnoDB tables.

In the case of order processing, InnoDB has a distinct advantage because it supports referential integrity through the use of foreign keys. These keys allow a field in one table to have an enforced relationship to the key of another table (e.g., an Order record contains a CustomerID field that "points" to the primary key of the Customer table). Foreign keys effectively point to those other tables and indicate that data is maintained in them, and they help you keep data consistent across your tables. (Keep in mind that a foreign key in an InnoDB table must reference another InnoDB table. Currently they can't cross storage engines.)

You might want to design your tables so that customers can't be removed without also removing all their orders. Or maybe you'd like to ensure that products aren't deleted from the catalog table before the orders that reference those products are archived. With InnoDB's foreign keys, you can.

2.4.2.4 Stock quotes

If you're collecting stock quotes for your own analysis, MyISAM tables work great. However, if you're running a high-traffic web service that has a real-time quote feed and thousands of users, a query should never have to wait. At any time, there could be many clients attempting to read and write to the table, so the row-level locking provided by InnoDB tables is the way to go.

If you have sufficient memory, MySQL's in-memory Heap tables might be an option, too. However, their indexes have some interesting restrictions you need to investigate first. See Section 4.3.2 in Chapter 4 for more details.

2.4.2.5 Bulletin boards and threaded discussion forums

Threaded discussions are an interesting problem for MySQL users. There are hundreds of freely available PHP and Perl-based systems available that provide threaded discussions. Many of them aren't written with database efficiency in mind, so they tend to perform a large number of queries for each request they serve, as well as updates to counters and usage statistics about the various discussions. Many of the systems also use a small number of monolithic tables to store all their data. As a result, a few central tables become the focus of heavy read and write activity, and the locks required to enforce concurrency become a substantial source of contention.

Despite their design shortcomings, most of the systems work well for small and medium loads. However, if a web site grows large enough and generates a significant amount of traffic, it may begin to get very slow. The obvious solution is to switch to a different table type that can handle the heavy read/write volume. Users who have attempted this are sometimes surprised to find that the system runs even more slowly than it did before!

What they don't realize is that the system is using a particular query, normally something like:

SELECT COUNT(*) FROM table WHERE ...

The problem is that not all engines can run that query quickly. MyISAM tables keep accurate row counts available, so they can. But BDB and InnoDB must actually scan the data to count all the rows. The developers of the popular web site Slashdot (http://slashdot.org/) ran into this problem when they moved their system from MyISAM to InnoDB tables. They spent time going through their code to eliminate all those queries.

MySQL's query cache, which we'll cover in more detail in Chapter 5, can often be a big help in situations in which an application issues the same query over and over with the same parameters.

2.4.2.6 CD-ROM applications

If you ever need to distribute a CD-ROM- or DVD-ROM-based application that uses MySQL data files, consider using MyISAM or Compressed MyISAM tables. They can be easily isolated and copied to other media. Compressed MyISAM tables take far less space than uncompressed ones, but they are read-only. Since the data is going to be on read-only media anyway, there's little reason not to use compressed tables.

2.4.3 Table Conversions

Several techniques are available to convert one table type to another, each with advantages and disadvantages. In the following sections, we cover three of the most common.

2.4.3.1 ALTER TABLE

The easiest way to move a table from one engine to another is by using an ALTER TABLE statement. The following command converts mytable to BDB:

ALTER TABLE mytable TYPE = BDB;

As of MySQL Versions 4.0.18 and 4.1.2, you may use ENGINE instead of TYPE. In a later version of MySQL (probably in the 5.x series), support for TYPE will be removed entirely.

The previous syntax works for all storage engines, but there's a catch: it can take a lot of time. MySQL will perform a row-by-row copy of your old table into your new table. During that time, you'll probably be using all the server's disk I/O capacity, and the original table will be locked while the conversion runs. So take care before trying this technique on a busy table. Instead, you can use one of the following methods, which involve making a copy of the table first.

2.4.3.2 Dump and reimport

To gain more control over the process, you might choose to dump the table to a text file using the mysqldump utility. Once the table is dumped, simply edit the dump file to adjust the CREATE TABLE statement it contains. Be sure to change the table name as well as its type because you can't have two tables with the same name in the same database even if they are of different types.

If you import into InnoDB or BDB, be sure to use the --no-autocommit option to disable AUTOCOMMIT mode. Otherwise each individual insert will be performed in its own transaction.

The downside of using mysqldump is that it isn't terribly fast and uses far more disk space. Not only will the dump file contain all the data from the table, it will also contain all the SQL necessary to repopulate the table. Also, you won't be able to delete the dump file until the new table has been created.

Furthermore, if the dump file happens to be quite large, editing it can be a challenge. You can't simply load a 6-GB file into vi or emacs on most systems.^[2] Instead, you'll need to craft a Perl or sed script to do the job.

^[2] Maybe you can, but it'll be pretty painful.

2.4.3.3 CREATE and SELECT

The third technique is a compromise between the speed of the first mechanism and the safety of the second. Rather than dumping the entire table or converting it all at once, you create the new table and use MySQL's INSERT INTO ... SELECT syntax to populate it incrementally. If, for example, you have a MyISAM table called myisam_table that you'd like to convert to an InnoDB table named innodb_table, you need to run queries like this:

BEGIN;

INSERT INTO innodb_table SELECT * FROM myisam_table WHERE id BETWEEN x AND y;

COMMIT;

Assuming that id is the primary key, you run that query using larger values of x and y each time until all the data has been copied to the new table. After doing so, you are left with the original table, which you can drop after you're done with it, and the new table, which is now fully populated.

Alternatively, if you use MySQL 4.1 or newer, you can create the new table and copy the table in two steps:

CREATE TABLE newtable LIKE mytable;

INSERT INTO newtable SELECT * FROM mytable;

Whichever method you use, if you're dealing with a large volume of data, it's often more efficient to copy the data before adding indexes to the new table.

< Day Day Up >