7.7 The Future of Replication

To solve some of MySQL's current shortcomings and to provide the infrastructure for handling problems that MySQL hasn't yet seen, a number of future enhancements have been proposed for MySQL. Let's look at them briefly.

7.7.1 Eliminating the Snapshot

With MySQL's current implementation, it's difficult to add a slave to a master after the master has been running for a long period of time. Many of the original binary logs have probably been removed to save space. Without all the logs, you can't simply configure the slave and point it at the master.

Even if you have all the binary logs on the master, it may take days, weeks, or even months for a slave to execute all the queries and finally catch up to the master. If you're looking to add slaves in a hurry, this clearly isn't the way to do it.

In either case, the ideal solution is simply to configure the new slave and tell it to begin replicating. Behind the scenes, the slave contacts the master and requests copies of the all the tables it needs, probably using a mechanism similar to LOAD TABLE FROM MASTER. The master will need a way to track all changes to tables between the time that the slave begins and finishes copying the tables. Upon completion of the copy, the slave receives all the necessary changes and begins replicating from the binary log.

An alternative is for all of MySQL's storage engines to implement a versioning scheme similar to InnoDB's. When a new slave connects and begins to copy the tables, it can get a snapshot from that moment in time. When the copy is complete, the slave can begin replicating from the binary log position corresponding to the moment when the snapshot was marked.

7.7.2 Fail-Safe Replication

When a master fails, you must select a new master and instruct all the slaves to connect to the new master and begin replicating. Not only is that process prone to errors, it can be time-consuming too. Ideally, MySQL should handle failover automatically.

The proposed solution involves each slave registering itself with the master so that the master can keep track of it. Not only will the master know which servers are slaves, it can also keep track of how up to date each slave is. The slaves, in turn, will also keep track of who all the other slaves are.

In the event that the master fails, the slaves can elect a master based on the available information. Ideally, they will find the slave that was the most up to date when the master went down.

7.7.3 Safe Multi-Master Replication

Today it's possible to use replication in a multi-master architecture, as depicted earlier (see Figure 7-3). The major drawback to doing so, however, is that you can't rely on AUTO_INCREMENT columns to function properly.

Each MyISAM table has a single counter that controls the next AUTO_INCREMENT value. Once that value has increased, it can't easily be decreased. If inserts are timed properly, they cause data to become inconsistent between the two masters.

Imagine the following events occurring on two servers, master1 and master2:

Both servers start with an empty orders table.
master1 inserts a record for customer 58, which is assigned ID 1.
master2 inserts a record for customer 1232, which is assigned ID 1.
master2 replicates master1's insert, adding the record for customer 58 and trying to assign it an ID of 1. That fails and results in a duplicate key error.
master1 replicates master2's insert, adding the record for customer 1232 and trying to assign it an ID of 1. That fails and results in a duplicate key error.

Each master was given an insert by some client before it had replicated the other master's insert. The result is that both masters are out of sync.

The current solution is to avoid using AUTO_INCREMENT fields completely and assign primary keys through some other means. You might use an MD5 hash of some values in the record, or perhaps use another library to generate a globally unique identifier (GUID).

Let's look at the two proposed solutions for the future.

7.7.3.1 Multipart auto-increment unique keys

The first is to use MyISAM's multipart auto-increment unique keys. Rather than using a single column as a primary key, you'd set up a table like this:

CREATE TABLE orders (

    server_id       INTEGER UNSIGNED NOT NULL,

    record_id       INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,

    stuff           VARCHAR(255)     NOT NULL,

    UNIQUE mykey (server_id, record_id)

);

Notice that the record_id is an AUTO_INCREMENT field and is the second part of a two-part unique key. When you insert NULL into the record_id column, MySQL will consider the value of server_id when automatically generating a value.

To illustrate this, notice the following:

mysql> insert into orders values (1, NULL, 'testing');

Query OK, 1 row affected (0.01 sec)



mysql> insert into orders values (1, NULL, 'testing');

Query OK, 1 row affected (0.00 sec)



mysql> insert into orders values (2, NULL, 'testing');

Query OK, 1 row affected (0.00 sec)



mysql> select * from orders;

+-----------+-----------+---------+

| server_id | record_id | stuff   |

+-----------+-----------+---------+

|         1 |         1 | testing |

|         1 |         2 | testing |

|         2 |         1 | testing |

+-----------+-----------+---------+

3 rows in set (0.03 sec)

MySQL, in effect, allows you to select from multiple AUTO_INCREMENT sequences based on the prefix you use. By adding a function such as SERVER_ID( ) to MySQL and rewriting the previous queries, you can use AUTO_INCREMENT with multi-master replication safely.

mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');

Query OK, 1 row affected (0.01 sec)



mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');

Query OK, 1 row affected (0.00 sec)



mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');

Query OK, 1 row affected (0.00 sec)

There are three problems with this approach. First, it works only for MyISAM tables. An ideal solution works across all table types. Another issue is that all slaves require some special logic. Today, when a slave reads the binary log of a master, it knows the master's server ID as well as its own, but it doesn't really do anything with the master's server ID. In this solution, the slave has to actually use the master's server ID any time that it replicated a query that involved the mythical SERVER_ID( ) function. That makes the replication logic a bit trickier on the slaves.

You could work around the lack of a SERVER_ID( ) function by simply using the actual server ID in your SQL statements. If you know you're talking to server 12, write the query accordingly:

mysql> insert into orders values (12, NULL, 'testing');

Query OK, 1 row affected (0.01 sec)

But there's the rub. You need to know, in advance of each query, what the server's ID is. Granted, the server's ID doesn't change, but if you're accessing one of many servers via a load balancer or don't have a persistent connection, the server you're talking to may change often. So you'd have to deal with the overhead of obtaining the server's ID whenever you need it.

mysql> show variables like 'server_id';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| server_id     | 102   |

+---------------+-------+

1 row in set (0.00 sec)

Finally, and most importantly, using two columns as the primary key just doesn't feel natural. It feels like a hack or a workaround. If this solution became widespread, others problems might arise. For example, setting up foreign-key relationships would be troublesome. Putting aside the fact that InnoDB doesn't even support multipart auto-increment unique keys, how would you define a foreign-key relationship with multipart keys?

7.7.3.2 Partitioned auto-increment fields

The second solution is to make auto-increment fields a bit more complex. Rather than simply using a 32-bit integer that starts at 1 and keeps counting, it might make sense to use more bits and partition the key-space based on the server ID. Currently, server IDs are 32-bit values, so by using a 64-bit auto-increment value, the two can be combined. The high 32 bits of the value would be the server ID of the server that originally generated the record, and the low 32 bits would be the real auto-increment value.

Internally, MySQL needs to treat the 64-bit auto-increment value a lot like the multipart auto-increment unique keys previously discussed. The value generated for the low 32 bits is dependent on the value of the high 32 bits (the server ID). The benefit is that from the user's point of view, it's a single column and can be used just like any other column. Insert statements are no more complex; all the magic is conveniently under the hood, where it belongs.

There are some downsides to this approach, however. The most apparent issue is that there would be large gaps in the values. For the sake of simplicity, MySQL can always subtract 1 from the server ID when generating the high bits of the auto-increment value. This allows values to continue starting at 1 when the server ID is 1. However, as soon as a second server is introduced, with server ID 2, it inserts values starting from 4,294,967,297 (2³² + 1) and counting up from there.

Another problem is that columns will require more space on disk (both in the data and index files). BIGINT columns are already 8 bytes (64 bits) wide. Adding another 4 bytes (32 bits) for the server ID portion of the auto-increment value means a 50% increase in the space required. That may not sound like a lot, but an application that requires 64-bit values in the first place is likely to be storing billions of rows. Adding an additional 4 bytes to a table containing 10 billion rows means storing an additional 40 GB of data!

It makes sense to break compatibility with existing MySQL versions (which use 32-bit server IDs) and reduce the size of the server ID to 8 or 16 bits. After all, with even 8 bits available, you can have up to 255 unique servers in a single replication setup; with 16 bits, that jumps to 65,535. It's unlikely anyone will have that many servers in a single replication setup.^[9]

^[9] Perhaps Google will decide to run MySQL on their growing farm of 100,000+ Linux servers. They'd need more than 8 bits.

< Day Day Up >