10.6 Data Encryption

In applications that store sensitive data, such as banking records, you may want the data to be stored in an encrypted format. Doing so makes it very difficult for someone to use the data even if they walk up to your server and take it home. A full discussion of the relative merits of encryption algorithms and techniques is beyond the scope of this book.

10.6.1 Hashing Passwords

In less sensitive applications, you may need to protect just a few pieces of information, such as a password database for another application. Passwords really shouldn't be stored in the clear, so they are commonly encrypted in applications. But rather than use encryption, it may be wise to follow the lead of most Unix systems and even MySQL itself: use a hashing algorithm on the password and store the result in your table.

Unlike traditional encryption, which can be reversed, hashing is a one-way process that can't be reversed. The only way to determine the password that generated a particular hash value is to use a very computationally expensive brute-force attack (trying all possible combinations of input).

MySQL provides four functions for hashing passwords: PASSWORD( ), ENCRYPT( ), SHA1( ), and MD5( ).^[12] The best way to see the results of each function is to try each one on the same source text. Let's see how the string pa55word hashes in each:

^[12] MySQL's ENCRYPT( ) simply calls the C library's crypt( ) function. On some Unix variants, crypt( ) is an MD5 implementation, making it no different from using MD5( ). On others, it is the traditional DES encryption algorithm.

mysql> SELECT MD5('pa55word');

+----------------------------------+

| MD5('pa55word')                  |

+----------------------------------+

| a17a41337551d6542fd005e18b43afd4 |

+----------------------------------+

1 row in set (0.13 sec)



mysql> SELECT PASSWORD('pa55word');

+----------------------+

| PASSWORD('pa55word') |

+----------------------+

| 1d35c6556b8cab45     |

+----------------------+

1 row in set (0.00 sec)



mysql> SELECT ENCRYPT('pa55word');

+---------------------+

| ENCRYPT('pa55word') |

+---------------------+

| up2Ecb0Hdj25A       |

+---------------------+

1 row in set (0.17 sec)

Each function returns a fixed-length alphanumeric string that can be stored in a CHAR column. To cope with the possibility of mixed-case characters in the result of ENCRYPT( ), it's best to declare the column CHAR BINARY.

Storing hashed data is as easy as:^[13]

^[13] While you can do it the way we describe here, there are a number of reasons why it is much better to do the MD5 calculations on the client machine if possible, because the clear-text password might appear in the process list or in a query log.

INSERT INTO user_table (user, pass) VALUE ('jzawodn', MD5('pa55word') )

To verify user's password, take the username and password supplied and run a SELECT query to see if they match. Using a language such as Perl or PHP, the query might look like this:

SELECT *

  FROM user_table

 WHERE user = '$username'

   AND pass = MD5('$password')

Password hashing is an easy-to-use and relatively secure way to store passwords in a database without them being easily recoverable.

10.6.2 Encrypted Filesystems

Because MySQL's various table handlers all store their data as regular files on whatever filesystem you may be using, it's possible to use an encrypted filesystem. Most popular operating systems have at least one encrypted filesystem available, either free or commercial.

The main advantage of this approach is that you don't have to do anything special for MySQL to take advantage of it. Because all the encryption and decryption takes place outside MySQL, it just performs reads and writes without any knowledge of what's happening under the hood. All you need to do is make sure your data and logs are stored on the proper filesystem. From your application's point of view, there's nothing special about this arrangement either.

There are a few downsides to using an encrypted filesystem with MySQL. First of all, because all the data, indexes, and logs are being encrypted, there will be a fair amount of CPU overhead involved in encrypting and decrypting the data. If you're thinking about using an encrypted filesystem, be sure to perform good benchmarks so that you understand how it behaves under heavy load.

A more subtle problem with this setup occurs when you consider making backups of your data. To copy the data to another location (disk, tape, CD-ROM, server, etc.), the data must be decrypted. To keep the data safe, you need to find backup software that can encrypt your backups. The only real workaround is to take a complete dump of the disk partition. You can safely store a copy elsewhere because the data remains encrypted. However, there's no way to selectively restore pieces of the data; you'd need to restore the entire partition.

10.6.3 Application-Level Encryption

A more common approach to encryption is to build it into the application (or middleware). When the application needs to store sensitive data, it first encrypts the data and stores the result in MySQL. Similarly, when the application retrieves information from MySQL, it must decrypt it.

This approach provides a lot of flexibility. It doesn't tie you to a particular filesystem, operating system, or even database (if your code is written in a generic fashion). It gives the application designer the freedom to choose an encryption algorithm that's most appropriate (balancing speed and strength) for the data being stored.

Because the data is stored encrypted, backups are very easy. No matter where you copy the data, it is encrypted. However, it also means that access to the data must go through software that understands how to decrypt it. You can't just fire up the mysql command-line tool and begin issuing queries.

Application-level encryption does have some drawbacks, though. It is a lot harder for MySQL to effectively index the data, for example. You may find yourself suffering from significant performance issues.

10.6.3.1 Design issues

This freedom and flexibility have interesting implications for database design. You need to ensure that the field types you are using are appropriate for the type of encryption you're using. Some algorithms produce blocks of data with fixed minimum sizes. That means you may need a column that can hold 256 bytes just to hold a piece of data that is significantly smaller before encryption. Many popular encryption libraries produce binary data, so you'll need to create columns that can store binary data. As an alternative, you can convert the binary data to a hex or base-64 representation, but that would require more space and time.

Deciding exactly what data should and shouldn't be encrypted isn't easy either. You need to balance security against making the information in your tables difficult to query. For example, you might have an account table that represents bank accounts and contains the following fields:

id
type
status
balance
overdraft_protection
date_established

Which fields make sense to encrypt? If you encrypt the balance, which seems reasonable, it would be difficult to answer common reporting questions. For example, you might try to write the following query to find the minimum, maximum, and average balance of accounts of each account type:

  SELECT MIN(balance), MAX(balance), AVG(balance)

    FROM account

GROUP BY type

But the results would be meaningless. MySQL has no clue what the balance field means, so it would just try to perform those functions on the encrypted data in the balance field.

The obvious but painful solution is for your application to read all the records from the account table and do the math for the report you need. That may not be terribly difficult, but it's annoying. Not only are you reimplementing functionality MySQL already provides, you're also slowing down the process considerably.

What all this boils down to is a tradeoff between security and the advantages of using a relational database in the first place. Any field that contains encrypted data is basically useless to MySQL's built-in functions because they need to operate on the unencrypted data. Similar problems arise in query optimization. In an unencrypted setup, you can easily find all the accounts with a balance greater than $100,000 by doing this:

  SELECT *

    FROM account

   WHERE balance > 100000

If there is an index on the balance field, MySQL will probably locate the records in a split second. But if the data is encrypted, you have to get all the records in your application and filter them after they're decrypted. There's just no way for MySQL to help you out.

10.6.4 Source Code Modification

If you're looking for a more flexible approach than either encrypted filesystems or application-based encryption, you can always build a custom solution. The source code for MySQL is freely available under the GNU General Public License.

This sort of work requires that you either know C++ or hire someone who does. Beyond that, you'll be looking to create your own table handler with native encryption support, or you might find it easier to extend an existing table handler (the MyISAM and BDB handlers are easiest to understand) with encryption.

You'll find the relevant files in the sql directory of the MySQL source code. Each table handler is composed of at least two C++ files. The MyISAM handler code, for example, is in ha_myisam.h and ha_myisam.cc.

< Day Day Up >