Previous Section  < Day Day Up >  Next Section

9.1 The LOAD DATA INFILE Statement

LOAD DATA INFILE provides an alternative to INSERT for adding new records to a table. With INSERT, you specify data values directly in the INSERT statement. LOAD DATA INFILE reads the values from a separate datafile.

The simplest form of the LOAD DATA INFILE statement specifies only the name of the datafile and the table into which to load the file:






LOAD DATA INFILE 'file_name' INTO TABLE table_name;


The filename is given as a string and must be quoted. MySQL assumes, unless told otherwise, that the file is located on the server host, that it has the default file format (tab-delimited and newline-terminated lines), and that each input line contains a value for each column in the table. However, LOAD DATA INFILE has clauses that give you control over each of those aspects of data-loading operations and more:

  • Which table to load

  • The name and location of the datafile

  • Which columns to load

  • The format of the datafile

  • How to handle duplicate records

  • Whether to ignore lines at the beginning of the datafile

The syntax for LOAD DATA INFILE is as follows, where optional parts of the statement are indicated by square brackets:






LOAD DATA [LOCAL] INFILE 'file_name'

    [IGNORE | REPLACE]

    INTO TABLE table_name

    format_specifiers

    [IGNORE n LINES]

    [(column_list)]


The following sections explain how the various parts of the statement work.

9.1.1 Specifying the Datafile Location

LOAD DATA INFILE can read datafiles that are located on the server host or on the client host:

  • By default, MySQL assumes that the file is located on the server host. The MySQL server reads the file directly.

  • If the statement begins with LOAD DATA LOCAL INFILE rather than with LOAD DATA INFILE, the file is read from the client host on which the statement is issued. In other words, LOCAL means local to the client host from which the statement is issued. In this case, the client program reads the datafile and sends its contents over the network to the server.

The rules for interpreting the filename are somewhat different for the server host and the client host.

9.1.1.1 Specifying the Location of Files on the Server Host

Without LOCAL in the LOAD DATA INFILE statement, MySQL looks for the datafile located on the server host and interprets the pathname as follows:

  • If you refer to the file by its full pathname, the server looks for the file in that exact location.

  • If you specify a relative name with a single component, the server looks for the file in the database directory for the default database. (This isn't necessarily the database that contains the table into which you're loading the file.)

  • If you specify a relative pathname with more than one component, the server interprets the name relative to its data directory.

Suppose that the server's data directory is /var/mysql/data, the database directory for the test database is /var/mysql/data/test, and the file data.txt is located in that database directory. Using the filename interpretation rules just given, it's possible to refer to the data.txt file three different ways in a LOAD DATA INFILE statement:

  • You can refer to the file by its full pathname:

    
    
    
    

    
    LOAD DATA INFILE '/var/mysql/data/test/data.txt' INTO TABLE t;
    
    

  • If test is the default database, you can refer to a file in the database directory using just the final component of its pathname:

    
    
    
    

    
    LOAD DATA INFILE 'data.txt' INTO TABLE t;
    
    

  • You can refer to any file in or under the server's data directory by its pathname relative to that directory:

    
    
    
    

    
    LOAD DATA INFILE './test/data.txt' INTO TABLE t;
    
    

9.1.1.2 Specifying the Location of Files on the Client Host

If you use LOCAL to read a datafile located on the client host, pathname interpretation is simpler:

  • If you refer to the file by its full pathname, the client program looks for the file in that exact location.

  • If you specify a relative pathname, the client program looks for the file relative to its current directory. Normally, this is the directory in which you invoked the program.

Suppose that there's a datafile named data.txt located in the /var/tmp directory on the client host and you invoke the mysql program while located in that directory. You can load the file into a table t using either of these two statements:






LOAD DATA LOCAL INFILE '/var/tmp/data.txt' INTO TABLE t;

LOAD DATA LOCAL INFILE 'data.txt' INTO TABLE t;


The first statement names the file using its full pathname. The second names the file relative to the current directory. If you invoke the mysql program in the /var directory instead, you can still load the file using the same full pathname. However, the relative pathname to the file is different than when running the program in the /var/tmp directory:






LOAD DATA LOCAL INFILE 'tmp/data.txt' INTO TABLE t;


9.1.1.3 Specifying Filenames on Windows

On Windows, the pathname separator character is \, but MySQL treats the backslash as the escape character in strings. To deal with this issue, write separators in Windows pathnames either as / or as \\. To load a file named C:\mydata\data.txt, specify the filename as shown in either of the following statements:






LOAD DATA INFILE 'C:/mydata/data.txt' INTO TABLE t;

LOAD DATA INFILE 'C:\\mydata\\data.txt' INTO TABLE t;


9.1.2 Loading Specific Table Columns

By default, LOAD DATA INFILE assumes that data values in input lines are present in the same order as the columns in the table. If the datafile contains more columns than the table, MySQL ignores the excess data values. If the datafile contains too few columns, each missing column is set to its default value in the table. (This is the same way MySQL handles columns that aren't named in an INSERT statement.)

If input lines don't contain values for every table column, or the data values are not in the same order as table columns, you can add a comma-separated list of column names within parentheses at the end of the LOAD DATA INFILE statement. This tells MySQL how columns in the table correspond to successive columns in the datafile. A list of columns is useful in two ways:

  • If the rows of the datafile don't contain a value for every column in the table, a column list indicates which columns are present in the file. Suppose that a table named subscriber has the following structure:

    
    
    
    

    
    mysql> DESCRIBE subscriber;
    
    +---------+------------------+------+-----+---------+----------------+
    
    | Field   | Type             | Null | Key | Default | Extra          |
    
    +---------+------------------+------+-----+---------+----------------+
    
    | id      | int(10) unsigned |      | PRI | NULL    | auto_increment |
    
    | name    | char(40)         |      |     |         |                |
    
    | address | char(40)         |      |     |         |                |
    
    +---------+------------------+------+-----+---------+----------------+
    
    

    Here, id is an AUTO_INCREMENT column. If you have a file people.txt containing names and addresses and want MySQL to generate ID numbers automatically, load the file like this:

    
    
    
    

    
    LOAD DATA INFILE 'people.txt' INTO TABLE subscriber (name,address);
    
    

    For any table column that isn't assigned a value from the datafile, MySQL sets it to its default value. MySQL thus sets the id column to the next sequence value for each input line.

  • If the order of the columns in the datafile doesn't correspond to the order of the columns in the table, a column list tells MySQL how to match up columns properly. For example, if the lines in people.txt contain addresses and names rather than names and addresses, the statement to load the file looks like this instead:

    
    
    
    

    
    LOAD DATA INFILE 'people.txt' INTO TABLE subscriber (address,name);
    
    

9.1.3 Skipping Datafile Lines

To ignore the initial part of the datafile, use the IGNORE n LINES clause, where n is the number of input lines to skip. This clause is commonly used when a file begins with a row of column names rather than data values. For example, to skip the first input line, a statement might be written like this:






LOAD DATA INFILE 'data.txt' INTO TABLE t IGNORE 1 LINES;


9.1.4 LOAD DATA INFILE and Duplicate Records

When you add new records to a table with an INSERT or REPLACE statement, you can control how to handle new records containing values that duplicate unique key values already present in the table. You can allow an error to occur, ignore the new records, or replace the old records with the new ones. LOAD DATA INFILE affords the same types of control over duplicate records by means of two modifier keywords. However, its duplicate-handling behavior differs slightly depending on whether the datafile is on the server host or the client host, so you must take the datafile location into account.

When loading a file that's located on the server host, LOAD DATA INFILE handles records that contain duplicate unique keys as follows:

  • By default, an input record that causes a duplicate-key violation results in an error and the rest of the datafile isn't loaded. (Records processed up to that point are loaded into the table.)

  • If you specify the IGNORE keyword after the filename, new records that cause duplicate-key violations are ignored and no error occurs. LOAD DATA INFILE processes the entire file, loads all records not containing duplicate keys, and discards the rest.

  • If you specify the REPLACE keyword after the filename, new records that cause duplicate-key violations replace any records already in the table that contain the duplicated key values. LOAD DATA INFILE processes the entire file and loads all its records into the table.

IGNORE and REPLACE are mutually exclusive. You can specify one or the other, but not both.

For datafiles located on the client host, duplicate unique key handling is similar, except that the default is to ignore records that contain duplicate keys rather than to terminate with an error. That is, the default is as though the IGNORE modifier were specified. The reason for this is that the client/server protocol doesn't allow transfer of the datafile from the client host to the server to be interrupted after it has started, so there's no convenient way to abort the operation in the middle.

9.1.5 Interpreting the Result of a LOAD DATA INFILE Statement

As LOAD DATA INFILE executes, it keeps track of the number of records processed and the number of data conversions that occur. Then it returns to the client an information string in the following format (the counts in each field will vary per LOAD DATA INFILE operation):






Records: 174  Deleted: 0  Skipped: 3  Warnings: 14


The fields have the following meanings:

  • Records indicates the number of input records read from the file. (This is not necessarily the number of records added to the table.)

  • Deleted is the number of records in the table that were replaced by input records having the same unique key value as a key already present in the table. The value may be nonzero if you use the REPLACE keyword in the statement.

  • Skipped indicates the number of input records that were ignored because they contained a unique key value that duplicated a key already present in the table. The value may be nonzero if you use the IGNORE keyword in the statement.

  • Warnings indicates the number of problems found in the input file. These can occur for several reasons, such as missing data values or data conversion (for example, converting an empty string to 0 for a numeric column). The warning count can be larger than the number of input records because warnings can occur for each data value in a record. LOAD DATA INFILE doesn't provide more specific information about problems with the datafile. In particular, it doesn't indicate which input records or data values resulted in warnings.

9.1.6 Privileges Needed for LOAD DATA INFILE

LOAD DATA INFILE requires that you have the INSERT privilege for the table into which you want to load data, as well as the DELETE privilege if you specify the REPLACE modifier. For a file located on the client host, you must have read access to the file, but no additional MySQL privileges are required. However, if the datafile is located on the server host, the server itself must read the file. In that case, you must also have the FILE privilege. Because FILE is an administrative privilege, it's likely that to use LOAD DATA INFILE without LOCAL, you'll need to connect to the server as an administrative user such as root.

9.1.7 Efficiency of LOAD DATA INFILE

LOAD DATA INFILE is very efficient for a datafile that is located on the server host. The MySQL server reads the file directly, so the data values need not cross the network from the client to the server. But even for a datafile located locally on the client host, LOAD DATA INFILE is more efficient than INSERT statements because there's less overhead for parsing data values and because the rows are loaded in a single operation. (Some of the efficiency of loading multiple rows at once can be obtained with INSERT if you use its multiple-row syntax, but LOAD DATA INFILE is still more efficient.)

    Previous Section  < Day Day Up >  Next Section