Previous Page
Next Page

9.6. Management of MySQL Cluster

Managing a MySQL Cluster involves a number of tasks, the first of which is to configure and start MySQL Cluster. This is covered in Section 9.4, "MySQL Cluster Configuration," and Section 9.5, "Process Management in MySQL Cluster."

The following sections cover the management of a running MySQL Cluster.

There are essentially two methods of actively managing a running MySQL Cluster. The first of these is through the use of commands entered into the management client whereby cluster status can be checked, log levels changed, backups started and stopped, and nodes stopped and started. The second method involves studying the contents of the cluster log ndb_node_id_cluster.log in the management server's DataDir directory. (Recall that node_id represents the unique identifier of the node whose activity is being logged.) The cluster log contains event reports generated by ndbd. It is also possible to send cluster log entries to a Unix system log.

9.6.1. MySQL Cluster Startup Phases

This section describes the steps involved when the cluster is started.

There are several different startup types and modes, as shown here:

  • Initial Start: The cluster starts with a clean filesystem on all data nodes. This occurs either when the cluster started for the very first time, or when it is restarted using the --initial option.

  • System Restart: The cluster starts and reads data stored in the data nodes. This occurs when the cluster has been shut down after having been in use, when it is desired for the cluster to resume operations from the point where it left off.

  • Node Restart: This is the online restart of a cluster node while the cluster itself is running.

  • Initial Node Restart: This is the same as a node restart, except that the node is reinitialized and started with a clean filesystem.

Prior to startup, each data node (ndbd process) must be initialized. Initialization consists of the following steps:

1.
Obtain a node ID.

2.
Fetch configuration data.

3.
Allocate ports to be used for inter-node communications.

4.
Allocate memory according to settings obtained from the configuration file.

After each data node has been initialized, the cluster startup process can proceed. The stages which the cluster goes through during this process are listed here:

  • Stage 0

    Clears the cluster filesystem. This stage occurs only if the cluster was started with the --initial option.

  • Stage 1

    This stage sets up Cluster connections, establishes inter-node communications, and starts Cluster heartbeats.

  • Stage 2

    The arbitrator node is elected. If this is a system restart, the cluster determines the latest restorable global checkpoint.

  • Stage 3

    This stage initializes a number of internal cluster variables.

  • Stage 4

    For an initial start or initial node restart, the REDO log files are created. The number of these files is equal to NoOfFragmentLogFiles.

    For a system restart:

    • Read schema or schemas.

    • Read data from the local checkpoint and undo logs.

    • Apply all redo information until the latest restorable global checkpoint has been reached.

    For a node restart, find the tail of the REDO log.

  • Stage 5

    If this is an initial start, create the SYSTAB_0 and NDB$EVENTS internal system tables.

    For a node restart or an initial node restart:

     
    1.
    The node is included in transaction handling operations.

    2.
    The node schema is compared with that of the master and synchronized with it.

    3.
    Synchronize data received in the form of INSERT from the other data nodes in this node's node group.

    4.
    In all cases, wait for complete local checkpoint as determined by the arbitrator.

  • Stage 6

    Update internal variables.

  • Stage 7

    Update internal variables.

  • Stage 8

    In a system restart, rebuild all indexes.

  • Stage 9

    Update internal variables.

  • Stage 10

    At this point in a node restart or initial node restart, APIs may connect to the node and begin to receive events.

  • Stage 11

    At this point in a node restart or initial node restart, event delivery is handed over to the node joining the cluster. The newly-joined node takes over responsibility for delivering its primary data to subscribers.

After this process is completed for an initial start or system restart, transaction handling is enabled. For a node restart or initial node restart, completion of the startup process means that the node may now act as a transaction coordinator.

9.6.2. Commands in the Management Client

In addition to the central configuration file, a cluster may also be controlled through a command-line interface available through the management client ndb_mgm. This is the primary administrative interface to a running cluster.

Commands for the event logs are given in Section 9.6.3, "Event Reports Generated in MySQL Cluster." Commands for creating backups and restoring from backup are provided in Section 9.6.5, "Online Backup of MySQL Cluster."

The management client has the following basic commands. In the listing that follows, node_id denotes either a database node ID or the keyword ALL, which indicates that the command should be applied to all of the cluster's data nodes.

  • HELP

    Displays information on all available commands.

  • SHOW

    Displays information on the cluster's status.

    Note: In a cluster where multiple management nodes are in use, this command displays information only for data nodes that are actually connected to the current management server.

  • node_idSTART

    Starts the data node identified by node_id (or all data nodes).

  • node_idSTOP

    Stops the data node identified by node_id (or all data nodes).

  • node_idRESTART [-N] [-I]

    Restarts the data node identified by node_id (or all data nodes).

  • node_idSTATUS

    Displays status information for the data node identified by node_id (or for all data nodes).

  • ENTER SINGLE USER MODE node_id

    Enters single-user mode, wherein only the MySQL server identified by the node ID node_id is allowed to access the database.

  • EXIT SINGLE USER MODE

    Exits single-user mode, allowing all SQL nodes (that is, all running mysqld processes) to access the database.

  • QUIT

    Terminates the management client.

  • SHUTDOWN

    Shuts down all cluster nodes, except for SQL nodes, and exits.

9.6.3. Event Reports Generated in MySQL Cluster

In this section, we discuss the types of event logs provided by MySQL Cluster, and the types of events that are logged.

MySQL Cluster provides two types of event log. These are the cluster log, which includes events generated by all cluster nodes, and node logs, which are local to each data node.

Output generated by cluster event logging can have multiple destinations including a file, the management server console window, or syslog. Output generated by node event logging is written to the data node's console window.

Both types of event logs can be set to log different subsets of events.

Note: The cluster log is the log recommended for most uses because it provides logging information for an entire cluster in a single file. Node logs are intended to be used only during application development, or for debugging application code.

Each reportable event can be distinguished according to three different criteria:

  • Category: This can be any one of the following values: STARTUP, SHUTDOWN, STATISTICS, CHECKPOINT, NODERESTART, CONNECTION, ERROR, or INFO.

  • Priority: This is represented by one of the numbers from 1 to 15 inclusive, where 1 indicates "most important" and 15 "least important."

  • Severity Level: This can be any one of the following values: ALERT, CRITICAL, ERROR, WARNING, INFO, or DEBUG.

Both the cluster log and the node log can be filtered on these properties.

9.6.3.1. Logging Management Commands

The following management commands are related to the cluster log:

  • CLUSTERLOG ON

    Turns the cluster log on.

  • CLUSTERLOG OFF

    Turns the cluster log off.

  • CLUSTERLOG INFO

    Provides information about cluster log settings.

  • node_id CLUSTERLOG category=tHReshold

    Logs category events with priority less than or equal to tHReshold in the cluster log.

  • CLUSTERLOG FILTER severity_level

    Toggles cluster logging of events of the specified severity_level.

The following table describes the default setting (for all data nodes) of the cluster log category threshold. If an event has a priority with a value lower than or equal to the priority threshold, it is reported in the cluster log.

Note that events are reported per data node, and that the threshold can be set to different values on different nodes.

Category

Default Threshold (All Data Nodes)

STARTUP

7

SHUTDOWN

7

STATISTICS

7

CHECKPOINT

7

NODERESTART

7

CONNECTION

7

ERROR

15

INFO

7


Thresholds are used to filter events within each category. For example, a STARTUP event with a priority of 3 is not logged unless the threshold for STARTUP is changed to 3 or lower. Only events with priority 3 or lower are sent if the threshold is 3.

The following table shows the event severity levels. (Note: These correspond to Unix syslog levels, except for LOG_EMERG and LOG_NOTICE, which are not used or mapped.)

1

ALERT

A condition that should be corrected immediately, such as a corrupted system database

2

CRITICAL

Critical conditions, such as device errors or insufficient resources

3

ERROR

Conditions that should be corrected, such as configuration errors

4

WARNING

Conditions that are not errors, but that might require special handling

5

INFO

Informational messages

6

DEBUG

Debugging messages used for NDB Cluster development


Event severity levels can be turned on or off (using CLUSTERLOG FILTERsee above). If a severity level is turned on, all events with a priority less than or equal to the category thresholds are logged. If the severity level is turned off, no events belonging to that severity level are logged.

9.6.3.2. Log Events

An event report reported in the event logs has the following format:

datetime [string] severity -- message

For example:

09:19:30 2005-07-24 [NDB] INFO -- Node 4 Start phase 4 completed

This section discusses all reportable events, ordered by category and severity level within each category.

In the event descriptions, GCP and LCP mean "Global Checkpoint" and "Local Checkpoint," respectively.

Connection Events

These events are associated with connections between Cluster nodes.

Event

Priority

Severity Level

Description

DB nodes connected

8

INFO

Data nodes connected

DB nodes disconnected

8

INFO

Data nodes disconnected

Communication closed

8

INFO

SQL node or data node connection closed

Communication opened

8

INFO

SQL node or data node connection opened


Checkpoint Events

The logging messages shown here are associated with checkpoints.

Event

Priority

Severity Level

Description

LCP stopped in calc keep GCI

0

ALERT

LCP stopped

Local checkpoint fragment completed

11

INFO

LCP on a fragment has been completed

Global checkpoint completed

10

INFO

GCP finished

Global checkpoint started

9

INFO

Start of GCP: REDO log is written to disk

Local checkpoint completed

8

INFO

LCP completed normally

Local checkpoint started

7

INFO

Start of LCP: data written to disk

Report undo log blocked

7

INFO

UNDO logging blocked; buffer near overflow


Startup Events

The following events are generated in response to the startup of a node or of the cluster and of its success or failure. They also provide information relating to the progress of the startup process, including information concerning logging activities.

Event

Priority

Severity Level

Description

Internal start signal received STTORRY

15

INFO

Blocks received after completion of restart

Undo records executed

15

INFO

 

New REDO log started

10

INFO

GCI keep X, newest restorable GCI Y

New log started

10

INFO

Log part X, start MB Y, stop MB Z

Node has been refused for inclusion in the cluster

8

INFO

Node cannot be included in cluster due to misconfiguration, inability to establish communication, or other problem

DB node neighbors

8

INFO

Shows neighboring data nodes

DB node start phase X completed

4

INFO

A data node start phase has been completed

Node has been successfully included into the cluster

3

INFO

Displays the node, managing node, and dynamic ID

DB node start phases initiated

1

INFO

NDB Cluster nodes starting

DB node all start phases completed

1

INFO

NDB Cluster nodes started

DB node shutdown initiated

1

INFO

Shutdown of data node has commenced

DB node shutdown aborted

1

INFO

Unable to shut down data node normally


Noderestart Events

The following events are generated when restarting a node and relate to the success or failure of the node restart process.

Event

Severity

Priority

Level Description

Node failure phase completed

8

ALERT

Reports completion of node failure phases

Node has failed, node state was X

8

ALERT

Reports that a node has failed

Report arbitrator results

2

ALERT

There are eight different possible results for arbitration attempts:

  • Arbitration check failedless than 1/2 nodes left

  • Arbitration check succeedednode group majority

  • Arbitration check failedmissing node group

  • Network partitioningarbitration required

  • Arbitration succeededaffirmative response from node X

  • Arbitration failednegative response from node X

  • Network partitioningno arbitrator available

  • Network partitioningno arbitrator configured

Completed copying a fragment

10

INFO

 

Completed copying of dictionary information

8

INFO

 

Completed copying distribution information

8

INFO

 

Starting to copy fragments

8

INFO

 

Completed copying all fragments

8

INFO

 

GCP takeover started

7

INFO

 

GCP takeover completed

7

INFO

 

LCP takeover started

7

INFO

 

LCP takeover completed (state = X)

7

INFO

 

Report whether an arbitrator is found or not

6

INFO

There are seven different possible outcomes when seeking an arbitrator:

  • Management server restarts arbitration thread [state=X]

  • Prepare arbitrator node X [ticket=Y]

  • Receive arbitrator node X [ticket=Y]

  • Started arbitrator node X [ticket=Y]

  • Lost arbitrator node Xprocess failure [state=Y]

  • Lost arbitrator node Xprocess exit [state=Y]

  • Lost arbitrator node X <error msg> [state=Y]


Statistics Events

The following events are of a statistical nature. They provide information such as numbers of transactions and other operations, amount of data sent or received by individual nodes, and memory usage.

Event

Priority

Severity Level

Description

Report job scheduling statistics

9

INFO

Mean internal job scheduling statistics

Sent number of bytes

9

INFO

Mean number of bytes sent to node X

Received number of bytes

9

INFO

Mean number of bytes received from node X

Report transaction statistics

8

INFO

Numbers of transactions, commits, reads, simple reads, writes, concurrent operations, attribute information, and aborts

Report operations

8

INFO

Number of operations

Report table create

7

INFO

 

Memory usage

5

INFO

Data and index memory usage (80%,90%, and 100%)


Error Events

These events relate to Cluster errors and warnings. The presence of one or more of these generally indicates that a major malfunction or failure has occurred.

Event

Priority

Severity

Description

Dead due to missed heartbeat

8

ALERT

Node X declared dead due to missed heartbeat

Transporter errors

2

ERROR

 

Transporter warnings

8

WARNING

 

Missed heartbeats

8

WARNING

Node X missed heartbeat Y

General warning events

2

WARNING

 


INFO Events

These events provide general information about the state of the cluster and activities associated with Cluster maintenance, such as logging and heartbeat transmission.

Event

Priority

Severity

Description

Sent heartbeat

12

INFO

Heartbeat sent to node X

Create log bytes

11

INFO

Log part, log file, MB

General information events

2

INFO

 


9.6.4. Single-User Mode

Single-user mode allows the database administrator to restrict access to the database system to a single MySQL server (SQL node). When entering single-user mode, all connections to all other MySQL servers are closed gracefully and all running transactions are aborted. No new transactions are allowed to be started.

Once the cluster has entered single-user mode, only the designated SQL node is granted access to the database.

You can use the ALL STATUS command to see when the cluster has entered single-user mode.

Example:

NDB> ENTER SINGLE USER MODE 5

After this command has executed and the cluster has entered single-user mode, the SQL node whose node ID is 5 becomes the cluster's only permitted user.

The node specified in the preceding command must be a MySQL Server node; An attempt to specify any other type of node will be rejected.

Note: When the preceding command is invoked, all transactions running on the designated node are aborted, the connection is closed, and the server must be restarted.

The command EXIT SINGLE USER MODE changes the state of the cluster's data nodes from single-user mode to normal mode. MySQL Servers waiting for a connection (that is, for the cluster to become ready and available), are again permitted to connect. The MySQL Server denoted as the single-user SQL node continues to run (if still connected) during and after the state change.

Example:

NDB> EXIT SINGLE USER MODE

There are two recommended ways to handle a node failure when running in single-user mode:

  • Method 1:

     
    1.
    Finish all single-user mode transactions

    2.
    Issue the EXIT SINGLE USER MODE command

    3.
    Restart the cluster's data nodes

  • Method 2:

    Restart database nodes prior to entering single-user mode

9.6.5. Online Backup of MySQL Cluster

This section describes how to create a backup and how to restore the database from a backup at a later time.

9.6.5.1. Cluster Backup Concepts

A backup is a snapshot of the database at a given time. The backup consists of three main parts:

  • Metadata: The names and definitions of all database tables

  • Table records: The data actually stored in the database tables at the time that the backup was made

  • Transaction log: A sequential record telling how and when data was stored in the database

Each of these parts is saved on all nodes participating in the backup. During backup, each node saves these three parts into three files on disk:

  • BACKUP-backup_id.node_id.ctl

    A control file containing control information and metadata. Each node saves the same table definitions (for all tables in the cluster) to its own version of this file.

  • BACKUP-backup_id-0.node_id.data

    A data file containing the table records, which are saved on a per-fragment basis. That is, different nodes save different fragments during the backup. The file saved by each node starts with a header that states the tables to which the records belong. Following the list of records there is a footer containing a checksum for all records.

  • BACKUP-backup_id.node_id.log

    A log file containing records of committed transactions. Only transactions on tables stored in the backup are stored in the log. Nodes involved in the backup save different records because different nodes host different database fragments.

In the listing above, backup_id stands for the backup identifier and node_id is the unique identifier for the node creating the file.

9.6.5.2. Using the Management Server to Create a Backup

Before starting a backup, make sure that the cluster is properly configured for performing one. (See Section 9.6.5.4, "Configuration for Cluster Backup.")

Creating a backup using the management server involves the following steps:

1.
Start the management server (ndb_mgm).

2.
Execute the command START BACKUP.

3.
The management server will reply with the message Start of backup ordered. This means that the management server has submitted the request to the cluster, but has not yet received any response.

4.
The management server will reply Backup backup_id started, where backup_id is the unique identifier for this particular backup. (This identifier will also be saved in the cluster log, if it has not been configured otherwise.) This means that the cluster has received and processed the backup request. It does not mean that the backup has finished.

5.
The management server will signal that the backup is finished with the message Backup backup_id completed.

To abort a backup that is already in progress:

1.
Start the management server.

2.
Execute the command ABORT BACKUP backup_id. The number backup_id is the identifier of the backup that was included in the response of the management server when the backup was started (in the message Backup backup_id started).

3.
The management server will acknowledge the abort request with Abort of backup backup_id ordered; note that it has received no actual response to this request yet.

4.
After the backup has been aborted, the management server will report Backup backup_id has been aborted for reason XYZ. This means that the cluster has terminated the backup and that all files related to this backup have been removed from the cluster filesystem.

It is also possible to abort a backup in progress from the system shell using this command:

shell> ndb_mgm -e "ABORT BACKUP backup_id"

Note: If there is no backup with ID backup_id running when it is aborted, the management server makes no explicit response. However, the fact that an invalid abort command was sent is indicated in the cluster log.

9.6.5.3. How to Restore a Cluster Backup

The cluster restoration program is implemented as a separate command-line utility ndb_restore, which reads the files created by the backup and inserts the stored information into the database. The restore program must be executed once for each set of backup files. That is, as many times as there were database nodes running when the backup was created.

The first time you run the ndb_restore restoration program, you also need to restore the metadata. In other words, you must re-create the database tables. (Note that the cluster should have an empty database when starting to restore a backup.) The restore program acts as an API to the cluster and therefore requires a free connection to connect to the cluster. This can be verified with the ndb_mgm command SHOW (you can accomplish this from a system shell using ndb_mgm -e SHOW). The -c connectstring option may be used to locate the MGM node (see Section 9.4.4.2, "The MySQL Cluster connectstring." for information on connectstrings). The backup files must be present in the directory given as an argument to the restoration program.

It is possible to restore a backup to a database with a different configuration than it was created from. For example, suppose that a backup with backup ID 12, created in a cluster with two database nodes having the node IDs 2 and 3, is to be restored to a cluster with four nodes. Then ndb_restore must be run twiceonce for each database node in the cluster where the backup was taken.

Note: For rapid restoration, the data may be restored in parallel, provided that there is a sufficient number of cluster connections available. However, the data files must always be applied before the logs.

9.6.5.4. Configuration for Cluster Backup

Four configuration parameters are essential for backup:

  • BackupDataBufferSize

    The amount of memory used to buffer data before it is written to disk.

  • BackupLogBufferSize

    The amount of memory used to buffer log records before these are written to disk.

  • BackupMemory

    The total memory allocated in a database node for backups. This should be the sum of the memory allocated for the backup data buffer and the backup log buffer.

  • BackupWriteSize

    The size of blocks written to disk. This applies for both the backup data buffer and the backup log buffer.

More detailed information about these parameters can be found in Section 9.4, "MySQL Cluster Configuration."

9.6.5.5. Backup Troubleshooting

If an error code is returned when issuing a backup request, the most likely cause is insufficient memory or insufficient disk space. You should check that there is enough memory allocated for the backup. Also check that there is enough space on the hard drive partition of the backup target.

NDB does not support repeatable reads, which can cause problems with the restoration process. Although the backup process is "hot," restoring a MySQL Cluster from backup is not a 100% "hot" process. This is due to the fact that, for the duration of the restore process, running transactions get non-repeatable reads from the restored data. This means that the state of the data is inconsistent while the restore is in progress.


Previous Page
Next Page