Wednesday, April 13, 2011

Round-Robin Multi-Source in Pure SQL

With the addition of the new tables to implement crash-safe replication we also get access to replication information through the SQL interface. This might not seem like a big advantage, but it should not be taken lightly. To demonstrate the power of using this approach, I will show how to implement a multi-source round-robin replication described at other places (including our book). However, compared to the other implementations—where the implementation requires a client to parse the output of SHOW SLAVE STATUS—the twist is that the implementation is entirely done in the server, using pure SQL.

If you're familiar with replication, you know that a slave can just replication from a single master. The trick used to replicate from multiple master—this is usually called multi-source—is to switch between masters in a time-share fashion as illustrated in Figure 1. The schema used to pick the master to replicate can vary, but it is common to use a round robin schedule.

The steps necessary to switch master are:

  1. Stop reading events from the master and empty the relay log. To stop reading events from the master, it is necessary to ensure that there are no outstanding events in the relay log before switching to another master. If this is not done, some will not be applied and will have to be re-fetched from the master.
    1. Stop the I/O thread.
    2. Wait for the events in the relay log to be applied.
    3. Stop the SQL thread.
  2. Save away the replication information.
  3. Fetch the saved information about the next master to replicate from.
  4. Change master using the new information.
  5. Start the slave threads.
Simple, right? So, let's make an implementation! So, what pieces do we need?
  • To handle the periodic switching, we use an SQL event for executing the above procedure.
  • We need a table to store the state of each master. The table should contain all the necessary information for configuring the master, including the binlog position.
  • We need to be able to store what master we're currently replicating from.

Saving state information

Figure 1. Tables for storing information about masters
CREATE TABLE my_masters (
    idx INT AUTO_INCREMENT PRIMARY KEY,
    host VARCHAR(50), port INT DEFAULT 3306,
    user VARCHAR(50), passwd VARCHAR(50),
    log_file VARCHAR(50), log_pos LONG,
    UNIQUE INDEX (host,port,user)
) ENGINE=InnoDB;

CREATE TABLE current_master (
    idx INT
) ENGINE=InnoDB;
We need two tables: a table my_masters to record information about the available masters and a table current_master that keeps information about the current master. The my_masters table will contain information on how to connect to the masters as well as the last seen position. We assume that the user and password information is stored in the table and won't save away that information when switching master. To store the current master being replicated from, We cannot use a user defined variable—because each invocation of an event spawns a new session—so we store this information in a table.

Switching masters

To be able to execute a CHANGE MASTER statement with the information we need, it would be perfect to use a prepared statement, but unfortunately, the CHANGE MASTER statement is one of those statements that cannot be used inside a prepared statement, so we have to build the statement dynamically. To make it easier, we create a change_master procedure that does the job of building, preparing, executing, and deallocating a prepared statement. We also allow the file name and position passed to be NULL, in which case we start replication without these parameters, essentially starting from the beginning of the masters binary log.
delimiter $$
CREATE PROCEDURE change_master(
    host VARCHAR(50), port INT,
    user VARCHAR(50), passwd VARCHAR(50),
    name VARCHAR(50), pos LONG)
BEGIN
  SET @cmd = CONCAT('CHANGE MASTER TO ',
                    CONCAT_WS(', ',
                    CONCAT('MASTER_HOST = "', host, '"'),
                    CONCAT('MASTER_PORT = ', port),
                    CONCAT('MASTER_USER = "', user, '"'),
                    CONCAT('MASTER_PASSWORD = "', passwd, '"')));

  IF name IS NOT NULL AND pos IS NOT NULL THEN
    SET @cmd = CONCAT(@cmd,
                      CONCAT_WS(', ', '',
                                CONCAT('MASTER_LOG_FILE = "', name, '"'),
                                CONCAT('MASTER_LOG_POS = ', pos)));
  END IF;
  PREPARE change_master FROM @cmd;
  EXECUTE change_master;
  DEALLOCATE PREPARE change_master;
END $$
delimiter ;
The last step is to create the event that switch master for us. As a specific feature, we implement the event handling so that we can add and remove rows from the my_masters table and the event will just pick the next one in order. To solve this, we use queries to pick the next one in order based on the index of the last used master and then an additional query to handle the case of a wrap-around with a missing table at index 1.

To allow the table to be changed while the events are executing, we place all the updates of our tables into a transaction. That way, any updates done to the table while the event is executing will not affect the logic for picking the next table.

There are some extra logic added to handle the case that there are "holes" in the index numbers: it is possible that there is no master with index 1 and it is possible that the next master does not have the next index in sequence. This also allow the server ID of the master to be used, but in the current implementation, we use a simple index instead.

delimiter $$
CREATE EVENT multi_source
    ON SCHEDULE EVERY 10 SECOND DO
BEGIN
   DECLARE l_host VARCHAR(50);
   DECLARE l_port INT UNSIGNED;
   DECLARE l_user TEXT;
   DECLARE l_pass TEXT;
   DECLARE l_file VARCHAR(50);
   DECLARE l_pos BIGINT;
   DECLARE l_idx INT DEFAULT 1;
   SET SQL_LOG_BIN = 0;
Don't write any of this to the binary log. Since this is an event, it will automatically be reset at the end of the execution and not affect anything else.
   STOP SLAVE IO_THREAD;
   SELECT master_log_name, master_log_pos
     INTO l_file, l_pos
     FROM mysql.slave_master_info;
   SELECT MASTER_POS_WAIT(l_file, l_pos);
   STOP SLAVE;
Stop the slave I/O thread and empty the relay log before switching master
   START TRANSACTION;
   UPDATE my_masters AS m,
          mysql.slave_relay_log_info AS rli
      SET m.log_pos = rli.master_log_pos,
          m.log_file = rli.master_log_name
    WHERE idx = (SELECT idx FROM current_master);
Save the position of the current master
   SELECT idx INTO l_next_idx FROM my_masters
    WHERE idx > (SELECT idx FROM current_master)
    ORDER BY idx LIMIT 1;
Find the next master in turn. To handle that masters have been removed, we will pick the next one index-wise. Wrap-around is handled by using the default of 1 above.
    SELECT idx INTO l_next_idx FROM my_masters
     WHERE idx >= l_next_idx
     ORDER BY idx LIMIT 1;
If we did a wrap-around, it might be the case that master with index 1 does not exist (the default for l_next_idx), so then we have to scan and find the first index that exists which is equal to or greater than l_next_idx.
    UPDATE current_master SET idx = l_next_idx;

    SELECT host, port, user, passwd, log_pos, log_file
      INTO l_host, l_port, l_user, l_pass, l_pos, l_file
      FROM my_masters
      WHERE idx = l_next_idx;

    CALL change_master(l_host, l_port, l_user,
                       l_pass, l_file, l_pos);
    COMMIT;
    START SLAVE;
END $$
delimiter ;
Extract the information about the new master from our masters table my_masters and change to use that master.
That's all! No you go off and play with it and send me comments.

You can download the MySQL 5.6 Milestone Development Release MySQL Developer Zone (dev.mysql.com), which contain the new replication tables and you can find information in the previous post on how to set up the server to use the new tables.

1 comment:

sameer said...

How to use this with multiple databases on source DB and single DB on destination DB server. Multiple source DBs replicated to single DB. I am trying out this with replicate rewrite db option but its not working