Sunday, June 03, 2007

The replication poll and our plans for the future

We've been running replication poll and we've got some answers, so I thought I would comment a little on the results of the poll and what our future plans with respect to replication is as a result of the feedback. As I commented in the previous post, there are some items that require a significant development effort, but the feedback we got helps us to prioritize.

The top five items from the poll above stands out, so I thought that I would comment on each of them in turn. The results of the poll were (when this post were written):

Online check that Master and Slave tables are consistent 45.4%
Multi-source replication: replicating from several masters to one slave 36.3%
Multi-threaded application of data on slave to improve performance 29.2%
Conflict resolution: earlier replicated rows are not applied 21.0%
Semi-synchronous replication: transaction copied to slave before commit 20.3%

Online check that Master and Slave tables are consistent

The most natural way to check that tables are consistent is to compute a hash of the contents of the table and then compare that with a hash of the same table on the slave. There are storage engines that have support for incrementally computing a hash, and for the other cases, the Table checksum that was released by Baron "Xaprb" can be used. The problem is to do the comparison while the replication is running, since any change to the table between computing the hash on the master and the slave will indicate that the tables are different when they in reality are not. To solve this, we are planning to introduce support to transfer the table hash and perform the check while replication is running. By adding the hash to the binary log, we have a computed hash for a table at a certain point in time, and the slave can then compare the contents of the tables as it see the event, being sure that this is the same (relative) point in time as it were on the master. We will probably add a default hash function for those engines that do not have something, and allow storage engines to return a hash of the data in the table (probably computed incrementally for efficiency).

Multi-source replication: replicating from several masters to one slave

This is something that actually was started a while ago, but for several reasons is not finished yet. A large portion of the work is actually done, but since the code is a tad old (enough to be non-trivial to incorporate into the current clone), there is some work remaining to actually close this one. Since there seems to be a considerable interest in this, both at the poll and at the MySQL Conference, we are considering finishing off this feature sometime in the aftermath of 5.1 GA. No promises here, though. There's a lot of things that we need to consider to build a high-quality replication solution, and we're a tad strained when it comes to manpower in the team.

Multi-threaded application of data on slave to improve performance

This is something that we really want to do, but which we really do not have the manpower for currently. It is a significant amount of work, it would be a huge improvement of the replication, but it would utterly make us unable to do anything else for a significant period. Sorry folks, but however much I would like to see this happen, it would be irresponsible to promise that we will implement this in the near future. There are also some changes going on internally with the threading model, so it might be easier to implement in the near future.

Conflict resolution: earlier replicated rows are not applied

When multi-source comes into the picture, it is inevitable that some form of conflict resolution will be needed. We are currently working on providing a simple version of timestamp-based conflict resolution in the form of "latest change wins". This is ongoing work, so you will see it in a post-5.1 release in the near future.

Semi-synchronous replication: transaction copied to slave before commit

There is already a MySQL 4 patch for this written by folks at Google Code under the Mysql4Patches work. The idea is to not commit the ongoing transaction until the entire transaction has been successfully transferred to at least one slave. The reason for this is that it should be possible to switch to a slave in the event of a failure of the master, so it has to be certain that the transaction exists somewhere else (at least in disk). We consider this as very important for our ongoing work of being the best on-line database server for modern applications, so you will probably see it pretty soon. Compared to the patch above, we would like to generalize it slightly to allow it to be configurable how many slave should have received it before the transaction is committed. This will of course reduce performance of the master, but it will provide better redundancy in the case of a serious failure and it is a minor addition to the work anyway.

Binary log event checksum

In addition to the things we mentioned above, this is very important to both find and repair problems with replication. The relay log is a potential source of problem, as is the code that writes the events to the relay log, so it is prudent to add a simple CRC checksum to each event to check the integrity of the event. Sadly enough, this does not exist currently, so we're trying to make this get into the code base as soon as possible, maybe even for 5.1 (keep your fingers crossed). This is not a promise: we're doing what we can, but there are no guarantees.

2 comments:

pabloj said...

Hi, can you tell us which of these might make it into 5.1 before it's released as stable?
Thanks

Mats Kindahl said...

I'm sorry, but 5.1 has been feature-frozen for a long time, so none of these features will make it into 5.1 GA.

All of these features are what we are considering for releases after 5.1.