Improving INSERT INTO … SELECT Performance
INSERT INTO…SELECT locks the table being read by the SELECT statement due to MySQL 5.0’s statement-based replication. Here’s a great post from the MySQL Performance Blog explaining the problem in detail and what to do about it.
Note: this problem is supposed to be eliminated when using MySQL 5.1 row-based replication.
I thought it would be interesting to see how the Drizzle developers where handling this scenario in their high-performance, high-concurrency remake of the MySQL. Below is an e-mail I got back from Jay Pipes on the subject:
Jonathon Hill wrote:
Hey guys,
I’m excited about Drizzle and look forward to its maturity!
I have just run into an issue with MySQL and was wondering how you might be handling this problem in Drizzle. Basically, when you do a INSERT INTO…SELECT query it has to lock the table being selected or replication doesn’t work right (see http://www.mysqlperformanceblog.com/2006/07/12/insert-into-select-performance-with-innodb-tables/). This is horrible for concurrence, because other clients are locked out until the SELECT is finished, which can be a while, depending on the query.
Not in Drizzle.
MySQL 5.1 solves this with row-based replication. Will Drizzle support replication? If so, how are you planning to handle this scenario?
I just added a test case to the replication suite for INSERT SELECT. As you can see below, INSERT SELECT is translated in Drizzle to multiple InsertRecord GPB messages (kinda like row-based replication, but in a standardized serial protocol format).
DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; CREATE TABLE t1 ( id INT NOT NULL, padding VARCHAR(200) NOT NULL ); INSERT INTO t1 VALUES (1, "I love testing."); INSERT INTO t1 VALUES (2, "I hate testing."); CREATE TABLE t2 ( id INT NOT NULL, padding VARCHAR(200) NOT NULL ); INSERT INTO t2 SELECT * FROM t1;
Output from command log reader, which reconstructs the row-based events into SQL statements, with some header info removed:
<br /> DROP TABLE IF EXISTS t1;<br /> DROP TABLE IF EXISTS t2;<br /> CREATE TABLE t1 ( id INT NOT NULL , padding VARCHAR(200) NOT NULL );<br /> INSERT INTO `test`.`t1` (`id`, `padding`) VALUES ("1", "I love testing.");<br /> INSERT INTO `test`.`t1` (`id`, `padding`) VALUES ("2", "I hate testing.");<br /> CREATE TABLE t2 ( id INT NOT NULL , padding VARCHAR(200) NOT NULL );<br /> INSERT INTO `test`.`t2` (`id`, `padding`) VALUES ("1", "I love testing.");<br /> INSERT INTO `test`.`t2` (`id`, `padding`) VALUES ("2", "I hate testing.");<br />
Cheers!
jay