Friday, May 4, 2012

Performance improvements for big INFORMATION_SCHEMA tables

A short while after I fixed the legacy bug that prevented temporary MyISAM tables from using the dynamic record format, I got an email from Davi Arnaut @ Twitter. It turned out that Twitter needed to fix the very same problem, but for the case when INFORMATION_SCHEMA temporary tables use MyISAM.

In short, INFORMATION_SCHEMA tables provide access to database metadata. Despite their name, they are more like views than tables: when you query them, relevant data is gathered from the dictionary and other server internals, not from tables. The gathered data is stored in a temporary table (memory or MyISAM depending on size) and then returned to the user.

The reason Davi emailed me was to let me know that he had further improved the fix for temporary MyISAM tables to also enable the use of dynamic record format for INFORMATION_SCHEMA tables. I usually don't have huge databases on my development box so the problem of querying metadata had gone unnoticed. But it turns out that Davi and his colleagues at Twitter do deal with massive amounts of data :-)

This was one of the queries that caused problems:

SELECT pool_id FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE limit 1;

The query was run on a server with a 120GB buffer pool and 7.8 million rows in the INNODB_BUFFER_PAGE table. Due to two big VARCHARs in this table the query used to produce a 51GB (!) temporary table and have a response time of 5 min 8.64 sec. With the fix, it went down to 481 MB temp table size and 9.05 seconds response time. This fix is available in MySQL 5.6.5.

Thanks to Davi Arnaut and Jeremy Cole on the MySQL team @ Twitter for finding the bug and for suggesting a fix. Open source works!

No comments:

Post a Comment