Friday, May 4, 2012

Performance improvements for big INFORMATION_SCHEMA tables

A short while after I fixed the legacy bug that prevented temporary MyISAM tables from using the dynamic record format, I got an email from Davi Arnaut @ Twitter. It turned out that Twitter needed to fix the very same problem, but for the case when INFORMATION_SCHEMA temporary tables use MyISAM.

In short, INFORMATION_SCHEMA tables provide access to database metadata. Despite their name, they are more like views than tables: when you query them, relevant data is gathered from the dictionary and other server internals, not from tables. The gathered data is stored in a temporary table (memory or MyISAM depending on size) and then returned to the user.

The reason Davi emailed me was to let me know that he had further improved the fix for temporary MyISAM tables to also enable the use of dynamic record format for INFORMATION_SCHEMA tables. I usually don't have huge databases on my development box so the problem of querying metadata had gone unnoticed. But it turns out that Davi and his colleagues at Twitter do deal with massive amounts of data :-)

This was one of the queries that caused problems:

SELECT pool_id FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE limit 1;