Jørgen's point of view: performance

Showing posts with label performance. Show all posts

Monday, October 21, 2013

FAQ: InnoDB extended secondary keys

MySQL 5.6 introduced a new feature called extended secondary keys. We get a lot of questions about it and find that most of them come from a few incorrect assumption. In this post I'll try to get rid of the confusion once and for all. Famous last words... here goes:

Q1: Do I need to do anything to enable extended secondary keys?

No, nothing at all. It's on by default and I can't see any sensible reason why you would want to disable it. However, it is possible to disable it by tuning the optimizer_switch: SET optimizer_switch='use_index_extensions={on|off}'.

Q2: Does extended secondary keys only work with InnoDB?

No, it should work with any storage engine that uses the primary key columns as reference to the row, which means most storage engines with clustered primary keys. I say "should" because it requires a minimum of work from the storage engine provider; it must announce to MySQL that it is supported.

DBT-3 Q3: 6 x performance in MySQL 5.6.10

When MySQL gets a query, it is the job of the optimizer to find the cheapest way to execute that query. Decisions include access method (range access, table scan, index lookup etc), join order, sorting strategy etc. If we simplify a bit, the optimizer first identifies the different ways to access each table and calculate their cost. After that, the join order is decided.

However, some access methods can only be considered after the join order has been decided and therefore gets special treatment in the MySQL optimizer. For join conditions, e.g. "WHERE table1.col1 = table2.col2", index lookup can only be used in table2 if table1 is earlier in the join sequence. Another class of access methods is only meaningful for tables that are first in the join order. An example is queries with ORDER BY ... LIMIT. Prior to MySQL 5.6.10 there was a bug in MySQL that made the optimizer choose inefficient execution plans for this query type. That's the topic I discuss below.

Index merge annoyances fixed in MySQL 5.6

While the index merge access types certainly are useful for a number of queries, there has been some frustration expressed both from customers and the community about how it...

is not used when it should have been
is used when ref access is obviously better
merges suboptimal indexes
is too restricted in which conditions can be used

I could come up with numerous examples of related bugs and feature requests dating back more than six years. To list a few: 17673, 30151, 23322, 65274, 65359.

The good news is that we have fixed all these issues in the latest released MySQL 5.6 server.

Consider the problem in BUG#65274 (simplified a bit):
CREATE TABLE phpbb_posts (
...
KEY `topic_id` (`topic_id`),
KEY `forum_id` (`forum_id`)
KEY `tid_fid` (`topic_id`,`forum_id`),
);
SELECT count(*) FROM phpbb_posts WHERE topic_id = 110126 AND forum_id = 19;

Performance improvements for big INFORMATION_SCHEMA tables

A short while after I fixed the legacy bug that prevented temporary MyISAM tables from using the dynamic record format, I got an email from Davi Arnaut @ Twitter. It turned out that Twitter needed to fix the very same problem, but for the case when INFORMATION_SCHEMA temporary tables use MyISAM.

In short, INFORMATION_SCHEMA tables provide access to database metadata. Despite their name, they are more like views than tables: when you query them, relevant data is gathered from the dictionary and other server internals, not from tables. The gathered data is stored in a temporary table (memory or MyISAM depending on size) and then returned to the user.

The reason Davi emailed me was to let me know that he had further improved the fix for temporary MyISAM tables to also enable the use of dynamic record format for INFORMATION_SCHEMA tables. I usually don't have huge databases on my development box so the problem of querying metadata had gone unnoticed. But it turns out that Davi and his colleagues at Twitter do deal with massive amounts of data :-)

This was one of the queries that caused problems:

SELECT pool_id FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE limit 1;

Copying unused bytes is bad (duh!)

Last summer my colleague Marko Mäkelä committed this seemingly innocent performance fix for InnoDB in MySQL 5.6:

3581 Marko Makela 2011-08-10
Bug#12835650 VARCHAR maximum length performance impact

row_sel_field_store_in_mysql_format(): Do not pad the unused part of
the buffer reserved for a True VARCHAR column (introduced in 5.0.3).
Add Valgrind instrumentation ensuring that the unused part will be
flagged uninitialized.

Before this, buffers which were used to send VARCHARs from InnoDB to the MySQL server were padded with 0s if the string was shorter than specified by the column. If, e.g., the string "foo" was stored in a VARCHAR(8), InnoDB used to write "3foo00000" to the buffer (the first character - 3 - determines the actual length of the string). However, even though these trailing bytes are not used anywhere, writing 0s to the buffer certainly has a cost. Hence the fix which stops InnoDB from writing them.

Oh my were we up for a ride! All of a sudden Valgrind started barking like this (and similar) for a number of tests:

Thread 19:
Syscall param pwrite64(buf) points to uninitialised byte(s)
at 0x381C60EEE3: ??? (in /lib64/libpthread-2.12.so)
by 0x84B703: my_pwrite (my_pread.c:162) by 0x86FE6F: mi_nommap_pwrite (mysql_file.h:1203)
   by 0x88621B: _mi_write_static_record (mi_statrec.c:65)
   by 0x889592: mi_write (mi_write.c:142)
   by 0x532085: handler::ha_write_row(unsigned char*) (handler.cc:6043)
   by 0x69B7F7: end_write(JOIN*, st_join_table*, bool) (sql_select.cc:20237)
   by 0x69A43C: evaluate_join_record(JOIN*, st_join_table*, int)
(sql_select.cc:19187)
   by 0x69A8E0: sub_select(JOIN*, st_join_table*, bool) (sql_select.cc:19294)

The modifications we did to the server to remedy the problems can be divided into two categories as described below. In the end, the result was significantly improved performance.

Improvements for many-table joins in MySQL 5.6

A lot has happened in MySQL 5.6 for queries joining many tables. For the most common use cases we have drastically reduced the cost of finding the execution plan. We have also improved the heuristics and removed bugs so that the final plan is often better than it used to be. Read on if you are one of those people who do 15 way joins!

Finding a query execution plan
First some background. You can skip this part if you know how MySQL picks the table join order in 5.5.

When presented with a query, MySQL will try to find the best order to join tables by employing a greedy search algorithm. The outcome is what we call a query execution plan, QEP. When you join just a few tables, there's no problem calculating the cost of all join order combinations and then pick the best plan. However, since there are (#tables)! possible combinations, the cost of calculating them all soon becomes too high: for five tables, e.g., there are 120 combinations which is no problem to compute. For 10 tables there are 3.6 million combinations and for 15 tables there are 1307 billion. You get the picture. For this reason, MySQL makes a trade off: use heuristics to only explore promising plans. This is supposed to significantly reduce the number of plans MySQL needs to calculate, but at the same time you risk not finding the best one.

Jørgen's point of view

Monday, October 21, 2013

FAQ: InnoDB extended secondary keys

Wednesday, February 13, 2013

DBT-3 Q3: 6 x performance in MySQL 5.6.10

Tuesday, October 2, 2012

Index merge annoyances fixed in MySQL 5.6

Friday, May 4, 2012

Performance improvements for big INFORMATION_SCHEMA tables

Wednesday, April 18, 2012

Copying unused bytes is bad (duh!)

Tuesday, April 10, 2012

Improvements for many-table joins in MySQL 5.6