Monday, December 17, 2012

Favorite MySQL 5.6 features: an optimizer perspective

There are so many exciting new features in MySQL 5.6 that I almost don't know where to start. To mention a few, MySQL's multi-core scalability has been significantly improved to meet modern hardware, InnoDB has better index statistics, much better performance, and online ALTER, replication has multi-threaded slaves and global transaction identifiers, performance schema has added capabilities to provide a much more detailed view of what's bogging a server down, and more... much more.

However, my prime interest is the optimizer, which is why I've compiled a list of my favorite new optimizer features in 5.6. Here goes:

New ways to understand query plans:
The most common requests from DBAs is to get more information to understand how and why MySQL behaves like it does. You can come a very long way at squeezing performance out of MySQL if you can answer questions like "What Query Execution Plan (QEP) did the server decide to use?" and "Why was this QEP picked?". In this department, MySQL 5.6 delivers a bunch of new features.


MySQL has had EXPLAIN for a long time. As you sure know, it is a tool that tells you more or less how a query will be executed. The output is crude and misses some important details like which predicates are evaluated on which table and which conditions are pushed to an index when ICP is used. In addition, MySQL used to materialize subqueries in the FROM clause to figure out the plan, something which made such EXPLAINs very slow. As an example, the response time for EXPLAIN of DBT3 query 8 used to be as long as executing the query itself (~4 hours with scale factor 10 and 1GB DB buffer) but is now down to a few milliseconds.

Fortunately, much has improved in the EXPLAIN landscape in MySQL 5.6. Before, you couldn't EXPLAIN anything except SELECT statements, but now you can use it on UPDATE and DELETE statements as well. And the annoying "have to materialize subquery in FROM clause" issue is gone once and for all. Furthermore, in addition to the traditional output, you can now ask EXPLAIN to produce JSON output which MySQL Workbench can visualize. Not only is the visual representation of EXPLAIN waaaaay more readable - you also get more information (like which predicates are pushed to an index etc) than from the traditional output.

Another very interesting new feature is Optimizer Tracing which can show you why the optimizer decided on a particular QEP. Due to all the details in it's output, Optimizer Tracing is mostly intended for advanced MySQL users but once you get the hang of it you'll get a very good picture of why the optimizer decided on a plan. We optimizer developers use it all the time; it speeds up and improves the quality of development and has uncovered a few inconsistencies (now fixed) in the optimizer. I'm sure the development cost of this new feature has already reached break-even and that's before the feature has been released!

Much better join optimizer:
The part of the optimizer that decides the order of which tables are joined is called the join optimizer. It employs a greedy search algorithm that has to calculate n! (n being the number of tables to join) combinations of join orders. Unfortunately, n! quickly becomes an overwhelmingly big number. To avoid that the optimizer spends days trying to find an optimal join order, heuristics are employed to avoid calculating non-promising partial plans. This helps quite a bit, but not nearly enough for joins with many - say 20 - tables. The optimizer_search_depth variable can therefore be used to reduce the number of calculated plans, but this comes with an increased risk of not finding a good plan.

The good news is that MySQL 5.6 offers significant improvements for many-table joins. First, the ordering of tables done before greedy search starts has been improved to take join dependency into account. This increases the probability that greedy search will find a pretty good plan early on, which in turn makes the heuristics able to avoid calculating a lot of plans. Second, greedy search will no longer calculate all permutations of unique-key joins. If a table can be added to a partial plan by using unique-key lookup (eq_ref), there is no need to calculate the cost of adding other tables at that stage because nothing can be better. This is an important change since in many-table joins, equality predicates on unique keys are quite common. The performance improvement for many-table joins with a lot of unique key joins is mind-blowing.

New ways to optimize IN subqueries:
One of the most significant improvements in MySQL 5.6 is that it offers new ways to optimize subqueries. This is important, because subquery performance has not been one of the strongest sides of MySQL. There are basically two new strategies the optimizer can employ: Subquery Materialization and Semi-Join.

With Subquery Materialization, a subquery is executed once and the result is stored. In MySQL 5.5, the subquery would instead be executed for each row in the outer query. Our tests show that with a scale factor of 10, DBT-3 query 18 execution time drops from months to 68 seconds.

The other new strategy, Semi-join, is a way to convert a subquery to an inner join. After conversion, the optimizer may choose a more optimal join order for the tables. However, inner joins may produce duplicates while subqueries may not, so for semi-join to work we need some way to remove these duplicate rows. Duplicate elimination strategies include FirstMatch, LooseScan, Materialization and Duplicate Weedout. Semi-join enables great improvements for a wide range of queries with IN subqueries, e.g. DBT-3 query 20.

Index handling improvements:
Another new feature is Index Condition Pushdown (ICP). With ICP, conditions that apply to columns that are part of a multi-column index are evaluated on the index without reading from the table or returning anything to the server. If the conditions evaluate to false, MySQL simply moves on to the next index entry. Every time a pushed condition evaluates to false, a lot of I/O and CPU time is saved. This is what would happen without ICP: "look up row in table, convert row from storage engine to MySQL server format, return row to server, evaluate conditions, discard row, request read of next index row". Naturally, avoiding all this this can speed up queries a great deal.

One of the latest improvements added to MySQL 5.6 is that the optimizer may now make use of the primary key columns that are automatically part of any secondary index in InnoDB. With this, MySQL may use the secondary indexes as if all primary key columns were explicitly added to them.

Disk-bound performance improvements:
Disk Sweep Multi Range Read (DsMRR, for range access) and Batched Key Access (BKA, for index lookups) takes physical disk storage properties into account to improve performance. The idea is to read rows from disk in the order they are stored and thus improve the likelyhood that multiple rows can be read from a page that is still in the buffer pool and also increase benefits from prefetching. Response time for many of the DBT-3 queries is reduced significantly by these new features.

ORDER BY with small LIMIT:
MySQL 5.6 also includes an optimization for queries with a small limit. For large datasets, MySQL used to create temporary sorted files that were merged. Now, a priority queue is maintained for only as many rows as require to produce the result, something which avoids a lot of unnecessary sorting. As always, the performance improvements depend on the query and the data, but the example query in didrik's blog goes from 40s to 10s response time.

Delayed Materialization of Derived Tables:
Starting with MySQL 5.6, derived tables are no longer materialized unless they need to be used. One effect of this has already been mentioned: reduced execution time of EXPLAINs with subqueries in the FROM clause. In addition, the derived table may now be indexed to improve performance.

3 comments:

  1. It is nice to get these improvements. It is even better to get descriptions like this. Keep on blogging.

    ReplyDelete
  2. I like this blog very much. There are so many improvements. I am a database architecture and developer in China. I'll follow your blog all the time.

    ReplyDelete