Jørgen's point of view: new feature

Showing posts with label new feature. Show all posts

Thursday, December 20, 2012

Gluh on InnoDB extended secondary keys

Sergey Glukhov (Gluh) recently wrote an interesting blog about InnoDB secondary key improvements in MySQL 5.6. His blog isn't aggregated to planet.mysql.com but certainly deserves some attention.

Here it is: InnoDB, extended secondary keys.

Monday, December 17, 2012

Favorite MySQL 5.6 features: an optimizer perspective

There are so many exciting new features in MySQL 5.6 that I almost don't know where to start. To mention a few, MySQL's multi-core scalability has been significantly improved to meet modern hardware, InnoDB has better index statistics, much better performance, and online ALTER, replication has multi-threaded slaves and global transaction identifiers, performance schema has added capabilities to provide a much more detailed view of what's bogging a server down, and more... much more.

However, my prime interest is the optimizer, which is why I've compiled a list of my favorite new optimizer features in 5.6. Here goes:

New ways to understand query plans:
The most common requests from DBAs is to get more information to understand how and why MySQL behaves like it does. You can come a very long way at squeezing performance out of MySQL if you can answer questions like "What Query Execution Plan (QEP) did the server decide to use?" and "Why was this QEP picked?". In this department, MySQL 5.6 delivers a bunch of new features.

Index merge annoyances fixed in MySQL 5.6

While the index merge access types certainly are useful for a number of queries, there has been some frustration expressed both from customers and the community about how it...

is not used when it should have been
is used when ref access is obviously better
merges suboptimal indexes
is too restricted in which conditions can be used

I could come up with numerous examples of related bugs and feature requests dating back more than six years. To list a few: 17673, 30151, 23322, 65274, 65359.

The good news is that we have fixed all these issues in the latest released MySQL 5.6 server.

Consider the problem in BUG#65274 (simplified a bit):
CREATE TABLE phpbb_posts (
...
KEY `topic_id` (`topic_id`),
KEY `forum_id` (`forum_id`)
KEY `tid_fid` (`topic_id`,`forum_id`),
);
SELECT count(*) FROM phpbb_posts WHERE topic_id = 110126 AND forum_id = 19;

On queries with many values in the IN clause

A few customers with rather extreme needs have contacted us about a performance issue with the range optimizer. Our solution to the problem is to introduce a new variable in MySQL 5.6, eq_range_index_dive_limit, which can be used to control whether or not the range optimizer will a) do index dives, or b) use index statistics when estimating the number of rows in the ranges of the query. The former method gives a far more accurate estimate while the latter costs a lot less to compute.

This is what the help text has to tell about the variable:

The optimizer will use existing index statistics instead of doing index dives for equality ranges if the number of equality ranges for the index is larger than or equal to [the value of variable]. If set to 0, index dives are always used.

"Equality range" means predicates using operators IN() or =, and it's important to notice that the number of such ranges is counted on a per index basis. Furthermore, index dives are not used on unique/primary indexes, so only non-unique indexes are affected.

The two ways row estimates can be computed

For as long as there have been a range access method in MySQL, the number of rows in a range has been estimated by diving down the index to find the start and end of the range and use these to count the number of rows between them. This technique is accurate, and is therefore a good basis to make the best possible execution plan (QEP) for the query. Unfortunately, it involves two index dives for each range. That's not a big deal if you run normal queries with a few equality ranges like

SELECT title FROM albums WHERE artist IN (10, 12)

But some MySQL users have queries with hundreds of values in the IN clause. For such queries it may take considerably longer to optimize the query than to execute it. This is when it makes sense to use the less accurate index statistics. Consider this example:

Index Condition Pushdown to the rescue!

A while ago, I explained how range access in a multiple-part index works and why MySQL can't utilize key parts beyond the first occurrence of some often used comparison operators. Luckily, there is a great improvement underway in MySQL 5.6 that will remedy much of this limitation. Meet Index Condition Pushdown.

How does ICP work?

Index Condition Pushdown is a new way for MySQL to evaluate conditions. Instead of evaluating conditions on rows read from a table, ICP makes it possible to evaluate conditions in the index and thereby avoid looking at the table if the condition is false.

Let's assume that we have a multiple-part index covering columns (keypart_1, ..., keypart_n). Further assume that we have a condition with a comparison operator on keypart_1 that does not allow keypart_2,...,keypart_n to be used as a range condition (more on that here).

When MySQL does range access without ICP enabled, the index is traversed to locate rows in the table that are within the range(s) relevant for the conditions on keypart_1. The rows in these ranges are read from the table and returned from the storage engine to the MySQL server. The server then evaluates the remaining conditions, including those that apply to keypart_2,..,keypart_n.

Optimizer tracing: Query Execution Plan descriptions beyond EXPLAIN

Understanding why MySQL chooses a particular join order or why table scan is chosen instead of range scan is often very hard even for experienced MySQL users. Two almost identical queries, differing only in constant values, may produce completely different plans. That's why we're introducing a great new feature in 5.6: Optimizer Tracing. The target users of this feature are developers and MySQL users experienced enough to understand the ins and outs of EXPLAIN.

What Optimizer Tracing is
You may already have guessed this, but optimizer tracing is a printout of important decisions the MySQL optimizer has done during the process of making the Query Execution Plan.

The trace is presented in JSON format which is easy to read both for humans and others.

Currently, the optimizer trace includes short explanations for:

Why the join order was chosen.
Important query transformations like IN to EXISTS.
The access methods applicable and why one of them was chosen.
Which conditions are attached to each table (i.e., what hides behind "Using where" in EXPLAIN)

More coverage will be added as we go along.

What Optimizer Tracing is NOT
The feature is not, and never will be, complete in the sense that it does not describe all choices the optimizer does. However, we're going to add more information when we find valuable things to add. We will not guarantee backwards compatibility like we do for EXPLAIN. This gives us the freedom to improve tracing at a high pace without going through multi-release deprecation warnings etc. In other words: optimizer tracing is not a replacement for EXPLAIN.

A quick tour
The best way get a feeling of optimizer tracing is to give it a try. Below is a teaser:

Jørgen's point of view

Thursday, December 20, 2012

Gluh on InnoDB extended secondary keys

Monday, December 17, 2012

Favorite MySQL 5.6 features: an optimizer perspective

Tuesday, October 2, 2012

Index merge annoyances fixed in MySQL 5.6

Tuesday, April 10, 2012

On queries with many values in the IN clause

Monday, March 19, 2012

Index Condition Pushdown to the rescue!

Tuesday, October 4, 2011

Optimizer tracing: Query Execution Plan descriptions beyond EXPLAIN