Jørgen's point of view: Index merge annoyances fixed in MySQL 5.6

While the index merge access types certainly are useful for a number of queries, there has been some frustration expressed both from customers and the community about how it...

is not used when it should have been
is used when ref access is obviously better
merges suboptimal indexes
is too restricted in which conditions can be used

I could come up with numerous examples of related bugs and feature requests dating back more than six years. To list a few: 17673, 30151, 23322, 65274, 65359.

The good news is that we have fixed all these issues in the latest released MySQL 5.6 server.

Consider the problem in BUG#65274 (simplified a bit):
CREATE TABLE phpbb_posts (
...
KEY `topic_id` (`topic_id`),
KEY `forum_id` (`forum_id`)
KEY `tid_fid` (`topic_id`,`forum_id`),
);
SELECT count(*) FROM phpbb_posts WHERE topic_id = 110126 AND forum_id = 19;

Before, MySQL would do an intersecting index merge between the indexes 'topic_id' and 'forum_id' even though 'ref' access using index 'tid_fid' is obviously better. The main problem was the order indexes were attempted added to the merge: MySQL used to start with 'topic_id' and 'forum_id' and, since all required fields were covered, would settle for this merge.

In 5.6.7 we have changed the order in which potential indexes are added to an intersection merge. Simply put, we now try to merge as few indexes as possible by preferring those that cover the most fields that are part of the condition. Thus, the index merge now starts out with the 'tid_fid' index. Since one index is enough to evaluate all conditions in this query, there will not be any index merge at all and 'ref' access is chosen instead.
Another improvement in 5.6.7 removes a great deal of restrictions to which conditions can be used for index merge union. Consider the following example, in which we want to count the biggest (in terms of value or quantity) orders through all times and in 2012, respectively. Note that the data is from an imagined startup, so only 5% of the orders are from before 2012. A session from MySQL 5.5.27:

mysql> CREATE TABLE orders (
    ->   order_id INT PRIMARY KEY AUTO_INCREMENT,
    ->   cust_id INT,
    ->   order_date date,
    ->   no_products INT,
    ->   value INT,
    ->   KEY ix_cust (cust_id),
    ->   KEY ix_odate (order_date),
    ->   KEY ix_noprod (no_products),
    ->   KEY ix_val (value)
    -> ) ENGINE = InnoDB;

mysql> /* insert 130k rows /*
mysql> EXPLAIN SELECT COUNT(order_id) FROM orders
    -> WHERE (no_products > 150 OR value > 1100)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: index_merge
possible_keys: ix_noprod,ix_val
          key: ix_noprod,ix_val
      key_len: 5,5
          ref: NULL
         rows: 9052
        Extra: Using sort_union(ix_noprod,ix_val); Using where
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(order_id) FROM orders
    -> WHERE (no_products > 150 OR value > 1100) AND order_date >= '2012-01-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: ALL
possible_keys: ix_odate,ix_noprod,ix_val
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 131649
        Extra: Using where
1 row in set (0.00 sec)

See how MySQL switches from index_merge union to table scan, and how this affects the rows estimate? This is obviously a bad choice and was due to a simplification in the range optimizer that just ignored possible index merge unions if it was possible to do a simple range scan instead (in this case a range scan using "order_date >= '2012-01-01'").

Note that this was a well documented restriction: "If a range scan is possible on some key, the optimizer will not consider using Index Merge Union or Index Merge Sort-Union algorithms."

Now let's see what MySQL 5.6.7 does. The query counting the biggest orders through all times choses the same execution plan as 5.5.27 so it's left out:

mysql> EXPLAIN SELECT COUNT(order_id) FROM orders
    -> WHERE (no_products > 150 OR value > 1100) AND order_date >= '2012-01-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: index_merge
possible_keys: ix_odate,ix_noprod,ix_val
          key: ix_noprod,ix_val
      key_len: 5,5
          ref: NULL
         rows: 8952
        Extra: Using sort_union(ix_noprod,ix_val); Using where
1 row in set (0.01 sec)

As you can see, MySQL is now able to recognize that union index merge is the best way to access this table.

PS: it is still possible to construct test cases with complex conditions that in theory could and should be handled by index merge but is not. Please file bugs if you have examples of real use cases where this applies - this gives me an excuse to further improve the range optimizer. For now, we have raised the bar significantly and can (as far as I know) handle the conditions we have bug reports on.

4 comments:

igorOctober 2, 2012 at 7:45 PM
Here's an example:
for the table t0 from mysql-test/include/index_merge1.inc:

mysql> explain select * from t0 where (key1 < 2 or key2 < 4) and (key3 > 10) or (key1 < 1 or key2 < 3) and (key3 < 20);
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref
| rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | t0 | ALL | i1,i2,i3 | NULL | NULL | NULL | 1024 | Using
where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
(This is the output from mysql-5.6.7-rc-linux2.6-x86_64.tar.gz. BTW
could you update 5.6 on launchpad, please?)

Index merge is not chosen here though it's much cheaper than the table scan.

Also almost any test case with multi-component indexes
from lp:~maria-captains/maria/5.3/mysql-test/t/range_vs_index_merge.test
fails to choose an index merge.

I'm afraid the problem cannot be resolved with such simple tweaks in opt_range.cc as you used. They could be enough to fix all the reported
bugs you mentioned though.

Igor Babaev
igor@askmonty.org

Tuesday, October 2, 2012

Index merge annoyances fixed in MySQL 5.6

4 comments: