Tuesday, October 2, 2012

Index merge annoyances fixed in MySQL 5.6

While the index merge access types certainly are useful for a number of queries, there has been some frustration expressed both from customers and the community about how it...
  1. is not used when it should have been
  2. is used when ref access is obviously better
  3. merges suboptimal indexes
  4. is too restricted in which conditions can be used
I could come up with numerous examples of related bugs and feature requests dating back more than six years. To list a few: 17673, 30151, 23322, 65274, 65359.

The good news is that we have fixed all these issues in the latest released MySQL 5.6 server.

Consider the problem in BUG#65274 (simplified a bit): 
CREATE TABLE phpbb_posts (
  ...
  KEY `topic_id` (`topic_id`),
  KEY `forum_id` (`forum_id`)
  KEY `tid_fid` (`topic_id`,`forum_id`),
);

SELECT count(*) FROM phpbb_posts WHERE topic_id = 110126 AND forum_id = 19;



Before, MySQL would do an intersecting index merge between the indexes 'topic_id' and 'forum_id' even though 'ref' access using index 'tid_fid' is obviously better. The main problem was the order indexes were attempted added to the merge: MySQL used to start with 'topic_id' and 'forum_id' and, since all required fields were covered, would settle for this merge.

In 5.6.7 we have changed the order in which potential indexes are added to an intersection merge. Simply put, we now try to merge as few indexes as possible by preferring those that cover the most fields that are part of the condition. Thus, the index merge now starts out with the 'tid_fid' index. Since one index is enough to evaluate all conditions in this query, there will not be any index merge at all and 'ref' access is chosen instead.
Another improvement in 5.6.7 removes a great deal of restrictions to which conditions can be used for index merge union. Consider the following example, in which we want to count the biggest (in terms of value or quantity) orders through all times and in 2012, respectively. Note that the data is from an imagined startup, so only 5% of the orders are from before 2012. A session from MySQL 5.5.27:

mysql> CREATE TABLE orders (
    ->   order_id INT PRIMARY KEY AUTO_INCREMENT,
    ->   cust_id INT,
    ->   order_date date,
    ->   no_products INT,
    ->   value INT,
    ->   KEY ix_cust (cust_id),
    ->   KEY ix_odate (order_date),
    ->   KEY ix_noprod (no_products),
    ->   KEY ix_val (value)
    -> ) ENGINE = InnoDB;

mysql> /* insert 130k rows /*
mysql> EXPLAIN SELECT COUNT(order_id) FROM orders
    -> WHERE (no_products > 150 OR value > 1100)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: index_merge
possible_keys: ix_noprod,ix_val
          key: ix_noprod,ix_val
      key_len: 5,5
          ref: NULL
         rows: 9052
        Extra: Using sort_union(ix_noprod,ix_val); Using where
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(order_id) FROM orders
    -> WHERE (no_products > 150 OR value > 1100) AND order_date >= '2012-01-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: ALL
possible_keys: ix_odate,ix_noprod,ix_val
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 131649
        Extra: Using where
1 row in set (0.00 sec)

See how MySQL switches from index_merge union to table scan, and how this affects the rows estimate? This is obviously a bad choice and was due to a simplification in the range optimizer that just ignored possible index merge unions if it was possible to do a simple range scan instead (in this case a range scan using "order_date >= '2012-01-01'").

Note that this was a well documented restriction: "If a range scan is possible on some key, the optimizer will not consider using Index Merge Union or Index Merge Sort-Union algorithms."

Now let's see what MySQL 5.6.7 does. The query counting the biggest orders through all times choses the same execution plan as 5.5.27 so it's left out:

mysql> EXPLAIN SELECT COUNT(order_id) FROM orders
    -> WHERE (no_products > 150 OR value > 1100) AND order_date >= '2012-01-01'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: index_merge
possible_keys: ix_odate,ix_noprod,ix_val
          key: ix_noprod,ix_val
      key_len: 5,5
          ref: NULL
         rows: 8952
        Extra: Using sort_union(ix_noprod,ix_val); Using where
1 row in set (0.01 sec)

As you can see, MySQL is now able to recognize that union index merge is the best way to access this table.

PS: it is still possible to construct test cases with complex conditions that in theory could and should be handled by index merge but is not. Please file bugs if you have examples of real use cases where this applies - this gives me an excuse to further improve the range optimizer. For now, we have raised the bar significantly and can (as far as I know) handle the conditions we have bug reports on.

4 comments:

  1. Here's an example:
    for the table t0 from mysql-test/include/index_merge1.inc:

    mysql> explain select * from t0 where (key1 < 2 or key2 < 4) and (key3 > 10) or (key1 < 1 or key2 < 3) and (key3 < 20);
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref
    | rows | Extra |
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
    | 1 | SIMPLE | t0 | ALL | i1,i2,i3 | NULL | NULL | NULL | 1024 | Using
    where |
    +----+-------------+-------+------+---------------+------+---------+------+------+-------------+
    1 row in set (0.00 sec)
    (This is the output from mysql-5.6.7-rc-linux2.6-x86_64.tar.gz. BTW
    could you update 5.6 on launchpad, please?)

    Index merge is not chosen here though it's much cheaper than the table scan.

    Also almost any test case with multi-component indexes
    from lp:~maria-captains/maria/5.3/mysql-test/t/range_vs_index_merge.test
    fails to choose an index merge.

    I'm afraid the problem cannot be resolved with such simple tweaks in opt_range.cc as you used. They could be enough to fix all the reported
    bugs you mentioned though.

    Igor Babaev
    igor@askmonty.org

    ReplyDelete
    Replies
    1. Igor,

      Thanks for the example. As I wrote above I am aware of and able to write examples of such complex conditions myself. What I wanted to know was if there are users with real problems that would need even more enhancements in this area. If not, I think you agree with me that a developer's time is best spent on enhancements that users care about.

      I figured the MySQL community would be eager to provide this feedback, but it is better to do so through bug reports.

      Delete
    2. Jørgen.

      Let's take the query from
      http://bugs.mysql.com/bug.php?id=17673

      We have for mysql-5.6.7

      mysql> explain select * from tmerge2 where ((c2 = 1 and c3 = 1) or (c4 =2 and c5 = 1)) and cp = 1\G
      *************************** 1. row ***************************
      id: 1
      select_type: SIMPLE
      table: tmerge2
      type: index_merge
      possible_keys: k1,k2
      key: k1,k2
      key_len: 12,12
      ref: NULL
      rows: 2
      Extra: Using sort_union(k1,k2); Using where

      and

      mysql> explain select * from tmerge2 where (c2 = 1 and c3 = 1 and cp=1) or (c4=2 and c5=1 and cp=1)\G
      *************************** 1. row ***************************
      id: 1
      select_type: SIMPLE
      table: tmerge2
      type: index_merge
      possible_keys: k1,k2
      key: k1,k2
      key_len: 14,14
      ref: NULL
      rows: 2
      Extra: Using sort_union(k1,k2); Using where

      See that in the second case the index merge uses 3 components for each merged index, while in the first case
      only two major components
      (key_len: 14,14 and key_len: 12,12 respectively)

      Delete
    3. Igor,

      I agree that this should be improved as well so I just made a fix. I'll brush the patch up as soon as I get the time for it :-)

      Delete