Friday, March 28, 2014

The range access method and why you should use EXPLAIN JSON

I got an interesting question about EXPLAIN and the range access method recently. The person had a query that could be written either with a BETWEEN predicate or an IN predicate, something similar to this:
mysql> EXPLAIN SELECT * 
    -> FROM orders WHERE customer_id BETWEEN 7 AND 10 AND value > 500;
+----+-------------+--------+-------+----------+----------+------+------
| id | select_type | table  | type  | key      | key_len  | rows | Extra
+----+-------------+--------+-------+----------+----------+------+------
|  1 | SIMPLE      | orders | range | cust_val | 10       |   91 |  ...
+----+-------------+--------+-------+----------+----------+------+------

mysql> EXPLAIN SELECT * 
    -> FROM orders WHERE customer_id IN (7,8,9,10) AND value > 500;
+----+-------------+--------+-------+----------+----------+------+------
| id | select_type | table  | type  | key      | key_len  | rows | Extra
+----+-------------+--------+-------+----------+----------+------+------
|  1 | SIMPLE      | orders | range | cust_val | 10       |   44 |  ...
+----+-------------+--------+-------+----------+----------+------+------

The table was:

CREATE TABLE orders (
   order_id INT NOT NULL PRIMARY KEY,
   customer_id INT,
   value INT,
   order_date DATE,
   KEY custid_value (customer_id, value)
)

Given that customer_id is an integer value, these queries should be equivalent. And that's what EXPLAIN seems to tell us too: same access method, same key, same key_length etc.

You may have guessed the question by now: If the queries are equivalent, why isn't the rows estimate identical? Is one of the numbers wrong or do these queries have different execution plans after all?