Tuesday, September 27, 2011

Tips and tricks: Killer response time for non-overlapping intervals

Assume you have a table where you store non-overlapping intervals using two columns, e.g. IP ranges. IP ranges are simple to represent using integer notation:

CREATE TABLE ip_owner (
   owner_id int NOT NULL,
   /* some columns */
   ip_start_int bigint NOT NULL,      /* IP address converted to integer */
   ip_end_int bigint NOT NULL,        /* IP address converted to integer */
   PRIMARY KEY (owner_id),
   INDEX ip_range (ip_start_int, ip_end_int)
) ENGINE=InnoDB;


And then you find yourself in a situation where you want to know who, if anyone, owns the IP address X. This can be done using the following query:

SELECT * FROM ip_owner WHERE ip_start_int <= X AND ip_end_int >= X;

MySQL can resolve this using a range scan, but will unfortunately only be able to use the ip_start_int <= X part of the condition as a range as explained here. Thus, the query will either be resolved by range scan if fairly few records have ip_start_int <= X or table scan otherwise. That means unreliable response time because it will be much quicker to query low-valued IPs than high valued IPs. I inserted 1M records into the table before running the queries below:


mysql> /* IP near lower end */
mysql> EXPLAIN SELECT owner_id, ip_start_int, ip_end_int
    -> FROM owner_ip 
    -> WHERE ip_start_int <= 3270002832 AND ip_end_int >= 3270002832;
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table    | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
|  1 | SIMPLE      | owner_ip | range | ip_range      | ip_range | 8       | NULL | 1417 | Using where; Using index |
+----+-------------+----------+-------+---------------+----------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> SELECT owner_id, ip_start_int, ip_end_int
    -> FROM owner_ip 
    -> WHERE ip_start_int <= 3270002832 AND ip_end_int >= 3270002832;
+----------+--------------+------------+
| owner_id | ip_start_int | ip_end_int |
+----------+--------------+------------+
|     1417 |   3270002832 | 3270002833 |
+----------+--------------+------------+
1 row in set (0.01 sec)

mysql> /* IP near upper end */
mysql> EXPLAIN SELECT owner_id, ip_start_int, ip_end_int
    -> FROM owner_ip 
    -> WHERE ip_start_int <= 3272090000 AND ip_end_int >= 3272090000;
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | owner_ip | ALL  | ip_range      | NULL | NULL    | NULL | 1048576 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

mysql> SELECT owner_id, ip_start_int, ip_end_int
    -> FROM owner_ip 
    -> WHERE ip_start_int <= 3272090000 AND ip_end_int >= 3272090000;
+----------+--------------+------------+
| owner_id | ip_start_int | ip_end_int |
+----------+--------------+------------+
|  1045001 |   3272090000 | 3272090001 |
+----------+--------------+------------+
1 row in set (6.63 sec)

mysql> /* IP close to median IP value */
mysql> SELECT owner_id, ip_start_int, ip_end_int 
    -> FROM owner_ip 
    -> WHERE ip_start_int <= 3271032832 AND ip_end_int >= 3271032832;
+----------+--------------+------------+
| owner_id | ip_start_int | ip_end_int |
+----------+--------------+------------+
|   516417 |   3271032832 | 3271032833 |
+----------+--------------+------------+
1 row in set (3.32 sec)


Luckily, we have valuable information we can use to significantly improve the response time:
  • The intervals are non-overlapping, so there is at most one record matching the conditions
  • The record we're looking for is the last record in the range of records with ip_start_int <= X. If ordered by ip_start_int in descending order, MySQL will scan the range in reverse order and thus find the record we're interested in first.
Let's use that for something good to get killer response time for any IP:
mysql> /* Apply knowledge */
mysql> /* IP near lower end */
mysql> SELECT owner_id, ip_start_int, ip_end_int 
    -> FROM owner_ip FORCE INDEX(ip_range) 
    -> WHERE ip_start_int <= 3270002832 AND ip_end_int >= 3270002832
    -> ORDER BY ip_start_int DESC
    -> LIMIT 1;
+----------+--------------+------------+
| owner_id | ip_start_int | ip_end_int |
+----------+--------------+------------+
|     1417 |   3270002832 | 3270002833 |
+----------+--------------+------------+
1 row in set (0.00 sec)

mysql> /* IP close to median IP value */
mysql> SELECT owner_id, ip_start_int, ip_end_int 
    -> FROM owner_ip FORCE INDEX(ip_range) 
    -> WHERE ip_start_int <= 3271032832 AND ip_end_int >= 3271032832
    -> ORDER BY ip_start_int DESC
    -> LIMIT 1;
+----------+--------------+------------+
| owner_id | ip_start_int | ip_end_int |
+----------+--------------+------------+
|   516417 |   3271032832 | 3271032833 |
+----------+--------------+------------+
1 row in set (0.00 sec)

mysql> /* IP near upper end */
mysql> SELECT owner_id, ip_start_int, ip_end_int 
    -> FROM owner_ip FORCE INDEX(ip_range) 
    -> WHERE ip_start_int <= 3272090000 AND ip_end_int >= 3272090000
    -> ORDER BY ip_start_int DESC
    -> LIMIT 1;
+----------+--------------+------------+
| owner_id | ip_start_int | ip_end_int |
+----------+--------------+------------+
|  1045001 |   3272090000 | 3272090001 |
+----------+--------------+------------+
1 row in set (0.00 sec)

This can of course be used for any non-overlapping ranges, e.g to query which customer rented a particular car on the date 2011-01-01:

SELECT cust_id 
FROM car_rental FORCE INDEX (rental_dates)
WHERE car_id=1001 AND pickup_date <= 2011-01-01 AND return_date >= 2011-01-01
ORDER BY picup_date DESC
LIMIT 1;

2 comments: