Tuesday, September 27, 2011

Tips and tricks: Killer response time for non-overlapping intervals

Assume you have a table where you store non-overlapping intervals using two columns, e.g. IP ranges. IP ranges are simple to represent using integer notation:

CREATE TABLE ip_owner (
   owner_id int NOT NULL,
   /* some columns */
   ip_start_int bigint NOT NULL,      /* IP address converted to integer */
   ip_end_int bigint NOT NULL,        /* IP address converted to integer */
   PRIMARY KEY (owner_id),
   INDEX ip_range (ip_start_int, ip_end_int)

And then you find yourself in a situation where you want to know who, if anyone, owns the IP address X. This can be done using the following query:

SELECT * FROM ip_owner WHERE ip_start_int <= X AND ip_end_int >= X;

MySQL can resolve this using a range scan, but will unfortunately only be able to use the ip_start_int <= X part of the condition as a range as explained here. Thus, the query will either be resolved by range scan if fairly few records have ip_start_int <= X or table scan otherwise. That means unreliable response time because it will be much quicker to query low-valued IPs than high valued IPs. I inserted 1M records into the table before running the queries below: