Tuesday, October 4, 2011

Optimizer tracing: Query Execution Plan descriptions beyond EXPLAIN

Understanding why MySQL chooses a particular join order or why table scan is chosen instead of range scan is often very hard even for experienced MySQL users. Two almost identical queries, differing only in constant values, may produce completely different plans. That's why we're introducing a great new feature in 5.6: Optimizer Tracing. The target users of this feature are developers and MySQL users experienced enough to understand the ins and outs of EXPLAIN.

What Optimizer Tracing is
You may already have guessed this, but optimizer tracing is a printout  of important decisions the MySQL optimizer has done during the process of making the Query Execution Plan.

The trace is presented in JSON format which is easy to read both for humans and others.

Currently, the optimizer trace includes short explanations for:
  • Why the join order was chosen.
  • Important query transformations like IN to EXISTS.
  • The access methods applicable and why one of them was chosen.
  • Which conditions are attached to each table (i.e., what hides behind "Using where" in EXPLAIN)
More coverage will be added as we go along.

What Optimizer Tracing is NOT
The feature is not, and never will be, complete in the sense that it does not describe all choices the optimizer does. However, we're going to add more information when we find valuable things to add. We will not guarantee backwards compatibility like we do for EXPLAIN. This gives us the freedom to improve tracing at a high pace without going through multi-release deprecation warnings etc. In other words: optimizer tracing is not a replacement for EXPLAIN.

A quick tour
The best way get a feeling of optimizer tracing is to give it a try. Below is a teaser:

mysql> EXPLAIN SELECT * FROM orders WHERE  customer_id <> 6 OR (customer_id <> 8 AND value = 500);
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra       |
|  1 | SIMPLE      | orders | ALL  | customer_id   | NULL | NULL    | NULL |  160 | Using where |
1 row in set (0.00 sec)

mysql> /* Insert a few rows into orders */
mysql> EXPLAIN SELECT * FROM orders WHERE  customer_id <> 6 OR (customer_id <> 8 AND value = 500);
| id | select_type | table  | type  | possible_keys | key         | key_len | ref  | rows | Extra                 |
|  1 | SIMPLE      | orders | range | customer_id   | customer_id | 10      | NULL |   26 | Using index condition |
1 row in set (0.00 sec)

So why did the plan change from table scan to range scan after inserting a few rows? Let's see what optimizer tracing has to report. You can turn on and modify it's behavior using a session variable called optimizer_trace. enabled=on turns on tracing, end_marker=off makes the trace a bit less human readable but JSON compliant and one_line=off is to make a pretty-printed trace:

mysql> SET optimizer_trace="enabled=on,end_marker=off,one_line=off";
mysql> SELECT * FROM orders WHERE  customer_id <> 6 OR (customer_id <> 8 AND value = 500);
mysql> SELECT trace FROM information_schema.OPTIMIZER_TRACE INTO DUMPFILE 'trace.json';

The performance of that query (and later queries in that session) will probably be bad, so be sure to turn off the tracing (enabled=off) once you've traced the queries you're interested in.

This is what it looks like when I open trace.json in Firefox using the JSONView add-on. I have collapsed ({...})a few objects that are not of interest in the example:

Some key information you can read from the trace:
  • table scan would, if chosen, read an estimated 170 records at a cost of 38.64. 
  • it's not possible to do range scan using the primary key, but the customer_id index may be used for this purpose.
  • range scan (cost 34.21) is cheaper than table scan (cost 38.64), so MySQL choses to use range scan.
BTW: The example query was used because the range optimizer used to have a bug that resulted in incorrect ranges for the query. It used to be

"NULL < customer_id < 6", 
"6 <= customer_id <= 6 AND 500 <= value <= 500", 
"6 < customer_id < 8", 
"8 <= customer_id <= 8 AND 500 <= value <= 500", 
"8 < customer_id". 

This was the first bug optimizer tracing helped me fix.

If you want to know more:
  • Guilhem Bichot has a blog about how to use the optimizer tracing related session variables.
  • The internals manual has a chapter about it


  1. This will be very useful. And I am very happy that you used JSON instead of XML.

  2. This is amazing!

    I stole your picture to advertise your post and the feature in Russian-speaking community at http://ru-mysql.livejournal.com/281844.html I hope this is OK.

  3. @Mark: Yes, I'm using it almost every day.

    @Sveta: Thank you :)

  4. I agree with Mark, JSON is better than XML, and the feature is very valuable.

  5. I'm glad our main target users are as enthusiastic as we are :-)

  6. I have been using optimizer tracing and have found the
    results very helpful when combined with analyzing the source code, but
    there a few questions I have about row count and cost.

    1. How is the row count more than merely informational to the
    optimizer? I had the impression the row count did not directly
    contribute to the optimizer's decision as much as the estimated cost,
    or that the row count was already accounted for/factored into in the
    final cost measurement.
    2. Related to that question, what is cost exactly? Is there a unit
    of measurement (ex: Time, Disk Seeks, etc.)?
    3. Is cost more important than row count or equally important?
    4. What is the genesis of the cost variable displayed in the trace?
    I assume it is just a guess on the respective storage engine's part of
    how expensive (in terms of disk access, processing, time, etc.)
    getting the rows will be for the query.

    I have some example snippets showing the optimizer seemingly using cost and row count to choose one index over both a range scan and a few other indexes:
    "table_scan":{"rows": 3108329, "cost": 655076}
    "rows": 2377102, "cost": 2.81e6,
    "rows": 1554164, "cost": 1.84e6,
    "rows": 2346664,"cost": 2.78e6,
    "rows": 1554164,"cost": 1.84e6,
    "considered_access_paths": [
    {"access_type": "ref",
    "index": "reports_group_id",
    "rows": 1.55e6,"cost": 411057,
    "chosen": true},

    1. Hi Christian,

      Very interesting questions that touch some pain point we have. I'll try to explain...

      The row count is mostly used when calculating costs, e.g. to estimate how many blocks will have to be read from disk (sequential read or random read depending on access method) and to estimate the cost of processing these rows in the server during execution. In addition, the row count also determines how many partial rows have to be joined to the next table in the join plan, and this is often the most important factor when calculating cost for that next table. However, the costs printed in the optimizer trace includes all this and decisions are for the most part made based on cost comparisons. There are also a few special heuristics in the optimizer that will trigger on row counts, but those are rare.

      It is not easy to define what cost is. It is an abstract unit most closely tied to the cost of doing a random read operation from HDD. Reading rows using 'ref' access method will have an IO cost equal to the rows estimate because it is assumed that the rows are in different blocks. Naturally, a table scan has a cost that is lower than the number of rows in the table because a) there are many rows per block, and b) it is cheaper to read sequentially than randomly on an HDD. Scanning a 1000 row table may therefore get an IO cost of e.g. 100 ("scanning the table has the cost equivalent of randomly reading 100 blocks"). Btw, numbers here are just examples.

      So far so good. Then comes the most important painful facts: Random read operations have wildly different response time on different media. E.g., SSDs were not around when this cost model was crafted in the first place. Furthermore, since the performance of CPUs have increased a great deal more than IO the last two decades, the cost calculation functions overestimates the cost of handling those rows in the execution apparatus.

      These issues are on our radar and the cost model will at some point be significantly reworked to model modern hardware and storage engine characteristics better.

      The row counts are important because they are a significant contributing factor to the cost, but the costs are what the optimizer usually compares.

      IIRC, only the IO costs for table scans and index scans are taken from the storage engine. Costs for other access methods are calculated in the MySQL optimizer.

      > ...
      > "table_scan":{"rows": 3108329, "cost": 655076}
      > ...
      > "range_scan_alternatives"
      > ...
      > "rows": 2377102, "cost": 2.81e6,
      > ...
      > "rows": 1554164, "cost": 1.84e6,
      > ...
      > "rows": 2346664,"cost": 2.78e6,
      > ...
      > "rows": 1554164,"cost": 1.84e6,
      > ...
      > "best_access_path":{
      > "considered_access_paths": [
      > {"access_type": "ref",
      > "index": "reports_group_id",
      > "rows": 1.55e6,"cost": 411057,
      > "chosen": true},

      This is not the end of the "considered_access_paths" block. There are probably more {access_type...} blocks within it that will show either "chosen:true" or "chosen:false" for scans and range access. It is always the latest occurance of "chosen:true" that determines which access method is actually chosen. The reason for this somewhat confusing output (multiple occurances of "chosen:true") is that it is based on what the optimizer knows at the time it is written; a "chosen:true" means more or less "this is the best access method seen so far". I find it a bit strange that the 'ref' access method above is cheaper than table scan when only 2x as many rows need to be scanned, but I'd need more info to determine why. In any case, I think your question is more of an academic nature than fixing a problematic query...