Index Usage with Functions

Rupert Harwood -

I just ran into this issue with a developer and thought it was blog-worthy.

Our UI was doing the following query:

SELECT node,
FROM  hotness_history
WHERE `database` = 'wiki_test'
      AND `table` = 'revision'
      AND `index` = '__idx_revision__PRIMARY'
      AND Unix_timestamp(timestamp) = 1389210952;

 At one point, we’d been doing timestamp = ‘2013-12-12 12:01’, but in order to avoid strange timezone issues, we converted to using integer UNIX timestamp throughout the UI code.  This lead us to unix_timestamp(timestamp) = 1389210952.

However, this performed terribly, taking as long as 5 seconds.  The explain shows us why:


The fact that the index_scan line shows nothing after the table.representation means we’re not sarging (using an index), thus we are reading the entire table, then filtering for a match on the timestamp.

The problem is that we can’t use an index on a column when we’re applying a function on that column (same limitation exists in MySQL, btw).  However, we can pretty easily eliminate this obstacle by inverting the function call and moving it to the other side of the expression:  timestamp = from_unixtime(123234235).

mysql> SELECT node,
   ->        READS,
   ->        replicas,
   ->        ranked_replicas,
   ->        total_replicas
   -> FROM   hotness_history
   -> WHERE  `database` = 'wiki_test'
   ->        AND `table` = 'revision'
   ->        AND `index` = '__idx_revision__PRIMARY'
   ->        AND timestamp = From_unixtime(1389210952);
| node | READS | replicas | ranked_replicas | total_replicas |
|    1 |  2732 |        1 |               1 |              6 |
|    3 |  2758 |        1 |               1 |              6 |
|    5 |  2732 |        1 |               1 |              6 |
3 rows in set (0.01 sec) 

Note the explain now shows us sarging (the bit after the comma on the index_scan line), and  thus reading very few rows, hence execution time is now trivial:


It is worth noting that while moving the function call to be over the constant allows us to use the index, the function call over the constant does interfere with the planner’s ability to utilize statistics about the distribution of values in the index, or in other words, to properly cost the plan using that index.  Thus it is possible that when using a function call around the constant parameter, the planner will choose a different plan than if you had given a simple constant (i.e. performed that function in the application instead of the database).  Thus it is often best to apply functions on constants within the application; if this is not possible or desirable (in this case we keep it in the database in order to let the database sort out timezone conversion), you may wish to use planner hints to force index selection (FORCE or USE INDEX (index_name)).

Article originally written by Nathan Parrish.

Have more questions? Submit a request


Please sign in to leave a comment.
Powered by Zendesk