Using Hotness_History to identify hot tables.

Henry Hwang -


Hotness History

Although we have a useful tool called hotness_history that shows you what the current hot tables are, there is a table that you can query to look at hot tables over a longer period of time. This is the CLUSTRIX_STATD.HOTNESS_HISTORY table.

mysql> show create table hotness_history \G
*************************** 1. row ***************************
 Table: hotness_history
Create Table: CREATE TABLE `hotness_history` (
 `timestamp` timestamp not null default current_timestamp,
 `read_rank` int(11) unsigned,
 `write_rank` int(11) unsigned,
 `node` int(11) unsigned not null,
 `database` varchar(256) CHARACTER SET utf8 not null,
 `table` varchar(256) CHARACTER SET utf8 not null,
 `index` varchar(256) CHARACTER SET utf8 not null,
 `reads` int(11) unsigned,
 `writes` int(11) unsigned,
 `replicas` int(11) unsigned,
 `ranked_replicas` int(11) unsigned,
 `total_replicas` int(11) unsigned,
 PRIMARY KEY (`timestamp`,`node`,`database`,`table`,`index`) /*$ DISTRIBUTE=5 */,
 KEY `timestamp_2` (`timestamp`,`write_rank`) /*$ DISTRIBUTE=2 */,
 KEY `timestamp` (`timestamp`,`read_rank`) /*$ DISTRIBUTE=2 */
) CHARACTER SET utf8 /*$ REPLICAS=2 SLICES=4 CONTAINER=btree */

1 row in set (0.00 sec)

The key fields are:

  • timestamp: statistics for that table or index for the last 5 minutes.
  • read_rank: how hot the table or index was for reads during the time.
  • write_rank: how hot the table or index was for writes during the time.
  • table: table that is hot.
  • index: this could be the index or could be the base representation. Base reps are suffixed with _PRIMARY.
  • replicas: number of replicas for that table or index. Generally, the number of slices should be equal or greater than the number of nodes so that data is spread out among all the nodes.

The cluster ranks each table or index with a read_rank and a write_rank for the duration of the stats window. A table or index that has a high rank would have had more reads or writes than the other representations.

To find the hot reads tables based on average read_rank for the last 24 hours, you can use:

SELECT Round(Avg(read_rank))  AS read_rank, 
       Round(Avg(write_rank)) AS write_rank, 
       Sum(reads)             AS reads, 
       Sum(writes)            AS writes, 
       DATABASE, 
       `table`, 
       `index`, 
       total_replicas 
FROM   clustrix_statd.hotness_history 
WHERE  timestamp >now() - INTERVAL 1 day 
       AND DATABASE NOT IN ('clustrix_statd', '_replication', 'clustrix_ui', 'system') 
       AND read_rank IS NOT NULL 
GROUP BY `table`, `index` 
ORDER BY read_rank ASC, write_rank ASC;

Similarly, you can do this for hot writes based on average write_rank:

SELECT Round(Avg(read_rank))  AS read_rank, 
       Round(Avg(write_rank)) AS write_rank, 
       Sum(reads)             AS reads, 
       Sum(writes)            AS writes, 
       DATABASE, 
       `table`, 
       `index`, 
       total_replicas 
FROM   clustrix_statd.hotness_history 
WHERE  timestamp >now() - INTERVAL 1 day 
       AND DATABASE NOT IN ('clustrix_statd', '_replication', 'clustrix_ui', 'system') 
       AND write_rank IS NOT NULL 
GROUP BY `table`, `index` 
ORDER BY write_rank ASC, read_rank ASC;

A variation to these queries is to look at the total reads and writes over the duration:

Hot reads by sum of reads:

SELECT Round(Avg(read_rank))  AS read_rank, 
       Round(Avg(write_rank)) AS write_rank, 
       Sum(reads)             AS reads, 
       Sum(writes)            AS writes, 
       DATABASE, 
       `table`, 
       `index`, 
       total_replicas 
FROM   clustrix_statd.hotness_history 
WHERE  timestamp >now() - INTERVAL 1 day 
       AND DATABASE NOT IN ('clustrix_statd', '_replication', 'clustrix_ui', 'system') 
       AND read_rank IS NOT NULL 
GROUP BY `table`, `index` 
ORDER BY 3 DESC, 4 DESC;

Hot writes by sum of writes:

SELECT Round(Avg(read_rank))  AS read_rank, 
       Round(Avg(write_rank)) AS write_rank, 
       Sum(reads)             AS reads, 
       Sum(writes)            AS writes, 
       DATABASE, 
       `table`, 
       `index`, 
       total_replicas 
FROM   clustrix_statd.hotness_history 
WHERE  timestamp >now() - INTERVAL 1 day 
       AND DATABASE NOT IN ('clustrix_statd', '_replication', 'clustrix_ui', 'system') 
       AND write_rank IS NOT NULL 
GROUP BY `table`, `index` 
ORDER BY 4 DESC, 3 DESC;

If the table is hot and the number of replicas is less than the two times the number of nodes, it may result in a load imbalance. You may want to consider re-slicing the table.

A hot table could indicate a poorly optimized query that is missing indexes and may warrant further investigation.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk