The ClustrixDB Rebalancer is an automated system for maintaining a healthy distribution of data within the cluster. It's the Rebalancer's job to respond to an "unhealthy" cluster by modifying the distribution and placement of user data. The Rebalancer is an online process which effects changes to the cluster with minimal interruption to user operations. The Rebalancer relieves the cluster administrator from the burden of manually manipulating data placement.
The rebalancer has several responsibilities. In order to ensure data protection and data distribution, it can perform the following tasks:
1. Copy. If the cluster loses a disk or a node, the rebalancer will issue copy operations to get back to a fully protected system state.
2. Move. The rebalancer tries to keep an even distribution of data in the system. If any nodes are less or more loaded than others, then the rebalancer will issue a move from a loaded node to a less loaded node. The reason in this circumstance will appear as load imbalance.
3. Rerank. Clustrix maintains multiple copies of data for protection. To increase cache efficiency, only one copy is used for reads. To maintain an even distribution of read requests in the cluster, the rebalancer will rank replicas for read preference. When it decides to change which replica is used for reads, it will rerank with a read imbalance reason.
4. Split. When a slice grows larger than its threshold (1GB default) the rebalancer will perform a split operation which effectively cuts the slice in half and then moves them for optimal balance.