- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)
Kicked off two 200K correctness and showed no related errors.
PRIORITY_TEAM_REDUNDANT should be in a different priority band from
PRIORITY_MERGE_SHARD and PRIORITY_SPLIT_SHARD, because
priority inversion happens within priorities in the same band.
Add a flag in HealthMetrics to indicate that batch priority is rate limited.
Data distributor pulls this flag from proxy to know roughly when rate limiting
happens.
DD uses this information to determine when to do the rebalance in the background,
i.e., moving data from heavily loaded servers to lighter ones. If the cluster is
currently rate limited for batch commits, then the rebalance will use longer
time intervals, otherwise use shorter intervals. See BgDDMountainChopper() and
BgDDValleyFiller() in DataDistributionQueue.actor.cpp.
Add a new role for ratekeeper.
Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.