- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)
Kicked off two 200K correctness and showed no related errors.
Change the threshold team number per server that should set lastBuildTeamsFailed
from DESIRED_TEAMS_PER_SERVER to
(SERVER_KNOBS->DESIRED_TEAMS_PER_SERVER * (configuration.storageTeamSize + 1)) / 2;
If serverTeamRemover removes a team before machineTeamRemover brings
the machine team number down to the desired number, DD may create a new
team (due to teams removed by serverTeamRemover), which may be removed
later by machineTeamRemover. This causes unnnecessary extra data movement.
1) No need to check server with only one team when teamRemover finds
a server team or machine team to remove
2) Fix optimalTeamCount counting in teamTracker
If the minimum number of teams of servers in a team is less than the
target value (desired_team_number_per_server * (teamSize + 1) / 2),
the team remover should not remove it. Otherwise, DD will oscillate in
building more teams and removing redundant teams.
Do not do consistency check for three_data_hall mode because when
machines are not evenly distributed across data halls, we will
need to build more teams than the total desired number to make sure
the number of teams per server is no less than the target value.
Do not overbuild teams because we may oscillate between building more teams and
removing the redundant teams. The oscillation happens when the machines are not
evenly distributed across availability zones.
For example, in three_data_hall mode, we have 1 machine in 1 data hall for 2 data halls.
We have 3 machines in the 3rd data hall. To build enough (and more teams) for servers
in the 3rd data hall, we will overbuild teams. However,
the teamRemover will remove those newly teams.
Change to remove machine team with most machine teams, using the same
logic as the serverTeamRemover.
The featue is guarded by TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS knob.