- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)
Kicked off two 200K correctness and showed no related errors.
Change the threshold team number per server that should set lastBuildTeamsFailed
from DESIRED_TEAMS_PER_SERVER to
(SERVER_KNOBS->DESIRED_TEAMS_PER_SERVER * (configuration.storageTeamSize + 1)) / 2;
If serverTeamRemover removes a team before machineTeamRemover brings
the machine team number down to the desired number, DD may create a new
team (due to teams removed by serverTeamRemover), which may be removed
later by machineTeamRemover. This causes unnnecessary extra data movement.
1) No need to check server with only one team when teamRemover finds
a server team or machine team to remove
2) Fix optimalTeamCount counting in teamTracker
If the minimum number of teams of servers in a team is less than the
target value (desired_team_number_per_server * (teamSize + 1) / 2),
the team remover should not remove it. Otherwise, DD will oscillate in
building more teams and removing redundant teams.
Do not do consistency check for three_data_hall mode because when
machines are not evenly distributed across data halls, we will
need to build more teams than the total desired number to make sure
the number of teams per server is no less than the target value.
Do not overbuild teams because we may oscillate between building more teams and
removing the redundant teams. The oscillation happens when the machines are not
evenly distributed across availability zones.
For example, in three_data_hall mode, we have 1 machine in 1 data hall for 2 data halls.
We have 3 machines in the 3rd data hall. To build enough (and more teams) for servers
in the 3rd data hall, we will overbuild teams. However,
the teamRemover will remove those newly teams.
Change to remove machine team with most machine teams, using the same
logic as the serverTeamRemover.
The featue is guarded by TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS knob.
Before the serverTeamRemover tries to pick a team to remove,
it waits for all data movement to finish, which means all teams are healthy.
When the serverTeamRemover starts to pick a team to remove,
we believe all servers are healthy.
A storage server is not desired to be colocated with tLogs.
So we want to mark the server as undesired.
However, if there is not enough process in the system, we will
have no choice but do so.
The old logic makes the server undesired if optimalTeamCount > 0;
However, there is a rare case when optimalTeamCount is 1 when it is supposed to be 0.
To overcome the situation, we add another condition healthyTeamCount > 0
as a guard to mark such a colocated server undesired.
When a teamTracker is cancelled, e.g, by redundant teamRemover or badTeamRemover,
we should decrease the optimalTeamCount if the team is considered as an
optimal team, i.e., all members' machine fitness is no worse than unset, and
the team is healthy.
Because serverTeamRemover takes time to remove teams,
getTeamCollectionValid() need to wait for a while before concluding that
the number of server teams is larger than the desired number.
Pick the team whose minimum team number of a server is the largest one to remove.
AddTeamsBestOf should keep building teams until each server has at least the
target number of teams.
The redundant team removed by teamRemover will not exist
in the global teams data structure. So we will not find
the redundant team from shard-to-team mapping in the system key.
Before this change, teamTracker marks such team as PRIORITY_TEAM_UNHEALTHY.
With this change, it marks it as PRIORITY_TEAM_REDUNDANT
We build more teams than we finally want so that we can use serverTeamRemover() actor to remove the teams
whose member belong to too many teams. This allows us to get a more balanced number of teams per server.
Because the consistency check will try to conver the value to int64_t.
If no server exists, the variable will not be updated and thus get overflowed
when it is converted to int64_t
If a team is removed from DD, it will be marked as failed and eventually removed from the
global teams data structure.
Team healthyness is likely to be a temporary state which can be changed rather quickly.
There are cases where traceTeamCollectionInfo was called within the same execution block, i.e.,
no wait between the two traceTeamCollectionInfo calls.
Because simulation uses the same time for all execution instructions in the same execution block,
having more than one traceTeamCollectionInfo at the same time will mess up the trackLatest semantics.
When one of them is always chosen by simulator, simulation test will report false positive error.
Changing this function to actor and adding a small delay inside the function can solve this problem.
Whenever use selectReplicas function, be careful that it may have bugs!
This bug is that it always return false (not able to find candidates)
when the storage team size is 1. This is wrong because when storage team size
is 1, the selectReplicas should return an empty result.
When team collection add new server teams, it picks a team with
the least number of teams. We should only consider the healthy teams
because the unhealthy ones will not be useful.
Team collection should prioritize to build machine teams for a machine
that has the least number of healthy machine teams, instead of just
machine teams, because unhealthy machine team will not be able to
produce more server teams.
When team collection (TC) build server teams and machine teams,
it needs to build enough teams such that each server and machine has
the DESIRED_TEAMS_PER_SERVER server teams and machine teams.
This change calculate the number of teams (server team and machine teams)
needed to get each teams for each server and machine.