Make sure both RateKeeper and DataDistributor are placed in the same data
center as the Master. Make sure only one RateKeeper is live in the cluster as
well.
The previous commit merge with the master, which just merges
the pull request #1062 from jzhou77/PR that adds a new DataDistribution role.
The merge causes conflicts and errors in simulation tests.
This commit resolves the code conflicts and
tries to fix the new errors after incorporating the new DataDistribution role
Added three knobs to control team remover
bool TR_FLAG_DISABLE_TEAM_REMOVER:
Disable the teamRemover actor
double TR_REMOVE_MACHINE_TEAM_DELAY:
Wait for the specified time before try to remove next machine team
double TR_WAIT_FOR_ALL_MACHINES_HEALTHY_DELAY:
Wait before checking if all machines are healthy
Call the traceTeamCollectionInfo function to record the team numbers
when we add a team directly from the shard information, instead of
using addTeamsBestOf logic.
The current simulator does not validate if the number of teams in
the system is larger than the maximum desired number of teams.
This validation should be added because we do NOT want too many teams
in the system, which may impede the systems availability when
multiple fault zones (e.g., machines) crashes at the same time.
This commit adds the test at the consistency check in simulation.
Since the current code does not handle the upgrading situation
when we enforce the machine teams, the test is expected to fail.
The later commit will handle the upgrading situation which gracefully
remove the surplus teams.
Fixed a couple of bugs
1) A rare race condition where a worker is being roles even after it died.
2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.
Current server team collection logic does not consider
the fact that multipe storage servers can run on the same machine.
When multiple machines fail, all servers on the machines will fail, and
the possibility of having one process team fail and lose data is very high.
To reduce the possibility of losing data when multiple machine fails,
we first create machine teams which span across different fault zones;
we then create server teams based on machine teams by
first picking 1 machine team, and then
picking 1 server from each machine in the machine team.
Signed-off-by: Meng Xu <meng_xu@apple.com>
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.