foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	6220a5ce0f	Merge pull request #1370 from jzhou77/fix-unreferenced Remove unused functions	2019-04-09 11:49:45 -07:00
mpilman	1c16f87a4e	Remove trace-calls to printable (in non-workloads)	2019-04-05 13:12:19 -07:00
Evan Tschannen	a38c396283	made all maintenance transactions lock aware	2019-04-02 14:27:48 -07:00
Evan Tschannen	628fec8c8b	updated status with information about ongoing maintenance clear the maintenance zone if a different storage server is detected failed	2019-04-02 14:15:51 -07:00
Evan Tschannen	781cf9b5a0	added the ability to make a zoneId for maintenance in fdbcli	2019-04-01 17:55:13 -07:00
Jingyu Zhou	f7f8ddd894	Fix warnings on unused variables Found by -Wunused-variable flag.	2019-04-01 14:00:20 -07:00
Jingyu Zhou	9f6fe5f649	Merge remote-tracking branch 'apple/master' into ratekeeper	2019-03-15 11:30:04 -07:00
Jingyu Zhou	99d521ef4f	Monitor Ratekeeper and DataDistributor to use stateless processes Since Ratekeeper and DataDistributor are no longer running with Master, they might be running with stateful processes before a new Master becomes alive, which is undesirable. This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster Controller -- if Master runs on a stateless class and RK/DD runs at a worse class, then RK/DD will be killed. I.e., RK/DD should be running at their own classes or on the same stateless process as Master. After restart, RK/DD should be running at a better process class.	2019-03-14 15:00:57 -07:00
Evan Tschannen	a2108047aa	removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects.	2019-03-13 13:14:39 -07:00
Jingyu Zhou	2b0139670e	Fix review comment for PR 1176	2019-03-12 12:02:30 -07:00
Jingyu Zhou	cdfe906c30	Data distributor pulls batch limited info from proxy Add a flag in HealthMetrics to indicate that batch priority is rate limited. Data distributor pulls this flag from proxy to know roughly when rate limiting happens. DD uses this information to determine when to do the rebalance in the background, i.e., moving data from heavily loaded servers to lighter ones. If the cluster is currently rate limited for batch commits, then the rebalance will use longer time intervals, otherwise use shorter intervals. See BgDDMountainChopper() and BgDDValleyFiller() in DataDistributionQueue.actor.cpp.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	835cc278c3	Fix rebase conflicts.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	d52ff738c0	Fix merge conflicts during rebase.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	b2ee41ba33	Remove lastLimited from data distribution Fix a serialization bug in ServerDBInfo, which causes test failures.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	36a51a7b57	Fix a segfault bug due to uncopied ratekeeper interface	2019-03-07 13:16:20 -08:00
Jingyu Zhou	e6ac3f7fe8	Minor fix on ratekeeper work registration.	2019-03-07 13:16:20 -08:00
Jingyu Zhou	3c86643822	Separate Ratekeeper from data distribution. Add a new role for ratekeeper. Remove StorageServerChanges from data distribution. Ratekeeper monitors storage servers, which borrows the idea from DataDistribution.	2019-03-07 13:16:20 -08:00
anoyes	981426bac9	More ide fixes	2019-03-05 18:03:57 -08:00
Evan Tschannen	b8910ba7cd	Merge branch 'master' into feature-fix-force-recovery # Conflicts: # fdbclient/ManagementAPI.actor.h # fdbserver/DataDistribution.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/KillRegion.actor.cpp	2019-02-22 14:38:13 -08:00
Evan Tschannen	d008de576e	Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR Add background actor to remove redundant teams	2019-02-22 14:22:07 -08:00
Meng Xu	9445ac0b0c	Status: Use new data distributor worker to publish status After we add a new data distributor role, we publish the data related to data distributor and rate keeper through the new role (and new worker). So the status needs to contact the data distributor, instead of master, to get the status information.	2019-02-21 18:05:50 -08:00
Meng Xu	3e703dc2d1	TeamRemover: Fix bug that may not remove all teams needed	2019-02-21 15:54:16 -08:00
Meng Xu	7cca439e00	TeamRemover: Add status to show redundant team removing Distinguish the removal of unhealthy team and redundant team. Change status report to include redundant team removal report.	2019-02-21 14:16:46 -08:00
Meng Xu	0ac7014142	TeamRemover: Resolve minor comments from code review	2019-02-21 13:18:11 -08:00
Evan Tschannen	329ab766f1	factored out a duplicate code block attempted to fix a compiler error	2019-02-20 18:20:10 -08:00
Meng Xu	d86ba0e811	TeamRemover: Change it to run periodically This simplifies the problem of when we should invoke the teamRemover	2019-02-20 16:08:34 -08:00
Evan Tschannen	27e3617548	fix: remove bad teams needed to use dd_stall_check delay, because in simulation the buggified delay time could make us remove bad teams before they submit their ranges to the queue	2019-02-20 14:18:36 -08:00
Evan Tschannen	3a572b010f	fix: a forced recovery needed to force the data distributor to restart	2019-02-19 16:04:52 -08:00
mpilman	27a3153719	Use ACTOR forward declarations in MoveKeys Also MoveKeys.h -> MoveKeys.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3a0f9839b9	Fix minor IDE build errors	2019-02-19 15:16:59 -08:00
mpilman	3cb2391b58	use proper fwd declarations in ManagementAPI Also ManagementAPI.h -> ManagementAPI.actor.h	2019-02-19 15:16:59 -08:00
Meng Xu	111ab2eccc	TeamRemover: Check redundant team flag before satisfiesPolicy In addTeam(), to determine the team is badTeam or not, we should check redundantTeam before check satisfiesPolicy. Because if a team is redundantTeam, it has been removed from the system before we call addTeam(). The only reason we call addTeam() for a removed redundantTeam is to kick off the badTeam cleanup logic.	2019-02-19 14:46:47 -08:00
Meng Xu	e256d9a9ac	TeamRemover: Change ASSERT in teamRemover function When we remove a machine team in teamRemover function, we should always find the machine team in the global machineTeams. Change the ASSERT to the above invariant.	2019-02-19 08:13:10 -08:00
Meng Xu	3c1ed2eba9	TeamRemover: Confident no duplicate machine teams In removeMachineTeam, we are confident that there is no duplicate machine team when remove a machine team from a machines vector of machineTeams	2019-02-19 08:13:10 -08:00
Meng Xu	ed1d4635bc	TeamRemover: Format cleaning Use clang-format and remove debug messages for the code that fixes bugs in merging the PR of adding a DataDistributor role	2019-02-19 08:13:10 -08:00
Meng Xu	211036ee22	TeamRemover: Fix bugs introduced in the previous commit	2019-02-19 08:13:10 -08:00
Meng Xu	a7810d9594	TeamRemover: Fix ASSERT condition in teamRemover	2019-02-19 08:13:10 -08:00
Meng Xu	06b6a1d2ad	TeamRemover: Bug fix in teamRemover and add teamRemover invocation point	2019-02-19 08:13:10 -08:00
Meng Xu	a6d3a5a3d6	TeamRemover: Change machineTeamNumber to healthyMachineTeamNumber Always use healthy machine team number as the condition of if redundant teams exist	2019-02-19 08:13:10 -08:00
Meng Xu	b35631365f	TeamRemover: Solve confict when merge with PR 1061 The previous commit merge with the master, which just merges the pull request #1062 from jzhou77/PR that adds a new DataDistribution role. The merge causes conflicts and errors in simulation tests. This commit resolves the code conflicts and tries to fix the new errors after incorporating the new DataDistribution role	2019-02-19 08:13:10 -08:00
Evan Tschannen	065a45e05f	Merge branch 'master' into feature-fix-force-recovery # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/workloads/KillRegion.actor.cpp	2019-02-18 17:09:06 -08:00
Evan Tschannen	d492395f84	fix: simulation could buggify a delay such that data distribution incorrectly thinks the queue is not processing unhealthy relocations	2019-02-18 14:57:07 -08:00
Vishesh Yadav	e05b53d755	Merge remote-tracking branch 'apple/master' into task/tls-upgrade	2019-02-15 20:37:07 -08:00
Meng Xu	6d09ac483c	Merge with master	2019-02-15 17:03:40 -08:00
Meng Xu	5ca074d86f	TeamRemover: No order of removing team and machine team We do NOT enforce the removing order of removing a machine team and the server teams on the machine team. This is for the benefit of clear code logic. When a storage server locality changes, we first remove the server and its machine if needed, before we handle the server team removal and addition.	2019-02-15 10:54:29 -08:00
Meng Xu	cfd323dafe	TeamRemover: Check when a server team is removed We do not actively remove a machine team when it has no server team on it. But since adding a server team may add a machine team, we need to be careful that the machine team number is not larger than the desired number due to server team creation. So whenever a server team is removed, we should check if the teamRemover should be kicked in.	2019-02-15 09:35:31 -08:00
Meng Xu	e803eef906	TeamRemover: Must be called when machine number changes When the machine number changes due to machine remove event, the desired machine team number changes. Then we need to make sure the teamRemover actor is running to clean up the redundant teams.	2019-02-14 20:53:26 -08:00
Meng Xu	1e55e8fea6	TeamRemover: Do not call teamRemover in getTeam getTeam is called very frequently and does not create a new team. So no need to call teamRemover in getTeam. teamRemover should be called only when a new team may be added.	2019-02-14 17:37:20 -08:00
Jingyu Zhou	5e6577cc82	Final cleanup per review comments Make distributor interface optional in ServerDBInfo and many other small changes.	2019-02-14 16:37:17 -08:00
Jingyu Zhou	bf6da81bf9	Remove recovery version from data distribution queue This parameter is no longer used/needed.	2019-02-14 16:37:16 -08:00

1 2 3 4 5

224 Commits