foundationdb

Commit Graph

Author	SHA1	Message	Date
mpilman	f79a9594c1	Several bugfixes to make fdb build on non-ide	2019-02-19 15:16:59 -08:00
mpilman	999ea09bfd	Use correct fwd decls in TesterInterface Also TesterInterface.h -> TesterInterface.actor.h	2019-02-19 15:16:59 -08:00
mpilman	699216f713	Use fwd decls in workloads Also workloads.h -> workloads.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3f0fd2a20c	Use fwd decls in WorkerInterface Also WorkerInterface.h -> WorkerInterface.actor.h	2019-02-19 15:16:59 -08:00
mpilman	27a3153719	Use ACTOR forward declarations in MoveKeys Also MoveKeys.h -> MoveKeys.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3a0f9839b9	Fix minor IDE build errors	2019-02-19 15:16:59 -08:00
mpilman	9b14aeb156	Tell cmake not to link/install on ide build	2019-02-19 15:16:59 -08:00
mpilman	0bb60e5a3b	Use proper fwd decl in NativeAPI Also NativeAPI.h -> NativeAPI.actor.h	2019-02-19 15:16:59 -08:00
mpilman	78dd80ea8a	Proper fwd decl in BackupAgent Also BackupAgent.h -> BackupAgent.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3cb2391b58	use proper fwd declarations in ManagementAPI Also ManagementAPI.h -> ManagementAPI.actor.h	2019-02-19 15:16:59 -08:00
Vishesh Yadav	124a277a65	Remove coordinator printf in SimulatedCluster	2019-02-19 13:53:17 -08:00
Vishesh Yadav	0898686c9b	Remove old TODO	2019-02-18 15:43:27 -08:00
Vishesh Yadav	e05b53d755	Merge remote-tracking branch 'apple/master' into task/tls-upgrade	2019-02-15 20:37:07 -08:00
Vishesh Yadav	d34a658357	Add Restore role previous removed by mistake	2019-02-15 20:25:03 -08:00
Vishesh Yadav	c03de6c7b6	Update CLI to take addresses with mixed TLS states	2019-02-15 20:23:07 -08:00
Evan Tschannen	83060c6e56	Merge pull request #1062 from jzhou77/PR Add a new DataDistributor role.	2019-02-15 13:51:27 -08:00
Evan Tschannen	8099a0b665	Merge pull request #1129 from alexmiller-apple/tstlog-1 Spill-by-reference TLog Part 1: IDiskQueue::read()	2019-02-15 11:48:14 -08:00
mpilman	75f692b931	simplify actorcompiler and target to compile coveragetool	2019-02-15 00:01:42 -08:00
Jingyu Zhou	5e6577cc82	Final cleanup per review comments Make distributor interface optional in ServerDBInfo and many other small changes.	2019-02-14 16:37:17 -08:00
Jingyu Zhou	bf6da81bf9	Remove recovery version from data distribution queue This parameter is no longer used/needed.	2019-02-14 16:37:16 -08:00
Evan Tschannen	038144adb1	Update fdbserver/DataDistribution.actor.cpp Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>	2019-02-14 16:37:16 -08:00
Evan Tschannen	171a69c810	Update fdbserver/ClusterController.actor.cpp Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>	2019-02-14 16:37:16 -08:00
Evan Tschannen	a4b2c9ef88	Update fdbserver/ClusterController.actor.cpp Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>	2019-02-14 16:37:16 -08:00
Jingyu Zhou	fc3a784963	Fix another build team bug The buildTeam() can create teams with undesired storage servers, which are considered unhealthy. As a result, the data movement can become stuck. Fix this by adding an ACTOR monitorHealthyTeams that builds team every one second whenever there is no healthy teams. Clean up storageServerTracker() interface.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	8afe84d31b	Fix an ordering bug for buildTeam When zeroHealthyTeams signals and the storage server becomes healthy, we could attempt buildTeam before the ServerStatusMap is updated. As a result, the healthy server is not available for use. Fix by delaying the buildTeam after the status map is updated.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	a7d1111a10	Make servers and serverIDs private for TCTeamInfo Make both accessible through public member functions instead.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	0e47912192	Fix an out-of-memory error	2019-02-14 16:37:16 -08:00
Jingyu Zhou	8b1235533e	Fix segfault during configuration change This bug was introduced in cee23ee3. During a configuration change, the data distributor was restarted, which destroys previous DDTeamCollection and cancels all previous teamTracker(). In this case, even though the healthy team count reaches 0, there is no need to try to rebuild teams. The bug is triggered when trying rebuilding teams, DDTeamCollection is already destroyed.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	07dab56133	Fix a data movement stuck bug When moving keys to a team, if one of the server in the target team died, then the move can become stuck. This is because the DDTeamCollection waits for all the data movement of the failed server to be completed. However, in this case, because the movement has not finished yet, checking the database tells us there is no key assocated with this server and it is safe to go ahead. In reality, only the in-memory structure knows there is pending movement, i.e., unfinished move causes some keys to be attributed to the failed server. Thus, the server can't be removed yet. Fix by adding a check with in-memory structure in waitForAllDataRemoved(). Use const& to optimize a few function parameters.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	961d71538e	A follow-on fix to ensure build team for zero teams	2019-02-14 16:37:16 -08:00
Jingyu Zhou	5deeec29e3	Fix a bug where team is not rebuild after storage failure When two failures happened to a team, one of the server recovered. The current logic skips for building a new team, which is wrong.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	62c67a50e5	Fix segfault error The usedIds is updated by master registration request, which populates the usedIds map. However, this request may contain processes that cluster controller is not aware, i.e., not in id_worker map. This is ok until I added tracing the usedIds, which silently insert an empty entry into id_worker map for the unknown process. This new entry can cause crashing failure when trying to access its LocalityData. Remove AsyncTrigger for usedIds, and change to serverInfo->onChange. Use const & to avoid unnecessary copies in WorkerInterface's LocalityData and getExtraTLogEligibleMachines().	2019-02-14 16:37:16 -08:00
Jingyu Zhou	21066b013a	Remove DataDistributorRejoinRequest This is no longer needed, since worker registration piggybacks distributor interface now.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	816f8b1ae1	Per review comments Add a knob for starting distributor delay. Move distributor failed variable to a local loop.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	578473a974	Various review comments fixes	2019-02-14 16:37:16 -08:00
Jingyu Zhou	b3d1633114	Fix bugs of missing request The quite database can fail to send out requests and report timeout. This seems to be caused by reusing a request that uses the same ReplyPromise. Another bug is Proxy can wait for unneeded time for a dabase change, while the distributor is already known to itself.	2019-02-14 16:37:16 -08:00
Evan Tschannen	5fb48083cd	Update fdbserver/ClusterController.actor.cpp Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>	2019-02-14 16:37:16 -08:00
Evan Tschannen	2db31d70a5	Update fdbserver/DataDistributorInterface.h Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>	2019-02-14 16:37:16 -08:00
Jingyu Zhou	8c61de318f	Fix segfault and no_more_servers errors	2019-02-14 16:37:16 -08:00
Jingyu Zhou	7897616164	Fix wait failure bug on cluster controller The setDistributor() sets an AsyncVar and then runs waitFailureClient. This ordering is wrong because the AsyncVar::set triggers the other loop to run first, which will wait on Never(). The correct code should wait on the Future returned by the waitFailureClient.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	00f2253229	Piggyback data distributor interface in worker registration This allows cluster controller to know data distributor during worker registration phase, thus avoiding recruiting a new data distributor after starting. Also change the worker to skip creating a new data distributor if there is already one running on the worker, which can trigger operation timeout in tests.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	39e4a59154	Add used worker IDs to cluster controller This "usedIds" is updated when receiving a master registration message, so that when recruiting new data distributor, existing assignment is known.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	6a655143e8	A follow-on fix for config key usage And some trace event cleanups.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	be5c962bb7	Add a new configuration version key \xff/conf/version This fixed a bug found by upgrade test, where the configuration monitor of the data distributor was monitoring excludedServersVersionKey, which doesn't change in ChangeConfig workload. As a result, data distributor was not aware of configuration changes. Adding this new key and make sure this key is updated in configuration changes so that the monitor can detect configuration changes.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	3135f1d84b	Cluster controller ignores distrobutor rejoin After controller starts one, it will wait for that one and ignore any rejoins received later. Add remoteRecovered() to data distribution for remote team collection.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	99e109d6c5	Fix timeout error due to lost exception Found in tests, a move key conflict exception was not handled because the Future object was not waited by someone. As a result, the data distributor did not die and database checking couldn't get the metric and keep trying until timeout.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	c38b2a8c38	Change masterId to distributorId in tracker. This reflects the change of moving data distribution out of master server.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	aea602d9c7	Remove getRecoveryInfo from master interface.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	f5242bda7c	Update data distributor to use configuration monitor This enable removal of GetRecoveryInfoRequest from master interface. Remove recoveryTransactionVersion from dataDistribution().	2019-02-14 16:37:16 -08:00
Jingyu Zhou	e0a7162cf8	Add a failure timeout knob for data distributor. Set default time to 1.0s.	2019-02-14 16:37:16 -08:00

1 2 3 4 5 ...

1332 Commits