foundationdb

Commit Graph

Author	SHA1	Message	Date
Ata E Husain Bohra	936bf5336a	Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191 ) * Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine"" Major changes includes: 1. Re-revert Sequencer refactor commits listed below (in listed order): 1.a. This reverts commit `bb17e194d9`. 1.b. This reverts commit `d174bb2e06`. 1.c. This reverts commit `30b05b469c`. 2. Update Status.actor to track ClusterController interface to track recovery status. 3. Introduce a ServerKnob to define "cluster recovery trace event" prefix; for now keeping it as "Master", however, it should allow smooth transition to "Cluster" prefix as it seems more appropriate.	2022-01-06 12:15:51 -08:00
Aaron Molitor	30b05b469c	Revert "Refactor: ClusterController driving cluster-recovery state machine" This reverts commit `dfe9d184ff`.	2021-12-24 11:25:51 -08:00
Ata E Husain Bohra	dfe9d184ff	Refactor: ClusterController driving cluster-recovery state machine At present, cluster recovery process consists of following steps: 1. ClusterController clusterWatchDatabase actor recruits master/sequencer process. 2. Sequencer process implements the cluster recovery state machine, responsible to recruit all other processes as well restore the cluster state. Patch proposes a scheme where the cluster recovery state machine is implemented and driven by the ClusterController process instead of the Sequencer process. Advantages of the scheme could be: 1. Simplified design where ClusterController recruits "sequencer" process like other worker processes compared to current scheme where "sequencer" process gets special treatment. In newer scheme sequencer is responsible for maintaining/providing "committed version" (as expected). 2. ClusterController is responsible for worker processes recruitment, the sequencer though orchestrating the recovery state machine, it need to reachout to the ClusterController for recruiting worker processes etc. NOTE: Patch has moved the recovery state machine code from 'sequencer' -> 'cluster-controller' process, however, necessary updates were done for both functionality as well as performance improvement reasons. Next Steps: Cluster recovery documentation will be updated in near future.	2021-12-22 14:06:27 -08:00
sfc-gh-tclinkenbeard	b9a22a61ef	Fix many -Wreorder-ctor warnings	2021-07-23 17:33:18 -07:00
FDB Formatster	df90cc89de	apply clang-format to .c, .cpp, .h, .hpp files	2021-03-10 10:18:07 -08:00
Young Liu	cc5bc16bd8	Rename more places from proxy to commit proxy	2020-09-15 22:29:49 -07:00
Young Liu	525f10e30c	Merge master branch	2020-07-22 16:08:49 -07:00
Young Liu	21c1998cca	Fix MaxTLogQueueSize Bug	2020-07-16 15:56:04 -07:00
Young Liu	5b06d69d25	Pass watches test	2020-07-15 00:37:41 -07:00
Markus Pilman	0fbe7101c3	Revert "Revert "Request tracing"" This reverts commit `327cc31e35`.	2020-07-07 10:06:13 -06:00
Young Liu	8cd97a2be8	Resolve Evan's comments: - Only report commit version when commit version is larger than the known committed version. - Fix task priorities of [Get/Report]LiveCommittedVersion [request/reply]. - Fix some code style issues.	2020-06-17 10:02:25 -07:00
Young Liu	4dfb903a3a	tmp merge	2020-06-16 20:32:07 -07:00
Jingyu Zhou	327cc31e35	Revert "Request tracing"	2020-06-16 12:32:42 -07:00
Young Liu	0628caa669	Fixed ctest build	2020-06-13 22:23:47 -07:00
Young Liu	bf524fc6f2	Added span trace in serveLiveCommittedVersion	2020-06-13 17:47:13 -07:00
Young Liu	f211a54593	Merged from upstream master	2020-06-13 16:47:12 -07:00
Young Liu	f8c457d74d	Minor fix against Meng's comments	2020-06-13 16:27:08 -07:00
Young Liu	a47806a966	Fixed locked and metadataVersion in GetReadVersion	2020-06-10 15:55:23 -07:00
Markus Pilman	9ef93714ac	Address review comments	2020-06-10 15:48:49 -07:00
Markus Pilman	caabbec466	Spans for read and commit path	2020-06-10 09:59:41 -07:00
Young Liu	3a37e0af75	Serve GetReadVersion through master instead of peer proxies	2020-06-09 20:47:34 -07:00
Young Liu	a038a02cdd	Serve GetReadVersion through master	2020-06-09 11:16:23 -07:00
A.J. Beamon	d128252e90	Merge release-6.3 into master	2020-05-22 09:25:32 -07:00
Evan Tschannen	ff5543b579	working implementation	2020-04-12 22:18:51 -07:00
Evan Tschannen	0c2e8b9462	only serialize a single endpoint for an interface	2020-04-12 16:04:48 -07:00
Evan Tschannen	ce4493f679	many bug fixes	2020-04-10 13:45:16 -07:00
Evan Tschannen	e08f0201f1	merge release 6.2 into master	2020-03-17 12:51:47 -07:00
Evan Tschannen	a068d4063f	renamed ProxyGetConsistentReadVersion	2020-03-16 12:11:32 -07:00
Jingyu Zhou	0c08161d8e	Remove old backup workers when done For backup workers working on old epochs, once their work is done, they will notify the master. Then the master removes them from the log system and acknowledge back to the backup workers so that they can gracefully shut down. The popping of a backup worker is stalled if there are workers from older epochs still working. Otherwise, workers from old epochs will lost data. However, allowing newer epoch to start backup can cause holes in version ranges. The restore process must verify the backup progress to make sure there are no holes, otherwise it has to wait.	2020-01-22 19:38:45 -08:00
Andrew Noyes	6aa0ada7b1	Replace scalar root types with proper messages	2019-08-28 14:40:50 -07:00
Alex Miller	df7f0cffa1	Raise the priority of TLogRejoin above the default work priority. With sharded txs tags, the master now receives data from transaction logs at an order of magnitude higher rate. This is the intentional desires result of sharding the txs tag. With a sufficient number of TLogs, the master will saturate its CPU time handling the peek responses. Performance tests revealed some unstable oddities in how long a recovery would take, which was eventually root caused to a priority inversion between TLogRejoin requests and TLog peek replies. Once peek replies saturate the CPU, the master would proceed to ignore further TLogRejoin messages. TLogRejoin is what marks a TLog as available to the failure monitor, which is also what decides between a ServerPeekCursor and a MergePeekCursor for a SetPeekCursor. Ignoring TLogRejoins meant that the sharded txs locality tags for those servers would be merge peeked over all TLogs. This is much less efficient than just peeking one copy of data from the one preferred server. Depending on the race between TLogPeek replies saturating the CPU and TLogRejoin requests being submitted, a variable number of tags would be affected, and thus the performance test would have some variance in its results.	2019-07-19 16:55:04 -07:00
Alex Miller	7a500cd37f	A giant translation of TaskFooPriority -> TaskPriority::Foo This is so that APIs that take priorities don't take ints, which are common and easy to accidentally pass the wrong thing.	2019-06-25 02:47:35 -07:00
mpilman	68ce9a5e75	ProtocolVersion type - second try	2019-06-18 17:55:27 -07:00
mpilman	6afce01744	Implementation complete (not yet working)	2019-05-13 14:15:22 -07:00
mpilman	ba83c458a6	types implemented	2019-05-13 14:15:22 -07:00
Vishesh Yadav	e05b53d755	Merge remote-tracking branch 'apple/master' into task/tls-upgrade	2019-02-15 20:37:07 -08:00
Jingyu Zhou	aea602d9c7	Remove getRecoveryInfo from master interface.	2019-02-14 16:37:16 -08:00
Jingyu Zhou	886e7ab2ba	Add a new DataDistributor role. Let cluster controller to start a new data distributor role by sending a message to a chosen worker. Change MasterInterface usage in DataDistribution to masterId Add DataDistributor rejoin handling. This allows the data distributor to tell the new cluster controller of its existence so that the controller doesn't spawn a new one. I.e., there should be only ONE data distributor in the cluster. If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries to recruit one as DD. CC also monitors DD and restarts one if it failed. The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for the new DD. Add GetRecoveryInfo RPC to master server, which is called by data distributor to obtain the recovery Transaction version from the master server.	2019-02-14 16:30:13 -08:00
Evan Tschannen	1d7fec3074	Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2 # Conflicts: # .gitignore	2019-01-24 17:43:06 -08:00
anoyes	6a4d87802b	Replace & operator with variadic function	2018-12-28 11:33:42 -08:00
Vishesh Yadav	3eb9b23024	Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint - This patch will make FDB listen to multiple addresses given via command line. Although, we'll still use first address in most places, this patch starts using vector<NetworkAddress> in Endpoint at some basic places. - When sending packets to an endpoint, pick a random network address in endpoints - Renames Endpoint::address to Endpoint::addresses since it now holds a vector of addresses.	2018-12-13 13:36:52 -08:00
Vishesh Yadav	43e5a46f9b	Change Endpoint::address(NetworkAddress) to vector<NetworkAddress> Extend `Endpoint` class to take multiple NetworkAddresses instead of just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll have multiple IP:PORT pairs. This patch simply adds the field and makes changes to compile the codebase. The first element of of `address` field is used everywhere. Hence the way we talk to remains same with this patch. NOTE: Directly accessing the first memeber of Endpoint::address is unsafe as Endpoint() doesn't enforces non-empty address list. However, since the correctness test pass for now and are anyway replacing all those unsafe accesses with ones considering the whole vector, this patch ignores to access them in safe way.	2018-12-13 13:36:52 -08:00
Robert Escriva	268093a96d	Adjust all includes to be relative to the root. Remove the use of relative paths. A header at foo/bar.h could be included by files under foo/ with "bar.h", but would be included everywhere else as "foo/bar.h". Adjust so that every include references such a header with the latter form. Signed-off-by: Robert Escriva <rescriva@dropbox.com>	2018-10-19 17:35:33 +00:00
Balachandar Namasivayam	8ae640c062	Addressed review comments.	2018-03-02 17:56:49 -08:00
Balachandar Namasivayam	11df1aeabf	Add new api to get shared tlogs id and address	2018-03-02 16:50:30 -08:00
Alec Grieser	0bae9880f1	remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py	2018-02-21 10:25:11 -08:00
FDB Dev Team	a674cb4ef4	Initial repository commit	2017-05-25 13:48:44 -07:00

47 Commits