foundationdb

Commit Graph

Author	SHA1	Message	Date
Xiaoxi Wang	d25fc4db34	add ASSERT_WE_THINK	2022-04-07 09:21:50 -07:00
Xiaoxi Wang	20fee3dd06	check pseudo locality before pop	2022-04-05 23:48:18 -07:00
sfc-gh-tclinkenbeard	a71099471b	Update copyright header dates	2022-03-21 13:36:23 -07:00
A.J. Beamon	250a88e682	Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement.	2022-02-24 12:25:52 -08:00
Zhe Wu	e07ae6fdb9	Address comments	2022-02-16 15:28:56 -08:00
Zhe Wu	9da735c38e	Batch empty peek reply	2022-02-16 15:28:56 -08:00
Ata E Husain Bohra	936bf5336a	Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191 ) * Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine"" Major changes includes: 1. Re-revert Sequencer refactor commits listed below (in listed order): 1.a. This reverts commit `bb17e194d9`. 1.b. This reverts commit `d174bb2e06`. 1.c. This reverts commit `30b05b469c`. 2. Update Status.actor to track ClusterController interface to track recovery status. 3. Introduce a ServerKnob to define "cluster recovery trace event" prefix; for now keeping it as "Master", however, it should allow smooth transition to "Cluster" prefix as it seems more appropriate.	2022-01-06 12:15:51 -08:00
Aaron Molitor	30b05b469c	Revert "Refactor: ClusterController driving cluster-recovery state machine" This reverts commit `dfe9d184ff`.	2021-12-24 11:25:51 -08:00
Aaron Molitor	d174bb2e06	Revert "Refactor: ClusterController driving cluster-recovery state machine" This reverts commit `abd2959702`.	2021-12-24 11:25:51 -08:00
Ata E Husain Bohra	abd2959702	Refactor: ClusterController driving cluster-recovery state machine diff-1: Address Jingyu's review comments At present, cluster recovery process consists of following steps: 1. ClusterController clusterWatchDatabase actor recruits master/sequencer process. 2. Sequencer process implements the cluster recovery state machine, responsible to recruit all other processes as well restore the cluster state. Patch proposes a scheme where the cluster recovery state machine is implemented and driven by the ClusterController process instead of the Sequencer process. Advantages of the scheme could be: 1. Simplified design where ClusterController recruits "sequencer" process like other worker processes compared to current scheme where "sequencer" process gets special treatment. In newer scheme sequencer is responsible for maintaining/providing "committed version" (as expected). 2. ClusterController is responsible for worker processes recruitment, the sequencer though orchestrating the recovery state machine, it need to reachout to the ClusterController for recruiting worker processes etc. NOTE: Patch has moved the recovery state machine code from 'sequencer' -> 'cluster-controller' process, however, necessary updates were done for both functionality as well as performance improvement reasons. Next Steps: Cluster recovery documentation will be updated in near future.	2021-12-22 14:06:27 -08:00
Ata E Husain Bohra	dfe9d184ff	Refactor: ClusterController driving cluster-recovery state machine At present, cluster recovery process consists of following steps: 1. ClusterController clusterWatchDatabase actor recruits master/sequencer process. 2. Sequencer process implements the cluster recovery state machine, responsible to recruit all other processes as well restore the cluster state. Patch proposes a scheme where the cluster recovery state machine is implemented and driven by the ClusterController process instead of the Sequencer process. Advantages of the scheme could be: 1. Simplified design where ClusterController recruits "sequencer" process like other worker processes compared to current scheme where "sequencer" process gets special treatment. In newer scheme sequencer is responsible for maintaining/providing "committed version" (as expected). 2. ClusterController is responsible for worker processes recruitment, the sequencer though orchestrating the recovery state machine, it need to reachout to the ClusterController for recruiting worker processes etc. NOTE: Patch has moved the recovery state machine code from 'sequencer' -> 'cluster-controller' process, however, necessary updates were done for both functionality as well as performance improvement reasons. Next Steps: Cluster recovery documentation will be updated in near future.	2021-12-22 14:06:27 -08:00
Evan Tschannen	e3819dad7c	fix: If a removed tlog never attempted a queue commit, the update storage loop could get stuck waiting for queueCommittingVersion to advance	2021-11-25 09:55:01 -08:00
Evan Tschannen	964d0209ca	Merge pull request #5637 from sfc-gh-ljoswiak/features/data-loss-prevention Data loss protection when joining new cluster	2021-11-15 15:26:32 -08:00
Lukas Joswiak	e4c3f886da	Fix recovery issue	2021-11-10 16:15:13 -08:00
Lukas Joswiak	15e0d5b29f	Add explicit transaction options when reading cluster ID	2021-11-09 12:29:49 -08:00
Lukas Joswiak	74cf64fe0f	Sync cluster ID through ServerDBInfo	2021-11-09 12:29:48 -08:00
Lukas Joswiak	4640045243	Fix rare simulation failures When partitions appear before a cluster has fully recovered, it was possible to have different tlogs persist different cluster IDs because they were involved in different partitions. This would affect recovery when a quorum was eventually reached. The solution to this is to avoid persisting the cluster ID before a cluster has fully recovered, to make sure all nodes agree on the cluster ID.	2021-11-09 12:29:48 -08:00
Lukas Joswiak	3988b11fd6	Cleanup	2021-11-09 12:29:48 -08:00
Lukas Joswiak	aa3383f0e3	Exclude when joining new cluster	2021-11-09 12:29:48 -08:00
Lukas Joswiak	3e2c65bb11	Allow tlog to join another cluster but retain its data	2021-11-09 12:29:48 -08:00
Lukas Joswiak	30867750b5	Add protection against storage and tlog data deletion when joining a new cluster	2021-11-09 12:29:47 -08:00
Markus Pilman	7df059570a	Make sure unit tests are run often enough	2021-11-08 15:43:32 -07:00
Evan Tschannen	c615279807	Merge pull request #5720 from sfc-gh-ljoswiak/fixes/recovery-failure-fix Fix possible recovery hang	2021-10-25 12:35:31 -07:00
Evan Tschannen	f1158371a7	Merge branch 'master' of https://github.com/apple/foundationdb into feature-range-feed # Conflicts: # flow/error_definitions.h	2021-10-21 00:55:12 -07:00
Lukas Joswiak	120d99e941	Fix a recovery hang that could occur when a new recovery was started during the existing recovery	2021-10-19 17:37:14 -07:00
sfc-gh-tclinkenbeard	9e06b6e6e3	Make IClosable interface const-correct	2021-10-18 13:40:47 -07:00
Evan Tschannen	5c642f706e	Merge branch 'master' of https://github.com/apple/foundationdb into feature-range-feed # Conflicts: # fdbcli/fdbcli.actor.cpp	2021-10-09 19:34:16 -07:00
Xiaoge Su	abf73047ca	Enforce std:: specifier rather than using namespace	2021-09-16 19:40:28 -07:00
Xiaoge Su	067c1cc55b	Extract methods in LogSystem.h to corresponding cpp file	2021-09-12 14:17:19 -07:00
Evan Tschannen	ac5b580e2d	Merge branch 'master' into feature-range-feed # Conflicts: # fdbcli/fdbcli.actor.cpp # fdbclient/StorageServerInterface.cpp # fdbclient/StorageServerInterface.h # fdbserver/ApplyMetadataMutation.cpp # fdbserver/TLogServer.actor.cpp # flow/error_definitions.h	2021-09-09 23:13:22 -07:00
Steve Atherton	deeb6b3404	Merge branch 'master' of https://github.com/apple/foundationdb into durability-bug-repro1 # Conflicts: # fdbserver/TLogServer.actor.cpp	2021-08-24 16:19:16 -07:00
Steve Atherton	ec0e39b40f	Bug fix: Popped versions are exclusive, so after recovery a tag for which there is no longer data should be considered popped up until the version after recovery, indicating that data at the recovery version itself has been popped.	2021-08-24 15:16:20 -07:00
Xiaoxi Wang	a97570bd06	solve mis-spelling, trace log and format problems	2021-08-11 18:26:00 -07:00
Xiaoxi Wang	1f6cee89ab	merge master, fix conflicts	2021-08-10 10:01:45 -07:00
Steve Atherton	c73e861074	Move role UIDs for MutationTracking TraceEvents from various inconsistent detail fields into the TraceEvent UID field.	2021-08-10 01:59:28 -07:00
Steve Atherton	54c7036eaf	Move role UIDs for MutationTracking TraceEvents from various inconsistent detail fields into the TraceEvent UID field.	2021-08-10 01:52:36 -07:00
Evan Tschannen	208a5790ad	fixed usage of durable version	2021-08-09 21:58:44 -07:00
Evan Tschannen	ed28aecde0	Merge branch 'master' into feature-range-feed	2021-08-09 20:40:55 -07:00
Evan Tschannen	bc9a0e1315	first attempt to add data distribution support for range feeds	2021-08-09 10:05:56 -07:00
Xiaoxi Wang	2263626cdc	200k test clean: enable remote Log pull from LogRouter	2021-08-07 09:53:32 -07:00
Xiaoxi Wang	2df0474fec	merge master	2021-08-02 11:58:35 -07:00
Xiaoxi Wang	ae2268f9f2	200k simulation: check stream sequence; delay in GetMore loop	2021-08-02 10:52:24 -07:00
Xiaoxi Wang	2a88033800	clean 100k simulation test. revert changes of fdbrpc.h	2021-07-31 16:46:14 -07:00
Xiaoxi Wang	1c4bce17aa	revert code refactor	2021-07-30 19:08:22 -07:00
Xiaoxi Wang	10c82b422f	merge master branch	2021-07-28 14:19:46 -07:00
Xiaoxi Wang	12d4f5c261	disable streaming peek for localities < 0	2021-07-28 14:11:25 -07:00
sfc-gh-tclinkenbeard	c74047c665	Merge remote-tracking branch 'origin/master' into fix-more-clang-warnings	2021-07-28 11:51:02 -07:00
Steve Atherton	507c1f11e3	Add .log() to bare TraceEvent() invocations without any .detail()s to avoid clang-tidy warning about immediate destruction of object without use.	2021-07-26 19:55:10 -07:00
Xiaoxi Wang	c6b0de1264	problem: OOM	2021-07-26 09:36:53 -07:00
sfc-gh-tclinkenbeard	23558a5430	Fix -Wreorder-ctor warnings in TLogServer.actor.cpp	2021-07-24 23:15:22 -07:00

1 2 3 4 5 ...

576 Commits