Commit Graph

381 Commits

Author SHA1 Message Date
Jingyu Zhou 0259a243ae Switch DC if log router peek becomes stuck
Trying to a different DC if this happens.
2023-03-06 17:41:56 -08:00
Zhe Wu 087d37d10b Add event for txn server initialization and a warning for TLog slow catching up 2023-01-11 10:02:06 -08:00
sfc-gh-tclinkenbeard 3c97f43138 Change Histogram::Unit::microseconds to milliseconds 2022-11-21 08:03:56 -08:00
sfc-gh-tclinkenbeard c9968f5c0c Mark several methods of ILogSystem, IPeekCursor and LogPushData const 2022-11-15 14:57:32 -08:00
Zhe Wu 32bc9b6ebb Fix a race condition between batched peek and pop, where the server removal pop may be lost 2022-11-04 15:05:37 -07:00
Lukas Joswiak 9d3c3b1efe Remove cluster ID logic from individual roles
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
2022-10-27 13:56:13 -07:00
Jingyu Zhou a8391caf23 Revert "Data loss protection v2" 2022-10-20 18:09:58 -05:00
Lukas Joswiak 72bc89cf39 Remove cluster ID logic from individual roles
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
2022-10-18 21:37:42 -07:00
sfc-gh-tclinkenbeard 82adc1e856 Make g_simulator a pointer 2022-09-15 09:00:33 -07:00
Zhe Wu bd99f4fa3b Log tlog initialization 2022-07-21 13:54:44 -07:00
Markus Pilman 1de37afd52
Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
Jingyu Zhou b2fded5c51 CC sends recovery txn version during TLog recruitment
This simplifies the logic for TLog to wait for recovery txn before replying
back to peeks.
2022-05-24 14:57:55 -07:00
Zhe Wu cb73352e36 Don't pop every generation of old log router 2022-05-24 08:47:57 -07:00
Ray Jenkins dc9e782ccc
OpenTelemetry Tracing Perf Fixes (#6990) 2022-05-02 14:56:51 -05:00
Evan Tschannen a825eb8a8c fix: when more tlogs are absent than the replication factor we would access invalid memory 2022-04-27 16:53:30 -07:00
Ray Jenkins 1c5bf135d5
Revert "Migrate to OpenTelemetry tracing. (#6855)" (#6941)
This reverts commit 5df3bac110.
2022-04-25 09:29:56 -05:00
Ray Jenkins 5df3bac110
Migrate to OpenTelemetry tracing. (#6855) 2022-04-20 09:26:37 -05:00
Jingyu Zhou cfcf0f152c Merge branch 'main-4a085fc84' into vv
Fix Conflicts:
	fdbclient/NativeAPI.actor.cpp
	fdbserver/ClusterRecovery.actor.cpp
	fdbserver/MasterInterface.h
	fdbserver/masterserver.actor.cpp
	flow/error_definitions.h
2022-03-30 22:28:06 -07:00
Dan Lambright f867474b05 Respond to Jingyu's comments 3/24 2022-03-25 10:50:41 -04:00
Dan Lambright 12e88a8ef5 Error in previous commit 2022-03-23 14:16:45 -04:00
Dan Lambright f23f451cc4 Fix bug computing tlog count per log group 2022-03-22 14:12:26 -04:00
sfc-gh-tclinkenbeard a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Dan Lambright d69aa8ae92 retain tlog count per log group, add fix dropped in previous rebase 2022-03-21 15:08:13 -04:00
Dan Lambright 6e507b8c07 refactor unicast recovery 2022-03-17 12:25:50 -04:00
Dan Lambright b529801407 Respond to Jingyu's comments 2022-03-15 19:17:54 -04:00
Dan Lambright de73fc03dc fix recovery algorithm 2022-03-14 08:59:58 -04:00
Dan Lambright 9544379cdf rebase 2022-01-20 11:12:33 -05:00
Dan Lambright adc9055097 Do not restart recovery unless min DV of recovered tlog set goes down 2022-01-11 12:52:05 -05:00
Ata E Husain Bohra 936bf5336a
Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191)
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""

Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9.
1.b. This reverts commit d174bb2e06.
1.c. This reverts commit 30b05b469c.

2. Update Status.actor to track ClusterController interface to track
   recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
   prefix; for now keeping it as "Master", however, it should allow
   smooth transition to "Cluster" prefix as it seems more appropriate.
2022-01-06 12:15:51 -08:00
Dan Lambright 49e89571fa Set recoverAt to max(all tlogs rv) for recovered (crashed) tLogs in UNICAST mode. 2022-01-04 12:27:20 -05:00
Aaron Molitor 30b05b469c Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit dfe9d184ff.
2021-12-24 11:25:51 -08:00
Ata E Husain Bohra dfe9d184ff Refactor: ClusterController driving cluster-recovery state machine
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Dan Lambright 792d7d288d address review comments 2021-12-19 12:50:59 -05:00
Dan Lambright 0222d8669d fix simulation failures 2021-12-10 09:56:21 -05:00
Dan Lambright faef404279 system rv is max of tlog's rv 2021-11-15 09:42:01 -05:00
Dan Lambright 4979ccb889 commits recovered if written to every tlog minus failure tolerance. 2021-11-12 12:10:04 -05:00
Dan Lambright 0f99ad582b first cut unicast recovery 2021-11-10 12:31:16 -05:00
Lukas Joswiak 3988b11fd6 Cleanup 2021-11-09 12:29:48 -08:00
Lukas Joswiak 3e2c65bb11 Allow tlog to join another cluster but retain its data 2021-11-09 12:29:48 -08:00
Lukas Joswiak 30867750b5 Add protection against storage and tlog data deletion when joining a new cluster 2021-11-09 12:29:47 -08:00
Dan Lambright 58e1888d8e remove network hop by getting previous commit versions in GetCommitVersionRequest 2021-09-30 11:51:57 -04:00
Sreenath Bodagala 184c134b8a - Resolve merge conflicts 2021-09-17 20:25:16 +00:00
Sreenath Bodagala 2aa3b44d4e Merge remote-tracking branch 'apple-upstream/master' into version-vector-prototype
- Conflicts:
	fdbserver/LogSystem.h
	fdbserver/LogSystemConfig.h
	fdbserver/TagPartitionedLogSystem.actor.cpp

- Files modified during merge:

modified:   fdbserver/LogSystem.cpp
modified:   fdbserver/LogSystemConfig.cpp
2021-09-17 19:36:18 +00:00
Xiaoge Su abf73047ca Enforce std:: specifier rather than using namespace 2021-09-16 19:40:28 -07:00
Xiaoge Su c32c3b6ec4 fixup! Reformat the code per github's requirement 2021-09-12 14:17:19 -07:00
Xiaoge Su 40648dbb31 fixup! Update code per comment
Also fix the issue that TagPartitionedLogSystem.actor.cpp should include
TagPartitionedLogSystem.actor.h
2021-09-12 14:17:19 -07:00
Xiaoge Su ecca4edeb4 Create TagPartitionedLogSystem.actor.h
TagPartitionedLogSystem.actor.h contains the struct of TagPartitionedLogSystem.
2021-09-12 14:17:19 -07:00
Sreenath Bodagala a081c0baa5 Merge remote-tracking branch 'apple-upstream/master' into version-vector-prototype 2021-08-05 22:40:32 +00:00
yao-xiao-github 8609b45354
Add histograms to CommitProxyServer. (#5299) 2021-08-05 09:17:37 -07:00
Andrew Noyes 353efe7db2
Merge pull request #5264 from sfc-gh-tclinkenbeard/fix-more-clang-warnings
Enable more warnings for `clang`
2021-07-29 15:43:54 -07:00