Commit Graph

342 Commits

Author SHA1 Message Date
Ata E Husain Bohra 936bf5336a
Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191)
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""

Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9.
1.b. This reverts commit d174bb2e06.
1.c. This reverts commit 30b05b469c.

2. Update Status.actor to track ClusterController interface to track
   recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
   prefix; for now keeping it as "Master", however, it should allow
   smooth transition to "Cluster" prefix as it seems more appropriate.
2022-01-06 12:15:51 -08:00
Aaron Molitor 30b05b469c Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit dfe9d184ff.
2021-12-24 11:25:51 -08:00
Ata E Husain Bohra dfe9d184ff Refactor: ClusterController driving cluster-recovery state machine
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Lukas Joswiak 3988b11fd6 Cleanup 2021-11-09 12:29:48 -08:00
Lukas Joswiak 3e2c65bb11 Allow tlog to join another cluster but retain its data 2021-11-09 12:29:48 -08:00
Lukas Joswiak 30867750b5 Add protection against storage and tlog data deletion when joining a new cluster 2021-11-09 12:29:47 -08:00
Xiaoge Su abf73047ca Enforce std:: specifier rather than using namespace 2021-09-16 19:40:28 -07:00
Xiaoge Su c32c3b6ec4 fixup! Reformat the code per github's requirement 2021-09-12 14:17:19 -07:00
Xiaoge Su 40648dbb31 fixup! Update code per comment
Also fix the issue that TagPartitionedLogSystem.actor.cpp should include
TagPartitionedLogSystem.actor.h
2021-09-12 14:17:19 -07:00
Xiaoge Su ecca4edeb4 Create TagPartitionedLogSystem.actor.h
TagPartitionedLogSystem.actor.h contains the struct of TagPartitionedLogSystem.
2021-09-12 14:17:19 -07:00
yao-xiao-github 8609b45354
Add histograms to CommitProxyServer. (#5299) 2021-08-05 09:17:37 -07:00
Andrew Noyes 353efe7db2
Merge pull request #5264 from sfc-gh-tclinkenbeard/fix-more-clang-warnings
Enable more warnings for `clang`
2021-07-29 15:43:54 -07:00
sfc-gh-tclinkenbeard 94a65865d9 Merge remote-tracking branch 'origin/master' into fix-clang-warnings 2021-07-28 12:29:27 -07:00
sfc-gh-tclinkenbeard c74047c665 Merge remote-tracking branch 'origin/master' into fix-more-clang-warnings 2021-07-28 11:51:02 -07:00
Steve Atherton 507c1f11e3 Add .log() to bare TraceEvent() invocations without any .detail()s to avoid clang-tidy warning about immediate destruction of object without use. 2021-07-26 19:55:10 -07:00
sfc-gh-tclinkenbeard 64dc1dc185 Fix -Wreorder-ctor warnings in NativeAPI.actor.cpp and several other files 2021-07-24 00:23:06 -07:00
sfc-gh-tclinkenbeard e62e6503ac Fix most delete-non-virtual-dtor clang warnings 2021-07-21 23:32:44 -07:00
Lukas Joswiak 5338251946 Fix invalid read 2021-07-13 10:44:37 -07:00
Jingyu Zhou c700feaa6e Address Dan's comments 2021-06-28 11:16:35 -07:00
sfc-gh-tclinkenbeard 371a38e6e5 Merge remote-tracking branch 'origin/master' into remove-extra-copies 2021-06-07 10:26:06 -07:00
sfc-gh-tclinkenbeard 594e8944ae Move RestoreWorkerInterface into fdbserver 2021-05-30 11:51:47 -07:00
sfc-gh-tclinkenbeard f28ac955c3 Remove unnecessary temporary objects while growing objects of type std::vector<std::pair<A, B>> 2021-05-10 16:32:50 -07:00
A.J. Beamon 25c4880ebe Merge branch 'release-6.3' into merge-release-6.3-into-master (temporarily discard all changes to BackupContainer.actor.cpp)
# Conflicts:
#	fdbclient/BackupContainer.actor.cpp
#	fdbserver/Knobs.h
2021-03-15 16:41:04 -07:00
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Vishesh Yadav 3fbe7e69fa Add TraceEvent to warn tLogs which has not joined after timeout
During recovery, wait for TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS and
log the tLogs that has not joined yet.
2021-03-09 18:57:46 -08:00
FDB Formatster 8a8c488ede apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-05 18:13:38 -06:00
Andrew Noyes 79cec09255 Apply clang-tidy's performance-inefficient-vector-operation fix
I ran this command in my build directory after compiling with
OPEN_FOR_IDE. It took a few small tweaks to get it to compile, which is
outside the scope of this commit.

    $ python run-clang-tidy.py -j $(nproc) -checks='-*,performance-inefficient-vector-operation' -fix
2021-03-04 03:58:25 +00:00
Evan Tschannen 346a4e3ecd Merge branch 'release-6.3'
# Conflicts:
#	fdbcli/fdbcli.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/MultiInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2021-03-01 18:52:06 -08:00
Meng Xu 0c28e1d640 Add comment to peekLogRouter in TagPartitionedLogSystem 2021-02-22 16:20:50 -08:00
Meng Xu ef0bf2728e Merge branch 'release-6.3' into mengxu/ha-code
Resolve Conflicts:
	fdbserver/LogRouter.actor.cpp: Only conflicts at comments
2021-02-19 21:47:09 -08:00
Meng Xu 33eb1de00e Add some comment to log system
and resolve review comment by deleting my questions.
2021-02-19 21:44:13 -08:00
Meng Xu 471a3489fb Resolve review comments and add trace fields to MasterRecoveryState 2021-02-17 14:44:14 -08:00
Meng Xu 9122be4d81 Add comments to HA code and loadBalance code 2021-02-10 13:51:36 -08:00
sfc-gh-tclinkenbeard dd3669886e Improve IPeekCursor and ILogSystem method signatures 2020-12-08 09:09:31 -08:00
Andrew Noyes 877997632d Merge branch 'release-6.3' into anoyes/merge-release-6.3-master
Include conflict markers for review purposes
2020-12-04 01:38:07 +00:00
Andrew Noyes dc2bac5670 Resolve conflicts 2020-11-24 19:09:42 +00:00
Andrew Noyes 1f541f02be Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
Meng Xu 046a6e8427 Add Alex comment on tLog 2020-11-12 13:29:11 -08:00
sfc-gh-tclinkenbeard 4669f837fa Add uses of makeReference 2020-11-07 22:10:18 -08:00
sfc-gh-tclinkenbeard 0ff1809d25 Remove emplace_back(new ...) pattern
This pattern can cause a memory leak due to an exception between
resource allocation and management
2020-11-07 22:09:53 -08:00
Meng Xu 4788544a6f Revise comments based on review suggestions
Ack. Jingyu and Xin for their suggestions.
2020-11-06 08:51:13 -08:00
Meng Xu 063700e4d6 Add comments and questions to HA and tLog code reading
The comments' correctness need to be confirmed by reviewers.
2020-10-30 12:14:57 -07:00
Lukas Joswiak 53b7721d6c Add additional trace information 2020-09-04 15:36:47 -07:00
Lukas Joswiak 2a58e775d2 Add original changes 2020-09-04 15:36:47 -07:00
Evan Tschannen 12edadd059 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	fdbclient/Knobs.cpp
#	fdbclient/MasterProxyInterface.h
#	fdbrpc/simulator.h
#	fdbserver/MasterProxyServer.actor.cpp
#	tests/fast/CycleAndLock.txt
#	tests/fast/TxnStateStoreCycleTest.txt
#	tests/fast/VersionStamp.txt
#	tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
#	tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
#	versions.target
2020-08-31 19:33:34 -07:00
Evan Tschannen 29eec30183 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	CMakeLists.txt
#	build/Dockerfile
#	build/Dockerfile.devel
#	documentation/sphinx/source/downloads.rst
#	fdbserver/Knobs.cpp
#	fdbserver/LogSystem.h
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WaitFailure.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen 507c67c930 Added additional information to trace events 2020-08-26 11:42:23 -07:00
Evan Tschannen 28cb5f242c another fix 2020-08-26 11:01:40 -07:00
Evan Tschannen e81ccd2dc9 another compiler fix 2020-08-26 10:59:06 -07:00
Evan Tschannen e531046b53 fix compiler errors 2020-08-26 10:56:21 -07:00