Aaron Molitor
30b05b469c
Revert "Refactor: ClusterController driving cluster-recovery state machine"
...
This reverts commit dfe9d184ff
.
2021-12-24 11:25:51 -08:00
Aaron Molitor
d174bb2e06
Revert "Refactor: ClusterController driving cluster-recovery state machine"
...
This reverts commit abd2959702
.
2021-12-24 11:25:51 -08:00
Ata E Husain Bohra
abd2959702
Refactor: ClusterController driving cluster-recovery state machine
...
diff-1: Address Jingyu's review comments
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
responsible to recruit all other processes as well restore the
cluster state.
Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.
Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
process like other worker processes compared to current scheme
where "sequencer" process gets special treatment. In newer scheme
sequencer is responsible for maintaining/providing
"committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
the sequencer though orchestrating the recovery state machine, it
need to reachout to the ClusterController for recruiting worker
processes etc.
NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.
Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Ata E Husain Bohra
dfe9d184ff
Refactor: ClusterController driving cluster-recovery state machine
...
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
responsible to recruit all other processes as well restore the
cluster state.
Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.
Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
process like other worker processes compared to current scheme
where "sequencer" process gets special treatment. In newer scheme
sequencer is responsible for maintaining/providing
"committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
the sequencer though orchestrating the recovery state machine, it
need to reachout to the ClusterController for recruiting worker
processes etc.
NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.
Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Dan Lambright
9f4ac866cd
Avoid context switch between appending version list and updating dv
...
Port PR 6117 (Resolver saves shardChanged in recent state transactions)
2021-12-13 13:02:32 -05:00
Dan Lambright
0222d8669d
fix simulation failures
2021-12-10 09:56:21 -05:00
Evan Tschannen
e3819dad7c
fix: If a removed tlog never attempted a queue commit, the update storage loop could get stuck waiting for queueCommittingVersion to advance
2021-11-25 09:55:01 -08:00
Evan Tschannen
964d0209ca
Merge pull request #5637 from sfc-gh-ljoswiak/features/data-loss-prevention
...
Data loss protection when joining new cluster
2021-11-15 15:26:32 -08:00
Dan Lambright
4979ccb889
commits recovered if written to every tlog minus failure tolerance.
2021-11-12 12:10:04 -05:00
Lukas Joswiak
e4c3f886da
Fix recovery issue
2021-11-10 16:15:13 -08:00
Dan Lambright
0f99ad582b
first cut unicast recovery
2021-11-10 12:31:16 -05:00
Sreenath Bodagala
1ec238b8b4
- Address a review comment
2021-11-09 20:46:42 +00:00
Lukas Joswiak
15e0d5b29f
Add explicit transaction options when reading cluster ID
2021-11-09 12:29:49 -08:00
Lukas Joswiak
74cf64fe0f
Sync cluster ID through ServerDBInfo
2021-11-09 12:29:48 -08:00
Lukas Joswiak
4640045243
Fix rare simulation failures
...
When partitions appear before a cluster has fully recovered, it was
possible to have different tlogs persist different cluster IDs because
they were involved in different partitions. This would affect recovery
when a quorum was eventually reached. The solution to this is to avoid
persisting the cluster ID before a cluster has fully recovered, to make
sure all nodes agree on the cluster ID.
2021-11-09 12:29:48 -08:00
Lukas Joswiak
3988b11fd6
Cleanup
2021-11-09 12:29:48 -08:00
Lukas Joswiak
aa3383f0e3
Exclude when joining new cluster
2021-11-09 12:29:48 -08:00
Lukas Joswiak
3e2c65bb11
Allow tlog to join another cluster but retain its data
2021-11-09 12:29:48 -08:00
Lukas Joswiak
30867750b5
Add protection against storage and tlog data deletion when joining a new cluster
2021-11-09 12:29:47 -08:00
Sreenath Bodagala
26ac1529fa
- Unblock any waiting peeks before stopping a tlog.
2021-11-09 17:22:50 +00:00
Markus Pilman
7df059570a
Make sure unit tests are run often enough
2021-11-08 15:43:32 -07:00
Dan Lambright
05a1419ba0
Fix corner-case where poppedVersion races with wait on new mutations in tLog
2021-11-03 11:32:31 -04:00
Dan Lambright
befe1993c4
fix conflict on rebase
2021-10-29 12:25:26 -04:00
Sreenath Bodagala
2bf54fda90
- Address review comments
2021-10-28 20:06:11 +00:00
Sreenath Bodagala
4503b0a347
- Capture metrics about empty/non-empty peeks done by storage servers
2021-10-26 14:37:46 +00:00
Evan Tschannen
c615279807
Merge pull request #5720 from sfc-gh-ljoswiak/fixes/recovery-failure-fix
...
Fix possible recovery hang
2021-10-25 12:35:31 -07:00
Evan Tschannen
f1158371a7
Merge branch 'master' of https://github.com/apple/foundationdb into feature-range-feed
...
# Conflicts:
# flow/error_definitions.h
2021-10-21 00:55:12 -07:00
Lukas Joswiak
120d99e941
Fix a recovery hang that could occur when a new recovery was started during the existing recovery
2021-10-19 17:37:14 -07:00
sfc-gh-tclinkenbeard
9e06b6e6e3
Make IClosable interface const-correct
2021-10-18 13:40:47 -07:00
Dan Lambright
23062b892e
Calculate tpcv on resolvers
2021-10-15 16:40:00 -04:00
Dan Lambright
f099bb2574
comments on this PR's change
2021-10-15 15:08:25 -04:00
Dan Lambright
15dc5a3e41
wake waiters when data made durable
2021-10-15 10:58:48 -04:00
Evan Tschannen
5c642f706e
Merge branch 'master' of https://github.com/apple/foundationdb into feature-range-feed
...
# Conflicts:
# fdbcli/fdbcli.actor.cpp
2021-10-09 19:34:16 -07:00
Dan Lambright
58e1888d8e
remove network hop by getting previous commit versions in GetCommitVersionRequest
2021-09-30 11:51:57 -04:00
Sreenath Bodagala
2aa3b44d4e
Merge remote-tracking branch 'apple-upstream/master' into version-vector-prototype
...
- Conflicts:
fdbserver/LogSystem.h
fdbserver/LogSystemConfig.h
fdbserver/TagPartitionedLogSystem.actor.cpp
- Files modified during merge:
modified: fdbserver/LogSystem.cpp
modified: fdbserver/LogSystemConfig.cpp
2021-09-17 19:36:18 +00:00
Xiaoge Su
abf73047ca
Enforce std:: specifier rather than using namespace
2021-09-16 19:40:28 -07:00
Xiaoge Su
067c1cc55b
Extract methods in LogSystem.h to corresponding cpp file
2021-09-12 14:17:19 -07:00
Evan Tschannen
ac5b580e2d
Merge branch 'master' into feature-range-feed
...
# Conflicts:
# fdbcli/fdbcli.actor.cpp
# fdbclient/StorageServerInterface.cpp
# fdbclient/StorageServerInterface.h
# fdbserver/ApplyMetadataMutation.cpp
# fdbserver/TLogServer.actor.cpp
# flow/error_definitions.h
2021-09-09 23:13:22 -07:00
Dan Lambright
d8d64ecc6f
Add TODO
2021-09-09 12:47:00 -04:00
Dan Lambright
ea748f3273
Add latency metrics for blocking peek
2021-09-08 09:50:01 -04:00
Dan Lambright
8689e1f106
merge with master
2021-08-30 15:29:08 -04:00
Steve Atherton
deeb6b3404
Merge branch 'master' of https://github.com/apple/foundationdb into durability-bug-repro1
...
# Conflicts:
# fdbserver/TLogServer.actor.cpp
2021-08-24 16:19:16 -07:00
Steve Atherton
ec0e39b40f
Bug fix: Popped versions are exclusive, so after recovery a tag for which there is no longer data should be considered popped up until the version *after* recovery, indicating that data at the recovery version itself has been popped.
2021-08-24 15:16:20 -07:00
Sreenath Bodagala
7c269b5225
- Address a bug
2021-08-17 14:40:00 +00:00
Xiaoxi Wang
a97570bd06
solve mis-spelling, trace log and format problems
2021-08-11 18:26:00 -07:00
Sreenath Bodagala
cec744cebf
- Address the following issues:
...
- Sequencer should update the version vector once for a given commit
version (irrespective of the number of times that it receives and
processes the ReportRawCommittedVersionRequest message for that commit
version). Issue found by simulation tests.
- Storage server should take both its latest commit version and the
read version into account while processing a read request. This is to
address transaction_too_old error that we saw while running tests with
mako (and also in YCSB tests).
- Do not enable the tlog blocking-peek logic if ENABLE_VERSION_VECTOR
flag is set to false.
2021-08-10 19:47:18 +00:00
Xiaoxi Wang
1f6cee89ab
merge master, fix conflicts
2021-08-10 10:01:45 -07:00
Steve Atherton
c73e861074
Move role UIDs for MutationTracking TraceEvents from various inconsistent detail fields into the TraceEvent UID field.
2021-08-10 01:59:28 -07:00
Steve Atherton
54c7036eaf
Move role UIDs for MutationTracking TraceEvents from various inconsistent detail fields into the TraceEvent UID field.
2021-08-10 01:52:36 -07:00
Evan Tschannen
208a5790ad
fixed usage of durable version
2021-08-09 21:58:44 -07:00
Evan Tschannen
ed28aecde0
Merge branch 'master' into feature-range-feed
2021-08-09 20:40:55 -07:00
Evan Tschannen
bc9a0e1315
first attempt to add data distribution support for range feeds
2021-08-09 10:05:56 -07:00
Xiaoxi Wang
2263626cdc
200k test clean: enable remote Log pull from LogRouter
2021-08-07 09:53:32 -07:00
Sreenath Bodagala
1758c92683
- Pull changes related to tlog-peeks from the version indexer branch
...
Pull commits 5e37bc37a0
and
95e85aaffb
from the version indexer branch.
2021-08-06 14:42:35 +00:00
Sreenath Bodagala
a081c0baa5
Merge remote-tracking branch 'apple-upstream/master' into version-vector-prototype
2021-08-05 22:40:32 +00:00
Xiaoxi Wang
2df0474fec
merge master
2021-08-02 11:58:35 -07:00
Xiaoxi Wang
ae2268f9f2
200k simulation: check stream sequence; delay in GetMore loop
2021-08-02 10:52:24 -07:00
Xiaoxi Wang
2a88033800
clean 100k simulation test. revert changes of fdbrpc.h
2021-07-31 16:46:14 -07:00
Xiaoxi Wang
1c4bce17aa
revert code refactor
2021-07-30 19:08:22 -07:00
Xiaoxi Wang
10c82b422f
merge master branch
2021-07-28 14:19:46 -07:00
Xiaoxi Wang
12d4f5c261
disable streaming peek for localities < 0
2021-07-28 14:11:25 -07:00
sfc-gh-tclinkenbeard
c74047c665
Merge remote-tracking branch 'origin/master' into fix-more-clang-warnings
2021-07-28 11:51:02 -07:00
Steve Atherton
507c1f11e3
Add .log() to bare TraceEvent() invocations without any .detail()s to avoid clang-tidy warning about immediate destruction of object without use.
2021-07-26 19:55:10 -07:00
Xiaoxi Wang
c6b0de1264
problem: OOM
2021-07-26 09:36:53 -07:00
sfc-gh-tclinkenbeard
23558a5430
Fix -Wreorder-ctor warnings in TLogServer.actor.cpp
2021-07-24 23:15:22 -07:00
sfc-gh-tclinkenbeard
b9a22a61ef
Fix many -Wreorder-ctor warnings
2021-07-23 17:33:18 -07:00
Xiaoxi Wang
bfebd4e812
Merge branch 'master' of https://github.com/apple/foundationdb into tlog_dev
2021-07-22 16:15:07 -07:00
Xiaoxi Wang
cd32478b52
memory error(Simple config)
2021-07-22 15:45:59 -07:00
Xiaoxi Wang
1057835e8b
merge with master
2021-07-20 17:09:34 -07:00
Xiaoxi Wang
5046ee3b07
add stream peek to logRouter
2021-07-20 17:42:00 +00:00
sfc-gh-tclinkenbeard
6f81155784
Merge remote-tracking branch 'origin/master' into const-serverdbinfo
2021-07-20 10:18:40 -07:00
Xiaoxi Wang
f3667ce91a
more debug logs; let tryEstablishStream wait until the connection is good
2021-07-19 18:43:51 +00:00
Steve Atherton
f596a81073
Rename ::TRUE and ::FALSE in BooleanParams to ::True and ::False so as to not conflict with the TRUE and FALSE macros provided by the Windows and MacOS SDKs.
2021-07-17 00:11:40 -07:00
Xiaoxi Wang
227570357a
trace log and reset changes; byteAcknownledge overflow
2021-07-15 21:30:14 +00:00
Sreenath Bodagala
5f504d2148
- Block a peek request on a tlog until the tlog has a commit version
...
that is relevant to the requester
Code extracted from https://github.com/apple/foundationdb/pull/5058
2021-07-15 19:49:20 +00:00
Xiaoxi Wang
1584ed5853
Merge branch 'master' of https://github.com/apple/foundationdb into tlog_dev
2021-07-14 16:20:19 +00:00
Xiaoxi Wang
066d534194
trivial changes
2021-07-14 16:19:23 +00:00
sfc-gh-tclinkenbeard
84f6b55e6c
Prevent tLog from modifying ServerDBInfo object
2021-07-11 23:29:36 -07:00
Xiaoxi Wang
6d1c12899d
catch exceptions
2021-07-09 22:46:16 +00:00
Xiaoxi Wang
5a43a8c367
add returnIfBlocked in stream request
2021-07-08 19:32:58 +00:00
sfc-gh-tclinkenbeard
020371a78f
Merge remote-tracking branch 'origin/master' into add-boolean-param
2021-07-07 16:50:51 -07:00
Zhe Wang
cc10c9aee2
clean up add_trace_event_to_tLog_pop
2021-07-07 14:14:59 -05:00
Zhe Wang
b82a3f4276
add trace event to tLog pop
2021-07-07 12:13:49 -05:00
Xiaoxi Wang
b6d5c8a091
implement tLogPeekStream
2021-07-06 23:14:58 +00:00
Xiaoxi Wang
9948b9d4ef
refactor TLog Peek code
2021-07-05 00:14:27 +00:00
sfc-gh-tclinkenbeard
8cc40e3a2b
Expand use of BOOLEAN_PARAM
2021-07-02 21:41:50 -07:00
sfc-gh-tclinkenbeard
79ff07a071
Added *BOOLEAN_PARAM macros to enforce documentation of boolean parameters
2021-07-02 15:04:42 -07:00
Xiaoxi Wang
b50fda6b4b
add simple streaming peek functions
2021-07-01 23:17:28 +00:00
Xiaoxi Wang
ae3542f8ab
add stream struct in Tlog
2021-06-29 17:06:09 +00:00
Evan Tschannen
fcb8bd6475
Revert "Make the sim2 run loop match the behavior of the net2 run loop."
2021-06-22 14:50:01 -07:00
Evan Tschannen
154332a94b
Merge branch 'master' of https://github.com/apple/foundationdb into feature-sim-time-batching
...
# Conflicts:
# fdbserver/VersionedBTree.actor.cpp
2021-06-22 09:37:40 -07:00
Zhe Wang
ae7b93dcce
add epoch info to trace events when tLog begins
2021-06-09 19:14:36 -05:00
Evan Tschannen
801f147551
properly handle io_errors from the destructor of LogData
2021-05-20 18:23:11 -07:00
Evan Tschannen
cc18022e7d
small clang format finds
2021-05-20 16:45:08 -07:00
Evan Tschannen
f57f0d64f4
Merge branch 'master' into feature-sim-time-batching
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
2021-05-20 09:09:35 -07:00
Evan Tschannen
907248dcd4
fixed a rare simulation bug where missingFinalCommit could be skipped by two successive logSystem changes
2021-05-19 13:26:01 -07:00
Lukas Joswiak
e7d7b39f12
Merge pull request #4744 from sfc-gh-tclinkenbeard/add-rangeresult-type-alias
...
Create RangeResult type alias
2021-05-03 16:29:33 -07:00
sfc-gh-tclinkenbeard
5c2d7b6080
Create RangeResult type alias
2021-05-03 13:14:16 -07:00
Steve Atherton
cbd77fe6f3
Added new StorageBytes member to StorageMetrics and TLogMetrics (for newest TLog version only). Moved StorageBytes detail from SpecialCounters to the traceCounters() decorator callback to avoid calling getStorageBytes(), which makes a system call, four extra times on storage servers and eight extra times on logs.
2021-04-08 01:09:47 -07:00
Evan Tschannen
0554a05fc2
typo
2021-03-19 13:19:26 -07:00