Commit Graph

687 Commits

Author SHA1 Message Date
Steve Atherton c1b2519b9c Remove additional calls to KVS init in TLogServer in favor of a single call at startup. 2022-10-06 13:14:24 -07:00
Steve Atherton f2dbbcce40 Allow overlapping commits in Redwood in which the caller drops the commit futures. Call IKVS::init() in TLogServer. 2022-10-06 12:48:06 -07:00
Ata E Husain Bohra 201eac77cf
Init TLogPersistent storage for a sharedTLog (#8363)
* Init TLogPersistent storage for a sharedTLog

Description

  diff-1: Address review comments

Patch udpates the code to intialize TLogPersistent storage for a
shared TLog independent of intializing persistentState for a
versioned Tlog data. Appraoch allows initializing Tlog persistent
storage as well as writing 'persistFormat' key for a shared TLog
earlier in the TLog creation lifecycle.

Testing

devRunCorrectness - 100K
2022-09-30 14:06:53 -07:00
sfc-gh-tclinkenbeard 985958c260 Add rare code probe decoration 2022-09-25 15:28:32 -07:00
Lukas Joswiak c97377f48e
Remove durable cluster ID write (#8238) 2022-09-20 07:56:07 -07:00
A.J. Beamon 4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
Evan Tschannen f5161c362e
fix: persistentData->commit() was not protected by the persistentDataCommitLock (#8200)
* fix: persistentData->commit() was not protected by the persistentDataCommitLock, meaning it is possible for inconsistent data to be made durable on the tlog

* fixed a compilation error
2022-09-16 09:47:20 -07:00
sfc-gh-tclinkenbeard 82adc1e856 Make g_simulator a pointer 2022-09-15 09:00:33 -07:00
Josh Slocum fdb509c99e
2 TLog stopped bug fixes - one setting stop from dbinfo, the other handling a race between peek and stopping (#8001) 2022-08-29 15:25:29 -05:00
Dan Adkins 4317e528ab
Guard initPersistentState() calls with timeout. (#7649)
We've observed the recovery process stuck in initPersistentState,
while waiting to acquire persistentDataCommitLock. All of the
other places in that function which potentially interact with a
disk are guarded by a timeout: TLOG_MAX_CREATE_DURATION.

Since it's possible that the current holder of that lock is
stuck in persistentData->commit(), it makes sense to add
a timeout around the entire function, rather than each of the
places where it might get stuck on an I/O operation.

The end result is that after 10 seconds, this process will fail
and cluster recovery will restart.
2022-08-15 10:02:03 -07:00
Zhe Wu 410ad5ff18 TLog track unpopped recovery tag 2022-07-25 20:10:18 -07:00
Zhe Wu aa2740e21e Increase AllTags field length in TLogReady 2022-07-20 13:17:21 -07:00
Markus Pilman 1de37afd52
Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
Markus Pilman 4ece33a0a8
Merge pull request #7445 from sfc-gh-anoyes/anoyes/fix-ubsan
Fix UBSAN build when statically linking libcxx
2022-07-11 17:27:37 -06:00
Andrew Noyes 55548e4ac8 Avoid signed integer overflow 2022-07-07 10:19:20 -07:00
A.J. Beamon c4b0f6eaae
Add an internal C API to support connection to a cluster using a connection string (#7438)
* Add an internal C API to support memory connection records

* Track shared state in the client using a unique and immutable cluster ID from the cluster

* Add missing code to store the clusterId in the database state object

* Update some arguments to pass by const&
2022-07-07 10:12:49 +02:00
Yi Wu 364644673f
Support TLog encryption in commit proxy (#6942)
This PR add support for TLog encryption through commit proxy. The encryption is done on per-mutation basis. As CP writes mutations to TLog, it inserts encryption header alongside encrypted mutations. Storage server (and other consumers of TLog such as storage cache and backup worker) decrypts the mutations as they peek TLog.
2022-06-29 14:21:05 -07:00
Chaoguang Lin 29f98f3654
Avoid duplicate snapshot on one process if it serves as multiple roles (#7294)
* Fix comments

* Add simulation value for SERVER_KNOBS->SNAP_CREATE_MAX_TIMEOUT

* A work version with correctness clean

* Remove unnecessay comments; debugging symbols

* Only check secondary address for coordinators, same as before

* Change the trace to SevError and remove the ASSERT(false)

* Remove TLogSnapRequest handling on TlogServer, which is changed to use WorkerSnapRequest

* Add retry for network failures

* Add retry limit for network failures; still allow duplicate snapshots on processes are both tlog and storage to avoid race

* Add retry limit as a knob and make backoff exponentail

* Add getDatabaseConfiguration(Transaction* tr)

* revert back to send request for each role once

* update some comments
2022-06-29 11:23:07 -07:00
Andrew Noyes 1f8fc32f41
Save a memcpy in the tlog peek path (#7328) 2022-06-07 13:22:56 -07:00
Jingyu Zhou ae5818afa8
Merge pull request #7240 from jzhou77/fix-7109
CC sends recovery txn version during TLog recruitment
2022-05-27 09:27:19 -07:00
Jingyu Zhou b2fded5c51 CC sends recovery txn version during TLog recruitment
This simplifies the logic for TLog to wait for recovery txn before replying
back to peeks.
2022-05-24 14:57:55 -07:00
Andrew Noyes 53882ef741 Revert most logic in #5637 2022-05-24 12:23:49 -07:00
Xiaoxi Wang 5a431980d2 Merge branch 'main' of https://github.com/apple/foundationdb into features/debug-macro 2022-05-20 12:18:20 -07:00
Xiaoxi Wang 6c11fc74ba add debug traces 2022-05-18 15:20:23 -07:00
Dan Lambright 8f884be4f5 To not block peeks during recovery in version vector. 2022-05-10 12:53:54 -04:00
Jingyu Zhou 05e63bc703
Fix orphaned storage server due to force recovery (#6914)
* Fix orphaned storage server due to force recovery

The force recovery can roll back the transaction that adds a storage server.
However, the storage server may now at version B > A, the recovery version.
As a result, its peek to buddy TLog won't return TLogPeekReply::popped to
trigger its exit, and instead getting a higher version C > B back. To the
storage server, this means the message is empty, thus not removing itself and
keeps peeking.

The fix is to instead of using recovery version as the popped version for the
SS, we use the recovery transaction version, which is the first transaction
after the recovery. Force recovery bumps this version to a much higher version
than the SS's version. So the TLog would set TLogPeekReply::popped to trigger
the storage server exit.

* Fix tlog peek to disallow return empty message between recoveredAt and recovery txn version

This contract today is not explicitly set and can cause storage server to fail
with assertion "rollbackVersion >= data->storageVersion()". This is because if
such an empty version is returned, SS may advance its storage version to a
value larger than the rollback version set in the recovery transaction.

The fix is to block peek reply until recovery transaction has been received.

* Move recoveryTxnReceived to be per LogData

This is because a shared TLog can have a first generation TLog which is already
setting the promise, thus later generations won't wait for the recovery version.
For the current generation, all peeks need to wait, while for older generations,
there is no need to wait (by checking if they are stopped).

* For initial commit, poppedVersion needs to be at least 2

To get rid of the previous unsuccessful recovery's recruited seed
storage servers.
2022-05-02 17:17:37 -07:00
Evan Tschannen 442d2b34c7
fix: pops which were ignored during a snapshot would not be replayed on the proper tlogs within a shared tlog (#6892) 2022-04-19 16:57:41 -07:00
Dan Lambright e43fde16ec formatting 2022-04-08 17:28:16 -04:00
Dan Lambright 62975f87d1 Formatting 2022-04-08 15:04:46 -04:00
Dan Lambright 5bdc525353
Merge branch 'main' into vv 2022-04-08 13:16:04 -04:00
Xiaoxi Wang d25fc4db34 add ASSERT_WE_THINK 2022-04-07 09:21:50 -07:00
Xiaoxi Wang 20fee3dd06 check pseudo locality before pop 2022-04-05 23:48:18 -07:00
Jingyu Zhou cfcf0f152c Merge branch 'main-4a085fc84' into vv
Fix Conflicts:
	fdbclient/NativeAPI.actor.cpp
	fdbserver/ClusterRecovery.actor.cpp
	fdbserver/MasterInterface.h
	fdbserver/masterserver.actor.cpp
	flow/error_definitions.h
2022-03-30 22:28:06 -07:00
Jingyu Zhou e9659b5dd4 Merge branch 'master-PR-6500' into vv
Fix Conflicts:
	fdbclient/CommitProxyInterface.h
	fdbclient/NativeAPI.actor.cpp
	fdbserver/masterserver.actor.cpp
2022-03-30 14:53:49 -07:00
sfc-gh-tclinkenbeard a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Dan Lambright 2bbace3c89 Fix tLogServer.actor.cpp 2022-02-25 16:35:24 -05:00
A.J. Beamon 250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
Dan Lambright 9e5f6d8214 Fix clang format 2022-02-24 12:33:25 -05:00
Dan Lambright 8cc9a5af1a Rebase 02/23 2022-02-23 14:23:28 -05:00
Zhe Wu e07ae6fdb9 Address comments 2022-02-16 15:28:56 -08:00
Zhe Wu 9da735c38e Batch empty peek reply 2022-02-16 15:28:56 -08:00
Dan Lambright 9544379cdf rebase 2022-01-20 11:12:33 -05:00
Dan Lambright 1b0a1ac221 Do not recover different versions for the same key across tLogs 2022-01-12 13:27:53 -05:00
Ata E Husain Bohra 936bf5336a
Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191)
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""

Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9.
1.b. This reverts commit d174bb2e06.
1.c. This reverts commit 30b05b469c.

2. Update Status.actor to track ClusterController interface to track
   recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
   prefix; for now keeping it as "Master", however, it should allow
   smooth transition to "Cluster" prefix as it seems more appropriate.
2022-01-06 12:15:51 -08:00
Dan Lambright 49e89571fa Set recoverAt to max(all tlogs rv) for recovered (crashed) tLogs in UNICAST mode. 2022-01-04 12:27:20 -05:00
Aaron Molitor 30b05b469c Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit dfe9d184ff.
2021-12-24 11:25:51 -08:00
Aaron Molitor d174bb2e06 Revert "Refactor: ClusterController driving cluster-recovery state machine"
This reverts commit abd2959702.
2021-12-24 11:25:51 -08:00
Ata E Husain Bohra abd2959702 Refactor: ClusterController driving cluster-recovery state machine
diff-1: Address Jingyu's review comments

At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Ata E Husain Bohra dfe9d184ff Refactor: ClusterController driving cluster-recovery state machine
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Dan Lambright 9f4ac866cd Avoid context switch between appending version list and updating dv
Port PR 6117 (Resolver saves shardChanged in recent state transactions)
2021-12-13 13:02:32 -05:00
Dan Lambright 0222d8669d fix simulation failures 2021-12-10 09:56:21 -05:00
Evan Tschannen e3819dad7c fix: If a removed tlog never attempted a queue commit, the update storage loop could get stuck waiting for queueCommittingVersion to advance 2021-11-25 09:55:01 -08:00
Evan Tschannen 964d0209ca
Merge pull request #5637 from sfc-gh-ljoswiak/features/data-loss-prevention
Data loss protection when joining new cluster
2021-11-15 15:26:32 -08:00
Dan Lambright 4979ccb889 commits recovered if written to every tlog minus failure tolerance. 2021-11-12 12:10:04 -05:00
Lukas Joswiak e4c3f886da Fix recovery issue 2021-11-10 16:15:13 -08:00
Dan Lambright 0f99ad582b first cut unicast recovery 2021-11-10 12:31:16 -05:00
Sreenath Bodagala 1ec238b8b4 - Address a review comment 2021-11-09 20:46:42 +00:00
Lukas Joswiak 15e0d5b29f Add explicit transaction options when reading cluster ID 2021-11-09 12:29:49 -08:00
Lukas Joswiak 74cf64fe0f Sync cluster ID through ServerDBInfo 2021-11-09 12:29:48 -08:00
Lukas Joswiak 4640045243 Fix rare simulation failures
When partitions appear before a cluster has fully recovered, it was
possible to have different tlogs persist different cluster IDs because
they were involved in different partitions. This would affect recovery
when a quorum was eventually reached. The solution to this is to avoid
persisting the cluster ID before a cluster has fully recovered, to make
sure all nodes agree on the cluster ID.
2021-11-09 12:29:48 -08:00
Lukas Joswiak 3988b11fd6 Cleanup 2021-11-09 12:29:48 -08:00
Lukas Joswiak aa3383f0e3 Exclude when joining new cluster 2021-11-09 12:29:48 -08:00
Lukas Joswiak 3e2c65bb11 Allow tlog to join another cluster but retain its data 2021-11-09 12:29:48 -08:00
Lukas Joswiak 30867750b5 Add protection against storage and tlog data deletion when joining a new cluster 2021-11-09 12:29:47 -08:00
Sreenath Bodagala 26ac1529fa - Unblock any waiting peeks before stopping a tlog. 2021-11-09 17:22:50 +00:00
Markus Pilman 7df059570a Make sure unit tests are run often enough 2021-11-08 15:43:32 -07:00
Dan Lambright 05a1419ba0 Fix corner-case where poppedVersion races with wait on new mutations in tLog 2021-11-03 11:32:31 -04:00
Dan Lambright befe1993c4 fix conflict on rebase 2021-10-29 12:25:26 -04:00
Sreenath Bodagala 2bf54fda90 - Address review comments 2021-10-28 20:06:11 +00:00
Sreenath Bodagala 4503b0a347 - Capture metrics about empty/non-empty peeks done by storage servers 2021-10-26 14:37:46 +00:00
Evan Tschannen c615279807
Merge pull request #5720 from sfc-gh-ljoswiak/fixes/recovery-failure-fix
Fix possible recovery hang
2021-10-25 12:35:31 -07:00
Evan Tschannen f1158371a7 Merge branch 'master' of https://github.com/apple/foundationdb into feature-range-feed
# Conflicts:
#	flow/error_definitions.h
2021-10-21 00:55:12 -07:00
Lukas Joswiak 120d99e941 Fix a recovery hang that could occur when a new recovery was started during the existing recovery 2021-10-19 17:37:14 -07:00
sfc-gh-tclinkenbeard 9e06b6e6e3 Make IClosable interface const-correct 2021-10-18 13:40:47 -07:00
Dan Lambright 23062b892e Calculate tpcv on resolvers 2021-10-15 16:40:00 -04:00
Dan Lambright f099bb2574 comments on this PR's change 2021-10-15 15:08:25 -04:00
Dan Lambright 15dc5a3e41 wake waiters when data made durable 2021-10-15 10:58:48 -04:00
Evan Tschannen 5c642f706e Merge branch 'master' of https://github.com/apple/foundationdb into feature-range-feed
# Conflicts:
#	fdbcli/fdbcli.actor.cpp
2021-10-09 19:34:16 -07:00
Dan Lambright 58e1888d8e remove network hop by getting previous commit versions in GetCommitVersionRequest 2021-09-30 11:51:57 -04:00
Sreenath Bodagala 2aa3b44d4e Merge remote-tracking branch 'apple-upstream/master' into version-vector-prototype
- Conflicts:
	fdbserver/LogSystem.h
	fdbserver/LogSystemConfig.h
	fdbserver/TagPartitionedLogSystem.actor.cpp

- Files modified during merge:

modified:   fdbserver/LogSystem.cpp
modified:   fdbserver/LogSystemConfig.cpp
2021-09-17 19:36:18 +00:00
Xiaoge Su abf73047ca Enforce std:: specifier rather than using namespace 2021-09-16 19:40:28 -07:00
Xiaoge Su 067c1cc55b Extract methods in LogSystem.h to corresponding cpp file 2021-09-12 14:17:19 -07:00
Evan Tschannen ac5b580e2d Merge branch 'master' into feature-range-feed
# Conflicts:
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/StorageServerInterface.cpp
#	fdbclient/StorageServerInterface.h
#	fdbserver/ApplyMetadataMutation.cpp
#	fdbserver/TLogServer.actor.cpp
#	flow/error_definitions.h
2021-09-09 23:13:22 -07:00
Dan Lambright d8d64ecc6f Add TODO 2021-09-09 12:47:00 -04:00
Dan Lambright ea748f3273 Add latency metrics for blocking peek 2021-09-08 09:50:01 -04:00
Dan Lambright 8689e1f106 merge with master 2021-08-30 15:29:08 -04:00
Steve Atherton deeb6b3404 Merge branch 'master' of https://github.com/apple/foundationdb into durability-bug-repro1
# Conflicts:
#	fdbserver/TLogServer.actor.cpp
2021-08-24 16:19:16 -07:00
Steve Atherton ec0e39b40f Bug fix: Popped versions are exclusive, so after recovery a tag for which there is no longer data should be considered popped up until the version *after* recovery, indicating that data at the recovery version itself has been popped. 2021-08-24 15:16:20 -07:00
Sreenath Bodagala 7c269b5225 - Address a bug 2021-08-17 14:40:00 +00:00
Xiaoxi Wang a97570bd06 solve mis-spelling, trace log and format problems 2021-08-11 18:26:00 -07:00
Sreenath Bodagala cec744cebf - Address the following issues:
- Sequencer should update the version vector once for a given commit
version (irrespective of the number of times that it receives and
processes the ReportRawCommittedVersionRequest message for that commit
version). Issue found by simulation tests.

- Storage server should take both its latest commit version and the
read version into account while processing a read request. This is to
address transaction_too_old error that we saw while running tests with
mako (and also in YCSB tests).

- Do not enable the tlog blocking-peek logic if ENABLE_VERSION_VECTOR
flag is set to false.
2021-08-10 19:47:18 +00:00
Xiaoxi Wang 1f6cee89ab merge master, fix conflicts 2021-08-10 10:01:45 -07:00
Steve Atherton c73e861074 Move role UIDs for MutationTracking TraceEvents from various inconsistent detail fields into the TraceEvent UID field. 2021-08-10 01:59:28 -07:00
Steve Atherton 54c7036eaf Move role UIDs for MutationTracking TraceEvents from various inconsistent detail fields into the TraceEvent UID field. 2021-08-10 01:52:36 -07:00
Evan Tschannen 208a5790ad fixed usage of durable version 2021-08-09 21:58:44 -07:00
Evan Tschannen ed28aecde0 Merge branch 'master' into feature-range-feed 2021-08-09 20:40:55 -07:00
Evan Tschannen bc9a0e1315 first attempt to add data distribution support for range feeds 2021-08-09 10:05:56 -07:00
Xiaoxi Wang 2263626cdc 200k test clean: enable remote Log pull from LogRouter 2021-08-07 09:53:32 -07:00
Sreenath Bodagala 1758c92683 - Pull changes related to tlog-peeks from the version indexer branch
Pull commits 5e37bc37a0 and
95e85aaffb from the version indexer branch.
2021-08-06 14:42:35 +00:00
Sreenath Bodagala a081c0baa5 Merge remote-tracking branch 'apple-upstream/master' into version-vector-prototype 2021-08-05 22:40:32 +00:00