foundationdb

Commit Graph

Author	SHA1	Message	Date
sfc-gh-tclinkenbeard	0814841827	Replace NULL with nullptr in fdbserver	2020-09-20 11:31:49 -07:00
Meng Xu	ef8c1060a2	Merge branch 'master' into mengxu/tmp-merge-6.3	2020-07-13 10:15:56 -07:00
A.J. Beamon	b09dddc07e	Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3 # Conflicts: # cmake/ConfigureCompiler.cmake # documentation/sphinx/source/downloads.rst # fdbrpc/FlowTransport.actor.cpp # fdbrpc/fdbrpc.vcxproj # fdbserver/DataDistributionQueue.actor.cpp # fdbserver/Knobs.cpp # fdbserver/Knobs.h # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/Status.actor.cpp # fdbserver/storageserver.actor.cpp # flow/flow.vcxproj	2020-07-10 15:06:34 -07:00
sfc-gh-tclinkenbeard	99bf993815	Replace BOOST_NOEXCEPT with noexcept	2020-06-09 22:39:19 -07:00
Evan Tschannen	ced65cd30b	finished explicitly versioning everything stored in the database	2020-05-22 17:14:21 -07:00
Evan Tschannen	8fd926e08e	serialize old tlog entries with old protocol versions to support downgrades	2020-05-22 14:00:07 -07:00
Markus Pilman	c2bc75516f	Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles	2020-05-14 10:34:53 -07:00
Alex Miller	f148412a32	Make UPDATE_STORAGE_BYTE_LIMIT the reference spill variety. Which is unrelated, but a change I was supposed to do a while ago and forgot.	2020-05-12 16:59:20 -07:00
Markus Pilman	5f9b127e56	Emit traces regularly about role assignment We are currently emitting Role transition traces when a role starts and when it ends. While this is useful for debugging, it doesn't work well with tools that inject data and might potentially miss some trace lines. We do decorate each trace lines with the roles assigned to that particular process, however, this is not sufficient for tools that can make use of the UID -> Role mapping	2020-05-08 16:27:57 -07:00
Evan Tschannen	7cebe743f9	A number of bug fixes of rare correctness errors	2020-04-29 13:50:13 -07:00
Evan Tschannen	c87aa33941	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # bindings/go/src/fdb/generated.go # documentation/sphinx/source/api-common.rst.inc # documentation/sphinx/source/api-ruby.rst # documentation/sphinx/source/release-notes.rst # fdbclient/FailureMonitorClient.actor.cpp # fdbclient/NativeAPI.actor.cpp # fdbclient/vexillographer/fdb.options # fdbrpc/FlowTransport.actor.cpp # fdbserver/OldTLogServer_6_0.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/fdbserver.actor.cpp # versions.target	2020-04-23 13:47:53 -07:00
Evan Tschannen	96258b9809	Merge branch 'release-6.2' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbcli/fdbcli.actor.cpp # fdbclient/ManagementAPI.actor.cpp # fdbrpc/FlowTransport.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/DataDistribution.actor.h # fdbserver/DataDistributionQueue.actor.cpp # fdbserver/KeyValueStoreMemory.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/QuietDatabase.actor.cpp # fdbserver/SkipList.cpp # fdbserver/StorageMetrics.actor.h # fdbserver/TLogServer.actor.cpp # fdbserver/fdbserver.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/KVStoreTest.actor.cpp # flow/CMakeLists.txt # flow/Knobs.cpp # flow/Knobs.h # flow/genericactors.actor.cpp # flow/serialize.h	2020-02-21 19:09:16 -08:00
Evan Tschannen	6c0b934dda	Merge pull request #2242 from alexmiller-apple/fix-10min-stall-again Fix the 10min multi-region recovery stall again	2020-01-23 17:53:02 -08:00
Jingyu Zhou	8b67a89eed	More review comments fixed.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	56f40a978e	Backport changes to OldTLogServer_6_2	2020-01-22 19:38:46 -08:00
Alex Miller	f0fe62a298	TLogs should not respond with data earlier than the begin version Parallel peek more code would prefer the begin version it was sent by the previous parallel peek over the request's begin version. This means that a merge cursor trying to advance past message versions would still get old data that it would have to filter out. A simple application of std::max fixes this.	2020-01-21 19:09:07 -08:00
Alex Miller	ffc3506fff	Continuing a parallel peek after a timeout would hang.	2020-01-21 17:12:18 -08:00
Alex Miller	1cb311fcb8	Add an ASSERT_WE_THINK that peek cursors don't get timed_out() This should prevent us from regressing and having multi-region recoveries hang for 10min again.	2020-01-21 17:07:37 -08:00
Alex Miller	f58507c830	Rename poppedLocationForVersion -> versionForPoppedLocation	2019-12-19 10:24:31 -08:00
Alex Miller	b98107ccab	Update fdbserver/OldTLogServer_6_2.actor.cpp	2019-12-18 11:15:18 -08:00
Alex Miller	d8cbd495af	Fix another pop + spill/dq-pop interleaving issue This fixes an issue introduced in the previous patch, where pop would immediately set `poppedLocationNeedsUpdate`, but setting the popped version was now delayed. This means that we could: 1. Run the spill loop and persist all popped versions 2. Receive a pop, and set the poppedLocationNeedsUpdate flag 3. Run the dq-pop loop, and clear the poppedLocationNeedsUpdate flag and now when we update the persistentPopped version again, we won't have the flag set for dq-pop to know that it needs to scan the spilled data again for the minLocation. We could more carefully update the flag, but instead, I've just converted it into a version that's kept in sync purely in the dq-pop loop, to remove shared state between pop and the dq-pop loop.	2019-12-17 23:15:48 -08:00
Alex Miller	b36062a509	DiskQueue should only pop based off of persisted popped tag versions This commit is to fix a bug where popping a tag between updatePersistentData and popDiskQueue can cause the TLog to recover to an incorrect understanding of what data it has available. The following series of events need to happen to trigger this bug: Tag 1:1 is popped to version 10 updatePersistentData is run... updatePersistentPopped runs and we persistentData stores 1:1 as popped to 10 A mutation is spilled for 1:1 at version 11 at location 1000 A mutation is spilled for 1:1 at version 21 at location 5000 updatePersistentData finishes and commits the btree changes Tag 1:1 is popped to version 20 popDiskQueue runs The btree is read for spilled mutations with version >=20 The minimum location required for the disk queue is found to be location 5000 The disk queue is popped to location 5000 The TLog crashes The worker restarts, and reloads the TLog files from disk restorePersistentPopped restores tag 1:1 as having been popped to version 10 Parallel peeks are received for tag 1:1 starting at version 0 The first peek is less than the popped version, so we respond with no data, and an end version of 10 The second peek starts at version 10, which is greater than the popped version The btree is read for spilled mutations, and we find that there is a mutation at version 11 at location 1000 Location 1000 is read in the DiskQueue The resulting page read at Location 1000 was popped pre-crash, and thus might either (a) be corrupt or (b) have an incorrect sequence number. The fix to this is to force popDiskQueue/updatePoppedLocation to use the popped version that was persisted to disk, and not the most recently popped version for the given tag. This bug doesn't manifest in simulation, because we don't have any code that peeks at a lower version than what has been popped.	2019-12-17 23:02:37 -08:00
negoyal	b6f35c573e	Forward declare tLogPop in 6_2.	2019-11-20 10:43:24 -08:00
negoyal	2c227a7049	Missing cacheTag pop changes in OldTLogServer 6_2 version.	2019-11-19 17:41:48 -08:00
sramamoorthy	c9097cca18	deprecate isTLogInSameNode used by snapshot V1	2019-10-09 15:33:11 -07:00
Alex Miller	77c72de176	Comment variable and code style fix Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>	2019-10-07 18:08:27 -07:00
Alex Miller	1d8a7e5af7	Spill SharedTLog when there's more than one. When switching between spill_type or log_version, a new instance of a SharedTLog is created in the transaction log processes. If this is done in a saturated database, then doubling the amount of memory to hold mutations in memory can cause TLogs to be uncomfortably close to the 8GB OOM limit. Instead, we now thread which UID of a SharedTLog is active, and the other TLog spill out the majority of their mutations.	2019-10-07 18:08:27 -07:00
Alex Miller	9401a6941a	Code review nits const correctness and file renaming in comment. Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>	2019-10-03 15:53:39 -07:00
Alex Miller	60fb04ca68	Fork TLogServer into TLogServer_6_2 This prepares us for incoming modifications to the TLog that can't easily coexist with our current on-disk state.	2019-10-03 01:41:25 -07:00

29 Commits