Commit Graph

152 Commits

Author SHA1 Message Date
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
A.J. Beamon aaf0a9aa7b Merge branch 'release-6.3' into merge-release-6.3-into-master
# Conflicts:
#	build/docker-compose.yaml
#	cmake/ConfigureCompiler.cmake
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/IAsyncFile.h
#	fdbrpc/IRateControl.h
#	fdbrpc/simulator.h
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbservice/ServiceBase.cpp
2021-02-08 12:58:34 -08:00
A.J. Beamon 67e783acf8 Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/CompileBoost.cmake
#	cmake/FDBComponents.cmake
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/simulator.h
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/storageserver.actor.cpp
#	flow/Knobs.h
#	flow/network.h
2021-02-08 09:20:28 -08:00
Evan Tschannen 068647ba1b updated naming and comments 2021-02-03 14:24:39 -08:00
Evan Tschannen 4f3da7c93b also kill tlogs if a commit takes too long 2021-01-29 12:18:06 -08:00
Andrew Noyes 4ee97c0784 Use clang-tidy to automatically fix missing overrides
Use `clang-tidy -p . $file -checks='-*,modernize-use-override' -header-filter='.*' -fix`
to fix missing overrides, and then use git clang-format to reformat just
those changes. This went pretty well for most files.

Formatting the following files went off the rails, so I'm going to
follow up with a commit that's just clang-tidy and no clang-format.

- fdbclient/DatabaseBackupAgent.actor.cpp
- fdbclient/FileBackupAgent.actor.cpp
- fdbserver/OldTLogServer_4_6.actor.cpp
- fdbmonitor/SimpleIni.h
- fdbserver/workloads/ClientTransactionProfileCorrectness.actor.cpp
2021-01-26 02:04:12 +00:00
Markus Pilman bdd3dbfa7d remove duplicates 2020-11-10 14:01:07 -07:00
sfc-gh-tclinkenbeard 0814841827 Replace NULL with nullptr in fdbserver 2020-09-20 11:31:49 -07:00
Evan Tschannen 12edadd059 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	fdbclient/Knobs.cpp
#	fdbclient/MasterProxyInterface.h
#	fdbrpc/simulator.h
#	fdbserver/MasterProxyServer.actor.cpp
#	tests/fast/CycleAndLock.txt
#	tests/fast/TxnStateStoreCycleTest.txt
#	tests/fast/VersionStamp.txt
#	tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
#	tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
#	versions.target
2020-08-31 19:33:34 -07:00
Evan Tschannen 29eec30183 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	CMakeLists.txt
#	build/Dockerfile
#	build/Dockerfile.devel
#	documentation/sphinx/source/downloads.rst
#	fdbserver/Knobs.cpp
#	fdbserver/LogSystem.h
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WaitFailure.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen ce1139e588 added missing dumpToken trace events 2020-08-27 17:17:27 -07:00
Meng Xu ef8c1060a2 Merge branch 'master' into mengxu/tmp-merge-6.3 2020-07-13 10:15:56 -07:00
A.J. Beamon b09dddc07e Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/fdbrpc.vcxproj
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
A.J. Beamon 04d1217941 Track statistics about server-side request latency on each process, to include min, max, mean, and various percentiles. 2020-07-09 16:39:15 -07:00
sfc-gh-tclinkenbeard 99bf993815 Replace BOOST_NOEXCEPT with noexcept 2020-06-09 22:39:19 -07:00
Evan Tschannen ced65cd30b finished explicitly versioning everything stored in the database 2020-05-22 17:14:21 -07:00
Evan Tschannen 8fd926e08e serialize old tlog entries with old protocol versions to support downgrades 2020-05-22 14:00:07 -07:00
Markus Pilman 5f9b127e56 Emit traces regularly about role assignment
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.

We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
Evan Tschannen 7cebe743f9 A number of bug fixes of rare correctness errors 2020-04-29 13:50:13 -07:00
Evan Tschannen c87aa33941 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/go/src/fdb/generated.go
#	documentation/sphinx/source/api-common.rst.inc
#	documentation/sphinx/source/api-ruby.rst
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FailureMonitorClient.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/vexillographer/fdb.options
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	versions.target
2020-04-23 13:47:53 -07:00
Evan Tschannen 91fba9106d ported peek metrics to old tlog 6.0 2020-04-22 23:35:48 -07:00
Balachandar Namasivayam a476127f5f
Merge pull request #2802 from xumengpanda/mengxu/debug-master-PR
Fix correctness failure on master branch
2020-03-18 16:07:36 -07:00
Evan Tschannen e08f0201f1 merge release 6.2 into master 2020-03-17 12:51:47 -07:00
Evan Tschannen ea98c7a40a added additional timeout on initPersistentState 2020-03-16 11:38:14 -07:00
Evan Tschannen d6d347f665 treat a tlog which takes a long time to create its disk queue as failed 2020-03-13 10:31:59 -07:00
Meng Xu bd345f85db ConsistencyCheck:Fix failue due to address inconsistency between process and worker
With TLS, a worker (or process) can have a TLS address and non-TLS address.
When a process is created in simulation, the primary address is TLS by default.
The non-TLS one is the TLS address port plus one.

In a connection between two workers, if their primary addresses do not enable
or disable TLS together, one worker will swap its primary address and secondary address
so that the TLS config of the two endpoints can match.

The swap can make the primary address no longer the TLS one that was created
when the process is created. And the swap only happens for worker instead of
process struct in simulation.

This swap can cause worker->address != process->address.
In checkForExtraDataStores actor, we use worker->address to check if a process
is killable and use the process->address to kill the process. The inconsistency
can cause simulation to kill a protected process that is not killable and leads
to simulation failure.
2020-03-10 21:07:16 -07:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen 8129f74a10
Merge pull request #2698 from etschannen/feature-recruit-delay
The CC waits until no new workers register before starting a bad recruitment
2020-02-20 14:42:37 -08:00
A.J. Beamon fcbdcda490
Merge pull request #2650 from ajbeamon/fix-reverse-range-read-byte-limit-bug
Fix reverse range read performance bug
2020-02-20 12:47:17 -08:00
Evan Tschannen fbd45963d8 The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment 2020-02-19 16:48:30 -08:00
A.J. Beamon 1d9140d874 Removed TLogVersion logging.
Added logging of SharedTLog ID for each TLog.
Switched ID logged for TLogRejoining event to the TLog instead of the SharedTLog.
Made some parameters to startRole passed by reference.
2020-02-14 12:33:43 -08:00
A.J. Beamon 56053c565b Improve TLog "Role" event by adding the worker ID, the TLog version, and under what circumstances the TLog is being started (Restored, Recruited, or Recovered).
The SharedTLog role was being started and stopped twice, so remove one instance of it.
2020-02-12 15:11:38 -08:00
A.J. Beamon df2b0452b4 Step 3 of fixing storage server range reads: change return type of readRange from VectorRef<KeyValueRef> to RangeResultRef. 2020-02-06 13:19:24 -08:00
Evan Tschannen 6c0b934dda
Merge pull request #2242 from alexmiller-apple/fix-10min-stall-again
Fix the 10min multi-region recovery stall again
2020-01-23 17:53:02 -08:00
Jingyu Zhou 8b67a89eed More review comments fixed. 2020-01-22 19:42:13 -08:00
Jingyu Zhou 9d7a1a77d0 Small fixes. 2020-01-22 19:38:45 -08:00
Jingyu Zhou 73824faf65 Track pseudo tags popping for individual IDs
For each log router ID, we track the popped version of each pseudo tag so that
the popping only applied to the minimum of these versions.

Also add more tracing for popping and epochs.
2020-01-22 19:38:45 -08:00
Jingyu Zhou 11964733b7 WIP: should be divided into smaller commits. 2020-01-22 19:38:45 -08:00
Jingyu Zhou 03a17a30ef Refactor: check displacement in LogSystemConfig 2020-01-22 19:38:45 -08:00
Jingyu Zhou 8221d33eb1 Use emplace_back instead of push_back for TLogServer 2020-01-22 19:35:30 -08:00
Alex Miller f0fe62a298 TLogs should not respond with data earlier than the begin version
Parallel peek more code would prefer the begin version it was sent by
the previous parallel peek over the request's begin version.  This means
that a merge cursor trying to advance past message versions would still
get old data that it would have to filter out.

A simple application of std::max fixes this.
2020-01-21 19:09:07 -08:00
Alex Miller ffc3506fff Continuing a parallel peek after a timeout would hang. 2020-01-21 17:12:18 -08:00
Alex Miller 1cb311fcb8 Add an ASSERT_WE_THINK that peek cursors don't get timed_out()
This should prevent us from regressing and having multi-region
recoveries hang for 10min again.
2020-01-21 17:07:37 -08:00
Evan Tschannen 3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen 827cea74b5 fix: tlogs must send a recruitment reply even when actor cancelled or the recruitment endpoint will be marked as permanently failed 2020-01-16 17:37:17 -08:00
Evan Tschannen ebcb2f79ed Merge branch 'master' of github.com:apple/foundationdb 2019-11-22 15:34:49 -08:00
Evan Tschannen 8d3ef89540 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/MutationList.h
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2019-11-14 15:49:56 -08:00
negoyal a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Evan Tschannen 396dccbc98 when peeking from satellites we do not need to limit the amount of peeking on log router tags, because that is the only thing that can be peeked from a satellite log 2019-11-08 18:34:05 -08:00
Evan Tschannen afc9713005 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FDBTypes.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	versions.target
2019-11-06 13:45:37 -08:00