Commit Graph

119 Commits

Author SHA1 Message Date
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
sfc-gh-tclinkenbeard 5020e3faa1 Make ILogSystem::IPeekCursor const-correct 2020-12-08 09:09:31 -08:00
David Youngworth d64cf8b9e3 Merge branch 6.3 into master 2020-11-17 11:22:45 -08:00
David Youngworth d0391db862 Merge branch 'release-6.2' into release-6.3 2020-11-16 10:15:23 -08:00
Markus Pilman bdd3dbfa7d remove duplicates 2020-11-10 14:01:07 -07:00
sfc-gh-tclinkenbeard 4669f837fa Add uses of makeReference 2020-11-07 22:10:18 -08:00
Vishesh Yadav 7b28de8a41 Add IDs to ConnectionReset TraceEvents 2020-11-04 14:06:49 -08:00
Vishesh Yadav 22b16302c3 Make ConnectionReset logs easier to query #3977
All TraceLogs that are related to ConnectionReset should be prefixed with
ConnectionReset. This should make it easy to query and aggregate by address and
role.
2020-11-02 15:10:51 -08:00
Evan Tschannen 12edadd059 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	fdbclient/Knobs.cpp
#	fdbclient/MasterProxyInterface.h
#	fdbrpc/simulator.h
#	fdbserver/MasterProxyServer.actor.cpp
#	tests/fast/CycleAndLock.txt
#	tests/fast/TxnStateStoreCycleTest.txt
#	tests/fast/VersionStamp.txt
#	tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
#	tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
#	versions.target
2020-08-31 19:33:34 -07:00
Evan Tschannen 29eec30183 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	CMakeLists.txt
#	build/Dockerfile
#	build/Dockerfile.devel
#	documentation/sphinx/source/downloads.rst
#	fdbserver/Knobs.cpp
#	fdbserver/LogSystem.h
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WaitFailure.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen 507c67c930 Added additional information to trace events 2020-08-26 11:42:23 -07:00
Meng Xu ef8c1060a2 Merge branch 'master' into mengxu/tmp-merge-6.3 2020-07-13 10:15:56 -07:00
A.J. Beamon b09dddc07e Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
# Conflicts:
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/fdbrpc.vcxproj
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
Evan Tschannen 33c9b1374a more compile fixes 2020-07-09 22:57:43 -07:00
Evan Tschannen f6163d0a79 fix compile errors 2020-07-09 22:53:02 -07:00
Evan Tschannen 717242a0ee reset WAN network connections every 5 minutes is responses take more than 500ms 2020-07-09 22:50:47 -07:00
sfc-gh-ngoyal 693d9e8b89
Merge branch 'master' into fdb_cache_wo_allocator 2020-06-09 15:09:58 -07:00
Alex Miller ccaac162e2 Resolve performance concerns of nearly-no-op debugMutation being frequently called
This introduces unhygenic macro variants that inline a `ENABLED &&`
before the TraceEvent.  This way, they get entirely compiled out unless
enabled.

Then rewrite all debugMutation uses via sed.
2020-05-13 18:44:15 -07:00
Alex Miller 122762cce1 Add debugMessagesAndTags, and track mutations in more places.
Like:
* Leaving the proxy
* Entering the TLog
* Leaving the TLog
* Being read on a cursor

All of this brought to you by TagsAndMessage!

This also slides in a minor optimization as to how mutations are serialized per target log.
2020-03-27 03:31:04 -07:00
negoyal acaf91ac47 Merge branch 'master' into fdb_cache_subfeature2 2020-03-26 13:33:08 -07:00
negoyal 8abac91033 Fixed a bug in cache server while peeking at a version lower than popped version and added some logging. 2020-03-26 12:39:07 -07:00
Meng Xu bd345f85db ConsistencyCheck:Fix failue due to address inconsistency between process and worker
With TLS, a worker (or process) can have a TLS address and non-TLS address.
When a process is created in simulation, the primary address is TLS by default.
The non-TLS one is the TLS address port plus one.

In a connection between two workers, if their primary addresses do not enable
or disable TLS together, one worker will swap its primary address and secondary address
so that the TLS config of the two endpoints can match.

The swap can make the primary address no longer the TLS one that was created
when the process is created. And the swap only happens for worker instead of
process struct in simulation.

This swap can cause worker->address != process->address.
In checkForExtraDataStores actor, we use worker->address to check if a process
is killable and use the process->address to kill the process. The inconsistency
can cause simulation to kill a protected process that is not killable and leads
to simulation failure.
2020-03-10 21:07:16 -07:00
Evan Tschannen 303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen 1076abdee5 fixed crash when interf was not created 2020-03-05 19:09:08 -08:00
Evan Tschannen 1128666840 added additional logging on the log router 2020-03-05 18:17:06 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen cf4efca852 fix: buffered cursor should always make sure all of the sub-cursors are completely exhausted before calculating minVersion. It is not legal to advance a cursor version past an epochEnd (+100 million versions) without also returning the epochEnd mutation, or the storage servers might not be able to rollback far enough because the end of the previous epoch will be made durable 2020-02-19 15:24:32 -08:00
Alex Miller 7798456201 Make TLogs have consistent parallel peek behavior.
TLogServer and LogRouter had some leftover code from me trying to be
more "correct" about parallel peek semantics, but those changes weren't
reflected in the OldTLog* files.  I've reverted the changes, as
realistically, they are more likely to waste CPU than improve TLog behavior.
2020-01-21 18:23:16 -08:00
Alex Miller 858e4e5900 Move the check to a better location.
This way, we avoid some ID randomness, and also avoid the potential for
resetting the randomID and sequence without clearing out the future
vector.
2020-01-21 17:08:42 -08:00
Alex Miller 1cb311fcb8 Add an ASSERT_WE_THINK that peek cursors don't get timed_out()
This should prevent us from regressing and having multi-region
recoveries hang for 10min again.
2020-01-21 17:07:37 -08:00
Alex Miller 0662f8dba0 When switching parallel->single->parallel, reset sequence and peekId
This fixes an issue where one could hang for 10min for the second
parallel peek to time out, if one happened to catch the edge of a
onlySpilled transition wrong.
2020-01-21 17:07:37 -08:00
Evan Tschannen afc9713005 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FDBTypes.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen dbc5a2393c combineMessages still did not serialize tags correctly 2019-11-05 18:44:30 -08:00
Evan Tschannen 1c873591be fixed a compiler error 2019-11-05 18:32:15 -08:00
Evan Tschannen 86560fe727 fix: tempTags was not used correctly 2019-11-05 18:22:25 -08:00
Evan Tschannen a8ca47beff optimized memory allocations by using VectorRef<Tag> instead of std::vector<Tag> 2019-11-05 18:07:30 -08:00
Evan Tschannen daac8a2c22 Knobified a few variables 2019-11-04 20:21:38 -08:00
Evan Tschannen 457896b80d remote logs use bufferedCursor when peeking from log routers to improve performance
bufferedCursor performance has been improved
2019-11-04 19:47:45 -08:00
Evan Tschannen 3325980c03 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.actor.h
#	fdbserver/worker.actor.cpp
#	versions.target
2019-10-24 17:38:15 -07:00
Evan Tschannen a7492aab0a fix: poppedVersion can update during a yield, so all work must be done immediately after getMore returns 2019-10-23 23:06:02 -07:00
Alex Miller 1e5b8c74e3 Continuing a parallel peek after a timeout would hang.
This is to guard against the case where

1. Peeks with sequence numbers 0-39 are submitted
2. A 15min pause happens, in which timeout removes the peek tracker data
3. Peeks with sequence numbers 40-59 are submitted, with the same peekId

The second round of peeks wouldn't have the data left that it's allowed
to start running peek 40 immediately, and thus would hang for 10min
until it gets cleaned up.

Also, guard against overflowing the sequence number.
2019-10-22 19:24:05 -07:00
Alex Miller c008e7f8b3 When switching parallel->single->parallel, reset sequence and peekId
This fixes an issue where one could hang for 10min for the second
parallel peek to time out, if one happened to catch the edge of a
onlySpilled transition wrong.
2019-10-22 19:10:58 -07:00
Evan Tschannen b495cc697b Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-09-13 09:25:08 -07:00
Alex Miller 324289039a When reloading one cursor in a merge cursor, top off the other cursors as well. 2019-09-12 16:22:28 -07:00
Jingyu Zhou 2723922f5f Replace -1 as VERSION_HEADER constant for serialization 2019-09-05 12:45:39 -07:00
Jingyu Zhou f9357c5ad8 Fix side effect of ArenaReader
ServerPeekCursor::nextMessage() should only consume the message header, because
the reader() directly inherits the current position. The previous commit
changes the positon to the begining of the next message, which breaks storage
server code.
2019-09-05 11:07:07 -07:00
Jingyu Zhou cd3f1e33d4 Refactor deserialization of TagsAndMessages
Consolidate deserialization of TagsAndMessages in the structure itself and
change both TLog and ServerPeekCursor to use it.
2019-09-04 14:55:05 -07:00
Evan Tschannen b0480edd15 fix: messageVersion could be larger than poppedVersion, and we will discard messages that are needed 2019-08-06 16:31:05 -07:00
Evan Tschannen 7ac7eb82f2 fix: buffered cursor would start multiple bufferedGetMore actors
advance all of the cursors to the poppedVersion
2019-07-30 14:42:05 -07:00
Evan Tschannen b5cb7919b6 fix: canDiscardPopped was not reset when necessary in all cases 2019-07-30 13:44:44 -07:00