Andrew Noyes
9601769b01
Merge pull request #3858 from sfc-gh-rchen/stable_interfaces
...
Stable interfaces
2020-12-11 09:34:27 -08:00
Andrew Noyes
cc669f399e
Merge remote-tracking branch 'upstream/release-6.3' into anoyes/merge-release-6.3-master
2020-12-07 22:26:11 +00:00
Andrew Noyes
7fbc4d7391
Resolve conflicts
2020-12-04 23:58:42 +00:00
A.J. Beamon
fa4d87f432
Merge pull request #4112 from sfc-gh-tclinkenbeard/6.3-fix-tlog-pop-slow-task
...
Yield while processing ignored pop requests on tlog
2020-12-04 09:20:36 -08:00
Andrew Noyes
877997632d
Merge branch 'release-6.3' into anoyes/merge-release-6.3-master
...
Include conflict markers for review purposes
2020-12-04 01:38:07 +00:00
sfc-gh-tclinkenbeard
e9c31b4200
Move ignore pop logic from tLogPopCore to tLogPop
2020-12-03 14:23:49 -08:00
sfc-gh-tclinkenbeard
6184236c87
Add code coverage macro to processPopRequests
2020-12-03 11:56:55 -08:00
sfc-gh-tclinkenbeard
7e815ebb68
Added TLOG_POP_BATCH_SIZE knob
2020-12-03 11:56:55 -08:00
sfc-gh-tclinkenbeard
1003057a7e
Removed TLogData::toBePoppedMutex
2020-12-03 11:56:52 -08:00
sfc-gh-tclinkenbeard
a8af598307
Addressed review comments
2020-12-02 16:24:55 -08:00
Trevor Clinkenbeard
9d702099cf
Update fdbserver/TLogServer.actor.cpp
...
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2020-12-02 16:11:23 -08:00
Richard Chen
c77d9e4abe
merge conflicts
2020-12-02 21:53:19 +00:00
sfc-gh-tclinkenbeard
5237549c19
Yield while processing ignored pop requests on tlog
2020-11-26 20:36:02 -08:00
Andrew Noyes
dc2bac5670
Resolve conflicts
2020-11-24 19:09:42 +00:00
Andrew Noyes
1f541f02be
Merge branch 'anoyes/merge-6.2-to-6.3' into anoyes/release-6.3-merge
...
Merge, leaving conflict markers for now
2020-11-24 16:55:34 +00:00
David Youngworth
d64cf8b9e3
Merge branch 6.3 into master
2020-11-17 11:22:45 -08:00
David Youngworth
d0391db862
Merge branch 'release-6.2' into release-6.3
2020-11-16 10:15:23 -08:00
Jingyu Zhou
569ab46bf6
Merge pull request #4000 from xumengpanda/mengxu/ha-code-read
...
Add comments to TLog, SS, and DD related code
2020-11-14 09:06:05 -08:00
Vishesh Yadav
4df23741b2
tLog: Track tlog commit latencies in histogram
2020-11-12 17:48:16 -08:00
Meng Xu
222da17558
Merge branch 'release-6.2' into mengxu/ha-code-read
2020-11-12 13:39:27 -08:00
Meng Xu
046a6e8427
Add Alex comment on tLog
2020-11-12 13:29:11 -08:00
Meng Xu
c2dd7d1d38
Remove unresolved questions
2020-11-11 22:39:11 -08:00
Markus Pilman
bdd3dbfa7d
remove duplicates
2020-11-10 14:01:07 -07:00
sfc-gh-tclinkenbeard
4669f837fa
Add uses of makeReference
2020-11-07 22:10:18 -08:00
Meng Xu
4788544a6f
Revise comments based on review suggestions
...
Ack. Jingyu and Xin for their suggestions.
2020-11-06 08:51:13 -08:00
Meng Xu
1664e2ff7f
Add more comments and questions to LR tLog and loadbalance
2020-11-01 21:22:23 -08:00
Xin Dong
46150d22c3
Attach generation(recovery count) to TLog metrics and LogRouter metrics.
2020-11-01 11:24:23 -08:00
Meng Xu
063700e4d6
Add comments and questions to HA and tLog code reading
...
The comments' correctness need to be confirmed by reviewers.
2020-10-30 12:14:57 -07:00
Richard Chen
545ee4269d
master conflicts
2020-10-19 01:03:54 +00:00
Trevor Clinkenbeard
24ea35e56f
Merge pull request #3748 from sfc-gh-ljoswiak/visibility-2
...
Add TLogVersion::V6
2020-10-14 17:35:32 -07:00
Richard Chen
41843f07e6
add simulator support for different process versions and ProtocolVersion test
2020-10-12 18:19:31 +00:00
sfc-gh-tclinkenbeard
a9607bdcec
Explicitly seal classes that inherit but aren't inherited from
2020-10-07 21:58:24 -07:00
sfc-gh-tclinkenbeard
8571dcfe28
Use override where applicable in fdbserver
2020-10-07 18:41:19 -07:00
Lukas Joswiak
dea7000970
Merge remote-tracking branch 'upstream/master' into visibility-1
2020-10-06 18:38:15 -07:00
sfc-gh-tclinkenbeard
0814841827
Replace NULL with nullptr in fdbserver
2020-09-20 11:31:49 -07:00
Lukas Joswiak
7dc55fdffd
Revert state
2020-09-04 15:36:47 -07:00
Lukas Joswiak
53b7721d6c
Add additional trace information
2020-09-04 15:36:47 -07:00
Lukas Joswiak
1ca7fe1a05
Add span metadata message
2020-09-04 15:36:47 -07:00
Young Liu
87693cae81
merge master branch and resolve conflicts
2020-09-02 13:44:33 -07:00
Evan Tschannen
12edadd059
Merge branch 'release-6.3'
...
# Conflicts:
# CMakeLists.txt
# fdbclient/Knobs.cpp
# fdbclient/MasterProxyInterface.h
# fdbrpc/simulator.h
# fdbserver/MasterProxyServer.actor.cpp
# tests/fast/CycleAndLock.txt
# tests/fast/TxnStateStoreCycleTest.txt
# tests/fast/VersionStamp.txt
# tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
# tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
# versions.target
2020-08-31 19:33:34 -07:00
Evan Tschannen
29eec30183
Merge branch 'release-6.2' into release-6.3
...
# Conflicts:
# CMakeLists.txt
# build/Dockerfile
# build/Dockerfile.devel
# documentation/sphinx/source/downloads.rst
# fdbserver/Knobs.cpp
# fdbserver/LogSystem.h
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WaitFailure.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
# packaging/msi/FDBInstaller.wxs
2020-08-31 01:10:29 -07:00
Evan Tschannen
ce1139e588
added missing dumpToken trace events
2020-08-27 17:17:27 -07:00
Young Liu
229ab0d5f1
Fix some conflicts and remote debugging trace events
2020-07-22 23:35:46 -07:00
Young Liu
525f10e30c
Merge master branch
2020-07-22 16:08:49 -07:00
Young Liu
302cf5c45f
Remove debug trace events
2020-07-22 12:20:22 -07:00
Young Liu
2703cedac5
Fixed known bugs
2020-07-17 22:24:52 -07:00
Young Liu
21c1998cca
Fix MaxTLogQueueSize Bug
2020-07-16 15:56:04 -07:00
Young Liu
5b06d69d25
Pass watches test
2020-07-15 00:37:41 -07:00
Meng Xu
ef8c1060a2
Merge branch 'master' into mengxu/tmp-merge-6.3
2020-07-13 10:15:56 -07:00
A.J. Beamon
b09dddc07e
Merge branch 'release-6.2' into merge-release-6.2-into-release-6.3
...
# Conflicts:
# cmake/ConfigureCompiler.cmake
# documentation/sphinx/source/downloads.rst
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/fdbrpc.vcxproj
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/flow.vcxproj
2020-07-10 15:06:34 -07:00
A.J. Beamon
04d1217941
Track statistics about server-side request latency on each process, to include min, max, mean, and various percentiles.
2020-07-09 16:39:15 -07:00
sfc-gh-tclinkenbeard
99bf993815
Replace BOOST_NOEXCEPT with noexcept
2020-06-09 22:39:19 -07:00
sfc-gh-ngoyal
693d9e8b89
Merge branch 'master' into fdb_cache_wo_allocator
2020-06-09 15:09:58 -07:00
negoyal
cf13e00a8f
Merge remote-tracking branch 'origin/release-6.3' into fdb_cache_wo_allocator
2020-06-01 17:38:31 -07:00
Meng Xu
1c35ad884f
Merge branch 'master' into mengxu/release-6.3-conflict-PR
...
Has conflict with master;
Next commit will fix the conflicts.
2020-05-25 12:01:49 -07:00
Evan Tschannen
ee6ff80064
another compile fix
2020-05-22 17:26:22 -07:00
Evan Tschannen
ced65cd30b
finished explicitly versioning everything stored in the database
2020-05-22 17:14:21 -07:00
A.J. Beamon
7a09d016a6
Merge branch 'release-6.3' into merge-release6.3-into-master
2020-05-19 12:52:44 -07:00
Markus Pilman
c2bc75516f
Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles
2020-05-14 10:34:53 -07:00
Alvin Moore
a160f9199f
Merge pull request #3171 from apple/release-6.3
...
Merge Release 6.3 into Master
2020-05-14 10:00:47 -07:00
Alex Miller
bf6d056095
Changing the last suggestions from review.
2020-05-13 18:48:43 -07:00
Alex Miller
ccaac162e2
Resolve performance concerns of nearly-no-op debugMutation being frequently called
...
This introduces unhygenic macro variants that inline a `ENABLED &&`
before the TraceEvent. This way, they get entirely compiled out unless
enabled.
Then rewrite all debugMutation uses via sed.
2020-05-13 18:44:15 -07:00
Alex Miller
27da91ab9e
Merge remote-tracking branch 'upstream/master' into mutation-debugging
2020-05-13 12:51:44 -07:00
Alex Miller
f148412a32
Make UPDATE_STORAGE_BYTE_LIMIT the reference spill variety.
...
Which is unrelated, but a change I was supposed to do a while ago and
forgot.
2020-05-12 16:59:20 -07:00
Markus Pilman
5f9b127e56
Emit traces regularly about role assignment
...
We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.
We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
2020-05-08 16:27:57 -07:00
negoyal
dd033736ed
Merge branch 'master' into fdb_cache_subfeature2
2020-05-04 17:29:43 -07:00
Evan Tschannen
7cebe743f9
A number of bug fixes of rare correctness errors
2020-04-29 13:50:13 -07:00
Evan Tschannen
c87aa33941
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/go/src/fdb/generated.go
# documentation/sphinx/source/api-common.rst.inc
# documentation/sphinx/source/api-ruby.rst
# documentation/sphinx/source/release-notes.rst
# fdbclient/FailureMonitorClient.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/vexillographer/fdb.options
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# versions.target
2020-04-23 13:47:53 -07:00
Evan Tschannen
0a1b2a572f
more compile fixes
2020-04-22 14:41:17 -07:00
Evan Tschannen
68906bf3c3
fix compile errors
2020-04-22 14:36:41 -07:00
Evan Tschannen
d0cc2a1ee4
added logging for parallel peeks on TLogs
2020-04-22 14:24:45 -07:00
Alex Miller
122762cce1
Add debugMessagesAndTags, and track mutations in more places.
...
Like:
* Leaving the proxy
* Entering the TLog
* Leaving the TLog
* Being read on a cursor
All of this brought to you by TagsAndMessage!
This also slides in a minor optimization as to how mutations are serialized per target log.
2020-03-27 03:31:04 -07:00
negoyal
acaf91ac47
Merge branch 'master' into fdb_cache_subfeature2
2020-03-26 13:33:08 -07:00
negoyal
8abac91033
Fixed a bug in cache server while peeking at a version lower than popped version and added some logging.
2020-03-26 12:39:07 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Evan Tschannen
ea98c7a40a
added additional timeout on initPersistentState
2020-03-16 11:38:14 -07:00
Evan Tschannen
d6d347f665
treat a tlog which takes a long time to create its disk queue as failed
2020-03-13 10:31:59 -07:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
8129f74a10
Merge pull request #2698 from etschannen/feature-recruit-delay
...
The CC waits until no new workers register before starting a bad recruitment
2020-02-20 14:42:37 -08:00
A.J. Beamon
fcbdcda490
Merge pull request #2650 from ajbeamon/fix-reverse-range-read-byte-limit-bug
...
Fix reverse range read performance bug
2020-02-20 12:47:17 -08:00
Evan Tschannen
fbd45963d8
The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment
2020-02-19 16:48:30 -08:00
A.J. Beamon
1d9140d874
Removed TLogVersion logging.
...
Added logging of SharedTLog ID for each TLog.
Switched ID logged for TLogRejoining event to the TLog instead of the SharedTLog.
Made some parameters to startRole passed by reference.
2020-02-14 12:33:43 -08:00
A.J. Beamon
56053c565b
Improve TLog "Role" event by adding the worker ID, the TLog version, and under what circumstances the TLog is being started (Restored, Recruited, or Recovered).
...
The SharedTLog role was being started and stopped twice, so remove one instance of it.
2020-02-12 15:11:38 -08:00
Markus Pilman
e71fe44ee3
Merge branch 'master' into features/icc
2020-02-08 21:33:02 -08:00
A.J. Beamon
df2b0452b4
Step 3 of fixing storage server range reads: change return type of readRange from VectorRef<KeyValueRef> to RangeResultRef.
2020-02-06 13:19:24 -08:00
mpilman
d09e07f1f5
Merge remote-tracking branch 'upstream/master' into features/icc
2020-02-04 10:26:18 -08:00
Jingyu Zhou
7544ff88d9
Comment out frequent TLogPop trace event
2020-01-31 19:29:09 -08:00
Evan Tschannen
6c0b934dda
Merge pull request #2242 from alexmiller-apple/fix-10min-stall-again
...
Fix the 10min multi-region recovery stall again
2020-01-23 17:53:02 -08:00
Jingyu Zhou
8b67a89eed
More review comments fixed.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
9d7a1a77d0
Small fixes.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
85c4a4e422
Address review comments for PR #1625
2020-01-22 19:38:45 -08:00
Jingyu Zhou
73824faf65
Track pseudo tags popping for individual IDs
...
For each log router ID, we track the popped version of each pseudo tag so that
the popping only applied to the minimum of these versions.
Also add more tracing for popping and epochs.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
11964733b7
WIP: should be divided into smaller commits.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
03a17a30ef
Refactor: check displacement in LogSystemConfig
2020-01-22 19:38:45 -08:00
Jingyu Zhou
442738b6db
Small code refactoring
2020-01-22 19:35:30 -08:00
Jingyu Zhou
8221d33eb1
Use emplace_back instead of push_back for TLogServer
2020-01-22 19:35:30 -08:00
Alex Miller
f0fe62a298
TLogs should not respond with data earlier than the begin version
...
Parallel peek more code would prefer the begin version it was sent by
the previous parallel peek over the request's begin version. This means
that a merge cursor trying to advance past message versions would still
get old data that it would have to filter out.
A simple application of std::max fixes this.
2020-01-21 19:09:07 -08:00
Alex Miller
7798456201
Make TLogs have consistent parallel peek behavior.
...
TLogServer and LogRouter had some leftover code from me trying to be
more "correct" about parallel peek semantics, but those changes weren't
reflected in the OldTLog* files. I've reverted the changes, as
realistically, they are more likely to waste CPU than improve TLog behavior.
2020-01-21 18:23:16 -08:00
Alex Miller
ffc3506fff
Continuing a parallel peek after a timeout would hang.
2020-01-21 17:12:18 -08:00
Alex Miller
9c47bbe460
Remove trackerData time bump
...
As we're in an error handling case, so this shouldn't be considered
making forward progress.
2020-01-21 17:08:42 -08:00
Alex Miller
1cb311fcb8
Add an ASSERT_WE_THINK that peek cursors don't get timed_out()
...
This should prevent us from regressing and having multi-region
recoveries hang for 10min again.
2020-01-21 17:07:37 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
827cea74b5
fix: tlogs must send a recruitment reply even when actor cancelled or the recruitment endpoint will be marked as permanently failed
2020-01-16 17:37:17 -08:00
Alex Miller
f58507c830
Rename poppedLocationForVersion -> versionForPoppedLocation
2019-12-19 10:24:31 -08:00
Alex Miller
b5d82a74c3
Update fdbserver/TLogServer.actor.cpp
...
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-12-19 10:20:52 -08:00
Alex Miller
d8cbd495af
Fix another pop + spill/dq-pop interleaving issue
...
This fixes an issue introduced in the previous patch, where pop would
immediately set `poppedLocationNeedsUpdate`, but setting the popped
version was now delayed. This means that we could:
1. Run the spill loop and persist all popped versions
2. Receive a pop, and set the poppedLocationNeedsUpdate flag
3. Run the dq-pop loop, and clear the poppedLocationNeedsUpdate flag
and now when we update the persistentPopped version again, we won't have
the flag set for dq-pop to know that it needs to scan the spilled data
again for the minLocation.
We could more carefully update the flag, but instead, I've just
converted it into a version that's kept in sync purely in the dq-pop
loop, to remove shared state between pop and the dq-pop loop.
2019-12-17 23:15:48 -08:00
Alex Miller
b36062a509
DiskQueue should only pop based off of persisted popped tag versions
...
This commit is to fix a bug where popping a tag between
updatePersistentData and popDiskQueue can cause the TLog to recover to
an incorrect understanding of what data it has available.
The following series of events need to happen to trigger this bug:
Tag 1:1 is popped to version 10
updatePersistentData is run...
updatePersistentPopped runs and we persistentData stores 1:1 as popped to 10
A mutation is spilled for 1:1 at version 11 at location 1000
A mutation is spilled for 1:1 at version 21 at location 5000
updatePersistentData finishes and commits the btree changes
Tag 1:1 is popped to version 20
popDiskQueue runs
The btree is read for spilled mutations with version >=20
The minimum location required for the disk queue is found to be location 5000
The disk queue is popped to location 5000
The TLog crashes
The worker restarts, and reloads the TLog files from disk
restorePersistentPopped restores tag 1:1 as having been popped to version 10
Parallel peeks are received for tag 1:1 starting at version 0
The first peek is less than the popped version, so we respond with no data, and an end version of 10
The second peek starts at version 10, which is greater than the popped version
The btree is read for spilled mutations, and we find that there is a mutation at version 11 at location 1000
Location 1000 is read in the DiskQueue
The resulting page read at Location 1000 was popped pre-crash, and thus
might either (a) be corrupt or (b) have an incorrect sequence number.
The fix to this is to force popDiskQueue/updatePoppedLocation to use the
popped version that was persisted to disk, and not the most recently
popped version for the given tag.
This bug doesn't manifest in simulation, because we don't have any code
that peeks at a lower version than what has been popped.
2019-12-17 23:02:37 -08:00
Evan Tschannen
ebcb2f79ed
Merge branch 'master' of github.com:apple/foundationdb
2019-11-22 15:34:49 -08:00
Evan Tschannen
8d3ef89540
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/MutationList.h
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-14 15:49:56 -08:00
negoyal
a4a0bf18f9
Merging with Master.
2019-11-12 13:01:29 -08:00
Evan Tschannen
396dccbc98
when peeking from satellites we do not need to limit the amount of peeking on log router tags, because that is the only thing that can be peeked from a satellite log
2019-11-08 18:34:05 -08:00
Evan Tschannen
afc9713005
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/FDBTypes.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen
a8ca47beff
optimized memory allocations by using VectorRef<Tag> instead of std::vector<Tag>
2019-11-05 18:07:30 -08:00
Evan Tschannen
4de60fc437
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/TLogServer.actor.cpp
2019-11-01 15:48:04 -07:00
Evan Tschannen
85c315f684
Fix: parallelPeekMore was not enabled when peeking from log routers
2019-11-01 14:02:44 -07:00
Evan Tschannen
3325980c03
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.actor.h
# fdbserver/worker.actor.cpp
# versions.target
2019-10-24 17:38:15 -07:00
Evan Tschannen
2722c8b188
avoid starting a new startSpillingActor with every TLog recruitment
2019-10-23 11:15:54 -07:00
Evan Tschannen
e01e8371a6
Merge pull request #2256 from alexmiller-apple/spill-log-on-switch-6.2
...
Spill SharedTLog when there's more than one
2019-10-23 10:51:28 -07:00
Alex Miller
0c325c5351
Always check which SharedTLog is active
...
In case it is set before we get to the onChange()
2019-10-23 01:59:36 -07:00
Alex Miller
1e5b8c74e3
Continuing a parallel peek after a timeout would hang.
...
This is to guard against the case where
1. Peeks with sequence numbers 0-39 are submitted
2. A 15min pause happens, in which timeout removes the peek tracker data
3. Peeks with sequence numbers 40-59 are submitted, with the same peekId
The second round of peeks wouldn't have the data left that it's allowed
to start running peek 40 immediately, and thus would hang for 10min
until it gets cleaned up.
Also, guard against overflowing the sequence number.
2019-10-22 19:24:05 -07:00
Alex Miller
1eb3a70b96
Spill SharedTLog when there's more than one.
...
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes. If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.
Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.
This is a backport of #2213 (fef89aa1
) to release-6.2
2019-10-17 01:24:50 -07:00
sramamoorthy
c9097cca18
deprecate isTLogInSameNode used by snapshot V1
2019-10-09 15:33:11 -07:00
Alex Miller
77c72de176
Comment variable and code style fix
...
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-10-07 18:08:27 -07:00
Alex Miller
71af24dff3
Fix a bug that would cause active logs to spill aggressively
...
And add some useful logging about when things do or do not spill.
2019-10-07 18:08:27 -07:00
Alex Miller
1d8a7e5af7
Spill SharedTLog when there's more than one.
...
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes. If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.
Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.
2019-10-07 18:08:27 -07:00
Alex Miller
5016f3fedd
Whitespace fixes
...
no idea what happened here
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-10-04 13:37:59 -07:00
Alex Miller
6bcb72fa74
Fix stray Unversioned()
...
I forgot there were two
2019-10-03 19:45:13 -07:00
Alex Miller
28f6275f94
Use AssumeVersion instead of Unversioned
...
Which lets us revert the unversioned serilaization of TLogSpillType
2019-10-03 15:59:09 -07:00
Alex Miller
9401a6941a
Code review nits
...
const correctness and file renaming in comment.
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-10-03 15:53:39 -07:00
Alex Miller
6742222084
Make TLogServer able to spill by value and by reference
...
...and test it in simulation, but not combined yet.
It turns out that because of txsTag, we basically had to support
spill-by-value anyway. Thus, if we treat all tags like txsTag when
spilling and peeking, then we have an easy way to bring the two spilling
types back into one implementation.
2019-10-03 01:45:10 -07:00
Alex Miller
d38a96ab73
Make LogData aware of the spill type it was created to perform.
...
The spilling type is now pulled out of the request, and then stored on
LogData for later access, and persisted in the tlog metadata per tlog
generation.
It turns out that serializing types as Unversioned is a bit wonky.
2019-10-03 01:45:10 -07:00
Evan Tschannen
b495cc697b
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-09-13 09:25:08 -07:00
Alex Miller
53bcf41805
Fix the build.
2019-09-12 18:46:30 -07:00
Alex Miller
befa0646b3
Merge remote-tracking branch 'upstream/release-6.2' into faster-remote-dc
2019-09-12 18:46:03 -07:00
Evan Tschannen
6a7f109788
added logging on the TLog for the tag with smallest popped version
2019-09-12 16:22:01 -07:00
Alex Miller
99843bd4ba
Add parallel peek support to log routers
2019-09-12 14:26:37 -07:00
Evan Tschannen
94668c6f1f
Merge pull request #2063 from jzhou77/clang
...
Refactor deserialization of on-wire buffer with TagsAndMessage
2019-09-09 16:34:56 -07:00
Jingyu Zhou
2d5ebebb7b
Use TagsAndMessage for deserialization in TLogServer
2019-09-05 16:53:10 -07:00
Jingyu Zhou
2723922f5f
Replace -1 as VERSION_HEADER constant for serialization
2019-09-05 12:45:39 -07:00
Meng Xu
c2355f721e
Merge branch 'master' into mengxu/performant-restore-PR
2019-09-04 17:11:42 -07:00
Jingyu Zhou
cd3f1e33d4
Refactor deserialization of TagsAndMessages
...
Consolidate deserialization of TagsAndMessages in the structure itself and
change both TLog and ServerPeekCursor to use it.
2019-09-04 14:55:05 -07:00
Evan Tschannen
24aad14f06
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-08-30 17:23:58 -07:00
Evan Tschannen
dc1d055b27
Merge pull request #2042 from senthil-ram/snap_cli_fix
...
fix fdbcli --exec 'snapshot create.sh' failure
2019-08-30 13:40:38 -07:00
sramamoorthy
b3277f2982
Fix #2009 posix compliant args for snapshot binary
2019-08-30 12:54:09 -07:00
Andrew Noyes
6aa0ada7b1
Replace scalar root types with proper messages
2019-08-28 14:40:50 -07:00
Jingyu Zhou
4a63de16e9
Merge pull request #1945 from xumengpanda/mengxu/tLog-code-read-v2
...
Add comments to DiskQueue and tLog
2019-08-08 13:24:32 -07:00
Meng Xu
c9c50ceff8
Comments:Add comments to DiskQueue
...
No functional change.
2019-08-01 15:20:01 -07:00
Meng Xu
7ccaeddf05
Merge branch 'master' into mengxu/performant-restore-PR
2019-08-01 13:23:17 -07:00
Evan Tschannen
3774ff55b0
There were still use cases where this checks are necessary
2019-07-31 17:45:21 -07:00
Evan Tschannen
854ee75664
we no longer need to special case for txs tag, because it will be initialized by createTagData
2019-07-31 17:13:15 -07:00