Meng Xu
ab2dd36bdc
FastRestore:Generic way to detect stragger
2020-02-21 14:30:08 -08:00
Meng Xu
e4258d73f5
FastRestore:Applier:Remove applying actors that do not have good perf
2020-02-19 15:27:59 -08:00
Meng Xu
03f699f2f9
Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR
2020-02-19 15:22:33 -08:00
Meng Xu
551f1ba4d2
FastRestore:Minor revision for code review
2020-02-19 11:52:24 -08:00
Meng Xu
0d668ea0c3
FastRestore:Applier:Add more trace for perf tracking
2020-02-13 15:50:10 -08:00
Meng Xu
acf34319c1
FastRestore:Applier:Precompute mutations and apply in parallel
...
Precompute mutations received by an applier;
Only apply the final result to the destination DB;
Execute multiple txns in parallel to apply final results to the destination DB.
2020-02-12 22:47:48 -08:00
Meng Xu
cda8fc189e
FastRestore:AtomicOp:Intro weighted size for atomicOp
...
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Meng Xu
dbce1e9974
FastRestore:Applier:Add metrics counter and proc counter
2020-02-10 16:38:26 -08:00
Meng Xu
1fc793d6a7
FastRestore:Loader:Add metrics counter
2020-02-09 22:06:14 -08:00
Meng Xu
fd5b4af05a
FastRestore:Add trace for each phase on master
2020-02-09 18:54:10 -08:00
Meng Xu
cf331b9a03
FastRestore:monitorFinishedVersion for measuring perf quickly
2020-02-05 14:26:25 -08:00
Meng Xu
3b57bf1781
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-03 17:23:54 -08:00
Meng Xu
aa601adcd7
Apply suggestions from code review
...
Correct typo.
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-03 09:49:26 -08:00
Meng Xu
962024d8b8
Explain knob SHARD_MAX_BYTES_PER_KSEC
...
Explain why it may cost 100MB data movement.
No code change.
2020-01-31 17:04:11 -08:00
Meng Xu
7f37a90c48
FastRestore:Introduce FASTRESTORE_VB_PARALLELISM
...
for controlling the number of concurrently running version batches.
2020-01-28 10:39:57 -08:00
Meng Xu
0f4dfeda5b
FastRestore:Test random number of appliers and loaders
2020-01-27 20:19:58 -08:00
Meng Xu
3cfe1f031d
FastRestore:Use set instead of vector for keysplitter
...
and disable testing for random number of appliers and loaders
2020-01-27 20:14:33 -08:00
Meng Xu
141609e80a
FastRestore:Improve code style and fix typos
2020-01-27 18:13:14 -08:00
Meng Xu
b04e98771e
FastRestore:Replace FastRestoreOpConfig with Knobs
...
And randomize value for the rest of knobs
2020-01-24 14:24:34 -08:00
Jingyu Zhou
1311fec45a
Add an option to get minKnownCommittedVersion from Proxies
...
The backup worker needs to use this version for popping when running in a NOOP
mode. This option is added to GetReadVersionRequest and proxies will send back
minKnownCommittedVersion if the option is set.
Also add a couple of knobs for backup workers.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
0e5f5b50f0
Remove unused backup worker knobs
2020-01-22 19:38:46 -08:00
Jingyu Zhou
a4d6ebe79e
Recruit backup worker in newEpoch
2020-01-22 19:37:48 -08:00
Jingyu Zhou
de8d953865
Add backup role, class, and worker skeleton
2020-01-22 19:35:30 -08:00
Evan Tschannen
54d77d20b2
Merge branch 'release-6.2'
2020-01-19 15:22:49 -08:00
Evan Tschannen
8197f0562f
merge priority did not need to be raised, because we no longer merge shards until they are untrackable
...
max_commit_updates was too large, and could cause proxies to run out of memory
2020-01-17 14:24:58 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
4b90487b90
occasionally throw wrong_shard_server when waitMetrics times out so that the waitMetrics request can get the correct number of shards if two shards have been merged or split and the same storage server owns all the chunks
2020-01-15 13:22:18 -08:00
A.J. Beamon
9668c4471f
Clamp infinite limit in ratekeeper
2020-01-14 15:45:24 -08:00
Evan Tschannen
17e97f24e4
Merge pull request #2526 from etschannen/feature-dd-improvements
...
Data distribution improvements
2020-01-10 17:53:22 -08:00
Evan Tschannen
fde53cbeef
HasBeenTrueFor was ready immediately after a previous shard merge
2020-01-10 16:28:56 -08:00
Evan Tschannen
9b80498180
Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth
2020-01-10 14:58:38 -08:00
Evan Tschannen
4aab9b7bc8
fix: clients would waste time attempting to read from a remote region when it was in the process of catching up
2020-01-10 12:23:59 -08:00
Evan Tschannen
02a8e8d1e9
batch priority must be heavily throttled before stopping data distribution rebalancing
2020-01-09 17:05:22 -08:00
Evan Tschannen
9842272ced
raised the priority of shard merges, because the tracker cannot track an unmerged shard
2020-01-09 17:04:17 -08:00
Evan Tschannen
e4fa4ad0c9
Data distribution will not merge a shard unless it has been low bandwidth for 5 minutes
2020-01-09 17:02:49 -08:00
Evan Tschannen
ab7071932f
Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly
2020-01-09 16:59:37 -08:00
Meng Xu
39a4f2372f
Change FASTRESTORE_SAMPLING_PERCENT to 0 to 100
2019-12-04 21:26:27 -08:00
Meng Xu
c6b36dbffb
FastRestore:Sampling:Resolve review comments
2019-12-04 17:35:11 -08:00
Meng Xu
dd91d26dfa
FastRestore:Sampling:Add FASTRESTORE_SAMPLING_RATE knob
2019-12-04 11:46:29 -08:00
Evan Tschannen
07331ab5fd
Merge pull request #2362 from etschannen/master
...
Merge 6.2 into master
2019-12-02 15:04:27 -08:00
Evan Tschannen
ebcb2f79ed
Merge branch 'master' of github.com:apple/foundationdb
2019-11-22 15:34:49 -08:00
Xin Dong
14dd5626d7
Resolve review comments
2019-11-22 10:11:45 -08:00
Xin Dong
b6e1839d84
Code clean up
2019-11-21 13:39:19 -08:00
Xin Dong
b282e180d5
Added a knob to disable read sampling
2019-11-20 14:03:20 -08:00
Evan Tschannen
8d3ef89540
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/MutationList.h
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-14 15:49:56 -08:00
negoyal
a4a0bf18f9
Merging with Master.
2019-11-12 13:01:29 -08:00
Evan Tschannen
396dccbc98
when peeking from satellites we do not need to limit the amount of peeking on log router tags, because that is the only thing that can be peeked from a satellite log
2019-11-08 18:34:05 -08:00
Evan Tschannen
afc9713005
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/FDBTypes.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen
daac8a2c22
Knobified a few variables
2019-11-04 20:21:38 -08:00
Evan Tschannen
71dfaa3f95
Merge pull request #2275 from dongxinEric/bugfix/2273/fix-read-key-sampling
...
Resolves #2273 : Use a large value for read sampling size threshold. Also at sampling …
2019-10-31 10:21:58 -07:00