Commit Graph

291 Commits

Author SHA1 Message Date
Meng Xu ab2dd36bdc FastRestore:Generic way to detect stragger 2020-02-21 14:30:08 -08:00
Meng Xu e4258d73f5 FastRestore:Applier:Remove applying actors that do not have good perf 2020-02-19 15:27:59 -08:00
Meng Xu 03f699f2f9 Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR 2020-02-19 15:22:33 -08:00
Meng Xu 551f1ba4d2 FastRestore:Minor revision for code review 2020-02-19 11:52:24 -08:00
Meng Xu 0d668ea0c3 FastRestore:Applier:Add more trace for perf tracking 2020-02-13 15:50:10 -08:00
Meng Xu acf34319c1 FastRestore:Applier:Precompute mutations and apply in parallel
Precompute mutations received by an applier;
Only apply the final result to the destination DB;
Execute multiple txns in parallel to apply final results to the destination DB.
2020-02-12 22:47:48 -08:00
Meng Xu cda8fc189e FastRestore:AtomicOp:Intro weighted size for atomicOp
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Meng Xu dbce1e9974 FastRestore:Applier:Add metrics counter and proc counter 2020-02-10 16:38:26 -08:00
Meng Xu 1fc793d6a7 FastRestore:Loader:Add metrics counter 2020-02-09 22:06:14 -08:00
Meng Xu fd5b4af05a FastRestore:Add trace for each phase on master 2020-02-09 18:54:10 -08:00
Meng Xu cf331b9a03 FastRestore:monitorFinishedVersion for measuring perf quickly 2020-02-05 14:26:25 -08:00
Meng Xu 3b57bf1781 Merge branch 'master' into mengxu/fast-restore-agent-PR 2020-02-03 17:23:54 -08:00
Meng Xu aa601adcd7
Apply suggestions from code review
Correct typo.

Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-03 09:49:26 -08:00
Meng Xu 962024d8b8 Explain knob SHARD_MAX_BYTES_PER_KSEC
Explain why it may cost 100MB data movement.
No code change.
2020-01-31 17:04:11 -08:00
Meng Xu 7f37a90c48 FastRestore:Introduce FASTRESTORE_VB_PARALLELISM
for controlling the number of concurrently running version batches.
2020-01-28 10:39:57 -08:00
Meng Xu 0f4dfeda5b FastRestore:Test random number of appliers and loaders 2020-01-27 20:19:58 -08:00
Meng Xu 3cfe1f031d FastRestore:Use set instead of vector for keysplitter
and disable testing for random number of appliers and loaders
2020-01-27 20:14:33 -08:00
Meng Xu 141609e80a FastRestore:Improve code style and fix typos 2020-01-27 18:13:14 -08:00
Meng Xu b04e98771e FastRestore:Replace FastRestoreOpConfig with Knobs
And randomize value for the rest of knobs
2020-01-24 14:24:34 -08:00
Jingyu Zhou 1311fec45a Add an option to get minKnownCommittedVersion from Proxies
The backup worker needs to use this version for popping when running in a NOOP
mode. This option is added to GetReadVersionRequest and proxies will send back
minKnownCommittedVersion if the option is set.

Also add a couple of knobs for backup workers.
2020-01-22 19:42:13 -08:00
Jingyu Zhou 0e5f5b50f0 Remove unused backup worker knobs 2020-01-22 19:38:46 -08:00
Jingyu Zhou a4d6ebe79e Recruit backup worker in newEpoch 2020-01-22 19:37:48 -08:00
Jingyu Zhou de8d953865 Add backup role, class, and worker skeleton 2020-01-22 19:35:30 -08:00
Evan Tschannen 54d77d20b2 Merge branch 'release-6.2' 2020-01-19 15:22:49 -08:00
Evan Tschannen 8197f0562f merge priority did not need to be raised, because we no longer merge shards until they are untrackable
max_commit_updates was too large, and could cause proxies to run out of memory
2020-01-17 14:24:58 -08:00
Evan Tschannen 3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen 4b90487b90 occasionally throw wrong_shard_server when waitMetrics times out so that the waitMetrics request can get the correct number of shards if two shards have been merged or split and the same storage server owns all the chunks 2020-01-15 13:22:18 -08:00
A.J. Beamon 9668c4471f Clamp infinite limit in ratekeeper 2020-01-14 15:45:24 -08:00
Evan Tschannen 17e97f24e4
Merge pull request #2526 from etschannen/feature-dd-improvements
Data distribution improvements
2020-01-10 17:53:22 -08:00
Evan Tschannen fde53cbeef HasBeenTrueFor was ready immediately after a previous shard merge 2020-01-10 16:28:56 -08:00
Evan Tschannen 9b80498180 Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth 2020-01-10 14:58:38 -08:00
Evan Tschannen 4aab9b7bc8 fix: clients would waste time attempting to read from a remote region when it was in the process of catching up 2020-01-10 12:23:59 -08:00
Evan Tschannen 02a8e8d1e9 batch priority must be heavily throttled before stopping data distribution rebalancing 2020-01-09 17:05:22 -08:00
Evan Tschannen 9842272ced raised the priority of shard merges, because the tracker cannot track an unmerged shard 2020-01-09 17:04:17 -08:00
Evan Tschannen e4fa4ad0c9 Data distribution will not merge a shard unless it has been low bandwidth for 5 minutes 2020-01-09 17:02:49 -08:00
Evan Tschannen ab7071932f Data distribution no longer attempts to pick teams which share members of the source unless the team matches exactly 2020-01-09 16:59:37 -08:00
Meng Xu 39a4f2372f Change FASTRESTORE_SAMPLING_PERCENT to 0 to 100 2019-12-04 21:26:27 -08:00
Meng Xu c6b36dbffb FastRestore:Sampling:Resolve review comments 2019-12-04 17:35:11 -08:00
Meng Xu dd91d26dfa FastRestore:Sampling:Add FASTRESTORE_SAMPLING_RATE knob 2019-12-04 11:46:29 -08:00
Evan Tschannen 07331ab5fd
Merge pull request #2362 from etschannen/master
Merge 6.2 into master
2019-12-02 15:04:27 -08:00
Evan Tschannen ebcb2f79ed Merge branch 'master' of github.com:apple/foundationdb 2019-11-22 15:34:49 -08:00
Xin Dong 14dd5626d7 Resolve review comments 2019-11-22 10:11:45 -08:00
Xin Dong b6e1839d84 Code clean up 2019-11-21 13:39:19 -08:00
Xin Dong b282e180d5 Added a knob to disable read sampling 2019-11-20 14:03:20 -08:00
Evan Tschannen 8d3ef89540 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/MutationList.h
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2019-11-14 15:49:56 -08:00
negoyal a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
Evan Tschannen 396dccbc98 when peeking from satellites we do not need to limit the amount of peeking on log router tags, because that is the only thing that can be peeked from a satellite log 2019-11-08 18:34:05 -08:00
Evan Tschannen afc9713005 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/FDBTypes.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	versions.target
2019-11-06 13:45:37 -08:00
Evan Tschannen daac8a2c22 Knobified a few variables 2019-11-04 20:21:38 -08:00
Evan Tschannen 71dfaa3f95
Merge pull request #2275 from dongxinEric/bugfix/2273/fix-read-key-sampling
Resolves #2273: Use a large value for read sampling size threshold. Also at sampling …
2019-10-31 10:21:58 -07:00