Commit Graph

157 Commits

Author SHA1 Message Date
Meng Xu 505997ba0a FastRestore:Switch to new sendBatchRequests that tracks performance and straggler 2020-02-21 15:45:32 -08:00
Meng Xu 03f699f2f9 Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR 2020-02-19 15:22:33 -08:00
Meng Xu 94d799552e FastRestore:Apply clang-format against master 2020-02-18 16:41:59 -08:00
Meng Xu 132f5aa9ba FastRestore:Improve trace name and cosmetic change 2020-02-18 16:41:19 -08:00
Meng Xu b5e60585aa FastRestore:Applier:Fix precompute mutation result 2020-02-13 12:57:47 -08:00
Meng Xu cda8fc189e FastRestore:AtomicOp:Intro weighted size for atomicOp
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Meng Xu e76b6d824a FastRestore:Assign priority to actors to prioritize vb work
When we pipeline multiple version batches, we should prevent a later
version batch from blocking the earlier version batch by consuming
CPU resources.

To achive the above, we should assign higher priority to actors
in later phases in a version batch.

Because restore master will not invoke an actor at a later phase unless
the actors at the earlier phases have been finished. This priority assignment
will not cause dead lock.
2020-02-10 20:29:23 -08:00
Meng Xu dbce1e9974 FastRestore:Applier:Add metrics counter and proc counter 2020-02-10 16:38:26 -08:00
Meng Xu 1fc793d6a7 FastRestore:Loader:Add metrics counter 2020-02-09 22:06:14 -08:00
Meng Xu 72110de7e2 FastRestore:Add trace for quick perf. measurement 2020-02-06 19:48:26 -08:00
Meng Xu cab9d51e06 Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-01-27 18:16:26 -08:00
Meng Xu 141609e80a FastRestore:Improve code style and fix typos 2020-01-27 18:13:14 -08:00
Meng Xu cfdcddd90e FastRestore:Loader:Pipeline sendMutationsToApplier actors 2020-01-23 20:22:05 -08:00
Meng Xu e011f39829 FastRestore:Add sanity check and trace events 2020-01-23 16:03:41 -08:00
Meng Xu 009fcdeb16 FastRestore:Sanity check each restore asset is processed exactly once 2020-01-21 17:17:45 -08:00
Meng Xu 022783b449 Start batches in reverse order for testings and code cleanup 2020-01-21 14:49:40 -08:00
Meng Xu 4ac92d223b Cleanup batch buffer for each restore request 2020-01-21 14:49:36 -08:00
Meng Xu d69bd2f661 FastRestore:Loader buffer data for multiple batches 2020-01-17 17:01:06 -08:00
Meng Xu bfbf2164c4 FastRestore:Applier buffer data for multiple batches 2020-01-17 17:01:01 -08:00
Meng Xu f436ea806e FastRestore:Resolve review comment
1) Sort logfiles by endVersion

2) Exit program early when restore will not succeed

3) Do not increase nextVersion unncessarily when
calculate version batches.

4) Change assert condition that ensures progress in
calculating version batches.
2020-01-13 14:08:27 -08:00
Meng Xu c29e380076 FastRestore:Remove prevVersion from LoadingParam 2020-01-07 14:59:17 -08:00
Meng Xu 9df02512ab FastRestore:Apply clang-format 2020-01-07 11:50:32 -08:00
Meng Xu 67e913c3d5 Change LoadingParam struct and endVersion definition
1) Remove endVersion field because it has been included in RestoreAsset;

2) Ensure endVersion in VersionBatch and RestoreAsset is always exclusive;

3) Revise ASSERT in laoder and applier in situations when the dummy commit version
is endVersion, to avoid false positive ASSERT failure.
2020-01-07 11:48:03 -08:00
Meng Xu c3f8f3b445 FastRestore:Build VersionBatch less than threshold size 2020-01-07 11:46:56 -08:00
Meng Xu c10035ba54 FastRestore:Use isInVersionRange based on code review 2019-12-23 15:01:27 -08:00
Meng Xu 8d6f511816 FastRestore:Resolve review comment
Filter out range mutations that do not overlap with the restore range.
Small changes on format.
2019-12-22 20:09:10 -08:00
Meng Xu 61b29de3ce FastRestore:Self code review
Clean up commented code;
Add sanity check.
2019-12-20 22:24:34 -08:00
Meng Xu ddcf3fdd80 FastRestore:Apply clang format 2019-12-20 22:00:36 -08:00
Meng Xu 2cd1f0780a FastRestore:Split asset to subasset for async parsing files 2019-12-20 21:44:40 -08:00
Meng Xu e98b2a0d1c FastRestore:Introduce RestoreAsset 2019-12-20 18:00:10 -08:00
Meng Xu ffc8f76710 FastRestore:Rename StringRefReaderMX to BackupStringRefReader 2019-12-19 11:49:37 -08:00
Meng Xu b5d7890ce0 FastRestore:Resolve review comments 2019-12-12 07:45:30 -08:00
Meng Xu 9670d64fbd FastRestore:Remove commented code 2019-12-11 16:48:40 -08:00
Meng Xu 1371db4cdc FastRestore:Self code review and cleanup
1. Review memory use cases and improve:
Ensure state varialble is initialized and
change unnecessary  state variable to variable.

2. Remove debug code that is no longer useful;

3. Mute verbose debug.
2019-12-11 16:37:33 -08:00
Meng Xu 9a6dabe47e Merge branch 'mengxu/fastrestore-code-cleanup-PR' into mengxu/fast-restore-fix-valgrind-PR 2019-12-10 20:05:35 -08:00
Meng Xu feb2a8c70c FastRestore Change RestoreSendMutationVectorVersionedRequest name
Change RestoreSendMutationVectorVersionedRequest to
RestoreSendVersionedMutationsRequest for better naming
2019-12-10 17:23:40 -08:00
Meng Xu 20a19978f9 FastRestore:LoadingParam cleanup 2019-12-10 17:20:44 -08:00
Meng Xu e8dfc1c187 Replace pop_front(size) with new empty standalone obj 2019-12-06 23:16:49 -08:00
Meng Xu 4a66366a05 Use MutationsVec instead of VectorRef 2019-12-06 22:00:40 -08:00
Meng Xu 39a4f2372f Change FASTRESTORE_SAMPLING_PERCENT to 0 to 100 2019-12-04 21:26:27 -08:00
Meng Xu c6b36dbffb FastRestore:Sampling:Resolve review comments 2019-12-04 17:35:11 -08:00
Meng Xu dd91d26dfa FastRestore:Sampling:Add FASTRESTORE_SAMPLING_RATE knob 2019-12-04 11:46:29 -08:00
Meng Xu 2b987d1945 FastRestore:typedef Standalone<VectorRef<MutationRef>> MutationsVec 2019-12-04 11:39:55 -08:00
Meng Xu 9383c3f0a6 FastRestore:Sampling:Apply clang format 2019-12-03 21:27:06 -08:00
Meng Xu 3310f67e9e Merge branch 'mengxu/fast-restore-fix-valgrind-PR' into mengxu/fast-restore-sampling-PR 2019-12-03 16:24:40 -08:00
Meng Xu 153b713b53 FastRestore:Add sampling on parsed mutations 2019-12-03 12:52:17 -08:00
Meng Xu 474f0067c4 Remove unneeded state 2019-11-25 23:10:14 -08:00
Meng Xu a04f314b1b
Merge pull request #2383 from jzhou77/restore
Use sizeof() to replace constant numbers
2019-11-22 16:14:44 -08:00
Jingyu Zhou 037e808253 Address review comments by changing variable names 2019-11-22 13:12:04 -08:00
Jingyu Zhou 9927a9013f Use sizeof() to replace constant numbers 2019-11-22 11:47:25 -08:00
Meng Xu 78f10f15b3 FastRestore:replace insert with emplace for map and vector
This resolves the review suggestions.
2019-11-21 22:47:04 -08:00
Meng Xu 343bcd104a FastRestore:Apply Clang format 2019-11-20 21:04:18 -08:00
Meng Xu 3f5491318d FastRestore:Fix bug that cause nondeterminism
1) Use map iterator instead of pointer to maintain stability when map is inserted or deleted
2) dummySampleWorkload: clear rangeToApplier data in each sampling phase. otherwise, we can
have an increasing number of keys assigned to the applier.
2019-11-15 11:30:09 -08:00
Meng Xu 9e36b897e6 FastRestore:Loaders must send to appliers log files data before range files 2019-11-12 21:43:12 -08:00
Meng Xu 592f4c0fc4 FastRestore:Remove RestoreSetApplierKeyRangeVectorRequest 2019-11-12 17:59:11 -08:00
Meng Xu 7e4c4ea98e FastRestore:Load mutations before assign ranges to appliers 2019-11-12 17:14:17 -08:00
Jingyu Zhou ae7e42face
Merge pull request #2313 from xumengpanda/mengxu/fastrestore-applyToDB-bugfix-PR
Performant restore [8/XX]: Fix bugs in applyToDB logic and add more tests
2019-11-12 08:50:23 -08:00
Meng Xu 630c29d160 FastRestore:resolve review comments
1) wait on whenAtLeast;
2) Put BigEndian64 into the function call and the decoder to prevent
future people from making the same mistake.
2019-11-11 17:00:16 -08:00
A.J. Beamon cf2ec3418c
Merge pull request #2317 from xumengpanda/mengxu/fastrestore-extend-atomicOpTest-PR
AtomicOps Test: Add more detailed debug information when test fails with opType = AddValue
2019-11-11 15:03:10 -08:00
Meng Xu 0ccded1929 AtomicOps:Resolve review comments 2019-11-05 19:27:49 -08:00
Meng Xu c4d1e6e1a9 Trace:Severity:Include SevNoInfo to mute trace
Define SevFRMutationInfo to trace mutations in restore.
2019-11-04 16:18:40 -08:00
Meng Xu e345c9061f FastRestore:Refine debug messages 2019-11-04 11:47:38 -08:00
Meng Xu 7903b47b82 FastRestore:Remove unnecessary return 2019-10-24 13:09:24 -07:00
Meng Xu c53f817c5e FastRestore:Convert handleInitVersionBatchRequest to plain func 2019-10-24 13:06:50 -07:00
Meng Xu 60d26ff5d7 FastRestore:Resolve review comments 2019-10-24 12:52:12 -07:00
Meng Xu bae0c907a6 FastRestore:Convert unnecessary actor function to plain function 2019-10-23 15:10:34 -07:00
Meng Xu ab4a375b95 FastRestore:RestoreLoader:Define SerializedMutationPartMap type 2019-10-17 10:12:38 -07:00
Meng Xu 78b1ebc7c2 FastRestore:Loader:Handle multiple mutations at same verions in multiple files 2019-10-16 20:57:16 -07:00
Meng Xu d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Meng Xu 9cc832cfd6 FastRestore:Fix Mac and Windows compilation error 2019-08-02 14:33:08 -07:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Meng Xu f1741aa90d FastRestore: Resolve review comments
1) Do not keep restore role data (e.g., masterData) in restore worker;
2) Change function parameter list by only passing in the needed variables in role data;
3) Remove unneccessary files vector from masterData;
4) Change typos in comments and some functions name.
2019-07-24 17:51:53 -07:00
Meng Xu 701676dbd2 FastRestore:Refactor code and add missing files
Add RestoreWorker.actor.cpp and RestoreWorkerInterface.actor.h back.
2019-06-18 09:54:27 -07:00
Meng Xu 022b555b69 FastRestore:Fix bug in finish restore
RestoreMaster may not receive all acks. for the last command, i.e., finishRestore,
because RestoreLoaders and RestoreAppliers exit immediately after sending the ack.
If the ack is lost, it will not be resent.

This commit also removes some unneeded code.
This commit passes 50k random tests without errors.
2019-06-05 20:07:18 -07:00
Meng Xu 3fcb6ec0a1 FastRestore:Refactor RestoreLoader and fix bugs
Refactor RestoreLoader code and
Fix a bug in notifying restore finish.
2019-06-04 21:53:31 -07:00
Meng Xu 477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
Meng Xu a372c82db2 FastRestore:BugFix:Loader must distinguish range and log mutations sent to appliers 2019-05-30 21:22:33 -07:00
Meng Xu 450bda9a01 FastRestore:Refactor parsing backup file code
Refactor _parseRangeFileToMutationsOnLoader and
_parseLogFileToMutationsOnLoader functions and their callees
2019-05-30 14:01:48 -07:00
Meng Xu a3f61e6df7 FastRestore:Rafctor:Reduce code size
1) Use runRYWTransaction to replace the loop-try style;
2) Remove unnecessary printf
3) Do not mistakenly send reply twice.
2019-05-29 17:03:50 -07:00
Meng Xu 9e1216af1c FastRestore:Remove CMDUID 2019-05-29 13:48:04 -07:00
Meng Xu 4f484a2a5d FastRestore:Refactor out the use of cmdID and other non-must functions 2019-05-29 13:26:17 -07:00
Meng Xu d56837ba16 FastRestore:Refactor LoadFileRequest
1) Remove global map to buffer the parsed mutations on loader.
   Use local map instead to increase parallelism.
2) Use std::map<LoadingParam, Future<Void>> to hold the actor
that parse a backup file and to de-duplicate requests.
3) Remove unused code.
2019-05-28 18:39:00 -07:00
Meng Xu fe2624fc22 FastRestore:Remove sampling phase
Remove the sampling phase to make the PR easier to review.
The sampling design and implementation may be changed and added in
next PR.
2019-05-26 21:34:58 -07:00
Meng Xu 3eadb31798 FastRestore:Resolve two major reveiw comments
1) Add sendBatchRequests and getBatchReplies

sendBatchRequests is a generic actor to send requests without
processing replies.
getBatchReplies is similar to sendBatchRequests expect that
it returns the reply to caller.

2) Share applier interface to loaders by using RequestStream,
instead of using DB.
   Create RestoreSysInfo struct, similar purpose as DBInfo, for
 the restore system information that are shared among restore workers.
2019-05-24 21:53:21 -07:00
Meng Xu fac63a83c4 FastRestore:Use NotifiedVersion to deduplicate requests
Add a NotifiedVersion into an applier data which represents
the smallest version the applier is at.

When a loader sends mutation vector to appliers, it sends
the request that contains prevVersion and commitVersion.

This commits also put actor into an actorCollector for
loop-choose-when situation.
2019-05-22 22:09:54 -07:00
Meng Xu 12817af03f FastRestore:Fix CMake compiling errors 2019-05-16 20:01:43 -07:00
Meng Xu 35b169fd2d FastRestore:Fix bug in registerMutationsToApplier
We forgot to update the applierInterface reference to the iterated
applyID
2019-05-14 22:10:09 -07:00
Meng Xu d9c97b5e5f FastRestore:Fix bug in sending a vector of mutations
When mutationVectorThreshold is not 1, a loader sends a vector of
mutations to an applier.

We should never mix mutations at different versions into the same vector.

The code on previous commit may mix mutations at versions.
This commit resolves the bug.
2019-05-14 21:04:36 -07:00
Meng Xu f33e3bf8bc FastRestore:bugFix:loader must clear kvOps after use it
In the sampling phase, a loader will cache the mutations into kvOps map;
In the loading log file phase, the loader will do the same thing.
The loader must clear the kvOps map once the loader use it; otherwise,
it will cache the sampled mutations twice, which leads to an
inconsistent restored DB.
2019-05-14 20:28:32 -07:00
Meng Xu f54a1e1463 FastRestore:Fix bug in deciding applierID in splitMutation 2019-05-14 17:39:44 -07:00
Meng Xu f8c654cd86 FastRestore:Fix splitMutation bug
The splitted range mutation had a wrong param1 for the produced first mutation
2019-05-14 17:05:50 -07:00
Meng Xu 1f159113e6 FastRestore:Test multiple appliers
Loaders will split a range mutation for multiple appliers when needed.
2019-05-14 16:41:04 -07:00
Meng Xu 6c4c807801 FastRestore:fix bug due to non-unique cmdid
This commit identifies the bug
why DB may be restored to an inconsistent state.

The cmdid is used to achieve exact once delivery even when
network can deliver a request twice.
This is under assumption that cmdid is unique for each request!

However, this assumption may not hold for
the phase Loader_Send_Mutations_To_Applier, when loaders send parsed
mutations to appliers:
1) When the same loader loads multiple files, we reset the cmdid
for the phase;
2) When different loaders load files, each loader's cmdid starts from
0 for the phase.
Both situations can break the assumption, which causes appliers to
miss some mutations to apply. This breaks the cycle test.
2019-05-14 01:49:49 -07:00
Meng Xu c115e3ceb1 FastRestore: Remove handleSampleLogFileRequest
handleSampleLogFileRequest is replaced by handleLoadLogFileRequest
2019-05-13 18:49:13 -07:00
Meng Xu 730142d532 FastRestore: Mark sampled file as processed files
This commit should pass correctness test, but
it does not mean the fast restore logic is correct.

We should NOT mark sampled file as processed files.
2019-05-13 17:53:11 -07:00
Meng Xu 76dd8dc8a8 FastRestore: Fix splitMutation bug 2019-05-13 17:24:57 -07:00
Meng Xu c7cd758e01 FastRestore:Do not mark log file as processed in sampling
This commit will expose a potential bug in fast restore.
We may need to parse range file before log file.
2019-05-13 11:37:20 -07:00
Meng Xu 26b224cddc FastRestore:RestoreLoader: Unify parsing log file
Use a generic actor to parse log files for sampling phase and
load phase.
2019-05-13 11:36:29 -07:00
Meng Xu a2fef23678 FastRestore: Remove handleSampleRangeFile actor 2019-05-13 10:36:44 -07:00