Commit Graph

200 Commits

Author SHA1 Message Date
Meng Xu 528466e0e6 FastRestore:Fix Valgrind error InvalidSuppression
Trace.error() must explicitly include error_code_actor_cancelled
to handle the error.
2020-05-02 19:52:05 -07:00
Meng Xu f9f1ac6594 FastRestore:Revise TraceEvent for better diagnosis 2020-05-01 16:31:55 -07:00
Meng Xu 134dbca0ee FastRestore:Use cannonical way to trace error 2020-05-01 13:35:13 -07:00
Meng Xu 41c0a1768f FastRestore:Make FastRestore event type more descriptive 2020-05-01 10:27:08 -07:00
Meng Xu 038f3834fc Merge branch 'master' into mengxu/fr-code-improvement-PR 2020-05-01 09:26:29 -07:00
Meng Xu 6bd71560f0 FastRestore:Reduce trace events in real cluster environment 2020-04-30 19:12:31 -07:00
Meng Xu f073049865 FastRestore:Revise trace events to be descriptive
Revert changes that send mutations to appliers out of order
2020-04-24 10:31:08 -07:00
Meng Xu d21da5065a FastRestore:Loader:Merge MutationsVec and LogMessageVersionVec into VersionedMutationsVec
Remove the actor that sends one mutation message batch in the previous commit,
because that actor no longer reduces the code complexity.
2020-04-21 22:05:34 -07:00
Meng Xu 061bcd2fb4 FastRestore:Replace typeString with safe getTypeString func
Also fix compilation error in previous commit
2020-04-13 15:15:54 -07:00
Meng Xu dbc9c23193 FastRestore:Loader:Send mutations at different versions in the same message to appliers
This increases the bandwidth sent from loaders to appliers.
2020-04-12 10:46:58 -07:00
Meng Xu 55ee034e7f
Merge pull request #2916 from jzhou77/backup-fix
Remove version stamp ops from RestoreApplier
2020-04-11 14:04:11 -07:00
Meng Xu 2325ab209f FastRestore:Applier:Avoid extra copy in getAndComputeStagingKeys 2020-04-08 12:22:08 -07:00
Meng Xu 5ebafdb94c FastRestore:Apply clang-format to changes 2020-04-07 15:57:03 -07:00
Meng Xu e5b2cd81d5 FastRestore:Cleanup debug code 2020-04-07 15:56:44 -07:00
Jingyu Zhou cd8215ecf2 Remove version stamp ops from RestoreApplier
Version stamp ops are converted into SET at the proxy, so the backup files
will never have them.
2020-04-06 22:27:47 -07:00
Meng Xu a51ff7aaae FastRestore:Fix:buildVersionBatches may lose the last log file
If the last log file's endversion decides the last version batch's endversoin,
the buildVersionBatches function may quit early before include the last log file.

This causes some mutations missing and lead to incorrect DB.

This commit also addes an ASSERT(maxVBVersion >= targetVersion) to
alert such error as early as possible to simplify debug.
2020-04-06 12:24:26 -07:00
Meng Xu 536e65cd76 FastRestore:Introduce debugFRMutation for debug keys 2020-04-05 15:00:36 -07:00
Meng Xu 432c99afd0 FastRestore:Applier:Keep incompleteStagingKeys content before values are applied to DB
To avoid the incompleteStagingKeys is cleared before  getAndComputeStagingKeys() finish using it.
2020-04-04 22:38:04 -07:00
Meng Xu a81ec332a9 FastRestore:Fix:Master cannot throttle on in progress version batches when it release batches out of order in simulation 2020-04-04 17:34:26 -07:00
Meng Xu 6bce67ca75 FastRestore:Apply clang-format 2020-04-01 21:27:54 -07:00
Meng Xu c69c959428 FastRestore:Fix:It is legal for a backup key not exist in DB 2020-03-31 22:02:17 -07:00
Meng Xu 25e96a13d3 FastRestore:Fix clearrange on a key mistakenly clear other keys 2020-03-31 17:45:19 -07:00
Meng Xu 212dadc2a1 Fix bug in add mutation on applier
For clear range mutation, we may clear the right boundary key which should not be cleared.
2020-03-31 16:51:08 -07:00
Meng Xu 33c4be9c42 Improve debug message for debug mutations 2020-03-31 16:00:51 -07:00
Jingyu Zhou 65e3b9192e Add an assert for probably dead code 2020-03-28 21:19:47 -07:00
Meng Xu 13f343ec96 Resolve minor review comment 2020-03-28 16:03:01 -07:00
Jingyu Zhou 196127fb92 Address review comments 2020-03-23 14:15:36 -07:00
Jingyu Zhou fea6155714 StagingKey uses mutation instead of a vector of mutations for each log version
Because each log version contains commit version and subsequence number, each
key can only have one mutation for its log version. This simplifies
StagingKey::add() a lot.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 799f0b4b0e Small code refactor 2020-03-20 20:15:09 -07:00
Jingyu Zhou ab0b59b0c3 Add subsequence number to restore loader & applier
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.

For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
2020-03-20 20:13:38 -07:00
Andrew Noyes c3b67c0c63 Fix OPEN_FOR_IDE build 2020-03-03 11:32:43 -08:00
Meng Xu e6457ba0d5 FastRestore:Correct type for imcompleteStagingKeys 2020-03-02 11:33:07 -08:00
Meng Xu 2520e8d44c FastRestore:Use more concise code as suggested in review 2020-03-01 22:32:36 -08:00
Meng Xu 01c1a15caf FastRestore:Applier:Limit fetch keys number in a txn in getAndComputeStagingKeys 2020-02-28 16:53:36 -08:00
Meng Xu fe8b8bbbff FastRestore:Change vb state to class from enum 2020-02-27 20:15:25 -08:00
Meng Xu d77177367c FastRestore:Track each ongoing version batch progress state for applier and loader roles 2020-02-27 19:47:22 -08:00
Meng Xu fbb6e8f39d FastRestore:Create low memory situation in simulation on purpose 2020-02-26 14:54:38 -08:00
Meng Xu 06495b90ae FastRestore:Loader:Use isSchedulable to guard OOM
And trigger delayed actors that are blocked on memory to recheck memory.
2020-02-26 14:35:05 -08:00
Meng Xu a354f6ffa2 FastRestore:Applier:Use isSchedulable to guard OOM 2020-02-26 14:12:56 -08:00
Meng Xu ca726fc68e FastRestore:Introduce OOM protection
An actor is schedulable to run if the current worker has enough resourc, i.e.,
the worker's memory usage is below the threshold;
Exception: If the actor is working on the current version batch, we have to schedule
the actor to run to avoid dead-lock.
Future: When we release the actors that are blocked by memory usage, we should release them
in increasing order of their version batch.
2020-02-26 14:09:18 -08:00
Meng Xu fbf5020af9 FastRestore:Applier:Add fetchKeys counter 2020-02-26 11:37:40 -08:00
Meng Xu 6bd4703a9f FastRestore:Resolve review comments 2020-02-20 14:27:34 -08:00
Meng Xu d5d26f589f FastRestore:Cosmetic change to improve code readability 2020-02-19 15:43:51 -08:00
Meng Xu 7897b1658f FastRestore:Applier:Rename new apply actor names 2020-02-19 15:29:32 -08:00
Meng Xu e4258d73f5 FastRestore:Applier:Remove applying actors that do not have good perf 2020-02-19 15:27:59 -08:00
Meng Xu fe75a4cafb FastRestore:Apply clang-format 2020-02-19 15:22:52 -08:00
Meng Xu 03f699f2f9 Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR 2020-02-19 15:22:33 -08:00
Meng Xu 94d799552e FastRestore:Apply clang-format against master 2020-02-18 16:41:59 -08:00
Meng Xu 132f5aa9ba FastRestore:Improve trace name and cosmetic change 2020-02-18 16:41:19 -08:00
Meng Xu 31a6ec34b7 Merge branch 'master' into mengxu/fast-restore-agent-PR 2020-02-18 16:17:59 -08:00
Meng Xu c603b20e7e FastRestore:Resolve review comments 2020-02-18 14:08:27 -08:00
Meng Xu c34a69df32 FastRestore:Applier:Remove unused func 2020-02-13 23:06:56 -08:00
Meng Xu 3e2c19630a FastRestore:Applier:atomicOp can work on an empty key 2020-02-13 23:05:54 -08:00
Meng Xu b57583a504 FastRestore:Applier:Handle multiple gets in parallel 2020-02-13 23:05:31 -08:00
Meng Xu 53f427c319 FastRestore:Applier:fix getAndComputeStagingKeys 2020-02-13 22:11:30 -08:00
Meng Xu 0d668ea0c3 FastRestore:Applier:Add more trace for perf tracking 2020-02-13 15:50:10 -08:00
Meng Xu 0b27786811 FastRestore:Applier:Minor change for clang-format 2020-02-13 13:17:32 -08:00
Meng Xu b5e60585aa FastRestore:Applier:Fix precompute mutation result 2020-02-13 12:57:47 -08:00
Meng Xu b1b44d4477 FastRestore:Applier:Handle CompareAndClear atomicOp 2020-02-13 11:11:29 -08:00
Meng Xu 58dad5373b FastRestore:Applier:Handle CompareAndClear atomicOp 2020-02-13 11:06:20 -08:00
Meng Xu d3c01763d9 FastRestore:Applier:Handle version stamped key values 2020-02-13 10:48:36 -08:00
Meng Xu b008df97eb FastRestore:Applier:Multiple set-clear mutations at same version 2020-02-13 10:13:46 -08:00
Meng Xu 238b2cb8e4 FastRestore:Applier:Fix various bugs
1. segmentation error
2. there exist mutations that is not set or clear or atomicOp, precompute result should ignore them.
2020-02-13 10:00:23 -08:00
Meng Xu acf34319c1 FastRestore:Applier:Precompute mutations and apply in parallel
Precompute mutations received by an applier;
Only apply the final result to the destination DB;
Execute multiple txns in parallel to apply final results to the destination DB.
2020-02-12 22:47:48 -08:00
Meng Xu 2bc82ffd70 FastRestore:Applier:Store received mutation by key 2020-02-12 14:12:38 -08:00
Meng Xu c0f75d77b1 FastRestore:Applier:Intro StagingKey struct 2020-02-12 13:57:18 -08:00
Meng Xu 3e6bbe9e5b FastRestore:Applier:Use real size for atomic op 2020-02-11 15:51:32 -08:00
Meng Xu cda8fc189e FastRestore:AtomicOp:Intro weighted size for atomicOp
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Meng Xu e76b6d824a FastRestore:Assign priority to actors to prioritize vb work
When we pipeline multiple version batches, we should prevent a later
version batch from blocking the earlier version batch by consuming
CPU resources.

To achive the above, we should assign higher priority to actors
in later phases in a version batch.

Because restore master will not invoke an actor at a later phase unless
the actors at the earlier phases have been finished. This priority assignment
will not cause dead lock.
2020-02-10 20:29:23 -08:00
Meng Xu 325bd52939 FastRestore:Applier:Count appliedTxns 2020-02-10 17:13:20 -08:00
Meng Xu dbce1e9974 FastRestore:Applier:Add metrics counter and proc counter 2020-02-10 16:38:26 -08:00
Meng Xu 9b7a00a64f FastRestore:Mute trace when apply to db 2020-02-06 20:52:24 -08:00
Meng Xu dc848f4297 FastRestore:Disable verbose trace for perf. measurement 2020-02-06 20:50:23 -08:00
Meng Xu cab9d51e06 Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-01-27 18:16:26 -08:00
Meng Xu 141609e80a FastRestore:Improve code style and fix typos 2020-01-27 18:13:14 -08:00
Meng Xu b04e98771e FastRestore:Replace FastRestoreOpConfig with Knobs
And randomize value for the rest of knobs
2020-01-24 14:24:34 -08:00
Meng Xu 4bf579a6d5 FastRestore:Fix race condition in pipeline
Master should not start asking appliers to apply mutations at batchIndex
until all appliers have applied mutations at (batchIndex - 1).
Otherwise, mutations may not be applied in increasing order of versions,
because appliers at different batch index can have overlapped key ranges.
2020-01-23 16:34:45 -08:00
Meng Xu e011f39829 FastRestore:Add sanity check and trace events 2020-01-23 16:03:41 -08:00
Meng Xu 009fcdeb16 FastRestore:Sanity check each restore asset is processed exactly once 2020-01-21 17:17:45 -08:00
Meng Xu 022783b449 Start batches in reverse order for testings and code cleanup 2020-01-21 14:49:40 -08:00
Meng Xu 4ac92d223b Cleanup batch buffer for each restore request 2020-01-21 14:49:36 -08:00
Meng Xu 1a130b0df3 FastRestore:Fix race condition on handleApplyToDBRequest 2020-01-17 17:01:09 -08:00
Meng Xu bfbf2164c4 FastRestore:Applier buffer data for multiple batches 2020-01-17 17:01:01 -08:00
Meng Xu 67e913c3d5 Change LoadingParam struct and endVersion definition
1) Remove endVersion field because it has been included in RestoreAsset;

2) Ensure endVersion in VersionBatch and RestoreAsset is always exclusive;

3) Revise ASSERT in laoder and applier in situations when the dummy commit version
is endVersion, to avoid false positive ASSERT failure.
2020-01-07 11:48:03 -08:00
Meng Xu 8d6f511816 FastRestore:Resolve review comment
Filter out range mutations that do not overlap with the restore range.
Small changes on format.
2019-12-22 20:09:10 -08:00
Meng Xu ddcf3fdd80 FastRestore:Apply clang format 2019-12-20 22:00:36 -08:00
Meng Xu d888e3100b FastRestore:Applier:Add invariant 2019-12-20 19:34:28 -08:00
Meng Xu e98b2a0d1c FastRestore:Introduce RestoreAsset 2019-12-20 18:00:10 -08:00
Meng Xu 1371db4cdc FastRestore:Self code review and cleanup
1. Review memory use cases and improve:
Ensure state varialble is initialized and
change unnecessary  state variable to variable.

2. Remove debug code that is no longer useful;

3. Mute verbose debug.
2019-12-11 16:37:33 -08:00
Meng Xu feb2a8c70c FastRestore Change RestoreSendMutationVectorVersionedRequest name
Change RestoreSendMutationVectorVersionedRequest to
RestoreSendVersionedMutationsRequest for better naming
2019-12-10 17:23:40 -08:00
Meng Xu 39a4f2372f Change FASTRESTORE_SAMPLING_PERCENT to 0 to 100 2019-12-04 21:26:27 -08:00
Meng Xu c6b36dbffb FastRestore:Sampling:Resolve review comments 2019-12-04 17:35:11 -08:00
Meng Xu 2b987d1945 FastRestore:typedef Standalone<VectorRef<MutationRef>> MutationsVec 2019-12-04 11:39:55 -08:00
Meng Xu 9383c3f0a6 FastRestore:Sampling:Apply clang format 2019-12-03 21:27:06 -08:00
Meng Xu 3310f67e9e Merge branch 'mengxu/fast-restore-fix-valgrind-PR' into mengxu/fast-restore-sampling-PR 2019-12-03 16:24:40 -08:00
Meng Xu 530b689299 Move state variable to the start of function 2019-11-26 11:17:59 -08:00
Meng Xu 474f0067c4 Remove unneeded state 2019-11-25 23:10:14 -08:00
Meng Xu bb97307f08 FastRestore:Applier:Move state variables at the start of actor 2019-11-25 21:25:14 -08:00
Jingyu Zhou ae7e42face
Merge pull request #2313 from xumengpanda/mengxu/fastrestore-applyToDB-bugfix-PR
Performant restore [8/XX]: Fix bugs in applyToDB logic and add more tests
2019-11-12 08:50:23 -08:00
Meng Xu 630c29d160 FastRestore:resolve review comments
1) wait on whenAtLeast;
2) Put BigEndian64 into the function call and the decoder to prevent
future people from making the same mistake.
2019-11-11 17:00:16 -08:00