Commit Graph

229 Commits

Author SHA1 Message Date
Meng Xu 3d6f69c8e2 FastRestore:addPrefix:Transform must clear both orignal and transformed range
Otherwise, anything left in the range can interfer with the result.
2020-06-21 22:18:12 -07:00
Jingyu Zhou df064ac922
Merge pull request #3321 from xumengpanda/mengxu/fr-restore-ranges-PR
Fast Restore: Support restoring sub ranges in the framework
2020-06-09 21:01:39 -07:00
Meng Xu d85dc5a4d3 FastRestore:Only clear ranges that will be restored
Instead of clearning the entire normal key space.

This commit also removes some unnecessary tr->reset() which can invalid the txn backoff time.
2020-06-08 22:41:49 -07:00
Meng Xu 28212d397d RestoreApplier:Remove getValue actor 2020-06-08 20:32:52 -07:00
Meng Xu 1edcee4e9d RestoreApplier:Rewrite getKeys because key_not_exists error is handled by txn internally 2020-06-08 20:27:25 -07:00
Meng Xu 5022566b35 Validate if key_not_found error ever happens 2020-06-08 16:59:00 -07:00
Meng Xu f00deefd5a RestoreApplier:Remove unnecessary txn reset 2020-06-08 10:10:32 -07:00
Meng Xu 8c81fedf11 RestoreApplier:Better handling of key not exist 2020-06-07 21:49:35 -07:00
Meng Xu 94be3afcf8 RestoreApplier:Costmic change based on review 2020-06-06 21:17:57 -07:00
Meng Xu f51fca0bf3 FastRestore:Sanity check actors do not throw error silently 2020-06-05 17:44:24 -07:00
Meng Xu ffe949b04d Applier:getAndComputeStagingKeys:reset txn at first error
When tr->onError() is ready, the txn state has been reset.
We cannot wait on the get() future from the txn because its state has been deleted.
If we do that, it will throw txn_cancelled error, which will be throw all the way
up to the RestoreApplier main loop.

The batchData->dbApplier, which is assigned by writeMutationsToDB(self->id(), req.batchIndex, batchData, cx),
will become ready but isError(). This will make all handleApplyToDBRequest throw error silently.
2020-06-05 16:40:19 -07:00
Meng Xu e9af22085b Debug: getAndComputeStagingKeys may be stuck
Maybe wait(success(fValues[i])); never return
2020-06-04 21:26:14 -07:00
Meng Xu 633587a95a RestoreApplier:getAndComputeStagingKeys:retry for keys that exist in DB
Test shows that we cannot just skip the key that exist in DB but has
future_version error.
2020-06-03 21:17:35 -07:00
Meng Xu 87a557dcb4 FastRestore:Applier:Treat future_version as key not exist 2020-06-03 18:30:59 -07:00
Meng Xu d5025a1779 getAndComputeStagingKeys: Improved handling of not exist keys 2020-06-03 15:32:36 -07:00
Meng Xu f5aef706f6 FastRestore:Delay leader election until restore requests are set 2020-05-12 19:11:08 -07:00
Meng Xu a93c23d239 Resovle review comments 2020-05-07 15:06:59 -07:00
Meng Xu e4bf6d570f FastRestore:Add assertion and trace events for diagnosis 2020-05-05 19:12:15 -07:00
Meng Xu 4d90384c58 Correct suppression event 2020-05-05 12:36:32 -07:00
Meng Xu c49b6756fe FastRestoreApplier:Trace clear range op when it has too many for debug 2020-05-05 09:28:50 -07:00
Meng Xu 759820cc61 FastRestoreApplier:Add warning when too many clears in a txn 2020-05-05 09:00:02 -07:00
Meng Xu 62de02fb2c FastRestoreApplier:Add delay to avoid overwelming DB 2020-05-05 08:47:26 -07:00
Meng Xu 67b9e0b29a FastRestoreApplier:Add sanity check and trace for debugging stall 2020-05-04 22:32:57 -07:00
Meng Xu d22af629cd FastRestoreApplier:Add applierID and batchIndex for precompute stage 2020-05-04 16:32:09 -07:00
Meng Xu abda13e9df FastRestoreApplier:Free memory at each VB and refactor handleApplyToDBRequest 2020-05-04 15:29:27 -07:00
Meng Xu 135f6443da FastRestoreApplier:Add trace to track applying status 2020-05-04 15:02:53 -07:00
Meng Xu 0ba1551116 FastRestore:Trace memory usage periodically 2020-05-04 11:20:53 -07:00
Meng Xu 7b5d43da9c FastRestore:Remove unused field in RestoreRequest 2020-05-03 20:59:47 -07:00
Meng Xu ae86b5bb68 FastRestoreApplier:Continue when a key not exists in DB
Although we thought all keys cached in appliers should have
a base value in DB.
2020-05-03 20:47:21 -07:00
Meng Xu 528466e0e6 FastRestore:Fix Valgrind error InvalidSuppression
Trace.error() must explicitly include error_code_actor_cancelled
to handle the error.
2020-05-02 19:52:05 -07:00
Meng Xu f9f1ac6594 FastRestore:Revise TraceEvent for better diagnosis 2020-05-01 16:31:55 -07:00
Meng Xu 134dbca0ee FastRestore:Use cannonical way to trace error 2020-05-01 13:35:13 -07:00
Meng Xu 41c0a1768f FastRestore:Make FastRestore event type more descriptive 2020-05-01 10:27:08 -07:00
Meng Xu 038f3834fc Merge branch 'master' into mengxu/fr-code-improvement-PR 2020-05-01 09:26:29 -07:00
Meng Xu 6bd71560f0 FastRestore:Reduce trace events in real cluster environment 2020-04-30 19:12:31 -07:00
Meng Xu f073049865 FastRestore:Revise trace events to be descriptive
Revert changes that send mutations to appliers out of order
2020-04-24 10:31:08 -07:00
Meng Xu d21da5065a FastRestore:Loader:Merge MutationsVec and LogMessageVersionVec into VersionedMutationsVec
Remove the actor that sends one mutation message batch in the previous commit,
because that actor no longer reduces the code complexity.
2020-04-21 22:05:34 -07:00
Meng Xu 061bcd2fb4 FastRestore:Replace typeString with safe getTypeString func
Also fix compilation error in previous commit
2020-04-13 15:15:54 -07:00
Meng Xu dbc9c23193 FastRestore:Loader:Send mutations at different versions in the same message to appliers
This increases the bandwidth sent from loaders to appliers.
2020-04-12 10:46:58 -07:00
Meng Xu 55ee034e7f
Merge pull request #2916 from jzhou77/backup-fix
Remove version stamp ops from RestoreApplier
2020-04-11 14:04:11 -07:00
Meng Xu 2325ab209f FastRestore:Applier:Avoid extra copy in getAndComputeStagingKeys 2020-04-08 12:22:08 -07:00
Meng Xu 5ebafdb94c FastRestore:Apply clang-format to changes 2020-04-07 15:57:03 -07:00
Meng Xu e5b2cd81d5 FastRestore:Cleanup debug code 2020-04-07 15:56:44 -07:00
Jingyu Zhou cd8215ecf2 Remove version stamp ops from RestoreApplier
Version stamp ops are converted into SET at the proxy, so the backup files
will never have them.
2020-04-06 22:27:47 -07:00
Meng Xu a51ff7aaae FastRestore:Fix:buildVersionBatches may lose the last log file
If the last log file's endversion decides the last version batch's endversoin,
the buildVersionBatches function may quit early before include the last log file.

This causes some mutations missing and lead to incorrect DB.

This commit also addes an ASSERT(maxVBVersion >= targetVersion) to
alert such error as early as possible to simplify debug.
2020-04-06 12:24:26 -07:00
Meng Xu 536e65cd76 FastRestore:Introduce debugFRMutation for debug keys 2020-04-05 15:00:36 -07:00
Meng Xu 432c99afd0 FastRestore:Applier:Keep incompleteStagingKeys content before values are applied to DB
To avoid the incompleteStagingKeys is cleared before  getAndComputeStagingKeys() finish using it.
2020-04-04 22:38:04 -07:00
Meng Xu a81ec332a9 FastRestore:Fix:Master cannot throttle on in progress version batches when it release batches out of order in simulation 2020-04-04 17:34:26 -07:00
Meng Xu 6bce67ca75 FastRestore:Apply clang-format 2020-04-01 21:27:54 -07:00
Meng Xu c69c959428 FastRestore:Fix:It is legal for a backup key not exist in DB 2020-03-31 22:02:17 -07:00
Meng Xu 25e96a13d3 FastRestore:Fix clearrange on a key mistakenly clear other keys 2020-03-31 17:45:19 -07:00
Meng Xu 212dadc2a1 Fix bug in add mutation on applier
For clear range mutation, we may clear the right boundary key which should not be cleared.
2020-03-31 16:51:08 -07:00
Meng Xu 33c4be9c42 Improve debug message for debug mutations 2020-03-31 16:00:51 -07:00
Jingyu Zhou 65e3b9192e Add an assert for probably dead code 2020-03-28 21:19:47 -07:00
Meng Xu 13f343ec96 Resolve minor review comment 2020-03-28 16:03:01 -07:00
Jingyu Zhou 196127fb92 Address review comments 2020-03-23 14:15:36 -07:00
Jingyu Zhou fea6155714 StagingKey uses mutation instead of a vector of mutations for each log version
Because each log version contains commit version and subsequence number, each
key can only have one mutation for its log version. This simplifies
StagingKey::add() a lot.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 799f0b4b0e Small code refactor 2020-03-20 20:15:09 -07:00
Jingyu Zhou ab0b59b0c3 Add subsequence number to restore loader & applier
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.

For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
2020-03-20 20:13:38 -07:00
Andrew Noyes c3b67c0c63 Fix OPEN_FOR_IDE build 2020-03-03 11:32:43 -08:00
Meng Xu e6457ba0d5 FastRestore:Correct type for imcompleteStagingKeys 2020-03-02 11:33:07 -08:00
Meng Xu 2520e8d44c FastRestore:Use more concise code as suggested in review 2020-03-01 22:32:36 -08:00
Meng Xu 01c1a15caf FastRestore:Applier:Limit fetch keys number in a txn in getAndComputeStagingKeys 2020-02-28 16:53:36 -08:00
Meng Xu fe8b8bbbff FastRestore:Change vb state to class from enum 2020-02-27 20:15:25 -08:00
Meng Xu d77177367c FastRestore:Track each ongoing version batch progress state for applier and loader roles 2020-02-27 19:47:22 -08:00
Meng Xu fbb6e8f39d FastRestore:Create low memory situation in simulation on purpose 2020-02-26 14:54:38 -08:00
Meng Xu 06495b90ae FastRestore:Loader:Use isSchedulable to guard OOM
And trigger delayed actors that are blocked on memory to recheck memory.
2020-02-26 14:35:05 -08:00
Meng Xu a354f6ffa2 FastRestore:Applier:Use isSchedulable to guard OOM 2020-02-26 14:12:56 -08:00
Meng Xu ca726fc68e FastRestore:Introduce OOM protection
An actor is schedulable to run if the current worker has enough resourc, i.e.,
the worker's memory usage is below the threshold;
Exception: If the actor is working on the current version batch, we have to schedule
the actor to run to avoid dead-lock.
Future: When we release the actors that are blocked by memory usage, we should release them
in increasing order of their version batch.
2020-02-26 14:09:18 -08:00
Meng Xu fbf5020af9 FastRestore:Applier:Add fetchKeys counter 2020-02-26 11:37:40 -08:00
Meng Xu 6bd4703a9f FastRestore:Resolve review comments 2020-02-20 14:27:34 -08:00
Meng Xu d5d26f589f FastRestore:Cosmetic change to improve code readability 2020-02-19 15:43:51 -08:00
Meng Xu 7897b1658f FastRestore:Applier:Rename new apply actor names 2020-02-19 15:29:32 -08:00
Meng Xu e4258d73f5 FastRestore:Applier:Remove applying actors that do not have good perf 2020-02-19 15:27:59 -08:00
Meng Xu fe75a4cafb FastRestore:Apply clang-format 2020-02-19 15:22:52 -08:00
Meng Xu 03f699f2f9 Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR 2020-02-19 15:22:33 -08:00
Meng Xu 94d799552e FastRestore:Apply clang-format against master 2020-02-18 16:41:59 -08:00
Meng Xu 132f5aa9ba FastRestore:Improve trace name and cosmetic change 2020-02-18 16:41:19 -08:00
Meng Xu 31a6ec34b7 Merge branch 'master' into mengxu/fast-restore-agent-PR 2020-02-18 16:17:59 -08:00
Meng Xu c603b20e7e FastRestore:Resolve review comments 2020-02-18 14:08:27 -08:00
Meng Xu c34a69df32 FastRestore:Applier:Remove unused func 2020-02-13 23:06:56 -08:00
Meng Xu 3e2c19630a FastRestore:Applier:atomicOp can work on an empty key 2020-02-13 23:05:54 -08:00
Meng Xu b57583a504 FastRestore:Applier:Handle multiple gets in parallel 2020-02-13 23:05:31 -08:00
Meng Xu 53f427c319 FastRestore:Applier:fix getAndComputeStagingKeys 2020-02-13 22:11:30 -08:00
Meng Xu 0d668ea0c3 FastRestore:Applier:Add more trace for perf tracking 2020-02-13 15:50:10 -08:00
Meng Xu 0b27786811 FastRestore:Applier:Minor change for clang-format 2020-02-13 13:17:32 -08:00
Meng Xu b5e60585aa FastRestore:Applier:Fix precompute mutation result 2020-02-13 12:57:47 -08:00
Meng Xu b1b44d4477 FastRestore:Applier:Handle CompareAndClear atomicOp 2020-02-13 11:11:29 -08:00
Meng Xu 58dad5373b FastRestore:Applier:Handle CompareAndClear atomicOp 2020-02-13 11:06:20 -08:00
Meng Xu d3c01763d9 FastRestore:Applier:Handle version stamped key values 2020-02-13 10:48:36 -08:00
Meng Xu b008df97eb FastRestore:Applier:Multiple set-clear mutations at same version 2020-02-13 10:13:46 -08:00
Meng Xu 238b2cb8e4 FastRestore:Applier:Fix various bugs
1. segmentation error
2. there exist mutations that is not set or clear or atomicOp, precompute result should ignore them.
2020-02-13 10:00:23 -08:00
Meng Xu acf34319c1 FastRestore:Applier:Precompute mutations and apply in parallel
Precompute mutations received by an applier;
Only apply the final result to the destination DB;
Execute multiple txns in parallel to apply final results to the destination DB.
2020-02-12 22:47:48 -08:00
Meng Xu 2bc82ffd70 FastRestore:Applier:Store received mutation by key 2020-02-12 14:12:38 -08:00
Meng Xu c0f75d77b1 FastRestore:Applier:Intro StagingKey struct 2020-02-12 13:57:18 -08:00
Meng Xu 3e6bbe9e5b FastRestore:Applier:Use real size for atomic op 2020-02-11 15:51:32 -08:00
Meng Xu cda8fc189e FastRestore:AtomicOp:Intro weighted size for atomicOp
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Meng Xu e76b6d824a FastRestore:Assign priority to actors to prioritize vb work
When we pipeline multiple version batches, we should prevent a later
version batch from blocking the earlier version batch by consuming
CPU resources.

To achive the above, we should assign higher priority to actors
in later phases in a version batch.

Because restore master will not invoke an actor at a later phase unless
the actors at the earlier phases have been finished. This priority assignment
will not cause dead lock.
2020-02-10 20:29:23 -08:00
Meng Xu 325bd52939 FastRestore:Applier:Count appliedTxns 2020-02-10 17:13:20 -08:00
Meng Xu dbce1e9974 FastRestore:Applier:Add metrics counter and proc counter 2020-02-10 16:38:26 -08:00