Commit Graph

42 Commits

Author SHA1 Message Date
Meng Xu 198696bc1e Move transformRestoredDatabase from server to client
AtomicRestore workload turns out to rely on the FileBackupAgent
client. Keeping transformRestoredDatabase in server makes linking harder.
2020-06-23 15:48:43 -07:00
Meng Xu 4e27fd34e5 Refactor transformDatabaseContents into RestoreCommon
Prepare to enable addPrefix for atomicRestore
2020-06-23 14:33:13 -07:00
Meng Xu e4bf6d570f FastRestore:Add assertion and trace events for diagnosis 2020-05-05 19:12:15 -07:00
Meng Xu 2fec56e7e2 FastRestore:Logging for getReplyBatches 2020-05-04 20:12:59 -07:00
Meng Xu 134dbca0ee FastRestore:Use cannonical way to trace error 2020-05-01 13:35:13 -07:00
Meng Xu 28178f356f FastRestore:Minor knob change and revise comments 2020-05-01 10:47:44 -07:00
Meng Xu 41c0a1768f FastRestore:Make FastRestore event type more descriptive 2020-05-01 10:27:08 -07:00
Meng Xu 05ba743f96 Control number of replies on wait in getBatchReplies 2020-05-01 10:09:08 -07:00
Meng Xu 96855d9b47 FastRestore:Loader:Enable sending mutation messages out of order 2020-04-25 17:21:17 -07:00
Meng Xu 93112d0adb FastRestore:getBatchReplies:resetReply on errors unconditionally
This can avoid immediate error at the cost that the sampling mutation stats
can be off.
We can change this to reset only the error request later.
2020-04-24 10:31:30 -07:00
Meng Xu f073049865 FastRestore:Revise trace events to be descriptive
Revert changes that send mutations to appliers out of order
2020-04-24 10:31:08 -07:00
Meng Xu 38193a3866 Merge branch 'master' into mengxu/fr-code-improvement-PR 2020-04-22 10:51:33 -07:00
Jingyu Zhou 6909f0b8fc Remove decodeRangeFileBlock from parallel restore
Reuse the one from fileBackup namespace.
2020-04-21 13:42:24 -07:00
Meng Xu 2960a2fe8a FastRestore:Add knob to control parallelism in waiting requests 2020-04-19 21:34:11 -07:00
Meng Xu a0c32f7a67 FastRestore:getBatchReplies:Comment out trace for performance 2020-04-08 15:43:40 -07:00
Meng Xu 2325ab209f FastRestore:Applier:Avoid extra copy in getAndComputeStagingKeys 2020-04-08 12:22:08 -07:00
Jingyu Zhou 88ad28e576 Integrate parallel restore with partitioned logs
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.

TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Meng Xu 2c6f82e1ab FastRestore:Add unit name to threshold knob name 2020-03-02 10:52:44 -08:00
Meng Xu 2520e8d44c FastRestore:Use more concise code as suggested in review 2020-03-01 22:32:36 -08:00
Meng Xu 62b9043ff6 FastRestore:DB can be destroyed before master unlock it in simulation
Because retore roles run as workload in simulation,
they do not know when DB is destroyed by the backup and restore test workload.
So if DB is destroyed earlier than restore master unlocks DB, which is rare,
restore master should abort the unlocking DB step.
2020-02-28 14:25:58 -08:00
Meng Xu fbf5020af9 FastRestore:Applier:Add fetchKeys counter 2020-02-26 11:37:40 -08:00
Meng Xu 8506bce493 FastRestore:Reuse getBatchReplies for sendBatchRequests
Remove old sendBatchRequests and getBatchReplies as well.
2020-02-21 16:15:53 -08:00
Meng Xu 4dd206b1b8 FastRestore:Use new getBatchReplies that profile request latency 2020-02-21 15:59:57 -08:00
Meng Xu 505997ba0a FastRestore:Switch to new sendBatchRequests that tracks performance and straggler 2020-02-21 15:45:32 -08:00
Meng Xu 05ea79f584 FastRestore:Profile performance for getBatchReplies
Generic approach to profile getBatchReplies performance
and detect straggler.
2020-02-21 15:20:22 -08:00
Meng Xu ab2dd36bdc FastRestore:Generic way to detect stragger 2020-02-21 14:30:08 -08:00
Meng Xu e76b6d824a FastRestore:Assign priority to actors to prioritize vb work
When we pipeline multiple version batches, we should prevent a later
version batch from blocking the earlier version batch by consuming
CPU resources.

To achive the above, we should assign higher priority to actors
in later phases in a version batch.

Because restore master will not invoke an actor at a later phase unless
the actors at the earlier phases have been finished. This priority assignment
will not cause dead lock.
2020-02-10 20:29:23 -08:00
Meng Xu cab9d51e06 Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-01-27 18:16:26 -08:00
Meng Xu 52e3d20d39 FastRestore:VersionBatch replace vector with set
In order to ensure each backup file only appears in version batch once.
2020-01-22 13:13:10 -08:00
Meng Xu 153b713b53 FastRestore:Add sampling on parsed mutations 2019-12-03 12:52:17 -08:00
Meng Xu e345c9061f FastRestore:Refine debug messages 2019-11-04 11:47:38 -08:00
Meng Xu 3e2b3de4d0 FastRestore:RestoreMaster:Remove the extra lockDatabase in RestoreMaster 2019-10-17 00:50:13 -07:00
Meng Xu 2cd7010efb FastRestore:Add fileIndex to RestoreFileFR struct and bug fix
Fix bugs in RestoreMaster that cannot properly lock or unlock DB when
exception occurs;
Fix bug in ordering backup files
2019-10-17 00:50:13 -07:00
Meng Xu 2602cb3591 FastRestore:Rename RestoreConfig to RestoreConfigFR to fix link problem in windows
Because the current restore has defined RestoreConfig, windows linker complains.
This commit rename the RestoreConfig used in FastRestore as RestoreConfigFR.
2019-08-02 23:00:12 -07:00
Meng Xu 9cc832cfd6 FastRestore:Fix Mac and Windows compilation error 2019-08-02 14:33:08 -07:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 45b9504ba6 FastRestore:Refactor distribute workload for version batch
Rewrite the code that collects files for a version batch and that
distribute workload among loaders for files in a version batch.
The new code is easier to understand and maintain.
2019-05-30 17:39:50 -07:00
Meng Xu 620cdd411e FastRestore:Add comments for each restore file 2019-05-12 21:53:43 -07:00
Meng Xu 5406c74daf FastRestore: Ensure actorcompiler.h is included 2019-05-11 22:48:39 -07:00
Meng Xu a08a6776f5 FastRestore: Refactor to smaller components
The current code uses one restore interface to handle the work
for all restore roles, i.e., master, loader and applier.
This makes it harder to review or maintain or scale.

This commit split the restore into multiple roles by mimicing FDB
transaction system:
1) It uses a RestoreWorker as the process to host restore roles;
   This commit assumes one restore role per RestoreWorker; but
   it should be easy to extend to support multiple roles per RestoreWorker;
2) It creates 3 restore roles:
   RestoreMaster: Coordinate the restore process and send commands to the other two roles;
   RestoreLoader: Parse backup files to mutations and send mutations to appliers;
   RestoreApplier: Sort received mutations and apply them to DB in order.

Compilable version. To be tested in correctness.
2019-05-10 14:20:06 -07:00
Meng Xu 25c75f4222 FastRestore: Add new empty files for restore roles
Add .h and .cpp files for RestoreLoader and RestoreApplier roles.
We will split the code for each restore role into a separate file.

This commit also fixes the bug in including RestoreCommon.actor.h, and
remove the unused code.
2019-05-06 16:59:41 -07:00
Meng Xu 19841f9ef5 FastRestore: Move copied code into a separate file
We re-use some code from the existing restore system.
To make code review easier and code cleaner, we move the copied and
small-changed code into two separate files:
RestoreCommon.actor.h and RestoreCommon.actor.cpp
2019-04-30 20:57:02 -07:00