Commit Graph

140 Commits

Author SHA1 Message Date
Meng Xu 719eda9421 FastRestore:Add an assertion in handleRestoreSysInfoRequest
as suggested in code review.
2020-04-18 22:42:46 -07:00
Meng Xu 10a6461d13 FastRestore:Change __inline__ to inline
__inline__ is compiler specific while inline is the standard keyword
2020-04-17 22:31:44 -07:00
Meng Xu 916d361587 BackupAndParallelRestoreCorrectness:Remove unnecessary checking optional variable 2020-04-17 18:32:14 -07:00
Meng Xu c8d049d0bb FastRestore:Loader:Add counter oldLogMutations 2020-04-17 15:21:59 -07:00
Meng Xu d6c1baa784 FastRestore:Filter out log mutations whose version is smaller than range mutation version 2020-04-15 19:45:03 -07:00
Meng Xu dbc9c23193 FastRestore:Loader:Send mutations at different versions in the same message to appliers
This increases the bandwidth sent from loaders to appliers.
2020-04-12 10:46:58 -07:00
Meng Xu 2325ab209f FastRestore:Applier:Avoid extra copy in getAndComputeStagingKeys 2020-04-08 12:22:08 -07:00
Meng Xu da7249ed1c FastRestore:Minor revision based on review comments 2020-04-02 11:15:22 -07:00
Meng Xu 6bce67ca75 FastRestore:Apply clang-format 2020-04-01 21:27:54 -07:00
Meng Xu 33c4be9c42 Improve debug message for debug mutations 2020-03-31 16:00:51 -07:00
Meng Xu e286f316b9 Increase generated key length for splitMutation unit test 2020-03-31 14:27:07 -07:00
Meng Xu b7da76223c Fix a tricky splitMutation bug
splitMutation result may include the end key of a clearrange mutation
2020-03-31 13:38:58 -07:00
Meng Xu ccbbdc4ba4 Unit test:Verify splitMutation by comparing with intersectingRanges result 2020-03-31 12:13:02 -07:00
Meng Xu 8a30526336 FastRestore:Remove commented assertion 2020-03-28 13:11:32 -07:00
Meng Xu 404a3e2619 FastRestore:Loader:Remove sanity chech for the order of sending log and range mutations 2020-03-27 23:36:13 -07:00
Meng Xu 97f8e46388 Sanity check subversion for log mutations 2020-03-27 13:07:08 -07:00
Jingyu Zhou 99f4ef6e0c Fix restore loader to handle mutation sub number
For old backup format, give them a sub sequence number starting from 0 for each
commit version.
2020-03-26 13:04:00 -07:00
Jingyu Zhou 4bdb32be14 Batch sending all mutations of a version from RestoreLoader
This optimization is to reduce the number of messages sent from loader to
applier, which was unintentionally done when introducing sub sequence numbers
for mutations.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 799f0b4b0e Small code refactor 2020-03-20 20:15:09 -07:00
Jingyu Zhou e40f937d3a Fix missing mutations in splitMutation
When a range mutation is larger than the last split point, this mutation can
become missing in the RestoreLoader, which is fixed in this commit.
2020-03-20 20:15:09 -07:00
Jingyu Zhou fe51ba3d16 Give maximum subsequence number for snapshot mutations
This is needed so that mutations in partitioned logs are applied first and
snapshot mutations are applied later for the same commit version.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 2eac17b553 StagingKey can add out-of-order mutations
For partitioned logs, mutations of the same version may be sent to applier
out-of-order. If one loader advances to the next version, an applier may
receive later version mutations for different loaders. So, dropping of early
mutations is wrong.
2020-03-20 20:13:38 -07:00
Jingyu Zhou ab0b59b0c3 Add subsequence number to restore loader & applier
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.

For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 6b9b93314e Check block padding is \0xff for new mutation logs 2020-03-20 20:13:38 -07:00
Jingyu Zhou 35aafefb89 Consolidate StringRefReader classes
Fix a compiler error of unused variable too.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 88ad28e576 Integrate parallel restore with partitioned logs
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.

TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Jingyu Zhou e15015ee6c Add mutation log version names
I.e., BACKUP_AGENT_MLOG_VERSION for 2001 and PARTITIONED_MLOG_VERSION for 4110.
2020-03-20 20:13:38 -07:00
Meng Xu d3071409c5 FastRestore:Add comment for integrating with new backup format 2020-03-20 20:13:38 -07:00
Meng Xu 2520e8d44c FastRestore:Use more concise code as suggested in review 2020-03-01 22:32:36 -08:00
Meng Xu 62b9043ff6 FastRestore:DB can be destroyed before master unlock it in simulation
Because retore roles run as workload in simulation,
they do not know when DB is destroyed by the backup and restore test workload.
So if DB is destroyed earlier than restore master unlocks DB, which is rare,
restore master should abort the unlocking DB step.
2020-02-28 14:25:58 -08:00
Meng Xu fbb6e8f39d FastRestore:Create low memory situation in simulation on purpose 2020-02-26 14:54:38 -08:00
Meng Xu 06495b90ae FastRestore:Loader:Use isSchedulable to guard OOM
And trigger delayed actors that are blocked on memory to recheck memory.
2020-02-26 14:35:05 -08:00
Meng Xu ca726fc68e FastRestore:Introduce OOM protection
An actor is schedulable to run if the current worker has enough resourc, i.e.,
the worker's memory usage is below the threshold;
Exception: If the actor is working on the current version batch, we have to schedule
the actor to run to avoid dead-lock.
Future: When we release the actors that are blocked by memory usage, we should release them
in increasing order of their version batch.
2020-02-26 14:09:18 -08:00
Meng Xu 505997ba0a FastRestore:Switch to new sendBatchRequests that tracks performance and straggler 2020-02-21 15:45:32 -08:00
Meng Xu 03f699f2f9 Merge branch 'master' into mengxu/fast-restore-applier-multi-applying-PR 2020-02-19 15:22:33 -08:00
Meng Xu 94d799552e FastRestore:Apply clang-format against master 2020-02-18 16:41:59 -08:00
Meng Xu 132f5aa9ba FastRestore:Improve trace name and cosmetic change 2020-02-18 16:41:19 -08:00
Meng Xu b5e60585aa FastRestore:Applier:Fix precompute mutation result 2020-02-13 12:57:47 -08:00
Meng Xu cda8fc189e FastRestore:AtomicOp:Intro weighted size for atomicOp
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Meng Xu e76b6d824a FastRestore:Assign priority to actors to prioritize vb work
When we pipeline multiple version batches, we should prevent a later
version batch from blocking the earlier version batch by consuming
CPU resources.

To achive the above, we should assign higher priority to actors
in later phases in a version batch.

Because restore master will not invoke an actor at a later phase unless
the actors at the earlier phases have been finished. This priority assignment
will not cause dead lock.
2020-02-10 20:29:23 -08:00
Meng Xu dbce1e9974 FastRestore:Applier:Add metrics counter and proc counter 2020-02-10 16:38:26 -08:00
Meng Xu 1fc793d6a7 FastRestore:Loader:Add metrics counter 2020-02-09 22:06:14 -08:00
Meng Xu 72110de7e2 FastRestore:Add trace for quick perf. measurement 2020-02-06 19:48:26 -08:00
Meng Xu cab9d51e06 Merge branch 'master' into mengxu/fast-restore-pipeline-PR 2020-01-27 18:16:26 -08:00
Meng Xu 141609e80a FastRestore:Improve code style and fix typos 2020-01-27 18:13:14 -08:00
Meng Xu cfdcddd90e FastRestore:Loader:Pipeline sendMutationsToApplier actors 2020-01-23 20:22:05 -08:00
Meng Xu e011f39829 FastRestore:Add sanity check and trace events 2020-01-23 16:03:41 -08:00
Meng Xu 009fcdeb16 FastRestore:Sanity check each restore asset is processed exactly once 2020-01-21 17:17:45 -08:00
Meng Xu 022783b449 Start batches in reverse order for testings and code cleanup 2020-01-21 14:49:40 -08:00
Meng Xu 4ac92d223b Cleanup batch buffer for each restore request 2020-01-21 14:49:36 -08:00