Commit Graph

9393 Commits

Author SHA1 Message Date
Jingyu Zhou 05b87cf288 Partitioned logs need to compute continuous begin Version
Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 1f95cba53e Add describePartitionedBackup() for parallel restore
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.

Rename partitioned log file so that the last number is block size.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 2eac17b553 StagingKey can add out-of-order mutations
For partitioned logs, mutations of the same version may be sent to applier
out-of-order. If one loader advances to the next version, an applier may
receive later version mutations for different loaders. So, dropping of early
mutations is wrong.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 938a6f358d Describe backup uses partitioned logs to find continuous end version
For partitioned logs, the continuous end version has to be done range by range,
where each range must contain continuous version for all tags.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 659843ff51 Check partitioned log files are continuous for RestoreSet
The idea of checking is to use Tag 0 to find out ranges and their number of
tags. Then for each tag 1 and above, check versions are continuous.
2020-03-20 20:13:38 -07:00
Jingyu Zhou ab0b59b0c3 Add subsequence number to restore loader & applier
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.

For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
2020-03-20 20:13:38 -07:00
Jingyu Zhou fda6c08640 Include a total number of tags in partition log file names
This is needed for BackupContainer to check partitioned mutation logs are
continuous, i.e., restorable to a version.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 64859467e4 Return partitioned logs for RestorableFileSet 2020-03-20 20:13:38 -07:00
Jingyu Zhou 940bea102a Add a knob to switch mutation logs for parallel restore
Knob FASTRESTORE_USE_PARTITIONED_LOGS, default is true to enable partitioned
mutation logs. Otherwise, old mutation logs are used.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 6b9b93314e Check block padding is \0xff for new mutation logs 2020-03-20 20:13:38 -07:00
Jingyu Zhou 35aafefb89 Consolidate StringRefReader classes
Fix a compiler error of unused variable too.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 88ad28e576 Integrate parallel restore with partitioned logs
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.

TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Jingyu Zhou e15015ee6c Add mutation log version names
I.e., BACKUP_AGENT_MLOG_VERSION for 2001 and PARTITIONED_MLOG_VERSION for 4110.
2020-03-20 20:13:38 -07:00
Jingyu Zhou ec352c03c9 Add partitioned logs to BackupContainer 2020-03-20 20:13:38 -07:00
Meng Xu d3071409c5 FastRestore:Add comment for integrating with new backup format 2020-03-20 20:13:38 -07:00
Jingyu Zhou 3801e50288 Backup worker: enable 50% of time in simulation
Make this randomization a separate one.
2020-03-20 20:13:38 -07:00
Meng Xu 980037f3a8
Merge pull request #2835 from bnamasivayam/revert-report-conflicting-keys
Revert report conflicting keys
2020-03-20 10:33:26 -07:00
Evan Tschannen e7e559cbae
Merge pull request #2706 from etschannen/feature-test-harness
Added TestHarness and TraceLogHelper for assisting with automated simulation testing
2020-03-20 10:29:22 -07:00
A.J. Beamon cf9a18a64d
Merge pull request #2838 from dongxinEric/misc/diable-ruby-tester
Disable Ruby tests for the moment.
2020-03-20 10:26:26 -07:00
Evan Tschannen a38a7fc8b4 updated copyright date 2020-03-20 10:15:33 -07:00
Xin Dong f293666028 Added back unnecessary changes. 2020-03-20 10:12:37 -07:00
Xin Dong 851ee20c1a Disable Ruby tests for the moment. 2020-03-20 10:05:14 -07:00
Jingyu Zhou 34415f82b3
Merge pull request #2832 from xumengpanda/mengxu/backup-code-review-PR
Buggify upload delay when backup worker upload data to blob
2020-03-19 21:42:28 -07:00
Balachandar Namasivayam 804fe1b22e Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key"
This reverts commit 648dc4a933, reversing
changes made to 487d131b38.
2020-03-19 21:34:28 -07:00
Balachandar Namasivayam efd0c6cec0 Revert "Merge pull request #2833 from xumengpanda/mengxu/remove-test-PR"
This reverts commit 8d655d7e40, reversing
changes made to cd5be43cd9.
2020-03-19 21:33:47 -07:00
Meng Xu dfea2c2e55 BackupWorker:Remove assert in pop 2020-03-19 20:14:52 -07:00
Meng Xu a323b80439 BackupWorker:Improve code comments 2020-03-19 15:58:22 -07:00
Jingyu Zhou 8bdda0fe04 Backup Worker: Give a chance of saving progress before displaced
Move the exit loop after the saving of progress so that when doneTrigger is
active, we won't exit the loop immediately.
2020-03-19 14:59:38 -07:00
Jingyu Zhou 8d655d7e40
Merge pull request #2833 from xumengpanda/mengxu/remove-test-PR
Remove ReportConflictingKeys.txt workload
2020-03-19 12:41:27 -07:00
Meng Xu 4f61973ede Remove the ReportConflictingKeys.txt test from CMake 2020-03-19 10:49:54 -07:00
Meng Xu 308d82245c Remove ReportConflictingKeys.txt workload
The PR 2257 still has many correctness failures.
Remove the workload to avoid noise while we are working on the fixes.
2020-03-19 10:34:14 -07:00
Jingyu Zhou 5bf62c8f85 Reduce a call to getLogSystemConfig() 2020-03-19 10:08:19 -07:00
Alex Miller cd5be43cd9
Merge pull request #2816 from mpilman/features/docker-script
added script that will generated a dev-docker img
2020-03-19 02:02:58 -07:00
Meng Xu 94276076de BackupWorker:Buggify upload delay
Add questions to code as well.
2020-03-18 19:04:45 -07:00
Jingyu Zhou 9a91bb2b9e Add target version as the limit for version batches
If using partitioned logs, the mutations after the target version can be
included if this limit is not considered.
2020-03-18 16:44:17 -07:00
Jingyu Zhou 61f8cd2529 Add an exitEarly flag for backup worker
If a backup worker is on an old epoch, it could exit early if either of the
following is true:
- there is no backups
- all backups starts a version >= the endVersion

If this flag is set, the backup worker exit without doing any work, which
signals the master to update oldest backup epoch.
2020-03-18 16:44:17 -07:00
Jingyu Zhou be8c9585c9 Skip setting backupStartedKey if using old mutation logs
For old submitBackup(), where partitionedLog is false, do not set the
backupStartedKey in BackupConfig, which signals backup workers to skip these
backups.
2020-03-18 16:44:17 -07:00
Jingyu Zhou 19f6394dc9 Fix oldest backup epoch for backup workers
The oldest backup epoch is piggybacked in LogSystemConfig from master to
cluster controller and then to all workers. Previously, this epoch is set
to the current master epoch, which is wrong.
2020-03-18 16:44:17 -07:00
Jingyu Zhou 3513bbefe6 StagingKey uses mutation instead of a vector of mutations for each log version
Because each log version contains commit version and subsequence number, each
key can only have one mutation for its log version. This simplifies
StagingKey::add() a lot.
2020-03-18 16:44:17 -07:00
Jingyu Zhou 9b11bd8ee4 Batch sending all mutations of a version from RestoreLoader
This optimization is to reduce the number of messages sent from loader to
applier, which was unintentionally done when introducing sub sequence numbers
for mutations.
2020-03-18 16:42:53 -07:00
Jingyu Zhou b697e46b19 Fix duplicated mutation in StagingKey
For some reason I am not sure why, there can be duplicated mutations added to
StagingKey, which needs to be filtered out. Otherwise, atomic operations can
result in corrupted data in database.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 0fb9e943f2 Small code refactor 2020-03-18 16:41:35 -07:00
Jingyu Zhou d1ef6f1225 Fix missing mutations in splitMutation
When a range mutation is larger than the last split point, this mutation can
become missing in the RestoreLoader, which is fixed in this commit.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 3bb12bc844 Fix decode bug of missing mutations
After reading a new block, all mutations are sorted by version again, which
can invalidate previously tuple. As a result, the decoded file will miss some
of the mutations.
2020-03-18 16:41:35 -07:00
Jingyu Zhou c3dd593113 Updates lastest backup worker progress after all previous epochs are done
If workers for previous epochs are still ongoing, we may end up with a
container that miss mutations in previous epochs. So the update only happens
after there are only current epoch's backup workers.
2020-03-18 16:41:35 -07:00
Jingyu Zhou a855e871e0 Fix duplicate file removal for subset version ranges
Partitioned logs can have strict subset version ranges, which was not properly
handled -- we used to assume overlapping only happens for the same begin
version.
2020-03-18 16:41:35 -07:00
Jingyu Zhou d5250084bd Fix a time gap for monitoring backup keys
Backup worker starts by check if there are backup keys and then runs
monitorBackupKeyOrPullData() loop, which does the check again. The second check
can be delayed, which causes the loop to perform NOOP pops. The fix removes
this second check and uses the result of the first check to decide what to do
in the loop.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 14b5925276 Allow overlapped versions in partitioned logs
The overlapping can only happens between two generations, where the known
committed version to recovery version is copied from old generation to the new
generation. Within a generation, there is no overlap.

The fix here is related to the calculation of continuous version ranges,
allowing the overlap to happen.
2020-03-18 16:41:35 -07:00
Jingyu Zhou ceb56cf49d Add done trigger so that backup progress can be set
Otherwise, when there is no mutations for the unfinished range, the empty file
may not be created when the worker is displaced, thus leaving holes in version
ranges.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 4e09c7be83 Remove debug print out 2020-03-18 16:41:35 -07:00