Commit Graph

9158 Commits

Author SHA1 Message Date
Jingyu Zhou 273c086b0f Fix MacOS compiling error
clang doesn't allow capture references, so use copy for lambda's capture list.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 6fb7316185 Fix asset end version if request.targetVersion is -1 2020-03-20 20:15:09 -07:00
Jingyu Zhou ca1a4ef9fd Ignore mutation logs of size 0 in converter 2020-03-20 20:15:08 -07:00
Jingyu Zhou d0a24dd20d Decode out of order mutations in old mutation logs
In the old mutation logs, a version's mutations are serialized as a buffer.
Then the buffer is split into smaller chunks, e.g., 10000 bytes each. When
writting chunks to the final mutation log file, these chunks can be flushed
out of order. For instance, the (version, chunck_part) can be in the order of
(3, 0), (4, 0), (3, 1). As a result, the decoder must read forward to find all
chunks of data for a version.

Another complication is that the files are organized into blocks, where (3, 1)
can be in a subsequent block. This change checks the value size for each
version, if the size is smaller than the right size, the decoder will look
for the missing chucks in the next block.
2020-03-20 20:15:08 -07:00
Jingyu Zhou d82432da3c Fix wrong end version for restore loader
The restore cannot exceed the target version of the restore request. Otherwise,
the version restored is larger than the requested version.
2020-03-20 20:15:08 -07:00
Jingyu Zhou c59b0844a9 Add total number of tags to WorkerBackupStatus
This allows the backup worker to check the number of tags.
2020-03-20 20:15:08 -07:00
Jingyu Zhou d8731a1796 Refactor to use std::find_if for more concise code 2020-03-20 20:15:08 -07:00
Jingyu Zhou 12ed8ad536 Fix backup worker start version when logset start version is lower
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 38def426f4 Add a flag to submitBackup for partitioned log
This is to distinguish with old workloads so that they can work in simulation.
2020-03-20 20:15:08 -07:00
Jingyu Zhou e9287407d6 Backup worker updates latest log versions in BackupConfig
If backup worker is enabled, the current epoch's worker of tag (-2,0) will be
responsible for monitoring the backup progress of all workers and update the
BackupConfig with the latest saved log version, which is the minimum version
of all tags.

This change has been incorporated in the getLatestRestorableVersion() so that
it is transparent to clients.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 80d3fa1222 Add delay for master to recruit backup workers
This delay is to ensure old epoch's backup workers can save their progress in
the database. Otherwise, the new master could attempts to recruit backup
workers for the old epoch on version ranges that have already been popped. As
a result, the logs will lose data.
2020-03-20 20:15:08 -07:00
Jingyu Zhou fe6b4a4398 Some correctness fixes 2020-03-20 20:15:08 -07:00
Jingyu Zhou 5ce9fc0e4c Partitioned logs should be filtered after sorting by tag IDs
The default sorting by begin and end version doesn't work with duplicates
removal, as tags are also compared.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 5afc23a0e1 Give a chance for backup worker to finish writing files
If a backup worker is cancelled, wait until it finishes writing files so that
we don't need to create these files in the next epoch.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 20df67ee6a Filter partitioned logs with subset relationship
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 696ce6aa82 Fix compiling error of reverse iterators
MacOS and Windows compiler doesn't like the use of "!=" operator of
std::map::reverse_iterator.
2020-03-20 20:15:08 -07:00
Jingyu Zhou b792d76d62 Fix version gap in old epoch's backup
When pull finished and message queue is empty, we should use end version as the
popVersion for backup files. Otherwise, there might be a version gap between
last message and end version.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 31f7108eab Handle partial recovery in BackupProgress
A partial recovery can result in empty epoch that copies previous epoch's
version range. In this case, getOldEpochTagsVersionsInfo() will not return
previous epoch's information. To correctly compute the start version for a
backup worker, we need to check previous epoch's saved version. If they are
larger than this epoch's begin version, use previously saved version as the
start version.
2020-03-20 20:15:08 -07:00
Jingyu Zhou e3eb3beaaf Consider previously pulled version for pulling version
Saving files only happens if we are not pulling, i.e., not in NOOP mode.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 1b159a3785 Fix: backup worker ignores deleted container 2020-03-20 20:14:36 -07:00
Jingyu Zhou 00350dd3d8 Fix pulledVersion of backup worker
Not sure why, the cursor's version can be smaller than before.
2020-03-20 20:14:35 -07:00
Jingyu Zhou 672ad7a8ea Fix: backup worker savedVersion init to begin version
Choosing invalidVersion is wrong, as the worker starts at beginVersion.
2020-03-20 20:14:35 -07:00
Jingyu Zhou c300a5c1b7 Fix contract changes: backup worker generate continuous versions
Before we allow holes in version ranges in partitioned mutation logs. This
has been changed so that restore can easily figure out if database is
restorable. A specific problem is that if the backup worker didn't find any
mutations for an old epoch, the worker can just exit without generating a
log file, thus leaving holes in version ranges.

Another contract change is that if a backup key is set, then we must store
all mutations for that key, especially for the worker for the old epoch. As a
result, the worker must first check backup key, before pulling mutations and
uploading logs. Otherwise, we may lose mutations.

Finally, when a backup key is removed, the saving of mutations should be up to
the current version so that backup worker doesn't exit too early. I.e., avoid
the case saved mutation versions are less than the snapshot version taken.
2020-03-20 20:14:35 -07:00
Jingyu Zhou 86edc1c9c8 Fix backup worker does NOOP pop before getting backup key
The NOOP pop cuases some mutation ranges being dropped by backup workers. As a
result, the backup is incomplete. Specifically, the wait of BACKUP_NOOP_POP_DELAY
blocks the monitoring of backup key actor.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 05b87cf288 Partitioned logs need to compute continuous begin Version
Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 1f95cba53e Add describePartitionedBackup() for parallel restore
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.

Rename partitioned log file so that the last number is block size.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 2eac17b553 StagingKey can add out-of-order mutations
For partitioned logs, mutations of the same version may be sent to applier
out-of-order. If one loader advances to the next version, an applier may
receive later version mutations for different loaders. So, dropping of early
mutations is wrong.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 938a6f358d Describe backup uses partitioned logs to find continuous end version
For partitioned logs, the continuous end version has to be done range by range,
where each range must contain continuous version for all tags.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 659843ff51 Check partitioned log files are continuous for RestoreSet
The idea of checking is to use Tag 0 to find out ranges and their number of
tags. Then for each tag 1 and above, check versions are continuous.
2020-03-20 20:13:38 -07:00
Jingyu Zhou ab0b59b0c3 Add subsequence number to restore loader & applier
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.

For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
2020-03-20 20:13:38 -07:00
Jingyu Zhou fda6c08640 Include a total number of tags in partition log file names
This is needed for BackupContainer to check partitioned mutation logs are
continuous, i.e., restorable to a version.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 64859467e4 Return partitioned logs for RestorableFileSet 2020-03-20 20:13:38 -07:00
Jingyu Zhou 940bea102a Add a knob to switch mutation logs for parallel restore
Knob FASTRESTORE_USE_PARTITIONED_LOGS, default is true to enable partitioned
mutation logs. Otherwise, old mutation logs are used.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 6b9b93314e Check block padding is \0xff for new mutation logs 2020-03-20 20:13:38 -07:00
Jingyu Zhou 35aafefb89 Consolidate StringRefReader classes
Fix a compiler error of unused variable too.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 88ad28e576 Integrate parallel restore with partitioned logs
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.

TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Jingyu Zhou e15015ee6c Add mutation log version names
I.e., BACKUP_AGENT_MLOG_VERSION for 2001 and PARTITIONED_MLOG_VERSION for 4110.
2020-03-20 20:13:38 -07:00
Jingyu Zhou ec352c03c9 Add partitioned logs to BackupContainer 2020-03-20 20:13:38 -07:00
Meng Xu d3071409c5 FastRestore:Add comment for integrating with new backup format 2020-03-20 20:13:38 -07:00
Jingyu Zhou 3801e50288 Backup worker: enable 50% of time in simulation
Make this randomization a separate one.
2020-03-20 20:13:38 -07:00
Meng Xu 980037f3a8
Merge pull request #2835 from bnamasivayam/revert-report-conflicting-keys
Revert report conflicting keys
2020-03-20 10:33:26 -07:00
Evan Tschannen e7e559cbae
Merge pull request #2706 from etschannen/feature-test-harness
Added TestHarness and TraceLogHelper for assisting with automated simulation testing
2020-03-20 10:29:22 -07:00
A.J. Beamon cf9a18a64d
Merge pull request #2838 from dongxinEric/misc/diable-ruby-tester
Disable Ruby tests for the moment.
2020-03-20 10:26:26 -07:00
Evan Tschannen a38a7fc8b4 updated copyright date 2020-03-20 10:15:33 -07:00
Xin Dong f293666028 Added back unnecessary changes. 2020-03-20 10:12:37 -07:00
Xin Dong 851ee20c1a Disable Ruby tests for the moment. 2020-03-20 10:05:14 -07:00
Jingyu Zhou 34415f82b3
Merge pull request #2832 from xumengpanda/mengxu/backup-code-review-PR
Buggify upload delay when backup worker upload data to blob
2020-03-19 21:42:28 -07:00
Balachandar Namasivayam 804fe1b22e Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key"
This reverts commit 648dc4a933, reversing
changes made to 487d131b38.
2020-03-19 21:34:28 -07:00
Balachandar Namasivayam efd0c6cec0 Revert "Merge pull request #2833 from xumengpanda/mengxu/remove-test-PR"
This reverts commit 8d655d7e40, reversing
changes made to cd5be43cd9.
2020-03-19 21:33:47 -07:00
Meng Xu dfea2c2e55 BackupWorker:Remove assert in pop 2020-03-19 20:14:52 -07:00