Commit Graph

9323 Commits

Author SHA1 Message Date
Xin Dong f293666028 Added back unnecessary changes. 2020-03-20 10:12:37 -07:00
Xin Dong 851ee20c1a Disable Ruby tests for the moment. 2020-03-20 10:05:14 -07:00
Jingyu Zhou 34415f82b3
Merge pull request #2832 from xumengpanda/mengxu/backup-code-review-PR
Buggify upload delay when backup worker upload data to blob
2020-03-19 21:42:28 -07:00
Balachandar Namasivayam 804fe1b22e Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key"
This reverts commit 648dc4a933, reversing
changes made to 487d131b38.
2020-03-19 21:34:28 -07:00
Balachandar Namasivayam efd0c6cec0 Revert "Merge pull request #2833 from xumengpanda/mengxu/remove-test-PR"
This reverts commit 8d655d7e40, reversing
changes made to cd5be43cd9.
2020-03-19 21:33:47 -07:00
Meng Xu dfea2c2e55 BackupWorker:Remove assert in pop 2020-03-19 20:14:52 -07:00
Meng Xu a323b80439 BackupWorker:Improve code comments 2020-03-19 15:58:22 -07:00
Jingyu Zhou 8bdda0fe04 Backup Worker: Give a chance of saving progress before displaced
Move the exit loop after the saving of progress so that when doneTrigger is
active, we won't exit the loop immediately.
2020-03-19 14:59:38 -07:00
Jingyu Zhou 8d655d7e40
Merge pull request #2833 from xumengpanda/mengxu/remove-test-PR
Remove ReportConflictingKeys.txt workload
2020-03-19 12:41:27 -07:00
Meng Xu 4f61973ede Remove the ReportConflictingKeys.txt test from CMake 2020-03-19 10:49:54 -07:00
Meng Xu 308d82245c Remove ReportConflictingKeys.txt workload
The PR 2257 still has many correctness failures.
Remove the workload to avoid noise while we are working on the fixes.
2020-03-19 10:34:14 -07:00
Jingyu Zhou 5bf62c8f85 Reduce a call to getLogSystemConfig() 2020-03-19 10:08:19 -07:00
Alex Miller cd5be43cd9
Merge pull request #2816 from mpilman/features/docker-script
added script that will generated a dev-docker img
2020-03-19 02:02:58 -07:00
Meng Xu 94276076de BackupWorker:Buggify upload delay
Add questions to code as well.
2020-03-18 19:04:45 -07:00
Jingyu Zhou 9a91bb2b9e Add target version as the limit for version batches
If using partitioned logs, the mutations after the target version can be
included if this limit is not considered.
2020-03-18 16:44:17 -07:00
Jingyu Zhou 61f8cd2529 Add an exitEarly flag for backup worker
If a backup worker is on an old epoch, it could exit early if either of the
following is true:
- there is no backups
- all backups starts a version >= the endVersion

If this flag is set, the backup worker exit without doing any work, which
signals the master to update oldest backup epoch.
2020-03-18 16:44:17 -07:00
Jingyu Zhou be8c9585c9 Skip setting backupStartedKey if using old mutation logs
For old submitBackup(), where partitionedLog is false, do not set the
backupStartedKey in BackupConfig, which signals backup workers to skip these
backups.
2020-03-18 16:44:17 -07:00
Jingyu Zhou 19f6394dc9 Fix oldest backup epoch for backup workers
The oldest backup epoch is piggybacked in LogSystemConfig from master to
cluster controller and then to all workers. Previously, this epoch is set
to the current master epoch, which is wrong.
2020-03-18 16:44:17 -07:00
Jingyu Zhou 3513bbefe6 StagingKey uses mutation instead of a vector of mutations for each log version
Because each log version contains commit version and subsequence number, each
key can only have one mutation for its log version. This simplifies
StagingKey::add() a lot.
2020-03-18 16:44:17 -07:00
Jingyu Zhou 9b11bd8ee4 Batch sending all mutations of a version from RestoreLoader
This optimization is to reduce the number of messages sent from loader to
applier, which was unintentionally done when introducing sub sequence numbers
for mutations.
2020-03-18 16:42:53 -07:00
Jingyu Zhou b697e46b19 Fix duplicated mutation in StagingKey
For some reason I am not sure why, there can be duplicated mutations added to
StagingKey, which needs to be filtered out. Otherwise, atomic operations can
result in corrupted data in database.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 0fb9e943f2 Small code refactor 2020-03-18 16:41:35 -07:00
Jingyu Zhou d1ef6f1225 Fix missing mutations in splitMutation
When a range mutation is larger than the last split point, this mutation can
become missing in the RestoreLoader, which is fixed in this commit.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 3bb12bc844 Fix decode bug of missing mutations
After reading a new block, all mutations are sorted by version again, which
can invalidate previously tuple. As a result, the decoded file will miss some
of the mutations.
2020-03-18 16:41:35 -07:00
Jingyu Zhou c3dd593113 Updates lastest backup worker progress after all previous epochs are done
If workers for previous epochs are still ongoing, we may end up with a
container that miss mutations in previous epochs. So the update only happens
after there are only current epoch's backup workers.
2020-03-18 16:41:35 -07:00
Jingyu Zhou a855e871e0 Fix duplicate file removal for subset version ranges
Partitioned logs can have strict subset version ranges, which was not properly
handled -- we used to assume overlapping only happens for the same begin
version.
2020-03-18 16:41:35 -07:00
Jingyu Zhou d5250084bd Fix a time gap for monitoring backup keys
Backup worker starts by check if there are backup keys and then runs
monitorBackupKeyOrPullData() loop, which does the check again. The second check
can be delayed, which causes the loop to perform NOOP pops. The fix removes
this second check and uses the result of the first check to decide what to do
in the loop.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 14b5925276 Allow overlapped versions in partitioned logs
The overlapping can only happens between two generations, where the known
committed version to recovery version is copied from old generation to the new
generation. Within a generation, there is no overlap.

The fix here is related to the calculation of continuous version ranges,
allowing the overlap to happen.
2020-03-18 16:41:35 -07:00
Jingyu Zhou ceb56cf49d Add done trigger so that backup progress can be set
Otherwise, when there is no mutations for the unfinished range, the empty file
may not be created when the worker is displaced, thus leaving holes in version
ranges.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 4e09c7be83 Remove debug print out 2020-03-18 16:41:35 -07:00
Jingyu Zhou 03fd5cf3fa Give maximum subsequence number for snapshot mutations
This is needed so that mutations in partitioned logs are applied first and
snapshot mutations are applied later for the same commit version.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 472849e45c Fix MacOS compiling error
clang doesn't allow capture references, so use copy for lambda's capture list.
2020-03-18 16:41:35 -07:00
Jingyu Zhou dbb05faa24 Fix asset end version if request.targetVersion is -1 2020-03-18 16:41:35 -07:00
Jingyu Zhou 7f3c64e326 Ignore mutation logs of size 0 in converter 2020-03-18 16:41:35 -07:00
Jingyu Zhou 937d8bcb8e Decode out of order mutations in old mutation logs
In the old mutation logs, a version's mutations are serialized as a buffer.
Then the buffer is split into smaller chunks, e.g., 10000 bytes each. When
writting chunks to the final mutation log file, these chunks can be flushed
out of order. For instance, the (version, chunck_part) can be in the order of
(3, 0), (4, 0), (3, 1). As a result, the decoder must read forward to find all
chunks of data for a version.

Another complication is that the files are organized into blocks, where (3, 1)
can be in a subsequent block. This change checks the value size for each
version, if the size is smaller than the right size, the decoder will look
for the missing chucks in the next block.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 7d1538a9fc Fix wrong end version for restore loader
The restore cannot exceed the target version of the restore request. Otherwise,
the version restored is larger than the requested version.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 6a302e6605 Add total number of tags to WorkerBackupStatus
This allows the backup worker to check the number of tags.
2020-03-18 16:41:35 -07:00
Jingyu Zhou ce2595821a Refactor to use std::find_if for more concise code 2020-03-18 16:41:35 -07:00
Jingyu Zhou 89d8f13038 Fix backup worker start version when logset start version is lower
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 524b275a94 Add a flag to submitBackup for partitioned log
This is to distinguish with old workloads so that they can work in simulation.
2020-03-18 16:41:35 -07:00
Jingyu Zhou be1d36bed3 Backup worker updates latest log versions in BackupConfig
If backup worker is enabled, the current epoch's worker of tag (-2,0) will be
responsible for monitoring the backup progress of all workers and update the
BackupConfig with the latest saved log version, which is the minimum version
of all tags.

This change has been incorporated in the getLatestRestorableVersion() so that
it is transparent to clients.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 15437ffb53 Add delay for master to recruit backup workers
This delay is to ensure old epoch's backup workers can save their progress in
the database. Otherwise, the new master could attempts to recruit backup
workers for the old epoch on version ranges that have already been popped. As
a result, the logs will lose data.
2020-03-18 16:41:35 -07:00
Jingyu Zhou b8c362cf44 Some correctness fixes 2020-03-18 16:41:35 -07:00
Jingyu Zhou 2c2d679a5d Partitioned logs should be filtered after sorting by tag IDs
The default sorting by begin and end version doesn't work with duplicates
removal, as tags are also compared.
2020-03-18 16:41:35 -07:00
Jingyu Zhou cade657682 Give a chance for backup worker to finish writing files
If a backup worker is cancelled, wait until it finishes writing files so that
we don't need to create these files in the next epoch.
2020-03-18 16:41:35 -07:00
Jingyu Zhou cc33a1e35e Filter partitioned logs with subset relationship
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-18 16:41:35 -07:00
Jingyu Zhou a015277e49 Fix compiling error of reverse iterators
MacOS and Windows compiler doesn't like the use of "!=" operator of
std::map::reverse_iterator.
2020-03-18 16:41:35 -07:00
Jingyu Zhou a0fb8ad5fc Fix version gap in old epoch's backup
When pull finished and message queue is empty, we should use end version as the
popVersion for backup files. Otherwise, there might be a version gap between
last message and end version.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 70487cee1b Handle partial recovery in BackupProgress
A partial recovery can result in empty epoch that copies previous epoch's
version range. In this case, getOldEpochTagsVersionsInfo() will not return
previous epoch's information. To correctly compute the start version for a
backup worker, we need to check previous epoch's saved version. If they are
larger than this epoch's begin version, use previously saved version as the
start version.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 96eab2f3ec Consider previously pulled version for pulling version
Saving files only happens if we are not pulling, i.e., not in NOOP mode.
2020-03-18 16:41:35 -07:00