Commit Graph

20 Commits

Author SHA1 Message Date
Jingyu Zhou ca1a4ef9fd Ignore mutation logs of size 0 in converter 2020-03-20 20:15:08 -07:00
Jingyu Zhou 20df67ee6a Filter partitioned logs with subset relationship
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou e15015ee6c Add mutation log version names
I.e., BACKUP_AGENT_MLOG_VERSION for 2001 and PARTITIONED_MLOG_VERSION for 4110.
2020-03-20 20:13:38 -07:00
Jingyu Zhou e3061ab713 Use ArenaReader to deserialize MutationRef 2020-02-13 10:28:08 -08:00
Andrew Noyes 09f3690f09 Fix OPEN_FOR_IDE build 2020-02-03 10:42:05 -08:00
Jingyu Zhou 690e93145e Fix some comments 2020-01-22 19:38:46 -08:00
Jingyu Zhou 06fb45f32a FileConverter skips mutation files without tag ID
Fileconverter doesn't know the format of old mutation logs.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 7d1b9fe6d3 Add mutation file decoder 2020-01-22 19:38:46 -08:00
Jingyu Zhou 568a8a8e77 Use big endian for mutation log files
For each mutation, its version, sub-version, and size are prefixed with big
endian representation. This is required, especially for the first version
variable, because we use 0xFF for padding purpose. A little endian version
number can easily collide with 0xFF, while big endian is guaranteed to have
0x00 as the first byte.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 114e153bc8 Use block size encoded in file names
The log files have block size encoded in their names and the converter should
use these sizes.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 1123157ae0 Ignore mutations large than the end version 2020-01-22 19:38:46 -08:00
Jingyu Zhou b92363bc29 Remove duplicated log files before the conversion
Duplicates can happen because backup workers may store the log for
old epochs successfully, but do not update the progress before another
recovery happened.	As a result, next epoch will retry and creates
duplicated log files.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 4327435601 Fix a data corruption bug
VersionedData used to include a MutationRef, which is made from BinaryReader.
Unfortunately, the StringRef inside MutationRef points a memory allocated from
the BinaryReader's arena, which is free'd after BinaryReader is destroyed.
Change to use a StringRef pointing to the serialized mutation solves this bug.
2020-01-22 19:38:46 -08:00
Jingyu Zhou c1748c0460 Code refactoring
The BackupWorker produces files not in blocks, which should be fixed.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 84a49cf389 Add merge sorting mutations from multiple files
This is implemented in MutationFilesReadProgress.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 5ab9d0925c Add namespace file_converter 2020-01-22 19:38:46 -08:00
Jingyu Zhou 7f7ec99170 Serialize and deserialize new backup files
The BackupWorker writes files that can be read by FileConverter. Move
StringRefReader to the header file for reuse in FileConverter.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 5ac63ec526 Apply clang-format 2020-01-22 19:38:46 -08:00
Jingyu Zhou 674b468609 Add more parameter parsing 2020-01-22 19:38:46 -08:00
Jingyu Zhou 2707ab3eba Add fdbconvert command line utility
fdbconvert is intended to convert new backup files which are tagged mutation
logs to old backup format. The actual conversion is not included in this commit
and will be added in future commits.

Note that the BackupContainer needs to be updated to support new backup files,
which is also not included in this commit.
2020-01-22 19:38:46 -08:00