If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
For each mutation, its version, sub-version, and size are prefixed with big
endian representation. This is required, especially for the first version
variable, because we use 0xFF for padding purpose. A little endian version
number can easily collide with 0xFF, while big endian is guaranteed to have
0x00 as the first byte.
Duplicates can happen because backup workers may store the log for
old epochs successfully, but do not update the progress before another
recovery happened. As a result, next epoch will retry and creates
duplicated log files.
VersionedData used to include a MutationRef, which is made from BinaryReader.
Unfortunately, the StringRef inside MutationRef points a memory allocated from
the BinaryReader's arena, which is free'd after BinaryReader is destroyed.
Change to use a StringRef pointing to the serialized mutation solves this bug.
fdbconvert is intended to convert new backup files which are tagged mutation
logs to old backup format. The actual conversion is not included in this commit
and will be added in future commits.
Note that the BackupContainer needs to be updated to support new backup files,
which is also not included in this commit.