The sanity check parses each range file to get the key range of each
range file.
The parsing incurs restore_unsupported_file_version error.
We need to include this sanity check before 6.3 release.
The oldest epoch the master gets can assume its begin version is 1, which can
be wrong. In this case, we use the saved backup progress to "true-up" the real
begin version.
The backup container can figure out if partitioned logs are used by looking at
mutation logs, thus consolidating the API to a single describeBackup() as
before.
Backup container can have mutation log files that are not continuous overall,
but contain a continuous range that cover the snapshots. So when determine the
continuous log ranges, try to find one that cover the first snapshot's begin
version.
When scan from a known version, stop if logs are not continuous. However, if
scan from 0, we should reset minLogBegin to the next file and continue scan
from that file's begin version.
The contiguous version should scan from the max of scanBegin version and the
minLogBegin version. Once we found a version that's larger, set it as the
contiguousLogEnd version.
Partitioned logs can have strict subset version ranges, which was not properly
handled -- we used to assume overlapping only happens for the same begin
version.
The overlapping can only happens between two generations, where the known
committed version to recovery version is copied from old generation to the new
generation. Within a generation, there is no overlap.
The fix here is related to the calculation of continuous version ranges,
allowing the overlap to happen.
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.
Rename partitioned log file so that the last number is block size.
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.
TODO: fix unable to restore errors.
The mutation logs of backup workers are saved into "mlogs" directory under the
container directory. The backup worker has been restructured to handle multiple
backups, where each one is stored in a separate backup container.
In the backup worker, mutations pulled from TLogs are buffered in a message
queue. When writing out to different containers, their corresponding mutation
ranges are used to check if a mutation should be written. When a new backup
is submitted by the client, "backupStartedKey" is updated. The worker monitors
this key, updates its internal map of backups, and then next pull from TLog
needs to wait for the readiness of the new backup. This is to ensure when
worker 0 sets the backup is started, all workers have already been logging
mutations for the backup.
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
the file only has functionalities related to restore worker.
Passed correctness test