In the old mutation logs, a version's mutations are serialized as a buffer.
Then the buffer is split into smaller chunks, e.g., 10000 bytes each. When
writting chunks to the final mutation log file, these chunks can be flushed
out of order. For instance, the (version, chunck_part) can be in the order of
(3, 0), (4, 0), (3, 1). As a result, the decoder must read forward to find all
chunks of data for a version.
Another complication is that the files are organized into blocks, where (3, 1)
can be in a subsequent block. This change checks the value size for each
version, if the size is smaller than the right size, the decoder will look
for the missing chucks in the next block.
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
If backup worker is enabled, the current epoch's worker of tag (-2,0) will be
responsible for monitoring the backup progress of all workers and update the
BackupConfig with the latest saved log version, which is the minimum version
of all tags.
This change has been incorporated in the getLatestRestorableVersion() so that
it is transparent to clients.
This delay is to ensure old epoch's backup workers can save their progress in
the database. Otherwise, the new master could attempts to recruit backup
workers for the old epoch on version ranges that have already been popped. As
a result, the logs will lose data.
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
When pull finished and message queue is empty, we should use end version as the
popVersion for backup files. Otherwise, there might be a version gap between
last message and end version.
A partial recovery can result in empty epoch that copies previous epoch's
version range. In this case, getOldEpochTagsVersionsInfo() will not return
previous epoch's information. To correctly compute the start version for a
backup worker, we need to check previous epoch's saved version. If they are
larger than this epoch's begin version, use previously saved version as the
start version.
Before we allow holes in version ranges in partitioned mutation logs. This
has been changed so that restore can easily figure out if database is
restorable. A specific problem is that if the backup worker didn't find any
mutations for an old epoch, the worker can just exit without generating a
log file, thus leaving holes in version ranges.
Another contract change is that if a backup key is set, then we must store
all mutations for that key, especially for the worker for the old epoch. As a
result, the worker must first check backup key, before pulling mutations and
uploading logs. Otherwise, we may lose mutations.
Finally, when a backup key is removed, the saving of mutations should be up to
the current version so that backup worker doesn't exit too early. I.e., avoid
the case saved mutation versions are less than the snapshot version taken.
The NOOP pop cuases some mutation ranges being dropped by backup workers. As a
result, the backup is incomplete. Specifically, the wait of BACKUP_NOOP_POP_DELAY
blocks the monitoring of backup key actor.
Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.
Rename partitioned log file so that the last number is block size.
For partitioned logs, mutations of the same version may be sent to applier
out-of-order. If one loader advances to the next version, an applier may
receive later version mutations for different loaders. So, dropping of early
mutations is wrong.
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.
For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.
TODO: fix unable to restore errors.