Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.
Rename partitioned log file so that the last number is block size.
For partitioned logs, mutations of the same version may be sent to applier
out-of-order. If one loader advances to the next version, an applier may
receive later version mutations for different loaders. So, dropping of early
mutations is wrong.
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.
For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.
TODO: fix unable to restore errors.
If a backup worker is on an old epoch, it could exit early if either of the
following is true:
- there is no backups
- all backups starts a version >= the endVersion
If this flag is set, the backup worker exit without doing any work, which
signals the master to update oldest backup epoch.
For old submitBackup(), where partitionedLog is false, do not set the
backupStartedKey in BackupConfig, which signals backup workers to skip these
backups.
The oldest backup epoch is piggybacked in LogSystemConfig from master to
cluster controller and then to all workers. Previously, this epoch is set
to the current master epoch, which is wrong.
Because each log version contains commit version and subsequence number, each
key can only have one mutation for its log version. This simplifies
StagingKey::add() a lot.
This optimization is to reduce the number of messages sent from loader to
applier, which was unintentionally done when introducing sub sequence numbers
for mutations.
For some reason I am not sure why, there can be duplicated mutations added to
StagingKey, which needs to be filtered out. Otherwise, atomic operations can
result in corrupted data in database.
After reading a new block, all mutations are sorted by version again, which
can invalidate previously tuple. As a result, the decoded file will miss some
of the mutations.
If workers for previous epochs are still ongoing, we may end up with a
container that miss mutations in previous epochs. So the update only happens
after there are only current epoch's backup workers.
Partitioned logs can have strict subset version ranges, which was not properly
handled -- we used to assume overlapping only happens for the same begin
version.
Backup worker starts by check if there are backup keys and then runs
monitorBackupKeyOrPullData() loop, which does the check again. The second check
can be delayed, which causes the loop to perform NOOP pops. The fix removes
this second check and uses the result of the first check to decide what to do
in the loop.
The overlapping can only happens between two generations, where the known
committed version to recovery version is copied from old generation to the new
generation. Within a generation, there is no overlap.
The fix here is related to the calculation of continuous version ranges,
allowing the overlap to happen.
Otherwise, when there is no mutations for the unfinished range, the empty file
may not be created when the worker is displaced, thus leaving holes in version
ranges.