Before we allow holes in version ranges in partitioned mutation logs. This
has been changed so that restore can easily figure out if database is
restorable. A specific problem is that if the backup worker didn't find any
mutations for an old epoch, the worker can just exit without generating a
log file, thus leaving holes in version ranges.
Another contract change is that if a backup key is set, then we must store
all mutations for that key, especially for the worker for the old epoch. As a
result, the worker must first check backup key, before pulling mutations and
uploading logs. Otherwise, we may lose mutations.
Finally, when a backup key is removed, the saving of mutations should be up to
the current version so that backup worker doesn't exit too early. I.e., avoid
the case saved mutation versions are less than the snapshot version taken.
The NOOP pop cuases some mutation ranges being dropped by backup workers. As a
result, the backup is incomplete. Specifically, the wait of BACKUP_NOOP_POP_DELAY
blocks the monitoring of backup key actor.
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.
Rename partitioned log file so that the last number is block size.
For partitioned logs, mutations of the same version may be sent to applier
out-of-order. If one loader advances to the next version, an applier may
receive later version mutations for different loaders. So, dropping of early
mutations is wrong.
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.
For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.
TODO: fix unable to restore errors.
If a backup worker is on an old epoch, it could exit early if either of the
following is true:
- there is no backups
- all backups starts a version >= the endVersion
If this flag is set, the backup worker exit without doing any work, which
signals the master to update oldest backup epoch.
The oldest backup epoch is piggybacked in LogSystemConfig from master to
cluster controller and then to all workers. Previously, this epoch is set
to the current master epoch, which is wrong.
Because each log version contains commit version and subsequence number, each
key can only have one mutation for its log version. This simplifies
StagingKey::add() a lot.
This optimization is to reduce the number of messages sent from loader to
applier, which was unintentionally done when introducing sub sequence numbers
for mutations.
For some reason I am not sure why, there can be duplicated mutations added to
StagingKey, which needs to be filtered out. Otherwise, atomic operations can
result in corrupted data in database.
If workers for previous epochs are still ongoing, we may end up with a
container that miss mutations in previous epochs. So the update only happens
after there are only current epoch's backup workers.
Backup worker starts by check if there are backup keys and then runs
monitorBackupKeyOrPullData() loop, which does the check again. The second check
can be delayed, which causes the loop to perform NOOP pops. The fix removes
this second check and uses the result of the first check to decide what to do
in the loop.
Otherwise, when there is no mutations for the unfinished range, the empty file
may not be created when the worker is displaced, thus leaving holes in version
ranges.
The start version of tlog set can be smaller than the last epoch's end version.
In this case, set backup worker's start version as last epoch's end version to
avoid overlapping of version ranges among backup workers.
If backup worker is enabled, the current epoch's worker of tag (-2,0) will be
responsible for monitoring the backup progress of all workers and update the
BackupConfig with the latest saved log version, which is the minimum version
of all tags.
This change has been incorporated in the getLatestRestorableVersion() so that
it is transparent to clients.
This delay is to ensure old epoch's backup workers can save their progress in
the database. Otherwise, the new master could attempts to recruit backup
workers for the old epoch on version ranges that have already been popped. As
a result, the logs will lose data.
When pull finished and message queue is empty, we should use end version as the
popVersion for backup files. Otherwise, there might be a version gap between
last message and end version.
A partial recovery can result in empty epoch that copies previous epoch's
version range. In this case, getOldEpochTagsVersionsInfo() will not return
previous epoch's information. To correctly compute the start version for a
backup worker, we need to check previous epoch's saved version. If they are
larger than this epoch's begin version, use previously saved version as the
start version.
Before we allow holes in version ranges in partitioned mutation logs. This
has been changed so that restore can easily figure out if database is
restorable. A specific problem is that if the backup worker didn't find any
mutations for an old epoch, the worker can just exit without generating a
log file, thus leaving holes in version ranges.
Another contract change is that if a backup key is set, then we must store
all mutations for that key, especially for the worker for the old epoch. As a
result, the worker must first check backup key, before pulling mutations and
uploading logs. Otherwise, we may lose mutations.
Finally, when a backup key is removed, the saving of mutations should be up to
the current version so that backup worker doesn't exit too early. I.e., avoid
the case saved mutation versions are less than the snapshot version taken.
The NOOP pop cuases some mutation ranges being dropped by backup workers. As a
result, the backup is incomplete. Specifically, the wait of BACKUP_NOOP_POP_DELAY
blocks the monitoring of backup key actor.
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.
Rename partitioned log file so that the last number is block size.
For partitioned logs, mutations of the same version may be sent to applier
out-of-order. If one loader advances to the next version, an applier may
receive later version mutations for different loaders. So, dropping of early
mutations is wrong.
The subsequence number is needed so that mutations of the same commit version
number, but from different partitioned logs can be correctly reassembled in
order.
For old backup files, the sub number is always 0. For partitioned mutation
logs, the actual sub number is used. For range files, the sub number is always
0.
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.
TODO: fix unable to restore errors.
With TLS, a worker (or process) can have a TLS address and non-TLS address.
When a process is created in simulation, the primary address is TLS by default.
The non-TLS one is the TLS address port plus one.
In a connection between two workers, if their primary addresses do not enable
or disable TLS together, one worker will swap its primary address and secondary address
so that the TLS config of the two endpoints can match.
The swap can make the primary address no longer the TLS one that was created
when the process is created. And the swap only happens for worker instead of
process struct in simulation.
This swap can cause worker->address != process->address.
In checkForExtraDataStores actor, we use worker->address to check if a process
is killable and use the process->address to kill the process. The inconsistency
can cause simulation to kill a protected process that is not killable and leads
to simulation failure.
The idea being that we keep around a TLSConfig that the configuration
that the user has provided, and then when we want to intialize an SSL
context, we ask the TLSConfig to load all certificates and return us a
LoadedTLSConfig that is a concrete set of certificate bytes in memory.
initTLS now just takes the in-memory bytes and applies them to the ssl
context.
This is a large refactor to lead up into certificate refeshing, where we
will periodically check for changes to the certificates, and then
re-load them and apply them to a new SSL context.