While displaced backup workers wait for uploading to finish, it can get
connection_failed error, which caused spurious SevError of BackupFailed. Fix
by ignoring any errors from the uploading actor.
After an Arena object is counted, it can grow larger later. So we can't reduce
the amount of memory of arena size later. Instead, we use the arena size when
inserting mutations.
This introduces unhygenic macro variants that inline a `ENABLED &&`
before the TraceEvent. This way, they get entirely compiled out unless
enabled.
Then rewrite all debugMutation uses via sed.
I.e., do not allow the same version's mutations saved in different files.
Otherwise, we may have a file only contain a version's partial data, causing
continuity analysis of mutation logs to fail. This could also cause restore
failures, if the target version's mutations are stored in two files.
In the above description, all mutation logs refer to the same tag's logs.
For the first mutation log of a backup, we need to true-up its begin version to
the exact version of the first mutation. This is needed to ensure the strict
less than relationship between two mutation logs, if one's version range is
within the other.
A problematic scenario is as follows:
Epoch 1: a mutation log A [200, 900] is saved, but its progress is NOT saved.
Epoch 2: master recruits a worker for [1, 1000], 1000 is epoch 1's end version.
New worker saves a mutation log B [100, 1000]
A's range is strict within B's range, but A's size is larger than B.
This happens because B's start version is true-up to the backup's begin version,
which is not the actual version of the first mutation. After B's begin version
is true-up to 300, we won't have this issue.
This fixes the consistency check failure, where saving progress commits new
transactions. Pop is performed by the NOOP loop in monitorBackupKeyOrPullData.
When the Master recruits a backup worker for previous epochs, the Master may
set the begin version to a very low number, because the backup progress for
that epoch is not saved. This can cause problem for the log file, since these
low versions have been popped.
The fix here is to advance savedVersion to the minimum of backup's starting
version if it is higher than the begin version set by the Master. This is safe
because these versions are not popped. If they are popped, their progress should
already be recorded and Master would use a higher version than the backup's
starting version.
To pause/resume the backup workers, the fdbbackup command will write to the
backupPausedKey. Then backup workers noticed the value of the key has been
changed and stops/resumes pulling from TLog.