Commit Graph

29 Commits

Author SHA1 Message Date
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Markus Pilman 37d9e975e9 Fix multiple compiler warnings 2021-03-03 10:18:03 -07:00
Andrew Noyes d2cf700bd4 Fix compiler warnings 2020-07-28 18:30:26 +00:00
Jingyu Zhou d883426c6a Fix spammy GotBackupProgress events
Only print this types of events during master recovery and don't log them for
backup workers.
2020-06-27 21:30:38 -07:00
Jingyu Zhou 3108102f26 Fix a backup progress true-up bug
Sometimes, the true-up has to go backup multiple epochs for saved versions,
because a tag's progress can be missing in an epoch. In other words, we need to
check progress for all tags.
2020-05-31 14:15:12 -07:00
Jingyu Zhou 9e9591a07a Fix an infinite loop 2020-05-19 15:15:39 -07:00
Jingyu Zhou 084afbd22d True-up backup progress may go back multiple epochs
Because the previous epoch may not save some tags, true-up backup progress may
need to go back more than one epoch.
2020-05-18 15:22:44 -07:00
Jingyu Zhou 5528857934 Remove epoch's begin version check
Turns out the begin version can be a valid previous epoch's begin version, not
specificly 1.
2020-04-20 11:06:46 -07:00
Jingyu Zhou 4c66c8c377 Fix backup progress calculation
The oldest epoch the master gets can assume its begin version is 1, which can
be wrong. In this case, we use the saved backup progress to "true-up" the real
begin version.
2020-04-20 11:06:46 -07:00
Jingyu Zhou 7e5551ea19 Avoid overlapping version ranges for backup workers
Sometimes, an epoch's begin version is lower than the previous epoch's end
version. In some rare casse, the master ends up recruiting backup workers for
both epoch and have overlapping ranges of [epochBeginVersion, prevEpochEndVersion].
Since the popping order is by epoch. Previous epoch can pop the mutation and
save to a log file. Then this epoch will miss these popped mutation in the
overlapping range, causing corrupted mutation logs.
2020-04-10 21:19:37 -07:00
Jingyu Zhou 280bc94738 Do not recruit backup workers with wrong tags
In a rare scenario, the master can recruit backup workers with more tags than
the number of log router tags for an epoch. This can be caused by an
unsuccessful recovery, which uses more tags than the next epoch. When
recruiting for the next epoch, if no progress has been made yet, the recruiting
logic will look back at the previous epoch. If previous epoch has saved past
this epoch's begin version, current epoch's progress is updated with that
information and can result in more tags being inserted to this epoch's
recruitment.
2020-03-28 21:19:41 -07:00
Jingyu Zhou 472f7bdd32 Rename a trace event to avoid confusion
Change from BackupRange to BackupVersionRange.
2020-03-25 11:03:05 -07:00
Jingyu Zhou edcbeb8992 Address review comments
Move transaction object outside of the loop and rename trace events.
2020-03-24 18:22:20 -07:00
Meng Xu 3f31ebf659 New backup:Revise event name and explain code 2020-03-23 10:55:44 -07:00
Jingyu Zhou d5250084bd Fix a time gap for monitoring backup keys
Backup worker starts by check if there are backup keys and then runs
monitorBackupKeyOrPullData() loop, which does the check again. The second check
can be delayed, which causes the loop to perform NOOP pops. The fix removes
this second check and uses the result of the first check to decide what to do
in the loop.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 472849e45c Fix MacOS compiling error
clang doesn't allow capture references, so use copy for lambda's capture list.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 6a302e6605 Add total number of tags to WorkerBackupStatus
This allows the backup worker to check the number of tags.
2020-03-18 16:41:35 -07:00
Jingyu Zhou ce2595821a Refactor to use std::find_if for more concise code 2020-03-18 16:41:35 -07:00
Jingyu Zhou b8c362cf44 Some correctness fixes 2020-03-18 16:41:35 -07:00
Jingyu Zhou a015277e49 Fix compiling error of reverse iterators
MacOS and Windows compiler doesn't like the use of "!=" operator of
std::map::reverse_iterator.
2020-03-18 16:41:35 -07:00
Jingyu Zhou 70487cee1b Handle partial recovery in BackupProgress
A partial recovery can result in empty epoch that copies previous epoch's
version range. In this case, getOldEpochTagsVersionsInfo() will not return
previous epoch's information. To correctly compute the start version for a
backup worker, we need to check previous epoch's saved version. If they are
larger than this epoch's begin version, use previously saved version as the
start version.
2020-03-18 16:41:35 -07:00
Jingyu Zhou d8c6bf585d Include a total number of tags in partition log file names
This is needed for BackupContainer to check partitioned mutation logs are
continuous, i.e., restorable to a version.
2020-03-18 16:39:40 -07:00
Jingyu Zhou 7662b8e47f Add copyright for BackupProgress.actor.cpp 2020-01-31 19:29:09 -08:00
Jingyu Zhou 1eaea91cb3 Address review comments 2020-01-22 19:42:13 -08:00
Jingyu Zhou c08a192c75 Add a backup start key
If the backup key is not set, do not recruit backup workers for old epoches.
2020-01-22 19:42:13 -08:00
Jingyu Zhou 4bed33031f Set backup worker start version to be savedVersion + 1
If no progress found, start version is set to epochBegin. So the start version
is the one after the last saved (or from last epoch's saved) version.
2020-01-22 19:42:13 -08:00
Jingyu Zhou 4ed75e37f3 BackupProgress uses old epoch's begin version if no progress found
Get rid of the complex logic of choosing the largest saved version from
previous epoch for the oldest epoch. Instead, use the begin version now
available from log system.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 19eacac3ce Add a unit test for BackupProgress 2020-01-22 19:38:46 -08:00
Jingyu Zhou 64052f6349 Check and fill backup gaps for old epochs and tags
Sometimes the backup worker has not updated progress to the system space and a
master recovery happens. As a result, next epoch doesn't know the progress of
previous ones. This change is to check for such missing gaps and fill them with
the whole range [startVersion, endVersion).

The code is refactored into BackupProgress.actor.* to consolidate backup
progress processing for the master server.
2020-01-22 19:38:46 -08:00