Commit Graph

8323 Commits

Author SHA1 Message Date
Daniel Smith 7aebe05ff9
Merge pull request #2 from apple/master
Merge upstream
2020-02-07 11:43:53 -05:00
Evan Tschannen 2d4f24ac48
Merge pull request #2645 from Daniel-B-Smith/only-one-conflict
Only use one MiniConflictSet
2020-02-06 15:55:09 -08:00
Evan Tschannen 844c8511c4
Merge pull request #2588 from jzhou77/backup-worker
Integrate new backup worker with existing backup command
2020-02-05 14:14:43 -08:00
Jingyu Zhou c43ac4c38f Backup worker: Construct range map on-demand
This is to reduce the number of map lookups in the original code.
2020-02-05 11:47:05 -08:00
Jingyu Zhou d5849af5c0 Address review comments 2020-02-05 10:33:51 -08:00
Jingyu Zhou e32750931b Backup worker: Remove stopped backups and fix block ends 2020-02-04 16:01:27 -08:00
Evan Tschannen 8449badb3e
Merge pull request #1868 from dongxinEric/fix/1827/error_instead_of_timeout
Send error back before put the GRV request with PRIORITY_BATCH into t…
2020-02-04 14:32:47 -08:00
Jingyu Zhou c95d52cd18 Make mutation log file continuous w.r.t. versions
Each file is encoded with [startVersion, endVersion) range so that we can
easily detect missing ranges (i.e., files) in a backup container.
2020-02-04 14:30:32 -08:00
Evan Tschannen bf7d7e2f1e
Merge pull request #2499 from ajbeamon/ratekeeper-durable-version-smoother-fix
Fix inaccurate limiting durability lag
2020-02-04 13:04:58 -08:00
Meng Xu 8fe8c4b1de
Merge pull request #2638 from atn34/atn34/fix-open-for-ide
Fix OPEN_FOR_IDE build
2020-02-04 11:24:19 -08:00
Jingyu Zhou 52c6737411 Rename backupLoggingEnabled as backupWorkerEnabled
To highlight the changes for 7.0 backup changes. By default,
backup_worker_enabled flag is set for 7.0 version.
2020-02-04 10:09:16 -08:00
Daniel Smith 7153077f87 Only use one mini conflict set 2020-02-04 11:19:50 -05:00
Daniel Smith adbb1b56f7
Merge pull request #1 from apple/master
Merge upstream
2020-02-04 11:18:39 -05:00
Jingyu Zhou 28349e2b03 Backup worker checks backup ranges
Mutations are only logged when they are within the backup ranges, which means a
range clear mutation has to calculate the intersection ranges and divide a clear
into potentially multiple clear mutations. This pare of code is modeled after
how proxy handles backup mutations.
2020-02-03 20:27:31 -08:00
Jingyu Zhou 7c10683c77 Backup workers save logs into right containers
The mutation logs of backup workers are saved into "mlogs" directory under the
container directory. The backup worker has been restructured to handle multiple
backups, where each one is stored in a separate backup container.

In the backup worker, mutations pulled from TLogs are buffered in a message
queue. When writing out to different containers, their corresponding mutation
ranges are used to check if a mutation should be written. When a new backup
is submitted by the client, "backupStartedKey" is updated. The worker monitors
this key, updates its internal map of backups, and then next pull from TLog
needs to wait for the readiness of the new backup. This is to ensure when
worker 0 sets the backup is started, all workers have already been logging
mutations for the backup.
2020-02-03 20:27:14 -08:00
Jingyu Zhou 0db03f1d3c Use backup_logging_enabled flag
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Alex Miller 60c699d624
Merge pull request #2630 from Daniel-B-Smith/resolver-conflict
Clean up SkipLists (formatting / dead code no op)
2020-02-03 18:51:50 -08:00
Evan Tschannen 4524831456
Merge pull request #2518 from vishesh/task/failmon-remove-server
FailureMonitoring: Server processes no longer need to talk to ClusterController
2020-02-03 17:22:50 -08:00
A.J. Beamon cc6539093c
Merge pull request #2633 from xumengpanda/mengxu/knob-explain
Explain knob SHARD_MAX_BYTES_PER_KSEC
2020-02-03 14:33:35 -08:00
Jingyu Zhou 2b81b128f0
Merge pull request #2636 from jzhou77/valgrind-fix
Fix valgrind found error of reading uninitialized data
2020-02-03 13:12:49 -08:00
Andrew Noyes f16c4cfa66 Fix second semantic error 2020-02-03 11:20:53 -08:00
Andrew Noyes f25e913db2 Fix semantic change 2020-02-03 11:15:28 -08:00
Andrew Noyes 09f3690f09 Fix OPEN_FOR_IDE build 2020-02-03 10:42:05 -08:00
Meng Xu aa601adcd7
Apply suggestions from code review
Correct typo.

Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-03 09:49:26 -08:00
Vishesh Yadav becc01923a
Merge pull request #2626 from ajbeamon/update-api-version-to-700
Update API version to 700
2020-02-03 09:49:24 -08:00
Vishesh Yadav 58e67c970d
Merge pull request #2622 from ajbeamon/fdbkeyvalue-use-uint8-t
Change FDBKeyValue's key and value members to have type uint8_t*
2020-02-03 09:48:08 -08:00
Jingyu Zhou d73a19fea4 Fix valgrind found error of reading uninitialized data 2020-02-02 13:16:23 -08:00
Jingyu Zhou 7662b8e47f Add copyright for BackupProgress.actor.cpp 2020-01-31 19:29:09 -08:00
Jingyu Zhou f303c7f34b Add backup_type to documentation 2020-01-31 19:29:09 -08:00
Jingyu Zhou 297f22726c Add backup_type database configuration option
Update simulation tests to randomly set backup types to be one of: old backup
(default), new backup (tagged), or both (default+tagged).
2020-01-31 19:29:09 -08:00
Jingyu Zhou 38aa1903fd Add a DB configuration option for backup workers
Right now, the default is to keep the old backup behavior, i.e., do NOT use
backup workers. Specifically, if BackupType is not set (or is set to default),
the master will not recruit backup workers and will not add pseudo locality for
backup workers.

The StartFullBackupTaskFunc is updated to check if backup worker is enabled.
Only when it is not enabled, starting a backup will wait on all backup workers
to be started.
2020-01-31 19:29:09 -08:00
Jingyu Zhou f7956cfbfc Clear backup UID from backupStartedKey when finish/abort backups
Clearing this key signals backup workers that backup is no longer needed. When
no backup is going on, the backup workers switch to the NOOP state.
2020-01-31 19:29:09 -08:00
Jingyu Zhou 19ef7f6bdb Skip watch of backup task's started key if it's already set
The backup task may be restarted multiple times so the started key for the
backup task may already be set. In this case, the wait on watch should be
skipped.
2020-01-31 19:29:09 -08:00
Jingyu Zhou 7cf2881fe8 Fix backup worker ID and remove some fields
The backup worker ID was changed to be the interface ID, not the request ID.
The lastSeenVersion is replaced with minKnownCommittedVersion, which is
incremented even when there is no mutations. This also avoid the problem of
setting popVersion to be higher than the actual committed version (no harm
here though, as there are no mutations).

The "backupStartedKey" handling is also fixed to correctly handle cases when
we wait for the stopping of backups.
2020-01-31 19:29:09 -08:00
Jingyu Zhou 7544ff88d9 Comment out frequent TLogPop trace event 2020-01-31 19:29:09 -08:00
Jingyu Zhou f8342f0884 Add keepRunning for start backup transaction
TaskBucket::keepRunning() needs to be called in backup transactions to be sure
that the task has not been cancelled. If so, the task is cancelled. Otherwise,
the task can continue run, causing multiple runs of the same task.

Another subtle issue is that the beginVersion is persisted on backupStartedKey.
So while reading it back from that key, we should set task's beginVersion with
the value persisted earlier.
2020-01-31 19:29:09 -08:00
Jingyu Zhou 5a602f58e8 Start backup with a wait on all backup workers running
This wait is to make sure that backup workers are already saving mutations so
that no mutations are missed. The idea is that the CLI sets a "backupStartedKey"
in the database and waits for allWorkerStarted() key of the backup to be set.

Backup workers monitor the changes to the "backupStartedKey" and start logging
mutations. Additionally, backup worker for Tag(-2,0) monitors all other workers
have started (checking their saved progress version is larger than the backup's
start version), and then sets the allWorkerStarted() key for the backup.
2020-01-31 19:29:09 -08:00
Jingyu Zhou e9c7ad82cc Comment out pseudo tag pop trace event 2020-01-31 19:29:09 -08:00
Jingyu Zhou 83907bd453 Backup worker allows on and off of backups
The monitoring loop of system key "backupStartedKey" and decides to be in one
of two modes: NOOP and backup. In the NOOP mode, the worker just pop TLogs. In
the backup mode, the worker pulls mutations from TLogs and save the mutations
into logs.
2020-01-31 19:29:09 -08:00
Meng Xu 962024d8b8 Explain knob SHARD_MAX_BYTES_PER_KSEC
Explain why it may cost 100MB data movement.
No code change.
2020-01-31 17:04:11 -08:00
Xin Dong 7016f7903b Fixed another build error. Do not use timeReplyIgnoreError since we don not want the logging inside that function and thus that's unnecessary anymore. Change to use ready() which basically ignores the error. 2020-01-31 15:48:29 -08:00
Xin Dong c1f992667b Fix build failure 2020-01-31 14:27:47 -08:00
Xin Dong 8d28c2a7f0 Added two new counters for transaction throttled error and remove the verbose trace event logging. Also changed a chain of 'if' statements into 'if-else' statements since they are mutal exclusive 2020-01-31 14:16:39 -08:00
A.J. Beamon 3ddadf0481
Merge pull request #2628 from ajbeamon/add-transaction-profiling-analyzer-tests
Add transaction_profiling_analyzer tests
2020-01-31 14:07:19 -08:00
Daniel Smith 3f6cacb477 Delete a bunch more code 2020-01-31 12:57:43 -05:00
Daniel Smith 3c64e77179 Delete unreachable code 2020-01-31 12:33:24 -05:00
Daniel Smith f5be0db366 clang-format 2020-01-31 12:32:58 -05:00
Xin Dong 800d2e560c Apply review comments. 2020-01-31 09:32:22 -08:00
Xin Dong f8513327dc
Update fdbserver/MasterProxyServer.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-01-31 09:30:37 -08:00
Xin Dong ca8a4c0096 Fix CTest failure. 2020-01-30 16:32:32 -08:00