Commit Graph

3960 Commits

Author SHA1 Message Date
Meng Xu 42df1e7792
Merge pull request #2879 from jzhou77/backup-progress
Update mutation bytes written for new backups
2020-03-30 21:42:45 -07:00
Meng Xu a85652375c
Merge pull request #2872 from jzhou77/backup-fix
Switch off old mutation logging on proxies for new backups
2020-03-30 21:42:10 -07:00
Meng Xu 60f6edc3b5
Merge pull request #2860 from zjuLcg/report-conflicting-key-roll-forward
Report conflicting key roll forward
2020-03-30 17:33:56 -07:00
Jingyu Zhou 411b4c28ac Update mutation bytes written for new backups
This will make the log bytes written available to backup status and describe
backup calls.
2020-03-29 21:23:34 -07:00
Jingyu Zhou 65e3b9192e Add an assert for probably dead code 2020-03-28 21:19:47 -07:00
Jingyu Zhou 280bc94738 Do not recruit backup workers with wrong tags
In a rare scenario, the master can recruit backup workers with more tags than
the number of log router tags for an epoch. This can be caused by an
unsuccessful recovery, which uses more tags than the next epoch. When
recruiting for the next epoch, if no progress has been made yet, the recruiting
logic will look back at the previous epoch. If previous epoch has saved past
this epoch's begin version, current epoch's progress is updated with that
information and can result in more tags being inserted to this epoch's
recruitment.
2020-03-28 21:19:41 -07:00
Meng Xu 13f343ec96 Resolve minor review comment 2020-03-28 16:03:01 -07:00
Meng Xu 8a30526336 FastRestore:Remove commented assertion 2020-03-28 13:11:32 -07:00
Meng Xu 404a3e2619 FastRestore:Loader:Remove sanity chech for the order of sending log and range mutations 2020-03-27 23:36:13 -07:00
Meng Xu 21a5c67f9a FastRestore:Remove assertion on mutation sending order 2020-03-27 17:09:31 -07:00
Meng Xu 75fc9af5c8 Apply clang format 2020-03-27 16:55:52 -07:00
Meng Xu 0222e8096c FastRestore:Send log mutations and range mutations in parallel
With the subversion extension, appliers can order log and range mutations
based on LogMessageVersion instead of sending order.
2020-03-27 16:54:19 -07:00
Meng Xu f7233bade7 Rename ParallelRestoreCorrectnessAtomicOpTinyData.txt by removing TinyData 2020-03-27 13:08:59 -07:00
Meng Xu 97f8e46388 Sanity check subversion for log mutations 2020-03-27 13:07:08 -07:00
Meng Xu 32b0ba1822 Merge branch 'master' into mengxu/parallel-range-log-file-loading-PR 2020-03-27 12:13:47 -07:00
Meng Xu 113d0fb48b Remove incorrect assertion 2020-03-27 12:13:30 -07:00
chaoguang 64148469e8 clang-format the pr 2020-03-26 15:52:30 -07:00
Jingyu Zhou 9a9af7d8a8 Add more trace event details on partitioned log 2020-03-26 13:57:31 -07:00
Meng Xu 6299ad3913 FastRestore:Load range and log files in parallel for new backup format 2020-03-26 13:17:44 -07:00
Jingyu Zhou 6be913a430 Add partitioned logs option to AtomicRestore workload 2020-03-26 13:04:00 -07:00
Jingyu Zhou aca458cd96 Set 50% chance to restore old backup files for fast restore 2020-03-26 13:04:00 -07:00
Jingyu Zhou 99f4ef6e0c Fix restore loader to handle mutation sub number
For old backup format, give them a sub sequence number starting from 0 for each
commit version.
2020-03-26 13:04:00 -07:00
Jingyu Zhou 40b17e1e9b Remove a no longer unused knob 2020-03-26 13:04:00 -07:00
Jingyu Zhou 772ab70aee Add an option for fast restore to restore old backups
If "usePartitionedLogs" is set to false, then the workload uses old backups for
restore.
2020-03-26 13:04:00 -07:00
Meng Xu 1052b23ee1
Merge pull request #2370 from atn34/test-watch-outliving-transaction
Test watch outliving transaction
2020-03-26 12:40:38 -07:00
Andrew Noyes cdb6bbfc85 Test watch outliving transaction 2020-03-26 10:09:03 -07:00
Jingyu Zhou feedab02a0
Merge pull request #2855 from xumengpanda/mengxu/fr-api-atomicrestore-PR
Add ApiCorrectnessAtomicRestore workload for the new performant restore
2020-03-25 18:05:26 -07:00
Evan Tschannen bb5799bd20
Merge pull request #2642 from xumengpanda/mengxu/new-backup-format-PR
FastRestore:Integrate with new backup format
2020-03-25 15:47:55 -07:00
Jingyu Zhou 0f57bf9685 Remove a SevError event
The same mutation can be present in overlapping mutation logs. Thus we cannot
assert its absence. This can be caused for multiple reasons. One possibility
is that new TLogs can copy mutations from old generation TLogs; another one
is backup worker is recruited without knowning previously saved progress.
2020-03-25 15:23:21 -07:00
Meng Xu 1ba11dc74b Apply clang format 2020-03-25 11:20:17 -07:00
Meng Xu 120272f025 Change unlockDB from RestoreMaster to Agent 2020-03-25 11:04:49 -07:00
Jingyu Zhou 472f7bdd32 Rename a trace event to avoid confusion
Change from BackupRange to BackupVersionRange.
2020-03-25 11:03:05 -07:00
Evan Tschannen e0fbd9ecbe
Merge pull request #2847 from atn34/atn34/assert-no-return
Assert recoverAndEndEpoch does not become ready
2020-03-25 10:23:38 -07:00
Jingyu Zhou e2f317a0da Fix a crash failure 2020-03-25 09:18:49 -07:00
chaoguang 62627dd2ee Fix a randomness bug and naming issue in TraceEvent 2020-03-25 00:55:40 -07:00
Jingyu Zhou 00fb4c1a35 Fix an off by one error
Backup worker's saved version should start from its startVersion - 1, i.e.,
the startVersion is not saved yet. Otherwise, if the version range is just
the startVersion itself and there is no data, then the range [startVersion,
startVersion + 1) will be missing. This causes non-continuous partitioned logs.
2020-03-24 23:40:36 -07:00
Meng Xu ca8966a28b Move lockDB into submitRestore request from restore worker
AtomicRestore needs to lock DB before we start the restore worker.
So we cannot lock DB in restore worker with a different randomUID.
2020-03-24 23:39:35 -07:00
Meng Xu 6a8d6ddb8e Introduce ParallelRestoreApiCorrectnessAtomicRestore.txt test
This covers ApiCorrectnessTest as workload for parallel restore.
2020-03-24 22:30:51 -07:00
Jingyu Zhou 669916467e Add missing transaction reset call 2020-03-24 20:14:37 -07:00
Jingyu Zhou 5e729a5bcf Merge branch 'master' of https://github.com/apple/foundationdb into backup-worker-bak 2020-03-24 19:54:36 -07:00
Jingyu Zhou edcbeb8992 Address review comments
Move transaction object outside of the loop and rename trace events.
2020-03-24 18:22:20 -07:00
Andrew Noyes 289487559d Revert "Revert "Merge pull request #2257 from zjuLcg/report-conflicting-key""
This reverts commit 804fe1b22e.
2020-03-24 18:11:15 -07:00
Meng Xu b173929316 Add atomicParallelRestore to AtomicRestore workload 2020-03-24 15:58:49 -07:00
Meng Xu 81f7181c9e Refactor submitParallelRestore function into FileBackupAgent 2020-03-24 14:44:55 -07:00
Meng Xu 5584884c12 Refactor parallelRestoreFinish function into FileBackupAgent 2020-03-24 14:15:15 -07:00
Jingyu Zhou a3058e7d96 Fix incorrectly marking a backup job as stopped
This causes missing version ranges for mutation logs.
2020-03-23 22:05:58 -07:00
Jingyu Zhou 1155304cd5 Remove a spurious assertion
It's possible that there is a gap between backup's contiguousLogEnd and snapshot
version.
2020-03-23 21:39:40 -07:00
Jingyu Zhou 82a1790776 Fix backup worker crash due to aborted backup job
If a backup job is aborted, the "startedBackupWorkers" key can be cleared, thus
triggering the assertion failure.
2020-03-23 21:11:25 -07:00
Jingyu Zhou 243d078596 Fix off by one error
Epoch end version is saved version + 1, so need +1 for minBackupVersion.
2020-03-23 20:44:31 -07:00
Jingyu Zhou f1d7fbafb4 Stop actors for displaced backup workers
If the worker is displaced, it should not update backup containers.
2020-03-23 18:48:06 -07:00