Commit Graph

52 Commits

Author SHA1 Message Date
Meng Xu 5022566b35 Validate if key_not_found error ever happens 2020-06-08 16:59:00 -07:00
tclinken aca995d2c5 Removed dead backup agent code 2020-05-01 14:46:59 -07:00
Meng Xu 3f510d0653
Merge pull request #3036 from jzhou77/backup-cmd
Several bug fixes for new backups
2020-04-29 16:34:59 -07:00
Jingyu Zhou 9da76c35ea Fix a memory corruption error
The backup container URL should be Key instead of KeyRef.
2020-04-29 15:55:34 -07:00
Jingyu Zhou 7d59e53349 Consolidate makePadding() 2020-04-28 15:39:23 -07:00
Meng Xu f5e8345496 FastRestoreAgent:Use atomicParallelRestore to kick off restore
Replace the handcrafted version with atomicParallelRestore actor
which is simulation tested
2020-04-27 22:15:00 -07:00
Jingyu Zhou 9bfc5bbea8 Check RestorableFileSet's key ranges in simulation
Ranges written in the manifest file should match with actual file content.
2020-04-21 12:55:40 -07:00
Jingyu Zhou 9fb3fb9d82 Add pause/resume for new backups
To pause/resume the backup workers, the fdbbackup command will write to the
backupPausedKey. Then backup workers noticed the value of the key has been
changed and stops/resumes pulling from TLog.
2020-04-06 14:29:46 -07:00
Jingyu Zhou feedab02a0
Merge pull request #2855 from xumengpanda/mengxu/fr-api-atomicrestore-PR
Add ApiCorrectnessAtomicRestore workload for the new performant restore
2020-03-25 18:05:26 -07:00
Meng Xu 1ba11dc74b Apply clang format 2020-03-25 11:20:17 -07:00
Meng Xu 120272f025 Change unlockDB from RestoreMaster to Agent 2020-03-25 11:04:49 -07:00
Meng Xu ca8966a28b Move lockDB into submitRestore request from restore worker
AtomicRestore needs to lock DB before we start the restore worker.
So we cannot lock DB in restore worker with a different randomUID.
2020-03-24 23:39:35 -07:00
Meng Xu 241c2703c8 Fix atomicParallelRestore interface 2020-03-24 17:00:55 -07:00
Meng Xu 80d62f3cb8 Fix:Add atomicParallelRestore to header 2020-03-24 16:28:08 -07:00
Meng Xu 81f7181c9e Refactor submitParallelRestore function into FileBackupAgent 2020-03-24 14:44:55 -07:00
Meng Xu 5584884c12 Refactor parallelRestoreFinish function into FileBackupAgent 2020-03-24 14:15:15 -07:00
Jingyu Zhou 44c1996950 Change all worker started to be set after all workers updated a key
Previously, all worker started is set to be when saved log versions are higher.
However, saving the versions can be wrong, as the worker is not guaranteed to
write to the right container. For instance, if the watch is triggered later,
then mutation logs are written to previous containers. So we need to ensure the
right container is ready -- all workers have acknowledged seeing the container.
2020-03-22 16:40:12 -07:00
Jingyu Zhou 38def426f4 Add a flag to submitBackup for partitioned log
This is to distinguish with old workloads so that they can work in simulation.
2020-03-20 20:15:08 -07:00
Jingyu Zhou e9287407d6 Backup worker updates latest log versions in BackupConfig
If backup worker is enabled, the current epoch's worker of tag (-2,0) will be
responsible for monitoring the backup progress of all workers and update the
BackupConfig with the latest saved log version, which is the minimum version
of all tags.

This change has been incorporated in the getLatestRestorableVersion() so that
it is transparent to clients.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 35aafefb89 Consolidate StringRefReader classes
Fix a compiler error of unused variable too.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 5a602f58e8 Start backup with a wait on all backup workers running
This wait is to make sure that backup workers are already saving mutations so
that no mutations are missed. The idea is that the CLI sets a "backupStartedKey"
in the database and waits for allWorkerStarted() key of the backup to be set.

Backup workers monitor the changes to the "backupStartedKey" and start logging
mutations. Additionally, backup worker for Tag(-2,0) monitors all other workers
have started (checking their saved progress version is larger than the backup's
start version), and then sets the allWorkerStarted() key for the backup.
2020-01-31 19:29:09 -08:00
Jingyu Zhou 7f7ec99170 Serialize and deserialize new backup files
The BackupWorker writes files that can be read by FileConverter. Move
StringRefReader to the header file for reuse in FileConverter.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 485d3d0feb Use Version instead of int64_t 2020-01-22 19:38:45 -08:00
Alvin Moore 7628d04fb9 Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00
Steve Atherton 4ff058e86b Backup and DR layer status document generation now uses snapshot reads for all keys read to avoid unnecessary conflicts when read during a status update or cleanup transaction. Since many of the keys read use wrapper functions, all of the underlying functions in BackupAgentBase and its two implementations also required a snapshot mode argument. All snapshot arguments default to false to match the underlying FDB API get/getrange methods. 2019-12-19 00:29:35 -08:00
Meng Xu d0147e5e5d Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
Resolved Conflicts:
	documentation/sphinx/source/release-notes.rst
	fdbserver/DataDistribution.actor.cpp
	versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen ef01ad2ed8 optimized log range clearing to clear everything for each possible hash (256 clears) if that would be more efficient than one clear per second that has elapsed
aborting a DR without the —cleanup flag will still attempt to cleanup for 30 seconds before giving up
added a cleanup command to fdbbackup which can remove mutations from orphaned DRs which were stopped without the —cleanup flag
2019-09-27 18:32:27 -07:00
Meng Xu d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Jingyu Zhou 4c0e824456 Include unactorcompiler.h at the end of *.actor.h 2019-07-10 14:51:52 -07:00
Meng Xu 477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
Meng Xu 529ce66b6c Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-18 18:02:45 -07:00
Andrew Noyes 6207d724f8 Fix all -Wunused-variable warnings 2019-04-15 18:13:00 -07:00
Meng Xu 70d7c289f4 Merge branch 'master' into mengxu/restore/parallel-v7 2019-03-30 22:13:10 -07:00
Stephen Atherton d5e50e6963 Rewrote BackupAgentBase::parseTime() to use std::get_time() so it compiles on all supported platforms. Added unit test for parseTime() and formatTime(). 2019-03-20 01:18:37 -07:00
Steve Atherton 8aab719c22
Merge branch 'master' into feature-backup-json 2019-03-12 18:23:16 -07:00
Stephen Atherton bc0b2aa040 Merge branch 'release-6.0' of https://github.com/apple/foundationdb
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BlobStore.actor.cpp
2019-03-12 04:49:12 -07:00
Stephen Atherton f0eae0295f Merge branch 'master' of https://github.com/apple/foundationdb into feature-backup-json 2019-03-12 03:35:03 -07:00
Stephen Atherton f2953db7d8 Added updating of backup snapshot shards behind in snapshot dispatcher so status can determine if a snapshot is lagging the configured speed. 2019-03-11 01:25:51 -07:00
Stephen Atherton 023bbb566f Renamed backup state enums for clarity, added backup state names. Changed Epochs to EpochSeconds in backup JSON along with some other renaming/moving of fields, and added information about snapshot dispatch. Changed timestamp format for input/output in all backup/restore contexts to be a fully qualified time with timezone offset. Added information about the last snapshot dispatch to backup config and status (not yet populated). 2019-03-10 16:00:01 -07:00
Stephen Atherton 06c11a316d Normalized timestamp to text format across backup and restore tooling. Added epochs field to JSON objects describing versions and timestamps in backup status and describe output, renamed some fields for clarity. 2019-03-06 22:34:25 -08:00
Stephen Atherton 1399aee532 Added --json option to fdbbackup status. 2019-03-06 21:32:46 -08:00
Balachandar Namasivayam f3391ea413
Merge pull request #1240 from satherton/feature-restore-by-timestamp
Restore by timestamp
2019-03-06 16:21:06 -08:00
Stephen Atherton 7778112f6a Bug fix, restore was using the destination cluster to look up timestamps when printing the backup description instead of (optionally) the original cluster which generated the backup. Made missing cluster file errors more clear. 2019-03-06 02:45:55 -08:00
Alex Miller c6a65389ae Remove noexcept macro and replace with BOOST_NOEXCEPT.
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
Steve Atherton 21f55e1878
Merge pull request #1190 from bnamasivayam/restore-multiple-ranges
Add support for restoring multiple ranges.
2019-03-05 10:15:55 -08:00
Balachandar Namasivayam a258df32f6 Skip switchover checks for force option. 2019-03-04 15:58:36 -08:00
Balachandar Namasivayam 7eba50b086 Add support for restoring multiple ranges. 2019-02-25 18:00:28 -08:00
mpilman f79a9594c1 Several bugfixes to make fdb build on non-ide 2019-02-19 15:16:59 -08:00