foundationdb

Commit Graph

Author	SHA1	Message	Date
Jingyu Zhou	6be913a430	Add partitioned logs option to AtomicRestore workload	2020-03-26 13:04:00 -07:00
Jingyu Zhou	aca458cd96	Set 50% chance to restore old backup files for fast restore	2020-03-26 13:04:00 -07:00
Jingyu Zhou	99f4ef6e0c	Fix restore loader to handle mutation sub number For old backup format, give them a sub sequence number starting from 0 for each commit version.	2020-03-26 13:04:00 -07:00
Jingyu Zhou	40b17e1e9b	Remove a no longer unused knob	2020-03-26 13:04:00 -07:00
Jingyu Zhou	772ab70aee	Add an option for fast restore to restore old backups If "usePartitionedLogs" is set to false, then the workload uses old backups for restore.	2020-03-26 13:04:00 -07:00
Meng Xu	1052b23ee1	Merge pull request #2370 from atn34/test-watch-outliving-transaction Test watch outliving transaction	2020-03-26 12:40:38 -07:00
Andrew Noyes	cdb6bbfc85	Test watch outliving transaction	2020-03-26 10:09:03 -07:00
Jingyu Zhou	feedab02a0	Merge pull request #2855 from xumengpanda/mengxu/fr-api-atomicrestore-PR Add ApiCorrectnessAtomicRestore workload for the new performant restore	2020-03-25 18:05:26 -07:00
Evan Tschannen	bb5799bd20	Merge pull request #2642 from xumengpanda/mengxu/new-backup-format-PR FastRestore:Integrate with new backup format	2020-03-25 15:47:55 -07:00
Jingyu Zhou	0f57bf9685	Remove a SevError event The same mutation can be present in overlapping mutation logs. Thus we cannot assert its absence. This can be caused for multiple reasons. One possibility is that new TLogs can copy mutations from old generation TLogs; another one is backup worker is recruited without knowning previously saved progress.	2020-03-25 15:23:21 -07:00
Meng Xu	1ba11dc74b	Apply clang format	2020-03-25 11:20:17 -07:00
Meng Xu	120272f025	Change unlockDB from RestoreMaster to Agent	2020-03-25 11:04:49 -07:00
Jingyu Zhou	472f7bdd32	Rename a trace event to avoid confusion Change from BackupRange to BackupVersionRange.	2020-03-25 11:03:05 -07:00
Evan Tschannen	e0fbd9ecbe	Merge pull request #2847 from atn34/atn34/assert-no-return Assert recoverAndEndEpoch does not become ready	2020-03-25 10:23:38 -07:00
Jingyu Zhou	e2f317a0da	Fix a crash failure	2020-03-25 09:18:49 -07:00
Jingyu Zhou	00fb4c1a35	Fix an off by one error Backup worker's saved version should start from its startVersion - 1, i.e., the startVersion is not saved yet. Otherwise, if the version range is just the startVersion itself and there is no data, then the range [startVersion, startVersion + 1) will be missing. This causes non-continuous partitioned logs.	2020-03-24 23:40:36 -07:00
Meng Xu	ca8966a28b	Move lockDB into submitRestore request from restore worker AtomicRestore needs to lock DB before we start the restore worker. So we cannot lock DB in restore worker with a different randomUID.	2020-03-24 23:39:35 -07:00
Meng Xu	6a8d6ddb8e	Introduce ParallelRestoreApiCorrectnessAtomicRestore.txt test This covers ApiCorrectnessTest as workload for parallel restore.	2020-03-24 22:30:51 -07:00
Jingyu Zhou	669916467e	Add missing transaction reset call	2020-03-24 20:14:37 -07:00
Jingyu Zhou	5e729a5bcf	Merge branch 'master' of https://github.com/apple/foundationdb into backup-worker-bak	2020-03-24 19:54:36 -07:00
Jingyu Zhou	edcbeb8992	Address review comments Move transaction object outside of the loop and rename trace events.	2020-03-24 18:22:20 -07:00
Meng Xu	b173929316	Add atomicParallelRestore to AtomicRestore workload	2020-03-24 15:58:49 -07:00
Meng Xu	81f7181c9e	Refactor submitParallelRestore function into FileBackupAgent	2020-03-24 14:44:55 -07:00
Meng Xu	5584884c12	Refactor parallelRestoreFinish function into FileBackupAgent	2020-03-24 14:15:15 -07:00
Jingyu Zhou	a3058e7d96	Fix incorrectly marking a backup job as stopped This causes missing version ranges for mutation logs.	2020-03-23 22:05:58 -07:00
Jingyu Zhou	1155304cd5	Remove a spurious assertion It's possible that there is a gap between backup's contiguousLogEnd and snapshot version.	2020-03-23 21:39:40 -07:00
Jingyu Zhou	82a1790776	Fix backup worker crash due to aborted backup job If a backup job is aborted, the "startedBackupWorkers" key can be cleared, thus triggering the assertion failure.	2020-03-23 21:11:25 -07:00
Jingyu Zhou	243d078596	Fix off by one error Epoch end version is saved version + 1, so need +1 for minBackupVersion.	2020-03-23 20:44:31 -07:00
Jingyu Zhou	f1d7fbafb4	Stop actors for displaced backup workers If the worker is displaced, it should not update backup containers.	2020-03-23 18:48:06 -07:00
Jingyu Zhou	dd90845277	Fix assert failure Should be backup's contiguousLogEnd > maxRestorableVersion.	2020-03-23 14:49:05 -07:00
Jingyu Zhou	196127fb92	Address review comments	2020-03-23 14:15:36 -07:00
Jingyu Zhou	fd7643c322	Remove a variable	2020-03-23 13:45:48 -07:00
Jingyu Zhou	90b40e1d75	Merge branch 'mengxu/new-backup-format-PR-delta' of github.com:xumengpanda/foundationdb into backup-worker-bak Resolve Conflicts: fdbclient/BackupAgent.actor.h fdbserver/BackupWorker.actor.cpp fdbserver/RestoreMaster.actor.cpp fdbserver/masterserver.actor.cpp	2020-03-23 13:35:33 -07:00
Meng Xu	be67ab4d6a	Correct comment based on review	2020-03-23 12:53:40 -07:00
Jingyu Zhou	f0f4e42a4c	Add removal for backupWorkerCache	2020-03-23 12:47:42 -07:00
Andrew Noyes	fa8eaf9810	Assert recoverAndEndEpoch does not become ready	2020-03-23 12:40:00 -07:00
Meng Xu	0fcd6c98d4	Include simulator.h to RestoreWorker	2020-03-23 11:34:02 -07:00
Meng Xu	48db54424f	Add assassination workload to restore test workload Add assert to ensure restore worker is reliable and not killed.	2020-03-23 11:11:13 -07:00
Meng Xu	51047a6c1d	Protect restore worker from assassination in simulation	2020-03-23 11:06:40 -07:00
Meng Xu	3f31ebf659	New backup:Revise event name and explain code	2020-03-23 10:55:44 -07:00
Jingyu Zhou	a8c2acdba0	Count the unique number of tags in startedBackupWorkers	2020-03-23 10:44:26 -07:00
Jingyu Zhou	658504bc66	Add a cache to handle repeated delivery of backup recruitment messages	2020-03-23 10:22:24 -07:00
Jingyu Zhou	1552653f1c	Backup Worker: Cancel the actor when container is stopped	2020-03-22 21:08:11 -07:00
Jingyu Zhou	33ea027f84	Make sure only current epoch's backup workers update all workers So that backup workers from old epochs don't mess with the list of all workers.	2020-03-22 18:28:22 -07:00
Jingyu Zhou	44c1996950	Change all worker started to be set after all workers updated a key Previously, all worker started is set to be when saved log versions are higher. However, saving the versions can be wrong, as the worker is not guaranteed to write to the right container. For instance, if the watch is triggered later, then mutation logs are written to previous containers. So we need to ensure the right container is ready -- all workers have acknowledged seeing the container.	2020-03-22 16:40:12 -07:00
Jingyu Zhou	97702d91c8	Skip recruiting backup workers for older epochs before min backup version When master starts recruiting backup workers, if there is no active backup job or the min version of the backup job is greater than old epoch's end version, then these old epochs can be skipped.	2020-03-21 13:44:02 -07:00
Jingyu Zhou	0eacf1cdab	trackTlogRecovery listens on backup worker change events Old TLogs can only be removed when backup workers no long need them (i.e., the oldest backup epoch == current epoch). As a result, the core state changes need include backup worker changes, which updates the oldest backup epoch.	2020-03-20 20:17:32 -07:00
Jingyu Zhou	818072f3cb	Set oldest backup epoch if not recruiting backup workers Since tlog is not kept until backup worker has pulled mutations from it, the old tlogs can only be displaced after oldest backup epoch equals current epoch. So if master is not recruiting backup workers, it should set the oldest backup epoch as the current epoch.	2020-03-20 20:16:43 -07:00
Jingyu Zhou	0fe2810425	Fix repeated backup progress checking in backup worker The delay is not used, which caused repeated progress checking in worker 0.	2020-03-20 20:16:43 -07:00
Jingyu Zhou	4a499a3c97	Remove backup worker's first and last pop The first pop of current epoch can pop old epoch's data before they are saved. The last pop of a stopped backup worker should be skipped so that after recovery, the data is still accessible in case the last epoch's progress saving transaction is delayed.	2020-03-20 20:16:43 -07:00

1 2 3 4 5 ...

3939 Commits