foundationdb

Commit Graph

Author	SHA1	Message	Date
Meng Xu	85a9f6ab96	Merge branch 'master' into mengxu/fr-enable-rollback-and-abort-test-PR	2020-03-23 16:32:00 -07:00
Meng Xu	14c641ce2b	Enable rollback workload	2020-03-23 16:29:01 -07:00
Jingyu Zhou	cede1500cd	Merge pull request #2848 from xumengpanda/mengxu/fr-multi-cycle-test-PR Performant restore [22/xx]: Introduce multiple cycle tests for the restore	2020-03-23 16:27:15 -07:00
Meng Xu	4756447b74	Enable abortAndRestartAfter option for restore cycle test	2020-03-23 15:15:14 -07:00
Meng Xu	f936adedb0	Change tab to space for test workload file	2020-03-23 15:10:35 -07:00
Jingyu Zhou	dd90845277	Fix assert failure Should be backup's contiguousLogEnd > maxRestorableVersion.	2020-03-23 14:49:05 -07:00
Jingyu Zhou	196127fb92	Address review comments	2020-03-23 14:15:36 -07:00
Meng Xu	d0bce1a105	Add ParallelRestoreCorrectnessCycle test into CMakeList	2020-03-23 13:56:33 -07:00
Meng Xu	9c2c3b26d3	Merge branch 'master' into mengxu/fr-multi-cycle-test-PR	2020-03-23 13:54:20 -07:00
Jingyu Zhou	9a50458a64	Merge pull request #2846 from xumengpanda/mengxu/fr-add-attrition-to-test-PR Performant restore [21/xx]: Enable assassination workload in restore test	2020-03-23 13:52:01 -07:00
Meng Xu	00ada7e086	Enable attrition workload for restore multi cycle tests	2020-03-23 13:50:05 -07:00
Jingyu Zhou	fd7643c322	Remove a variable	2020-03-23 13:45:48 -07:00
Jingyu Zhou	90b40e1d75	Merge branch 'mengxu/new-backup-format-PR-delta' of github.com:xumengpanda/foundationdb into backup-worker-bak Resolve Conflicts: fdbclient/BackupAgent.actor.h fdbserver/BackupWorker.actor.cpp fdbserver/RestoreMaster.actor.cpp fdbserver/masterserver.actor.cpp	2020-03-23 13:35:33 -07:00
Meng Xu	ace19eefe4	Introduce multi cycle tests for fast restore	2020-03-23 12:59:43 -07:00
Meng Xu	be67ab4d6a	Correct comment based on review	2020-03-23 12:53:40 -07:00
Jingyu Zhou	f0f4e42a4c	Add removal for backupWorkerCache	2020-03-23 12:47:42 -07:00
Andrew Noyes	fa8eaf9810	Assert recoverAndEndEpoch does not become ready	2020-03-23 12:40:00 -07:00
Meng Xu	5b2b2a8767	Enable assassination for restore cycle test	2020-03-23 12:28:04 -07:00
Meng Xu	0fcd6c98d4	Include simulator.h to RestoreWorker	2020-03-23 11:34:02 -07:00
Meng Xu	48db54424f	Add assassination workload to restore test workload Add assert to ensure restore worker is reliable and not killed.	2020-03-23 11:11:13 -07:00
Meng Xu	51047a6c1d	Protect restore worker from assassination in simulation	2020-03-23 11:06:40 -07:00
Meng Xu	3f31ebf659	New backup:Revise event name and explain code	2020-03-23 10:55:44 -07:00
Jingyu Zhou	a8c2acdba0	Count the unique number of tags in startedBackupWorkers	2020-03-23 10:44:26 -07:00
Jingyu Zhou	658504bc66	Add a cache to handle repeated delivery of backup recruitment messages	2020-03-23 10:22:24 -07:00
Jingyu Zhou	1552653f1c	Backup Worker: Cancel the actor when container is stopped	2020-03-22 21:08:11 -07:00
Jingyu Zhou	33ea027f84	Make sure only current epoch's backup workers update all workers So that backup workers from old epochs don't mess with the list of all workers.	2020-03-22 18:28:22 -07:00
Jingyu Zhou	44c1996950	Change all worker started to be set after all workers updated a key Previously, all worker started is set to be when saved log versions are higher. However, saving the versions can be wrong, as the worker is not guaranteed to write to the right container. For instance, if the watch is triggered later, then mutation logs are written to previous containers. So we need to ensure the right container is ready -- all workers have acknowledged seeing the container.	2020-03-22 16:40:12 -07:00
Jingyu Zhou	97702d91c8	Skip recruiting backup workers for older epochs before min backup version When master starts recruiting backup workers, if there is no active backup job or the min version of the backup job is greater than old epoch's end version, then these old epochs can be skipped.	2020-03-21 13:44:02 -07:00
Jingyu Zhou	0eacf1cdab	trackTlogRecovery listens on backup worker change events Old TLogs can only be removed when backup workers no long need them (i.e., the oldest backup epoch == current epoch). As a result, the core state changes need include backup worker changes, which updates the oldest backup epoch.	2020-03-20 20:17:32 -07:00
Jingyu Zhou	818072f3cb	Set oldest backup epoch if not recruiting backup workers Since tlog is not kept until backup worker has pulled mutations from it, the old tlogs can only be displaced after oldest backup epoch equals current epoch. So if master is not recruiting backup workers, it should set the oldest backup epoch as the current epoch.	2020-03-20 20:16:43 -07:00
Jingyu Zhou	0fe2810425	Fix repeated backup progress checking in backup worker The delay is not used, which caused repeated progress checking in worker 0.	2020-03-20 20:16:43 -07:00
Jingyu Zhou	4a499a3c97	Remove backup worker's first and last pop The first pop of current epoch can pop old epoch's data before they are saved. The last pop of a stopped backup worker should be skipped so that after recovery, the data is still accessible in case the last epoch's progress saving transaction is delayed.	2020-03-20 20:16:43 -07:00
Jingyu Zhou	9d6de758a7	Backup Worker: Give a chance of saving progress before displaced Move the exit loop after the saving of progress so that when doneTrigger is active, we won't exit the loop immediately.	2020-03-20 20:16:10 -07:00
Jingyu Zhou	5359528132	Reduce a call to getLogSystemConfig()	2020-03-20 20:15:09 -07:00
Jingyu Zhou	6b0d2923e7	Add target version as the limit for version batches If using partitioned logs, the mutations after the target version can be included if this limit is not considered.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	08173951bc	Add an exitEarly flag for backup worker If a backup worker is on an old epoch, it could exit early if either of the following is true: - there is no backups - all backups starts a version >= the endVersion If this flag is set, the backup worker exit without doing any work, which signals the master to update oldest backup epoch.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	e1737fc644	Skip setting backupStartedKey if using old mutation logs For old submitBackup(), where partitionedLog is false, do not set the backupStartedKey in BackupConfig, which signals backup workers to skip these backups.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	5b36dcaad5	Fix oldest backup epoch for backup workers The oldest backup epoch is piggybacked in LogSystemConfig from master to cluster controller and then to all workers. Previously, this epoch is set to the current master epoch, which is wrong.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	fea6155714	StagingKey uses mutation instead of a vector of mutations for each log version Because each log version contains commit version and subsequence number, each key can only have one mutation for its log version. This simplifies StagingKey::add() a lot.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	4bdb32be14	Batch sending all mutations of a version from RestoreLoader This optimization is to reduce the number of messages sent from loader to applier, which was unintentionally done when introducing sub sequence numbers for mutations.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	4065ca2a65	Fix duplicated mutation in StagingKey For some reason I am not sure why, there can be duplicated mutations added to StagingKey, which needs to be filtered out. Otherwise, atomic operations can result in corrupted data in database.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	799f0b4b0e	Small code refactor	2020-03-20 20:15:09 -07:00
Jingyu Zhou	e40f937d3a	Fix missing mutations in splitMutation When a range mutation is larger than the last split point, this mutation can become missing in the RestoreLoader, which is fixed in this commit.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	b18f192831	Fix decode bug of missing mutations After reading a new block, all mutations are sorted by version again, which can invalidate previously tuple. As a result, the decoded file will miss some of the mutations.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	9ea549ba7d	Updates lastest backup worker progress after all previous epochs are done If workers for previous epochs are still ongoing, we may end up with a container that miss mutations in previous epochs. So the update only happens after there are only current epoch's backup workers.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	4c75c61f39	Fix duplicate file removal for subset version ranges Partitioned logs can have strict subset version ranges, which was not properly handled -- we used to assume overlapping only happens for the same begin version.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	1a1f572f29	Fix a time gap for monitoring backup keys Backup worker starts by check if there are backup keys and then runs monitorBackupKeyOrPullData() loop, which does the check again. The second check can be delayed, which causes the loop to perform NOOP pops. The fix removes this second check and uses the result of the first check to decide what to do in the loop.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	c63493c34f	Allow overlapped versions in partitioned logs The overlapping can only happens between two generations, where the known committed version to recovery version is copied from old generation to the new generation. Within a generation, there is no overlap. The fix here is related to the calculation of continuous version ranges, allowing the overlap to happen.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	fa7c8d8bb3	Add done trigger so that backup progress can be set Otherwise, when there is no mutations for the unfinished range, the empty file may not be created when the worker is displaced, thus leaving holes in version ranges.	2020-03-20 20:15:09 -07:00
Jingyu Zhou	4f4ce93f8c	Remove debug print out	2020-03-20 20:15:09 -07:00

1 2 3 4 5 ...

9368 Commits All Branches Search

9368 Commits

All Branches