foundationdb

Commit Graph

Author	SHA1	Message	Date
Jingyu Zhou	297da14aba	Fix backup worker not popping up to end version Previously, the pop version is the min of minKnownCommittedVersion and endVersion. In the case of backup worker for previous epoch, the endVersion should be used.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	40436a4e78	Filter out non-backup related mutations	2020-01-22 19:38:45 -08:00
Jingyu Zhou	ff512b0c93	Fix memory corruption due to invalid Arena For an ILogPeekCursor, the arena becomes invalid if hasMessage() is false. So the backup worker needs to keep a reference to the arena so that the message refers to memory area that is still valid.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	12e91240cc	Fix a typo	2020-01-22 19:38:45 -08:00
Jingyu Zhou	9abdd16cc5	Add logic to skip non-backup related mutations If a mutation has txsTag, then it is the change to in-memory key value store, i.e., the transaction state store, and should be ignored by the backup worker. The only exception is for the "metadataVersionKey", which needs to be stored in the backup.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	485d3d0feb	Use Version instead of int64_t	2020-01-22 19:38:45 -08:00
Jingyu Zhou	31a1106286	Save mutations to backup files in simulation This is the first step in the new backup's data pipeline. Verification of file content is needed in future commits. A clear documentation of file format is a work in progress.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	dafcaee844	Fix compiler errors.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	c7f51782b8	Use override for virtual functions.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	b745373163	Backup workers only save committed mutations.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	23985da6a0	Use backup worker failed error code during recovery And use override instead of virtual in TagPartitionedLogSystem.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	840e74d696	Allow storage server queue in consistency check The backup worker needs to update its progress even during consistency check by commit transactions to the database. Thus we can't really achieve zero storage server queue. So add a limit of 10,000 to pass the consistency check.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	8585d78bfd	Refactor to remove a trigger from backup worker	2020-01-22 19:38:45 -08:00
Jingyu Zhou	9d7a1a77d0	Small fixes.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	9567bf730d	Fix a crash due to null log system When a master starts, backup worker from old epochs may send BackupWorkerDoneRequest to it. The master can be safely ignore it, since the checkRemoved logic of the backup worker can self exit then.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	0c08161d8e	Remove old backup workers when done For backup workers working on old epochs, once their work is done, they will notify the master. Then the master removes them from the log system and acknowledge back to the backup workers so that they can gracefully shut down. The popping of a backup worker is stalled if there are workers from older epochs still working. Otherwise, workers from old epochs will lost data. However, allowing newer epoch to start backup can cause holes in version ranges. The restore process must verify the backup progress to make sure there are no holes, otherwise it has to wait.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	85c4a4e422	Address review comments for PR #1625	2020-01-22 19:38:45 -08:00
Jingyu Zhou	116608a0a7	Set backup workers w.r.t. the correct epoch For backup workers created for previous epoch, we need to associate them with the correct epoch so that later peekLogRouter can get the correct peek cursor. Otherwise, the workers can never peek the missing range of mutations.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	22f4bef589	Fix a race that backup workers may not be registered After the backup worker recruitment is done, we need to force trigger the registration with cluster controller. Otherwise, the log system may not have the backup workers, which can stall backup workers from obtaining a cursor and resulting in mutations being kept in TLogs.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	d3f14699c4	Backup worker should aggressively advance versions Separate popping logic into an actor with shorter interval than the upload interval. More critically, even if there is no mutations (e.g., in quiet database period), the popped version should still be advanced.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	6c6a553dcc	Fix hang due to distributor death in QuietDatabase It's possible that after obtaining data distributor, the distributor then dies and a new one is recruited. Because the tester is still contacting the old one, it becomes stuck.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	73824faf65	Track pseudo tags popping for individual IDs For each log router ID, we track the popped version of each pseudo tag so that the popping only applied to the minimum of these versions. Also add more tracing for popping and epochs.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	3509209d3f	Fix not setting epoch for old log system	2020-01-22 19:38:45 -08:00
Jingyu Zhou	a1095c8250	Remove epoch from DBCoreState Use existing recoveryCount if needed.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	580151e1d4	Refactor code using C++ 17 iterator	2020-01-22 19:38:45 -08:00
Jingyu Zhou	d5a92e1805	Fix pseudo locality usage bug Somehow pseudo localities are not saved to LogSystemConfig and getPseudoPopTag() should translate LogRouter tag to pseudo tags.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	c2b8ee3b53	Small improvement	2020-01-22 19:38:45 -08:00
Jingyu Zhou	19d6a889ff	Recruit backup workers for old epochs If there are unfinished ranges in the old epochs, the new master will recruit backup workers responsible for finishing these ranges. These workers remains in the cluster until the next epoch, when it will remove itself.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	ac851619bb	Fix merge errors with master	2020-01-22 19:38:45 -08:00
Jingyu Zhou	11964733b7	WIP: should be divided into smaller commits.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	17002740bb	Add epoch and backup workers to DBCoreState This enables backup workers to know the end version of the epoch. Additionally, the master recovery only needs to deal with crashed backup workers by recruiting new workers to backup the unfinished version range.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	41f0cf2bb5	Add decode function for backup progress	2020-01-22 19:38:45 -08:00
Jingyu Zhou	f245084bf3	Refactor LogRouter with hasLogRouter()	2020-01-22 19:38:45 -08:00
Jingyu Zhou	03a17a30ef	Refactor: check displacement in LogSystemConfig	2020-01-22 19:38:45 -08:00
Jingyu Zhou	7da9f47f26	Enable pop from backup workers This is still WIP as some edge cases can trigger test failure, most likely due to not popping mutations by backup workers when epoch ends.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	a797958af6	Update peekLogRouter for backup workers to peek	2020-01-22 19:37:48 -08:00
Jingyu Zhou	c3e5a9550f	Fix too many files error caused by many recoveries	2020-01-22 19:37:48 -08:00
Jingyu Zhou	443c4995a2	Add file identifier in interfaces for flatbuffer	2020-01-22 19:37:48 -08:00
Jingyu Zhou	a4d6ebe79e	Recruit backup worker in newEpoch	2020-01-22 19:37:48 -08:00
Jingyu Zhou	ece3cadf8e	Recruit backup worker during master recovery Right now recruit the same number as TLogs. The backup worker does nothing.	2020-01-22 19:37:48 -08:00
Jingyu Zhou	eac49bca04	Add backup worker recruitment in master.	2020-01-22 19:35:30 -08:00
Jingyu Zhou	acc4ad276d	Add std:: namespace for vector	2020-01-22 19:35:30 -08:00
Jingyu Zhou	442738b6db	Small code refactoring	2020-01-22 19:35:30 -08:00
Jingyu Zhou	de8d953865	Add backup role, class, and worker skeleton	2020-01-22 19:35:30 -08:00
Jingyu Zhou	8221d33eb1	Use emplace_back instead of push_back for TLogServer	2020-01-22 19:35:30 -08:00
Evan Tschannen	38569e46c1	Merge pull request #2584 from etschannen/master updated old release notes	2020-01-22 15:59:15 -08:00
Evan Tschannen	6a26ca09b1	updated old release notes	2020-01-22 15:58:29 -08:00
mpilman	bb88458830	Merge branch 'features/documentation-server' of github.com:mpilman/foundationdb into features/documentation-server	2020-01-22 14:25:00 -08:00
mpilman	6b11a3ee21	handle cases where no username is set	2020-01-22 14:24:01 -08:00
Markus Pilman	ff11c29258	Typo Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2020-01-22 14:14:08 -08:00

1 2 3 4 5 ...

8262 Commits All Branches Search

8262 Commits

All Branches