foundationdb

Commit Graph

Author	SHA1	Message	Date
Jingyu Zhou	7d1b9fe6d3	Add mutation file decoder	2020-01-22 19:38:46 -08:00
Jingyu Zhou	568a8a8e77	Use big endian for mutation log files For each mutation, its version, sub-version, and size are prefixed with big endian representation. This is required, especially for the first version variable, because we use 0xFF for padding purpose. A little endian version number can easily collide with 0xFF, while big endian is guaranteed to have 0x00 as the first byte.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	114e153bc8	Use block size encoded in file names The log files have block size encoded in their names and the converter should use these sizes.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	1123157ae0	Ignore mutations large than the end version	2020-01-22 19:38:46 -08:00
Jingyu Zhou	b92363bc29	Remove duplicated log files before the conversion Duplicates can happen because backup workers may store the log for old epochs successfully, but do not update the progress before another recovery happened. As a result, next epoch will retry and creates duplicated log files.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	4327435601	Fix a data corruption bug VersionedData used to include a MutationRef, which is made from BinaryReader. Unfortunately, the StringRef inside MutationRef points a memory allocated from the BinaryReader's arena, which is free'd after BinaryReader is destroyed. Change to use a StringRef pointing to the serialized mutation solves this bug.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	954743977b	Add paddings to a block in mutation log files This is needed otherwise decoding cannot be performed.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	c1748c0460	Code refactoring The BackupWorker produces files not in blocks, which should be fixed.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	84a49cf389	Add merge sorting mutations from multiple files This is implemented in MutationFilesReadProgress.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	e4aea9b66d	Use VectorRef<Tag> for VersionedMessage	2020-01-22 19:38:46 -08:00
Jingyu Zhou	5ab9d0925c	Add namespace file_converter	2020-01-22 19:38:46 -08:00
Jingyu Zhou	d8c74e7e1a	Extend BackupContainer to support tagged log files That is, the file name contains the log router tag ID as the last component, e.g., "log,39638169,42718056,016f52a4d16ef36fd3335db9c68abfc1,1048576,1".	2020-01-22 19:38:46 -08:00
Jingyu Zhou	7f7ec99170	Serialize and deserialize new backup files The BackupWorker writes files that can be read by FileConverter. Move StringRefReader to the header file for reuse in FileConverter.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	5ac63ec526	Apply clang-format	2020-01-22 19:38:46 -08:00
Jingyu Zhou	674b468609	Add more parameter parsing	2020-01-22 19:38:46 -08:00
Jingyu Zhou	2707ab3eba	Add fdbconvert command line utility fdbconvert is intended to convert new backup files which are tagged mutation logs to old backup format. The actual conversion is not included in this commit and will be added in future commits. Note that the BackupContainer needs to be updated to support new backup files, which is also not included in this commit.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	56f40a978e	Backport changes to OldTLogServer_6_2	2020-01-22 19:38:46 -08:00
Jingyu Zhou	f21d7ca44c	Add tag ID to backup log file names	2020-01-22 19:38:46 -08:00
Jingyu Zhou	2c83fbfe6c	Rename to BackupWorker.actor.cpp to be explicit There is already one file named backup.actor.cpp in "fdbbackup/".	2020-01-22 19:38:46 -08:00
Jingyu Zhou	2b2325036a	Fix compiler error of using override	2020-01-22 19:38:46 -08:00
Jingyu Zhou	4ed75e37f3	BackupProgress uses old epoch's begin version if no progress found Get rid of the complex logic of choosing the largest saved version from previous epoch for the oldest epoch. Instead, use the begin version now available from log system.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	42430e8f5e	Add epochBegin version to OldTLogCoreData/OldLogData/OldTLogConf This is to simplify the backup process so that whenever there is an old epoch in the log system, we always know its begin version and can backup from that version if no progress is known for that old epoch.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	250137a52f	Change BackupProgress to be a class Struct doesn't need addref() or delref() members, though.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	1e0753a327	Remove backup workers from DBCoreState This is no longer needed.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	19eacac3ce	Add a unit test for BackupProgress	2020-01-22 19:38:46 -08:00
Jingyu Zhou	64052f6349	Check and fill backup gaps for old epochs and tags Sometimes the backup worker has not updated progress to the system space and a master recovery happens. As a result, next epoch doesn't know the progress of previous ones. This change is to check for such missing gaps and fill them with the whole range [startVersion, endVersion). The code is refactored into BackupProgress.actor.* to consolidate backup progress processing for the master server.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	08d9f36071	Add tags for backup worker trace events	2020-01-22 19:38:46 -08:00
Jingyu Zhou	52bdaeee39	Do not save backup workers to core state and back Each master starts from an empty set of backup workers and recruits a new set. So there is no need to save current backup workers to DBCoreState. Note current backup workers need to be serialized to LogSystemConfig (in ServerDBInfo) so that backup workers can check if they have been displaced.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	ed54aaa09e	Fix a crash failure of empty backup interface	2020-01-22 19:38:46 -08:00
Jingyu Zhou	67ad260b9e	Fix OOM in backup worker For backup worker working on old epochs, make it a contract that the worker won't pull messages after the end version. This potentially saves memory and simplify the saving logic. Fix the wrong backup epoch when sending BackupWorkerDoneRequest.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	297da14aba	Fix backup worker not popping up to end version Previously, the pop version is the min of minKnownCommittedVersion and endVersion. In the case of backup worker for previous epoch, the endVersion should be used.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	40436a4e78	Filter out non-backup related mutations	2020-01-22 19:38:45 -08:00
Jingyu Zhou	ff512b0c93	Fix memory corruption due to invalid Arena For an ILogPeekCursor, the arena becomes invalid if hasMessage() is false. So the backup worker needs to keep a reference to the arena so that the message refers to memory area that is still valid.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	12e91240cc	Fix a typo	2020-01-22 19:38:45 -08:00
Jingyu Zhou	9abdd16cc5	Add logic to skip non-backup related mutations If a mutation has txsTag, then it is the change to in-memory key value store, i.e., the transaction state store, and should be ignored by the backup worker. The only exception is for the "metadataVersionKey", which needs to be stored in the backup.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	485d3d0feb	Use Version instead of int64_t	2020-01-22 19:38:45 -08:00
Jingyu Zhou	31a1106286	Save mutations to backup files in simulation This is the first step in the new backup's data pipeline. Verification of file content is needed in future commits. A clear documentation of file format is a work in progress.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	dafcaee844	Fix compiler errors.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	c7f51782b8	Use override for virtual functions.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	b745373163	Backup workers only save committed mutations.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	23985da6a0	Use backup worker failed error code during recovery And use override instead of virtual in TagPartitionedLogSystem.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	840e74d696	Allow storage server queue in consistency check The backup worker needs to update its progress even during consistency check by commit transactions to the database. Thus we can't really achieve zero storage server queue. So add a limit of 10,000 to pass the consistency check.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	8585d78bfd	Refactor to remove a trigger from backup worker	2020-01-22 19:38:45 -08:00
Jingyu Zhou	9d7a1a77d0	Small fixes.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	9567bf730d	Fix a crash due to null log system When a master starts, backup worker from old epochs may send BackupWorkerDoneRequest to it. The master can be safely ignore it, since the checkRemoved logic of the backup worker can self exit then.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	0c08161d8e	Remove old backup workers when done For backup workers working on old epochs, once their work is done, they will notify the master. Then the master removes them from the log system and acknowledge back to the backup workers so that they can gracefully shut down. The popping of a backup worker is stalled if there are workers from older epochs still working. Otherwise, workers from old epochs will lost data. However, allowing newer epoch to start backup can cause holes in version ranges. The restore process must verify the backup progress to make sure there are no holes, otherwise it has to wait.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	85c4a4e422	Address review comments for PR #1625	2020-01-22 19:38:45 -08:00
Jingyu Zhou	116608a0a7	Set backup workers w.r.t. the correct epoch For backup workers created for previous epoch, we need to associate them with the correct epoch so that later peekLogRouter can get the correct peek cursor. Otherwise, the workers can never peek the missing range of mutations.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	22f4bef589	Fix a race that backup workers may not be registered After the backup worker recruitment is done, we need to force trigger the registration with cluster controller. Otherwise, the log system may not have the backup workers, which can stall backup workers from obtaining a cursor and resulting in mutations being kept in TLogs.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	d3f14699c4	Backup worker should aggressively advance versions Separate popping logic into an actor with shorter interval than the upload interval. More critically, even if there is no mutations (e.g., in quiet database period), the popped version should still be advanced.	2020-01-22 19:38:45 -08:00

1 2 3 4 5 ...

8144 Commits All Branches Search

8144 Commits

All Branches