foundationdb

Commit Graph

Author	SHA1	Message	Date
Xin Dong	c901fc269b	Changed to use 'rate' instead of 'limit' after some discussion with Evan and AJ	2020-01-30 14:13:56 -08:00
Xin Dong	65c607bc13	Fix the error after the rebase	2020-01-30 14:13:56 -08:00
Xin Dong	1b313a4f7e	Address review comments. Rebased with latest master	2020-01-30 14:13:56 -08:00
Xin Dong	9aaf4bc107	Add code coverage mark when sending out the throttled error.	2020-01-30 14:13:56 -08:00
Xin Dong	e21426d12a	Send error back to the GRV requests with batch priority when the cluster is saturated, instead of blindly enqueue the requests and let the client timeout.	2020-01-30 14:13:56 -08:00
A.J. Beamon	fa51a1abc5	Merge pull request #2604 from xumengpanda/mengxu/fast-restore-valgrind-fix-PR14 Performant restore [14/XX Add-on]: Fix initialized field in VersionBatch struct	2020-01-27 15:22:31 -08:00
Meng Xu	76f30e71dc	FastRestore:Init VersionBatch explicitly Built-in variable may not be zero initialized by compiler provided default constructor.	2020-01-26 13:15:45 -08:00
Alvin Moore	d03e49b4a1	Fixed the location of crc32c.h from fdbrpc to flow	2020-01-26 07:01:25 -08:00
Alex Miller	6945a6ea01	Merge pull request #2345 from zjuLcg/add-consistency-verification-in-mako-workload Add consistency verification in mako workload	2020-01-24 17:07:49 -08:00
Evan Tschannen	8f599e9d15	fix: backupWorker would crash when run outside of simulation	2020-01-23 19:06:39 -08:00
Evan Tschannen	76e192d490	Merge pull request #2538 from alexmiller-apple/hashlittle2-to-crc32c Convert more hashlittle{,2} uses to crc32c_append	2020-01-23 17:54:38 -08:00
Evan Tschannen	6c0b934dda	Merge pull request #2242 from alexmiller-apple/fix-10min-stall-again Fix the 10min multi-region recovery stall again	2020-01-23 17:53:02 -08:00
A.J. Beamon	b2c8a4a34c	Merge pull request #2519 from xumengpanda/mengxu/fast-restore-versionBatch-fixSize-PR Performant restore [14/XX]: Ensure each version-batch not exceed a configured size	2020-01-23 16:49:01 -08:00
A.J. Beamon	8a065b9da4	Merge pull request #2557 from alexmiller-apple/reduce-versionstamp-conflictranges Narrow the unreadable range of keys after a versionstamped key operation	2020-01-23 11:14:47 -08:00
Jingyu Zhou	6ddf73e26a	Remove code introduced when resolving merge conflicts	2020-01-22 21:23:38 -08:00
Jingyu Zhou	39fbacbc4f	Address review comments	2020-01-22 19:43:40 -08:00
Jingyu Zhou	acebfdc67b	Restore storage queue limit to 0 in consistency check The storage queue is no longer going to be a problem failing tests. Now the backup worker life cycle is tied with backup. So consistency check only happens after the backup workload is done. Thus, we no longer need to save backup progress when consistency check is running.	2020-01-22 19:43:40 -08:00
Jingyu Zhou	c6c39ca99d	Update better master exist with backup workers During recruitment, if there is no desired log router count, use tlog size instead, because the number of backup workers has to be larger than 0.	2020-01-22 19:43:40 -08:00
Jingyu Zhou	8b67a89eed	More review comments fixed.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	1eaea91cb3	Address review comments	2020-01-22 19:42:13 -08:00
Jingyu Zhou	1311fec45a	Add an option to get minKnownCommittedVersion from Proxies The backup worker needs to use this version for popping when running in a NOOP mode. This option is added to GetReadVersionRequest and proxies will send back minKnownCommittedVersion if the option is set. Also add a couple of knobs for backup workers.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	7989f3f015	Add NOOP to backup worker The backup worker just blindly pop tags if the "backupStartedKey" is not set. Note the commit version from TLog cannot be used as the pop version, because for a single region, during a recovery the log router tags are used to recover mutations. The backup worker can potentially pop mutations that are needed for recovery, causing consistency errors. So the solution for now is to use commit version - 5,000,000, which is a version guaranteed to be persisted on all replicas.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	c08a192c75	Add a backup start key If the backup key is not set, do not recruit backup workers for old epoches.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	e14246ac16	Add more information for trace events	2020-01-22 19:42:13 -08:00
Jingyu Zhou	4bed33031f	Set backup worker start version to be savedVersion + 1 If no progress found, start version is set to epochBegin. So the start version is the one after the last saved (or from last epoch's saved) version.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	dcd0a46bc6	Fix a rare remote recovery bug This bug was introduced when I added log router tags unconditionally to any configurations. In newEpoch(), the wait for remote recovery is conditioned on "logRouterTags == 0", which always becomes false. Thus remote recovery was not performed and remote TLogs won't copy data from previous epoch's TLogs (previous epoch is a single region configuration). As a result, storage servers cannot peek/get the data, and won't pop tags. Thus, waitForFullReplication() became stuck and eventually test timeout.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	56a2c37071	Recruit backup workers for single region Enable log router tags for single region, which are popped by backup workers. Need to add noop for backup workers if there is no active backups.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	0e5f5b50f0	Remove unused backup worker knobs	2020-01-22 19:38:46 -08:00
Jingyu Zhou	60f360c954	Log oldest backup epoch in the backup worker	2020-01-22 19:38:46 -08:00
Jingyu Zhou	568a8a8e77	Use big endian for mutation log files For each mutation, its version, sub-version, and size are prefixed with big endian representation. This is required, especially for the first version variable, because we use 0xFF for padding purpose. A little endian version number can easily collide with 0xFF, while big endian is guaranteed to have 0x00 as the first byte.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	954743977b	Add paddings to a block in mutation log files This is needed otherwise decoding cannot be performed.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	e4aea9b66d	Use VectorRef<Tag> for VersionedMessage	2020-01-22 19:38:46 -08:00
Jingyu Zhou	7f7ec99170	Serialize and deserialize new backup files The BackupWorker writes files that can be read by FileConverter. Move StringRefReader to the header file for reuse in FileConverter.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	56f40a978e	Backport changes to OldTLogServer_6_2	2020-01-22 19:38:46 -08:00
Jingyu Zhou	f21d7ca44c	Add tag ID to backup log file names	2020-01-22 19:38:46 -08:00
Jingyu Zhou	2c83fbfe6c	Rename to BackupWorker.actor.cpp to be explicit There is already one file named backup.actor.cpp in "fdbbackup/".	2020-01-22 19:38:46 -08:00
Jingyu Zhou	2b2325036a	Fix compiler error of using override	2020-01-22 19:38:46 -08:00
Jingyu Zhou	4ed75e37f3	BackupProgress uses old epoch's begin version if no progress found Get rid of the complex logic of choosing the largest saved version from previous epoch for the oldest epoch. Instead, use the begin version now available from log system.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	42430e8f5e	Add epochBegin version to OldTLogCoreData/OldLogData/OldTLogConf This is to simplify the backup process so that whenever there is an old epoch in the log system, we always know its begin version and can backup from that version if no progress is known for that old epoch.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	250137a52f	Change BackupProgress to be a class Struct doesn't need addref() or delref() members, though.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	1e0753a327	Remove backup workers from DBCoreState This is no longer needed.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	19eacac3ce	Add a unit test for BackupProgress	2020-01-22 19:38:46 -08:00
Jingyu Zhou	64052f6349	Check and fill backup gaps for old epochs and tags Sometimes the backup worker has not updated progress to the system space and a master recovery happens. As a result, next epoch doesn't know the progress of previous ones. This change is to check for such missing gaps and fill them with the whole range [startVersion, endVersion). The code is refactored into BackupProgress.actor.* to consolidate backup progress processing for the master server.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	08d9f36071	Add tags for backup worker trace events	2020-01-22 19:38:46 -08:00
Jingyu Zhou	52bdaeee39	Do not save backup workers to core state and back Each master starts from an empty set of backup workers and recruits a new set. So there is no need to save current backup workers to DBCoreState. Note current backup workers need to be serialized to LogSystemConfig (in ServerDBInfo) so that backup workers can check if they have been displaced.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	ed54aaa09e	Fix a crash failure of empty backup interface	2020-01-22 19:38:46 -08:00
Jingyu Zhou	67ad260b9e	Fix OOM in backup worker For backup worker working on old epochs, make it a contract that the worker won't pull messages after the end version. This potentially saves memory and simplify the saving logic. Fix the wrong backup epoch when sending BackupWorkerDoneRequest.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	297da14aba	Fix backup worker not popping up to end version Previously, the pop version is the min of minKnownCommittedVersion and endVersion. In the case of backup worker for previous epoch, the endVersion should be used.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	40436a4e78	Filter out non-backup related mutations	2020-01-22 19:38:45 -08:00
Jingyu Zhou	ff512b0c93	Fix memory corruption due to invalid Arena For an ILogPeekCursor, the arena becomes invalid if hasMessage() is false. So the backup worker needs to keep a reference to the arena so that the message refers to memory area that is still valid.	2020-01-22 19:38:45 -08:00

1 2 3 4 5 ...

3359 Commits