foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	32c0169fc8	use the old logic for lifetime since we already have verified the cluster controller is correct	2020-07-20 10:26:47 -07:00
Evan Tschannen	6a38f81269	do not kill the master unless we have a dbInfo from the current cluster controller	2020-07-17 14:59:38 -07:00
Jingyu Zhou	5cc5d9cf1e	Log peer address whose failure can cause master recovery So when there is master recovery due to failed tlog, proxy, resolver, log router, or resolver, we can have a trace event tells which address that the master thinks is dead.	2020-07-10 15:57:03 -07:00
Jingyu Zhou	d883426c6a	Fix spammy GotBackupProgress events Only print this types of events during master recovery and don't log them for backup workers.	2020-06-27 21:30:38 -07:00
Evan Tschannen	ced65cd30b	finished explicitly versioning everything stored in the database	2020-05-22 17:14:21 -07:00
Markus Pilman	c2bc75516f	Merge branch 'release-6.3' of github.com:apple/foundationdb into features/trace-roles	2020-05-14 10:34:53 -07:00
Markus Pilman	5f9b127e56	Emit traces regularly about role assignment We are currently emitting Role transition traces when a role starts and when it ends. While this is useful for debugging, it doesn't work well with tools that inject data and might potentially miss some trace lines. We do decorate each trace lines with the roles assigned to that particular process, however, this is not sufficient for tools that can make use of the UID -> Role mapping	2020-05-08 16:27:57 -07:00
Evan Tschannen	a442565e13	more work towards shrinking locality	2020-04-18 21:29:38 -07:00
Evan Tschannen	07cc0a8d74	code cleanup	2020-04-10 17:02:11 -07:00
Evan Tschannen	a51c92854a	Merge branch 'master' into feature-tree-broadcast # Conflicts: # fdbserver/WorkerInterface.actor.h # fdbserver/worker.actor.cpp	2020-04-06 21:09:44 -07:00
Evan Tschannen	2a1bd97120	fix compilation errors	2020-04-06 20:58:43 -07:00
Evan Tschannen	477d66b46d	implemented a tree broadcast for txn state message for proxies, and serverDBInfo for workers	2020-04-05 23:09:36 -07:00
Evan Tschannen	bb5799bd20	Merge pull request #2642 from xumengpanda/mengxu/new-backup-format-PR FastRestore:Integrate with new backup format	2020-03-25 15:47:55 -07:00
Jingyu Zhou	e2f317a0da	Fix a crash failure	2020-03-25 09:18:49 -07:00
Jingyu Zhou	243d078596	Fix off by one error Epoch end version is saved version + 1, so need +1 for minBackupVersion.	2020-03-23 20:44:31 -07:00
Jingyu Zhou	90b40e1d75	Merge branch 'mengxu/new-backup-format-PR-delta' of github.com:xumengpanda/foundationdb into backup-worker-bak Resolve Conflicts: fdbclient/BackupAgent.actor.h fdbserver/BackupWorker.actor.cpp fdbserver/RestoreMaster.actor.cpp fdbserver/masterserver.actor.cpp	2020-03-23 13:35:33 -07:00
Meng Xu	be67ab4d6a	Correct comment based on review	2020-03-23 12:53:40 -07:00
Andrew Noyes	fa8eaf9810	Assert recoverAndEndEpoch does not become ready	2020-03-23 12:40:00 -07:00
Meng Xu	3f31ebf659	New backup:Revise event name and explain code	2020-03-23 10:55:44 -07:00
Jingyu Zhou	97702d91c8	Skip recruiting backup workers for older epochs before min backup version When master starts recruiting backup workers, if there is no active backup job or the min version of the backup job is greater than old epoch's end version, then these old epochs can be skipped.	2020-03-21 13:44:02 -07:00
Jingyu Zhou	818072f3cb	Set oldest backup epoch if not recruiting backup workers Since tlog is not kept until backup worker has pulled mutations from it, the old tlogs can only be displaced after oldest backup epoch equals current epoch. So if master is not recruiting backup workers, it should set the oldest backup epoch as the current epoch.	2020-03-20 20:16:43 -07:00
Jingyu Zhou	5359528132	Reduce a call to getLogSystemConfig()	2020-03-20 20:15:09 -07:00
Jingyu Zhou	12ed8ad536	Fix backup worker start version when logset start version is lower The start version of tlog set can be smaller than the last epoch's end version. In this case, set backup worker's start version as last epoch's end version to avoid overlapping of version ranges among backup workers.	2020-03-20 20:15:08 -07:00
Jingyu Zhou	80d3fa1222	Add delay for master to recruit backup workers This delay is to ensure old epoch's backup workers can save their progress in the database. Otherwise, the new master could attempts to recruit backup workers for the old epoch on version ranges that have already been popped. As a result, the logs will lose data.	2020-03-20 20:15:08 -07:00
Jingyu Zhou	fda6c08640	Include a total number of tags in partition log file names This is needed for BackupContainer to check partitioned mutation logs are continuous, i.e., restorable to a version.	2020-03-20 20:13:38 -07:00
Jingyu Zhou	5bf62c8f85	Reduce a call to getLogSystemConfig()	2020-03-19 10:08:19 -07:00
Jingyu Zhou	89d8f13038	Fix backup worker start version when logset start version is lower The start version of tlog set can be smaller than the last epoch's end version. In this case, set backup worker's start version as last epoch's end version to avoid overlapping of version ranges among backup workers.	2020-03-18 16:41:35 -07:00
Jingyu Zhou	15437ffb53	Add delay for master to recruit backup workers This delay is to ensure old epoch's backup workers can save their progress in the database. Otherwise, the new master could attempts to recruit backup workers for the old epoch on version ranges that have already been popped. As a result, the logs will lose data.	2020-03-18 16:41:35 -07:00
Jingyu Zhou	d8c6bf585d	Include a total number of tags in partition log file names This is needed for BackupContainer to check partitioned mutation logs are continuous, i.e., restorable to a version.	2020-03-18 16:39:40 -07:00
Evan Tschannen	e08f0201f1	merge release 6.2 into master	2020-03-17 12:51:47 -07:00
Evan Tschannen	56dee89e6e	active generations should include the current one	2020-03-16 11:09:42 -07:00
Evan Tschannen	e5d53c863b	report in status the number of active generations	2020-03-16 10:29:17 -07:00
Evan Tschannen	818537ed2d	Update fdbserver/masterserver.actor.cpp Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2020-03-14 15:04:46 -07:00
Evan Tschannen	2f2f56020f	Update fdbserver/masterserver.actor.cpp Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2020-03-13 15:54:13 -07:00
Evan Tschannen	a39effa57d	delay recoveries after 70 outstanding generations, and stop recoveries after 100 outstanding generations to prevent a death spiral from filling up the coordinated state	2020-03-13 10:28:32 -07:00
Evan Tschannen	96258b9809	Merge branch 'release-6.2' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbcli/fdbcli.actor.cpp # fdbclient/ManagementAPI.actor.cpp # fdbrpc/FlowTransport.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/DataDistribution.actor.h # fdbserver/DataDistributionQueue.actor.cpp # fdbserver/KeyValueStoreMemory.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/QuietDatabase.actor.cpp # fdbserver/SkipList.cpp # fdbserver/StorageMetrics.actor.h # fdbserver/TLogServer.actor.cpp # fdbserver/fdbserver.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/KVStoreTest.actor.cpp # flow/CMakeLists.txt # flow/Knobs.cpp # flow/Knobs.h # flow/genericactors.actor.cpp # flow/serialize.h	2020-02-21 19:09:16 -08:00
A.J. Beamon	df2b0452b4	Step 3 of fixing storage server range reads: change return type of readRange from VectorRef<KeyValueRef> to RangeResultRef.	2020-02-06 13:19:24 -08:00
Jingyu Zhou	52c6737411	Rename backupLoggingEnabled as backupWorkerEnabled To highlight the changes for 7.0 backup changes. By default, backup_worker_enabled flag is set for 7.0 version.	2020-02-04 10:09:16 -08:00
Jingyu Zhou	0db03f1d3c	Use backup_logging_enabled flag The default is to enable new backup workers. Users can disable this flag to turn off the backup worker feature.	2020-02-03 20:03:22 -08:00
Jingyu Zhou	38aa1903fd	Add a DB configuration option for backup workers Right now, the default is to keep the old backup behavior, i.e., do NOT use backup workers. Specifically, if BackupType is not set (or is set to default), the master will not recruit backup workers and will not add pseudo locality for backup workers. The StartFullBackupTaskFunc is updated to check if backup worker is enabled. Only when it is not enabled, starting a backup will wait on all backup workers to be started.	2020-01-31 19:29:09 -08:00
Jingyu Zhou	8b67a89eed	More review comments fixed.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	1eaea91cb3	Address review comments	2020-01-22 19:42:13 -08:00
Jingyu Zhou	e14246ac16	Add more information for trace events	2020-01-22 19:42:13 -08:00
Jingyu Zhou	4bed33031f	Set backup worker start version to be savedVersion + 1 If no progress found, start version is set to epochBegin. So the start version is the one after the last saved (or from last epoch's saved) version.	2020-01-22 19:42:13 -08:00
Jingyu Zhou	4ed75e37f3	BackupProgress uses old epoch's begin version if no progress found Get rid of the complex logic of choosing the largest saved version from previous epoch for the oldest epoch. Instead, use the begin version now available from log system.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	19eacac3ce	Add a unit test for BackupProgress	2020-01-22 19:38:46 -08:00
Jingyu Zhou	64052f6349	Check and fill backup gaps for old epochs and tags Sometimes the backup worker has not updated progress to the system space and a master recovery happens. As a result, next epoch doesn't know the progress of previous ones. This change is to check for such missing gaps and fill them with the whole range [startVersion, endVersion). The code is refactored into BackupProgress.actor.* to consolidate backup progress processing for the master server.	2020-01-22 19:38:46 -08:00
Jingyu Zhou	ed54aaa09e	Fix a crash failure of empty backup interface	2020-01-22 19:38:46 -08:00
Jingyu Zhou	23985da6a0	Use backup worker failed error code during recovery And use override instead of virtual in TagPartitionedLogSystem.	2020-01-22 19:38:45 -08:00
Jingyu Zhou	840e74d696	Allow storage server queue in consistency check The backup worker needs to update its progress even during consistency check by commit transactions to the database. Thus we can't really achieve zero storage server queue. So add a limit of 10,000 to pass the consistency check.	2020-01-22 19:38:45 -08:00

1 2 3 4

198 Commits