foundationdb

Commit Graph

Author	SHA1	Message	Date
Meng Xu	94d799552e	FastRestore:Apply clang-format against master	2020-02-18 16:41:59 -08:00
Meng Xu	132f5aa9ba	FastRestore:Improve trace name and cosmetic change	2020-02-18 16:41:19 -08:00
Meng Xu	31a6ec34b7	Merge branch 'master' into mengxu/fast-restore-agent-PR	2020-02-18 16:17:59 -08:00
Meng Xu	a12a161fb3	Merge branch 'master' into mengxu/fast-restore-pipeline-PR	2020-02-18 14:49:52 -08:00
Meng Xu	c603b20e7e	FastRestore:Resolve review comments	2020-02-18 14:08:27 -08:00
A.J. Beamon	649fc6ba94	Merge pull request #2329 from davisp/trace-clock-source-network-option Add network option for the trace clock source	2020-02-15 10:43:00 -08:00
Paul J. Davis	32e285a761	Add network option for the trace clock source This option allows clients to select the clock source for trace events similar to the `--traceclock` command line parameter for `fdbserver`. Using the `realtime` clock sources makes loading event data into OpenTracing systems like Jaeger more useful.	2020-02-15 11:30:43 -06:00
Alex Miller	94e7f790d8	Merge pull request #2667 from atn34/atn34/remove-flatbuffers-knob Remove USE_OBJECT_SERIALIZER knob	2020-02-14 15:44:38 -08:00
Xin Dong	1849939bc3	Added a delay to avoid get stuck in a loop because the request is not versioned and thus if a storage server is behind it might not know it has been assigned a shard range that a proxy thinks it has.	2020-02-12 15:01:26 -08:00
Xin Dong	2e1d03cbe7	Addressed AJ's review comments	2020-02-12 14:57:40 -08:00
Xin Dong	03287a0214	Fix build error.	2020-02-12 14:57:40 -08:00
Xin Dong	57f0c11712	Address Evan's review comments	2020-02-12 14:57:40 -08:00
Xin Dong	d20ce99774	Resolved the review comment and renamed the functions	2020-02-12 14:57:40 -08:00
Xin Dong	d934aed1d7	Because when the user issue 'getStorageByteSample' on a large key range, which can be as large as the whole DB, we need to change the behavior of 'waitStorageMetricsMultipleLocation' to avoid the case where a target key range got moved/splited by DD and thus the call to 'waitMetircs' on the corresponding storage server will return 'wrong_shard_server' error and thus the whole 'waitStorageMetricsMultipleLocation' will be retried on the large key range. What we want is to do the retry only for the key range that caused the error.	2020-02-12 14:57:40 -08:00
Xin Dong	807204e676	Update fdbclient/MultiVersionTransaction.actor.cpp Apply A.J's suggestion. Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>	2020-02-12 14:57:40 -08:00
Xin Dong	d5c3f821e2	Added missing pieces.	2020-02-12 14:57:40 -08:00
Xin Dong	70f89042fd	Remove comment that does not apply anymore	2020-02-12 14:57:40 -08:00
Xin Dong	0c16d43c2f	Added necessary plumbings to expose byte sample collected by storage servers to fdb_c library	2020-02-12 14:57:40 -08:00
Andrew Noyes	1248d2b8b4	Remove USE_OBJECT_SERIALIZER knob	2020-02-12 10:41:52 -08:00
Meng Xu	cda8fc189e	FastRestore:AtomicOp:Intro weighted size for atomicOp atomicOp has an amplified performance overhead to the cluster, for example, an ADD operation can be small, but SS has to load the value to do the operation and the value can be large.	2020-02-11 12:48:05 -08:00
mpilman	5a9d420cb7	Merge remote-tracking branch 'upstream/release-6.2' into release-merges/20200210	2020-02-10 10:02:05 -08:00
A.J. Beamon	ff44bd2b33	Merge pull request #2639 from atn34/atn34/include-port-in-address-default Enable include_port_in_address by default for api version 700	2020-02-10 09:50:59 -08:00
Markus Pilman	e71fe44ee3	Merge branch 'master' into features/icc	2020-02-08 21:33:02 -08:00
Evan Tschannen	844c8511c4	Merge pull request #2588 from jzhou77/backup-worker Integrate new backup worker with existing backup command	2020-02-05 14:14:43 -08:00
Jingyu Zhou	d5849af5c0	Address review comments	2020-02-05 10:33:51 -08:00
Meng Xu	08443ed18d	FastRestore:Remove debug trace for debugging connection errors	2020-02-04 17:06:02 -08:00
Evan Tschannen	8449badb3e	Merge pull request #1868 from dongxinEric/fix/1827/error_instead_of_timeout Send error back before put the GRV request with PRIORITY_BATCH into t…	2020-02-04 14:32:47 -08:00
mpilman	100402aadf	Don't call operator explicitely	2020-02-04 11:03:43 -08:00
mpilman	52ca752dd3	Merge remote-tracking branch 'origin/features/icc' into features/icc	2020-02-04 10:29:49 -08:00
mpilman	d09e07f1f5	Merge remote-tracking branch 'upstream/master' into features/icc	2020-02-04 10:26:18 -08:00
Jingyu Zhou	52c6737411	Rename backupLoggingEnabled as backupWorkerEnabled To highlight the changes for 7.0 backup changes. By default, backup_worker_enabled flag is set for 7.0 version.	2020-02-04 10:09:16 -08:00
Jingyu Zhou	7c10683c77	Backup workers save logs into right containers The mutation logs of backup workers are saved into "mlogs" directory under the container directory. The backup worker has been restructured to handle multiple backups, where each one is stored in a separate backup container. In the backup worker, mutations pulled from TLogs are buffered in a message queue. When writing out to different containers, their corresponding mutation ranges are used to check if a mutation should be written. When a new backup is submitted by the client, "backupStartedKey" is updated. The worker monitors this key, updates its internal map of backups, and then next pull from TLog needs to wait for the readiness of the new backup. This is to ensure when worker 0 sets the backup is started, all workers have already been logging mutations for the backup.	2020-02-03 20:27:14 -08:00
Jingyu Zhou	0db03f1d3c	Use backup_logging_enabled flag The default is to enable new backup workers. Users can disable this flag to turn off the backup worker feature.	2020-02-03 20:03:22 -08:00
Meng Xu	3b57bf1781	Merge branch 'master' into mengxu/fast-restore-agent-PR	2020-02-03 17:23:54 -08:00
Evan Tschannen	4524831456	Merge pull request #2518 from vishesh/task/failmon-remove-server FailureMonitoring: Server processes no longer need to talk to ClusterController	2020-02-03 17:22:50 -08:00
Meng Xu	ca3b6135d0	FastRestore:Add debug to see why restore role is not connected Reason: restore is a fdbserver who does not register with CC. The new failure monitor changes how connection works for client and server. For client, it does not connect to CC to get connected. For server, it has to connect to CC to get connected. Restore worker becomes the special role that behaves like a client but is a server.	2020-02-03 17:19:52 -08:00
Andrew Noyes	2ce887012c	Respect api version for include_port_in_address	2020-02-03 15:25:30 -08:00
Andrew Noyes	07a3051f0e	Enable include_port_in_address by default for api version 700 Resolves #2607	2020-02-03 15:10:00 -08:00
Meng Xu	9c2046b11b	FastRestore:Minic fdbd to monitor coordintors Before we start a fdb restore process.	2020-02-03 14:48:31 -08:00
Jingyu Zhou	297f22726c	Add backup_type database configuration option Update simulation tests to randomly set backup types to be one of: old backup (default), new backup (tagged), or both (default+tagged).	2020-01-31 19:29:09 -08:00
Jingyu Zhou	38aa1903fd	Add a DB configuration option for backup workers Right now, the default is to keep the old backup behavior, i.e., do NOT use backup workers. Specifically, if BackupType is not set (or is set to default), the master will not recruit backup workers and will not add pseudo locality for backup workers. The StartFullBackupTaskFunc is updated to check if backup worker is enabled. Only when it is not enabled, starting a backup will wait on all backup workers to be started.	2020-01-31 19:29:09 -08:00
Jingyu Zhou	f7956cfbfc	Clear backup UID from backupStartedKey when finish/abort backups Clearing this key signals backup workers that backup is no longer needed. When no backup is going on, the backup workers switch to the NOOP state.	2020-01-31 19:29:09 -08:00
Jingyu Zhou	19ef7f6bdb	Skip watch of backup task's started key if it's already set The backup task may be restarted multiple times so the started key for the backup task may already be set. In this case, the wait on watch should be skipped.	2020-01-31 19:29:09 -08:00
Jingyu Zhou	f8342f0884	Add keepRunning for start backup transaction TaskBucket::keepRunning() needs to be called in backup transactions to be sure that the task has not been cancelled. If so, the task is cancelled. Otherwise, the task can continue run, causing multiple runs of the same task. Another subtle issue is that the beginVersion is persisted on backupStartedKey. So while reading it back from that key, we should set task's beginVersion with the value persisted earlier.	2020-01-31 19:29:09 -08:00
Jingyu Zhou	5a602f58e8	Start backup with a wait on all backup workers running This wait is to make sure that backup workers are already saving mutations so that no mutations are missed. The idea is that the CLI sets a "backupStartedKey" in the database and waits for allWorkerStarted() key of the backup to be set. Backup workers monitor the changes to the "backupStartedKey" and start logging mutations. Additionally, backup worker for Tag(-2,0) monitors all other workers have started (checking their saved progress version is larger than the backup's start version), and then sets the allWorkerStarted() key for the backup.	2020-01-31 19:29:09 -08:00
Jingyu Zhou	e9c7ad82cc	Comment out pseudo tag pop trace event	2020-01-31 19:29:09 -08:00
Xin Dong	7016f7903b	Fixed another build error. Do not use timeReplyIgnoreError since we don not want the logging inside that function and thus that's unnecessary anymore. Change to use ready() which basically ignores the error.	2020-01-31 15:48:29 -08:00
Xin Dong	c1f992667b	Fix build failure	2020-01-31 14:27:47 -08:00
Xin Dong	8d28c2a7f0	Added two new counters for transaction throttled error and remove the verbose trace event logging. Also changed a chain of 'if' statements into 'if-else' statements since they are mutal exclusive	2020-01-31 14:16:39 -08:00
Alex Miller	ee6490c9d1	Merge pull request #2314 from mengranwo/memory-engine New Radix-Tree based Memory Storage Engine	2020-01-30 16:20:13 -08:00

1 2 3 4 5 ...

1637 Commits