Meng Xu
31a6ec34b7
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-18 16:17:59 -08:00
Meng Xu
a12a161fb3
Merge branch 'master' into mengxu/fast-restore-pipeline-PR
2020-02-18 14:49:52 -08:00
Meng Xu
c603b20e7e
FastRestore:Resolve review comments
2020-02-18 14:08:27 -08:00
A.J. Beamon
649fc6ba94
Merge pull request #2329 from davisp/trace-clock-source-network-option
...
Add network option for the trace clock source
2020-02-15 10:43:00 -08:00
Paul J. Davis
32e285a761
Add network option for the trace clock source
...
This option allows clients to select the clock source for trace events
similar to the `--traceclock` command line parameter for `fdbserver`.
Using the `realtime` clock sources makes loading event data into
OpenTracing systems like Jaeger more useful.
2020-02-15 11:30:43 -06:00
Evan Tschannen
663d176fdb
fix: coordinators auto could added 0.0.0.0:0 as a coordinator
2020-02-14 16:50:55 -08:00
Alex Miller
94e7f790d8
Merge pull request #2667 from atn34/atn34/remove-flatbuffers-knob
...
Remove USE_OBJECT_SERIALIZER knob
2020-02-14 15:44:38 -08:00
Evan Tschannen
96eec756b3
more simulation fixes
2020-02-12 15:12:43 -08:00
Xin Dong
1849939bc3
Added a delay to avoid get stuck in a loop because the request is not versioned and thus if a storage server is behind it might not know it has been assigned a shard range that a proxy thinks it has.
2020-02-12 15:01:26 -08:00
Xin Dong
2e1d03cbe7
Addressed AJ's review comments
2020-02-12 14:57:40 -08:00
Xin Dong
03287a0214
Fix build error.
2020-02-12 14:57:40 -08:00
Xin Dong
57f0c11712
Address Evan's review comments
2020-02-12 14:57:40 -08:00
Xin Dong
d20ce99774
Resolved the review comment and renamed the functions
2020-02-12 14:57:40 -08:00
Xin Dong
d934aed1d7
Because when the user issue 'getStorageByteSample' on a large key range, which can be as large as the whole DB, we need to change the behavior of 'waitStorageMetricsMultipleLocation' to avoid the case where a target key range got moved/splited by DD and thus the call to 'waitMetircs' on the corresponding storage server will return 'wrong_shard_server' error and thus the whole 'waitStorageMetricsMultipleLocation' will be retried on the large key range. What we want is to do the retry only for the key range that caused the error.
2020-02-12 14:57:40 -08:00
Xin Dong
807204e676
Update fdbclient/MultiVersionTransaction.actor.cpp
...
Apply A.J's suggestion.
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-12 14:57:40 -08:00
Xin Dong
d5c3f821e2
Added missing pieces.
2020-02-12 14:57:40 -08:00
Xin Dong
70f89042fd
Remove comment that does not apply anymore
2020-02-12 14:57:40 -08:00
Xin Dong
0c16d43c2f
Added necessary plumbings to expose byte sample collected by storage servers to fdb_c library
2020-02-12 14:57:40 -08:00
Andrew Noyes
1248d2b8b4
Remove USE_OBJECT_SERIALIZER knob
2020-02-12 10:41:52 -08:00
Evan Tschannen
38a5511b96
additional simulation fixes
2020-02-11 15:52:06 -08:00
Andrew Noyes
86089fdc1b
Merge branch 'release-6.2' into atn34/configure-locked
2020-02-11 13:51:41 -08:00
Evan Tschannen
fd5eb5946e
Merge pull request #2606 from ajbeamon/options-documentation-fix
...
Fix database default retry limit documentation.
2020-02-11 13:29:30 -08:00
Meng Xu
cda8fc189e
FastRestore:AtomicOp:Intro weighted size for atomicOp
...
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Andrew Noyes
7b5de42d43
Address review comments
2020-02-11 10:40:09 -08:00
mpilman
5a9d420cb7
Merge remote-tracking branch 'upstream/release-6.2' into release-merges/20200210
2020-02-10 10:02:05 -08:00
A.J. Beamon
ff44bd2b33
Merge pull request #2639 from atn34/atn34/include-port-in-address-default
...
Enable include_port_in_address by default for api version 700
2020-02-10 09:50:59 -08:00
Markus Pilman
e71fe44ee3
Merge branch 'master' into features/icc
2020-02-08 21:33:02 -08:00
Alex Miller
e390dbd36c
Add a non-FDBLibTLS verify peers framework to new TLS impl
2020-02-06 21:06:52 -08:00
Evan Tschannen
38d8d0d675
fixed simulation
2020-02-06 19:29:31 -08:00
Evan Tschannen
844c8511c4
Merge pull request #2588 from jzhou77/backup-worker
...
Integrate new backup worker with existing backup command
2020-02-05 14:14:43 -08:00
Jingyu Zhou
d5849af5c0
Address review comments
2020-02-05 10:33:51 -08:00
Andrew Noyes
90c1b2df88
Don't include header
2020-02-05 09:57:18 -08:00
Evan Tschannen
53d0867a17
limit the number of connections a process can attempt to establish in parallel
2020-02-04 18:15:10 -08:00
Meng Xu
08443ed18d
FastRestore:Remove debug trace for debugging connection errors
2020-02-04 17:06:02 -08:00
Evan Tschannen
c8c34333c1
increased connect parallelism
2020-02-04 14:59:20 -08:00
Evan Tschannen
84853dd1fd
switched SSL implementation to use boost ssl
2020-02-04 14:56:40 -08:00
Evan Tschannen
8449badb3e
Merge pull request #1868 from dongxinEric/fix/1827/error_instead_of_timeout
...
Send error back before put the GRV request with PRIORITY_BATCH into t…
2020-02-04 14:32:47 -08:00
mpilman
100402aadf
Don't call operator explicitely
2020-02-04 11:03:43 -08:00
mpilman
52ca752dd3
Merge remote-tracking branch 'origin/features/icc' into features/icc
2020-02-04 10:29:49 -08:00
mpilman
d09e07f1f5
Merge remote-tracking branch 'upstream/master' into features/icc
2020-02-04 10:26:18 -08:00
Jingyu Zhou
52c6737411
Rename backupLoggingEnabled as backupWorkerEnabled
...
To highlight the changes for 7.0 backup changes. By default,
backup_worker_enabled flag is set for 7.0 version.
2020-02-04 10:09:16 -08:00
Jingyu Zhou
7c10683c77
Backup workers save logs into right containers
...
The mutation logs of backup workers are saved into "mlogs" directory under the
container directory. The backup worker has been restructured to handle multiple
backups, where each one is stored in a separate backup container.
In the backup worker, mutations pulled from TLogs are buffered in a message
queue. When writing out to different containers, their corresponding mutation
ranges are used to check if a mutation should be written. When a new backup
is submitted by the client, "backupStartedKey" is updated. The worker monitors
this key, updates its internal map of backups, and then next pull from TLog
needs to wait for the readiness of the new backup. This is to ensure when
worker 0 sets the backup is started, all workers have already been logging
mutations for the backup.
2020-02-03 20:27:14 -08:00
Jingyu Zhou
0db03f1d3c
Use backup_logging_enabled flag
...
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Meng Xu
3b57bf1781
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-03 17:23:54 -08:00
Evan Tschannen
4524831456
Merge pull request #2518 from vishesh/task/failmon-remove-server
...
FailureMonitoring: Server processes no longer need to talk to ClusterController
2020-02-03 17:22:50 -08:00
Meng Xu
ca3b6135d0
FastRestore:Add debug to see why restore role is not connected
...
Reason: restore is a fdbserver who does not register with CC.
The new failure monitor changes how connection works for client and server.
For client, it does not connect to CC to get connected.
For server, it has to connect to CC to get connected.
Restore worker becomes the special role that behaves like a client but is a server.
2020-02-03 17:19:52 -08:00
Andrew Noyes
a043646d1d
Don't read in init database transaction
2020-02-03 15:32:01 -08:00
Andrew Noyes
61a727d701
Allow new databases to be configured as locked
2020-02-03 15:32:01 -08:00
Andrew Noyes
2ce887012c
Respect api version for include_port_in_address
2020-02-03 15:25:30 -08:00
Andrew Noyes
07a3051f0e
Enable include_port_in_address by default for api version 700
...
Resolves #2607
2020-02-03 15:10:00 -08:00
Meng Xu
9c2046b11b
FastRestore:Minic fdbd to monitor coordintors
...
Before we start a fdb restore process.
2020-02-03 14:48:31 -08:00
Jingyu Zhou
297f22726c
Add backup_type database configuration option
...
Update simulation tests to randomly set backup types to be one of: old backup
(default), new backup (tagged), or both (default+tagged).
2020-01-31 19:29:09 -08:00
Jingyu Zhou
38aa1903fd
Add a DB configuration option for backup workers
...
Right now, the default is to keep the old backup behavior, i.e., do NOT use
backup workers. Specifically, if BackupType is not set (or is set to default),
the master will not recruit backup workers and will not add pseudo locality for
backup workers.
The StartFullBackupTaskFunc is updated to check if backup worker is enabled.
Only when it is not enabled, starting a backup will wait on all backup workers
to be started.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
f7956cfbfc
Clear backup UID from backupStartedKey when finish/abort backups
...
Clearing this key signals backup workers that backup is no longer needed. When
no backup is going on, the backup workers switch to the NOOP state.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
19ef7f6bdb
Skip watch of backup task's started key if it's already set
...
The backup task may be restarted multiple times so the started key for the
backup task may already be set. In this case, the wait on watch should be
skipped.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
f8342f0884
Add keepRunning for start backup transaction
...
TaskBucket::keepRunning() needs to be called in backup transactions to be sure
that the task has not been cancelled. If so, the task is cancelled. Otherwise,
the task can continue run, causing multiple runs of the same task.
Another subtle issue is that the beginVersion is persisted on backupStartedKey.
So while reading it back from that key, we should set task's beginVersion with
the value persisted earlier.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
5a602f58e8
Start backup with a wait on all backup workers running
...
This wait is to make sure that backup workers are already saving mutations so
that no mutations are missed. The idea is that the CLI sets a "backupStartedKey"
in the database and waits for allWorkerStarted() key of the backup to be set.
Backup workers monitor the changes to the "backupStartedKey" and start logging
mutations. Additionally, backup worker for Tag(-2,0) monitors all other workers
have started (checking their saved progress version is larger than the backup's
start version), and then sets the allWorkerStarted() key for the backup.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
e9c7ad82cc
Comment out pseudo tag pop trace event
2020-01-31 19:29:09 -08:00
Xin Dong
7016f7903b
Fixed another build error. Do not use timeReplyIgnoreError since we don not want the logging inside that function and thus that's unnecessary anymore. Change to use ready() which basically ignores the error.
2020-01-31 15:48:29 -08:00
Xin Dong
c1f992667b
Fix build failure
2020-01-31 14:27:47 -08:00
Xin Dong
8d28c2a7f0
Added two new counters for transaction throttled error and remove the verbose trace event logging. Also changed a chain of 'if' statements into 'if-else' statements since they are mutal exclusive
2020-01-31 14:16:39 -08:00
Alex Miller
ee6490c9d1
Merge pull request #2314 from mengranwo/memory-engine
...
New Radix-Tree based Memory Storage Engine
2020-01-30 16:20:13 -08:00
Xin Dong
1b313a4f7e
Address review comments. Rebased with latest master
2020-01-30 14:13:56 -08:00
Xin Dong
e21426d12a
Send error back to the GRV requests with batch priority when the cluster is saturated, instead of blindly enqueue the requests and let the client timeout.
2020-01-30 14:13:56 -08:00
A.J. Beamon
f46090d081
Fix typo
2020-01-29 14:25:24 -08:00
Xin Dong
68f6e7be97
Added some java docs and changed some texts in fdboptions for further comparision and clarification around .asList() and .iterator()
2020-01-29 14:01:54 -08:00
A.J. Beamon
adc72dde43
Merge branch 'release-6.2' into merge-release-6.2-into-master
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-01-28 12:16:24 -08:00
Meng Xu
cab9d51e06
Merge branch 'master' into mengxu/fast-restore-pipeline-PR
2020-01-27 18:16:26 -08:00
Meng Xu
141609e80a
FastRestore:Improve code style and fix typos
2020-01-27 18:13:14 -08:00
A.J. Beamon
aa7acaf02c
Fix transaction retry limit description. Fix typo.
2020-01-27 10:33:31 -08:00
Meng Xu
76f30e71dc
FastRestore:Init VersionBatch explicitly
...
Built-in variable may not be zero initialized by
compiler provided default constructor.
2020-01-26 13:15:45 -08:00
Evan Tschannen
1ed3ba7170
establishing 20 TLS connections in parallel is too expensive
2020-01-25 10:59:20 -08:00
Meng Xu
16f9ec45bd
Merge branch 'master' into mengxu/fast-restore-pipeline-PR
2020-01-23 20:15:21 -08:00
A.J. Beamon
b2c8a4a34c
Merge pull request #2519 from xumengpanda/mengxu/fast-restore-versionBatch-fixSize-PR
...
Performant restore [14/XX]: Ensure each version-batch not exceed a configured size
2020-01-23 16:49:01 -08:00
A.J. Beamon
8a065b9da4
Merge pull request #2557 from alexmiller-apple/reduce-versionstamp-conflictranges
...
Narrow the unreadable range of keys after a versionstamped key operation
2020-01-23 11:14:47 -08:00
Jingyu Zhou
1eaea91cb3
Address review comments
2020-01-22 19:42:13 -08:00
Jingyu Zhou
1311fec45a
Add an option to get minKnownCommittedVersion from Proxies
...
The backup worker needs to use this version for popping when running in a NOOP
mode. This option is added to GetReadVersionRequest and proxies will send back
minKnownCommittedVersion if the option is set.
Also add a couple of knobs for backup workers.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
c08a192c75
Add a backup start key
...
If the backup key is not set, do not recruit backup workers for old epoches.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
1123157ae0
Ignore mutations large than the end version
2020-01-22 19:38:46 -08:00
Jingyu Zhou
d8c74e7e1a
Extend BackupContainer to support tagged log files
...
That is, the file name contains the log router tag ID as the last component,
e.g., "log,39638169,42718056,016f52a4d16ef36fd3335db9c68abfc1,1048576,1".
2020-01-22 19:38:46 -08:00
Jingyu Zhou
7f7ec99170
Serialize and deserialize new backup files
...
The BackupWorker writes files that can be read by FileConverter. Move
StringRefReader to the header file for reuse in FileConverter.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
f21d7ca44c
Add tag ID to backup log file names
2020-01-22 19:38:46 -08:00
Jingyu Zhou
485d3d0feb
Use Version instead of int64_t
2020-01-22 19:38:45 -08:00
Jingyu Zhou
dafcaee844
Fix compiler errors.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
c7f51782b8
Use override for virtual functions.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
19d6a889ff
Recruit backup workers for old epochs
...
If there are unfinished ranges in the old epochs, the new master will recruit
backup workers responsible for finishing these ranges. These workers remains in
the cluster until the next epoch, when it will remove itself.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
11964733b7
WIP: should be divided into smaller commits.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
41f0cf2bb5
Add decode function for backup progress
2020-01-22 19:38:45 -08:00
Jingyu Zhou
7da9f47f26
Enable pop from backup workers
...
This is still WIP as some edge cases can trigger test failure, most likely due
to not popping mutations by backup workers when epoch ends.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
de8d953865
Add backup role, class, and worker skeleton
2020-01-22 19:35:30 -08:00
Evan Tschannen
73ad702d14
Clients which fetch status should not disconnect from the coordinators and cluster controller between each retrieval
2020-01-22 15:41:22 -08:00
Meng Xu
009fcdeb16
FastRestore:Sanity check each restore asset is processed exactly once
2020-01-21 17:17:45 -08:00
Alex Miller
914a693f3a
Copy the key size check over also.
...
It doesn't matter in this case, as getVersionstampKeyRange would have
already passed the check, but it should be done if anyone ever calls
this function in the future.
2020-01-21 15:35:27 -08:00
Alex Miller
18e6e36a8e
Fill in the versionstamp, but still leave the position.
...
Using range.begin() stripped off the last 4 bytes that specified where
the versionstamp is, so when the mutation got to the proxy, it wouldn't
get versionstamped correctly.
2020-01-21 15:24:57 -08:00
Meng Xu
4ac92d223b
Cleanup batch buffer for each restore request
2020-01-21 14:49:36 -08:00
Vishesh Yadav
daef5f011a
Merge remote-tracking branch 'apple/master' into task/failmon-remove-server
2020-01-21 13:20:15 -08:00
Meng Xu
d69bd2f661
FastRestore:Loader buffer data for multiple batches
2020-01-17 17:01:06 -08:00
Meng Xu
bfbf2164c4
FastRestore:Applier buffer data for multiple batches
2020-01-17 17:01:01 -08:00
Alex Miller
eb64eede8d
Make a smaller range inaccessable after writing a versionstamped key
...
A transaction's read version is the lower bound of what a transaction's
commit version could be. Thus, we can narrow the conflict range of a
versionstamped key, and thus reduce the amount of the keyspace that is
rendered inaccessable, by filling in the read version on the
versionstamped key and using that as the lower bound of the conflict
range.
This allows reads to still be done to versionstampled keys lower than
the read version of the current transaction.
2020-01-16 21:41:59 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00