chaoguang
6f90228a0b
change to krmSetRangeCoalescing
2020-03-12 11:31:36 -07:00
Meng Xu
1759d5c8c4
Apply clang-format
2020-03-12 10:18:53 -07:00
chaoguang
4e8cb0cb96
add krmSetRangeCoalescing for RYWTr
2020-03-12 09:53:00 -07:00
chaoguang
c2f0c41c52
use krmSetRange
2020-03-11 23:12:38 -07:00
chaoguang
0094293d50
add const vars
2020-03-11 23:11:49 -07:00
chaoguang
6ae60870fc
use krmSetRange
2020-03-11 13:20:40 -07:00
chaoguang
bdabb8638e
Change prefix
2020-03-11 12:40:40 -07:00
chaoguang
d1c56d3b57
add constant KeyRefs in SystemData
2020-03-11 12:25:50 -07:00
Meng Xu
bd345f85db
ConsistencyCheck:Fix failue due to address inconsistency between process and worker
...
With TLS, a worker (or process) can have a TLS address and non-TLS address.
When a process is created in simulation, the primary address is TLS by default.
The non-TLS one is the TLS address port plus one.
In a connection between two workers, if their primary addresses do not enable
or disable TLS together, one worker will swap its primary address and secondary address
so that the TLS config of the two endpoints can match.
The swap can make the primary address no longer the TLS one that was created
when the process is created. And the swap only happens for worker instead of
process struct in simulation.
This swap can cause worker->address != process->address.
In checkForExtraDataStores actor, we use worker->address to check if a process
is killable and use the process->address to kill the process. The inconsistency
can cause simulation to kill a protected process that is not killable and leads
to simulation failure.
2020-03-10 21:07:16 -07:00
chaoguang
698198a09e
Merge remote-tracking branch 'upstream/master' into report-conflicting-key
2020-03-09 10:50:33 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
15f1a75d4f
updated documentation for 6.2.18
2020-03-06 11:16:10 -08:00
Evan Tschannen
dbfc0cbcc0
Merge pull request #2781 from alexmiller-apple/certificate-refresh
...
Refresh certificates used for handshaking when they change on disk
2020-03-06 11:12:04 -08:00
A.J. Beamon
fd8d569b91
Fix a few typos.
2020-03-05 14:42:07 -08:00
A.J. Beamon
6479034645
Add more metrics to the TransactionMetrics event
2020-03-05 14:00:44 -08:00
Alex Miller
595dd77ed1
Merge remote-tracking branch 'upstream/release-6.2' into certificate-refresh
2020-03-04 20:25:42 -08:00
Alex Miller
9b5ef3416e
Refactor TLSParams into TLSConfig + LoadedTLSConfig
...
The idea being that we keep around a TLSConfig that the configuration
that the user has provided, and then when we want to intialize an SSL
context, we ask the TLSConfig to load all certificates and return us a
LoadedTLSConfig that is a concrete set of certificate bytes in memory.
initTLS now just takes the in-memory bytes and applies them to the ssl
context.
This is a large refactor to lead up into certificate refeshing, where we
will periodically check for changes to the certificates, and then
re-load them and apply them to a new SSL context.
2020-03-04 20:14:47 -08:00
Xin Dong
39610d15f8
Revert this change since it somehow introduced a random crash detected on circus
2020-03-04 16:14:38 -08:00
Evan Tschannen
c73cae0feb
Merge pull request #2760 from ajbeamon/client-version-fixes
...
Improvements to client version reporting
2020-03-04 15:52:49 -08:00
A.J. Beamon
d80cef8308
Merge pull request #2775 from etschannen/release-6.2
...
fix: blobstore needs to handshake tls connections
2020-03-04 15:09:43 -08:00
chaoguang
7a76e9556d
Merge remote-tracking branch 'upstream/master' into report-conflicting-key
2020-03-04 11:24:39 -08:00
Meng Xu
1ef4cb432b
Merge branch 'master' into mengxu/fast-restore-robust-and-visibility-PR-v2
2020-03-01 20:08:07 -08:00
Meng Xu
ad9b3fb4a8
DD:Add trace for detailed relocate shard info
2020-02-29 13:45:10 -08:00
Evan Tschannen
b0062f58d3
fix: blobstore needs to handshake tls connections
2020-02-28 15:44:22 -08:00
Evan Tschannen
c11c24b79d
removed the fdbrpc version of platform.h
2020-02-28 14:56:10 -08:00
Evan Tschannen
6054c05963
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/fdbserver.actor.cpp
# versions.target
2020-02-28 12:11:05 -08:00
A.J. Beamon
d1e1fea42d
Our binaries that act like clients (fdbcli, backup and DR binaries) were reporting an unknown client version. Clients did not react if the list of supported versions changed.
2020-02-28 09:35:21 -08:00
Xin Dong
13e72f7b3b
Merge pull request #2605 from dongxinEric/fix/1977/report-inability-to-flush-trace-log
...
Report inability to flush trace logs.
2020-02-27 12:36:55 -08:00
Evan Tschannen
c3299b8ebe
if tls cannot be initialized, throw an error from createDatabase
2020-02-26 18:53:06 -08:00
Evan Tschannen
d1598e7c99
set_verify_peers throws an error instead of returning a value
2020-02-26 16:06:16 -08:00
Evan Tschannen
2586bade68
re-added support for configuration TLS options with environment variables
2020-02-26 15:33:48 -08:00
Meng Xu
ca726fc68e
FastRestore:Introduce OOM protection
...
An actor is schedulable to run if the current worker has enough resourc, i.e.,
the worker's memory usage is below the threshold;
Exception: If the actor is working on the current version batch, we have to schedule
the actor to run to avoid dead-lock.
Future: When we release the actors that are blocked by memory usage, we should release them
in increasing order of their version batch.
2020-02-26 14:09:18 -08:00
Evan Tschannen
924d335aa7
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# flow/Knobs.cpp
# flow/Knobs.h
2020-02-25 18:25:19 -08:00
Evan Tschannen
d3bca19960
backup should also submit on the first proxy for similar reasons to DR
2020-02-25 15:57:32 -08:00
Evan Tschannen
a486ec2de0
pipelined fdbdr status
2020-02-25 15:48:00 -08:00
Xin Dong
090c89e90a
Addressed review comments. Fix the bug where issues on a worker may be wrongly cleared by subsequent GetDBinfo request.
2020-02-25 15:39:38 -08:00
Xin Dong
1c346fcfb0
Added the new issues into Status Schema. Remove the issue reporting in lastError since:
...
- If the issue string contains the error number, status schema needs to be super verbose to include all possible issue strings
- If the issue string does not contain the error number, the generic issue string can be pretty useless.
Thus now specific issues are being reported before calling lastError
2020-02-25 15:38:14 -08:00
A.J. Beamon
71782ff803
Update fdbclient/MasterProxyInterface.h
2020-02-25 15:30:19 -08:00
Evan Tschannen
daee15cbb5
fix: starting a DR should do the commit on the first proxy to ensure all mutations from previous backups have been flushed
2020-02-25 12:35:24 -08:00
Evan Tschannen
13a523a355
fix: commit on first proxy did not always commit to the first proxy
2020-02-25 12:34:31 -08:00
Alvin Moore
0f64505d0b
Merge branch 'release-6.2' of github.com:apple/foundationdb
...
Needed to pull in changes to build docker
2020-02-23 23:27:53 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
A.J. Beamon
4c696d5bf2
Merge branch 'release-6.2' into dd-better-rebalance-logging
...
# Conflicts:
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-21 17:41:00 -08:00
A.J. Beamon
6810a03283
Add more logging to valley filler and mountain chopper
2020-02-21 10:55:14 -08:00
Evan Tschannen
f04e311a1e
Merge commit 'b46d6e25e24993ab5a5f04091fd3235050b7cd09' into feature-boost-ssl
...
# Conflicts:
# fdbserver/SimulatedCluster.actor.cpp
# flow/Net2.actor.cpp
2020-02-20 17:36:38 -08:00
Evan Tschannen
efbc8141a0
fix: messed up define
2020-02-20 17:29:06 -08:00
Evan Tschannen
3bef06dd47
TLS_DISABLED also implies we do not have openssl
2020-02-20 17:20:48 -08:00
A.J. Beamon
e1fb568fd1
Merge branch 'release-6.2' into dd-use-available-space
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
2020-02-20 16:12:42 -08:00
A.J. Beamon
4f1301b2dd
Merge pull request #2583 from etschannen/feature-keep-status-connected
...
Clients should not disconnect from the CC after fetching status
2020-02-20 13:12:30 -08:00
Evan Tschannen
24c6f7616f
removed unused code
2020-02-20 11:57:54 -08:00
A.J. Beamon
4c9c736253
Data distribution uses available space instead of free space when evaluating whether processes are low on space and penalizing them.
2020-02-20 11:21:03 -08:00
Evan Tschannen
08c318d28a
re-added the connect lock in the fdbcli so that the timeout is not spent before a connection has been initiated (because of the handshake lock)
2020-02-20 10:43:34 -08:00
Evan Tschannen
fd8a58b035
re-added support for the TLS_DISABLED flag
2020-02-19 18:37:47 -08:00
Evan Tschannen
761da5a059
code cleanup
2020-02-19 17:59:45 -08:00
Evan Tschannen
e06c3e2eb7
fix: checkForExcludedServer needs to check both the tls and non-tls address
2020-02-19 15:10:54 -08:00
Alex Miller
88d36af9c7
Fix --tls_password and add better error logging
...
This refactors all tls settings into a TLSParams object so that we can
set the password before loading any certificates.
It turns out that the FDBLibTLS code did really nice things with error
logging, but I just didn't understand openssl enough before to realize
what pieces I should be copying.
2020-02-19 00:57:05 -08:00
Meng Xu
94d799552e
FastRestore:Apply clang-format against master
2020-02-18 16:41:59 -08:00
Meng Xu
132f5aa9ba
FastRestore:Improve trace name and cosmetic change
2020-02-18 16:41:19 -08:00
Steve Atherton
3d72c2a661
BackupContainerFilesystem no longer unnecessarily depends on abspath() to find the last part of a path string, since it shouldn't touch the local filesystem in the remote case.
2020-02-18 16:35:00 -08:00
Meng Xu
31a6ec34b7
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-18 16:17:59 -08:00
Meng Xu
a12a161fb3
Merge branch 'master' into mengxu/fast-restore-pipeline-PR
2020-02-18 14:49:52 -08:00
Meng Xu
c603b20e7e
FastRestore:Resolve review comments
2020-02-18 14:08:27 -08:00
A.J. Beamon
649fc6ba94
Merge pull request #2329 from davisp/trace-clock-source-network-option
...
Add network option for the trace clock source
2020-02-15 10:43:00 -08:00
Paul J. Davis
32e285a761
Add network option for the trace clock source
...
This option allows clients to select the clock source for trace events
similar to the `--traceclock` command line parameter for `fdbserver`.
Using the `realtime` clock sources makes loading event data into
OpenTracing systems like Jaeger more useful.
2020-02-15 11:30:43 -06:00
Markus Pilman
ccf590e193
Merge branch 'master' of github.com:apple/foundationdb into features/boost70
2020-02-14 22:05:51 -08:00
mpilman
3a1e878a9b
Upgrade to boost 1.72
2020-02-14 18:10:13 -08:00
Evan Tschannen
663d176fdb
fix: coordinators auto could added 0.0.0.0:0 as a coordinator
2020-02-14 16:50:55 -08:00
Alex Miller
94e7f790d8
Merge pull request #2667 from atn34/atn34/remove-flatbuffers-knob
...
Remove USE_OBJECT_SERIALIZER knob
2020-02-14 15:44:38 -08:00
Evan Tschannen
96eec756b3
more simulation fixes
2020-02-12 15:12:43 -08:00
Xin Dong
1849939bc3
Added a delay to avoid get stuck in a loop because the request is not versioned and thus if a storage server is behind it might not know it has been assigned a shard range that a proxy thinks it has.
2020-02-12 15:01:26 -08:00
Xin Dong
2e1d03cbe7
Addressed AJ's review comments
2020-02-12 14:57:40 -08:00
Xin Dong
03287a0214
Fix build error.
2020-02-12 14:57:40 -08:00
Xin Dong
57f0c11712
Address Evan's review comments
2020-02-12 14:57:40 -08:00
Xin Dong
d20ce99774
Resolved the review comment and renamed the functions
2020-02-12 14:57:40 -08:00
Xin Dong
d934aed1d7
Because when the user issue 'getStorageByteSample' on a large key range, which can be as large as the whole DB, we need to change the behavior of 'waitStorageMetricsMultipleLocation' to avoid the case where a target key range got moved/splited by DD and thus the call to 'waitMetircs' on the corresponding storage server will return 'wrong_shard_server' error and thus the whole 'waitStorageMetricsMultipleLocation' will be retried on the large key range. What we want is to do the retry only for the key range that caused the error.
2020-02-12 14:57:40 -08:00
Xin Dong
807204e676
Update fdbclient/MultiVersionTransaction.actor.cpp
...
Apply A.J's suggestion.
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-02-12 14:57:40 -08:00
Xin Dong
d5c3f821e2
Added missing pieces.
2020-02-12 14:57:40 -08:00
Xin Dong
70f89042fd
Remove comment that does not apply anymore
2020-02-12 14:57:40 -08:00
Xin Dong
0c16d43c2f
Added necessary plumbings to expose byte sample collected by storage servers to fdb_c library
2020-02-12 14:57:40 -08:00
Andrew Noyes
1248d2b8b4
Remove USE_OBJECT_SERIALIZER knob
2020-02-12 10:41:52 -08:00
Evan Tschannen
38a5511b96
additional simulation fixes
2020-02-11 15:52:06 -08:00
Andrew Noyes
86089fdc1b
Merge branch 'release-6.2' into atn34/configure-locked
2020-02-11 13:51:41 -08:00
Evan Tschannen
fd5eb5946e
Merge pull request #2606 from ajbeamon/options-documentation-fix
...
Fix database default retry limit documentation.
2020-02-11 13:29:30 -08:00
Meng Xu
cda8fc189e
FastRestore:AtomicOp:Intro weighted size for atomicOp
...
atomicOp has an amplified performance overhead to the cluster,
for example, an ADD operation can be small, but SS has to load
the value to do the operation and the value can be large.
2020-02-11 12:48:05 -08:00
Andrew Noyes
7b5de42d43
Address review comments
2020-02-11 10:40:09 -08:00
mpilman
5a9d420cb7
Merge remote-tracking branch 'upstream/release-6.2' into release-merges/20200210
2020-02-10 10:02:05 -08:00
A.J. Beamon
ff44bd2b33
Merge pull request #2639 from atn34/atn34/include-port-in-address-default
...
Enable include_port_in_address by default for api version 700
2020-02-10 09:50:59 -08:00
Markus Pilman
e71fe44ee3
Merge branch 'master' into features/icc
2020-02-08 21:33:02 -08:00
Alex Miller
e390dbd36c
Add a non-FDBLibTLS verify peers framework to new TLS impl
2020-02-06 21:06:52 -08:00
Evan Tschannen
38d8d0d675
fixed simulation
2020-02-06 19:29:31 -08:00
Evan Tschannen
844c8511c4
Merge pull request #2588 from jzhou77/backup-worker
...
Integrate new backup worker with existing backup command
2020-02-05 14:14:43 -08:00
Jingyu Zhou
d5849af5c0
Address review comments
2020-02-05 10:33:51 -08:00
Andrew Noyes
90c1b2df88
Don't include header
2020-02-05 09:57:18 -08:00
Evan Tschannen
53d0867a17
limit the number of connections a process can attempt to establish in parallel
2020-02-04 18:15:10 -08:00
Meng Xu
08443ed18d
FastRestore:Remove debug trace for debugging connection errors
2020-02-04 17:06:02 -08:00
Evan Tschannen
c8c34333c1
increased connect parallelism
2020-02-04 14:59:20 -08:00
Evan Tschannen
84853dd1fd
switched SSL implementation to use boost ssl
2020-02-04 14:56:40 -08:00
Evan Tschannen
8449badb3e
Merge pull request #1868 from dongxinEric/fix/1827/error_instead_of_timeout
...
Send error back before put the GRV request with PRIORITY_BATCH into t…
2020-02-04 14:32:47 -08:00
mpilman
100402aadf
Don't call operator explicitely
2020-02-04 11:03:43 -08:00
mpilman
52ca752dd3
Merge remote-tracking branch 'origin/features/icc' into features/icc
2020-02-04 10:29:49 -08:00
mpilman
d09e07f1f5
Merge remote-tracking branch 'upstream/master' into features/icc
2020-02-04 10:26:18 -08:00
Jingyu Zhou
52c6737411
Rename backupLoggingEnabled as backupWorkerEnabled
...
To highlight the changes for 7.0 backup changes. By default,
backup_worker_enabled flag is set for 7.0 version.
2020-02-04 10:09:16 -08:00
Jingyu Zhou
7c10683c77
Backup workers save logs into right containers
...
The mutation logs of backup workers are saved into "mlogs" directory under the
container directory. The backup worker has been restructured to handle multiple
backups, where each one is stored in a separate backup container.
In the backup worker, mutations pulled from TLogs are buffered in a message
queue. When writing out to different containers, their corresponding mutation
ranges are used to check if a mutation should be written. When a new backup
is submitted by the client, "backupStartedKey" is updated. The worker monitors
this key, updates its internal map of backups, and then next pull from TLog
needs to wait for the readiness of the new backup. This is to ensure when
worker 0 sets the backup is started, all workers have already been logging
mutations for the backup.
2020-02-03 20:27:14 -08:00
Jingyu Zhou
0db03f1d3c
Use backup_logging_enabled flag
...
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Meng Xu
3b57bf1781
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-03 17:23:54 -08:00
Evan Tschannen
4524831456
Merge pull request #2518 from vishesh/task/failmon-remove-server
...
FailureMonitoring: Server processes no longer need to talk to ClusterController
2020-02-03 17:22:50 -08:00
Meng Xu
ca3b6135d0
FastRestore:Add debug to see why restore role is not connected
...
Reason: restore is a fdbserver who does not register with CC.
The new failure monitor changes how connection works for client and server.
For client, it does not connect to CC to get connected.
For server, it has to connect to CC to get connected.
Restore worker becomes the special role that behaves like a client but is a server.
2020-02-03 17:19:52 -08:00
Andrew Noyes
a043646d1d
Don't read in init database transaction
2020-02-03 15:32:01 -08:00
Andrew Noyes
61a727d701
Allow new databases to be configured as locked
2020-02-03 15:32:01 -08:00
Andrew Noyes
2ce887012c
Respect api version for include_port_in_address
2020-02-03 15:25:30 -08:00
Andrew Noyes
07a3051f0e
Enable include_port_in_address by default for api version 700
...
Resolves #2607
2020-02-03 15:10:00 -08:00
Meng Xu
9c2046b11b
FastRestore:Minic fdbd to monitor coordintors
...
Before we start a fdb restore process.
2020-02-03 14:48:31 -08:00
Jingyu Zhou
297f22726c
Add backup_type database configuration option
...
Update simulation tests to randomly set backup types to be one of: old backup
(default), new backup (tagged), or both (default+tagged).
2020-01-31 19:29:09 -08:00
Jingyu Zhou
38aa1903fd
Add a DB configuration option for backup workers
...
Right now, the default is to keep the old backup behavior, i.e., do NOT use
backup workers. Specifically, if BackupType is not set (or is set to default),
the master will not recruit backup workers and will not add pseudo locality for
backup workers.
The StartFullBackupTaskFunc is updated to check if backup worker is enabled.
Only when it is not enabled, starting a backup will wait on all backup workers
to be started.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
f7956cfbfc
Clear backup UID from backupStartedKey when finish/abort backups
...
Clearing this key signals backup workers that backup is no longer needed. When
no backup is going on, the backup workers switch to the NOOP state.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
19ef7f6bdb
Skip watch of backup task's started key if it's already set
...
The backup task may be restarted multiple times so the started key for the
backup task may already be set. In this case, the wait on watch should be
skipped.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
f8342f0884
Add keepRunning for start backup transaction
...
TaskBucket::keepRunning() needs to be called in backup transactions to be sure
that the task has not been cancelled. If so, the task is cancelled. Otherwise,
the task can continue run, causing multiple runs of the same task.
Another subtle issue is that the beginVersion is persisted on backupStartedKey.
So while reading it back from that key, we should set task's beginVersion with
the value persisted earlier.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
5a602f58e8
Start backup with a wait on all backup workers running
...
This wait is to make sure that backup workers are already saving mutations so
that no mutations are missed. The idea is that the CLI sets a "backupStartedKey"
in the database and waits for allWorkerStarted() key of the backup to be set.
Backup workers monitor the changes to the "backupStartedKey" and start logging
mutations. Additionally, backup worker for Tag(-2,0) monitors all other workers
have started (checking their saved progress version is larger than the backup's
start version), and then sets the allWorkerStarted() key for the backup.
2020-01-31 19:29:09 -08:00
Jingyu Zhou
e9c7ad82cc
Comment out pseudo tag pop trace event
2020-01-31 19:29:09 -08:00
Xin Dong
7016f7903b
Fixed another build error. Do not use timeReplyIgnoreError since we don not want the logging inside that function and thus that's unnecessary anymore. Change to use ready() which basically ignores the error.
2020-01-31 15:48:29 -08:00
Xin Dong
c1f992667b
Fix build failure
2020-01-31 14:27:47 -08:00
Xin Dong
8d28c2a7f0
Added two new counters for transaction throttled error and remove the verbose trace event logging. Also changed a chain of 'if' statements into 'if-else' statements since they are mutal exclusive
2020-01-31 14:16:39 -08:00
Alex Miller
ee6490c9d1
Merge pull request #2314 from mengranwo/memory-engine
...
New Radix-Tree based Memory Storage Engine
2020-01-30 16:20:13 -08:00
Xin Dong
1b313a4f7e
Address review comments. Rebased with latest master
2020-01-30 14:13:56 -08:00
Xin Dong
e21426d12a
Send error back to the GRV requests with batch priority when the cluster is saturated, instead of blindly enqueue the requests and let the client timeout.
2020-01-30 14:13:56 -08:00
A.J. Beamon
f46090d081
Fix typo
2020-01-29 14:25:24 -08:00
Xin Dong
68f6e7be97
Added some java docs and changed some texts in fdboptions for further comparision and clarification around .asList() and .iterator()
2020-01-29 14:01:54 -08:00
A.J. Beamon
adc72dde43
Merge branch 'release-6.2' into merge-release-6.2-into-master
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-01-28 12:16:24 -08:00
Meng Xu
cab9d51e06
Merge branch 'master' into mengxu/fast-restore-pipeline-PR
2020-01-27 18:16:26 -08:00
Meng Xu
141609e80a
FastRestore:Improve code style and fix typos
2020-01-27 18:13:14 -08:00
A.J. Beamon
aa7acaf02c
Fix transaction retry limit description. Fix typo.
2020-01-27 10:33:31 -08:00
Meng Xu
76f30e71dc
FastRestore:Init VersionBatch explicitly
...
Built-in variable may not be zero initialized by
compiler provided default constructor.
2020-01-26 13:15:45 -08:00
Evan Tschannen
1ed3ba7170
establishing 20 TLS connections in parallel is too expensive
2020-01-25 10:59:20 -08:00
Meng Xu
16f9ec45bd
Merge branch 'master' into mengxu/fast-restore-pipeline-PR
2020-01-23 20:15:21 -08:00
A.J. Beamon
b2c8a4a34c
Merge pull request #2519 from xumengpanda/mengxu/fast-restore-versionBatch-fixSize-PR
...
Performant restore [14/XX]: Ensure each version-batch not exceed a configured size
2020-01-23 16:49:01 -08:00
A.J. Beamon
8a065b9da4
Merge pull request #2557 from alexmiller-apple/reduce-versionstamp-conflictranges
...
Narrow the unreadable range of keys after a versionstamped key operation
2020-01-23 11:14:47 -08:00
Jingyu Zhou
1eaea91cb3
Address review comments
2020-01-22 19:42:13 -08:00
Jingyu Zhou
1311fec45a
Add an option to get minKnownCommittedVersion from Proxies
...
The backup worker needs to use this version for popping when running in a NOOP
mode. This option is added to GetReadVersionRequest and proxies will send back
minKnownCommittedVersion if the option is set.
Also add a couple of knobs for backup workers.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
c08a192c75
Add a backup start key
...
If the backup key is not set, do not recruit backup workers for old epoches.
2020-01-22 19:42:13 -08:00
Jingyu Zhou
1123157ae0
Ignore mutations large than the end version
2020-01-22 19:38:46 -08:00
Jingyu Zhou
d8c74e7e1a
Extend BackupContainer to support tagged log files
...
That is, the file name contains the log router tag ID as the last component,
e.g., "log,39638169,42718056,016f52a4d16ef36fd3335db9c68abfc1,1048576,1".
2020-01-22 19:38:46 -08:00
Jingyu Zhou
7f7ec99170
Serialize and deserialize new backup files
...
The BackupWorker writes files that can be read by FileConverter. Move
StringRefReader to the header file for reuse in FileConverter.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
f21d7ca44c
Add tag ID to backup log file names
2020-01-22 19:38:46 -08:00
Jingyu Zhou
485d3d0feb
Use Version instead of int64_t
2020-01-22 19:38:45 -08:00
Jingyu Zhou
dafcaee844
Fix compiler errors.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
c7f51782b8
Use override for virtual functions.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
19d6a889ff
Recruit backup workers for old epochs
...
If there are unfinished ranges in the old epochs, the new master will recruit
backup workers responsible for finishing these ranges. These workers remains in
the cluster until the next epoch, when it will remove itself.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
11964733b7
WIP: should be divided into smaller commits.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
41f0cf2bb5
Add decode function for backup progress
2020-01-22 19:38:45 -08:00
Jingyu Zhou
7da9f47f26
Enable pop from backup workers
...
This is still WIP as some edge cases can trigger test failure, most likely due
to not popping mutations by backup workers when epoch ends.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
de8d953865
Add backup role, class, and worker skeleton
2020-01-22 19:35:30 -08:00
Evan Tschannen
73ad702d14
Clients which fetch status should not disconnect from the coordinators and cluster controller between each retrieval
2020-01-22 15:41:22 -08:00
Meng Xu
009fcdeb16
FastRestore:Sanity check each restore asset is processed exactly once
2020-01-21 17:17:45 -08:00
chaoguang
7224807d1d
fix typo in var name
2020-01-21 15:38:48 -08:00
Alex Miller
914a693f3a
Copy the key size check over also.
...
It doesn't matter in this case, as getVersionstampKeyRange would have
already passed the check, but it should be done if anyone ever calls
this function in the future.
2020-01-21 15:35:27 -08:00
Alex Miller
18e6e36a8e
Fill in the versionstamp, but still leave the position.
...
Using range.begin() stripped off the last 4 bytes that specified where
the versionstamp is, so when the mutation got to the proxy, it wouldn't
get versionstamped correctly.
2020-01-21 15:24:57 -08:00
chaoguang
5386b4ecdf
Adding prefix to returned keys, keeping consistency with query range
2020-01-21 15:18:16 -08:00
Meng Xu
4ac92d223b
Cleanup batch buffer for each restore request
2020-01-21 14:49:36 -08:00
Vishesh Yadav
daef5f011a
Merge remote-tracking branch 'apple/master' into task/failmon-remove-server
2020-01-21 13:20:15 -08:00
Meng Xu
d69bd2f661
FastRestore:Loader buffer data for multiple batches
2020-01-17 17:01:06 -08:00
Meng Xu
bfbf2164c4
FastRestore:Applier buffer data for multiple batches
2020-01-17 17:01:01 -08:00
chaoguang
5b62d672dc
update comments
2020-01-17 15:18:07 -08:00
chaoguang
17721a00f1
change vars names
2020-01-17 14:26:45 -08:00
chaoguang
670c3e629f
Fix arena issue and change std::vector back to VectorRef
2020-01-17 14:24:13 -08:00
chaoguang
d17d1c88cd
add const to local KeyRef
2020-01-17 11:31:43 -08:00
Alex Miller
eb64eede8d
Make a smaller range inaccessable after writing a versionstamped key
...
A transaction's read version is the lower bound of what a transaction's
commit version could be. Thus, we can narrow the conflict range of a
versionstamped key, and thus reduce the amount of the keyspace that is
rendered inaccessable, by filling in the read version on the
versionstamped key and using that as the lower bound of the conflict
range.
This allows reads to still be done to versionstampled keys lower than
the read version of the current transaction.
2020-01-16 21:41:59 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
chaoguang
db901d626a
Update some var names and make returned list in ResolveTransactionBatchReply as Optional
2020-01-16 14:41:06 -08:00
chaoguang
1fa934837d
Update some var names
2020-01-16 13:14:13 -08:00
chaoguang
30b9858cf5
using std::vector instead of VectorRef to avoid unknown arena change behavior
2020-01-16 13:06:36 -08:00
mengranwo
227edd4248
change memory storage engine name from memory-radixtree to memory-radixtree-beta
2020-01-15 13:49:45 -08:00
mengranwo
f597aa7e18
WIP : deployable/stable version since Nov 3. Start rebase to master branch
2020-01-15 13:49:45 -08:00
chaoguang
72d39a31f1
Change to send back index of read_conflict_range not keys, which speeds up but fail to give narrowed keyrange
2020-01-15 10:21:51 -08:00
Evan Tschannen
c93ca04ea6
Do not merge more than 100 shards together to avoid creating untrackable shards
2020-01-15 09:33:27 -08:00
chaoguang
1cf3b5b807
Change to general prefix '\xff\xff/transaction/conflicting_keys/'
2020-01-13 20:54:16 -08:00
chaoguang
1c2b116688
Push sorting and union of overlapped keyranges to clients
2020-01-13 18:08:54 -08:00
Meng Xu
f436ea806e
FastRestore:Resolve review comment
...
1) Sort logfiles by endVersion
2) Exit program early when restore will not succeed
3) Do not increase nextVersion unncessarily when
calculate version batches.
4) Change assert condition that ensures progress in
calculating version batches.
2020-01-13 14:08:27 -08:00
Evan Tschannen
8475da359c
Merge pull request #2527 from etschannen/feature-region-fixes
...
A database could perform poorly while a remote region catches up to the primary
2020-01-10 17:26:43 -08:00
Evan Tschannen
855f03a41f
ratekeeper needed to check remoteDC in another location
...
the storage server scoped a transaction incorrectly
2020-01-10 15:58:36 -08:00
chaoguang
1f633d3e5f
change local var from Key to KeyRef
2020-01-10 15:53:06 -08:00
Evan Tschannen
16bf3dbba3
Merge pull request #2512 from etschannen/feature-kill-ping
...
Improve the reliability of the kills from fdbcli
2020-01-10 13:03:49 -08:00
Evan Tschannen
c6087add51
Update fdbclient/Knobs.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-01-10 12:51:37 -08:00
Evan Tschannen
4aab9b7bc8
fix: clients would waste time attempting to read from a remote region when it was in the process of catching up
2020-01-10 12:23:59 -08:00
Evan Tschannen
da1be272cb
fix: servers which opened the database would use the full list of proxies
2020-01-10 12:20:30 -08:00
chaoguang
1cdb22c4a4
refine code according to comments
2020-01-09 15:42:21 -08:00
Alvin Moore
7628d04fb9
Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00
Meng Xu
a2b26906e8
FastRestore:Filter out empty files before distributing workload
...
and clean up unused code
2020-01-07 17:01:53 -08:00
Vishesh Yadav
6e6cfaff16
Cleanup old Failure Monitoring code
2020-01-07 15:53:32 -08:00
Vishesh Yadav
85c24dc074
Active Failure Monitoring no longer needed at server processes
...
This patch removes active failure monitoring at server processes.
Hence like client processes, servers no longer require continuously
publishing their membership to cluster controller.
When a process is marked as failed, we still need to know if it back
up at certain point, particularly when the reference count is
incremented. In that case, loadBalance may see AllAlternativesFailed
as failed. To overcome this problem, whenever peer references is
incremented and and the address is marked as failed, connectionKeeper
will bypass waiting for data and connects immediately to check if the
process is back up.
2020-01-07 15:53:32 -08:00
chaoguang
941ee6d9f9
update comments
2020-01-07 15:39:50 -08:00
Meng Xu
c29e380076
FastRestore:Remove prevVersion from LoadingParam
2020-01-07 14:59:17 -08:00
chaoguang
b8ffc72cca
Use a RYW in transactionInfo to track conflicting keys
2020-01-07 13:55:05 -08:00
chaoguang
57fb28af2c
Explicitly set read-version and clear the whole keyspace to make sure the getRange happens locally
2020-01-07 13:54:33 -08:00
Meng Xu
9df02512ab
FastRestore:Apply clang-format
2020-01-07 11:50:32 -08:00
Meng Xu
67e913c3d5
Change LoadingParam struct and endVersion definition
...
1) Remove endVersion field because it has been included in RestoreAsset;
2) Ensure endVersion in VersionBatch and RestoreAsset is always exclusive;
3) Revise ASSERT in laoder and applier in situations when the dummy commit version
is endVersion, to avoid false positive ASSERT failure.
2020-01-07 11:48:03 -08:00
Meng Xu
c3f8f3b445
FastRestore:Build VersionBatch less than threshold size
2020-01-07 11:46:56 -08:00
chaoguang
10719200c3
A hack way to call API through getRange("\xff\xff/conflicting_keys\<start_key>", "\xff\xff/conflicting_keys\<end_key>").
2020-01-06 11:22:11 -08:00
Evan Tschannen
9a3dfec7c5
open a connection with processes before attempting to kill them to improve the reliability of the kill process
...
secondaryAddresses are included in the list of processes which can be killed
2020-01-03 16:10:44 -08:00
Meng Xu
8d6f511816
FastRestore:Resolve review comment
...
Filter out range mutations that do not overlap with the restore range.
Small changes on format.
2019-12-22 20:09:10 -08:00
Meng Xu
61b29de3ce
FastRestore:Self code review
...
Clean up commented code;
Add sanity check.
2019-12-20 22:24:34 -08:00