Alex Miller
1398e9a82e
Stop background eio threads on Net2::stop().
...
This will stop eio threads for both the client (`fdb_stop_network()`)
and the server. This change is being done more for the former, but I
don't see any harm in doing the latter as well.
2020-04-18 19:40:55 -07:00
Evan Tschannen
99a58f8ee5
fix compiler errors
2020-04-17 17:47:50 -07:00
Evan Tschannen
0ee62badcd
Merge branch 'feature-tree-broadcast' into feature-small-endpoint
2020-04-17 17:16:04 -07:00
Evan Tschannen
ba3e2af473
Merge commit '5288033bcfe40c3ade97c8bf2d04cf31b3f16cb1' into feature-tree-broadcast
2020-04-17 15:17:37 -07:00
Vishesh Yadav
992002cf34
FlowTransport: don't start connectionKeeper for local peer
...
getOrOpenPeer() is used for local addresses sometimes, which ends up
starting connectionKeeper() which is unnecessary.
2020-04-16 14:04:32 -07:00
Vishesh Yadav
43f19bd463
Don't skip filesystem check on when KAIO is disabled
...
WSL the known Linux system where KAIO is not supported, can run these checks.
2020-04-16 10:56:01 -07:00
Vishesh Yadav
8c8f23bff2
Merge remote-tracking branch 'apple/master' into task/issue-1017-slow-machine-poisoning
2020-04-16 00:45:35 -07:00
Vishesh Yadav
1901f49b97
Net2FileSystem: Add guards to honor DISABLE_POSIX_KERNEL_AIO
...
- Adds some asserts in KAIO to ensure that when knob is set, we don't
end up using KAIO in any case.
- Fixes a bug where we initialize AsyncFileKAIO on Linux builds even
when KAIO is disabled. This can cause problems in systems such as
Windows Subsystem for Linux where KAIO is not supported.
FIXES #2382
2020-04-15 23:47:37 -07:00
Vishesh Yadav
da7d0093ee
Cleanup unused code
2020-04-15 19:48:25 -07:00
Vishesh Yadav
023372a226
FailMon: Mark peer failed after retrying
2020-04-15 19:47:24 -07:00
Meng Xu
d6c1baa784
FastRestore:Filter out log mutations whose version is smaller than range mutation version
2020-04-15 19:45:03 -07:00
negoyal
b85dc16c6d
Merge branch 'master' into fdb_cache_subfeature2
2020-04-14 17:07:41 -07:00
Vishesh Yadav
f959af8228
Refactor per review comments
2020-04-14 11:30:40 -07:00
Alex Miller
2e53c8c5e2
Merge pull request #2915 from tclinken/move-optimizations
...
Avoid unnecessary copies in PromiseStream
2020-04-13 21:15:16 -07:00
Evan Tschannen
dbf6afc78e
more space efficient endpoint map
2020-04-12 23:51:20 -07:00
Evan Tschannen
ff5543b579
working implementation
2020-04-12 22:18:51 -07:00
Evan Tschannen
0c2e8b9462
only serialize a single endpoint for an interface
2020-04-12 16:04:48 -07:00
Evan Tschannen
07cc0a8d74
code cleanup
2020-04-10 17:02:11 -07:00
Evan Tschannen
e8d333733a
Merge branch 'master' into feature-tree-broadcast
2020-04-10 13:51:09 -07:00
Evan Tschannen
ac4654b09e
re-suppress trace event
2020-04-10 13:50:26 -07:00
Evan Tschannen
ce4493f679
many bug fixes
2020-04-10 13:45:16 -07:00
Vishesh Yadav
fed5c543d4
Remove leftover TODO code around centralized healthmonitor
2020-04-08 22:56:07 -07:00
Vishesh Yadav
13447f439f
fdbrpc: Add a constant to onFailedFor()
...
Since, we mark an address as failed when connection is failed, this
patch adds a contant to compensate the time needed to reconnect and
make sure endpoint is actually down. This contant is equal to
FAILURE_MIN_DELAY which was used by centralized
FailureMonitoringClient earlier removed.
2020-04-08 19:34:40 -07:00
Vishesh Yadav
975e6b1d9a
Merge remote-tracking branch 'apple/master' into task/issue-1017-slow-machine-poisoning
...
Removed merge conflict with old build system.
2020-04-08 19:25:13 -07:00
tclinken
3a01d24970
Pass const ref to a_callback_fire
2020-04-08 14:50:41 -07:00
tclinken
488c20e58e
Fixed failing "/flow/flow/promisestream callbacks" unit test
2020-04-08 11:24:56 -07:00
tclinken
ff3e3fcc13
Added /flow/PromiseStream/move unit test
2020-04-08 00:45:56 -07:00
tclinken
6ab4a57123
Allow RequestStream::send to use move semantics
2020-04-07 11:06:55 -07:00
Markus Pilman
d4542dbb5a
Delete old build system
2020-04-07 11:03:45 -07:00
Vishesh Yadav
36d89db4e7
FlowTransport: Don't always close unused open connection
2020-04-06 23:05:29 -07:00
Vishesh Yadav
4214c45218
FlowTransport: Temporarily keep idle connections open for server processes
2020-04-06 21:27:54 -07:00
Evan Tschannen
a51c92854a
Merge branch 'master' into feature-tree-broadcast
...
# Conflicts:
# fdbserver/WorkerInterface.actor.h
# fdbserver/worker.actor.cpp
2020-04-06 21:09:44 -07:00
Evan Tschannen
2a1bd97120
fix compilation errors
2020-04-06 20:58:43 -07:00
Markus Pilman
8b5780c36c
don't include source and binary dir
...
This forces users to use include paths from the sources root.
So `#include "Arena.h"` won't work anymore, only
`#include "flow/Arena.h"` will.
2020-04-06 10:13:49 -07:00
Evan Tschannen
477d66b46d
implemented a tree broadcast for txn state message for proxies, and serverDBInfo for workers
2020-04-05 23:09:36 -07:00
Vishesh Yadav
fdc1048f75
Add knob to turn off marking unstable connections
2020-04-03 15:53:00 -07:00
Vishesh Yadav
613b8bb169
More TraceEvents and remove 'delayed' used for debugging
2020-04-03 15:53:00 -07:00
Vishesh Yadav
1d35f2ff5a
Mark a connection as failed for X seconds if closes too often
2020-04-03 15:53:00 -07:00
Vishesh Yadav
d90e168e24
Add HealthMonitoring skeleton code
2020-04-03 15:53:00 -07:00
Vishesh Yadav
04f925f770
Format FailureMonitor* files
2020-04-03 15:53:00 -07:00
Alvin Moore
78f0cddb14
Merge pull request #2684 from mpilman/features/boost70
...
Upgrade to boost 1.72
2020-04-03 09:30:59 -07:00
negoyal
a0c8946f31
Merge branch 'master' into fdb_cache_subfeature2
2020-04-02 12:27:04 -07:00
Meng Xu
ccbbdc4ba4
Unit test:Verify splitMutation by comparing with intersectingRanges result
2020-03-31 12:13:02 -07:00
Markus Pilman
28cd38913d
Merge branch 'master' of github.com:apple/foundationdb into features/boost70
2020-03-27 13:44:10 -07:00
negoyal
acaf91ac47
Merge branch 'master' into fdb_cache_subfeature2
2020-03-26 13:33:08 -07:00
Balachandar Namasivayam
a476127f5f
Merge pull request #2802 from xumengpanda/mengxu/debug-master-PR
...
Fix correctness failure on master branch
2020-03-18 16:07:36 -07:00
Evan Tschannen
e08f0201f1
merge release 6.2 into master
2020-03-17 12:51:47 -07:00
Evan Tschannen
c197520fa7
Merge pull request #2810 from alexmiller-apple/fdbcli-tlsinfo
...
Add a `tlsinfo` command to fdbcli that prints the certificate chain.
2020-03-16 15:47:32 -07:00
Meng Xu
7f559bc712
Cleanup code and apply clang-format
...
Self code review
2020-03-16 15:08:32 -07:00
Evan Tschannen
243c268d9d
Limit the amount of requests the proxy can queue up in memory
2020-03-13 10:17:49 -07:00
Alex Miller
0c558efcfe
Add a `tlsinfo` command to fdbcli that prints the certificate chain.
...
This requires the certificate chain to load successfully, otherwise
fdbcli will error out at an earlier point due to Net2 not being able to
configure TLS.
2020-03-13 00:11:53 -07:00
Meng Xu
0ef09539a9
addressMap[normalizedAddress]->address may not equal to normalizedAddress
2020-03-12 13:01:25 -07:00
Meng Xu
1759d5c8c4
Apply clang-format
2020-03-12 10:18:53 -07:00
Meng Xu
a9136f3f72
Add waitForUnreliableExtraStoreReboot to wait for extra store to reboot
2020-03-12 10:18:31 -07:00
Meng Xu
bd345f85db
ConsistencyCheck:Fix failue due to address inconsistency between process and worker
...
With TLS, a worker (or process) can have a TLS address and non-TLS address.
When a process is created in simulation, the primary address is TLS by default.
The non-TLS one is the TLS address port plus one.
In a connection between two workers, if their primary addresses do not enable
or disable TLS together, one worker will swap its primary address and secondary address
so that the TLS config of the two endpoints can match.
The swap can make the primary address no longer the TLS one that was created
when the process is created. And the swap only happens for worker instead of
process struct in simulation.
This swap can cause worker->address != process->address.
In checkForExtraDataStores actor, we use worker->address to check if a process
is killable and use the process->address to kill the process. The inconsistency
can cause simulation to kill a protected process that is not killable and leads
to simulation failure.
2020-03-10 21:07:16 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Evan Tschannen
dbfc0cbcc0
Merge pull request #2781 from alexmiller-apple/certificate-refresh
...
Refresh certificates used for handshaking when they change on disk
2020-03-06 11:12:04 -08:00
A.J. Beamon
effb6d2d49
Add ResolverMetrics trace event
2020-03-05 10:49:21 -08:00
Alex Miller
595dd77ed1
Merge remote-tracking branch 'upstream/release-6.2' into certificate-refresh
2020-03-04 20:25:42 -08:00
Alex Miller
9b5ef3416e
Refactor TLSParams into TLSConfig + LoadedTLSConfig
...
The idea being that we keep around a TLSConfig that the configuration
that the user has provided, and then when we want to intialize an SSL
context, we ask the TLSConfig to load all certificates and return us a
LoadedTLSConfig that is a concrete set of certificate bytes in memory.
initTLS now just takes the in-memory bytes and applies them to the ssl
context.
This is a large refactor to lead up into certificate refeshing, where we
will periodically check for changes to the certificates, and then
re-load them and apply them to a new SSL context.
2020-03-04 20:14:47 -08:00
Evan Tschannen
976c2fc7a8
Update fdbrpc/FlowTransport.actor.cpp
...
Co-Authored-By: Alex Miller <35046903+alexmiller-apple@users.noreply.github.com>
2020-03-04 16:13:59 -08:00
Evan Tschannen
820957025f
accept connections in batches of 20 to improve performance
2020-03-04 14:24:57 -08:00
Evan Tschannen
c11c24b79d
removed the fdbrpc version of platform.h
2020-02-28 14:56:10 -08:00
Evan Tschannen
6054c05963
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/fdbserver.actor.cpp
# versions.target
2020-02-28 12:11:05 -08:00
Evan Tschannen
2586bade68
re-added support for configuration TLS options with environment variables
2020-02-26 15:33:48 -08:00
negoyal
cd949eca71
Merge branch 'master' into fdb_cache_subfeature2
2020-02-26 11:22:08 -08:00
Evan Tschannen
924d335aa7
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# flow/Knobs.cpp
# flow/Knobs.h
2020-02-25 18:25:19 -08:00
Evan Tschannen
d60268123b
updated comment
2020-02-25 16:00:46 -08:00
Evan Tschannen
6e7d2ff7dd
prevent the proxy from delaying too long based on an incorrect estimate of the compute time
2020-02-25 15:46:13 -08:00
Evan Tschannen
65fbe0d0bc
revert AcceptSocket priority change because of bad performance results
2020-02-21 19:22:14 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Evan Tschannen
6f1d3ccd35
Merge branch 'release-6.2' into feature-boost-ssl
2020-02-20 20:03:40 -08:00
Evan Tschannen
7056c73f20
fixed a number of problems with findBestPolicySetSimple
2020-02-20 20:00:54 -08:00
Evan Tschannen
dc3826e2fd
fix: tls throttling would re-insert the failure into the map
2020-02-20 18:17:39 -08:00
Evan Tschannen
f04e311a1e
Merge commit 'b46d6e25e24993ab5a5f04091fd3235050b7cd09' into feature-boost-ssl
...
# Conflicts:
# fdbserver/SimulatedCluster.actor.cpp
# flow/Net2.actor.cpp
2020-02-20 17:36:38 -08:00
Evan Tschannen
a50939417b
fix: zoneid is all lower case
2020-02-20 17:26:44 -08:00
Evan Tschannen
f7a37077cc
handshake takes time in simulation
2020-02-20 15:26:56 -08:00
Evan Tschannen
d7c841a28a
Merge pull request #2589 from etschannen/feature-proxy-delay
...
Improve version pipelining on the proxy
2020-02-20 15:23:30 -08:00
Evan Tschannen
8b768e66df
Merge pull request #2694 from dongxinEric/feature/2663/specialize-policy-for-zoneid-in-cc
...
Added a specialized algorithm for PolicyOne and PolicyAcross(,'zoneId…
2020-02-20 14:46:23 -08:00
Evan Tschannen
def8ca6da3
simulation advances timer() separately from now() to better model the real world
2020-02-20 12:10:20 -08:00
Xin Dong
298d6cb3d7
Address review comments.
2020-02-20 09:34:01 -08:00
Evan Tschannen
3c4d551647
improve prioritization of connection monitor and listen given that listen is no longer expensive (because handshake is done separately)
2020-02-19 18:50:21 -08:00
Evan Tschannen
761da5a059
code cleanup
2020-02-19 17:59:45 -08:00
Evan Tschannen
a6486766c2
fix: rebooting an unreliable process will make it reliable again, but while unreliable the files for that process could have already been corrupted so simulation will think a process is healthy that is actually corrupted
2020-02-19 15:18:57 -08:00
Evan Tschannen
46d5f5e325
do not trigger the resetPing if we cannot actually remove the peer, because it will cause us to reset the timeout, so repeated calls to removePeer can keep a dead peer from being removed
2020-02-19 15:17:50 -08:00
Xin Dong
efc0d7f9d5
Added a specialized algorithm for PolicyOne and PoilcyAcross(,'zoneId',PolicyOne()) to find a set of TLog servers which will be able to fulfill the policy later.
2020-02-19 09:25:57 -08:00
Meng Xu
132f5aa9ba
FastRestore:Improve trace name and cosmetic change
2020-02-18 16:41:19 -08:00
Meng Xu
31a6ec34b7
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-18 16:17:59 -08:00
Alex Miller
9d88356468
Merge pull request #2686 from mpilman/features/avoid-unnecessary-template-instanciations
...
Removed dead code
2020-02-17 14:46:39 -08:00
Alex Miller
9144c3e8ca
Merge pull request #2087 from atn34/issue-1226
...
Allow member actors access to private variables
2020-02-17 14:39:31 -08:00
mpilman
aac94a766b
Removed dead code
2020-02-15 21:56:48 -08:00
mpilman
c2ccbbadd8
Revert "Several fixes to make FDB compile with clang-cl"
...
This reverts commit 0e1f9efb85
.
2020-02-14 22:18:30 -08:00
Markus Pilman
ccf590e193
Merge branch 'master' of github.com:apple/foundationdb into features/boost70
2020-02-14 22:05:51 -08:00
Markus Pilman
0e1f9efb85
Several fixes to make FDB compile with clang-cl
2020-02-14 22:05:43 -08:00
mpilman
3a1e878a9b
Upgrade to boost 1.72
2020-02-14 18:10:13 -08:00
Andrew Noyes
1248d2b8b4
Remove USE_OBJECT_SERIALIZER knob
2020-02-12 10:41:52 -08:00
Evan Tschannen
dcbce3593e
fixed TLS in simulation
2020-02-10 14:00:21 -08:00
Markus Pilman
e71fe44ee3
Merge branch 'master' into features/icc
2020-02-08 21:33:02 -08:00
Alex Miller
6b921ac900
Stop building FDBLibTLS and stop linking against libtls.so
...
Which now means OpenSSL and LibreSSL are equally acceptable.
2020-02-06 21:13:58 -08:00
Evan Tschannen
38d8d0d675
fixed simulation
2020-02-06 19:29:31 -08:00
Evan Tschannen
69de430057
separate handshaking from connection to improve pipelining
2020-02-06 16:45:54 -08:00
negoyal
85cc35e81e
Merge branch 'master' into HEAD
2020-02-05 14:59:55 -08:00
Evan Tschannen
53d0867a17
limit the number of connections a process can attempt to establish in parallel
2020-02-04 18:15:10 -08:00
Andrew Noyes
fcefb4bf6d
Merge branch 'master' into issue-1226
2020-02-04 17:46:36 -08:00
Evan Tschannen
84853dd1fd
switched SSL implementation to use boost ssl
2020-02-04 14:56:40 -08:00
Evan Tschannen
8449badb3e
Merge pull request #1868 from dongxinEric/fix/1827/error_instead_of_timeout
...
Send error back before put the GRV request with PRIORITY_BATCH into t…
2020-02-04 14:32:47 -08:00
mpilman
d09e07f1f5
Merge remote-tracking branch 'upstream/master' into features/icc
2020-02-04 10:26:18 -08:00
Meng Xu
3b57bf1781
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-03 17:23:54 -08:00
Evan Tschannen
4524831456
Merge pull request #2518 from vishesh/task/failmon-remove-server
...
FailureMonitoring: Server processes no longer need to talk to ClusterController
2020-02-03 17:22:50 -08:00
Meng Xu
ca3b6135d0
FastRestore:Add debug to see why restore role is not connected
...
Reason: restore is a fdbserver who does not register with CC.
The new failure monitor changes how connection works for client and server.
For client, it does not connect to CC to get connected.
For server, it has to connect to CC to get connected.
Restore worker becomes the special role that behaves like a client but is a server.
2020-02-03 17:19:52 -08:00
Jingyu Zhou
d73a19fea4
Fix valgrind found error of reading uninitialized data
2020-02-02 13:16:23 -08:00
Xin Dong
e21426d12a
Send error back to the GRV requests with batch priority when the cluster is saturated, instead of blindly enqueue the requests and let the client timeout.
2020-01-30 14:13:56 -08:00
Meng Xu
ff92401ed5
FastRestore:Add FastRestoreClass and blob option
...
To simplify test in circus framework, we need a fastrestore class;
To get data from blob in real mode, restore worker should set up
blob credentials in order to call BackupContainer interface to
get all backup files.
2020-01-28 20:25:05 -08:00
mpilman
1cca70d4ea
Added logic to keep locationCache up to date
2020-01-26 20:53:50 -08:00
Evan Tschannen
231d7830a0
more accurate calculation on the amount of time that proxy should wait before getting a version from the master
2020-01-26 19:47:12 -08:00
Alex Miller
d06d664ed7
Merge pull request #2149 from tapaswenipathak/ticket-2135
...
Add comments to explain functions in ReplicationUtils.cpp
2020-01-24 16:35:11 -08:00
Evan Tschannen
76e192d490
Merge pull request #2538 from alexmiller-apple/hashlittle2-to-crc32c
...
Convert more hashlittle{,2} uses to crc32c_append
2020-01-23 17:54:38 -08:00
Jingyu Zhou
1eaea91cb3
Address review comments
2020-01-22 19:42:13 -08:00
Jingyu Zhou
de8d953865
Add backup role, class, and worker skeleton
2020-01-22 19:35:30 -08:00
Evan Tschannen
78adbea834
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# flow/Knobs.h
# versions.target
2020-01-21 21:38:19 -08:00
Evan Tschannen
afd3ec13ff
added knobs
2020-01-21 18:58:34 -08:00
Evan Tschannen
4a716b85f6
we must still finish accept before the handshake completes
2020-01-21 16:55:34 -08:00
Evan Tschannen
7a4b459f07
wait for a tls handshake to complete before returning a connection
...
wait for multiple tls errors before throttling
2020-01-21 16:45:15 -08:00
Vishesh Yadav
daef5f011a
Merge remote-tracking branch 'apple/master' into task/failmon-remove-server
2020-01-21 13:20:15 -08:00
Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Alex Miller
d23aa5f46c
Convert AsyncFile uses from hashlittle to crc32
2020-01-15 19:16:16 -08:00
Alex Miller
8d44a2a0d4
Convert sim2 from hashlittle to crc32c
2020-01-13 18:28:40 -08:00
Alex Miller
da73164eda
Move crc32c from fdbrpc to flow
...
So that we can use it from a piece of flow code without breaking module
boundaries.
Also rename generated-constants to crc32c-generated-constants so that
it's more apparent that they're related files.
2020-01-13 18:19:30 -08:00
Evan Tschannen
0e916fdbed
throttle client TLS errors longer than server errors so that when both happen simultaneously the server throttling will be disabled when the client makes its next attempt
2020-01-12 22:12:18 -08:00
Evan Tschannen
bc4d33a55b
fixed compiler error
2020-01-12 17:01:08 -08:00
Evan Tschannen
1f7eb1f738
throttle outgoing tls connections before establishing a network connection
...
store serverTLSConnectionThrottler map inside of g_network, so that it works properly with simulation
2020-01-12 16:44:30 -08:00
Balachandar Namasivayam
ccfbf04e20
Revert "Throttle both client and server side TLS connections if there is a handshake error."
...
This reverts commit 1b1be9f764
.
2020-01-10 18:41:02 -08:00
Balachandar Namasivayam
1b1be9f764
Throttle both client and server side TLS connections if there is a handshake error.
2020-01-10 18:08:02 -08:00
Balachandar Namasivayam
249e5a73b6
Minor optimization that erases an entry from the map only when it is present.
2020-01-10 16:39:25 -08:00
Balachandar Namasivayam
741aa523e6
Establishing TLS connection through the handshake process is expensive and the fdbserver process can get easily saturated with doing repeated TLS handshakes with only a few hundreds of clients have bad certificate. Hence throttle the number of handshakes done on the server per client ip if it has a bad certificate.
2020-01-10 16:19:41 -08:00
Alvin Moore
7628d04fb9
Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00
Vishesh Yadav
598b2eaeb0
fdbrpc: Add warning when peer is unavailable for long time
2020-01-08 13:55:13 -08:00
Evan Tschannen
83ad9caf54
implemented a load balancing algorithm which evens out the number of requests processes by each proxy
2020-01-08 01:59:01 -08:00
Vishesh Yadav
6b8daeae6e
FailureMonitor: Cleanup code around reconnecting to failed connections
...
If there a peer with non-zero reference, keep trying to reconnect when
a connection fails. We already make sure it doesn't happen only once
every 2 seconds.
2020-01-07 15:53:32 -08:00
Vishesh Yadav
ba096f59f9
FailureMonitor: Update comment on how healthy/failed addresses are tracked
2020-01-07 15:53:32 -08:00
Vishesh Yadav
85c24dc074
Active Failure Monitoring no longer needed at server processes
...
This patch removes active failure monitoring at server processes.
Hence like client processes, servers no longer require continuously
publishing their membership to cluster controller.
When a process is marked as failed, we still need to know if it back
up at certain point, particularly when the reference count is
incremented. In that case, loadBalance may see AllAlternativesFailed
as failed. To overcome this problem, whenever peer references is
incremented and and the address is marked as failed, connectionKeeper
will bypass waiting for data and connects immediately to check if the
process is back up.
2020-01-07 15:53:32 -08:00
Evan Tschannen
a9541f8066
Merge branch 'feature-addpeer-fix' of github.com:etschannen/foundationdb into feature-addpeer-fix
2020-01-03 12:15:45 -08:00
Evan Tschannen
7152469cc3
log the base trace event before the endpoint messages
2020-01-03 12:15:38 -08:00
Andrew Noyes
e16bdab3b4
Assert that a request hasn't already gotten a reply
2020-01-03 09:41:54 -08:00
Evan Tschannen
6b28e3b43b
Update fdbrpc/LoadBalance.actor.h
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-01-02 17:37:58 -08:00
Evan Tschannen
9e137d3b49
fix: addPeerReference only marks a connection as healthy if it is the first peerReference
...
added additional logging to long LoadBalance calls, and when the failure monitor state changes for an address
2019-12-19 18:26:29 -08:00
Alvin Moore
3bf971ba8b
Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/storageserver.actor.cpp
2019-12-12 07:13:12 -08:00
Andrew Noyes
b126570a20
Augment tests to catch A.J.'s counterexample
2019-12-05 10:27:12 -08:00
Andrew Noyes
bd9faae1e7
Add a unit test that repros #2406
2019-12-04 16:45:32 -08:00
Andrew Noyes
46d10dc7dc
Fix "null passed as argument declared not null"
...
Fix several such reports from ubsan
E.g.
/Users/anoyes/workspace/foundationdb/flow/Arena.h:794:16: runtime error: null pointer passed as argument 1, which is declared to never be null
2019-12-03 14:46:53 -08:00
Evan Tschannen
3c769fcf60
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-22 15:39:19 -08:00
Evan Tschannen
ebcb2f79ed
Merge branch 'master' of github.com:apple/foundationdb
2019-11-22 15:34:49 -08:00
Evan Tschannen
746b357b7f
fix: simulation should not allow connections to dead processes
2019-11-21 20:36:40 -08:00
Evan Tschannen
27cb299d84
simulation can sometimes randomly hang or throw connection_failed, instead of always doing one or the other
2019-11-21 16:24:18 -08:00
Evan Tschannen
067dc55bfb
fix: making _conn a state variable was keeping connections open that should be closed
2019-11-21 16:08:32 -08:00
Evan Tschannen
569c6d4476
throws of connection_failed() from net()->connect did not result in clients marking a connection as failed in the failure monitor
2019-11-21 13:08:59 -08:00
Evan Tschannen
2727b91c46
simulation tests network connections failing due to errors instead of just hanging
2019-11-21 12:33:07 -08:00
Evan Tschannen
dbfa3dc217
Merge pull request #2200 from negoyal/storage-cache-subfeature1
...
Storage cache subfeature1
2019-11-20 13:59:06 -08:00
A.J. Beamon
ed8d3f163c
Rename hgVersion to sourceVersion.
2019-11-15 12:26:51 -08:00
Evan Tschannen
57fdbbf975
fix: in simulation dead connections need to stop receiving traffic after 1 second
2019-11-15 10:16:44 -08:00
Evan Tschannen
8d3ef89540
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbclient/MutationList.h
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2019-11-14 15:49:56 -08:00
A.J. Beamon
aad9fa3baa
Don't check for too many connections closed on client connections
2019-11-13 13:00:43 -08:00
negoyal
a4a0bf18f9
Merging with Master.
2019-11-12 13:01:29 -08:00
A.J. Beamon
ef801a6432
Rename LargePacket warnings to distinguish between sent and received packets. Also remove Net2_ prefix from packet size trace events.
2019-11-12 09:23:46 -08:00
Evan Tschannen
24bc9aeaf5
Merge pull request #2224 from atn34/test-buggified-delay
...
Replace /flow/delayOrdering with /flow/buggifiedDelay
2019-10-31 11:20:55 -07:00
Evan Tschannen
3325980c03
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer_6_0.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.actor.h
# fdbserver/worker.actor.cpp
# versions.target
2019-10-24 17:38:15 -07:00
mpilman
92ce9ef5dc
updated comment
2019-10-24 11:45:32 -07:00
mpilman
325a8e4213
remove confusing USE_ODIRECT knob
2019-10-24 11:44:03 -07:00
mpilman
f23392ec5a
Don't use O_DIRECT in EIO by default
2019-10-24 11:39:55 -07:00
mpilman
7ad0e20e48
Added knob to disable O_DIRECT
2019-10-24 11:20:14 -07:00
mpilman
f41f19b5f6
Introduced knob to set eio parallelism
2019-10-24 11:20:14 -07:00
mpilman
85977fb8d5
Use O_DIRECT with EIO
2019-10-24 11:20:14 -07:00
Evan Tschannen
688940b685
merge 6.2 into master
2019-10-21 11:43:46 -07:00
Tapasweni Pathak
0fab0d1a25
remove whitespaces
2019-10-17 22:18:26 +05:30
Tapasweni Pathak
4000ddadc0
remove comments from ReplicationUtils.cpp file
2019-10-17 22:18:26 +05:30
Tapasweni Pathak
795c951b7b
Add function documentation
2019-10-17 22:18:21 +05:30
Tapasweni Pathak
f2f65eccc9
Merge remote-tracking branch 'upstream/master' into ticket-2135
2019-10-16 23:25:38 +05:30
A.J. Beamon
562ce17eca
Initialize outgoingConnectionIdle in the constructor. Add back line to connectionKeeper that is needed in some looping cases
2019-10-10 12:48:35 -07:00
A.J. Beamon
ad8604f24a
Fix spurious ConnectionClosed event when starting a connection.
2019-10-10 10:34:44 -07:00
Andrew Noyes
69fe02933d
Replace /flow/delayOrdering with /flow/buggifiedDelay
...
Seems that we don't want the property that delays become ready in order
to hold, so make sure it doesn't hold in the simulator.
2019-10-09 12:59:01 -07:00
Vishesh Yadav
162b4efaea
Merge pull request #2180 from alexmiller-apple/cmake-staticify-libraries
...
Make FDBLibTLS and thirdparty static libraries.
2019-10-03 14:00:07 -07:00
Alex Miller
3b9678356e
Make FDBLibTLS and thirdparty static libraries.
...
They're statically linked anyway, and this fixes an issue with CMake
complaining that there are cyclic dependencies that are non-static.
2019-09-30 18:32:24 -07:00
Andrew Noyes
a2243b6501
Add test for delay ordering
...
See #2148
2019-09-26 12:40:19 -07:00
Tapasweni Pathak
50d43cff15
Add comments to explain functions in ReplicationUtils.cpp
2019-09-26 23:03:13 +05:30
Andrew Noyes
ab650c6fe6
Make test resilient to changes elsewhere in file
2019-09-12 09:21:17 -07:00
Andrew Noyes
8353dd4731
Test line pragmas in generated actor files
2019-09-11 17:12:55 -07:00
Andrew Noyes
9d531db985
Handle nested classes
2019-09-10 13:25:58 -07:00
Andrew Noyes
c487f021f0
WIP - seems to work for 1 level of nesting
2019-09-10 13:09:37 -07:00
Andrew Noyes
eeb2da5c9d
WIP - simple example compiles
2019-09-09 22:56:19 -07:00
Meng Xu
c2355f721e
Merge branch 'master' into mengxu/performant-restore-PR
2019-09-04 17:11:42 -07:00
Meng Xu
d160810662
FastRestore:Resolve review comments
2019-09-04 16:48:43 -07:00
Evan Tschannen
24aad14f06
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-08-30 17:23:58 -07:00
Evan Tschannen
a7237c4302
Merge pull request #2045 from atn34/disallow-scalar-network-messages
...
Disallow scalar network messages
2019-08-30 13:38:54 -07:00
Vishesh Yadav
cf56b005e8
Add comment for pinging incompatible clients
...
If client is incompatible, connectionMonitor relies on peer->resetPing
to be triggered whenever data is received to prevent ping timeout.
The server stopped sending pings since 6.2 which meant resetPing
doesn't get triggered.
2019-08-30 11:17:22 -07:00
A.J. Beamon
1fdabe62c2
Merge pull request #2048 from etschannen/feature-fix-connections
...
Fixed two different ways useful connections were being closed
2019-08-30 11:05:02 -07:00
Evan Tschannen
8fc28dd730
fix: continue pinging incompatible clients from the servers so that the the client knows the server process is active
2019-08-29 16:51:03 -07:00
Evan Tschannen
1c0484cffc
fix: do not close connections which have outstanding tryGetReplies with the peer
2019-08-29 16:49:57 -07:00
Andrew Noyes
6aa0ada7b1
Replace scalar root types with proper messages
2019-08-28 14:40:50 -07:00
Meng Xu
d5b9c46de9
Increase delay in monitoring LeakedConnection
...
trackLeakedConnection actor should give server enough time to
close its connection due to idle connection.
The current logic waits for at least 24 seconds to detect and close
an idle connection.
The current trackLeakedConnection actor waits for about 30 seconds
to claim LeakedConnection error.
We increase the delay in trackLeakedConnection actor to avoid
false positive error in simulation test.
Co-authored by: Vishesh Yadav
2019-08-23 15:10:39 -07:00
Evan Tschannen
41b908752e
increased move keys parallelism to be less of a decrease just in case lowering this could effect normal data distribution
...
raised target durability lag versions to give more time for batch limiting to come into play before this limit is hit
changed max_bad_options to better reflect the name
2019-08-21 14:55:21 -07:00