Commit Graph

90 Commits

Author SHA1 Message Date
Evan Tschannen c3299b8ebe if tls cannot be initialized, throw an error from createDatabase 2020-02-26 18:53:06 -08:00
Evan Tschannen 65fbe0d0bc revert AcceptSocket priority change because of bad performance results 2020-02-21 19:22:14 -08:00
Evan Tschannen dc3826e2fd fix: tls throttling would re-insert the failure into the map 2020-02-20 18:17:39 -08:00
Evan Tschannen f04e311a1e Merge commit 'b46d6e25e24993ab5a5f04091fd3235050b7cd09' into feature-boost-ssl
# Conflicts:
#	fdbserver/SimulatedCluster.actor.cpp
#	flow/Net2.actor.cpp
2020-02-20 17:36:38 -08:00
Evan Tschannen 7d54acf4ca removed an unnecessary yield 2020-02-20 14:41:49 -08:00
Evan Tschannen 69b5a1fbe3 more priority improvements 2020-02-20 10:11:43 -08:00
Evan Tschannen fd8a58b035 re-added support for the TLS_DISABLED flag 2020-02-19 18:37:47 -08:00
Evan Tschannen 761da5a059 code cleanup 2020-02-19 17:59:45 -08:00
Alex Miller 88d36af9c7 Fix --tls_password and add better error logging
This refactors all tls settings into a TLSParams object so that we can
set the password before loading any certificates.

It turns out that the FDBLibTLS code did really nice things with error
logging, but I just didn't understand openssl enough before to realize
what pieces I should be copying.
2020-02-19 00:57:05 -08:00
Evan Tschannen 693e469003 Changed the handshake lock to a BoundedFlowLock, which will enforce that old handshakes complete before starting to initiate new handshakes 2020-02-14 16:49:52 -08:00
Evan Tschannen dcbce3593e fixed TLS in simulation 2020-02-10 14:00:21 -08:00
Alex Miller e390dbd36c Add a non-FDBLibTLS verify peers framework to new TLS impl 2020-02-06 21:06:52 -08:00
Evan Tschannen 38d8d0d675 fixed simulation 2020-02-06 19:29:31 -08:00
Evan Tschannen 69de430057 separate handshaking from connection to improve pipelining 2020-02-06 16:45:54 -08:00
Evan Tschannen 53d0867a17 limit the number of connections a process can attempt to establish in parallel 2020-02-04 18:15:10 -08:00
Evan Tschannen 84853dd1fd switched SSL implementation to use boost ssl 2020-02-04 14:56:40 -08:00
Evan Tschannen 231d7830a0 more accurate calculation on the amount of time that proxy should wait before getting a version from the master 2020-01-26 19:47:12 -08:00
Evan Tschannen 7a4b459f07 wait for a tls handshake to complete before returning a connection
wait for multiple tls errors before throttling
2020-01-21 16:45:15 -08:00
Evan Tschannen e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen 1f7eb1f738 throttle outgoing tls connections before establishing a network connection
store serverTLSConnectionThrottler map inside of g_network, so that it works properly with simulation
2020-01-12 16:44:30 -08:00
Evan Tschannen 2e20c12200
Merge pull request #2475 from ajbeamon/priority-busy-fixes
Fix PriorityBusy calculation and add PriorityMaxBusy
2020-01-10 12:47:17 -08:00
Evan Tschannen d8c3c2fda4 Improved prioritization of commit path on the proxies 2019-12-18 16:56:35 -08:00
A.J. Beamon a093021855 Fix priority time calculation. Track max priority busy rather than seconds squared. 2019-12-17 09:14:54 -08:00
Evan Tschannen 5a6bc2aa71 increase the priority of cluster controller recruitment to prefer recruitment over sending serverDBInfo 2019-12-04 16:28:41 -08:00
Evan Tschannen 5f1ef53f62 increase the priority at which the cluster controller registers workers to avoid having a saturated cluster controller recruit a master without all available workers 2019-12-04 16:17:41 -08:00
Evan Tschannen 1e5677b55a increase the priority of reboot and recruitment requests 2019-11-11 15:17:11 -08:00
A.J. Beamon fa6e45a852 Separate AsioReactor sleep and react into two different functions. Track slow tasks and time spent in react, track time spent in launch. Don't track react time at priority 0. 2019-08-28 14:35:48 -07:00
Evan Tschannen da8163fd5a allow one requests every second to skip there all_alteratives_failed delay, because if a client has a timeout longer than the delay we will never invalidate the key servers cache 2019-08-09 13:03:40 -07:00
mpilman 370ba8b841 Remove --object-serializer flag from executables 2019-08-06 09:25:40 -07:00
Evan Tschannen 98c3b24036
Merge pull request #1869 from alexmiller-apple/sharded-txs-performance
Raise the priority of TLogRejoin above TLogPeekReply
2019-07-26 13:30:13 -07:00
Alex Miller df7f0cffa1 Raise the priority of TLogRejoin above the default work priority.
With sharded txs tags, the master now receives data from transaction
logs at an order of magnitude higher rate.  This is the intentional
desires result of sharding the txs tag.  With a sufficient number of
TLogs, the master will saturate its CPU time handling the peek
responses.

Performance tests revealed some unstable oddities in how long a recovery
would take, which was eventually root caused to a priority inversion
between TLogRejoin requests and TLog peek replies.

Once peek replies saturate the CPU, the master would proceed to ignore
further TLogRejoin messages.  TLogRejoin is what marks a TLog as
available to the failure monitor, which is also what decides between a
ServerPeekCursor and a MergePeekCursor for a SetPeekCursor.  Ignoring
TLogRejoins meant that the sharded txs locality tags for those servers
would be merge peeked over all TLogs.  This is much less efficient than
just peeking one copy of data from the one preferred server.

Depending on the race between TLogPeek replies saturating the CPU and
TLogRejoin requests being submitted, a variable number of tags would be
affected, and thus the performance test would have some variance in its
results.
2019-07-19 16:55:04 -07:00
Alex Miller c3a8ae4752
Merge pull request #1791 from fzhjon/fetch-keys-requests-priority
Introduce priority to fetchKeys requests from data distribution
2019-07-19 14:54:51 -07:00
mpilman 6a4a129cf5 fixed silly boost visitor bug 2019-07-16 15:10:55 -07:00
mpilman b18666d942 statically link libstdc++ on Linux and remove std::variant
this will hopefully fix #1610
2019-07-16 14:53:16 -07:00
Jon Fu f707d186fe added new priority for fetchkeys requests and adjusted ddmetrics workload to run parallel with mako 2019-07-11 09:56:58 -07:00
A.J. Beamon 69d7c4f79c Merge branch 'master' into track-run-loop-busyness
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Net2.actor.cpp
#	flow/network.h
2019-07-09 18:39:23 -07:00
Alex Miller 888f4f92e0 Fix errors and TaskPriority more priorities. 2019-07-03 21:03:58 -07:00
Alex Miller ea6898144d Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-07-03 20:44:15 -07:00
Jingyu Zhou 5ea2e69016 Remove a fdbprc header from flow library
Flow should be an independent library.
2019-07-03 19:56:38 -07:00
Evan Tschannen 3fb0999e10 revert storage server priority changes 2019-07-02 16:54:47 -07:00
Alex Miller 8e1ab6e7db Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-28 17:32:54 -07:00
Evan Tschannen 4cef1d3937 Experimental change of storage write priority 2019-06-28 16:54:22 -07:00
A.J. Beamon 7f23814841 Track run loop busyness and report it in status. 2019-06-26 14:03:02 -07:00
Alex Miller bf883d7055 Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-25 14:26:50 -07:00
Evan Tschannen 0fe6edc254
Merge pull request #1678 from mpilman/features/external-workload
Features/external workload
2019-06-25 13:53:19 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
mpilman 2eff2b7e21 First simple test is working (but very buggy) 2019-06-19 13:03:41 -07:00
mpilman 68ce9a5e75 ProtocolVersion type - second try 2019-06-18 17:55:27 -07:00
Vishesh Yadav a8e408e268 run clang-format on changes 2019-06-10 14:10:24 -07:00
Vishesh Yadav 6b4d30c3ae failmon: Identify client vs server when starting failure monitoring client 2019-06-09 00:43:12 -07:00