Commit Graph

916 Commits

Author SHA1 Message Date
Evan Tschannen afd3ec13ff added knobs 2020-01-21 18:58:34 -08:00
Evan Tschannen 7a4b459f07 wait for a tls handshake to complete before returning a connection
wait for multiple tls errors before throttling
2020-01-21 16:45:15 -08:00
Evan Tschannen e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen 0e916fdbed throttle client TLS errors longer than server errors so that when both happen simultaneously the server throttling will be disabled when the client makes its next attempt 2020-01-12 22:12:18 -08:00
Evan Tschannen 1f7eb1f738 throttle outgoing tls connections before establishing a network connection
store serverTLSConnectionThrottler map inside of g_network, so that it works properly with simulation
2020-01-12 16:44:30 -08:00
Evan Tschannen ef5dfb87dc
Merge pull request #2529 from bnamasivayam/tls-throtlling
Establishing TLS connection through the handshake process is expensiv…
2020-01-12 14:56:21 -08:00
Balachandar Namasivayam 741aa523e6 Establishing TLS connection through the handshake process is expensive and the fdbserver process can get easily saturated with doing repeated TLS handshakes with only a few hundreds of clients have bad certificate. Hence throttle the number of handshakes done on the server per client ip if it has a bad certificate. 2020-01-10 16:19:41 -08:00
Evan Tschannen 2e20c12200
Merge pull request #2475 from ajbeamon/priority-busy-fixes
Fix PriorityBusy calculation and add PriorityMaxBusy
2020-01-10 12:47:17 -08:00
Evan Tschannen 176a1b6319
Merge pull request #2515 from ajbeamon/remove-timer-in-slowtask-profiler
Fix slow task profiler crash
2020-01-10 12:41:57 -08:00
Evan Tschannen a5f544818c
Merge pull request #2420 from ajbeamon/trace-clock-source-fix
Revert change to make g_trace_clock thread_local, ...
2020-01-10 12:36:38 -08:00
A.J. Beamon de5a591b15 Attempt a minor pointless change to fix the build 2020-01-06 15:17:13 -08:00
A.J. Beamon 6cf38790d6 Reorganize declaration of variable and add release note. 2020-01-06 12:27:56 -08:00
A.J. Beamon 4a52864023 Remove call of timer() from the slow task profiling signal handler, as it can lead to crashes if called at the wrong time. 2020-01-06 12:19:45 -08:00
Evan Tschannen 16b5af067c changed trace event name 2020-01-03 16:03:29 -08:00
Evan Tschannen deb032745a fix: do not set logged until then end of the function 2020-01-03 12:45:23 -08:00
Evan Tschannen 1867d30017 added asserts to protect against future actions on a trace event that has been logged 2020-01-03 12:31:06 -08:00
Evan Tschannen 7152469cc3 log the base trace event before the endpoint messages 2020-01-03 12:15:38 -08:00
Evan Tschannen 6e473c3a83 Merge branch 'release-6.2' into feature-addpeer-fix 2020-01-02 17:37:23 -08:00
Evan Tschannen 032797ca5c
Merge pull request #2430 from etschannen/release-6.2
Reduce recovery times caused by saturating the cluster controller
2020-01-02 17:35:59 -08:00
A.J. Beamon 3dd3ac3cfd Merge branch 'release-6.2' into trace-clock-source-fix
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-01-02 15:14:12 -08:00
A.J. Beamon ca01593067 Cap busyness to 1.0 at logging time to cover all cases where it could be measured above. 2020-01-02 15:10:42 -08:00
Evan Tschannen 9e137d3b49 fix: addPeerReference only marks a connection as healthy if it is the first peerReference
added additional logging to long LoadBalance calls, and when the failure monitor state changes for an address
2019-12-19 18:26:29 -08:00
A.J. Beamon 3b28c7f103 Throw the correct error in deleteFile 2019-12-19 14:13:09 -08:00
Evan Tschannen d8c3c2fda4 Improved prioritization of commit path on the proxies 2019-12-18 16:56:35 -08:00
A.J. Beamon a093021855 Fix priority time calculation. Track max priority busy rather than seconds squared. 2019-12-17 09:14:54 -08:00
Alvin Moore 5080b8293a Added Windows required library for function GetProcessMemoryInfo 2019-12-16 08:09:10 -08:00
Evan Tschannen 3c30215662 Merge branch 'release-6.2' of github.com:apple/foundationdb into release-6.2 2019-12-09 13:18:07 -08:00
A.J. Beamon 20eacdb434 Add missing include 2019-12-06 15:18:17 -08:00
A.J. Beamon 9866d1ce27 Revert change to make g_trace_clock thread_local, instead checking we are on the correct thread when getting the time. 2019-12-06 10:15:49 -08:00
Andrew Noyes 78b202f3a4 Apply A.J.'s suggestion to randomInt as well 2019-12-05 11:01:41 -08:00
Andrew Noyes cf5cdc4e93
Update flow/DeterministicRandom.cpp
Include equality now that we've adjusted the value by 1.

Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-12-05 11:01:03 -08:00
Andrew Noyes b09f0b334b Take A.J.'s suggestion, which fixes A.J.s counterexample 2019-12-05 10:27:35 -08:00
Andrew Noyes 96e71bb109 Fix old Make build 2019-12-04 17:06:49 -08:00
Andrew Noyes 89a093e035 Accept UBSAN's suggestion
/home/anoyes/workspace/foundationdb/flow/DeterministicRandom.cpp:72:29: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself
2019-12-04 16:45:45 -08:00
Andrew Noyes 4f943be21d Move DeterministicRandom impl to its own translation unit
This will allow me to recompile faster after making changes, and should
(slightly) speed up overall compilation.

I manually verified that the unseed matched for one test before and
after this change, so I probably didn't screw up the refactor
2019-12-04 16:45:32 -08:00
Evan Tschannen 5a6bc2aa71 increase the priority of cluster controller recruitment to prefer recruitment over sending serverDBInfo 2019-12-04 16:28:41 -08:00
Evan Tschannen 5f1ef53f62 increase the priority at which the cluster controller registers workers to avoid having a saturated cluster controller recruit a master without all available workers 2019-12-04 16:17:41 -08:00
Andrew Noyes 46d10dc7dc Fix "null passed as argument declared not null"
Fix several such reports from ubsan

E.g.

/Users/anoyes/workspace/foundationdb/flow/Arena.h:794:16: runtime error: null pointer passed as argument 1, which is declared to never be null
2019-12-03 14:46:53 -08:00
Andrew Noyes 4022ca1381 Add USE_UBSAN cmake option 2019-12-02 16:20:06 -08:00
Andrew Noyes 7f263a2614 Require callers to alignment requirements for aligned_alloc 2019-12-02 15:30:54 -08:00
Andrew Noyes ff8758b1fd Request alignment that's at least sizeof(void*)
According to https://en.cppreference.com/w/c/memory/aligned_alloc#Notes,
aligned_alloc may return nullptr if it doesn't like the requested
alignment. Let's also detect if nullptr is returned.
2019-12-02 12:51:33 -08:00
Meng Xu a2c84d932d
Merge pull request #2396 from atn34/atn34/fix-invalid-vptr
Fix invalid vptr
2019-11-27 21:16:49 -08:00
Meng Xu 7eaf76bacf
Merge pull request #2389 from atn34/atn34/default-init-flatbuffers
Default initialize absent flatbuffers members
2019-11-27 21:16:27 -08:00
Andrew Noyes 8fc74e3182 Fix UBSAN error
Since QuorumCallback<T> is a non trivial type, we need to construct it
before we interact with it

This change fixes the following UBSAN message
/Users/anoyes/workspace/foundationdb/flow/genericactors.actor.h:930:18: runtime error: member access within address 0x0001243f63d0 which does not point to an object of type 'Callback<Standalone<StringRef> >'
0x0001243f63d0: note: object has invalid vptr
2019-11-27 13:16:48 -08:00
Andrew Noyes c4e01301b0 Fix a potential UB instance
Writing a value which is not 0 or 1 to the underlying memory of a bool
is undefined behavior. Conformant flatbuffers implementations must
accept bytes that are not 0 or 1 as booleans [1]. (Conformant
implementations are only allowed to write the byte 0 or 1 as a boolean
[1])

So this protects us from undefined behavior if we ever read a
flatbuffers message written by an almost-conformant implementation.

[1]: https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#boolean
2019-11-26 11:18:17 -08:00
Andrew Noyes 17ab2f8e00 Default initialize absent flatbuffers members 2019-11-26 10:58:29 -08:00
Evan Tschannen 27cb299d84 simulation can sometimes randomly hang or throw connection_failed, instead of always doing one or the other 2019-11-21 16:24:18 -08:00
Evan Tschannen 2727b91c46 simulation tests network connections failing due to errors instead of just hanging 2019-11-21 12:33:07 -08:00
Evan Tschannen 1e5677b55a increase the priority of reboot and recruitment requests 2019-11-11 15:17:11 -08:00
mpilman 325a8e4213 remove confusing USE_ODIRECT knob 2019-10-24 11:44:03 -07:00