Evan Tschannen
afd3ec13ff
added knobs
2020-01-21 18:58:34 -08:00
Evan Tschannen
7a4b459f07
wait for a tls handshake to complete before returning a connection
...
wait for multiple tls errors before throttling
2020-01-21 16:45:15 -08:00
Evan Tschannen
e65760eb46
Merge pull request #2536 from etschannen/feature-commit-latency
...
Improved commit latency in large clusters
2020-01-13 19:12:02 -08:00
Evan Tschannen
0e916fdbed
throttle client TLS errors longer than server errors so that when both happen simultaneously the server throttling will be disabled when the client makes its next attempt
2020-01-12 22:12:18 -08:00
Evan Tschannen
1f7eb1f738
throttle outgoing tls connections before establishing a network connection
...
store serverTLSConnectionThrottler map inside of g_network, so that it works properly with simulation
2020-01-12 16:44:30 -08:00
Evan Tschannen
ef5dfb87dc
Merge pull request #2529 from bnamasivayam/tls-throtlling
...
Establishing TLS connection through the handshake process is expensiv…
2020-01-12 14:56:21 -08:00
Balachandar Namasivayam
741aa523e6
Establishing TLS connection through the handshake process is expensive and the fdbserver process can get easily saturated with doing repeated TLS handshakes with only a few hundreds of clients have bad certificate. Hence throttle the number of handshakes done on the server per client ip if it has a bad certificate.
2020-01-10 16:19:41 -08:00
Evan Tschannen
2e20c12200
Merge pull request #2475 from ajbeamon/priority-busy-fixes
...
Fix PriorityBusy calculation and add PriorityMaxBusy
2020-01-10 12:47:17 -08:00
Evan Tschannen
176a1b6319
Merge pull request #2515 from ajbeamon/remove-timer-in-slowtask-profiler
...
Fix slow task profiler crash
2020-01-10 12:41:57 -08:00
Evan Tschannen
a5f544818c
Merge pull request #2420 from ajbeamon/trace-clock-source-fix
...
Revert change to make g_trace_clock thread_local, ...
2020-01-10 12:36:38 -08:00
A.J. Beamon
de5a591b15
Attempt a minor pointless change to fix the build
2020-01-06 15:17:13 -08:00
A.J. Beamon
6cf38790d6
Reorganize declaration of variable and add release note.
2020-01-06 12:27:56 -08:00
A.J. Beamon
4a52864023
Remove call of timer() from the slow task profiling signal handler, as it can lead to crashes if called at the wrong time.
2020-01-06 12:19:45 -08:00
Evan Tschannen
16b5af067c
changed trace event name
2020-01-03 16:03:29 -08:00
Evan Tschannen
deb032745a
fix: do not set logged until then end of the function
2020-01-03 12:45:23 -08:00
Evan Tschannen
1867d30017
added asserts to protect against future actions on a trace event that has been logged
2020-01-03 12:31:06 -08:00
Evan Tschannen
7152469cc3
log the base trace event before the endpoint messages
2020-01-03 12:15:38 -08:00
Evan Tschannen
6e473c3a83
Merge branch 'release-6.2' into feature-addpeer-fix
2020-01-02 17:37:23 -08:00
Evan Tschannen
032797ca5c
Merge pull request #2430 from etschannen/release-6.2
...
Reduce recovery times caused by saturating the cluster controller
2020-01-02 17:35:59 -08:00
A.J. Beamon
3dd3ac3cfd
Merge branch 'release-6.2' into trace-clock-source-fix
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-01-02 15:14:12 -08:00
A.J. Beamon
ca01593067
Cap busyness to 1.0 at logging time to cover all cases where it could be measured above.
2020-01-02 15:10:42 -08:00
Evan Tschannen
9e137d3b49
fix: addPeerReference only marks a connection as healthy if it is the first peerReference
...
added additional logging to long LoadBalance calls, and when the failure monitor state changes for an address
2019-12-19 18:26:29 -08:00
A.J. Beamon
3b28c7f103
Throw the correct error in deleteFile
2019-12-19 14:13:09 -08:00
Evan Tschannen
d8c3c2fda4
Improved prioritization of commit path on the proxies
2019-12-18 16:56:35 -08:00
A.J. Beamon
a093021855
Fix priority time calculation. Track max priority busy rather than seconds squared.
2019-12-17 09:14:54 -08:00
Alvin Moore
5080b8293a
Added Windows required library for function GetProcessMemoryInfo
2019-12-16 08:09:10 -08:00
Evan Tschannen
3c30215662
Merge branch 'release-6.2' of github.com:apple/foundationdb into release-6.2
2019-12-09 13:18:07 -08:00
A.J. Beamon
20eacdb434
Add missing include
2019-12-06 15:18:17 -08:00
A.J. Beamon
9866d1ce27
Revert change to make g_trace_clock thread_local, instead checking we are on the correct thread when getting the time.
2019-12-06 10:15:49 -08:00
Andrew Noyes
78b202f3a4
Apply A.J.'s suggestion to randomInt as well
2019-12-05 11:01:41 -08:00
Andrew Noyes
cf5cdc4e93
Update flow/DeterministicRandom.cpp
...
Include equality now that we've adjusted the value by 1.
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-12-05 11:01:03 -08:00
Andrew Noyes
b09f0b334b
Take A.J.'s suggestion, which fixes A.J.s counterexample
2019-12-05 10:27:35 -08:00
Andrew Noyes
96e71bb109
Fix old Make build
2019-12-04 17:06:49 -08:00
Andrew Noyes
89a093e035
Accept UBSAN's suggestion
...
/home/anoyes/workspace/foundationdb/flow/DeterministicRandom.cpp:72:29: runtime error: negation of -9223372036854775808 cannot be represented in type 'long int'; cast to an unsigned type to negate this value to itself
2019-12-04 16:45:45 -08:00
Andrew Noyes
4f943be21d
Move DeterministicRandom impl to its own translation unit
...
This will allow me to recompile faster after making changes, and should
(slightly) speed up overall compilation.
I manually verified that the unseed matched for one test before and
after this change, so I probably didn't screw up the refactor
2019-12-04 16:45:32 -08:00
Evan Tschannen
5a6bc2aa71
increase the priority of cluster controller recruitment to prefer recruitment over sending serverDBInfo
2019-12-04 16:28:41 -08:00
Evan Tschannen
5f1ef53f62
increase the priority at which the cluster controller registers workers to avoid having a saturated cluster controller recruit a master without all available workers
2019-12-04 16:17:41 -08:00
Andrew Noyes
46d10dc7dc
Fix "null passed as argument declared not null"
...
Fix several such reports from ubsan
E.g.
/Users/anoyes/workspace/foundationdb/flow/Arena.h:794:16: runtime error: null pointer passed as argument 1, which is declared to never be null
2019-12-03 14:46:53 -08:00
Andrew Noyes
4022ca1381
Add USE_UBSAN cmake option
2019-12-02 16:20:06 -08:00
Andrew Noyes
7f263a2614
Require callers to alignment requirements for aligned_alloc
2019-12-02 15:30:54 -08:00
Andrew Noyes
ff8758b1fd
Request alignment that's at least sizeof(void*)
...
According to https://en.cppreference.com/w/c/memory/aligned_alloc#Notes ,
aligned_alloc may return nullptr if it doesn't like the requested
alignment. Let's also detect if nullptr is returned.
2019-12-02 12:51:33 -08:00
Meng Xu
a2c84d932d
Merge pull request #2396 from atn34/atn34/fix-invalid-vptr
...
Fix invalid vptr
2019-11-27 21:16:49 -08:00
Meng Xu
7eaf76bacf
Merge pull request #2389 from atn34/atn34/default-init-flatbuffers
...
Default initialize absent flatbuffers members
2019-11-27 21:16:27 -08:00
Andrew Noyes
8fc74e3182
Fix UBSAN error
...
Since QuorumCallback<T> is a non trivial type, we need to construct it
before we interact with it
This change fixes the following UBSAN message
/Users/anoyes/workspace/foundationdb/flow/genericactors.actor.h:930:18: runtime error: member access within address 0x0001243f63d0 which does not point to an object of type 'Callback<Standalone<StringRef> >'
0x0001243f63d0: note: object has invalid vptr
2019-11-27 13:16:48 -08:00
Andrew Noyes
c4e01301b0
Fix a potential UB instance
...
Writing a value which is not 0 or 1 to the underlying memory of a bool
is undefined behavior. Conformant flatbuffers implementations must
accept bytes that are not 0 or 1 as booleans [1]. (Conformant
implementations are only allowed to write the byte 0 or 1 as a boolean
[1])
So this protects us from undefined behavior if we ever read a
flatbuffers message written by an almost-conformant implementation.
[1]: https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#boolean
2019-11-26 11:18:17 -08:00
Andrew Noyes
17ab2f8e00
Default initialize absent flatbuffers members
2019-11-26 10:58:29 -08:00
Evan Tschannen
27cb299d84
simulation can sometimes randomly hang or throw connection_failed, instead of always doing one or the other
2019-11-21 16:24:18 -08:00
Evan Tschannen
2727b91c46
simulation tests network connections failing due to errors instead of just hanging
2019-11-21 12:33:07 -08:00
Evan Tschannen
1e5677b55a
increase the priority of reboot and recruitment requests
2019-11-11 15:17:11 -08:00
mpilman
325a8e4213
remove confusing USE_ODIRECT knob
2019-10-24 11:44:03 -07:00