Commit Graph

122 Commits

Author SHA1 Message Date
Vaidas Gasiunas e28a8401fb
Update coordinator list from cluster file (#7382)
* Log failed connection attempts in monitorProxies

* Update coordinator list from the cluster file after failing to connect to all coordinators

* Wiggle and upgrade test with legacy version monitoring; updating tests to use 7.1.9

* Update coordinator list from the cluster file: addressing review comments

* Update coordinator list from the cluster file: addressing review comments

* Wait on future for all setAndPersistConnectionString calls
2022-06-23 09:22:09 +02:00
Renxuan Wang 839af5701e
Fix bug in resolveTCPEndpoint() when hostname resolving fails. (#7375)
* Close trace file when error happens in runNetwork().

* Improve the bestCount algorithm in getLeader().

In the current implementation, if the nominees are [0,1], the chosen leader will be 1, which is an exception to other cases and our expectation that if 2 nominees have the same frequency, the one with lower id will be the leader.

* Remove unnecessary new statement.

stream will never be a nullptr.

* Move self->dnsCache out of lambda capture.

Member variables are not capture by default, thus, `host` and `service` are not captured. This somehow successfully compile, but throws std::bad_alloc or basic_string::_S_create exceptions when we call `host+":"+service` in dnsCache.remove().

* Revert unintended change.

* Address comments.
2022-06-13 20:24:30 -07:00
Renxuan Wang df4e0deb4d
coordinatorsKey should not always store IP addresses. (#7204)
* coordinatorsKey should not storing IP addresses.

Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.

* Remove the legacy coordinators() function.

* Update async_resolve().

ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.

* Clean code format.

* Fix typo.

* Remove SpecifiedQuorumChange and NoQuorumChange.
2022-05-23 11:42:56 -07:00
A.J. Beamon 993a79b42c
Restart protocol monitoring logic when we cycle the coordinator, even if its the same coordinator. Add some defensive code in case we fetch the protocol version for an absent peer. (#6397) 2022-05-19 10:00:23 +02:00
Renxuan Wang 30e124c09b Remove HostnameStatus and resolve trigger.
They are no longer needed since we have coordinators DNS cache; and they are introducing complex crashes.
2022-05-12 10:13:55 -07:00
Renxuan Wang 154de018ff
One place in PaxosConfigConsumer was missed out in #6926. (#7006)
* One place in PaxosConfigConsumer was missed out.

* Minor improvements.
2022-04-28 18:32:55 -07:00
Renxuan Wang c69a07a858
Check in the new Hostname logic. (#6926)
* Revert #6655.

20220407-031010-renxuan-c101052c21da8346           compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan

* Revert #6271.

20220407-051532-renxuan-470f0fe6aac1c217           compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan

* Revert #6266.

Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable.

20220407-175119-renxuan-55d30ee1a4b42c2f           compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan

* Add hostname to coordinator interfaces.

* Turn on the new hostname logic.

* Add the corresponding change in config txns.

The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first.

Passed correctness tests.

* Return error when hostnames cannot be resolved in coordinators command.

* Minor fixes.
2022-04-27 21:54:13 -07:00
Renxuan Wang e40cc8722c
A few hostname improvements. (#6825)
* Add tryResolveHostnames() in connection string.

* Add missing hostname to related interfaces.

* Do not pass RequestStream into *GetReplyFromHostname() functions.

Because we are using new RequestStream for each request anyways. Also, the passed in pointer could be nullptr, which results in seg faults.

* Add dynamic hostname resolve and reconnect intervals.

* Address comments.
2022-04-20 13:42:46 -07:00
Jingyu Zhou 17dc1a61f3 ClientDBInfo may be unintentionally not set
The ClientDBInfo's comparison is through an internal UID and shrinkProxyList()
can change proxies inside ClientDBInfo. Since the UID is not changed by that
function, subsequent set can be unintentionally skipped.

This was not a big issue before. However, VV introduces a change that the
client side compares the returned proxy ID with its known set of GRV proxies
and will retry GRV if the returned proxy ID is not in the set. Due the above
bug, GRV returned by a proxy is not within the client set, and results in
indefinite retrying GRVs.
2022-04-18 09:09:14 -07:00
Renxuan Wang 938e8ed996 Do not throw lookup_failed when resolving fails.
Instead, return an empty Optional<NetworkAddress>. For resolveWithRetry(), still return NetworkAddress because it retries until succeed.
2022-04-08 14:21:49 -07:00
Renxuan Wang 465ff712b6
Move Hostname to its own files. (#6759)
* Change DNS cache to use std::map.

Revert commit 90c259d84e, because if we use unordered_map, toString() can be inconsistent.

* Move ClientKnob::COORDINATOR_HOSTNAME_RESOLVE_DELAY to FlowKnob::HOSTNAME_RESOLVE_DELAY.

* Move Hostname to its own files.

Also, add resolve-related variables and functions in Hostname.
2022-04-04 19:04:51 -07:00
Renxuan Wang 2a59c5fd4e
Workers should monitor coordinators in submitCandidacy(). (#6655)
* Workers should monitor coordinators in submitCandidacy().

* Change re-resolve delay to a knob.
2022-03-24 19:20:42 -07:00
sfc-gh-tclinkenbeard a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Renxuan Wang 3e761aef3b Add resetConnectionString() function to ClusterConnectionString. 2022-02-24 23:02:29 -08:00
A.J. Beamon 250a88e682 Enforce that trace event suppression calls happen first when using trace event call chaining. Fix various instances where we weren't following this requirement. 2022-02-24 12:25:52 -08:00
Renxuan Wang 3c1394578b Address comments. 2022-02-22 16:29:59 -08:00
Renxuan Wang 622d89b552 Rebase on main.
Since we changed ClusterConnectionString's status flag from boolean to enum in #6422, we need to update this PR correspondingly.
2022-02-22 16:29:59 -08:00
Renxuan Wang 8eb7a10404 Address comments. 2022-02-22 16:29:59 -08:00
Renxuan Wang 481587a8c6 Turn on hostname logic. 2022-02-22 16:29:59 -08:00
Renxuan Wang c29c522d3e Change ClusterConnectionString's status flag from boolean to enum.
So that when multiple threads try to resolve hostnames concurrently, we should have only one resolving and others wait on the result. Otherwise, they would add duplicated resolved results to coordinators.
2022-02-21 10:53:03 -08:00
Renxuan Wang f9f3735f73 Add resolveHostnamesBlocking() in ConnectionString and IClusterConnectionRecord.
Also, combine IClusterConnectionRecord::getConnectionString() and IClusterConnectionRecord::getMutableConnectionString() to IClusterConnectionRecord::getConnectionString(), and rename setConnectionString() to setAndPersistConnectionString().
2022-01-28 12:20:41 -08:00
Renxuan Wang 8c49391c51 Address comments. 2022-01-20 19:18:15 -08:00
Renxuan Wang 02f72dd377 Address comments. 2022-01-20 19:18:15 -08:00
Renxuan Wang a6c482ee91 Add the support of using hostname in ConnectionString.
This PR comes without simulation tests except some unit tests. The simulation tests will be in the PR that uses hostname in code logic.
2022-01-20 19:18:15 -08:00
Renxuan Wang 9f70e84a7b Remove trimFromHostname(). 2021-12-07 15:39:51 -08:00
Renxuan Wang 85dff214a4 Address comments. 2021-11-04 11:42:28 -07:00
Renxuan Wang f15ceb5489 Add Hostname struct, and fromHostname in NetworkAddress struct. 2021-11-04 11:42:28 -07:00
Renxuan Wang f503ba6a7c Move getConnectionString() to IClusterConnectionRecord.
ClusterConnectionString was moved to IClusterConnectionRecord in PR #5853.
2021-10-27 12:59:52 -07:00
A.J. Beamon 9358adcf49 Address some review comments. 2021-10-22 11:05:18 -07:00
A.J. Beamon e882eb33fc Abstract the cluster file into a cluster connection record that can be backed by something other than the filesystem. 2021-10-22 11:05:18 -07:00
Vaidas Gasiunas 16171d8252 Refactoring well-known endpoint registration
- List all well-known endpoints of FDB in a single enum
- Identify well-known endpoints by plain IDs
2021-09-21 11:05:31 -06:00
Xiaoge Su abf73047ca Enforce std:: specifier rather than using namespace 2021-09-16 19:40:28 -07:00
Renxuan Wang 96fcde45c2 Minor leader election code improvements.
1. Rename monitorLeaderRemotely* functions to monitorLeaderWithDelayedCandidacy*. "Remote" is not clearly describing what the functions are doing;
2. Rename monitorLeaderForProxies() to monitorLeaderAndGetClientInfo() to better describe the function;
3. Remove monitorLeaderRemotelyInternal() and monitorLeaderRemotely() in MonitorLeader.actor.cpp, to eliminate code duplication. They already exist in worker.actor.cpp;
4. Move the declaration of getLeader() from LeaderElection.actor.cpp to MonitorLeader.h;
5. Update a few comments.
2021-09-16 15:34:45 -07:00
FDB Formatster 69508b980f modify comments to make clang-format and coverage tool play nice 2021-08-27 17:18:00 -07:00
sfc-gh-tclinkenbeard 3c1cabf041 Coordinator lets client know if it cannot communicate with cluster controller 2021-07-13 17:26:28 -07:00
sfc-gh-tclinkenbeard 9379bab04e Move trim to anonymous namespace 2021-07-13 14:28:46 -07:00
sfc-gh-tclinkenbeard 371a38e6e5 Merge remote-tracking branch 'origin/master' into remove-extra-copies 2021-06-07 10:26:06 -07:00
Dan Lambright 64c10d3625 fix joshua failures, formatting 2021-05-27 08:08:07 -04:00
Dan Lambright fcfb78162c misc cleanup for publishing 2021-05-27 08:08:07 -04:00
Dan Lambright 742c22cef2 Don't allow changing desriptor if knob is set 2021-05-27 08:08:07 -04:00
sfc-gh-tclinkenbeard f28ac955c3 Remove unnecessary temporary objects while growing objects of type std::vector<std::pair<A, B>> 2021-05-10 16:32:50 -07:00
Renxuan Wang 652c5c4e84
Merge pull request #4668 from RenxuanW/worker
Improve logging on worker joining cluster
2021-04-30 10:24:02 -07:00
Andrew Noyes 904a39e473
Merge pull request #4667 from sfc-gh-ajbeamon/feature-mvc-monitor-protocol-version
Use fewer connections in the multi-version client
2021-04-28 14:13:17 -07:00
RenxuanW 0145eea684 Make `MonitorLeaderForwarding` and `LeaderForwarding` trackLatest events. 2021-04-27 15:17:20 -07:00
Andrew Noyes 8a00c6cdf8 Add -Wshift-sign-overflow
This catches the bug fixed in #4656 at compile time
2021-04-21 23:49:26 +00:00
A.J. Beamon b2d6930103 The multi-version client monitors the cluster's protocol version and only activates the client library that can connect. 2021-04-15 11:45:14 -07:00
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Andrew Noyes 79cec09255 Apply clang-tidy's performance-inefficient-vector-operation fix
I ran this command in my build directory after compiling with
OPEN_FOR_IDE. It took a few small tweaks to get it to compile, which is
outside the scope of this commit.

    $ python run-clang-tidy.py -j $(nproc) -checks='-*,performance-inefficient-vector-operation' -fix
2021-03-04 03:58:25 +00:00
Richard Chen c77d9e4abe merge conflicts 2020-12-02 21:53:19 +00:00
sfc-gh-tclinkenbeard 4669f837fa Add uses of makeReference 2020-11-07 22:10:18 -08:00