Commit Graph

40 Commits

Author SHA1 Message Date
Renxuan Wang 839af5701e
Fix bug in resolveTCPEndpoint() when hostname resolving fails. (#7375)
* Close trace file when error happens in runNetwork().

* Improve the bestCount algorithm in getLeader().

In the current implementation, if the nominees are [0,1], the chosen leader will be 1, which is an exception to other cases and our expectation that if 2 nominees have the same frequency, the one with lower id will be the leader.

* Remove unnecessary new statement.

stream will never be a nullptr.

* Move self->dnsCache out of lambda capture.

Member variables are not capture by default, thus, `host` and `service` are not captured. This somehow successfully compile, but throws std::bad_alloc or basic_string::_S_create exceptions when we call `host+":"+service` in dnsCache.remove().

* Revert unintended change.

* Address comments.
2022-06-13 20:24:30 -07:00
Renxuan Wang 0500bcb3a7 Use unique pointer to avoid memory leak. 2022-05-27 14:15:21 -07:00
Renxuan Wang d502c47bfe Avoid creating new RequestStream when retry. 2022-05-27 14:15:21 -07:00
Renxuan Wang 30e124c09b Remove HostnameStatus and resolve trigger.
They are no longer needed since we have coordinators DNS cache; and they are introducing complex crashes.
2022-05-12 10:13:55 -07:00
Renxuan Wang c69a07a858
Check in the new Hostname logic. (#6926)
* Revert #6655.

20220407-031010-renxuan-c101052c21da8346           compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan

* Revert #6271.

20220407-051532-renxuan-470f0fe6aac1c217           compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan

* Revert #6266.

Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable.

20220407-175119-renxuan-55d30ee1a4b42c2f           compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan

* Add hostname to coordinator interfaces.

* Turn on the new hostname logic.

* Add the corresponding change in config txns.

The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first.

Passed correctness tests.

* Return error when hostnames cannot be resolved in coordinators command.

* Minor fixes.
2022-04-27 21:54:13 -07:00
Renxuan Wang e40cc8722c
A few hostname improvements. (#6825)
* Add tryResolveHostnames() in connection string.

* Add missing hostname to related interfaces.

* Do not pass RequestStream into *GetReplyFromHostname() functions.

Because we are using new RequestStream for each request anyways. Also, the passed in pointer could be nullptr, which results in seg faults.

* Add dynamic hostname resolve and reconnect intervals.

* Address comments.
2022-04-20 13:42:46 -07:00
Markus Pilman 16467262f0 Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-04-10 14:12:37 -06:00
Renxuan Wang 938e8ed996 Do not throw lookup_failed when resolving fails.
Instead, return an empty Optional<NetworkAddress>. For resolveWithRetry(), still return NetworkAddress because it retries until succeed.
2022-04-08 14:21:49 -07:00
Renxuan Wang 70c49b1b87 DNS cache entry should also be removed in tryGetReplyFromHostname(). 2022-04-08 14:21:49 -07:00
Renxuan Wang f3a8ac21be Change resolve functions in hostname to return network address. 2022-04-08 14:21:49 -07:00
Renxuan Wang 3290c81f1d Fix tryGetReplyFromHostname(). 2022-04-08 14:21:49 -07:00
Josh Slocum 6276cebad9
Blob integration (#6808)
* Fixing leaked stream with explicit notify failed before destructor

* better logic to prevent races in change feed fetching

* Found new race that makes assert incorrect

* handle server overloaded in initial read from fdb

* Handling more blob error types in granule retry

* Fixing rollback metadata problem, added better debugging

* Fixing version race when fetching change feed metadata

* Better racing split request handling

* fixing assert

* Handle change feed popped check in the blob worker

* fix: do not use a RYW transaction for a versionstamp because of randomize API version (#6768)

* more merge conflict issues

* Change feed destroy fixes

* Fixing change feed destroy and move race

* Check error condition in BG file req

* Using relative endpoints for blob worker interface

* Fixing bug in previous fix

* More destroy and move race fixes

* Don't update empty version on destroy in case it gets rolled back. moved() and removing will take care of ensuring it is not read

* Bug fix (#6796)

* fix: do not use a RYW transaction for a versionstamp because of randomize API version

* fix: if the initialSnapshotVersion was pruned, granule history was incorrect

* added a way to compress null bytes in printable()

* Fixing durability issue with moving and destroying change feeds

* Adding fix for not fully deleting files for a granule that child granules need to re-snapshot

* More destroy and move races

* Fixing change feed destroy and pop races

* Renaming bg prune to purge, and adding a C api and unit test for it

* more cleanup

* review comments

* Observability for granule purging

* better handling for change feed not registered

* Fixed purging bugs (#6815)

* fix: do not use a RYW transaction for a versionstamp because of randomize API version

* fix: if the initialSnapshotVersion was pruned, granule history was incorrect

* added a way to compress null bytes in printable()

* fixed a few purging bugs

Co-authored-by: Evan Tschannen <evan.tschannen@snowflake.com>
2022-04-08 14:15:25 -07:00
Markus Pilman bf956f5630 Merge remote-tracking branch 'origin/main' into features/private-request-streams 2022-04-07 13:29:27 -06:00
Markus Pilman e2d7d4075d multiple bug fixes 2022-04-07 11:08:07 -06:00
Renxuan Wang 267c4deaee
Add tryGetReplyFromHostname() and retryGetReplyFromHostname(). (#6761)
* Add hostname to coordination interfaces.

* Add tryGetReplyFromHostname() and retryGetReplyFromHostname().

* Change tryGetReplyFromHostname() to call hostname.resolve().

* Add throw for actor_cancelled.
2022-04-06 10:47:00 -07:00
Markus Pilman b595d4462f Throw error on unauthorized access 2022-03-29 14:58:43 -06:00
Josh Slocum f27475e2f4 Merge branch 'main' into blob_integration 2022-03-22 11:41:58 -05:00
sfc-gh-tclinkenbeard a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
Josh Slocum 8205771e8f ReplyPromiseStreams need synchronous disconnect to avoid change feed races 2022-03-18 17:05:12 -05:00
Markus Pilman dc973fb67e Allow List and first test 2022-02-22 11:15:16 +01:00
Josh Slocum ddfc301d74 Improving memory footprint of change feeds and making it configurable 2022-02-04 16:41:25 -06:00
Josh Slocum 2592c3f0ae Implemented less efficient version of reply stream closing 2022-02-03 20:07:49 -06:00
Xiaoxi Wang 15347773d9 fix double destruction memory bug 2021-07-07 22:55:49 +00:00
Evan Tschannen 1cec223434 cherry pick streaming replies to master 2021-06-19 09:47:13 -07:00
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Markus Pilman bdd3dbfa7d remove duplicates 2020-11-10 14:01:07 -07:00
Xin Dong e21426d12a Send error back to the GRV requests with batch priority when the cluster is saturated, instead of blindly enqueue the requests and let the client timeout. 2020-01-30 14:13:56 -08:00
Evan Tschannen 1c0484cffc fix: do not close connections which have outstanding tryGetReplies with the peer 2019-08-29 16:49:57 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Andrew Noyes 781b6ece77 Fix OPEN_FOR_IDE -Wunused-variable warnings
CC #1255, #1173
2019-04-16 15:28:01 -07:00
Evan Tschannen e3400c13ae fixed a performance regression related to broadcasting a read version to too many transactions simultaneously 2019-03-22 18:37:39 -07:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen 712aa00261 a better fix to the windows build issue 2018-08-21 14:54:38 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen 0e97ce79b4 fix: destroy peers that are inactive
do not open new connections to send replies
2018-07-08 10:26:41 -07:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen df74e2a373 re-added support for non-copying tlog recovery 2017-10-24 15:09:31 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00