Commit Graph

256 Commits

Author SHA1 Message Date
Terry Wilson 31d777e212
grpclb: switch to use acceptResolvedAddresses() (#9568)
This is part of a migration to move all LB implementations from
handleResolvedAddresses() to this new method.
2022-10-10 13:31:46 -07:00
Trevor Edwards 73020a9dd7 grpclb: fix mismatched indices in addresses log 2022-10-10 09:56:48 -07:00
Eric Anderson 61f19d707a
Swap Animalsniffer to Java 8 and Android 19
Also added missing signatures. Swapping to version catalog will make
this process easier in the future.
2022-08-10 12:41:57 -07:00
Terry Wilson 7665d3850b
Revert "Add LoadBalancer.acceptResolvedAddresses() (#8742)" (#9414)
This reverts commit 70a29fbfe3.
2022-07-28 12:15:26 -07:00
Terry Wilson 70a29fbfe3
Add LoadBalancer.acceptResolvedAddresses() (#8742)
Introduces a new acceptResolvedAddresses() to the LoadBalancer.

This will now be the preferred way to handle addresses from the NameResolver. The existing handleResolvedAddresses() will eventually be deprecated.

The new method returns a boolean based on the LoadBalancers ability to use the provided addresses. If it does not accept them, false is returned. LoadBalancer implementations using the new method should no longer implement the canHandleEmptyAddressListFromNameResolution(), which will eventually be removed, along with handleResolvedAddresses().

Backward compatibility will be maintained so existing load balancers using handleResolvedAddresses() will continue to work.

Additionally the previously deprecated handleResolvedAddressGroups() method is removed.
2022-07-26 09:23:37 -07:00
Eric Anderson 19ad4467db Service config parse failures should be UNAVAILABLE
INVALID_ARGUMENT is propagated to the data plane if no previous config
is available. INVALID_ARGUMENT is reserved for application use; LBs
should pretty much use UNAVAILABLE exclusively.

While most of the changes are in xds, there do not appear to be likely
xds code paths that would propagate a bad status to the data plane.
Internal policies either don't use parseLoadBalancingPolicyConfig() and
instead have their configuration objects constructed directly or are
constructed transitively through the cluster manager which uses INTERNAL
if there's a child failure. There was a worrisome hole before this
commit for StatusRuntimeExceptions received by the cluster manager, but
the audit didn't find any locations throwing such an exception.
User-selected policies produce a NACK and are protected from the
existing xds client watcher paths. The worst that appears could happen
is the channel could panic (which uses INTERNAL) if a bug let a bad
configuration through.
2022-07-08 15:49:12 -07:00
Eric Anderson 0ff9f37b9e Use Gradle's task configuration avoidance APIs
This can avoid creating an additional 736 tasks (previously 502 out of
1591 were not created). That's not all that important as the build time
is essentially the same, but this lets us see the poor behavior of the
protobuf plugin in our own project and increase our understanding of how
to avoid task creation when developing the plugin. Of the tasks still
being created, protobuf is the highest contributor with 165 tasks,
followed by maven-publish with 76 and appengine with 53. The remaining
59 are from our own build, but indirectly caused by maven-publish.
2022-07-08 12:16:40 -07:00
Eric Anderson b06942d63b Use Gradle's version catalog
This moves our depedencies into a plain file that can be read and
updated by tooling. While the current tooling is not particularly better
than just using gradle-versions-plugin, it should put us on better
footing. gradle-versions-plugin is actually pretty nice, but will be
incompatible with Gradle 8, so we need to wait a bit to see what the
future holds.

Left libraries as an alias for libs to reduce the commit size and make
it easier to revert if we don't end up liking this approach.

We're using Gradle 7.3.3 where it was an incubating fetaure. But in
Gradle 7.4 is became stable.
2022-06-14 14:04:10 -07:00
Eric Anderson a206cda1a8 Change Attributes.Key debug strings to reference the API of the key
Users appear to be doing `attributes.toString()` to find keys they are
interested in and then unable to find the name of the Key in our API.
They workaround the problem by scanning through `attributes.keys()`
looking for the key of interest. This is an abuse of the keys() API and
unnecessary user friction. They'd happily use the API if they just knew
where to find it.

I added internal to some strings to make it clear that you shouldn't go
looking to use it. There were many strings I didn't change. I focused on
keys most likely to be seen by users, which meant keys in grpc-api and
keys that are available via transport attributes.

See https://github.com/grpc/grpc-java/issues/1764#issuecomment-1139250061
2022-06-02 16:11:02 -07:00
Sergii Tkachenko 743b1ede11
grpclb: Include META-INF/services to //grpclb:grpclb (#9156) 2022-05-10 09:02:33 -07:00
sanjaypujare 538db03d56
api: add support for SocketAddress types in ManagedChannelProvider (#9076)
* api: add support for SocketAddress types in ManagedChannelProvider
also add support for SocketAddress types in NameResolverProvider
Use scheme in target URI to select a NameRseolverProvider and get
that provider's supported SocketAddress types.
implement selection in ManagedChannelRegistry of appropriate
ManagedChannelProvider based on NameResolver's SocketAddress types
2022-04-22 09:10:55 -07:00
Zhouyihai Ding ad2c0f93f4
Support setting gRPClb initial fallback timeout by service config (#8980) 2022-03-16 09:55:15 -07:00
ZHANG Dapeng 042f9879d4
all: remove deprecated StreamInfo.transportAttrs (#8768)
APIs such as `StreamInfo.getTransportAttrs()` were [deprecated](860e97d12a (diff-aa4049f54d6d5d462700e9221344184a37d2068b3ba7d715abd417b1df5bf883R114)) since 1.41.0. Removing now.
2021-12-20 09:46:25 -08:00
Terry Wilson c1e19af86d
grpclb: fallback timer only when not already using fallback backends. (#8646)
Addresses a problem where we initially only resolve addresses to the backends, but not the load balancer and then later resolve addresses to both. In this situation the fallback timer was started during the second instance even if it resulted in the timer later failing as we were already using fallback backends.

This change assures that a fallback time is only ever started if we are not already using the fallback backends.

This is a follow-up fix to #8253.
2021-11-02 12:47:47 -07:00
Zhouyihai Ding 5396a1de3d
grpclb: remove redundant logs and add a system property to hide server lists in logs
The server list updates are very verbose and currently logged every second, causing a huge log spam if `ChannelLogger` is completely enabled. For debugging an internal issue, we need to turn on `ChannelLogger` but hide the server list updates from the logs to keep the log size reasonable.
2021-09-22 10:13:42 -07:00
ZHANG Dapeng 860e97d12a
all: API refactoring in preparation to support retry stats (#8355)
Rebased PR #8343 into the first commit of this PR, then (the 2nd commit) reverted the part for metric recording of retry attempts. The PR as a whole is mechanical refactoring. No behavior change (except that some of the old code path when tracer is created is moved into the new method `streamCreated()`).

The API change is documented in go/grpc-stats-api-change-for-retry-java
2021-07-31 18:33:02 -07:00
ZhouyihaiDing 22aa8fcef1 Add Info log for gRPC LB's init response 2021-07-08 13:41:49 -05:00
Eric Anderson 0cabf5672a compiler: Add GrpcGenerated annotation to generated class
This can be used by annotation processors to avoid processing the
gRPC-generated code. The normal Generated annotation only has SOURCE
retention, so isn't available to annotation processors.

I don't include the service name within the annotation as that assumes
we'll never have need for any other type of generated class. If there's
a request for exposing service name via an annotation in the future, we
can make an RpcService annotation or the like.

Fixes #8158
2021-07-02 22:11:40 -07:00
Chengyuan Zhang 2cbc7fc3a5
grpclb: skip fallback if the LB is already in fallback mode (#8253)
Manually checks if the gRPCLB policy is already in fallback mode when trying to fallback due to receiving address update without LB addresses. 

Commit b956f8852d added an invariant check in the FallbackModeTask runnable to ensure the task is fired only when the LB is not already in fallback mode. However, that commit missed the case that receiving address updates without LB addresses can trigger the run of FallbackModeTask runnable, because the existing implementation chose to reuse the code in FallbackModeTask. In such case, running FallbackModeTask could break the invariant check as the LB policy may already in fallback mode.

This change eliminates the reuse of FallbackModeTask for handling address update without LB address. That is, every time receiving address update, we manually check if it is already in fallback instead of reusing to FallbackModeTask perform the check.

Note there was a discussion brought up whether we should force entering fallback (shutdown existing subchannels) or we should still keep the balancer connection. Different languages have already diverged on this. Go shuts down the balancer connection and all subchannel connections to force using fallback addresses. C-core keep the balancer connection working and does not shutdown subchannels, only let fallback happens after the existing balancer connection and subchannel connections become broken. Java shuts down the balancer connection but not subchannels. This change does not try to change the existing behavior, but only fixes the invariant check breakage.

-------------------
See bug reported in  b/190700476
2021-06-11 14:53:18 -07:00
Eric Anderson 5642e01243
Replace failOnVersionConflict() with custom requireUpperBoundDeps
failOnVersionConflict has never been good for us. It is equivalent to
Maven dependencyConvergence which we discourage our users to use because
it is too tempermental and _creates_ version skew issues over time.
However, we had no real alternative for determining if our deps would be
misinterpeted by Maven.

failOnVersionConflict has been a constant drain and makes it really hard
to do seemingly-trivial upgrades. As evidenced by protobuf/build.gradle
in this change, it also caused _us_ to introduce a version downgrade.

This introduces our own custom requireUpperBoundDeps implementation so
that we can get back to simple dependency upgrades _and_ increase our
confidence in a consistent dependency tree.
2021-06-11 14:01:18 -07:00
Penn (Dapeng) Zhang aa18b2c228 grpclb: update load_balancer.proto 2021-06-11 13:28:48 -07:00
Chengyuan Zhang c7afb89708
grpclb: use a standalone Context for gRPCLB control plane RPCs (#8154)
Inject a standalone Context that is independent of application RPCs to GrpclbLoadBalancer for control plane RPCs. The control plane RPC should be independent and not impacted by the lifetime of Context used for application RPCs.
2021-05-10 10:21:36 -07:00
Chengyuan Zhang 9614738a7d
core, grpclb, xds: let leaf LB policies explicitly refresh name resolution when subchannel connection is broken (#8048)
Currently each subchannel implicitly refreshes the name resolution when its state changes to IDLE or TRANSIENT_FAILURE. That is, this feature is built into subchannel's internal implementation. Although it eliminates the burden of having LB implementations refreshing the resolver when connections to backends are broken, this is gives LB policies no chance to disable or override this refresh (e.g., in some complex load balancing hierarchy like xDS, LB policies may embed a resolver inside for resolving backends so the refreshing resolution operation should be hooked to the resolver embedded in the LB policy instead of the one in Channel).

In order to make this transition smoothly, we add a check to SubchannelImpl that checks if the LoadBalancer has explicitly called Helper.refreshNameResolution for broken subchannels created by it. If not, it logs a warning and do the refresh.

A temporary LoadBalancer.Helper API ignoreRefreshNameResolution() is added to avoid false-positive warnings for xDS that intentionally does not want a refresh. Once the migration is done, this should be deleted.
2021-04-16 10:49:06 -07:00
Chengyuan Zhang b956f8852d
grpclb: include fallback reason in error status of failing to fallback (#8035)
Enhance error information reflected by RPC status when failing to fallback (aka, no fallback addresses provided by resolver), by including the original cause of entering fallback. Cases to fallback include:

  - When the fallback timer fires before we have received the first response from the balancer.
     - If no fallback addresses are found, RPCs will be failed with status {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n Timeout waiting for remote balancer", cause=null}
  - When the balancer RPC finishes before receiving any backend addresses
     - If no fallback addresses are found, RPCs will be failed with status {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n <description from the status of balancer RPC>", cause=<cause from the status of balancer RPC>}
  - When we get an explicit response from the balancer telling us go into fallback
     - If no fallback addresses are found, RPCs will be failed with status {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n Fallback requested by balancer", cause=null}
  - When the balancer call has finished *and* we cannot connect to any of the backends in the last response we received from the balancer.
     - Depending on whichever the two happened last, the last happening one is the reason that triggers entering fallback. If no fallback addresses are found, RPCs will be failed with status {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n <description from the status of balancer RPC>", cause=<cause from the status of balancer RPC>} or {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n <description from the status of one of the broken subchannels>", cause=<cause from the status of one of the broken subchannels>}

Note all RPCs will fail with UNAVAILABLE status code, the fallback reason will be attached as description and cause (if any).
2021-04-07 18:06:32 -07:00
Chengyuan Zhang c3caafa5ae
grpclb: turn into TRANSIENT_FAILURE if given (by balancer or fallback) an empty list of addresses (#7960)
Turn LB state into TRANSIENT_FAILURE if the list of backend addresses given to be used is found to be empty. It applies to both cases in which the address list is given by the balancer or the resolver (aka, the fallback list).
2021-03-12 17:25:32 -08:00
Chengyuan Zhang 9c562c8a6f
grpclb: should not ignore subchannels with CONNECTING state in aggregating the overall LB state (#7959)
We should treat both IDLE and CONNECTING subchannels as "connection in progress" when aggregating for the overall load balancing state. Otherwise, RPCs could fail prematurely if one subchannel enters TF while all the others are still in CONNECTING.

23d279660c made each individual subchannel stay in TF until READY if it previously was in TF. So subchannels with CONNECTING state are those in first time connecting. We should give them time to connect.
2021-03-11 16:48:01 -08:00
yifeizhuang 6a9c9901e4
grpclb: support multiple authorities in lb backends for all SRV records (#7951) 2021-03-11 13:29:41 -08:00
Chengyuan Zhang e5ab4d743d
grpclb: fix race between address update and LB stream recreation (#7934)
When the LB stream has been closed and a retry task is scheduled. Receiving a ResolvedAddress update with LB addresses immediately creates a new RPC stream again. Then when the retry task fires, a LB stream already exists.

This change cancels the retry task when the address update causing a new LB stream to be created.
2021-03-02 22:43:26 -08:00
Kun Zhang 23d279660c
grpclb: keep RR Subchannel state in TRANSIENT_FAILURE until becoming READY (#7816)
If all RR servers are unhealthy, it's possible that at least one
connection is CONNECTING at every moment which causes RR to stay in
CONNECTING. It's better to keep the TRANSIENT_FAILURE state in that
case so that fail-fast RPCs can fail fast.

The same changes have been made for RoundRobinLoadBalancer in #6657
2021-01-15 15:19:52 -08:00
Chengyuan Zhang b66d182bb9
api: delete LoadBalancer.Helper APIs that had been deprecated for a long time (#7793) 2021-01-11 15:25:35 -08:00
ZHANG Dapeng 7d77f64773
compiler: remove some of the static imports in codegen (#7751)
Resolves #7741 
Some of the static methods in generated code have the same method name but different package name, such `ClientCalls.asyncClientStreamingCall` and `ServerCalls.asyncClientStreamingCall`. It's less readable using static import than using full-qualified method name in-place.
2020-12-23 11:28:03 -08:00
ZHANG Dapeng ca12e7a339
grpclb: improve log for SRV lookup failure (#7647) 2020-11-23 11:09:28 -08:00
ZHANG Dapeng 6cdd537f0e
grpclb: enhance grpclb logging 2020-10-07 16:22:52 -07:00
ZHANG Dapeng 3abdb2859f
grpclb: cache requestConnection if no subchannel created
An issue was found during CBT RLS client testing: The RLS lb creates grplb child balancer, calls `grpclb.handleResolvedAddress()` then immediately calls `grpclb.requestConnection()`. The subchannel in `GrpclbState.currentPicker.pickList` contains only `GrpclbState.BUFFER_ENTRY` at the moment `grpclb.requestConnection()` is called, and therefore the `requestConnection()` is no-op, and RPC is hanging.
2020-09-16 16:48:22 -07:00
Eric Anderson e92b2275f9 Update to Error Prone 2.4
Most of the changes should be semi-clear why they were made. However, BadImport
may not be as obvious: https://errorprone.info/bugpattern/BadImport . That
impacted classes named Type, Entry, and Factory. Also
PublicContructorForAbstractClass:
https://errorprone.info/bugpattern/PublicConstructorForAbstractClass

The JdkObsolete issue is already resolved but is not yet in a release.
2020-08-06 10:56:16 -05:00
susinmotion 24731102c6
grpclb: Make ATTR_LB_ADDRS public (#7230) 2020-07-21 12:45:47 -07:00
ZHANG Dapeng 0044f8ce56
all: migrate gradle build to java-library plugin
- Use gradle configuration `api` for dependencies that are part of grpc public api signatures.
- Replace deprecated gradle configurations `compile`, `testCompile`, `runtime` and `testRuntime`.
- With minimal change in dependencies: If we need dep X and Y to compile our code, and if X transitively depends on Y, then our build would still pass even if we only include X as `compile`/`implementation` dependency for our project. Ideally we should include both X and Y explicitly as `implementation` dependency for our project, but in this PR we don't add the missing Y if it is previously missing.
2020-05-04 16:44:30 -07:00
ZHANG Dapeng 1086ee89c1
grpclb,xds: fix code lint 2020-04-02 18:18:32 -07:00
Eric Anderson 103c33e821 services,grpclb: Filter internal files from javadoc/jacoco 2020-04-02 08:57:31 -07:00
Jihun Cho 6dbdfcdbbc
grpclb: CachedSubchannelPool use new create subchannel (#6831) 2020-03-31 13:31:04 -07:00
Chengyuan Zhang ec25beb660
grpclb: clean up usage of raw load balancing config attributes in tests (#6798) 2020-03-03 09:52:12 -08:00
Chengyuan Zhang afc1f2e567
core, grpclb: clean up grpclb specific attributes in core (#6790)
Move ATTR_LB_ADDR_AUTHORITY and ATTR_LB_PROVIDED_BACKEND attributes definition in GrpcAttributes to GrpclbConstants. grpc-alts will have a compile dependency on grpc-grpclb.
2020-03-02 10:27:57 -08:00
Chengyuan Zhang 6a7e47b8a5
core, grpclb: change policy selection strategy for Grpclb policy (take two: move logic of querying SRV into Grpclb's own resolver) (#6723)
Eliminated the code path of resolving Grpclb balancer addresses in grpc-core and moved it into GrpclbNameResolver, which is a subclass of DnsNameResolver. Main changes:

- Slightly changed ResourceResolver and its JNDI implementation. ResourceResolver#resolveSrv(String) returns a list of SrvRecord so that it only parse SRV records and does nothing more. It's gRPC's name resolver's logic to use information parsed from SRV records.

- Created a GrpclbNameResolver class that extends DnsNameResolver. Logic of using information from SRV records to set balancer addresses as ResolutionResult attributes is implemented in GrpclbNameResolver only.

- Refactored DnsNameResolver, mainly the resolveAll(...) method. Logics for resolving backend addresses and service config are modularized into resolveAddresses() and resolveServiceConfig() methods respectively. They are shared implementation for subclasses (i.e., GrpclbNameResolver).
2020-03-02 01:03:25 -08:00
Jihun Cho 4b201267c6
grpclb: add description to lb sends no backends status (#6751) 2020-02-25 11:05:36 -08:00
Jihun Cho abed707385
grpclb: handles empty address from LB (#6734) 2020-02-21 10:55:57 -08:00
Jihun Cho 774f2763c9
grpclb: add serviceName config to grpclb policy config (#6563) 2020-02-11 10:27:47 -08:00
Eric Anderson 255e5feb24 Sync grpc-proto to 1ff78907
This noticed that load_balancer.proto had local changes introduced
in #6549. This was not noticed by Bazel because grpclb was not using
the io_grpc_grpc_proto repository. These issues have been fixed.
2020-02-10 12:32:39 -08:00
Chengyuan Zhang e0ee52cc22
grpclb: fix lint warnings (#6670) 2020-02-03 10:34:30 -08:00
Chengyuan Zhang 295b64b5ff
grpclb: expose balancer address related attributes in internal accessor (#6669) 2020-01-31 17:50:04 -08:00
Chengyuan Zhang 26bff62ff3
grpclb: internal accessor for balancer address related attribute keys (#6667)
Creates an internal accessor for attribute keys in grpclb package that is used by name resolver implementations to set balancer addresses as name resolution result attributes.
2020-01-31 15:44:30 -08:00