The recommended way to load dependencies from `rules_jvm_external`
is to make use of the `@maven` workspace, and the most readable
way of doing that is to use the `artifact` macro provides.
This removes the need to generate the "compat" namespaces, which
`rules_jvm_external` provided for backwards compatibility with
older releases. This change also sets things up for supporting
`bzlmod`: this requires all workspaces accessed by a library to
be named "up front" in the `MODULE.bazel` file. This way, the
only repo that needs to be exported is `@maven`, rather than the
current huge list.
* core, netty, okhttp: implement new logic for nameResolverFactory API in channelBuilder
fix ManagedChannelImpl to use NameResolverRegistry instead of NameResolverFactory
fix the ManagedChannelImplBuilder and remove nameResolverFactory
* Integrate target parsing and NameResolverProvider searching
Actually creating the name resolver is now delayed to the end of
ManagedChannelImpl.getNameResolver; we don't want to call into the name
resolver to determine if we should use the name resolver.
Added getDefaultScheme() to NameResolverRegistry to avoid needing
NameResolver.Factory.
---------
Co-authored-by: Eric Anderson <ejona@google.com>
Instead of a boolean, we now return a Status object. Status.OK
represents accepted addresses and other non-acceptance. This allows the
LB to provide more information about why a set of addresses were not
acceptable.
The status will later be sent to the name resolver as well to allow it
to also better react to to bad addresses.
Motivation:
When multiple NameResolvers are created, the Classloader is scanned every time trying to figure out if the Platform is Android. This expensive work could be done only once.
Modification:
Cache isAndroid resolution in a constant.
Result:
Less expensive multiple NameResolvers instantiation.
Currently, the gRPC compiler isn't properly using the fully qualified
string name `java.lang.String` instead of `String`. Update the generator
to use the `$String$` alias to avoid compile issues with protobuf
messages called String.
Fixes#10316.
This avoids the (often missing) evaluationDependsOn and fixes using
results from other projects without propagating those through
Configuration. It also reduces the number of useless classes pulled in
by down-stream tests, reducing the probability of rebuilds.
The expectation of fixtures is they help testing down-stream code that
use the classes in main. That applies to all the classes here except for
FakeClock and StaticTestingClassLoader. It would also apply to many
internal classes in grpc-testing, but let's consider cleaning that up
future work.
This change is to address b/269159638 about GCP metadata server returning an unexpected REFUSED status instead of NXDOMAIN. This causes a 15s delay with older versions of systemd-resolved.
This issue was observed with systemd 245.4. Version 251 worked without issue.
Introduce an AsyncService interface in the generated code and move the methods from <service>ImplBase to default implementation of the interface.
* update pom files to allow java 1.8
* Add a bindService(<service>Async) method
* Change TestServiceImpl to use the interface and include a bind method instead of extending TestServiceImplBase.
Introduces a new acceptResolvedAddresses() to the LoadBalancer.
This will now be the preferred way to handle addresses from the NameResolver. The existing handleResolvedAddresses() will eventually be deprecated.
The new method returns a boolean based on the LoadBalancers ability to use the provided addresses. If it does not accept them, false is returned. LoadBalancer implementations using the new method should no longer implement the canHandleEmptyAddressListFromNameResolution(), which will eventually be removed, along with handleResolvedAddresses().
Backward compatibility will be maintained so existing load balancers using handleResolvedAddresses() will continue to work.
Additionally the previously deprecated handleResolvedAddressGroups() method is removed.
INVALID_ARGUMENT is propagated to the data plane if no previous config
is available. INVALID_ARGUMENT is reserved for application use; LBs
should pretty much use UNAVAILABLE exclusively.
While most of the changes are in xds, there do not appear to be likely
xds code paths that would propagate a bad status to the data plane.
Internal policies either don't use parseLoadBalancingPolicyConfig() and
instead have their configuration objects constructed directly or are
constructed transitively through the cluster manager which uses INTERNAL
if there's a child failure. There was a worrisome hole before this
commit for StatusRuntimeExceptions received by the cluster manager, but
the audit didn't find any locations throwing such an exception.
User-selected policies produce a NACK and are protected from the
existing xds client watcher paths. The worst that appears could happen
is the channel could panic (which uses INTERNAL) if a bug let a bad
configuration through.
This can avoid creating an additional 736 tasks (previously 502 out of
1591 were not created). That's not all that important as the build time
is essentially the same, but this lets us see the poor behavior of the
protobuf plugin in our own project and increase our understanding of how
to avoid task creation when developing the plugin. Of the tasks still
being created, protobuf is the highest contributor with 165 tasks,
followed by maven-publish with 76 and appengine with 53. The remaining
59 are from our own build, but indirectly caused by maven-publish.
This moves our depedencies into a plain file that can be read and
updated by tooling. While the current tooling is not particularly better
than just using gradle-versions-plugin, it should put us on better
footing. gradle-versions-plugin is actually pretty nice, but will be
incompatible with Gradle 8, so we need to wait a bit to see what the
future holds.
Left libraries as an alias for libs to reduce the commit size and make
it easier to revert if we don't end up liking this approach.
We're using Gradle 7.3.3 where it was an incubating fetaure. But in
Gradle 7.4 is became stable.
Users appear to be doing `attributes.toString()` to find keys they are
interested in and then unable to find the name of the Key in our API.
They workaround the problem by scanning through `attributes.keys()`
looking for the key of interest. This is an abuse of the keys() API and
unnecessary user friction. They'd happily use the API if they just knew
where to find it.
I added internal to some strings to make it clear that you shouldn't go
looking to use it. There were many strings I didn't change. I focused on
keys most likely to be seen by users, which meant keys in grpc-api and
keys that are available via transport attributes.
See https://github.com/grpc/grpc-java/issues/1764#issuecomment-1139250061
* api: add support for SocketAddress types in ManagedChannelProvider
also add support for SocketAddress types in NameResolverProvider
Use scheme in target URI to select a NameRseolverProvider and get
that provider's supported SocketAddress types.
implement selection in ManagedChannelRegistry of appropriate
ManagedChannelProvider based on NameResolver's SocketAddress types
Addresses a problem where we initially only resolve addresses to the backends, but not the load balancer and then later resolve addresses to both. In this situation the fallback timer was started during the second instance even if it resulted in the timer later failing as we were already using fallback backends.
This change assures that a fallback time is only ever started if we are not already using the fallback backends.
This is a follow-up fix to #8253.
The server list updates are very verbose and currently logged every second, causing a huge log spam if `ChannelLogger` is completely enabled. For debugging an internal issue, we need to turn on `ChannelLogger` but hide the server list updates from the logs to keep the log size reasonable.
Rebased PR #8343 into the first commit of this PR, then (the 2nd commit) reverted the part for metric recording of retry attempts. The PR as a whole is mechanical refactoring. No behavior change (except that some of the old code path when tracer is created is moved into the new method `streamCreated()`).
The API change is documented in go/grpc-stats-api-change-for-retry-java
This can be used by annotation processors to avoid processing the
gRPC-generated code. The normal Generated annotation only has SOURCE
retention, so isn't available to annotation processors.
I don't include the service name within the annotation as that assumes
we'll never have need for any other type of generated class. If there's
a request for exposing service name via an annotation in the future, we
can make an RpcService annotation or the like.
Fixes#8158
Manually checks if the gRPCLB policy is already in fallback mode when trying to fallback due to receiving address update without LB addresses.
Commit b956f8852d added an invariant check in the FallbackModeTask runnable to ensure the task is fired only when the LB is not already in fallback mode. However, that commit missed the case that receiving address updates without LB addresses can trigger the run of FallbackModeTask runnable, because the existing implementation chose to reuse the code in FallbackModeTask. In such case, running FallbackModeTask could break the invariant check as the LB policy may already in fallback mode.
This change eliminates the reuse of FallbackModeTask for handling address update without LB address. That is, every time receiving address update, we manually check if it is already in fallback instead of reusing to FallbackModeTask perform the check.
Note there was a discussion brought up whether we should force entering fallback (shutdown existing subchannels) or we should still keep the balancer connection. Different languages have already diverged on this. Go shuts down the balancer connection and all subchannel connections to force using fallback addresses. C-core keep the balancer connection working and does not shutdown subchannels, only let fallback happens after the existing balancer connection and subchannel connections become broken. Java shuts down the balancer connection but not subchannels. This change does not try to change the existing behavior, but only fixes the invariant check breakage.
-------------------
See bug reported in b/190700476
failOnVersionConflict has never been good for us. It is equivalent to
Maven dependencyConvergence which we discourage our users to use because
it is too tempermental and _creates_ version skew issues over time.
However, we had no real alternative for determining if our deps would be
misinterpeted by Maven.
failOnVersionConflict has been a constant drain and makes it really hard
to do seemingly-trivial upgrades. As evidenced by protobuf/build.gradle
in this change, it also caused _us_ to introduce a version downgrade.
This introduces our own custom requireUpperBoundDeps implementation so
that we can get back to simple dependency upgrades _and_ increase our
confidence in a consistent dependency tree.
Inject a standalone Context that is independent of application RPCs to GrpclbLoadBalancer for control plane RPCs. The control plane RPC should be independent and not impacted by the lifetime of Context used for application RPCs.
Currently each subchannel implicitly refreshes the name resolution when its state changes to IDLE or TRANSIENT_FAILURE. That is, this feature is built into subchannel's internal implementation. Although it eliminates the burden of having LB implementations refreshing the resolver when connections to backends are broken, this is gives LB policies no chance to disable or override this refresh (e.g., in some complex load balancing hierarchy like xDS, LB policies may embed a resolver inside for resolving backends so the refreshing resolution operation should be hooked to the resolver embedded in the LB policy instead of the one in Channel).
In order to make this transition smoothly, we add a check to SubchannelImpl that checks if the LoadBalancer has explicitly called Helper.refreshNameResolution for broken subchannels created by it. If not, it logs a warning and do the refresh.
A temporary LoadBalancer.Helper API ignoreRefreshNameResolution() is added to avoid false-positive warnings for xDS that intentionally does not want a refresh. Once the migration is done, this should be deleted.
Enhance error information reflected by RPC status when failing to fallback (aka, no fallback addresses provided by resolver), by including the original cause of entering fallback. Cases to fallback include:
- When the fallback timer fires before we have received the first response from the balancer.
- If no fallback addresses are found, RPCs will be failed with status {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n Timeout waiting for remote balancer", cause=null}
- When the balancer RPC finishes before receiving any backend addresses
- If no fallback addresses are found, RPCs will be failed with status {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n <description from the status of balancer RPC>", cause=<cause from the status of balancer RPC>}
- When we get an explicit response from the balancer telling us go into fallback
- If no fallback addresses are found, RPCs will be failed with status {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n Fallback requested by balancer", cause=null}
- When the balancer call has finished *and* we cannot connect to any of the backends in the last response we received from the balancer.
- Depending on whichever the two happened last, the last happening one is the reason that triggers entering fallback. If no fallback addresses are found, RPCs will be failed with status {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n <description from the status of balancer RPC>", cause=<cause from the status of balancer RPC>} or {UNAVAILABLE, description="Unable to fallback, no fallback addresses found\n <description from the status of one of the broken subchannels>", cause=<cause from the status of one of the broken subchannels>}
Note all RPCs will fail with UNAVAILABLE status code, the fallback reason will be attached as description and cause (if any).
Turn LB state into TRANSIENT_FAILURE if the list of backend addresses given to be used is found to be empty. It applies to both cases in which the address list is given by the balancer or the resolver (aka, the fallback list).
We should treat both IDLE and CONNECTING subchannels as "connection in progress" when aggregating for the overall load balancing state. Otherwise, RPCs could fail prematurely if one subchannel enters TF while all the others are still in CONNECTING.
23d279660c made each individual subchannel stay in TF until READY if it previously was in TF. So subchannels with CONNECTING state are those in first time connecting. We should give them time to connect.
When the LB stream has been closed and a retry task is scheduled. Receiving a ResolvedAddress update with LB addresses immediately creates a new RPC stream again. Then when the retry task fires, a LB stream already exists.
This change cancels the retry task when the address update causing a new LB stream to be created.
If all RR servers are unhealthy, it's possible that at least one
connection is CONNECTING at every moment which causes RR to stay in
CONNECTING. It's better to keep the TRANSIENT_FAILURE state in that
case so that fail-fast RPCs can fail fast.
The same changes have been made for RoundRobinLoadBalancer in #6657
Resolves#7741
Some of the static methods in generated code have the same method name but different package name, such `ClientCalls.asyncClientStreamingCall` and `ServerCalls.asyncClientStreamingCall`. It's less readable using static import than using full-qualified method name in-place.
An issue was found during CBT RLS client testing: The RLS lb creates grplb child balancer, calls `grpclb.handleResolvedAddress()` then immediately calls `grpclb.requestConnection()`. The subchannel in `GrpclbState.currentPicker.pickList` contains only `GrpclbState.BUFFER_ENTRY` at the moment `grpclb.requestConnection()` is called, and therefore the `requestConnection()` is no-op, and RPC is hanging.