FixedResultPicker can be used in more situations. Note that
WrrLocalityLoadBalancerTest's test was changed non-trivially. The
noChildLb test was particularly nasty as it assumed
LoadBalancer.ErrorPicker had same toString() as
GracefulSwitchLoadBalancer's ErrorPicker.
In ac35ab6 the logic in xDS Name resolver was changed to support encoded
authorities. This seems to cause an issue for xdstp replacements which
would percent encode the authority for the replacement causing double
encoding.
For example:
URI = xds:///path/to/service
Authority = path%2Fto%2Fservice
xdstp resource = xdstp:///envoy.config.listener.v3.Listener/path%252Fto%252Fservice
Here the authority is encoded due to slashes and during replacement we
percent encode it again causing %2F to change to %252F. To avoid this
issue, use the encoded authority only for the getServiceAuthority() API
and for all other use cases retain the unencoded authority.
The scripts used `git rev-parse --show-toplevel` so it appeared they
could be used from any directory. But references to "GIT_BASE_DIR"
weren't absolute, so it did matter the starting directory. And it
mattered in a big way for xds/import.sh as if you ran it from the
grpc-java directory it would delete the xds directory in grpc-java, not
third_party.
The trap that deleted the GIT_BASE_DIR was very broken. In addition to
potentially deleting the wrong directory, it was unnecessary because
that directory was in tmpdir. But you can only have one trap per signal,
so this unnecessary trap disabled the trap that deleted tmpdir.
The script needed a full clone because it needed to check out a specific
commit. To work with --depth 1 you have to use some convoluted syntax.
But just downloading a tar.gz is easy and seems should work fine on Mac.
protoc-gen-validate/import.sh didn't have the trap problem, but seemed
to have drifted from the other scritps. All the scripts were synced to
match.
The bootstrapping code currently does not log zone and subZone from locality correctly, and only logs region. This commit fixes the logging message format.
* Eliminate NPE by skipping further processing when stream is defined, but doesn't have a property for streamKey (header processing identified an error)
Fixes#10364
* Add unit test for missing content type
Encode the service authority before passing it into gRPC util in the xDS name resolver to handle xDS requests which might contain multiple slashes. Example: xds:///path/to/service:port.
As currently the underlying Java URI library does not break the encoded authority into host/port correctly simplify the check to just look for '@' as we are only interested in checking for user info to validate the authority for HTTP.
This change also leads to few changes in unit tests that relied on this check for invalid authorities which now will be considered valid.
Just like #9376, depending on Guava packages such as URLEscapers or PercentEscapers leads to internal failures(Ex: Unresolvable reference to com.google.common.escape.Escaper from io.grpc.internal.GrpcUtil). To avoid these issues create an in house version that is heavily inspired by grpc-go/grpc.
* Sort the policies in a rule by policy name when parsing from proto. This fixes the server sending a GOAWAY when an LDS update with no changes other than ordering is received.
* Remove use of deprecated method setSourceIp
* Fix style issues
* Update RbacFilterTest.java
Currently, the gRPC compiler isn't properly using the fully qualified
string name `java.lang.String` instead of `String`. Update the generator
to use the `$String$` alias to avoid compile issues with protobuf
messages called String.
Fixes#10316.
Plumbing through sourceSet lets cross-project dependencies work the same
way as artifacts published to Maven. This fixes an issue for
interop-testing where build/install would include all the raw files from
thirdparty in addition to the grpc-xds.jar. For example:
build/install/grpc-interop-testing/lib/com/github/xds/data/orca/v3/OrcaLoadReport$1.class
b/288577812
Since 44847bf4e, when we upgraded our JUnit version, the JUnit
exclusions have probably not been necessary. e0ac97c4f upgraded
Robolectric to a version that had the auto.service problem fixed.
* Suppress duplicate children and NACK if detect loops (child is an ancestors of the current CDS aggregate).
* Handle diamond shaped aggregations (same cluster appears under 2 distinct parents that doesn't create a loop).
These code locations just needed a generic gRPC server, without any
transport-specific configuration. Use vanilla ManagedServerBuilder
instead of hard-coding to Netty, as we would suggest to our users.
This avoids the (often missing) evaluationDependsOn and fixes using
results from other projects without propagating those through
Configuration. It also reduces the number of useless classes pulled in
by down-stream tests, reducing the probability of rebuilds.
The expectation of fixtures is they help testing down-stream code that
use the classes in main. That applies to all the classes here except for
FakeClock and StaticTestingClassLoader. It would also apply to many
internal classes in grpc-testing, but let's consider cleaning that up
future work.
The pinning is unreliable in Maven and ignored by Gradle. I'm not at all
convinced that we are pinning/not pinning in appropriate projects. The
pinning also serves less of a purpose since we started encouraging the
BOM and grpc-netty-shaded. Netty's HTTP/2 API has also become somewhat
stable compared to its earlier history. If we notice an up-tick in
version skew, we can reinstate it.
The pinning is annoying in the build.gradle code and causes Maven/Gradle
to download the version list once a day, which can be troublesome to
users unaware of how to tell the tools to work offline.
It also opens our users to platform issues like seen in #10043
and #10086 where Maven Central's version list was incorrectly generated.
Or like #9664 where Gradle Plugin's repository caches packages from
JCenter but the version list is not as cachable so exposed us to JCenter
instability.
This fixes#8357, by way of "we think we won't worry any more." See
90db93b9 when it was originally introduced. And issues
like #8337, #3634.
LoadBalancers in general should never throw, but
parseLoadBalancingConfig() in particular has a return value to
communicate the error. Throwing can be a bit unpredictable, but at its
most trivial form causes a channel panic. There's no reason to throw
explicitly and calls to JsonUtil have to be protected by a try-catch
because it can throw.
TlsTesting.loadCert() is a public API and so should be preferred over
our internal utility. It avoids creating a temp file that has to be
deleted by a shutdown hook. Usages that needed a file were not migrated.
ReadyPicker hasn't been necessary since 111ff60e, when we stopped
calling super.pickSubchannel(). EmptyPicker was only used in tests, and
we can just compare the class name instead of doing an instanceof check.
Unfortunately, calling getClass() caused Java to start casting the
return value of pickerCaptor.getValue() based on its generics. Captors
don't verify the type they capture, so using any type other than
SubchannelPicker for the pickerCaptor is misleading and hides a cast.
* xds: handle the handlerRemoved callback to skip updateSslContext processing
In handlerAdded we submit a callback to updateSslContext but before the
callback is executed the handler could be removed (e.g. bad connection)
in which case the callback should skip all of the processing.
Also added a unit test to check there is no exception.
Under normal conditions the child LB of `ClusterImplLoadBalancer` does
not fluctuate, based on the field used to configure load balancing in
the xDS `Cluster` proto it is either:
1. `WrrLocalityLoadBalancer` if the newer `load_balancing_policy` field
is used
2. `WeightedTargetLoadBalancer` if the legacy `lb_policy` field is used
`ClusterImplLoadBalancer` currently assumes that this child does not
change and so does not change the child LB when the name resolver sends an
update. If the control plane does switch to using a different field for
LB config, that update will have an LB config meant for the other child
LB type. This will result in a ClassCastException and a channel panic.
To address this, `ClusterImplLoadBalancer` will now use
`GracefulSwitchLoadBalancer` and makes sure if the child policy changes
the correct LB implementation is switched to.
Currently the code maintains one LoadStatsManager2 that collects all
stats. The problem with this is that in a federation situation there
will be multiple LrsClients that will be periodically picking up stats
from the manager and sending them to their respective control planes.
This creates a first-come-first-serve situation where the stats get
randomly distributed across the control planes.
This change creates separate LoadStatsManagers dedicated to their own
control planes, thus assuring no stats will get lost.
xds: Correctly start LRS clients in federation situations
The old code used a single member variable to indicate if load reporting
had already been started by XdsClientImpl. This boolean was used to
avoid starting a LoadReportClient more than twice. This works fine with
a single control plane server.
The problem occurs in federation situations where there is more than one
control plane and thus more than one LoadReportClient. Once the first
LoadReportClient is started, the member variable boolean is flipped to
true and no other LoadReportClients would be started.
This change removes the boolean member variable and relies on the fact
that starting an already started LoadReportClient is a no-op.
* In `handleRpcStreamClosed()`, move retry handling to before the call to `xdsResponseHandler.handleStreamClosed()` so that TSan doesn't report a race condition that is completely meaningless.
fixes#9920
When XdsClient learns that a control plane no longer tracks a resource,
it should only notify watchers associated with that control plane.
This matters in control plane federation cases when more than one
control plane is in use.
Introduce an AsyncService interface in the generated code and move the methods from <service>ImplBase to default implementation of the interface.
* update pom files to allow java 1.8
* Add a bindService(<service>Async) method
* Change TestServiceImpl to use the interface and include a bind method instead of extending TestServiceImplBase.
* xds: allow sum of cluster weights above MAX_INT up to max of unsigned int.
* Define nextLong(long bound) method in FakeRandom for WeightedRandomPickerTest.
Fix a bug. When any of the xds subscribers for a resource has the last watcher cancelled, the bug will accidentally remove that resource type from the map, which make xds stream not accepting response update for that resource type entirely(pass through, no ACK/NACK will send).
Trying to upgrade Gradle to 7.6 improved the checkstyle plugin such that
it appears to have been running in new occasions. That in turn exposed
us to https://github.com/checkstyle/checkstyle/issues/5088. That bug was
fixed in 8.28, which also fixed lots of other bugs. So now we have
better checking and some existing volations needed fixing. Since the
code style fixes generated a lot of noise, this is a pre-fix to reduce
the size of a Gradle upgrade.
I did not upgrade past 8.28 because at some point some other bugs were
introduced, in particular with the Indentation module. I chose the
oldest version that had the particular bug impacting me fixed. Upgrading
to this old-but-newer version still makes it easier to upgrade to a
newer version in the future.
If an artifact on Maven Central exposes a type from gRPC on its API
surface, then consumers of that artifact need that gRPC API in the
compile classpath. Bazel handles this by making hjars for transitive
dependencies, but if the dependencies are runtime_deps then Bazel won't
generate hjars containing the needed symbols.
We don't export netty-shaded because the classes already don't match
Maven Central. If an artifact on Maven Central is exposing a
netty-shaded class on its API surface, it wouldn't work anyway since the
class simply doesn't exist for the Bazel build.
Fixes#9772
* xds: Disallow duplicate addresses in the RingHashLB.
Removed test that was previously checking for specific expected behavior with duplicate addresses.
This change has these main aspects to it:
1. Removal of any name resolution responsibility from ManagedChannelImpl
2. Creation of a new RetryScheduler to own generic retry logic
- Can also be used outside the name resolution context
3. Creation of a new RetryingNameScheduler that can be used to wrap any
polling name resolver to add retry capability
4. A new facility in NameResolver to allow implementations to notify
listeners on the success of name resolution attempts
- RetryingNameScheduler relies on this
* xds:Change timer creation logic to wait until the adsStream is ready before creating the timer to mark resources absent.
* xds:When the ads stream is closed only send errors to subscribers that haven't yet gotten results to match spec.
* Use a blocking queue to avoid the 2-second sleep.
For some inexplicable reason the following call.verifyRequest fails only for the V2 test and only from command line not IDE unless there is some Thread.sleep, even if it is only 1-millis.
Fix ConcurrentModificationException in PriorityLoadBalancer by making copy of children values to iterate rather than directly using children in for loop.
We use state-of-the-world approach. For LDS/CDS, the control plane must return all resources that the client has subscribed to in each request. If some LDS/CDS resources are gone in a new update, their corresponding RDS/EDS resources names will be onAbsent(), unless there is cached data that is in use by other subscribers in other components.
The motivations to remove this "retained resource" logic between resource types are:
1. Already handled by the subscribers, e.g. a CDS state would shut down its childLBs on new updates. XdsResolver for LdsUpdate would cancel all existing RDS subscriptions. Therefore the onAbsent() notification is effectively no-op.
2. Complexity.
ClusterImplLoadBalancer adds the ATTR_CLUSTER_NAME and
ATTR_SSL_CONTEXT_PROVIDER_SUPPLIER attributes to the EAG list when it
creates a new subchannel, but they are lost on subsequent address
updates. This change assures the attributes are also included on address
updates.
If a child policy triggers an update to the parent priority policy
it will be ignored if an update is already in process.
This is the second attempt to make this change, the first one caused a
problem with the ring hash LB. A new test that uses actual control plane
and data plane servers is now included to prove the issue no longer
appears.
This extracts the startup and shutdown code for the control and data
plane server to reparate JUnit rules, which allows this logic to be
resued in other tests in a simple manner. Also makes the test easier to
read with the boiler plate init code removed.
Now the xds resources are dynamically managed in resourceStore in xdsClient. The types is a xdsResourceType, singleton.
There is no longer hardcoded static list of known resource types, the subscription list is the source of truth.
AbstractXdsClient that manages AdsStream will only accept the xds resource types that has already has watchers subscribed to, same behaviour as before.
This fixes a regression in commit e1ad984. I'd create a test, but the
NPE gets thrown away in the context of the current test setup so can't
be created as quickly as we'd like to fix this. I have manually tested
in a custom reproduction to confirm it resolves the NPE.
Seen at b/248326695
```
java.lang.AssertionError: java.lang.NullPointerException
at io.grpc.xds.ClientXdsClient$1.uncaughtException(ClientXdsClient.java:89)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:97)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.xds.ClientXdsClient.cancelXdsResourceWatch(ClientXdsClient.java:327)
at io.grpc.xds.ClusterResolverLoadBalancer$ClusterResolverLbState$EdsClusterState.shutdown(ClusterResolverLoadBalancer.java:378)
at io.grpc.xds.ClusterResolverLoadBalancer$ClusterResolverLbState.shutdown(ClusterResolverLoadBalancer.java:206)
at io.grpc.util.GracefulSwitchLoadBalancer.shutdown(GracefulSwitchLoadBalancer.java:195)
at io.grpc.xds.ClusterResolverLoadBalancer.shutdown(ClusterResolverLoadBalancer.java:141)
at io.grpc.xds.CdsLoadBalancer2$CdsLbState.shutdown(CdsLoadBalancer2.java:136)
at io.grpc.xds.CdsLoadBalancer2.shutdown(CdsLoadBalancer2.java:110)
at io.grpc.util.GracefulSwitchLoadBalancer.shutdown(GracefulSwitchLoadBalancer.java:195)
at io.grpc.xds.ClusterManagerLoadBalancer$ChildLbState.shutdown(ClusterManagerLoadBalancer.java:256)
at io.grpc.xds.ClusterManagerLoadBalancer.shutdown(ClusterManagerLoadBalancer.java:138)
at io.grpc.internal.AutoConfiguredLoadBalancerFactory$AutoConfiguredLoadBalancer.shutdown(AutoConfiguredLoadBalancerFactory.java:164)
at io.grpc.internal.ManagedChannelImpl.shutdownNameResolverAndLoadBalancer(ManagedChannelImpl.java:381)
at io.grpc.internal.ManagedChannelImpl.access$8200(ManagedChannelImpl.java:118)
at io.grpc.internal.ManagedChannelImpl$DelayedTransportListener.transportTerminated(ManagedChannelImpl.java:2174)
at io.grpc.internal.DelayedClientTransport$3.run(DelayedClientTransport.java:122)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
at io.grpc.SynchronizationContext.execute(SynchronizationContext.java:127)
at io.grpc.internal.ManagedChannelImpl$RealChannel.shutdown(ManagedChannelImpl.java:1057)
at io.grpc.internal.ManagedChannelImpl.shutdown(ManagedChannelImpl.java:817)
at io.grpc.internal.ManagedChannelImpl.shutdownNow(ManagedChannelImpl.java:837)
at io.grpc.internal.ManagedChannelImpl.shutdownNow(ManagedChannelImpl.java:117)
at io.grpc.internal.ForwardingManagedChannel.shutdownNow(ForwardingManagedChannel.java:52)
at io.grpc.internal.ManagedChannelOrphanWrapper.shutdownNow(ManagedChannelOrphanWrapper.java:65)
at io.grpc.testing.integration.GrpclbFallbackTestClient.tearDown(GrpclbFallbackTestClient.java:178)
at io.grpc.testing.integration.GrpclbFallbackTestClient.main(GrpclbFallbackTestClient.java:67)
Caused by: java.lang.NullPointerException
at io.grpc.xds.ClientXdsClient.handleResourceResponse(ClientXdsClient.java:179)
at io.grpc.xds.AbstractXdsClient$AbstractAdsStream.handleRpcResponse(AbstractXdsClient.java:358)
at io.grpc.xds.AbstractXdsClient$AdsStreamV3$1$1.run(AbstractXdsClient.java:511)
at io.grpc.SynchronizationContext.drain(SynchronizationContext.java:95)
... 26 more
```
* xds: security code refactoring/renaming
1) move certprovider package under security
2) refactor inner Factory into CertProviderClientSslContextProviderFactory and CertProviderServerSslContextProviderFactory
3) Make CertProviderClientSslContextProvider and CertProviderServerSslContextProvider non-public
4) use only public (non package private) types like SslContextProvider (instead of CertProviderClientSslContextProvider etc)
Mainly refactor work to make type specific xds resources generic, e.g.
1. Define abstract class XdsResourceType to be extended by pluggable new resources. It mainly contains abstract method doParse() to parse unpacked proto messges and produce a ResourceUpdate. The common unpacking proto logic is in XdsResourceType default method parse()
2. Move the parsing/processing logics to specific XdsResourceType. Implementing:
XdsListenerResource for LDS
XdsRouteConfigureResource for RDS
XdsClusterResource for CDS
XdsEndpointResource for EDS
3. The XdsResourceTypes are singleton. To process for each XdsClient, context is passed in parameters, defined by XdsResourceType.Args.
4. Watchers will use generic APIs to subscribe to resource watchXdsResource(XdsResourceType, resourceName, watcher). Watcher and ResourceSubscribers becomes java generic class.
Removes the option of skipping the update of the priority LB state when
the failover timer is pending.
This consistency facilitates a future change weher we delay child LB
status updates if the priority LB is performing an update. The upcoming
priority LB policy gRFC also does not require this update to ever be
skipped.
The log id is an incrementing value starting from 0. That means the same
binary on two different machines will have the same hashes for each
consecutive Channel. That was not at all the intension of CHANNEL_ID.
From gRFC A42:
> This can be used in similar situations to where Envoy uses
> connection_properties to hash on the source IP address.
No logic changes, just cleans up warnings to make spotting real problems easier.
Remove "public" declarations on interfaces
Remove duplicate semicolons (Java lines ending in ";;")
Remove unneeded import
Change non-javadoc comment to not start with "/**"
Remove unneeded explicit type declarations from generics
Fix broken javadoc links
It's been 17 months since the check was introduced, which is plenty for
the migration. Leaving ignoreRefreshNameResolutionCheck() in-place to
let users delete their call sites. We'll remove the method after a few
releases.
Fixes#9409
This fixes builds including dependencies from Maven that use
io.grpc:grpc-services or io.grpc:grpc-xds. It resolves this error:
```
no such target '@io_grpc_grpc_java//services:services': target 'services' not declared in package 'services' defined by services/BUILD.bazel and referenced by '@maven//:io_grpc_grpc_services'
```
Fixes#9419
- Reduce nesting level by using `continue`
- Rearrange the order when it's possible to bail out early
- Add comments explaining what case it is, and the logic behind it
Introduces a new acceptResolvedAddresses() to the LoadBalancer.
This will now be the preferred way to handle addresses from the NameResolver. The existing handleResolvedAddresses() will eventually be deprecated.
The new method returns a boolean based on the LoadBalancers ability to use the provided addresses. If it does not accept them, false is returned. LoadBalancer implementations using the new method should no longer implement the canHandleEmptyAddressListFromNameResolution(), which will eventually be removed, along with handleResolvedAddresses().
Backward compatibility will be maintained so existing load balancers using handleResolvedAddresses() will continue to work.
Additionally the previously deprecated handleResolvedAddressGroups() method is removed.
%s is fairly safe (requires a Formattable to use Locale), so %d is the
main risk item. Places that really didn't need to use String.format()
were converted to plain string concatenation. Logging locations were
generally converted to using the log infrastructure's delayed
formatting, which is generally locale-sensitive but we're okay with
that. That wasn't done in okhttp, however, because Android frequently
doesn't use MessageFormat so we'd lose the parameters. Everywhere else
was explicitly defined to be Locale.US, to be consistent independent of
the default system locale.