* netty: implement UdsNameResolver and UdsNettyChannelProvider
When the scheme is "unix:" we get the UdsNettyChannelProvider to
create a NettyChannelBuilder with DomainSocketAddress type and
other related params needed for UDS sockets
This should fix test failures on aarch64.
```
expected to be less than: 0.0
but was : 0.0
at app//io.grpc.benchmarks.driver.LoadWorkerTest.assertWorkOccurred(LoadWorkerTest.java:198)
at app//io.grpc.benchmarks.driver.LoadWorkerTest.runUnaryBlockingClosedLoop(LoadWorkerTest.java:90)
```
runUnaryBlockingClosedLoop() has been failing but the other tests
suceeding. The failure is complaining that getCount() == 0, which means
no RPCs completed. The slowest successful test has a mean RPC time of
226 ms (the unit was logged incorrectly) and comparing to x86 tests
runUnaryBlockingClosedLoop() is ~2x as slow because it executes first.
So this is probably _barely_ failing and 4 attempts instead of 3 would
be sufficient. While the test tries to wait for 10 RPCs to complete, it
seems likely it is stopping early even for the successful runs on
aarch64. There are 4 concurrent RPCs, so to get 10 RPCs we need to wait
for 3 batches of RPCs to complete which would be 1346 ms (5 loops)
assuming a 452 ms mean latency. Bumping timeout by 10x to give lots of
headroom.
When a problem happens, it will now report back quickly instead of
waiting until the timeout expires. The timeout exception will also
report each RPC's state.
This is to help diagnose aarch64 test failures.
The test appears to be slow because of classloading. The failure cases
were very slow at 14-16 seconds, but looking at other logs it succeeds
after 12 seconds. It is the first test in the class, and the other tests
run much faster. This could be solved with warmup code, but increasing
the RPC deadline is easier.
Two back-to-back failures on aarch64:
https://source.cloud.google.com/results/invocations/c4612a28-d594-42e9-b8ab-12c999690b40/targetshttps://source.cloud.google.com/results/invocations/3d5d1dc2-6b47-493d-b15c-e99458067d73/targets
```
expected to be true
at app//io.grpc.rls.CachingRlsLbClientTest.rls_withCustomRlsChannelServiceConfig(CachingRlsLbClientTest.java:267)
```
And the next run failed on a different line but seems the same cause:
https://source.cloud.google.com/results/invocations/546b83d1-cd26-4b87-8871-a7a06a60dc06/targets
```
expected to be true
at app//io.grpc.rls.CachingRlsLbClientTest.rls_withCustomRlsChannelServiceConfig(CachingRlsLbClientTest.java:273)
```
Reproduced with:
```diff
diff --git a/rls/src/test/java/io/grpc/rls/CachingRlsLbClientTest.java b/rls/src/test/java/io/grpc/rls/CachingRlsLbClientTest.java
index 9fac852fa..631d632eb 100644
--- a/rls/src/test/java/io/grpc/rls/CachingRlsLbClientTest.java
+++ b/rls/src/test/java/io/grpc/rls/CachingRlsLbClientTest.java
@@ -264,6 +264,11 @@ public class CachingRlsLbClientTest {
// initial request
CachedRouteLookupResponse resp = getInSyncContext(routeLookupRequest);
+ try {
+ Thread.sleep(2000);
+ } catch (Exception e) {
+ throw new RuntimeException(e);
+ }
assertThat(resp.isPending()).isTrue();
// server response
```
1. move orca from xds and from service to io.grpc.xds.orca new package
2. keep CallMetricsRecorder and InternalCallMetricsRecorder in service
3. Added APIs for recording utilization/requestCost/cpuUtilization/memoryUtilzation for per-query requests, added internal data structure equivalent to OrcaLoadReport
This LB is the parent for weighted_target and will configure it based on the child policy it gets in its configuration and locality weights that come in a ResolvedAddresses attribute.
Described in [A52: gRPC xDS Custom Load Balancer Configuration](https://github.com/grpc/proposal/pull/298)
grpc-observability was accidentally included in grpc-bom in 1.45 even
though it was not published to Maven Central. This is intended to reduce
the likelihood of such things reoccurring. We only include a project in
the bom if it is using maven-publish and if the publishing task is
enabled.
onlyIf is very similar to enabled, except it is processed just before
the task is run. We need a more static property here, so swap to
enabled. If a project uses onlyIf in the future, grpc-bom won't be able
to automatically exclude it.
Fix the issue with `Linux aarch64 (emulated)` builds failing with
```
Expiring Daemon because JVM heap space is exhausted
Daemon will be stopped at the end of the build after running out of JVM memory
```
This fixes the build itself, however certain tests still fail.
This reverts commit 0963f3151d. This
causes dependency problems when importing into Google, as
google-auth-library-java needs to be upgraded and that requires an
upgrade to google-http-java-client to bring in
https://github.com/googleapis/google-http-java-client/pull/1505 .
Reverting for now and will roll forward once those upgrades are
performed.
Remove unused xds/third_party/istio/src/main/proto/security/proto/providers/google/meshca.proto
and xds/src/generated/main/grpc/com/google/security/meshca/v1/MeshCertificateServiceGrpc.java
generated from it.
Retryable was added in google-auth-library 1.5.3 to make clear the
situations that deserve a retry of the RPC. Bump to that version and
swap away from the imprecise IOException heuristic.
go/auth-correct-retry
Fixes#6808
Proto updates:
- cncf/xds: Sort xds/import.sh protos alphabetically
- cncf/xds: Sync protos to cncf/xds@d92e9ce (commit 2021-12-16, corresponding to
envoy cl/440193522). It's a no-op for used protos, but helpful to import the
latest matcher.proto
- cncf/xds: Import xds/type/matcher/v3/matcher.proto with dependencies
- envoyproxy/protoc-gen-validate: Sync protos to
envoyproxy/protoc-gen-validate@dfcdc5e (commit 2022-03-10, corresponding to
envoy cl/440193522) to pick up ignore_empty field required for the following
envoy sync
- envoyproxy/envoy Sync protos to envoyproxy/envoy@e33f444 (commit 2022-04-07,
cl/440193522). This is the minimal version needed to pick up
ClusterSpecifierPlugin.is_optional. a. Generated code:
AggregatedDiscoveryServiceGrpc was regenerated from the updated proto. This
is a no-op, just a minor change to the docblocks. b. Deprecated fields had to
be taken care of manually, see "Manual updates to the code" below.
- envoyproxy/envoy Sync protos to the latest imported version
envoyproxy/envoy@5d74719 (commit 2022-04-08, cl/443359189). Not needed for
anything specific, just the last version, and was easy to import.
Manual updates to the code as the result of envoyproxy/envoy@e33f444 sync:
- Deprecated ConfigSource.path replaced with the ConfigSource.path_config_source
in test fake resources. The ConfigSource.path isn't in active code paths, so
no prod code changes needed.
- Suppress CertificateValidationContext.match_subject_alt_names deprecations in
test files. Surprisingly, we don't report deprecations in prod files, despite
the fact this field is used in prod code a few times.
* api: add support for SocketAddress types in ManagedChannelProvider
also add support for SocketAddress types in NameResolverProvider
Use scheme in target URI to select a NameRseolverProvider and get
that provider's supported SocketAddress types.
implement selection in ManagedChannelRegistry of appropriate
ManagedChannelProvider based on NameResolver's SocketAddress types
This refactoring is done in preparation of a larger change where LB configuration will be provided in the xDS Cluster proto message load_balancing_policy field. This field will allow for the configuration of custom LB policies with arbitrary configuration data.
- Instead of directly creating Java configuration objects, the client delegates to a new factory class to generate JSON configurations
- This factory is considered a "legacy" one as a separate factory will be introduced to build configs based on the new load_balancing_policy field
- The client will use a LoadBalancerProvider to parse the generated config to assure it is valid.
- Overlapping LB config validation that exists both in ClientXdsClient and LB providers will be removed from the client.
This is a second attempt at #8996 that was reverted by #9092.
The initial PR was reverted because the change caused the duplicate CDS update detection in ClientXdsClient to fail. This was because equality checking of PolicySelection instances cannot be relied on. This PR uses the JSON config instead - CdsLoadBalancer2 will handle the conversion from JSON config to PolicySelection.
changes in priority:
Keep track of whether a child has seen TRANSIENT_FAILURE more recently than IDLE or READY, and use this to decide whether to restart the failover timer when a child reports CONNECTING. This ensures that we properly start the failover timer when the ring_hash child policy transitions from IDLE to CONNECTING at startup.
Behaviour change also affects address updates the current priority from CONNECTING to CONNECTING, previously it reports one CONNECTING, right now it does not report and wait there due to failover timer in effect. This helps to try the next priority.
* Revert "- Change config builder to a static factory class. - Remove validation and default value logic that already exists in providers from the factory. - Using the PolicySelection in CdsUpdate instead of the JSON config."
This reverts commit 54c72b945e.
* Revert "xds: ClientXdsClient to provide LB config in JSON"
This reverts commit 4903b44a82.
Users should be able to inject all executors. The transport shouldn't be
hard-coded to create the TIMER_SERVICE, especially since a scheduler is
already available to the builder.
This matches what we do in ManagedChannelImplBuilder and
NettyChannelBuilder. It also fixes a (probably unimportant) bug where
the factory returned from swapChannelCredentials() didn't have its
references to the executors so could not outlive the parent factory.
- Remove validation and default value logic that already exists in providers from the factory.
- Using the PolicySelection in CdsUpdate instead of the JSON config.
This refactoring is done in preparation of a larger change where LB
configuration will be provided in the xDS Cluster proto message
load_balancing_policy field. This field will allow for the configuration
of custom LB policies with arbitrary configuration data.
- Instead of directly creating Java configuration objects, the client
delegates to a new builder to generate JSON configurations
- This factory is considered a "legacy" one as a separate factory will
be introduced to build configs based on the new load_balancing_policy
field
- The client will use a LoadBalancerProvider to parse the generated
config to assure it is valid.
- CdsLoadBalancer2 will parse to config again to produce the LB config
object passed down to child LBs.
A substantial portion of the methods are unused. While these don't
contribute to the size of Android builds because of dead code
elimination in the build process, they still show up in static analysis
and raise questions like "when are we using MD5" or "when are we special
casing exception message text" (answer: "we're not").
This would limit LRS stream creation to one per second, even if the
old stream was considered good as it received a response. This is the
same change as made to ADS in 957079194a.
b/224833499
In the olden days, before LB policies, transports had to accept RPCs as
soon as they were created. This hasn't been true for a very long time,
so remove the tests.
Since a978c9ed we're using real, legit code flows in the tests. This
allowed TSAN to discover that `attributes` is racy when read when
creating a new stream before the transport is ready. We could use a lock
or volatile, but the value of the attributes would still be incorrect
for any RPCs that are created before the transport is ready.
Since there's now only one test that delays the connection, I inline the
support code.
This greatly reduces the number of arguments passed to the constructor
and allows using the builder in tests to change specific arguments
without having to pass all the other arguments. It also makes it easier
to see where tests are doing something special.
While it is weird to expose fields as package-private for digging-into
in the constructor, it's actually very similar to the pattern of passing
the builder instance into the constuctor. In this case, the weirdness is
because the builder isn't a nested class of the transport and there is
an additional level of building going on (Builder and TransportFactory).
We do this pattern already in ManagedChannelImpl which only has the one
level of building.
Ticker is powered by System.nanoTime() which is CLOCK_MONOTONIC.
TimeProvider is powered by System.currentTimeMillis() which is
CLOCK_REALTIME. For durations, the monotonic clock is appropriate, not
the wall time which can jump around.
Ticker is powered by System.nanoTime() which is CLOCK_MONOTONIC.
TimeProvider is powered by System.currentTimeMillis() which is
CLOCK_REALTIME. For durations, the monotonic clock is appropriate, not
the wall time which can jump around.