Commit Graph

995 Commits

Author SHA1 Message Date
Eric Anderson 75fa441fc9
xds: Plumb the Cluster's filterMetadata to RPCs
This will be used by CSM observability, and may get exposed to further
uses in the future.
2024-05-24 15:08:36 -07:00
Eric Anderson fea577c804
core: Exit idle mode when delayed transport is in use
8844cf7b8 triggered a regression where a new RPC wouldn't cause the
channel to exit idle mode, if an RPC was still progressing on an old
transport. This was already possible previously, but was racy.
8844cf7b8 made it less racy and more obvious.

The two added `exitIdleMode()` calls in this commit are companions to
those in `enterIdleMode()`, which detect whether the channel should
immediately exit idle mode.

Noticed in cl/635819804.
2024-05-23 14:45:38 -07:00
Vindhya Ningegowda 77a1e77e11
xds, rls: Experimental metrics are disabled by default (#11196)
Experimental metrics (i.e WRR and RLS metrics) are disabled by default. Users are expected to explicitly enable while configuring metrics.
2024-05-10 17:46:58 -07:00
Eric Anderson 7a663f633c api: Hide internal metric APIs
Some APIs were marked experimental but had internal APIs in their
surface. These were all changed to internal. And then the internal APIs
were mostly hidden from generated documentation.

All these APIs will eventually become public and maybe even stable. But
they need some iteration before we're ready for others to start using
them.
2024-05-08 10:24:24 -07:00
Larry Safran 59b189bf91
Change HappyEyeballs and new pick first LB flags default value to false (#11120)
* Change HappyEyeballs flag default value to false since some G3 users are seeing problems.
Put the flag logic in a common place for PickFirstLeafLoadBalancer & WRR's test.

* Set expected requestConnection count based on whether happy eyeballs is enabled or not

* Disable new PickFirstLB

* Fix test expectations to handle both new and old PF LB paths.
2024-05-08 10:08:23 -07:00
Eric Anderson 45a91bd035 xds: Add WRR metric test with real channel 2024-05-08 07:50:09 -07:00
Terry Wilson 2bc4306940
xds: Include locality label in WRR metrics (#11170) 2024-05-07 11:40:03 -07:00
hakusai22 6ec744f2a0
Fix various typos (#11144) 2024-05-06 20:29:44 -07:00
Eric Anderson 077dcbf90f xds: Plumb locality in xds_cluster_impl and weighted_target
As part of gRFC A78:

> To support the locality label in the WRR metrics, we will extend the
> `weighted_target` LB policy (see A28) to define a resolver attribute
> that indicates the name of its child. This attribute will be passed
> down to each of its children with the appropriate value, so that any
> LB policy that sits underneath the `weighted_target` policy will be
> able to use it.

xds_cluster_impl is involved because it uses the child names in the
AddressFilter, which must match the names used by weighted_target.
Instead of using Locality.toString() in multiple policies and assuming
the policies agree, we now have xds_cluster_impl decide the locality's
name and pass it down explicitly. This allows us to change the name
format to match gRFC A78:

> If locality information is available, the value of this label will be
> of the form `{region="${REGION}", zone="${ZONE}",
> sub_zone="${SUB_ZONE}"}`, where `${REGION}`, `${ZONE}`, and
> `${SUB_ZONE}` are replaced with the actual values. If no locality
> information is available, the label will be set to the empty string.
2024-05-03 12:42:31 -07:00
Terry Wilson 35a171bc1d
xds: include the target label to WRR metrics (#11141) 2024-05-01 15:20:38 -07:00
Eric Anderson 4c78a9746c
Plumb optional labels from LB to ClientStreamTracer
As part of gRFC A78:

> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.
2024-04-29 16:30:51 -07:00
Terry Wilson 06df25b65d
core,xds: Metrics recording in WRR LB (#11129)
Adds the recording of the four metrics documented in:

https://github.com/grpc/proposal/blob/master/A78-grpc-metrics-wrr-pf-xds.md#weighted-round-robin-lb-policy
2024-04-26 15:59:49 -07:00
Eric Anderson 9de8e44384 util: Remove deactivation and GracefulSwitchLb from MultiChildLb
It is easy to manage these things outside of MultiChildLb and it makes
the shared code easier and use less memory. In particular, we don't want
to use many instances of GracefulSwitchLb in virtually every policy
simply because it was needed in one or two cases.
2024-04-22 07:48:49 -07:00
Eric Anderson 7f0a1910d3 xds: Directly manage deactivation in cluster manager 2024-04-22 07:48:49 -07:00
Eric Anderson 61bf21e2a1 xds: Swap RingHashLb to use lazy child, instead of deactivation 2024-04-22 07:48:49 -07:00
Eric Anderson 32d48ae89a bazel: Fix formatting with buildifier
Keys and dependencies are sorted.
2024-04-04 11:50:07 -07:00
Kannan J 097a46b761 Use empty string instead of null for endpoint identification algorithm to disable server hostname verification, since null value gets ignored in Sun's SSLEngine implementation. 2024-03-28 16:58:48 -07:00
David Burns 00649913b0
bazel: Use the `artifact` macro for loading maven deps
The recommended way to load dependencies from `rules_jvm_external`
is to make use of the `@maven` workspace, and the most readable
way of doing that is to use the `artifact` macro provides.

This removes the need to generate the "compat" namespaces, which
`rules_jvm_external` provided for backwards compatibility with
older releases. This change also sets things up for supporting
`bzlmod`: this requires all workspaces accessed by a library to
be named "up front" in the `MODULE.bazel` file. This way, the
only repo that needs to be exported is `@maven`, rather than the
current huge list.
2024-03-28 14:33:32 -07:00
Eric Anderson e6305930de Specify a locale for upper/lower case conversions
None of these conversions should use the arbitrary system locale. Error
Prone will help prevent these getting introduced in the future.

Fixes #10372
2024-03-27 15:58:34 -07:00
Larry Safran 51f811df86
Enable Happy Eyeballs by default (#11022)
* Flip the flag

* Fix test flakiness where IPv6 was not considered loopback
2024-03-21 16:59:54 -07:00
Larry Safran 38f968fafb
Have EDS resource parse the additional addresses from envoy message (#11011)
* Have EDS resource parse the additional addresses from envoy message
* Update respositories.bzl to point to current grpc-proto instead of a 2021 version.
* Update respositories.bzl to point to recent cncf/xds and envoyproxy/data-plane-api
* Add cncf_upda to repositories.bzl
2024-03-15 12:26:21 -07:00
Sergii Tkachenko 0d749c5943
xds: Stabilize CsdsService (#11003)
To make it stable, this PR hides protobuf from being exposed via the
API.

Note: this breaks ABI of `CsdsService.streamClientStatus` and
`CsdsService.fetchClientStatus`, but these methods should not
normally be called by the user.

Closes #8016.
2024-03-11 16:14:05 -07:00
Larry Safran d1c406bd23
Prepare to switch flag to use new PickFirstLeafLoadBalancer by default (#10998)
* Fix PickFirstLeafLoadBalancer and tests to work when it is used.
* Actually use EAG attributes for subchannels.
2024-03-11 14:12:56 -07:00
Eric Anderson 9ee5e9f008 xds: Move node id logging out of xds.client
This removes a grpc-ism environment variable. Note that the logger is
still registered under XdsClientImpl. That could maybe change, but it is
a bit unclear what it should become and it seemed better for this to
have no behavior changes.
2024-03-06 07:25:54 -08:00
Eric Anderson 42b2cbdec3 xds: Move LR and RLS experimental flags to where they are used
That's better for code organization and also removes some grpc-isms from
XdsResourceType.
2024-03-06 07:25:33 -08:00
Eric Anderson 85e52cd113 xds: Remove WRR and PF experimental flags
They have been on by default for a good while, and seem stable. This
also removes some grpc-isms from XdsResourceType.
2024-03-05 13:21:53 -08:00
Eric Anderson 27824469bf xds: Provide default XdsResourceType.extractResourceName impl
This method is only needed sometimes, and with time will be needed less
and less. Don't require new types to implement it, instead relying on
control planes to use the new approach.
2024-03-05 11:30:35 -08:00
Anirudh Ramachandra 867e469404
xds: Support retrieving names from wrapped resource containers (#10975)
The xDS library only honored names retrieved from the inner resource
containers, but for wrapped resources the outer layer could contain the
required name. This commit prefers the name on the wrapped container
over the inner resource name.
2024-03-05 07:22:18 -08:00
Eric Anderson ac62c8b055 Fix tests and warnings on Java 17
SelfSignedCertificate is not available on Java 17 because
OpenJdkSelfSignedCertGenerator is not available. This only impacted
tests.

AccessController is being removed, and these locations are doing simple
reflection which is unlikely to require it even when a security policy
is in effect. There's other places we do reflection without the
AccessController, so either no security policies care or the users can
update their policies to allow it.
2024-02-29 16:55:46 -08:00
Sergii Tkachenko feab4e5449
xds: Get rid of xDS v2 dependencies (#10968)
xDS v2 support was dropped about a year ago, but the xds package still
had a few xDS v2 usages. This PR:

- Removes all leftover usages of xDS v2 classes in gprc-xds
- Removes all imported xDS v2 protos and their leaf dependencies:
- Removes xDS v2 generated services
- Makes minor improvements to the xds import script output

### Before
```sh
# Imported 154 protos.
❯ find . -iname "*xds*.jar" -exec du -h {} \; | col -x
  13M ./build/libs/grpc-xds-1.63.0-SNAPSHOT-original.jar
  6.1M ./build/libs/grpc-xds-1.63.0-SNAPSHOT-sources.jar
  388K ./build/libs/grpc-xds-1.63.0-SNAPSHOT-javadoc.jar
  14M ./build/libs/grpc-xds-1.63.0-SNAPSHOT.jar

```

### After
```sh
# Imported 86 protos.
❯ find . -iname "*xds*.jar" -exec du -h {} \; | col -x
  9.1M ./build/libs/grpc-xds-1.63.0-SNAPSHOT-original.jar
  4.1M ./build/libs/grpc-xds-1.63.0-SNAPSHOT-sources.jar
  388K ./build/libs/grpc-xds-1.63.0-SNAPSHOT-javadoc.jar
  9.1M ./build/libs/grpc-xds-1.63.0-SNAPSHOT.jar ```

Reduction:
- Number of protos: 44%
- Jar size: 35%
2024-02-29 10:33:18 -08:00
Eric Anderson 9b53bcaf66 xds: Increase timeouts in XdsClientFederationTest
`isolatedResourceDeletions()` has failed with a timeout waiting on
onChanged when running under TSAN. TSAN can slow things down, so let's
increase the timeout to ensure it isn't just timeout flake.
2024-02-27 11:28:24 -08:00
Larry Safran 8087977c0b
Move xds classes for Stubby to xds.client package (#10912)
* Move bootstrap, XdsClient, load reporting, XdsLogger and XdsResourceType to xds.client package.
2024-02-26 16:41:16 -08:00
yifeizhuang 78b3972ff3
xds: fix xdsNameResolver virtual host lookup authority, use service authority instead of ldsResourceName (#10960) 2024-02-26 16:33:20 -08:00
Eric Anderson 569956e022 xds: Pre-add fallback to xds client pool accessor
When we implement A71, we're no longer going to have a single xds
client, but instead one per channel target. Add that parameter now, even
though it is unused, to avoid managing the (internal) API breakage when
we implement fallback.
2024-02-23 14:12:46 -08:00
Eric Anderson bfc0f959cd xds: Avoid nonexistent DNS resolution in XdsClientImplV3Test
The DNS lookups are taking considerable time on the Windows CI (~11s),
which causes the test to time out:

```
Wanted but not invoked:
ldsResourceWatcher.onError(<any>);
-> at io.grpc.xds.XdsClientImplTestBase.sendToNonexistentHost(XdsClientImplTestBase.java:3733)
Actually, there were zero interactions with this mock.

	at io.grpc.xds.XdsClientImplTestBase.sendToNonexistentHost(XdsClientImplTestBase.java:3733)
```

The ARM build, which uses an emulator, has had this test succeed, so the
failure seems unrelated to CPU usage. We want to avoid external I/O
anyway during tests, so removing the DNS lookup is good.

The TSAN comment referenced XdsClientImplTestBase.sendToNonexistentHost,
but the test no longer calls fakeClock.forwardTime so the comment was
out-of-date. Change the comment to make clear the race involved.
2024-02-23 11:44:06 -08:00
Eric Anderson d7628a3aba
xds: Fix flow control data race in ControlPlaneClient
As discovered by TSAN, the adsStream field is not synchronized.
```
WARNING: ThreadSanitizer: data race (pid=1625)
  Read of size 4 at 0x00009b66fc88 by thread T23 (mutexes: write M0):
    #0 io.grpc.xds.ControlPlaneClient.isReady()Z ControlPlaneClient.java:203
    #1 io.grpc.xds.ControlPlaneClient.readyHandler()V ControlPlaneClient.java:211
    #2 io.grpc.xds.ControlPlaneClient$AdsStream.onReady()V ControlPlaneClient.java:328
    #3 io.grpc.xds.GrpcXdsTransportFactory$EventHandlerToCallListenerAdapter.onReady()V GrpcXdsTransportFactory.java:145
    #4 io.grpc.PartialForwardingClientCallListener.onReady()V PartialForwardingClientCallListener.java:44
    #5 io.grpc.ForwardingClientCallListener.onReady()V ForwardingClientCallListener.java:23
    #6 io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onReady()V ForwardingClientCallListener.java:40
    #7 io.grpc.PartialForwardingClientCallListener.onReady()V PartialForwardingClientCallListener.java:44
    #8 io.grpc.ForwardingClientCallListener.onReady()V ForwardingClientCallListener.java:23
    #9 io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onReady()V ForwardingClientCallListener.java:40
    #10 io.grpc.internal.DelayedClientCall$DelayedListener.onReady()V DelayedClientCall.java:497
    #11 io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamOnReady.runInternal()V ClientCallImpl.java:781
    #12 io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamOnReady.runInContext()V ClientCallImpl.java:772
    #13 io.grpc.internal.ContextRunnable.run()V ContextRunnable.java:37
    #14 io.grpc.internal.SerializingExecutor.run()V SerializingExecutor.java:133
    #15 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V ThreadPoolExecutor.java:1130
    #16 java.util.concurrent.ThreadPoolExecutor$Worker.run()V ThreadPoolExecutor.java:630
    #17 java.lang.Thread.run()V Thread.java:830
    #18 (Generated Stub) <null>

  Previous write of size 4 at 0x00009b66fc88 by thread T4 (mutexes: write M1, write M2, write M3, write M4, write M5):
    #0 io.grpc.xds.ControlPlaneClient$AdsStream.cleanUp()V ControlPlaneClient.java:424
    #1 io.grpc.xds.ControlPlaneClient$AdsStream.close(Ljava/lang/Exception;)V ControlPlaneClient.java:418
    #2 io.grpc.xds.ControlPlaneClient$1.run()V ControlPlaneClient.java:130
    #3 io.grpc.SynchronizationContext.drain()V SynchronizationContext.java:94
    #4 io.grpc.SynchronizationContext.execute(Ljava/lang/Runnable;)V SynchronizationContext.java:126
    #5 io.grpc.xds.XdsClientImpl.shutdown()V XdsClientImpl.java:207
    #6 io.grpc.xds.SharedXdsClientPoolProvider$RefCountedXdsClientObjectPool.returnObject(Ljava/lang/Object;)Lio/grpc/xds/XdsClient; SharedXdsClientPoolProvider.java:144
    #7 io.grpc.xds.SharedXdsClientPoolProvider$RefCountedXdsClientObjectPool.returnObject(Ljava/lang/Object;)Ljava/lang/Object; SharedXdsClientPoolProvider.java:102
    #8 io.grpc.xds.XdsClientFederationTest.cleanUp()V XdsClientFederationTest.java:86
```
2024-02-22 14:43:29 -08:00
Eric Anderson f4cc166f18 xds: Copy data in least request to avoid picker data race
In 0d39bf50 the ReadyPicker was changed holding List<Subchannel> to
List<ChildLbState>, but ChildLbState mutates over time and is not
synchronized. We want the picker to have a snapshot of the data, so copy
the data from ChildLbState instead of using it directly.

Unfortunately the tests depended on the ChildLbState a bit, so we need
to save the EAG only to use it in tests. That's okay for now, but in the
future we'll probably want to remove that unnecessary memory usage.
2024-02-16 13:44:25 -08:00
Larry Safran 044749706a
util:MultiChildLoadBalancer cleanup (#10780)
* add final, change method permissions, add javadoc, cleanup unneeded, move updateOverallBalancingState to ClusterManagerLB and make it abstract

* Restructure to eliminate the flags as protected methods

* Move methods around so that the candidates for override are near the top.

* Reorder picker methods lower
2024-02-15 14:12:40 -08:00
Eric Anderson 7787673992 xds: Replace isEquivalentTo with equals in LeastRequest
This is similar to the changes to round robin in dca89b25.
2024-02-13 08:29:31 -08:00
Anirudh Ramachandra 608bb8499c
Mark couple of helper functions in XdsClient as public. (#10871) 2024-02-09 12:53:03 -08:00
Anirudh Ramachandra 52b11c1d08
Expose the getOrCreate method via the InternalSharedXdsClientPoolProvider. This is needed for internal users to both set the bootstrap and interact with the XdsClient via the shared object pool (#10872) 2024-02-09 12:50:26 -08:00
yifeizhuang 03decafa1f
XdsClient is experimental (#10876) 2024-02-06 17:43:30 -08:00
yifeizhuang f6d9221b65
xds: hide TlsContextManager in XdsResourceType.Args (#10894) 2024-02-06 15:41:24 -08:00
Sergii Tkachenko 4f7ec131ec
xds: Googleapis proto sync to 2023-01-10 (#10896)
Sync googleapis protos to
googleapis/googleapis@114a745b28 for
consistency with Envoy and cncf/xds.

The same version is used in [envoy]
(62e7c59374/api/bazel/repository_locations.bzl (L69))
and [cncf/xds]
(0fa0005c9c/bazel/repository_locations.bzl (L23))
since Jan 2023.

Function-wise, this is a noop.
2024-02-05 18:52:55 -08:00
Sergii Tkachenko 68334a019a
xds: Envoy proto sync to 2024-01-24 (#10895)
`envoyproxy/envoy`: Sync protos to the latest imported version
147e6b9523
(commit 2024-01-24, cl/604403196).

Should be a noop, just a routine xDS proto update to make upcoming
RLQS-related imports simpler.
2024-02-05 17:21:51 -08:00
Anirudh Ramachandra 3202370684
Allow users to start watching xDS resources (#10864) 2024-02-01 09:21:00 -08:00
yifeizhuang a97f21b61e
xds: fix NPE in wrr in TF state (#10868) 2024-02-01 09:06:46 -08:00
yifeizhuang 8d280c97e3
xds: move filterRegistry and loadBalancerRegistry out of XdsResourceType.Args (#10843) 2024-01-31 09:45:37 -08:00
yifeizhuang 20abea47bc
xds: move tlsContextManager (#10859)
Minor refactor to the tlsContextManager to not expose itself on the xdsClientImpl constructor.
This is to allow people who plugins xdsTransportFactory to use the API easily.
2024-01-30 12:59:33 -08:00
yifeizhuang 8e1cc943b0
xds: change controlPlaneClient and loadReportClient to use xdsTransportFactory (#10829) 2024-01-25 17:05:12 -08:00