Commit Graph

6308 Commits

Author SHA1 Message Date
Eric Anderson f737cbc143 api: Hide internal metric APIs
Some APIs were marked experimental but had internal APIs in their
surface. These were all changed to internal. And then the internal APIs
were mostly hidden from generated documentation.

All these APIs will eventually become public and maybe even stable. But
they need some iteration before we're ready for others to start using
them.
2024-05-09 11:31:01 -07:00
Eric Anderson 1e731be49a opentelemetry: Rename and stabilize API OpenTelemetryModule
OpenTelemetryModule is renamed to GrpcOpenTelemetry. The Builder is now
`final`, although that should only impact mocks as it had a private
constructor.

Fixes #10591
2024-05-09 10:48:59 -07:00
Eric Anderson 8808d63338 opentelemetry: Missing locality should be empty string
From gRFC A78:

> If no locality information is available, the label will be set to the
> empty string.
2024-05-09 10:48:41 -07:00
Eric Anderson a239063c2b xds: Add WRR metric test with real channel 2024-05-09 10:48:24 -07:00
Terry Wilson 2897b39390 xds: Include locality label in WRR metrics (#11170) 2024-05-09 09:01:54 -07:00
Eric Anderson 3b6b1537d4 rls: Add metric test with real channel 2024-05-09 09:01:35 -07:00
Eric Anderson a639175c04 opentelemetry: Add optional grpc.lb.locality to per-call metrics
The optional label API was added in 4c78a974 and xds_cluster_impl was
plumbed in 077dcbf9.

From gRFC A78:

> ### Optional xDS Locality Label
>
> When xDS is used, it is desirable for some metrics to include an optional
> label indicating which xDS locality the metrics are associated with.
> We want to provide this optional label for the metrics in both the
> existing per-call metrics defined in [A66] and in the new metrics for
> the WRR LB policy, described below.
>
> If locality information is available, the value of this label will be of
> the form `{region="${REGION}", zone="${ZONE}", sub_zone="${SUB_ZONE}"}`,
> where `${REGION}`, `${ZONE}`, and `${SUB_ZONE}` are replaced with the
> actual values.  If no locality information is available, the label will
> be set to the empty string.
>
> #### Per-Call Metrics
>
> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.  We will then use this mechanism in the `xds_cluster_impl`
> policy's picker to set the locality label. ...
>
> This label will be available on the following per-call metrics:
> - `grpc.client.attempt.duration`
> - `grpc.client.attempt.sent_total_compressed_message_size`
> - `grpc.client.attempt.rcvd_total_compressed_message_size`
2024-05-09 09:01:17 -07:00
Eric Anderson b6f7b693e7 Add gauge metric API and Otel implementation
This is needed by gRFC A78 for xds metrics, and for RLS metrics. Since
gauges need to acquire a lock (or other synchronization) in the
callback, the callback allows batching multiple gauges together to avoid
acquiring-and-requiring such locks.

Unlike other metrics, gauges are reported on-demand to the MetricSink.
This means not all sinks will receive the same data, as the sinks will
ask for the gauges at different times.
2024-05-08 19:28:28 -07:00
Eric Anderson 1994125c78 opentelemetry: Add grpc.target label to per-call metrics
As defined by gRFC A66, the target is on all client-side per-call
metrics (both call and attempt).
2024-05-08 19:28:14 -07:00
Eric Anderson 952ac022ee Add internal channel builder API to get target
This will be used for gRFC A66's OTel per-RPC metric label:

> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`).

The majority of the changes are to move target computation from
ManagedChannelImpl into the builder. A small hack API was added to
ManagedChannelBuilder to get the target to create an interceptor.
2024-05-08 19:28:14 -07:00
Eric Anderson affa470252 opentelemetry: Fix checking wrong metric for null 2024-05-08 19:27:47 -07:00
Eric Anderson 18cf46e456 Migrate GlobalInterceptors to ConfiguratorRegistry
This should preserve all the existing behavior of GlobalInterceptors as
used by grpc-gcp-observability, including it disabling the implicit
OpenCensus integration.

Both the old and new API are internal. I hid Configurator and
ConfiguratorRegistry behind Internal-prefixed classes, like had been
done with GlobalInterceptors to further discourage use until the API is
ready.

GlobalInterceptorsTest was modified to become ConfiguratorRegistryTest.
2024-05-08 16:23:51 -07:00
Larry Safran f16b6f2080
Change HappyEyeballs and new pick first LB flags default value to false (#11120) (#11177)
* Change HappyEyeballs flag default value to false since some G3 users are seeing problems.
Put the flag logic in a common place for PickFirstLeafLoadBalancer & WRR's test.

* Set expected requestConnection count based on whether happy eyeballs is enabled or not

* Disable new PickFirstLB

* Fix test expectations to handle both new and old PF LB paths.
2024-05-08 16:06:36 -07:00
Eric Anderson 8f81bd2886 xds: Plumb locality in xds_cluster_impl and weighted_target
As part of gRFC A78:

> To support the locality label in the WRR metrics, we will extend the
> `weighted_target` LB policy (see A28) to define a resolver attribute
> that indicates the name of its child. This attribute will be passed
> down to each of its children with the appropriate value, so that any
> LB policy that sits underneath the `weighted_target` policy will be
> able to use it.

xds_cluster_impl is involved because it uses the child names in the
AddressFilter, which must match the names used by weighted_target.
Instead of using Locality.toString() in multiple policies and assuming
the policies agree, we now have xds_cluster_impl decide the locality's
name and pass it down explicitly. This allows us to change the name
format to match gRFC A78:

> If locality information is available, the value of this label will be
> of the form `{region="${REGION}", zone="${ZONE}",
> sub_zone="${SUB_ZONE}"}`, where `${REGION}`, `${ZONE}`, and
> `${SUB_ZONE}` are replaced with the actual values. If no locality
> information is available, the label will be set to the empty string.
2024-05-08 15:50:03 -07:00
Vindhya Ningegowda d6f1a9d569 Add MetricSink implementation for gRPC OpenTelemetry
This adds the following components that are required for gRPC A79
non-per-call metrics architecture.

- MetricSink implementation for gRPC OpenTelemetry
- Configurator for plumbing per call metrics ClientInterceptor and
  ServerStreamTracer.Factory via unified OpenTelemetryModule.
2024-05-08 15:49:49 -07:00
Sergii Tkachenko 79bb5e540d
buildscripts: simplify PSM interop Kokoro buildscripts (#11121) (#11158)
Integrates the new features of the the Kokoro PSM Interop install library introduced in grpc/psm-interop#73.

Nearly all common functionality was moved from per-language/per-branch PSM Interop build scripts to [psm_interop_kokoro_lib.sh](https://github.com/grpc/psm-interop/blob/main/.kokoro/psm_interop_kokoro_lib.sh):
1. The list of tests in the each test suite 
2. Per-test-suite flag customization
3. `run_test` methods
4. `build_docker_images_if_needed` methods
5. Generic `build_test_app_docker_images` methods (simple docker build + docker push + docker tag). grpc-java is one exception, as it doesn't run docker directly, but a cloudbuild flow.

Now all PSM Interop jobs share the same buildscripts by all test suites:
1.  buildscript that invokes the test: `psm-interop-test-{language}.sh` (configured as `build_file` in the build cfg)
2. buildscript that builds the xDS test client/server and publishes them as a Docker image: `psm-interop-build-{language}.sh` (conventional name called from `psm_interop_kokoro_lib.sh`)

`psm-interop-test-{language}.sh`:
1. Sets `GRPC_LANGUAGE`, `BUILD_SCRIPT_DIR` environment variables.
2. Downloads the shared `psm_interop_kokoro_lib.sh` from the main branch of the psm-interop repo.
3. Sources `psm-interop-build-{language}.sh`
4. Calls `psm::run "${PSM_TEST_SUITE}"` (`PSM_TEST_SUITE` configured in the cfg file).

`psm-interop-build-{language}.sh`:
1. Defines `psm::lang::build_docker_images` which is called from `psm_interop_kokoro_lib.sh`.
2. Invokes any repo-specific logic.
3. May use `psm::build::docker_images_generic` for generic Docker build, tag, push, or provide implement its own build/publish method.

References:
- b/288578634
- See the full list of the new features at grpc/psm-interop#73.
- Additional fixes to the shared lib: grpc/psm-interop#78, grpc/psm-interop#79
2024-05-06 16:10:47 -07:00
Terry Wilson a1d19327fe
rls: Add the target label to RLS counter metrics (#11142) 2024-05-01 16:19:56 -07:00
Terry Wilson 35a171bc1d
xds: include the target label to WRR metrics (#11141) 2024-05-01 15:20:38 -07:00
Terry Wilson a9fb272b78
rls: add counter metrics (#11138)
Adds the following metrics to the RlsLoadBalancer:
- grpc.lb.rls.default_target_picks
- grpc.lb.rls.target_picks
- grpc.lb.rls.failed_picks
2024-05-01 11:24:38 -07:00
Eric Anderson 4561bb5b80 Plumb target to load balancer
gRFC A78 has WRR and pick-first include a `grpc.target` label, defined
in A66:

> `grpc.target` : Canonicalized target URI used when creating gRPC
> Channel, e.g. "dns:///pubsub.googleapis.com:443",
> "xds:///helloworld-gke:8000". Canonicalized target URI is the form
> with the scheme included if the user didn't mention the scheme
> (`scheme://[authority]/path`). For channels such as inprocess channels
> where a target URI is not available, implementations can synthesize a
> target URI.
2024-05-01 09:19:45 -07:00
Eric Anderson 27d57585cd api: Return a noop MetricRecorder from Helper by default
Since 06df25b65d, WRR has been calling this method, and it will get an
exception. We don't want WRR to be broken until we have MetricRecorder
fully plumbed.
2024-04-30 07:18:56 -07:00
Eric Anderson 4c78a9746c
Plumb optional labels from LB to ClientStreamTracer
As part of gRFC A78:

> To support the locality label in the per-call metrics, we will provide
> a mechanism for LB picker to add optional labels to the call attempt
> tracer.
2024-04-29 16:30:51 -07:00
Terry Wilson 06df25b65d
core,xds: Metrics recording in WRR LB (#11129)
Adds the recording of the four metrics documented in:

https://github.com/grpc/proposal/blob/master/A78-grpc-metrics-wrr-pf-xds.md#weighted-round-robin-lb-policy
2024-04-26 15:59:49 -07:00
Vindhya Ningegowda 795ee0f6e3
Add MetricRecorder implementation (#11128)
* added MetricRecorderImpl and unit tests for MetricInstrumentRegistry

* updated MetricInstrumentRegistry to use array instead of ArrayList

* renamed record<>Counter APIs to add<>Counter. Added check for mismatched label values

* added lock for instruments array
2024-04-26 13:47:55 -07:00
Eric Anderson da619e2bde rls: Fix time handling in CachingRlsLbClient
`getMinEvictionTime()` was fixed to make sure only deltas were used for
comparisons (`a < b` is broken; `a - b < 0` is okay). It had also
returned `0` by default, which was meaningless as there is no epoch for
`System.nanoTime()`. LinkedHashLruCache now passes the current time into
a few more functions since the implementations need it and it was
sometimes already available. This made it easier to make some classes
static.
2024-04-25 15:38:39 -07:00
Eric Anderson 056195401f rls: Document RefCountedChildPolicyWrapperFactory as non-threadsafe
Instead of having docs in RefCountedChildPolicyWrapperFactory saying
that every method was guarded by a lock, I added `@GuardedBy("lock")`
within CachingRlsLbClient, so now it is clearly not thread-safe and the
lock protects access. The AtomicLong was replaced with a long since
1) there was no multi-threading and 2) the logic was not atomic-safe
which was misleading.
2024-04-25 15:35:50 -07:00
Eric Anderson 2840fd6b47 opentelemetry: Remove delayed attempt recording
In OpenCensus recording an attempt was delayed in order to wait for
inboundUncompressedSize(). But we don't need that in OpenTelemetry, and
could have removed this code when copying from OpenCensus.
2024-04-25 14:59:33 -07:00
rtadepalli 5c9b492318
Add `StatusProto.toStatusException` overload to accept `Throwable` (#11083)
* Add `StatusProto.toStatusException` overload to accept `Throwable`
---------

Co-authored-by: Eric Anderson <ejona@google.com>
2024-04-24 18:05:54 -07:00
Ryan P. Brewster e036b1b198 netty: Allow deframer errors to close stream with a status code
Today, deframer errors cancel the stream without communicating a status code
to the peer. This change causes deframer errors to trigger a best-effort
attempt to send trailers with a status code so that the peer understands
why the stream is being closed.

Fixes #3996
2024-04-24 14:37:37 -07:00
Eric Anderson 11612b484a Upgrade OpenTelemetry to 1.36.0 2024-04-23 17:24:47 -07:00
Larry Safran 9bf04db0d3
reorder bazel rule parameters to satisfy CheckBzlFormat (#11118) 2024-04-22 16:14:47 -07:00
Benjamin Peterson fb9a10809f
netty: Release SendGrpcFrameCommand when stream is missing (#11116)
`sendGrpcFrame` owns the buffer in `SendGrpcFrameCommand`. If the frame is not handed off to netty, it needs to be released in the method.

https://github.com/grpc/grpc-java/issues/11115
2024-04-22 10:27:39 -07:00
Ashok Varma 77e59b29dd cronet: Update to StandardCharsets and assertNotNull API's 2024-04-22 09:54:49 -07:00
Ashok Varma 163efa3716 cronet: Update to Java-8 API's and tighten the scopes 2024-04-22 09:54:49 -07:00
Ashok Varma c703a1ee07 cronet: @javadoc update android permission MODIFY_NETWORK_ACCOUNTING (deprecated) => UPDATE_DEVICE_STATS 2024-04-22 09:54:49 -07:00
Ashok Varma 5a8da19f32 cronet: Update Cronet to latest release + Move to Stable Cronet APIs. 2024-04-22 09:54:49 -07:00
Eric Anderson 9de8e44384 util: Remove deactivation and GracefulSwitchLb from MultiChildLb
It is easy to manage these things outside of MultiChildLb and it makes
the shared code easier and use less memory. In particular, we don't want
to use many instances of GracefulSwitchLb in virtually every policy
simply because it was needed in one or two cases.
2024-04-22 07:48:49 -07:00
Eric Anderson 7f0a1910d3 xds: Directly manage deactivation in cluster manager 2024-04-22 07:48:49 -07:00
Eric Anderson 61bf21e2a1 xds: Swap RingHashLb to use lazy child, instead of deactivation 2024-04-22 07:48:49 -07:00
Alex Panchenko 8a21afcc9e
compiler: add option `@generated=omit` (#11086)
related to #11081
2024-04-18 18:34:04 -07:00
Laglangyue 52e65ec0d8
minor: remove the unnecessary final,static (#11098) 2024-04-18 15:26:01 -07:00
hvadehra add8c37a41
Add `load()` statements for the Bazel builtin top-level java symbols (#11105)
Loads are being added in preparation for moving the symbols out of Bazel and into `rules_java`.
2024-04-17 16:43:21 -07:00
Vindhya Ningegowda c404c9f66c
Add MetricRecorder and MetricSink interface (#11109)
Adds interfaces required for recording metrics from gRPC components. And added API to get `MetricRecorder` in `LoadBalancer.Helper` and add `MetricSink` to `ManagedChannelBuilder`.
2024-04-17 15:10:57 -07:00
Sergii Tkachenko e490273edd
netty: Handle write queue promise failures (#11016)
Handles Netty write frame failures caused by issues in the Netty
itself.

Normally we don't need to do anything on frame write failures because
the cause of a failed future would be an IO error that resulted in
the stream closure.  Prior to this PR we treated these issues as a
noop, except the initial headers write on the client side.

However, a case like netty/netty#13805 (a bug in generating next
stream id) resulted in an unclosed stream on our side. This PR adds
write frame future failure handlers that ensures the stream is
cancelled, and the cause is propagated via Status.

Fixes #10849
2024-04-16 16:27:51 -07:00
Vindhya Ningegowda 497e155217
Add Metric Instrument Registry (#11103)
* added metric instrument registry
2024-04-12 13:42:40 -07:00
Sergii Tkachenko 34e241a60e
buildscripts: Migrate PSM Interop to Artifact Registry (#11079)
From Container Registry (gcr.io) to Artifact Registry (pkg.dev).
2024-04-08 10:27:09 -07:00
yifeizhuang 167a2031e2
Update README etc to reference 1.63.0 (#11076) 2024-04-05 10:40:01 -07:00
Eric Anderson 1d6f1f1b42 bazel: Verify Maven deps in bzlmod and WORKSPACE match
The text between the GRPC_DEPS_{START,END} must be identical in
formatting. Probably not a problem in general and not necessarily bad.
But it is simplistic.

Eric waking up this morning:
> We need more sed.
2024-04-04 14:05:05 -07:00
Eric Anderson 32d48ae89a bazel: Fix formatting with buildifier
Keys and dependencies are sorted.
2024-04-04 11:50:07 -07:00
Keith Smiley d1890c0acc
bazel: Add support for bzlmod (#11046) 2024-04-04 08:36:55 -07:00