Commit Graph

6473 Commits

Author SHA1 Message Date
Ashok Varma 5a8da19f32 cronet: Update Cronet to latest release + Move to Stable Cronet APIs. 2024-04-22 09:54:49 -07:00
Eric Anderson 9de8e44384 util: Remove deactivation and GracefulSwitchLb from MultiChildLb
It is easy to manage these things outside of MultiChildLb and it makes
the shared code easier and use less memory. In particular, we don't want
to use many instances of GracefulSwitchLb in virtually every policy
simply because it was needed in one or two cases.
2024-04-22 07:48:49 -07:00
Eric Anderson 7f0a1910d3 xds: Directly manage deactivation in cluster manager 2024-04-22 07:48:49 -07:00
Eric Anderson 61bf21e2a1 xds: Swap RingHashLb to use lazy child, instead of deactivation 2024-04-22 07:48:49 -07:00
Alex Panchenko 8a21afcc9e
compiler: add option `@generated=omit` (#11086)
related to #11081
2024-04-18 18:34:04 -07:00
Laglangyue 52e65ec0d8
minor: remove the unnecessary final,static (#11098) 2024-04-18 15:26:01 -07:00
hvadehra add8c37a41
Add `load()` statements for the Bazel builtin top-level java symbols (#11105)
Loads are being added in preparation for moving the symbols out of Bazel and into `rules_java`.
2024-04-17 16:43:21 -07:00
Vindhya Ningegowda c404c9f66c
Add MetricRecorder and MetricSink interface (#11109)
Adds interfaces required for recording metrics from gRPC components. And added API to get `MetricRecorder` in `LoadBalancer.Helper` and add `MetricSink` to `ManagedChannelBuilder`.
2024-04-17 15:10:57 -07:00
Sergii Tkachenko e490273edd
netty: Handle write queue promise failures (#11016)
Handles Netty write frame failures caused by issues in the Netty
itself.

Normally we don't need to do anything on frame write failures because
the cause of a failed future would be an IO error that resulted in
the stream closure.  Prior to this PR we treated these issues as a
noop, except the initial headers write on the client side.

However, a case like netty/netty#13805 (a bug in generating next
stream id) resulted in an unclosed stream on our side. This PR adds
write frame future failure handlers that ensures the stream is
cancelled, and the cause is propagated via Status.

Fixes #10849
2024-04-16 16:27:51 -07:00
Vindhya Ningegowda 497e155217
Add Metric Instrument Registry (#11103)
* added metric instrument registry
2024-04-12 13:42:40 -07:00
Sergii Tkachenko 34e241a60e
buildscripts: Migrate PSM Interop to Artifact Registry (#11079)
From Container Registry (gcr.io) to Artifact Registry (pkg.dev).
2024-04-08 10:27:09 -07:00
yifeizhuang 167a2031e2
Update README etc to reference 1.63.0 (#11076) 2024-04-05 10:40:01 -07:00
Eric Anderson 1d6f1f1b42 bazel: Verify Maven deps in bzlmod and WORKSPACE match
The text between the GRPC_DEPS_{START,END} must be identical in
formatting. Probably not a problem in general and not necessarily bad.
But it is simplistic.

Eric waking up this morning:
> We need more sed.
2024-04-04 14:05:05 -07:00
Eric Anderson 32d48ae89a bazel: Fix formatting with buildifier
Keys and dependencies are sorted.
2024-04-04 11:50:07 -07:00
Keith Smiley d1890c0acc
bazel: Add support for bzlmod (#11046) 2024-04-04 08:36:55 -07:00
Sergii Tkachenko b6ca908de8
MAINTAINERS.md: Add Kannan (#11057)
- [kannanjgithub](https://github.com/kannanjgithub), Google LLC
2024-04-03 14:28:14 -07:00
Eric Anderson 6e97b180b4
rls: Synchronization fixes in CachingRlsLbClient
This started with combining handleNewRequest with asyncRlsCall, but that
emphasized pre-existing synchronization issues and trying to fix those
exposed others. It was hard to split this into smaller commits because
they were interconnected.

handleNewRequest was combined with asyncRlsCall to use a single code
flow for handling the completed future while also failing the pick
immediately for thottled requests. That flow was then reused for
refreshing after backoff and data stale. It no longer optimizes the RPC
completing immediately because that would not happen in real life; it
only happens in tests because of inprocess+directExecutor() and we don't
want to test a different code flow in tests. This did require updating
some of the tests.

One small behavior change to share the combined asyncRlsCall with
backoff is we now always invalidate an entry after the backoff.
Previously the code could replace the entry with its new value in one
operation if the asyncRlsCall future completed immediately. That only
mattered to a single test which now sees an EXPLICIT eviction.

SynchronizationContext used to provide atomic scheduling in
BackoffCacheEntry, but it was not guaranteeing the scheduledRunnable was
only accessed from the sync context. The same was true for calling up
the LB tree with `updateBalancingState()`. In particular, adding entries
to the cache during a pick could evict entries without running the
cleanup methods within the context, as well as the RLS channel
transitioning from TRANSIENT_FAILURE to READY. This was replaced with
using a bare Future with a lock to provide atomicity.

BackoffCacheEntry no longer uses the current time and instead waits for
the backoff timer to actually run before considering itself expired.
Previously, it could race with periodic cleanup and get evicted before
the timer ran, which would cancel the timer and forget the
backoffPolicy. Since the backoff timer invalidates the entry, it is
likely useless to claim it ever expires, but that level of behavior was
preserved since I didn't look into the LRU cache deeply.

propagateRlsError() was moved out of asyncRlsCall because it was not
guaranteed to run after the cache was updated. If something was already
running on the sync context, then RPCs would hang until another update
caused updateBalancingState().

Some methods were moved out of the CacheEntry classes to avoid
shared-state mutation in constructors. But if we add something in a
factory method, we want to remove it in a sibling method to the factory
method, so additional code is moved for symmetry. Moving shared-state
mutation ouf of constructors is important because 1) it is surprising
and 2) ErrorProne doesn't validate locking within constructors. In
general, having shared-state methods in CacheEntries also has the
problem that ErrorProne can't validate CachingRlsLbClient calls to
CacheEntry. ErrorProne can't know that "lock" is already held because
CacheEntry could have been created from a _different instance_ of
CachingRlsLbClient and there's no way for us to let ErrorProne prove it
is the same instance of "lock".

DataCacheEntry still mutates global state that requires a lock in its
constructor, but it is less severe of a problem and it requires more
choices to address.
2024-04-03 12:22:04 -07:00
Rostislav 58de563fa4
examples: support retry policy example for bazel build
According to the docs, I can use bazel to build examples, but
retry-example is not supported in bazel config. So If you'll try to
build this example with bazel, you'll get an error:


```
examples git:(master) ✗ bazel build :retrying-hello-world-server :retrying-hello-world-client
ERROR: Skipping ':retrying-hello-world-client': no such target '//:retrying-hello-world-client': target 'retrying-hello-world-client' not declared in package '' defined by /Users/rostik404/projects/grpc-java/examples/BUILD.bazel
ERROR: no such target '//:retrying-hello-world-client': target 'retrying-hello-world-client' not declared in package '' defined by /Users/rostik404/projects/grpc-java/examples/BUILD.bazel
INFO: Elapsed time: 0.331s
INFO: 0 processes.
ERROR: Build did NOT complete successfully
```
2024-04-01 15:07:10 -07:00
François JACQUES 8050723397
OkHttpServer: support maxConcurrentCallsPerConnection (Fixes #11062). (#11063)
* Add option in OkHttpServerBuilder
* Add value as MAX_CONCURRENT_STREAM setting in settings frame sent by the server to the client per connection
* Enforce limit by sending a RST frame with REFUSED_STREAM error
2024-04-01 08:39:33 -07:00
Alex Panchenko e36f099be9
StatusException/StatusRuntimeException hide stack trace in a simpler way (#11064) 2024-04-01 08:31:13 -07:00
Eric Anderson 0866e716d6 README.md: Mention the netty-shaded "transport"
You have to dig deep into SECURITY.md to hear about what's in
netty-shaded. Add a small blub for it in the README.

Inspired by discussion on #10931
2024-03-29 11:02:29 -07:00
François JACQUES d21fe32bea
okhttp: Remove finished stream even if a pending stream was started
Fixes #11053
2024-03-29 10:00:32 -07:00
Kannan J 097a46b761 Use empty string instead of null for endpoint identification algorithm to disable server hostname verification, since null value gets ignored in Sun's SSLEngine implementation. 2024-03-28 16:58:48 -07:00
David Burns 00649913b0
bazel: Use the `artifact` macro for loading maven deps
The recommended way to load dependencies from `rules_jvm_external`
is to make use of the `@maven` workspace, and the most readable
way of doing that is to use the `artifact` macro provides.

This removes the need to generate the "compat" namespaces, which
`rules_jvm_external` provided for backwards compatibility with
older releases. This change also sets things up for supporting
`bzlmod`: this requires all workspaces accessed by a library to
be named "up front" in the `MODULE.bazel` file. This way, the
only repo that needs to be exported is `@maven`, rather than the
current huge list.
2024-03-28 14:33:32 -07:00
Larry Safran 4ef1baddd2
core: Remove useless ExperimentalApi annotation from pick first
PickFirstLeafLoadBalancer is in internal and also isn't public.
2024-03-28 08:07:53 -07:00
Eric Anderson 569b426d62 core: Improve clarity of RpcProgress meanings 2024-03-27 16:11:26 -07:00
Eric Anderson e6305930de Specify a locale for upper/lower case conversions
None of these conversions should use the arbitrary system locale. Error
Prone will help prevent these getting introduced in the future.

Fixes #10372
2024-03-27 15:58:34 -07:00
Ran 37263b774d
Make setOnReadyThreshold() a noop method instead of abstract. (#11044)
Make setOnReadyThreshold() a noop method instead of abstract
2024-03-27 14:56:13 -07:00
Kannan J 2c5f0c22cd
core: Transition to CONNECTING immediately when exiting idle
The name resolver takes some time before it returns addresses. While waiting the channel will be IDLE instead of the proper CONNECTING. This generally doesn't matter since RPCs behave similarly for IDLE and CONNECTING, but is confusing for users when watching channel.getState() closely.

Fixes #10517.
2024-03-27 11:40:13 -07:00
hypnoce f7ee5f3182 servlet: Check log fine level before hex string conversion. Fixes #11031. 2024-03-25 08:18:44 -07:00
abtom 537dbe826a
binder: Helper class to allow in process servers to use peer uids in test (#11014) 2024-03-22 15:57:29 -07:00
Terry Wilson 10cb4a3bed
util: Status desc for outlier detection ejection (#11036)
Including a Status description makes it easier to debug subchannel
closure issues if it's clear that a subchannel became unavailable because
of an outlier detection ejection.
2024-03-22 14:40:06 -07:00
Larry Safran bdb623031f
Fix retry race condition that can lead to double decrementing inFlightSubStreams and so miss calling closed (#11026) 2024-03-22 10:01:58 -07:00
yifeizhuang b3ffb5078d
Start 1.64.0 development cycle (#11030) 2024-03-22 09:32:10 -07:00
Eric Anderson 3abab95e75
core: Provide DEADLINE_EXCEEDED insights for context deadline
We provided extra details when the RPC is killed by CallOptions'
Deadline, but didn't do the same for Context.

To avoid duplicating code, things were restructured, including the
threading. There are more code flows now, but I think the
multi-threading came out more obvious and less error-prone. I didn't
change the status when the deadline is already expired, because the
text is shared with DelayedClientCall and AbstractInteropTest doesn't
distinguish between the two cases.

This is a roll-forward that avoids a NPE when cancel() is called
without an earlier call to start().

As seen at b/300991330
2024-03-21 17:02:06 -07:00
Larry Safran 51f811df86
Enable Happy Eyeballs by default (#11022)
* Flip the flag

* Fix test flakiness where IPv6 was not considered loopback
2024-03-21 16:59:54 -07:00
James Duong 2c83ef0632
Allow configuration of the queued byte threshold at which a Stream is considered not ready (#10977)
* Allow the queued byte threshold for a Stream to be ready to be configurable

- on clients this is exposed by setting a CallOption
- on servers this is configured by calling a method on ServerCall or ServerStreamListener
2024-03-21 15:37:26 -07:00
Terry Wilson 68eb639b1c
Revert "core: Provide DEADLINE_EXCEEDED insights for context deadline" (#11024)
This reverts commit 0e31ac9303.
2024-03-19 10:06:01 -07:00
Larry Safran 38f968fafb
Have EDS resource parse the additional addresses from envoy message (#11011)
* Have EDS resource parse the additional addresses from envoy message
* Update respositories.bzl to point to current grpc-proto instead of a 2021 version.
* Update respositories.bzl to point to recent cncf/xds and envoyproxy/data-plane-api
* Add cncf_upda to repositories.bzl
2024-03-15 12:26:21 -07:00
Eric Anderson 0e31ac9303 core: Provide DEADLINE_EXCEEDED insights for context deadline
We provided extra details when the RPC is killed by CallOptions'
Deadline, but didn't do the same for Context.

To avoid duplicating code, things were restructured, including the
threading. There are more code flows now, but I think the
multi-threading came out more obvious and less error-prone. I didn't
change the status when the deadline is already expired, because the
text is shared with DelayedClientCall and AbstractInteropTest doesn't
distinguish between the two cases.

As seen at b/300991330
2024-03-15 10:44:44 -07:00
Sergii Tkachenko fafd99db52
(minor) Add missing update notes for cronet to libs.versions.toml (#11007)
Related issue blocking the update (#10396) was created during
dependency update #10359, but I forgot to add the note to
libs.versions.toml.
2024-03-14 10:40:00 -07:00
Larry Safran 36e9f0dfac
core: Eliminate NPE seen in PickFirstLeafLoadBalancer (#11013)
ref b/329420531
2024-03-13 20:53:34 -07:00
Larry Safran 8a9ce990b0
Enable new PF by default (#11002) 2024-03-12 14:50:40 -07:00
Larry Safran 27a3d8e278
Eliminate validation so that code path is more heavily exercised (#11005)
* Eliminate validation so that code path is more heavily exercised
2024-03-12 13:41:03 -07:00
Sergii Tkachenko b3475a0e46
api: Remove ExperimentalApi from Attributes.Builder.discard (#11004)
Now tracked together with Attributes:
https://github.com/grpc/grpc-java/issues/1764.

Closes #5777.
2024-03-11 16:15:03 -07:00
Sergii Tkachenko 0d749c5943
xds: Stabilize CsdsService (#11003)
To make it stable, this PR hides protobuf from being exposed via the
API.

Note: this breaks ABI of `CsdsService.streamClientStatus` and
`CsdsService.fetchClientStatus`, but these methods should not
normally be called by the user.

Closes #8016.
2024-03-11 16:14:05 -07:00
Eric Anderson 403aa8189d Upgrade to Gradle 8.6 and upgrade plugins
The gradle wrapper was removed from example-oauth because we don't want
to maintain the wrapper copy in each example (at least right now it
doesn't make sense for it to be the only other example to have the
gradle wrapper).
2024-03-11 15:44:20 -07:00
Larry Safran d1c406bd23
Prepare to switch flag to use new PickFirstLeafLoadBalancer by default (#10998)
* Fix PickFirstLeafLoadBalancer and tests to work when it is used.
* Actually use EAG attributes for subchannels.
2024-03-11 14:12:56 -07:00
Sergii Tkachenko ebbe0673f3
alts,census,gcp-observability: Explicitly set grpc-context as an implementation dependency (#10997)
Override google-auth `io.grpc:grpc-context` dependency with our own
`project(":grpc-context")`.

This fixes the issue with classes depending on `grpc-alts`
implementation receiving very (!) old `io.grpc:grpc-context:1.27.2`
as a transitive dependency of `com.google.auth`.

Projects `grpc-census` `grpc-gcp-observability` are affected in the
similar way, except `io.grpc:grpc-context:1.59.1` is pulled as  a
transitive dependency of `io.opencensus`.

### Before
```
❯ ./gradlew -q :grpc-xds:dependencyInsight --configuration=compileClasspath --dependency=io.grpc:grpc-context
*** Skipping the build of codegen and compilation of proto files because skipCodegen=true
  * Skipping the build of Android projects because skipAndroid=true
io.grpc:grpc-context:1.27.2
  Variant compile:
    | Attribute Name                 | Provided | Requested    |
    |--------------------------------|----------|--------------|
    | org.gradle.status              | release  |              |
    | org.gradle.category            | library  | library      |
    | org.gradle.libraryelements     | jar      | classes      |
    | org.gradle.usage               | java-api | java-api     |
    | org.gradle.dependency.bundling |          | external     |
    | org.gradle.jvm.environment     |          | standard-jvm |
    | org.gradle.jvm.version         |          | 8            |

io.grpc:grpc-context:1.27.2
\--- io.opencensus:opencensus-api:0.31.1
     +--- com.google.http-client:google-http-client:1.43.3
     |    +--- com.google.auth:google-auth-library-oauth2-http:1.22.0
     |    |    \--- project :grpc-alts
     |    |         \--- compileClasspath
     |    \--- com.google.http-client:google-http-client-gson:1.43.3
     |         \--- com.google.auth:google-auth-library-oauth2-http:1.22.0 (*)
     \--- io.opencensus:opencensus-contrib-http-util:0.31.1
          \--- com.google.http-client:google-http-client:1.43.3 (*)

(*) - Indicates repeated occurrences of a transitive dependency subtree. Gradle expands transitive dependency subtrees only once per project; repeat occurrences only display the root of the subtree, followed by this annotation.
```

### After
```
❯ ./gradlew -q :grpc-xds:dependencyInsight --configuration=compileClasspath --dependency=io.grpc:grpc-context
*** Skipping the build of codegen and compilation of proto files because skipCodegen=true
  * Skipping the build of Android projects because skipAndroid=true
project :grpc-context
  Variant apiElements:
    | Attribute Name                 | Provided | Requested    |
    |--------------------------------|----------|--------------|
    | org.gradle.category            | library  | library      |
    | org.gradle.dependency.bundling | external | external     |
    | org.gradle.jvm.version         | 8        | 8            |
    | org.gradle.libraryelements     | jar      | classes      |
    | org.gradle.usage               | java-api | java-api     |
    | org.gradle.jvm.environment     |          | standard-jvm |
   Selection reasons:
      - By conflict resolution: between versions 1.63.0-SNAPSHOT and 1.27.2

project :grpc-context
\--- project :grpc-alts
     \--- compileClasspath

io.grpc:grpc-context:1.27.2 -> project :grpc-context
\--- io.opencensus:opencensus-api:0.31.1
     +--- com.google.http-client:google-http-client:1.43.3
     |    +--- com.google.auth:google-auth-library-oauth2-http:1.22.0
     |    |    \--- project :grpc-alts
     |    |         \--- compileClasspath
     |    \--- com.google.http-client:google-http-client-gson:1.43.3
     |         \--- com.google.auth:google-auth-library-oauth2-http:1.22.0 (*)
     \--- io.opencensus:opencensus-contrib-http-util:0.31.1
          \--- com.google.http-client:google-http-client:1.43.3 (*)

(*) - Indicates repeated occurrences of a transitive dependency subtree. Gradle expands transitive dependency subtrees only once per project; repeat occurrences only display the root of the subtree, followed by this annotation.
```
2024-03-11 14:00:12 -07:00
Eric Anderson aa90768129
rls: Fix a local and remote race
The local race passes `rlsPicker` to the channel before
CachingRlsLbClient is finished constructing. `RlsPicker` can use
multiple of the fields not yet initialized. This seems not to be
happening in practice, because it appears like it would break things
very loudly (e.g., NPE).

The remote race seems incredibly hard to hit, because it requires an RPC
to complete before the pending data tracking the RPC is added to a map.
But with if a system is at 100% CPU utilization, maybe it can be hit. If
it is hit, all RPCs needing the impacted cache entry will forever be
buffered.
2024-03-08 09:47:11 -08:00