If a handshake is ongoing during shutdown, this would substantially
reduce the time it takes to shut down. Previously, you would need to use
channel.shutdownNow() to have fast shutdown behavior, which is an
unnecessary use of the variant.
When the current approach was written WriteBufferingAndExceptionHandler
didn't exist and so it was hard to predict how the pipeline would react
to events (particularly because of HTTP/2 handler's re-definition of
close()). Now that WBAEH exists, this is more straight-forward.
Implement a simple allocation-free xx_hash utility class without using sun.misc.Unsafe. The hash function mainly targets on xDS use case, which is mostly small strings (endpoint address, headers, etc) and primitive types.
In gRPC's use case, string characters need to be treated as ASCIIs to make the produced hash values match other implementations (Envoy, gRPC-Go, C-core, etc) would produce.
The hashing implementation and tests are borrowed from OpenHFT's XxHash implementation (https://github.com/OpenHFT/Zero-Allocation-Hashing/blob/master/src/main/java/net/openhft/hashing/XxHash.java, see commit 658079a50903c32c54f2ab5c86243244b3ac60ed), which is under Apache 2.0 license. For more details, see https://github.com/OpenHFT/Zero-Allocation-Hashing.
The code is made to be in third_party directory with LISENCE and NOTICE files.
The serviceName field in oobChannel grpclb config should not be null, otherwise it will default to the lbHelper.getAuthority(), which perviously defaulted to the lookup service before #7852, but has been overridden to the backend service for authentication in #7852.
This change adds two parts to XdsClient for receiving configurations that support hashing based load balancing policies:
- Each Route contains a list of HashPolicys, which specifies the hash value generation for requests routed to that Route.
- Each Cluster resource can specify lb policy other than "round_robin". If it is "ring_hash", it contains the configuration for mapping each RPC's hash value to one of the endpoints.
The previous fix#7878 didn't work because the server field is expected to be full hostname (without port number). Need strip the port part from the authority.
The server filed in lookup request as specified in go/dynamic-request-routing/#heading=h.eqjtcpo6u8ep should be the original target, not the RLS server where the lookup request is sent to.
RLS RPC deadline is configured by service config, and could be extremely long. When RLS lb is shutdown, any pending RLS PRC should be cancelled. Now using shutdownNow() to forcefully close the RLS channel.
This change cleans up most value-typed classes in EnvoyProtoData, which represent immutable xDS configurations used in gRPC. This introduces AutoValue for reducing the amount of boilerplate code for pure data classes.
Not all value-typed classes in xDS have been migrated, some would need more invasive refactoring and would be done next. This change is a pure no-op refactoring. No behavior change should be introduced.
For more details, see PR description.
This is part of the examples and other documentation, but a user
starting with the README would find things not working and it be very
unclear why.
Realized this was an issue because of
https://stackoverflow.com/q/66028045/4690866 .
This change reimplements stats recording for the client side:
1. Implemented the new stats objects: ClusterDropStats and ClusterLocalityStats, which match C-core's implementation. The XdsClient APIs for accessing stats objects are
- addClusterDropStats(String clusterName, String edsServiceName)
- addClusterLocalityStats(String clusterName, String edsServiceName, Locality locality)
2. Eliminated the LRS LB policy and incorporate locality load recording in ClusterImplLoadBalancer. The endpoint addresses resolved in ClusterResolverLoadBalancer will attach the locality in each address attributes. In ClusterImplLoadBalancer, its helper's createSubchannel() will populate the address locality and then call XdsClient.addClusterLocalityStats(...) to obtain the per-locality stats object for recording RPCs. This stats object is attached to the created subchannel's attribute. Therefore, ClusterImplLoadBalancer receives Picker update from its child LB policy, the Picker's subchannel will always have the per-locality stats object attached. Helper.pickSubchannel(...) will populate the per-locality stats object and wrap it into the stream tracer for counting RPCs. Note the subchannel's shutdown() is wrapped to call the stats object's Release().
See this PR in netty: https://github.com/netty/netty/pull/9798 . It's
possible that one peer has closed the stream, yet another frame from
peers arrives after it. This is largely harmless, as explained in the PR
from netty repository. If we don't do this, the log will be polluted with
these harmless logs.
Example that would no longer be logged:
```
Jan 25, 2021 6:23:51 PM io.grpc.netty.NettyServerHandler onStreamError
WARNING: Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA frame for an unknown stream 27
at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:147)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:596)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:239)
...
```
ManagedChannelImpl should not override authority for createResolvingOobChannel(target, creds), because ManagedChannelImpl does not know what target and creds are.
Enhance `ManagedChannelBuilder.overrideAuthority()` to make it impossible to use a different authority to a backend by wrapping ClientTransportFactory.newClientTransport() and setting ClientTransportOptions’ authority. To avoid confusing the LB policy, it would need to keep the original addresses to return during `Subchannel.getAddresses()`
The class `OverrideAuthorityNameResolverFactory` is deleted and its logic is moved into `ManagedChannelImpl`.
- Add APIs to `ClientTransportFactory`:
```java
public interface ClientTransportFactory {
/**
* Swaps to a new ChannelCredentials with all other settings unchanged. Returns null if the
* ChannelCredentials is not supported by the current ClientTransportFactory settings.
*/
SwapChannelCredentialsResult swapChannelCredentials(ChannelCredentials channelCreds);
final class SwapChannelCredentialsResult {
final ClientTransportFactory transportFactory;
@Nullable final CallCredentials callCredentials;
}
}
```
- Add `ChannelCredentials` to constructor args of `ManagedChannelImplBuilder`:
```java
public ManagedChannelImplBuilder(
String target, @Nullable ChannelCredentials channelCreds, @Nullable CallCredentials callCreds, ...)
```
Changes the xDS interop test client to support timeout test.
- Synced xDS test proto messages with grpc-proto.
- Changed RpcConfig to be the configuration for per test method type. Added timeoutSec for its deadline configuration.
- Changed accumulated stats to include RPC status instead of just succeeded/failed.
Improve the CallCredentialsApplyingTransport shutdown lifecycle management. Right now CallCredentialsApplyingTransport shutdown the delegated real transport too early. It should be waiting for the metadataAppliers to finish because they may execute asynchronously. In addition, there is no shutdown check on CallCredentialsApplyingTransport for newStream(). The degraded lifecycle implementation may cause RejectionExecutionException, or accepting new RPCs after the underlying transport is already closed during channel shutdown.
We added listener on metadataApplier to notify completion, a magic counter to track the pending metadataApplier for delaying shutdown, also added shutdown check for newStream().
This provides us a path forward with #7211 (hiding
AbstractManagedChannelImplBuilder and AbstractServerImplBuilder) while
providing users a migration path to manage the ABI breakage (#7552). We
do a .class hack so that recompiling avoids the internal class reference
yet the old methods are still available.
Leaving the classes as-is causes javac to compile two versions of each
method, one returning the public class (e.g. ServerBuilder) and one
returning the internal class (e.g., AbstractServerImplBuilder). However,
we rewrite the signature that is used at compile time so that new
compilations will not reference internal-returning methods.
This is intended to be temporary, just to give a migration path. Once we
have given users some time to recompile we will remove this rewriting
and change the generics to use public classes.