Currently ManagedChannelImpl will interpret empty address list from
NameResolver as an error, and LoadBalancer will NEVER receive an empty
list from handleResolvedAddressGroups(). There is a case in the
request-routing design where a LoadBalancer only receives service
config with which it constructs children NameResolver/LoadBalancer
pairs, thus no address is expected at this level.
canHandleEmptyAddressListFromNameResolution() is a signal to Channel
(and to the parent LoadBalancer in case of hierachical LoadBalancers)
about whether it accepts empty address lists. The default is false,
which is the current behavior.
The logic is currently duplicated in ManagedChannelImpl and
AutoConfiguredLoadBalancer. The one in ManagedChannelImpl will be
removed once we delete ManagedChannelBuilder.loadBalancerFactory() as
AutoConfiguredLoadBalancer will always be the top-level LoadBalancer
by then.
Previously, overwriting an existing Key would cause the original CallOptions instance to also be mutated.
See #5142
Also adds a regression test for this issue.
ManagedChannelBuilder.loadBalancingFactory() overrides the proper
policy selection logic implemented in AutoConfiguredLoadBalancer, thus
has problems in cases where NameResolver returns balancer addresses,
because custom LoadBalancers normally don't differentiate between
normal server addresses with balancer addresses. The policy selection
logic will filter out balancer addresses.
When service owner turns on grpclb through service config, it
shouldn't break existing clients that don't have grpclb in their
classpath.
Resolves#4602
There are currently three boolean flags, and there will be one more
soon. Put them all in the top-level class instead of passing them as
arguments on lower levels.
Raise visibility of AutoConfiguredLoadBalancerFactory as internal
tests need to access it from a different package.
Rename HealthCheckingLoadBalancerFactory.LoadBalancerImpl to
*.HealthCheckingLoadBalancer so that its toString() output is more
informative.
This is needed for internal issue b/119247688.
A particular test that runs GRPC Android build in a non-Android
environment failed because RoundRobinLoadBalancerProvider was deleted
by ProGuard but the service-loader META-INF file still referred to it,
causing a loading failure.
This could be fixed by adding RoundRobinLoadBalancerProvider to the
hard-coded list, which is recognized by ProGuard then it will keep the
class.
However, we don't expect anyone to use RoundRobinLoadBalancerProvider
on Android, including the class on Android would increase code size,
which Android apps are sensitive to. Hence we move
RoundRobinLoadBalancerProvider to a different package (util), which is
built as a separate artifact internally which Android users usually
don't depend on. (Note that in open-source util is in the same artifact as core,
which is unfortunately).
Remove unused variables and prefer ArrayDeque to LinkedList. The swap to
Queue from Deque was just to make it more obvious what the usage was,
since the original swap to Deque was to avoid the same LinkedList lint
violation (3d51756d).
Introduce ChannelLogger, which is a utility provided to LoadBalancer implementations (potentially NameResolvers too) for recording events to channel trace. This is immediately required by client-side health checking (#4932, https://github.com/grpc/proposal/blob/master/A17-client-side-health-checking.md) to record an error about disabling health checking. It is also useful for any LoadBalancer implementations to record important information.
ChannelLogger implementation is backed by the internal ChannelTracer/Channelz. Because Channelz limits the number of retained events, and events are lost once the process ends, I have expanded it to also log Java logger. This would provide a "last resort" in cases where there are too many events or off-line investigation is needed. All logs are prefixed with logId so that they can be easily associated with the involved Channel/Subchannel.
To prevent log spamming, the logs are all at FINE level or below so that they are not visible by default. They are logged to ChannelLogger's logger, so that user can have precise control.
There are also more verbose information that may not fit in ChannelTracer, but can be useful for debugging. It's desirable that these logs are associated with logId, but they currently manually include the logId, which is cumbersome and may result in inconsistency. For this use case, I added the DEBUG level for ChannelLogger, which formats the log in the same way as other levels, while not recorded to Channelz.
I have converted most logging and channel tracer recording in the Channel implementation and LoadBalancers.
The ManagedChannelImpl change prevents any LB initialization failure
from producing a useless exception like:
java.lang.NullPointerException
at io.grpc.internal.ManagedChannelImpl.shutdownNameResolverAndLoadBalancer(ManagedChannelImpl.java:321)
at io.grpc.internal.ManagedChannelImpl.panic(ManagedChannelImpl.java:738)
at io.grpc.internal.ManagedChannelImpl$1.uncaughtException(ManagedChannelImpl.java:144)
Instead, now it will have the expected panic behavior of an INTERNAL
Status with a proper cause.
Since the NPE in AutoConfiguredLoadBalancerFactory wouldn't mean much to
users, it now has a more explicit message.
Following the [spec](https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md) on duplicate header names:
**Custom-Metadata** header order is not guaranteed to be preserved except for values with duplicate header names. Duplicate header names may have their values joined with "," as the delimiter and be considered semantically equivalent. Implementations must split Binary-Headers on "," before decoding the Base64-encoded values.
Because otherwise the user logic around Subchannel creation will
likely to race with handleSubchannelState().
Will log a warning if LoadBalancer.Helper.createSubChannel() is called
outside of the SynchronizationContext.
Adds SynchronizationContext.throwIfNotInThisSynchronizationContext()
to facilitate this warning. It can also be used by LoadBalancer
implementations to make it a requirement.
LoadBalancerProvider is the interface that extends LoadBalancer.Factory. LoadBalancerRegistry is the one that loads the providers through service loader, and allows users to access providers through their names.
pick_first and round_robin balancer factories, which are experimental public API are now deprecated. Their providers are internal, as they are accessible by policy name.
AutoConfiguredLoadBalancerFactory is modified to access implementations purely by their names, thus hard-coded class names are no longer needed, and it can support arbitrary policy selected by service config.
The trailing dot denotes the hostname to be absolute. It is fine to
leave, but removing it makes the authority match the more common form
and hopefully reduces confusion.
This happens to works around SNI failures caused when using gRPC-LB,
since SNI prohibits the trailing dot. However, that is not the reason
for this change as we have to support users directly providing a
hostname with the trailing dot anyway (and doing so is not hard).
See #4912
Swapping MetadataApplier to an abstract class is not ABI-safe for
callers. So I revert back to the previous interface definition and
introduce a CallCredentials2.MetadataApplier which is an abstract class.
Once everyone is on CallCredentials2 then we can swap it to an abstract
class again.
Fixes#5002
Provides a `SynchronizationContext` for scheduling tasks, with and without delay, from LoadBalancer implementations. This absorbs and extends the internal utility `ChannelExecutor`. It supersedes `Helper.runSerialized()`, which is now deprecated.
# Motivation
I see multiple cases that schedule tasks with a delay while requiring the task to run in the "Channel Executor". There have been repeated work to wrap scheduled tasks and handle races between cancellation and task run (see the diff in `GrpclbState.java` for example). The LoadBalancer implementation (e.g., GrpclbLoadBalancer) also has to acquire the `ScheduledExecutorService` from somewhere and release it upon shutdown.
The upcoming HealthCheckLoadBalancer (#4932), which would use back-off policy to retry health-checking streams, would have to do all the things above. At this point I think we need to provide something that combines `runSerialized()` with a scheduled executor with the same synchronization guarantees.
# Design details
`SynchronizationContext` is a similar to `ScheduledExecutorService` but tailored for use in `LoadBalancer` and potentially other cases outside of `LoadBalancer`. It offers task queuing and serialization and delayed scheduling. It guarantees non-reentrancy and happens-before among tasks. It owns no thread, but run tasks on caller's or caller-provided threads.
All channel-level state mutations and callback methods on `LoadBalancer` are done in a SynchronizationContext, which was previously referred to as "Channel Executor".
`SynchronizationContext.schedule()` returns a `ScheduledHandle` for status checking and cancellation. `ScheduedFuture` from `SchedulingExecutorService.schedule()` is too broad for our use cases (e.g., the blocking `get()` should never be used).
`SynchronizationContext.schedule()` requires a `ScheduledExecutorService`, which is now available through `Helper.getScheduledExecutorService()`. LoadBalancers don't need to worry about where to get `SchedulingExecutorService` any more.
# Alternatives
Alternatively, we could keep `Helper.runSerialized()` and add something like `Helper.runSerialiezdWithDelay()`, but having them on their own interface allows clean fake implementation by `FakeClock` for test, and allows other components (potentially `InternalSubchannel` for reconnection backoff) to use it too.
Instead of asking caller of `schedule()` to provide the `ScheduledExecutorService`, we considered having SynchronizationContext take a `ScheduledExecutorService` at construction. It would be inconvenient for LoadBalancer implementations that don't use `schedule()`, as they would be forced to provide a fake `ScheduledExecutorService` (which is cumbersome).
Instead of making `SynchronizationContext` a (semi-)concrete class, we considered making it an pure abstract class. However, we found it nontrivial to implement `execute()` correctly with the non-reentrancy guarantee.
This reverts commit ef8a84421d.
Firebase is not yet ready to migrate to the new API. Will try again once we made the release and migrated them to CallCredentials2.
This will allow enabling Error Prone on JDK 10+ (after
updating the net.ltgt.errorprone plugin), and is also a
prerequisite to that plugin update.
Also remove net.ltgt.apt plugin, as Gradle has native
support for annotationProcessor.
This change is mainly to fix a test, but it also is an implementation of the proposal here: https://github.com/grpc/proposal/pull/79
In short:
* Do not do SRV or TXT lookups when the target name is `localhost`. This can be overriden by a system property
* Do not do SRV or TXT lookups when the target name is an IPv6 or IPv4 address. This _cannot_ be overriden. The constructed domains for these queries would themselves not be valid. (e.g. _grpclb._tcp.192.168.0.1)
* Speeds up initial connection when communicating over local host, since it is extremely uncommon that such a connection would need gRPCLB or SRV records
I expect to remove the system property after a release if no one asks about it.
This is a rename of the pre-existing Netty builder method, so aliases
were added to the Netty builders.
Fixes#4050. This API was a minor rename to the pre-existing Netty API,
so has already undergone API review and thus is not ExperimentalApi.
There is a known issue that causes DNS lookup issue when network
siwtchover on android. This issue is tracked separately in #4962.
This change simply disables DNS cache to avoid the issue on Android.
Those overrides are kept for backward compatibility and convenience
for callers. Documentation already says implementations should not
override them. Making them final reduces confusion around which
override should be verified in tests and be overridden in forwarding
classes, thus prevents bugs caused by such confusion.
Returns a Channel that allows a LoadBalancer to make auxiliary RPCs on already-established application connections. We need this to implement client-side health-checking (#4932)
See comments on the API for its semantics.
Notable changes:
- Transports are modified to use InUseStateAggregator so that they can exclude RPCs made on Subchannel.asChannel() when reporting in-use state for idle mode.
- OobChannel shares the same Executor as Subchannel.asChannel(). Because the latter is not a ManagedChannel and doesn't have life-cycle, thus can't determine when to return the Executor to a pool, the Executor is now returned only when ManagedChannelImpl is terminated.
This change does 3 main things (in 3 commits):
1. Refactor the resolution runnable to be testable
2. Add Finer level logging to aid in debugging
3. Check that there are addresses before passing them to ManagedChannelImpl.
This is the first step of smoothly changing the CallCredentials API.
Security level and authority are parameters required to be passed to
applyRequestMetadata(). This change wraps them, along with
MethodDescriptor and the transport attributes to RequestInfo, which is
more clear to the implementers.
ATTR_SECURITY_LEVEL is moved to the internal GrpcAttributes and
annotated as TransportAttr, because transports are required to set it,
but no user is actually reading them from
{Client,Server}Call.getAttributes().
ATTR_AUTHORITY is removed, because no transport is overriding it.
All involved interfaces are changed to abstract classes, as this will
make further API changes smoother.
The CallCredentials name is stabilized, thus we first introduce
CallCredentials2, ask CallCredentials implementations to migrate to
it, while GRPC accepting both at the same time, then replace
CallCredentials with CallCredentials2.
This will be used by LoadBalancer plugins that delegates to another,
which is what the new request routing (go/grpc-request-routing-design)
requires. This will also be used to wrap LoadBalancers to add
client-side health-checking functionality.
* doc: organize Attributes Keys with annotations.
Keys are annotated with the following annotations:
1. Grpc.TransportAttr: transport attributes returned by
{Client,Server}Call.getAttributes().
2. NameResolver.ResolutionResultAttr: attributes passed as the
argument of NameResolver.Listener.onAddresses() and
LoadBalancer.handleResolvedAddressGroups()
3. EquivalentAddressGroup.Attr: attributes from
EquivalentAddressGroups.
* Expand the usage of annotations to Attributes variables.
Log using new proto definition
- Remove io.grpc.BinaryLog.CallId because a call ID is now an AtomicLong
- Add the concept of "always included" and "never included" metadata
keys. This is needed because grpc-status-details-bin is already
logged in the binlog msg, and we will log grpc-trace-bin for the
census info.
- unit tests are effectively rewritten
* Remove redundant SubchannelPicker refreshes in RoundRobinLoadBalancer
- Ensure active subchannel list and round-robin index is only
regenerated/refreshed when it changes
- Make it so that Subchannels exist in subchannels map iff their state
!= SHUTDOWN
- Add EmptyPicker class since logic for this case is disjoint from the
non-empty case
* remove explicit initialization of boolean ready field
per @carl-mastrangelo's review comment
* minor restructuring to make logic clearer; more explanatory comments
* move some checks inside updateBalancingState method for clarity
* store current state and picker in RRLB, only update when new one is diff
* some more simplification/refactoring; improve test coverage
- remove now redundant check in handleSubchannelState
- collapse getAggregatedState() and getAggregatedError() into
handleBalancingState()
- have both pickers extend new RoundRobinPicker, move
areEquivalentPickers() logic into RoundRobinPicker.isEquivalentTo()
- extend unit tests to cover some additional cases
* Address latest review comments from @zhangkun83
- Use explicit check for non-empty list instead of assert
- Change EmptyPicker.status to be non-nullable
- Further test coverage improvement including explicit picker comparison
tests
* use EMPTY_OK instead of Status.OK for initial empty picker
The exact census span behavior wrt gRPC is not yet defined, so let's
punt on tight integration.
It may be fine to log grpc-trace-bin on server side because it is a
key visible to the application.
This is an API used to coordinate across packages and must live in
`io.grpc`.
Prepending `Internal` makes it easier to detect and hide this class
from public visibility when using certain build tools.
fixes#4796
* Make the list of providers an immutable List
* Make obvious that the list is statically initialized
* Add documentation for when methods were added.
* Use RuntimeException, rather than IllegalStateException.
There seem to be some users converting from StatusRuntimeException
to StatusException using the following paradigm:
sre.getStatus().toException(sre.getTrailers())
Since there isn't a viable alternative, revert the warning.
grpc-netty is still really useful, but for most users who aren't doing
anything advanced using grpc-netty-shaded is much safer from a
dependency basis.
grpc-netty-shaded has seen more usage and has shown itself to be stable
and reduce the number of conflicts due to Netty versions.
RoundRobinLoadBalancerFactory creates a new Picker instance every time the set of provided address groups changes or the connection state of subchannels associated with existing address groups changes. In certain scenarios, such as deployment/replacement of the target service cluster, this can lead to high churn of Picker objects. Given that each new Picker's subchannel index is initialized to zero, in these scenarios requests can end up getting disproportionately routed through subchannels (and hence server nodes) which are earlier in the list of address groups.
At Netflix we have measured that some service nodes end up taking 3-4x the load that of other nodes during deployment.
This commit randomizes the start index of the RoundRobinLoadBalancerFactory.Picker which eliminates this behavior.
Prevent multiple effective close calls either by successful completion
of a cancel or complete notification, or through successive exceptions
handled within a single call.
DoNotMock was removed from error_prone_annotations in 2.1.3, because
there was no enforcement mechanism (which is in google/error-prone#572).
Guava and Trust also depend on error_prone_annotations and are beginning
to use newer versions, so our usage of DoNotMock is causing diamond
dependency problems. This allows us to update to 2.2.0.
The annotations were useful internally; we're solving that in cl/205294089.
Most of the changes are changing the signature of newClientTransport.
Since this is annoying, I choose to introduce a ClientTransportOptions
object to avoid the churn in the future.
With ClientTransportOptions in place, there's only a few lines necessary
of plumbing for the Attributes: add the field to ClientTransportOptions
and populate it in InternalSubchannel. There are no consumers of the
field in this commit.
This avoids the needs to flatten to EAGs for cases like PickFirst,
making the Attributes in EAGs able to be used in communication with
core. See #4302 for some discussion on the topic.
Instead of failing after a a missing A/AAAA record, this change
makes the resolver keep going and try out SRV records too. This
is needed for use with ALTS, and is part of the gRPCLB spec.
This change also moved the JNDI code to a separate, reflectively
loaded file. This makes it easy to exclude the file and not worry
about the missing class references on Android. Additionally, if
javax.naming might be available on Android, this allows it to be
loaded. A key side effect of this is that DnsNameResolver is
smaller, and more cleanly tested.
The motivation here is in some cases we log the remote-addr that is set in the gRPC call attributes, and have to special case this type to support inprocess servers.
This PR adds an automatic gradle format checker and reformats all the *.gradle files. After this, new changes to *.gradle files will fail to build if not in good format, just like checkStyle failure.
Includes
- API plumbing for channel tracing `+ManagedChannelBuilder.maxTraceEvents()`
- trace logging for channel/subchannel creation events and name resolving events.
Trace logging for all other events will be added in subsequent PRs.
Add internal accessors for ServerInterceptors and ClientInterceptors because some helpers were pkg private
Fix tests that were once creating BinaryLogProvider instances, they should now only create io.grpc.BinaryLog instances
classes in internal now use the io.grpc.BinaryLog API,
io.grpc.BinaryLogProvider is an implementation of the API and will be
moved to io.grpc.services.
This is a class similar to `CallTracer`, to be used for Channel Tracing.
The constructor arg `maxEvents` is a param supposed to be provided by `ManagedChannelBuilder`.
Moved `setState()` inside of `channelExecutor` runnable together with `setSubchannels`, otherwise the state may be inconsistent with subchannels, say channel state is the initial IDLE but subchannels is non-empty with active transports, or channel state is READY but subchannels is empty.
Deprecate static builder method, Keys.of(), add a notice of plans to
remove keys(), emphasize that the name is only a debug label.
The `@ExperimentalAPI` is left on the class because there are still
issues around hashCode/equals.
BinaryLog.java is the class that is responsible for intercepting
client and server calls. It now requires a CallId to be passed
in. The BinaryLogProviderImpl is responsible for generating a
CallId and passing it in.
Replace the tests with tests that just make sure binary log is
installed. Better tests exist today in BinaryLogProviderTest to make
sure the actual logic is correct.
In previous code of `ManagedChannelImplTest`, new tests will have to add more and more custom args in `createChannel(...)`, and each time when a custom arg is needed all other default args need be repeatedly provided in `createChannel(...)`. In most cases those args are just builder attributes. So this PR makes ManagedChannelImplTest easier to stub by providing a builder field, then new tests will just set necessary builder attributes before calling `createChannel()`.
Forward `toString()` method for forwarding classes to improve debug information.
For example, `ForwardingManagedChannel.toString()` will return something like:
```
ForwardingManagedChannel{delegate=ManagedChannelImpl{logId=tag-13247, target=localhost:8080}}
```
Then channelz GUI will take this into account. This is particularly
useful for InProcessTransport, where I have decided we do not need
special support for in channelz. The server and channel stats are
already sufficient.
- replumbed `RetryPolicy` with `MethodInfo` without breaking the existing `RetryPolicyTest`.
- moved `ServiceConfigInterceptor.MethodInfo.RetryPolicy` out as a top level class so that `RetriableStream` does not import `ServiceConfigInterceptor`.
Previously StreamTracer.streamClosed() is called in
ServerStream.close(), but it is not exactly when the stream is
officially closed. ServerStreamListener.closed() is guaranteed to be
called and it is the official end of the stream.
Before:
`InputBufferStream.close()` does not close their buffer so the buffer will leak.
After:
Resolves#4198.
Override the `close` for closing their buffer.
The peer socket is read from TRANSPORT_ATTR_REMOTE_ADDR from the
stream attributes. We only log the peer on receive initial metadata.
The call id assumes census is available. The call ID read from the
context via SERVER_CALL_ID_CONTEXT_KEY on server side, and read from
CallOptions via CLIENT_CALL_ID_CALLOPTION_KEY on client side. The
value is copied from CONTEXT_SPAN_KEY which is set by census.
Pass around CallId with two longs, not a byte[].