This is a big, but mostly mechanical change. The newly added Test*StreamTracer classes are designed to be extended which is why they are non final and have protected fields. There are a few notable things in this:
1. verifyNoMoreInteractions is gone. The API for StreamTracers doesn't make this guarantee. I have recovered this behavior by failing duplicate calls. This has resulted in a few bugs in the test code being fixed.
2. StreamTracers cannot be mocked anymore. Tracers need to be thread safe, which mocks simply are not. This leads to a HUGE number of reports when trying to find real races in gRPC.
3. If these classes are useful, we can promote them out of internal. I just put them here out of convenience.
Moved the following APIs from `io.grpc.testing.TestUtils` to `io.grpc.internal.TestUtils`:
`InetSocketAddress testServerAddress(String host, int port)`
`InetSocketAddress testServerAddress(int port)`
`List<String> preferredTestCiphers()`
`File loadCert(String name)`
`X509Certificate loadX509Cert(String fileName)`
`SSLSocketFactory newSslSocketFactoryForCa(Provider provider, File certChainFile)`
`void sleepAtLeast(long millis)`
APIs not to be moved:
`ServerInterceptor recordRequestHeadersInterceptor()`
`ServerInterceptor recordServerCallInterceptor()`
Static mutable flags are evil. Turned them to options on channel
builder. Also restore the local stats recording by default, because
these flags were added with the concern of wire-compatibility, but not
local feature.
Main implementation is in CensusTracingModule.
Also a few fix-ups in the stats implementation CensusStatsModule:
- Change header key name from grpc-census-bin to grpc-tags-bin
- Server does not fail on header parse errors. Uses the default instead.
Protect Census-based stats and tracing with static flags: `GrpcUtil.enableCensusStats` and `GrpcUtil.enableCensusTracing`. They keep those features disabled by default until they, especially their wire formats, are stabilized.
They were added in #2863. In TestServiceClient metricsExpected() returns
false because server-side stats is not available, but client-side stats
are there, thus these asserts would fail.
Resolves grpc/grpc/issues/10552
Per spec this metric should be calculated on the server and sent back to
the client, for which the mechanism is not currently defined. As it's
not a required metric, we remove the incorrect implementation for now.
Internal ref: b/37208451
Background
==========
LoadBalancer needs to track RPC measurements and status for
load-reporting. We need to introduce a "Tracer" API for that.
Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.
Alternatives
============
We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.
Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.
Part of the LB-related information is whether the client has received
the initial headers from the server. While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.
Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API. Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.
Summary of changes
==================
API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.
On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.
Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor. By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call. This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.
On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.
The Tracer API supports propagation of stats/tracing information through
Context and metadata. Both client-side and server-side tracer factories
have access to the headers object. Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.
Implementation details
----------------------
Only real streams report stats. Pseudo streams such as delayed stream,
failing stream don't report. InProcess transport streams currently
don't report stats.
"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream. On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.
The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.
Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.
Bug fixed
----------------
The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong. Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK. Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.
Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK). Whichever happens
first is the reported status.
TODOs
=====
A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
Guava 20 introduced some overloading optimizations for Preconditions
that require using Guava 20+ at runtime. Unfortunately, Guava 20 removes
some things that is causing incompatibilities with other libraries, like
Cassandra. While the incompatibility did trigger some of those libraries
to improve compatibility for newer Guavas, we'd like to give the
community more time to work through it. See #2688
At this commit, we appear to be compatible with Guava 18+. It's not
clear if we want to actually "support" 18, but it did compile. Guava 17
doesn't have at least MoreObjects, directExecutor, and firstNotNull.
Guava 21 compiles without warnings, so it should be compatible with
Guava 22 when it is released.
One test method will fail with the upcoming Guava 22, but this won't
impact applications. I made MoreThrowables to avoid using any
known-deprecated Guava methods in our JARs, to reduce pain for those
stuck with old versions of gRPC in the future (July 2018).
In the stand-alone Android apps I removed unnecessary explicit deps
instead of syncing the version used.
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.
This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
Futures almost universally should be handled in some way when being
returned, either to receive the value or to cancel scheduled tasks to
prevent leaks.
Netty is a bit of a special case though, since it constantly returns
futures that you ignore (even adding a listener returns the "this"
future). So we want to suppress the warning for code using Netty instead
of trying to fix it. When we enable ErrorProne in the build, we should
start passing -Xep:FutureReturnValueIgnored:OFF in the compilerArgs.
Commit 65e4d9f4 broke the jwt_token_creds. It is believed to be because
the JWT does not see the authority passed to overrideAuthority. So the
changes to interop-testing client are temporarily reverted here. Note
that this breaks GRPC_PROXY_EXP testing, so the incompatibility needs to
be resolved.
Solving #2682 will allow reverting this change.
Fixesgrpc/grpc#9497Fixes#2680
In some environments DNS is not available and is performed by the
CONNECT proxy. Nothing "special" should need to be done for these
environments, but the previous support took shortcuts which knowingly
would not support such environments.
This change should fix both OkHttp and Netty. Netty's
Bootstrap.connect() resolved the name immediately whereas using
ChannelPipeline.connect() waits until the address reaches the end of the
pipeline. Netty uses NetUtil.toSocketAddressString() to get the name of
the address, which uses InetSocketAddress.getHostString() when
available.
OkHttp is still using InetSocketAddress.getHostName() which may issue
reverse DNS lookups. However, if the reverse DNS lookup fails, it should
convert the IP to a textual string like getHostString(). So as long as
the reverse DNS maps to the same machine as the IP, there should only be
performance concerns, not correctness issues. Since the DnsNameResolver
is creating unresolved addresses, the reverse DNS lookups shouldn't
occur in the common case.
Because "core"'s test source already depends on "testing", e.g.,
`core/src/test/java/io/grpc/internal/ServerCallImplTest.java` uses
`testing/src/main/java/io/grpc/internal/testing/StatsTestUtils.java`,
which forms a circular dependency.
This change moves the StatsContext setter accessors from "core"'s test
source to "testing".
Resolves#2651
These are only used in internal tests. In production,
StatsContextFactory is loaded by the "Instrumentation" library and must
be one per process, thus we won't allow setting it on a per-channel or
per-server basis.
The interceptors are quite specific, and probably not helpful for
current testing strategies. They really are only useful for interop
testing. Moving them to interop-testing avoids them appearing to be in a
public API (even if that API is experimental).
No functional changes were made; just code movement.
If a LoadBalancer2 is passed in, the builder will create ManagedChannelImpl2 instead of ManagedChannelImpl. This allows us to test the LBv2 classes on a large scale.