Commit Graph

244 Commits

Author SHA1 Message Date
Kun Zhang 16f4de4636 core/stats: report message count metrics to Census. (#3312) 2017-08-04 15:58:21 -07:00
Carl Mastrangelo 02cb718767 testing,core: don't use mocks for stream tracers (#3305)
This is a big, but mostly mechanical change.  The newly added Test*StreamTracer classes are designed to be extended which is why they are non final and have protected fields.  There are a few notable things in this:

1.  verifyNoMoreInteractions is gone.   The API for StreamTracers doesn't make this guarantee.  I have recovered this behavior by failing duplicate calls.  This has resulted in a few bugs in the test code being fixed.

2.  StreamTracers cannot be mocked anymore.  Tracers need to be thread safe, which mocks simply are not.  This leads to a HUGE number of reports when trying to find real races in gRPC.

3.  If these classes are useful, we can promote them out of internal.  I just put them here out of convenience.
2017-08-03 16:59:06 -07:00
Eric Anderson c4f91272d2 all: Remove unused variables and squelch incorrect ErrorProne warning 2017-08-02 15:48:23 -07:00
Eric Gribkoff c0e010af8f interop-testing: implement compression interop tests 2017-07-31 16:08:23 -07:00
Kun Zhang 04e0450304 core: pass CallOptions to newClientStreamTracer(). (#3276)
Resolves #3256
2017-07-27 12:07:25 -07:00
Eric Anderson 924b0b2b00 Revert "interop-testing: overrideAuthority breaks JWT"
This reverts commit 57b9105c7f.

Issue #2682 is fixed, so we can revert the commit as planned. This
re-applies a previously-reverted modernization.
2017-07-19 13:54:32 -07:00
Ian Haken 677c84e1d4 okhttp: Add support for specifying a custom hostname verifier when using on OkHttpChannelBuilder. 2017-07-14 16:04:15 -07:00
ZHANG Dapeng ff0ad5fac3 testing: refactor part of TestUtils to internal
Moved the following APIs from `io.grpc.testing.TestUtils` to `io.grpc.internal.TestUtils`:

`InetSocketAddress testServerAddress(String host, int port)`
`InetSocketAddress testServerAddress(int port)`
`List<String> preferredTestCiphers()`
`File loadCert(String name)`
`X509Certificate loadX509Cert(String fileName)`
`SSLSocketFactory newSslSocketFactoryForCa(Provider provider, File certChainFile)`
`void sleepAtLeast(long millis)`

APIs not to be moved:

`ServerInterceptor recordRequestHeadersInterceptor()`
`ServerInterceptor recordServerCallInterceptor()`
2017-07-10 16:30:38 -07:00
ZHANG Dapeng 0a36c9d8a3 all: migrate .proto files from testing-proto to interop-testing
and provide a simple protobuf service for test in testing-proto instead.
2017-07-06 15:26:08 -07:00
Eric Gribkoff 7c60510794 all,interop-testing: update google-auth-library-oauth2-http (#3188) 2017-07-06 13:22:43 -07:00
Eric Anderson 361f0381be interop-testing: Support conscrypt at runtime with OkHttp
Conscrypt can thus be an alternative to Jetty ALPN for running these
OkHttp integration tests.
2017-06-30 17:07:38 -07:00
Eric Anderson 9052d4245f interop-testing: Skip Netty Jetty ALPN tests if it is unavailable 2017-06-30 16:25:31 -07:00
ZHANG Dapeng b7e50ddd05 testing: move NoopClientCall & NoopServerCall to internal
one step for #3105
2017-06-22 15:10:58 -07:00
ZHANG Dapeng 544ceded6b netty: rename HandlerSettings to InternalHandlerSettings 2017-06-19 09:04:04 -07:00
Eric Gribkoff a2a42e3396 interop-testing: add alpnagent to support okhttp in test client 2017-06-05 16:28:37 -07:00
Carl Mastrangelo 82fce837e4 core: don't return concrete type from AbstractServerImplBuilder 2017-06-01 14:38:05 -07:00
Carl Mastrangelo 166108a943 all: fix licence whitespace 2017-06-01 14:28:37 -07:00
Carl Mastrangelo 3bfd630bff all: update to Apache 2 licence
Also, update the authors.
2017-05-31 13:29:01 -07:00
Eric Anderson cefcd39396 Fix incorrect assertEquals argument ordering
The expected value should be first.
2017-05-24 08:47:02 -07:00
Eric Gribkoff 5dc8a124bf interop-testing,okhttp,testing: update tests to pass with ipv6 2017-05-16 17:01:05 -07:00
Kun Zhang be74e97b5d core: stop using static flags for Census control. (#2947)
Static mutable flags are evil.  Turned them to options on channel
builder.  Also restore the local stats recording by default, because
these flags were added with the concern of wire-compatibility, but not
local feature.
2017-04-26 10:50:55 -07:00
Kun Zhang 49bde5494b core: integrate instrumentation-java (Census) tracing (#2938)
Main implementation is in CensusTracingModule.

Also a few fix-ups in the stats implementation CensusStatsModule:

- Change header key name from grpc-census-bin to grpc-tags-bin
- Server does not fail on header parse errors. Uses the default instead.

Protect Census-based stats and tracing with static flags: `GrpcUtil.enableCensusStats` and `GrpcUtil.enableCensusTracing`. They keep those features disabled by default until they, especially their wire formats, are stabilized.
2017-04-25 13:53:29 -07:00
Eric Gribkoff b6bf252566 interop-testing: add cacheable_unary test 2017-04-21 10:22:05 -07:00
Kun Zhang 6618f9739e core: add inboundHeaders() to ClientStreamTracer. (#2921)
Also renamed headersSent to outboundHeaders
2017-04-17 14:01:34 -07:00
Carl Mastrangelo be61af42e9 core: use RESOURCE_EXHAUSTED for max message size failures 2017-04-13 08:35:48 -07:00
Kun Zhang 64938d3cb4 testing: remove negative asserts about stats. (#2894)
They were added in #2863. In TestServiceClient metricsExpected() returns
false because server-side stats is not available, but client-side stats
are there, thus these asserts would fail.

Resolves grpc/grpc/issues/10552
2017-04-10 14:23:26 -07:00
Kun Zhang 44cca5507d core: remove incorrect reporting of CLIENT_SERVER_ELAPSED_TIME. (#2891)
Per spec this metric should be calculated on the server and sent back to
the client, for which the mechanism is not currently defined. As it's
not a required metric, we remove the incorrect implementation for now.

Internal ref: b/37208451
2017-04-10 11:25:25 -07:00
Kun Zhang 903197b2aa core: StreamTracer (#2863)
Background
==========

LoadBalancer needs to track RPC measurements and status for
load-reporting.  We need to introduce a "Tracer" API for that.

Since such API is very close to the current
Census(instrumentation)-based stats reporting mechanism in terms of what
are recorded, we will migrate the Census-based stats reporting under the
new Tracer API.

Alternatives
============

We considered plumbing the LB-related information from the LoadBalancer
to the core, and recording those information along with the currently
recorded stats to Census. The LB-related information, such as LB_ID,
reason for dropping reqeusts etc, would be added to the Census
StatsContext as tags.

Since tags are held by StatsContext before eventually being recorded by
providing the measurements, and StatsContext is immutable, this would
require a way for LoadBalancer to override the StatsContext, which means
LoadBalancer API would has direct reference to the Census StatsContext.
This is undesirable because Census API is not stable yet.

Part of the LB-related information is whether the client has received
the initial headers from the server.  While such information can be
grabbed by implementing a ClientInterceptor, it must be recorded along
with other information such as LB_ID to be useful, and LB_ID is only
available in GrpclbLoadBalancer.

Bottom line, trying to use solely the Census StatsContext API to record
LB load information would require extra data plumbing channel between
ClientInterceptor, LoadBalancer and the gRPC core, as well as exposing
Census API on the gRPC API.  Even with those extensive changes, we are
yet to find a working solution. Therefore, we abandoned this idea and
propose this PR.

Summary of changes
==================

API summary
-----------
Introduce "StreamTracer" API, a callback interface for receiving stats
and tracing related updates concerning **a single stream**.
"ClientStreamTracer" and "ServerStreamTracer" add side-specific
events. A stream can have zero or more tracers and report to all of
them.

On the client-side, CallOptions now takes a list of
ClientStreamTracer.Factory. Opon creating a ClientStream, each of the
factory creates a ClientStreamTracer for the stream. This allows
ClientInterceptors to install its own tracer factories by overriding the
CallOptions.

Since StreamTracer only tracks the span of a stream, tracking of a
ClientCall needs to be done in a ClientInterceptor.  By installing its
own StreamTracer when a ClientCall is created, ClientInterceptor can
associate the updates for a Call with the updates for the Streams
created for that Call.  This is how we keep the existing Census
reporting mechanism in CensusStreamTracerModule.

On the server-side, ServerStreamTracer.Factory is added through the
ServerBuilder, and is used to create ServerStreamTracers for every
ServerStream.

The Tracer API supports propagation of stats/tracing information through
Context and metadata.  Both client-side and server-side tracer factories
have access to the headers object.  Client-side tracer relies on
interceptor to read the Context, while server-side tracer has
filterContext() method that can override the Context.

Implementation details
----------------------

Only real streams report stats.  Pseudo streams such as delayed stream,
failing stream don't report.  InProcess transport streams currently
don't report stats.

"StatsTraceContext" which used to receive updates from core and report
directly to Census (StatsContext), now delegates to the StreamTracers of
a stream.  On the client-side, the scope of a StatsTraceContext reduces
from ClientCall to a ClientStream to match the scope of StreamTracer.

The Census-specific logic that was in StatsTraceContext is moved into
CensusStreamTracerModule, which produces factories for StreamTracers
that report to Census.

Reporting with StatsTraceContext is moved out of the Channel/Call layer
into Transport/Stream layer, to match the scope change of
StatsTraceContext.

Bug fixed
----------------

The end of a server-side call was reported in ServerCallImpl's
ServerStreamListenerImpl.closed(), which was wrong.  Because closed()
receiving OK doesn't necessarily mean the RPC ended with OK.  Instead it
means the server has successfully sent the final status, which may be
non-OK, to the client.

Now the end report is done in both ServerStream.close(any Status) and
before calling ServerStreamListener.closed(non-OK).  Whichever happens
first is the reported status.

TODOs
=====

A follow-up change to the LoadBalancer API will add a
ClientStreamTracer.Factory to the PickResult to complete the API needed
by load-reporting.
2017-04-07 11:03:24 -07:00
Carl Mastrangelo 5c37a8360c core: cache Accept-Encoding headers (#2766)
* core: cache Accept-Encoding headers

This avoids rebuilding the raw bytes for each RPC.  The decode path
is not yet optimized to avoid pulling to much into a single commit.
Decompressors may still use invalid names, but this is equivalent to
previous behavior, as cleaing happens later in the caching.

Also, internal accessors on DecompressorRegistry are now hidden.

Before:
Benchmark                                                    (extraEncodings)    Mode      Cnt        Score    Error  Units
DecompressorRegistryBenchmark.marshalOld                                    0  sample   928744      124.104 ± 11.159  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   0  sample                84.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   0  sample                94.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   0  sample               107.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   0  sample               114.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   0  sample               202.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  0  sample              4944.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 0  sample             12178.008           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   0  sample           2056192.000           ns/op
DecompressorRegistryBenchmark.marshalOld                                    1  sample  1345050      150.123 ±  6.952  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   1  sample               109.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   1  sample               127.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   1  sample               142.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   1  sample               152.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   1  sample               243.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  1  sample              4640.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 1  sample             11472.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   1  sample           2101248.000           ns/op
DecompressorRegistryBenchmark.marshalOld                                    2  sample  1130903      175.846 ±  1.392  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   2  sample               131.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   2  sample               148.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   2  sample               164.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   2  sample               174.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   2  sample               311.000           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  2  sample              6048.768           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 2  sample             12349.107           ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   2  sample            112000.000           ns/op

After:
Benchmark                                                    (extraEncodings)    Mode      Cnt        Score   Error  Units
DecompressorRegistryBenchmark.marshalOld                                    0  sample  1095005       67.555 ± 5.529  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   0  sample                42.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   0  sample                52.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   0  sample                69.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   0  sample                84.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   0  sample               133.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  0  sample              3324.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 0  sample             11056.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   0  sample           1820672.000          ns/op
DecompressorRegistryBenchmark.marshalOld                                    1  sample  1437034       78.089 ± 0.723  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   1  sample                60.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   1  sample                69.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   1  sample                79.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   1  sample                83.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   1  sample                96.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  1  sample              2728.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 1  sample             11104.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   1  sample            105344.000          ns/op
DecompressorRegistryBenchmark.marshalOld                                    2  sample  1203782       95.213 ± 0.864  ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.00                   2  sample                68.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.50                   2  sample                85.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.90                   2  sample                98.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.95                   2  sample               101.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.99                   2  sample               119.000          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.999                  2  sample              3209.736          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p0.9999                 2  sample             11257.947          ns/op
DecompressorRegistryBenchmark.marshalOld:marshalOld·p1.00                   2  sample             63168.000          ns/op
2017-03-01 15:30:28 -08:00
Eric Anderson 437fafab1b Address review comments 2017-02-28 09:23:04 -08:00
Eric Anderson 2eeb5e3e9e all: Downgrade to Guava 19
Guava 20 introduced some overloading optimizations for Preconditions
that require using Guava 20+ at runtime. Unfortunately, Guava 20 removes
some things that is causing incompatibilities with other libraries, like
Cassandra. While the incompatibility did trigger some of those libraries
to improve compatibility for newer Guavas, we'd like to give the
community more time to work through it. See #2688

At this commit, we appear to be compatible with Guava 18+. It's not
clear if we want to actually "support" 18, but it did compile. Guava 17
doesn't have at least MoreObjects, directExecutor, and firstNotNull.
Guava 21 compiles without warnings, so it should be compatible with
Guava 22 when it is released.

One test method will fail with the upcoming Guava 22, but this won't
impact applications. I made MoreThrowables to avoid using any
known-deprecated Guava methods in our JARs, to reduce pain for those
stuck with old versions of gRPC in the future (July 2018).

In the stand-alone Android apps I removed unnecessary explicit deps
instead of syncing the version used.
2017-02-28 09:23:04 -08:00
Eric Anderson e67b6027af interop-testing: Remove useless tcnative configuration
It doesn't do anything, since tcnative has been specified as a normal
dependency for a while.
2017-02-27 17:12:07 -08:00
Eric Anderson 675080b208 all: Enable ErrorProne during compilation
ErrorProne provides static analysis for common issues, including
misused variables GuardedBy locks.

This increases build time by 60% for parallel builds and 30% for
non-parallel, so I've provided a way to disable the check. It is on by
default though and will be run in our CI environments.
2017-02-24 14:53:23 -08:00
Eric Anderson 9283b2b13f interop-testing: Remove useless Threads in test 2017-02-17 11:21:30 -08:00
Eric Anderson 72923dca87 all: prepare for ErrorProne's FutureReturnValueIgnored
Futures almost universally should be handled in some way when being
returned, either to receive the value or to cancel scheduled tasks to
prevent leaks.

Netty is a bit of a special case though, since it constantly returns
futures that you ignore (even adding a listener returns the "this"
future). So we want to suppress the warning for code using Netty instead
of trying to fix it. When we enable ErrorProne in the build, we should
start passing -Xep:FutureReturnValueIgnored:OFF in the compilerArgs.
2017-02-16 07:22:56 -08:00
Eric Gribkoff 48e1430909 interop-testing: fix flakes in Http2Client 2017-02-07 16:14:39 -08:00
Kun Zhang 7ab5e0e810 core: record server_elapsed_time on client (#2673)
It is defined as the time between the client sends out the headers, and the RPC finishes.
2017-02-03 13:29:06 -08:00
kpayson64 6fbe140959 interop-testing: Replace Mockito with StreamObserver for several tests (#2639) 2017-02-02 10:21:41 -08:00
Eric Anderson 57b9105c7f interop-testing: overrideAuthority breaks JWT
Commit 65e4d9f4 broke the jwt_token_creds. It is believed to be because
the JWT does not see the authority passed to overrideAuthority. So the
changes to interop-testing client are temporarily reverted here. Note
that this breaks GRPC_PROXY_EXP testing, so the incompatibility needs to
be resolved.

Solving #2682 will allow reverting this change.

Fixes grpc/grpc#9497
Fixes #2680
2017-02-01 08:38:53 -08:00
Eric Anderson 65e4d9f47a all: avoid DNS with GRPC_PROXY_EXP
In some environments DNS is not available and is performed by the
CONNECT proxy. Nothing "special" should need to be done for these
environments, but the previous support took shortcuts which knowingly
would not support such environments.

This change should fix both OkHttp and Netty. Netty's
Bootstrap.connect() resolved the name immediately whereas using
ChannelPipeline.connect() waits until the address reaches the end of the
pipeline. Netty uses NetUtil.toSocketAddressString() to get the name of
the address, which uses InetSocketAddress.getHostString() when
available.

OkHttp is still using InetSocketAddress.getHostName() which may issue
reverse DNS lookups. However, if the reverse DNS lookup fails, it should
convert the IP to a textual string like getHostString(). So as long as
the reverse DNS maps to the same machine as the IP, there should only be
performance concerns, not correctness issues. Since the DnsNameResolver
is creating unresolved addresses, the reverse DNS lookups shouldn't
occur in the common case.
2017-01-27 09:27:07 -08:00
Carl Mastrangelo 89bc2cd3b2 all: update to latest import ordering 2017-01-26 13:43:06 -08:00
Kun Zhang f088b81fc8 core: stop "testing" from depending on "core"'s test. (#2652)
Because "core"'s test source already depends on "testing", e.g.,
`core/src/test/java/io/grpc/internal/ServerCallImplTest.java` uses
`testing/src/main/java/io/grpc/internal/testing/StatsTestUtils.java`,
which forms a circular dependency.

This change moves the StatsContext setter accessors from "core"'s test
source to "testing".

Resolves #2651
2017-01-25 07:57:12 -08:00
Kun Zhang 737cd16a38 core: make StatsContextFactory setters protected (#2634)
These are only used in internal tests.  In production,
StatsContextFactory is loaded by the "Instrumentation" library and must
be one per process, thus we won't allow setting it on a per-channel or
per-server basis.
2017-01-20 17:20:44 -08:00
Eric Anderson f51316b84a testing: Move echo interceptors out of TestUtils
The interceptors are quite specific, and probably not helpful for
current testing strategies. They really are only useful for interop
testing. Moving them to interop-testing avoids them appearing to be in a
public API (even if that API is experimental).

No functional changes were made; just code movement.
2017-01-19 17:04:31 -08:00
Lukasz Strzalkowski 060eb45623 Rename attributes() to getAttributes() to make it consistent 2017-01-19 12:53:54 -08:00
Eric Anderson 1e99b299e1 all: ErrorProne fixes and avoid @Beta in Guava 2017-01-19 12:16:05 -08:00
Carl Mastrangelo 8e463ce1a8 core,stub: remove deprecated deadline methods 2017-01-10 15:54:38 -08:00
Kun Zhang d17a7b5bd4 core: abstract channel builder to accept LoadBalancer2 (#2583)
If a LoadBalancer2 is passed in, the builder will create ManagedChannelImpl2 instead of ManagedChannelImpl. This allows us to test the LBv2 classes on a large scale.
2017-01-10 15:30:12 -08:00
Carl Mastrangelo 8d49df28ee all: add max message size to client calls 2017-01-05 17:23:34 -08:00
ZHANG Dapeng 4a4f25ada4 weekly cleanup: errorprone, javastyle, unused (#2566) 2017-01-05 16:13:25 -08:00