Commit Graph

5794 Commits

Author SHA1 Message Date
A.J. Beamon 38ae352fc5 Fix a merge issue 2019-07-10 09:46:23 -07:00
A.J. Beamon 69d7c4f79c Merge branch 'master' into track-run-loop-busyness
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Net2.actor.cpp
#	flow/network.h
2019-07-09 18:39:23 -07:00
Evan Tschannen 49121172ea
Merge pull request #1795 from alexmiller-apple/peek-from-satellites
Log Routers will prefer to peek from satellite logs.
2019-07-09 17:38:57 -07:00
Evan Tschannen c8d86516f0
Merge pull request #1800 from ajbeamon/rename-datacenter-version-difference
Rename datacenter_version_difference to datacenter_lag and include bo…
2019-07-09 17:29:27 -07:00
Meng Xu cce00bb413
Merge pull request #1808 from ajbeamon/improved-transaction-metrics
Improve TransactionMetrics
2019-07-09 16:46:17 -07:00
Evan Tschannen 5851ad0208
Merge pull request #1768 from vishesh/task/cheap-clients
Make clients cheaper
2019-07-09 15:27:08 -07:00
Vishesh Yadav 4b8eb27134 fdbrpc: Move setStatus line in addPeerReference 2019-07-09 15:01:12 -07:00
A.J. Beamon fdd580c878 Restore some variable initializations that were unintentionally removed. 2019-07-09 15:00:11 -07:00
Vishesh Yadav 2f29b2c3d1 simulator: Just do a wait() in setupAndRun to avoid destruction
It get us out of the ACTOR, never clearing the systemActors, and let
simulator call exit().
2019-07-09 14:55:20 -07:00
Vishesh Yadav 983343978e fdbrpc: ConnectionMonitor should close unreferenced after delay
Potentially for cases, where it goes up to 1 immediately.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 22678267cd fdbrpc: Don't drop idle connections from server
Instead try pinging the client and let that decide whether the client
is alive or not. Ideally, it should always be failed since a well
behaved client would have closed the connection.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 1f9c80f633 fdbrpc: Instead of tracking last sent data, track last sent non-ping data
* This will allow client to continue monitoring peer connections while
connection stays open, so that there is no period of "uncertainity"
without previous no-monitoring approach.

* Use multiplier for incoming connection idle timeout

* Update idle connection timeout values and leaked connection timeout in
simulator.
2019-07-09 14:24:16 -07:00
Vishesh Yadav ae6c3e013a monitorClientInfo: Wait for master proxy endpoint failures than triggers
This will not initiate request to get get new set of proxy unless we
know for a fact that endpoint has indeed failed, not just because the
connection to Peer was closed as it was sitting idle.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 867986cdea fdbrpc: Reduced connection monitoring from clients
This patch does two changes to connection monitoring:

1. Connection monitoring at client side will check if the connection
has been stayed idle for some time. If connection is unused for a
while, we close the connection. There is some weirdness involved here
as ping messages are by themselves are connection traffic. We get over
this by making it two-phase process, first being checking idle
reliable traffic, followed by disabling pings and then checking for
idle unreliable traffic.

2. Connection monitoring of clients from server will no longer send
pings to clients. Instead, it keep monitor the received bytes and
close after certain period of inactivity.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 7647d3e3c0 fdbrpc: Don't use RequestStream for pings in ConnectionMonitor
RequestStream add another count to peerReference, which means as long
as ConnectionMonitor is alive, we'll never get peerReference=0 keeping
unnecessary connections potentially alive.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 78a1b2defc simulator: Destroy each process individually in its context
When simulation ends, all the actors are cancelled, and the
destructions which rely on `globals` may not have access to right
globals (instead of the default simulator process globals). This
patch, calls destroy on each process individually after we context
switch to that process so that the globals acceses in destructor are
its own.

This issue arised when trying to get `Peer::peerReferences` in
NetNotifiedQueue, resulting in decrementing the reference count of
peers in FlowTransport object of '0.0.0.0'.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 3f4f71ff9f fdbrpc: Increment peerReferences correctly
The constructor of FlowReceiver which handled reference counting
peerReferences relied on calling a virtual method from constructor
whose behaviour isn't correct. This patch, bubbles down result of that
virtual method from derived constructor to base contructor.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 705059dea1 Trace: Add support to print pointers 2019-07-09 14:24:16 -07:00
A.J. Beamon 764a4591ad Add a comment to internal flag 2019-07-09 14:17:26 -07:00
Vishesh Yadav eabc610daa
Merge pull request #1813 from alexmiller-apple/log-version-4
Add a TLogVersion::V4
2019-07-09 08:42:20 -07:00
Alex Miller 44f11702a8 Log Routers will prefer to peek from satellite logs.
Formerly, they would prefer to peek from the primary's logs.  Testing of
a failed region rejoining the cluster revealed that this becomes quite a
strain on the primary logs when extremely large volumes of peek requests
are coming from the Log Routers.  It happens that we have satellites
that contain the same mutations with Log Router tags, that have no other
peeking load, so we can prefer to use the satellite to peek rather than
the primary to distribute load across TLogs better.

Unfortunately, this revealed a latent bug in how tagged mutations in the
KnownCommittedVersion->RecoveryVersion gap were copied across
generations when the number of log router tags were decreased.
Satellite TLogs would be assigned log router tags using the
team-building based logic in getPushLocations(), whereas TLogs would
internally re-index tags according to tag.id%logRouterTags.  This
mismatch would mean that we could have:

    Log0 -2:0 ----- -2:0  Log 0

    Log1 -2:1 \
               >--- -2:1,-2:0 (-2:2 mod 2 becomes -2:0)  Log 1
    Log2 -2:2 /

And now we have data that's tagged as -2:0 on a TLog that's not the
preferred location for -2:0, and therefore a BestLocationOnly cursor
would miss the mutations.

This was never noticed before, as we never
used a satellite as a preferred location to peek from.  Merge cursors
always peek from all locations, and thus a peek for -2:0 that needed
data from the satellites would have gone to both TLogs and merged the
results.

We now take this mod-based re-indexing into account when assigning which
TLogs need to recover which tags from the previous generation, to make
sure that tag.id%logRouterTags always results in the assigned TLog being
the preferred location.

Unfortunately, previously existing will potentially have existing
satellites with log router tags indexed incorrectly, so this transition
needs to be gated on a `log_version` transition.  Old LogSets will have
an old LogVersion, and we won't prefer the sattelite for peeking.  Log
Sets post-6.2 (opt-in) or post-6.3 (default) will be indexed correctly,
and therefore we can safely offload peeking onto the satellites.
2019-07-08 22:25:01 -07:00
Alex Miller d2ef84a8f9 Add a TLogVersion::V4
And refactor some code to make adding more TLogVersions easier.
2019-07-08 22:22:45 -07:00
A.J. Beamon a5a6f8431c Add a random UID to TransactionMetrics in case a client opens multiple connections and also a field to indicate whether the connection is internal. Convert some of the metrics to our Counter object instead of running totals. 2019-07-08 14:01:04 -07:00
Evan Tschannen 7f427b60e2
Merge pull request #1807 from etschannen/master
Merge 6.1 into master
2019-07-08 09:05:03 -07:00
Evan Tschannen e4193b125b Merge branch 'master' of github.com:apple/foundationdb 2019-07-08 09:04:02 -07:00
Evan Tschannen ec11ef024b
Merge pull request #1798 from ajbeamon/merge-release-6.1-into-master
Merge release 6.1 into master
2019-07-08 09:02:56 -07:00
A.J. Beamon dd85edb08c
Merge pull request #1802 from xumengpanda/mengxu/DD-ensure-redundant-team-priority-as700-PR
TeamTracker:Set redundant team priority as PRIORITY_TEAM_REDUNDANT
2019-07-08 08:47:28 -07:00
Evan Tschannen b146da0dd1 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/DataDistribution.actor.cpp
#	versions.target
2019-07-07 20:54:59 -07:00
Evan Tschannen b7e762bd25 pull from master 2019-07-07 20:50:57 -07:00
Evan Tschannen 6ca3ca7944
Merge pull request #1806 from etschannen/post-release-cleanup-6.1.11
Post release cleanup 6.1.11
2019-07-07 20:49:14 -07:00
Evan Tschannen f80279d8c4 update installer WIX GUID following release 2019-07-07 20:48:22 -07:00
Evan Tschannen f840d40e05 update versions target to 6.1.12 2019-07-07 20:48:22 -07:00
Vishesh Yadav 8d3a826c63
Merge pull request #1804 from alexmiller-apple/cycle-verify-only
Add a checkOnly parameter to Cycle workload.
2019-07-05 21:59:52 -07:00
Jingyu Zhou 50e7593c5b
Merge pull request #1796 from ajbeamon/remove-trace-event-underscores
Remove trace event underscores
2019-07-05 21:45:55 -07:00
Alex Miller 14e5dd74fe Add a checkOnly parameter to Cycle workload.
So that it can be used in the real world for consistency checking of
backup and DR.
2019-07-05 19:09:09 -07:00
Evan Tschannen 310a5fe9a3 fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy 2019-07-05 17:28:22 -07:00
Meng Xu e8fb7564f5 Merge branch 'master' into mengxu/DD-ensure-redundant-team-priority-as700-PR 2019-07-05 17:28:12 -07:00
Evan Tschannen e7c0ecf729 fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy 2019-07-05 15:46:16 -07:00
Meng Xu 46d28a3b79 TeamTracker:Set redundant team priority as redundant
The redundant team removed by teamRemover will not exist
in the global teams data structure. So we will not find
the redundant team from shard-to-team mapping in the system key.

Before this change, teamTracker marks such team as PRIORITY_TEAM_UNHEALTHY.
With this change, it marks it as PRIORITY_TEAM_REDUNDANT
2019-07-05 15:24:00 -07:00
A.J. Beamon abb8503839 Add PR number 2019-07-05 14:37:28 -07:00
A.J. Beamon 4be08d9b2d Rename datacenter_version_difference to datacenter_lag and include both seconds and versions. 2019-07-05 14:36:18 -07:00
Andrew Noyes 6d74af93d3 Use true instead of 1 2019-07-05 14:07:02 -07:00
Andrew Noyes 15c6f2b864 Explain SFINAE for has_serialization_done 2019-07-05 14:07:02 -07:00
Andrew Noyes 9ed8eb2cdb Explain strange use of literal byte strings 2019-07-05 14:07:02 -07:00
Andrew Noyes 7350b3db30 Don't assume serializeReplicationPolicy succeeds 2019-07-05 14:07:02 -07:00
Andrew Noyes 889e153b81 Add object serializer flag to fdbcli 2019-07-05 14:07:02 -07:00
Andrew Noyes e2ed56fa56 Convert ownedPtr to unownedPtr for IReplicationPolicy
Remove WriteRawMemory feature

Remove deserialization_done
2019-07-05 14:07:02 -07:00
Andrew Noyes 9894d928a1 Re-use identical vtables 2019-07-05 14:07:02 -07:00
Andrew Noyes 4c5ebd7609 Avoid assert when collecting vtables 2019-07-05 14:07:02 -07:00
A.J. Beamon d06b961a4a
Merge pull request #1747 from alexmiller-apple/flowlock-api
A giant translation of TaskFooPriority -> TaskPriority::Foo
2019-07-05 14:06:14 -07:00