Commit Graph

956 Commits

Author SHA1 Message Date
Evan Tschannen a9541f8066 Merge branch 'feature-addpeer-fix' of github.com:etschannen/foundationdb into feature-addpeer-fix 2020-01-03 12:15:45 -08:00
Evan Tschannen 7152469cc3 log the base trace event before the endpoint messages 2020-01-03 12:15:38 -08:00
Andrew Noyes e16bdab3b4 Assert that a request hasn't already gotten a reply 2020-01-03 09:41:54 -08:00
Evan Tschannen 6b28e3b43b
Update fdbrpc/LoadBalance.actor.h
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-01-02 17:37:58 -08:00
Evan Tschannen 9e137d3b49 fix: addPeerReference only marks a connection as healthy if it is the first peerReference
added additional logging to long LoadBalance calls, and when the failure monitor state changes for an address
2019-12-19 18:26:29 -08:00
Alvin Moore 3bf971ba8b Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
2019-12-12 07:13:12 -08:00
Andrew Noyes b126570a20 Augment tests to catch A.J.'s counterexample 2019-12-05 10:27:12 -08:00
Andrew Noyes bd9faae1e7 Add a unit test that repros #2406 2019-12-04 16:45:32 -08:00
Andrew Noyes 46d10dc7dc Fix "null passed as argument declared not null"
Fix several such reports from ubsan

E.g.

/Users/anoyes/workspace/foundationdb/flow/Arena.h:794:16: runtime error: null pointer passed as argument 1, which is declared to never be null
2019-12-03 14:46:53 -08:00
Evan Tschannen 3c769fcf60 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2019-11-22 15:39:19 -08:00
Evan Tschannen ebcb2f79ed Merge branch 'master' of github.com:apple/foundationdb 2019-11-22 15:34:49 -08:00
Evan Tschannen 746b357b7f fix: simulation should not allow connections to dead processes 2019-11-21 20:36:40 -08:00
Evan Tschannen 27cb299d84 simulation can sometimes randomly hang or throw connection_failed, instead of always doing one or the other 2019-11-21 16:24:18 -08:00
Evan Tschannen 067dc55bfb fix: making _conn a state variable was keeping connections open that should be closed 2019-11-21 16:08:32 -08:00
Evan Tschannen 569c6d4476 throws of connection_failed() from net()->connect did not result in clients marking a connection as failed in the failure monitor 2019-11-21 13:08:59 -08:00
Evan Tschannen 2727b91c46 simulation tests network connections failing due to errors instead of just hanging 2019-11-21 12:33:07 -08:00
Evan Tschannen dbfa3dc217
Merge pull request #2200 from negoyal/storage-cache-subfeature1
Storage cache subfeature1
2019-11-20 13:59:06 -08:00
A.J. Beamon ed8d3f163c Rename hgVersion to sourceVersion. 2019-11-15 12:26:51 -08:00
Evan Tschannen 57fdbbf975 fix: in simulation dead connections need to stop receiving traffic after 1 second 2019-11-15 10:16:44 -08:00
Evan Tschannen 8d3ef89540 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/MutationList.h
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2019-11-14 15:49:56 -08:00
A.J. Beamon aad9fa3baa Don't check for too many connections closed on client connections 2019-11-13 13:00:43 -08:00
negoyal a4a0bf18f9 Merging with Master. 2019-11-12 13:01:29 -08:00
A.J. Beamon ef801a6432 Rename LargePacket warnings to distinguish between sent and received packets. Also remove Net2_ prefix from packet size trace events. 2019-11-12 09:23:46 -08:00
Evan Tschannen 24bc9aeaf5
Merge pull request #2224 from atn34/test-buggified-delay
Replace /flow/delayOrdering with /flow/buggifiedDelay
2019-10-31 11:20:55 -07:00
Evan Tschannen 3325980c03 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.actor.h
#	fdbserver/worker.actor.cpp
#	versions.target
2019-10-24 17:38:15 -07:00
mpilman 92ce9ef5dc updated comment 2019-10-24 11:45:32 -07:00
mpilman 325a8e4213 remove confusing USE_ODIRECT knob 2019-10-24 11:44:03 -07:00
mpilman f23392ec5a Don't use O_DIRECT in EIO by default 2019-10-24 11:39:55 -07:00
mpilman 7ad0e20e48 Added knob to disable O_DIRECT 2019-10-24 11:20:14 -07:00
mpilman f41f19b5f6 Introduced knob to set eio parallelism 2019-10-24 11:20:14 -07:00
mpilman 85977fb8d5 Use O_DIRECT with EIO 2019-10-24 11:20:14 -07:00
Evan Tschannen 688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Tapasweni Pathak 0fab0d1a25 remove whitespaces 2019-10-17 22:18:26 +05:30
Tapasweni Pathak 4000ddadc0 remove comments from ReplicationUtils.cpp file 2019-10-17 22:18:26 +05:30
Tapasweni Pathak 795c951b7b Add function documentation 2019-10-17 22:18:21 +05:30
Tapasweni Pathak f2f65eccc9 Merge remote-tracking branch 'upstream/master' into ticket-2135 2019-10-16 23:25:38 +05:30
A.J. Beamon 562ce17eca Initialize outgoingConnectionIdle in the constructor. Add back line to connectionKeeper that is needed in some looping cases 2019-10-10 12:48:35 -07:00
A.J. Beamon ad8604f24a Fix spurious ConnectionClosed event when starting a connection. 2019-10-10 10:34:44 -07:00
Andrew Noyes 69fe02933d Replace /flow/delayOrdering with /flow/buggifiedDelay
Seems that we don't want the property that delays become ready in order
to hold, so make sure it doesn't hold in the simulator.
2019-10-09 12:59:01 -07:00
Vishesh Yadav 162b4efaea
Merge pull request #2180 from alexmiller-apple/cmake-staticify-libraries
Make FDBLibTLS and thirdparty static libraries.
2019-10-03 14:00:07 -07:00
Alex Miller 3b9678356e Make FDBLibTLS and thirdparty static libraries.
They're statically linked anyway, and this fixes an issue with CMake
complaining that there are cyclic dependencies that are non-static.
2019-09-30 18:32:24 -07:00
Andrew Noyes a2243b6501 Add test for delay ordering
See #2148
2019-09-26 12:40:19 -07:00
Tapasweni Pathak 50d43cff15 Add comments to explain functions in ReplicationUtils.cpp 2019-09-26 23:03:13 +05:30
Andrew Noyes ab650c6fe6 Make test resilient to changes elsewhere in file 2019-09-12 09:21:17 -07:00
Andrew Noyes 8353dd4731 Test line pragmas in generated actor files 2019-09-11 17:12:55 -07:00
Andrew Noyes 9d531db985 Handle nested classes 2019-09-10 13:25:58 -07:00
Andrew Noyes c487f021f0 WIP - seems to work for 1 level of nesting 2019-09-10 13:09:37 -07:00
Andrew Noyes eeb2da5c9d WIP - simple example compiles 2019-09-09 22:56:19 -07:00
Meng Xu c2355f721e Merge branch 'master' into mengxu/performant-restore-PR 2019-09-04 17:11:42 -07:00
Meng Xu d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Evan Tschannen 24aad14f06 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-08-30 17:23:58 -07:00
Evan Tschannen a7237c4302
Merge pull request #2045 from atn34/disallow-scalar-network-messages
Disallow scalar network messages
2019-08-30 13:38:54 -07:00
Vishesh Yadav cf56b005e8 Add comment for pinging incompatible clients
If client is incompatible, connectionMonitor relies on peer->resetPing
to be triggered whenever data is received to prevent ping timeout.
The server stopped sending pings since 6.2 which meant resetPing
doesn't get triggered.
2019-08-30 11:17:22 -07:00
A.J. Beamon 1fdabe62c2
Merge pull request #2048 from etschannen/feature-fix-connections
Fixed two different ways useful connections were being closed
2019-08-30 11:05:02 -07:00
Evan Tschannen 8fc28dd730 fix: continue pinging incompatible clients from the servers so that the the client knows the server process is active 2019-08-29 16:51:03 -07:00
Evan Tschannen 1c0484cffc fix: do not close connections which have outstanding tryGetReplies with the peer 2019-08-29 16:49:57 -07:00
Andrew Noyes 6aa0ada7b1 Replace scalar root types with proper messages 2019-08-28 14:40:50 -07:00
Meng Xu d5b9c46de9 Increase delay in monitoring LeakedConnection
trackLeakedConnection actor should give server enough time to
close its connection due to idle connection.

The current logic waits for at least 24 seconds to detect and close
an idle connection.
The current trackLeakedConnection actor waits for about 30 seconds
to claim LeakedConnection error.

We increase the delay in trackLeakedConnection actor to avoid
false positive error in simulation test.

Co-authored by: Vishesh Yadav
2019-08-23 15:10:39 -07:00
Evan Tschannen 41b908752e increased move keys parallelism to be less of a decrease just in case lowering this could effect normal data distribution
raised target durability lag versions to give more time for batch limiting to come into play before this limit is hit
changed max_bad_options to better reflect the name
2019-08-21 14:55:21 -07:00
Evan Tschannen 51cedd24c8 load balance will send reads to remote servers if more than one alternative is failed or overloaded 2019-08-19 13:59:49 -07:00
Andrew Noyes 638c9379c6 Add [[flow_allow_discard]] to dsltest + FlowTests 2019-08-16 09:24:57 -07:00
Andrew Noyes fe6f2f59e0 Suppress what is apparently an intentional memory leak 2019-08-16 09:24:57 -07:00
Andrew Noyes a8cdcff0c2 Change --disable-actor-without-wait-warning to --disable-diagnostics
We probably just want to disable all actor diagnostics for the flow test
files.

Also add --generate-probes to the help text
2019-08-16 09:24:57 -07:00
Andrew Noyes 4ebb325ff9 Make cancellable actors [[nodiscard]] by default 2019-08-16 09:24:57 -07:00
Evan Tschannen da8163fd5a allow one requests every second to skip there all_alteratives_failed delay, because if a client has a timeout longer than the delay we will never invalidate the key servers cache 2019-08-09 13:03:40 -07:00
Evan Tschannen 5f7d3498ea The delay for all_alteratives_failed can scale all the way up to 30.0 at a much slow time ratio 2019-08-09 12:35:19 -07:00
Evan Tschannen 0eb0e7a44a made Peer reference counted to avoid other potential bugs involving accessing Peer after it has been destroyed 2019-08-09 11:52:12 -07:00
Evan Tschannen 84fd1003a5 do not close idle network connections with incompatible servers 2019-08-08 23:47:00 -07:00
Evan Tschannen 98b643b7ae fix: connectionReader could access self in its destructor after it has already the object has already been deleted if an error can be thrown from connectionMonitor while still on the stack from scanPackets 2019-08-08 23:35:44 -07:00
Meng Xu 7ff46e6772 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-07 20:31:56 -07:00
mpilman 370ba8b841 Remove --object-serializer flag from executables 2019-08-06 09:25:40 -07:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 7ccaeddf05 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-01 13:23:17 -07:00
A.J. Beamon 863204a29d Update names in CMakeLists, vcxproj 2019-08-01 08:48:25 -07:00
A.J. Beamon e0736232d4 Rename file in comment header 2019-08-01 08:40:45 -07:00
A.J. Beamon e61cac4ed4 Fix spacing issue; rename fdbrpc/Stats.h to fdbrpc/TimedRequest.h 2019-08-01 08:39:52 -07:00
Markus Pilman 0e474ed47e
Update fdbrpc/Stats.h
Co-Authored-By: Evan Tschannen <36455792+etschannen@users.noreply.github.com>
2019-07-31 19:56:21 -07:00
mpilman 7d247af500 Two minor bug fixes from recent optimizations 2019-07-31 19:14:11 -07:00
Andrew Noyes 0569df00f6 Remove indirection in LoadBalancedReply serialization 2019-07-31 17:59:35 -07:00
Jingyu Zhou daf1e09af4 Explicitly check for clang and g++ 2019-07-30 20:08:56 -07:00
Jingyu Zhou 638d2d05f4 Adds attribute to non-windows compilers 2019-07-30 15:49:25 -07:00
Jingyu Zhou 83922f1f37 Fix clang compiling error without sse4.2 2019-07-30 15:02:24 -07:00
Evan Tschannen 90e3b50213 Merge branch 'master' into feature-coordinator-connection
# Conflicts:
#	fdbclient/DatabaseContext.h
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Evan Tschannen 4a866290b7 Clients keep a persistent connection open with coordinators to get updates to the list of proxies
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00
Evan Tschannen c70e762f0e
Merge pull request #1785 from xumengpanda/mengxu/server-team-remover-PR
Remove redundant server teams
2019-07-19 17:44:16 -07:00
mpilman 1ac2d01b03 Merge remote-tracking branch 'upstream/master' into flatbuffers-fixes2 2019-07-18 09:50:08 -07:00
Evan Tschannen 5d3e69b6fc
Merge pull request #1820 from fzhjon/load-balance-locality
Introduced a knob that can turn locality on/off
2019-07-16 16:40:43 -07:00
mpilman d5caf0c1b4 Merge branch 'flatbuffers-fixes2' of github.com:mpilman/foundationdb into flatbuffers-fixes2 2019-07-16 14:47:40 -07:00
Jon Fu 7d37040725 split the single knob into two for finer-grained control 2019-07-16 12:46:02 -07:00
Meng Xu 20f067e794 Merge with master:Resolve conflict with PR#1797 2019-07-16 10:52:28 -07:00
mpilman 6c6a1ca8f4 Expose serialization context too all traits 2019-07-15 12:58:31 -07:00
Meng Xu 1c0daa7f2c Resolve review comments:Remove unneeded code 2019-07-12 18:10:04 -07:00
mpilman 75d4b612cf Make object serializer versioned 2019-07-12 11:53:14 -07:00
Meng Xu cf935ff9e6 Remove debug message and format code 2019-07-11 22:05:20 -07:00
Andrew Noyes c96854fce2 Simplify IReplicationPolicy serialization 2019-07-11 17:35:37 -07:00
mpilman be3a07826d fixed serialization bug with ReplicationPolicy 2019-07-11 17:35:37 -07:00
Andrew Noyes ae6f17625e Support PacketBuffer's of arbitrary size 2019-07-11 17:35:37 -07:00
Andrew Noyes 59ddf091e8 Re-use writeToOffsets vector 2019-07-11 17:35:37 -07:00
Andrew Noyes b9af1f43b7 Implement new dynamic_size_traits 2019-07-11 17:35:37 -07:00
Steve Atherton 1700d492cf
Merge pull request #1823 from ajbeamon/cache-hit-rate-in-status
Tweak cache hit calculations and add cache hit rate to status
2019-07-11 14:06:06 -07:00
Meng Xu c6e42d6119 ReplicationPolicy:Add trace for the name of each keyIndex 2019-07-10 19:29:29 -07:00
A.J. Beamon b4dbc6d7fa Change the way cache hits and misses are tracked to avoid counting blind page writes as misses and count the results of partial page writes. Report cache hit rate in status. 2019-07-10 14:43:20 -07:00
Vishesh Yadav c694931e33
sim2: Remove obsolete comment 2019-07-10 14:06:06 -07:00
Jon Fu 19a91765f6 introduced a knob that can turn locality on/off 2019-07-09 16:16:15 -07:00
Vishesh Yadav 4b8eb27134 fdbrpc: Move setStatus line in addPeerReference 2019-07-09 15:01:12 -07:00
Vishesh Yadav 983343978e fdbrpc: ConnectionMonitor should close unreferenced after delay
Potentially for cases, where it goes up to 1 immediately.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 22678267cd fdbrpc: Don't drop idle connections from server
Instead try pinging the client and let that decide whether the client
is alive or not. Ideally, it should always be failed since a well
behaved client would have closed the connection.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 1f9c80f633 fdbrpc: Instead of tracking last sent data, track last sent non-ping data
* This will allow client to continue monitoring peer connections while
connection stays open, so that there is no period of "uncertainity"
without previous no-monitoring approach.

* Use multiplier for incoming connection idle timeout

* Update idle connection timeout values and leaked connection timeout in
simulator.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 867986cdea fdbrpc: Reduced connection monitoring from clients
This patch does two changes to connection monitoring:

1. Connection monitoring at client side will check if the connection
has been stayed idle for some time. If connection is unused for a
while, we close the connection. There is some weirdness involved here
as ping messages are by themselves are connection traffic. We get over
this by making it two-phase process, first being checking idle
reliable traffic, followed by disabling pings and then checking for
idle unreliable traffic.

2. Connection monitoring of clients from server will no longer send
pings to clients. Instead, it keep monitor the received bytes and
close after certain period of inactivity.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 7647d3e3c0 fdbrpc: Don't use RequestStream for pings in ConnectionMonitor
RequestStream add another count to peerReference, which means as long
as ConnectionMonitor is alive, we'll never get peerReference=0 keeping
unnecessary connections potentially alive.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 3f4f71ff9f fdbrpc: Increment peerReferences correctly
The constructor of FlowReceiver which handled reference counting
peerReferences relied on calling a virtual method from constructor
whose behaviour isn't correct. This patch, bubbles down result of that
virtual method from derived constructor to base contructor.
2019-07-09 14:24:16 -07:00
Andrew Noyes 15c6f2b864 Explain SFINAE for has_serialization_done 2019-07-05 14:07:02 -07:00
Andrew Noyes 7350b3db30 Don't assume serializeReplicationPolicy succeeds 2019-07-05 14:07:02 -07:00
Andrew Noyes e2ed56fa56 Convert ownedPtr to unownedPtr for IReplicationPolicy
Remove WriteRawMemory feature

Remove deserialization_done
2019-07-05 14:07:02 -07:00
Andrew Noyes 4c5ebd7609 Avoid assert when collecting vtables 2019-07-05 14:07:02 -07:00
Alex Miller 888f4f92e0 Fix errors and TaskPriority more priorities. 2019-07-03 21:03:58 -07:00
Alex Miller 8e1ab6e7db Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-28 17:32:54 -07:00
Evan Tschannen 5ccffe3cb3
Merge pull request #1684 from jzhou77/large-packet
Better handling for large packets
2019-06-28 16:19:01 -07:00
Alex Miller bc4548e0d3 Fix sed accidentally rewriting a trace event to have an invalid field name. 2019-06-27 17:55:41 -07:00
Alex Miller bf883d7055 Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-25 14:26:50 -07:00
Evan Tschannen 0fe6edc254
Merge pull request #1678 from mpilman/features/external-workload
Features/external workload
2019-06-25 13:53:19 -07:00
Evan Tschannen 24937d8125
Merge pull request #1744 from vishesh/task/monitor-leader-on-demand
Fix setting enClientFailureMonitor global for client
2019-06-25 13:38:59 -07:00
Jingyu Zhou e6ff23c420 Allow next packet size to be read when receiving data
For large packet, allocate sizeof(uint32_t) more bytes for next packet size.
Also add knob MIN_PACKET_BUFFER_FREE_BYTES, which is used to trigger allocation
of a new arena when free bytes are lower than this threshold.
2019-06-25 10:18:56 -07:00
Jingyu Zhou c4e44e6697 Refactor: add missing include 2019-06-25 10:18:56 -07:00
Jingyu Zhou 54ac91342d Refactor: include ordering in FlowTransport.actor.cpp 2019-06-25 10:18:56 -07:00
Jingyu Zhou 9d07e84dc8 Update fdbrpc/FlowTransport.actor.cpp
Co-Authored-By: Evan Tschannen <36455792+etschannen@users.noreply.github.com>
2019-06-25 10:18:56 -07:00
Jingyu Zhou e0c3df899b Break connection read into multiple smaller reads
And add yield between consecutive reads.
2019-06-25 10:18:56 -07:00
Jingyu Zhou 559a4b6e26 Add const qualifier 2019-06-25 10:18:56 -07:00
Jingyu Zhou 2363326ecb Add knobs for various packet buffer sizes 2019-06-25 10:18:56 -07:00
Jingyu Zhou b151141965 Better handling for large packets
On the sending side, a large packet is split into smaller pieces. On the
receiving side, use packet length to allocate buffer to avoid multiple memcpy
and allocations.
2019-06-25 10:18:56 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Vishesh Yadav cbc2398254 fdbrpc: Remove default parameter from FlowTransport::createInstance 2019-06-25 01:17:38 -07:00
A.J. Beamon 189541d42c LoadBalancedReplies were sending uninitialized versions of their subclasses when there was an error. 2019-06-20 13:57:56 -07:00
mpilman 844dd60202 FDB compiling with intel compiler 2019-06-20 09:29:01 -07:00
Alex Miller 12dbe13c9c Provide a no-op O_CLOEXEC on windows to fix the build. 2019-06-19 17:16:06 -07:00
Alex Miller df0baa0066
Merge pull request #1720 from mpilman/features/protocol-version
Make protocol version a type
2019-06-19 13:46:35 -07:00
mpilman 2eff2b7e21 First simple test is working (but very buggy) 2019-06-19 13:03:41 -07:00
mpilman 68ce9a5e75 ProtocolVersion type - second try 2019-06-18 17:55:27 -07:00
Alex Miller 4fa5dc0502 Merge remote-tracking branch 'upstream/master' into cloexec 2019-06-18 16:35:18 -07:00
mpilman 8576665a90 Revert "Revert "Make protocol version a type""
This reverts commit 455bf3b3ec.
2019-06-18 14:49:04 -07:00
Alex Miller 455bf3b3ec Revert "Make protocol version a type" 2019-06-18 10:59:17 -07:00
mpilman dc9522bd86 Address code-review comments 2019-06-16 09:59:15 -07:00
mpilman da53a92bec Make protocol version a type
This fixes #1214

The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
Evan Tschannen 6ececa94ce
Merge pull request #1640 from vishesh/task/client-failmon
Clients will no longer get failure monitoring info from cluster controller
2019-06-13 17:31:17 -07:00
Evan Tschannen 55f7e7d372 fix: The delay inside the disabledMap was causing the storage server updateStorage actor to run on the client process 2019-06-13 14:28:30 -07:00
Evan Tschannen dccb9bc26d fixed a number of correctness problems 2019-06-12 19:40:50 -07:00
Vishesh Yadav 42fafe8a42 Addressed review comments 2019-06-11 18:58:00 -07:00
Evan Tschannen 9fdbf0c846
Merge pull request #1477 from tclinken/features/local-rk
Local Ratekeeper
2019-06-11 13:24:08 -07:00
mpilman 4c62458172 Make valgrind work on Fedora 30 2019-06-11 10:27:08 -07:00
Trevor Clinkenbeard 8144882d7b Merge branch 'apple-master' into features/local-rk 2019-06-10 19:40:25 -07:00
Vishesh Yadav a8e408e268 run clang-format on changes 2019-06-10 14:10:24 -07:00
Vishesh Yadav 4316ef9ec6 failMon: For clients remove expireFailure and report failures only during connect 2019-06-09 00:43:38 -07:00
Vishesh Yadav 6fa7081a21 net: Don't make FailureMonitoring requests from client
This patch removes the need for clients to continuously contact
cluster coordinator for failure monitoring information. Instead, it
uses the FlowTransport to monitor the statuses of peers and update
FailureMonitor accordingly.
2019-06-09 00:43:38 -07:00
Vishesh Yadav 6b4d30c3ae failmon: Identify client vs server when starting failure monitoring client 2019-06-09 00:43:12 -07:00
Evan Tschannen 29b96414e2 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/NativeAPI.actor.cpp
#	fdbserver/Coordination.actor.cpp
#	flow/Arena.h
#	versions.target
2019-06-03 18:49:35 -07:00
Parallels 773f52d0a1 Merge remote-tracking branch 'upstream/master' into cloexec 2019-06-03 15:43:32 -07:00
Evan Tschannen 172a0a9009 added an extra trace event 2019-05-30 17:25:42 -07:00
sramamoorthy d68a229772 makefile changes to accommodate boost/process.hpp 2019-05-28 22:07:46 -07:00
A.J. Beamon 603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
Evan Tschannen f4fbaac6b0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-05-19 10:27:59 -07:00
Evan Tschannen 2b8b7954a9 in simulation, prevent data from being received over a connection 1 second after the connection is closed on the other end 2019-05-17 15:05:32 -07:00
Steve Atherton 5a8c97480a
Merge pull request #1506 from nikolas-ioannou/feature-pagecache-lru
AsyncFileCached: switch from a random to an LRU cache eviction policy
2019-05-17 13:42:21 -07:00
Alvin Moore 3acaa7343e Enabled C++17 for all Windows projects
Set Visual Studio version to 2017 (first version to support C++17)
2019-05-16 17:44:13 -07:00
Evan Tschannen a745a8094e do not close a connection due to a missed ping if the process is still receiving data from the connection 2019-05-16 17:26:48 -07:00
Alvin Moore 94aed513c7 Switched Windows tools within projects to 2017 2019-05-16 15:05:11 -07:00
Alex Miller 69fb852ee0 Add more CLOEXEC-like things.
From missed call sites found during/after code review.
2019-05-14 20:30:58 -10:00
Alex Miller 1f02cd30e2 Mark all opened files as close-on-exec. 2019-05-13 16:05:49 -10:00
mpilman 46e7a0ca56 address reviews and make compile with `-Wunused-variable` 2019-05-13 14:15:23 -07:00
mpilman 0713e06efc Started to work on Windows 2019-05-13 14:15:23 -07:00
mpilman 20c3f7f264 remove mixed-mode support 2019-05-13 14:15:23 -07:00
mpilman 42385c2f81 Fixed issues introduced during rebase 2019-05-13 14:15:23 -07:00
mpilman f5fa3a65b4 some more fixes 2019-05-13 14:15:23 -07:00
mpilman 44db3450ec Several flatbuffers bug fixes 2019-05-13 14:15:23 -07:00
mpilman 69fa3d3903 fixed compilation issues after rebase 2019-05-13 14:15:23 -07:00
mpilman 3bc5ee8c58 fix indentation 2019-05-13 14:15:22 -07:00
mpilman 47fdb78782 use EnsureTable for enums of enums 2019-05-13 14:15:22 -07:00
mpilman 642a96807b Fixed compilation issues after rebase 2019-05-13 14:15:22 -07:00
mpilman 6afce01744 Implementation complete (not yet working) 2019-05-13 14:15:22 -07:00
mpilman 9eeb48c43d Allow to turn on object serializer
This commit includes functionality to turn on
the object serializer for network communication.
This is done the following way:

- On incoming connections, a process will detect
  whether the client supports the object serializer
  and will only serialize responses with it, if it does
- On outgoing connections, the command line flag is used
  to determine whether the object serializer should be used
  to send data.

This way, a cluster can run in mixed mode. To upgrade one
can upgrade one process at a time and set the flag one process
at a time.

This is how this is tested on the simulator:
- The command line flag can take three options: on, off,
  and random.
- For off, the object serializer will never we used.
- For on, the object serializer will be always used.
- For random, the simulator will flip a coin for each
  process it starts up.
2019-05-13 14:15:22 -07:00
mpilman ba83c458a6 types implemented 2019-05-13 14:15:22 -07:00
mpilman fe81454ec2 basic functionality for object serializer
This commit includes:
- The flatbuffers implementation
- A draft on how it should be used for network messages
- A serializer that can be used independently

What is missing:
- All root objects will need a file identifier
- Many special classes can not be serialized yet as the
  corresponding traits are not yet implemented
- Object serialization can not yet be turned on (this will
  need a network option)
2019-05-13 14:15:22 -07:00
Evan Tschannen 8c3516951a Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-05-12 20:13:49 -07:00
Alex Miller c502ed3d15 Fix a variety of problems stemming from a wait() being added to push().
And that this code was previously insufficiently tested.
2019-05-10 14:55:11 -10:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Alex Miller 510b0b2fcd Fix DiskQueue not replaceFile'ing frequently enough for the final time. 2019-05-08 23:08:25 -10:00
Nikolas Ioannou c2827f4fa3 Add page cache hit, miss, and eviction stats to SystemMonitor 2019-05-08 15:41:17 +02:00
Austin Seipp af00248df6 fdbrpc: fix some print/scan format warnings
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Jingyu Zhou 41ae9cd0d8 Remove ; for namespace declarations 2019-05-06 13:21:28 -07:00
Nikolas Ioannou d6170210e7 AsyncFileCached: throw (new) exception, through helper static fn, if cache eviction polity not recognized. 2019-05-06 10:11:46 +02:00
Nikolas Ioannou fdb3992990 AsyncFileCached: switch to string for eviction policy knob 2019-05-03 14:04:43 +02:00
Nikolas Ioannou b2280e15bf AsyncFileCached: support for lru cache eviction policy
- Added a knob to control page cache eviction policy (0=random default, 1=lru)
- Added page cache hit, miss, and eviction stats
2019-05-02 17:35:30 +02:00
Jingyu Zhou e193cac5ef Merge remote-tracking branch 'apple/master' into tlog
Resolve Conflicts: fdbserver/MasterProxyServer.actor.cpp
2019-05-01 17:18:00 -07:00
Evan Tschannen 2d5043c665 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-30 18:27:04 -07:00
Jingyu Zhou 024d58263e
Merge pull request #1505 from alexmiller-apple/reorder-unset-fit
UnsetFit > OkayFit, so order the cases that way
2019-04-30 13:21:37 -07:00
Nikolas Ioannou 2462d29235 AsyncFileCached: switch from a random to an LRU cache eviction policy 2019-04-30 11:56:34 +02:00
Alex Miller 48ce964225 UnsetFit > OkayFit, so order the cases that way
Keeping them sorted top->down makes keeping track of the rules of what
will be recruited where much easier.
2019-04-29 19:51:43 -07:00
Stephen Atherton b4f833c27e Added sampled logging for slow ftruncate/fallocate calls. 2019-04-23 13:40:23 -07:00
A.J. Beamon e0f76edf77
Merge pull request #1471 from AlvinMooreSr/release-6.1-merge
Merge Release 6.1 Into Master
2019-04-23 11:08:21 -07:00
Jingyu Zhou d19b0cf1c1 Refactor LogSet with two new constructors 2019-04-21 10:41:07 -07:00
Trevor Clinkenbeard 2ad6d4d994 Do not rethrow server_overloaded error from load balancer 2019-04-20 15:38:57 -07:00
Andrew Noyes ddd6e26e18 Fix unused-variable warning in linux-only code 2019-04-20 10:39:20 -07:00
Trevor Clinkenbeard 2967ceb807 Fixed checkAndProcessResult
checkAndProcessResult must handle errors in LoadBalancedReply objects in the
same way other errors are handled.
2019-04-19 18:29:58 -07:00
Meng Xu 529ce66b6c Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-18 18:02:45 -07:00
Alvin Moore 2bea99591e Merge branch 'release-6.1' of copy of master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-04-17 15:51:48 -07:00
Andrew Noyes 781b6ece77 Fix OPEN_FOR_IDE -Wunused-variable warnings
CC #1255, #1173
2019-04-16 15:28:01 -07:00
Andrew Noyes 75b9369583 Make checksumHistoryBudget optional
See https://github.com/apple/foundationdb/pull/1446#discussion_r275933381
2019-04-16 12:55:53 -07:00
Andrew Noyes 247f95a6e2 Add -Wunused-variable 2019-04-15 18:13:00 -07:00
Andrew Noyes 6207d724f8 Fix all -Wunused-variable warnings 2019-04-15 18:13:00 -07:00
Evan Tschannen cd5c9d91fa
Merge pull request #1443 from etschannen/master
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
Evan Tschannen 8e05713a5d do not log a SevError trace event if we cannot deserialize the connect packet 2019-04-10 17:41:02 -07:00
Evan Tschannen 6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
Remove unused functions
2019-04-09 11:49:45 -07:00
A.J. Beamon 058d028099
Merge pull request #1301 from mpilman/features/cheaper-traces
Defer formatting in traces to make them cheaper
2019-04-09 10:11:04 -07:00
A.J. Beamon a7288e1325 Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate. 2019-04-08 14:21:24 -07:00
mpilman bdba8e22eb Added test and bugfixes 2019-04-08 11:05:29 -07:00
mpilman b944e0b116 generalized read guards, allow for penalty+error 2019-04-08 11:04:44 -07:00
Evan Tschannen 1358603c7a fix: getReplyUnlessFailedFor must still report endpoint failures even if the address is local 2019-04-08 10:42:58 -07:00
Evan Tschannen 1baae75ac9 merge 6.1 2019-04-07 23:24:31 -07:00
Balachandar Namasivayam 83e67d6b8f Do not start failure monitoring for local endpoints in getReplyUnlessFailedFor. 2019-04-05 20:21:22 -07:00
Andrew Noyes bd12e77213 Whitespace tweak 2019-04-05 16:30:42 -07:00
Andrew Noyes c882743afa Make actual build happy again 2019-04-05 16:30:42 -07:00
Andrew Noyes d7612a4426 Fix OPEN_FOR_IDE build errors 2019-04-05 16:30:42 -07:00
mpilman c008e16c81 Defer formatting in traces to make them cheaper
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:

- TraceEvent::detail now takes a c-string instead of std::string for
  literals. This prevents unnecessary allocations if the trace is not
  going to be printed in the first place (for example for SevDebug).
  Before that `detail` expected a `std::string` as key, which mean that
  any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
  specialized for any type that needs to be printed. The actual
  formatting will be deferred to after the `enabled` check. This
  provides two benefits: (1) if a TraceEvent is disabled, we don't pay
  for the formatting and (2) TraceEvent can trace types that it doesn't
  know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
  passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
  calls, a call to detail will only introduce a if-branch which is much
  cheaper than a function call.
2019-04-05 13:12:19 -07:00
Meng Xu c4a8a80d6f Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-04 22:51:00 -07:00
Evan Tschannen 390ab9cfed A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy 2019-04-04 14:11:12 -07:00
Jingyu Zhou 2b75c2e684 Restore removed functions.
crc32c.cpp is 3rd party code.
orYield() in genericactors.actor.h might be used in the future code.
2019-04-04 13:24:55 -07:00
Markus Pilman 101a05ae77
Merge branch 'master' into features/client-simulator 2019-04-03 10:03:56 -08:00
Alex Miller 45c466e269 Open incrementalDelete files with OPEN_UNBUFFERED
This fixes crashes from AsyncFileWinASIO refusing to open a file that
didn't have OPEN_UNBUFFERED.
2019-04-01 17:25:08 -07:00
Jingyu Zhou 47b4b82628
Merge branch 'master' into fix-unreferenced 2019-04-01 14:07:19 -07:00
Jingyu Zhou 3f76be8f45 Merge remote-tracking branch 'apple/master' into fix-unreferenced 2019-04-01 14:00:43 -07:00
Jingyu Zhou f7f8ddd894 Fix warnings on unused variables
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
mpilman e23e63c6ac Implemented JavaWorkload
This change allows a user to write a workload in Java.

The way this is implemented is by creating a JVM within the
simulator and calling the corresponding workload class. A
workload can then run in the simulator or on a testing cluster.

If the workload is executed within the simulator, the resulting
test will not be deterministic anymore as it will execute in a
different thread (and even without that it is not clear, whether
we could get determinism as the JVM does a lot of stuff that are
not deterministic).

This is intendet to get better testing of the Java client and
layer authors can use the simulator to test their layers on a single
machine but they can still simulate failing machines etc.
2019-03-31 17:57:43 -07:00
Meng Xu 70d7c289f4 Merge branch 'master' into mengxu/restore/parallel-v7 2019-03-30 22:13:10 -07:00
Balachandar Namasivayam 0bbdc15f71 Multi-test processes waits until a timeout if any of the tester processes restarts. Use getReplyUnlessFailedFor instead of getReply to detect the restarts and fail quickly instead of waiting for a timeout which is usually large. 2019-03-28 17:05:30 -07:00
Evan Tschannen b6008558d3 renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen 34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen c10f1eea71 QueueModel changed to unordered_map 2019-03-27 20:56:44 -07:00
Evan Tschannen f1a4bdd70d changed failureMonitor to use an unordered_map 2019-03-27 19:17:08 -07:00
Evan Tschannen e5a80f2c94 optimized IPaddress 2019-03-27 18:21:13 -07:00
Jingyu Zhou a55f06e082 Remove unused functions
Found with -Wunused-function flag.
2019-03-27 15:45:28 -07:00
A.J. Beamon 71e2fdafb8 Changes to ratekeeper camel case 2019-03-27 08:24:25 -07:00
Evan Tschannen 3b5b03e435 ReplyPromise does not serialize an empty NetworkAddress 2019-03-26 12:05:43 -07:00
Evan Tschannen d45159ebf7
Merge pull request #1307 from jzhou77/ratekeeper
Monitor placement of Ratekeeper and DataDistributor
2019-03-24 17:26:07 -07:00
Evan Tschannen 1fc6937802 changed NetworkAddressList to at most two addresses for performance 2019-03-23 17:54:46 -07:00
Evan Tschannen 36ab852bb1 Merge branch 'master' into ratekeeper
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-22 18:41:00 -07:00
Evan Tschannen efbcd18987 fixed a performance regression related to broadcasting a read version to too many transactions simultaneously 2019-03-22 16:05:20 -07:00
Jingyu Zhou 0fb6a03c07 First round of review comment fixes for PR#1307 2019-03-19 11:29:19 -07:00
Jingyu Zhou 254c78053c Fix a segfault error
After wait, ServerDBInfo may have changed. Using the old copy is wrong.
2019-03-15 22:11:13 -07:00
A.J. Beamon 85b3f11e71 Fix various compiler warnings 2019-03-15 10:34:57 -07:00
Meng Xu 5a10bf5dfc Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-14 10:35:12 -07:00
Steve Atherton be0da73938
Merge pull request #1290 from etschannen/feature-cheap-policy
Optimized a few uses of the replication policy engine
2019-03-13 17:01:19 -07:00
Evan Tschannen e7d1f9e5f1 fixed review comments 2019-03-13 15:59:03 -07:00
Evan Tschannen e8cb85ed8e optimize validateAllCombinations 2019-03-13 14:47:35 -07:00
Vishesh Yadav c32504f705 io: Add DISABLE_POSIX_KERNEL_AIO knob to use EIO instead of Kernel AIO
- Some Linux filesystems don't support O_DIRECT which is required by
Kernel AIO to function properly. Instead of using O_SYNC, EIO is
much better options in terms of performance penalty.
- Some systems may not support AIO at all. Eg. Windows Subsystem for
Linux.

FIXES #842
RELATED #274
2019-03-13 13:39:45 -07:00
Evan Tschannen a2108047aa removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects. 2019-03-13 13:14:39 -07:00
Evan Tschannen e068c478b5 merge master 2019-03-12 18:31:25 -07:00
Evan Tschannen a7e45cff91
Merge pull request #1176 from jzhou77/ratekeeper
Make Ratekeeper a separate role
2019-03-12 15:58:59 -07:00
Meng Xu 85c24b0067 Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-12 15:20:54 -07:00
Balachandar Namasivayam 880e8643d1 Fix Windows link errors 2019-03-11 17:49:03 -07:00
Evan Tschannen 044b6b4f8a Merge branch 'master' into feature-degraded-tlog
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen 41c493f8d4 fix: connectPacket accessed uninitialized variables 2019-03-08 14:40:32 -05:00
Jingyu Zhou 5dcde9efe0 Fix locality per review comment and a mac compile error 2019-03-07 13:16:20 -08:00
Jingyu Zhou 3c86643822 Separate Ratekeeper from data distribution.
Add a new role for ratekeeper.

Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Meng Xu 04880e3d4d Merge branch 'master' into mengxu/tls-switch-status-PR 2019-03-06 13:41:16 -08:00
Alex Miller c6a65389ae Remove noexcept macro and replace with BOOST_NOEXCEPT.
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
Alex Miller af617d68e6 boost 1.52.0 -> 1.67.0 in all vcxproj files 2019-03-05 22:06:12 -08:00
Meng Xu 820548223a Status: connected_coordinators misc minor changes
Change the rst document file;
Change the coding style to be consistent with the nearby code;
Ensure we always initilize the connectedCoordinatesNum to 0
even when the variable is not used.
2019-03-05 21:45:18 -08:00
anoyes 981426bac9 More ide fixes 2019-03-05 18:03:57 -08:00
Evan Tschannen 82d957e0bb
Merge pull request #1178 from vishesh/task/issue-963-IPv6
IPv6 Support
2019-03-05 17:14:16 -08:00
Meng Xu afd7c1d497 AsynFileWinASIO: Make error checking consistent with Linux
In Linux, KAIO uses ASSERT to make sure open() flags have
OPEN_UNBUFFERED set.

In Windows, we uses if-condition and return io_errors() when the
flag is not set.

This PR makes Windoes implementation always use ASSERT to check the
flag.
2019-03-04 16:36:04 -08:00
Vishesh Yadav 5cd8bac6cb fix: segfault due external assignment of Endpoint::addresses #1201
isLocal() now checks if the address is equal to default
NetworkAddress() which should match the behaviour before TLS changes.
2019-03-04 15:49:11 -08:00
Vishesh Yadav 1d3e62c4e3 net: Don't use a union of IP in ConnectPacket #963
Since keeping a union and using the packet size to figure out whether
the ConnectPacket is using IPv6 to IPv4 address is not easily
maintainable. For simplicity, we just serialize everything in
ConnectPacket and be backward compatible with older format.

However, some code for some much older stuff is removed.
2019-03-04 14:12:45 -08:00
Vishesh Yadav e93cd0ff21 Add some checks and comments to IPv6 changes #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav 592e224155 net: add/use formatIpPort to format IP:PORT pairs #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav cc9ad0e202 net: Use IPv6 in simulation testing #963
25% times we will use IPv6 addresses
2019-03-04 14:12:45 -08:00
Vishesh Yadav 57832e625d net: Support IPv6 #963
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.

- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.

- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.

- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Meng Xu 94385447bc Status: Get if client configured TLS
To understand if all clients have configured TLS,
we check the tlsoption when a client tries to open database.
This is similar to how we track the versions of multi-version clients.
2019-03-01 15:17:01 -08:00
Stephen Atherton 7d287c6999 Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2019-02-28 14:01:00 -08:00
Stephen Atherton 887856b6b0 Bug fix in AsyncFileReadAhead where a file size that is an integer multiple of the read chunk size will cause a crash when reading the file's final block. BackupContainerLocalDirectory now uses AsyncFileReadAhead in simulation to get simulation coverage of that class, and FileBackup will generate file sizes which expose the bug. 2019-02-28 00:22:38 -08:00
Evan Tschannen 8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller 2dc57568cb Change many things about log_version.
* log_version in the database (`/conf/log_version`) is now a hint that gets
  rounded to the nearest supported version.
* fdbcli and FDB enforce that only a valid log_version can be configured to
* TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet)
* Some comments here and there
* Add an assert on filename length to make sure KV-pairs in filename
  don't exceed a maximum length.
2019-02-26 16:47:04 -08:00
Evan Tschannen b8910ba7cd Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.h
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Trevor Clinkenbeard 25b397977c Never assign DataDistributor role to process of class CoordinatorClass 2019-02-20 17:22:01 -08:00
Trevor Clinkenbeard 1bb384db4d Merge branch 'master' of https://github.com/apple/foundationdb into add-no-assign-class 2019-02-20 13:13:12 -08:00
mpilman f14dee764b Use fwd decl for connectionReader - fdbrpc compiling 2019-02-19 15:16:59 -08:00
mpilman 3bd9b9047b Minor fixes - flow now compiling with intellisense 2019-02-19 15:16:59 -08:00
Evan Tschannen 065a45e05f Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Vishesh Yadav 0898686c9b Remove old TODO 2019-02-18 15:43:27 -08:00
Evan Tschannen 62603d11a1 updated the killRegion simulation test to test a much larger variety of failure scenarios 2019-02-18 15:32:51 -08:00
Vishesh Yadav e05b53d755 Merge remote-tracking branch 'apple/master' into task/tls-upgrade 2019-02-15 20:37:07 -08:00
Vishesh Yadav 345fd7e4da Prefer unencrypted ports at client side during transition 2019-02-15 20:23:07 -08:00
Evan Tschannen 83060c6e56
Merge pull request #1062 from jzhou77/PR
Add a new DataDistributor role.
2019-02-15 13:51:27 -08:00
mpilman 75f692b931 simplify actorcompiler and target to compile coveragetool 2019-02-15 00:01:42 -08:00
Jingyu Zhou c35d1bf2ef Fix according Alex's comment 2019-02-14 16:30:13 -08:00
Jingyu Zhou 886e7ab2ba Add a new DataDistributor role.
Let cluster controller to start a new data distributor role by sending a
message to a chosen worker.
Change MasterInterface usage in DataDistribution to masterId

Add DataDistributor rejoin handling.

This allows the data distributor to tell the new cluster controller of its
existence so that the controller doesn't spawn a new one. I.e., there should
be only ONE data distributor in the cluster.

If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries
to recruit one as DD. CC also monitors DD and restarts one if it failed.

The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for
the new DD.

Add GetRecoveryInfo RPC to master server, which is called by data distributor
to obtain the recovery Transaction version from the master server.
2019-02-14 16:30:13 -08:00
Vishesh Yadav 907446d0ce Merge remote-tracking branch 'apple/master' into task/tls-upgrade 2019-02-14 11:37:38 -08:00
A.J. Beamon 9272a41e5f
Merge pull request #1146 from atn34/fix-actor-warning
Fix actor warning for cmake build
2019-02-13 11:01:37 -08:00
Andrew Noyes 3a38bff8ee Use DISABLE_ACTOR_WITHOUT_WAIT_WARNING consistently 2019-02-13 10:30:35 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
Andrew Noyes 874a58cb4f Suppress actor without wait for tests in cmake 2019-02-12 11:01:17 -08:00