Commit Graph

1472 Commits

Author SHA1 Message Date
Evan Tschannen 045175bd0e added tracking for the size of the system keyspace 2019-09-27 22:39:19 -07:00
Evan Tschannen 324d0bd3b0 Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-cleanup-mutations 2019-09-27 19:15:14 -07:00
Evan Tschannen 3cc5d484a5 the include and exclude commands do not need to set the moveKeysLockOwnerKey, which will kill the data distribution algorithm 2019-09-27 18:33:56 -07:00
Evan Tschannen ef01ad2ed8 optimized log range clearing to clear everything for each possible hash (256 clears) if that would be more efficient than one clear per second that has elapsed
aborting a DR without the —cleanup flag will still attempt to cleanup for 30 seconds before giving up
added a cleanup command to fdbbackup which can remove mutations from orphaned DRs which were stopped without the —cleanup flag
2019-09-27 18:32:27 -07:00
sramamoorthy a4d38f1158 Fix #2057 snapshot cli to print UID in failure too 2019-09-17 05:18:28 -07:00
Evan Tschannen 9b4f7626bb cache the serialization of clientDBInfo 2019-09-11 15:19:42 -07:00
Andrew Noyes 34dedc9a62 Fix whitespace 2019-09-05 16:44:58 -07:00
Andrew Noyes c18c4c1b83 Use a transaction option to control includePort behavior 2019-09-05 14:58:39 -07:00
Andrew Noyes 11f6adf645 Treat \xff\xff prefix as 'includePort' for get_addresses_for_key 2019-09-04 17:47:40 -07:00
Evan Tschannen dc1d055b27
Merge pull request #2042 from senthil-ram/snap_cli_fix
fix fdbcli --exec 'snapshot create.sh' failure
2019-08-30 13:40:38 -07:00
Evan Tschannen a7237c4302
Merge pull request #2045 from atn34/disallow-scalar-network-messages
Disallow scalar network messages
2019-08-30 13:38:54 -07:00
Evan Tschannen 84e2c9e1a5
Merge pull request #2041 from senthil-ram/snap_error_reporting
improved error msgs for snapshot cmd
2019-08-30 12:58:41 -07:00
sramamoorthy b3277f2982 Fix #2009 posix compliant args for snapshot binary 2019-08-30 12:54:09 -07:00
A.J. Beamon 3f9e392668
Merge pull request #2014 from etschannen/feature-fdbcli-sleep
Added a sleep command to fdbcli
2019-08-30 11:22:13 -07:00
Evan Tschannen f3bc7e0abd do not duplicate data distribution disabled fields in status
fixed a few bugs related to the existing data distribution disabled fields in status
2019-08-29 18:41:34 -07:00
Andrew Noyes b5f9e9f307 Move comment above if 2019-08-28 15:21:58 -07:00
Andrew Noyes 6aa0ada7b1 Replace scalar root types with proper messages 2019-08-28 14:40:50 -07:00
sramamoorthy 7a9097ea01 make fdbcli --exec 'snapshot create.sh' to succeed 2019-08-27 16:44:19 -07:00
sramamoorthy 5d87443323 improved error msgs for snapshot cmd 2019-08-27 16:43:52 -07:00
Vishesh Yadav 2b941f51bd Revert "fix: Use getReply* instead of tryGetReply in `monitorProxies`"
This reverts commit e7c94a2411.
2019-08-26 18:31:08 -07:00
Vishesh Yadav e7c94a2411 fix: Use getReply* instead of tryGetReply in `monitorProxies`
`tryGetReply` is unreliable, and since `monitorProxies` expects reply
after long period, the connection to coordinator gets closed due to
idle timeout, only to get reopened again in next loop to make
`openDatabase` request.

When using `getReply` our reliable message queue won't be empty and
connection will stay open.
2019-08-26 18:24:49 -07:00
Evan Tschannen 0b0c9fe0ff data distribution status was combined into regular status 2019-08-21 14:44:15 -07:00
A.J. Beamon 2b80d836f4 Merge branch 'release-6.2' into add-coordinator-to-status-roles-list
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-08-19 15:03:59 -07:00
Evan Tschannen 3965a959b4
Merge pull request #2017 from ajbeamon/fix-fileconfigure-error-cases
Fix some validation logic when using the fdbcli fileconfigure command
2019-08-19 14:49:51 -07:00
A.J. Beamon 7953545331 Fix an unknown_error when the file passed to fileconfigure doesn't contain a valid object (e.g. if you omit the enclosing {} of your object).
Fix an internal error when configuring regions with some storage servers that don't have a datacenter set.
2019-08-19 11:28:15 -07:00
Evan Tschannen 2a436d5f6f fix: do not block fdbcli from starting if DataDistributionStatus is not available 2019-08-16 18:15:02 -07:00
A.J. Beamon b8e57f37d7 Add 'coordinator' to the list of roles that a process can have in status. 2019-08-15 14:42:49 -07:00
A.J. Beamon bb72cdd36a Report lag with the usual "seconds" and "versions" fields. Rename and deprecate the qos.*version_lag_storage_server fields. 2019-08-15 13:42:39 -07:00
A.J. Beamon 6581161dd3 Add ratekeeper's durability lag statistics to status 2019-08-15 11:07:04 -07:00
Evan Tschannen a067e6e812
Merge pull request #1990 from etschannen/release-6.2
Fixed status reporting bugs related to connected clients
2019-08-13 16:24:12 -07:00
Evan Tschannen 70ce678879 fix: max_protocol_clients were being added to the connected_clients list
fix: the clientCount was included clients with unknown protocol versions. This has been changed back to the pre-6.2 behavior where it is just a count of clients with known versions, and now clients with unknown versions are tracked explicitly as its own supported_version section
2019-08-13 15:54:40 -07:00
A.J. Beamon c4004a4eea Don't count read version requests if we've already started one. Also avoid some other work that only needs to be done if we haven't started a read version request. 2019-08-12 15:55:48 -07:00
Evan Tschannen c9fa7237f1 Merge branch 'master' of github.com:apple/foundationdb 2019-08-06 16:40:14 -07:00
Evan Tschannen ba54508c47 code cleanup 2019-08-06 16:30:30 -07:00
mpilman 370ba8b841 Remove --object-serializer flag from executables 2019-08-06 09:25:40 -07:00
A.J. Beamon e61cac4ed4 Fix spacing issue; rename fdbrpc/Stats.h to fdbrpc/TimedRequest.h 2019-08-01 08:39:52 -07:00
mpilman 7d247af500 Two minor bug fixes from recent optimizations 2019-07-31 19:14:11 -07:00
Evan Tschannen f33969c9d4
Merge pull request #1949 from atn34/at-what-cost3
Improve read performance part 1 of 2
2019-07-31 18:05:05 -07:00
mpilman dabe516320 Avoid unnecessary timer calls 2019-07-31 17:59:35 -07:00
Andrew Noyes 2dd3a6afe1 Fully qualify base class members 2019-07-31 17:59:35 -07:00
Andrew Noyes 0569df00f6 Remove indirection in LoadBalancedReply serialization 2019-07-31 17:59:35 -07:00
Evan Tschannen 7d7aa27c2d
Merge pull request #1814 from dongxinEric/feature/1508/finer-grained-dd-controls
Added finer grained controls to DataDistribution in fdbcli.
2019-07-31 17:36:20 -07:00
Evan Tschannen f4bcfcd53c
Merge pull request #1944 from etschannen/master
more bug fixes
2019-07-31 17:13:58 -07:00
Evan Tschannen 0063ef62ea fix: the client would not shrink the proxy list in all cases 2019-07-31 16:06:51 -07:00
Balachandar Namasivayam e8a9931dbe
Merge pull request #1918 from atn34/at-what-cost
Avoid memcpy for small types
2019-07-31 11:39:38 -07:00
Xin Dong 5d20364423 Address review comments 2019-07-30 22:24:30 -07:00
Xin Dong 1922c39377 Resolve review comments. 100K run shows one suspecious ASSERT_WE_THINK failure which I think could be a race. 2019-07-30 22:24:30 -07:00
Xin Dong ae11efcb0a Made following changes:
- Make sure the disabled data distribution won't be accidentally enabled by the 'maintenance' command
- Make sure the status json reflects the status of DD accordingly
- Make sure the CLI can play with the new DD states correctly, i.e. print out warns when necessary
2019-07-30 22:20:45 -07:00
Xin Dong 4ecfc9830f Added finer grained controls to DataDistribution in fdbcli. What's happening under the hood is:
- Use pre-existing 'healthZone' key and write a special value to it in order to disable DD for all storage server failures
- Use a new system key 'rebalanceDDIgnored' key to disable/enable DD for all rebalance reasons(MountainChopper and ValleyFiller)

Kicked off two 200K correctness and showed no related errors.
2019-07-30 22:17:21 -07:00
Evan Tschannen efb9131657
Merge pull request #1909 from etschannen/feature-client-proxy-connections
Clients only connect to three proxies to reduce the number of network connections
2019-07-30 17:53:41 -07:00
Evan Tschannen 54df2abe8e fix: trace event did not compile 2019-07-30 17:52:53 -07:00
Evan Tschannen 85767f2034
Update fdbclient/MonitorLeader.actor.cpp 2019-07-30 17:19:33 -07:00
Evan Tschannen 69e7ed3e53
Merge pull request #1932 from etschannen/master
Bug fixes for rare bugs found by simulation
2019-07-30 17:18:30 -07:00
Evan Tschannen 7aece7398b fix: it was reducing the list of proxies on the coordinators, which would have made all the clients talking to that coordinator connect to the same set of proxies
optimized the code to avoid re-randomizing the same list of proxies
2019-07-30 17:15:24 -07:00
Evan Tschannen 2cd10bc96c Merge branch 'master' into feature-client-proxy-connections 2019-07-30 17:06:23 -07:00
sramamoorthy 63941e0d96 disable DD with a in-memory flag and use in snapv2 2019-07-30 17:04:51 -07:00
Evan Tschannen 9f11f2ec53 Merge branch 'master' of github.com:apple/foundationdb 2019-07-30 16:55:56 -07:00
Evan Tschannen 5c978f6129 fix: switchConnectionFile could get the proxies out of the clientInfo and continue connecting to the wrong cluster 2019-07-30 16:32:26 -07:00
A.J. Beamon 59dd5916c5 Merge branch 'master' into rename-fault-tolerance-status-fields
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-07-30 15:11:25 -07:00
A.J. Beamon 14648e20f9
Merge pull request #1901 from ajbeamon/data-distribution-receives-bytes-input-rate
Send bytes input rate to data distribution
2019-07-30 15:01:36 -07:00
A.J. Beamon 438bc636d5 Rename max_machine_failures_without_losing_X to max_zone_failures_without_losing_X in status. 2019-07-30 14:02:31 -07:00
sramamoorthy 49eaa31984 Add a trace event for snap create failure 2019-07-29 20:28:22 -07:00
sramamoorthy 5a56f6b456 minor snap create client improvement and bug fixes 2019-07-29 20:28:22 -07:00
Andrew Noyes 9f32edf4df Avoid memcpy for small types
This is undefined behavior, since it's potentially a misaligned access.
But it's _probably_ not worse than the status quo
2019-07-29 17:11:45 -07:00
Evan Tschannen 8425f53fc5 clients only connect to three proxies 2019-07-28 23:52:29 -07:00
Evan Tschannen 5c98dcce6d revert the proxy forwarding path, because it is no longer necessary as clients keep a persistent connection open with coordinators 2019-07-27 16:46:22 -07:00
Evan Tschannen b509a441e7 Merge branch 'master' into feature-skip-confirm
# Conflicts:
#	bindings/flow/tester/Tester.actor.cpp
#	bindings/go/src/_stacktester/stacktester.go
#	bindings/java/src/test/com/apple/foundationdb/test/AsyncStackTester.java
#	bindings/java/src/test/com/apple/foundationdb/test/StackTester.java
#	bindings/python/tests/tester.py
#	bindings/ruby/tests/tester.rb
#	documentation/sphinx/source/api-c.rst
#	documentation/sphinx/source/api-python.rst
#	documentation/sphinx/source/api-ruby.rst
#	documentation/sphinx/source/data-modeling.rst
#	documentation/sphinx/source/developer-guide.rst
#	fdbclient/vexillographer/fdb.options
#	fdbserver/MasterProxyServer.actor.cpp
2019-07-27 15:08:13 -07:00
Evan Tschannen d1c7ab325b fix: getConnectionFile could crash when connectionFile is null 2019-07-27 13:02:06 -07:00
Evan Tschannen 1c4028d71e fixed merge conflict error 2019-07-26 16:09:46 -07:00
Evan Tschannen 27d9ce1143 fixed merge conflict 2019-07-26 15:23:36 -07:00
Evan Tschannen 90e3b50213 Merge branch 'master' into feature-coordinator-connection
# Conflicts:
#	fdbclient/DatabaseContext.h
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Evan Tschannen 8149b5b352
Merge pull request #1413 from atn34/change-connection-file
Switch cluster file feature
2019-07-26 13:27:37 -07:00
Evan Tschannen ee92f0574f fix: lastRequestTime was not updated
fix: COORDINATOR_REGISTER_INTERVAL was not set
fixed review comments
2019-07-26 13:23:56 -07:00
sramamoorthy 9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
Evan Tschannen be5d144b8b added status information on connected clients 2019-07-25 17:15:31 -07:00
A.J. Beamon b91795d288 Send bytes input rate to DD. 2019-07-25 16:27:32 -07:00
sramamoorthy a65c9f92ed get rid of all timeouts and other changes 2019-07-24 15:36:28 -07:00
sramamoorthy a2f2ad96ff code review comments and merge to master changes 2019-07-24 15:36:28 -07:00
sramamoorthy 869f77aef1 Few cosmetic edits and fixes 2019-07-24 15:36:28 -07:00
sramamoorthy ddd4523816 bug fix in timeout & header file re-arrange in DD 2019-07-24 15:36:28 -07:00
sramamoorthy 0598779a40 snap create timeout for client call 2019-07-24 15:36:28 -07:00
sramamoorthy ec9af66fc5 timeout snapshotdatabase call 2019-07-24 15:36:28 -07:00
sramamoorthy 62c14dae72 disable dd during snap and enable in restore 2019-07-24 15:36:28 -07:00
sramamoorthy d0793f5ca2 snap v2: master proxy related changes 2019-07-24 15:36:28 -07:00
sramamoorthy 209448807d snap v2: fdbclient related changes 2019-07-24 15:36:28 -07:00
Evan Tschannen 83f4b8ebb1
Merge pull request #1866 from senthil-ram/setDDBugFix
setDDMode should set moveKeysLockWriteKey
2019-07-24 15:35:18 -07:00
Trevor Clinkenbeard 9ad9bd4c1f Merge branch 'master' of https://github.com/apple/foundationdb into change-connection-file 2019-07-24 15:22:26 -07:00
Evan Tschannen 8b73a1c998 removed verbose trace messages 2019-07-24 15:07:41 -07:00
Evan Tschannen 4a866290b7 Clients keep a persistent connection open with coordinators to get updates to the list of proxies
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00
Vishesh Yadav c4d1e16ab5
Merge pull request #1873 from atn34/uninitialized-memory
Fix uninitialized memory
2019-07-23 10:36:01 -07:00
Andrew Noyes 1968db17e3 Initialize in default constructor for GetReadVersionReply
==10473== Uninitialised byte(s) found during client check request
==10473==    at 0x1BA9ACE: sendPacket(TransportData*, ISerializeSource const&, Endpoint const&, bool, bool) (FlowTransport.actor.cpp:1252)
==10473==    by 0x877C05: (anonymous namespace)::NetworkSenderActorState<GetReadVersionReply, (anonymous namespace)::NetworkSenderActor<GetReadVersionReply> >::a_body1cont2(GetReadVersionReply const&, int) [clone .isra.0] (networksender.actor
.h:40)
==10473==    by 0x877CC6: a_body1when1 (networksender.actor.g.h:147)
==10473==    by 0x877CC6: a_callback_fire (networksender.actor.g.h:161)
==10473==    by 0x877CC6: ActorCallback<(anonymous namespace)::NetworkSenderActor<GetReadVersionReply>, 0, GetReadVersionReply>::fire(GetReadVersionReply const&) (flow.h:894)
==10473==    by 0xC343A7: send<GetReadVersionReply&> (flow.h:343)
==10473==    by 0xC343A7: send<GetReadVersionReply&> (fdbrpc.h:124)
==10473==    by 0xC343A7: (anonymous namespace)::ForwardProxyActorState<(anonymous namespace)::ForwardProxyActor>::a_body1loopBody1when2(ReplyPromise<GetReadVersionReply> const&, int) (MasterProxyServer.actor.cpp:1814)
==10473==    by 0xC33C10: (anonymous namespace)::ForwardProxyActorState<(anonymous namespace)::ForwardProxyActor>::a_body1loopBody1(int) (MasterProxyServer.actor.g.cpp:8167)
==10473==    by 0xC35434: a_body1loopHead1 (MasterProxyServer.actor.g.cpp:8152)
==10473==    by 0xC35434: a_body1loopBody1cont2 (MasterProxyServer.actor.g.cpp:8327)
==10473==    by 0xC35434: a_body1loopBody1cont1when1 (MasterProxyServer.actor.g.cpp:8333)
==10473==    by 0xC35434: a_body1loopBody1cont1when1 (MasterProxyServer.actor.g.cpp:8331)
==10473==    by 0xC35434: a_callback_fire (MasterProxyServer.actor.g.cpp:8347)
==10473==    by 0xC35434: ActorCallback<(anonymous namespace)::ForwardProxyActor, 3, Void>::fire(Void const&) (flow.h:894)
==10473==    by 0x7E7BE7: SAV<Void>::finishSendAndDelPromiseRef() (flow.h:375)
==10473==    by 0x8319FD: a_body1when1 (genericactors.actor.g.h:10892)
==10473==    by 0x8319FD: a_callback_fire (genericactors.actor.g.h:10920)
==10473==    by 0x8319FD: ActorCallback<(anonymous namespace)::ChooseActorActor<Void>, 0, Void>::fire(Void const&) (flow.h:894)
==10473==    by 0x891917: void SAV<Void>::send<Void>(Void&&) (flow.h:343)
==10473==    by 0x1C47ADC: send<Void> (flow.h:674)
==10473==    by 0x1C47ADC: execTask (sim2.actor.cpp:1632)
==10473==    by 0x1C47ADC: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) (sim2.actor.cpp:975)
==10473==    by 0x1C47FF2: a_body1loopBody1when1 (sim2.actor.g.cpp:5092)
==10473==    by 0x1C47FF2: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1(int) (sim2.actor.g.cpp:5037)
==10473==    by 0x1C47A6C: a_body1loopHead1 (sim2.actor.g.cpp:5020)
==10473==    by 0x1C47A6C: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) (sim2.actor.g.cpp:5086)
==10473==  Address 0x12db1ba1 is 2,977 bytes inside a recently re-allocated block of size 4,096 alloc'd
==10473==    at 0x1CC5D7F: FastAllocator<4096>::allocate() (FastAlloc.cpp:290)
==10473==    by 0x1CFAA68: operator new (FastAlloc.h:193)
==10473==    by 0x1CFAA68: PacketWriter::nextBuffer() (Net2Packet.cpp:59)
==10473==    by 0x1CFABD6: PacketWriter::writeAhead(int, SplitBuffer*) (Net2Packet.cpp:81)
==10473==    by 0x1BA97EB: sendPacket(TransportData*, ISerializeSource const&, Endpoint const&, bool, bool) (FlowTransport.actor.cpp:1199)
==10473==    by 0x7DEAD1: a_body1cont2 (networksender.actor.h:40)
==10473==    by 0x7DEAD1: a_body1when1 (networksender.actor.g.h:147)
==10473==    by 0x7DEAD1: a_callback_fire (networksender.actor.g.h:161)
==10473==    by 0x7DEAD1: ActorCallback<(anonymous namespace)::NetworkSenderActor<GetValueReply>, 0, GetValueReply>::fire(GetValueReply const&) (flow.h:894)
==10473==    by 0xF22767: send<GetValueReply&> (flow.h:343)
==10473==    by 0xF22767: send<GetValueReply&> (fdbrpc.h:124)
==10473==    by 0xF22767: (anonymous namespace)::GetValueQActorState<(anonymous namespace)::GetValueQActor>::a_body1cont5(int) [clone .isra.0] (storageserver.actor.cpp:890)
==10473==    by 0xF2305C: (anonymous namespace)::GetValueQActorState<(anonymous namespace)::GetValueQActor>::a_body1cont3(int) [clone .isra.0] (storageserver.actor.g.cpp:1592)
==10473==    by 0xF23447: a_body1cont2when1 (storageserver.actor.g.cpp:1627)
==10473==    by 0xF23447: (anonymous namespace)::GetValueQActorState<(anonymous namespace)::GetValueQActor>::a_body1cont2(Void const&, int) [clone .isra.0] (storageserver.actor.g.cpp:1512)
==10473==    by 0xF23507: a_body1when1 (storageserver.actor.g.cpp:1523)
==10473==    by 0xF23507: a_callback_fire (storageserver.actor.g.cpp:1537)
==10473==    by 0xF23507: ActorCallback<(anonymous namespace)::GetValueQActor, 0, Void>::fire(Void const&) (flow.h:894)
==10473==    by 0x891917: void SAV<Void>::send<Void>(Void&&) (flow.h:343)
==10473==    by 0x1C47ADC: send<Void> (flow.h:674)
==10473==    by 0x1C47ADC: execTask (sim2.actor.cpp:1632)
==10473==    by 0x1C47ADC: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) (sim2.actor.cpp:975)
==10473==    by 0x1C47FF2: a_body1loopBody1when1 (sim2.actor.g.cpp:5092)
==10473==    by 0x1C47FF2: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1(int) (sim2.actor.g.cpp:5037)
==10473==  Uninitialised value was created by a stack allocation
==10473==    at 0xC342D0: (anonymous namespace)::ForwardProxyActorState<(anonymous namespace)::ForwardProxyActor>::a_body1loopBody1when2(ReplyPromise<GetReadVersionReply> const&, int) (MasterProxyServer.actor.g.cpp:8213)
2019-07-22 13:54:52 -07:00
A.J. Beamon e29a6ea280
Merge pull request #1871 from bnamasivayam/tr-priority-add-client-log
Track the priority of sampled Transaction as part of GetReadVersion e…
2019-07-22 13:04:52 -07:00
Balachandar Namasivayam df652155fc Addressed review comments 2019-07-22 12:17:05 -07:00
Trevor Clinkenbeard 3507bfd52f Trigger masterProxiesChangeTrigger in switchConnectionFileImpl 2019-07-22 11:13:07 -07:00
Balachandar Namasivayam af267ba053 Track the priority of sampled Transaction as part of GetReadVersion event. 2019-07-19 17:31:49 -07:00
Evan Tschannen 041531d283
Merge pull request #1864 from ajbeamon/client-thread-safety-fix
Fix thread-safety issue with connection file when creating database on client
2019-07-19 16:42:26 -07:00
Evan Tschannen 846038b0e6
Merge pull request #1858 from bnamasivayam/rk-ssfetch-throttle
Ratekeeper throttling aggressively when unable to fetch storage server list
2019-07-19 16:41:58 -07:00
Evan Tschannen 3045826e3c
Merge pull request #1819 from mpilman/flatbuffers-fixes2
Flatbuffers fixes2
2019-07-19 16:33:50 -07:00
Alex Miller c3a8ae4752
Merge pull request #1791 from fzhjon/fetch-keys-requests-priority
Introduce priority to fetchKeys requests from data distribution
2019-07-19 14:54:51 -07:00
A.J. Beamon f6183df8b9
Merge pull request #1852 from vishesh/task/issue-1840-non-blocking-exclusion
fdbcli: Add `no_wait` option in `exclude` command to avoid blocking
2019-07-19 12:49:29 -07:00
A.J. Beamon b93a08ac6f Don't wrap the connection file in a reference until after we are on the main thread because references aren't thread safe. 2019-07-19 11:16:30 -07:00
sramamoorthy 0962641540 setDDMode should set moveKeysLockWriteKey
After takeMoveKeysLock notes down the owner and the
moveKeysLockWriteKey value, it monitors the above two in
pollMoveKeysLock and checks if anything is changed, but
setDDMode was not setting the moveKeysLockWriteKey and
so a sequence like disable, enable and disable would not
really disable DD.
2019-07-19 11:13:29 -07:00
A.J. Beamon bc5c65e5ab
Merge pull request #1756 from jzhou77/db-option
Add transaction getApproximateSize() API
2019-07-19 08:33:24 -07:00
Vishesh Yadav d9a8657096 fdbcli: Add `no_wait` option in `exclude` command to avoid blocking
RESOLVES #1840
2019-07-18 13:07:31 -07:00
Evan Tschannen 94c66f8d58
Merge pull request #1738 from bnamasivayam/consistency-check-disable
Disable/Re-enable consistency check through a database key.
2019-07-18 10:56:02 -07:00
Jingyu Zhou 7c0aca5b0c Fix review comments: release notes and formatting 2019-07-18 10:26:39 -07:00
Jingyu Zhou 4b36099097
Update fdbclient/ReadYourWrites.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-18 10:10:36 -07:00
mpilman 1ac2d01b03 Merge remote-tracking branch 'upstream/master' into flatbuffers-fixes2 2019-07-18 09:50:08 -07:00
Balachandar Namasivayam 406bcebdc4 Ratekeeper to throttle tpsLimit to 1 if it is not able to fetch storage server list for some configurable amount of time. 2019-07-17 18:08:17 -07:00
Jingyu Zhou e8e48e0dbd Fix size calculation
Mutations in writeRangeToNativeTransaction() is already counted, so there is no
need to count them again.
2019-07-16 15:21:13 -07:00
mpilman d5caf0c1b4 Merge branch 'flatbuffers-fixes2' of github.com:mpilman/foundationdb into flatbuffers-fixes2 2019-07-16 14:47:40 -07:00
Andrew Noyes e7e48a40ce Fix a few IncludeVersion bugs 2019-07-16 13:28:29 -07:00
Jingyu Zhou 14cb21285f Remove futureGetVersion in C binding and FutureVersion in Python binding 2019-07-16 10:46:07 -07:00
Jingyu Zhou d5cc2beb5f
Update fdbclient/ReadYourWrites.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-16 10:33:25 -07:00
Alec Grieser a89d09f1a6
Merge pull request #1844 from ajbeamon/configurable-trace-field-length
Make trace event field lengths default knobified and individually configurable.
2019-07-15 15:53:22 -07:00
A.J. Beamon 2cd05e9ac9
Merge pull request #1712 from tclinken/add-local-rk-to-status
Track the local ratekeeper rate in status
2019-07-15 15:17:11 -07:00
mpilman 54416f46fd Pass type as param to VectorRef instead of bool 2019-07-15 15:08:49 -07:00
A.J. Beamon 68f2b7a7f3 Event subclasses are managed through templates and don't have virtual functions, so don't use override. 2019-07-15 15:01:24 -07:00
Balachandar Namasivayam 9169232fa9 Add the new messages to Schema. 2019-07-15 13:47:27 -07:00
Alex Miller c8e94e601a
Merge pull request #1729 from etschannen/feature-fast-txs-recovery
Improve the recovery speed of the txnStateStore
2019-07-15 13:27:41 -07:00
mpilman 6c6a1ca8f4 Expose serialization context too all traits 2019-07-15 12:58:31 -07:00
Evan Tschannen 1a18c859c7 knobified the durability lag rate controls 2019-07-12 18:50:56 -07:00
Evan Tschannen c4c9e6cee7 fixed compiler errors 2019-07-12 18:28:41 -07:00
Evan Tschannen 02de53160d only skip confirm epoch live if CAUSAL_READ_RISKY is enabled
time checked on the proxy should be less than the time waited by the master to account for clock speed differences
setting REQUIRED_MIN_RECOVERY_DURATION and ENFORCED_MIN_RECOVERY_DURATION to 0 will go back to the old behavior
2019-07-12 17:58:16 -07:00
Jingyu Zhou 562bf6511a Fix approximate size calculation 2019-07-12 16:53:37 -07:00
A.J. Beamon d5051b08dd Make trace event field lengths (and total event sizes) default knobified and configurable. Add a transaction option to control the field length of transaction debug logging. Make the program start command line field less likely to be truncated. 2019-07-12 16:12:35 -07:00
Andrew Noyes 9001f5d23f Fix bug introduced in merge commit 2019-07-12 14:02:38 -07:00
Evan Tschannen a63969afb3 enforce a minimum recovery duration, which allows proxies to avoid checking if the epoch is alive as long as its last commit has been less than MINIMUM_RECOVERY_DURATION ago 2019-07-12 13:10:21 -07:00
mpilman 75d4b612cf Make object serializer versioned 2019-07-12 11:53:14 -07:00
Andrew Noyes 969957e619 Merge branch 'master' into change-connection-file 2019-07-12 11:39:19 -07:00
Jingyu Zhou 2dcc3cfd0a Deprecate fdb_future_get_version for version 620
Use fdb_future_get_int64 in all bindings.
2019-07-11 21:17:31 -07:00
Andrew Noyes 319a823b79 Fix whitespace 2019-07-11 17:35:37 -07:00
mpilman 7f21fc2bde Serialize range result to string for speed 2019-07-11 17:35:37 -07:00
Andrew Noyes ae6f17625e Support PacketBuffer's of arbitrary size 2019-07-11 17:35:37 -07:00
Jingyu Zhou b2a89c8b77 Address review comments for PR #1756
Use fdb_future_get_int64 for language bindings and get rid of using Version
with getApproximateSize API.
2019-07-11 16:41:29 -07:00
A.J. Beamon f31884c749 Merge branch 'master' into add-priority-starts-to-status
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-07-11 15:26:52 -07:00
Steve Atherton 1700d492cf
Merge pull request #1823 from ajbeamon/cache-hit-rate-in-status
Tweak cache hit calculations and add cache hit rate to status
2019-07-11 14:06:06 -07:00
A.J. Beamon 97609ad991 Add information about transaction starts at different priorities to status. 2019-07-11 13:54:44 -07:00
A.J. Beamon f4366e69ca Unknown options should not be used internally (i.e. underneath thread-safe API). This commit removes various checks that options exist and replaces them with an ASSERT. 2019-07-11 11:25:39 -07:00
Jon Fu 652cd77689 fixed merge conflicts and use new TaskPriority enum class 2019-07-11 09:56:59 -07:00
Jon Fu f12a3909f3 renamed workloads and made code style adjustments 2019-07-11 09:56:58 -07:00
Jon Fu 1e9d31597c removed extra parameter from getRange, added knob to guard new changes, and adjusted style/formatting in several places 2019-07-11 09:56:58 -07:00
Jon Fu f707d186fe added new priority for fetchkeys requests and adjusted ddmetrics workload to run parallel with mako 2019-07-11 09:56:58 -07:00
Evan Tschannen bbef631872 fix: do not access optionInfo unless the option already exists in the map 2019-07-10 18:48:54 -07:00
Jingyu Zhou c70a426f04 Update approximate size calculation 2019-07-10 15:00:50 -07:00
Jingyu Zhou 2c2836c6c7 Require API version to 620 for approximate size 2019-07-10 15:00:50 -07:00
Jingyu Zhou 5d1437c8e0 Push int directly to stack for getApproximateSize 2019-07-10 15:00:50 -07:00
Jingyu Zhou 0802df2c8f Convert size from int to string before pushing onto stack
Using int is troublesome because the size of int can be different from the
desired 64 bits. So, using a string representation seems to be more consistent.
2019-07-10 14:58:35 -07:00
Jingyu Zhou 9d12843a26 Push size as tuple to stack 2019-07-10 14:58:35 -07:00
Jingyu Zhou d5aaba3b15 Minor code fix 2019-07-10 14:58:07 -07:00
Jingyu Zhou 0ad2d2d16e Add binding test for getApproximateSize API 2019-07-10 14:58:07 -07:00
Jingyu Zhou 8ef8b59fcc Use ThreadFuture for getApproximateSize
Change return type to int64_t and fix C and Python binding to use the correct
type.
2019-07-10 14:58:07 -07:00
Jingyu Zhou 4c0e824456 Include unactorcompiler.h at the end of *.actor.h 2019-07-10 14:51:52 -07:00
Jingyu Zhou c50a675bf0 Add transaction getApproximateSize() API
The size is the summation of expected size of mutations, read conflict ranges,
and write conflict ranges.
2019-07-10 14:51:52 -07:00
A.J. Beamon b4dbc6d7fa Change the way cache hits and misses are tracked to avoid counting blind page writes as misses and count the results of partial page writes. Report cache hit rate in status. 2019-07-10 14:43:20 -07:00
Evan Tschannen d8948c8be1 Merge branch 'master' into feature-fast-txs-recovery
# Conflicts:
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2019-07-10 13:59:52 -07:00
Evan Tschannen 7e919e361c
Merge pull request #1817 from etschannen/feature-proxy-forward
Proxies will forward clients to the next generation
2019-07-10 13:53:12 -07:00
Alec Grieser a72d5b526a
Merge pull request #1767 from ajbeamon/fix-mvc-default-options
Make default and persistent options specifyable via annotations to fdb.options...
2019-07-10 13:19:45 -07:00
A.J. Beamon 69d7c4f79c Merge branch 'master' into track-run-loop-busyness
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Net2.actor.cpp
#	flow/network.h
2019-07-09 18:39:23 -07:00
A.J. Beamon a174178be1 Merge branch 'master' into fix-mvc-default-options
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
2019-07-09 18:33:14 -07:00
Evan Tschannen c8d86516f0
Merge pull request #1800 from ajbeamon/rename-datacenter-version-difference
Rename datacenter_version_difference to datacenter_lag and include bo…
2019-07-09 17:29:27 -07:00
Evan Tschannen 7ad0d1a12b Merge branch 'master' into feature-proxy-forward
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
2019-07-09 17:26:15 -07:00
Meng Xu cce00bb413
Merge pull request #1808 from ajbeamon/improved-transaction-metrics
Improve TransactionMetrics
2019-07-09 16:46:17 -07:00
A.J. Beamon 15ecba59db Merge branch 'master' into fix-mvc-default-options
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-07-09 15:28:12 -07:00
A.J. Beamon fdd580c878 Restore some variable initializations that were unintentionally removed. 2019-07-09 15:00:11 -07:00
Vishesh Yadav ae6c3e013a monitorClientInfo: Wait for master proxy endpoint failures than triggers
This will not initiate request to get get new set of proxy unless we
know for a fact that endpoint has indeed failed, not just because the
connection to Peer was closed as it was sitting idle.
2019-07-09 14:24:16 -07:00
A.J. Beamon 764a4591ad Add a comment to internal flag 2019-07-09 14:17:26 -07:00
Trevor Clinkenbeard 1582a2a24d Merge branch 'master' of https://github.com/apple/foundationdb into change-connection-file 2019-07-09 13:41:54 -07:00
Trevor Clinkenbeard 1bac04509e Track the local ratekeeper rate as a percentage
This value is reported in status for each storage server.
2019-07-09 12:46:53 -07:00
Vishesh Yadav eabc610daa
Merge pull request #1813 from alexmiller-apple/log-version-4
Add a TLogVersion::V4
2019-07-09 08:42:20 -07:00
Alex Miller d2ef84a8f9 Add a TLogVersion::V4
And refactor some code to make adding more TLogVersions easier.
2019-07-08 22:22:45 -07:00
A.J. Beamon a5a6f8431c Add a random UID to TransactionMetrics in case a client opens multiple connections and also a field to indicate whether the connection is internal. Convert some of the metrics to our Counter object instead of running totals. 2019-07-08 14:01:04 -07:00
Evan Tschannen c348b3da51 After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies 2019-07-08 12:53:40 -07:00
Evan Tschannen 15e894c724 Merge in master 2019-07-05 15:49:24 -07:00
A.J. Beamon 4be08d9b2d Rename datacenter_version_difference to datacenter_lag and include both seconds and versions. 2019-07-05 14:36:18 -07:00
Alex Miller ea6898144d Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-07-03 20:44:15 -07:00
A.J. Beamon 6b6012ee7b Add a break to setOption() switch statement. Better detection of missing options (and logging for present options). 2019-07-02 15:42:53 -07:00
Alec Grieser a84f481004
Merge pull request #1734 from ajbeamon/fix-onerror-retries-on-cluster-version-changed
If onError fails with cluster_version_changed, retry the error on the new transaction.
2019-07-03 00:23:36 +02:00
A.J. Beamon c9ed860277 Fix whitespace 2019-07-02 14:19:22 -07:00
A.J. Beamon 3f6ba3d737 Remove space 2019-07-02 11:18:45 -07:00
A.J. Beamon 1cf449db10 Undo formatting changes to otherwise unchanged code 2019-07-02 11:17:47 -07:00
A.J. Beamon e7218bbb28 Restore retry limiting on the client sampling transaction 2019-07-02 11:16:00 -07:00
A.J. Beamon 7e5b5a0536
Apply suggestions from code review
Use emplace_back instead of push_back

Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-07-02 11:09:46 -07:00
Alex Miller 8e1ab6e7db Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-28 17:32:54 -07:00
Evan Tschannen b9a6271375 local ratekeeper no longer globally limits 2019-06-28 16:54:22 -07:00
Evan Tschannen 92b32855ca ratekeeper’s control algorithm would oscillate when limited by local ratekeeper 2019-06-28 16:54:22 -07:00
A.J. Beamon aa1bc0087e Address some review comments 2019-06-28 14:17:25 -07:00
A.J. Beamon 2035b36257 Make default and persistent options specifyable via annotations to fdb.options. Fix some issues with persisting these options in the multi-version client. Make size limit option not persistent. 2019-06-28 13:24:32 -07:00
A.J. Beamon 7f23814841 Track run loop busyness and report it in status. 2019-06-26 14:03:02 -07:00
Alex Miller bf883d7055 Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-25 14:26:50 -07:00
Alex Miller d7c00f9cd2 And another. 2019-06-25 14:19:56 -07:00
Evan Tschannen 24937d8125
Merge pull request #1744 from vishesh/task/monitor-leader-on-demand
Fix setting enClientFailureMonitor global for client
2019-06-25 13:38:59 -07:00
Alex Miller 8e28930d12 Fix another hardcoded priority. 2019-06-25 10:36:32 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Trevor Clinkenbeard f38414e26d Changes error type on aborted read
Throw transaction_too_old instead of all_alternatives_failed when the
cluster file changes while a read request is outstanding
2019-06-24 10:01:41 -07:00
sramamoorthy 5abc891b12 undo the partial retry logic in NativeAPI 2019-06-24 09:36:07 -07:00
Andrew Noyes f7cd9438d3 Use camelCase instead of snake_case 2019-06-24 09:32:14 -07:00
Andrew Noyes 70ebcb3baf Fix quietDatabase timeouts
Update the implementation to interact with the new "don't maintain a
connection to the cluster controller unless necessary" change, and
unlock the originalDB at the end of the workload.
2019-06-22 14:27:30 -07:00
Alec Grieser e8c75505d3
Merge pull request #1725 from jzhou77/db-option
Add transaction size option
2019-06-21 08:25:34 -07:00
Balachandar Namasivayam 7489f83a7f Disable/Re-enable consistency check through a database key.
fdbcli has a new command 'consistencycheck' to disable/re-enable consistency check.
cluster_healthy metric in status becomes false if consistencycheck is disabled.
2019-06-20 21:38:45 -07:00