Commit Graph

6332 Commits

Author SHA1 Message Date
Jingyu Zhou 16840453d6
Merge pull request #1876 from xumengpanda/mengxu/6.2-release-meng-PR
Add release notes for DD related changes in 6.2
2019-07-23 10:11:11 -07:00
A.J. Beamon 6e078a41a7
Merge pull request #1828 from mpilman/features/lexicographical-ordered-traces
Make trace files lexicographically ordered
2019-07-23 08:31:59 -07:00
A.J. Beamon e98cee016d Fix unsafe usage of now() function from multiple threads in trace logging. 2019-07-22 22:31:38 -07:00
Meng Xu e582219ec5 Remove unnecessary condition in DDQueue
Resolve the review comment.
2019-07-22 17:00:37 -07:00
Andrew Noyes 5cef65b6c4
Update flow/FileTraceLogWriter.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-22 16:38:18 -07:00
Meng Xu f003613847 Add release notes for Meng Xu changes in 6.2 2019-07-22 16:34:49 -07:00
mpilman de4fe4d640 Remove dead code and revert back to base 10
there can be at most 2^32 files with a distinct index. Therefore
`log10(index) < 10` - this means base 10 is good enough.
2019-07-22 16:20:03 -07:00
Andrew Noyes e34e5b750b Fix rst syntax 2019-07-22 16:13:55 -07:00
Andrew Noyes 2a3233a647 Update release notes 2019-07-22 16:11:11 -07:00
Jingyu Zhou 6959858b6d Add release note for large packet handling and txn size limit 2019-07-22 15:42:32 -07:00
Trevor Clinkenbeard e3a564e1a7 Added release notes for Trevor Clinkenbeard changes in 6.2 2019-07-22 14:23:46 -07:00
mpilman aa93f8411a Merge branch 'features/lexicographical-ordered-traces' of github.com:mpilman/foundationdb into features/lexicographical-ordered-traces 2019-07-22 14:22:51 -07:00
Markus Pilman a57cd7e688
Update documentation/sphinx/source/release-notes.rst
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-22 14:20:33 -07:00
mpilman 001bfaf80c made trace index unsigned 2019-07-22 14:20:09 -07:00
Markus Pilman a48bdc0095
Update flow/FileTraceLogWriter.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-07-22 14:16:34 -07:00
Andrew Noyes 1968db17e3 Initialize in default constructor for GetReadVersionReply
==10473== Uninitialised byte(s) found during client check request
==10473==    at 0x1BA9ACE: sendPacket(TransportData*, ISerializeSource const&, Endpoint const&, bool, bool) (FlowTransport.actor.cpp:1252)
==10473==    by 0x877C05: (anonymous namespace)::NetworkSenderActorState<GetReadVersionReply, (anonymous namespace)::NetworkSenderActor<GetReadVersionReply> >::a_body1cont2(GetReadVersionReply const&, int) [clone .isra.0] (networksender.actor
.h:40)
==10473==    by 0x877CC6: a_body1when1 (networksender.actor.g.h:147)
==10473==    by 0x877CC6: a_callback_fire (networksender.actor.g.h:161)
==10473==    by 0x877CC6: ActorCallback<(anonymous namespace)::NetworkSenderActor<GetReadVersionReply>, 0, GetReadVersionReply>::fire(GetReadVersionReply const&) (flow.h:894)
==10473==    by 0xC343A7: send<GetReadVersionReply&> (flow.h:343)
==10473==    by 0xC343A7: send<GetReadVersionReply&> (fdbrpc.h:124)
==10473==    by 0xC343A7: (anonymous namespace)::ForwardProxyActorState<(anonymous namespace)::ForwardProxyActor>::a_body1loopBody1when2(ReplyPromise<GetReadVersionReply> const&, int) (MasterProxyServer.actor.cpp:1814)
==10473==    by 0xC33C10: (anonymous namespace)::ForwardProxyActorState<(anonymous namespace)::ForwardProxyActor>::a_body1loopBody1(int) (MasterProxyServer.actor.g.cpp:8167)
==10473==    by 0xC35434: a_body1loopHead1 (MasterProxyServer.actor.g.cpp:8152)
==10473==    by 0xC35434: a_body1loopBody1cont2 (MasterProxyServer.actor.g.cpp:8327)
==10473==    by 0xC35434: a_body1loopBody1cont1when1 (MasterProxyServer.actor.g.cpp:8333)
==10473==    by 0xC35434: a_body1loopBody1cont1when1 (MasterProxyServer.actor.g.cpp:8331)
==10473==    by 0xC35434: a_callback_fire (MasterProxyServer.actor.g.cpp:8347)
==10473==    by 0xC35434: ActorCallback<(anonymous namespace)::ForwardProxyActor, 3, Void>::fire(Void const&) (flow.h:894)
==10473==    by 0x7E7BE7: SAV<Void>::finishSendAndDelPromiseRef() (flow.h:375)
==10473==    by 0x8319FD: a_body1when1 (genericactors.actor.g.h:10892)
==10473==    by 0x8319FD: a_callback_fire (genericactors.actor.g.h:10920)
==10473==    by 0x8319FD: ActorCallback<(anonymous namespace)::ChooseActorActor<Void>, 0, Void>::fire(Void const&) (flow.h:894)
==10473==    by 0x891917: void SAV<Void>::send<Void>(Void&&) (flow.h:343)
==10473==    by 0x1C47ADC: send<Void> (flow.h:674)
==10473==    by 0x1C47ADC: execTask (sim2.actor.cpp:1632)
==10473==    by 0x1C47ADC: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) (sim2.actor.cpp:975)
==10473==    by 0x1C47FF2: a_body1loopBody1when1 (sim2.actor.g.cpp:5092)
==10473==    by 0x1C47FF2: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1(int) (sim2.actor.g.cpp:5037)
==10473==    by 0x1C47A6C: a_body1loopHead1 (sim2.actor.g.cpp:5020)
==10473==    by 0x1C47A6C: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) (sim2.actor.g.cpp:5086)
==10473==  Address 0x12db1ba1 is 2,977 bytes inside a recently re-allocated block of size 4,096 alloc'd
==10473==    at 0x1CC5D7F: FastAllocator<4096>::allocate() (FastAlloc.cpp:290)
==10473==    by 0x1CFAA68: operator new (FastAlloc.h:193)
==10473==    by 0x1CFAA68: PacketWriter::nextBuffer() (Net2Packet.cpp:59)
==10473==    by 0x1CFABD6: PacketWriter::writeAhead(int, SplitBuffer*) (Net2Packet.cpp:81)
==10473==    by 0x1BA97EB: sendPacket(TransportData*, ISerializeSource const&, Endpoint const&, bool, bool) (FlowTransport.actor.cpp:1199)
==10473==    by 0x7DEAD1: a_body1cont2 (networksender.actor.h:40)
==10473==    by 0x7DEAD1: a_body1when1 (networksender.actor.g.h:147)
==10473==    by 0x7DEAD1: a_callback_fire (networksender.actor.g.h:161)
==10473==    by 0x7DEAD1: ActorCallback<(anonymous namespace)::NetworkSenderActor<GetValueReply>, 0, GetValueReply>::fire(GetValueReply const&) (flow.h:894)
==10473==    by 0xF22767: send<GetValueReply&> (flow.h:343)
==10473==    by 0xF22767: send<GetValueReply&> (fdbrpc.h:124)
==10473==    by 0xF22767: (anonymous namespace)::GetValueQActorState<(anonymous namespace)::GetValueQActor>::a_body1cont5(int) [clone .isra.0] (storageserver.actor.cpp:890)
==10473==    by 0xF2305C: (anonymous namespace)::GetValueQActorState<(anonymous namespace)::GetValueQActor>::a_body1cont3(int) [clone .isra.0] (storageserver.actor.g.cpp:1592)
==10473==    by 0xF23447: a_body1cont2when1 (storageserver.actor.g.cpp:1627)
==10473==    by 0xF23447: (anonymous namespace)::GetValueQActorState<(anonymous namespace)::GetValueQActor>::a_body1cont2(Void const&, int) [clone .isra.0] (storageserver.actor.g.cpp:1512)
==10473==    by 0xF23507: a_body1when1 (storageserver.actor.g.cpp:1523)
==10473==    by 0xF23507: a_callback_fire (storageserver.actor.g.cpp:1537)
==10473==    by 0xF23507: ActorCallback<(anonymous namespace)::GetValueQActor, 0, Void>::fire(Void const&) (flow.h:894)
==10473==    by 0x891917: void SAV<Void>::send<Void>(Void&&) (flow.h:343)
==10473==    by 0x1C47ADC: send<Void> (flow.h:674)
==10473==    by 0x1C47ADC: execTask (sim2.actor.cpp:1632)
==10473==    by 0x1C47ADC: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1cont1(Void const&, int) (sim2.actor.cpp:975)
==10473==    by 0x1C47FF2: a_body1loopBody1when1 (sim2.actor.g.cpp:5092)
==10473==    by 0x1C47FF2: Sim2::RunLoopActorState<Sim2::RunLoopActor>::a_body1loopBody1(int) (sim2.actor.g.cpp:5037)
==10473==  Uninitialised value was created by a stack allocation
==10473==    at 0xC342D0: (anonymous namespace)::ForwardProxyActorState<(anonymous namespace)::ForwardProxyActor>::a_body1loopBody1when2(ReplyPromise<GetReadVersionReply> const&, int) (MasterProxyServer.actor.g.cpp:8213)
2019-07-22 13:54:52 -07:00
Vishesh Yadav 2f9f3c184f
Merge pull request #1870 from alexmiller-apple/txnStateStore-workload
Add ability to bulk load data into TxnStateStore
2019-07-22 13:20:39 -07:00
A.J. Beamon e29a6ea280
Merge pull request #1871 from bnamasivayam/tr-priority-add-client-log
Track the priority of sampled Transaction as part of GetReadVersion e…
2019-07-22 13:04:52 -07:00
Balachandar Namasivayam df652155fc Addressed review comments 2019-07-22 12:17:05 -07:00
mpilman c26c68d6d2 Address review comments 2019-07-22 11:45:05 -07:00
Meng Xu b7478f5dd3 DD:Add comments to help understand code
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Trevor Clinkenbeard 3507bfd52f Trigger masterProxiesChangeTrigger in switchConnectionFileImpl 2019-07-22 11:13:07 -07:00
Meng Xu 378db79441 Resolve conflict when merge with master 2019-07-22 10:56:20 -07:00
mpilman a66dc937fc Make trace files lexicographically ordered
This resolves #1825

The basic idea is that we prefix every sequence number with its with,
separated by a dot. That way the filename length isn't increased
drastically (unlike using a fixed whidth number with 0-padding)
and the file names will be lexicographically ordered.
2019-07-22 10:10:47 -07:00
Meng Xu dae4436a3d TC:UnitTest:Change invariant due to alg change 2019-07-20 21:06:54 -07:00
Meng Xu 612a51fe00 Apply Clang format to PRIORITY_TEAM_REDUNDANT 2019-07-19 18:32:22 -07:00
Meng Xu ea76451f15 Count PRIORITY_TEAM_REDUNDANT as count PRIORITY_TEAM_UNHEALTHY 2019-07-19 18:30:01 -07:00
Alex Miller 4ac1a0f557 Add ability to bulk load data into TxnStateStore
* Changes BulkLoad workload to support a specific volume of data to load
* Changes BulkLoad and Cycle to correctly handle \xff in keyPrefix
* Adds BulkLoad to TxnStateStoreCycleTest
2019-07-19 18:01:24 -07:00
Evan Tschannen c70e762f0e
Merge pull request #1785 from xumengpanda/mengxu/server-team-remover-PR
Remove redundant server teams
2019-07-19 17:44:16 -07:00
Balachandar Namasivayam af267ba053 Track the priority of sampled Transaction as part of GetReadVersion event. 2019-07-19 17:31:49 -07:00
Alex Miller df7f0cffa1 Raise the priority of TLogRejoin above the default work priority.
With sharded txs tags, the master now receives data from transaction
logs at an order of magnitude higher rate.  This is the intentional
desires result of sharding the txs tag.  With a sufficient number of
TLogs, the master will saturate its CPU time handling the peek
responses.

Performance tests revealed some unstable oddities in how long a recovery
would take, which was eventually root caused to a priority inversion
between TLogRejoin requests and TLog peek replies.

Once peek replies saturate the CPU, the master would proceed to ignore
further TLogRejoin messages.  TLogRejoin is what marks a TLog as
available to the failure monitor, which is also what decides between a
ServerPeekCursor and a MergePeekCursor for a SetPeekCursor.  Ignoring
TLogRejoins meant that the sharded txs locality tags for those servers
would be merge peeked over all TLogs.  This is much less efficient than
just peeking one copy of data from the one preferred server.

Depending on the race between TLogPeek replies saturating the CPU and
TLogRejoin requests being submitted, a variable number of tags would be
affected, and thus the performance test would have some variance in its
results.
2019-07-19 16:55:04 -07:00
Meng Xu b001a9ebe8 ServerTeamRemover runs after machineTeamRemover finishes
If serverTeamRemover removes a team before machineTeamRemover brings
the machine team number down to the desired number, DD may create a new
team (due to teams removed by serverTeamRemover), which may be removed
later by machineTeamRemover. This causes unnnecessary extra data movement.
2019-07-19 16:48:52 -07:00
Evan Tschannen 041531d283
Merge pull request #1864 from ajbeamon/client-thread-safety-fix
Fix thread-safety issue with connection file when creating database on client
2019-07-19 16:42:26 -07:00
Evan Tschannen 846038b0e6
Merge pull request #1858 from bnamasivayam/rk-ssfetch-throttle
Ratekeeper throttling aggressively when unable to fetch storage server list
2019-07-19 16:41:58 -07:00
Evan Tschannen d6cf18f2e1
Merge pull request #1851 from ajbeamon/fix-huge-arena-sample-thread-safety
Fix huge arena tracking thread-safety issues.
2019-07-19 16:41:03 -07:00
Evan Tschannen 6d694cc2ce
Merge pull request #1818 from alexmiller-apple/peek-cursor-timeout-bug
Fix parallel peek stalling for 10min when a TLog generation is destroyed
2019-07-19 16:39:31 -07:00
Evan Tschannen fa182befc7
Merge pull request #1863 from jzhou77/db-option
Log large transactions at proxy
2019-07-19 16:35:24 -07:00
Evan Tschannen 3045826e3c
Merge pull request #1819 from mpilman/flatbuffers-fixes2
Flatbuffers fixes2
2019-07-19 16:33:50 -07:00
Alex Miller c3a8ae4752
Merge pull request #1791 from fzhjon/fetch-keys-requests-priority
Introduce priority to fetchKeys requests from data distribution
2019-07-19 14:54:51 -07:00
Meng Xu f243e77afc Increase merge and split shard priority by 100
PRIORITY_TEAM_REDUNDANT should be in a different priority band from
PRIORITY_MERGE_SHARD and PRIORITY_SPLIT_SHARD, because
priority inversion happens within priorities in the same band.
2019-07-19 13:55:38 -07:00
Jingyu Zhou 8e9aaa767e
Merge pull request #1860 from alexmiller-apple/comma-operator
Remove `operator , (vector<T>, T)` as an append operator.
2019-07-19 12:53:32 -07:00
A.J. Beamon f6183df8b9
Merge pull request #1852 from vishesh/task/issue-1840-non-blocking-exclusion
fdbcli: Add `no_wait` option in `exclude` command to avoid blocking
2019-07-19 12:49:29 -07:00
Jingyu Zhou 63e37aebaf Reorder include files. 2019-07-19 11:24:26 -07:00
A.J. Beamon b93a08ac6f Don't wrap the connection file in a reference until after we are on the main thread because references aren't thread safe. 2019-07-19 11:16:30 -07:00
sramamoorthy 0962641540 setDDMode should set moveKeysLockWriteKey
After takeMoveKeysLock notes down the owner and the
moveKeysLockWriteKey value, it monitors the above two in
pollMoveKeysLock and checks if anything is changed, but
setDDMode was not setting the moveKeysLockWriteKey and
so a sequence like disable, enable and disable would not
really disable DD.
2019-07-19 11:13:29 -07:00
Jingyu Zhou d8fb1ea2d3 Log large transactions at proxy
This can help debugging where large transactions are coming from.
2019-07-19 11:10:48 -07:00
Jingyu Zhou fcf22cf264
Merge pull request #1861 from ajbeamon/binding-tester-fix
Size limit test only sets database defaults when run directly
2019-07-19 10:12:31 -07:00
A.J. Beamon cbc913f902 When run through external means (such as the binding tester), the size limits tests should not change db level defaults. 2019-07-19 09:03:11 -07:00
A.J. Beamon bc5c65e5ab
Merge pull request #1756 from jzhou77/db-option
Add transaction getApproximateSize() API
2019-07-19 08:33:24 -07:00
Alex Miller 9863ace96c Replace usages with intialization lists.
But C++ needs a bit of help to inference though the templates.
2019-07-18 22:27:36 -07:00