Commit Graph

341 Commits

Author SHA1 Message Date
Evan Tschannen d6d347f665 treat a tlog which takes a long time to create its disk queue as failed 2020-03-13 10:31:59 -07:00
Evan Tschannen 8129f74a10
Merge pull request #2698 from etschannen/feature-recruit-delay
The CC waits until no new workers register before starting a bad recruitment
2020-02-20 14:42:37 -08:00
A.J. Beamon fcbdcda490
Merge pull request #2650 from ajbeamon/fix-reverse-range-read-byte-limit-bug
Fix reverse range read performance bug
2020-02-20 12:47:17 -08:00
Evan Tschannen fbd45963d8 The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment 2020-02-19 16:48:30 -08:00
A.J. Beamon 1d9140d874 Removed TLogVersion logging.
Added logging of SharedTLog ID for each TLog.
Switched ID logged for TLogRejoining event to the TLog instead of the SharedTLog.
Made some parameters to startRole passed by reference.
2020-02-14 12:33:43 -08:00
A.J. Beamon 56053c565b Improve TLog "Role" event by adding the worker ID, the TLog version, and under what circumstances the TLog is being started (Restored, Recruited, or Recovered).
The SharedTLog role was being started and stopped twice, so remove one instance of it.
2020-02-12 15:11:38 -08:00
A.J. Beamon df2b0452b4 Step 3 of fixing storage server range reads: change return type of readRange from VectorRef<KeyValueRef> to RangeResultRef. 2020-02-06 13:19:24 -08:00
Evan Tschannen 827cea74b5 fix: tlogs must send a recruitment reply even when actor cancelled or the recruitment endpoint will be marked as permanently failed 2020-01-16 17:37:17 -08:00
Evan Tschannen 396dccbc98 when peeking from satellites we do not need to limit the amount of peeking on log router tags, because that is the only thing that can be peeked from a satellite log 2019-11-08 18:34:05 -08:00
Evan Tschannen a8ca47beff optimized memory allocations by using VectorRef<Tag> instead of std::vector<Tag> 2019-11-05 18:07:30 -08:00
Evan Tschannen 85c315f684 Fix: parallelPeekMore was not enabled when peeking from log routers 2019-11-01 14:02:44 -07:00
Evan Tschannen 2722c8b188 avoid starting a new startSpillingActor with every TLog recruitment 2019-10-23 11:15:54 -07:00
Evan Tschannen e01e8371a6
Merge pull request #2256 from alexmiller-apple/spill-log-on-switch-6.2
Spill SharedTLog when there's more than one
2019-10-23 10:51:28 -07:00
Alex Miller 0c325c5351 Always check which SharedTLog is active
In case it is set before we get to the onChange()
2019-10-23 01:59:36 -07:00
Alex Miller 1e5b8c74e3 Continuing a parallel peek after a timeout would hang.
This is to guard against the case where

1. Peeks with sequence numbers 0-39 are submitted
2. A 15min pause happens, in which timeout removes the peek tracker data
3. Peeks with sequence numbers 40-59 are submitted, with the same peekId

The second round of peeks wouldn't have the data left that it's allowed
to start running peek 40 immediately, and thus would hang for 10min
until it gets cleaned up.

Also, guard against overflowing the sequence number.
2019-10-22 19:24:05 -07:00
Alex Miller 1eb3a70b96 Spill SharedTLog when there's more than one.
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes.  If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.

Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.

This is a backport of #2213 (fef89aa1) to release-6.2
2019-10-17 01:24:50 -07:00
Alex Miller 53bcf41805 Fix the build. 2019-09-12 18:46:30 -07:00
Alex Miller befa0646b3 Merge remote-tracking branch 'upstream/release-6.2' into faster-remote-dc 2019-09-12 18:46:03 -07:00
Evan Tschannen 6a7f109788 added logging on the TLog for the tag with smallest popped version 2019-09-12 16:22:01 -07:00
Alex Miller 99843bd4ba Add parallel peek support to log routers 2019-09-12 14:26:37 -07:00
Evan Tschannen dc1d055b27
Merge pull request #2042 from senthil-ram/snap_cli_fix
fix fdbcli --exec 'snapshot create.sh' failure
2019-08-30 13:40:38 -07:00
sramamoorthy b3277f2982 Fix #2009 posix compliant args for snapshot binary 2019-08-30 12:54:09 -07:00
Andrew Noyes 6aa0ada7b1 Replace scalar root types with proper messages 2019-08-28 14:40:50 -07:00
Evan Tschannen 3774ff55b0 There were still use cases where this checks are necessary 2019-07-31 17:45:21 -07:00
Evan Tschannen 854ee75664 we no longer need to special case for txs tag, because it will be initialized by createTagData 2019-07-31 17:13:15 -07:00
Evan Tschannen ff171e293e fix: always make sure to add txsTags to localTags for remote logs 2019-07-31 16:04:35 -07:00
Evan Tschannen 9f11f2ec53 Merge branch 'master' of github.com:apple/foundationdb 2019-07-30 16:55:56 -07:00
Evan Tschannen aaeeb605b2 Changes to degraded can cause master recoveries, which are not supposed to happen when speedUpSimulation is true 2019-07-30 16:33:40 -07:00
Evan Tschannen 6977e7d2e8 do not return recovered version as popped for txsTags because it could cause recovery to start over
optimized how buffered peek cursor discards popped data
2019-07-30 12:21:48 -07:00
Evan Tschannen 13203da199 fix: do not set the popped version of txsTag because it could be copied over at the recoveredAt version 2019-07-27 22:36:06 -07:00
Evan Tschannen 28df2c35bb
Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade
Make sharded txsTag upgradeable and downgradeable
2019-07-26 13:29:39 -07:00
sramamoorthy 9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
sramamoorthy a65c9f92ed get rid of all timeouts and other changes 2019-07-24 15:36:28 -07:00
sramamoorthy a2f2ad96ff code review comments and merge to master changes 2019-07-24 15:36:28 -07:00
sramamoorthy 31c010b393 few minor fixes 2019-07-24 15:36:28 -07:00
sramamoorthy c73bdfad9f do not pop txsTag 2019-07-24 15:36:28 -07:00
sramamoorthy a335ed2011 includeCancelled for tLogSnapCreate 2019-07-24 15:36:28 -07:00
sramamoorthy 61cd690add enable/disable pop req with UID mis-match to fail 2019-07-24 15:36:28 -07:00
sramamoorthy f4e257e464 snap v2: TLog related changes 2019-07-24 15:36:28 -07:00
Evan Tschannen 6d694cc2ce
Merge pull request #1818 from alexmiller-apple/peek-cursor-timeout-bug
Fix parallel peek stalling for 10min when a TLog generation is destroyed
2019-07-19 16:39:31 -07:00
Alex Miller 9863ace96c Replace usages with intialization lists.
But C++ needs a bit of help to inference though the templates.
2019-07-18 22:27:36 -07:00
Alex Miller 55258709a0 Remove an ASSERT from testing and now inaccurate comment. 2019-07-17 01:30:01 -07:00
Alex Miller e9684a1f63 Fix issues configuring from sharded txs tag to not
Which is an intermingling of what should be two commits:

1. Rely on TLogVersion instead of txsTags==0

2. Copy and index sharded txsTags between KCV and RV as txsTag when
configuring log_version 4->3.
2019-07-17 01:25:09 -07:00
Alex Miller 812ce37bcd Remove buggify and unneeded safeguards.
The buggify was actually incorrect and broke an invariant, which I then
fixed on the other side, but this work was actually unneeded in total.

The real issue being fixed was returnIfBlock not sending an error, as
well as the other error cases.
2019-07-16 15:58:02 -07:00
Alex Miller 4cc60dc9b8 Merge remote-tracking branch 'upstream/master' into peek-cursor-timeout-bug 2019-07-15 17:05:39 -07:00
Alex Miller 2cbc05fc72 Address more issues that cause peek cursors to time out.
There were error cases that would cause a peek to terminate early or be
cancelled without sending anything to the next peek in line.  We would
thus end up with the first peek in a sequence waiting on its future, and
nothing that exists that would send to that future.
2019-07-15 16:03:37 -07:00
Alex Miller c8e94e601a
Merge pull request #1729 from etschannen/feature-fast-txs-recovery
Improve the recovery speed of the txnStateStore
2019-07-15 13:27:41 -07:00
Vishesh Yadav 2606794df6
Merge pull request #1812 from alexmiller-apple/improve-only-spilled
Improve the behavior of parallelPeekMore+onlySpilled.
2019-07-10 17:15:19 -07:00
Evan Tschannen d8948c8be1 Merge branch 'master' into feature-fast-txs-recovery
# Conflicts:
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2019-07-10 13:59:52 -07:00
Evan Tschannen 49121172ea
Merge pull request #1795 from alexmiller-apple/peek-from-satellites
Log Routers will prefer to peek from satellite logs.
2019-07-09 17:38:57 -07:00