Evan Tschannen
d6d347f665
treat a tlog which takes a long time to create its disk queue as failed
2020-03-13 10:31:59 -07:00
Evan Tschannen
8129f74a10
Merge pull request #2698 from etschannen/feature-recruit-delay
...
The CC waits until no new workers register before starting a bad recruitment
2020-02-20 14:42:37 -08:00
A.J. Beamon
fcbdcda490
Merge pull request #2650 from ajbeamon/fix-reverse-range-read-byte-limit-bug
...
Fix reverse range read performance bug
2020-02-20 12:47:17 -08:00
Evan Tschannen
fbd45963d8
The cluster controller waits until no new workers register for 1.0 before starting a bad recruitment
2020-02-19 16:48:30 -08:00
A.J. Beamon
1d9140d874
Removed TLogVersion logging.
...
Added logging of SharedTLog ID for each TLog.
Switched ID logged for TLogRejoining event to the TLog instead of the SharedTLog.
Made some parameters to startRole passed by reference.
2020-02-14 12:33:43 -08:00
A.J. Beamon
56053c565b
Improve TLog "Role" event by adding the worker ID, the TLog version, and under what circumstances the TLog is being started (Restored, Recruited, or Recovered).
...
The SharedTLog role was being started and stopped twice, so remove one instance of it.
2020-02-12 15:11:38 -08:00
A.J. Beamon
df2b0452b4
Step 3 of fixing storage server range reads: change return type of readRange from VectorRef<KeyValueRef> to RangeResultRef.
2020-02-06 13:19:24 -08:00
Evan Tschannen
827cea74b5
fix: tlogs must send a recruitment reply even when actor cancelled or the recruitment endpoint will be marked as permanently failed
2020-01-16 17:37:17 -08:00
Evan Tschannen
396dccbc98
when peeking from satellites we do not need to limit the amount of peeking on log router tags, because that is the only thing that can be peeked from a satellite log
2019-11-08 18:34:05 -08:00
Evan Tschannen
a8ca47beff
optimized memory allocations by using VectorRef<Tag> instead of std::vector<Tag>
2019-11-05 18:07:30 -08:00
Evan Tschannen
85c315f684
Fix: parallelPeekMore was not enabled when peeking from log routers
2019-11-01 14:02:44 -07:00
Evan Tschannen
2722c8b188
avoid starting a new startSpillingActor with every TLog recruitment
2019-10-23 11:15:54 -07:00
Evan Tschannen
e01e8371a6
Merge pull request #2256 from alexmiller-apple/spill-log-on-switch-6.2
...
Spill SharedTLog when there's more than one
2019-10-23 10:51:28 -07:00
Alex Miller
0c325c5351
Always check which SharedTLog is active
...
In case it is set before we get to the onChange()
2019-10-23 01:59:36 -07:00
Alex Miller
1e5b8c74e3
Continuing a parallel peek after a timeout would hang.
...
This is to guard against the case where
1. Peeks with sequence numbers 0-39 are submitted
2. A 15min pause happens, in which timeout removes the peek tracker data
3. Peeks with sequence numbers 40-59 are submitted, with the same peekId
The second round of peeks wouldn't have the data left that it's allowed
to start running peek 40 immediately, and thus would hang for 10min
until it gets cleaned up.
Also, guard against overflowing the sequence number.
2019-10-22 19:24:05 -07:00
Alex Miller
1eb3a70b96
Spill SharedTLog when there's more than one.
...
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes. If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.
Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.
This is a backport of #2213 (fef89aa1
) to release-6.2
2019-10-17 01:24:50 -07:00
Alex Miller
53bcf41805
Fix the build.
2019-09-12 18:46:30 -07:00
Alex Miller
befa0646b3
Merge remote-tracking branch 'upstream/release-6.2' into faster-remote-dc
2019-09-12 18:46:03 -07:00
Evan Tschannen
6a7f109788
added logging on the TLog for the tag with smallest popped version
2019-09-12 16:22:01 -07:00
Alex Miller
99843bd4ba
Add parallel peek support to log routers
2019-09-12 14:26:37 -07:00
Evan Tschannen
dc1d055b27
Merge pull request #2042 from senthil-ram/snap_cli_fix
...
fix fdbcli --exec 'snapshot create.sh' failure
2019-08-30 13:40:38 -07:00
sramamoorthy
b3277f2982
Fix #2009 posix compliant args for snapshot binary
2019-08-30 12:54:09 -07:00
Andrew Noyes
6aa0ada7b1
Replace scalar root types with proper messages
2019-08-28 14:40:50 -07:00
Evan Tschannen
3774ff55b0
There were still use cases where this checks are necessary
2019-07-31 17:45:21 -07:00
Evan Tschannen
854ee75664
we no longer need to special case for txs tag, because it will be initialized by createTagData
2019-07-31 17:13:15 -07:00
Evan Tschannen
ff171e293e
fix: always make sure to add txsTags to localTags for remote logs
2019-07-31 16:04:35 -07:00
Evan Tschannen
9f11f2ec53
Merge branch 'master' of github.com:apple/foundationdb
2019-07-30 16:55:56 -07:00
Evan Tschannen
aaeeb605b2
Changes to degraded can cause master recoveries, which are not supposed to happen when speedUpSimulation is true
2019-07-30 16:33:40 -07:00
Evan Tschannen
6977e7d2e8
do not return recovered version as popped for txsTags because it could cause recovery to start over
...
optimized how buffered peek cursor discards popped data
2019-07-30 12:21:48 -07:00
Evan Tschannen
13203da199
fix: do not set the popped version of txsTag because it could be copied over at the recoveredAt version
2019-07-27 22:36:06 -07:00
Evan Tschannen
28df2c35bb
Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade
...
Make sharded txsTag upgradeable and downgradeable
2019-07-26 13:29:39 -07:00
sramamoorthy
9afd162e2f
remove snap v1 related code
2019-07-25 17:29:31 -07:00
sramamoorthy
a65c9f92ed
get rid of all timeouts and other changes
2019-07-24 15:36:28 -07:00
sramamoorthy
a2f2ad96ff
code review comments and merge to master changes
2019-07-24 15:36:28 -07:00
sramamoorthy
31c010b393
few minor fixes
2019-07-24 15:36:28 -07:00
sramamoorthy
c73bdfad9f
do not pop txsTag
2019-07-24 15:36:28 -07:00
sramamoorthy
a335ed2011
includeCancelled for tLogSnapCreate
2019-07-24 15:36:28 -07:00
sramamoorthy
61cd690add
enable/disable pop req with UID mis-match to fail
2019-07-24 15:36:28 -07:00
sramamoorthy
f4e257e464
snap v2: TLog related changes
2019-07-24 15:36:28 -07:00
Evan Tschannen
6d694cc2ce
Merge pull request #1818 from alexmiller-apple/peek-cursor-timeout-bug
...
Fix parallel peek stalling for 10min when a TLog generation is destroyed
2019-07-19 16:39:31 -07:00
Alex Miller
9863ace96c
Replace usages with intialization lists.
...
But C++ needs a bit of help to inference though the templates.
2019-07-18 22:27:36 -07:00
Alex Miller
55258709a0
Remove an ASSERT from testing and now inaccurate comment.
2019-07-17 01:30:01 -07:00
Alex Miller
e9684a1f63
Fix issues configuring from sharded txs tag to not
...
Which is an intermingling of what should be two commits:
1. Rely on TLogVersion instead of txsTags==0
2. Copy and index sharded txsTags between KCV and RV as txsTag when
configuring log_version 4->3.
2019-07-17 01:25:09 -07:00
Alex Miller
812ce37bcd
Remove buggify and unneeded safeguards.
...
The buggify was actually incorrect and broke an invariant, which I then
fixed on the other side, but this work was actually unneeded in total.
The real issue being fixed was returnIfBlock not sending an error, as
well as the other error cases.
2019-07-16 15:58:02 -07:00
Alex Miller
4cc60dc9b8
Merge remote-tracking branch 'upstream/master' into peek-cursor-timeout-bug
2019-07-15 17:05:39 -07:00
Alex Miller
2cbc05fc72
Address more issues that cause peek cursors to time out.
...
There were error cases that would cause a peek to terminate early or be
cancelled without sending anything to the next peek in line. We would
thus end up with the first peek in a sequence waiting on its future, and
nothing that exists that would send to that future.
2019-07-15 16:03:37 -07:00
Alex Miller
c8e94e601a
Merge pull request #1729 from etschannen/feature-fast-txs-recovery
...
Improve the recovery speed of the txnStateStore
2019-07-15 13:27:41 -07:00
Vishesh Yadav
2606794df6
Merge pull request #1812 from alexmiller-apple/improve-only-spilled
...
Improve the behavior of parallelPeekMore+onlySpilled.
2019-07-10 17:15:19 -07:00
Evan Tschannen
d8948c8be1
Merge branch 'master' into feature-fast-txs-recovery
...
# Conflicts:
# fdbserver/TagPartitionedLogSystem.actor.cpp
2019-07-10 13:59:52 -07:00
Evan Tschannen
49121172ea
Merge pull request #1795 from alexmiller-apple/peek-from-satellites
...
Log Routers will prefer to peek from satellite logs.
2019-07-09 17:38:57 -07:00