Commit Graph

2592 Commits

Author SHA1 Message Date
Evan Tschannen 4a597fdcce increase the task priority of popping 2019-11-05 15:03:41 -08:00
Evan Tschannen daac8a2c22 Knobified a few variables 2019-11-04 20:21:38 -08:00
Evan Tschannen 457896b80d remote logs use bufferedCursor when peeking from log routers to improve performance
bufferedCursor performance has been improved
2019-11-04 19:47:45 -08:00
Evan Tschannen 8a3521f945 Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-enable-parallel-peek 2019-11-01 14:04:22 -07:00
Evan Tschannen 85c315f684 Fix: parallelPeekMore was not enabled when peeking from log routers 2019-11-01 14:02:44 -07:00
A.J. Beamon 1dc5985062
Merge pull request #2305 from etschannen/release-6.2
merges crossing systemKeys.begin did not decrement systemSizeEstimate
2019-11-01 09:12:01 -07:00
Evan Tschannen 8f0348d5e0 fix: merges which cross over systemKeys.begin did not properly decrement the systemSizeEstimate 2019-10-31 16:38:33 -07:00
Evan Tschannen 5cf0045bc0
Merge pull request #2294 from satherton/feature-redwood
Bug fixes in Redwood
2019-10-25 14:56:01 -07:00
Stephen Atherton 2ee1782c19 Bug fixes in Redwood. BTree height was not being reset when a new empty root is written. IKeyValueStore wrapper was not obeying the row limit in a reverse range query. Added yields to and delays to break up tasks and set IO priorities. 2019-10-25 14:52:06 -07:00
Evan Tschannen a7492aab0a fix: poppedVersion can update during a yield, so all work must be done immediately after getMore returns 2019-10-23 23:06:02 -07:00
Evan Tschannen f8e44d2f71 fix: If a storage server was offline, it would not be checked for being in an undesired dc 2019-10-23 23:04:39 -07:00
Evan Tschannen eb910b850b fixed a window build error 2019-10-23 13:48:24 -07:00
Evan Tschannen 2722c8b188 avoid starting a new startSpillingActor with every TLog recruitment 2019-10-23 11:15:54 -07:00
Evan Tschannen ae3f8132a7
Merge pull request #2280 from satherton/feature-redwood
Update redwood
2019-10-23 10:57:38 -07:00
Evan Tschannen 9197b03122
Merge pull request #2279 from ajbeamon/latency-band-ignore-batch
Ignore batch priority GRVs for latency band tracking
2019-10-23 10:52:44 -07:00
Evan Tschannen e01e8371a6
Merge pull request #2256 from alexmiller-apple/spill-log-on-switch-6.2
Spill SharedTLog when there's more than one
2019-10-23 10:51:28 -07:00
Evan Tschannen c1731e3b8d
Merge pull request #2276 from alexmiller-apple/fix-10min-stall-again-6.2
More fixes to prevent 10min stalls in recovering secondaries
2019-10-23 10:45:55 -07:00
A.J. Beamon a1bed51d34 Ignore batch priority GRVs for latency band tracking 2019-10-23 10:29:58 -07:00
Stephen Atherton 0e51a248b4 Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-redwood 2019-10-23 10:12:54 -07:00
Stephen Atherton 613bbaecc4 Bug fix in queue page footprint tracking. Added VersionedBTree::destroyAndCheckSanity() which clears the tree, processes the entire lazy delete queue, and then verifies some pager usage statistics. This check is currently disabled because it appears to find a bug where the final state has a few more pages in use than expected. StorageBytes now includes the delayed free list pages as free space since they will be reusable soon. 2019-10-23 09:31:06 -07:00
Alex Miller 0c325c5351 Always check which SharedTLog is active
In case it is set before we get to the onChange()
2019-10-23 01:59:36 -07:00
Alex Miller 1e5b8c74e3 Continuing a parallel peek after a timeout would hang.
This is to guard against the case where

1. Peeks with sequence numbers 0-39 are submitted
2. A 15min pause happens, in which timeout removes the peek tracker data
3. Peeks with sequence numbers 40-59 are submitted, with the same peekId

The second round of peeks wouldn't have the data left that it's allowed
to start running peek 40 immediately, and thus would hang for 10min
until it gets cleaned up.

Also, guard against overflowing the sequence number.
2019-10-22 19:24:05 -07:00
Evan Tschannen f65f0cd37a
Merge pull request #2274 from etschannen/feature-cleanup-destuidlookup
Automatically cleanup backup and DR sharing metadata
2019-10-22 19:11:23 -07:00
Alex Miller c008e7f8b3 When switching parallel->single->parallel, reset sequence and peekId
This fixes an issue where one could hang for 10min for the second
parallel peek to time out, if one happened to catch the edge of a
onlySpilled transition wrong.
2019-10-22 19:10:58 -07:00
Stephen Atherton 6a57fab431 Bug fixes in lazy subtree deletion, queue pushFront(), queue flush(), and advancing the oldest pager version. CommitSubtree no longer forces page rewrites due to boundary changes. IPager2 and IVersionedStore now have explicit async init() functions to avoid returning futures from some frequently used functions. 2019-10-22 17:17:29 -07:00
Evan Tschannen 35ac0071a8 fixed a compiler error 2019-10-22 17:06:54 -07:00
Evan Tschannen 2d74288d16 Added a comment to clarify why cleanup work is done in status 2019-10-22 16:33:44 -07:00
Evan Tschannen 3478652d06
Apply suggestions from code review
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:32:09 -07:00
Evan Tschannen d5c2147c0c
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:27:52 -07:00
Evan Tschannen 2caad04d9c Keys in the destUIDLookupPrefix can be cleaned up automatically if they do not have an associated entry in the logRangesRange keyspace 2019-10-22 11:58:40 -07:00
Evan Tschannen 12c517ab16 limit the number of committed version updates in progress simultaneously to prevent running out of memory 2019-10-21 16:01:45 -07:00
Stephen Atherton 44175e0921 COWPager will no longer expire read Snapshots that are still in use. 2019-10-18 01:27:00 -07:00
Stephen Atherton 0e9d082805 Bug fixes in FIFOQueue concurrent nested reads and writes caused by the pager/freelist circular dependencies. 2019-10-17 21:34:17 -07:00
Evan Tschannen 43e99ef6a4 fix: better master exists must check if fitness is better for proxies or resolvers before looking at the count of either of them 2019-10-17 13:18:31 -07:00
Alex Miller 1eb3a70b96 Spill SharedTLog when there's more than one.
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes.  If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.

Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.

This is a backport of #2213 (fef89aa1) to release-6.2
2019-10-17 01:24:50 -07:00
Evan Tschannen 42b7acf7b7
Merge pull request #2202 from etschannen/feature-share-mutations
Backup and DR would not share mutations if started on different versions of FDB
2019-10-16 20:28:39 -07:00
Evan Tschannen a81ff63147
Merge pull request #2250 from etschannen/feature-fix-proxy-slow-task
added a yield on the proxy to remove a slow task when processing large transactions
2019-10-16 20:22:05 -07:00
Evan Tschannen 587cbefe7f duplicate mutation stream checker did not have a timeout
duplicate mutation stream did not work properly when multiple ranges exist with the same begin key
2019-10-16 20:17:09 -07:00
Evan Tschannen 5be773f145
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:24 -07:00
Evan Tschannen 2facfc090b
Update fdbserver/Status.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:12 -07:00
Evan Tschannen a85f69c62f
Merge pull request #2241 from etschannen/feature-recruitment-cleanup
Fixed a few small issues with recruitment logic on the cluster controller
2019-10-16 16:25:42 -07:00
Evan Tschannen 552eb44bf8
Merge pull request #2230 from ajbeamon/fix-fault-tolerance-reporting-with-remote-regions
Fix: status would fail to account for remote regions when...
2019-10-16 14:51:48 -07:00
Evan Tschannen 8b09cd16b2 Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations 2019-10-16 14:50:37 -07:00
Evan Tschannen ac28e96bbf added a yield on the proxy to remove a slow task when processing large transactions 2019-10-16 14:31:59 -07:00
Stephen Atherton 6b7317da9b Bug and clarity fixes to tracking FIFOQueue page and item count. 2019-10-15 03:36:22 -07:00
Stephen Atherton c3e2bde987 Deferred subtree clears and expiring/reusing old pages is complete. Many bug fixes involving scheduled page freeing, page list queue flushing, and expiring old snapshots (this was mostly written but not used yet). Rewrote most of FIFOQueue (again) to more cleanly handle queue cyclical dependencies caused by having queues that use a pager which in tern uses the same queues for managing page freeing and allocation. Many debug output improvements, including making BTreePageIDs and LogicalPageIDs stringify the same way everywhere to make following a PageID easier. 2019-10-15 03:10:50 -07:00
Evan Tschannen 298b815109 one proxy or resolver with best fitness no longer prevents more proxies or resolvers from being recruited with good fitness 2019-10-14 18:32:17 -07:00
Evan Tschannen 5064d91b75 fix: the cluster controller would not change to a new set of satellite tlogs when they become available in a better satellite location 2019-10-14 18:31:23 -07:00
Evan Tschannen 35e816e9ad added the ability to configure satellite_logs by satellite location, this will overwrite the region configure if both are present 2019-10-14 18:30:15 -07:00
Evan Tschannen 5667331729 added a buggify + minor code cleanup 2019-10-11 18:31:43 -07:00