Evan Tschannen
4a597fdcce
increase the task priority of popping
2019-11-05 15:03:41 -08:00
Evan Tschannen
daac8a2c22
Knobified a few variables
2019-11-04 20:21:38 -08:00
Evan Tschannen
457896b80d
remote logs use bufferedCursor when peeking from log routers to improve performance
...
bufferedCursor performance has been improved
2019-11-04 19:47:45 -08:00
Evan Tschannen
8a3521f945
Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-enable-parallel-peek
2019-11-01 14:04:22 -07:00
Evan Tschannen
85c315f684
Fix: parallelPeekMore was not enabled when peeking from log routers
2019-11-01 14:02:44 -07:00
A.J. Beamon
1dc5985062
Merge pull request #2305 from etschannen/release-6.2
...
merges crossing systemKeys.begin did not decrement systemSizeEstimate
2019-11-01 09:12:01 -07:00
Evan Tschannen
8f0348d5e0
fix: merges which cross over systemKeys.begin did not properly decrement the systemSizeEstimate
2019-10-31 16:38:33 -07:00
Evan Tschannen
5cf0045bc0
Merge pull request #2294 from satherton/feature-redwood
...
Bug fixes in Redwood
2019-10-25 14:56:01 -07:00
Stephen Atherton
2ee1782c19
Bug fixes in Redwood. BTree height was not being reset when a new empty root is written. IKeyValueStore wrapper was not obeying the row limit in a reverse range query. Added yields to and delays to break up tasks and set IO priorities.
2019-10-25 14:52:06 -07:00
Evan Tschannen
a7492aab0a
fix: poppedVersion can update during a yield, so all work must be done immediately after getMore returns
2019-10-23 23:06:02 -07:00
Evan Tschannen
f8e44d2f71
fix: If a storage server was offline, it would not be checked for being in an undesired dc
2019-10-23 23:04:39 -07:00
Evan Tschannen
eb910b850b
fixed a window build error
2019-10-23 13:48:24 -07:00
Evan Tschannen
2722c8b188
avoid starting a new startSpillingActor with every TLog recruitment
2019-10-23 11:15:54 -07:00
Evan Tschannen
ae3f8132a7
Merge pull request #2280 from satherton/feature-redwood
...
Update redwood
2019-10-23 10:57:38 -07:00
Evan Tschannen
9197b03122
Merge pull request #2279 from ajbeamon/latency-band-ignore-batch
...
Ignore batch priority GRVs for latency band tracking
2019-10-23 10:52:44 -07:00
Evan Tschannen
e01e8371a6
Merge pull request #2256 from alexmiller-apple/spill-log-on-switch-6.2
...
Spill SharedTLog when there's more than one
2019-10-23 10:51:28 -07:00
Evan Tschannen
c1731e3b8d
Merge pull request #2276 from alexmiller-apple/fix-10min-stall-again-6.2
...
More fixes to prevent 10min stalls in recovering secondaries
2019-10-23 10:45:55 -07:00
A.J. Beamon
a1bed51d34
Ignore batch priority GRVs for latency band tracking
2019-10-23 10:29:58 -07:00
Stephen Atherton
0e51a248b4
Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-redwood
2019-10-23 10:12:54 -07:00
Stephen Atherton
613bbaecc4
Bug fix in queue page footprint tracking. Added VersionedBTree::destroyAndCheckSanity() which clears the tree, processes the entire lazy delete queue, and then verifies some pager usage statistics. This check is currently disabled because it appears to find a bug where the final state has a few more pages in use than expected. StorageBytes now includes the delayed free list pages as free space since they will be reusable soon.
2019-10-23 09:31:06 -07:00
Alex Miller
0c325c5351
Always check which SharedTLog is active
...
In case it is set before we get to the onChange()
2019-10-23 01:59:36 -07:00
Alex Miller
1e5b8c74e3
Continuing a parallel peek after a timeout would hang.
...
This is to guard against the case where
1. Peeks with sequence numbers 0-39 are submitted
2. A 15min pause happens, in which timeout removes the peek tracker data
3. Peeks with sequence numbers 40-59 are submitted, with the same peekId
The second round of peeks wouldn't have the data left that it's allowed
to start running peek 40 immediately, and thus would hang for 10min
until it gets cleaned up.
Also, guard against overflowing the sequence number.
2019-10-22 19:24:05 -07:00
Evan Tschannen
f65f0cd37a
Merge pull request #2274 from etschannen/feature-cleanup-destuidlookup
...
Automatically cleanup backup and DR sharing metadata
2019-10-22 19:11:23 -07:00
Alex Miller
c008e7f8b3
When switching parallel->single->parallel, reset sequence and peekId
...
This fixes an issue where one could hang for 10min for the second
parallel peek to time out, if one happened to catch the edge of a
onlySpilled transition wrong.
2019-10-22 19:10:58 -07:00
Stephen Atherton
6a57fab431
Bug fixes in lazy subtree deletion, queue pushFront(), queue flush(), and advancing the oldest pager version. CommitSubtree no longer forces page rewrites due to boundary changes. IPager2 and IVersionedStore now have explicit async init() functions to avoid returning futures from some frequently used functions.
2019-10-22 17:17:29 -07:00
Evan Tschannen
35ac0071a8
fixed a compiler error
2019-10-22 17:06:54 -07:00
Evan Tschannen
2d74288d16
Added a comment to clarify why cleanup work is done in status
2019-10-22 16:33:44 -07:00
Evan Tschannen
3478652d06
Apply suggestions from code review
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:32:09 -07:00
Evan Tschannen
d5c2147c0c
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-22 13:27:52 -07:00
Evan Tschannen
2caad04d9c
Keys in the destUIDLookupPrefix can be cleaned up automatically if they do not have an associated entry in the logRangesRange keyspace
2019-10-22 11:58:40 -07:00
Evan Tschannen
12c517ab16
limit the number of committed version updates in progress simultaneously to prevent running out of memory
2019-10-21 16:01:45 -07:00
Stephen Atherton
44175e0921
COWPager will no longer expire read Snapshots that are still in use.
2019-10-18 01:27:00 -07:00
Stephen Atherton
0e9d082805
Bug fixes in FIFOQueue concurrent nested reads and writes caused by the pager/freelist circular dependencies.
2019-10-17 21:34:17 -07:00
Evan Tschannen
43e99ef6a4
fix: better master exists must check if fitness is better for proxies or resolvers before looking at the count of either of them
2019-10-17 13:18:31 -07:00
Alex Miller
1eb3a70b96
Spill SharedTLog when there's more than one.
...
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes. If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.
Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.
This is a backport of #2213 (fef89aa1
) to release-6.2
2019-10-17 01:24:50 -07:00
Evan Tschannen
42b7acf7b7
Merge pull request #2202 from etschannen/feature-share-mutations
...
Backup and DR would not share mutations if started on different versions of FDB
2019-10-16 20:28:39 -07:00
Evan Tschannen
a81ff63147
Merge pull request #2250 from etschannen/feature-fix-proxy-slow-task
...
added a yield on the proxy to remove a slow task when processing large transactions
2019-10-16 20:22:05 -07:00
Evan Tschannen
587cbefe7f
duplicate mutation stream checker did not have a timeout
...
duplicate mutation stream did not work properly when multiple ranges exist with the same begin key
2019-10-16 20:17:09 -07:00
Evan Tschannen
5be773f145
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:24 -07:00
Evan Tschannen
2facfc090b
Update fdbserver/Status.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-16 16:35:12 -07:00
Evan Tschannen
a85f69c62f
Merge pull request #2241 from etschannen/feature-recruitment-cleanup
...
Fixed a few small issues with recruitment logic on the cluster controller
2019-10-16 16:25:42 -07:00
Evan Tschannen
552eb44bf8
Merge pull request #2230 from ajbeamon/fix-fault-tolerance-reporting-with-remote-regions
...
Fix: status would fail to account for remote regions when...
2019-10-16 14:51:48 -07:00
Evan Tschannen
8b09cd16b2
Merge branch 'release-6.2' of github.com:apple/foundationdb into feature-share-mutations
2019-10-16 14:50:37 -07:00
Evan Tschannen
ac28e96bbf
added a yield on the proxy to remove a slow task when processing large transactions
2019-10-16 14:31:59 -07:00
Stephen Atherton
6b7317da9b
Bug and clarity fixes to tracking FIFOQueue page and item count.
2019-10-15 03:36:22 -07:00
Stephen Atherton
c3e2bde987
Deferred subtree clears and expiring/reusing old pages is complete. Many bug fixes involving scheduled page freeing, page list queue flushing, and expiring old snapshots (this was mostly written but not used yet). Rewrote most of FIFOQueue (again) to more cleanly handle queue cyclical dependencies caused by having queues that use a pager which in tern uses the same queues for managing page freeing and allocation. Many debug output improvements, including making BTreePageIDs and LogicalPageIDs stringify the same way everywhere to make following a PageID easier.
2019-10-15 03:10:50 -07:00
Evan Tschannen
298b815109
one proxy or resolver with best fitness no longer prevents more proxies or resolvers from being recruited with good fitness
2019-10-14 18:32:17 -07:00
Evan Tschannen
5064d91b75
fix: the cluster controller would not change to a new set of satellite tlogs when they become available in a better satellite location
2019-10-14 18:31:23 -07:00
Evan Tschannen
35e816e9ad
added the ability to configure satellite_logs by satellite location, this will overwrite the region configure if both are present
2019-10-14 18:30:15 -07:00
Evan Tschannen
5667331729
added a buggify + minor code cleanup
2019-10-11 18:31:43 -07:00