We are currently emitting Role transition traces when a role starts and
when it ends. While this is useful for debugging, it doesn't work well
with tools that inject data and might potentially miss some trace lines.
We do decorate each trace lines with the roles assigned to that
particular process, however, this is not sufficient for tools that can
make use of the UID -> Role mapping
Parallel peek more code would prefer the begin version it was sent by
the previous parallel peek over the request's begin version. This means
that a merge cursor trying to advance past message versions would still
get old data that it would have to filter out.
A simple application of std::max fixes this.
This fixes an issue introduced in the previous patch, where pop would
immediately set `poppedLocationNeedsUpdate`, but setting the popped
version was now delayed. This means that we could:
1. Run the spill loop and persist all popped versions
2. Receive a pop, and set the poppedLocationNeedsUpdate flag
3. Run the dq-pop loop, and clear the poppedLocationNeedsUpdate flag
and now when we update the persistentPopped version again, we won't have
the flag set for dq-pop to know that it needs to scan the spilled data
again for the minLocation.
We could more carefully update the flag, but instead, I've just
converted it into a version that's kept in sync purely in the dq-pop
loop, to remove shared state between pop and the dq-pop loop.
This commit is to fix a bug where popping a tag between
updatePersistentData and popDiskQueue can cause the TLog to recover to
an incorrect understanding of what data it has available.
The following series of events need to happen to trigger this bug:
Tag 1:1 is popped to version 10
updatePersistentData is run...
updatePersistentPopped runs and we persistentData stores 1:1 as popped to 10
A mutation is spilled for 1:1 at version 11 at location 1000
A mutation is spilled for 1:1 at version 21 at location 5000
updatePersistentData finishes and commits the btree changes
Tag 1:1 is popped to version 20
popDiskQueue runs
The btree is read for spilled mutations with version >=20
The minimum location required for the disk queue is found to be location 5000
The disk queue is popped to location 5000
The TLog crashes
The worker restarts, and reloads the TLog files from disk
restorePersistentPopped restores tag 1:1 as having been popped to version 10
Parallel peeks are received for tag 1:1 starting at version 0
The first peek is less than the popped version, so we respond with no data, and an end version of 10
The second peek starts at version 10, which is greater than the popped version
The btree is read for spilled mutations, and we find that there is a mutation at version 11 at location 1000
Location 1000 is read in the DiskQueue
The resulting page read at Location 1000 was popped pre-crash, and thus
might either (a) be corrupt or (b) have an incorrect sequence number.
The fix to this is to force popDiskQueue/updatePoppedLocation to use the
popped version that was persisted to disk, and not the most recently
popped version for the given tag.
This bug doesn't manifest in simulation, because we don't have any code
that peeks at a lower version than what has been popped.
When switching between spill_type or log_version, a new instance of a
SharedTLog is created in the transaction log processes. If this is done
in a saturated database, then doubling the amount of memory to hold
mutations in memory can cause TLogs to be uncomfortably close to the 8GB
OOM limit.
Instead, we now thread which UID of a SharedTLog is active, and the
other TLog spill out the majority of their mutations.