So that we can use it from a piece of flow code without breaking module
boundaries.
Also rename generated-constants to crc32c-generated-constants so that
it's more apparent that they're related files.
Fix several such reports from ubsan
E.g.
/Users/anoyes/workspace/foundationdb/flow/Arena.h:794:16: runtime error: null pointer passed as argument 1, which is declared to never be null
Pushing was already a serialized, sequential operation.
Instead make it explicit that there are two waits as part of a push:
1. The setup work to reserve a spot on in the file
2. The work of writing and sync'ing the data
And we return a Future<Future<Void>> to force these to be done sequentially.
So as to not make filesystem assumptions. This knob did technically
appear in (only the) 6.1.5 release, but this feature was broken 6.1.5,
so thus impossible to use anyway.
And instead create a new file while incrementally truncating the old one
down. This avoids queueing up a massive number of filesystem metadata
operations in one call, thus flooding the disk with requests and
stalling out all other filesystem operations.
This sets the knobs so that a truncate of >10GB causes us to create a
new file rather than trying to truncate the old one.
DiskQueue shrinking was implemented for spill-by-reference, as now
a DiskQueue could grow "unboundedly" large.
Without a minimum file size, write burst workloads would cause the
DiskQueue to shrink down to 100MB, and then grow back to its usual ~4GB
size in a cycle. File growth means filesystem metadata mutations, which
we'd prefer to avoid if possible since they're more unpredicatble in
terms of latency.
In a healthy cluster, the TLog never spills, so the disk of a single
DiskQueue file should stay less than 2*TLOG_SPILL_THRESHOLD. In the
worst case of spill-by-value, the DiskQueue could grow to
2*TLOG_HARD_LIMIT. Therefore, having this limit will cause DiskQueue
shrinking to never behave sub-optimally for spill-by-value, and will
cause the DiskQueue files to return to the optimal size with
spill-by-reference.
This time, track what location in the DiskQueue has been spilled in
persistent state, and then feed it back into the disk queue before
recovery.
This also introduces an ASSERT that recovery only reads exactly the
bytes that it needs to have in memory.
There's various ASSERT()'s that assume firstPages is empty, and enforces
things about `seq`. Some of these asserts have spuriously passed, since
uninitialized pages look like they have a `seq` of 0, which would be the
beginning of the disk queue.
Now they'll look like the end of the disk queue, which is far easier to
fail on.
If the disk queue is called with the following series of operations:
Push(a) -> 1
Commit()
Pop(1)
Push(b)
Commit()
Commit()
Then the last Commit() should be a no-op, and not actually run accordingly.
However, anyPopped was only set to `false` if no pages were pushed, and thus
we'd falsely think that an extra empty page commit needed to happen to log to
record the new popped position, but there actually was no new popped page
position to record.
Aside from the extra commit, it maybe makes getCommitOverhead slightly
inaccurate, but that's only used for some accounting inside of the memory
storage engine and at a quick glance doesn't look like it should have caused
any bad effects.
I dug through history, and this code has been this way since the initial commit
by Dave, and then no one has touched the anyPopped logic since.