I ran this command in my build directory after compiling with
OPEN_FOR_IDE. It took a few small tweaks to get it to compile, which is
outside the scope of this commit.
$ python run-clang-tidy.py -j $(nproc) -checks='-*,performance-inefficient-vector-operation' -fix
Use `clang-tidy -p . $file -checks='-*,modernize-use-override' -header-filter='.*' -fix`
to fix missing overrides, and then use git clang-format to reformat just
those changes. This went pretty well for most files.
Formatting the following files went off the rails, so I'm going to
follow up with a commit that's just clang-tidy and no clang-format.
- fdbclient/DatabaseBackupAgent.actor.cpp
- fdbclient/FileBackupAgent.actor.cpp
- fdbserver/OldTLogServer_4_6.actor.cpp
- fdbmonitor/SimpleIni.h
- fdbserver/workloads/ClientTransactionProfileCorrectness.actor.cpp
So that we can use it from a piece of flow code without breaking module
boundaries.
Also rename generated-constants to crc32c-generated-constants so that
it's more apparent that they're related files.
Fix several such reports from ubsan
E.g.
/Users/anoyes/workspace/foundationdb/flow/Arena.h:794:16: runtime error: null pointer passed as argument 1, which is declared to never be null
Pushing was already a serialized, sequential operation.
Instead make it explicit that there are two waits as part of a push:
1. The setup work to reserve a spot on in the file
2. The work of writing and sync'ing the data
And we return a Future<Future<Void>> to force these to be done sequentially.
So as to not make filesystem assumptions. This knob did technically
appear in (only the) 6.1.5 release, but this feature was broken 6.1.5,
so thus impossible to use anyway.
And instead create a new file while incrementally truncating the old one
down. This avoids queueing up a massive number of filesystem metadata
operations in one call, thus flooding the disk with requests and
stalling out all other filesystem operations.
This sets the knobs so that a truncate of >10GB causes us to create a
new file rather than trying to truncate the old one.
DiskQueue shrinking was implemented for spill-by-reference, as now
a DiskQueue could grow "unboundedly" large.
Without a minimum file size, write burst workloads would cause the
DiskQueue to shrink down to 100MB, and then grow back to its usual ~4GB
size in a cycle. File growth means filesystem metadata mutations, which
we'd prefer to avoid if possible since they're more unpredicatble in
terms of latency.
In a healthy cluster, the TLog never spills, so the disk of a single
DiskQueue file should stay less than 2*TLOG_SPILL_THRESHOLD. In the
worst case of spill-by-value, the DiskQueue could grow to
2*TLOG_HARD_LIMIT. Therefore, having this limit will cause DiskQueue
shrinking to never behave sub-optimally for spill-by-value, and will
cause the DiskQueue files to return to the optimal size with
spill-by-reference.
This time, track what location in the DiskQueue has been spilled in
persistent state, and then feed it back into the disk queue before
recovery.
This also introduces an ASSERT that recovery only reads exactly the
bytes that it needs to have in memory.
There's various ASSERT()'s that assume firstPages is empty, and enforces
things about `seq`. Some of these asserts have spuriously passed, since
uninitialized pages look like they have a `seq` of 0, which would be the
beginning of the disk queue.
Now they'll look like the end of the disk queue, which is far easier to
fail on.