In this patch, for a given test, it is possible to override the knob
values, e.g.
[[test]]
[[test.knobs]]
watch_timeout = 999
will set the client knob WATCH_TIMEOUT to 999 during the test. The
original value will be recovered after the test is over.
* Set default for USE_JEMALLOC initially in ConfigureCompiler
Instead of trying to change the value later on. This fixes the valgrind
build, which was previously incorrectly getting jemalloc involved.
* Check aligned_alloc result for null
And OOM if so - don't assert
* Check that we can allocate magazines with no internal fragmentation
We may want to do this so that the jemalloc heap profiler has some
knowledge of FastAlloc
* Populate TestFile field for noSim tests in TestHarness
* Remove handling for nonexistent "ActualRun"
MacOS warnings are format warnings, e.g., `format specifies type 'long' but the argument has type 'Version' (aka 'long long')`.
Windows warnings are `ACTOR does not contain a wait() statement`.
* Fix a benign bug turned up by _GLIBCXX_DEBUG
Just calling std::vector::operator[] with an out of bounds index at all
is technically undefined behavior.
* Fix compilation issue with _GLIBCXX_DEBUG
For some reason std::max with an initializer list isn't constexpr with
_GLIBCXX_DEBUG set
* Add contrib/debug_determinism
Add an instrumentation-based technique for debugging unseen mismatches. Also guard a few existing sources of nondeterminism that don't affect unseen with the DEBUG_DETERMINISM macro.
Also change the simulated run loop to not run as the only task inside the real run loop, since that was a source of nondeterminism.
Also fix nondeterminism from calling timer_int
* Add StorageMetadataType::currentTime
Basically a deterministic-in-simulation version of timer_int that we can
use instead of timer_int for StorageMetadataType::createdTime
* Update StreamCipher ctx/cipher management to respect determinism
StreamCipher keeps record of CipherKeys created
(including globalCipherKey) to ensure the sensitive data gets
ZERO-OUT and not recorded as part of FDB process dump. However,
in current code it is maintained as an unordered_set indexed
by the object itself. Approach adds non determinism given
object pointer based indexing scheme.
Patch addresses the concern by updating the recording to use
a map indexed by UID.
* add storagemetadata
* add StorageWiggler;
* fix serverMetadataKey bug
* add metadata tracker in storage tracker
* finish StorageWiggler
* update next storage ID
* change pid to server id
* write metadata when seed SS
* add status json fields
* remove pid based ppw iteration
* fix time expression
* fix tss metadata nonexistence; fix transaction retry when retrieving metadata
* fix checkMetadata bug when store type is wrong
* fix remove storage status json
* format code
* refactor updateNextWigglingStoragePID
* seperate storage metadata tracker and store type tracker
* rename pid
* wiggler stats
* fix completion between waitServerListChange and storageRecruiter
* solve review comments
* rename system key
* fix database lock timeout by adding lock_aware
* format code
* status json
* resolve code format/naming comments
* delete expireNow; change PerpetualStorageWiggleID's value to KeyBackedObjectMap<UID, StorageWiggleValue>
* fix omit start rount
* format code
* status json reset
* solve status json format
* improve status json latency; replace binarywriter/reader to objectwriter/reader; refactor storagewigglerstats transactions
* status timestamp
* add knob for trace event severity
* add knob for TraceEvent severity
* fix format
* fix switch format
* moved intToSeverity call inside __test initialization
* updated knob name
* fix line length format
* fix format
* git clang-format
* Upgrade AES 128 GCM -> AES 256, StreamCipher code refactor
Major changes proposed are:
1. Refactor StreamCipher code to enable instantiation of
multiple encryption keys. However, code still retains
a globalEncryption key semantics used in Backup file
encryption usecase.
2. Enhance StreamCipher to provide HMAC signature digest
generation. Further, the class implements HMAC encryption
key derivation function.
3. Upgrade StreamCipher to use AES 256 GCM mode from currently
supported AES 128 GCM mode.
Note: The code changes the encryption key size, however, the
feature is NOT currently in use, hence, should be OK.
3. Add EncryptionOps validation and benchmark toml supported
workload, it does the following:
a. Allow user to configure encrypt-decrypt of a fixed size
buffer or variable size buffer [100, 512K]
b. Allow user to configure number of interactions of the runs,
in each iteration: generate random data, derive an encryption
key using HMAC SHA256 method, encrypt data and
then decrypt data. It collects following metrics:
i) time taken to derive encryption key.
ii) time taken to encrypt the buffer.
iii) time taken to decrypt the buffer.
iv) total bytes encrypted and/or decrypted
c. Along with stats it basic basic validations on the encrypted
and decrypted buffer
d. On completion for test, records the above mentioned metrics
in trace files.
In unit and simulation testing calls to pthread_setname_np may return errors,
as the threads may complete before calls to setname can be executed. This change
adds better error handling for cases where ENOENT or ESRCH is returned during testing.
Previously the ASSERT_EQ would cause tests to fail if a non-zero return value was encountered.
This change will trace log with a SevWarn when ENOENT or ESRCH is encountered. Otherwise
it will trace with SevError and throw a platform_error.
* Use localhost cluster for trace_partial_file_suffix_test
This way we get a predictable 127.0.0.1 in the trace file name
* Skip suspend test of pidof is not available
* Avoid writing to closed trace log
calling fdb_network_stop sends a "close" message to the trace thread,
but the network thread might can still be running and sending "flush"
messages to the network thread. This change basically ignores any
flushes that come after a close.
* Ensure unique ports for multi-process tests
1. Introduce processDiskReadSeconds and processDiskWriteSeconds, which stands for disk read/write times `since the last logging`. They can only be obtained on Linux and macOS, and will be 0 on Windows and FreeBSD;
2. Rename `busyTicks` to `IOMilliSecs`;
3. On FreeBSD, the metrics should be collected among all devices.
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""
Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9.
1.b. This reverts commit d174bb2e06.
1.c. This reverts commit 30b05b469c.
2. Update Status.actor to track ClusterController interface to track
recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
prefix; for now keeping it as "Master", however, it should allow
smooth transition to "Cluster" prefix as it seems more appropriate.
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
responsible to recruit all other processes as well restore the
cluster state.
Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.
Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
process like other worker processes compared to current scheme
where "sequencer" process gets special treatment. In newer scheme
sequencer is responsible for maintaining/providing
"committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
the sequencer though orchestrating the recovery state machine, it
need to reachout to the ClusterController for recruiting worker
processes etc.
NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.
Next Steps:
Cluster recovery documentation will be updated in near future.
We had been disabling -Wdelete-non-virtual-dtor, because this seems to be done intentionally in the generated code of the actor compiler. I spent some time trying to rewrite it in a way that doesn't literally delete/destroy through a pointer to a base class without a virtual destructor, but I was unable to come up with something that passes correctness. My best guess is that we do this so that we can destroy actor state classes, call callbacks registered on the actor SAV, and then destroy the SAV.
Anyway now we'll detect new usages of deleting through a pointer to a base class without a virtual destructor.
* Unify flags implementation and change help text in backup.actor.cpp
Description
Testing
* Keep LOG_GROUP unchanged
Description
Testing
* Transfer the hyphens to underscores for internal options and user's input, EXCEPT leading hyphens
Description
Testing
* Use a deep copy of the user's input flag to do the match
Description
Testing
* Convert the _ to - in Option arrays of backup.actor.cpp
Description
Testing
* Transter _ to - for files:
TLSConfig.actor.h, fdbcli.actor.cpp, fdbserver.actor.cpp, FileConverter.h, FileConverter.cpp
Description
Testing
* Change another way to unify flag: using SO_O_ICASE_HYPHEN_AND_UNDERSCORE to determine whether we do the conversion in function IsEqual
Description
Testing
* Change the config command's name from SO_O_ICASE_HYPHEN_AND_UNDERSCORE to SO_O_HYPHEN_TO_UNDERSCORE
Description
Testing
* Update the comment for the SO_O_HYPHEN_TO_UNDERSCORE
Description
Testing
* Fix left underscore in SOption arrays
Description
Testing
* Convert _ to - in several files for commands
Description
Testing
* Make the FDBService and fdbmonitor backward compatible
Description
Testing
* Fix bugs about pointers
Description
Testing
* Check underscore and hyphen at the same time for --knob_, --localily_ and --test_
And fix bugs in fdbmonitor and FDBService
Description
Testing
* Simplify the function in fdbmonitor and FDBService about retrieving arguments.
And fix some documents in masterserver.actor.cpp
Description
Testing
* Convert _ to - for knob in the setKnob functions
Description
Testing
* Convert - to _ in the setKnob functions
Description
Since key in the knob related maps only contain _
Testing
* Rename varialbe name in the fdbmonitor and FDBService for clarification
Description
Testing
Co-authored-by: Chang Liu <chang.liu@snowflake.com>
1. Add a trace event when a database is created and move the cluster file / connection string from ClientStart to the new trace event
2. Add a detail for the path to the image being loaded
3. Add a detail for whether a client library is primary or not
4. Set a thread name for each external client thread that includes the release version
Patch improves on handling scenarios where either commit or grv proxies
value is update to -1 OR `proxies_count` is being reset.
The code splits the proxies between two proxies by ensuring for invalid
input configuration, the min (read as 1) proxies gets provisioned, otherwise,
the split is done based on input values
Patch handles the scenario where mutation supplied values to update grv_proxies
and/or commit_proxies is -1, however, the total proxy count > 1,
uses DEFAULT_COMMIT_GRV_PROXIES_RATIO to split proxies between
grv_proxies & commit_proxies.
This eliminates many useless warnings when compiling.
`#pragma message: The practice of declaring the Bind placeholders (_1, _2, ...) in the global namespace is deprecated. Please use <boost/bind/bind.hpp> + using namespace boost::placeholders, or define BOOST_BIND_GLOBAL_PLACEHOLDERS to retain the current behavior.`
Transactions (created on a separate thread) can read the `globals` field
at the same time as `setGlobal` is called on the main thread, causing a
potential race. TSAN surfaced this issue.
* Redwood files now growth in large page chunks controlled by a knob to reduce truncate() calls for expansion. PriorityMultiLock has limit on consecutive same-priority lock release. Increased Redwood max priority level to 3 for more separation at higher BTree levels.
* Simulation fix, don't mark certain IO timeout errors as injected unless the simulated process has been set to have an unreliable disk.
* Pager writes now truncate gradually upward, one chunk at a time, in response to writes, which wait on only the necessary truncate operations. Increased buggified chunk size because truncate can be very slow in simulation.
* In simulation, ioTimeoutError() and ioDegradedOrTimeoutError() will wait until at least the target timeout interval past the point when simulation is sped up.
* PriorityMultiLock::toString() prints more info and is now public.
* Added queued time to PriorityMultiLock.
* Bug fix to handle when speedUpSimulation changes later than the configured time.
* Refactored mutation application in leaf nodes to do fewer comparisons and do in place value updates if the new value is the same size as the old value.
* Renamed updatingInPlace to updatingDeltaTree for clarity. Inlined switchToLinearMerge() since it is only used in one place.
* Updated extendToCover to be more clear by passing in the old extension future as a parameter. Fixed initialization warning.
In this PR, the blob manager now recruits blob workers
(via communication with the cluster controller). Blob workers
are onboarded as blob worker processes enter the cluster.
When compiling FDB using clang++, self-assign warning appears due to the
code
pos = littleEndian32(pos);
in Atomic.h, which expands to
pos = pos;
as littleEndian32 is defined as
#define littleEndian32(value) value
This warning is not interesting, but annoying, by adding a
no-side-effect casting, the warning is suppressed.