TokenSign was copying unused Arena held by Standalone instead of refering to it.
An Arena has to be used at least once before it holds a valid, copyable reference.
Otherwise the lifecycle of the copied Arena would be its own and not be shared with the original.
Thus, when the copied arena went out of scope,
the memory supposed to be held by returned Standalone also got released.
Fix: instead of copying, refer to Standalone's arena.
* Add tryResolveHostnames() in connection string.
* Add missing hostname to related interfaces.
* Do not pass RequestStream into *GetReplyFromHostname() functions.
Because we are using new RequestStream for each request anyways. Also, the passed in pointer could be nullptr, which results in seg faults.
* Add dynamic hostname resolve and reconnect intervals.
* Address comments.
Code is still untested at this point, but testing code was written
Implemented token management and testing
Code is still untested at this point, but testing code was written
* Fixing leaked stream with explicit notify failed before destructor
* better logic to prevent races in change feed fetching
* Found new race that makes assert incorrect
* handle server overloaded in initial read from fdb
* Handling more blob error types in granule retry
* Fixing rollback metadata problem, added better debugging
* Fixing version race when fetching change feed metadata
* Better racing split request handling
* fixing assert
* Handle change feed popped check in the blob worker
* fix: do not use a RYW transaction for a versionstamp because of randomize API version (#6768)
* more merge conflict issues
* Change feed destroy fixes
* Fixing change feed destroy and move race
* Check error condition in BG file req
* Using relative endpoints for blob worker interface
* Fixing bug in previous fix
* More destroy and move race fixes
* Don't update empty version on destroy in case it gets rolled back. moved() and removing will take care of ensuring it is not read
* Bug fix (#6796)
* fix: do not use a RYW transaction for a versionstamp because of randomize API version
* fix: if the initialSnapshotVersion was pruned, granule history was incorrect
* added a way to compress null bytes in printable()
* Fixing durability issue with moving and destroying change feeds
* Adding fix for not fully deleting files for a granule that child granules need to re-snapshot
* More destroy and move races
* Fixing change feed destroy and pop races
* Renaming bg prune to purge, and adding a C api and unit test for it
* more cleanup
* review comments
* Observability for granule purging
* better handling for change feed not registered
* Fixed purging bugs (#6815)
* fix: do not use a RYW transaction for a versionstamp because of randomize API version
* fix: if the initialSnapshotVersion was pruned, granule history was incorrect
* added a way to compress null bytes in printable()
* fixed a few purging bugs
Co-authored-by: Evan Tschannen <evan.tschannen@snowflake.com>
* initial structure for remote IKVS server
* moved struct to .h file, added new files to CMakeList
* happy path implementation, connection error when testing
* saved minor local change
* changed tracing to debug
* fixed onClosed and getError being called before init is finished
* fix spawn process bug, now use absolute path
* added server knob to set ikvs process port number
* added server knob for remote/local kv store
* implement simulator remote process spawning
* fixed bug for simulator timeout
* commit all changes
* removed print lines in trace
* added FlowProcess implementation by Markus
* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child
* temporary fix for process factory throwing segfault on create
* specify public address in command
* change remote kv store knob to false for jenkins build
* made port 0 open random unused port
* change remote store knob to true for benchmark
* set listening port to randomly opened port
* added print lines for jenkins run open kv store timeout debug
* removed most tracing and print lines
* removed tutorial changes
* update handleIOErrors error handling to handle remote-ikvs cases
* Push all debugging changes
* A version where worker bug exists
* A version where restarting tests fail
* Use both the name and the port to determine the child process
* Remove unnecessary update on local address
* Disable remote-kvs for DiskFailureCycle test
* A version where restarting stuck
* A version where most restarting tests green
* Reset connection with child process explicitly
* Remove change on unnecessary files
* Unify flags from _ to -
* fix merging unexpected changes
* fix trac.error to .errorUnsuppressed
* Add license header
* Remove unnecessary header in FlowProcess.actor.cpp
* Fix Windows build
* Fix Windows build, add missing ;
* Fix a stupid bug caused by code dropped by code merging
* Disable remote kvs by default
* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune
* serialization change on readrange
* Update traces
* Refactor the RemoteIKVS interface
* Format files
* Update sim2 interface to not clog connections between parent and child processes in simulation
* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled
* Add comments, format files
* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections
* Commit the IConnection interface change, forgot in previous commit
* Fix the issue that onClosed request is cancelled by ActorCollection
* Enable the remote kv store knob
* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process
* Fix the bug where one process starts storage server more than once
* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally
* Remove unreachable code path and add comments
* Clang format the code
* Fix a simple wait error
* Clang format after merging the main branch
* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false
* Disable remote kvs for PhysicalShardMove which is for RocksDB
* Cleanup #include orders, remove debugging traces
* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build
Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
* Set default for USE_JEMALLOC initially in ConfigureCompiler
Instead of trying to change the value later on. This fixes the valgrind
build, which was previously incorrectly getting jemalloc involved.
* Check aligned_alloc result for null
And OOM if so - don't assert
* Check that we can allocate magazines with no internal fragmentation
We may want to do this so that the jemalloc heap profiler has some
knowledge of FastAlloc
* Populate TestFile field for noSim tests in TestHarness
* Remove handling for nonexistent "ActualRun"
* Add contrib/debug_determinism
Add an instrumentation-based technique for debugging unseen mismatches. Also guard a few existing sources of nondeterminism that don't affect unseen with the DEBUG_DETERMINISM macro.
Also change the simulated run loop to not run as the only task inside the real run loop, since that was a source of nondeterminism.
Also fix nondeterminism from calling timer_int
* Add StorageMetadataType::currentTime
Basically a deterministic-in-simulation version of timer_int that we can
use instead of timer_int for StorageMetadataType::createdTime
* Upgrade AES 128 GCM -> AES 256, StreamCipher code refactor
Major changes proposed are:
1. Refactor StreamCipher code to enable instantiation of
multiple encryption keys. However, code still retains
a globalEncryption key semantics used in Backup file
encryption usecase.
2. Enhance StreamCipher to provide HMAC signature digest
generation. Further, the class implements HMAC encryption
key derivation function.
3. Upgrade StreamCipher to use AES 256 GCM mode from currently
supported AES 128 GCM mode.
Note: The code changes the encryption key size, however, the
feature is NOT currently in use, hence, should be OK.
3. Add EncryptionOps validation and benchmark toml supported
workload, it does the following:
a. Allow user to configure encrypt-decrypt of a fixed size
buffer or variable size buffer [100, 512K]
b. Allow user to configure number of interactions of the runs,
in each iteration: generate random data, derive an encryption
key using HMAC SHA256 method, encrypt data and
then decrypt data. It collects following metrics:
i) time taken to derive encryption key.
ii) time taken to encrypt the buffer.
iii) time taken to decrypt the buffer.
iv) total bytes encrypted and/or decrypted
c. Along with stats it basic basic validations on the encrypted
and decrypted buffer
d. On completion for test, records the above mentioned metrics
in trace files.
Major changes includes:
1. Add a new FDB role responsible- EncyrptKeyProxy. The role is
responsible to expose APIs to fetch encyrption keys interacting
with external Encryption KeyManager interface.
2. The process is a FDB singleton process following similar recruitment
rules as other singleton processes in the system.
3. Code to recruit the worker process; given the encryption keys are
needed during recovery (decode TLog records), for now the process
is co-located in same datacenter as ClusterController.
4. Skeleton process actor code; more functionality will be added in
subsequent PRs.
NOTE: The code is protected under a SERVER_KNOB with the default
value as 'false' for now.
This eliminates many useless warnings when compiling.
`#pragma message: The practice of declaring the Bind placeholders (_1, _2, ...) in the global namespace is deprecated. Please use <boost/bind/bind.hpp> + using namespace boost::placeholders, or define BOOST_BIND_GLOBAL_PLACEHOLDERS to retain the current behavior.`
* Redwood files now growth in large page chunks controlled by a knob to reduce truncate() calls for expansion. PriorityMultiLock has limit on consecutive same-priority lock release. Increased Redwood max priority level to 3 for more separation at higher BTree levels.
* Simulation fix, don't mark certain IO timeout errors as injected unless the simulated process has been set to have an unreliable disk.
* Pager writes now truncate gradually upward, one chunk at a time, in response to writes, which wait on only the necessary truncate operations. Increased buggified chunk size because truncate can be very slow in simulation.
* In simulation, ioTimeoutError() and ioDegradedOrTimeoutError() will wait until at least the target timeout interval past the point when simulation is sped up.
* PriorityMultiLock::toString() prints more info and is now public.
* Added queued time to PriorityMultiLock.
* Bug fix to handle when speedUpSimulation changes later than the configured time.
* Refactored mutation application in leaf nodes to do fewer comparisons and do in place value updates if the new value is the same size as the old value.
* Renamed updatingInPlace to updatingDeltaTree for clarity. Inlined switchToLinearMerge() since it is only used in one place.
* Updated extendToCover to be more clear by passing in the old extension future as a parameter. Fixed initialization warning.