Previously, we had Ref types outliving the arena's that owned them,
specifically encryptDomains in the getResolution actor. Refactor to use
Standalone's, which both fixes the memory error and makes this easier to
reason about.
Also fix a potential ODR violation.
Adding the following metrics:
* BlobCipherKeyCache hit/miss
* EKP: KMS requests latencies
* For each component that using encryption, they now need to pass a UsageType enum to the encryption helper methods (GetEncryptCipherKeys/GetLatestEncryptCipherKey/encrypt/decrypt) and those methods will help to log get cipher key latency samples and encryption/decryption cpu times accordingly.
* Move arena members to the end of serializer calls
See
https://github.com/apple/foundationdb/tree/main/flow#flatbuffersobjectserializer
for why this is necessary.
* Fix a heap-use-after-free
Previously memory owned by
EncryptKeyProxyData::baseCipherDomainIdKeyIdCache was borrowed by a call
to EncryptKeyProxyData::insertIntoBaseDomainIdCache where it was
invalidated and then used. Now
EncryptKeyProxyData::insertIntoBaseDomainIdCache takes shared ownership
by taking a Standalone.
And also rename some types to end in Ref to follow the flow conventions
described here: https://github.com/apple/foundationdb/tree/main/flow#arenas
* Cleaned up BlobGranule TODO + FIXMEs and addressed some
* popping feed at correct version
* blob worker taking over a granule will pop from where previous worker left off
* addressed fixme of blob worker not re-snapshotting from old change feed
* formatting
* more change feed popped fixes after pop updates
* Getting rid of change feed parallelism lock since it can cause deadlocks in fetching, and relying on full fetch lock
* New blob worker metric and fixing old one
* server-side popped checking still doesn't work because of pops at non-mutation versions
* format
* assorted bug fixes for blob granules
* Fixing transaction used after commit in blob manager recovery
* fixing race with granule merging across hard boundaries because it hadn't loaded them yet
* throttle the cluster when blob workers fall behind
* do not throttle on blob workers if they are not enabled
* remove an unnecessary actor
* fixed a compile error
* fetch blob worker metrics at the same interval as the rate is updated, avoid fetching the complete blob worker list too frequently
* fixed another compilation bug
* added a 5 second delay before bw throttling to prevent false positives caused by the 100e6 version jump during recovery. Lower the throttling thresholds to react much quicker to bw lag.
* fixed a number of problems
* changed the minBlobVersionRequest to look at storage server versions since this will be a lot more efficient
* fix: do not let desired go backwards
* fix: track the version of notAtLatest changefeeds for throttling
* ratekeeper now throttled blob workers by estimating the transaction per second throughput of the blob workers
* added metrics for blob worker change feeds
* added a knob to disable bw throttling
* fixed the transaction options in blob manager
* Granule purge cannot delete history entry for fully deleting granule until all children are completely done splitting
* Several purging fixes related to granule history
* Fixed typo in refactor
* fixing memory model for purgeRange
* formatting
* weakening granule purge test for now
* cleanup
* First version of force purging granules
* fixing issue in BW range assignment reporting
* Fixing incorrect assert with force purging
* Error handling when checking force purged state
* fixed force purging and recover/reassign range races and check
* Handling force purge + boundary change race
* more places to check for force purged status
* fixed manager restart in the middle of force purge bug
* fixing same-BM purge and assignment races in all cases
* weakening orphaned granule history check a bit because of difficult to solve races
* fixing txn options on retry
* loading force purged ranges at start to avoid resuming a merge that is being force purged
* cleanup
* Enabling purging in granule tests, and adding check for leaked change feeds in force purge
* formatting
* missed parameter in merge conflicts
* Fixing leaked change feed race with merge and force purge
* adding change feed cleanup when new blob manager recovers in-progress merge that raced with force purge
* added forcepurge fdbcli command
* 'main' of github.com:sfc-gh-nwijetunga/foundationdb: (32 commits)
Store rocksdb::DBOptions and rocksdb::ColumnFamilyOptions to (#7766)
Update CONTRIBUTING.md
Update tests/rare/SpecificUnitTests.toml
fix ASAN OOM problem
Update CONTRIBUTING.md
Write tracing and ALP special key errors as JSON
Fix: the static tenant map in the Java tester was being accessed concurrently from multiple threads. Make it a concurrent map. (#7805)
Run clang-format
Print SIGNAL output to stdout
Print to stderr only upon errors
Testing upgrades to a future version of FDB (#7780)
Flush gcov coverage upon SIGTERM
Report the unit tests being run in test harness
Fix a bug in a storage wiggler unit test where some servers were added with too recent a timestamp
Fix undefined behavior in versioned btree test due to integer overflow
When a transaction operation gets an unknown tenant error, it needs to reset the tenant ID so it can be updated in the next tenant lookup request.
Don't buggify max tenants per cluster globally; instead buggify it in specific tests
Remove non-existing unittest
Add unit tests to the correctness package
Add comment to INetwork
...
* Enable configuring the next future protocol version as the current protocol version in FDB client, fdbserver, and fdbcli
* Auto format python files used in upgrade tests
* Add a test for upgrading to a future FDB version
* Emphasize that the options for using future protocol version are intended for test purposes only
* Make the global variable for current protocol version visible only locally
* Refactirng to avoid using currentProtocolVersion() in static intialization
* Update go bindings
* better check for granule-ification
* Handling blob granule initial split too large
* Re-evaluating split size if too large, even if read doesn't get transaction_too_old
* reworked to have blob worker propose split key
* New GranuleStatusReply to avoid seqno issue stream side effects
* Handling retries on reevaluateInitialSplit properly
* Waiting for stream to be initialized
* Checking reevaluate split for additional split points beyond proposed
* Fixing more races in reevaluate initial split
* properly handling cleaning up old change feed after split re-evaluate
* fixing granule conversion bug with hard boundaries
* fixing clear and merge check race with cycle test
* refactor missed knob check for clearAndMerge
* Fixing formatting
* review comments and improving large range conversion
* fixing typo
* more formatting
* Using knownBlobRanges for blob granule ranges whether tenants are enabled or not
* Effectively disabled blob granule tests when tenants enabled to fix ctest
* Granule purge cannot delete history entry for fully deleting granule until all children are completely done splitting
* Several purging fixes related to granule history
* Fixed typo in refactor
* fixing memory model for purgeRange
* formatting
* weakening granule purge test for now
* cleanup
* review comments
* Encrypt BlobGranule delta files
Description
diff-1: Address review comments
Major changes proposed by the patch are:
1. Refactor code to allow caching of 'encryption key ctx' as part of
BlobFilePointerRef. The refactoring allows snapshot and/or delta files
to store their own file encryption context.
2. Enable BlobGranule delta file encryption/decryption semantics.
Testing
BlobGranuleCorrrectness
BlobGranuleCorrectnessClean
BlobGranuleFileUnitTestToml
Description
Testing