Previously worker servers only register with cluster controller after local file recovery. This PR change to let worker servers register itself before local file recovery, but indicate that although it can server stateless roles, it cannot become storage or TLog server. After local file recovery, the worker will register again to indicate it can now become storage and Tlog.
The change fixes the deadlock issue with encryption, where when there's only stateful workers that can become master and EKP, those worker servers won't be able to register with cluster controller since no EKP is available to provide cipher keys to decrypt encrypted storage.
For now we have the change only when encryption is on, and will make it the default behavior once we think it is stable.
* Some fixes.
* Enabled ShardedRocksDB in IKeyValueStore.h
* Added unit test for ShardedRocks for \xff\xff key space read/write.
* Resolved comments.
* Return empty read results if the physical shards are not initialized.
Co-authored-by: He Liu <heliu@apple.com>
The erased bytes from recent state transactions was accidently set to 0 and
cause anyPopped to be always false. Thus, when totalStateBytes will never
decrease and can cause high latency between "Resolver.resolveBatch.Before" and
"Resolver.resolveBatch.AfterQueueSizeCheck", in hundreds of milliseconds. This
will cause high server side commit latency.
* Shard based move.
* Clean up.
* Clear results on retry in getInitialDataDistribution.
* Remove assertion on SHARD_ENCODE_LOCATION_METADATA for compatibility.
* Resolved comments.
Co-authored-by: He Liu <heliu@apple.com>
* Add an internal C API to support memory connection records
* Track shared state in the client using a unique and immutable cluster ID from the cluster
* Add missing code to store the clusterId in the database state object
* Update some arguments to pass by const&
* Add JsonWebKeySet parser/stringifier
* Update header directory
* Make JWKS parser correctness clean for OpenSSL 1.x
Add RSA keygen support
* Make JWKS parser correctness clean for OpenSSL 3.x
+extend unique_ptr for scoped destruction of OpenSSL objects
* Use PKey::{sign|verify}() in TokenSign
* Apply AutoCPointer to MkCert
* Apply Clang format
* JWKS::toStringRef() returns StringRef > Optional<StringRef>
* Fix Mac/Windows build error
* Fix incorrect fix of Mac build
* Fix filename in license comment for AutoCPointer.h
* Refactor complex C macros into function templates
This PR add support for TLog encryption through commit proxy. The encryption is done on per-mutation basis. As CP writes mutations to TLog, it inserts encryption header alongside encrypted mutations. Storage server (and other consumers of TLog such as storage cache and backup worker) decrypts the mutations as they peek TLog.
* Fix comments
* Add simulation value for SERVER_KNOBS->SNAP_CREATE_MAX_TIMEOUT
* A work version with correctness clean
* Remove unnecessay comments; debugging symbols
* Only check secondary address for coordinators, same as before
* Change the trace to SevError and remove the ASSERT(false)
* Remove TLogSnapRequest handling on TlogServer, which is changed to use WorkerSnapRequest
* Add retry for network failures
* Add retry limit for network failures; still allow duplicate snapshots on processes are both tlog and storage to avoid race
* Add retry limit as a knob and make backoff exponentail
* Add getDatabaseConfiguration(Transaction* tr)
* revert back to send request for each role once
* update some comments
* extract tenant management into its own file and namespace
* rename the tenant management workload source file
* extract tenant special keys functions to a separate file
* extract some helper functions to GenericTransactionHelper.h
* convert StringRef -> TenantNameRef
* move some TenantMapEntry implementation into the cpp file
* add some helper functions to decode/encode a tenant mode
* Log failed connection attempts in monitorProxies
* Update coordinator list from the cluster file after failing to connect to all coordinators
* Wiggle and upgrade test with legacy version monitoring; updating tests to use 7.1.9
* Update coordinator list from the cluster file: addressing review comments
* Update coordinator list from the cluster file: addressing review comments
* Wait on future for all setAndPersistConnectionString calls
* Defer recoveredDiskFiles wait if Encryption data at-rest is enabled
Description
In the current code ClusterController startup wait for 'recoveredDiskFiles'
future to complete before triggered 'clusterControllerCore' actor, which
inturn starts 'EncryptKeyProxy' (EKP) actor resposible to fetch/refresh
encryption keys needed for ClusterRecovery as well interactions with
KMS.
Patch addresses a circular dependency where StorageServer initialization
depends on EKP, but, CC doesn't recruit EKP till 'recoveredDiskFiles' completes
which includes SS initialization. Given 'recoveredDiskFiles' is an optimization,
the patch proposes deferring the 'recoveredDiskFiles' future completion until
new Master recruitment is done as part of ClusterRecovery (unblock EKP singleton)
Testing
Ran 500K correctness runs: 20220618-055310-ahusain-foundationdb-61c431d467557551
Recorded failures doesn't seems to be related to the change.
* Add an DD tenant-cache-assembly actor
* Add basic tenant list monitoring for tenant cache.
* Update DD tenant cache refresh to be more efficient and unit-testable
* Remove the DD prefix in the tenant cache class name (and associated impl and UT class names); there is nothing specific to DD in it; DD uses it; other modules may use it in the future
* Disable DD tenant awareness by default
* fix a fault injection bug in txn store recovery
* Update LogSystemDiskQueueAdapter.actor.cpp
typo
* recoverLoc can be overwritten, so on reset use the stored range start
Description
Major changes include:
1. GetEncryptByKeyIds cache elements can expire.
2. Update iterator after erasing an element during refresh encryption keys
operation.
Testing
EncryptKeyProxyTest
Adding encryption support for TxnStateStore. It is done by supporting encryption. for KeyValueStoreMemory. The encryption is currently done on operation level when the operations are being write to the underlying log file. See inline comment for the encrypted data format.
This PR depends on #7252. It is part of the effort to support TLog encryption #6942.
* Don't fail test if log cursor times out during network partition
Also, exercise the codepath for handling timed_out in simulation, by
reverting this knob buggification behavior to that of 07976993e7.
* clang-format
* Remove unnecessary actorcompiler.h includes (from non-actor files)
* Make AsyncFileChaos a non-actor header file
* Add unactorcompiler.h include to the end of actor header files
* Add missing actorcompiler.h includes to actor header files
* KmsConnector implementation to support KMS driven CipherKey TTL
Description
KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.
Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.
Testing
1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
* KmsConnector implementation to support KMS driven CipherKey TTL
Description
diff-1: Set expireTS for baseCipherId indexed cache
KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.
Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.
Testing
1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
* KmsConnector implementation to support KMS driven CipherKey TTL
Description
diff-2: Fix Valgrind issues discovered runnign tests
diff-1: Set expireTS for baseCipherId indexed cache
KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.
Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.
Testing
1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
* KmsConnector implementation to support KMS driven CipherKey TTL
Description
diff-3: Address review comment
diff-2: Fix Valgrind issues discovered runnign tests
diff-1: Set expireTS for baseCipherId indexed cache
KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.
Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.
Testing
1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
Adding GetEncryptCipherKeys and GetLatestCipherKeys helper actors, which encapsulate cipher key fetch logic: getting cipher keys from local BlobCipherKeyCache, and on cache miss fetch from EKP (encrypt key proxy). These helper actors also handles the case if EKP get shutdown in the middle, they listen on ServerDBInfo to wait for new EKP start and send new request there instead.
The PR also have other misc changes:
* EKP is by default started in simulation regardless of. ENABLE_ENCRYPTION knob, so that in restart tests, if ENABLE_ENCRYPTION is switch from on to off after restart, encrypted data will still be able to be read.
* API tweaks for BlobCipher
* Adding a ENABLE_TLOG_ENCRYPTION knob which will be used in later PRs. The knob should normally be consistent with ENABLE_ENCRYPTION knob, but could be used to disable TLog encryption alone.
This PR is split out from #6942.
Since memory is now limited with RSS size, add RSS size in status json for
reporting. Also change how available_bytes is calculated from:
(available + virtual memory) * process_limit / machine_limit
to:
(available memory) * process_limit / machine_limit
* Fix a heap-use-after-free in a unit test
The data passed to IAsyncFile::write must remain valid until the future
is ready.
* Use holdWhile instead of a new state variable
* Add simulation test for 1 data hall + 1 machine failure case.
* Disable BUGGIFY for DEGRADED_RESET_INTERVAL.
A simulation test discovered a situation where machines attempting to connect
to a dead coordinator (with a well-known endpoint) were getting themselves
marked degraded. This flapping of the degraded state prevented recovery from
completing, as it started over any time it noticed that tlogs on degraded
hosts could be relocated to non-degraded ones.
bin/fdbserver -r simulation -f tests/rare/CycleWithDeadHall.toml -b on -s 276841956
Description
Major changes proposed include:
1. Update EncryptKeyServer APIs to be tenant aware.
2. Update KmsConnector APIs to be tenant aware
Client of above APIs such as: CP, SS and BlobWorker need to supply
encryption domain info that includes: tenantId and tenantName
Testing
1. Update EncryptKeyProxyTest
2. Update RESTKmsConnectorTest
3. Update SimKmsConnectorTest
* Add JWT support to TokenSign
* Encapsulate OpenSSL public/private key type
Type-safe passing around of keys without having to DER/PEM-serialize
(OpenSSL doesn't have distinct types for public and private key)
* Apply Clang format
* Add verify benchmark for JWT and FlatBuffers token
* Unit test base64url::{encode, decode}
* Make all payload fields optional
Let user code validate non-signature fields
* Make all payload fields optional
Completely defer field check to user code
* Move rapidjson from fdbclient to contrib
* Make fdbrpc's rapidjson linkage private
Currently only sources include them.
* Modify rapidjson path in apiversioner.py
* Algorithm::Unknown > Algorithm::UNKNOWN
Cluster controller could send multiple recovery transaction versions
and not accpeting them unconditionally could cause a discrepancy between
the recovery transaction versions on the sequencer and the resolvers,
resulting in a hung recovery (because the cluster controller won't be able
to commit the recovery transaction version).
* Fix a heap-use-after-free in PaxosConfigConsumer.actor.cpp
* Two more defensive local promises
* Two more defensive promise copies
* Fix latent logic error
Factors the updated version calculation path into a function and adds
unit tests for it. Some of the important test paths include making sure
the calculated version always increases, and to check issues with
overflowing integers.
* coordinatorsKey should not storing IP addresses.
Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.
* Remove the legacy coordinators() function.
* Update async_resolve().
ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.
* Clean code format.
* Fix typo.
* Remove SpecifiedQuorumChange and NoQuorumChange.
* Fixing simulation validation assert that was tripping incorrectly
* Commenting out debugging prints
* Fixed multiple error propagation issues in blob worker
* Add missing secondary queries tests for index prefetch
This change adds test for missing secondary queries for index prefetch,
in this case, MATCHED_ONLY mode would NOT return KV and UNMATCHED_ONLY
mode would return KV.
* remove default value for params
* Enable MATCHED and UNMATCHED mode for index prefetch
MATCHED mode returns index entries whose secondary KVs are present,
UNMATCHED mode returns index entries whose secondary KVs are absent.
Note that the conflict read range of this txn is set in 2 steps:
* Set the conflict range for primary query according to request
* Set the conflict ranges for secondary queries according to responses.
As a result, conflicts of different match_index mode are taken care of.
* Fix c binding
Description
Major changes proposed in the patch includes:
1. Update EncryptKeyProxy EncyrptBaseCipherKeyId cache to be indexed
using {encryptDomainId, baseCipherId} instead of only 'baseCipherId'
2. Enhance RESTKmsConnector 'error' tag to encapsulte: errorMessage
and errorCode information
Testing
1. Updated EncyrptKeyProxy test
2. Updated RESTKmsConnector unit test