* flow: add ApiVersion to replace hard coding api version
Instead of hard coding api value, let's rely on feature versions akin to
ProtocolVersion.
* ApiVersion: remove use of -1 for latest and use LATEST_VERSION
* Move arena members to the end of serializer calls
See
https://github.com/apple/foundationdb/tree/main/flow#flatbuffersobjectserializer
for why this is necessary.
* Fix a heap-use-after-free
Previously memory owned by
EncryptKeyProxyData::baseCipherDomainIdKeyIdCache was borrowed by a call
to EncryptKeyProxyData::insertIntoBaseDomainIdCache where it was
invalidated and then used. Now
EncryptKeyProxyData::insertIntoBaseDomainIdCache takes shared ownership
by taking a Standalone.
And also rename some types to end in Ref to follow the flow conventions
described here: https://github.com/apple/foundationdb/tree/main/flow#arenas
* Make envvar access in envvar-knob parsing cross-platform
Variable `environ' is defined for Linux,
does not exist in Mac, and is deprecated in Windows
* Prevent env knob strings from being copied at each iteration
* Add headers for {Get|Free}EnvironmentStrings()
* Adjust for different memory layout of environ vs GetEnvironmentStrings()
GetEnvironmentStrings returns a single byte string
with adjacent key=value strings delimited with null bytes,
whereas environ is an array of key=value strings.
* Fix missing return value for Windows build
* Have just one return statement
* Implement TenantCacheEntry in-memory cache
Description
diff-4: TraceEvent usage improvements
diff-3: Address review comments
diff-2: Add APIs to read counter values, test improvements
diff-1: Address review comments
Major changes includes:
1. Implements an actor that enables an in-memory caching of
TenantCacheEntry object, allowing the caller to embed custom
information along with TenantCacheEntry.
2. The cache follows read-through cache semantics where the entry
gets loaded from underlying database on a miss.
3. The cache implements a "periodic poller" to refresh known Tenants
by consulting the database. Once a database keyrange-watch feature is
available, cache shall be updated.
Bonus:
Implement a 'recurringAsync' addition to genericActors allowing caller
to schedule a periodic task registering an "actor functor"; the routine
'waits' for the actor unlike existing 'recurring' implementation.
Testing
TenantEntryCache workload
devCorrectnessRun - 100K
* Update network address in trace logs; Add system monitor for flowprocess
* Create a new trace file with the correct process address for flowprocess
* Remove unused debugging traces
* Add a new error lock_file_failure; Change please_reboot_remote_kv_store to please_reboot_kv_store; Add the code to only reboot the kv store but not the worker; Remove some unnecessay traces
* Add error handling for file_not_found in handleIOErrors
* Format worker.actor.cpp file
Description
FDB native encryption data at-rest supports two type of cipher-keys
in-memory caching:
1. Revocable keys - with a definite expiry (future timestamp)
2. Non-revocable keys - with or without expiry timestamp and/or
refreshAt timestamp.
Patch update BlobCipherKey in-memory cache to respect EKP/KMS
supplied 'refreshAt' and 'expireAt' timestamp. GetLatestCipher
validates `cipher key freshness' as well as GetCipherKey checks
for 'cipher key liveness' before replying details to the caller.
Patch also optimizes the BlobCipher module logging by taking
following measures:
1. BLOB_CIPHER_DEBUG macro to guard spammy log messages needed
mostly for debugging failures.
2. Minimize log volume by logging cipherKey details for any new
key added to the cache, key-refreshes are not logged.
3. Categorize logs into: debug, info and warn on per-usecase basis
Testing
devRunCorrectness - 100K
EncryptOps.toml - 100K
Currently GRV is reporting proxy_memory_limit_exceeded error which has
error message claiming Commit proxy failing. This split should remove
such confusion.
* Enable configuring the next future protocol version as the current protocol version in FDB client, fdbserver, and fdbcli
* Auto format python files used in upgrade tests
* Add a test for upgrading to a future FDB version
* Emphasize that the options for using future protocol version are intended for test purposes only
* Make the global variable for current protocol version visible only locally
* Refactirng to avoid using currentProtocolVersion() in static intialization
* Update go bindings
Currently it invokes pow(10, uniform_rand(log_e(range_begin), log_e(range_end))),
which may overflow beyond UINT32_MAX.
Fix it by preventing log_e(0) case and change pow base to M_E
* Don't build fdb c shim with ubsan
This avoids duplicate symbols when linking. It doesn't really make sense
to assemble files with -fsanitize=undefined anyway, since it won't
insert instrumentation.
* Consolidate boost_asan with boost_target
* Fix ASAN build
Description
-diff-2: Fix Mac build issues
-diff-1: Address review comments
Patch addresses the issue where ASAN build failed after introducing
BlobGranule compression.
Testing
ASAN build
As of version 7.1, knob parsing started using std::stol which will
partially consume its input and return success, even if that's not
what we want, e.g. std::stol("4GiB") returns 4.
This changes returns the behavior of 7.0, which would consider the
above example invalid.
* proof of concept
* use code-probe instead of test
* code probe working on gcc
* code probe implemented
* renamed TestProbe to CodeProbe
* fixed refactoring typo
* support filtered output
* print probes at end of simulation
* fix missed probes print
* fix deduplication
* Fix refactoring issues
* revert bad refactor
* make sure file paths are relative
* fix more wrong refactor changes
* Attempt to fix windows build
The windows build is complaining that some symbols in flow are
duplicate, which seems fair since we're currently compiling all the flow
sources and _also_ linking to flow for flowlinktest. This change makes
it so that we only link flow, and don't compile the flow source files
again.
* Remove source files from fdbclient and fdbrpc link tests
* Shard based move.
* Clean up.
* Clear results on retry in getInitialDataDistribution.
* Remove assertion on SHARD_ENCODE_LOCATION_METADATA for compatibility.
* Resolved comments.
Co-authored-by: He Liu <heliu@apple.com>
Currently, we have code in different folders like `flow/` and `fdbrpc/`
that should remain isolated. For example, `flow/` files should not
include functionality from any other modules. `fdbrpc/` files should
only be able to include functionality from itself and from `flow/`.
However, when creating a shared library, the linker doesn't complain
about undefined symbols -- this only happens when creating an
executable. Thus, for example, it is possible to forward declare an
`fdbclient` function in an `fdbrpc` file and then use it, and nothing
will break (when it should, because this is illegal).
This change adds dummy executables for a few modules (`flow`, `fdbrpc`,
`fdbclient`) that will cause a linker error if there are included
symbols which the linker can't resolve.
* Add JsonWebKeySet parser/stringifier
* Update header directory
* Make JWKS parser correctness clean for OpenSSL 1.x
Add RSA keygen support
* Make JWKS parser correctness clean for OpenSSL 3.x
+extend unique_ptr for scoped destruction of OpenSSL objects
* Use PKey::{sign|verify}() in TokenSign
* Apply AutoCPointer to MkCert
* Apply Clang format
* JWKS::toStringRef() returns StringRef > Optional<StringRef>
* Fix Mac/Windows build error
* Fix incorrect fix of Mac build
* Fix filename in license comment for AutoCPointer.h
* Refactor complex C macros into function templates
This PR add support for TLog encryption through commit proxy. The encryption is done on per-mutation basis. As CP writes mutations to TLog, it inserts encryption header alongside encrypted mutations. Storage server (and other consumers of TLog such as storage cache and backup worker) decrypts the mutations as they peek TLog.
Currently, a std::string is copied unnecessarily for every key and value
in a trace event.
This actually showed up in a jemalloc heap profile while I was
investigating something unrelated. I was surprised to see it since these
allocations should have a very short lifetime.
Description
Major changes include:
1. GetEncryptByKeyIds cache elements can expire.
2. Update iterator after erasing an element during refresh encryption keys
operation.
Testing
EncryptKeyProxyTest
* Work around flow trace's data race bug
BaseTraceEvent::setNetworkThread() and flushTraceFile[()|Void()]
has a long-standing race condition for traceEventThrottlerCache global
when flushTraceFileVoid() is not called from the network thread.
This race dates back to 2017 (commit hash 80e5fecfe2),
so before the race itself is fixed, work around the problem.
* Remove call to flushTraceFileVoid() from MkCertCli
* Apply clang format
* Close trace file when error happens in runNetwork().
* Improve the bestCount algorithm in getLeader().
In the current implementation, if the nominees are [0,1], the chosen leader will be 1, which is an exception to other cases and our expectation that if 2 nominees have the same frequency, the one with lower id will be the leader.
* Remove unnecessary new statement.
stream will never be a nullptr.
* Move self->dnsCache out of lambda capture.
Member variables are not capture by default, thus, `host` and `service` are not captured. This somehow successfully compile, but throws std::bad_alloc or basic_string::_S_create exceptions when we call `host+":"+service` in dnsCache.remove().
* Revert unintended change.
* Address comments.
* fixup! Fix the XmlTraceLogFormatter
The original escape process uses a `loop` while the code is actually not
an ACTOR. So the actorcompiler is not reacting. This causes the escape
not escaping the XML fields properly.
* fixup! Reformat source
* KmsConnector implementation to support KMS driven CipherKey TTL
Description
KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.
Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.
Testing
1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
* KmsConnector implementation to support KMS driven CipherKey TTL
Description
diff-1: Set expireTS for baseCipherId indexed cache
KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.
Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.
Testing
1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
* KmsConnector implementation to support KMS driven CipherKey TTL
Description
diff-2: Fix Valgrind issues discovered runnign tests
diff-1: Set expireTS for baseCipherId indexed cache
KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.
Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.
Testing
1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
* KmsConnector implementation to support KMS driven CipherKey TTL
Description
diff-3: Address review comment
diff-2: Fix Valgrind issues discovered runnign tests
diff-1: Set expireTS for baseCipherId indexed cache
KMS CipherKeys can be of two types:
1. Revocable CipherKeys: having a finite lifetime, after which the CipherKey
shouldn't be used by the FDB.
2. Non-revocable CipherKeys: ciphers are not revocable, however, FDB would
still want to refresh ciphers to support KMS cipher rotation feature.
Patch proposes following change to incorporate support for above defined cipher-key
types:
1. Extend KmsConnector response to include optional 'refreshAfter' & 'expireAfter'
time intervals. EncryptKeyProxy (EKP) cache would define corresponding absolute refresh &
expiry timestamp for a given cipherKey. On an event of transient KMS connectivity outage,
a caller of EKP API for a non-revocable key should continue using cached cipherKey until
it expires.
2. Simplify KmsConnector API arena handling by using VectorRef to represent component
structs and manage associated memory allocation/lifetime.
Testing
1. EncryptKeyProxyTest
2. RESTKmsConnectorTest
3. SimKmsConnectorTest
* Add TRACING_SPAN_ATTRIBUTES_ENABLED Knob, default false.
In order to prevent accidental leakage of PII to external tracing collector services,
we've added a knob to prevent additional attributes to be added to spans unless explicitly
enabled by the user.
* Enable span attributes knob for unit tests.
Adding GetEncryptCipherKeys and GetLatestCipherKeys helper actors, which encapsulate cipher key fetch logic: getting cipher keys from local BlobCipherKeyCache, and on cache miss fetch from EKP (encrypt key proxy). These helper actors also handles the case if EKP get shutdown in the middle, they listen on ServerDBInfo to wait for new EKP start and send new request there instead.
The PR also have other misc changes:
* EKP is by default started in simulation regardless of. ENABLE_ENCRYPTION knob, so that in restart tests, if ENABLE_ENCRYPTION is switch from on to off after restart, encrypted data will still be able to be read.
* API tweaks for BlobCipher
* Adding a ENABLE_TLOG_ENCRYPTION knob which will be used in later PRs. The knob should normally be consistent with ENABLE_ENCRYPTION knob, but could be used to disable TLog encryption alone.
This PR is split out from #6942.
Description
Major changes proposed include:
1. Update EncryptKeyServer APIs to be tenant aware.
2. Update KmsConnector APIs to be tenant aware
Client of above APIs such as: CP, SS and BlobWorker need to supply
encryption domain info that includes: tenantId and tenantName
Testing
1. Update EncryptKeyProxyTest
2. Update RESTKmsConnectorTest
3. Update SimKmsConnectorTest
* Add JWT support to TokenSign
* Encapsulate OpenSSL public/private key type
Type-safe passing around of keys without having to DER/PEM-serialize
(OpenSSL doesn't have distinct types for public and private key)
* Apply Clang format
* Add verify benchmark for JWT and FlatBuffers token
* Unit test base64url::{encode, decode}
* Make all payload fields optional
Let user code validate non-signature fields
* Make all payload fields optional
Completely defer field check to user code
* Move rapidjson from fdbclient to contrib
* Make fdbrpc's rapidjson linkage private
Currently only sources include them.
* Modify rapidjson path in apiversioner.py
* Algorithm::Unknown > Algorithm::UNKNOWN
also in order to continue tracing the pending network thread activity.
Poll event throttler only in the network thread in order to avoid a race condition.
* coordinatorsKey should not storing IP addresses.
Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.
* Remove the legacy coordinators() function.
* Update async_resolve().
ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.
* Clean code format.
* Fix typo.
* Remove SpecifiedQuorumChange and NoQuorumChange.
The OTEL receiver does not utilize the parent Trace id, only parent Span id. Essentially
it relies on Trace ids implicitly matching during trace assembly. Therefore, we may remove
these 16 bytes in serialization.
* Enable debugId tracing for encryption requests
Description
diff-1: Minor fixes, address review comment
Proposed changes include:
1. Update EncryptKeyProxy API to embded Optional<UID> for debugging
request execution.
2. Encryption participant FDB processes can set 'debugId' enabling
tracing requests within FDB cluster processes and beyond.
3. The 'debugId' if available is embedded as part of 'request_json_payload'
by RESTKmsConnector, enabling tracing request between FDB <--> KMS.
4. Fix EncryptKeyProxyTest which got broken due to recent changes.
Testing
Updated following test:
1. EncryptKeyProxy simulation test.
2. RESTKmsConnector simulation test.
Description
Testing
* REST KmsConnector implementation
Description
diff-1: Address review comments.
Add utility interface to Platform namespace to
create and operate on tmpfile
diff-2: Address review comments
Link Boost::filesystem to CMake build process
Major changes includes:
1. Implement REST based KmsConnector implementation.
2. Salient features of the connector:
2.1. Two required configuration are:
a. Discovery KMS URLs - enable KMS discovery on bootstrap
b. Endpoint path configuration to construct URI to fetch/refresh
encryption keys
c. Configuration to provide "validationTokens" to connect with
external KMS. Patch implements file-based token validation scheme.
2.2. On startup, RESTKmsConnector discovers KMS Urls and caches
them in-memory. Extracts "validationTokens" based on input config.
2.3. Expose endpoints to allow fetch/refresh of encryption keys.
2.4. Defines JSON format to interact with external KMS - request &
response payload format.
3. Extend Platform namespace with an interface to create and operate on
tmp files.
4. Update Platform 'readFileBytes' and 'writeFileBytes' to leverage
fstream supported implementation.
NOTE: KMS URLs fetched after initial discovery will be persisted using
DynamicKnobs. It is TODO at the moment and shall be completed
once DynamicKnobs is feature complete
Testing
Unit test to validation following:
1. Parsing on "validation tokens" logic.
2. Construction and parsing of REST JSON request and response strings.