* Implemented AuditUtils.actor.cpp
Moved AuditUtils to fdbserver/
* Persist AuditStorageState.
* Passed persisted AuditStorageState test.
* Added audit_storage_error to indicate a corruption is caught.
Throw/Send audit_storage_error when there is a data corruption.
Added doAuditStorage() for resuming Audit.
* Load and resume AuditStorage when DD restarts.
* Generate audit id monotonically.
* Fixed minor issue AuditId/Type was not set.
* Adding getLatestAuditStates.
* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.
* Added `audit_storage` fdbcli command.
* fmt.
* Fixed null shared_ptr issue.
* Improve audit data.
* Change DDAuditFailed to SevWarn.
* Sev.
* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.
* Moved AuditUtils* to fdbclient/.
* Added getAuditStatus fdbcli command.
* Refactor audit storage fdb cli commands.
* Added auditStorage in sim.
* Cleanup.
* Resolved comments.
* Resolved comments.
* Added SystemData for metadata audit.
Refactored audit workflow to make sure all sub-tasks are executed w/o
early exit.
* Improvements.
* Persisted Failed state after too many retries.
* Added retryCount for resumeAuditStorage().
* resolving conflict.
* Resolved conflicts.
* allow-merged-to-run
* add timeout to audit client
* fmt
* validate replica
* add audit serverKey
* address comments and fmt
* fix audit_storage_exceeded_request_limit
* fix segfault in getLatestAuditStatesImpl
* fix bugs
* remove timeout from workload
* fix bugs
* audit local view of shard assignment
* fmt
* fix-stuck-issue-and-make-dd-audit-storage-self-retry
* fix timeout
* fix timeout
* fix bugs and cleanup
* fix nit
* change name state to coreState for audit metadata
* address comments
* code clean
* fmt
* setup debug
* cleanup
* clean up
* code cleanup
* code clean
* remove tmp file
* fmt
* trace portion of shards that of anonymous physical shard
* remove unnecessary actor cleanup
* do not give up when tr is too old
* address commits
* refactor
* clean
* fmt
* fix-command-help-text
* fix-auditstate-restore-and-enable-restore-to-metadata-audit
* address comments
* fmrt
* debug and improve efficient of resume audit
* small change
* fix audit cli
* bypass completed audit when dd restart
* fix auditStorageCommandActor
* make mismatch key range more visable
* address comments
* make local shard metadata check can make progress by retries
* address comments
* address comments
* partition location metadata validation by range and server
* unset MIN_TRACE_SEVERITY
* address comments and SS auto proceed until failed then notify dd
* persistNewAuditState should checkMoveKeysLock
* audit storage location metadata partitioned by range and move shard assignment history def to the end of SS structure
* code cleanup
* fix error message in metadata validation
* fix registerAuditsForShardAssignmentHistoryCollection input for local shard validation
* add comments to code and add guard to make sure the SS audit does not proceeds automatically for many times without being notified by DD --- to support audit cancellation later
* fix coalesceRangeList
* replace rangeOverlapping func with operator and use struct instead of complicated type for return value of getKeyServer/serverKey/shardInfo
* simplify shard assignment history
* shardAssignmentRecordRequests should be unorder_map
* address comments, make trackShardAssignment simple, make anyChildAuditFailed cover all audit children, keep only one audit actor run at a time on each SS
* only run validate shard info once at a time, other audit type does not have this limitation
---------
Co-authored-by: He Liu <heliu05023@gmail.com>
Co-authored-by: He Liu <heliu@apple.com>
Co-authored-by: Zhe Wang <zhewang@Zhes-Laptop.local>
* fix metacluster get segfault
* update fdbcli tenant list function to take tenant group filter, support JSON, and report tenant IDs
* code review changes
* code formatting
* additional code review changes
* account for empty tenant groups
* reformat error catching in fdbcli command
* refactor json output and address code review comments
* add back mistakenly removed hint
* keep hints after 4th token
* add to tenant management workload
* fix compile error
* fix test range
* add more asserts to metacluster case
* nest test condition inside if block
* adjust tenant test layout
* refactor some test files
* reorganize test workload logic
Description
Patch removes an unused CODE_PROBE checking the encryption header
being read flag version is valid, given the flag-version is determined
by peeking into std::variant index and we only have version-1 supported,
for now converted the check to an ASSERT
Testing
EncryptionUnitTests.toml
EncryptionOps.toml
BlobGranuleCorrectness/Clean.toml
In clearData() used by simulation test to clean up, we re-use the same debugID for multiple transactions,
making it less clear when grepping for the debugID. This PR assigns a new debugID for each new transaction.
Some of the commented-out tracing code are already obsolete, I updated several of them, but am not sure if
the tracing should be enabled when enabling debug transaction.
Finally, use different but similar with the same prefix error messages for different call stacks.
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
Fix `RangeResult.readThrough` misuses:
1. KeyValueStores do not need to set readThrough, as it will not be
serialized and return. Also setting it to the last key of the result
is not right, it should at least be the keyAfter of the last key;
2. Fix NativeAPI doesn't set `RangeResult.more` in a few places;
3. Avoid `tryGetRange()` setting `readThrough` when `more` is false,
which was a workaround for the above item 2;
4. `tryGetRangeFromBlob()` doesn't set `more` but set `readThrough` to
indicate it is end, which was following the same above workaround I
think. Fixed that.
5. `getRangeStream()` is going to set `more` to true and then let the
`readThrough` be it's boundary.
Also added readThrough getter/setter function to validate it's usage.
* EaR: Implement Key Check Value semantics
Description
Key Check Value (KCV) is a checksum of cryptographic encryption key
used to validate encryption keys's integrity. FDB Encryption at-rest
relies on external KMS to supply encryption keys.
Patch proposes following major changes:
1. Implement Sha256 based KCV implementation to protect against
'baseCipher' corruption in two possible scenarios:
a) potential corruption external to FDB
b) potential corruption within FDB processes.
2. Scheme persists computed KCV token in block encryption header,
which then gets validated as part of header validation during
decryption.
3. FDB Encryption key derivation uses HMAC_SHA256 digest generation
scheme, which allows max 64 bytes of 'cipher buffer', patch add
required check to ensure 'baseCipher' length are within bounds.
OpenSSL HMAC underlying call ignores extra length if supplied, however,
it weakens the security guarantees, hence, disallowed.
Testing
devRunCorrectness - multiple 500K runs
Valgrind & Asan - BlobCipherUnit, RESTKMSUnit, BlobGranuleCorrectness*,
EncryptionOps, EncryptKeyProxyTest
* EaR: Fix BlobCipher cache handling for cipher needs refresh and/or expired
Description
Patch proposes BlobCipher cache bug related to handling of cipherKeys
that either 'needsRefresh' and/or 'expired'
Also, adds a unit-test to cover the following usecase:
1. Test refreshAt and expireAt properties of the cipherKey
2. Validate corresponding Counter value increments
Testing
Extend /blobCipher unitest tests
Description
EKP failure detection time is choosen to be fairly low given EKP
availability could impact cluster availability. However, in
simulation runs, for various reasons such a low-latency failure
detection could cause EKP recruitment to be done in a loop.
Patch relaxes the failure detection time for simulation runs
Testing
Description
RESTKms validation tokens are provided using foundationdb.conf, the file
uses '#' as comment start character. Update the token-name seperator to
'$' instead of '#'
Testing
RESTKmsConnectorUnit.toml
* Api Tester: Specify knobs in the toml file; Test loop profiler
* Gracefully stop the loop profiler thread
* Protect loop profiler thread by mutex
* Create loop profiler thread only if is not stopped
In the pathological case that both key size and value size are 0,
the probability of choosing that key-value pair is 0, and we divide
by zero when computing the sampledSize.
This change adds documentation to that function, which was quite
difficult to understand. In addition, we add `probability` to the
returned values, since one of the callers was backing it out from
sampledSize and itself in danger of dividing by zero.
* add a field to show average data movement bytes in MovingData trace
* change variable type
* Make changes to variable/function naming and add more comments
* move rolling window struct to a new file; deal with corner case: dd startup, elements are full
* format
* simplify codes
* modify file/struct name, universal to moving window
* fix typo
* add a new Knob to control MovingWindow::Deque size
* make the general use of dequeSize limit
* format
* format, use space rather than tab
* Add processID, networkAddress, and locality to layer status JSON for Backup Agents.
* Backup/dr agent determines network address to report in Layer Status only once, when the status updater loop begins, since it is a blocking call which connects to the cluster. And lots of code cleanup.
Description
Patch proposes two changes:
1. Avoid appending tls as part of URI for secure connections
2. RefreshEKs recurring task can be skipped if there are no keys to be refreshed
Testing
EncryptionOps.toml
EncryptKeyProxyTest.toml
devRunCorrectness
devRunCorrectnessFiltered 'Encrypt*'
* EaR: REST kms misc fixes
Description
Patch addresses following issues:
1. Fix "return connection" routine, it fixes a regression introduced by
an earlier fix.
2. Update RESTConnectionPool::connectionPoolMap to an "unordered_map"
for O(1) lookups
3. Improve logging
4. Make RESTUrl parsing handle extra '/' for 'resource'
Testing
Standalone fdbserver connecting to external KMS and database create
Description
Knob 'REST_KMS_ALLOW_NOT_SECURE_CONNECTION' got renamed in recent
patch, however, there are other places that needs an update too.
Testing
devRunCorrectness - 100K
RESTUtilUnits.toml
RESTKmsConnectorUnits.toml
* EaR: REST KMS fixes - encryption integration testing
Description
Major changes:
1. Multiple fixes observed while performing integration end-to-end
testing for Encryption at-rest feature.
2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL
to have more granular control of feature logging disconnected from
the cluster log level.
Testing
Integration testbed:
1. Run fdbserver standalone
2. Run external KMS http-server to serve encryption key fetch requests
* EaR: RESTClient HTTP compliance, fix json request content type
Description
diff-1: Address review comments
RESTClient is responsible to handle FDB <-> KMS communication
for Encryption and other usecases. By design, it only supports
"secure connection" i.e. "https"; however, it seems there is a
need to expand the module to support "http" connection,
for instance: test and dev deployments for instance.
However, given RESTClient gets involved in handling high
sensitive contents such as: plaintext "encryption cipher
from a KMS", the feature is guarded using
CLIENT_KNOB->REST_KMS_ENABLE_NOT_SECURE_CONNECTION which is
settable using FDBServer command line argument
"--kms-rest-enable_not_secure_connection" (boolean)
Testing
Deployed a standalone fdbserver and communicate with a
simple "http" server
1. Some additional idempotency problems in metacluster tests
2. An assertion that checked that a rename had expected values could fail during concurrent restores, but it would only happen if the transaction itself would fail to commit
3. Tweak the parameters of the MetaclusterRecovery test to try to avoid rare cases of logging too many trace events
1. When retrying the transaction to register a restoring cluster, don't choose a new ID if the current ID matches the one recorded for the restoring cluster
2. A metacluster test was incorrectly handling the case where a transaction was retried with unknown result and had committed successfully
* Fix transaction option consistency in TagThrottleInfo getter
Subroutine of getter actor function for throttled and recommended tags
was, upon retry, resetting the transaction object which the caller also uses,
resetting the transaction option and causing a key_outside_legal_range by caller
Also, allowing a subroutine to conditionally, non-trivially modify the passed object
(i.e. transaction reset) is a risky pattern.
Fix: confine subroutine's responsibility to "attempting to" fetch and parse
"autoThrottlingEnabled" key. Let the calling function reset the object if needed.
* Apply Clang format
The commit proxy writes a `ProxyMetrics` trace every 5 seconds. This
event contains a lot of useful information, such as the number of commit
batches that arrived and exited, the number of mutations processed, the
number of bytes those mutations made up, etc.. However, it is difficult
to tell what the workload pattern looks like within these 5 second
intervals when the metrics are being calculated.
This PR adds a new trace, `ProxyDetailedMetrics`, which logs itself
every 100ms. It currently only writes the number of mutations and the
number of mutation bytes that arrived during the 100ms time period. But
it should be easy to add more metrics in the future.
It's possible this increased logging could cause issues. Based off a
simulation run of the `WriteDuringRead` test, I got the following
results:
```
$ rg ProxyDetailedMetrics trace.json | wc -l
6877
$ rg "Roles\": \".*CP.*\"" trace.json | wc -l
11402
$ wc -l trace.json
96147 trace.json
```
So on processes running as a commit proxy, this approximately doubled
the number of lines logged. But relative to the cluster overall, it only
added about 5% overhead.
If we want to reduce this number, one possibility would be to not write
a trace if all the values being written are 0. I'm not sure if this
would help much in production, but in simulation the large majority of
the traces (99%+) consist of zero values.
* EaR: Update ApiWorkload to validate encryption at-rest guarantees
Description
FDB encryption data at-rest guarantees if cluster is configured with feature
enabled, all data written to persistent disks shall be "encrypted". Given FDB
maintains multiple persistent storages during lifecycle of the data, the patch
proposes a scheme to validate the invariant via "simulation testing"
Patch proposes updating ApiCorrectness workload to do the following:
1. Client supplied params and/randomly enable the validation feature.
2. Validation when enabled, allows injecting a known "marker string"
to workload generated Key and Value data patterns.
3. On shutdown, if the validation is enabled, all test files are
scanned for the known "marker" pattern.
Simulation tests are already capable of doing the following:
1. Randomly select TenantMode (disabled/optional/required)
2. Randomly select EncryptionAtRestMode (cluster_aware/domain_aware)
Hence, the updates test all possible combinations are validated. Also,
'defaultTenant' is present to cover 'domain_aware' encryption use cases.
Testing
devRunCorrectness
devRetryCorrectness - ApiCorrectness & EncryptedBackupCorrectness
1. Change the cluster ID each time the restore registers the cluster
2. Handle commit unknown result in the removal purge
3. Delete the active restore ID when a removal is first recorded rather than at the end of the removal
4. Delete any existing active restore IDs when registering a cluster in the management cluster
* initial commit to split tenant group metadata
* attempt to fix merge errors
* fix compile errors and adjust existing tests
* fix infinite loop and extra ACTOR tag
* direct assignment instead of store
* direct assign instead of store (missed a few)
Right now this only allows one server address being excluded. This is useful
when the database is unavailable but we want the recruitment to skip some
particular processes.
Manually tested the concept works with a loopback cluster.