* EaR: Configurable encryption framework
Description
EaR implementation only supports fixed size on-disk encryption header format.
One drawback of the scheme is, introducing a newer encryption scheme as well
as updating header format in future may incur data migration restrictions.
Major changes proposed in the patch includes:
1. Flexible Encryption header format allowing the following:
1.1. Header flags (metadata) can evolve separately from the encryption algorithm
1.2. Specific encryption algorithm header to allow future extensions.
2. Update the BlobCipher encryption/decryption util classes to work with newer
encryption header format.
3. Continue supporting multiple encryption authentication schemes such as:
HMAC-SHA and AES-CMAC; also, supports no encryption-authentication schemes.
4. Refactor BlobCipher unit test to enable testing of new format.
5. Configuration knobs to control encryption header flags and algorithm
versions.
Note:
The on-disk header storage footprint savings due to the newer scheme is as follows:
1. No encryption authentication: 54% smaller compared to existing implementation.
3. AES-CMAC: 16% smaller compared to existing implementation.
3. HMAC-SHA encryption authentication: almost same size.
Testing
BlobCipherTest
EncryptionOpsTest
Also, to minimize audit log loss, handle token usage audit logging at each usage.
This has a side-effect of making the token use log less bursty.
This also subtly changes the dedup cache policy.
Dedup time window used to be 5 seconds (default) since the start of batch-logging.
Now it's 5 seconds from the first usage since the closing of the previous dedup window
The metacluster status command in fdbcli currently reports some useful metacluster information when run on a
management cluster. We should update this command to report a status even on data clusters of a metacluster and
standalone clusters that do not belong to any metacluster.
- On data clusters, this would report that the cluster is a data cluster as well its name and the name of the metacluster it is a part of.
- On standalone clusters, status should report that the cluster is not part of a metacluster.
Test plan:
- CI
- Manual test
- Added new test `metacluster_fdbcli_tests.py` that can be run with ctest `ctest -R metacluster_fdbcli_tests`
Changes:
1. Update `ConfigureDatabase` workload to test with Redwood, while previously it does not. Also when encryption is enabled, only test with Redwood and not test the migration to other storage engine types, as currently only Redwood supports encryption.
2. Update multiple restart tests so that when testing upgrade from/downgrade to 7.2, disable encryption. This is due to recent change to make encryption controlled by DB config instead of a knob make 7.3 encryption incompatible with 7.2 encryption, and 7.2 encryption is considered an incomplete feature. This is done by splitting `from_7.1.0` test directory into `from_7.1.0_until_7.2.0`/`from_7.2.0_until_7.3.0`/`from_7.3.0`, duplicating every test to these directories and add `disableEncryption=true` when needed. Except...
3. For tests that run `SnapTest`, keep them in `from_7.1.0_until_7.2.0` directory. These tests could fail with 7.2/7.3 restart test, and due to separate Joshua issue, these failures are not exposed when they stay in `from_7.1.0` directory. The plan is to only keep these snapshot tests in `from_7.3.0` (or the directory for restart from the latest version) once issues are fixed.
Otherwise, the Attrition can RebootAndDelete tlogs in remote DC such that the
remote is unusable and blocking recovery to fully_recovered state. In fact,
the FirstCycleTest can only reach the accepting_commits state.
In the part 2 of the restarting test, the runTests() wait for quietDatabase()
to reach fully fully_recovered state, but was stuck in the accepting_commits
state.
* Print an asan heap profile on OOM
* Use 32KiB stacks for boost coro
* Print 100%, 10 max contexts for asan OOM
* Lower machineCount to 30 in DataLossRecovery test
* Add asanMachineCount override to control ASAN memory usage
* Implemented AuditUtils.actor.cpp
Moved AuditUtils to fdbserver/
* Persist AuditStorageState.
* Passed persisted AuditStorageState test.
* Added audit_storage_error to indicate a corruption is caught.
Throw/Send audit_storage_error when there is a data corruption.
Added doAuditStorage() for resuming Audit.
* Load and resume AuditStorage when DD restarts.
* Generate audit id monotonically.
* Fixed minor issue AuditId/Type was not set.
* Adding getLatestAuditStates.
* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.
* Added `audit_storage` fdbcli command.
* fmt.
* Fixed null shared_ptr issue.
* Improve audit data.
* Change DDAuditFailed to SevWarn.
* Sev.
* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.
* Moved AuditUtils* to fdbclient/.
* Added getAuditStatus fdbcli command.
* Refactor audit storage fdb cli commands.
* Added auditStorage in sim.
* Cleanup.
* Resolved comments.
* Resolved comments.
* Test disabling audit for sims.
* Cleanup.
Co-authored-by: He Liu <heliu@apple.com>