The PR is updating storage server and Redwood to enable encryption based on the encryption mode in DB config, which was previously controlled by a knob. High level changes are
1. Passing encryption mode in DB config to storage server
1.1 If it is a new storage server, pass the encryption mode through `InitializeStorageRequest`. the encryption mode is pass to Redwood for initialization
1.2 If it is an existing storage server, on restart the storage server will send `GetStorageServerRejoinInfoRequest` to commit proxy, and commit proxy will return the current encryption mode, which it get from DB config on its own initialization. Storage server will compare the DB config encryption mode to the local storage encryption mode, and fail if they don't match
2. Adding a new `encryptionMode()` method to `IKeyValueStore`, which return a future of local encryption mode of the KV store instance. A KV store supporting encryption would need to persist its own encryption mode, and return the mode via the API.
3. Redwood accepts encryption mode from its constructor. For a new Redwood instance, caller has to specific the encryption mode, which will be stored in Redwood per-instance file header. For existing instance, caller is supposed to not passing the encryption mode, and let Redwood find it out from its own header.
4. Refactoring in Redwood to accommodate the above changes.
* Fix Redwood tree height overgrowth when EaR and tenant page split are enabled, by removing the buildNewSubtree() logic.
* Fixing incorrect page upper bound for the last page created by writePages() without the buildNewSubtree() logic.
* Enable tenant page split if encryption mode is domain-aware encryption.
* Related test fixes:
- In simulation, pass encryption mode to storage/Redwood via knobs. This is a workaround to enable testing with Redwood encryption before we correctly pass the encryption mode via db config. Also temporarily disable tenant page split for restart tests.
- Disable raw access in FuzzApiCorrectness test if domain-aware encryption is enabled, to avoid test timeout
- Disable encryption for DrUpgradeRestart test, which is likely to fail due to a rare EKP deadlock issue blocking recovery. Will re-enable after the deadlock issue is fixed.
The unit test intentionally cause unexpected_encoding_type being thrown, which would hit a simulation only assert failure. Disabling that assert in this case.
Previously with EaR we always enable authentication (e.g. we encrypt Redwood pages). The authentication is a form of checksum, so dedicated page checksum was not needed. This PR adds back xxhash page checksum when authentication is disabled. Also change the knob to default disable authentication.
We have a recent redesign that no longer required to pass tenant name to get encryption key, and also not allowing optional tenant mode for tenant-aware encryption. This PR clean up Redwood code to remove tenant map usage, and update various checks accordingly.
Changes:
* Cleanup TenantPrefixIndex in TenantAwareEncryptionKeyProvider and related logic in storage server and Redwood for passing the map around.
* Cleanup and update DecodeBoundaryVerifier the reflect the new design.
* A minor fix to writePages() that avoid a page that's default domain encrypted having a lower bound key belonging to a non-default domain.
* Fix TenantAwareEncryptionKeyProvider::getEncryptionDomain() returning wrong prefix long for system domain.
* A minor change to add a context string to IoTimeoutError.
Redwood encrypt with page granularity. To do per-tenant encryption (i.e. each tenant are encrypted with different set of cipher keys), we need to split Redwood pages by tenant boundary. Moreover, it also needs to handle different tenant modes:
* tenantMode = disabled: do not split page by tenant, all data encrypt using default encryption domain.
* tenantMode = required: look at the prefix of the keys and split by tenant accordingly, and encrypt in tenant specific encryption domain accordingly.
* tenantMode = optional: some key ranges may not map to a tenant. In additional to looking at the key prefix, the key provider also query the tenant prefix index. For prefixes not found in the tenant prefix index, corresponding key should be encrypted using the default encryption domain.
The change also enforce data for each tenant forms a subtree, and key of the link to the subtree is exactly the tenant prefix.
This PR is building on top of #8172 and use the IPageEncryptionKeyProvider interface added there. Changes:
* In `writePages` and `splitPages`, query the key provider to split page accordingly.
* In `commitSubtree`, when doing in-place update (to both of leaf or internal page), check if the entry being insert belong to the same encryption domain as existing data of the page. If not, fallback to full page rebuild, where `writePages` will handle the page split.
* When updating the root, check if it is encrypted using non-default (i.e. tenant specific) domain. If so, add a redundant root node which will be encrypted with default encryption domain.
Tested with 100K run of `Lredwood/correctness/btree` unit test, where it uses `RandomEncryptionKeyProvider`, which is updated to support and generate random encryption domain with 4 byte domain prefixes.