When remote DC is down, the remote team collection of DD can initializing
waiting for the remote to recover (all_tlog_recruited state). However, the
getTeam request can already be served by the remote team collection. So, for
a RelocateShard (data movement such as split, move), it will get a team for
the remote DC. But the data movement can't make progress on the remote team
because the remote DC hasn't recovered yet. Because of the stuck of data
movement, the primary cannot reach the "storage_recovered" state and stay in
accepting_commit state.
The specifc test failure: slow/ApiCorrectness.toml -s 339026305 -b on
at commit: 0edd899d65
In this test, primary DC has 1 SS killed, remote DC has 2 TLog and 2 SS killed.
So the remote is dead, the remaining 2 SSes can't make progress because of the
loss of 2 TLogs. The repairDeadDatacenter() can't reach the "storage_recovered"
state due to DD's failure of moving shards away from the killed SS in the
primary.
The fix is to exclude all remote in repairDeadDatacenter() so that tells DD to
mark all SSes in the remote as unhealthy. Another fix is to return empty
results for getTeam request if the remote team collection is not ready. This
will allow the data movement to continue, essentially remote team is not changed
for the data movement.
Description
Set `enable_configurable_encryption` knob in the unit test to make
RandomUnitTest runs happy
Testing
BlobCipherUnitTest
EncryptionOps
RandomUnitTest
* EaR: Configurable encryption framework
Description
EaR implementation only supports fixed size on-disk encryption header format.
One drawback of the scheme is, introducing a newer encryption scheme as well
as updating header format in future may incur data migration restrictions.
Major changes proposed in the patch includes:
1. Flexible Encryption header format allowing the following:
1.1. Header flags (metadata) can evolve separately from the encryption algorithm
1.2. Specific encryption algorithm header to allow future extensions.
2. Update the BlobCipher encryption/decryption util classes to work with newer
encryption header format.
3. Continue supporting multiple encryption authentication schemes such as:
HMAC-SHA and AES-CMAC; also, supports no encryption-authentication schemes.
4. Refactor BlobCipher unit test to enable testing of new format.
5. Configuration knobs to control encryption header flags and algorithm
versions.
Note:
The on-disk header storage footprint savings due to the newer scheme is as follows:
1. No encryption authentication: 54% smaller compared to existing implementation.
3. AES-CMAC: 16% smaller compared to existing implementation.
3. HMAC-SHA encryption authentication: almost same size.
Testing
BlobCipherTest
EncryptionOpsTest