* EaR: Update KMS URL refresh policy and fix bugs
Description
RESTKmsConnector implements discovery and refresh semantics i.e.
on bootstrap it discovers KMS Urls and periodically refresh the
URLs (handle server upgrade scenario). The current implementation
caches the URLs in a min-heap, as part of serving a request, actor
pops out elements from min-heap and attempts connecting to the server,
on failure, the URL is temporarily stored in a stack, at the end of
the request processing, the stack is merged back into the heap.
The code doesn't work as expected if there are multiple requests
consumes the heap causing following issues:
1. Min-heap would retain old URLs replaced by latest refresh (stack merge)
2. URL discovery file is read more than expected as multiple requests can
empty heap, causing the code to read URLs from the file.
Patch proposes following policy to cache and maintain URLs priority:
1. Unresponsiveness penalty: KMS flaky connection or overload can cause
requests to timeout or fail; each such instance updates unresponsiveness
penalty of associated URL context. Further, the penalty is time bound and
deteriorate with time.
2. Cached URLs are sorted once a failure is encountered, priority followed
is:
2.1. Unresponsiveness penalty server(s) least preferred
2.2. Server(s) with high total-failures less preferred
2.3. Server(s) with high total-malformed response less preferred.
3. Updates RESTClient to throw 'retryable' error up to the client such as:
'connection_failed' and/or 'timeout'
4. Extend RESTUrl to support IPv6 format.
Testing
RESTUnit - 100K (new test added for coverage)
devRunCorrectness
* EaR: REST KMS fixes - encryption integration testing
Description
Major changes:
1. Multiple fixes observed while performing integration end-to-end
testing for Encryption at-rest feature.
2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL
to have more granular control of feature logging disconnected from
the cluster log level.
Testing
Integration testbed:
1. Run fdbserver standalone
2. Run external KMS http-server to serve encryption key fetch requests
* EaR: RESTClient HTTP compliance, fix json request content type
Description
diff-1: Address review comments
RESTClient is responsible to handle FDB <-> KMS communication
for Encryption and other usecases. By design, it only supports
"secure connection" i.e. "https"; however, it seems there is a
need to expand the module to support "http" connection,
for instance: test and dev deployments for instance.
However, given RESTClient gets involved in handling high
sensitive contents such as: plaintext "encryption cipher
from a KMS", the feature is guarded using
CLIENT_KNOB->REST_KMS_ENABLE_NOT_SECURE_CONNECTION which is
settable using FDBServer command line argument
"--kms-rest-enable_not_secure_connection" (boolean)
Testing
Deployed a standalone fdbserver and communicate with a
simple "http" server
Description
Two major changes proposed are:
I)
Used following setup for testing:
1. Run `fdbserver` locally.
2. Run a mock python based HTTP server (encryption endpoints not implemented)
Expectation was RESTClient code should go in loop trying to establish connection
to the desired encryption endpoint. However, observation was the code loops for
one cycle and followup cycle SEGV while printing a log using RESTUrl object which
is obtained as a 'pointer' from the caller. Update the code to use RESTUrl object
instead of the pointer.
II) In above setup, KMSConnector would throw 'encrypt_key_fetch_failed' error
which wasn't handled by EKProxy, hence, causing the service to terminate. Add
code to re-throw the error to the caller.
Testing
Description
Patch fixes an issue where new connection for a corresponding
'connectKey' isn't getting added to the connectionPoolMap.
Testing
Standlone fdbserver triggering RESTClient connection path