The `recoveredDiskFiles` is a promise that will be fulfilled once all the local TLog and storage files have been initialized in a process. It was added previously to make a process wait on it before joining the cluster, and it was to avoid a slow recovering TLog to join the cluster to slowdown cluster recovery.
With #7510, we allow a process to join the cluster to play stateless role, while still avoid it to join the cluster as stateful role before its TLog and storage is recovered. As such, the `recoveredDiskFiles` wait is no longer needed. This PR cleanup the logic.
* Extend Tlog persistentStorage to persist encryption state
Description
diff-3: Address review comment.
diff-2: Extend ClusterController endpoints to allow query
cluster's encryptionAtRest status
Update Tlog recovery to ensure on-disk encryption
status matches with cluster's cstate persisted
encryptionAtRest
diff-1: Store encryptionAtRestMode state in Coordinators
Major changes proposed are:
1. Extend TLog persistentStorage to persist encryption state
2. Encryption state persisted is derived from corresponding
db-config and relevant SERVER_KNOBS. In near future, knobs
shall be removed.
3. On TLog startup, the persisted encryption state is compared
against cluster configuration, if mismatch, the TLog is killed
and not allowed to rejoin the cluster.
Testing
devRunCorrectness - 100K
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
The cluster ID is now stored in the database instead of in the
txnStateStore. The cluster controller will read it on boot and send it
to all processes to persist.
And have these processes enter a "zombie" state where they cancel all
their actors and then wait forever, refusing to do any additional work
until they are manually handled by the operator.
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
The cluster ID is now stored in the database instead of in the
txnStateStore. The cluster controller will read it on boot and send it
to all processes to persist.
And have these processes enter a "zombie" state where they cancel all
their actors and then wait forever, refusing to do any additional work
until they are manually handled by the operator.