* Test watch cleanup on cancel
* Fix clearing the database in Java integration tests
* Always cancel the futures wrapped by MVC abortable futures
* More tests for watch cleanup
* Fix clear database database in some Java integration tests
* improve audit throughput
* if ssshard fails do audit due to ssi failure, then global retry is required
* fix a trace event name
* fix budget release in doAudit
* avoid throttling in general simultion tests
* fix doAuditOnStorageServer throw error
* avoid starting a task that has been complete
* when ddaudit ssshard failed, check if ssi is removed, if yes, silently exit
* fix trace detail name of AuditUtilStorageServerRemovedEnd evenrt
* redo schedule in doAuditOnStorageServer
* schedule does not wait doAudit
* remove TESTING_AUDIT_STORAGE_THROTTLING
* ssaudit stops proceeding if ddauditstate is not in running phase
* make tester audit storage only happen when simulation, and randomly set CONCURRENT_AUDIT_TASK_COUNT_MAX
In some rare cases, using the same failure time may prevent a certain size from
getting a team, thus cause data movement to become stuck.
Reproduction:
Seed: -f ./tests/rare/DcLag.toml -s 924031356 -b on
commit: 0975874b9 on release-7.3
build: gcc
* Remove duplicate getRange() for DB handles and update existing GetRange to accept DB handles.
* Initial progress checkpoint on new ConsistencyScan role.
* Updated TODOs, finished most if not all state updates.
* placeholder
* Add more TODOs, documentation and comment improvements.
* Checkpoint round state to avoid advancing progress if commit fails.
* Bug fix, check is supposed to be for overlap, not lack of overlap.
* Added more TODO's and added faked read results / exceptions and faked DB size retrieval to prove the consistencyScanCore logic works.
* Update JSON schemas and command help.
* Add comment about lifetime stats reset.
* More TODO comments and some renames for clarity, some bug fixes.
* properly stopping consistency scan in simulation so that it doesn't run forever and cause quiet database to fail
* removing trailing comma from consistency_scan json schema
* Making CC inconsistency not an error if it's intentional tss corruption
* consistency scan actually reads storage locations
* added check that consistency scan actually completes a round in simulation, fixed bug and added debugging around consistency scan getting stuck
* made consistency scan properly fetch database size
* refactoring data check to be used in both consistency scan and consistency check
* checking that consistency scan always completes at least one round and doesn't get stuck
* cleanup
* fixing ide build
* consistencyscan fdbcli command wasn't actually changing db state
* consistencyscan fdbcli command always said enabled even when it wasn't
---------
Co-authored-by: Steve Atherton <steve.atherton@snowflake.com>
In the KillRegion workload, a force recovery can intentionally kill a region
and recover from another region with data loss. It's possible that when the
ConsistencyScan (CS) role is checking a shard from two storage servers, the
one from the failed region returned first for a higher version, then the SS
was killed. Later, CS got data from the second SS from a different region,
where forced recovery rollback happened and a lower version value was returned.
In this case, even though the data is inconsistent, we shouldn't fail the test.
Reproduction:
commit: 8e0c9eab8 on release-7.3 branch
seed: -f ./tests/fast/KillRegionCycle.toml -s 2713020850 -b on
build: gcc
* EaR: REST based Simulated KMS Vault request hanlder interface
Description
diff-1: Address review comments
Improve unit test case coverage
diff-2: Extend RESTKmsConnectorUtil to generate HTTP::Header
EaR simulation testing is currently driven using SimKmsConnector
interface, it exposes endpoints directly invoked by EKP to fetch
encryption keys. Approach avoids testing RESTKms communication
path. Recently FDB codebase got extended by adding HTTPServer
interface, which was a gap prohibiting end-to-end testing of
EaR code.
Patch proposes following changes:
1. Refactor RESTKmsConnector to move common code and definitions
to RESTKmsConnectorUtil namespace
2. Introduce RESTSimKmsVault accepting HTTP format requests and
providing appropriate HTTP response.
Testing
RESTUnit 100K + 5k valgrind
devRunCorrectness 100K
Testing
* Stop BGCC while blob restore is in progress
* Explicitly add back knob BG_CONSISTENCY_CHECK_ENABLED for possible later use
* Add back the knobs to test files to avoid race condition
* Add back the knobs to test files to avoid race condition
* Add back the knobs to test files to avoid race condition