* remainingBudgetForAuditTasks should be managed within audit
* fix CI
* add audit storage test for various ranges
* clean DD
* new auditStorageUserDataQ
* fix assert fail in startTrackShardAssignment
* fix assert fail in ssaudit
* address comments
* replace assert with audit_cancel in ss audits
* add audit check progress tool
* add observability to audit progress and fix audit bugs
* fix audit progress issues and add sim test for audit progress and add trace event for the audit progress and add fdbcli to track the audit progress
* remove old audit storage on SS
* check audit progress when auditCore completes
* list audits
* cancel audits and corresponding tests
* make audit storage dblock aware
* increase audit retry since we are able to cancel
* fix updateAuditState and fdb github ci
* fmt
* fix fdbcli audit_storage and fix CI issue
* fix fdb cli
* address comments
* fmt
* Implemented AuditUtils.actor.cpp
Moved AuditUtils to fdbserver/
* Persist AuditStorageState.
* Passed persisted AuditStorageState test.
* Added audit_storage_error to indicate a corruption is caught.
Throw/Send audit_storage_error when there is a data corruption.
Added doAuditStorage() for resuming Audit.
* Load and resume AuditStorage when DD restarts.
* Generate audit id monotonically.
* Fixed minor issue AuditId/Type was not set.
* Adding getLatestAuditStates.
* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.
* Added `audit_storage` fdbcli command.
* fmt.
* Fixed null shared_ptr issue.
* Improve audit data.
* Change DDAuditFailed to SevWarn.
* Sev.
* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.
* Moved AuditUtils* to fdbclient/.
* Added getAuditStatus fdbcli command.
* Refactor audit storage fdb cli commands.
* Added auditStorage in sim.
* Cleanup.
* Resolved comments.
* Resolved comments.
* Added SystemData for metadata audit.
Refactored audit workflow to make sure all sub-tasks are executed w/o
early exit.
* Improvements.
* Persisted Failed state after too many retries.
* Added retryCount for resumeAuditStorage().
* resolving conflict.
* Resolved conflicts.
* allow-merged-to-run
* add timeout to audit client
* fmt
* validate replica
* add audit serverKey
* address comments and fmt
* fix audit_storage_exceeded_request_limit
* fix segfault in getLatestAuditStatesImpl
* fix bugs
* remove timeout from workload
* fix bugs
* audit local view of shard assignment
* fmt
* fix-stuck-issue-and-make-dd-audit-storage-self-retry
* fix timeout
* fix timeout
* fix bugs and cleanup
* fix nit
* change name state to coreState for audit metadata
* address comments
* code clean
* fmt
* setup debug
* cleanup
* clean up
* code cleanup
* code clean
* remove tmp file
* fmt
* trace portion of shards that of anonymous physical shard
* remove unnecessary actor cleanup
* do not give up when tr is too old
* address commits
* refactor
* clean
* fmt
* fix-command-help-text
* fix-auditstate-restore-and-enable-restore-to-metadata-audit
* address comments
* fmrt
* debug and improve efficient of resume audit
* small change
* fix audit cli
* bypass completed audit when dd restart
* fix auditStorageCommandActor
* make mismatch key range more visable
* address comments
* make local shard metadata check can make progress by retries
* address comments
* address comments
* partition location metadata validation by range and server
* unset MIN_TRACE_SEVERITY
* address comments and SS auto proceed until failed then notify dd
* persistNewAuditState should checkMoveKeysLock
* audit storage location metadata partitioned by range and move shard assignment history def to the end of SS structure
* code cleanup
* fix error message in metadata validation
* fix registerAuditsForShardAssignmentHistoryCollection input for local shard validation
* add comments to code and add guard to make sure the SS audit does not proceeds automatically for many times without being notified by DD --- to support audit cancellation later
* fix coalesceRangeList
* replace rangeOverlapping func with operator and use struct instead of complicated type for return value of getKeyServer/serverKey/shardInfo
* simplify shard assignment history
* shardAssignmentRecordRequests should be unorder_map
* address comments, make trackShardAssignment simple, make anyChildAuditFailed cover all audit children, keep only one audit actor run at a time on each SS
* only run validate shard info once at a time, other audit type does not have this limitation
---------
Co-authored-by: He Liu <heliu05023@gmail.com>
Co-authored-by: He Liu <heliu@apple.com>
Co-authored-by: Zhe Wang <zhewang@Zhes-Laptop.local>
* Implemented AuditUtils.actor.cpp
Moved AuditUtils to fdbserver/
* Persist AuditStorageState.
* Passed persisted AuditStorageState test.
* Added audit_storage_error to indicate a corruption is caught.
Throw/Send audit_storage_error when there is a data corruption.
Added doAuditStorage() for resuming Audit.
* Load and resume AuditStorage when DD restarts.
* Generate audit id monotonically.
* Fixed minor issue AuditId/Type was not set.
* Adding getLatestAuditStates.
* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.
* Added `audit_storage` fdbcli command.
* fmt.
* Fixed null shared_ptr issue.
* Improve audit data.
* Change DDAuditFailed to SevWarn.
* Sev.
* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.
* Moved AuditUtils* to fdbclient/.
* Added getAuditStatus fdbcli command.
* Refactor audit storage fdb cli commands.
* Added auditStorage in sim.
* Cleanup.
* Resolved comments.
* Resolved comments.
* Test disabling audit for sims.
* Cleanup.
Co-authored-by: He Liu <heliu@apple.com>