Commit Graph

990 Commits

Author SHA1 Message Date
Jon Fu 7c5de05cdb
Separate failed and excluded servers on fdbcli output (#10089)
* separate failed and excluded servers on fdbcli output

* change formatting
2023-05-02 14:22:17 -04:00
A.J. Beamon 0035d9c519
Merge pull request #10074 from sfc-gh-ajbeamon/apply-black-format
Apply black format to most Python files
2023-05-02 08:20:47 -07:00
Yanqin Jin 8b1fe728be
Add configuration option `auto_tenant_assignment` to data clusters (#10058)
This PR adds auto_tenant_assignment option to register/configure data clusters.
Setting auto_tenant_assignment to disabled means the data cluster is a dedicated one and won't be
used for auto tenant assignment. This option is enabled by default (allowing auto tenant assignment).

Test plan:
simulation tests and metacluster_fdbcli_tests.py
---------

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-05-01 21:58:49 -07:00
Zhe Wang d6e7b5f736
Audit storage: validate consistency of replica and shard location metadata (#9628)
* Implemented AuditUtils.actor.cpp

Moved AuditUtils to fdbserver/

* Persist AuditStorageState.

* Passed persisted AuditStorageState test.

* Added audit_storage_error to indicate a corruption is caught.

Throw/Send audit_storage_error when there is a data corruption.

Added doAuditStorage() for resuming Audit.

* Load and resume AuditStorage when DD restarts.

* Generate audit id monotonically.

* Fixed minor issue AuditId/Type was not set.

* Adding getLatestAuditStates.

* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.

* Added `audit_storage` fdbcli command.

* fmt.

* Fixed null shared_ptr issue.

* Improve audit data.

* Change DDAuditFailed to SevWarn.

* Sev.

* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.

* Moved AuditUtils* to fdbclient/.

* Added getAuditStatus fdbcli command.

* Refactor audit storage fdb cli commands.

* Added auditStorage in sim.

* Cleanup.

* Resolved comments.

* Resolved comments.

* Added SystemData for metadata audit.

Refactored audit workflow to make sure all sub-tasks are executed w/o
early exit.

* Improvements.

* Persisted Failed state after too many retries.

* Added retryCount for resumeAuditStorage().

* resolving conflict.

* Resolved conflicts.

* allow-merged-to-run

* add timeout to audit client

* fmt

* validate replica

* add audit serverKey

* address comments and fmt

* fix audit_storage_exceeded_request_limit

* fix segfault in getLatestAuditStatesImpl

* fix bugs

* remove timeout from workload

* fix bugs

* audit local view of shard assignment

* fmt

* fix-stuck-issue-and-make-dd-audit-storage-self-retry

* fix timeout

* fix timeout

* fix bugs and cleanup

* fix nit

* change name state to coreState for audit metadata

* address comments

* code clean

* fmt

* setup debug

* cleanup

* clean up

* code cleanup

* code clean

* remove tmp file

* fmt

* trace portion of shards that of anonymous physical shard

* remove unnecessary actor cleanup

* do not give up when tr is too old

* address commits

* refactor

* clean

* fmt

* fix-command-help-text

* fix-auditstate-restore-and-enable-restore-to-metadata-audit

* address comments

* fmrt

* debug and improve efficient of resume audit

* small change

* fix audit cli

* bypass completed audit when dd restart

* fix auditStorageCommandActor

* make mismatch key range more visable

* address comments

* make local shard metadata check can make progress by retries

* address comments

* address comments

* partition location metadata validation by range and server

* unset MIN_TRACE_SEVERITY

* address comments and SS auto proceed until failed then notify dd

* persistNewAuditState should checkMoveKeysLock

* audit storage location metadata partitioned by range and move shard assignment history def to the end of SS structure

* code cleanup

* fix error message in metadata validation

* fix registerAuditsForShardAssignmentHistoryCollection input for local shard validation

* add comments to code and add guard to make sure the SS audit does not proceeds automatically for many times without being notified by DD --- to support audit cancellation later

* fix coalesceRangeList

* replace rangeOverlapping func with operator and use struct instead of complicated type for return value of getKeyServer/serverKey/shardInfo

* simplify shard assignment history

* shardAssignmentRecordRequests should be unorder_map

* address comments, make trackShardAssignment simple, make anyChildAuditFailed cover all audit children, keep only one audit actor run at a time on each SS

* only run validate shard info once at a time, other audit type does not have this limitation

---------

Co-authored-by: He Liu <heliu05023@gmail.com>
Co-authored-by: He Liu <heliu@apple.com>
Co-authored-by: Zhe Wang <zhewang@Zhes-Laptop.local>
2023-05-01 10:35:52 -07:00
A.J. Beamon 182dc93ebd Apply black format to most Python files, excluding a few cases where we have Python 2 files and a few files written externally. Add external files as exclusions to the precommit checks. 2023-04-28 11:46:41 -07:00
Steve Atherton 69d6e43354 Added explanation of \u support to fdbcli token parsing. Small tweak to rangeconfig hints. Reformatted rangeconfig help to not be intended because the help printer does its own line wrapping which makes it look very messy. 2023-04-25 13:04:38 -07:00
Steve Atherton ab7b4c490e Add inline command line help for rangeconfig. 2023-04-25 12:16:04 -07:00
Steve Atherton b70ff34a66 Move custom shard test setup to a separate function. Add JSON utf-8 escaped bytes to fdbcli token parsing. 2023-04-25 10:48:54 -07:00
Steve Atherton 858b51a69b Address review comments. KeyRangeMapSnapshot is now ReferenceCounted and getSnapshot() returns a Reference to discourage copying. Added several comments for clarity. Added FormatUsingTraceable and changed all new formatters to use it except for Standalone<T> which redirects to the formatter for T. 2023-04-24 19:01:05 -07:00
Steve Atherton c57ed25987 Renamed SystemDBLockWriteNow() to SystemDBWriteLockedNow() and changed definition to be more direct / clear. 2023-04-22 13:17:41 -07:00
Steve Atherton 639d4d05ef Removed SYSTEM_PRIORITY_IMMEDIATE from KeyBackedTypes and all options from KeyBackedRangeMap database functions. Added SystemTransactionGenerator<> for wrapping Database types and generating transactions with selected system level options. 2023-04-21 19:00:29 -07:00
Steve Atherton 46cde666a5 Merge commit '9639192a88001043a104aeef0c394e99ca5d6a6e' into keybackedrangemap 2023-04-21 13:27:15 -07:00
Steve Atherton 948e2dd781 Bug fix in KeyBackedRangeMap::updateRange() where the range after the modified region could be set wrong. Added Database version of updateRange(). 2023-04-20 20:44:24 -07:00
Jon Fu a7cf82adb2
Update fdbcli tenant list function to take tenant group filter, support JSON, and report tenant IDs (#9967)
* fix metacluster get segfault

* update fdbcli tenant list function to take tenant group filter, support JSON, and report tenant IDs

* code review changes

* code formatting

* additional code review changes

* account for empty tenant groups

* reformat error catching in fdbcli command

* refactor json output and address code review comments

* add back mistakenly removed hint

* keep hints after 4th token

* add to tenant management workload

* fix compile error

* fix test range

* add more asserts to metacluster case

* nest test condition inside if block

* adjust tenant test layout

* refactor some test files

* reorganize test workload logic
2023-04-20 16:22:47 -04:00
Steve Atherton 2553aed118 KeyBackedRangeMap::updateRange() now coalesces adjacent matching ranges caused by the update, and supports replacing a range's config with a new explicit value. Added update command to rangeconfig cli. 2023-04-20 13:02:04 -07:00
Steve Atherton a164f8fa9d Add rangeconfig CLI. 2023-04-19 22:19:55 -07:00
Steve Atherton 53ee26d758 Changed KeyBackedTypes to an actor file. Added TypedKeySelectors for Map and Set classes and getRange() keySelector methods. Added debug macro for KeyBackedTypes. Rewrote KeyBackedRangeMap using keyselectors on KeyBackedMap. 2023-04-18 22:21:19 -07:00
Yanqin Jin 2959d07797
Add test coverage for metacluster operations via fdbcli (#9802)
Add test coverage for metacluster operations via fdbcli

Test plan:

```bash
mkdir build && cd build && cmake -G Ninja ..
ninja fdbcli fdbserver fdbmonitor
ctest -R metacluster_fdbcli_tests
```
2023-04-14 07:42:55 -07:00
Chaoguang Lin b9935ef6b4 Add wait at the end of versionepoch which triggers recovery; add start&end logging of each command test 2023-04-04 17:05:47 -07:00
A.J. Beamon 9c786e6d1e
Merge pull request #9854 from sfc-gh-ajbeamon/metacluster-separate-project
Metacluster refactoring
2023-04-04 09:43:41 -07:00
Hui Liu f2a406f609
Add blob manifest and mutation log status to "status json" (#9856) 2023-04-03 18:30:13 -07:00
A.J. Beamon 807646675c Refactor the metacluster project into smaller files, and reorganize the namespaces. Move some metacluster and tenant testing helpers into the metacluster project. 2023-03-30 16:20:09 -07:00
A.J. Beamon e61748c7d5 Move metacluster into its own directory and static library 2023-03-30 16:07:49 -07:00
Jon Fu 6c337e5c7a
fix metacluster get segfault (#9693) 2023-03-23 14:21:25 -04:00
A.J. Beamon 5504a58a12 Fix formatting 2023-03-13 16:04:25 -07:00
A.J. Beamon d39cda610a Merge branch 'main' into metacluster-improvements
# Conflicts:
#	fdbcli/TenantCommands.actor.cpp
2023-03-13 15:58:39 -07:00
Jon Fu b751d0f87c
Add tenant getId api to fdbcli (#9658)
* add tenant getId api to fdbcli

* add to fdbcli tests and address review comments

* fix fdbcli tests
2023-03-13 18:06:03 -04:00
A.J. Beamon 45056370b8 Merge branch 'main' into metacluster-improvements 2023-03-13 13:14:09 -07:00
A.J. Beamon 18cf523f49
Merge pull request #9660 from sfc-gh-ajbeamon/tenant-id-restore-safety
Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix
2023-03-13 13:12:30 -07:00
Ankita Kejriwal ffa0aa4a7b
Update the expected error message from exclude in fdbcli_tests (#9648) 2023-03-10 19:49:20 -08:00
A.J. Beamon 6b2185c707 Fix off-by-one in token hint generator for metacluster restore 2023-03-10 18:50:34 -08:00
A.J. Beamon cbc330697c Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix unless forced. Remember the largest used tenant ID on the data cluster and use it to update the management cluster tenant ID when force repopulating the same ID. 2023-03-10 15:36:37 -08:00
A.J. Beamon 06decf1141 Fix memory error in fdbcil tenant lock command 2023-03-10 10:31:59 -08:00
A.J. Beamon fa1f20a9ff Update the help text for some fdbcli tenant commands 2023-03-07 11:47:21 -08:00
Jingyu Zhou 684464cebb
Merge pull request #9594 from sfc-gh-jfu/fix-fdbcli-segfault
Fix fdbcli segfault
2023-03-07 08:28:14 -08:00
Steve Atherton 5ff0bc3f87
Merge pull request #9576 from sfc-gh-satherton/storage-configure-refactor
Storage and log engine configuration support / refactor a few things.
2023-03-07 02:10:14 -08:00
Chaoguang Lin 7273723a43
Add the hotrange fdbcli command (#9570)
* Add the hotrange fdbcli command

* Remove the unnecessary state

* Add the doc about the hotrange command
2023-03-06 14:46:52 -08:00
Jon Fu 9313e5f653 fix fdbcli segfault 2023-03-06 14:18:46 -08:00
Jingyu Zhou 7a0b3c05b9
Merge pull request #9540 from sfc-gh-huliu/timestamp
Report restore phase start time and eta
2023-03-06 14:06:23 -08:00
Jingyu Zhou 94a9e37583
Merge pull request #9526 from sfc-gh-xwang/fix/main/killall
wait extra time to make sure rebootWorker request sent to storage server
2023-03-06 11:25:09 -08:00
Steve Atherton 50d567b5a5 Refactored some parts of database configuration to support log_engine=<name> and storage_engine=<name> and generate these when converting a DatabaseConfig JSON object to a `configure` command. Refactored `fileconfigure` and simulation setup to use the same JSON -> configure function as the same code was copy/pasted to both places but only one has been kept up to date with new features. Renamed Redwood to `ssd-redwood-1` canonically but the experimental name is still supported for backward compatibility. 2023-03-04 20:52:31 -08:00
Hui Liu b2d497a3b2 Report restore phase start timestamp 2023-03-03 18:09:51 -08:00
Chaoguang Lin e123a51b50 remove the time guard on fdbcli commands in tests 2023-03-01 11:09:47 -08:00
Xiaoxi Wang 576c2a9c93 wait extra time to make sure rebootWorker request sent to storage server 2023-02-28 22:41:30 -08:00
Markus Pilman 20874d8575
Merge pull request #9502 from sfc-gh-ajbeamon/metacluster-tenant-lock-support
Metacluster tenant lock support
2023-02-27 21:19:03 -07:00
A.J. Beamon 469e77158f Add metacluster support for tenant locking 2023-02-27 16:53:13 -08:00
Jingyu Zhou 29a406948a
Merge pull request #9370 from sfc-gh-mpilman/features/tenant-lock-fdbcli
fdbcli commands for tenant lock
2023-02-27 16:18:51 -08:00
Markus Pilman ed1079fc7d Remove wrong comment 2023-02-27 15:27:16 -07:00
Markus Pilman 1689e36d14 Fix uid extraction from lock command output 2023-02-27 15:24:07 -07:00
Russell Sears bcc05b1058 Improve support for prebuilt boost 2023-02-27 15:38:58 -06:00