Commit Graph

13168 Commits

Author SHA1 Message Date
He Liu 29eab90528
Clean up dd traces (#11090)
* Clean up DD traces.

* clean up dd traces.
2023-12-06 15:53:04 -08:00
Yao Xiao 83e1f8ab7b
Logging improvement. (#11084) 2023-12-01 16:13:17 -08:00
neethuhaneesha 4f167f50be
Adding field length to audit storage trace events. (#11079) 2023-12-01 15:45:17 -08:00
He Liu a8cdb7367c
Throttle low-priority fetchKeys. (#11083) 2023-11-30 21:27:52 -08:00
Jingyu Zhou ee626409c0
Merge pull request #11064 from apple/features/coroutines
Added C++ Coroutine support to Flow
2023-11-22 09:58:32 -08:00
Jingyu Zhou 266a3f018b Disable hot shard throttling for restore
Otherwise, the restored database is not consistent.

To reproduce, at commit 45c459cfc of PR #11064:

-f ./tests/restarting/from_7.3.0/DrUpgradeRestart-1.toml -s 4210130489 -b off
-f ./tests/restarting/from_7.3.0/DrUpgradeRestart-2.toml --restarting -s 4210130490 -b on
2023-11-17 13:56:58 -08:00
Dan Lambright af53e9a532
Log ignored zones and reasons in RkUpdate (#11067) 2023-11-17 16:07:48 -05:00
Zhe Wang ba30d15dd3
nits for blobcipher test (#11073) 2023-11-17 10:47:27 -08:00
Zhe Wang 9df517618e
enable mutation tracking work with encrption (#11068) 2023-11-16 14:04:18 -08:00
Yao Xiao af106e5bda
Update rocksdb knobs. (#11069) 2023-11-16 13:53:12 -08:00
He Liu 422da7d7c7
Throttle fetch keys (#11060)
* Added ThroughputLimiter class.

* Throttle fetchKeys.

* cleanup.

* Increased fetchKeys rate limit for sim tests.

* Added unit.

* Resolved comments.
2023-11-16 10:30:40 -08:00
Zhe Wang 1e9c5bb390
Propagate data move reason from DD to SS (#11063)
* encode reason to data move id

* address comments

* fix data move id decode bug and add assert for data move decode invariant

* address comments
2023-11-15 13:07:11 -08:00
Sreenath Bodagala 03f85e2e9b Merge remote-tracking branch 'apple-upstream/main' 2023-11-14 16:30:12 +00:00
Markus Pilman e07b3e35ca Added C++ Coroutine support to Flow 2023-11-14 10:10:11 +01:00
Hao Fu 9b17dd8caf
Fix backup workers stability issues (#11044)
This PR includes a few stability fixes for Backup Worker

* Fixed memory bookkeeping issue in Backup Worker. Previously
it didn't release flow lock correctly when erasing messages.

* Added TLogServer fix to return 0 from poppedVersion() for
unrecognized log router tags.
2023-11-13 15:55:25 -08:00
He Liu b8f1670a0e
Physical shard move tss (#11057)
* Refactored newDataMoveId() and decodeServerKeysValue().

* Enabled physical shard move for tss.

* Added unit test & cleanup.

* clean up test configs.
2023-11-13 11:34:07 -08:00
Sreenath Bodagala 5dd98de522 - Fetch storage server lag between data centers in the failover test 2023-11-13 19:22:54 +00:00
He Liu 29a10311d9
Speed up physical shard move (#11056)
* Allow applyUpdates in multiple batches.

* Persist lastAppliedVersion together with a batch of updates.

* Fixed repeatedly applying the same fetched mutation.
Apply updates at different versions.

* Fixed race between removeDataShard and isInVersionedData.

* Ignore mutations earlier than `lastAppliedVersion`.

* Buggify the batch limit for applying updates.

* Implemented async load of updates.

* Fixed out-of-order version issue.

* Cleanup.

* Batch commit MoveInUpdates.

* Avoid popping unpersisted updates.

* Increment MoveInUpdates::lastAppliedVersion in MoveInUpdates::next().
Fixed MoveInUpdates::hasNext().

* Fixed loadUpdates start version.

* cleanup.

* Cleanup.

* fmt

* Commit MoveInUpdates regardless of MoveInShard status.

* Enable move restore.

* Disabled move restore and fallBackToAddingShard from ingestion failure.

* Get rid of persisting MoveInShard states between phases.

* Make updateMoveShardMetadata synchronous.

* Recovered code deleted accidentally.

* Cleanup.

* cleanup.
2023-11-13 09:27:58 -08:00
Sreenath Bodagala b545f61319 Merge remote-tracking branch 'apple-upstream/main' 2023-11-09 16:48:33 +00:00
Sreenath Bodagala d433438284 - Address a review comment 2023-11-09 16:38:54 +00:00
Yao Xiao e7aa0333a9 Update RocksDB options. 2023-11-08 14:09:31 -08:00
Sreenath Bodagala b0be152aad - Capture data center log and storage server version differences in
ClusterController and include them in status json.
2023-11-08 17:28:36 +00:00
He Liu 967a546e15
Optimize physical shard move (#10962)
* Allow applyUpdates in multiple batches.

* Persist lastAppliedVersion together with a batch of updates.

* Fixed repeatedly applying the same fetched mutation.
Apply updates at different versions.

* Fixed race between removeDataShard and isInVersionedData.

* Ignore mutations earlier than `lastAppliedVersion`.

* Buggify the batch limit for applying updates.

* Implemented async load of updates.

* Fixed out-of-order version issue.

* Cleanup.

* Batch commit MoveInUpdates.

* Avoid popping unpersisted updates.

* Increment MoveInUpdates::lastAppliedVersion in MoveInUpdates::next().
Fixed MoveInUpdates::hasNext().

* Fixed loadUpdates start version.

* cleanup.

* Cleanup.

* fmt

* Commit MoveInUpdates regardless of MoveInShard status.

* Enable move restore.

* Disabled move restore and fallBackToAddingShard from ingestion failure.

* Resolved comments.

* Fixed leaks from other prs.
2023-11-07 14:19:18 -08:00
Sreenath Bodagala d8f0a21ecc Merge remote-tracking branch 'apple-upstream/main' 2023-11-07 21:10:43 +00:00
Sreenath Bodagala 58c0e79874 - Prevent failover when storage servers are behind. 2023-11-03 21:45:48 +00:00
neethuhaneesha 220ad87cc4
Rocksdb new options configuration (#11048) 2023-11-03 13:45:03 -07:00
Yao Xiao 33a29ddd85
Upgrade RocksDB version and disable CF range deletion optimization. (#11045)
* Upgrade RocksDB version and disable CF range deletion optimization.wq

* Disable iterator.
2023-11-02 17:25:11 -07:00
Dan Lambright 015167c17e
Throttle commits against hot shards (#10970)
* throttle hot shards

* expire throttled shards over time

* add backoff

* Parallelize messaging from RK to CP

* Obtain shards from a single SS

* handle expired transactions

* bump transaction_throttled_hot_shard

* Change SevError to SevWarn for CannotMonitorHotShardForSS

* Add log per request
2023-10-31 12:01:34 -04:00
neethuhaneesha 361af9e862
Perpetual wiggle option to have multiple SS in rebalance state during wiggling. (#11019) 2023-10-24 15:10:51 -07:00
Zhe Wang 55c023a815
Improve visibility of consistency checker (#11018)
* improve visibility consistency checker

* fix for CI failure
2023-10-24 13:21:07 -07:00
neethuhaneesha b7148ba67e
Compaction rate limiter changes. (#11015) 2023-10-23 13:39:46 -07:00
Jingyu Zhou ad259d48cf
Merge pull request #11001 from jzhou77/main
Fix RemoveServersSafely workload timeout error
2023-10-23 09:26:33 -07:00
Zhe Wang 4695bdfbbf
improve printSnapshotTeamsInfo (#10999) 2023-10-23 08:54:07 -07:00
Zhe Wang adaf03b6f9
Check write traffic when bypass shard split (#10974)
* check write traffic when exiting shard split

* address comments
2023-10-20 15:00:14 -04:00
Jingyu Zhou d40902aefb Fix IDE build error 2023-10-20 09:03:32 -07:00
Jingyu Zhou 4e07fb982b Fix force flag for exclusion 2023-10-19 21:47:13 -07:00
Jingyu Zhou f99a8151a0 Fix RemoveServersSafely workload timeout error
This workload can have timeout error when using locality-based exclusion. The
sequence is:

1. RemoveServerSafely workload exclude locality by processid
2. Attrition reboots the target process, thus changing the processid, because
processid is generated for each worker process at fdbd()
3. RemoveServerSafely waits for the process exclusion, which never succeed
4. Timeout

The fix monitors processid locality changes and reissue the exclusion with the
correct locality.

To reproduce:
seed: -f ./tests/fast/SwizzledRollbackSideband.toml -s 879108103 -b on
commit: a3dbd4baf release-7.1
2023-10-19 21:46:44 -07:00
Jingyu Zhou e3c77044c0
Merge pull request #10987 from kakaiu/add-write-traffic-metrics-to-ddMetricsGetRange
Add write traffic metrics to ddMetricsGetRange
2023-10-19 11:04:31 -07:00
Johannes M. Scheuermann 9591a8e382 Correct the check for a machine being excluded 2023-10-19 11:42:46 +02:00
Zhe Wang 5f43fc91e7 pr-10985 2023-10-17 11:19:25 -07:00
Chaoguang Lin d24d9e6e21
Replace the wrong usage of g_simulator in the snapshot code (#10984) 2023-10-13 18:31:17 -07:00
Zhe Wang b0569f8717
fix corner cases of auditStorageServerShardQ (#10980) 2023-10-13 09:48:24 -07:00
Jingyu Zhou 9cd01ac58a
Merge pull request #10927 from sbodagala/main
Add support to fetch a specific group of status json fields
2023-10-11 12:57:04 -07:00
Jingyu Zhou 1896e5cd46 Fix an unitialized variable
Valgrind complains this for RecentRocksDBBackgroundWorkStats event.
2023-10-09 15:13:02 -07:00
Sreenath Bodagala 3dcee84898 Merge remote-tracking branch 'apple-upstream/main' 2023-10-09 15:21:16 +00:00
Zhe Wang 5767fed414
AuditStorage check all DC replica (#10955)
* add trace events when update audit metadata

* audit all DCs in replica

* fix corner case of audit replica

* fmt

* address comments
2023-10-06 14:30:21 -07:00
Yao Xiao 45494e3bba
Add knob for fetch keys budget. (#10963) 2023-10-06 13:06:34 -07:00
Sreenath Bodagala fe2a8b10b2 - Address PR review comments (includes a special key related test) 2023-10-06 16:19:56 +00:00
He Liu 1e249143a4
Allow large shard (#10961)
* Added large shard.

* getMaxShardSize() returns the fix max shard size.

* Resolved comments.

* fmt.
2023-10-05 13:35:06 -07:00
Zhe Wang be90185dd6
Throttle wiggling data moves (#10953)
* throttle wiggling data moves

* address comments
2023-10-03 09:29:24 -07:00
Sreenath Bodagala 9cde595207 - Re-use the current mechanism for gathering status. 2023-10-02 19:38:26 +00:00
Jingyu Zhou d27a81ff58 Increase DataCenter Lag gate to 100
Found test failure due to 30s gate being too small:

./fdbserver.6.3.16 -r simulation -f ./tests/restarting/from_6.3.13_until_7.2.0/DrUpgradeRestart-1.txt -s 3218805329 -b on --logsize 1GB
-f ./tests/restarting/from_6.3.13_until_7.2.0/DrUpgradeRestart-2.txt --restarting -s 3218805330 -b on

clang build
commit 7d5d1e082
2023-09-25 17:12:17 -07:00
Sreenath Bodagala 3c01b1befe - Add a special key in order to fetch a specific group of status json fields. 2023-09-25 16:23:19 +00:00
William Dowling 0f752473be
Merge branch 'main' into radixtree-production 2023-09-25 09:52:20 +02:00
neethuhaneesha ca2700cc35
Erase storageWiggleID from disk if not used (#10912) 2023-09-22 14:15:06 -07:00
Jingyu Zhou f42dd41ae8
Merge pull request #10810 from sfc-gh-tclinkenbeard/main-fix-clear-cost-estimation
Fix quota throttler clear cost estimation
2023-09-20 20:48:40 -07:00
Jingyu Zhou 3ebbe8c7a0
Merge pull request #10725 from sfc-gh-tclinkenbeard/main-throttle-multiple-write-tags
Monitor multiple write tags in `StorageQueueInfo::refreshCommitCost`
2023-09-20 20:44:34 -07:00
Zhe Wu 87083652a3
Merge pull request #10856 from halfprice/zhewu/wiggle-locality-list
Make `perpetual_storage_wiggle_locality` database option to take a list of localities
2023-09-20 15:42:25 -07:00
Zhe Wu d65a6a8a10 Address comments 2023-09-20 13:56:15 -07:00
Boris Korzun bb6c5ad187 Rename PAGE_SIZE constant 2023-09-20 22:14:31 +03:00
Zhe Wu 31d46b6fb2 Address comments 2023-09-19 10:35:51 -07:00
Jingyu Zhou f72eab118d Skip redwood for clearInflightCommits unit test
Redwood complains DB is invalid for this unit test:

Assertion keyProvider.isValid() || db.isValid() failed @ /root/src/foundationdb/fdbserver/VersionedBTree.actor.cpp 8029:
2023-09-18 14:34:36 -07:00
Jingyu Zhou 307491d68e Use getRange for server metadatas
To reduce read load on SSes that serve the reads.
2023-09-18 13:32:53 -07:00
Jingyu Zhou 12fe500633 ClusterController watches changes to storage metadata
To retrieve storage metadata for every status json request is very expensive
for clusters with a large number of storage servers. So I change the logic so
that ClusterController actively monitors changes to storage metadata, and only
retrieves them when there is a change.
2023-09-15 14:19:04 -07:00
Zhe Wang 1f6215808b
fix multiple requests from DDDoAudit (#10902) 2023-09-14 11:04:19 -07:00
Zhe Wang 29a2f63f8d
Fix SSShard Audit (#10896)
* fix ssshard

* address comments

* fmt
2023-09-13 21:15:12 -07:00
Jingyu Zhou 977851fa39 Fix sharded rocks failure by adding a shard 2023-09-13 15:53:27 -07:00
Jingyu Zhou 42df53b3bc Add a test case for storage engine
This test validates that in-flight commit to the storage engine is properly
handled. As found in https://github.com/apple/foundationdb/pull/10714, an
engine could misses in-flight data and cause data corruptions.

The test case is modeled after the above corruption: insert data, then clear
the data in the next commit to the storage engine, and finally varify that the
data is cleared.
2023-09-13 15:25:25 -07:00
Zhe Wu bebd1790db Resolve conflict with recent change in main 2023-09-13 14:28:21 -07:00
Zhe Wu af42816b0e Adding test for perpetual_storage_wiggle_locality to take a list of localities 2023-09-13 13:36:16 -07:00
Zhe Wu a86a2d752e Apply format 2023-09-13 13:36:16 -07:00
Zhe Wu 7158d702c6 Use locality list for host checking 2023-09-13 13:36:16 -07:00
Zhe Wu 4abd2edcad Parse wiggle locality as a list 2023-09-13 13:36:16 -07:00
A.J. Beamon 908d84c893
Merge pull request #10518 from oleg68/bugfix/clang-17
Fixed compiling foundationdb with the clang 17 compiler
2023-09-13 09:59:52 -07:00
Oleg Samarin b9b4e1ebe4 Fixed formatting 2023-09-13 18:56:25 +03:00
Oleg Samarin 3c75b5d7de
Apply suggestions from code review
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-09-13 17:18:06 +03:00
Zhe Wang 2a40fb4135
fix audit state serializer (#10889) 2023-09-12 14:48:04 -07:00
Jingyu Zhou d47e0eaaaa
Merge pull request #10877 from neethuhaneesha/wigglelocality
Locality check on perpetualStorageWiggleIDPrefix when DD restarts
2023-09-07 10:09:08 -07:00
Hui Liu b2ee1fb6c4
Tune blob restore default knob (#10871) 2023-09-07 09:36:51 -07:00
neethuhaneesha 9bafef4fd2 Locality check on perpetualStorageWiggleIDPrefix when DD restarts 2023-09-06 16:00:32 -07:00
Hui Liu 4d2a7d507d
Add a new blob restore state to fix a race after data copy (#10854) 2023-09-05 14:04:35 -07:00
Zhe Wu e2f5c50a7b
Merge pull request #10828 from halfprice/zhewu/clear-wiggle-storage-engine
Add option to set perpetual_storage_wiggle_engine to none
2023-09-05 11:06:21 -07:00
sfc-gh-tclinkenbeard dccc0e6773 Merge remote-tracking branch 'origin/main' into main-fix-clear-cost-estimation 2023-09-04 13:30:46 -07:00
sfc-gh-tclinkenbeard d82b66f5c6 Merge remote-tracking branch 'origin/main' into main-throttle-multiple-write-tags 2023-09-04 13:29:02 -07:00
Jingyu Zhou 3c5a38b120
Merge pull request #10859 from sfc-gh-ljoswiak/fixes/database-context-status
Avoid creating a DatabaseContext on every status json call
2023-09-02 10:25:47 -07:00
Jingyu Zhou 52209ef551
Merge pull request #10714 from neethuhaneesha/singlekeydelete
Storing the inprocess-commit deleted keysets until the commit is complete
2023-09-01 20:54:49 -07:00
Lukas Joswiak b3e58761d3 Avoid creating a DatabaseContext on every status json call
DatabaseContext currently leaks memory by creating `Counter`s with
unique IDs on construction. Each status json call creates a new
`DatabaseContext` object, causing a memory leak over time.
2023-09-01 19:55:17 -07:00
A.J. Beamon ead6c37e4a
Merge pull request #10662 from sfc-gh-tclinkenbeard/main-fix-grv-queue-leak
Fix GRV queue leak
2023-09-01 09:45:49 -07:00
Egor Zhdan 3428d4678c
[swift] Remove deprecated Swift attr spelling: `import_as_ref` (#10825)
This spelling is deprecated and will be removed in the future version of Swift:
```
__attribute__((swift_attr("import_as_ref")))
```

The new correct spelling is:
```
__attribute__((swift_attr("import_reference")))
```
2023-08-31 13:23:23 -05:00
neethuhaneesha 08cc6518bb Deleting rocksdb keysSet after the commit completes. 2023-08-30 15:25:23 -07:00
neethuhaneesha 68d5708e02 Storing the inprocess-commit deleted keysets until the commit is complete. (#10672) 2023-08-30 15:25:23 -07:00
Zhe Wu 83992d61ec Add a knob to guard the gray failure detection during TLog recovery 2023-08-29 14:49:39 -07:00
Zhe Wu abd535c2e0 Add test documents 2023-08-29 14:32:21 -07:00
Yi Wu 8d7f2e84ed
Merge pull request #10831 from sfc-gh-yiwu/ear_timeout
EaR: Handle KMS timeout in storage server and commit proxy
2023-08-28 20:59:22 -07:00
Zhe Wang 432c077b51
fix dd issue when dd skip audit (#10844) 2023-08-28 16:39:45 -07:00
Yi Wu 3287098b4a EaR: Handle KMS timeout in storage server and commit proxy 2023-08-28 16:17:43 -07:00
Xiaoge Su cb99a8f56f fixup! Reformat source 2023-08-28 14:56:39 -07:00
Xiaoge Su 5a12d12774 fixup! Enable sharded RocksDB storage engine for PhysicalShardMove 2023-08-28 14:56:39 -07:00
Jingyu Zhou 1a6114a66f
Merge pull request #10843 from xis19/main
fixup! Add back missing initializer
2023-08-28 12:51:54 -07:00
Xiaoge Su d0685f9fff fixup! Add back missing initializer 2023-08-28 11:05:07 -07:00
Hui Liu 30d4f07395
Catch initial flush error and avoid crashing blob manager (#10836) 2023-08-25 20:32:20 -07:00
Zhe Wang f43b20e15c
Audit location metadata in DD (#10820)
* Audit location metadata in DD

* nits
2023-08-25 17:11:11 -07:00
Josh Slocum 3bdcbef465
enabling median assignment limiting (#10805) 2023-08-25 17:52:54 -05:00
Yao Xiao b20dcf23a9
Support periodic compaction for sharded rocksdb. (#10815) 2023-08-25 15:38:01 -07:00
Jingyu Zhou adcedb9301
Merge pull request #10812 from xis19/main
A few test failure fixes
2023-08-25 11:07:33 -07:00
Zhe Wu 6610a228c7 Add option to set perpetual_storage_wiggle_engine to none 2023-08-24 13:43:58 -07:00
Yao Xiao 0a87b6039f
Add perpetual wiggle pause reason. (#10821) 2023-08-24 00:31:38 -07:00
Xiaoge Su d8d9b9402b fixup! Subobject-linkage
/codebuild/output/src3651401136/src/github.com/apple/foundationdb/fdbserver/DataDistribution.actor.cpp:111:8: error: 'DDAudit' has a field 'DDAudit::context' whose type uses the anonymous namespace [-Werror=subobject-linkage]
2023-08-23 13:47:41 -07:00
Xiaoge Su 26a87da578 fixup! Use isMocked method 2023-08-23 13:25:37 -07:00
Xiaoge Su 1ec5ed13b9
Merge branch 'main' into main 2023-08-23 11:35:21 -07:00
Zhe Wang 7e8f326277
Audit storage for specific engine (#10781)
* audit storage for specific engine

* fix getStorageType

* fix budget of skipAuditOnRange

* fix budget in scheduleAuditOnRange

* fix CI error

* improve trace events

* address comments
2023-08-23 10:51:24 -07:00
Zhe Wang f8311ae069
Add more trace event for TSS recruitment (#10809)
* add more trace event for tss

* update StorageServerInitProgress

* add more traces
2023-08-23 09:19:30 -07:00
Xiaoge Su ac2bf66a2a fixup! gcc does not allow class member variable uses anonymous namespace
fdbserver/DataDistribution.actor.cpp:360:8: error: 'DataDistributor' has a field 'DataDistributor::audits' whose type uses the anonymous namespace [-Werror=subobject-linkage]

clang does not report the same error
2023-08-22 18:03:46 -07:00
Xiaoge Su 95c5c2e04c Disable PerpentualWiggleStorageMigrationWorkload when RocksDB is disabled
This can also be fixed in CMake test files, not sure should disable it
there but guess changing CXX code might be simplier.
2023-08-22 16:57:07 -07:00
Xiaoge Su 47efd890ad Better error report when specified storage engine is not supported 2023-08-22 15:52:08 -07:00
Xiaoge Su 130102f1bd Disable AuditService when mocking DD 2023-08-22 15:51:38 -07:00
sfc-gh-tclinkenbeard d6a7e1eccd Normalize clear costs for quota throttler 2023-08-22 15:29:28 -07:00
Hui Liu 98958b32b8
Wait file deleted in blob manifest cleanup (#10802) 2023-08-22 10:50:03 -07:00
Josh Slocum b488abbee0
fixed split downsampling to happen as part of split range and to correctly deal with re-aligning keys after downsample (#10796) 2023-08-22 12:21:16 -05:00
Zhe Wang 83dc9ff6f7
Trace SS init progress (#10799)
* trace ss init progress

* improve trace events
2023-08-18 18:44:37 -07:00
Ata E Husain Bohra 22ddb8a92d
Prevent Status actor from bubbling up timeout error (#10791)
Description

Patch addresses occurrences where Status.actor ends up bubbling timeout error
up to the ClusterController causing recovery to be triggered when
ClusterGetStatus timeout for some reason.

Testing
2023-08-18 15:21:52 -07:00
Yao Xiao c63ee571e5
Export file metrics and add knob for file size multiplier. (#10785) 2023-08-16 11:27:33 -07:00
Hui Liu aea6fa5ca6
Set BLOB_RESTORE_SKIP_EMPTY_RANGES default value to false (#10784) 2023-08-16 10:02:06 -07:00
Ankita Kejriwal 7e424c7386
Stagger storage quota estimation requests and observability improvements (#10759)
* Rename and simplify fetch time variables

* Add RefreshTime detail to TenantCacheGetStorageUsageRefreshSlow trace

* Stagger storage estimation requests

* Update the value of a knob in simulation to reduce flakiness

* Improve names of TenantCache and StorageQuota related traces. Add slow refresh time.

* Convert potentially spammy TenantCache traces to SevDebug
2023-08-15 13:09:24 -07:00
Jingyu Zhou 5ace1911d1 Fix a few typos 2023-08-14 15:01:19 -07:00
Zhe Wang f1c17b27fc
Multiple improvements to AuditStorages (#10685)
* remove danger DDAudit assert, add AuditRate knob, add progress check when ssshard complete, add progress check for ssshard in fdbcli

* throttle progress check for ssshard

* fix getAuditProgressByServer

* fix trace event for ss audit

* using name -- checkMoveKeysLockForAudit

* new scheduleAuditLocationMetadata

* address comments

* shorten progress summary for ssshard

* simplify getAuditProgressByServer in fdbcli
2023-08-14 13:13:49 -07:00
Evan Tschannen 3209dc7b30
Fixed multiple bugs related to locality based exclusions (#10623)
* fix: Non-storage processes were not being checked for locality exclusions
fix: Data distribution when not detect a newly added process was locality excluded
fix: RemoveServerSafely did not wait for processes to be excluded before killing them when excluding localities

* fix: do not allow locality based excludes if they cannot exclude the required addresses
2023-08-11 15:17:02 -07:00
Zhe Wang e7528aca09
Revert inappropriate setting of ApiCorrectness test (#10771)
* add max_manifest_file_size to rocksdb

* revert ApiCorrectness config change
2023-08-11 12:41:55 -07:00
Evan Tschannen bf0677f1d4
encryptionAtRestMode needs to be in the system keyspace so that it is encrypted with the cluster specific encryption key rather than a tenant key (#10777) 2023-08-11 12:20:22 -07:00
Zhe Wang 5868173a3e nits 2023-08-10 16:51:28 -05:00
Zhe Wang 81db1da5a9 address comments 2023-08-10 16:51:28 -05:00
Zhe Wang 5598f4f28f trace shardedrocks manifest size 2023-08-10 16:51:28 -05:00
Zhe Wu 726574243c Improve PerpetualWiggleStorageMigration reliability 2023-08-10 09:36:04 -07:00
Zhe Wu eb6f0c613d Add documentation for perpetual_storage_wiggle_engine config 2023-08-10 09:35:57 -07:00
Zhe Wu ab4ae712e8 Add PerpetualWiggleStorageMigrationWorkload documentation. 2023-08-10 09:35:57 -07:00
Zhe Wu 17ae952f15 Remove debugging notes 2023-08-10 09:35:57 -07:00
Zhe Wu fb703c0021 Making PerpetualWiggleStorageMigration test pass more reliably 2023-08-10 09:35:57 -07:00
Zhe Wu 863038a44c Add improvement for initializing storage server using new perpetual_wiggle_storage_engine config 2023-08-10 09:35:57 -07:00
Zhe Wu cdca09dc93 Making PerpetualWiggleStorageMigration pass reliably 2023-08-10 09:35:57 -07:00
Zhe Wu f65e3a35f2 Adding perpetual wiggle test to reproduce exclude/include case 2023-08-10 09:35:57 -07:00
Xiaoxi Wang c2cfebac73 Fix tss kill logic: only disable Tss check when zeroHealthyTeams=false 2023-08-09 20:43:15 -07:00
Jingyu Zhou ad9472d6c2
Merge pull request #10757 from jzhou77/main
Fix a crash error
2023-08-09 14:57:24 -07:00
He Liu df848005f8
Allow applyUpdates in multiple batches. (#10583) 2023-08-09 14:04:36 -07:00
Jingyu Zhou a4e4ab6411 Fix a crash error 2023-08-09 13:44:52 -07:00
Jingyu Zhou d1d9f185d8
Merge pull request #10715 from xis19/aa
fixup! Change the type of queryQueueMax from double to uint64_t in Ge…
2023-08-07 15:58:58 -06:00
Josh Slocum 4f6bf34fe1 fixing HBR to update key BM/BW are watching 2023-08-07 14:01:30 -05:00
Josh Slocum 20b94368ee fixing tenant watches 2023-08-07 14:01:30 -05:00
Jingyu Zhou bb328c1c36
Merge pull request #10724 from sfc-gh-etschannen/fix-grv-accounting-main
fix: when requests are dropped from the queue the txnRequestOut becomes inconsistent with txnRequestIn
2023-08-07 11:40:50 -06:00
sfc-gh-tclinkenbeard 2228cd3320 Monitor multiple write tags in StorageQueueInfo::refreshCommitCost 2023-08-02 15:52:57 -07:00
Yi Wu fff6b745d2 EaR: turn down the probability of enabling encryption in simulation 2023-08-02 10:11:18 -07:00