He Liu
29eab90528
Clean up dd traces ( #11090 )
...
* Clean up DD traces.
* clean up dd traces.
2023-12-06 15:53:04 -08:00
Yao Xiao
83e1f8ab7b
Logging improvement. ( #11084 )
2023-12-01 16:13:17 -08:00
neethuhaneesha
4f167f50be
Adding field length to audit storage trace events. ( #11079 )
2023-12-01 15:45:17 -08:00
He Liu
a8cdb7367c
Throttle low-priority fetchKeys. ( #11083 )
2023-11-30 21:27:52 -08:00
Jingyu Zhou
ee626409c0
Merge pull request #11064 from apple/features/coroutines
...
Added C++ Coroutine support to Flow
2023-11-22 09:58:32 -08:00
Jingyu Zhou
266a3f018b
Disable hot shard throttling for restore
...
Otherwise, the restored database is not consistent.
To reproduce, at commit 45c459cfc
of PR #11064 :
-f ./tests/restarting/from_7.3.0/DrUpgradeRestart-1.toml -s 4210130489 -b off
-f ./tests/restarting/from_7.3.0/DrUpgradeRestart-2.toml --restarting -s 4210130490 -b on
2023-11-17 13:56:58 -08:00
Dan Lambright
af53e9a532
Log ignored zones and reasons in RkUpdate ( #11067 )
2023-11-17 16:07:48 -05:00
Zhe Wang
ba30d15dd3
nits for blobcipher test ( #11073 )
2023-11-17 10:47:27 -08:00
Zhe Wang
9df517618e
enable mutation tracking work with encrption ( #11068 )
2023-11-16 14:04:18 -08:00
Yao Xiao
af106e5bda
Update rocksdb knobs. ( #11069 )
2023-11-16 13:53:12 -08:00
He Liu
422da7d7c7
Throttle fetch keys ( #11060 )
...
* Added ThroughputLimiter class.
* Throttle fetchKeys.
* cleanup.
* Increased fetchKeys rate limit for sim tests.
* Added unit.
* Resolved comments.
2023-11-16 10:30:40 -08:00
Zhe Wang
1e9c5bb390
Propagate data move reason from DD to SS ( #11063 )
...
* encode reason to data move id
* address comments
* fix data move id decode bug and add assert for data move decode invariant
* address comments
2023-11-15 13:07:11 -08:00
Sreenath Bodagala
03f85e2e9b
Merge remote-tracking branch 'apple-upstream/main'
2023-11-14 16:30:12 +00:00
Markus Pilman
e07b3e35ca
Added C++ Coroutine support to Flow
2023-11-14 10:10:11 +01:00
Hao Fu
9b17dd8caf
Fix backup workers stability issues ( #11044 )
...
This PR includes a few stability fixes for Backup Worker
* Fixed memory bookkeeping issue in Backup Worker. Previously
it didn't release flow lock correctly when erasing messages.
* Added TLogServer fix to return 0 from poppedVersion() for
unrecognized log router tags.
2023-11-13 15:55:25 -08:00
He Liu
b8f1670a0e
Physical shard move tss ( #11057 )
...
* Refactored newDataMoveId() and decodeServerKeysValue().
* Enabled physical shard move for tss.
* Added unit test & cleanup.
* clean up test configs.
2023-11-13 11:34:07 -08:00
Sreenath Bodagala
5dd98de522
- Fetch storage server lag between data centers in the failover test
2023-11-13 19:22:54 +00:00
He Liu
29a10311d9
Speed up physical shard move ( #11056 )
...
* Allow applyUpdates in multiple batches.
* Persist lastAppliedVersion together with a batch of updates.
* Fixed repeatedly applying the same fetched mutation.
Apply updates at different versions.
* Fixed race between removeDataShard and isInVersionedData.
* Ignore mutations earlier than `lastAppliedVersion`.
* Buggify the batch limit for applying updates.
* Implemented async load of updates.
* Fixed out-of-order version issue.
* Cleanup.
* Batch commit MoveInUpdates.
* Avoid popping unpersisted updates.
* Increment MoveInUpdates::lastAppliedVersion in MoveInUpdates::next().
Fixed MoveInUpdates::hasNext().
* Fixed loadUpdates start version.
* cleanup.
* Cleanup.
* fmt
* Commit MoveInUpdates regardless of MoveInShard status.
* Enable move restore.
* Disabled move restore and fallBackToAddingShard from ingestion failure.
* Get rid of persisting MoveInShard states between phases.
* Make updateMoveShardMetadata synchronous.
* Recovered code deleted accidentally.
* Cleanup.
* cleanup.
2023-11-13 09:27:58 -08:00
Sreenath Bodagala
b545f61319
Merge remote-tracking branch 'apple-upstream/main'
2023-11-09 16:48:33 +00:00
Sreenath Bodagala
d433438284
- Address a review comment
2023-11-09 16:38:54 +00:00
Yao Xiao
e7aa0333a9
Update RocksDB options.
2023-11-08 14:09:31 -08:00
Sreenath Bodagala
b0be152aad
- Capture data center log and storage server version differences in
...
ClusterController and include them in status json.
2023-11-08 17:28:36 +00:00
He Liu
967a546e15
Optimize physical shard move ( #10962 )
...
* Allow applyUpdates in multiple batches.
* Persist lastAppliedVersion together with a batch of updates.
* Fixed repeatedly applying the same fetched mutation.
Apply updates at different versions.
* Fixed race between removeDataShard and isInVersionedData.
* Ignore mutations earlier than `lastAppliedVersion`.
* Buggify the batch limit for applying updates.
* Implemented async load of updates.
* Fixed out-of-order version issue.
* Cleanup.
* Batch commit MoveInUpdates.
* Avoid popping unpersisted updates.
* Increment MoveInUpdates::lastAppliedVersion in MoveInUpdates::next().
Fixed MoveInUpdates::hasNext().
* Fixed loadUpdates start version.
* cleanup.
* Cleanup.
* fmt
* Commit MoveInUpdates regardless of MoveInShard status.
* Enable move restore.
* Disabled move restore and fallBackToAddingShard from ingestion failure.
* Resolved comments.
* Fixed leaks from other prs.
2023-11-07 14:19:18 -08:00
Sreenath Bodagala
d8f0a21ecc
Merge remote-tracking branch 'apple-upstream/main'
2023-11-07 21:10:43 +00:00
Sreenath Bodagala
58c0e79874
- Prevent failover when storage servers are behind.
2023-11-03 21:45:48 +00:00
neethuhaneesha
220ad87cc4
Rocksdb new options configuration ( #11048 )
2023-11-03 13:45:03 -07:00
Yao Xiao
33a29ddd85
Upgrade RocksDB version and disable CF range deletion optimization. ( #11045 )
...
* Upgrade RocksDB version and disable CF range deletion optimization.wq
* Disable iterator.
2023-11-02 17:25:11 -07:00
Dan Lambright
015167c17e
Throttle commits against hot shards ( #10970 )
...
* throttle hot shards
* expire throttled shards over time
* add backoff
* Parallelize messaging from RK to CP
* Obtain shards from a single SS
* handle expired transactions
* bump transaction_throttled_hot_shard
* Change SevError to SevWarn for CannotMonitorHotShardForSS
* Add log per request
2023-10-31 12:01:34 -04:00
neethuhaneesha
361af9e862
Perpetual wiggle option to have multiple SS in rebalance state during wiggling. ( #11019 )
2023-10-24 15:10:51 -07:00
Zhe Wang
55c023a815
Improve visibility of consistency checker ( #11018 )
...
* improve visibility consistency checker
* fix for CI failure
2023-10-24 13:21:07 -07:00
neethuhaneesha
b7148ba67e
Compaction rate limiter changes. ( #11015 )
2023-10-23 13:39:46 -07:00
Jingyu Zhou
ad259d48cf
Merge pull request #11001 from jzhou77/main
...
Fix RemoveServersSafely workload timeout error
2023-10-23 09:26:33 -07:00
Zhe Wang
4695bdfbbf
improve printSnapshotTeamsInfo ( #10999 )
2023-10-23 08:54:07 -07:00
Zhe Wang
adaf03b6f9
Check write traffic when bypass shard split ( #10974 )
...
* check write traffic when exiting shard split
* address comments
2023-10-20 15:00:14 -04:00
Jingyu Zhou
d40902aefb
Fix IDE build error
2023-10-20 09:03:32 -07:00
Jingyu Zhou
4e07fb982b
Fix force flag for exclusion
2023-10-19 21:47:13 -07:00
Jingyu Zhou
f99a8151a0
Fix RemoveServersSafely workload timeout error
...
This workload can have timeout error when using locality-based exclusion. The
sequence is:
1. RemoveServerSafely workload exclude locality by processid
2. Attrition reboots the target process, thus changing the processid, because
processid is generated for each worker process at fdbd()
3. RemoveServerSafely waits for the process exclusion, which never succeed
4. Timeout
The fix monitors processid locality changes and reissue the exclusion with the
correct locality.
To reproduce:
seed: -f ./tests/fast/SwizzledRollbackSideband.toml -s 879108103 -b on
commit: a3dbd4baf
release-7.1
2023-10-19 21:46:44 -07:00
Jingyu Zhou
e3c77044c0
Merge pull request #10987 from kakaiu/add-write-traffic-metrics-to-ddMetricsGetRange
...
Add write traffic metrics to ddMetricsGetRange
2023-10-19 11:04:31 -07:00
Johannes M. Scheuermann
9591a8e382
Correct the check for a machine being excluded
2023-10-19 11:42:46 +02:00
Zhe Wang
5f43fc91e7
pr-10985
2023-10-17 11:19:25 -07:00
Chaoguang Lin
d24d9e6e21
Replace the wrong usage of g_simulator in the snapshot code ( #10984 )
2023-10-13 18:31:17 -07:00
Zhe Wang
b0569f8717
fix corner cases of auditStorageServerShardQ ( #10980 )
2023-10-13 09:48:24 -07:00
Jingyu Zhou
9cd01ac58a
Merge pull request #10927 from sbodagala/main
...
Add support to fetch a specific group of status json fields
2023-10-11 12:57:04 -07:00
Jingyu Zhou
1896e5cd46
Fix an unitialized variable
...
Valgrind complains this for RecentRocksDBBackgroundWorkStats event.
2023-10-09 15:13:02 -07:00
Sreenath Bodagala
3dcee84898
Merge remote-tracking branch 'apple-upstream/main'
2023-10-09 15:21:16 +00:00
Zhe Wang
5767fed414
AuditStorage check all DC replica ( #10955 )
...
* add trace events when update audit metadata
* audit all DCs in replica
* fix corner case of audit replica
* fmt
* address comments
2023-10-06 14:30:21 -07:00
Yao Xiao
45494e3bba
Add knob for fetch keys budget. ( #10963 )
2023-10-06 13:06:34 -07:00
Sreenath Bodagala
fe2a8b10b2
- Address PR review comments (includes a special key related test)
2023-10-06 16:19:56 +00:00
He Liu
1e249143a4
Allow large shard ( #10961 )
...
* Added large shard.
* getMaxShardSize() returns the fix max shard size.
* Resolved comments.
* fmt.
2023-10-05 13:35:06 -07:00
Zhe Wang
be90185dd6
Throttle wiggling data moves ( #10953 )
...
* throttle wiggling data moves
* address comments
2023-10-03 09:29:24 -07:00
Sreenath Bodagala
9cde595207
- Re-use the current mechanism for gathering status.
2023-10-02 19:38:26 +00:00
Jingyu Zhou
d27a81ff58
Increase DataCenter Lag gate to 100
...
Found test failure due to 30s gate being too small:
./fdbserver.6.3.16 -r simulation -f ./tests/restarting/from_6.3.13_until_7.2.0/DrUpgradeRestart-1.txt -s 3218805329 -b on --logsize 1GB
-f ./tests/restarting/from_6.3.13_until_7.2.0/DrUpgradeRestart-2.txt --restarting -s 3218805330 -b on
clang build
commit 7d5d1e082
2023-09-25 17:12:17 -07:00
Sreenath Bodagala
3c01b1befe
- Add a special key in order to fetch a specific group of status json fields.
2023-09-25 16:23:19 +00:00
William Dowling
0f752473be
Merge branch 'main' into radixtree-production
2023-09-25 09:52:20 +02:00
neethuhaneesha
ca2700cc35
Erase storageWiggleID from disk if not used ( #10912 )
2023-09-22 14:15:06 -07:00
Jingyu Zhou
f42dd41ae8
Merge pull request #10810 from sfc-gh-tclinkenbeard/main-fix-clear-cost-estimation
...
Fix quota throttler clear cost estimation
2023-09-20 20:48:40 -07:00
Jingyu Zhou
3ebbe8c7a0
Merge pull request #10725 from sfc-gh-tclinkenbeard/main-throttle-multiple-write-tags
...
Monitor multiple write tags in `StorageQueueInfo::refreshCommitCost`
2023-09-20 20:44:34 -07:00
Zhe Wu
87083652a3
Merge pull request #10856 from halfprice/zhewu/wiggle-locality-list
...
Make `perpetual_storage_wiggle_locality` database option to take a list of localities
2023-09-20 15:42:25 -07:00
Zhe Wu
d65a6a8a10
Address comments
2023-09-20 13:56:15 -07:00
Boris Korzun
bb6c5ad187
Rename PAGE_SIZE constant
2023-09-20 22:14:31 +03:00
Zhe Wu
31d46b6fb2
Address comments
2023-09-19 10:35:51 -07:00
Jingyu Zhou
f72eab118d
Skip redwood for clearInflightCommits unit test
...
Redwood complains DB is invalid for this unit test:
Assertion keyProvider.isValid() || db.isValid() failed @ /root/src/foundationdb/fdbserver/VersionedBTree.actor.cpp 8029:
2023-09-18 14:34:36 -07:00
Jingyu Zhou
307491d68e
Use getRange for server metadatas
...
To reduce read load on SSes that serve the reads.
2023-09-18 13:32:53 -07:00
Jingyu Zhou
12fe500633
ClusterController watches changes to storage metadata
...
To retrieve storage metadata for every status json request is very expensive
for clusters with a large number of storage servers. So I change the logic so
that ClusterController actively monitors changes to storage metadata, and only
retrieves them when there is a change.
2023-09-15 14:19:04 -07:00
Zhe Wang
1f6215808b
fix multiple requests from DDDoAudit ( #10902 )
2023-09-14 11:04:19 -07:00
Zhe Wang
29a2f63f8d
Fix SSShard Audit ( #10896 )
...
* fix ssshard
* address comments
* fmt
2023-09-13 21:15:12 -07:00
Jingyu Zhou
977851fa39
Fix sharded rocks failure by adding a shard
2023-09-13 15:53:27 -07:00
Jingyu Zhou
42df53b3bc
Add a test case for storage engine
...
This test validates that in-flight commit to the storage engine is properly
handled. As found in https://github.com/apple/foundationdb/pull/10714 , an
engine could misses in-flight data and cause data corruptions.
The test case is modeled after the above corruption: insert data, then clear
the data in the next commit to the storage engine, and finally varify that the
data is cleared.
2023-09-13 15:25:25 -07:00
Zhe Wu
bebd1790db
Resolve conflict with recent change in main
2023-09-13 14:28:21 -07:00
Zhe Wu
af42816b0e
Adding test for perpetual_storage_wiggle_locality to take a list of localities
2023-09-13 13:36:16 -07:00
Zhe Wu
a86a2d752e
Apply format
2023-09-13 13:36:16 -07:00
Zhe Wu
7158d702c6
Use locality list for host checking
2023-09-13 13:36:16 -07:00
Zhe Wu
4abd2edcad
Parse wiggle locality as a list
2023-09-13 13:36:16 -07:00
A.J. Beamon
908d84c893
Merge pull request #10518 from oleg68/bugfix/clang-17
...
Fixed compiling foundationdb with the clang 17 compiler
2023-09-13 09:59:52 -07:00
Oleg Samarin
b9b4e1ebe4
Fixed formatting
2023-09-13 18:56:25 +03:00
Oleg Samarin
3c75b5d7de
Apply suggestions from code review
...
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-09-13 17:18:06 +03:00
Zhe Wang
2a40fb4135
fix audit state serializer ( #10889 )
2023-09-12 14:48:04 -07:00
Jingyu Zhou
d47e0eaaaa
Merge pull request #10877 from neethuhaneesha/wigglelocality
...
Locality check on perpetualStorageWiggleIDPrefix when DD restarts
2023-09-07 10:09:08 -07:00
Hui Liu
b2ee1fb6c4
Tune blob restore default knob ( #10871 )
2023-09-07 09:36:51 -07:00
neethuhaneesha
9bafef4fd2
Locality check on perpetualStorageWiggleIDPrefix when DD restarts
2023-09-06 16:00:32 -07:00
Hui Liu
4d2a7d507d
Add a new blob restore state to fix a race after data copy ( #10854 )
2023-09-05 14:04:35 -07:00
Zhe Wu
e2f5c50a7b
Merge pull request #10828 from halfprice/zhewu/clear-wiggle-storage-engine
...
Add option to set perpetual_storage_wiggle_engine to none
2023-09-05 11:06:21 -07:00
sfc-gh-tclinkenbeard
dccc0e6773
Merge remote-tracking branch 'origin/main' into main-fix-clear-cost-estimation
2023-09-04 13:30:46 -07:00
sfc-gh-tclinkenbeard
d82b66f5c6
Merge remote-tracking branch 'origin/main' into main-throttle-multiple-write-tags
2023-09-04 13:29:02 -07:00
Jingyu Zhou
3c5a38b120
Merge pull request #10859 from sfc-gh-ljoswiak/fixes/database-context-status
...
Avoid creating a DatabaseContext on every status json call
2023-09-02 10:25:47 -07:00
Jingyu Zhou
52209ef551
Merge pull request #10714 from neethuhaneesha/singlekeydelete
...
Storing the inprocess-commit deleted keysets until the commit is complete
2023-09-01 20:54:49 -07:00
Lukas Joswiak
b3e58761d3
Avoid creating a DatabaseContext on every status json call
...
DatabaseContext currently leaks memory by creating `Counter`s with
unique IDs on construction. Each status json call creates a new
`DatabaseContext` object, causing a memory leak over time.
2023-09-01 19:55:17 -07:00
A.J. Beamon
ead6c37e4a
Merge pull request #10662 from sfc-gh-tclinkenbeard/main-fix-grv-queue-leak
...
Fix GRV queue leak
2023-09-01 09:45:49 -07:00
Egor Zhdan
3428d4678c
[swift] Remove deprecated Swift attr spelling: `import_as_ref` ( #10825 )
...
This spelling is deprecated and will be removed in the future version of Swift:
```
__attribute__((swift_attr("import_as_ref")))
```
The new correct spelling is:
```
__attribute__((swift_attr("import_reference")))
```
2023-08-31 13:23:23 -05:00
neethuhaneesha
08cc6518bb
Deleting rocksdb keysSet after the commit completes.
2023-08-30 15:25:23 -07:00
neethuhaneesha
68d5708e02
Storing the inprocess-commit deleted keysets until the commit is complete. ( #10672 )
2023-08-30 15:25:23 -07:00
Zhe Wu
83992d61ec
Add a knob to guard the gray failure detection during TLog recovery
2023-08-29 14:49:39 -07:00
Zhe Wu
abd535c2e0
Add test documents
2023-08-29 14:32:21 -07:00
Yi Wu
8d7f2e84ed
Merge pull request #10831 from sfc-gh-yiwu/ear_timeout
...
EaR: Handle KMS timeout in storage server and commit proxy
2023-08-28 20:59:22 -07:00
Zhe Wang
432c077b51
fix dd issue when dd skip audit ( #10844 )
2023-08-28 16:39:45 -07:00
Yi Wu
3287098b4a
EaR: Handle KMS timeout in storage server and commit proxy
2023-08-28 16:17:43 -07:00
Xiaoge Su
cb99a8f56f
fixup! Reformat source
2023-08-28 14:56:39 -07:00
Xiaoge Su
5a12d12774
fixup! Enable sharded RocksDB storage engine for PhysicalShardMove
2023-08-28 14:56:39 -07:00
Jingyu Zhou
1a6114a66f
Merge pull request #10843 from xis19/main
...
fixup! Add back missing initializer
2023-08-28 12:51:54 -07:00
Xiaoge Su
d0685f9fff
fixup! Add back missing initializer
2023-08-28 11:05:07 -07:00
Hui Liu
30d4f07395
Catch initial flush error and avoid crashing blob manager ( #10836 )
2023-08-25 20:32:20 -07:00
Zhe Wang
f43b20e15c
Audit location metadata in DD ( #10820 )
...
* Audit location metadata in DD
* nits
2023-08-25 17:11:11 -07:00
Josh Slocum
3bdcbef465
enabling median assignment limiting ( #10805 )
2023-08-25 17:52:54 -05:00
Yao Xiao
b20dcf23a9
Support periodic compaction for sharded rocksdb. ( #10815 )
2023-08-25 15:38:01 -07:00
Jingyu Zhou
adcedb9301
Merge pull request #10812 from xis19/main
...
A few test failure fixes
2023-08-25 11:07:33 -07:00
Zhe Wu
6610a228c7
Add option to set perpetual_storage_wiggle_engine to none
2023-08-24 13:43:58 -07:00
Yao Xiao
0a87b6039f
Add perpetual wiggle pause reason. ( #10821 )
2023-08-24 00:31:38 -07:00
Xiaoge Su
d8d9b9402b
fixup! Subobject-linkage
...
/codebuild/output/src3651401136/src/github.com/apple/foundationdb/fdbserver/DataDistribution.actor.cpp:111:8: error: 'DDAudit' has a field 'DDAudit::context' whose type uses the anonymous namespace [-Werror=subobject-linkage]
2023-08-23 13:47:41 -07:00
Xiaoge Su
26a87da578
fixup! Use isMocked method
2023-08-23 13:25:37 -07:00
Xiaoge Su
1ec5ed13b9
Merge branch 'main' into main
2023-08-23 11:35:21 -07:00
Zhe Wang
7e8f326277
Audit storage for specific engine ( #10781 )
...
* audit storage for specific engine
* fix getStorageType
* fix budget of skipAuditOnRange
* fix budget in scheduleAuditOnRange
* fix CI error
* improve trace events
* address comments
2023-08-23 10:51:24 -07:00
Zhe Wang
f8311ae069
Add more trace event for TSS recruitment ( #10809 )
...
* add more trace event for tss
* update StorageServerInitProgress
* add more traces
2023-08-23 09:19:30 -07:00
Xiaoge Su
ac2bf66a2a
fixup! gcc does not allow class member variable uses anonymous namespace
...
fdbserver/DataDistribution.actor.cpp:360:8: error: 'DataDistributor' has a field 'DataDistributor::audits' whose type uses the anonymous namespace [-Werror=subobject-linkage]
clang does not report the same error
2023-08-22 18:03:46 -07:00
Xiaoge Su
95c5c2e04c
Disable PerpentualWiggleStorageMigrationWorkload when RocksDB is disabled
...
This can also be fixed in CMake test files, not sure should disable it
there but guess changing CXX code might be simplier.
2023-08-22 16:57:07 -07:00
Xiaoge Su
47efd890ad
Better error report when specified storage engine is not supported
2023-08-22 15:52:08 -07:00
Xiaoge Su
130102f1bd
Disable AuditService when mocking DD
2023-08-22 15:51:38 -07:00
sfc-gh-tclinkenbeard
d6a7e1eccd
Normalize clear costs for quota throttler
2023-08-22 15:29:28 -07:00
Hui Liu
98958b32b8
Wait file deleted in blob manifest cleanup ( #10802 )
2023-08-22 10:50:03 -07:00
Josh Slocum
b488abbee0
fixed split downsampling to happen as part of split range and to correctly deal with re-aligning keys after downsample ( #10796 )
2023-08-22 12:21:16 -05:00
Zhe Wang
83dc9ff6f7
Trace SS init progress ( #10799 )
...
* trace ss init progress
* improve trace events
2023-08-18 18:44:37 -07:00
Ata E Husain Bohra
22ddb8a92d
Prevent Status actor from bubbling up timeout error ( #10791 )
...
Description
Patch addresses occurrences where Status.actor ends up bubbling timeout error
up to the ClusterController causing recovery to be triggered when
ClusterGetStatus timeout for some reason.
Testing
2023-08-18 15:21:52 -07:00
Yao Xiao
c63ee571e5
Export file metrics and add knob for file size multiplier. ( #10785 )
2023-08-16 11:27:33 -07:00
Hui Liu
aea6fa5ca6
Set BLOB_RESTORE_SKIP_EMPTY_RANGES default value to false ( #10784 )
2023-08-16 10:02:06 -07:00
Ankita Kejriwal
7e424c7386
Stagger storage quota estimation requests and observability improvements ( #10759 )
...
* Rename and simplify fetch time variables
* Add RefreshTime detail to TenantCacheGetStorageUsageRefreshSlow trace
* Stagger storage estimation requests
* Update the value of a knob in simulation to reduce flakiness
* Improve names of TenantCache and StorageQuota related traces. Add slow refresh time.
* Convert potentially spammy TenantCache traces to SevDebug
2023-08-15 13:09:24 -07:00
Jingyu Zhou
5ace1911d1
Fix a few typos
2023-08-14 15:01:19 -07:00
Zhe Wang
f1c17b27fc
Multiple improvements to AuditStorages ( #10685 )
...
* remove danger DDAudit assert, add AuditRate knob, add progress check when ssshard complete, add progress check for ssshard in fdbcli
* throttle progress check for ssshard
* fix getAuditProgressByServer
* fix trace event for ss audit
* using name -- checkMoveKeysLockForAudit
* new scheduleAuditLocationMetadata
* address comments
* shorten progress summary for ssshard
* simplify getAuditProgressByServer in fdbcli
2023-08-14 13:13:49 -07:00
Evan Tschannen
3209dc7b30
Fixed multiple bugs related to locality based exclusions ( #10623 )
...
* fix: Non-storage processes were not being checked for locality exclusions
fix: Data distribution when not detect a newly added process was locality excluded
fix: RemoveServerSafely did not wait for processes to be excluded before killing them when excluding localities
* fix: do not allow locality based excludes if they cannot exclude the required addresses
2023-08-11 15:17:02 -07:00
Zhe Wang
e7528aca09
Revert inappropriate setting of ApiCorrectness test ( #10771 )
...
* add max_manifest_file_size to rocksdb
* revert ApiCorrectness config change
2023-08-11 12:41:55 -07:00
Evan Tschannen
bf0677f1d4
encryptionAtRestMode needs to be in the system keyspace so that it is encrypted with the cluster specific encryption key rather than a tenant key ( #10777 )
2023-08-11 12:20:22 -07:00
Zhe Wang
5868173a3e
nits
2023-08-10 16:51:28 -05:00
Zhe Wang
81db1da5a9
address comments
2023-08-10 16:51:28 -05:00
Zhe Wang
5598f4f28f
trace shardedrocks manifest size
2023-08-10 16:51:28 -05:00
Zhe Wu
726574243c
Improve PerpetualWiggleStorageMigration reliability
2023-08-10 09:36:04 -07:00
Zhe Wu
eb6f0c613d
Add documentation for perpetual_storage_wiggle_engine config
2023-08-10 09:35:57 -07:00
Zhe Wu
ab4ae712e8
Add PerpetualWiggleStorageMigrationWorkload documentation.
2023-08-10 09:35:57 -07:00
Zhe Wu
17ae952f15
Remove debugging notes
2023-08-10 09:35:57 -07:00
Zhe Wu
fb703c0021
Making PerpetualWiggleStorageMigration test pass more reliably
2023-08-10 09:35:57 -07:00
Zhe Wu
863038a44c
Add improvement for initializing storage server using new perpetual_wiggle_storage_engine config
2023-08-10 09:35:57 -07:00
Zhe Wu
cdca09dc93
Making PerpetualWiggleStorageMigration pass reliably
2023-08-10 09:35:57 -07:00
Zhe Wu
f65e3a35f2
Adding perpetual wiggle test to reproduce exclude/include case
2023-08-10 09:35:57 -07:00
Xiaoxi Wang
c2cfebac73
Fix tss kill logic: only disable Tss check when zeroHealthyTeams=false
2023-08-09 20:43:15 -07:00
Jingyu Zhou
ad9472d6c2
Merge pull request #10757 from jzhou77/main
...
Fix a crash error
2023-08-09 14:57:24 -07:00
He Liu
df848005f8
Allow applyUpdates in multiple batches. ( #10583 )
2023-08-09 14:04:36 -07:00
Jingyu Zhou
a4e4ab6411
Fix a crash error
2023-08-09 13:44:52 -07:00
Jingyu Zhou
d1d9f185d8
Merge pull request #10715 from xis19/aa
...
fixup! Change the type of queryQueueMax from double to uint64_t in Ge…
2023-08-07 15:58:58 -06:00
Josh Slocum
4f6bf34fe1
fixing HBR to update key BM/BW are watching
2023-08-07 14:01:30 -05:00
Josh Slocum
20b94368ee
fixing tenant watches
2023-08-07 14:01:30 -05:00
Jingyu Zhou
bb328c1c36
Merge pull request #10724 from sfc-gh-etschannen/fix-grv-accounting-main
...
fix: when requests are dropped from the queue the txnRequestOut becomes inconsistent with txnRequestIn
2023-08-07 11:40:50 -06:00
sfc-gh-tclinkenbeard
2228cd3320
Monitor multiple write tags in StorageQueueInfo::refreshCommitCost
2023-08-02 15:52:57 -07:00
Yi Wu
fff6b745d2
EaR: turn down the probability of enabling encryption in simulation
2023-08-02 10:11:18 -07:00