Commit Graph

7281 Commits

Author SHA1 Message Date
Jingyu Zhou aa1d005cc5
Merge pull request #10907 from jzhou77/main
ClusterController watches changes to storage metadata
2023-09-19 12:21:35 -07:00
Jingyu Zhou 307491d68e Use getRange for server metadatas
To reduce read load on SSes that serve the reads.
2023-09-18 13:32:53 -07:00
Zhe Wu 8706098916
Merge pull request #10904 from halfprice/zhewu/txn-window
Create MAX_WRITE_TRANSACTION_LIFE_VERSIONS client knob
2023-09-15 15:51:13 -07:00
Jingyu Zhou 12fe500633 ClusterController watches changes to storage metadata
To retrieve storage metadata for every status json request is very expensive
for clusters with a large number of storage servers. So I change the logic so
that ClusterController actively monitors changes to storage metadata, and only
retrieves them when there is a change.
2023-09-15 14:19:04 -07:00
Zhe Wu aea57f6da4 Create MAX_WRITE_TRANSACTION_LIFE_VERSIONS client knob 2023-09-14 14:01:43 -07:00
Yao Xiao d90a5fa742
Disable read aware DD. (#10900) 2023-09-13 15:50:58 -07:00
Zhe Wang 2a40fb4135
fix audit state serializer (#10889) 2023-09-12 14:48:04 -07:00
Hao Fu 6d9c53f8c4 Add proxy to backup agent via global var
backup agent itself does not have proxy info.
This changes adds the proxy via a global var.
2023-09-08 10:27:01 -07:00
Zhe Wu dc48f1b325
Merge pull request #10876 from halfprice/zhewu/make-sure-end-is-not-persist
Make sure that storage and tlog are always set to a valid type
2023-09-07 10:58:35 -07:00
Hui Liu b2ee1fb6c4
Tune blob restore default knob (#10871) 2023-09-07 09:36:51 -07:00
Zhe Wu 9e5488dd3d Make sure that storage and tlog are always set to a valid type 2023-09-06 14:58:42 -07:00
Hui Liu 4d2a7d507d
Add a new blob restore state to fix a race after data copy (#10854) 2023-09-05 14:04:35 -07:00
Zhe Wu e2f5c50a7b
Merge pull request #10828 from halfprice/zhewu/clear-wiggle-storage-engine
Add option to set perpetual_storage_wiggle_engine to none
2023-09-05 11:06:21 -07:00
Hui Liu 00d3062728
Initialize apply mutations map for restore to version (#10857) 2023-09-05 10:03:35 -07:00
A.J. Beamon ead6c37e4a
Merge pull request #10662 from sfc-gh-tclinkenbeard/main-fix-grv-queue-leak
Fix GRV queue leak
2023-09-01 09:45:49 -07:00
Zhe Wu 83992d61ec Add a knob to guard the gray failure detection during TLog recovery 2023-08-29 14:49:39 -07:00
Zhe Wu 314d1b66a5 Fix StatusWorkload after adding perpetual_storage_wiggle_engine 2023-08-29 13:58:23 -07:00
Yi Wu 8d7f2e84ed
Merge pull request #10831 from sfc-gh-yiwu/ear_timeout
EaR: Handle KMS timeout in storage server and commit proxy
2023-08-28 20:59:22 -07:00
Zhe Wang 432c077b51
fix dd issue when dd skip audit (#10844) 2023-08-28 16:39:45 -07:00
Yi Wu 3287098b4a EaR: Handle KMS timeout in storage server and commit proxy 2023-08-28 16:17:43 -07:00
Zhe Wang f43b20e15c
Audit location metadata in DD (#10820)
* Audit location metadata in DD

* nits
2023-08-25 17:11:11 -07:00
Josh Slocum 3bdcbef465
enabling median assignment limiting (#10805) 2023-08-25 17:52:54 -05:00
Yao Xiao b20dcf23a9
Support periodic compaction for sharded rocksdb. (#10815) 2023-08-25 15:38:01 -07:00
Zhe Wu 6610a228c7 Add option to set perpetual_storage_wiggle_engine to none 2023-08-24 13:43:58 -07:00
Zhe Wang 7e8f326277
Audit storage for specific engine (#10781)
* audit storage for specific engine

* fix getStorageType

* fix budget of skipAuditOnRange

* fix budget in scheduleAuditOnRange

* fix CI error

* improve trace events

* address comments
2023-08-23 10:51:24 -07:00
Jingyu Zhou 2a616b3866
Merge pull request #10763 from w41ter/fix_8882
Fix guess region from s3 URL
2023-08-22 22:24:21 -07:00
Yao Xiao c63ee571e5
Export file metrics and add knob for file size multiplier. (#10785) 2023-08-16 11:27:33 -07:00
Hui Liu aea6fa5ca6
Set BLOB_RESTORE_SKIP_EMPTY_RANGES default value to false (#10784) 2023-08-16 10:02:06 -07:00
Ankita Kejriwal 7e424c7386
Stagger storage quota estimation requests and observability improvements (#10759)
* Rename and simplify fetch time variables

* Add RefreshTime detail to TenantCacheGetStorageUsageRefreshSlow trace

* Stagger storage estimation requests

* Update the value of a knob in simulation to reduce flakiness

* Improve names of TenantCache and StorageQuota related traces. Add slow refresh time.

* Convert potentially spammy TenantCache traces to SevDebug
2023-08-15 13:09:24 -07:00
Zhe Wang f1c17b27fc
Multiple improvements to AuditStorages (#10685)
* remove danger DDAudit assert, add AuditRate knob, add progress check when ssshard complete, add progress check for ssshard in fdbcli

* throttle progress check for ssshard

* fix getAuditProgressByServer

* fix trace event for ss audit

* using name -- checkMoveKeysLockForAudit

* new scheduleAuditLocationMetadata

* address comments

* shorten progress summary for ssshard

* simplify getAuditProgressByServer in fdbcli
2023-08-14 13:13:49 -07:00
Evan Tschannen 3209dc7b30
Fixed multiple bugs related to locality based exclusions (#10623)
* fix: Non-storage processes were not being checked for locality exclusions
fix: Data distribution when not detect a newly added process was locality excluded
fix: RemoveServerSafely did not wait for processes to be excluded before killing them when excluding localities

* fix: do not allow locality based excludes if they cannot exclude the required addresses
2023-08-11 15:17:02 -07:00
Zhe Wang e7528aca09
Revert inappropriate setting of ApiCorrectness test (#10771)
* add max_manifest_file_size to rocksdb

* revert ApiCorrectness config change
2023-08-11 12:41:55 -07:00
Zhe Wang 5868173a3e nits 2023-08-10 16:51:28 -05:00
Zhe Wang 5598f4f28f trace shardedrocks manifest size 2023-08-10 16:51:28 -05:00
Zhe Wu ab4ae712e8 Add PerpetualWiggleStorageMigrationWorkload documentation. 2023-08-10 09:35:57 -07:00
Zhe Wu 863038a44c Add improvement for initializing storage server using new perpetual_wiggle_storage_engine config 2023-08-10 09:35:57 -07:00
w41ter 5e6d15a0de Fix guess region from s3 URL 2023-08-10 09:09:51 +00:00
Jingyu Zhou 4a925b18e7
Merge pull request #10755 from w41ter/main
Avoid to send request to empty resource
2023-08-09 14:07:23 -07:00
He Liu df848005f8
Allow applyUpdates in multiple batches. (#10583) 2023-08-09 14:04:36 -07:00
w41ter 47a054e124 Avoid to send request to empty resource 2023-08-09 11:21:48 +00:00
Ankita Kejriwal 07e4516667
Reduce the frequency of polling tenants' storage usage (#10678) 2023-07-31 15:03:37 -07:00
Zhe Wang d0742c79ac
Improving visibility to debug sharded rocksdb (#10694)
* logging storage commit stats

* add rocks flush and compaction listener

* remove used field in FlushStats and fix CI error

* reduce LOGGING_ROCKSDB_BG_WORK_PROBABILITY

* merge rocks event listeners

* avoid using mutex/spinloop in rocksdb event listener

* code clean

* fix OnCompactionBegin and OnFlushBegin

* add logReason to RecentRocksDBBackgroundWorkStats

* add error listener back
2023-07-31 14:45:26 -07:00
Xiaoxi Wang 53ace31765 fix read ops sample name 2023-07-26 15:40:58 -07:00
Hao Fu a5f4d53c45
Remove SS entries from RateKeeper once it is down (#10627)
* Remove SS entries from RateKeeper once it is down

Before the change, certain data structures in RateKeeper would
not delete data associated with a deleted/cancelled SS, thus
it causes significant unnecessary CPU usage, results in degrades
of GRV proxy in performance.  This change fixes it.
2023-07-24 13:47:23 -07:00
sfc-gh-tclinkenbeard 345a5f2838 Buggify START_TRANSACTION_MAX_QUEUE_SIZE 2023-07-22 01:33:49 -07:00
Zhe Wang 3426fc3c1a
Add DD Security Mode (#10646)
* dd-security-mode

* address comments

* cleanup

* revise tr option set in loadAndUpdateAuditMetadataWithNewDDId

* address comments

* reset auditStorageInitStarted before DD init

* decouple audit resume and audit launch

* audit launch new request should wait for resuming existing requests

* address comment/clean up/fix

* fix

* fix initAuditMetadata retry

* fix initAuditMetadata retry should reset tr
2023-07-21 17:06:25 -07:00
Jingyu Zhou 8785840e52
Merge pull request #10638 from jzhou77/main
Fix a bug of hash value for backup log keys in getLogKey()
2023-07-21 16:50:20 -07:00
Jingyu Zhou 26906559e8
Merge pull request #10652 from yao-xiao-github/joshua
Set some rocksdb knobs to reduce simulation runtime.
2023-07-21 08:32:25 -07:00
Jingyu Zhou 24968674ce
Merge pull request #10643 from yao-xiao-github/write-buffer
Add knob.
2023-07-21 08:31:14 -07:00
Jingyu Zhou ae6ef0ce90 Add different block size for tests 2023-07-20 21:28:58 -07:00