Dimitris Apostolou
a88114c222
Fix typos
2024-02-07 01:16:00 +02:00
Zhe Wang
970175a8a2
cherrypick storage queue aware getteam ( #11154 )
2024-01-30 15:15:18 -08:00
Zhe Wang
ebb05f54c3
Add storage interface for checksum ( #11144 )
...
* add-storage-interface-for-check-sum
* address comments
2024-01-24 14:34:35 -08:00
Yao Xiao
2329e8327a
Add log cleaner for rocksdb logs. ( #11134 )
...
Co-authored-by: yaoxiao-github <yaoxiao@Yaos-MacBook-Pro-14.local>
2024-01-17 14:51:15 -08:00
Dan Lambright
0bfc99bf1f
ensure synthetic data is written to existing shards ( #11128 )
...
Co-authored-by: Dan Lambright <hlambright@apple.com>
2024-01-16 10:22:07 -05:00
Dan Lambright
54ebcde97b
Fix bug in synthetic data creation ( #11115 )
...
Co-authored-by: Dan Lambright <hlambright@apple.com>
2024-01-08 12:56:41 -05:00
Jingyu Zhou
75f7814ad1
Merge pull request #11112 from sfc-gh-jslocum/stuck_watch_fix_main
...
Stuck watch bug fix
2024-01-05 10:41:10 -08:00
Josh Slocum
611eb00fe1
stuck watch bug fix
...
* buggify watch version retry and fix multiple watch race after retry
* watch debugging improvements
2024-01-03 16:05:42 -06:00
Dan Lambright
86a2301faa
updated per review comments
2024-01-03 12:21:54 -05:00
Dan Lambright
20882507f4
sanity checks, fix knob
2024-01-02 12:09:32 -05:00
Dan Lambright
857e38b80b
bug fixes/cleanup
2023-12-21 16:39:22 -05:00
Dan Lambright
05571c59a9
Set tags on apply metadata mutations
2023-12-21 13:20:21 -05:00
Dan Lambright
2b4b4ae512
Synthesize data on SS based off parameters from new system transaction
2023-12-20 11:25:47 -05:00
Dan Lambright
5ebe8b0915
move data to value and parse it
2023-12-18 09:10:06 -05:00
Dan Lambright
a20f9d3475
Interfaces to synthesize data
2023-12-13 15:19:17 -05:00
neethuhaneesha
4f167f50be
Adding field length to audit storage trace events. ( #11079 )
2023-12-01 15:45:17 -08:00
He Liu
a8cdb7367c
Throttle low-priority fetchKeys. ( #11083 )
2023-11-30 21:27:52 -08:00
He Liu
422da7d7c7
Throttle fetch keys ( #11060 )
...
* Added ThroughputLimiter class.
* Throttle fetchKeys.
* cleanup.
* Increased fetchKeys rate limit for sim tests.
* Added unit.
* Resolved comments.
2023-11-16 10:30:40 -08:00
Zhe Wang
1e9c5bb390
Propagate data move reason from DD to SS ( #11063 )
...
* encode reason to data move id
* address comments
* fix data move id decode bug and add assert for data move decode invariant
* address comments
2023-11-15 13:07:11 -08:00
He Liu
b8f1670a0e
Physical shard move tss ( #11057 )
...
* Refactored newDataMoveId() and decodeServerKeysValue().
* Enabled physical shard move for tss.
* Added unit test & cleanup.
* clean up test configs.
2023-11-13 11:34:07 -08:00
He Liu
29a10311d9
Speed up physical shard move ( #11056 )
...
* Allow applyUpdates in multiple batches.
* Persist lastAppliedVersion together with a batch of updates.
* Fixed repeatedly applying the same fetched mutation.
Apply updates at different versions.
* Fixed race between removeDataShard and isInVersionedData.
* Ignore mutations earlier than `lastAppliedVersion`.
* Buggify the batch limit for applying updates.
* Implemented async load of updates.
* Fixed out-of-order version issue.
* Cleanup.
* Batch commit MoveInUpdates.
* Avoid popping unpersisted updates.
* Increment MoveInUpdates::lastAppliedVersion in MoveInUpdates::next().
Fixed MoveInUpdates::hasNext().
* Fixed loadUpdates start version.
* cleanup.
* Cleanup.
* fmt
* Commit MoveInUpdates regardless of MoveInShard status.
* Enable move restore.
* Disabled move restore and fallBackToAddingShard from ingestion failure.
* Get rid of persisting MoveInShard states between phases.
* Make updateMoveShardMetadata synchronous.
* Recovered code deleted accidentally.
* Cleanup.
* cleanup.
2023-11-13 09:27:58 -08:00
He Liu
967a546e15
Optimize physical shard move ( #10962 )
...
* Allow applyUpdates in multiple batches.
* Persist lastAppliedVersion together with a batch of updates.
* Fixed repeatedly applying the same fetched mutation.
Apply updates at different versions.
* Fixed race between removeDataShard and isInVersionedData.
* Ignore mutations earlier than `lastAppliedVersion`.
* Buggify the batch limit for applying updates.
* Implemented async load of updates.
* Fixed out-of-order version issue.
* Cleanup.
* Batch commit MoveInUpdates.
* Avoid popping unpersisted updates.
* Increment MoveInUpdates::lastAppliedVersion in MoveInUpdates::next().
Fixed MoveInUpdates::hasNext().
* Fixed loadUpdates start version.
* cleanup.
* Cleanup.
* fmt
* Commit MoveInUpdates regardless of MoveInShard status.
* Enable move restore.
* Disabled move restore and fallBackToAddingShard from ingestion failure.
* Resolved comments.
* Fixed leaks from other prs.
2023-11-07 14:19:18 -08:00
Dan Lambright
015167c17e
Throttle commits against hot shards ( #10970 )
...
* throttle hot shards
* expire throttled shards over time
* add backoff
* Parallelize messaging from RK to CP
* Obtain shards from a single SS
* handle expired transactions
* bump transaction_throttled_hot_shard
* Change SevError to SevWarn for CannotMonitorHotShardForSS
* Add log per request
2023-10-31 12:01:34 -04:00
Zhe Wang
b0569f8717
fix corner cases of auditStorageServerShardQ ( #10980 )
2023-10-13 09:48:24 -07:00
Zhe Wang
5767fed414
AuditStorage check all DC replica ( #10955 )
...
* add trace events when update audit metadata
* audit all DCs in replica
* fix corner case of audit replica
* fmt
* address comments
2023-10-06 14:30:21 -07:00
Yao Xiao
45494e3bba
Add knob for fetch keys budget. ( #10963 )
2023-10-06 13:06:34 -07:00
Zhe Wang
29a2f63f8d
Fix SSShard Audit ( #10896 )
...
* fix ssshard
* address comments
* fmt
2023-09-13 21:15:12 -07:00
Hui Liu
4d2a7d507d
Add a new blob restore state to fix a race after data copy ( #10854 )
2023-09-05 14:04:35 -07:00
Yi Wu
8d7f2e84ed
Merge pull request #10831 from sfc-gh-yiwu/ear_timeout
...
EaR: Handle KMS timeout in storage server and commit proxy
2023-08-28 20:59:22 -07:00
Zhe Wang
432c077b51
fix dd issue when dd skip audit ( #10844 )
2023-08-28 16:39:45 -07:00
Yi Wu
3287098b4a
EaR: Handle KMS timeout in storage server and commit proxy
2023-08-28 16:17:43 -07:00
Zhe Wang
f43b20e15c
Audit location metadata in DD ( #10820 )
...
* Audit location metadata in DD
* nits
2023-08-25 17:11:11 -07:00
Zhe Wang
f8311ae069
Add more trace event for TSS recruitment ( #10809 )
...
* add more trace event for tss
* update StorageServerInitProgress
* add more traces
2023-08-23 09:19:30 -07:00
Zhe Wang
83dc9ff6f7
Trace SS init progress ( #10799 )
...
* trace ss init progress
* improve trace events
2023-08-18 18:44:37 -07:00
Hui Liu
aea6fa5ca6
Set BLOB_RESTORE_SKIP_EMPTY_RANGES default value to false ( #10784 )
2023-08-16 10:02:06 -07:00
Zhe Wang
f1c17b27fc
Multiple improvements to AuditStorages ( #10685 )
...
* remove danger DDAudit assert, add AuditRate knob, add progress check when ssshard complete, add progress check for ssshard in fdbcli
* throttle progress check for ssshard
* fix getAuditProgressByServer
* fix trace event for ss audit
* using name -- checkMoveKeysLockForAudit
* new scheduleAuditLocationMetadata
* address comments
* shorten progress summary for ssshard
* simplify getAuditProgressByServer in fdbcli
2023-08-14 13:13:49 -07:00
He Liu
df848005f8
Allow applyUpdates in multiple batches. ( #10583 )
2023-08-09 14:04:36 -07:00
Zhe Wang
d0742c79ac
Improving visibility to debug sharded rocksdb ( #10694 )
...
* logging storage commit stats
* add rocks flush and compaction listener
* remove used field in FlushStats and fix CI error
* reduce LOGGING_ROCKSDB_BG_WORK_PROBABILITY
* merge rocks event listeners
* avoid using mutex/spinloop in rocksdb event listener
* code clean
* fix OnCompactionBegin and OnFlushBegin
* add logReason to RecentRocksDBBackgroundWorkStats
* add error listener back
2023-07-31 14:45:26 -07:00
Zhe Wang
3426fc3c1a
Add DD Security Mode ( #10646 )
...
* dd-security-mode
* address comments
* cleanup
* revise tr option set in loadAndUpdateAuditMetadataWithNewDDId
* address comments
* reset auditStorageInitStarted before DD init
* decouple audit resume and audit launch
* audit launch new request should wait for resuming existing requests
* address comment/clean up/fix
* fix
* fix initAuditMetadata retry
* fix initAuditMetadata retry should reset tr
2023-07-21 17:06:25 -07:00
Yao Xiao
70a7908fc9
Fix bytesPerCommit histogram.
2023-07-19 15:54:21 -07:00
Hui Liu
7c8c24bc8d
blob restore : Log and skip data copy if we miss data for a certain tenant ( #10621 )
2023-07-19 09:52:30 -07:00
Zhe Wang
63d387eb0b
Add complete check for location metadata by audit storage ( #10636 )
...
* cleanup traceevent
* add complete check
* fix and cleanup
* nit
* code cleanup
* code cleanup
* increase audit retry count
* revise comments and no code changes
2023-07-19 09:40:58 -07:00
Zhe Wang
522c9d4f0f
Add new implementation of audit storage for user data ( #10613 )
...
* remainingBudgetForAuditTasks should be managed within audit
* fix CI
* add audit storage test for various ranges
* clean DD
* new auditStorageUserDataQ
* fix assert fail in startTrackShardAssignment
* fix assert fail in ssaudit
* address comments
* replace assert with audit_cancel in ss audits
* add audit check progress tool
* add observability to audit progress and fix audit bugs
* fix audit progress issues and add sim test for audit progress and add trace event for the audit progress and add fdbcli to track the audit progress
* remove old audit storage on SS
* check audit progress when auditCore completes
2023-07-16 09:56:26 -07:00
Nim Wijetunga
7f2260bbd2
Add Encryption Related Latency Metrics ( #10596 )
...
* add ss and cp latency metrics
* make changes
2023-07-14 11:30:16 -07:00
Hui Liu
66a7acd960
Fix blob restore stuck issue ( #10574 )
2023-06-28 10:23:11 -07:00
He Liu
6337125712
Several minor improvements for ShardedRocksDB ( #10520 )
...
* Terminate DD if SHARD_ENCODE_LOCATION_METADATA is not enabled and storage_engine_type is ShardedRocksDB.
* Fixed Error in non-main thread.
* Minor improvements.
2023-06-24 16:07:14 -07:00
Zhe Wang
37689af3f2
Detect inconsistency of KeyServers and ServerKeys in real time ( #10484 )
...
* add framework
* add audit logic
* refactor audit loc metadata
* address comments
* add realtime audit timeout, add post validation logic
* fix input empty range to compareKeyServersAndServerKeys
* add context for auditKeyServersAndServerKeysInRealTime
* focus on moveShard
* remove space
* address comments
* cleanup
* add audit cleanup
* make validateRangeAssignment simple
* change trace name
* add shardAssigned
* stop DD when inconsistency detected
* fix ci
* small fix
* revert ss and auditUtl and simplify rt audit
* cleanup ss
* tiny change
* address comments and refactor code
* make auditLocationMetadataPreCheck retriable
* handle actor cancel in auditLocationMetadataPreCheck
* rm timeout and add new protection for failure of audit
* fix bugs
* import dataMoveId to validation
* improve trace event
* carefully propagate error and stop DD
* tiny fix
* small change
* remove a state var
* nit
* clean comments
* fmt
2023-06-23 17:40:21 -07:00
Evan Tschannen
ef682d304e
fix IKeyValueStore include
2023-06-16 13:28:40 -07:00
Josh Slocum
10c16dec41
fix retransmits in corruption check ( #10491 )
2023-06-14 10:41:06 -05:00
Evan Tschannen
359e178dcd
Merge branch 'main' into feature-durable-change-feed
...
# Conflicts:
# fdbclient/ClientKnobs.cpp
# fdbserver/BlobManager.actor.cpp
# fdbserver/worker.actor.cpp
2023-06-11 13:58:35 -07:00