Commit Graph

7458 Commits

Author SHA1 Message Date
Sreenath Bodagala fe2a8b10b2 - Address PR review comments (includes a special key related test) 2023-10-06 16:19:56 +00:00
He Liu 1e249143a4
Allow large shard (#10961)
* Added large shard.

* getMaxShardSize() returns the fix max shard size.

* Resolved comments.

* fmt.
2023-10-05 13:35:06 -07:00
Jingyu Zhou 12ec4628f3
Merge pull request #10960 from jzhou77/main
Fix strtol return type and error checking
2023-10-04 11:25:46 -07:00
Jingyu Zhou 7cb38e0bab Fix strtol return type and error checking
Added an overflow test as well.
2023-10-04 09:10:34 -07:00
Jingyu Zhou 912707f4f2
Merge pull request #10950 from jzhou77/main
Fix an invalid memory access
2023-10-03 11:24:34 -07:00
Zhe Wang be90185dd6
Throttle wiggling data moves (#10953)
* throttle wiggling data moves

* address comments
2023-10-03 09:29:24 -07:00
Jingyu Zhou ffb015db36 Fix an invalid memory access
Found by Valgrind
2023-10-02 12:53:43 -07:00
neethuhaneesha 0ec9b4d96e
Reducing canCommit wait to avoid bigger spikes on pendinng compaction bytes (#10943) 2023-09-28 13:03:25 -07:00
Jingyu Zhou 35ab0b3e10
Merge pull request #10523 from xis19/actor
Store the call stack of ACTORs
2023-09-26 21:05:54 -07:00
Sreenath Bodagala 3c01b1befe - Add a special key in order to fetch a specific group of status json fields. 2023-09-25 16:23:19 +00:00
William Dowling e5c4eafa9f Fix wrong curly brace placement 2023-09-25 10:47:05 +02:00
William Dowling 0f752473be
Merge branch 'main' into radixtree-production 2023-09-25 09:52:20 +02:00
Jingyu Zhou f42dd41ae8
Merge pull request #10810 from sfc-gh-tclinkenbeard/main-fix-clear-cost-estimation
Fix quota throttler clear cost estimation
2023-09-20 20:48:40 -07:00
Jingyu Zhou e0fba062c0
Merge pull request #10910 from johscheuer/correct-proxy-handling-backup-agent-main
Correct proxy handling backup agent main
2023-09-20 20:41:21 -07:00
Zhe Wu 87083652a3
Merge pull request #10856 from halfprice/zhewu/wiggle-locality-list
Make `perpetual_storage_wiggle_locality` database option to take a list of localities
2023-09-20 15:42:25 -07:00
Johannes M. Scheuermann 91c3c02673 Correct format 2023-09-20 17:52:59 +02:00
Johannes M. Scheuermann 1312e347be Correct the way how the backup agent makes use of the proxy and add proxy to trace events 2023-09-20 17:52:59 +02:00
Xiaoge Su 91ec1fdf10 Provide actor call backtrace
See design/AcAC.md
2023-09-19 20:58:33 -07:00
Jingyu Zhou aa1d005cc5
Merge pull request #10907 from jzhou77/main
ClusterController watches changes to storage metadata
2023-09-19 12:21:35 -07:00
Jingyu Zhou 307491d68e Use getRange for server metadatas
To reduce read load on SSes that serve the reads.
2023-09-18 13:32:53 -07:00
Zhe Wu 8706098916
Merge pull request #10904 from halfprice/zhewu/txn-window
Create MAX_WRITE_TRANSACTION_LIFE_VERSIONS client knob
2023-09-15 15:51:13 -07:00
Jingyu Zhou 12fe500633 ClusterController watches changes to storage metadata
To retrieve storage metadata for every status json request is very expensive
for clusters with a large number of storage servers. So I change the logic so
that ClusterController actively monitors changes to storage metadata, and only
retrieves them when there is a change.
2023-09-15 14:19:04 -07:00
Zhe Wu aea57f6da4 Create MAX_WRITE_TRANSACTION_LIFE_VERSIONS client knob 2023-09-14 14:01:43 -07:00
Yao Xiao d90a5fa742
Disable read aware DD. (#10900) 2023-09-13 15:50:58 -07:00
Zhe Wu dcb0ccbd37 Adding more test; adding documentations 2023-09-13 14:37:02 -07:00
Zhe Wu af42816b0e Adding test for perpetual_storage_wiggle_locality to take a list of localities 2023-09-13 13:36:16 -07:00
Zhe Wu a86a2d752e Apply format 2023-09-13 13:36:16 -07:00
Zhe Wu 4abd2edcad Parse wiggle locality as a list 2023-09-13 13:36:16 -07:00
Zhe Wu d599c7bcbd Use regex to validate wiggle locality 2023-09-13 13:36:14 -07:00
Zhe Wang 2a40fb4135
fix audit state serializer (#10889) 2023-09-12 14:48:04 -07:00
Hao Fu 6d9c53f8c4 Add proxy to backup agent via global var
backup agent itself does not have proxy info.
This changes adds the proxy via a global var.
2023-09-08 10:27:01 -07:00
Zhe Wu dc48f1b325
Merge pull request #10876 from halfprice/zhewu/make-sure-end-is-not-persist
Make sure that storage and tlog are always set to a valid type
2023-09-07 10:58:35 -07:00
Hui Liu b2ee1fb6c4
Tune blob restore default knob (#10871) 2023-09-07 09:36:51 -07:00
Zhe Wu 9e5488dd3d Make sure that storage and tlog are always set to a valid type 2023-09-06 14:58:42 -07:00
Hui Liu 4d2a7d507d
Add a new blob restore state to fix a race after data copy (#10854) 2023-09-05 14:04:35 -07:00
Zhe Wu e2f5c50a7b
Merge pull request #10828 from halfprice/zhewu/clear-wiggle-storage-engine
Add option to set perpetual_storage_wiggle_engine to none
2023-09-05 11:06:21 -07:00
Hui Liu 00d3062728
Initialize apply mutations map for restore to version (#10857) 2023-09-05 10:03:35 -07:00
sfc-gh-tclinkenbeard dccc0e6773 Merge remote-tracking branch 'origin/main' into main-fix-clear-cost-estimation 2023-09-04 13:30:46 -07:00
A.J. Beamon ead6c37e4a
Merge pull request #10662 from sfc-gh-tclinkenbeard/main-fix-grv-queue-leak
Fix GRV queue leak
2023-09-01 09:45:49 -07:00
Zhe Wu 83992d61ec Add a knob to guard the gray failure detection during TLog recovery 2023-08-29 14:49:39 -07:00
Zhe Wu 314d1b66a5 Fix StatusWorkload after adding perpetual_storage_wiggle_engine 2023-08-29 13:58:23 -07:00
Yi Wu 8d7f2e84ed
Merge pull request #10831 from sfc-gh-yiwu/ear_timeout
EaR: Handle KMS timeout in storage server and commit proxy
2023-08-28 20:59:22 -07:00
Zhe Wang 432c077b51
fix dd issue when dd skip audit (#10844) 2023-08-28 16:39:45 -07:00
Yi Wu 3287098b4a EaR: Handle KMS timeout in storage server and commit proxy 2023-08-28 16:17:43 -07:00
Zhe Wang f43b20e15c
Audit location metadata in DD (#10820)
* Audit location metadata in DD

* nits
2023-08-25 17:11:11 -07:00
Josh Slocum 3bdcbef465
enabling median assignment limiting (#10805) 2023-08-25 17:52:54 -05:00
Yao Xiao b20dcf23a9
Support periodic compaction for sharded rocksdb. (#10815) 2023-08-25 15:38:01 -07:00
Zhe Wu 6610a228c7 Add option to set perpetual_storage_wiggle_engine to none 2023-08-24 13:43:58 -07:00
Zhe Wang 7e8f326277
Audit storage for specific engine (#10781)
* audit storage for specific engine

* fix getStorageType

* fix budget of skipAuditOnRange

* fix budget in scheduleAuditOnRange

* fix CI error

* improve trace events

* address comments
2023-08-23 10:51:24 -07:00
Jingyu Zhou 2a616b3866
Merge pull request #10763 from w41ter/fix_8882
Fix guess region from s3 URL
2023-08-22 22:24:21 -07:00
sfc-gh-tclinkenbeard 57eff6c5aa Track cost of point clears 2023-08-22 15:43:13 -07:00
Yao Xiao c63ee571e5
Export file metrics and add knob for file size multiplier. (#10785) 2023-08-16 11:27:33 -07:00
Hui Liu aea6fa5ca6
Set BLOB_RESTORE_SKIP_EMPTY_RANGES default value to false (#10784) 2023-08-16 10:02:06 -07:00
Ankita Kejriwal 7e424c7386
Stagger storage quota estimation requests and observability improvements (#10759)
* Rename and simplify fetch time variables

* Add RefreshTime detail to TenantCacheGetStorageUsageRefreshSlow trace

* Stagger storage estimation requests

* Update the value of a knob in simulation to reduce flakiness

* Improve names of TenantCache and StorageQuota related traces. Add slow refresh time.

* Convert potentially spammy TenantCache traces to SevDebug
2023-08-15 13:09:24 -07:00
Zhe Wang f1c17b27fc
Multiple improvements to AuditStorages (#10685)
* remove danger DDAudit assert, add AuditRate knob, add progress check when ssshard complete, add progress check for ssshard in fdbcli

* throttle progress check for ssshard

* fix getAuditProgressByServer

* fix trace event for ss audit

* using name -- checkMoveKeysLockForAudit

* new scheduleAuditLocationMetadata

* address comments

* shorten progress summary for ssshard

* simplify getAuditProgressByServer in fdbcli
2023-08-14 13:13:49 -07:00
Evan Tschannen 3209dc7b30
Fixed multiple bugs related to locality based exclusions (#10623)
* fix: Non-storage processes were not being checked for locality exclusions
fix: Data distribution when not detect a newly added process was locality excluded
fix: RemoveServerSafely did not wait for processes to be excluded before killing them when excluding localities

* fix: do not allow locality based excludes if they cannot exclude the required addresses
2023-08-11 15:17:02 -07:00
Zhe Wang e7528aca09
Revert inappropriate setting of ApiCorrectness test (#10771)
* add max_manifest_file_size to rocksdb

* revert ApiCorrectness config change
2023-08-11 12:41:55 -07:00
Zhe Wang 5868173a3e nits 2023-08-10 16:51:28 -05:00
Zhe Wang 5598f4f28f trace shardedrocks manifest size 2023-08-10 16:51:28 -05:00
Zhe Wu ab4ae712e8 Add PerpetualWiggleStorageMigrationWorkload documentation. 2023-08-10 09:35:57 -07:00
Zhe Wu 863038a44c Add improvement for initializing storage server using new perpetual_wiggle_storage_engine config 2023-08-10 09:35:57 -07:00
w41ter 5e6d15a0de Fix guess region from s3 URL 2023-08-10 09:09:51 +00:00
Jingyu Zhou 4a925b18e7
Merge pull request #10755 from w41ter/main
Avoid to send request to empty resource
2023-08-09 14:07:23 -07:00
He Liu df848005f8
Allow applyUpdates in multiple batches. (#10583) 2023-08-09 14:04:36 -07:00
w41ter 47a054e124 Avoid to send request to empty resource 2023-08-09 11:21:48 +00:00
Ankita Kejriwal 07e4516667
Reduce the frequency of polling tenants' storage usage (#10678) 2023-07-31 15:03:37 -07:00
Zhe Wang d0742c79ac
Improving visibility to debug sharded rocksdb (#10694)
* logging storage commit stats

* add rocks flush and compaction listener

* remove used field in FlushStats and fix CI error

* reduce LOGGING_ROCKSDB_BG_WORK_PROBABILITY

* merge rocks event listeners

* avoid using mutex/spinloop in rocksdb event listener

* code clean

* fix OnCompactionBegin and OnFlushBegin

* add logReason to RecentRocksDBBackgroundWorkStats

* add error listener back
2023-07-31 14:45:26 -07:00
Xiaoxi Wang 53ace31765 fix read ops sample name 2023-07-26 15:40:58 -07:00
Hao Fu a5f4d53c45
Remove SS entries from RateKeeper once it is down (#10627)
* Remove SS entries from RateKeeper once it is down

Before the change, certain data structures in RateKeeper would
not delete data associated with a deleted/cancelled SS, thus
it causes significant unnecessary CPU usage, results in degrades
of GRV proxy in performance.  This change fixes it.
2023-07-24 13:47:23 -07:00
sfc-gh-tclinkenbeard 345a5f2838 Buggify START_TRANSACTION_MAX_QUEUE_SIZE 2023-07-22 01:33:49 -07:00
Zhe Wang 3426fc3c1a
Add DD Security Mode (#10646)
* dd-security-mode

* address comments

* cleanup

* revise tr option set in loadAndUpdateAuditMetadataWithNewDDId

* address comments

* reset auditStorageInitStarted before DD init

* decouple audit resume and audit launch

* audit launch new request should wait for resuming existing requests

* address comment/clean up/fix

* fix

* fix initAuditMetadata retry

* fix initAuditMetadata retry should reset tr
2023-07-21 17:06:25 -07:00
Jingyu Zhou 8785840e52
Merge pull request #10638 from jzhou77/main
Fix a bug of hash value for backup log keys in getLogKey()
2023-07-21 16:50:20 -07:00
Jingyu Zhou 26906559e8
Merge pull request #10652 from yao-xiao-github/joshua
Set some rocksdb knobs to reduce simulation runtime.
2023-07-21 08:32:25 -07:00
Jingyu Zhou 24968674ce
Merge pull request #10643 from yao-xiao-github/write-buffer
Add knob.
2023-07-21 08:31:14 -07:00
Jingyu Zhou ae6ef0ce90 Add different block size for tests 2023-07-20 21:28:58 -07:00
Yao Xiao a4f04d2ffc 2 2023-07-20 15:08:33 -07:00
Hui Liu 7c8c24bc8d
blob restore : Log and skip data copy if we miss data for a certain tenant (#10621) 2023-07-19 09:52:30 -07:00
Zhe Wang 63d387eb0b
Add complete check for location metadata by audit storage (#10636)
* cleanup traceevent

* add complete check

* fix and cleanup

* nit

* code cleanup

* code cleanup

* increase audit retry count

* revise comments and no code changes
2023-07-19 09:40:58 -07:00
Yao Xiao ff2f752bf9 Add knob. 2023-07-18 16:54:06 -07:00
Jingyu Zhou 7d6481f6f9 Add a test case for getLogKey()
With input that exposes the bug.
2023-07-17 20:33:41 -07:00
Yi Wu e8d3e926b5 Merge REST_KMS_RESTCLIENT knobs with RESTCLIENT knobs 2023-07-17 20:06:02 -07:00
Yi Wu 333a08269a address comments 2023-07-17 20:06:02 -07:00
Yi Wu 6dd9d80b6a reduce KMS request timeout 2023-07-17 20:06:02 -07:00
Yi Wu e095309d51 add kms stable state to status json 2023-07-17 20:06:02 -07:00
Trevor Clinkenbeard ccc286b2ea
Merge pull request #10633 from sfc-gh-tclinkenbeard/main-simplify-limiting-tps
Simplify `GlobalTagThrottler` limiting TPS calculation
2023-07-17 16:46:37 -07:00
Jingyu Zhou 944e45e630 Fix a bug of hash value for backup log keys
Need to be consistent with getLogRanges().
2023-07-17 14:31:16 -07:00
Nim Wijetunga 01acd8d3c6
blob granule inplace encryption (#10619) 2023-07-17 10:44:11 -07:00
sfc-gh-tclinkenbeard 9c6b365267 Simplify limiting rate calculation for GlobalTagThrottler 2023-07-16 22:33:13 -07:00
Zhe Wang 522c9d4f0f
Add new implementation of audit storage for user data (#10613)
* remainingBudgetForAuditTasks should be managed within audit

* fix CI

* add audit storage test for various ranges

* clean DD

* new auditStorageUserDataQ

* fix assert fail in startTrackShardAssignment

* fix assert fail in ssaudit

* address comments

* replace assert with audit_cancel in ss audits

* add audit check progress tool

* add observability to audit progress and fix audit bugs

* fix audit progress issues and add sim test for audit progress and add trace event for the audit progress and add fdbcli to track the audit progress

* remove old audit storage on SS

* check audit progress when auditCore completes
2023-07-16 09:56:26 -07:00
Nim Wijetunga 7f2260bbd2
Add Encryption Related Latency Metrics (#10596)
* add ss and cp latency metrics

* make changes
2023-07-14 11:30:16 -07:00
Yanqin Jin 9c51fa082e
Merge pull request #10592 from sfc-gh-yajin/validate-no-data-outside-tenants
If tenant mode is REQUIRED, then we should verify that in the normal key space, no data exists outside tenants' prefixes. This applies to data clusters (also known as partition clusters) in a metacluster and standalone clusters with tenants.
For the management cluster of a metacluster, we should verify that no data exists outside the prefix ranges specified by tenant/ and metacluster/ in the normal key space.

Test plan:
devRunCorrectnessFiltered +Metacluster* +Tenant* --max-runs 100000

20230702-052847-yajin-082705d269588494. 0 Failure

devRunCorrectness --max-runs 100000

20230702-134219-yajin-e9cce7bd165e70a9. 1 Failure, unrelated to this change
2023-07-12 09:05:39 -07:00
Trevor Clinkenbeard c54e1101c3
Merge pull request #10586 from sfc-gh-tclinkenbeard/main-gtt-forget-old-ss
Add `ExpectStableThroughput` simulation test
2023-07-10 14:16:05 -07:00
A.J. Beamon c24709d630
Merge pull request #10552 from sfc-gh-ajbeamon/automatic-idempotency-fix
Automatic idempotency test flagged improper cleanup
2023-07-10 10:17:33 -07:00
Josh Slocum 8ec439f023
fixing apparent source of nondeterminism in bgfiles (#10605) 2023-07-10 12:59:30 -04:00
sfc-gh-tclinkenbeard 2673a727ac Merge remote-tracking branch 'origin/main' into main-gtt-forget-old-ss 2023-07-09 12:57:23 -07:00
William Dowling 8625eb68b0 Allow both memory-radixtree and memory-radixtree-beta modes 2023-07-06 14:51:23 +02:00
Yanqin Jin 2c8b682310 Fix build issues 2023-07-05 15:49:29 -07:00
Jon Fu d9e61d3d98
Fix bug in FDB MultiVersionTransaction.actor.cpp (#10576)
* throw error in tenantUpdater if result is an error

* avoid throwing error in thread future lambda
2023-07-05 15:23:38 -04:00
Yao Xiao ab72951034
Add knob for manifest file size and log rocksdb status. (#10567) 2023-07-05 10:40:43 -07:00
Yanqin Jin 13fac35f53 SNOW-791059 Verify no data outside tenants in REQUIRED mode (#489)
If tenant mode is REQUIRED, then we should verify that in the normal key space, no data exists outside
tenants' prefixes. This applies to data clusters (also known as partition clusters) in a metacluster and standalone clusters
with tenants.
For the management cluster of a metacluster, we should verify that no data exists outside the prefix ranges specified by `tenant/` and `metacluster/` in the normal key space.

Test plan:
devRunCorrectnessFiltered +Metacluster* +Tenant* --max-runs 100000

20230702-052847-yajin-082705d269588494. 0 Failure
devRunCorrectness --max-runs 100000

20230702-134219-yajin-e9cce7bd165e70a9. 1 Failure, unrelated to this change
2023-07-05 10:33:49 -07:00
William Dowling 3ea1ba1648 Remove beta status from RadixTree storage engine 2023-07-05 17:54:54 +02:00
Ata E Husain Bohra 7779c908b3
EaR: Remove usage of ENABLE_CONFIGURABLE_ENCRYPTION knob (#10570)
Description

Given Configurable encryption has been checked in and being tested via
simulation for more than a month and also to avoid penalty of accessing
KNOBS in inline commit path, patch retires the KNOB and make
ConfigurationEncryption default EaR mode for FDB.

BlobCipher still supports the old format header and encryption semantics,
will remove the dead code as a followup PR.

Testing

devRunCorrectness - 100K
2023-06-30 17:48:09 -07:00
A.J. Beamon 76b73de506 Disable automatic idempotency most of the time, and when it is disabled check that cleaning is working 2023-06-30 16:09:11 -07:00
sfc-gh-tclinkenbeard b7bd6fc602 Cleanup stale throughput statistics 2023-06-30 13:04:56 -07:00
sfc-gh-tclinkenbeard 4f711b15cd Fix comments for proxy knobs 2023-06-28 01:27:35 -07:00
sfc-gh-tclinkenbeard 86fae8b7b5 Increase default value of TAG_THROTTLE_MAX_EMPTY_QUEUE_BUDGET 2023-06-28 01:27:28 -07:00
sfc-gh-tclinkenbeard 6e86f94cb9 Decouple token bucket knobs for different types of throttlers 2023-06-28 01:27:19 -07:00
Jingyu Zhou 178cddbffd
Merge pull request #10563 from sfc-gh-etschannen/fix-durable-change-feeds
cancel durable change feed actors in DatabaseContext destructor
2023-06-27 13:39:20 -07:00
Hui Liu 788f3420c8
Fix tenant map update race when applying mutations (#10557) 2023-06-27 10:31:58 -07:00
Evan Tschannen b247f565b7 cancel durable change feed actors in DatabaseContext destructor 2023-06-27 09:22:47 -07:00
Zhe Wang 37689af3f2
Detect inconsistency of KeyServers and ServerKeys in real time (#10484)
* add framework

* add audit logic

* refactor audit loc metadata

* address comments

* add realtime audit timeout, add post validation logic

* fix input empty range to compareKeyServersAndServerKeys

* add context for auditKeyServersAndServerKeysInRealTime

* focus on moveShard

* remove space

* address comments

* cleanup

* add audit cleanup

* make validateRangeAssignment simple

* change trace name

* add shardAssigned

* stop DD when inconsistency detected

* fix ci

* small fix

* revert ss and auditUtl and simplify rt audit

* cleanup ss

* tiny change

* address comments and refactor code

* make auditLocationMetadataPreCheck retriable

* handle actor cancel in auditLocationMetadataPreCheck

* rm timeout and add new protection for failure of audit

* fix bugs

* import dataMoveId to validation

* improve trace event

* carefully propagate error and stop DD

* tiny fix

* small change

* remove a state var

* nit

* clean comments

* fmt
2023-06-23 17:40:21 -07:00
A.J. Beamon 1af1346cde Change a buggify and decrease the number of transactions per second so that the dr upgrade test runs faster 2023-06-23 09:03:34 -07:00
A.J. Beamon b5497bdc3e
Merge pull request #10529 from sfc-gh-ajbeamon/test-metacluster-use-after-restore
Test use of the metacluster after a restore
2023-06-21 18:50:18 -07:00
Ata E Husain Bohra 370e39fbdb
Update SimKmsVault unit test assert checking max encryption keys (#10535)
Description

SimKmsVault unit test when run as part of simulation Random test,
based on the test order, SimKmsVaultKeyCtx can be initialized as
part of some other test (FlowSingleton).
Update the test to handle the scenario.

Testing

devRunCorrectness - 100K
2023-06-21 17:17:21 -07:00
A.J. Beamon a2302e45ec
Merge pull request #10522 from sfc-gh-ajbeamon/fix-mapped-range-assert
Fix check in getExactRange that determines whether we can return early
2023-06-21 10:24:52 -07:00
A.J. Beamon cc68320333 Add testing that a metacluster can be used after a restore. Fix some bugs found by this related to the restore ID and tenant tombstones. 2023-06-21 08:59:22 -07:00
Zhe Wu aac6d95182
Merge pull request #10525 from halfprice/zhewu/turn-on-batch-empty-commit-by-default
Turn on batching empty peek by default
2023-06-21 07:57:17 -07:00
Zhe Wu f3e95f6698
Merge pull request #10521 from halfprice/zhewu/gray-failure-recovery
Worker health monitor tracks recovered peers
2023-06-21 07:56:55 -07:00
Zhe Wu 514561e167 Turn on batching empty peek by default 2023-06-20 20:24:20 -07:00
Zhe Wu 8eb526684a
Merge pull request #10437 from halfprice/zhewu/one-shard-per-physical-shard
Add an option to limit the number of shard per team
2023-06-20 19:43:31 -07:00
A.J. Beamon 75ec56bffb When redoing a key location request, wait until after we've checked whether we've satisfied our min rows 2023-06-20 16:02:12 -07:00
Zhe Wu ef638cd6a6 Worker health monitor keeps track of recovered peer 2023-06-20 12:32:29 -07:00
Evan Tschannen 902d87a72c
Merge pull request #10515 from sfc-gh-etschannen/fix-ikeyvaluestore
fix IKeyValueStore include
2023-06-20 06:28:23 -07:00
Hui Liu af20493ad0
Move lastFlushTs to BlobGranuleBackupConfig (#10505) 2023-06-16 16:12:10 -07:00
Jefferson Zhong 13853c9f89 Move stepSize knob from ClientKnobs to ServerKnobs 2023-06-16 14:48:11 -07:00
Evan Tschannen 60cbc7ec8d fix formatting 2023-06-16 13:43:25 -07:00
Evan Tschannen ef682d304e fix IKeyValueStore include 2023-06-16 13:28:40 -07:00
Jay Zhuang 5130fe99d3
Fix RandomStringSetGenerator may not generate enough unique key (#10503)
For small key set, RandomStringSetGenerator may not try enough random
key to generate unique key set.
Increase the retry for smaller key set number.
2023-06-15 17:23:13 -07:00
Jingyu Zhou b23dbd7105
Merge pull request #10488 from sfc-gh-huliu/describebackup
Fix computeRestoreEndVersion bug when outLogs is null
2023-06-14 21:36:02 -07:00
Yao Xiao 7d1c6e7f17
Improve sharded rocksdb init time. (#10475) 2023-06-15 09:37:32 +08:00
Jay Zhuang ea52e90f03
Remove unnecessary padding for encrypt/decrypt (#9942)
* No padding is needed for AES CTR mode

https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Padding

* Remove EVP_CIPHER_CTX_reset() so the encryptor could be reused

It's only reused in the unittest. In prod, new encryptor is created for
every new text.
2023-06-14 13:37:29 -07:00
Hui Liu 606d8db75f
Remove blobGranuleLockKeys after blob granule restore (#10477) 2023-06-14 12:41:42 -07:00
Evan Tschannen 74614161b2
Merge pull request #10460 from sfc-gh-etschannen/feature-durable-change-feed
Cache change feeds durably on blob workers
2023-06-14 08:35:52 -07:00
Ata E Husain Bohra bfbf8cd053
EaR: Update KMS URL refresh policy and fix bugs (#10382)
* EaR: Update KMS URL refresh policy and fix bugs

Description

RESTKmsConnector implements discovery and refresh semantics i.e.
on bootstrap it discovers KMS Urls and periodically refresh the
URLs (handle server upgrade scenario). The current implementation
caches the URLs in a min-heap, as part of serving a request, actor
pops out elements from min-heap and attempts connecting to the server,
on failure, the URL is temporarily stored in a stack, at the end of
the request processing, the stack is merged back into the heap.
The code doesn't work as expected if there are multiple requests
consumes the heap causing following issues:
1. Min-heap would retain old URLs replaced by latest refresh (stack merge)
2. URL discovery file is read more than expected as multiple requests can
empty heap, causing the code to read URLs from the file.

Patch proposes following policy to cache and maintain URLs priority:
1. Unresponsiveness penalty: KMS flaky connection or overload can cause
requests to timeout or fail; each such instance updates unresponsiveness
penalty of associated URL context. Further, the penalty is time bound and
deteriorate with time.
2. Cached URLs are sorted once a failure is encountered, priority followed
is:
2.1. Unresponsiveness penalty server(s) least preferred
2.2. Server(s) with high total-failures less preferred
2.3. Server(s) with high total-malformed response less preferred.
3. Updates RESTClient to throw 'retryable' error up to the client such as:
'connection_failed' and/or 'timeout'
4. Extend RESTUrl to support IPv6 format.

Testing

RESTUnit - 100K (new test added for coverage)
devRunCorrectness
2023-06-14 08:06:39 -07:00
Hui Liu f84cedd361 Fix computeRestoreEndVersion bug when outLogs is null 2023-06-13 17:03:57 -07:00
Evan Tschannen eb772c0043 added a blob worker specific page cache size for redwood so that it does not have to be changed manually in fdb.conf for all blob worker processes 2023-06-13 10:35:13 -07:00
Hui Liu 630013cfd9
Fix MoveKeysClean.toml failure (#10470) 2023-06-13 08:45:03 -07:00
Josh Slocum 31e4610b56
misc operational and documentation improvements (#10465)
* misc operational and documentation improvements

* fixing doc build
2023-06-12 15:14:01 -05:00
Evan Tschannen 88eed268c3 added a knob for how many bytes are read from disk 2023-06-11 16:10:20 -07:00
Evan Tschannen b09d6d44eb disable change feed cache by default 2023-06-11 15:24:37 -07:00
Evan Tschannen a8ceadd917 actor cancellation still needs to unset storage 2023-06-11 14:55:05 -07:00
Evan Tschannen 359e178dcd Merge branch 'main' into feature-durable-change-feed
# Conflicts:
#	fdbclient/ClientKnobs.cpp
#	fdbserver/BlobManager.actor.cpp
#	fdbserver/worker.actor.cpp
2023-06-11 13:58:35 -07:00
Evan Tschannen f69f4c73ad addressed review comments 2023-06-11 13:54:38 -07:00
Evan Tschannen 7322e21e23 fixed compiler error 2023-06-11 09:25:05 -07:00
Evan Tschannen 334a868dfe fix: respect end when reading from disk; update the starting version when leaving a hole on disk 2023-06-11 09:24:09 -07:00
Evan Tschannen d03f08f914 fix: not all mutations were being made durable 2023-06-10 18:36:02 -07:00
Evan Tschannen be8d8a8f72 fix: popping the cache was removing too many versions 2023-06-09 16:20:48 -07:00
Evan Tschannen 33a7f57da5 fix: clear the cache when popping change feeds; do not insert versions into the cache that are already durable 2023-06-09 13:49:33 -07:00
Yi Wu 7048ad21a8
EaR: reduce metrics logging (#10453)
* EaR: reduce metrics logging

BlobCipherMetrics used to break down by usage types (whehter it is for tlog, redwood, backup, etc), and these counters will be printed to trace log even when encryption is not enabled, or the specific usage is not happening on a node (e.g. a node with only stateless roles will also print blob cipher counters for redwood). We are reducing the BlobCipherMetrics loggings by:
1. Default to not breakdown the metrics by usage type, and the behavior is controlled by the knob  `ENCRYPT_KEY_CACHE_ENABLE_DETAIL_LOGGING`
2. When the detail breakdown is enabled, the counters are lazily initialize
3. Even if the counters are initialized, they will not be logged if the count is 0 (so like if a node was recruited as tlog but then drops the tlog role later on, the tlog counter inside BlobCipherMetrics will not be logged anymore).

* buggify BlobCipherMetrics detail logging knob

* format
2023-06-09 12:07:49 -07:00
neethuhaneesha a71de03cb9
Rocksdb max auto readahead size knob change to 64k (#10449) 2023-06-09 11:26:06 -07:00
Ata E Husain Bohra 31aa06cfbc
EaR: Add test case to validate decryption with invalid key (#10394)
* EaR: Add test case to validate decryption with invalid key

Description

Extend BlobCipher unit test to provide coverage for the scenario
where buffer got encrypted with a EncryptionKey K, however,
decryption for some reason got attempted with K'.

Testing

EncryptionUnit.toml - 100K

* EaR: Add test case to validate decryption with invalid key

Description

Address review comments

Testing
2023-06-08 22:32:15 -07:00
Jingyu Zhou f210cf708d
Merge pull request #10436 from jzhou77/main
Add getlocation and getall fdbcli debug commands
2023-06-08 22:05:03 -07:00
Zhe Wang d85f6e95c4
init (#10458) 2023-06-08 18:30:32 -07:00
Josh Slocum b209cd5d19
Consistency scan polish (#10445)
* added operational metrics and some polish

* moving consistency scan enablement in simulation tests to main tester workflow

* more stats and throttling polish
2023-06-08 14:18:58 -05:00
Zhe Wang 8119e3da87
Fix audit storage actor cancel issue (#10443)
* init

* add testAuditStorageConcurrentRunForSameType test
2023-06-08 09:53:32 -07:00
Hui Liu ef93caf344
BlobGranuleRestore - skip muations applying if restore target version is less than begin version (#10442) 2023-06-08 09:19:25 -07:00
Zhe Wu 77f2caf030 Clean up documentation for PreferWithinShardCount 2023-06-07 22:20:04 -07:00
Lukas Joswiak c3d518409c Fix a bug where fulfilling a promise could cause it to get deleted
Make a local copy of the promise before calling `send` in case the
promise gets destroyed as a result of fulfilling it.

This issue was previously fixed for sending errors to the `result`
promise, but it was never fixed when fulfilling the promise. The issue
manifested as an invalid generation returned when running a `set`
against the configuration database immediately followed by a `get` with
a new transaction object.
2023-06-07 15:26:08 -07:00
Jingyu Zhou b8c0087ca6 Fix compiling errors 2023-06-07 15:10:00 -07:00
Zhe Wu 6b17f9fcf3 Adding PreferWithinShardLimit option 2023-06-07 14:38:58 -07:00
Jingyu Zhou 614686f737 Add getlocation and getall fdbcli debug commands
getlocation: returns the SS list for a key
getall: returns both the SS list and values on the SS for a key
2023-06-07 14:36:16 -07:00
sfc-gh-tclinkenbeard 6b28c53211 Merge remote-tracking branch 'origin/main' into main-fix-op-cost-bug 2023-06-07 09:58:29 -07:00
Evan Tschannen 197c39b552 cache change feeds using a storage engine to avoid reading them for the server on startup 2023-06-07 08:41:31 -07:00
Andrew Noyes bdec6bf5b9
Remove some unnecessary ref-counting in the PTree (#10401)
* Return const references in PTree accessors

Many usages do not require copying the reference (and incurring the
ref-counting overhead)

* Remove unnecessary refcounting for rotating ptree
2023-06-07 09:49:48 -05:00
Josh Slocum 220b7d1a37
Consistency scan test improvements (#10402)
* adding consistency scan clear stats and testing in simulation

* Adding test that intentionally injects corruption in consistency scan requests and ensures the scan finds it

* cleanup

* adding assert false to disabled code
2023-06-07 07:21:47 -05:00
Zhe Wang f8f8f72c4e
Add audit storage cancellation (#10386)
* list audits

* cancel audits and corresponding tests

* make audit storage dblock aware

* increase audit retry since we are able to cancel

* fix updateAuditState and fdb github ci

* fmt

* fix fdbcli audit_storage and fix CI issue

* fix fdb cli

* address comments

* fmt
2023-06-06 14:29:53 -07:00
He Liu fc8543125c
Added location_metadata fdbcli to query shard locations, assignements… (#10395)
* Added location_metadata fdbcli to query shard locations, assignements, numbers etc.

* Added `listshards` to get some random physical/non-physical shards.

* Resolved comments.
2023-06-06 10:33:48 -07:00
Hui Liu 8fcac8a9a9
Dump manifest by using multiple transactions (#10380) 2023-06-05 11:22:29 -07:00
Konrad `ktoso` Malawski c26aa0b2a3
Introduce initial Swift support in fdbserver (#10156)
* [fdbserver] workaround the FRT type layout issue to get Swfit getVersion working

* MasterData.actor.h: fix comment typo

* masterserver.swift: some tweaks

* masterserver.swift: remove getVersion function, use the method

* masterserver.swift: print replied version to output for tracing

* [swift] add radar links for C++ interop issues found in getVersion bringup

* Update fdbserver.actor.cpp

* Migrate MasterData closer to full reference type

This removes the workaround for the FRT type layout issue, and gets us closer to making MasterData a full reference type

* [interop] require a new toolchain (>= Oct 19th) to build

* [Swift] fix computation of toAdd for getVersion Swift implementation

* add Swift to FDBClient and add async `atLeast` to NotifiedVersion

* fix

* use new atLeast API in master server

* =build fixup link dependencies in swift fdbclient

* clocks

* +clock implement Clock using Flow's notion of time

* [interop] workaround the immortal retain/release issue

* [swift] add script to get latest centos toolchain

* always install swift hooks; not only in "test" mode

* simulator - first thing running WIP

* cleanups

* more cleanup

* working snapshot

* remove sim debug printlns

* added convenience for whenAtLeast

* try Alex's workaround

* annotate nonnull

* cleanup clock a little bit

* fix missing impls after rebase

* Undo the swift_lookup_Map_UID_CommitProxyVersionReplies workaround

No longer needed - the issue was retain/release

* [flow][swift] add Swift version of BUGGIFY

* [swiftication] add CounterValue type to provide value semantics for Counter types on the Swift side

* remove extraneous requestingProxyUID local

* masterserver: initial Swift state prototype

* [interop] make the Swiftied getVersion work

* masterserver - remove the C++ implementation (it can't be supported as state is now missing)

* Remove unnecessary SWIFT_CXX_REF_IMMORTAL annotations from Flow types

* Remove C++ implementation of CommitProxyVersionReplies - it's in Swift now

* [swift interop] remove more SWIFT_CXX_REF_IMMORTAL

* [swift interop] add SWIFT_CXX_IMMORTAL_SINGLETON_TYPE annotation for semanticly meaningful immortal uses

* rename SWIFT_CXX_REF_IMMORTAL -> UNSAFE_SWIFT_CXX_IMMORTAL_REF

* Move master server waitForPrev to swift

* =build fix linking swift in all modules

* =build single link option

* =cmake avoid manual math, just get "last" element from list

* implement Streams support (#18)

* [interop] update to new toolchain #6

* [interop] remove C++ vtable linking workarounds

* [interop] make MasterData proper reference counted SWIFT_CXX_REF_MASTERDATA

* [interop] use Swift array to pass UIDs to registerLastCommitProxyVersionReplies

* [interop] expose MasterServer actor to C++ without wrapper struct

* [interop] we no longer need expose on methods 🥳

* [interop] initial prototype of storing CheckedContinuation on the C++ side

* Example of invoking a synchronous swift function from a C++ unit test. (#21)

* move all "tests" we have in Swift, and priority support into real modules (#24)

* Make set continuation functions inline

* Split flow_swift into flow_swift and flow_swift_future to break circular dependency

* rename SwiftContinuationCallbackStruct to FlowCallbackForSwiftContinuation

* Future interop: use a method in a class template for continuation set call

* Revert "Merge pull request #22 from FoundationDB/cpp-continuation" (#30)

* Basic Swift Guide (#29)

Co-authored-by: Alex Lorenz <arphaman@gmail.com>

* Revert "Revert "Merge pull request #22 from FoundationDB/cpp-continuation" (#30)"

This reverts commit c025fe6258.

* Restore the C++ continuation, but it seems waitValue is broken for CInt somehow now

* disable broken tests - waitValue not accessible

* Streams can be async iterated over (#27)

Co-authored-by: Alex Lorenz <arphaman@gmail.com>

* remove work in progress things (#35)

* remove some not used (yet) code

* remove expose func for CInt, it's a primitive so we always have witness info (#37)

* +masterdata implement provideVersions in Swift (#36)

* serveLiveCommittedVersion in Swift (#38)

* Port updateLiveCommittedVersion to swift (#33)

Co-authored-by: Konrad `ktoso` Malawski <konrad_malawski@apple.com>

* Implement updateRecoveryData in Swift (#39)

Co-authored-by: Alex Lorenz <arphaman@gmail.com>

* Simplify flow_swift to avoid multiple targets and generate separate CheckedContinuation header

* Uncomment test which was blocked on extensions not being picked up (#31)

* [interop] Use a separate target for Swift-to-C++ header generation

* reduce boilerplate in future and stream support (#41)

* [interop] require interop v8 - that will fix linker issue (https://github.com/apple/swift/issues/62448)

* [interop] fix swift_stream_support.h Swift include

* [interop] bump up requirement to version 9

* [interop] Generalize the Flow.Optional -> Swift.Optional conversion using generics

* [WIP] masterServer func in Swift (#45)

* [interop] Try conforms_to with a SWIFT_CONFORMS_TO macro for Optional conformance (#49)

* [interop] include FlowOptionalProtocol source file when generating Flow_CheckedContinuation.h

This header generation step depends on the import of the C++ Flow module, which requires the presence of FlowOptionalProtocol

* conform Future to FlowFutureOps

* some notes

* move to value() so we can use discardable result for Flow.Void

* make calling into Swift async funcs nicer by returning Flow Futures

* [interop] hide initial use of FlowCheckedContinuation in flow.h to break dependency cycle

* [fdbserver] fix an EncryptionOpsUtils.h modularization issue (showed up with modularized libc++)

* Pass GCC toolchain using CMAKE_Swift_COMPILE_EXTERNAL_TOOLCHAIN to Swift's clang importer

* [interop] drop the no longer needed libstdc++ include directories

* [cmake] add a configuration check to ensure Swift can import C++ standard library

* [swift] include msgpack from msgpack_DIR

* [interop] make sure the FDB module maps have 'export' directive

* add import 'flow_swift' to swift_fdbserver_cxx_swift_value_conformance.swift

This is needed for CONFORMS_TO to work in imported modules

* make sure the Swift -> C++ manually bridged function signature matches generated signature

* [interop][workaround] force back use of @expose attribute before _Concurrency issue is fixed

* [interop] make getResolutionBalancer return a pointer to allow Swift to use it

We should revert back to a reference once compiler allows references again

* [interop] add a workaround for 'pop' being marked as unsafe in Swift

* masterserver.swift: MasterData returns the Swift actor pointer in an unsafe manner

* Add a 'getCopy' method to AsyncVar to make it more Swift friendly

* [interop] bump up the toolchain requirement

* Revert "[interop][workaround] force back use of @expose attribute before _Concurrency issue is fixed"

This reverts commit b01b271a76.

* [interop] add FIXME comments highlighting new issue workarounds

* [interop] adopt the new C++ interoperability compiler flag

* [interop] generate swift compile commands

* Do not deduplicate Swift compilation commands

* [interop] generate swift compile commands

* Do not deduplicate Swift compilation commands

* flow actorcompiler.h: add a SWIFT_ACTOR empty macro definition

This is needed to make the actor files parsable by clangd

* [cmake] add missing dependencies

* experimental cross compile

* [cmake] fix triple in cross-compiled cmake flags

* [interop] update to interop toolchain version 16

* [x-compile] add flags for cross-compiling boost

* cleanup x-compile cmake changes

* [cmake] fix typo in CMAKE_Swift_COMPILER_EXTERNAL_TOOLCHAIN config variable

* [interop] pass MasterDataActor from Swift to C++ and back to Swift

* [fdbserver] Swift->C++ header generation for FDBServer should use same module cache path

* Update swift_get_latest_toolchain.sh to fetch 5.9 toochains

* set HAVE_FLAG_SEARCH_PATHS_FIRST for cross compilation

* Resolve conflicts in net2/sim2/actors, can't build yet

* undo SWIFT_ACTOR changes, not necessary for merge

* guard c++ compiler flags with is_cxx_compile

* Update flow/actorcompiler/ActorParser.cs

Co-authored-by: Evan Wilde <etceterawilde@gmail.com>

* update the boost dependency

* Include boost directory from the container for Swift

* conform flow's Optional to FlowOptionalProtocol again

* Guard entire RocksDBLogForwarder.h with SSD_ROCKSDB_EXPERIMENTAL to avoid failing on missing rocksdb APIs

* remove extraneous merge marker

* [swift] update swift_test_streams.swifto to use vars in more places

* Add header guard to flow/include/flow/ThreadSafeQueue.h to fix moduralization issue

* Update net and sim impls

* [cmake] use prebuilt libc++ boost only when we're actually using libc++

* [fdbserver] Swift->C++ header generation for FDBServer should use same module cache path

* fixups after merge

* remove CustomStringConvertible conformance that would not be used

* remove self-caused deprecation warnings in future_support

* handle newly added task priority

* reformatting

* future: make value() not mutating

* remove FIXME, not needed anymore

* future: clarify why as functions

* Support TraceEvent in Swift

* Enable TraceEvent using a class wrapper in Swift

* prearing WITH_SWIFT flag

* wip disabled failing Go stuff

* cleanup WITH_SWIFT_FLAG and reenable Go

* wip disabled failing Go stuff

* move setting flag before printing it

* Add SWIFT_IDE_SETUP and cleanup guides and build a bit

* Revert "Wipe packet buffers that held serialized WipedString (#10018)"

This reverts commit e2df6e3302.

* [Swift] Compile workaround in KeyBackedRangeMap; default init is incorrect

* [interop] do not add FlowFutureOps conformance when building flow clang module for Flow checked continuation header pre-generation

* make sure to show  -DUSE_LIBCXX=OFF in readme

* readme updates

* do not print to stderr

* Update Swift and C++ code to build with latest Swift 5.9 toolchain now that we no longer support universal references and bridge the methods that take in a constant reference template parameter correctly

* Fix SERVER_KNOBS and enable use them for masterserver

* Bump to C++20, Swift is now able to handle it as well

* Put waitForPrev behind FLOW_WITH_SWIFT knob

* Forward declare updateLiveCommittedVersion

* Remove unused code

* fix wrong condition set for updateLiveCommittedVersion

* Revert "Revert "Wipe packet buffers that held serialized WipedString (#10018)""

This reverts commit 5ad8dce052.

* Enable go-bindings in cmake

* Revert "Revert "Wipe packet buffers that held serialized WipedString (#10018)""

This reverts commit 5ad8dce052.

* USE_SWIFT flag so we "build without swift" until ready to by default

* uncomment a few tests which were disabled during USE_SWIFT enablement

* the option is WITH_SWIFT, not USE

* formatting

* Fix masterserver compile error

* Fix some build errors.

How did it not merge cleanly? :/

* remove initializer list from constructor

* Expect Swift toolchain only if WITH_SWIFT is enabled

* Don't require Flow_CheckedContinuation when Swift is disabled

* Don't compile FlowCheckedContinuation when WITH_SWIFT=OFF

* No-op Swift macros

* More compile guards

* fix typo

* Run clang-format

* Guard swift/bridging include in fdbrpc

* Remove printf to pass the test

* Remove some more printf to avoid potential issues

TODO: Need to be TraceEvents instead

* Remove __has_feature(nullability) as its only used in Swift

* Don't use __FILENAME__

* Don't call generate_module_map outside WITH_SWIFT

* Add some more cmake stuff under WITH_SWIFT guard

* Some more guards

* Bring back TLSTest.cpp

* clang-format

* fix comment formatting

* Remove unused command line arg

* fix cmake formatting in some files

* Address some review comments

* fix clang-format error

---------

Co-authored-by: Alex Lorenz <arphaman@gmail.com>
Co-authored-by: Russell Sears <russell_sears@apple.com>
Co-authored-by: Evan Wilde <etceterawilde@gmail.com>
Co-authored-by: Alex Lorenz <aleksei_lorenz@apple.com>
Co-authored-by: Vishesh Yadav <vishesh_yadav@apple.com>
Co-authored-by: Vishesh Yadav <vishesh3y@gmail.com>
2023-06-02 16:09:28 -05:00
Nim Wijetunga 95bf14323f
EKP and KMS Health Check (#10341)
EKP and KMS Health Check
2023-06-01 16:24:04 -07:00
Trevor Clinkenbeard e6fb1e6a47
Merge pull request #10389 from sfc-gh-tclinkenbeard/main-holt-linear-smoother
Add `HoltLinearSmoother` class
2023-06-01 09:22:10 -07:00
sfc-gh-tclinkenbeard 3c6941192e Merge remote-tracking branch 'origin/main' into main-fix-op-cost-bug 2023-05-31 19:05:39 -07:00
sfc-gh-tclinkenbeard f647a1289e Split GLOBAL_TAG_THROTTLING_FOLDING_TIME into several knobs 2023-05-31 17:20:32 -07:00
Ata E Husain Bohra 4f21e0cfcd
EaR: Optimize logging from GetEncryptCipherKey (#10326)
Description

Optimize logging emitted from GetEncryptCipherKey module,
especially the one more useful for debugging and not very useful
in the production

Testing

SwizzledRollbackSideBand - randomSeed (276500218)
devRunCorrectness - 100k
2023-05-31 16:56:07 -07:00
Trevor Clinkenbeard 7cf901df43
Merge pull request #10354 from sfc-gh-tclinkenbeard/main-tag-counter-optimizations
Improve performance of `TransactionTagCounter`
2023-05-31 16:08:12 -07:00
Jingyu Zhou 36f2f9015e
Merge pull request #10357 from w41ter/main
Fix restore range loss
2023-05-31 15:43:25 -07:00
sfc-gh-tclinkenbeard b227c0ca27 Merge remote-tracking branch 'origin/main' into main-fix-op-cost-bug 2023-05-31 13:57:07 -07:00
He Liu aeb7f2bd4d Merge branch 'main' of https://github.com/apple/foundationdb into disable-physical-shard-move 2023-05-30 15:27:57 -07:00
Jingyu Zhou 43d67d6f98 Should repeat when speedUpSimulation is false 2023-05-30 11:08:48 -07:00
Jingyu Zhou 0674984ab1 Fix a simulation DR stuck issue
When buggify is enabled, it's possible the version map has 5 entries, which is
larger than BACKUP_MAP_KEY_LOWER_LIMIT, causing the range task to be delayed
infinitely: the BackupRangeTaskFunc::_execute() skips the execution and
schedules the task to be added back in BackupRangeTaskFunc::_finish().

Reproduction:
  Seed: -f ./tests/slow/SharedDefaultBackupCorrectness.toml -s 3202874095 -b on
        -f ./tests/slow/VersionStampBackupToDB.toml -s 1190111003 -b on
  Commit: 6e5773dd5 at release-7.3
  Build: clang
2023-05-30 09:50:24 -07:00
Zhe Wang 61aaca005e
SS Audit Storage Throttling (#10322)
* ss audit storage throttling

* add audit manager to ss

* reduce CONCURRENT_AUDIT_TASK_COUNT_MAX

* revises comments

* fix audit cli

* fix getAuditStates

* remove toStringForCLI
2023-05-29 14:43:47 -07:00
w41ter abd23958c2 Fix restore range loss 2023-05-29 11:39:07 +08:00
sfc-gh-tclinkenbeard 0cfbe4ccc1 Fix get*OperationCost functions for empty mutations/results 2023-05-28 13:17:24 -07:00
sfc-gh-tclinkenbeard a1b0a6b35e Merge remote-tracking branch 'origin/main' into main-tag-counter-optimizations 2023-05-27 13:04:11 -07:00
Jingyu Zhou d0e9a14b73
Merge pull request #10324 from liquid-helium/delete-data-move-checkpoints-by-id
Removed ENABLE_DD_PHYSICAL_SHARD_MOVE
2023-05-27 10:07:02 -07:00
sfc-gh-tclinkenbeard 926a7cbb4d Add AUTO_TAG_THROTTLE_SPRING_BYTES_STORAGE_SERVER knob 2023-05-26 16:10:37 -07:00
sfc-gh-tclinkenbeard 2385dd36f3 Update GLOBAL_TAG_THROTTLING_FOLDING_TIME default to 10.0 2023-05-26 16:10:37 -07:00
Zhe Wang 53675db306
Fix audit storage issue with multiple DDs (#10310)
* init

* add DDAuditContext

* move metadata update before runauditstorage

* revert DDAuditContext and replace ddAuditId with ddId

* cleanup
2023-05-26 15:56:03 -07:00
He Liu caed7ed374 Merge branch 'main' of https://github.com/apple/foundationdb into delete-data-move-checkpoints-by-id 2023-05-26 15:37:21 -07:00
He Liu 08de38120d Merge branch 'main' of https://github.com/apple/foundationdb into disable-physical-shard-move 2023-05-26 11:54:52 -07:00
Vaidas Gasiunas 60753b5b57
Fix a couple thread-safety issues (#10359)
* Make CodeProbeImpl::_hitCount atomic

* Structure access to TraceLog::logTraceEventMetrics so that it is written before a trace log is opened and only read from one thread after it is opened.

* Fix condition in assert

* Rename TraceLog::log to logMetrics and move initialization of trace log metrics into TraceLog::open

---------

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-05-26 19:36:02 +02:00
sfc-gh-tclinkenbeard 67c53eb203 Decrease default value for TAG_MEASUREMENT_INTERVAL to 5.0 2023-05-26 08:15:45 -07:00
sfc-gh-tclinkenbeard cf32ba4d8c Increase default value for SS_THROTTLE_TAGS_TRACKED to 5 2023-05-26 08:15:45 -07:00
sfc-gh-tclinkenbeard f741a584c0 Improve performance of TransactionTagCounter 2023-05-26 08:15:41 -07:00
sfc-gh-tclinkenbeard e724c90ffe Remove unnecessary GLOBAL_TAG_THROTTLING_MIN_TPS knob 2023-05-25 16:45:32 -07:00
sfc-gh-tclinkenbeard 71846070d6 Update default tag throttling knob values 2023-05-25 16:45:32 -07:00
He Liu d21d85e4b6 Merge branch 'main' of https://github.com/apple/foundationdb into disable-physical-shard-move 2023-05-25 12:25:44 -07:00
He Liu 1900b63acd Merge branch 'main' of https://github.com/apple/foundationdb into delete-data-move-checkpoints-by-id 2023-05-24 13:41:02 -07:00
Ankita Kejriwal 9373191e0a
Fix two bugs in checkExclusion() and add a trace event for better observability (#10330)
* Fix a division in checkExclusion() to be double and add a trace event

* Update the ssExcludedCount only if the role is storage
2023-05-24 10:58:03 -07:00
Jingyu Zhou 1712691da5
Merge pull request #10328 from sfc-gh-jslocum/knob_allow_relative_path_blob_container
adding knob to allow relative paths for local backup containers
2023-05-24 10:30:02 -07:00