Commit Graph

26439 Commits

Author SHA1 Message Date
sfc-gh-tclinkenbeard 066f713242 Fix type of TagCost field in BusiestReadTag traces 2023-05-26 08:15:45 -07:00
sfc-gh-tclinkenbeard 5aca2c7beb Fix failing /fdbserver/TransactionTagCounter/IgnoreBelowMinRate unit test 2023-05-26 08:15:45 -07:00
sfc-gh-tclinkenbeard 67c53eb203 Decrease default value for TAG_MEASUREMENT_INTERVAL to 5.0 2023-05-26 08:15:45 -07:00
sfc-gh-tclinkenbeard cf32ba4d8c Increase default value for SS_THROTTLE_TAGS_TRACKED to 5 2023-05-26 08:15:45 -07:00
sfc-gh-tclinkenbeard f741a584c0 Improve performance of TransactionTagCounter 2023-05-26 08:15:41 -07:00
sfc-gh-tclinkenbeard e724c90ffe Remove unnecessary GLOBAL_TAG_THROTTLING_MIN_TPS knob 2023-05-25 16:45:32 -07:00
sfc-gh-tclinkenbeard 71846070d6 Update default tag throttling knob values 2023-05-25 16:45:32 -07:00
sfc-gh-tclinkenbeard c9350f66f2 Fix bug where updateLastLogged was called even for disabled trace events 2023-05-25 16:45:32 -07:00
sfc-gh-tclinkenbeard bac407630c Fix type of TagCost field in BusiestReadTag traces 2023-05-25 16:45:32 -07:00
Josh Slocum 2344a419bd
fixing consistency scan loop and completion check (#10345)
* fixing consistency scan loop and completion check

* fixing ide build
2023-05-25 18:29:03 -05:00
Jefferson Zhong 492c74b679 Implement IStorageMetricsService intef for BlobMigrator 2023-05-25 15:58:05 -07:00
Jingyu Zhou c734a8431e
Merge pull request #10320 from jzhou77/main
Slow down wiggler when in ConsistencyCheck
2023-05-25 14:45:55 -07:00
Jingyu Zhou 9d6413dd2d
Merge pull request #10346 from jzhou77/fix-head
Disable mutation tracking by default
2023-05-25 13:53:32 -07:00
He Liu 0d191b4cb3 Merge branch 'main' of https://github.com/apple/foundationdb into delete-data-move-checkpoints-by-id 2023-05-25 13:29:04 -07:00
Jingyu Zhou df4b18ebf0 Disable mutation tracking by default
This is causing TracedTooManyLines errors
2023-05-25 12:44:51 -07:00
He Liu d21d85e4b6 Merge branch 'main' of https://github.com/apple/foundationdb into disable-physical-shard-move 2023-05-25 12:25:44 -07:00
Zhe Wu bbf6b241da
Merge pull request #10325 from halfprice/zhewu/gray-failure-track-recovery
Refactor healthMonitor in worker.actor.cpp
2023-05-25 09:57:36 -07:00
Jingyu Zhou 70d4c94e5c Limit the change to be only during consistency check in simulation
ConfigureStorageMigrationTest.toml can require wiggler when speedUpSimulation
is on.
2023-05-24 20:50:04 -07:00
Jingyu Zhou 68de513fc8 Slow down wiggler when speedUpSimulation
Storage wiggle can cause consistency check to restart due to wrong_shard_server
error. This can sometimes repeatedly happen and cause consistency check timeout.

Reproduction:
  seed: -f ./tests/fast/MutationLogReaderCorrectness.toml -s 575471711 -b on
  commit: 2e62b577f at release-7.1
  build: clang
2023-05-24 20:50:04 -07:00
He Liu 241907d8ad Clear move-in-shards before terminating SS. 2023-05-24 19:02:30 -07:00
Jingyu Zhou d39a7b53c3
Merge pull request #10342 from liquid-helium/revert-api-correctness-test-toml
Revert unintended changes to APICorrectnessTest.
2023-05-24 15:27:09 -07:00
He Liu 6a18c1fa4c Revert unintended changes to APICorrectnessTest. 2023-05-24 14:53:05 -07:00
He Liu 1900b63acd Merge branch 'main' of https://github.com/apple/foundationdb into delete-data-move-checkpoints-by-id 2023-05-24 13:41:02 -07:00
Jingyu Zhou b2ece8cd8f
Merge pull request #10337 from jzhou77/fix-head
Remove mentioning of build directory in README.md
2023-05-24 11:32:47 -07:00
Ankita Kejriwal 9373191e0a
Fix two bugs in checkExclusion() and add a trace event for better observability (#10330)
* Fix a division in checkExclusion() to be double and add a trace event

* Update the ssExcludedCount only if the role is storage
2023-05-24 10:58:03 -07:00
Jingyu Zhou 1712691da5
Merge pull request #10328 from sfc-gh-jslocum/knob_allow_relative_path_blob_container
adding knob to allow relative paths for local backup containers
2023-05-24 10:30:02 -07:00
Jingyu Zhou d2b21322a6 Remove mentioning of build directory in README.md
Fix #3098
2023-05-24 10:22:02 -07:00
He Liu 9100507928 Disable physical shard move by default. 2023-05-24 08:51:13 -07:00
Yanqin Jin 16df5a8517
Make redwood tests terminate after certain amount of time (#10032)
This PR avoids "external timeout" for redwood correctness tests.

Update the logic in fdbserver.actor.cpp so that -1 instead of 0 is considered a noUnseed. If "noUnseed == true", then -1 will be logged as "RandomUnseed" in the end of the trace.

Tweak the finish condition of redwood unit tests so that if wall clock time reaches a certain threshold, finish the test and set nounseed to true.
2023-05-23 21:29:45 -07:00
Jingyu Zhou 0ec2d7068d
Merge pull request #10329 from jzhou77/fix-head
Increase BW_RK_SIM_QUIESCE_DELAY to 400s
2023-05-23 20:53:00 -07:00
Jingyu Zhou 13800ae1a8 Increase BW_RK_SIM_QUIESCE_DELAY to 400s
The blob worker needs more time to catchup, about 388s in the failed simulation
test.

Reproduction:
  seed: -f ./tests/slow/BlobGranuleVerifyLargeClean.toml -s 4068151139 -b on
  commit: 3bdd71cb0 at release-7.3 branch
  build: gcc
2023-05-23 15:54:56 -07:00
Josh Slocum fb1eec0efe
adding disabled consistency check test to check singletons (#10314)
* adding disabled consistency check test to check singletons

* fixing warning
2023-05-23 17:20:59 -05:00
Josh Slocum 21468d9483
fixing too large packets from change feeds (#10315) 2023-05-23 17:20:48 -05:00
Josh Slocum 8f241632af adding knob to allow relative paths for local backup containers 2023-05-23 17:06:49 -05:00
Zhe Wu 2e00e0abed Refactor healthMonitor in worker.actor.cpp 2023-05-23 14:14:33 -07:00
He Liu 5160f91e78 Removed SHARD_ENCODE_LOCATION_METADATA. 2023-05-23 13:39:25 -07:00
Josh Slocum d038154d69
re-enabling change feed coalesce knob (#10317) 2023-05-23 14:43:11 -05:00
He Liu 8ad7ec6fdf
Psm ss (#9817)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* Refactored ShardedRocks checkpoint/restore for psm.

* Complete ShardedRocks::restore.

* dismiss operation_obsolete, and throw actor_cancelled.

* Validate checkpoint when !asKeyValues.

* fmt.

* Don't read from uninitialized physical shard.

* Resolved commments.

* cleanup.

* Added verify_checksum_before_restore for ShardedRocks.

* Added ShardedRocksDB checkpoint/restore unit test.

* Populate CheckpointMetaData::dir in RocksDB.

* Rename MovingIn as Adding.

* Added StorageServerUtils.

* Added physical shard move in SS.

* Fix on ApplyMetaData, doFetchFile error handling etc.

* Debugging incorrect shard size.

* Create/delete checkpoints only when Physical shard move is enabled.

* Added back SHARD_ENCODE_LOCATION_METADATA.

* Fixed bytesSample incorrect issue.

Essentially dedicated CheckpointRocksDBCF as key-value based checkpoint, will need to add a new format for the file-based checkpoint.

* Cleanup.

* Cleanup & compile rocksdb with 8.1 branch.

* clean up.

* clean up.

* Allowed request_maybe_delivered error type in FetchShard.

* Added FDBRocksDBVersion.h.

* Fixed stuck fetchShard.

* Don't create checkpoint on TSS.

* Upgrade to RocksDB 8.1.1

* Cleanup.

* Fixed accidently deleted db_path and name fields.

* Improved trace event.

* Removed redundants from previuos ShardedrocksDB.

* Cleanup.

* cleanup.

* cleanup.

* reanme `state`.

* Cleanup.

* Removed excessive TraceEvent.

* * Fixed shardMap race condition on different threads
* Added *Stats, logging data move rates.
* Added `DD_PHYSICAL_SHARD_MOVE_PROBABILITY` to support hybrid data move.

* Resolved comments.

* fmt.

* Use physical shard move in PhysicalShardMoveTest.

* Enforce physical-shard-move for PhysicalShardMoveTest.

* fmt
2023-05-23 11:18:35 -07:00
Xiaoxi Wang 96a110a9e6 Fix opsReadSample poll and getMetrics; 2023-05-23 09:46:34 -07:00
Xiaoxi Wang 969196d8ba Add read ops shard metrics notify bound 2023-05-23 09:46:34 -07:00
Josh Slocum 629b068145
Bg tenant metadata restarting (#10235)
* making blob metadata optionally deterministic across runs

* Non restarting test passes after refactor

* adding downgrade version test

* formatting
2023-05-23 11:24:13 -05:00
He Liu eaa934dac6
Added more logs about shard management. (#10303) 2023-05-22 18:00:00 -07:00
Yao Xiao bbf15be05f
Knobs to speed up DB open. (#10301) 2023-05-22 16:21:05 -07:00
Vaidas Gasiunas 9bc55f67c3
Fix releasing watches on future cancellation (#10304)
* Test watch cleanup on cancel

* Fix clearing the database in Java integration tests

* Always cancel the futures wrapped by MVC abortable futures

* More tests for watch cleanup

* Fix clear database database in some Java integration tests
2023-05-22 22:01:27 +02:00
Zhe Wang 6c980862c3
Improve throughput of audit storage (#10245)
* improve audit throughput

* if ssshard fails do audit due to ssi failure, then global retry is required

* fix a trace event name

* fix budget release in doAudit

* avoid throttling in general simultion tests

* fix doAuditOnStorageServer throw error

* avoid starting a task that has been complete

* when ddaudit ssshard failed, check if ssi is removed, if yes, silently exit

* fix trace detail name of AuditUtilStorageServerRemovedEnd evenrt

* redo schedule in doAuditOnStorageServer

* schedule does not wait doAudit

* remove TESTING_AUDIT_STORAGE_THROTTLING

* ssaudit stops proceeding if ddauditstate is not in running phase

* make tester audit storage only happen when simulation, and randomly set CONCURRENT_AUDIT_TASK_COUNT_MAX
2023-05-22 12:09:08 -07:00
sfc-gh-tclinkenbeard 7ef66ab356 Add OutstandingWatches and WatchMapSize to TransactionMetrics 2023-05-22 12:07:10 -07:00
Ata E Husain Bohra 2b0a08dbe4
BlobMetadata: Move SimBlobMetada store to SimKmsVault (#10269)
Description

Patch refactor SimKmsConnector to move SimBlobMetadata store to SimKmsVault

Testing

BlobGranuleCorrectness - 100K
/fdbserver/blob/connectionprovider - 100K
devRunCorrectness - 100K
2023-05-22 11:00:59 -07:00
Jingyu Zhou f820e92878
Merge pull request #10292 from sfc-gh-yajin/fix-heap-use-after-free 2023-05-19 20:48:33 -07:00
Jingyu Zhou e1a7335150
Merge pull request #10291 from jzhou77/main 2023-05-19 20:43:17 -07:00
Jingyu Zhou 8878de8c8f
Merge pull request #10288 from sfc-gh-yajin/update-test-pattern-1
Fix a test pattern so that simulator tests do not run non-sim tests
2023-05-19 16:34:01 -07:00