Commit Graph

553 Commits

Author SHA1 Message Date
Hui Liu 0fba65a3cd Implement SplitMetric pagination in blob migrator 2023-02-22 16:00:49 -08:00
Steve Atherton bb4fb3d81d
Merge pull request #9419 from sfc-gh-satherton/page-rebuild-fix
Optimize/fix node rebuild vs update trigger in Redwood
2023-02-21 13:49:14 -08:00
Josh Slocum 958f3b531b
Plumbing blob worker mapping through commit proxy like storage server (#9401)
* Plumbing blob worker mapping through commit proxy like storage server mapping

* review comments

* formatting
2023-02-21 13:21:44 -06:00
sfc-gh-tclinkenbeard 398079db3a Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-20 17:54:06 -08:00
Steve Atherton e169e65021 Fix to BTree node rebuild logic - rebuild when imbalance hits a limit controlled by a new knob. 2023-02-19 16:40:28 -08:00
Yi Wu eac757d186
EaR: cleanup encryption knobs (#9386)
Changes:
* Cleanup all encryption knobs 
* Update simulated cluster to randomly enable encryption with higher probability
2023-02-18 13:18:20 -08:00
Ankita Kejriwal b74a35a986 Enable STORAGE_QUOTA_ENABLED knob by default 2023-02-17 21:14:46 -08:00
sfc-gh-tclinkenbeard 1aef6cb5f7 Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-17 20:41:59 -08:00
Hui Liu aa1d983132 Truncate logs after force-flushing cold blob granules 2023-02-17 10:17:04 -08:00
Steve Atherton 41fa3eada9
Merge branch 'main' into add-redwood-slack-knob 2023-02-12 19:31:20 -08:00
Hui Liu cb9d4d8bb5
Merge pull request #9276 from sfc-gh-huliu/manifest
Split blob manifest as segments when writting
2023-02-09 13:54:02 -08:00
Hui Liu 6b6959d35f Split blob manifest as segments when writting 2023-02-09 11:26:19 -08:00
Jingyu Zhou 1a9aed795f
Merge pull request #9327 from sfc-gh-tclinkenbeard/enable-gtt-by-default
Enable `GLOBAL_TAG_THROTTLING` by default
2023-02-08 14:03:39 -08:00
sfc-gh-tclinkenbeard 484dc4f74c Enable GLOBAL_TAG_THROTTLING by default 2023-02-08 11:34:31 -08:00
sfc-gh-tclinkenbeard 09ad864eb5 Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-08 11:25:14 -08:00
sfc-gh-tclinkenbeard dd2ba18d45 Create GLOBAL_TAG_THROTTLING_MIN_TPS knob 2023-02-08 09:48:33 -08:00
Yi Wu d3bc2afc8e
EaR: storage server uses encryption DB config (#9115)
The PR is updating storage server and Redwood to enable encryption based on the encryption mode in DB config, which was previously controlled by a knob. High level changes are
1. Passing encryption mode in DB config to storage server
    1.1 If it is a new storage server, pass the encryption mode through `InitializeStorageRequest`. the encryption mode is pass to Redwood for initialization
    1.2 If it is an existing storage server, on restart the storage server will send `GetStorageServerRejoinInfoRequest` to commit proxy, and commit proxy will return the current encryption mode, which it get from DB config on its own initialization. Storage server will compare the DB config encryption mode to the local storage encryption mode, and fail if they don't match
2. Adding a new `encryptionMode()` method to `IKeyValueStore`, which return a future of local encryption mode of the KV store instance. A KV store supporting encryption would need to persist its own encryption mode, and return the mode via the API.
3. Redwood accepts encryption mode from its constructor. For a new Redwood instance, caller has to specific the encryption mode, which will be stored in Redwood per-instance file header. For existing instance, caller is supposed to not passing the encryption mode, and let Redwood find it out from its own header.
4. Refactoring in Redwood to accommodate the above changes.
2023-02-06 14:02:31 -08:00
Josh Slocum 9eac2b5f8b
un-buggifying PEEK_TRACKER_EXPIRATION_TIME to invalid value (#9275) 2023-02-06 09:06:16 -06:00
Hui Liu 774446d3a0 Support pagination for StorageServer splitMetrics API 2023-02-02 14:21:38 -08:00
Hui Liu 1021878764
Merge pull request #9250 from sfc-gh-huliu/restoreurl
Define knob url for blob manifest files
2023-01-30 16:30:02 -08:00
Josh Slocum 0881c0e4e2
Bg perf 2 (#9052)
* added dynamic write amp calculations for blob granule compaction

* changing blob worker parallelism counts to bytes budget to handle less uniform operation sizes

* more snapshotting parallelism for behind feeds

* add a bit of observability when this happens

* adding knobs

* typo

* adjusting some knobs up with buggified granule size

* fixing bugs in dynamic write amp

* fixing formatting

* fixing bug in knob buggification

* fix formatting
2023-01-26 16:56:45 -06:00
Hui Liu 73bf89cd62 Define knob url for blob manifest files 2023-01-26 12:38:26 -08:00
neethuhaneesha 3d113ac150
Changing histogram type. (#9232) 2023-01-25 09:50:40 -08:00
Dan Adkins 5dcece90e1 Increase buggified lock bytes for backup workers to at least 256 MB.
We are still encountered simulation failures where the backup worker
is waiting on the lock and an assertion fails.
2023-01-19 17:36:06 -08:00
neethuhaneesha ca4a964df1
Adding rocksDB control compaction on deletion knobs. (#9144) 2023-01-18 15:40:34 -08:00
Ata E Husain Bohra 3f2404cc25
[EaR]: Update KMS request/response to embedd version details (#9135)
* [EaR]: Update KMS request/response to embeded version details

Description

 diff-1 : Address review comments

Patch embedd 'version_tag' detail to KMS JSON request/response
payload, this features enables future expansion as well as enables
the path to support multiple versions simulatanesouly if needed

Testing

RESTKmsConnectorUnit.toml updated as per new code
devRunCorrectness - 100K
2023-01-16 12:18:25 -08:00
A.J. Beamon 811593e093 Merge branch 'main' into add-tenant-lookup-interface 2023-01-12 09:56:17 -08:00
A.J. Beamon 281083822b Trigger a commit if none happens within some amount of time when a tenant lookup is performed 2023-01-12 09:11:30 -08:00
Zhe Wu 37e026366c
Merge pull request #9119 from halfprice/zhewu/add-txn-server-initialization-event-1
Add event for txn server initialization and a warning for TLog slow catching up
2023-01-11 22:00:53 -08:00
Zhe Wu 087d37d10b Add event for txn server initialization and a warning for TLog slow catching up 2023-01-11 10:02:06 -08:00
Ata E Husain Bohra f673fce975
[EaR]: Update KMS APIs to split encryption keys endpoints (#9017)
* [EaR]: Update KMS APIs to split encryption keys endpoints

Description
  diff-1: Address review comments

Major changes proposed:
1. Extend fdbserver to allow parsing two endpoints for encryption at-rest
support: getEncrypitonKeys, getLatestEncryptionKeys
2. Update RESTKmsConnector to do the following:
 2.1. Split the getLatest and getCipher requests.
 2.2. "domain_id" for point lookup marked as 'optional'

Testing

devRunCorrectness - 100K
2023-01-09 10:55:53 -08:00
A.J. Beamon f999623bb1 Add a tenant lookup interface and use it when starting transactions 2023-01-06 15:51:12 -08:00
sfc-gh-tclinkenbeard 0f14647bbf Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-01-05 08:10:49 -08:00
Hui Liu e3bf79cf71 Add correctness test for blob restore 2023-01-04 11:10:34 -08:00
imperatorx 81e8afd3a2 Introduce new know for Redwood slack balancing 2022-12-20 15:22:54 +01:00
Meng Xu a1d513b355 Fix:Exclusion stuck because DD cannot build new teams
Bug behavior:
When DD has zero healthy machine teams but more unhealthy machine teams
than the max machine teams DD plans to build, DD will stop building
new machine teams. Due to zero healthy machine team (and zero healthy
server team), DD cannot find a healthy destination team  to relocate data.
When data relocation stops, exclusion stops progressing and stuck.

Bug happens when we *shrink* a k-host cluster by
first adding k/2 new host;
then quickly excluding all old hosts.

Fix:
Let DD build temporary extra teams to relocate data.
The extra teams will be cleaned up later by DD's remove extra teams logic.

Simulation test:
There is no simulation test to cover cluster expansion scnenario.
To most closely simulate this behavior, we intentionally overbuild all possible
machine teams to trigger the condition that unhealthy teams is larger than
the maximum teams DD wants to build later.
2022-12-19 15:28:01 -08:00
Xiaoxi Wang 919c512cdc fix wiggler state setting 2022-12-15 12:14:40 -08:00
Xiaoxi Wang ab4778bd19 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/ppwLoadBalance 2022-12-15 11:36:20 -08:00
He Liu 2024237e5d
Fetch checkpoint as key-value pairs (#9003)
* Allow multiple keyranges in CheckpointRequest.
Include DataMove ID in CheckpointMetaData.

* Use UID dataMoveId instead of Optional<UID>.

* Implemented ShardedRocks::checkpoint().

* Implementing createCheckpoint().

* Attempted to change getCheckpointMetaData*() for a single keyrange.

* Added getCheckpointMetaDataForRange.

* Minor fixes for NativeAPI.actor.cpp.

* Replace UID CheckpointMetaData::ssId with std::vector<UID>
CheckpointMetaData::src;

* Implemented getCheckpointMetaData() and completed checkpoint creation
and fetch in test.

* Refactoring CheckpointRequest and CheckpointMetaData

rename `dataMoveId` as `actionId` and make it Optional.

* Fixed ctor of CheckpointMetaData.

* Implemented ShardedRocksDB::restore().

* Tested checkpoint restore, and added range check for restore, so that
the target ranges can be a subset of the checkpoint ranges.

* Added test to partially restore a checkpoint.

* Refactor: added checkpointRestore().

* Sort ranges for comparison.

* Cleanups.

* Check restore ranges are empty; Add ranges in main thread.

* Resolved comments.

* Fixed GetCheckpointMetaData range check issue.

* Refactor CheckpointReader for CF checkpoint.

* Added CheckpointAsKeyValues as a parameter for newCheckpointReader.

* PhysicalShard::restoreKvs().

* Added `ranges` in fetchCheckpoint.

* Added RocksDBCheckpointKeyValues::ranges.

* Added ICheckpointIterator and implemented for RocksDBCheckpointReader.

* Refactored OpenAction for CheckpointReader, handled failure cases.

* Use RocksDBCheckpointIterator::end() in readRange.

* Set CheckpointReader timout and other Rocks read options.

* Implementing fetchCheckpointRange().

* Added more CheckpointReader tests.

* Cleanup.

* More cleanup.

* Resolved comments.

Co-authored-by: He Liu <heliu@apple.com>
2022-12-14 17:44:47 -08:00
Andrew Noyes dd0036f09c
Automatically clean old idempotency ids (#9039)
* Add cleanIdempotencyIds

Delete zero or more idempotency ids older than minAgeSeconds

* Automatically clean idempotency ids from first proxy

* Add test for cleaner

* Fix formatting

* Address review comments
2022-12-14 14:24:24 -08:00
Xiaoxi Wang 16d11143fa add smallLoadThreshold logic and change knobs 2022-12-07 11:45:49 -05:00
Xiaoxi Wang 5d01d33531 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/ppwLoadBalance 2022-12-07 09:11:55 -05:00
sfc-gh-tclinkenbeard 4b6098931c Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2022-12-04 08:40:04 -08:00
Hui Liu d76822bc12
Merge pull request #8850 from sfc-gh-huliu/applylog
blobrestore - apply mutation log
2022-12-02 13:03:13 -08:00
Jon Fu c7b4f80ac6
Merge pull request #8964 from sfc-gh-jfu/build-cop-too-many-traces-2
Disable buggify for DD_QUEUE_MAX_KEY_SERVERS knob
2022-12-01 16:06:42 -08:00
Hui Liu b38520ec4e blobrestore - apply mutation log 2022-12-01 14:16:18 -08:00
Jon Fu dace2927a5 disable buggify for DD_QUEUE_MAX_KEY_SERVERS knob 2022-12-01 14:10:01 -08:00
Jingyu Zhou c908f32d42 Increase buggified lock bytes for backup workers
To fix simulation failures where the knob value is too small.
2022-11-28 10:38:17 -08:00
sfc-gh-tclinkenbeard 453f3f44c6 Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2022-11-21 09:17:48 -08:00
Jingyu Zhou 2a74624cbd
Merge pull request #8848 from neethuhaneesha/suggest-compacts
Rocksdb suggest compact range checks
2022-11-17 20:01:19 -08:00