* added dynamic write amp calculations for blob granule compaction
* changing blob worker parallelism counts to bytes budget to handle less uniform operation sizes
* more snapshotting parallelism for behind feeds
* add a bit of observability when this happens
* adding knobs
* typo
* adjusting some knobs up with buggified granule size
* fixing bugs in dynamic write amp
* fixing formatting
* fixing bug in knob buggification
* fix formatting
* [EaR]: Update KMS request/response to embeded version details
Description
diff-1 : Address review comments
Patch embedd 'version_tag' detail to KMS JSON request/response
payload, this features enables future expansion as well as enables
the path to support multiple versions simulatanesouly if needed
Testing
RESTKmsConnectorUnit.toml updated as per new code
devRunCorrectness - 100K
* [EaR]: Update KMS APIs to split encryption keys endpoints
Description
diff-1: Address review comments
Major changes proposed:
1. Extend fdbserver to allow parsing two endpoints for encryption at-rest
support: getEncrypitonKeys, getLatestEncryptionKeys
2. Update RESTKmsConnector to do the following:
2.1. Split the getLatest and getCipher requests.
2.2. "domain_id" for point lookup marked as 'optional'
Testing
devRunCorrectness - 100K
Bug behavior:
When DD has zero healthy machine teams but more unhealthy machine teams
than the max machine teams DD plans to build, DD will stop building
new machine teams. Due to zero healthy machine team (and zero healthy
server team), DD cannot find a healthy destination team to relocate data.
When data relocation stops, exclusion stops progressing and stuck.
Bug happens when we *shrink* a k-host cluster by
first adding k/2 new host;
then quickly excluding all old hosts.
Fix:
Let DD build temporary extra teams to relocate data.
The extra teams will be cleaned up later by DD's remove extra teams logic.
Simulation test:
There is no simulation test to cover cluster expansion scnenario.
To most closely simulate this behavior, we intentionally overbuild all possible
machine teams to trigger the condition that unhealthy teams is larger than
the maximum teams DD wants to build later.
* Allow multiple keyranges in CheckpointRequest.
Include DataMove ID in CheckpointMetaData.
* Use UID dataMoveId instead of Optional<UID>.
* Implemented ShardedRocks::checkpoint().
* Implementing createCheckpoint().
* Attempted to change getCheckpointMetaData*() for a single keyrange.
* Added getCheckpointMetaDataForRange.
* Minor fixes for NativeAPI.actor.cpp.
* Replace UID CheckpointMetaData::ssId with std::vector<UID>
CheckpointMetaData::src;
* Implemented getCheckpointMetaData() and completed checkpoint creation
and fetch in test.
* Refactoring CheckpointRequest and CheckpointMetaData
rename `dataMoveId` as `actionId` and make it Optional.
* Fixed ctor of CheckpointMetaData.
* Implemented ShardedRocksDB::restore().
* Tested checkpoint restore, and added range check for restore, so that
the target ranges can be a subset of the checkpoint ranges.
* Added test to partially restore a checkpoint.
* Refactor: added checkpointRestore().
* Sort ranges for comparison.
* Cleanups.
* Check restore ranges are empty; Add ranges in main thread.
* Resolved comments.
* Fixed GetCheckpointMetaData range check issue.
* Refactor CheckpointReader for CF checkpoint.
* Added CheckpointAsKeyValues as a parameter for newCheckpointReader.
* PhysicalShard::restoreKvs().
* Added `ranges` in fetchCheckpoint.
* Added RocksDBCheckpointKeyValues::ranges.
* Added ICheckpointIterator and implemented for RocksDBCheckpointReader.
* Refactored OpenAction for CheckpointReader, handled failure cases.
* Use RocksDBCheckpointIterator::end() in readRange.
* Set CheckpointReader timout and other Rocks read options.
* Implementing fetchCheckpointRange().
* Added more CheckpointReader tests.
* Cleanup.
* More cleanup.
* Resolved comments.
Co-authored-by: He Liu <heliu@apple.com>
* Add cleanIdempotencyIds
Delete zero or more idempotency ids older than minAgeSeconds
* Automatically clean idempotency ids from first proxy
* Add test for cleaner
* Fix formatting
* Address review comments
memtable and writebuffer size are too small in simualtion, which causes
thousands of sst files and at least 6 levels of ssts.
Both makes compaction slower in simulation and contribute to timeout errors.
After increasing the size, failure rate (timeout failures) when we only run rocksdb and
sharded rocksdb engines in simulation drops from 10 out of 332339 tests to 10 out of 497532 tests.
For apple dev who wants to look into the joshua details,
before the change, joshua ensemble id is 20221111-223720-mengxudebugrocks-505ede1c55664ddf
after the change, joshua ensemble id is 20221114-192042-mengxurocksdebugknobchange-1e4c047d112e9a38
* add bytelimit for prefetch
A fraction of byteLimit will be used as the limit to fetch index.
For the indexes fetched, fetch records for them in batch.
byteLimit always count the index size, it also count record if exist,
it at least return 1 index-record entry and always include the last entry
despite that adding the last entry despite it might exceed limit.
There is a Knob STRICTLY_ENFORCE_BYTE_LIMIT, when it is set, records
will be discarded once the byteLimit is hit, despite they are fetched.
Otherwise, return the whole batch.