* better check for granule-ification
* Handling blob granule initial split too large
* Re-evaluating split size if too large, even if read doesn't get transaction_too_old
* reworked to have blob worker propose split key
* New GranuleStatusReply to avoid seqno issue stream side effects
* Handling retries on reevaluateInitialSplit properly
* Waiting for stream to be initialized
* Checking reevaluate split for additional split points beyond proposed
* Fixing more races in reevaluate initial split
* properly handling cleaning up old change feed after split re-evaluate
* fixing granule conversion bug with hard boundaries
* fixing clear and merge check race with cycle test
* refactor missed knob check for clearAndMerge
* Fixing formatting
* review comments and improving large range conversion
* fixing typo
* more formatting
* Using knownBlobRanges for blob granule ranges whether tenants are enabled or not
* Effectively disabled blob granule tests when tenants enabled to fix ctest
* Log OldBinary even if there are no trace files
DeterminismCheck and OldBinary attributes don't actually depend on
information in the ProgramStart event, so we can add them
unconditionally.
* Add JoshuaSeed attribute to Test element in test harness
* Add NoTraceFilesFound event in test harness
There's already something similar: NoTraceFileGenerated. It appears that
the original author only wants to log that if the process exited 0
though. I'm not sure what the reason for that is so I think it's safer
to add a new event. This will make it more clear if say an old binary is
corrupt.
* Granule purge cannot delete history entry for fully deleting granule until all children are completely done splitting
* Several purging fixes related to granule history
* Fixed typo in refactor
* fixing memory model for purgeRange
* formatting
* weakening granule purge test for now
* cleanup
* review comments
* blob: read TenantMap during recovery
Future functionality in the blob subsystem will rely on the tenant data
being loaded. This fixes this issue by loading the tenant data before
completing recovery such that continued actions on existing blob
granules will have access to the tenant data.
Example scenario with failover, splits are restarted before loading the
tenant data:
BM - BlobManager
epoch 3: epoch 4:
BM record intent to split.
Epoch fails.
BM recovery begins.
BM fails to persist split.
BM recovery finishes.
BM.checkBlobWorkerList()
maybeSplitRange().
BM.monitorClientRanges().
loads tenant data.
bin/fdbserver -r simulation -f tests/slow/BlobGranuleCorrectness.toml \
-s 223570924 -b on --crash --trace_format json
* blob: add tuple key truncation for blob granule alignment
FDB has a backup system available using the blob manager and blob
granule subsystem. If we want to audit the data in the blobs, it's a lot
easier if we can align them to something meaningful.
When a blob granule is being split, we ask the storage metrics system
for split points as it holds approximate data distribution metrics.
These keys are then processed to determine if they are a tuple and
should be truncated according to the new knob,
BG_KEY_TUPLE_TRUNCATE_OFFSET.
Here we keep all aligned keys together in the same granule even if it is
larger than the allowed granule size. The following commit will address
this by adding merge boundaries.
* blob: minor clean ups in merging code
1. Rename mergeNow -> seen. This is more inline with clocksweep naming
and removes the confusion between mergeNow and canMergeNow.
2. Make clearMergeCandidate() reset to MergeCandidateCannotMerge to make
a clear distinction what we're accomplishing.
3. Rename canMergeNow() -> mergeEligble().
* blob: add explicit (hard) boundaries
Blob ranges can be specified either through explicit ranges or at the
tenant level. Right now this is managed implicitly. This commit aims to
make it a little more explicit.
Blobification begins in monitorClientRanges() which parses either the
explicit blob ranges or the tenant map. As we do this and add new
ranges, let's explicitly track what is a hard boundary and what isn't.
When blob merging occurs, we respect this boundary. When a hard boundary
is encountered, we submit the found eligible ranges and start looking
for a new range beginning with this hard boundary.
* blob: create BlobGranuleSplitPoints struct
This is a setup for the following commit. Our goal here is to provide a
structure for split points to be passed around. The need is for us to be
able to carry uncommitted state until it is committed and we can apply
these mutations to the in-memory data structures.
* blob: implement soft boundaries
An earlier commit establishes the need to create data boundaries within
a tenant. The reality is we may encounter a set of keys that degnerate
to the same key prefix. We'll need to be able to split those across
granules, but we want to ensure we merge the split granules together
before merging with other granules.
This adds to the BlobGranuleSplitPoints state of new
BlobGranuleMergeBoundary items. BlobGranuleMergeBoundary contains state
saying if it is a left or right boundary. This information is used to,
like hard boundaries, force merging of like granules first.
We read the BlobGranuleMergeBoundary map into memory at recovery.
* Fixed ChangeServerKeysContext name issue.
* Update fdbserver/storageserver.actor.cpp
Co-authored-by: Andrew Noyes <andrew.noyes@snowflake.com>
Co-authored-by: He Liu <heliu@apple.com>
Co-authored-by: Andrew Noyes <andrew.noyes@snowflake.com>
* Add getRange test coverage for SpecialKeyRangeAsyncImpl
* Fix the bug in SpecialKeyRangeAsyncImpl found by the test
* Refactor ConflictingKeysImpl::getRange to use containedRanges to simplify the code
* Fix file format
* Initialize SpecialKeyRangeAsyncImpl cache with correct end key
* Add release notes
* Revert "Refactor ConflictingKeysImpl::getRange to use containedRanges to simplify the code"
This reverts commit fdd298f469.