Commit Graph

21748 Commits

Author SHA1 Message Date
Xiaoxi Wang 24e50edd87
Merge pull request #7808 from sfc-gh-ajbeamon/fix-wiggler-unit-test
Fix a bug in a storage wiggler unit test where some servers were added with too recent a timestamp
2022-08-07 21:25:33 -07:00
A.J. Beamon aedc6fc349 Fix a bug in a storage wiggler unit test where some servers were added with too recent a timestamp 2022-08-07 06:45:39 -07:00
Xiaoxi Wang 8ecee1992b
Merge pull request #7777 from sfc-gh-xwang/feature/main/eligible-wiggle
Make storage wiggler support SS_MIN_AGE
2022-08-06 17:01:46 -07:00
Josh Slocum f866ffc36b
Better granule conversion (#7787)
* better check for granule-ification

* Handling blob granule initial split too large

* Re-evaluating split size if too large, even if read doesn't get transaction_too_old

* reworked to have blob worker propose split key

* New GranuleStatusReply to avoid seqno issue stream side effects

* Handling retries on reevaluateInitialSplit properly

* Waiting for stream to be initialized

* Checking reevaluate split for additional split points beyond proposed

* Fixing more races in reevaluate initial split

* properly handling cleaning up old change feed after split re-evaluate

* fixing granule conversion bug with hard boundaries

* fixing clear and merge check race with cycle test

* refactor missed knob check for clearAndMerge

* Fixing formatting

* review comments and improving large range conversion

* fixing typo

* more formatting
2022-08-05 18:12:17 -05:00
Hui Liu cd0020ccaf
add counters for blob compression (#7799) 2022-08-05 18:07:13 -05:00
A.J. Beamon a6567fab9c
Merge pull request #7797 from sfc-gh-ajbeamon/python-binding-tester-reset-size-limit
Reset the default transaction size limit at the end of the size limit test
2022-08-05 13:35:51 -07:00
A.J. Beamon c8451d5f97
Merge pull request #7794 from sfc-gh-ajbeamon/fix-permanently-failed-tenant-creation
Fix: tenant creation in metacluster failure
2022-08-05 12:26:35 -07:00
Hui Liu 29ad2c0654
fdbcli: show status details about # works and # key ranges if blob granules enabled (#7792)
* fdbcli: show status if blob granules is enabled

* fdbcli: show status details for blob granules for # works and # key ranges
2022-08-05 12:33:57 -05:00
Josh Slocum b2835921ba
Using knownBlobRanges for blob granule ranges whether tenants are enabled or not (#7788)
* Using knownBlobRanges for blob granule ranges whether tenants are enabled or not

* Effectively disabled blob granule tests when tenants enabled to fix ctest
2022-08-05 11:46:09 -05:00
A.J. Beamon 390bac4e11 Reset the default transaction size limit at the end of the size limit test so future tests aren't artificially limited 2022-08-05 09:41:18 -07:00
A.J. Beamon a0023ddde6
Merge pull request #7795 from sfc-gh-ajbeamon/fix-ub-overflow
Fix integer overflow in test from setting a range limit too large
2022-08-05 09:31:38 -07:00
Josh Slocum ddbf32d35e
adding knob and fixing blobrange cli check bug with too many granules (#7793) 2022-08-05 11:19:17 -05:00
A.J. Beamon 4b7911f68f Fix integer overflow in test from setting a range limit too large 2022-08-05 07:28:06 -07:00
A.J. Beamon 1a132dfd1a If we are retrying a creation from the beginning because of a permanent failure on a data cluster, there should be an existing entry on the management cluster. If it is gone, throw tenant_removed. 2022-08-05 07:21:15 -07:00
A.J. Beamon a7653b2859 Fix: when a tenant creation permanently failed on the data cluster and started over, it could incorrectly fail with a tenant already exists error if the subsequent retry successfully committed during a commit_unknown_result error. Also expand the tenant management concurrency workload to include configure operations. 2022-08-05 06:46:46 -07:00
Yao Xiao 38c79a0eb5
Enable RocksDB metrics (#7739)
* add metrics

* Pass stats to metric logger

* Add debugID
2022-08-05 00:38:59 -07:00
A.J. Beamon ff23d5994e
Merge pull request #7729 from sfc-gh-ajbeamon/feature-metacluster
Metacluster
2022-08-04 15:29:44 -07:00
Josh Slocum d721d1b850
More cf bug fixes (#7789)
* Fixing change feed fetch and rollback race

* Fixing validation issue for change feed validation

* Fixing shutdown segfault in blob worker
2022-08-04 15:57:42 -05:00
Jingyu Zhou 84d483605b
Merge pull request #7431 from xis19/main
Let the storage server reports busiest write tag
2022-08-04 10:23:31 -07:00
A.J. Beamon ff93b345e5 Remove unnecessary deletion of the cluster capacity index 2022-08-04 06:39:58 -07:00
A.J. Beamon 48f149a62e Fix a small test bug and remove some delays that don't seem to be needed anymore 2022-08-04 05:57:07 -07:00
Yao Xiao 5ee40bbc5f
Clean up empty physical shards in KVS. (#7674) 2022-08-04 02:12:04 -07:00
A.J. Beamon a2e66c9695 Don't buggify the assignment availability timeout too low or it will become impossible to make an assignment 2022-08-03 19:39:20 -07:00
Xiaoxi Wang 753d096ae1 fix unit test bug; pass 100k 2022-08-03 19:20:52 -07:00
A.J. Beamon fbe1a4a69a Use multiple databases in the metacluster managemen test. Fix a test bug as well as some issues with setting up multiple extra databases. 2022-08-03 19:10:34 -07:00
Xiaoxi Wang 4b3acf6b4d update SS_MIN_AGE=21 day 2022-08-03 17:18:15 -07:00
Xiaoxi Wang 327fe33491 refactor for direct test getNextWigglingServerID through unit test 2022-08-03 17:16:38 -07:00
Hui Liu 4f75f01882
fdbcli: show status if blob granules is enabled (#7784) 2022-08-03 19:15:34 -05:00
Andrew Noyes da1ffebcb0
Improve test harness logging when there are no trace files (#7785)
* Log OldBinary even if there are no trace files

DeterminismCheck and OldBinary attributes don't actually depend on
information in the ProgramStart event, so we can add them
unconditionally.

* Add JoshuaSeed attribute to Test element in test harness

* Add NoTraceFilesFound event in test harness

There's already something similar: NoTraceFileGenerated. It appears that
the original author only wants to log that if the process exited 0
though. I'm not sure what the reason for that is so I think it's safer
to add a new event. This will make it more clear if say an old binary is
corrupt.
2022-08-03 17:14:33 -07:00
Josh Slocum 1cda8a2fc1
More blob granule operational stuff (#7783)
* More blob manager metrics

* making blobrange check command work for large ranges
2022-08-03 18:11:25 -05:00
Balachandar Namasivayam b9d156f0d9
Merge pull request #7771 from bnamasivayam/fix-transaction-analyzer
Add missing CommitInfo fields to the transaction profiling analyzer
2022-08-03 15:29:53 -07:00
Xiaoxi Wang de64d51e55 set MIN_ON_CHECK_DELAY 2022-08-03 15:29:40 -07:00
Josh Slocum 7f45cccb56
More granule purging fixes (#7756)
* Granule purge cannot delete history entry for fully deleting granule until all children are completely done splitting

* Several purging fixes related to granule history

* Fixed typo in refactor

* fixing memory model for purgeRange

* formatting

* weakening granule purge test for now

* cleanup

* review comments
2022-08-03 16:43:27 -05:00
He Liu 97cf3c99e7
Remove optimizeForSmallDB. (#7767)
Co-authored-by: He Liu <heliu@apple.com>
2022-08-03 13:51:57 -07:00
He Liu fa418fd784
Change SHARD_ENCODE_LOCATION_METADATA to a server knob. (#7770)
Co-authored-by: He Liu <heliu@apple.com>
2022-08-03 13:51:40 -07:00
A.J. Beamon 41af66bd4e Add a tenant consistency check and use it in the various tenant workloads 2022-08-03 13:33:45 -07:00
John Brownlee 4f4d32de8e
Merge pull request #7697 from brownleej/kubernetes-monitor-ip-family
Add IP family argument to fdb-kubernetes-monitor
2022-08-03 13:26:34 -07:00
Xiaoxi Wang 99bfc2406a update onCheck() and unit test; format code 2022-08-03 11:31:59 -07:00
Bala Namasivayam 996484191b Fix protocol version 2022-08-03 11:16:15 -07:00
A.J. Beamon 8e777a6330 Detect and handle inverted ranges in the get cluster list test. Remove some unused code. 2022-08-03 09:09:36 -07:00
Xiaoxi Wang 1243a1d8a3 add /StorageWiggler/MinAge test 2022-08-02 18:55:40 -07:00
Bala Namasivayam bf3009d6c9 Add missing CommitInfo fields to the transaction profiling analyzer 2022-08-02 17:13:03 -07:00
Xiaoxi Wang 2cd15073d5 make storage wiggler support SS_MIN_AGE 2022-08-02 16:21:41 -07:00
Trevor Clinkenbeard edf4e60fa9
Merge pull request #7631 from sfc-gh-tclinkenbeard/global-tag-throttling5
Improvements to `GlobalTagThrottler`
2022-08-02 16:04:20 -07:00
Dennis Zhou b34a54fa7f
blob: allow for alignment of granules to tuple boundaries (#7746)
* blob: read TenantMap during recovery

Future functionality in the blob subsystem will rely on the tenant data
being loaded. This fixes this issue by loading the tenant data before
completing recovery such that continued actions on existing blob
granules will have access to the tenant data.

Example scenario with failover, splits are restarted before loading the
tenant data:
BM - BlobManager
epoch 3:                        epoch 4:
  BM record intent to split.
  Epoch fails.
                                BM recovery begins.
  BM fails to persist split.
                                BM recovery finishes.
                                BM.checkBlobWorkerList()
                                  maybeSplitRange().
                                BM.monitorClientRanges().
                                  loads tenant data.

bin/fdbserver -r simulation -f tests/slow/BlobGranuleCorrectness.toml \
    -s 223570924 -b on  --crash --trace_format json

* blob: add tuple key truncation for blob granule alignment

FDB has a backup system available using the blob manager and blob
granule subsystem. If we want to audit the data in the blobs, it's a lot
easier if we can align them to something meaningful.

When a blob granule is being split, we ask the storage metrics system
for split points as it holds approximate data distribution metrics.
These keys are then processed to determine if they are a tuple and
should be truncated according to the new knob,
BG_KEY_TUPLE_TRUNCATE_OFFSET.

Here we keep all aligned keys together in the same granule even if it is
larger than the allowed granule size. The following commit will address
this by adding merge boundaries.

* blob: minor clean ups in merging code

1. Rename mergeNow -> seen. This is more inline with clocksweep naming
   and removes the confusion between mergeNow and canMergeNow.
2. Make clearMergeCandidate() reset to MergeCandidateCannotMerge to make
   a clear distinction what we're accomplishing.
3. Rename canMergeNow() -> mergeEligble().

* blob: add explicit (hard) boundaries

Blob ranges can be specified either through explicit ranges or at the
tenant level. Right now this is managed implicitly. This commit aims to
make it a little more explicit.

Blobification begins in monitorClientRanges() which parses either the
explicit blob ranges or the tenant map. As we do this and add new
ranges, let's explicitly track what is a hard boundary and what isn't.

When blob merging occurs, we respect this boundary. When a hard boundary
is encountered, we submit the found eligible ranges and start looking
for a new range beginning with this hard boundary.

* blob: create BlobGranuleSplitPoints struct

This is a setup for the following commit. Our goal here is to provide a
structure for split points to be passed around. The need is for us to be
able to carry uncommitted state until it is committed and we can apply
these mutations to the in-memory data structures.

* blob: implement soft boundaries

An earlier commit establishes the need to create data boundaries within
a tenant. The reality is we may encounter a set of keys that degnerate
to the same key prefix. We'll need to be able to split those across
granules, but we want to ensure we merge the split granules together
before merging with other granules.

This adds to the BlobGranuleSplitPoints state of new
BlobGranuleMergeBoundary items. BlobGranuleMergeBoundary contains state
saying if it is a left or right boundary. This information is used to,
like hard boundaries, force merging of like granules first.

We read the BlobGranuleMergeBoundary map into memory at recovery.
2022-08-02 16:06:25 -05:00
He Liu 013b9e3baa
Fixed ChangeServerKeysContext name issue. (#7761)
* Fixed ChangeServerKeysContext name issue.

* Update fdbserver/storageserver.actor.cpp

Co-authored-by: Andrew Noyes <andrew.noyes@snowflake.com>

Co-authored-by: He Liu <heliu@apple.com>
Co-authored-by: Andrew Noyes <andrew.noyes@snowflake.com>
2022-08-02 13:54:27 -07:00
sfc-gh-tclinkenbeard 025085ccb5 Fix GlobalTagThrottler::addRequests 2022-08-02 13:47:04 -07:00
Xiaoxi Wang 07eafcec93
Merge pull request #7763 from sfc-gh-xwang/feature/main/unittest
move waitForMost into generic actors
2022-08-02 13:30:30 -07:00
Andrew Noyes 5dbb6f1dd3
Make Tuple::pack return a Standalone<StringRef> (#7764)
This makes it less error-prone and more like other similar functions
like BinaryWriter::toValue

Closes #7748
2022-08-02 12:45:56 -07:00
Chaoguang Lin 48e46cbc81
Add test coverage for SpecialKeyRangeAsyncImpl::getRange (#7671)
* Add getRange test coverage for SpecialKeyRangeAsyncImpl

* Fix the bug in SpecialKeyRangeAsyncImpl found by the test

* Refactor ConflictingKeysImpl::getRange to use containedRanges to simplify the code

* Fix file format

* Initialize SpecialKeyRangeAsyncImpl cache with correct end key

* Add release notes

* Revert "Refactor ConflictingKeysImpl::getRange to use containedRanges to simplify the code"

This reverts commit fdd298f469.
2022-08-02 12:04:40 -07:00