Commit Graph

356 Commits

Author SHA1 Message Date
He Liu 0bbce98da2
Disable shard aware (#8072)
* Removed STORAGE_SERVER_SHARD_AWARE knob.

* Fixed PhysicalShardMove test.

Co-authored-by: He Liu <heliu@apple.com>
2022-09-02 09:07:34 -07:00
Josh Slocum e7a82c9283
Merge pull request #8059 from sfc-gh-jslocum/bw_rk_fix
Adding knob and increasing delay for simulation ratekeeper throttling assert
2022-08-31 15:55:05 -05:00
Josh Slocum dcfc03a247
Merge pull request #8061 from sfc-gh-jslocum/ss_ebrake_speed_up_sim
Minimizing effect of overly small buggified ss ebrake limits when speed up simulation is set
2022-08-31 15:54:41 -05:00
Yi Wu 49503987cc
Support Redwood encryption (#7376)
A new knob `ENABLE_STORAGE_SERVER_ENCRYPTION` is added, which despite its name, currently only Redwood supports it. The knob is mean to be only used in tests to test encryption in individual components, and otherwise enabling encryption should be done through the general `ENABLE_ENCRYPTION` knob.

Under the hood, a new `Encryption` encoding type is added to `IPager`, which use AES-256 to encrypt a page. With this encoding, `BlobCipherEncryptHeader` is inserted into page header for encryption metadata. Moreover, since we compute and store an SHA-256 auth token with the encryption header, we rely on it to checksum the data (and the encryption header), and skip the standard xxhash checksum.

`EncryptionKeyProvider` implements the `IEncryptionKeyProvider` interface to provide encryption keys, which utilizes the existing `getLatestEncryptCipherKey` and `getEncryptCipherKey` actors to fetch encryption keys from either local cache or EKP server. If multi-tenancy is used, for writing a new page, `EncryptionKeyProvider` checks if a page contain only data for a single tenant, if so, fetches tenant specific encryption key; otherwise system encryption key is used. The tenant check is done by extracting tenant id from page bound key prefixes. `EncryptionKeyProvider` also holds a reference of the `tenantPrefixIndex` map maintained by storage server, which is used to check if a tenant do exists, and getting the tenant name in order to get the encryption key.
2022-08-31 12:19:55 -07:00
Josh Slocum c587135988 Minimizing effect of overly small buggified ss ebrake limits when speed up simulation is set 2022-08-31 13:39:50 -05:00
Josh Slocum 9721de70b6 Adding knob and increasing delay for simulation ratekeeper throttling assert 2022-08-31 09:08:27 -05:00
Yao Xiao ac7a5823e2
Add knob for CF write buffer size. (#8038) 2022-08-30 17:52:29 -07:00
Yao Xiao 09f62acd14
Add delay to physical shard clean up. (#7989) 2022-08-29 11:30:50 -07:00
A.J. Beamon 2907d2d4dd
Merge pull request #8004 from sfc-gh-ajbeamon/fix-ub
Fix some undefined bevavior in RK and a unit test
2022-08-29 09:16:11 -07:00
Evan Tschannen 8314e80371
Fixed a few bugs which caused ratekeeper to unnecessarily throttle a cluster (#8006)
* do not count recently created change feeds for throttling

* fix: blocked assignments were not decremented when force purging

* fix: created needs to be updated when the changefeed is reset

* added asserts to detect if ratekeeper is throttled on blob workers
2022-08-26 15:38:31 -07:00
A.J. Beamon 0e782412a8 Fix some undefined bevavior: 1) a unit test was not initializing members of the WorkloadContext it was using, and 2) very large ratekeeper limits for batch priority were overflowing the types used to log them 2022-08-26 14:17:01 -07:00
Ata E Husain Bohra 00fe4863b6
Implement TenantCacheEntry in-memory cache (#7801)
* Implement TenantCacheEntry in-memory cache

Description

  diff-4: TraceEvent usage improvements 
  diff-3: Address review comments
  diff-2: Add APIs to read counter values, test improvements
  diff-1: Address review comments

Major changes includes:
1. Implements an actor that enables an in-memory caching of
TenantCacheEntry object, allowing the caller to embed custom
information along with TenantCacheEntry.
2. The cache follows read-through cache semantics where the entry
gets loaded from underlying database on a miss.
3. The cache implements a "periodic poller" to refresh known Tenants
by consulting the database. Once a database keyrange-watch feature is
available, cache shall be updated.

Bonus:
Implement a 'recurringAsync' addition to genericActors allowing caller
to schedule a periodic task registering an "actor functor"; the routine
'waits' for the actor unlike existing 'recurring' implementation.

Testing

TenantEntryCache workload
devCorrectnessRun - 100K
2022-08-25 11:42:26 -07:00
Ata E Husain Bohra d6b1ac056c
KMS connector to assist encryption enabled perf runs (#7978)
Description

FDB Native encryption requires integration with external
KeyMangement Services to fetch required encryption keys.
For simulation runs, there exists SimKmsConnector implementation
that fakes interaction with external KMS.

Major changes suggested in the patch:
1. Enable setting KMS_CONNECTOR_TYPE via command line arguments.
2. If "FDBPerfKmsConnector" is set as KMS_CONNECTOR_TYPE, then
allow using SimKmsConnector implementation.

Note: SimKmsConnector can handle process reboots.

Testing

devRunCorrectness - 100K
2022-08-25 10:00:46 -07:00
Chaoguang Lin 06aa6ee5ff
Add system monitor for flowprocess (#6925)
* Update network address in trace logs; Add system monitor for flowprocess

* Create a new trace file with the correct process address for flowprocess

* Remove unused debugging traces

* Add a new error lock_file_failure; Change please_reboot_remote_kv_store to please_reboot_kv_store; Add the code to only reboot the kv store but not the worker; Remove some unnecessay traces

* Add error handling for file_not_found in handleIOErrors

* Format worker.actor.cpp file
2022-08-24 00:40:38 -07:00
Evan Tschannen 493771b6a8
Throttle the cluster if the blob manager cannot assign ranges (#7900)
* Throttle the cluster if the blob manager cannot assign ranges

* fixed a number of different bugs which caused ratekeeper to throttle to zero because of blob worker lag

* fix: do not mark an assignment as block if it is cancelled

* remove asserts to merge bug fixes

* fix formatting

* restored old control flow to storage updater

* storage updater did not throw errors

* disable buggify to see if it fixes CI
2022-08-23 13:33:46 -05:00
Hui Liu 33411fd07e
add an option to read from blob storage for moving keys (#7848) 2022-08-23 08:07:17 -05:00
Jingyu Zhou b966c4de0c
Merge pull request #7936 from sfc-gh-tclinkenbeard/increase-tlog-max-create-duration-in-simulation
Increase `TLOG_MAX_CREATE_DURATION` to 15.0 in simulation
2022-08-22 09:31:11 -07:00
Josh Slocum 98a7ec1797
Blob Granules Cleanup (#7941)
* Cleaned up BlobGranule TODO + FIXMEs and addressed some

* popping feed at correct version

* blob worker taking over a granule will pop from where previous worker left off

* addressed fixme of blob worker not re-snapshotting from old change feed

* formatting

* more change feed popped fixes after pop updates

* Getting rid of change feed parallelism lock since it can cause deadlocks in fetching, and relying on full fetch lock

* New blob worker metric and fixing old one

* server-side popped checking still doesn't work because of pops at non-mutation versions

* format
2022-08-19 17:25:31 -07:00
Zhe Wang 2ceaae4219
dd physical shard core (#7703) 2022-08-19 14:47:00 -04:00
sfc-gh-tclinkenbeard 7449dbe5c4 Increase TLOG_MAX_CREATE_DURATION to 15.0 in simulation 2022-08-19 10:35:13 -07:00
He Liu 044f43b6c0
Fix shard mapping read (#7897)
* Read full range.

* Count empty empty shards.

* Added ROCKSDB_READ_RANGE_ROW_LIMIT.

* Fixed BytesLimit issue.

* TraceEvents.

* Cleaned up comments.

Co-authored-by: He Liu <heliu@apple.com>
2022-08-19 09:33:59 -07:00
Yao Xiao 9c20a07d35
Add delete_obsolete_files_period_micros to RocksDB options. (#7908) 2022-08-17 11:21:21 -07:00
Jingyu Zhou 120140b8b1
Merge pull request #7857 from yao-xiao-github/perf-metrics
Fix metrics and add tunable knobs.
2022-08-15 09:30:25 -07:00
Xiaoxi Wang 9133d4e16d
Merge pull request #7803 from sfc-gh-xwang/feature/main/ddvisibility
Add server selection counter in DDQueue
2022-08-12 15:10:25 -07:00
Evan Tschannen a9d3c9f9b3
Added throttling when a blob worker falls behind (#7751)
* throttle the cluster when blob workers fall behind

* do not throttle on blob workers if they are not enabled

* remove an unnecessary actor

* fixed a compile error

* fetch blob worker metrics at the same interval as the rate is updated, avoid fetching the complete blob worker list too frequently

* fixed another compilation bug

* added a 5 second delay before bw throttling to prevent false positives caused by the 100e6 version jump during recovery. Lower the throttling thresholds to react much quicker to bw lag.

* fixed a number of problems

* changed the minBlobVersionRequest to look at storage server versions since this will be a lot more efficient

* fix: do not let desired go backwards

* fix: track the version of notAtLatest changefeeds for throttling

* ratekeeper now throttled blob workers by estimating the transaction per second throughput of the blob workers

* added metrics for blob worker change feeds

* added a knob to disable bw throttling

* fixed the transaction options in blob manager
2022-08-12 13:15:56 -07:00
Jingyu Zhou 5929ac1d65
Merge pull request #7847 from xis19/knobCheck
Cleanup the knobs that are not being used
2022-08-11 21:52:52 -07:00
Xiaoge Su 114c266b04 fixup! Recover the KMS knob which is useful 2022-08-11 15:15:14 -07:00
Yao Xiao 599e4b86d5 Add more knobs 2022-08-11 15:13:42 -07:00
Josh Slocum 44f8bdd258
Blob Worker memory limit (#7858)
* Simulation version of blob_worker_full

* tracking blocked BM assignments

* actual memory estimation implementation
2022-08-11 15:07:08 -07:00
Trevor Clinkenbeard 583021c2d9
Merge pull request #7772 from sfc-gh-tclinkenbeard/global-tag-throttling6
Add status section for global tag throttler
2022-08-11 17:38:31 -03:00
Yao Xiao fcfe6a8c29 Fix metrics and add knobs 2022-08-11 11:46:21 -07:00
Xiaoge Su 85836cbec9 Cleanup the knobs that are not being used 2022-08-10 16:51:01 -07:00
Xiaoxi Wang 1cff154adb Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/ddvisibility 2022-08-10 12:03:42 -07:00
Xiaoxi Wang f6dce5dcee add knob to control summarize 2022-08-10 09:44:01 -07:00
Xiaoxi Wang 319ec2f43d add summarize event 2022-08-09 18:22:48 -07:00
Xiaoxi Wang 9032f6c394 add comments; change default knobs 2022-08-09 16:45:15 -07:00
Josh Slocum 62494f048c
several changes to manage blob worker memory more and to test that management (#7834) 2022-08-09 17:53:52 -05:00
Xiaoxi Wang ea0c60381f merge upstream/main 2022-08-09 12:28:57 -07:00
Jingyu Zhou eba77d78f4 Add knobs for min/max Ratekeeper limit
The default has no effects.
2022-08-08 15:27:21 -07:00
Xiaoxi Wang 2b2bc12cc1 Update Document; set log limit 2022-08-08 10:04:48 -07:00
Xiaoxi Wang b18e29dd87 Merge remote-tracking branch 'upstream' into feature/main/ddvisibility 2022-08-06 21:43:36 -07:00
Xiaoxi Wang 8ecee1992b
Merge pull request #7777 from sfc-gh-xwang/feature/main/eligible-wiggle
Make storage wiggler support SS_MIN_AGE
2022-08-06 17:01:46 -07:00
Xiaoxi Wang 06ae0a2f4c ddqueue.periodicalRefreshCounter() 2022-08-05 15:26:34 -07:00
Josh Slocum b2835921ba
Using knownBlobRanges for blob granule ranges whether tenants are enabled or not (#7788)
* Using knownBlobRanges for blob granule ranges whether tenants are enabled or not

* Effectively disabled blob granule tests when tenants enabled to fix ctest
2022-08-05 11:46:09 -05:00
sfc-gh-tclinkenbeard 1bd47a07b2 Add ENFORCE_TAG_THROTTLING_ON_PROXIES knob 2022-08-05 00:40:10 -07:00
Xiaoxi Wang 4b3acf6b4d update SS_MIN_AGE=21 day 2022-08-03 17:18:15 -07:00
He Liu fa418fd784
Change SHARD_ENCODE_LOCATION_METADATA to a server knob. (#7770)
Co-authored-by: He Liu <heliu@apple.com>
2022-08-03 13:51:40 -07:00
Xiaoxi Wang 2cd15073d5 make storage wiggler support SS_MIN_AGE 2022-08-02 16:21:41 -07:00
Trevor Clinkenbeard edf4e60fa9
Merge pull request #7631 from sfc-gh-tclinkenbeard/global-tag-throttling5
Improvements to `GlobalTagThrottler`
2022-08-02 16:04:20 -07:00
Dennis Zhou b34a54fa7f
blob: allow for alignment of granules to tuple boundaries (#7746)
* blob: read TenantMap during recovery

Future functionality in the blob subsystem will rely on the tenant data
being loaded. This fixes this issue by loading the tenant data before
completing recovery such that continued actions on existing blob
granules will have access to the tenant data.

Example scenario with failover, splits are restarted before loading the
tenant data:
BM - BlobManager
epoch 3:                        epoch 4:
  BM record intent to split.
  Epoch fails.
                                BM recovery begins.
  BM fails to persist split.
                                BM recovery finishes.
                                BM.checkBlobWorkerList()
                                  maybeSplitRange().
                                BM.monitorClientRanges().
                                  loads tenant data.

bin/fdbserver -r simulation -f tests/slow/BlobGranuleCorrectness.toml \
    -s 223570924 -b on  --crash --trace_format json

* blob: add tuple key truncation for blob granule alignment

FDB has a backup system available using the blob manager and blob
granule subsystem. If we want to audit the data in the blobs, it's a lot
easier if we can align them to something meaningful.

When a blob granule is being split, we ask the storage metrics system
for split points as it holds approximate data distribution metrics.
These keys are then processed to determine if they are a tuple and
should be truncated according to the new knob,
BG_KEY_TUPLE_TRUNCATE_OFFSET.

Here we keep all aligned keys together in the same granule even if it is
larger than the allowed granule size. The following commit will address
this by adding merge boundaries.

* blob: minor clean ups in merging code

1. Rename mergeNow -> seen. This is more inline with clocksweep naming
   and removes the confusion between mergeNow and canMergeNow.
2. Make clearMergeCandidate() reset to MergeCandidateCannotMerge to make
   a clear distinction what we're accomplishing.
3. Rename canMergeNow() -> mergeEligble().

* blob: add explicit (hard) boundaries

Blob ranges can be specified either through explicit ranges or at the
tenant level. Right now this is managed implicitly. This commit aims to
make it a little more explicit.

Blobification begins in monitorClientRanges() which parses either the
explicit blob ranges or the tenant map. As we do this and add new
ranges, let's explicitly track what is a hard boundary and what isn't.

When blob merging occurs, we respect this boundary. When a hard boundary
is encountered, we submit the found eligible ranges and start looking
for a new range beginning with this hard boundary.

* blob: create BlobGranuleSplitPoints struct

This is a setup for the following commit. Our goal here is to provide a
structure for split points to be passed around. The need is for us to be
able to carry uncommitted state until it is committed and we can apply
these mutations to the in-memory data structures.

* blob: implement soft boundaries

An earlier commit establishes the need to create data boundaries within
a tenant. The reality is we may encounter a set of keys that degnerate
to the same key prefix. We'll need to be able to split those across
granules, but we want to ensure we merge the split granules together
before merging with other granules.

This adds to the BlobGranuleSplitPoints state of new
BlobGranuleMergeBoundary items. BlobGranuleMergeBoundary contains state
saying if it is a left or right boundary. This information is used to,
like hard boundaries, force merging of like granules first.

We read the BlobGranuleMergeBoundary map into memory at recovery.
2022-08-02 16:06:25 -05:00