Commit Graph

391 Commits

Author SHA1 Message Date
Markus Pilman 5774249e5b
Revert "[DRAFT] Redwood PriorityMultiLock enable different launch limits to be specified based on different priority level." 2022-09-23 12:22:47 -06:00
Ata E Husain Bohra 52169d2b8e
Enable ZSTD compression support (#8014)
* Enable ZSTD compression filter

Description

  diff-4: Randomize Knob Compression filter selection
  diff-3: Minor refactoring
  diff-2: Limit ZSTD availability to CLANG compiler
  diff-1: Add ZSTD compression option to BlobGranule tests

Major changes includes:
1. Update FDB CMake to download, install and build Boost with
ZSTD compatibility
2. Update CompressionUtils to enable boost::iostreams::zstd
compression filter

Testing

CompressionUtilsUnit.toml
BlobGranuleCorrectness/BlobGranuleCorrectnessClean
devRunCorrectness - 100K (in-progress)
2022-09-22 14:31:49 -07:00
Steve Atherton 04b4960786 Merge branch 'main' into fzhao/RedwoodIOLaunchLimit
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/ReadWrite.actor.cpp
2022-09-22 00:39:51 -07:00
neethuhaneesha c4f59ba654
Merge pull request #8261 from neethuhaneesha/rocksdb-cachesize
RocksDB block cache size increase.
2022-09-21 14:50:31 -07:00
Josh Slocum 6270016bed
Seq insert perf fixes main (#8264)
* Force flushing granules post-split to guarantee parent feeds get cleaned up

* fixing bug and cleaning up split finalize code
2022-09-21 12:36:02 -07:00
neethuhaneesha bc9e806f1c RocksDB block cache size increase. 2022-09-21 11:41:27 -07:00
Fuheng Zhao 92aee50dbf update format 2022-09-20 22:51:45 -07:00
Fuheng Zhao 7002150188 add storage server read concurrency knob 2022-09-20 21:23:54 -07:00
Steve Atherton 3c306cc558 Merge branch 'fzhao/feature-testing' into fzhao/RedwoodIOLaunchLimit 2022-09-20 01:14:04 -07:00
Steve Atherton 74b152e550 Removed two obsolete things: explicit maxPriority argument from PriorityMultiLock as it is redundant after the launch limit refactor, and the redwood read concurrency lock which is no longer needed after the StorageServer priority refactor as it will control the concurrency of requests sent to the KVS. 2022-09-18 18:23:15 -07:00
Steve Atherton 31a66f9f12
Add comment explaining STORAGESERVER_READ_RANKS 2022-09-17 22:44:45 -07:00
sfc-gh-ngoyal 1bd97fe628
Recruit new singleton for consistency checker. (#5804)
* Recruit new singleton for consistency checker.

* Recruit the consistency checker only if enabled.

* Add a yield in monitorConsistencyChecker().

* Minor fixes.

* Consistency check workload enhancements.

* Minor fixes and clarifications.

* clang format

* Clang format.

* Minor fixes, cleanup, debug tracing.

* Misc.

* Move the consistency scan information from dbconfig to a key backed object.

* Move consistency scan config out of db cofig to a state object and feature rename.

* ConsistencyCheck workload refactor.

* devFormat

* Update fdbcli/ConsistencyScanCommand.actor.cpp

* Review Comments.

Co-authored-by: negoyal <neelam.goyal@gmail.com>
Co-authored-by: Ata E Husain Bohra <ata.husain@snowflake.com>
2022-09-16 09:03:06 -07:00
Fuheng Zhao ac65c3f569 merge upstream main 2022-09-15 14:19:19 -07:00
Hui Liu 59be25848f bootstrap blob manager and blob worker from blob manifest 2022-09-15 09:50:12 -07:00
Ata E Husain Bohra b540a3d6b9
Disable zlib find_package, effectively disable gzip compression (#8179)
Description

find_package was used to find and link `zlib` library needed to enable
boost::gzip compression filter. However, the code adds dynamic linkage
of zlib shared object with generated binaries (fdbserver for instance).

For now disable the ZLIB find code to effectively disable GZIP compression
support.

Testing
2022-09-14 14:03:13 -07:00
Josh Slocum d4ba6c266c
Merge pull request #8176 from sfc-gh-jslocum/ss_cf_burst_fix_main
Fixing Thundering Herd problem of change feed stream retries in SS
2022-09-14 16:01:20 -05:00
Jingyu Zhou e70a18e638
Merge pull request #8122 from xumengpanda/mengxu/io-timeout-main
Add STORAGE_SERVER_REBOOT_ON_IO_TIMEOUT knob to reboot SS on io_timeout
2022-09-14 10:11:46 -07:00
Josh Slocum 3e5e49b635 Operational improvements to limit thundering herd effect of many change feed queries being retried simultaneously 2022-09-14 09:57:21 -05:00
Lukas Joswiak 09892df0b0 Remove unused knob 2022-09-13 16:53:54 -07:00
Lukas Joswiak b2d395a304 Delay cluster controller restart when pushing knob updates to workers
This gives the `ConfigBroadcaster` time to send the knob change to all
workers before applying the change to itself and restarting.
2022-09-13 16:53:54 -07:00
Fuheng Zhao ee99de7cf3
Merge branch 'apple:main' into RedwoodIOLaunchLimit 2022-09-12 10:58:12 -07:00
Josh Slocum e66015cbe4
Including change feed bytes towards storage queue (#8107) 2022-09-08 16:37:44 -07:00
Fuheng Zhao f64f245269 update storage server PML parameters 2022-09-08 10:49:18 -07:00
Meng Xu 358a0dd88d Add STORAGE_SERVER_REBOOT_ON_IO_TIMEOUT knob to reboot SS on io_timeout 2022-09-07 15:30:04 -07:00
He Liu 0bbce98da2
Disable shard aware (#8072)
* Removed STORAGE_SERVER_SHARD_AWARE knob.

* Fixed PhysicalShardMove test.

Co-authored-by: He Liu <heliu@apple.com>
2022-09-02 09:07:34 -07:00
Fuheng Zhao fe5f1fab19 sync 2022-08-31 16:11:21 -07:00
Fuheng Zhao 0aa096dc17 sync with upstream main 2022-08-31 15:46:39 -07:00
Josh Slocum e7a82c9283
Merge pull request #8059 from sfc-gh-jslocum/bw_rk_fix
Adding knob and increasing delay for simulation ratekeeper throttling assert
2022-08-31 15:55:05 -05:00
Josh Slocum dcfc03a247
Merge pull request #8061 from sfc-gh-jslocum/ss_ebrake_speed_up_sim
Minimizing effect of overly small buggified ss ebrake limits when speed up simulation is set
2022-08-31 15:54:41 -05:00
Yi Wu 49503987cc
Support Redwood encryption (#7376)
A new knob `ENABLE_STORAGE_SERVER_ENCRYPTION` is added, which despite its name, currently only Redwood supports it. The knob is mean to be only used in tests to test encryption in individual components, and otherwise enabling encryption should be done through the general `ENABLE_ENCRYPTION` knob.

Under the hood, a new `Encryption` encoding type is added to `IPager`, which use AES-256 to encrypt a page. With this encoding, `BlobCipherEncryptHeader` is inserted into page header for encryption metadata. Moreover, since we compute and store an SHA-256 auth token with the encryption header, we rely on it to checksum the data (and the encryption header), and skip the standard xxhash checksum.

`EncryptionKeyProvider` implements the `IEncryptionKeyProvider` interface to provide encryption keys, which utilizes the existing `getLatestEncryptCipherKey` and `getEncryptCipherKey` actors to fetch encryption keys from either local cache or EKP server. If multi-tenancy is used, for writing a new page, `EncryptionKeyProvider` checks if a page contain only data for a single tenant, if so, fetches tenant specific encryption key; otherwise system encryption key is used. The tenant check is done by extracting tenant id from page bound key prefixes. `EncryptionKeyProvider` also holds a reference of the `tenantPrefixIndex` map maintained by storage server, which is used to check if a tenant do exists, and getting the tenant name in order to get the encryption key.
2022-08-31 12:19:55 -07:00
Josh Slocum c587135988 Minimizing effect of overly small buggified ss ebrake limits when speed up simulation is set 2022-08-31 13:39:50 -05:00
Josh Slocum 9721de70b6 Adding knob and increasing delay for simulation ratekeeper throttling assert 2022-08-31 09:08:27 -05:00
Yao Xiao ac7a5823e2
Add knob for CF write buffer size. (#8038) 2022-08-30 17:52:29 -07:00
Fuheng Zhao 620c119e9a update storage server priorities 2022-08-30 12:07:45 -07:00
Yao Xiao 09f62acd14
Add delay to physical shard clean up. (#7989) 2022-08-29 11:30:50 -07:00
A.J. Beamon 2907d2d4dd
Merge pull request #8004 from sfc-gh-ajbeamon/fix-ub
Fix some undefined bevavior in RK and a unit test
2022-08-29 09:16:11 -07:00
Evan Tschannen 8314e80371
Fixed a few bugs which caused ratekeeper to unnecessarily throttle a cluster (#8006)
* do not count recently created change feeds for throttling

* fix: blocked assignments were not decremented when force purging

* fix: created needs to be updated when the changefeed is reset

* added asserts to detect if ratekeeper is throttled on blob workers
2022-08-26 15:38:31 -07:00
A.J. Beamon 0e782412a8 Fix some undefined bevavior: 1) a unit test was not initializing members of the WorkloadContext it was using, and 2) very large ratekeeper limits for batch priority were overflowing the types used to log them 2022-08-26 14:17:01 -07:00
Ata E Husain Bohra 00fe4863b6
Implement TenantCacheEntry in-memory cache (#7801)
* Implement TenantCacheEntry in-memory cache

Description

  diff-4: TraceEvent usage improvements 
  diff-3: Address review comments
  diff-2: Add APIs to read counter values, test improvements
  diff-1: Address review comments

Major changes includes:
1. Implements an actor that enables an in-memory caching of
TenantCacheEntry object, allowing the caller to embed custom
information along with TenantCacheEntry.
2. The cache follows read-through cache semantics where the entry
gets loaded from underlying database on a miss.
3. The cache implements a "periodic poller" to refresh known Tenants
by consulting the database. Once a database keyrange-watch feature is
available, cache shall be updated.

Bonus:
Implement a 'recurringAsync' addition to genericActors allowing caller
to schedule a periodic task registering an "actor functor"; the routine
'waits' for the actor unlike existing 'recurring' implementation.

Testing

TenantEntryCache workload
devCorrectnessRun - 100K
2022-08-25 11:42:26 -07:00
Ata E Husain Bohra d6b1ac056c
KMS connector to assist encryption enabled perf runs (#7978)
Description

FDB Native encryption requires integration with external
KeyMangement Services to fetch required encryption keys.
For simulation runs, there exists SimKmsConnector implementation
that fakes interaction with external KMS.

Major changes suggested in the patch:
1. Enable setting KMS_CONNECTOR_TYPE via command line arguments.
2. If "FDBPerfKmsConnector" is set as KMS_CONNECTOR_TYPE, then
allow using SimKmsConnector implementation.

Note: SimKmsConnector can handle process reboots.

Testing

devRunCorrectness - 100K
2022-08-25 10:00:46 -07:00
Chaoguang Lin 06aa6ee5ff
Add system monitor for flowprocess (#6925)
* Update network address in trace logs; Add system monitor for flowprocess

* Create a new trace file with the correct process address for flowprocess

* Remove unused debugging traces

* Add a new error lock_file_failure; Change please_reboot_remote_kv_store to please_reboot_kv_store; Add the code to only reboot the kv store but not the worker; Remove some unnecessay traces

* Add error handling for file_not_found in handleIOErrors

* Format worker.actor.cpp file
2022-08-24 00:40:38 -07:00
Evan Tschannen 493771b6a8
Throttle the cluster if the blob manager cannot assign ranges (#7900)
* Throttle the cluster if the blob manager cannot assign ranges

* fixed a number of different bugs which caused ratekeeper to throttle to zero because of blob worker lag

* fix: do not mark an assignment as block if it is cancelled

* remove asserts to merge bug fixes

* fix formatting

* restored old control flow to storage updater

* storage updater did not throw errors

* disable buggify to see if it fixes CI
2022-08-23 13:33:46 -05:00
Hui Liu 33411fd07e
add an option to read from blob storage for moving keys (#7848) 2022-08-23 08:07:17 -05:00
Jingyu Zhou b966c4de0c
Merge pull request #7936 from sfc-gh-tclinkenbeard/increase-tlog-max-create-duration-in-simulation
Increase `TLOG_MAX_CREATE_DURATION` to 15.0 in simulation
2022-08-22 09:31:11 -07:00
Josh Slocum 98a7ec1797
Blob Granules Cleanup (#7941)
* Cleaned up BlobGranule TODO + FIXMEs and addressed some

* popping feed at correct version

* blob worker taking over a granule will pop from where previous worker left off

* addressed fixme of blob worker not re-snapshotting from old change feed

* formatting

* more change feed popped fixes after pop updates

* Getting rid of change feed parallelism lock since it can cause deadlocks in fetching, and relying on full fetch lock

* New blob worker metric and fixing old one

* server-side popped checking still doesn't work because of pops at non-mutation versions

* format
2022-08-19 17:25:31 -07:00
Zhe Wang 2ceaae4219
dd physical shard core (#7703) 2022-08-19 14:47:00 -04:00
sfc-gh-tclinkenbeard 7449dbe5c4 Increase TLOG_MAX_CREATE_DURATION to 15.0 in simulation 2022-08-19 10:35:13 -07:00
He Liu 044f43b6c0
Fix shard mapping read (#7897)
* Read full range.

* Count empty empty shards.

* Added ROCKSDB_READ_RANGE_ROW_LIMIT.

* Fixed BytesLimit issue.

* TraceEvents.

* Cleaned up comments.

Co-authored-by: He Liu <heliu@apple.com>
2022-08-19 09:33:59 -07:00
Yao Xiao 9c20a07d35
Add delete_obsolete_files_period_micros to RocksDB options. (#7908) 2022-08-17 11:21:21 -07:00
Jingyu Zhou 120140b8b1
Merge pull request #7857 from yao-xiao-github/perf-metrics
Fix metrics and add tunable knobs.
2022-08-15 09:30:25 -07:00