Commit Graph

529 Commits

Author SHA1 Message Date
sfc-gh-tclinkenbeard 003986fdb0 Randomize GLOBAL_TAG_THROTTLING knob 2022-10-18 15:16:24 -07:00
sfc-gh-tclinkenbeard 300840ea2e Enable GLOBAL_TAG_THROTTLING by default 2022-10-18 15:16:24 -07:00
Ankita Kejriwal fb014a4834 Merge branch 'main' of github.com:apple/foundationdb into monitorusage 2022-10-18 13:14:47 -07:00
sfc-gh-tclinkenbeard a556f21ed2 Merge remote-tracking branch 'origin/main' into limit-gtt-size 2022-10-15 10:26:16 -07:00
sfc-gh-tclinkenbeard 4ab947a9b3 Merge remote-tracking branch 'origin/main' into limit-gtt-size 2022-10-14 15:59:57 -07:00
Josh Slocum 914dfd7438
REST Blob metadata kms connector (#8474)
* Changing RESTKMSConnector request handling to not be synchronous, as that would be a huge perf bottleneck

* Implementation of blob kms fetch

* cleanup

* review comments
2022-10-14 17:49:00 -05:00
Josh Slocum 7cec0a5249
Blob metadata refresh (#8456)
* Adding EKP refresh of blob metadata

* Adding re-fetching blob metadata from BGTenantMap

* adding buggifies from code review comments
2022-10-14 08:17:50 -05:00
Hui Liu 169c341f79
Merge pull request #8386 from sfc-gh-huliu/blobmigrator
Add blob migrator to assist data copy from blob to storage server
2022-10-13 14:46:04 -07:00
Hui Liu 049df622f1 add a blob migrator 2022-10-13 13:21:45 -07:00
He Liu a43e424d8a Merge branch 'main' of https://github.com/apple/foundationdb into validate-data-consistency 2022-10-12 13:01:07 -07:00
Steve Atherton 52c831b7cd Merge commit 'd74c07238241305042bc2519e2952bdcf8f54351' into storageserver-pml
# Conflicts:
#	fdbclient/ServerKnobs.cpp
2022-10-11 00:19:20 -07:00
Steve Atherton d74c072382
Merge pull request #8448 from sfc-gh-yiwu/encrypt_disable
Add a knob to disable Redwood tenant page split
2022-10-10 21:17:43 -07:00
Yi Wu 5dc14accfd Add a knob to disable Redwood tenant page split 2022-10-10 17:02:37 -07:00
He Liu a730e32164 Merge branch 'main' of https://github.com/apple/foundationdb into validate-data-consistency 2022-10-10 13:17:17 -07:00
neethuhaneesha 4b238e3985
RocksDB disable WAL for experimentation. (#8443) 2022-10-10 12:32:05 -07:00
sfc-gh-tclinkenbeard b887b1e85c Limit number of tags tracked by GlobalTagThrottler 2022-10-10 12:28:26 -07:00
He Liu b52edd8658 Merge branch 'main' of https://github.com/apple/foundationdb into validate-data-consistency 2022-10-10 11:00:05 -07:00
Steve Atherton 3228afefd3 Unrevert #7578 - storage server PriorityMultiLock and PML rewrite. 2022-10-06 23:41:28 -07:00
He Liu 88c37c81d8 Merge branch 'main' of https://github.com/apple/foundationdb into thread-priority 2022-10-05 14:20:53 -07:00
Hui Liu 9799329b99
Merge pull request #8390 from sfc-gh-huliu/actor
Misc changes for blob manifest dumper
2022-10-05 10:30:56 -07:00
neethuhaneesha a565863189
RocksDB compaction knobs and stats. (#8392) 2022-10-04 15:13:50 -07:00
Hui Liu 50b3a8d3ba misc changes for manifest dumper
1) move dumper to a dedicated actor
2) include blob manager epoch into the manifest name
3) keep multiple manifest files(up to 5)
2022-10-04 10:33:43 -07:00
He Liu 5f975623fb Merge branch 'main' of https://github.com/apple/foundationdb into thread-priority 2022-09-30 08:57:53 -07:00
Jingyu Zhou c321d633c7
Merge pull request #8361 from halfprice/zhewu/update-default-gray-latency-percentile
Update default peer latency degradation percentile to 0.5
2022-09-30 08:34:31 -07:00
He Liu 63b8d775a3
Knob for RocksDb behaviors (#8360)
* Added knob for rocksdb suggest compact.

* Added ROCKSDB_LEVEL_COMPACTION_DYNAMIC_LEVEL_BYTES knob.

Co-authored-by: He Liu <heliu@apple.com>
2022-09-29 15:40:29 -07:00
Zhe Wu 78f29ecd88 Update default PEER_LATENCY_DEGRADATION_PERCENTILE to 0.5 2022-09-29 14:18:26 -07:00
Trevor Clinkenbeard ef13985feb
Merge pull request #8320 from sfc-gh-tclinkenbeard/read-write-fungible
Make read and write quotas fungible
2022-09-28 10:59:37 -07:00
sfc-gh-tclinkenbeard ba8fbc9573 Merge remote-tracking branch 'origin/main' into read-write-fungible 2022-09-26 23:26:14 -07:00
Zhe Wu 0188f2712b Update default PEER_DEGRADATION_CONNECTION_FAILURE_COUNT value 2022-09-26 14:45:47 -07:00
Evan Tschannen a900d8dfa9
increase the target blob worker lag to account for the slow startup time when a blob worker reboots (#8294) 2022-09-26 09:19:38 -07:00
sfc-gh-tclinkenbeard 7fc5c196c4 Make read and write quotas fungible 2022-09-25 21:00:11 -07:00
Markus Pilman 5774249e5b
Revert "[DRAFT] Redwood PriorityMultiLock enable different launch limits to be specified based on different priority level." 2022-09-23 12:22:47 -06:00
Ata E Husain Bohra 52169d2b8e
Enable ZSTD compression support (#8014)
* Enable ZSTD compression filter

Description

  diff-4: Randomize Knob Compression filter selection
  diff-3: Minor refactoring
  diff-2: Limit ZSTD availability to CLANG compiler
  diff-1: Add ZSTD compression option to BlobGranule tests

Major changes includes:
1. Update FDB CMake to download, install and build Boost with
ZSTD compatibility
2. Update CompressionUtils to enable boost::iostreams::zstd
compression filter

Testing

CompressionUtilsUnit.toml
BlobGranuleCorrectness/BlobGranuleCorrectnessClean
devRunCorrectness - 100K (in-progress)
2022-09-22 14:31:49 -07:00
Steve Atherton 04b4960786 Merge branch 'main' into fzhao/RedwoodIOLaunchLimit
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/ReadWrite.actor.cpp
2022-09-22 00:39:51 -07:00
neethuhaneesha c4f59ba654
Merge pull request #8261 from neethuhaneesha/rocksdb-cachesize
RocksDB block cache size increase.
2022-09-21 14:50:31 -07:00
Josh Slocum 6270016bed
Seq insert perf fixes main (#8264)
* Force flushing granules post-split to guarantee parent feeds get cleaned up

* fixing bug and cleaning up split finalize code
2022-09-21 12:36:02 -07:00
neethuhaneesha bc9e806f1c RocksDB block cache size increase. 2022-09-21 11:41:27 -07:00
Fuheng Zhao 92aee50dbf update format 2022-09-20 22:51:45 -07:00
Fuheng Zhao 7002150188 add storage server read concurrency knob 2022-09-20 21:23:54 -07:00
Steve Atherton 3c306cc558 Merge branch 'fzhao/feature-testing' into fzhao/RedwoodIOLaunchLimit 2022-09-20 01:14:04 -07:00
He Liu 2e2c198df7 Added knobs for RocksDB reader/writer thread priorities. 2022-09-19 11:20:17 -07:00
Steve Atherton 74b152e550 Removed two obsolete things: explicit maxPriority argument from PriorityMultiLock as it is redundant after the launch limit refactor, and the redwood read concurrency lock which is no longer needed after the StorageServer priority refactor as it will control the concurrency of requests sent to the KVS. 2022-09-18 18:23:15 -07:00
Steve Atherton 31a66f9f12
Add comment explaining STORAGESERVER_READ_RANKS 2022-09-17 22:44:45 -07:00
sfc-gh-ngoyal 1bd97fe628
Recruit new singleton for consistency checker. (#5804)
* Recruit new singleton for consistency checker.

* Recruit the consistency checker only if enabled.

* Add a yield in monitorConsistencyChecker().

* Minor fixes.

* Consistency check workload enhancements.

* Minor fixes and clarifications.

* clang format

* Clang format.

* Minor fixes, cleanup, debug tracing.

* Misc.

* Move the consistency scan information from dbconfig to a key backed object.

* Move consistency scan config out of db cofig to a state object and feature rename.

* ConsistencyCheck workload refactor.

* devFormat

* Update fdbcli/ConsistencyScanCommand.actor.cpp

* Review Comments.

Co-authored-by: negoyal <neelam.goyal@gmail.com>
Co-authored-by: Ata E Husain Bohra <ata.husain@snowflake.com>
2022-09-16 09:03:06 -07:00
Fuheng Zhao ac65c3f569 merge upstream main 2022-09-15 14:19:19 -07:00
He Liu 6f7968b618 Merge branch 'main' of https://github.com/apple/foundationdb into validate-data-consistency 2022-09-15 10:15:33 -07:00
Hui Liu 59be25848f bootstrap blob manager and blob worker from blob manifest 2022-09-15 09:50:12 -07:00
Ata E Husain Bohra b540a3d6b9
Disable zlib find_package, effectively disable gzip compression (#8179)
Description

find_package was used to find and link `zlib` library needed to enable
boost::gzip compression filter. However, the code adds dynamic linkage
of zlib shared object with generated binaries (fdbserver for instance).

For now disable the ZLIB find code to effectively disable GZIP compression
support.

Testing
2022-09-14 14:03:13 -07:00
Josh Slocum d4ba6c266c
Merge pull request #8176 from sfc-gh-jslocum/ss_cf_burst_fix_main
Fixing Thundering Herd problem of change feed stream retries in SS
2022-09-14 16:01:20 -05:00
Jingyu Zhou e70a18e638
Merge pull request #8122 from xumengpanda/mengxu/io-timeout-main
Add STORAGE_SERVER_REBOOT_ON_IO_TIMEOUT knob to reboot SS on io_timeout
2022-09-14 10:11:46 -07:00
Josh Slocum 3e5e49b635 Operational improvements to limit thundering herd effect of many change feed queries being retried simultaneously 2022-09-14 09:57:21 -05:00
Lukas Joswiak 09892df0b0 Remove unused knob 2022-09-13 16:53:54 -07:00
Lukas Joswiak b2d395a304 Delay cluster controller restart when pushing knob updates to workers
This gives the `ConfigBroadcaster` time to send the knob change to all
workers before applying the change to itself and restarting.
2022-09-13 16:53:54 -07:00
He Liu 958b28497e Merge branch 'main' of https://github.com/apple/foundationdb into validate-data-consistency 2022-09-13 13:55:01 -07:00
Fuheng Zhao ee99de7cf3
Merge branch 'apple:main' into RedwoodIOLaunchLimit 2022-09-12 10:58:12 -07:00
Josh Slocum e66015cbe4
Including change feed bytes towards storage queue (#8107) 2022-09-08 16:37:44 -07:00
He Liu 9e911956d9 Handle GetKeyValuesReply errors. 2022-09-08 15:41:22 -07:00
Fuheng Zhao f64f245269 update storage server PML parameters 2022-09-08 10:49:18 -07:00
Meng Xu 358a0dd88d Add STORAGE_SERVER_REBOOT_ON_IO_TIMEOUT knob to reboot SS on io_timeout 2022-09-07 15:30:04 -07:00
He Liu 033741daab Audit should always complete, any failures are retried. 2022-09-02 09:11:19 -07:00
He Liu 0bbce98da2
Disable shard aware (#8072)
* Removed STORAGE_SERVER_SHARD_AWARE knob.

* Fixed PhysicalShardMove test.

Co-authored-by: He Liu <heliu@apple.com>
2022-09-02 09:07:34 -07:00
Fuheng Zhao fe5f1fab19 sync 2022-08-31 16:11:21 -07:00
Fuheng Zhao 0aa096dc17 sync with upstream main 2022-08-31 15:46:39 -07:00
Josh Slocum e7a82c9283
Merge pull request #8059 from sfc-gh-jslocum/bw_rk_fix
Adding knob and increasing delay for simulation ratekeeper throttling assert
2022-08-31 15:55:05 -05:00
Josh Slocum dcfc03a247
Merge pull request #8061 from sfc-gh-jslocum/ss_ebrake_speed_up_sim
Minimizing effect of overly small buggified ss ebrake limits when speed up simulation is set
2022-08-31 15:54:41 -05:00
Yi Wu 49503987cc
Support Redwood encryption (#7376)
A new knob `ENABLE_STORAGE_SERVER_ENCRYPTION` is added, which despite its name, currently only Redwood supports it. The knob is mean to be only used in tests to test encryption in individual components, and otherwise enabling encryption should be done through the general `ENABLE_ENCRYPTION` knob.

Under the hood, a new `Encryption` encoding type is added to `IPager`, which use AES-256 to encrypt a page. With this encoding, `BlobCipherEncryptHeader` is inserted into page header for encryption metadata. Moreover, since we compute and store an SHA-256 auth token with the encryption header, we rely on it to checksum the data (and the encryption header), and skip the standard xxhash checksum.

`EncryptionKeyProvider` implements the `IEncryptionKeyProvider` interface to provide encryption keys, which utilizes the existing `getLatestEncryptCipherKey` and `getEncryptCipherKey` actors to fetch encryption keys from either local cache or EKP server. If multi-tenancy is used, for writing a new page, `EncryptionKeyProvider` checks if a page contain only data for a single tenant, if so, fetches tenant specific encryption key; otherwise system encryption key is used. The tenant check is done by extracting tenant id from page bound key prefixes. `EncryptionKeyProvider` also holds a reference of the `tenantPrefixIndex` map maintained by storage server, which is used to check if a tenant do exists, and getting the tenant name in order to get the encryption key.
2022-08-31 12:19:55 -07:00
Josh Slocum c587135988 Minimizing effect of overly small buggified ss ebrake limits when speed up simulation is set 2022-08-31 13:39:50 -05:00
Josh Slocum 9721de70b6 Adding knob and increasing delay for simulation ratekeeper throttling assert 2022-08-31 09:08:27 -05:00
Yao Xiao ac7a5823e2
Add knob for CF write buffer size. (#8038) 2022-08-30 17:52:29 -07:00
Fuheng Zhao 620c119e9a update storage server priorities 2022-08-30 12:07:45 -07:00
Yao Xiao 09f62acd14
Add delay to physical shard clean up. (#7989) 2022-08-29 11:30:50 -07:00
A.J. Beamon 2907d2d4dd
Merge pull request #8004 from sfc-gh-ajbeamon/fix-ub
Fix some undefined bevavior in RK and a unit test
2022-08-29 09:16:11 -07:00
Evan Tschannen 8314e80371
Fixed a few bugs which caused ratekeeper to unnecessarily throttle a cluster (#8006)
* do not count recently created change feeds for throttling

* fix: blocked assignments were not decremented when force purging

* fix: created needs to be updated when the changefeed is reset

* added asserts to detect if ratekeeper is throttled on blob workers
2022-08-26 15:38:31 -07:00
A.J. Beamon 0e782412a8 Fix some undefined bevavior: 1) a unit test was not initializing members of the WorkloadContext it was using, and 2) very large ratekeeper limits for batch priority were overflowing the types used to log them 2022-08-26 14:17:01 -07:00
Ata E Husain Bohra 00fe4863b6
Implement TenantCacheEntry in-memory cache (#7801)
* Implement TenantCacheEntry in-memory cache

Description

  diff-4: TraceEvent usage improvements 
  diff-3: Address review comments
  diff-2: Add APIs to read counter values, test improvements
  diff-1: Address review comments

Major changes includes:
1. Implements an actor that enables an in-memory caching of
TenantCacheEntry object, allowing the caller to embed custom
information along with TenantCacheEntry.
2. The cache follows read-through cache semantics where the entry
gets loaded from underlying database on a miss.
3. The cache implements a "periodic poller" to refresh known Tenants
by consulting the database. Once a database keyrange-watch feature is
available, cache shall be updated.

Bonus:
Implement a 'recurringAsync' addition to genericActors allowing caller
to schedule a periodic task registering an "actor functor"; the routine
'waits' for the actor unlike existing 'recurring' implementation.

Testing

TenantEntryCache workload
devCorrectnessRun - 100K
2022-08-25 11:42:26 -07:00
Ata E Husain Bohra d6b1ac056c
KMS connector to assist encryption enabled perf runs (#7978)
Description

FDB Native encryption requires integration with external
KeyMangement Services to fetch required encryption keys.
For simulation runs, there exists SimKmsConnector implementation
that fakes interaction with external KMS.

Major changes suggested in the patch:
1. Enable setting KMS_CONNECTOR_TYPE via command line arguments.
2. If "FDBPerfKmsConnector" is set as KMS_CONNECTOR_TYPE, then
allow using SimKmsConnector implementation.

Note: SimKmsConnector can handle process reboots.

Testing

devRunCorrectness - 100K
2022-08-25 10:00:46 -07:00
Chaoguang Lin 06aa6ee5ff
Add system monitor for flowprocess (#6925)
* Update network address in trace logs; Add system monitor for flowprocess

* Create a new trace file with the correct process address for flowprocess

* Remove unused debugging traces

* Add a new error lock_file_failure; Change please_reboot_remote_kv_store to please_reboot_kv_store; Add the code to only reboot the kv store but not the worker; Remove some unnecessay traces

* Add error handling for file_not_found in handleIOErrors

* Format worker.actor.cpp file
2022-08-24 00:40:38 -07:00
Ankita Kejriwal c47e1b6f53 Add a knob for tenant cache storage usage refresh interval and some minor fixes 2022-08-23 17:52:17 -07:00
Evan Tschannen 493771b6a8
Throttle the cluster if the blob manager cannot assign ranges (#7900)
* Throttle the cluster if the blob manager cannot assign ranges

* fixed a number of different bugs which caused ratekeeper to throttle to zero because of blob worker lag

* fix: do not mark an assignment as block if it is cancelled

* remove asserts to merge bug fixes

* fix formatting

* restored old control flow to storage updater

* storage updater did not throw errors

* disable buggify to see if it fixes CI
2022-08-23 13:33:46 -05:00
Hui Liu 33411fd07e
add an option to read from blob storage for moving keys (#7848) 2022-08-23 08:07:17 -05:00
Jingyu Zhou b966c4de0c
Merge pull request #7936 from sfc-gh-tclinkenbeard/increase-tlog-max-create-duration-in-simulation
Increase `TLOG_MAX_CREATE_DURATION` to 15.0 in simulation
2022-08-22 09:31:11 -07:00
Josh Slocum 98a7ec1797
Blob Granules Cleanup (#7941)
* Cleaned up BlobGranule TODO + FIXMEs and addressed some

* popping feed at correct version

* blob worker taking over a granule will pop from where previous worker left off

* addressed fixme of blob worker not re-snapshotting from old change feed

* formatting

* more change feed popped fixes after pop updates

* Getting rid of change feed parallelism lock since it can cause deadlocks in fetching, and relying on full fetch lock

* New blob worker metric and fixing old one

* server-side popped checking still doesn't work because of pops at non-mutation versions

* format
2022-08-19 17:25:31 -07:00
Zhe Wang 2ceaae4219
dd physical shard core (#7703) 2022-08-19 14:47:00 -04:00
sfc-gh-tclinkenbeard 7449dbe5c4 Increase TLOG_MAX_CREATE_DURATION to 15.0 in simulation 2022-08-19 10:35:13 -07:00
He Liu 044f43b6c0
Fix shard mapping read (#7897)
* Read full range.

* Count empty empty shards.

* Added ROCKSDB_READ_RANGE_ROW_LIMIT.

* Fixed BytesLimit issue.

* TraceEvents.

* Cleaned up comments.

Co-authored-by: He Liu <heliu@apple.com>
2022-08-19 09:33:59 -07:00
Yao Xiao 9c20a07d35
Add delete_obsolete_files_period_micros to RocksDB options. (#7908) 2022-08-17 11:21:21 -07:00
Jingyu Zhou 120140b8b1
Merge pull request #7857 from yao-xiao-github/perf-metrics
Fix metrics and add tunable knobs.
2022-08-15 09:30:25 -07:00
Xiaoxi Wang 9133d4e16d
Merge pull request #7803 from sfc-gh-xwang/feature/main/ddvisibility
Add server selection counter in DDQueue
2022-08-12 15:10:25 -07:00
Evan Tschannen a9d3c9f9b3
Added throttling when a blob worker falls behind (#7751)
* throttle the cluster when blob workers fall behind

* do not throttle on blob workers if they are not enabled

* remove an unnecessary actor

* fixed a compile error

* fetch blob worker metrics at the same interval as the rate is updated, avoid fetching the complete blob worker list too frequently

* fixed another compilation bug

* added a 5 second delay before bw throttling to prevent false positives caused by the 100e6 version jump during recovery. Lower the throttling thresholds to react much quicker to bw lag.

* fixed a number of problems

* changed the minBlobVersionRequest to look at storage server versions since this will be a lot more efficient

* fix: do not let desired go backwards

* fix: track the version of notAtLatest changefeeds for throttling

* ratekeeper now throttled blob workers by estimating the transaction per second throughput of the blob workers

* added metrics for blob worker change feeds

* added a knob to disable bw throttling

* fixed the transaction options in blob manager
2022-08-12 13:15:56 -07:00
Jingyu Zhou 5929ac1d65
Merge pull request #7847 from xis19/knobCheck
Cleanup the knobs that are not being used
2022-08-11 21:52:52 -07:00
Xiaoge Su 114c266b04 fixup! Recover the KMS knob which is useful 2022-08-11 15:15:14 -07:00
Yao Xiao 599e4b86d5 Add more knobs 2022-08-11 15:13:42 -07:00
Josh Slocum 44f8bdd258
Blob Worker memory limit (#7858)
* Simulation version of blob_worker_full

* tracking blocked BM assignments

* actual memory estimation implementation
2022-08-11 15:07:08 -07:00
Trevor Clinkenbeard 583021c2d9
Merge pull request #7772 from sfc-gh-tclinkenbeard/global-tag-throttling6
Add status section for global tag throttler
2022-08-11 17:38:31 -03:00
Yao Xiao fcfe6a8c29 Fix metrics and add knobs 2022-08-11 11:46:21 -07:00
Xiaoge Su 85836cbec9 Cleanup the knobs that are not being used 2022-08-10 16:51:01 -07:00
Xiaoxi Wang 1cff154adb Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/ddvisibility 2022-08-10 12:03:42 -07:00
Xiaoxi Wang f6dce5dcee add knob to control summarize 2022-08-10 09:44:01 -07:00
Xiaoxi Wang 319ec2f43d add summarize event 2022-08-09 18:22:48 -07:00
Xiaoxi Wang 9032f6c394 add comments; change default knobs 2022-08-09 16:45:15 -07:00
Josh Slocum 62494f048c
several changes to manage blob worker memory more and to test that management (#7834) 2022-08-09 17:53:52 -05:00
He Liu fa3e462662 Added validateStorageQ in SS. 2022-08-09 15:05:20 -07:00
Xiaoxi Wang ea0c60381f merge upstream/main 2022-08-09 12:28:57 -07:00
Jingyu Zhou eba77d78f4 Add knobs for min/max Ratekeeper limit
The default has no effects.
2022-08-08 15:27:21 -07:00
Xiaoxi Wang 2b2bc12cc1 Update Document; set log limit 2022-08-08 10:04:48 -07:00
Xiaoxi Wang b18e29dd87 Merge remote-tracking branch 'upstream' into feature/main/ddvisibility 2022-08-06 21:43:36 -07:00
Xiaoxi Wang 8ecee1992b
Merge pull request #7777 from sfc-gh-xwang/feature/main/eligible-wiggle
Make storage wiggler support SS_MIN_AGE
2022-08-06 17:01:46 -07:00
Xiaoxi Wang 06ae0a2f4c ddqueue.periodicalRefreshCounter() 2022-08-05 15:26:34 -07:00
Josh Slocum b2835921ba
Using knownBlobRanges for blob granule ranges whether tenants are enabled or not (#7788)
* Using knownBlobRanges for blob granule ranges whether tenants are enabled or not

* Effectively disabled blob granule tests when tenants enabled to fix ctest
2022-08-05 11:46:09 -05:00
sfc-gh-tclinkenbeard 1bd47a07b2 Add ENFORCE_TAG_THROTTLING_ON_PROXIES knob 2022-08-05 00:40:10 -07:00
Fuheng Zhao d5c3679046 merge upstream main and resolve conflicts 2022-08-04 12:15:00 -07:00
Xiaoxi Wang 4b3acf6b4d update SS_MIN_AGE=21 day 2022-08-03 17:18:15 -07:00
He Liu fa418fd784
Change SHARD_ENCODE_LOCATION_METADATA to a server knob. (#7770)
Co-authored-by: He Liu <heliu@apple.com>
2022-08-03 13:51:40 -07:00
Xiaoxi Wang 2cd15073d5 make storage wiggler support SS_MIN_AGE 2022-08-02 16:21:41 -07:00
Trevor Clinkenbeard edf4e60fa9
Merge pull request #7631 from sfc-gh-tclinkenbeard/global-tag-throttling5
Improvements to `GlobalTagThrottler`
2022-08-02 16:04:20 -07:00
Dennis Zhou b34a54fa7f
blob: allow for alignment of granules to tuple boundaries (#7746)
* blob: read TenantMap during recovery

Future functionality in the blob subsystem will rely on the tenant data
being loaded. This fixes this issue by loading the tenant data before
completing recovery such that continued actions on existing blob
granules will have access to the tenant data.

Example scenario with failover, splits are restarted before loading the
tenant data:
BM - BlobManager
epoch 3:                        epoch 4:
  BM record intent to split.
  Epoch fails.
                                BM recovery begins.
  BM fails to persist split.
                                BM recovery finishes.
                                BM.checkBlobWorkerList()
                                  maybeSplitRange().
                                BM.monitorClientRanges().
                                  loads tenant data.

bin/fdbserver -r simulation -f tests/slow/BlobGranuleCorrectness.toml \
    -s 223570924 -b on  --crash --trace_format json

* blob: add tuple key truncation for blob granule alignment

FDB has a backup system available using the blob manager and blob
granule subsystem. If we want to audit the data in the blobs, it's a lot
easier if we can align them to something meaningful.

When a blob granule is being split, we ask the storage metrics system
for split points as it holds approximate data distribution metrics.
These keys are then processed to determine if they are a tuple and
should be truncated according to the new knob,
BG_KEY_TUPLE_TRUNCATE_OFFSET.

Here we keep all aligned keys together in the same granule even if it is
larger than the allowed granule size. The following commit will address
this by adding merge boundaries.

* blob: minor clean ups in merging code

1. Rename mergeNow -> seen. This is more inline with clocksweep naming
   and removes the confusion between mergeNow and canMergeNow.
2. Make clearMergeCandidate() reset to MergeCandidateCannotMerge to make
   a clear distinction what we're accomplishing.
3. Rename canMergeNow() -> mergeEligble().

* blob: add explicit (hard) boundaries

Blob ranges can be specified either through explicit ranges or at the
tenant level. Right now this is managed implicitly. This commit aims to
make it a little more explicit.

Blobification begins in monitorClientRanges() which parses either the
explicit blob ranges or the tenant map. As we do this and add new
ranges, let's explicitly track what is a hard boundary and what isn't.

When blob merging occurs, we respect this boundary. When a hard boundary
is encountered, we submit the found eligible ranges and start looking
for a new range beginning with this hard boundary.

* blob: create BlobGranuleSplitPoints struct

This is a setup for the following commit. Our goal here is to provide a
structure for split points to be passed around. The need is for us to be
able to carry uncommitted state until it is committed and we can apply
these mutations to the in-memory data structures.

* blob: implement soft boundaries

An earlier commit establishes the need to create data boundaries within
a tenant. The reality is we may encounter a set of keys that degnerate
to the same key prefix. We'll need to be able to split those across
granules, but we want to ensure we merge the split granules together
before merging with other granules.

This adds to the BlobGranuleSplitPoints state of new
BlobGranuleMergeBoundary items. BlobGranuleMergeBoundary contains state
saying if it is a left or right boundary. This information is used to,
like hard boundaries, force merging of like granules first.

We read the BlobGranuleMergeBoundary map into memory at recovery.
2022-08-02 16:06:25 -05:00
Josh Slocum 4b66645d80
Granule file performance benchmark and improvements (#7742)
* added cpu microbenchmark for blob granule files

* Added edge case read benchmarks, and sorting memory deltas

* Sorted merge for granule files

* key block comparison optimization in granule files

* More performance improvements to granule file read

* fixing zlib not supported build

* fixing formatting

* Added debug macro for new debugging prints

* review comments

* more strict compression size validation assert
2022-08-02 11:36:44 -05:00
Josh Slocum 4d2f90977d
Merge pull request #7656 from sfc-gh-jslocum/cf_bw_operational_fixes
Cf bw operational fixes
2022-07-26 16:24:26 -05:00
Josh Slocum c32e1da908
Merge pull request #7673 from sfc-gh-jslocum/delta_files_v2
Sorted Delta Files
2022-07-26 16:04:55 -05:00
Josh Slocum 15e7a4b186 addressing review comments 2022-07-26 14:20:35 -05:00
Fuheng Zhao f761f9a03a use DefaultEndPoint as the default priority for storage server reads 2022-07-25 10:10:42 -07:00
Josh Slocum ea9018460a cleanup and polish 2022-07-22 15:13:32 -05:00
Lukas Joswiak 703aa1d279 Mess with timeout values 2022-07-22 10:37:29 -07:00
Lukas Joswiak 40d403ed5f Reduce global configuration system key reads from proxy
Clients now poll the proxy for the latest global config for a specific
version. The proxy now periodically requests the latest global
configuration data and stores it in memory, enabling it to respond
immediately to clients with the appropriate version.
2022-07-22 10:37:29 -07:00
Lukas Joswiak 56dfdbda83 Add migration timeout 2022-07-22 10:37:29 -07:00
Lukas Joswiak 2e99d5f6cc Batch global config refresh requests 2022-07-22 10:37:29 -07:00
He Liu 7a8be255cd
Ss shard management (#7340)
* Storage server shard management with physical shards.

* Cleanup.

* Resolved comments.

* Added `UnlimintedCommitBytes`.

Co-authored-by: He Liu <heliu@apple.com>
2022-07-22 09:30:44 -07:00
Josh Slocum 0b674bf0c4 Merge branch 'main' into cf_bw_operational_fixes 2022-07-20 08:03:59 -05:00
Josh Slocum 78b6a96006 Merge branch 'main' into granule_merging_batch 2022-07-20 07:42:26 -05:00
sfc-gh-tclinkenbeard fe05cc5c72 Update busy read tag reporting in status json 2022-07-19 16:29:11 -07:00
Josh Slocum 306610bfcb batch periodic merging in blob manager 2022-07-15 15:52:10 -05:00
Ata E Husain Bohra f288abebc2 BlobFile Encryption and compression support
Fix formatting issues and rename KNOB

Description

Testing
2022-07-14 17:22:00 -07:00
Ata E Husain Bohra f6f117592d BlobFile Encryption and compression support
- Limit verbose logging under DEBUG_MACRO
 - Update/Add code documentation

Description

Testing
2022-07-14 17:04:14 -07:00
Ata E Husain Bohra 24b2de8de8 BlobFile Encryption and compression support
Description

Testing
2022-07-14 17:04:14 -07:00
Fuheng Zhao 312e160a12 use PriorityMultiLock in storage server 2022-07-14 15:29:54 -07:00
Lukas Joswiak 407300bfa6 Disable testing of the remote key value store in simulation 2022-07-13 18:32:50 -07:00
Josh Slocum b85fbaef52
Merge pull request #7395 from sfc-gh-jslocum/bg_file_chunking
Chunked Snapshot Files
2022-07-13 17:22:34 -05:00
Fuheng Zhao d77695b77f use explicit number for ioMaxPriority 2022-07-12 11:54:10 -07:00
Josh Slocum 0b0ac16a4c Merge branch 'main' into granule_merging 2022-07-12 09:09:30 -05:00
Josh Slocum c6700fe62f Merge branch 'main' into bg_file_chunking 2022-07-12 08:28:06 -05:00
Fuheng Zhao 39b37a80ef remove comments and format 2022-07-11 16:10:38 -07:00
Fuheng Zhao 358b592458 Merge branch 'main' of https://github.com/apple/foundationdb into RedwoodIOLaunchLimit 2022-07-11 15:19:35 -07:00
Fuheng Zhao 0955419418 move ParsingStringVector function to genericactor class 2022-07-11 11:13:32 -07:00
Josh Slocum 33fcdc4764 Change Feed and Blob Worker operational improvements 2022-07-11 12:29:51 -05:00
Chaoguang Lin 901d988de9
Add a knob SNAPSHOT_ALL_STATEFUL_PROCESSES to snapshot all processes with stateful class type(storage, log, transaction) even if they are not recruited (#7554) 2022-07-08 20:53:49 -07:00
He Liu bc5bfaffda
Shard based move (#6981)
* Shard based move.

* Clean up.

* Clear results on retry in getInitialDataDistribution.

* Remove assertion on SHARD_ENCODE_LOCATION_METADATA for compatibility.

* Resolved comments.

Co-authored-by: He Liu <heliu@apple.com>
2022-07-07 20:49:16 -07:00
Fuheng Zhao ba5c8bd86e start on RedwoodIO lauch limit 2022-07-06 16:59:21 -07:00
Josh Slocum 9e64037b25 Merge branch 'main' into bg_file_chunking 2022-06-30 17:13:02 -05:00
Jingyu Zhou d60cab788e
Merge pull request #7502 from jzhou77/main
Add pipelining for secondary queries in index prefetch
2022-06-30 10:27:29 -07:00
Dan Lambright 98b18e3a18
Remove code obsoleted by commit c48d5690 (#7499) 2022-06-30 12:16:23 -04:00