foundationdb

Commit Graph

Author	SHA1	Message	Date
He Liu	0bbce98da2	Disable shard aware (#8072 ) * Removed STORAGE_SERVER_SHARD_AWARE knob. * Fixed PhysicalShardMove test. Co-authored-by: He Liu <heliu@apple.com>	2022-09-02 09:07:34 -07:00
Josh Slocum	e7a82c9283	Merge pull request #8059 from sfc-gh-jslocum/bw_rk_fix Adding knob and increasing delay for simulation ratekeeper throttling assert	2022-08-31 15:55:05 -05:00
Josh Slocum	dcfc03a247	Merge pull request #8061 from sfc-gh-jslocum/ss_ebrake_speed_up_sim Minimizing effect of overly small buggified ss ebrake limits when speed up simulation is set	2022-08-31 15:54:41 -05:00
Yi Wu	49503987cc	Support Redwood encryption (#7376 ) A new knob `ENABLE_STORAGE_SERVER_ENCRYPTION` is added, which despite its name, currently only Redwood supports it. The knob is mean to be only used in tests to test encryption in individual components, and otherwise enabling encryption should be done through the general `ENABLE_ENCRYPTION` knob. Under the hood, a new `Encryption` encoding type is added to `IPager`, which use AES-256 to encrypt a page. With this encoding, `BlobCipherEncryptHeader` is inserted into page header for encryption metadata. Moreover, since we compute and store an SHA-256 auth token with the encryption header, we rely on it to checksum the data (and the encryption header), and skip the standard xxhash checksum. `EncryptionKeyProvider` implements the `IEncryptionKeyProvider` interface to provide encryption keys, which utilizes the existing `getLatestEncryptCipherKey` and `getEncryptCipherKey` actors to fetch encryption keys from either local cache or EKP server. If multi-tenancy is used, for writing a new page, `EncryptionKeyProvider` checks if a page contain only data for a single tenant, if so, fetches tenant specific encryption key; otherwise system encryption key is used. The tenant check is done by extracting tenant id from page bound key prefixes. `EncryptionKeyProvider` also holds a reference of the `tenantPrefixIndex` map maintained by storage server, which is used to check if a tenant do exists, and getting the tenant name in order to get the encryption key.	2022-08-31 12:19:55 -07:00
Josh Slocum	c587135988	Minimizing effect of overly small buggified ss ebrake limits when speed up simulation is set	2022-08-31 13:39:50 -05:00
Josh Slocum	9721de70b6	Adding knob and increasing delay for simulation ratekeeper throttling assert	2022-08-31 09:08:27 -05:00
Yao Xiao	ac7a5823e2	Add knob for CF write buffer size. (#8038 )	2022-08-30 17:52:29 -07:00
Yao Xiao	09f62acd14	Add delay to physical shard clean up. (#7989 )	2022-08-29 11:30:50 -07:00
A.J. Beamon	2907d2d4dd	Merge pull request #8004 from sfc-gh-ajbeamon/fix-ub Fix some undefined bevavior in RK and a unit test	2022-08-29 09:16:11 -07:00
Evan Tschannen	8314e80371	Fixed a few bugs which caused ratekeeper to unnecessarily throttle a cluster (#8006 ) * do not count recently created change feeds for throttling * fix: blocked assignments were not decremented when force purging * fix: created needs to be updated when the changefeed is reset * added asserts to detect if ratekeeper is throttled on blob workers	2022-08-26 15:38:31 -07:00
A.J. Beamon	0e782412a8	Fix some undefined bevavior: 1) a unit test was not initializing members of the WorkloadContext it was using, and 2) very large ratekeeper limits for batch priority were overflowing the types used to log them	2022-08-26 14:17:01 -07:00
Ata E Husain Bohra	00fe4863b6	Implement TenantCacheEntry in-memory cache (#7801 ) * Implement TenantCacheEntry in-memory cache Description diff-4: TraceEvent usage improvements diff-3: Address review comments diff-2: Add APIs to read counter values, test improvements diff-1: Address review comments Major changes includes: 1. Implements an actor that enables an in-memory caching of TenantCacheEntry object, allowing the caller to embed custom information along with TenantCacheEntry. 2. The cache follows read-through cache semantics where the entry gets loaded from underlying database on a miss. 3. The cache implements a "periodic poller" to refresh known Tenants by consulting the database. Once a database keyrange-watch feature is available, cache shall be updated. Bonus: Implement a 'recurringAsync' addition to genericActors allowing caller to schedule a periodic task registering an "actor functor"; the routine 'waits' for the actor unlike existing 'recurring' implementation. Testing TenantEntryCache workload devCorrectnessRun - 100K	2022-08-25 11:42:26 -07:00
Ata E Husain Bohra	d6b1ac056c	KMS connector to assist encryption enabled perf runs (#7978 ) Description FDB Native encryption requires integration with external KeyMangement Services to fetch required encryption keys. For simulation runs, there exists SimKmsConnector implementation that fakes interaction with external KMS. Major changes suggested in the patch: 1. Enable setting KMS_CONNECTOR_TYPE via command line arguments. 2. If "FDBPerfKmsConnector" is set as KMS_CONNECTOR_TYPE, then allow using SimKmsConnector implementation. Note: SimKmsConnector can handle process reboots. Testing devRunCorrectness - 100K	2022-08-25 10:00:46 -07:00
Chaoguang Lin	06aa6ee5ff	Add system monitor for flowprocess (#6925 ) * Update network address in trace logs; Add system monitor for flowprocess * Create a new trace file with the correct process address for flowprocess * Remove unused debugging traces * Add a new error lock_file_failure; Change please_reboot_remote_kv_store to please_reboot_kv_store; Add the code to only reboot the kv store but not the worker; Remove some unnecessay traces * Add error handling for file_not_found in handleIOErrors * Format worker.actor.cpp file	2022-08-24 00:40:38 -07:00
Evan Tschannen	493771b6a8	Throttle the cluster if the blob manager cannot assign ranges (#7900 ) * Throttle the cluster if the blob manager cannot assign ranges * fixed a number of different bugs which caused ratekeeper to throttle to zero because of blob worker lag * fix: do not mark an assignment as block if it is cancelled * remove asserts to merge bug fixes * fix formatting * restored old control flow to storage updater * storage updater did not throw errors * disable buggify to see if it fixes CI	2022-08-23 13:33:46 -05:00
Hui Liu	33411fd07e	add an option to read from blob storage for moving keys (#7848 )	2022-08-23 08:07:17 -05:00
Jingyu Zhou	b966c4de0c	Merge pull request #7936 from sfc-gh-tclinkenbeard/increase-tlog-max-create-duration-in-simulation Increase `TLOG_MAX_CREATE_DURATION` to 15.0 in simulation	2022-08-22 09:31:11 -07:00
Josh Slocum	98a7ec1797	Blob Granules Cleanup (#7941 ) * Cleaned up BlobGranule TODO + FIXMEs and addressed some * popping feed at correct version * blob worker taking over a granule will pop from where previous worker left off * addressed fixme of blob worker not re-snapshotting from old change feed * formatting * more change feed popped fixes after pop updates * Getting rid of change feed parallelism lock since it can cause deadlocks in fetching, and relying on full fetch lock * New blob worker metric and fixing old one * server-side popped checking still doesn't work because of pops at non-mutation versions * format	2022-08-19 17:25:31 -07:00
Zhe Wang	2ceaae4219	dd physical shard core (#7703 )	2022-08-19 14:47:00 -04:00
sfc-gh-tclinkenbeard	7449dbe5c4	Increase TLOG_MAX_CREATE_DURATION to 15.0 in simulation	2022-08-19 10:35:13 -07:00
He Liu	044f43b6c0	Fix shard mapping read (#7897 ) * Read full range. * Count empty empty shards. * Added ROCKSDB_READ_RANGE_ROW_LIMIT. * Fixed BytesLimit issue. * TraceEvents. * Cleaned up comments. Co-authored-by: He Liu <heliu@apple.com>	2022-08-19 09:33:59 -07:00
Yao Xiao	9c20a07d35	Add delete_obsolete_files_period_micros to RocksDB options. (#7908 )	2022-08-17 11:21:21 -07:00
Jingyu Zhou	120140b8b1	Merge pull request #7857 from yao-xiao-github/perf-metrics Fix metrics and add tunable knobs.	2022-08-15 09:30:25 -07:00
Xiaoxi Wang	9133d4e16d	Merge pull request #7803 from sfc-gh-xwang/feature/main/ddvisibility Add server selection counter in DDQueue	2022-08-12 15:10:25 -07:00
Evan Tschannen	a9d3c9f9b3	Added throttling when a blob worker falls behind (#7751 ) * throttle the cluster when blob workers fall behind * do not throttle on blob workers if they are not enabled * remove an unnecessary actor * fixed a compile error * fetch blob worker metrics at the same interval as the rate is updated, avoid fetching the complete blob worker list too frequently * fixed another compilation bug * added a 5 second delay before bw throttling to prevent false positives caused by the 100e6 version jump during recovery. Lower the throttling thresholds to react much quicker to bw lag. * fixed a number of problems * changed the minBlobVersionRequest to look at storage server versions since this will be a lot more efficient * fix: do not let desired go backwards * fix: track the version of notAtLatest changefeeds for throttling * ratekeeper now throttled blob workers by estimating the transaction per second throughput of the blob workers * added metrics for blob worker change feeds * added a knob to disable bw throttling * fixed the transaction options in blob manager	2022-08-12 13:15:56 -07:00
Jingyu Zhou	5929ac1d65	Merge pull request #7847 from xis19/knobCheck Cleanup the knobs that are not being used	2022-08-11 21:52:52 -07:00
Xiaoge Su	114c266b04	fixup! Recover the KMS knob which is useful	2022-08-11 15:15:14 -07:00
Yao Xiao	599e4b86d5	Add more knobs	2022-08-11 15:13:42 -07:00
Josh Slocum	44f8bdd258	Blob Worker memory limit (#7858 ) * Simulation version of blob_worker_full * tracking blocked BM assignments * actual memory estimation implementation	2022-08-11 15:07:08 -07:00
Trevor Clinkenbeard	583021c2d9	Merge pull request #7772 from sfc-gh-tclinkenbeard/global-tag-throttling6 Add status section for global tag throttler	2022-08-11 17:38:31 -03:00
Yao Xiao	fcfe6a8c29	Fix metrics and add knobs	2022-08-11 11:46:21 -07:00
Xiaoge Su	85836cbec9	Cleanup the knobs that are not being used	2022-08-10 16:51:01 -07:00
Xiaoxi Wang	1cff154adb	Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/ddvisibility	2022-08-10 12:03:42 -07:00
Xiaoxi Wang	f6dce5dcee	add knob to control summarize	2022-08-10 09:44:01 -07:00
Xiaoxi Wang	319ec2f43d	add summarize event	2022-08-09 18:22:48 -07:00
Xiaoxi Wang	9032f6c394	add comments; change default knobs	2022-08-09 16:45:15 -07:00
Josh Slocum	62494f048c	several changes to manage blob worker memory more and to test that management (#7834 )	2022-08-09 17:53:52 -05:00
Xiaoxi Wang	ea0c60381f	merge upstream/main	2022-08-09 12:28:57 -07:00
Jingyu Zhou	eba77d78f4	Add knobs for min/max Ratekeeper limit The default has no effects.	2022-08-08 15:27:21 -07:00
Xiaoxi Wang	2b2bc12cc1	Update Document; set log limit	2022-08-08 10:04:48 -07:00
Xiaoxi Wang	b18e29dd87	Merge remote-tracking branch 'upstream' into feature/main/ddvisibility	2022-08-06 21:43:36 -07:00
Xiaoxi Wang	8ecee1992b	Merge pull request #7777 from sfc-gh-xwang/feature/main/eligible-wiggle Make storage wiggler support SS_MIN_AGE	2022-08-06 17:01:46 -07:00
Xiaoxi Wang	06ae0a2f4c	ddqueue.periodicalRefreshCounter()	2022-08-05 15:26:34 -07:00
Josh Slocum	b2835921ba	Using knownBlobRanges for blob granule ranges whether tenants are enabled or not (#7788 ) * Using knownBlobRanges for blob granule ranges whether tenants are enabled or not * Effectively disabled blob granule tests when tenants enabled to fix ctest	2022-08-05 11:46:09 -05:00
sfc-gh-tclinkenbeard	1bd47a07b2	Add ENFORCE_TAG_THROTTLING_ON_PROXIES knob	2022-08-05 00:40:10 -07:00
Xiaoxi Wang	4b3acf6b4d	update SS_MIN_AGE=21 day	2022-08-03 17:18:15 -07:00
He Liu	fa418fd784	Change SHARD_ENCODE_LOCATION_METADATA to a server knob. (#7770 ) Co-authored-by: He Liu <heliu@apple.com>	2022-08-03 13:51:40 -07:00
Xiaoxi Wang	2cd15073d5	make storage wiggler support SS_MIN_AGE	2022-08-02 16:21:41 -07:00
Trevor Clinkenbeard	edf4e60fa9	Merge pull request #7631 from sfc-gh-tclinkenbeard/global-tag-throttling5 Improvements to `GlobalTagThrottler`	2022-08-02 16:04:20 -07:00
Dennis Zhou	b34a54fa7f	blob: allow for alignment of granules to tuple boundaries (#7746 ) * blob: read TenantMap during recovery Future functionality in the blob subsystem will rely on the tenant data being loaded. This fixes this issue by loading the tenant data before completing recovery such that continued actions on existing blob granules will have access to the tenant data. Example scenario with failover, splits are restarted before loading the tenant data: BM - BlobManager epoch 3: epoch 4: BM record intent to split. Epoch fails. BM recovery begins. BM fails to persist split. BM recovery finishes. BM.checkBlobWorkerList() maybeSplitRange(). BM.monitorClientRanges(). loads tenant data. bin/fdbserver -r simulation -f tests/slow/BlobGranuleCorrectness.toml \ -s 223570924 -b on --crash --trace_format json * blob: add tuple key truncation for blob granule alignment FDB has a backup system available using the blob manager and blob granule subsystem. If we want to audit the data in the blobs, it's a lot easier if we can align them to something meaningful. When a blob granule is being split, we ask the storage metrics system for split points as it holds approximate data distribution metrics. These keys are then processed to determine if they are a tuple and should be truncated according to the new knob, BG_KEY_TUPLE_TRUNCATE_OFFSET. Here we keep all aligned keys together in the same granule even if it is larger than the allowed granule size. The following commit will address this by adding merge boundaries. * blob: minor clean ups in merging code 1. Rename mergeNow -> seen. This is more inline with clocksweep naming and removes the confusion between mergeNow and canMergeNow. 2. Make clearMergeCandidate() reset to MergeCandidateCannotMerge to make a clear distinction what we're accomplishing. 3. Rename canMergeNow() -> mergeEligble(). * blob: add explicit (hard) boundaries Blob ranges can be specified either through explicit ranges or at the tenant level. Right now this is managed implicitly. This commit aims to make it a little more explicit. Blobification begins in monitorClientRanges() which parses either the explicit blob ranges or the tenant map. As we do this and add new ranges, let's explicitly track what is a hard boundary and what isn't. When blob merging occurs, we respect this boundary. When a hard boundary is encountered, we submit the found eligible ranges and start looking for a new range beginning with this hard boundary. * blob: create BlobGranuleSplitPoints struct This is a setup for the following commit. Our goal here is to provide a structure for split points to be passed around. The need is for us to be able to carry uncommitted state until it is committed and we can apply these mutations to the in-memory data structures. * blob: implement soft boundaries An earlier commit establishes the need to create data boundaries within a tenant. The reality is we may encounter a set of keys that degnerate to the same key prefix. We'll need to be able to split those across granules, but we want to ensure we merge the split granules together before merging with other granules. This adds to the BlobGranuleSplitPoints state of new BlobGranuleMergeBoundary items. BlobGranuleMergeBoundary contains state saying if it is a left or right boundary. This information is used to, like hard boundaries, force merging of like granules first. We read the BlobGranuleMergeBoundary map into memory at recovery.	2022-08-02 16:06:25 -05:00

1 2 3 4 5 ...

356 Commits