foundationdb

Commit Graph

Author	SHA1	Message	Date
Josh Slocum	62494f048c	several changes to manage blob worker memory more and to test that management (#7834 )	2022-08-09 17:53:52 -05:00
He Liu	fa3e462662	Added validateStorageQ in SS.	2022-08-09 15:05:20 -07:00
Xiaoxi Wang	ea0c60381f	merge upstream/main	2022-08-09 12:28:57 -07:00
Jingyu Zhou	eba77d78f4	Add knobs for min/max Ratekeeper limit The default has no effects.	2022-08-08 15:27:21 -07:00
Xiaoxi Wang	2b2bc12cc1	Update Document; set log limit	2022-08-08 10:04:48 -07:00
Xiaoxi Wang	b18e29dd87	Merge remote-tracking branch 'upstream' into feature/main/ddvisibility	2022-08-06 21:43:36 -07:00
Xiaoxi Wang	8ecee1992b	Merge pull request #7777 from sfc-gh-xwang/feature/main/eligible-wiggle Make storage wiggler support SS_MIN_AGE	2022-08-06 17:01:46 -07:00
Xiaoxi Wang	06ae0a2f4c	ddqueue.periodicalRefreshCounter()	2022-08-05 15:26:34 -07:00
Josh Slocum	b2835921ba	Using knownBlobRanges for blob granule ranges whether tenants are enabled or not (#7788 ) * Using knownBlobRanges for blob granule ranges whether tenants are enabled or not * Effectively disabled blob granule tests when tenants enabled to fix ctest	2022-08-05 11:46:09 -05:00
sfc-gh-tclinkenbeard	1bd47a07b2	Add ENFORCE_TAG_THROTTLING_ON_PROXIES knob	2022-08-05 00:40:10 -07:00
Fuheng Zhao	d5c3679046	merge upstream main and resolve conflicts	2022-08-04 12:15:00 -07:00
Xiaoxi Wang	4b3acf6b4d	update SS_MIN_AGE=21 day	2022-08-03 17:18:15 -07:00
He Liu	fa418fd784	Change SHARD_ENCODE_LOCATION_METADATA to a server knob. (#7770 ) Co-authored-by: He Liu <heliu@apple.com>	2022-08-03 13:51:40 -07:00
Xiaoxi Wang	2cd15073d5	make storage wiggler support SS_MIN_AGE	2022-08-02 16:21:41 -07:00
Trevor Clinkenbeard	edf4e60fa9	Merge pull request #7631 from sfc-gh-tclinkenbeard/global-tag-throttling5 Improvements to `GlobalTagThrottler`	2022-08-02 16:04:20 -07:00
Dennis Zhou	b34a54fa7f	blob: allow for alignment of granules to tuple boundaries (#7746 ) * blob: read TenantMap during recovery Future functionality in the blob subsystem will rely on the tenant data being loaded. This fixes this issue by loading the tenant data before completing recovery such that continued actions on existing blob granules will have access to the tenant data. Example scenario with failover, splits are restarted before loading the tenant data: BM - BlobManager epoch 3: epoch 4: BM record intent to split. Epoch fails. BM recovery begins. BM fails to persist split. BM recovery finishes. BM.checkBlobWorkerList() maybeSplitRange(). BM.monitorClientRanges(). loads tenant data. bin/fdbserver -r simulation -f tests/slow/BlobGranuleCorrectness.toml \ -s 223570924 -b on --crash --trace_format json * blob: add tuple key truncation for blob granule alignment FDB has a backup system available using the blob manager and blob granule subsystem. If we want to audit the data in the blobs, it's a lot easier if we can align them to something meaningful. When a blob granule is being split, we ask the storage metrics system for split points as it holds approximate data distribution metrics. These keys are then processed to determine if they are a tuple and should be truncated according to the new knob, BG_KEY_TUPLE_TRUNCATE_OFFSET. Here we keep all aligned keys together in the same granule even if it is larger than the allowed granule size. The following commit will address this by adding merge boundaries. * blob: minor clean ups in merging code 1. Rename mergeNow -> seen. This is more inline with clocksweep naming and removes the confusion between mergeNow and canMergeNow. 2. Make clearMergeCandidate() reset to MergeCandidateCannotMerge to make a clear distinction what we're accomplishing. 3. Rename canMergeNow() -> mergeEligble(). * blob: add explicit (hard) boundaries Blob ranges can be specified either through explicit ranges or at the tenant level. Right now this is managed implicitly. This commit aims to make it a little more explicit. Blobification begins in monitorClientRanges() which parses either the explicit blob ranges or the tenant map. As we do this and add new ranges, let's explicitly track what is a hard boundary and what isn't. When blob merging occurs, we respect this boundary. When a hard boundary is encountered, we submit the found eligible ranges and start looking for a new range beginning with this hard boundary. * blob: create BlobGranuleSplitPoints struct This is a setup for the following commit. Our goal here is to provide a structure for split points to be passed around. The need is for us to be able to carry uncommitted state until it is committed and we can apply these mutations to the in-memory data structures. * blob: implement soft boundaries An earlier commit establishes the need to create data boundaries within a tenant. The reality is we may encounter a set of keys that degnerate to the same key prefix. We'll need to be able to split those across granules, but we want to ensure we merge the split granules together before merging with other granules. This adds to the BlobGranuleSplitPoints state of new BlobGranuleMergeBoundary items. BlobGranuleMergeBoundary contains state saying if it is a left or right boundary. This information is used to, like hard boundaries, force merging of like granules first. We read the BlobGranuleMergeBoundary map into memory at recovery.	2022-08-02 16:06:25 -05:00
Josh Slocum	4b66645d80	Granule file performance benchmark and improvements (#7742 ) * added cpu microbenchmark for blob granule files * Added edge case read benchmarks, and sorting memory deltas * Sorted merge for granule files * key block comparison optimization in granule files * More performance improvements to granule file read * fixing zlib not supported build * fixing formatting * Added debug macro for new debugging prints * review comments * more strict compression size validation assert	2022-08-02 11:36:44 -05:00
Josh Slocum	4d2f90977d	Merge pull request #7656 from sfc-gh-jslocum/cf_bw_operational_fixes Cf bw operational fixes	2022-07-26 16:24:26 -05:00
Josh Slocum	c32e1da908	Merge pull request #7673 from sfc-gh-jslocum/delta_files_v2 Sorted Delta Files	2022-07-26 16:04:55 -05:00
Josh Slocum	15e7a4b186	addressing review comments	2022-07-26 14:20:35 -05:00
Fuheng Zhao	f761f9a03a	use DefaultEndPoint as the default priority for storage server reads	2022-07-25 10:10:42 -07:00
Josh Slocum	ea9018460a	cleanup and polish	2022-07-22 15:13:32 -05:00
Lukas Joswiak	703aa1d279	Mess with timeout values	2022-07-22 10:37:29 -07:00
Lukas Joswiak	40d403ed5f	Reduce global configuration system key reads from proxy Clients now poll the proxy for the latest global config for a specific version. The proxy now periodically requests the latest global configuration data and stores it in memory, enabling it to respond immediately to clients with the appropriate version.	2022-07-22 10:37:29 -07:00
Lukas Joswiak	56dfdbda83	Add migration timeout	2022-07-22 10:37:29 -07:00
Lukas Joswiak	2e99d5f6cc	Batch global config refresh requests	2022-07-22 10:37:29 -07:00
He Liu	7a8be255cd	Ss shard management (#7340 ) * Storage server shard management with physical shards. * Cleanup. * Resolved comments. * Added `UnlimintedCommitBytes`. Co-authored-by: He Liu <heliu@apple.com>	2022-07-22 09:30:44 -07:00
Josh Slocum	0b674bf0c4	Merge branch 'main' into cf_bw_operational_fixes	2022-07-20 08:03:59 -05:00
Josh Slocum	78b6a96006	Merge branch 'main' into granule_merging_batch	2022-07-20 07:42:26 -05:00
sfc-gh-tclinkenbeard	fe05cc5c72	Update busy read tag reporting in status json	2022-07-19 16:29:11 -07:00
Josh Slocum	306610bfcb	batch periodic merging in blob manager	2022-07-15 15:52:10 -05:00
Ata E Husain Bohra	f288abebc2	BlobFile Encryption and compression support Fix formatting issues and rename KNOB Description Testing	2022-07-14 17:22:00 -07:00
Ata E Husain Bohra	f6f117592d	BlobFile Encryption and compression support - Limit verbose logging under DEBUG_MACRO - Update/Add code documentation Description Testing	2022-07-14 17:04:14 -07:00
Ata E Husain Bohra	24b2de8de8	BlobFile Encryption and compression support Description Testing	2022-07-14 17:04:14 -07:00
Fuheng Zhao	312e160a12	use PriorityMultiLock in storage server	2022-07-14 15:29:54 -07:00
Lukas Joswiak	407300bfa6	Disable testing of the remote key value store in simulation	2022-07-13 18:32:50 -07:00
Josh Slocum	b85fbaef52	Merge pull request #7395 from sfc-gh-jslocum/bg_file_chunking Chunked Snapshot Files	2022-07-13 17:22:34 -05:00
Fuheng Zhao	d77695b77f	use explicit number for ioMaxPriority	2022-07-12 11:54:10 -07:00
Josh Slocum	0b0ac16a4c	Merge branch 'main' into granule_merging	2022-07-12 09:09:30 -05:00
Josh Slocum	c6700fe62f	Merge branch 'main' into bg_file_chunking	2022-07-12 08:28:06 -05:00
Fuheng Zhao	39b37a80ef	remove comments and format	2022-07-11 16:10:38 -07:00
Fuheng Zhao	358b592458	Merge branch 'main' of https://github.com/apple/foundationdb into RedwoodIOLaunchLimit	2022-07-11 15:19:35 -07:00
Fuheng Zhao	0955419418	move ParsingStringVector function to genericactor class	2022-07-11 11:13:32 -07:00
Josh Slocum	33fcdc4764	Change Feed and Blob Worker operational improvements	2022-07-11 12:29:51 -05:00
Chaoguang Lin	901d988de9	Add a knob SNAPSHOT_ALL_STATEFUL_PROCESSES to snapshot all processes with stateful class type(storage, log, transaction) even if they are not recruited (#7554 )	2022-07-08 20:53:49 -07:00
He Liu	bc5bfaffda	Shard based move (#6981 ) * Shard based move. * Clean up. * Clear results on retry in getInitialDataDistribution. * Remove assertion on SHARD_ENCODE_LOCATION_METADATA for compatibility. * Resolved comments. Co-authored-by: He Liu <heliu@apple.com>	2022-07-07 20:49:16 -07:00
Fuheng Zhao	ba5c8bd86e	start on RedwoodIO lauch limit	2022-07-06 16:59:21 -07:00
Josh Slocum	9e64037b25	Merge branch 'main' into bg_file_chunking	2022-06-30 17:13:02 -05:00
Jingyu Zhou	d60cab788e	Merge pull request #7502 from jzhou77/main Add pipelining for secondary queries in index prefetch	2022-06-30 10:27:29 -07:00
Dan Lambright	98b18e3a18	Remove code obsoleted by commit `c48d5690` (#7499 )	2022-06-30 12:16:23 -04:00
Jingyu Zhou	ba5e2ed72d	Add batching for secondary queries in index prefetch This is to reduce excessive load being issued concurrently, controlled by knob MAX_PARALLEL_QUICK_GET_VALUE.	2022-06-29 21:17:11 -07:00
Chaoguang Lin	29f98f3654	Avoid duplicate snapshot on one process if it serves as multiple roles (#7294 ) * Fix comments * Add simulation value for SERVER_KNOBS->SNAP_CREATE_MAX_TIMEOUT * A work version with correctness clean * Remove unnecessay comments; debugging symbols * Only check secondary address for coordinators, same as before * Change the trace to SevError and remove the ASSERT(false) * Remove TLogSnapRequest handling on TlogServer, which is changed to use WorkerSnapRequest * Add retry for network failures * Add retry limit for network failures; still allow duplicate snapshots on processes are both tlog and storage to avoid race * Add retry limit as a knob and make backoff exponentail * Add getDatabaseConfiguration(Transaction* tr) * revert back to send request for each role once * update some comments	2022-06-29 11:23:07 -07:00
Trevor Clinkenbeard	db769667ae	Merge pull request #7112 from sfc-gh-tclinkenbeard/global-tag-throttling3 Create Global Tag Throttler	2022-06-29 10:06:00 -07:00
sfc-gh-tclinkenbeard	be9a2002c3	Merge remote-tracking branch 'origin/main' into fix-knob-typo	2022-06-28 15:58:18 -07:00
sfc-gh-tclinkenbeard	086e4bff06	Merge remote-tracking branch 'origin/main' into global-tag-throttling3	2022-06-28 10:18:13 -07:00
Lukas Joswiak	9ac7a1aed8	Increase buggified max version batch size With a smaller byte limit, more `VersionBatch`s are created, resulting in many more traces.	2022-06-27 22:44:09 -07:00
sfc-gh-tclinkenbeard	0c2bd73e28	Fix typo in SHARD_READ_HOT_BANDWIDTH_MIN_PER_KSECONDS name	2022-06-27 09:56:02 -07:00
Markus Pilman	a47ed89018	Linux fixes and addressed review comments	2022-06-23 20:52:13 -06:00
sfc-gh-tclinkenbeard	840dac1fa3	Merge remote-tracking branch 'origin/main' into global-tag-throttling3	2022-06-22 22:17:33 -07:00
Bharadwaj V.R	8cf2be030f	Build a TenantCache for use by DD (#7207 ) * Add an DD tenant-cache-assembly actor * Add basic tenant list monitoring for tenant cache. * Update DD tenant cache refresh to be more efficient and unit-testable * Remove the DD prefix in the tenant cache class name (and associated impl and UT class names); there is nothing specific to DD in it; DD uses it; other modules may use it in the future * Disable DD tenant awareness by default	2022-06-21 16:29:30 -07:00
sfc-gh-tclinkenbeard	2391e58fb2	Merge remote-tracking branch 'origin/main' into global-tag-throttling3	2022-06-21 10:09:15 -07:00
sfc-gh-tclinkenbeard	0216740c0c	Add /GlobalTagThrottler/Simple unit test	2022-06-15 17:21:54 -07:00
Josh Slocum	3f871f4123	Chunked Snapshot Files passes unit test	2022-06-15 14:52:28 -05:00
sfc-gh-tclinkenbeard	b7fd69ed7f	Add GLOBAL_TAG_THROTTLING_TRACE_INTERVAL knob	2022-06-13 16:09:21 -07:00
Andrew Noyes	013b290ca5	Don't fail test if log cursor times out during network partition (#7330 ) * Don't fail test if log cursor times out during network partition Also, exercise the codepath for handling timed_out in simulation, by reverting this knob buggification behavior to that of `07976993e7`. * clang-format	2022-06-13 13:28:22 -07:00
sfc-gh-tclinkenbeard	df71a49bf6	Merge remote-tracking branch 'origin/main' into global-tag-throttling3	2022-06-13 10:03:10 -07:00
Josh Slocum	d6920cde28	Implemented blob granule merging	2022-06-09 10:50:53 -05:00
Sreenath Bodagala	fe5f11358f	Merge pull request #7318 from sbodagala/main Introduce a knob that controls the placement of remote storage server commit versions in version vector	2022-06-08 12:18:15 -04:00
Yi Wu	bbf8cb4b02	GetEncryptCipherKeys helper function and misc encryption changes (#7252 ) Adding GetEncryptCipherKeys and GetLatestCipherKeys helper actors, which encapsulate cipher key fetch logic: getting cipher keys from local BlobCipherKeyCache, and on cache miss fetch from EKP (encrypt key proxy). These helper actors also handles the case if EKP get shutdown in the middle, they listen on ServerDBInfo to wait for new EKP start and send new request there instead. The PR also have other misc changes: * EKP is by default started in simulation regardless of. ENABLE_ENCRYPTION knob, so that in restart tests, if ENABLE_ENCRYPTION is switch from on to off after restart, encrypted data will still be able to be read. * API tweaks for BlobCipher * Adding a ENABLE_TLOG_ENCRYPTION knob which will be used in later PRs. The knob should normally be consistent with ENABLE_ENCRYPTION knob, but could be used to disable TLog encryption alone. This PR is split out from #6942.	2022-06-07 21:00:13 -07:00
Sreenath Bodagala	96a88e3847	Merge remote-tracking branch 'apple-upstream/main'	2022-06-07 18:38:35 +00:00
Dan Adkins	bd47f390bd	Add simulation test for three_data_hall configuration (#7305 ) * Add simulation test for 1 data hall + 1 machine failure case. * Disable BUGGIFY for DEGRADED_RESET_INTERVAL. A simulation test discovered a situation where machines attempting to connect to a dead coordinator (with a well-known endpoint) were getting themselves marked degraded. This flapping of the degraded state prevented recovery from completing, as it started over any time it noticed that tlogs on degraded hosts could be relocated to non-degraded ones. bin/fdbserver -r simulation -f tests/rare/CycleWithDeadHall.toml -b on -s 276841956	2022-06-06 13:14:49 -07:00
Sreenath Bodagala	a3c6ed2e86	- Introduce a knob that will control the placement of the commit versions of remote storage servers in version vector. This optimization will help reduce the size of version vector in HA configuration.	2022-06-03 19:29:54 +00:00
Josh Slocum	b650410a48	Merge branch 'main' into blob_granule_kms	2022-06-03 09:13:49 -05:00
Josh Slocum	fcd20c479d	addressing review comments	2022-06-03 08:36:07 -05:00
Evan Tschannen	473edf3d11	fix: the peek_using_streaming can cause memory corruption (#7286 ) * fix: the peek_using_streaming can cause memory corruption * changed how ALLOW_DANGEROUS_KNOBS is initialized * only buggify streaming when simulated	2022-05-31 16:04:28 -07:00
sfc-gh-tclinkenbeard	29ebcdbfa9	Merge remote-tracking branch 'origin/main' into global-tag-throttling3	2022-05-31 14:53:51 -07:00
Josh Slocum	ffa4255c65	Added blob metadata concept as new secret type, and verified blob workers can load it	2022-05-27 15:15:56 -05:00
Xiaoxi Wang	74748c20a0	a.fix heap-use-after-free caused by early noErrorsActors destroy	2022-05-27 10:14:10 -07:00
Xiaoxi Wang	5bea02cfde	Merge pull request #6686 from sfc-gh-xwang/readaware Read-aware Data Distributor (default disabled)	2022-05-26 11:05:03 -07:00
Xiaoxi Wang	13a77dd5a2	change priority knob; change PromiseStream to FutureStream; remove comments; add on_sr check	2022-05-25 17:09:34 -07:00
Josh Slocum	85af0a25b2	Enabling BM to understand tenant boundaries, and changing BlobGranuleCorrectness to use tenants	2022-05-25 17:16:56 -05:00
Xiaoxi Wang	73624bcd2a	Merge remote-tracking branch 'upstream/main' into readaware	2022-05-23 11:17:38 -07:00
Dan Lambright	5e5e454362	Merge pull request #7199 from apple/blocking-peek-is-1sec-bug	2022-05-20 17:34:56 -04:00
Dan Lambright	801f6d52ee	put buggigy knob conditionals in standard format	2022-05-20 16:02:07 -04:00
He Liu	bc509d9572	Added fetchCheckpointKeyValuesQ in storage server.	2022-05-19 13:27:21 -07:00
Xiaoxi Wang	4f28af1ab7	add comment	2022-05-19 10:50:24 -07:00
Dan Lambright	a4fa68453c	In version vector, blocking peek timeout cannot be much greater than max_read_transaction_life_versions	2022-05-19 10:34:37 -04:00
Xiaoxi Wang	48e7689cc4	restore knob value	2022-05-17 22:20:35 -07:00
Xiaoxi Wang	8950822e36	add knobs; change knobs	2022-05-17 14:49:27 -07:00
Xiaoxi Wang	382f0fc4a2	merge upstream/main	2022-05-17 10:20:51 -07:00
Xiaoxi Wang	3b241955e7	add more informative trace info	2022-05-16 21:25:56 -07:00
Xiaoxi Wang	e9e11bf53b	change all criteria to knobs	2022-05-12 16:30:21 -07:00
Ata E Husain Bohra	a7cd61c5cf	Enable debugId tracing for encryption requests (#7111 ) * Enable debugId tracing for encryption requests Description diff-1: Minor fixes, address review comment Proposed changes include: 1. Update EncryptKeyProxy API to embded Optional<UID> for debugging request execution. 2. Encryption participant FDB processes can set 'debugId' enabling tracing requests within FDB cluster processes and beyond. 3. The 'debugId' if available is embedded as part of 'request_json_payload' by RESTKmsConnector, enabling tracing request between FDB <--> KMS. 4. Fix EncryptKeyProxyTest which got broken due to recent changes. Testing Updated following test: 1. EncryptKeyProxy simulation test. 2. RESTKmsConnector simulation test. Description Testing	2022-05-11 13:23:27 -07:00
Xiaoxi Wang	2717cee1f9	Merge branch 'features/read-skew' into readaware	2022-05-09 16:12:09 -07:00
sfc-gh-tclinkenbeard	835ee43a4a	Cleanup to make global tag throttling feature mergeable	2022-05-09 11:35:51 -07:00
sfc-gh-tclinkenbeard	970b30b6cc	Added SS_THROTTLE_TAGS_TRACKED knob	2022-05-07 15:58:47 -07:00
sfc-gh-tclinkenbeard	bfd145b299	Make GlobalTagThrottlerImpl::QuotaAndCounters::quota optional. This allows us to track transaction rates for tags that don't have a specified quota.	2022-05-07 15:58:46 -07:00
sfc-gh-tclinkenbeard	b786354340	Add tracing and improve testing environment for global tag throttling	2022-05-07 15:58:46 -07:00
sfc-gh-tclinkenbeard	ef61b1f804	Add GLOBAL_TAG_THROTTLING_FOLDING_TIME knob	2022-05-07 15:58:46 -07:00
sfc-gh-tclinkenbeard	45bad60cc3	Add GLOBAL_TAG_THROTTLING_MIN_RATE knob	2022-05-07 15:58:46 -07:00
sfc-gh-tclinkenbeard	a4f4af4d6b	Add transactionTag test parameter to workloads	2022-05-07 15:58:05 -07:00
sfc-gh-tclinkenbeard	5a1de67757	Add GLOBAL_TAG_THROTTLING knob	2022-05-07 15:58:04 -07:00
Ata E Husain Bohra	33ae398268	REST KmsConnector implementation (#6994 ) * REST KmsConnector implementation Description diff-1: Address review comments. Add utility interface to Platform namespace to create and operate on tmpfile diff-2: Address review comments Link Boost::filesystem to CMake build process Major changes includes: 1. Implement REST based KmsConnector implementation. 2. Salient features of the connector: 2.1. Two required configuration are: a. Discovery KMS URLs - enable KMS discovery on bootstrap b. Endpoint path configuration to construct URI to fetch/refresh encryption keys c. Configuration to provide "validationTokens" to connect with external KMS. Patch implements file-based token validation scheme. 2.2. On startup, RESTKmsConnector discovers KMS Urls and caches them in-memory. Extracts "validationTokens" based on input config. 2.3. Expose endpoints to allow fetch/refresh of encryption keys. 2.4. Defines JSON format to interact with external KMS - request & response payload format. 3. Extend Platform namespace with an interface to create and operate on tmp files. 4. Update Platform 'readFileBytes' and 'writeFileBytes' to leverage fstream supported implementation. NOTE: KMS URLs fetched after initial discovery will be persisted using DynamicKnobs. It is TODO at the moment and shall be completed once DynamicKnobs is feature complete Testing Unit test to validation following: 1. Parsing on "validation tokens" logic. 2. Construction and parsing of REST JSON request and response strings.	2022-05-07 13:18:35 -07:00
Xiaoxi Wang	69985ba251	Merge branch 'main' of https://github.com/apple/foundationdb into readaware	2022-05-02 10:53:22 -07:00
Josh Slocum	674e6a8fdc	Merge branch 'main' into min_shard_bytes_main	2022-04-29 16:00:27 -05:00
Josh Slocum	db6d7396ca	Add delay between quickly re-recruiting the same singleton process, to avoid recruit thrashing when there are temporarily multiple cluster controllers (#7000 )	2022-04-28 15:45:09 -07:00
Lukas Joswiak	8b489ae0bf	Lower max compute duration cutoff and increase suppression duration	2022-04-28 10:48:57 -04:00
Lukas Joswiak	f13a8df5d9	Add logging measuring commit compute duration	2022-04-28 10:48:57 -04:00
Xiaoxi Wang	00b97ec829	add storage metric compare knob; timeThrottle with constant	2022-04-27 23:37:35 -07:00
Xiaoxi Wang	898a5b86b2	reset knobs	2022-04-26 17:16:55 -07:00
Xiaoxi Wang	87d358ce6c	reset all knobs	2022-04-26 17:11:38 -07:00
Xiaoxi Wang	0639810b66	merge upstream/main	2022-04-22 11:09:15 -07:00
Xiaoxi Wang	eb949ee3dc	change default shard size; enable valley filler best; top 10 random choice	2022-04-22 10:07:12 -07:00
Ata E Husain Bohra	670d40ef79	FDB native KMS Connector Framework (#6846 ) * FDB native KMS Connector Framework Description Major changes includes: 1. Framework code to enable FDB native KMS connector implementation. 2. SERVER_KNOBS->KMS_CONNECTOR_TYPE controls the connector type selection. 3. KmsConnectorInterface endpoint definitions, every KMSConnector implementation needs to support defined endpoints. 4. Update EncryptKeyProxy to leverage KmsConnectorInterface endpoints to fetch encryption keys on-demand and/or periodic refreshes. Integrate SimKmsConnector implementation. 5. Implement SimKmsConnector by leveraging existing SimKeyProxy implementation. Testing Unit test: fdbserver/SimKmsConnector Simulation: EncryptKeyProxy	2022-04-22 08:53:39 -07:00
Zhe Wang	6c9ff6ee5e	Add sharded rocksdb type (#6862 ) * add-sharded-rocksdb-type * address comments Co-authored-by: Zhe Wang <zhewang@Zhes-MacBook-Pro.local>	2022-04-21 22:53:14 -04:00
Josh Slocum	ca68fefca2	Decreasing MIN_SHARD_BYTES knob	2022-04-20 18:46:11 -05:00
Xiaoxi Wang	8d3f851495	merge upstream/mainA	2022-04-12 17:03:09 -07:00
Xiaoxi Wang	ed97a35dc0	Merge branch 'main' into readaware	2022-04-12 16:47:15 -07:00
Ata E Husain Bohra	933e5bbd2e	EncryptKeyProxy server APIs for simulation runs. (#6727 ) * EncryptKeyProxy server APIs for simulation runs. Description diff-2: FlowSingleton util class Bug fixes diff-1: Expected errors returned to the caller Major changes proposed are: 1. EncryptKeyProxy server APIs: 1.1. Lookup Cipher details via BaseCipherId 1.2. Lookup latest Cipher details via encryption domainId. 2. EncyrptKeyProxy implements caches indexed by: baseCipherId & encyrptDomainId 3. Periodic task to refresh domainId indexed cache to support 'limiting cipher lifetime' abilities if supported by external KMS solutions. Testing EncyrptKeyProxyTest workload to validate the newly added code.	2022-04-11 09:08:42 -07:00
Dan Lambright	9d433c1bef	Merge pull request #6764 from apple/vv version-vector-prototype to main branch	2022-04-08 18:50:12 -04:00
neethuhaneesha	b7096c410f	Merge pull request #6795 from neethuhaneesha/rocksdb-blocksize Adding rocksdb block size option.	2022-04-08 14:20:54 -07:00
Dan Lambright	1b3b4166c6	Merge branch 'main' into vv	2022-04-08 17:18:13 -04:00
Trevor Clinkenbeard	ba8fbca038	Merge pull request #6752 from sfc-gh-tclinkenbeard/improve-snapshot-fault-tolerance Improve fault tolerance of snapshots	2022-04-08 12:46:50 -07:00
Lukas Joswiak	73a7c32982	Add fdbcli command to read/write version epoch (#6480 ) * Initialize cluster version at wall-clock time Previously, new clusters would begin at version 0. After this change, clusters will initialize at a version matching wall-clock time. Instead of using the Unix epoch (or Windows epoch), FDB clusters will use a new epoch, defaulting to January 1, 2010, 01:00:00+00:00. In the future, this base epoch will be modifiable through fdbcli, allowing administrators to advance the cluster version. Basing the version off of time allows different FDB clusters to share data without running into version issues. * Send version epoch to master * Cleanup * Update fdbserver/storageserver.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * Jump directly to expected version if possible * Fix initial version issue on storage servers * Add random recovery offset to start version in simulation * Type fixes * Disable reference time by default Enable on a cluster using the fdbcli command `versionepoch add 0`. * Use correct recoveryTransactionVersion when recovering * Allow version epoch to be adjusted forwards (to decrease the version) * Set version epoch in simulation * Add quiet database check to ensure small version offset * Fix initial version issue on storage servers * Disable reference time by default Enable on a cluster using the fdbcli command `versionepoch add 0`. * Add fdbcli command to read/write version epoch * Cause recovery when version epoch is set * Handle optional version epoch key * Add ability to clear the version epoch This causes version advancement to revert to the old methodology whereas versions attempt to advance by about a million versions per second, instead of trying to match the clock. * Update transaction access * Modify version epoch to use microseconds instead of seconds * Modify fdbcli version target API Move commands from `versionepoch` to `targetversion` top level command. * Add fdbcli tests for * Temporarily disable targetversion cli tests * Fix version epoch fetch issue * Fix Arena issue * Reduce max version jump in simulation to 1,000,000 * Rework fdbcli API It now requires two commands to fully switch a cluster to using the version epoch. First, enable the version epoch with `versionepoch enable` or `versionepoch set <versionepoch>`. At this point, versions will be given out at a faster or slower rate in an attempt to reach the expected version. Then, run `versionepoch commit` to perform a one time jump to the expected version. This is essentially irreversible. * Temporarily disable old targetversion tests * Cleanup * Move version epoch buggify to sequencer This will cause some issues with the QuietDatabase check for the version offset - namely, it won't do anything, since the version epoch is not being written to the txnStateStore in simulation. This will get fixed in the future. Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>	2022-04-08 12:33:19 -07:00
Dan Lambright	c106847e3e	Merge branch 'main' into vv	2022-04-08 15:05:51 -04:00
Xiaoxi Wang	5113277e9a	Merge branch 'main' into readaware	2022-04-08 11:50:01 -07:00
Steve Atherton	11a5d14a11	Merge pull request #6108 from sfc-gh-satherton/redwood-header-changes Redwood page format refactor to support format evolution, forensic analysis and future encryption scheme	2022-04-08 10:59:12 -07:00
Dan Lambright	5bdc525353	Merge branch 'main' into vv	2022-04-08 13:16:04 -04:00
sfc-gh-tclinkenbeard	e27b0d9ab5	Merge remote-tracking branch 'origin/main' into improve-snapshot-fault-tolerance	2022-04-07 23:30:16 -07:00
Zhe Wu	e017faa6c4	grey failure detection account for the case where the connection between primary and satellite DC becomes bad.	2022-04-07 17:34:13 -07:00
Xiaoxi Wang	150b4318aa	refactor datadistribution command; try dual-mode code	2022-04-06 22:10:23 -07:00
Neethu Haneesha Bingi	0d05669c22	Adding rocksdb block size option.	2022-04-06 19:15:17 -07:00
Zhe Wu	5fd494a57b	Allow worker health monitor to report recent destroyed peers who currently have roles in transaction systems	2022-04-06 13:33:50 -07:00
Xiaoxi Wang	aba9d85560	merge main	2022-04-06 09:57:52 -07:00
Steve Atherton	5961b801cc	Keep decode caches in page cache based on page height minimum set by a knob with a default value of 2.	2022-04-05 15:11:07 -07:00
Josh Slocum	aaaf42525a	misc bg operational fixes and improvements	2022-04-05 12:26:00 -05:00
Zhe Wu	1c6dfae48e	Making gray failure also monitors connection failures	2022-04-05 09:59:05 -07:00
Dan Lambright	60c55e0785	Merge remote-tracking branch 'origin/version-vector-prototype' into vv	2022-04-05 11:17:39 -04:00
Josh Slocum	cb918b9cef	Added basic blob granule consistency check	2022-04-04 11:38:42 -05:00
Josh Slocum	268caa5ac8	fixing shard size knobs outside of simulation	2022-04-04 11:38:18 -05:00
sfc-gh-tclinkenbeard	4f61c86b69	Add MAX_COORDINATOR_SNAPSHOT_FAULT_TOLERANCE knob	2022-04-03 23:28:57 -07:00
sfc-gh-tclinkenbeard	253db642be	Add MAX_SNAPSHOT_FAULT_TOLERANCE knob	2022-04-03 22:31:45 -07:00
Jingyu Zhou	4200ffe53b	Tune down BLOCKING_PEEK_TIMEOUT to 0.4s Many simulation failures seems to be due to this knob being too high: when a read range request comes, the version is bumped by about 1M, which causes read version to be before storage version.	2022-04-02 12:26:35 -07:00
Jingyu Zhou	64d4658034	Merge branch 'main' into vv Fix Conflicts: flow/error_definitions.h	2022-04-01 21:49:24 -07:00
Bharadwaj V.R	f749aac223	Merge branch 'apple:main' into ssupdateb4registration	2022-03-31 18:59:44 -07:00
Chaoguang Lin	7d365bd1bb	Remote ikvs debugging (#6465 ) * initial structure for remote IKVS server * moved struct to .h file, added new files to CMakeList * happy path implementation, connection error when testing * saved minor local change * changed tracing to debug * fixed onClosed and getError being called before init is finished * fix spawn process bug, now use absolute path * added server knob to set ikvs process port number * added server knob for remote/local kv store * implement simulator remote process spawning * fixed bug for simulator timeout * commit all changes * removed print lines in trace * added FlowProcess implementation by Markus * initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child * temporary fix for process factory throwing segfault on create * specify public address in command * change remote kv store knob to false for jenkins build * made port 0 open random unused port * change remote store knob to true for benchmark * set listening port to randomly opened port * added print lines for jenkins run open kv store timeout debug * removed most tracing and print lines * removed tutorial changes * update handleIOErrors error handling to handle remote-ikvs cases * Push all debugging changes * A version where worker bug exists * A version where restarting tests fail * Use both the name and the port to determine the child process * Remove unnecessary update on local address * Disable remote-kvs for DiskFailureCycle test * A version where restarting stuck * A version where most restarting tests green * Reset connection with child process explicitly * Remove change on unnecessary files * Unify flags from _ to - * fix merging unexpected changes * fix trac.error to .errorUnsuppressed * Add license header * Remove unnecessary header in FlowProcess.actor.cpp * Fix Windows build * Fix Windows build, add missing ; * Fix a stupid bug caused by code dropped by code merging * Disable remote kvs by default * Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune * serialization change on readrange * Update traces * Refactor the RemoteIKVS interface * Format files * Update sim2 interface to not clog connections between parent and child processes in simulation * Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled * Add comments, format files * Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections * Commit the IConnection interface change, forgot in previous commit * Fix the issue that onClosed request is cancelled by ActorCollection * Enable the remote kv store knob * Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process * Fix the bug where one process starts storage server more than once * Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally * Remove unreachable code path and add comments * Clang format the code * Fix a simple wait error * Clang format after merging the main branch * Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false * Disable remote kvs for PhysicalShardMove which is for RocksDB * Cleanup #include orders, remove debugging traces * Revert the reorder in fdbserver.actor.cpp, which fails the gcc build Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>	2022-03-31 17:08:59 -07:00
Jingyu Zhou	4fd414a8ed	Merge branch 'main' into vv Fix Conflicts: fdbclient/NativeAPI.actor.cpp	2022-03-31 09:38:36 -07:00
Jingyu Zhou	cfcf0f152c	Merge branch 'main-4a085fc84' into vv Fix Conflicts: fdbclient/NativeAPI.actor.cpp fdbserver/ClusterRecovery.actor.cpp fdbserver/MasterInterface.h fdbserver/masterserver.actor.cpp flow/error_definitions.h	2022-03-30 22:28:06 -07:00
Jingyu Zhou	b34f4052cd	Merge branch 'main-f28dfc12b' into vv Fix Conflicts: fdbclient/MultiVersionTransaction.actor.cpp fdbclient/NativeAPI.actor.cpp fdbclient/NativeAPI.actor.h fdbserver/storageserver.actor.cpp	2022-03-30 21:01:25 -07:00
Jingyu Zhou	00b57d4cce	Merge branch 'main-67eba5ec7' into vv Fix Conflicts: fdbclient/CommitProxyInterface.h fdbclient/NativeAPI.actor.cpp fdbclient/StorageServerInterface.h fdbserver/CommitProxyServer.actor.cpp fdbserver/storageserver.actor.cpp	2022-03-30 20:05:55 -07:00
Jingyu Zhou	e9659b5dd4	Merge branch 'master-PR-6500' into vv Fix Conflicts: fdbclient/CommitProxyInterface.h fdbclient/NativeAPI.actor.cpp fdbserver/masterserver.actor.cpp	2022-03-30 14:53:49 -07:00
Bharadwaj V.R	726cb3a18f	merge commits from main	2022-03-28 22:49:03 -07:00
Bharadwaj V.R	6d7a4b91c8	Create a server knob to control server version convergence threshold before SSI registration. Watch version lag in SS update loop and register when within lag limit	2022-03-27 12:37:38 -07:00
Xiaoxi Wang	d93b57dd88	conflict solving	2022-03-24 20:45:51 -07:00
Xiaoxi Wang	1b631a9263	solve conflict with main	2022-03-24 16:29:11 -07:00
Ata E Husain Bohra	017709aec6	Introduce BlobCipher interface and cipher caching interface (#6391 ) * Introduce BlobCipher interface and cipher caching interface diff-3: Update the code to avoid deriving encryption key periodically. Implement EncyrptBuf interface to limit memcpys. Improve both unit test and simulation to better code coverage. diff-2: Add specific error code for OpenSSL AES call failures diff-1: Update encryption scheme to AES-256-CTR. Minor updates to Header to capture more information. Major changes proposed are: 1. Introduce encyrption header format. 2. Introduce a BlobCipher cipher key representation encoding following information: baseCipher details, derived encryption cipher details, creationTime and random salt. 3. Introduce interface to support block cipher encrytion and decrytion operations. Encyrption populates encryption header allowing client to persist them on-disk, this header is then read allowing decryption on reads. 4. Introduce interface to allow in-memory caching of cipher keys. The cache allowing mapping of "encryption domain" -> "base cipher id" -> "derived cipher keys" (3D hash map). This cache interface will be used by FDB processes participating in encryption to cache recently used ciphers (performance optimization). Testing: 1. Unit test to validate caching interface. 2. Update EncryptionOps simulation test to validate block cipher operations.	2022-03-24 07:31:49 -07:00
Evan Tschannen	4a085fc844	Merge pull request #6602 from apple/blob_integration Blob integration	2022-03-23 12:02:43 -07:00
Neethu Haneesha Bingi	790ef9ff36	Adding rocksdb compaction readahead size option.	2022-03-22 13:29:35 -07:00
Josh Slocum	f27475e2f4	Merge branch 'main' into blob_integration	2022-03-22 11:41:58 -05:00
sfc-gh-tclinkenbeard	a71099471b	Update copyright header dates	2022-03-21 13:36:23 -07:00
Yi Wu	264b3811af	update the lag knob	2022-03-18 14:56:04 -07:00
Yi Wu	27df6df950	Redwood: config remap cleanup by size instead of versions	2022-03-18 14:56:04 -07:00
Josh Slocum	37e7c80f26	Merge branch 'main' into blob_integration	2022-03-17 18:45:42 -05:00
Trevor Clinkenbeard	10c536c700	Merge pull request #6435 from sfc-gh-ljoswiak/fixes/dynamic-knobs-release-readiness Dynamic knobs improvements	2022-03-16 10:26:56 -07:00
He Liu	c3a68d661e	Physical Shard Move (#6264 ) Physical Shard Move part I: Checkpoint creation, transfer and restore.	2022-03-15 13:03:23 -07:00
Josh Slocum	67eba5ec7c	Limiting DD Moves by destination SS.	2022-03-15 13:52:19 -05:00
Josh Slocum	26cbe6863d	Adjusting simulation knobs to reduce change feed memory overhead for large merge cases	2022-03-14 11:48:26 -05:00
Ata E Husain Bohra	944ec48415	Introduce a simulate EncryptKeyVaultProxy interface (#6576 ) Description Major changes proposed are: 1. Rename ServerKnob->ENABLE_ENCRYPT_KEY_PROXY to ServerKnob->ENABLE_ENCRYPTION. Approach simplifies enabling controlling encyrption code change using a single knob (desirable) 2. Implement EncyrptKeyVaultProxy simulated interface to assist validating encyrption workflows in simulation runs. The interface is leveraged to satisfy "encryption keys" lookup which otherwise gets satisfied by integrating organization preferred Encryption Key Management solution. Testing Unit test to validate the newly added code	2022-03-10 12:06:49 -08:00
Tao Lin	e2c7c30faf	GetMappedRange support serializable & check RYW & continuation (#6181 )	2022-03-10 10:05:44 -08:00
Josh Slocum	c8c97e0256	Blob Worker focused cleanup	2022-03-10 09:55:23 -06:00
Josh Slocum	1f964ac085	BM focused cleanup	2022-03-09 15:04:17 -06:00
Josh Slocum	e71b3533f9	Merge branch 'main' into blob_integration	2022-03-09 08:59:56 -06:00
Xiaoxi Wang	8d20ee8432	change storage metrics of read sample calculation	2022-03-07 17:36:43 -08:00
neethuhaneesha	212deb05e9	Merge pull request #6501 from neethuhaneesha/rocksdb-CompBytesLimiter Rocksdb knobs for compaction, storageserver canCommit() waiting if rocksdb overloaded	2022-03-07 14:59:34 -08:00
Steve Atherton	8f8f95931b	In the SQLite storage engine, destroy and create new cursors after SQLITE_CURSOR_MAX_LIFETIME_BYTES of KV read usage because cursor usage increases page cache memory usage in SQLite (internal page cache, not AsyncFileCached) by pinning pages which are not freed until the cursor is destroyed.	2022-03-07 11:20:59 -08:00
Neethu Haneesha Bingi	8796a763a5	Rocksdb knobs for compaction, storageserver canCommit() waiting if rocksdb overloaded.	2022-03-04 12:41:17 -08:00
Neethu Haneesha Bingi	83e0368eaa	RocksDB increasing background threads to speedup compaction.	2022-03-03 15:14:37 -08:00
Trevor Clinkenbeard	fe957deef8	Merge pull request #6399 from sfc-gh-bvr/fdb#4271 Introduce a new server knob and use it to test if storage servers are…	2022-02-28 13:02:23 -08:00
Zhe Wang	f14e08a991	addRocksDBPerfContextMetrics	2022-02-23 22:29:07 -05:00
Dan Lambright	8cc9a5af1a	Rebase 02/23	2022-02-23 14:23:28 -05:00
Evan Tschannen	330b2b48ec	improved file cleanup execution and testing	2022-02-22 12:00:09 -08:00
Lukas Joswiak	a8828db58e	Load balance dynamic knob requests This commit also removes an attempt to read the latest configuration snapshot when a rollforward timeout occurs. The normal retry loop will eventually fetch an up to date snapshot and the rollforward will be retried.	2022-02-22 10:53:48 -08:00
Josh Slocum	38a75a8b89	Merge branch 'main' into blob_integration	2022-02-17 17:47:38 -06:00
Bharadwaj V.R	a54acb3720	Temporarily lower safety buffer knob. AtomicBackupCorrectness needs fixing	2022-02-16 19:26:40 -08:00
Bharadwaj V.R	27855bc5b5	Merge branch 'apple:main' into fdb#4271	2022-02-16 15:38:36 -08:00
Zhe Wu	9da735c38e	Batch empty peek reply	2022-02-16 15:28:56 -08:00
Bharadwaj V.R	949f1f1c3e	Switch to testing MIN_AVAILABLE_SPACE	2022-02-16 11:33:07 -08:00
Bharadwaj V.R	3fe6a952f1	Merge with upstream tcinfo refactor and move the server knob init to be adjacent to related knobs	2022-02-16 10:28:55 -08:00
Bharadwaj V.R	fe03e6f822	Introduce a new server knob and use it to test if storage servers are near the min bar for available space	2022-02-15 22:43:06 -08:00
Trevor Clinkenbeard	ef68e6fe0d	Merge pull request #6353 from sfc-gh-ljoswiak/fixes/dynamic-knobs Fix dynamic knobs correctness issues	2022-02-10 22:13:02 -08:00
Sreenath Bodagala	c82180501a	Merge remote-tracking branch 'apple-upstream/main' into version-vector-prototype	2022-02-10 22:22:34 +00:00
Zhe Wang	d684508540	Add RatekeeperLimitReasonDetails traceevent for RK	2022-02-10 13:59:47 -08:00
Josh Slocum	c8cd8c0622	Adding request timeout for blob worker	2022-02-09 15:49:33 -06:00
Lukas Joswiak	990e215a8d	Fix formatting Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>	2022-02-09 13:43:32 -08:00
Lukas Joswiak	d5a562e6b8	Fix dynamic knobs correctness issues	2022-02-09 13:43:32 -08:00
Sreenath Bodagala	72e06369a4	Merge remote-tracking branch 'apple-upstream/main' into version-vector-prototype	2022-02-08 17:47:57 +00:00
Josh Slocum	88cab7fb67	More change feed fetching improvements and optimizations	2022-02-07 11:19:51 -06:00
Josh Slocum	ddfc301d74	Improving memory footprint of change feeds and making it configurable	2022-02-04 16:41:25 -06:00
Dan Lambright	b29e4ce506	Enable unicast	2022-01-28 13:15:18 -05:00
Ata E Husain Bohra	87ee4cf958	Add new FDB EncryptKeyProxy role Major changes includes: 1. Add a new FDB role responsible- EncyrptKeyProxy. The role is responsible to expose APIs to fetch encyrption keys interacting with external Encryption KeyManager interface. 2. The process is a FDB singleton process following similar recruitment rules as other singleton processes in the system. 3. Code to recruit the worker process; given the encryption keys are needed during recovery (decode TLog records), for now the process is co-located in same datacenter as ClusterController. 4. Skeleton process actor code; more functionality will be added in subsequent PRs. NOTE: The code is protected under a SERVER_KNOB with the default value as 'false' for now.	2022-01-25 17:38:27 -08:00
Josh Slocum	cf45354833	switched buggified and expected shard size for simulation	2022-01-20 20:37:03 -08:00
Josh Slocum	4bfef29e4c	Changed small shards in simulation logic	2022-01-20 20:37:03 -08:00
Josh Slocum	6a8e9d71d2	Raising default minimum shard size, as it causes unecessary merging on growing clusters.	2022-01-20 20:37:03 -08:00
Steve Atherton	2384c5aeb9	Change Redwood default page size knob to 8192.	2022-01-20 20:31:26 -08:00
Dan Lambright	9544379cdf	rebase	2022-01-20 11:12:33 -05:00
Neethu Haneesha Bingi	162bce7a58	Rocksdb write rate limiter.	2022-01-18 13:23:00 -08:00
Neethu Haneesha Bingi	ef4038fe8d	Rocksdb read range iterator pool to reuse iterators.	2022-01-18 02:05:21 -08:00
Ata E Husain Bohra	936bf5336a	Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191 ) * Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine"" Major changes includes: 1. Re-revert Sequencer refactor commits listed below (in listed order): 1.a. This reverts commit `bb17e194d9`. 1.b. This reverts commit `d174bb2e06`. 1.c. This reverts commit `30b05b469c`. 2. Update Status.actor to track ClusterController interface to track recovery status. 3. Introduce a ServerKnob to define "cluster recovery trace event" prefix; for now keeping it as "Master", however, it should allow smooth transition to "Cluster" prefix as it seems more appropriate.	2022-01-06 12:15:51 -08:00
Neethu Haneesha Bingi	1f30368e71	KeyValueStoreRocksDB histograms to track latencies	2021-12-21 23:09:46 -08:00
Suraj Gupta	310d990b12	Add debugging.	2021-12-10 16:43:58 -06:00
Suraj Gupta	3fe7a9f553	More fixes	2021-12-10 16:43:58 -06:00
Suraj Gupta	932f68e1b3	Refactor bstore and add watch key.	2021-12-10 16:43:58 -06:00
Dan Lambright	8aef860a8b	rebase	2021-12-10 11:24:07 -05:00
Dan Lambright	0222d8669d	fix simulation failures	2021-12-10 09:56:21 -05:00
Evan Tschannen	951bc4acd7	fix: do not call better master exists until the long lived stateless processes have settled into their desired locations	2021-12-06 13:12:27 -08:00
Tao Lin	9b0a9c4503	Return error when getRangeAndFlatMap has more & Improve simulation tests (#6029 )	2021-12-03 12:50:07 -08:00
Evan Tschannen	243927c964	added a knob	2021-12-03 10:31:51 -08:00
Steve Atherton	bed25f9571	Delay prioritized eviction of updated pages until after commit completes.	2021-11-28 21:03:44 -08:00
Evan Tschannen	8fa7085c78	added a comment	2021-11-24 11:40:41 -08:00
Evan Tschannen	c9ee83e1b1	fix: do not buggify PEEK_TRACKER_EXPIRATION_TIME to a value of 20	2021-11-24 11:28:57 -08:00
Steve Atherton	508429f30d	Redwood chunked file growth and low priority IO starvation prevention (#5936 ) * Redwood files now growth in large page chunks controlled by a knob to reduce truncate() calls for expansion. PriorityMultiLock has limit on consecutive same-priority lock release. Increased Redwood max priority level to 3 for more separation at higher BTree levels. * Simulation fix, don't mark certain IO timeout errors as injected unless the simulated process has been set to have an unreliable disk. * Pager writes now truncate gradually upward, one chunk at a time, in response to writes, which wait on only the necessary truncate operations. Increased buggified chunk size because truncate can be very slow in simulation. * In simulation, ioTimeoutError() and ioDegradedOrTimeoutError() will wait until at least the target timeout interval past the point when simulation is sped up. * PriorityMultiLock::toString() prints more info and is now public. * Added queued time to PriorityMultiLock. * Bug fix to handle when speedUpSimulation changes later than the configured time. * Refactored mutation application in leaf nodes to do fewer comparisons and do in place value updates if the new value is the same size as the old value. * Renamed updatingInPlace to updatingDeltaTree for clarity. Inlined switchToLinearMerge() since it is only used in one place. * Updated extendToCover to be more clear by passing in the old extension future as a parameter. Fixed initialization warning.	2021-11-12 13:47:07 -08:00
Dan Lambright	4979ccb889	commits recovered if written to every tlog minus failure tolerance.	2021-11-12 12:10:04 -05:00
Jingyu Zhou	ad9ff18f69	Flip PROXY_USE_RESOLVER_PRIVATE_MUTATIONS to true	2021-11-11 11:26:18 -08:00
Daniel Smith	394b9dc619	Code review changes	2021-11-10 11:53:27 -05:00
Daniel Smith	f6342b0a8d	Update defaults	2021-11-10 11:51:05 -05:00
Daniel Smith	66520eb1c1	Utilize read types to do selective throttling	2021-11-10 11:51:04 -05:00
Tao Lin	fdb3b72e35	Introduce GetRangeAndFlatMap to push computations down to FDB Re-introduce #5609	2021-11-09 13:52:28 -08:00
Tao Lin	586cc3b102	Revert "Introduce GetRangeAndFlatMap to push computations down to FDB"	2021-11-04 08:46:56 -07:00
Tao Lin	0853661d13	Introduce getRangeAndHop to push computations down to FDB	2021-11-03 13:21:16 -07:00
Dan Lambright	befe1993c4	fix conflict on rebase	2021-10-29 12:25:26 -04:00
Sreenath Bodagala	7b0c39572e	- Disable PROXY_USE_RESOLVER_PRIVATE_MUTATIONS in simulation tests	2021-10-28 21:31:53 +00:00
Xiaoxi Wang	1a2a838df3	add knob	2021-10-27 09:08:37 -07:00
Xiaoxi Wang	69190ed04e	format	2021-10-27 09:08:37 -07:00
Xiaoxi Wang	0053b4793e	change knob and delete redundant doBuildTeam	2021-10-27 09:08:37 -07:00
Evan Tschannen	2208b04174	Merge pull request #5855 from sfc-gh-etschannen/blob_full_clean Blob Granules V0	2021-10-26 09:57:35 -07:00
Lukas Joswiak	c96f560cbe	Verify rollback of a single version in simulation, other small fixes	2021-10-25 12:03:22 -07:00
Josh Slocum	0ff8ddc2b6	Merge branch 'master' into blob_full_clean	2021-10-25 13:38:48 -05:00
Steve Atherton	d153519188	Merge pull request #5813 from sfc-gh-jslocum/ss_ebrake_streaming_fix Fixes to ss e-brake, tlog streaming, and their interaction	2021-10-22 10:46:17 -07:00
Josh Slocum	773886515e	Merge branch 'feature-range-feed' into blob_full_clean	2021-10-22 11:07:51 -05:00
Zhe Wu	0cf829ef91	Reduce restore error message	2021-10-20 14:02:48 -07:00
sbodagala	48a0ecd647	Merge pull request #5787 from dlambrig/integrate-PR5700 version vector / Calculate TPCV on resolvers	2021-10-20 16:35:44 -04:00
Josh Slocum	8dd7f8f447	Fixes to ss e-brake, tlog streaming, and their interaction	2021-10-20 10:48:29 -05:00
Suraj Gupta	ff0d687704	Cleanup comments for server knobs.	2021-10-18 17:45:30 -04:00
Sreenath Bodagala	246f035afe	Merge remote-tracking branch 'apple-upstream/master' into version-vector-prototype	2021-10-18 19:07:33 +00:00
Dan Lambright	eb814ce070	Remove dead code, check replies match for all resolvers	2021-10-18 10:23:08 -04:00
A.J. Beamon	507a09893c	Add ClientCount to ClusterControllerMetrics (#5748 )	2021-10-17 20:47:11 -07:00
Jingyu Zhou	eaf0a00502	Enable PROXY_USE_RESOLVER_PRIVATE_MUTATIONS for simulation tests	2021-10-15 09:47:23 -04:00
Jingyu Zhou	0dc9c607f4	Add knob PROXY_USE_RESOLVER_PRIVATE_MUTATIONS To control proxy to use private mutations from resolvers or not.	2021-10-15 09:47:23 -04:00
Josh Slocum	5f0ec0612a	Merge branch 'feature-range-feed' into blob_full	2021-10-13 15:44:35 -05:00
Suraj Gupta	2ec8781224	Merge knobs into one.	2021-10-13 14:00:37 -04:00

... 3 4 5 6 7 ...

529 Commits