Commit Graph

116 Commits

Author SHA1 Message Date
Trevor Clinkenbeard fe957deef8
Merge pull request #6399 from sfc-gh-bvr/fdb#4271
Introduce a new server knob and use it to test if storage servers are…
2022-02-28 13:02:23 -08:00
Zhe Wang f14e08a991 addRocksDBPerfContextMetrics 2022-02-23 22:29:07 -05:00
Bharadwaj V.R a54acb3720 Temporarily lower safety buffer knob. AtomicBackupCorrectness needs fixing 2022-02-16 19:26:40 -08:00
Bharadwaj V.R 27855bc5b5
Merge branch 'apple:main' into fdb#4271 2022-02-16 15:38:36 -08:00
Zhe Wu 9da735c38e Batch empty peek reply 2022-02-16 15:28:56 -08:00
Bharadwaj V.R 949f1f1c3e Switch to testing MIN_AVAILABLE_SPACE 2022-02-16 11:33:07 -08:00
Bharadwaj V.R 3fe6a952f1 Merge with upstream tcinfo refactor and move the server knob init to be adjacent to related knobs 2022-02-16 10:28:55 -08:00
Bharadwaj V.R fe03e6f822 Introduce a new server knob and use it to test if storage servers are near the min bar for available space 2022-02-15 22:43:06 -08:00
Trevor Clinkenbeard ef68e6fe0d
Merge pull request #6353 from sfc-gh-ljoswiak/fixes/dynamic-knobs
Fix dynamic knobs correctness issues
2022-02-10 22:13:02 -08:00
Zhe Wang d684508540 Add RatekeeperLimitReasonDetails traceevent for RK 2022-02-10 13:59:47 -08:00
Lukas Joswiak 990e215a8d Fix formatting
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-02-09 13:43:32 -08:00
Lukas Joswiak d5a562e6b8 Fix dynamic knobs correctness issues 2022-02-09 13:43:32 -08:00
Ata E Husain Bohra 87ee4cf958 Add new FDB EncryptKeyProxy role
Major changes includes:

1. Add a new FDB role responsible- EncyrptKeyProxy. The role is
   responsible to expose APIs to fetch encyrption keys interacting
   with external Encryption KeyManager interface.
2. The process is a FDB singleton process following similar recruitment
   rules as other singleton processes in the system.
3. Code to recruit the worker process; given the encryption keys are
   needed during recovery (decode TLog records), for now the process
   is co-located in same datacenter as ClusterController.
4. Skeleton process actor code; more functionality will be added in
   subsequent PRs.

NOTE: The code is protected under a SERVER_KNOB with the default
      value as 'false' for now.
2022-01-25 17:38:27 -08:00
Josh Slocum cf45354833 switched buggified and expected shard size for simulation 2022-01-20 20:37:03 -08:00
Josh Slocum 4bfef29e4c Changed small shards in simulation logic 2022-01-20 20:37:03 -08:00
Josh Slocum 6a8e9d71d2 Raising default minimum shard size, as it causes unecessary merging on growing clusters. 2022-01-20 20:37:03 -08:00
Steve Atherton 2384c5aeb9 Change Redwood default page size knob to 8192. 2022-01-20 20:31:26 -08:00
Neethu Haneesha Bingi 162bce7a58 Rocksdb write rate limiter. 2022-01-18 13:23:00 -08:00
Neethu Haneesha Bingi ef4038fe8d Rocksdb read range iterator pool to reuse iterators. 2022-01-18 02:05:21 -08:00
Ata E Husain Bohra 936bf5336a
Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine" (#6191)
* Revert "Revert "Refactor: ClusterController driving cluster-recovery state machine""

Major changes includes:
1. Re-revert Sequencer refactor commits listed below (in listed order):
1.a. This reverts commit bb17e194d9.
1.b. This reverts commit d174bb2e06.
1.c. This reverts commit 30b05b469c.

2. Update Status.actor to track ClusterController interface to track
   recovery status.
3. Introduce a ServerKnob to define "cluster recovery trace event"
   prefix; for now keeping it as "Master", however, it should allow
   smooth transition to "Cluster" prefix as it seems more appropriate.
2022-01-06 12:15:51 -08:00
Neethu Haneesha Bingi 1f30368e71 KeyValueStoreRocksDB histograms to track latencies 2021-12-21 23:09:46 -08:00
Tao Lin 9b0a9c4503
Return error when getRangeAndFlatMap has more & Improve simulation tests (#6029) 2021-12-03 12:50:07 -08:00
Steve Atherton bed25f9571 Delay prioritized eviction of updated pages until after commit completes. 2021-11-28 21:03:44 -08:00
Evan Tschannen 8fa7085c78 added a comment 2021-11-24 11:40:41 -08:00
Evan Tschannen c9ee83e1b1 fix: do not buggify PEEK_TRACKER_EXPIRATION_TIME to a value of 20 2021-11-24 11:28:57 -08:00
Steve Atherton 508429f30d
Redwood chunked file growth and low priority IO starvation prevention (#5936)
* Redwood files now growth in large page chunks controlled by a knob to reduce truncate() calls for expansion.   PriorityMultiLock has limit on consecutive same-priority lock release.  Increased Redwood max priority level to 3 for more separation at higher BTree levels.

* Simulation fix, don't mark certain IO timeout errors as injected unless the simulated process has been set to have an unreliable disk.

* Pager writes now truncate gradually upward, one chunk at a time, in response to writes, which wait on only the necessary truncate operations.   Increased buggified chunk size because truncate can be very slow in simulation.

* In simulation, ioTimeoutError() and ioDegradedOrTimeoutError() will wait until at least the target timeout interval past the point when simulation is sped up.

* PriorityMultiLock::toString() prints more info and is now public.

* Added queued time to PriorityMultiLock.

* Bug fix to handle when speedUpSimulation changes later than the configured time.

* Refactored mutation application in leaf nodes to do fewer comparisons and do in place value updates if the new value is the same size as the old value.

* Renamed updatingInPlace to updatingDeltaTree for clarity.  Inlined switchToLinearMerge() since it is only used in one place.

* Updated extendToCover to be more clear by passing in the old extension future as a parameter.  Fixed initialization warning.
2021-11-12 13:47:07 -08:00
Daniel Smith 394b9dc619 Code review changes 2021-11-10 11:53:27 -05:00
Daniel Smith f6342b0a8d Update defaults 2021-11-10 11:51:05 -05:00
Daniel Smith 66520eb1c1 Utilize read types to do selective throttling 2021-11-10 11:51:04 -05:00
Tao Lin fdb3b72e35 Introduce GetRangeAndFlatMap to push computations down to FDB
Re-introduce #5609
2021-11-09 13:52:28 -08:00
Tao Lin 586cc3b102
Revert "Introduce GetRangeAndFlatMap to push computations down to FDB" 2021-11-04 08:46:56 -07:00
Tao Lin 0853661d13 Introduce getRangeAndHop to push computations down to FDB 2021-11-03 13:21:16 -07:00
Xiaoxi Wang 1a2a838df3 add knob 2021-10-27 09:08:37 -07:00
Xiaoxi Wang 69190ed04e format 2021-10-27 09:08:37 -07:00
Xiaoxi Wang 0053b4793e change knob and delete redundant doBuildTeam 2021-10-27 09:08:37 -07:00
Evan Tschannen 2208b04174
Merge pull request #5855 from sfc-gh-etschannen/blob_full_clean
Blob Granules V0
2021-10-26 09:57:35 -07:00
Lukas Joswiak c96f560cbe Verify rollback of a single version in simulation, other small fixes 2021-10-25 12:03:22 -07:00
Josh Slocum 0ff8ddc2b6 Merge branch 'master' into blob_full_clean 2021-10-25 13:38:48 -05:00
Steve Atherton d153519188
Merge pull request #5813 from sfc-gh-jslocum/ss_ebrake_streaming_fix
Fixes to ss e-brake, tlog streaming, and their interaction
2021-10-22 10:46:17 -07:00
Josh Slocum 773886515e Merge branch 'feature-range-feed' into blob_full_clean 2021-10-22 11:07:51 -05:00
Zhe Wu 0cf829ef91 Reduce restore error message 2021-10-20 14:02:48 -07:00
Josh Slocum 8dd7f8f447 Fixes to ss e-brake, tlog streaming, and their interaction 2021-10-20 10:48:29 -05:00
Suraj Gupta ff0d687704 Cleanup comments for server knobs. 2021-10-18 17:45:30 -04:00
A.J. Beamon 507a09893c
Add ClientCount to ClusterControllerMetrics (#5748) 2021-10-17 20:47:11 -07:00
Josh Slocum 5f0ec0612a Merge branch 'feature-range-feed' into blob_full 2021-10-13 15:44:35 -05:00
Suraj Gupta 2ec8781224 Merge knobs into one. 2021-10-13 14:00:37 -04:00
Suraj Gupta a163619fbc Change default val for knob. 2021-10-13 09:58:09 -04:00
Suraj Gupta 5a6a052c55 Add a knob to gate blob-related work. 2021-10-13 09:48:02 -04:00
Zhe Wu c0fbe5471f Implement the core logic of grey failure triggered failover 2021-10-07 11:19:34 -07:00
Suraj Gupta 4d54669ccd Recruit the blob workers via blob manager.
In this PR, the blob manager now recruits blob workers
(via communication with the cluster controller). Blob workers
are onboarded as blob worker processes enter the cluster.
2021-10-04 11:07:08 -04:00