Commit Graph

10523 Commits

Author SHA1 Message Date
Xiaoxi Wang 9358aea097 fix busy loop with correct error handling in valley filler 2022-09-21 14:58:34 -07:00
Xiaoxi Wang f9e0230b86 DDQueue constructor with ITxnProcessor 2022-09-21 10:56:22 -07:00
Xiaoxi Wang 97fd5878d9 change DDTeamCollection constructor 2022-09-20 13:00:28 -07:00
Chaoguang Lin 125137b987
Change the special key space correctness workload to hit code probe (#8214) 2022-09-19 15:01:21 -07:00
Josh Slocum 0f3f493c28
Merge pull request #8218 from sfc-gh-jslocum/aligned_purge
fixes for non-aligned blob range calls
2022-09-19 15:00:02 -05:00
A.J. Beamon 4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
Markus Pilman bd8347d92e
Merge pull request #7769 from sfc-gh-mpilman/features/always-inject-faults
Separated normal workloads and failure injection
2022-09-19 11:48:36 -06:00
Markus Pilman 028a5dcb77 by default, add a failure injection workload at most 3 times 2022-09-19 09:42:34 -06:00
Markus Pilman e1627e0a78 Merge remote-tracking branch 'origin/main' into features/always-inject-faults 2022-09-19 09:38:55 -06:00
Josh Slocum 88f88707f5 fixes for non-aligned blob range calls 2022-09-16 19:06:15 -05:00
Yi Wu e66942ada4
Update Redwood encryption interface (#8172)
Update Redwood encryption interface to make it better suit for per-tenant encryption, where we will need to do tenant page split.
2022-09-16 15:56:05 -07:00
Markus Pilman ef2d301305 Merge remote-tracking branch 'origin/main' into features/always-inject-faults 2022-09-16 16:41:16 -06:00
Andrew Noyes 2bdfc52f97
Fix heap use after free (#8189)
Previously, we had Ref types outliving the arena's that owned them,
specifically encryptDomains in the getResolution actor. Refactor to use
Standalone's, which both fixes the memory error and makes this easier to
reason about.

Also fix a potential ODR violation.
2022-09-16 13:46:05 -07:00
Markus Pilman 9441795c8e address review comments 2022-09-16 14:40:10 -06:00
Steve Atherton 4cbe8be459
Merge pull request #8201 from sfc-gh-satherton/kaio-latency-pctile
Changed AsyncFileKAIO Read/WriteSync latency metrics to LatencySample
2022-09-16 10:46:38 -07:00
Evan Tschannen f5161c362e
fix: persistentData->commit() was not protected by the persistentDataCommitLock (#8200)
* fix: persistentData->commit() was not protected by the persistentDataCommitLock, meaning it is possible for inconsistent data to be made durable on the tlog

* fixed a compilation error
2022-09-16 09:47:20 -07:00
sfc-gh-ngoyal 1bd97fe628
Recruit new singleton for consistency checker. (#5804)
* Recruit new singleton for consistency checker.

* Recruit the consistency checker only if enabled.

* Add a yield in monitorConsistencyChecker().

* Minor fixes.

* Consistency check workload enhancements.

* Minor fixes and clarifications.

* clang format

* Clang format.

* Minor fixes, cleanup, debug tracing.

* Misc.

* Move the consistency scan information from dbconfig to a key backed object.

* Move consistency scan config out of db cofig to a state object and feature rename.

* ConsistencyCheck workload refactor.

* devFormat

* Update fdbcli/ConsistencyScanCommand.actor.cpp

* Review Comments.

Co-authored-by: negoyal <neelam.goyal@gmail.com>
Co-authored-by: Ata E Husain Bohra <ata.husain@snowflake.com>
2022-09-16 09:03:06 -07:00
Steve Atherton 0e0a10ab45 Merge commit '59b04d46cff720df5d267daa8dc8a60c25466f74' into kaio-latency-pctile 2022-09-15 20:27:02 -07:00
Steve Atherton 2bf90ca5ec Change KAIO latency metrics to use LatencySample for easier usability. Rename a SQLite-specific knob to indicate it is specific to SQLite. 2022-09-15 13:27:23 -07:00
Hao Fu e9a12b4a1a
Fix memory bug in index prefetch code path (#8187)
Same address is being reused for multiple queries incorrectly.
2022-09-15 16:18:37 -04:00
Hui Liu b19f1b5e3b
Merge pull request #8109 from sfc-gh-huliu/bmr
Add blob manifest for full restore
2022-09-15 11:05:41 -07:00
Hui Liu 59be25848f bootstrap blob manager and blob worker from blob manifest 2022-09-15 09:50:12 -07:00
sfc-gh-tclinkenbeard b9fbd8c130 Fix -Wlogical-op-parentheses warning 2022-09-15 09:00:33 -07:00
sfc-gh-tclinkenbeard 82adc1e856 Make g_simulator a pointer 2022-09-15 09:00:33 -07:00
Steve Atherton c7924c9fb3
Merge pull request #8173 from sfc-gh-satherton/read-stage-latencies
Add new latency samples for GetValue, GetRange, QueueWait, and VersionWait
2022-09-14 17:31:51 -07:00
Steve Atherton c254de5031 Rename read latency trace events to a different scheme. 2022-09-14 16:16:23 -07:00
Lukas Joswiak cfbd04ae78 Add explicit check for a simulated network 2022-09-14 14:14:49 -07:00
Josh Slocum d4ba6c266c
Merge pull request #8176 from sfc-gh-jslocum/ss_cf_burst_fix_main
Fixing Thundering Herd problem of change feed stream retries in SS
2022-09-14 16:01:20 -05:00
Steve Atherton a8d3349898 Add latency sample for getKey. 2022-09-14 13:22:16 -07:00
Ata E Husain Bohra d2b82d2c46
Introduce "default encryption domain" (#8139)
* Introduce "default encryption domain"

Description

In current FDB native encryption data at-rest implementation,
an entity getting encrypted (mutation, KV and/or file) is categorized
into one of following encryption domains:
1. Tenant domain, where, Encryption domain == Tenant boundaries
2. FDB system keyspace - FDB metadata encryption domain
3. FDB Encryption Header domain - used to generate digest for
plaintext EncryptionHeader.

The scheme doesn't support encryption if an entity can't be categorized
into any of above mentioned encryption domains, for instance, non-tenant
mutations are NOT supported.

Patch extend the encryption support for mutations for which corresponding
Tenant information can't be obtained (Key length shorter than TenantPrefix)
and/or mutations do not belong to any valid Tenant
(FDB management cluster data) by mapping such mutations to a
"default encryption domain".

TODO

CommitProxy driven TLog encryption implementation requires every transaction
mutation to contain 1 KV, not crossing Tenant-boundaries. Only exception to
this rule is ClearRange mutations. For now ClearRange mutations are mapped
to 'default encryption domain', in subsequent patch appropriate handling
for ClearRange mutations shall be proposed.

Testing

devRunCorrectness - 100k
2022-09-14 10:58:32 -07:00
Steve Atherton 71aef89f8a
Merge pull request #8167 from sfc-gh-satherton/fix-ide-errors
Fix some IDE build errors and warnings.
2022-09-14 10:39:37 -07:00
Jingyu Zhou e70a18e638
Merge pull request #8122 from xumengpanda/mengxu/io-timeout-main
Add STORAGE_SERVER_REBOOT_ON_IO_TIMEOUT knob to reboot SS on io_timeout
2022-09-14 10:11:46 -07:00
Josh Slocum 3e5e49b635 Operational improvements to limit thundering herd effect of many change feed queries being retried simultaneously 2022-09-14 09:57:21 -05:00
Steve Atherton 3df6a86d22 Add new latency samples for GetValue, GetRange, QueueWait which is the time between request receipt and when it starts to be processed, and VersionWait which is the time after QueueWait spent waiting for the necessary data version to arrive. 2022-09-14 01:28:16 -07:00
Steve Atherton adff908c19 Fix compiler error, state pointer vars must be initialized. 2022-09-14 00:26:54 -07:00
Meng Xu 9e9efb69a0 Format code to repo style 2022-09-13 16:59:45 -07:00
Lukas Joswiak 8c50f98c00 Update type of coordinators hash
This fixes some serialization issues due to `BinaryReader` not being
able to deserialize types of size_t.
2022-09-13 16:53:54 -07:00
Lukas Joswiak 424bb87f3e Apply feedback 2022-09-13 16:53:54 -07:00
Lukas Joswiak e22138d3d3 Update fdbserver/include/fdbserver/ClusterRecovery.actor.h
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-09-13 16:53:54 -07:00
Lukas Joswiak 5a1f9f3e9e Update fdbserver/worker.actor.cpp
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-09-13 16:53:54 -07:00
Lukas Joswiak 2a26777d64 Disable coordinator change when using simple configuration database 2022-09-13 16:53:54 -07:00
Lukas Joswiak 7ee6be9238 Simplify how ConfigBroadcastInterface is stored on worker 2022-09-13 16:53:54 -07:00
Lukas Joswiak 809d77c2ab Fix issue where annotations were not being serialized 2022-09-13 16:53:54 -07:00
Lukas Joswiak b2d395a304 Delay cluster controller restart when pushing knob updates to workers
This gives the `ConfigBroadcaster` time to send the knob change to all
workers before applying the change to itself and restarting.
2022-09-13 16:53:54 -07:00
Lukas Joswiak 8d237ba493 Fix various correctness and timeout issues
Contains the following fixes:

* When handling the special case rollforward where nodes can be rolled
  forward even if a majority are at version 0, we don't want to reset
  the live version of the node being rolled forward. This is because a
  quorum of nodes at version 0 can continue handing out and incrementing
  their live version, and if they are rolled forward there is the
  potential for them to go back in time in regard to their live version.
  So in this one special case, they should maintain their existing live
  version.
* Fixes some unseed issues due to fields not being initialized properly.
* Temporarily disables a coordinator restart in the recovery path (in
  the coordinated state) due to it causing a timeout. This needs more
  investigation in the future.
2022-09-13 16:53:54 -07:00
Lukas Joswiak 249ff2b2fd Fix configuration database unit tests 2022-09-13 16:53:54 -07:00
Lukas Joswiak cd2bbffa4c Add flag to disable the configuration database
The `--no-config-db` flag, passed to `fdbserver`, will disable the
configuration database. When this flag is specified, no `ConfigNode`s
will be started, the `ConfigBroadcaster` will not be started, and on a
coordinator change no attempt will be made to lock `ConfigNode`s.
2022-09-13 16:53:54 -07:00
Lukas Joswiak 74ac617a34 Add support for changing coordinators to the configuration database
Configuration database data lives on the coordinators. When a change
coordinators command is issued, the data must be sent to the new
coordinators to keep the database consistent.
2022-09-13 16:53:54 -07:00
Trevor Clinkenbeard b641bd6c04
Merge pull request #8168 from sfc-gh-tclinkenbeard/add-doappendiffits-unittest
Add `/Atomic/DoAppendIfFits` unit test
2022-09-13 15:18:59 -07:00
Xiaoxi Wang 2ae01bdf2d
Merge pull request #8162 from sfc-gh-xwang/feature/main/moveKey
Implement txnProcessor->moveKeys(const MoveKeysParams& params)
2022-09-13 14:11:49 -07:00