Commit Graph

167 Commits

Author SHA1 Message Date
Zhe Wang 33eecd0775
Real-time corruption detection with accumulative checksum (#11255)
* acs framework

* code refactor and fix bugs

* add ss crash loop protector

* use sharedptr instead of raw pointer

* fixed critical bugs and add provate mutation acs to the framework

* enable ACS for all mutations except for clear serverTag mutation and fix bugs

* fix restarting tests

* refactor code and fix bugs

* fix AccumulativeChecksumState toString

* fix bugs

* allow all mutations in acs and fixed bugs

* fix bugs and code cleanup

* code clean up for adding recovery support

* simplify code and support recovery

* clear acs state at ss

* fix bug

* terminate validator if ss will be removed in the current batch

* simplify code

* add trace

* address comments

* optimize code

* deep copy when adding mutation to acs validator

* warp encode and decode persist acs key

* make acstable private

* remove unless func

* remove unless func

* remove epoch in ACS validator

* add acs mutation counter in SS metrics

* code cleanup and make knob check better

* make mutation buffer global

* simplify code

* add comments

* make knob randomly set

* address comments

* ss reboot after acs mismatch found
2024-04-04 15:03:44 -07:00
Zhe Wang b10c7107bb
Enable Accumulative Checksum in MutationRef (#11225)
* code clean up and add accumulative checksum bits to mutation ref

* address comments and fix issues

* address comments

* propagate acs index from commit proxy to storage server

* address comments

* address comments

* address comments

* address comments
2024-03-11 09:51:31 -07:00
He Liu 9d8d52cbb7
Added checksum in MutationRef (#11181)
* Append checksum to param2.

* Pass sim tests w/o validating checksums.

* Code cleanup.

* Renew checksum.

* Remove checksum for all private mutations.

* Added checksum validation at SS.

* Fixed VERSION_TIMESTAMP.

* Disable Mutation Checksum by default.

* Cleanup.

* cleanup.
2024-02-09 13:36:41 -08:00
Dimitris Apostolou a88114c222
Fix typos 2024-02-07 01:16:00 +02:00
Dan Lambright 86a2301faa updated per review comments 2024-01-03 12:21:54 -05:00
Dan Lambright 20882507f4 sanity checks, fix knob 2024-01-02 12:09:32 -05:00
Dan Lambright 857e38b80b bug fixes/cleanup 2023-12-21 16:39:22 -05:00
Dan Lambright 05571c59a9 Set tags on apply metadata mutations 2023-12-21 13:20:21 -05:00
Dan Lambright 2b4b4ae512 Synthesize data on SS based off parameters from new system transaction 2023-12-20 11:25:47 -05:00
Dan Lambright 5ebe8b0915 move data to value and parse it 2023-12-18 09:10:06 -05:00
Dan Lambright a20f9d3475 Interfaces to synthesize data 2023-12-13 15:19:17 -05:00
Evan Tschannen ef682d304e fix IKeyValueStore include 2023-06-16 13:28:40 -07:00
Evan Tschannen 359e178dcd Merge branch 'main' into feature-durable-change-feed
# Conflicts:
#	fdbclient/ClientKnobs.cpp
#	fdbserver/BlobManager.actor.cpp
#	fdbserver/worker.actor.cpp
2023-06-11 13:58:35 -07:00
He Liu 8ad7ec6fdf
Psm ss (#9817)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* Refactored ShardedRocks checkpoint/restore for psm.

* Complete ShardedRocks::restore.

* dismiss operation_obsolete, and throw actor_cancelled.

* Validate checkpoint when !asKeyValues.

* fmt.

* Don't read from uninitialized physical shard.

* Resolved commments.

* cleanup.

* Added verify_checksum_before_restore for ShardedRocks.

* Added ShardedRocksDB checkpoint/restore unit test.

* Populate CheckpointMetaData::dir in RocksDB.

* Rename MovingIn as Adding.

* Added StorageServerUtils.

* Added physical shard move in SS.

* Fix on ApplyMetaData, doFetchFile error handling etc.

* Debugging incorrect shard size.

* Create/delete checkpoints only when Physical shard move is enabled.

* Added back SHARD_ENCODE_LOCATION_METADATA.

* Fixed bytesSample incorrect issue.

Essentially dedicated CheckpointRocksDBCF as key-value based checkpoint, will need to add a new format for the file-based checkpoint.

* Cleanup.

* Cleanup & compile rocksdb with 8.1 branch.

* clean up.

* clean up.

* Allowed request_maybe_delivered error type in FetchShard.

* Added FDBRocksDBVersion.h.

* Fixed stuck fetchShard.

* Don't create checkpoint on TSS.

* Upgrade to RocksDB 8.1.1

* Cleanup.

* Fixed accidently deleted db_path and name fields.

* Improved trace event.

* Removed redundants from previuos ShardedrocksDB.

* Cleanup.

* cleanup.

* cleanup.

* reanme `state`.

* Cleanup.

* Removed excessive TraceEvent.

* * Fixed shardMap race condition on different threads
* Added *Stats, logging data move rates.
* Added `DD_PHYSICAL_SHARD_MOVE_PROBABILITY` to support hybrid data move.

* Resolved comments.

* fmt.

* Use physical shard move in PhysicalShardMoveTest.

* Enforce physical-shard-move for PhysicalShardMoveTest.

* fmt
2023-05-23 11:18:35 -07:00
A.J. Beamon d8141c049d Add code probes for tenant code 2023-05-10 20:44:39 -07:00
Evan Tschannen 3dd86d6c22 move IKeyValueStore.h to the client 2023-05-10 15:41:47 -07:00
A.J. Beamon f1cbc86b94 Add a metacluster version to the MetaclusterRegistrationEntry and validate it when loading the entry from the cluster. 2023-04-27 10:04:57 -07:00
Steve Atherton 53ee26d758 Changed KeyBackedTypes to an actor file. Added TypedKeySelectors for Map and Set classes and getRange() keySelector methods. Added debug macro for KeyBackedTypes. Rewrote KeyBackedRangeMap using keyselectors on KeyBackedMap. 2023-04-18 22:21:19 -07:00
A.J. Beamon f0dfe68ded When enabling checkpoints in apply metadata mutation, validate the existence of any server tag key we try to use 2023-04-13 12:26:58 -07:00
A.J. Beamon 807646675c Refactor the metacluster project into smaller files, and reorganize the namespaces. Move some metacluster and tenant testing helpers into the metacluster project. 2023-03-30 16:20:09 -07:00
A.J. Beamon e61748c7d5 Move metacluster into its own directory and static library 2023-03-30 16:07:49 -07:00
A.J. Beamon dd650215d4 Store a smaller tenant object in the txn state store 2023-02-23 09:29:33 -08:00
Markus Pilman 4e31cd7582 Fix compilation error due to TenantLockState being moved to a different namespace 2023-02-22 15:21:47 -07:00
Markus Pilman 15d8548c0e Merge remote-tracking branch 'origin/main' into features/tenant-lock2
# Conflicts:
#	fdbserver/ApplyMetadataMutation.cpp
#	fdbserver/storageserver.actor.cpp
2023-02-21 13:39:35 -07:00
Yi Wu eac757d186
EaR: cleanup encryption knobs (#9386)
Changes:
* Cleanup all encryption knobs 
* Update simulated cluster to randomly enable encryption with higher probability
2023-02-18 13:18:20 -08:00
Yi Wu fe18c87ac6
EaR: commit proxy fetch additional cipher keys post-resolution (#9308)
Commit proxy needs to fetch additional cipher keys post-resolution, since tenant ids for raw access requests and cross-tenant clear ranges are calculated after resolution.
2023-02-14 13:05:51 -08:00
Markus Pilman 1c712df709 Tenant lock workload succeeding 2023-02-08 15:51:09 +01:00
Markus Pilman 280e50d42d fixed compilation issues 2023-02-06 11:56:39 +01:00
Markus Pilman 64c11c1f34 Implemented tenant lock and basic test 2023-01-26 15:47:39 +01:00
A.J. Beamon fd13bc04c8 Update the tenant maps to be keyed by ID 2023-01-23 14:09:12 -08:00
Nim Wijetunga 330ac71630
Tenant Deletion Support for Backup Mutation Log (#9103)
tenant deletion support for backup mutation log
2023-01-18 15:11:58 -08:00
Nim Wijetunga 1675502d76
Blob Worker Encryption doesn't use BG_METADATA_SOURCE (#9121)
* bw encrypt doesnt use knob

* Trigger Build
2023-01-11 14:03:25 -08:00
Nim Wijetunga 114eb4a3a6
Resolver uses Encryption DB Config (#9002)
Resolver uses encryption DB config
2023-01-10 17:11:14 -08:00
A.J. Beamon b93136b911 Update tenant maps on the commit proxy to be ID based and not store the full TenantMapEntries 2022-12-07 14:49:37 -08:00
He Liu 3d2124df80
Checkpoint restore sharded rocks (#8758)
* Allow multiple keyranges in CheckpointRequest.
Include DataMove ID in CheckpointMetaData.

* Use UID dataMoveId instead of Optional<UID>.

* Implemented ShardedRocks::checkpoint().

* Implementing createCheckpoint().

* Attempted to change getCheckpointMetaData*() for a single keyrange.

* Added getCheckpointMetaDataForRange.

* Minor fixes for NativeAPI.actor.cpp.

* Replace UID CheckpointMetaData::ssId with std::vector<UID>
CheckpointMetaData::src;

* Implemented getCheckpointMetaData() and completed checkpoint creation
and fetch in test.

* Refactoring CheckpointRequest and CheckpointMetaData

rename `dataMoveId` as `actionId` and make it Optional.

* Fixed ctor of CheckpointMetaData.

* Implemented ShardedRocksDB::restore().

* Tested checkpoint restore, and added range check for restore, so that
the target ranges can be a subset of the checkpoint ranges.

* Added test to partially restore a checkpoint.

* Refactor: added checkpointRestore().

* Sort ranges for comparison.

* Cleanups.

* Check restore ranges are empty; Add ranges in main thread.

* Resolved comments.

* Fixed GetCheckpointMetaData range check issue.

* Fixed error description.

Co-authored-by: He Liu <heliu@apple.com>
2022-11-30 08:22:14 -08:00
sfc-gh-tclinkenbeard c03f60c618 Update rare code probe annotations 2022-11-15 13:21:25 -08:00
A.J. Beamon fc8929cde7 During a restore, the tenant map may not be self-consistent. For example, it is possible for a tenant to exist with two names if it was renamed during a backup. This updates the tenant maps in SS and CP to allow there to be multiple tenants with the same ID, but it expects there to only be one such tenant once the restore is complete and the data is accessed. 2022-11-02 09:05:31 -07:00
Jingyu Zhou 0135d9cee1
Merge pull request #8643 from jzhou77/fix
Remove unnecessary decodeServerTagValue calls
2022-11-01 12:28:08 -07:00
Jingyu Zhou bc098ff8d7 Save a copy for param2 2022-11-01 10:14:58 -07:00
Nim Wijetunga 24ce8c0fd0
Commit Proxy Encryption Code Probes (#8618)
* add commit proxy encryption code probes

* fix comment

* address pr comments

* address pr comments

* address pr comments

* address pr comments

* Trigger Build
2022-10-31 20:04:42 -07:00
Jingyu Zhou d76a003351 Remove unnecessary decodeServerTagValue calls 2022-10-31 16:58:08 -07:00
Lukas Joswiak 9d3c3b1efe Remove cluster ID logic from individual roles
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
2022-10-27 13:56:13 -07:00
Jingyu Zhou a8391caf23 Revert "Data loss protection v2" 2022-10-20 18:09:58 -05:00
Lukas Joswiak 72bc89cf39 Remove cluster ID logic from individual roles
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
2022-10-18 21:37:42 -07:00
Yi Wu ac6aaf3785
encryption: fix some data not being encrypted (#8403)
Changes:
1. Change `isEncryptionOpSupported` to not check against `clientDBInfo.isEncryptionEnabled`, but instead against ENABLE_ENCRYPTION server knob. The problem with clientDBInfo is before its being broadcast to the workers, its content is uninitialized, during which some data (e.g. item 2) is not getting encrypted when they should.
2. Fix CommitProxy not encrypting metadata mutations which are recovered from txnStateStore
3. Fix KeyValueStoreMemory (thus TxnStateStore) partial transaction coming from recovery is not encrypted
4. new CODE_PROBE for the above fixes
5. Logging changes
2022-10-12 14:18:56 -07:00
Markus Pilman ea1325a552
Merge pull request #8319 from sfc-gh-tclinkenbeard/add-rare-code-probe-annotation
Add `rare` code probe decoration
2022-10-07 09:39:00 -06:00
sfc-gh-tclinkenbeard 985958c260 Add rare code probe decoration 2022-09-25 15:28:32 -07:00
A.J. Beamon fda0d7223d Update backup to include system key ranges needed for tenants. Run simulated backup tests with tenants. 2022-09-22 10:00:13 -07:00
Ata E Husain Bohra d2b82d2c46
Introduce "default encryption domain" (#8139)
* Introduce "default encryption domain"

Description

In current FDB native encryption data at-rest implementation,
an entity getting encrypted (mutation, KV and/or file) is categorized
into one of following encryption domains:
1. Tenant domain, where, Encryption domain == Tenant boundaries
2. FDB system keyspace - FDB metadata encryption domain
3. FDB Encryption Header domain - used to generate digest for
plaintext EncryptionHeader.

The scheme doesn't support encryption if an entity can't be categorized
into any of above mentioned encryption domains, for instance, non-tenant
mutations are NOT supported.

Patch extend the encryption support for mutations for which corresponding
Tenant information can't be obtained (Key length shorter than TenantPrefix)
and/or mutations do not belong to any valid Tenant
(FDB management cluster data) by mapping such mutations to a
"default encryption domain".

TODO

CommitProxy driven TLog encryption implementation requires every transaction
mutation to contain 1 KV, not crossing Tenant-boundaries. Only exception to
this rule is ClearRange mutations. For now ClearRange mutations are mapped
to 'default encryption domain', in subsequent patch appropriate handling
for ClearRange mutations shall be proposed.

Testing

devRunCorrectness - 100k
2022-09-14 10:58:32 -07:00
Lukas Joswiak 74ac617a34 Add support for changing coordinators to the configuration database
Configuration database data lives on the coordinators. When a change
coordinators command is issued, the data must be sent to the new
coordinators to keep the database consistent.
2022-09-13 16:53:54 -07:00