Commit Graph

7281 Commits

Author SHA1 Message Date
Sam Gwydir 6c16875c34
Add networkoption to disable non-TLS connections (#9984)
* Add networkoption to disable non-TLS connections

* add disable plaintext connection to fdbserver

* python doc

* Formatting

* Add tls disable plaintext connection to client api test

* review

* fix negative test

* formatting

* add TLS support to c client config tests

Adds support for TLS in the client and server separately

* add tests for disable_plaintext_connections

Test TLS and Plaintext Clusters and Clients

* Fix documentation

* Rename option to indicate it is client-only

* clearer formatting

* default to allowing plaintext connections

* add SetTLSDisablePlaintextConnection to go bindings
2023-05-13 00:14:11 +02:00
A.J. Beamon eacf817b2f Add metacluster code probes 2023-05-12 12:32:24 -07:00
Josh Slocum f82ea43198
copying headers into http request (#10227) 2023-05-11 20:18:12 -05:00
A.J. Beamon b15622c492 Fix formatting and unrelated windows build issue 2023-05-11 08:52:20 -07:00
neethuhaneesha 92d1da79a9
RocksDB WAL archive options. (#10211) 2023-05-10 21:36:18 -07:00
A.J. Beamon d8141c049d Add code probes for tenant code 2023-05-10 20:44:39 -07:00
Zhe Wang 8559d4f1a8
Adding cleanup of old audit metadata (#10137)
* clean up old audit metadata

* change comments

* fix audit cleanup rule as PR description claim and reduce timeout of auditStorageCorrectness in tester

* address comment

* clear audit metadata should not throw error

* cleanup progress metadata by type

* control number of AuditStatistic events

* carefully persist new audit state

* add unit tests and fix issues

* cleanup

* allow audit concurrent run for different types and fix some bug in auditutl

* fix ci issue and nits
2023-05-10 19:32:04 -07:00
Yao Xiao 995fba9254
Merge pull request #10152 from yao-xiao-github/main
Cherrypick multiple ShardedRocksDB improvements
2023-05-10 16:14:17 -07:00
Evan Tschannen 3dd86d6c22 move IKeyValueStore.h to the client 2023-05-10 15:41:47 -07:00
Yao Xiao 182d2cafbf Log physical shard size in KVS 2023-05-10 12:54:59 -07:00
Ata E Husain Bohra 18fd2702c4
EaR: Implement SimKmsVault interface, refactor SimKmsConnector (#10194)
Description

Patch implements a SimKmsVault interface allowing unittest/simulation
to satisfy encryption lookup usecases. It also refactors existing
SimKmsConnector to leverage SimKmsVault APIs

Testing

devRunCorrectness - 100K
/simKmsVault - asan & valgrind
EncryptionUnitTest
2023-05-10 12:44:53 -07:00
He Liu 66cd102821
Added `get_audit_status checkmigration` to print out the number of da… (#10188)
* Added `get_audit_status checkmigration` to print out the number of data shards and `physical shards`, so that we know the progress of migration to `shard_encode_location_metadata`

* Fixed print format.

* Addressed comments.
2023-05-10 12:26:39 -07:00
Yao Xiao 2d1b5d02e2 Range deletion memory usage improvements (#10048) 2023-05-10 10:23:01 -07:00
Yao Xiao fa101e1e11 Log background error and add knobs for memory tuning. (#9841)
* error logger

* recovery mode
2023-05-10 10:23:01 -07:00
Yao Xiao fa821c0ed6 Cherrypick #9746 2023-05-10 10:23:01 -07:00
Yao Xiao abd45c4486 Cherrypick #9665 2023-05-10 10:23:01 -07:00
Josh Slocum 9a2365daa8
fixing bugs with tenant_mode required on external clients and changin… (#10183)
* fixing bugs with tenant_mode required on external clients and changing test to find them

* Update fdbcli/BlobKeyCommand.actor.cpp

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>

---------

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-05-09 13:41:58 -05:00
Jay Zhuang 801a01bd38
Merge pull request #10159 from sfc-gh-jazhuang/redwood_test
Integrate the random key/value generator to Redwood test
2023-05-09 11:41:47 -07:00
Josh Slocum e69d54fbc0
Block unblobbify (#10182)
* stregthening check for not merging consecutive blob ranges

* implementing expanded unblobbify and changing tests to account
2023-05-09 11:43:11 -05:00
Josh Slocum 6be0c74d5b
Adding explicit blob range mutation log to handle large number of ranges (#10174)
* Adding explicit blob range mutation log to handle large number of ranges

* fixing ide build
2023-05-09 11:30:04 -05:00
Jay Zhuang 1c009bbd11 Update value size and maxCommitSize based on pageSize 2023-05-09 09:11:30 -07:00
Jay Zhuang 9f2f735d53 More random keys if there's fewer stringSet 2023-05-09 09:11:30 -07:00
Jay Zhuang 561db510e0 Add a helper class to container StringSetGenerator and StringGenerator 2023-05-09 09:11:30 -07:00
Jay Zhuang fd680782b5 Integrate the random key/value generator 2023-05-09 09:11:30 -07:00
Evan Tschannen c8e8505101
buggified max_shards_on_large_teams (#10105)
* buggified max_shards_on_large_teams, and had the consistency scan verify the proper number of shards have been overreplicated

* fix: when restarting the data distributor, do no allow more than max_shards_on_large_teams shards to be marked as healthy
2023-05-08 16:56:42 -07:00
Hui Liu 53e68065e7
Support blob manifest backup for fdbbackup cmdline (#10091) 2023-05-08 16:07:22 -07:00
Ankita Kejriwal 63354f68ad
Update knob values for Storage Quota polling intervals (#10154) 2023-05-08 10:06:29 -07:00
Hui Liu 65ed7775fd
Add manifest encryption (#10081) 2023-05-05 14:33:37 -07:00
Jingyu Zhou b844a92c1e
Merge pull request #10143 from neethuhaneesha/paranoidChecks
Rocksdb paranoid file checks knob.
2023-05-05 10:23:06 -07:00
Josh Slocum a4dffa087a
Adding Simulated HTTP Server and refactoring HTTP code (#10112)
* Adding Simulated HTTP Server and refactoring HTTP code

* fixing formatting

* fixing merge conflicts

* fixing more merge conflicts

* code review feedback

* changing reference counted interface

* more fixes

* fixing ide build i guess
2023-05-05 12:19:17 -05:00
Steve Atherton fb2fc6a260
Merge pull request #10157 from sfc-gh-satherton/systemkey-overlap
Bug fix, check is supposed to be for overlap, not lack of overlap.
2023-05-04 21:24:12 -07:00
Jingyu Zhou 78434517ff Increase buggified STORAGE_METRICS_SHARD_LIMIT value
The previous buggified value 3 can be the same as key location size, thus
causing splitStorageMetrics() to stuck.
2023-05-04 19:31:43 -07:00
Steve Atherton d52113e7a3 Bug fix, check is supposed to be for overlap, not lack of overlap. 2023-05-04 18:08:37 -07:00
Josh Slocum fb950a9c81
adding blob ranges to backup keys to not lose blobbification on restore (#10059) 2023-05-04 13:55:20 -05:00
neethuhaneesha 8b2f3bcfdc Rocksdb paranoid file checks knob. 2023-05-04 11:49:38 -07:00
A.J. Beamon 9d647f827c
Merge pull request #10129 from sfc-gh-ajbeamon/require-reliable-coordinator-quorum
Do not allow changing the coordinators to a set that is unreliable in simulation
2023-05-04 08:18:29 -07:00
Jay Zhuang d0cb599c7a Fix a gcc build error
```
RandomKeyValueUtils.cpp:64:106: error: call of overloaded 'RandomKeyTupleGenerator(<brace-enclosed initializer list>)' is ambiguous
```
2023-05-03 16:33:04 -07:00
Jay Zhuang a18bb10bcf Merge branch 'main' into random-kv-generator 2023-05-03 15:39:37 -07:00
Zhe Wang d254fba6e5
Adding cleanup of audit progress metadata when audit complete (#10118)
* cleanup audit progress metadata and tester directly issue audit requests to DD instead of CC

* address comments and fix test dd issue request but dd not present
2023-05-03 15:39:22 -07:00
A.J. Beamon ccf61ac2e5 Do not allow changing the coordinators to a set that is unreliable, because otherwise we could delete our coordinated state 2023-05-03 15:03:03 -07:00
Xiaoxi Wang 91de1c880e remove PrepareBlobRestore waiting for inFlight moving 2023-05-03 14:43:23 -07:00
Xiaoxi Wang d7c089fd13 add timeout to blob migrator getReply to tackle recovery during preparation 2023-05-03 14:43:23 -07:00
Josh Slocum 22155c84f4
adding logic to disable splitting within a truncated tuple, and validating it in test (#10106) 2023-05-03 10:23:46 -05:00
Zhe Wu fffdfa5b3d Increase MAX_STORAGE_COMMIT_TIME to be inline with LOW_PRIORITY_DURABILITY_LAG 2023-05-02 11:12:52 -07:00
Josh Slocum d0c412b5e6
fixing incorrect uses of ThreadSafeAsyncVar (#10086) 2023-05-02 07:29:06 -05:00
Xiaoxi Wang 5ea53a797e check storage metadata and storage server interface in the same transaction 2023-05-01 18:08:08 -07:00
Xiaoxi Wang 3a8bdcca3d add metadata check to quiesent consistency check 2023-05-01 18:08:08 -07:00
Xiaoxi Wang 3605d8c74c populate storage metadata for tss 2023-05-01 18:08:08 -07:00
A.J. Beamon 85f5e206a7
Merge pull request #10047 from sfc-gh-ajbeamon/add-metacluster-version
Add a metacluster version to the MetaclusterRegistrationEntry and validate it when loading the entry
2023-05-01 12:32:37 -07:00
A.J. Beamon b258159d3a Change enum capitalization. Improve error reporting if we cannot read metacluster registration when fetching metacluster metrics. Improve timeliness of metacluster metrics updates. 2023-05-01 11:21:42 -07:00
Zhe Wang d6e7b5f736
Audit storage: validate consistency of replica and shard location metadata (#9628)
* Implemented AuditUtils.actor.cpp

Moved AuditUtils to fdbserver/

* Persist AuditStorageState.

* Passed persisted AuditStorageState test.

* Added audit_storage_error to indicate a corruption is caught.

Throw/Send audit_storage_error when there is a data corruption.

Added doAuditStorage() for resuming Audit.

* Load and resume AuditStorage when DD restarts.

* Generate audit id monotonically.

* Fixed minor issue AuditId/Type was not set.

* Adding getLatestAuditStates.

* Improved persisted errors and added AuditStorageCommand.actor.cpp for
fdbcli.

* Added `audit_storage` fdbcli command.

* fmt.

* Fixed null shared_ptr issue.

* Improve audit data.

* Change DDAuditFailed to SevWarn.

* Sev.

* set SERVE_AUDIT_STORAGE_PARALLELISM to 1.

* Moved AuditUtils* to fdbclient/.

* Added getAuditStatus fdbcli command.

* Refactor audit storage fdb cli commands.

* Added auditStorage in sim.

* Cleanup.

* Resolved comments.

* Resolved comments.

* Added SystemData for metadata audit.

Refactored audit workflow to make sure all sub-tasks are executed w/o
early exit.

* Improvements.

* Persisted Failed state after too many retries.

* Added retryCount for resumeAuditStorage().

* resolving conflict.

* Resolved conflicts.

* allow-merged-to-run

* add timeout to audit client

* fmt

* validate replica

* add audit serverKey

* address comments and fmt

* fix audit_storage_exceeded_request_limit

* fix segfault in getLatestAuditStatesImpl

* fix bugs

* remove timeout from workload

* fix bugs

* audit local view of shard assignment

* fmt

* fix-stuck-issue-and-make-dd-audit-storage-self-retry

* fix timeout

* fix timeout

* fix bugs and cleanup

* fix nit

* change name state to coreState for audit metadata

* address comments

* code clean

* fmt

* setup debug

* cleanup

* clean up

* code cleanup

* code clean

* remove tmp file

* fmt

* trace portion of shards that of anonymous physical shard

* remove unnecessary actor cleanup

* do not give up when tr is too old

* address commits

* refactor

* clean

* fmt

* fix-command-help-text

* fix-auditstate-restore-and-enable-restore-to-metadata-audit

* address comments

* fmrt

* debug and improve efficient of resume audit

* small change

* fix audit cli

* bypass completed audit when dd restart

* fix auditStorageCommandActor

* make mismatch key range more visable

* address comments

* make local shard metadata check can make progress by retries

* address comments

* address comments

* partition location metadata validation by range and server

* unset MIN_TRACE_SEVERITY

* address comments and SS auto proceed until failed then notify dd

* persistNewAuditState should checkMoveKeysLock

* audit storage location metadata partitioned by range and move shard assignment history def to the end of SS structure

* code cleanup

* fix error message in metadata validation

* fix registerAuditsForShardAssignmentHistoryCollection input for local shard validation

* add comments to code and add guard to make sure the SS audit does not proceeds automatically for many times without being notified by DD --- to support audit cancellation later

* fix coalesceRangeList

* replace rangeOverlapping func with operator and use struct instead of complicated type for return value of getKeyServer/serverKey/shardInfo

* simplify shard assignment history

* shardAssignmentRecordRequests should be unorder_map

* address comments, make trackShardAssignment simple, make anyChildAuditFailed cover all audit children, keep only one audit actor run at a time on each SS

* only run validate shard info once at a time, other audit type does not have this limitation

---------

Co-authored-by: He Liu <heliu05023@gmail.com>
Co-authored-by: He Liu <heliu@apple.com>
Co-authored-by: Zhe Wang <zhewang@Zhes-Laptop.local>
2023-05-01 10:35:52 -07:00
Steve Atherton 16d8b1d1f9
Merge pull request #9949 from sfc-gh-etschannen/fix-shard-count
fix: do not let too many shards use large teams
2023-04-29 23:50:49 -07:00
Steve Atherton e291d5e51f Merge commit '5e8feac8c980fbef6b6f523360e42d28dd120e5d' into random-kv-generator 2023-04-28 14:11:42 -07:00
Josh Slocum 5b47913882
disabling global conncetion pool for now (#10054) 2023-04-28 09:48:56 -05:00
neethuhaneesha 53fe07a709
Enabling auto_prefix_mode to true in rocksdb. (#10050) 2023-04-27 12:11:48 -07:00
A.J. Beamon f1cbc86b94 Add a metacluster version to the MetaclusterRegistrationEntry and validate it when loading the entry from the cluster. 2023-04-27 10:04:57 -07:00
Jay Zhuang 0ab691b707
Merge pull request #10002 from sfc-gh-jazhuang/readThrough
Fix RangeResult.readThrough misuse
2023-04-27 09:59:11 -07:00
Xiaoxi Wang a05e078c4a Remove locations.size() == expectedShardCount assertion and add comments 2023-04-26 14:23:09 -07:00
He Liu e11f804f96
ShardedRocks checkpoint/restore for physical shard move (#9752)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* Refactored ShardedRocks checkpoint/restore for psm.

* Complete ShardedRocks::restore.

* dismiss operation_obsolete, and throw actor_cancelled.

* Validate checkpoint when !asKeyValues.

* fmt.

* Don't read from uninitialized physical shard.

* Resolved commments.

* cleanup.

* Added verify_checksum_before_restore for ShardedRocks.

* Added ShardedRocksDB checkpoint/restore unit test.

* Populate CheckpointMetaData::dir in RocksDB.

* Addressed comments.
2023-04-26 09:17:18 -07:00
Steve Atherton f9c8840fd6 Initial checkin of RandomKeyValueUtils.h/cpp and a unit test. 2023-04-26 01:34:56 -07:00
Steve Atherton 7f6d5f296a Merge commit 'e318fc260070ba6ba604930b8f259c9b655938ea' into keybackedrangemap
# Conflicts:
#	flow/include/flow/error_definitions.h
2023-04-25 14:21:23 -07:00
Jingyu Zhou 6b15d67928
Merge pull request #10010 from jzhou77/main
Properly handle proxy_memory_limit_exceeded error for GetKeyServerLocationsRequest
2023-04-25 11:18:03 -07:00
Jingyu Zhou 74bb659f71 Simplify backoff calls per comment 2023-04-25 09:15:16 -07:00
Steve Atherton 858b51a69b Address review comments. KeyRangeMapSnapshot is now ReferenceCounted and getSnapshot() returns a Reference to discourage copying. Added several comments for clarity. Added FormatUsingTraceable and changed all new formatters to use it except for Standalone<T> which redirects to the formatter for T. 2023-04-24 19:01:05 -07:00
Steve Atherton 7f3df82d98 Add code probes and trace events to range config setup. Make default and non-default ranges randomly on/off. Restore GetMappedRange getRange buggify on range read size. 2023-04-22 23:52:49 -07:00
Steve Atherton 2c7f3c2120 In KeyBackedTypes, many methods have two versions, one which accepts a Transaction and one which accepts a Database from which a Transaction is created and a retry loop is run via runTransaction(). This refactor combines both versions into a single function which uses a static check to call itself with runTransaction() if the passed object is a Transaction creator. 2023-04-22 13:58:00 -07:00
Steve Atherton c57ed25987 Renamed SystemDBLockWriteNow() to SystemDBWriteLockedNow() and changed definition to be more direct / clear. 2023-04-22 13:17:41 -07:00
Steve Atherton dd334f1b02 Added `transaction_option_setter<DB>` to determine if a DB-like thing has a `->setOptions(tr)` method. This method is called in `runTransaction()` templates at the top of the retry loop and in the manual retry loops in KeyBackedTypes. Added `if constexpr(` support to the ActorCompiler to support this. 2023-04-22 11:10:20 -07:00
Steve Atherton 893faf7d5a Change optional get() to ->. 2023-04-22 01:15:49 -07:00
Steve Atherton 639d4d05ef Removed SYSTEM_PRIORITY_IMMEDIATE from KeyBackedTypes and all options from KeyBackedRangeMap database functions. Added SystemTransactionGenerator<> for wrapping Database types and generating transactions with selected system level options. 2023-04-21 19:00:29 -07:00
Steve Atherton 46cde666a5 Merge commit '9639192a88001043a104aeef0c394e99ca5d6a6e' into keybackedrangemap 2023-04-21 13:27:15 -07:00
Steve Atherton 879e729cec WatchableTrigger::onChange() provides a signal when its baseline version is stablished. Wrote docs for it. Fixed DD race condition in DD config watching using this feature. 2023-04-21 13:18:31 -07:00
sfc-gh-tclinkenbeard 9639192a88 Add GLOBAL_TAG_THROTTLING_REPORT_ONLY knob 2023-04-21 11:13:42 -07:00
Steve Atherton 948e2dd781 Bug fix in KeyBackedRangeMap::updateRange() where the range after the modified region could be set wrong. Added Database version of updateRange(). 2023-04-20 20:44:24 -07:00
Jingyu Zhou c544985fe5 Add trace events and adjust backoff
For each success, half the backoff until less than initial backoff value, then
set the backoff to 0.
2023-04-20 15:56:06 -07:00
Jon Fu a7cf82adb2
Update fdbcli tenant list function to take tenant group filter, support JSON, and report tenant IDs (#9967)
* fix metacluster get segfault

* update fdbcli tenant list function to take tenant group filter, support JSON, and report tenant IDs

* code review changes

* code formatting

* additional code review changes

* account for empty tenant groups

* reformat error catching in fdbcli command

* refactor json output and address code review comments

* add back mistakenly removed hint

* keep hints after 4th token

* add to tenant management workload

* fix compile error

* fix test range

* add more asserts to metacluster case

* nest test condition inside if block

* adjust tenant test layout

* refactor some test files

* reorganize test workload logic
2023-04-20 16:22:47 -04:00
Steve Atherton 2553aed118 KeyBackedRangeMap::updateRange() now coalesces adjacent matching ranges caused by the update, and supports replacing a range's config with a new explicit value. Added update command to rangeconfig cli. 2023-04-20 13:02:04 -07:00
Josh Slocum ae862e1b96 fixing code probe issues 2023-04-20 11:54:47 -07:00
Steve Atherton 2e9a7f927b Prevent KeyBackedRangeMap::getSnapshot() from touching a key outside of its map range. 2023-04-19 23:14:10 -07:00
Steve Atherton 183492cfb3 Change WatchableTrigger back to use Versionstamps and complete the watch() and onChange() implementations. 2023-04-19 21:45:56 -07:00
Ata E Husain Bohra a099d377fa
EaR: Remove unused CODE_PROBE handling encrypiton header flag version (#10020)
Description

Patch removes an unused CODE_PROBE checking the encryption header
being read flag version is valid, given the flag-version is determined
by peeking into std::variant index and we only have version-1 supported,
for now converted the check to an ASSERT

Testing

EncryptionUnitTests.toml
EncryptionOps.toml
BlobGranuleCorrectness/Clean.toml
2023-04-19 18:02:31 -07:00
Jingyu Zhou a83295e3bd Add backoff to lookupTenantImpl for commit_proxy_memory_limit_exceeded error 2023-04-19 16:55:46 -07:00
Jingyu Zhou 3bfd353a22 Add backoff to getKeyLocation_internal as well 2023-04-19 16:45:50 -07:00
Jingyu Zhou ad1c7bba74 Make commit_proxy_memory_limit_exceeded error a database level backoff
Since this error means the database is overloaded.
2023-04-19 15:52:16 -07:00
Josh Slocum 377dd0d754
Fixing path for non-backup (blob granule) blob url use cases to not prepend /data to path (#10011) 2023-04-19 14:18:08 -05:00
Nim Wijetunga 021bdccc32
propogate encryption errors properly (#10012)
propogate encryption errors properly
2023-04-19 11:35:29 -07:00
Jingyu Zhou b49625d45b Properly handle proxy_memory_limit_exceeded error for GetKeyServerLocationsRequest
Also add buggify to inject the error in simulation.
2023-04-19 09:35:09 -07:00
Jay Zhuang 8e7a5b5b22 Add the helper function to support reverse range read 2023-04-19 08:31:52 -07:00
Steve Atherton edb071c6f2 Updated includes on newer files. 2023-04-18 22:51:27 -07:00
Steve Atherton ebb3c8d698 Apply clang-format, debug output changes. 2023-04-18 22:21:27 -07:00
Steve Atherton bc6b9cb83f Added toJSON for DD Range configuration. 2023-04-18 22:21:27 -07:00
Steve Atherton d210772825 Bug fix in reading KeyBackedRangeMap local snapshot where returned iterable range sets would iterate to nothing. 2023-04-18 22:21:27 -07:00
Steve Atherton 3877eb9019 Rewrote KeyBackedMap and KeyBackedSet getRange(KeySelectors..) to fix bugs. Removed TypedKeySelector::packBounded() because it's a bad idea. Added seek() to KeyBackedMap and KeyBackedSet which is a convenient way to find an item in the container which is <, <=, >, >= some query. Rewrote KeyBackedRangeMap on these new features to remove bugs. Simplified KeyBackedRangeMap::ValueType contract. Updated DDRangeConfig to get rid of the forceBoundary concept, replaced with the teamID concept. 2023-04-18 22:21:27 -07:00
Steve Atherton 38bdc8bcf4 Disable KeyBackedTypes debug by default. 2023-04-18 22:21:26 -07:00
Steve Atherton 53ee26d758 Changed KeyBackedTypes to an actor file. Added TypedKeySelectors for Map and Set classes and getRange() keySelector methods. Added debug macro for KeyBackedTypes. Rewrote KeyBackedRangeMap using keyselectors on KeyBackedMap. 2023-04-18 22:21:19 -07:00
Steve Atherton b7e68bbf51 DDConfiguration class for modeling user specified key range configuration options. Added KeyBackedRangeMapSnapshot, some other supporting changes to KeyBackedTypes. Added invalidKey to give KeyBackedTypes a safe prefix to avoid accidental userspace modification from uninitialized accessors. 2023-04-18 22:09:18 -07:00
Steve Atherton b1a17cce0d Added KeyBackedRangeMap and SystemKey. 2023-04-18 22:03:41 -07:00
Steve Atherton 585ddb9ffc Avoid unnecessary copy. 2023-04-18 21:48:55 -07:00
Steve Atherton a4438d4542 Refactored and simplified KeyBackedTypes to base types KeyBackedProperty, KeyBackedMap, and KeyBackedSet. Object and BinaryValue variations are now redefined as customizations of the base types. Added WatchableTrigger for using a key to track a last updated version, supported by all KeyBackedTypes. KeyBackedStruct is renamed to KeyBackedClass contains a WatchableTrigger to pass to contained KeyBackTypes. 2023-04-18 21:48:55 -07:00
Ankita Kejriwal 6b8e35fd19
Merge pull request #9964 from sfc-gh-xwang/fix/main/comments
fix HealthMetrics update bug and correct comments about readLoadKSecond()
2023-04-18 16:42:42 -07:00
Yanqin Jin 09ab501cb7
Improve logging for test (#9966)
In clearData() used by simulation test to clean up, we re-use the same debugID for multiple transactions,
making it less clear when grepping for the debugID. This PR assigns a new debugID for each new transaction.

Some of the commented-out tracing code are already obsolete, I updated several of them, but am not sure if
the tracing should be enabled when enabling debug transaction.

Finally, use different but similar with the same prefix error messages for different call stacks.

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-04-18 15:25:06 -07:00
Jay Zhuang 8da1b875df Update getReadThrough() to return Key directly
And a few comment update
2023-04-18 13:13:59 -07:00
Jay Zhuang 8865be10dd Add nextBeginKeySelector() to avoid key clone 2023-04-18 12:24:21 -07:00
Nim Wijetunga 22ba818133
Prevent Encryption Key Refresh for Non-Latest Keys (#9959)
prevent refresh for old encryption keys
2023-04-18 09:43:24 -07:00
Jay Zhuang b7da2ed16c Fix RangeResult.readThrough misuse
Fix `RangeResult.readThrough` misuses:
1. KeyValueStores do not need to set readThrough, as it will not be
   serialized and return. Also setting it to the last key of the result
   is not right, it should at least be the keyAfter of the last key;
2. Fix NativeAPI doesn't set `RangeResult.more` in a few places;
3. Avoid `tryGetRange()` setting `readThrough` when `more` is false,
   which was a workaround for the above item 2;
4. `tryGetRangeFromBlob()` doesn't set `more` but set `readThrough` to
   indicate it is end, which was following the same above workaround I
   think. Fixed that.
5. `getRangeStream()` is going to set `more` to true and then let the
   `readThrough` be it's boundary.

Also added readThrough getter/setter function to validate it's usage.
2023-04-17 21:37:51 -07:00
sfc-gh-tclinkenbeard 862c7e2ee8 Fix uninitialized memory issues in ratekeeper metrics 2023-04-17 15:27:19 -07:00
sfc-gh-tclinkenbeard 7076c050d2 Decrease MIN_TAG_*_PAGES_RATE defaults 2023-04-17 15:16:29 -07:00
Jingyu Zhou a31f0e1641
Merge pull request #9862 from hfu94/rmain
Revert matchIndex feature
2023-04-17 12:43:49 -07:00
Xiaoxi Wang 50e1f629fe EligibilityCounter use int type; resolve review comments; 2023-04-17 12:24:17 -07:00
Xiaoxi Wang bdab07cbc8 fix detailed HealthMetrics update bug
add omitted return value
2023-04-17 12:24:14 -07:00
sfc-gh-tclinkenbeard a7217055c8 Update default value of MIN_TAG_WRITE_PAGES_RATE to match default value of MIN_TAG_READ_PAGES_RATE 2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard 568518b6a3 Update semantics of MIN_TAG_WRITE_PAGES_RATE to reflect name 2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard 7e4d4d6527 Update semantics of MIN_TAG_READ_PAGES_RATE to reflect name 2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard 4be3c3e7ff Fix initialization of SERVER_KNOBS->MIN_TAG_READ_PAGES_RATE 2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard 0f0eb7c2b6 Add GLOBAL_TAG_THROTTLING_TRACE_INTERVAL knob 2023-04-17 10:09:38 -07:00
hao fu 29161b2fda Revert matchIndex feature
It is not protocol compatible, revert it to avoid deployment issue.
Will have a new PR to have the feature if moving forward.
2023-04-17 09:39:45 -07:00
Josh Slocum 370feaa3c9
refactoring and adding future compatibility to blob range metadata (#9955)
* refactoring and adding future compatibility to blob range metadata

* formatting
2023-04-13 15:06:50 -05:00
Evan Tschannen 12e507e06c rename knobs 2023-04-13 09:40:37 -07:00
Ata E Husain Bohra fe0a4df06a
EaR: Implement Key Check Value semantics (#9936)
* EaR: Implement Key Check Value semantics

Description

Key Check Value (KCV) is a checksum of cryptographic encryption key
used to validate encryption keys's integrity. FDB Encryption at-rest
relies on external KMS to supply encryption keys.

Patch proposes following major changes:
1. Implement Sha256 based KCV implementation to protect against
'baseCipher' corruption in two possible scenarios:
 a) potential corruption external to FDB
 b) potential corruption within FDB processes.
2. Scheme persists computed KCV token in block encryption header,
which then gets validated as part of header validation during
decryption.
3. FDB Encryption key derivation uses HMAC_SHA256 digest generation
scheme, which allows max 64 bytes of 'cipher buffer', patch add
required check to ensure 'baseCipher' length are within bounds.
OpenSSL HMAC underlying call ignores extra length if supplied, however,
it weakens the security guarantees, hence, disallowed.

Testing

devRunCorrectness - multiple 500K runs
Valgrind & Asan - BlobCipherUnit, RESTKMSUnit, BlobGranuleCorrectness*,
EncryptionOps, EncryptKeyProxyTest
2023-04-12 14:29:31 -07:00
Xiaoxi Wang f7061debde remove unused CPU knob; add comments for EligibilityCounter 2023-04-12 09:33:05 -07:00
Xiaoxi Wang b0fe14aed5 getTeam based on EligiblityCount 2023-04-12 09:33:05 -07:00
Xiaoxi Wang 7ca44124d4 explain what does pivot ratio mean; fix the knob assertion 2023-04-12 09:33:05 -07:00
Xiaoxi Wang 5648f827a0 adjust CPU pivot knobs to hack simulation test 2023-04-12 09:33:05 -07:00
Xiaoxi Wang 31fd4bb272 consider consistent low CPU status for 5min 2023-04-12 09:33:05 -07:00
Xiaoxi Wang 490a7b534a add getAverageCPU method; delete default value of GetTeamRequest
arguments (solve conflicts)
2023-04-12 09:33:05 -07:00
Xiaoxi Wang 67b737b44d add getStorageStats method to DatabaseContext 2023-04-12 09:33:05 -07:00
Xiaoxi Wang 1181b1dfab use max(ops*empty_penalty, readbytes) as new read load (solve conflicts) 2023-04-12 09:33:05 -07:00
Zhe Wu 10a6f3d2d0
Merge pull request #9890 from halfprice/zhewu/log-router-gray-failure
Gray failure detects disconnected remote log router and recover high DC lag
2023-04-07 16:25:11 -07:00
Ata E Husain Bohra 9d8e8d2f9e
Update fdbclient/BlobCipher.cpp
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-04-07 10:34:11 -07:00
Ata E Husain Bohra e10259f461 Fix Asan reportin gheap overflow
Description

Fix Asan reportin gheap overflow

Testing

BlobCipherUnitTest with failing seed
2023-04-07 10:24:22 -07:00
Josh Slocum d37b2b0a76
Adding BlobFailureInjection workload (#9833)
* Adding BlobFailureInjection workload

* fixing formatting
2023-04-06 15:10:36 -05:00
Josh Slocum aef5130da2
adding system priority option to getDatabaseConfiguration, and several debugging improvements (#9864) 2023-04-06 15:08:40 -05:00
Hui Liu 396f89a3f4
Cleanup stale disk files for double recruitment of storage server (#9794) 2023-04-06 12:13:59 -07:00
Ata E Husain Bohra ecc6d5a712
EaR: Fix BlobCipher cache handling for cipher needs refresh and/or expired (#9845)
* EaR: Fix BlobCipher cache handling for cipher needs refresh and/or expired

Description

Patch proposes BlobCipher cache bug related to handling of cipherKeys
that either 'needsRefresh' and/or 'expired'
Also, adds a unit-test to cover the following usecase:
1. Test refreshAt and expireAt properties of the cipherKey
2. Validate corresponding Counter value increments

Testing

Extend /blobCipher unitest tests
2023-04-06 11:43:10 -07:00
Nim Wijetunga c780d706d1
Remove EKP Interface from ServerDBInfo (#9909)
Remove ekp interface from ServerDBInfo
2023-04-06 11:35:47 -07:00
Hui Liu 711e040627
RestoreConfig - use restoreRangeSet to replace restoreRanges (#9912) 2023-04-06 11:16:05 -07:00
Ata E Husain Bohra 625eb00d36
Relax EKP failure detection time for simulated runs (#9908)
Description

EKP failure detection time is choosen to be fairly low given EKP
availability could impact cluster availability. However, in
simulation runs, for various reasons such a low-latency failure
detection could cause EKP recruitment to be done in a loop.
Patch relaxes the failure detection time for simulation runs

Testing
2023-04-05 18:12:25 -07:00
Nim Wijetunga 6e4e6ab2f4
Revert "Revert "Refactor GetEncryptCipherKeys (#9600)"" (#9903)
* Revert "Revert "Refactor GetEncryptCipherKeys (#9600)" (#9708)"
2023-04-05 10:03:48 -07:00
A.J. Beamon 6023ee866a Fix format 2023-04-04 16:10:56 -07:00
A.J. Beamon 974376b484 Use originalEntry rather than updatedTenantEntry to access the tenant name for the tenant index entry being erased. 2023-04-04 16:02:26 -07:00
A.J. Beamon e27908556a Update the tenant group index to be a tuple of (tenant group name, tenant name, tenant ID) 2023-04-04 14:46:15 -07:00
Zhe Wang 8102ac9a41
Serialize concurrent datamoves and their cleanups (#9421)
* retry when concurrent dm cleanups happens

* cleanup

* fix conflict data move and cleanup

* clean

* make code looks better

* use background cleanup

* fmt

* unset StartMoveShardsFoundConflictingDataMove error

* address comments
2023-04-04 11:38:24 -07:00
Zhe Wu 90f035d7f7
Merge pull request #9894 from halfprice/zhewu/disable-gc-generation
Disable GC Tlog generation due to a bug exposed by the new logic
2023-04-04 10:05:37 -07:00
A.J. Beamon 9c786e6d1e
Merge pull request #9854 from sfc-gh-ajbeamon/metacluster-separate-project
Metacluster refactoring
2023-04-04 09:43:41 -07:00
Jingyu Zhou f17e466aa4
Remove printable() from TSS trace events (#9889) 2023-04-04 08:46:39 -07:00
Zhe Wu 3751641704 Disable GC Tlog generation due to a bug exposed by the new logic in the existing code 2023-04-04 08:12:03 -07:00
Zhe Wu 675663327d Adding a knob to control log router triggered recovery 2023-04-03 22:31:22 -07:00
Nim Wijetunga 2867443f95
Recurit EKP without Enabling Encryption (#9885)
Recruit EKP without needing to enable encryption
2023-04-03 20:05:21 -07:00
neethuhaneesha 1d6908d3b4
Changing single key deletions to delete based on number of deletes instead of bytelimit. (#9867) 2023-04-03 13:55:58 -07:00
Ata E Husain Bohra 769226e5c0
EaR: Fix heap-over-flow in BlobCipherTest (#9877)
Description

Heap overflow was due to recent upgrade in BlobCipherTest to use
variable size 'baseCipher' buffer.

Testing

BlobCipherUnit test
2023-04-03 12:25:59 -07:00
Zhe Wu 7ab0ae8189
Merge pull request #9041 from halfprice/zhewu/gc-earlier-generations
Track TLog generation recovery and remove no longer needed TLog generations before recovery reaches to `fully_recovered`
2023-04-03 10:16:19 -07:00
Zhe Wu 50a20946d1 Implement check if locality is already excluded in exclude locality command 2023-04-01 19:04:58 -07:00
Zhe Wu d903d0cc8d checkSafeExclusion should always create new ExclusionSafetyCheckRequest 2023-03-31 16:27:26 -07:00
Jingyu Zhou 92c182a842
Merge pull request #9858 from sfc-gh-nwijetunga/nim/fix-rest-connector-tls
Enable TLS with HTTPS connection
2023-03-31 10:02:58 -07:00
Jingyu Zhou 3560117f71
Merge pull request #9846 from johscheuer/fix-exclusion-check
Don't stop iterating over all storage processes in exclusion check
2023-03-31 09:59:00 -07:00
Ata E Husain Bohra 3f6fcada45
EaR - Misc fixes found using end-to-end integration testing (#9806)
* EaR - Misc fixes found using end-to-end integration testing

Description

Major changes proposed includes:
1. RESTClient filtering of trailing `/`(s) characters from
input URI resource path
2. Avoid EKP exponential backup given RESTClient supports
exponential backoffs retries for all retryable errors.
3. Memory allocation optimizations:
 3.1. BaseCipher key management using Standalone semantics
 in KMSConnector interface endpoints
 3.2. Optimize memcpy while looking encryption-keys in EKP endpoints
4. Avoid delay while starting EKP, given its criticality during
cluster recovery.
5. Update BlobCipher to handle variable size BaseCipher buffer
6. Improved logging

Testing

Setup:
1. External KMS server to supply encryption keys (inhouse)
2. Create cluster with: cluster_aware & domain_aware config

* Fix EncryptionOps test

Description

Testing

* EaR - Misc fixes found using end-to-end integration testing

Description

Major changes:
1. Cleanup EKP driven exponential backup files.
2. Update EKP not to use #1.

Testing

* EaR - Misc fixes found using end-to-end integration testing

Description

Address review comments

Testing

* Fix AES 256 key length value

Description

Testing

* Address review comments

Description

Testing
2023-03-30 22:22:26 -07:00
Nim Wijetunga 4a68e6072a enable tls with https connection 2023-03-30 21:45:03 -07:00
A.J. Beamon 807646675c Refactor the metacluster project into smaller files, and reorganize the namespaces. Move some metacluster and tenant testing helpers into the metacluster project. 2023-03-30 16:20:09 -07:00
A.J. Beamon e61748c7d5 Move metacluster into its own directory and static library 2023-03-30 16:07:49 -07:00
A.J. Beamon 3164cadc6f
Merge pull request #9851 from sfc-gh-ajbeamon/improve-boolean-param
Allow boolean parameters to be nested inside of namespaces or classes
2023-03-30 16:06:03 -07:00
Ata E Husain Bohra 054358e63f
EaR: Update REST validation token-name seperator from '#' -> '$' (#9848)
Description

RESTKms validation tokens are provided using foundationdb.conf, the file
uses '#' as comment start character. Update the token-name seperator to
'$' instead of '#'

Testing

RESTKmsConnectorUnit.toml
2023-03-30 15:40:09 -07:00
A.J. Beamon 64b6a5d257 Allow boolean parameters to be nested inside of namespaces or classes 2023-03-30 15:09:59 -07:00
Vaidas Gasiunas 894f555e95
Test loop profiler using API Tester (#7174)
* Api Tester: Specify knobs in the toml file; Test loop profiler

* Gracefully stop the loop profiler thread

* Protect loop profiler thread by mutex

* Create loop  profiler thread only if is not stopped
2023-03-30 23:00:23 +02:00
Johannes M. Scheuermann 6a612dd85f Don't stop iterating over all storage processes in exclusion check 2023-03-30 13:52:23 +02:00
Ata E Husain Bohra 0e720634f3
EaR: Allow RESTKmsConnector validation token newline char sanitization (#9831)
Description

Patch proposes ability to remove newline characters from KMSConnector
validation tokens

Testing

RESTKmsConnectorUnit.toml
2023-03-29 16:56:46 -07:00
Xiaoxi Wang 0b65f87422 Update fdbclient/include/fdbclient/ServerKnobs.h comment about pivot space ratio 2023-03-28 14:51:00 -07:00
Xiaoxi Wang d691e94af2 rename median ratio to pivot ratio; extract updatePivotAvailableSpaceRatio function; add related knobs 2023-03-28 14:51:00 -07:00
Josh Slocum c7fcf1e8f2
handling multiple in-flight rollbacks for tss change feed stream comparison (#9816) 2023-03-28 09:39:38 -05:00
Zhe Wu 33736ff9af Cleanup GcGeneration test and function documents 2023-03-27 12:31:44 -07:00
Zhe Wu 40dc54223c Add GC generation test, and make all simulation test passing 2023-03-27 11:46:13 -07:00
Zhe Wu 2da86c37aa Add a knob to guard track tlog recovery 2023-03-27 11:42:27 -07:00
Zhe Wu 78bef8110b Track tlog recovery: tlog side implementation 2023-03-27 11:42:27 -07:00
Jay Zhuang cb389bf026
Merge pull request #9610 from sfc-gh-jazhuang/encrypt_inplace
Add inplace encryption and decryption API
which avoids the memory allocation and memcpy.
2023-03-27 11:21:06 -07:00
Zhe Wu 4a7f7cdfce
Merge pull request #9803 from halfprice/zhewu/exclude-check-existance
Do not update exclude/failed system metadata in excludeServers if the input list is already excluded/failed.
2023-03-27 09:38:04 -07:00
Zhe Wu 6e1bb08677 Update documentation 2023-03-24 15:29:47 -07:00
Zhe Wu 8211b5d097 Add a check in excludeServer function that if the exclusion list already exists, don't need to issue new writes. 2023-03-24 14:57:31 -07:00
Dan Adkins 02f0a44987
Avoid divide-by-zero in isKeyValueInSample. (#9799)
In the pathological case that both key size and value size are 0,
the probability of choosing that key-value pair is 0, and we divide
by zero when computing the sampledSize.

This change adds documentation to that function, which was quite
difficult to understand. In addition, we add `probability` to the
returned values, since one of the callers was backing it out from
sampledSize and itself in danger of dividing by zero.
2023-03-24 16:05:26 -04:00
Xiaoge Su 88eeb5a526 Remove WolfSSL support in FoundationDB 2023-03-23 20:17:18 -07:00
Jay Zhuang dba3555635 fix inplaceEncrypt() unittest issue 2023-03-23 15:26:22 -07:00
Jay Zhuang d9b37e527c Replace EncryptFinal() with CTX_reset() 2023-03-23 15:26:22 -07:00
Jay Zhuang 0efd403e59 Add inplace encryption/decryption API 2023-03-23 15:26:22 -07:00
Jingyu Zhou 18a0fa0263
Merge pull request #9468 from johscheuer/dont-block-exclusion-stateless-processes
Don't block the exclusion of stateless processes by the free capacity check
2023-03-23 08:59:43 -07:00
Johannes M. Scheuermann 694263ae5f Format code and update comment 2023-03-22 16:31:04 +01:00
neethuhaneesha 1e656210e1 Adding rocksdb bloom filter knobs. 2023-03-21 13:10:40 -07:00
neethuhaneesha 06657e1e0e
Rocksdb knob changes. (#9389) 2023-03-21 12:03:43 -07:00
He Liu 81c3cb8c50
Psm checkpoint (#9636)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* dismiss operation_obsolete, and throw actor_cancelled.

* fmt.

* Resolved commments.
2023-03-21 09:14:10 -07:00
A.J. Beamon d324afe1bd
Merge pull request #9753 from sfc-gh-ajbeamon/fix-tenant-list-infinite-loop
Do not list renaming tenants twice when listing tenant metadata
2023-03-20 16:05:56 -07:00
Evan Tschannen a258d775c3
Merge pull request #9710 from sfc-gh-etschannen/feature-custom-dd
Added the ability to manually create a shard and also increase its replication factor
2023-03-20 15:35:10 -07:00
Zhongxing Zhang d2c1b3124e
add a field to show average data movement bytes in MovingData trace (#9591)
* add a field to show average data movement bytes in MovingData trace

* change variable type

* Make changes to variable/function naming and add more comments

* move rolling window struct to a new file; deal with corner case: dd startup, elements are full

* format

* simplify codes

* modify file/struct name, universal to moving window

* fix typo

* add a new Knob to control MovingWindow::Deque size

* make the general use of dequeSize limit

* format

* format, use space rather than tab
2023-03-20 14:33:32 -07:00
A.J. Beamon 6becf12ecd Merge branch 'main' into fix-tenant-list-infinite-loop 2023-03-20 14:11:16 -07:00
A.J. Beamon 690a0a81ae Reading a list of tenant metadata ordered by tenant name would occasionally get stuck in an infinite loop if the last tenant in a batch was being renamed and has the same ID as the tenant read in the previous batch. This change removes rename destinations from the list and avoids this problem. 2023-03-20 13:30:27 -07:00
Evan Tschannen 8e4eb83ba7 addressed review comments 2023-03-20 11:41:23 -07:00
Xiaoxi Wang dc1eb1375b add a miss healthy_perpetual_wiggle enum 2023-03-20 09:46:36 -07:00
Xiaoxi Wang ef706e551f Add more details into priority comments. 2023-03-20 09:46:36 -07:00
Xiaoxi Wang e48fd10d8d add perpetual wiggle to .team_tracker field 2023-03-20 09:46:36 -07:00
Xiaoxi Wang f89a483f3d add informal classification of priority 2023-03-20 09:46:36 -07:00
Xiaoxi Wang c73577de7d Add team priority comments and document. 2023-03-20 09:46:36 -07:00
Steve Atherton 216d0be2cf
Add processID, networkAddress, and locality to layer status JSON for Backup Agents. (#9736)
* Add processID, networkAddress, and locality to layer status JSON for Backup Agents.

* Backup/dr agent determines network address to report in Layer Status only once, when the status updater loop begins, since it is a blocking call which connects to the cluster.  And lots of code cleanup.
2023-03-17 18:07:03 -07:00
A.J. Beamon dc2bd78aa7 The consistency check should retry if it couldn't find all the commit proxies when getting key server locations 2023-03-17 12:00:47 -07:00
Evan Tschannen 73767501d4 Merge branch 'main' into feature-custom-dd
# Conflicts:
#	fdbserver/tester.actor.cpp
2023-03-17 10:33:38 -07:00
Ata E Husain Bohra c492f83bf4
EaR: Avoid appending `tls` to the URL (#9734)
Description

Patch proposes two changes:

1. Avoid appending tls as part of URI for secure connections
2. RefreshEKs recurring task can be skipped if there are no keys to be refreshed

Testing

EncryptionOps.toml
EncryptKeyProxyTest.toml
devRunCorrectness 
devRunCorrectnessFiltered 'Encrypt*'
2023-03-16 22:52:51 -07:00
He Liu 0f5e75b34b
Added newDataMoveId(). (#9647)
* Added newDataMoveId().

* Added `ENABLE_DD_PHYSICAL_SHARD_MOVE`

* fmt.

* Replace `teamId` with `shardId`.
2023-03-16 18:06:06 -07:00
A.J. Beamon 484a414117 Increase the buggified tag measurement interval to reduce trace spam 2023-03-16 11:53:45 -07:00
Evan Tschannen ac54962533 code cleanup 2023-03-16 09:47:21 -07:00
A.J. Beamon 6d5ffa11f9 Merge branch 'main' into fix-tenant-id-increment 2023-03-15 17:56:42 -07:00
Josh Slocum b4eb665f1d
fixing copy constructor error and adding test for it (#9711) 2023-03-15 15:33:16 -07:00
A.J. Beamon 3881f1ccc6 More carefully validate tenant increments to avoid overflows 2023-03-15 14:56:12 -07:00
Ata E Husain Bohra dbcab0b1bd
Revert "Refactor GetEncryptCipherKeys (#9600)" (#9708)
This reverts commit 2702665e35.
2023-03-15 12:10:08 -07:00
Evan Tschannen aaf7b9b32b Added the ability to manually create a shard and also increase its replication factor 2023-03-15 11:26:15 -07:00
Markus Pilman aa09baadab
Merge pull request #9635 from sfc-gh-etschannen/fix-consistency-check
Fix: the consistency check did not properly report failed tests
2023-03-15 11:21:44 -07:00
Evan Tschannen 6c1d02a14f
Merge pull request #9703 from sfc-gh-jslocum/bg_file_logical_size
adding blob granule logical size
2023-03-15 09:59:57 -07:00
Evan Tschannen 2f96627d43 merge in main 2023-03-15 09:26:22 -07:00
Evan Tschannen 0a8435b742
Merge pull request #9702 from sfc-gh-jslocum/dbg_bg_ctest_timeout
fixing 2 bugs related to high delta file waitCommitted latency
2023-03-15 08:52:35 -07:00
Johannes M. Scheuermann b317928646 Only consider newly excluded processes 2023-03-15 15:36:15 +01:00
Josh Slocum a5b4212990 adding blob granule logical size 2023-03-15 08:54:49 -05:00
Josh Slocum 52c0dc56cc fixing 2 bugs related to high delta file waitCommitted latency 2023-03-15 08:39:42 -05:00
Evan Tschannen c435e8336a no message 2023-03-14 16:40:50 -07:00
He Liu a0a3f4bff3
Fetch byte sample file (#9657) 2023-03-14 16:24:08 -07:00
Zhe Wang 7d2766b692
Fix KeyRangeRef::isCovered() (#9675)
* fix KeyRangeRef::isCovered()

* reproduce bug

* more unit test added

* fmt
2023-03-14 12:41:18 -07:00
Jingyu Zhou c5e9bdc6e4
Merge pull request #9684 from sfc-gh-ahusain/ahusain-fix-rest-test
Fix RestUtilUnit test
2023-03-14 09:16:39 -07:00
A.J. Beamon d39cda610a Merge branch 'main' into metacluster-improvements
# Conflicts:
#	fdbcli/TenantCommands.actor.cpp
2023-03-13 15:58:39 -07:00
Ata E Husain Bohra aae8b131cb Remove 'printf'
Description

Testing
2023-03-13 15:50:04 -07:00
Ata E Husain Bohra a196f2fd75 Fix RestUtilUnit test
Description

Fix RestUtilUnit test

Testing

RESTUtilUnits.toml
2023-03-13 15:46:15 -07:00
A.J. Beamon 45056370b8 Merge branch 'main' into metacluster-improvements 2023-03-13 13:14:09 -07:00
A.J. Beamon 18cf523f49
Merge pull request #9660 from sfc-gh-ajbeamon/tenant-id-restore-safety
Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix
2023-03-13 13:12:30 -07:00
Ata E Husain Bohra ea796eb3ec
EaR: REST kms misc fixes (#9664)
* EaR: REST kms misc fixes

Description

Patch addresses following issues:
1. Fix "return connection" routine, it fixes a regression introduced by
an earlier fix.
2. Update RESTConnectionPool::connectionPoolMap to an "unordered_map"
for O(1) lookups
3. Improve logging
4. Make RESTUrl parsing handle extra '/' for 'resource'

Testing

Standalone fdbserver connecting to external KMS and database create
2023-03-13 13:11:05 -07:00
A.J. Beamon cbc330697c Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix unless forced. Remember the largest used tenant ID on the data cluster and use it to update the management cluster tenant ID when force repopulating the same ID. 2023-03-10 15:36:37 -08:00
Jingyu Zhou b755e668bf
Merge pull request #9601 from jzhou77/fix-head
Allow log router to detect slow peeks and to switch DC for peeking
2023-03-09 15:34:24 -08:00
Ata E Husain Bohra b227007ab0
EaR: Fix knob name (#9630)
Description

Knob 'REST_KMS_ALLOW_NOT_SECURE_CONNECTION' got renamed in recent
patch, however, there are other places that needs an update too.

Testing

devRunCorrectness - 100K
RESTUtilUnits.toml
RESTKmsConnectorUnits.toml
2023-03-08 17:37:39 -08:00
Nim Wijetunga 2702665e35
Refactor GetEncryptCipherKeys (#9600)
* inital commit

* address pr comments
2023-03-08 17:05:03 -08:00
Evan Tschannen 4a17ed363a Fix: the consistency check did not properly report failed tests 2023-03-08 16:56:23 -08:00
Nim Wijetunga 218ed4519f
Strengthen Snapshot Backup/Restore Asserts (#9552)
strengthen backup/restore asserts for encryption
2023-03-08 15:24:02 -08:00
Ata E Husain Bohra d0eec9d0ba
EaR: REST KMS fixes - encryption integration testing (#9598)
* EaR: REST KMS fixes - encryption integration testing

Description

Major changes:
1. Multiple fixes observed while performing integration end-to-end
testing for Encryption at-rest feature.
2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL
to have more granular control of feature logging disconnected from
the cluster log level.

Testing

Integration testbed:
1. Run fdbserver standalone
2. Run external KMS http-server to serve encryption key fetch requests
2023-03-08 09:49:43 -08:00
Hui Liu c43f8b3fdc
Refactor - introduce BlobRestoreController for APIs to manage restore state (#9616) 2023-03-08 07:50:30 -08:00
Johannes M. Scheuermann c6eca3f398 Format code 2023-03-08 08:33:19 +01:00
Johannes M. Scheuermann 1550f3c596 Make use of precomputed exclude check 2023-03-08 08:19:42 +01:00
Johannes M. Scheuermann bae627f016 Fix syntax 2023-03-08 08:19:42 +01:00
Johannes M. Scheuermann db8c60c80f Don't block the exclusion of stateless processes by the free capacity check 2023-03-08 08:19:41 +01:00
A.J. Beamon de5f2c0fee Disallow cluster names that start with the `\xff` byte 2023-03-07 11:46:34 -08:00
Steve Atherton 5ff0bc3f87
Merge pull request #9576 from sfc-gh-satherton/storage-configure-refactor
Storage and log engine configuration support / refactor a few things.
2023-03-07 02:10:14 -08:00
Steve Atherton f6747adf89 Move implementations to cpp file. 2023-03-06 18:43:26 -08:00
Jingyu Zhou 0259a243ae Switch DC if log router peek becomes stuck
Trying to a different DC if this happens.
2023-03-06 17:41:56 -08:00
Ata E Husain Bohra a45de70003
EaR: RESTClient HTTP compliance, fix json request content type (#9544)
* EaR: RESTClient HTTP compliance, fix json request content type

Description

  diff-1: Address review comments

RESTClient is responsible to handle FDB <-> KMS communication
for Encryption and other usecases. By design, it only supports
"secure connection" i.e. "https"; however, it seems there is a
need to expand the module to support "http" connection,
for instance: test and dev deployments for instance.

However, given RESTClient gets involved in handling high
sensitive contents such as: plaintext "encryption cipher
from a KMS", the feature is guarded using
CLIENT_KNOB->REST_KMS_ENABLE_NOT_SECURE_CONNECTION which is
settable using FDBServer command line argument
"--kms-rest-enable_not_secure_connection" (boolean)

Testing

Deployed a standalone fdbserver and communicate with a
simple "http" server
2023-03-06 16:06:03 -08:00
Jingyu Zhou 0d8bde9dcd
Merge pull request #9505 from jzhou77/fix
Support multiple key prefix filters for fdbdecode
2023-03-06 15:57:03 -08:00
Chaoguang Lin 7273723a43
Add the hotrange fdbcli command (#9570)
* Add the hotrange fdbcli command

* Remove the unnecessary state

* Add the doc about the hotrange command
2023-03-06 14:46:52 -08:00
Jingyu Zhou 7a0b3c05b9
Merge pull request #9540 from sfc-gh-huliu/timestamp
Report restore phase start time and eta
2023-03-06 14:06:23 -08:00
A.J. Beamon 85c3cf702c
Merge pull request #9584 from sfc-gh-ajbeamon/fix-metacluster-create-error-msg
Fix metacluster create error message
2023-03-06 10:30:03 -08:00
A.J. Beamon ea907f10f5 Print the tenant mode string rather than integer value when reporting that we couldn't create a metacluster 2023-03-06 09:25:50 -08:00
Josh Slocum e1b620135b Merge branch 'main' into bg_latency_fixes 2023-03-06 09:23:11 -06:00
Steve Atherton 50d567b5a5 Refactored some parts of database configuration to support log_engine=<name> and storage_engine=<name> and generate these when converting a DatabaseConfig JSON object to a `configure` command. Refactored `fileconfigure` and simulation setup to use the same JSON -> configure function as the same code was copy/pasted to both places but only one has been kept up to date with new features. Renamed Redwood to `ssd-redwood-1` canonically but the experimental name is still supported for backward compatibility. 2023-03-04 20:52:31 -08:00