Zhe Wang
770ffb7419
[Release-7.3] TLS should accept same key with different values ( #11763 )
...
* fix tls
* address comment
2024-11-15 00:16:16 -08:00
flowguru
1edb499428
Fix backup dryrun bug
...
* Fix backup dryrun bug
Currently there is a out-of-scope issue, this change also adds
a knob to control whether to allow dryrun of backup
* fix another bug that misses a wait statement
---------
Co-authored-by: Hao <fdbflowguru@gmail.com>
2024-11-14 09:46:11 -08:00
Syed Paymaan Raza
c3979112cb
Urgent consistency checker fixes (cherrypick 7.3) ( #11736 )
...
* [fdbserver] Drop duplicate or conflicted requests from urgent consistency checker clients
* Fix edge case in urgent consistency check causing infinite loop
* fixup! Fix edge case in urgent consistency check causing infinite loop
self review
2024-10-26 09:50:51 -07:00
Vishesh Yadav
2b23111720
[release-7.3] Log all incoming connections ( #11713 )
...
* Log all incoming connections
* Address review comments
* Update FlowTransport.actor.cpp
* Update FlowTransport.actor.cpp
* Refactor
* Format
* initialize for simulation
2024-10-16 20:32:32 -07:00
Hao Fu
525183012a
Retry with dryrun in the presence of s3 token error(release-7.3) ( #11602 )
...
* Retry with dryrun in the presence of s3 token error
s3 token is from local disk and might be expired or invalid,
before this change backup retries to upload data to s3 indefinitely,
thus it is a waste of network bandwidth.
Now retry with a get request of list all buckets in the case of
s3 token error, and only retry the upload when token error disappears.
* Finish testing, set default to false
* Check bucket exist or not, rather than listBucket
* address comments
2024-09-06 11:47:33 -07:00
Xiaoge Su
08554e4f57
Add PeerAddress to all PeerAddr/Peer TraceEvent [release-7.3] ( #11521 )
...
* Add PeerAddress to all PeerAddr/Peer TraceEvent
This is to address #4846
* fixup!
* Decorate TLS handshake errors with peerAddr (#10090 )
* Use connection debug ID in N2_AcceptHandshakeError
* Decorate TLS handshake errors with peerIP
* only write one value to ostream
* Add PeerAddress to all PeerAddr/Peer TraceEvent
This is to address #4846
---------
Co-authored-by: Sam Gwydir <sam.gwydir@snowflake.com>
2024-07-23 17:31:19 -07:00
Yao Xiao
83ae9ac129
[release-7.3] Wait for TSS during finishMoveShards. ( #11485 )
...
* Fix wait
* Fix wait
2024-07-22 23:41:00 -07:00
Zhe Wang
6d93c4dcac
[Release-7.3] Cherrypick Consistency Check Urgent ( #11228 )
...
* Consistency Check Urgent (Cherrypick from Release-7.1) (#11217 )
* cherry-pick-distributed-consistency-checker
* code cleanup
* refactor code, decouple consistencyCheckerUrgent and consistency checker
* fix workload for consistencycheckurgent
* add new consistencycheckurgent role type
* fix CI
* address comments
* fix-consistencycheckurgent-large-read (#11229 )
2024-03-05 10:45:15 -08:00
Jingyu Zhou
9e54cd0b5f
Merge pull request #11198 from johscheuer/abort-fdb-if-non-zero-exit-7.3
...
[RELEASE-7.3] Abort process when abnormal shutdown is initiated to allow coredumps to be generated
2024-02-14 15:36:37 -08:00
He Liu
11d8a5ef92
Cherry-pick to 7.3: Added checksum in MutationRef ( #11181 ) ( #11193 )
2024-02-14 14:48:53 -08:00
Johannes M. Scheuermann
fc59b9cbf6
Add knob to allow fdbserver to abort under abnormal behaviour
2024-02-14 10:17:32 +01:00
Johannes Scheuermann
2657754a53
Update link to abort function
2024-02-14 10:17:30 +01:00
Johannes M. Scheuermann
4f0361419b
Abort process when abnormal shutdown is initiated to allow coredumps to be generated
2024-02-14 10:17:28 +01:00
Johannes M. Scheuermann
e194d2bcaf
Fix complie issue for ALLOC_INSTRUMENTATION
2024-02-01 14:46:52 +01:00
yaoxiao-github
c8d092d067
Add log cleaner for rocksdb logs.
2024-01-17 14:55:55 -08:00
Jingyu Zhou
d4eea8f048
Fix picking of IPv4 addresses
2023-10-18 08:24:09 +02:00
Jingyu Zhou
84a68f94ab
Add a knob RESOLVE_PREFER_IPV4_ADDR to prefer IPv4 addresses
...
The default is to prefer IPv6 addresses.
2023-10-18 08:24:01 +02:00
Andrew Noyes
6fc73483a5
Implement Event using std::latch ( #10705 )
2023-09-25 11:14:27 -07:00
Zhe Wang
57e2f08c50
[Release-7.3] Cherry-pick audit storage optimizations ( #10840 )
...
* Multiple improvements to AuditStorages (#10685 )
* remove danger DDAudit assert, add AuditRate knob, add progress check when ssshard complete, add progress check for ssshard in fdbcli
* throttle progress check for ssshard
* fix getAuditProgressByServer
* fix trace event for ss audit
* using name -- checkMoveKeysLockForAudit
* new scheduleAuditLocationMetadata
* address comments
* shorten progress summary for ssshard
* simplify getAuditProgressByServer in fdbcli
* Audit storage for specific engine (#10781 )
* audit storage for specific engine
* fix getStorageType
* fix budget of skipAuditOnRange
* fix budget in scheduleAuditOnRange
* fix CI error
* improve trace events
* address comments
* Audit location metadata in DD (#10820 )
* Audit location metadata in DD
* nits
* Fix auditStorage: Audit task should not retry if the task is issued by an outdated DD (copy from main PR 10844)
2023-08-29 11:09:28 -07:00
Evan Tschannen
2ff2b2cf38
added a blob worker specific page cache size for redwood so that it does not have to be changed manually in fdb.conf for all blob worker processes
2023-08-07 17:20:04 -07:00
Evan Tschannen
6ccc6c9b5b
addressed review comments
2023-08-07 17:20:04 -07:00
Nim Wijetunga
7e14bd3389
add kms and ekp status to json
2023-08-07 14:59:58 -07:00
Josh Slocum
6f044dc339
reducing frequency of EKC latency logging to the standard for latencies in fdb
2023-07-19 14:05:00 -07:00
A.J. Beamon
0d9c581bd1
Rename TraceLog::log to logMetrics and move initialization of trace log metrics into TraceLog::open
2023-07-19 14:05:00 -07:00
A.J. Beamon
fa70b885fe
Fix condition in assert
2023-07-19 14:05:00 -07:00
A.J. Beamon
a73227f7c1
Structure access to TraceLog::logTraceEventMetrics so that it is written before a trace log is opened and only read from one thread after it is opened.
2023-07-19 14:05:00 -07:00
A.J. Beamon
20915b749f
Make CodeProbeImpl::_hitCount atomic
2023-07-19 14:05:00 -07:00
Zhe Wang
cc4781a4ec
Detect inconsistency of KeyServers and ServerKeys in real time ( #10484 )
...
* add framework
* add audit logic
* refactor audit loc metadata
* address comments
* add realtime audit timeout, add post validation logic
* fix input empty range to compareKeyServersAndServerKeys
* add context for auditKeyServersAndServerKeysInRealTime
* focus on moveShard
* remove space
* address comments
* cleanup
* add audit cleanup
* make validateRangeAssignment simple
* change trace name
* add shardAssigned
* stop DD when inconsistency detected
* fix ci
* small fix
* revert ss and auditUtl and simplify rt audit
* cleanup ss
* tiny change
* address comments and refactor code
* make auditLocationMetadataPreCheck retriable
* handle actor cancel in auditLocationMetadataPreCheck
* rm timeout and add new protection for failure of audit
* fix bugs
* import dataMoveId to validation
* improve trace event
* carefully propagate error and stop DD
* tiny fix
* small change
* remove a state var
* nit
* clean comments
* fmt
2023-07-06 10:00:31 -07:00
Xiaoge Su
32a60e9a29
[release-7.3] Report missing attributes when constructing status ( #10524 )
...
* Report missing attribute in StorageServer status
* fixup! reformat source
* fixup! Reformat source
* Retrigger CI
2023-06-21 14:02:43 -07:00
Zhe Wang
aa47c0c722
[Release 7.3] Cherry-pick Add audit storage cancellation ( #10386 ) ( #10430 )
...
* Add audit storage cancellation (#10386 )
* list audits
* cancel audits and corresponding tests
* make audit storage dblock aware
* increase audit retry since we are able to cancel
* fix updateAuditState and fdb github ci
* fmt
* fix fdbcli audit_storage and fix CI issue
* fix fdb cli
* address comments
* fmt
* Fix audit storage actor cancel issue (#10443 )
* init
* add testAuditStorageConcurrentRunForSameType test
* init (#10458 )
2023-06-09 10:43:54 -07:00
He Liu
f408c1bb08
Cherry pick psm and several other PRs ( #10405 )
...
* Psm ss (#9817 )
* Update NativeAPI getCheckpointForRange().
* Implemented checkpoint in SS.
* clean up.
* Disabled StorageServerCheckpointTest.
* Serialized checkpoint creation and deletion.
Simplified checkpoint GC, via deleting CheckpointMetaData::dir.
* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.
* Minor improvements on CheckpointMetaData and DataMoveMetaData.
* fmt.
* Optimized PhysicalShardMove test
cleanup.
* Refactored ShardedRocks checkpoint/restore for psm.
* Complete ShardedRocks::restore.
* dismiss operation_obsolete, and throw actor_cancelled.
* Validate checkpoint when !asKeyValues.
* fmt.
* Don't read from uninitialized physical shard.
* Resolved commments.
* cleanup.
* Added verify_checksum_before_restore for ShardedRocks.
* Added ShardedRocksDB checkpoint/restore unit test.
* Populate CheckpointMetaData::dir in RocksDB.
* Rename MovingIn as Adding.
* Added StorageServerUtils.
* Added physical shard move in SS.
* Fix on ApplyMetaData, doFetchFile error handling etc.
* Debugging incorrect shard size.
* Create/delete checkpoints only when Physical shard move is enabled.
* Added back SHARD_ENCODE_LOCATION_METADATA.
* Fixed bytesSample incorrect issue.
Essentially dedicated CheckpointRocksDBCF as key-value based checkpoint, will need to add a new format for the file-based checkpoint.
* Cleanup.
* Cleanup & compile rocksdb with 8.1 branch.
* clean up.
* clean up.
* Allowed request_maybe_delivered error type in FetchShard.
* Added FDBRocksDBVersion.h.
* Fixed stuck fetchShard.
* Don't create checkpoint on TSS.
* Upgrade to RocksDB 8.1.1
* Cleanup.
* Fixed accidently deleted db_path and name fields.
* Improved trace event.
* Removed redundants from previuos ShardedrocksDB.
* Cleanup.
* cleanup.
* cleanup.
* reanme `state`.
* Cleanup.
* Removed excessive TraceEvent.
* * Fixed shardMap race condition on different threads
* Added *Stats, logging data move rates.
* Added `DD_PHYSICAL_SHARD_MOVE_PROBABILITY` to support hybrid data move.
* Resolved comments.
* fmt.
* Use physical shard move in PhysicalShardMoveTest.
* Enforce physical-shard-move for PhysicalShardMoveTest.
* fmt
* Reverted unintended changes.
* Added more logs about shard management. #10303
* Removed ENABLE_DD_PHYSICAL_SHARD_MOVE #10324
* Delete a data move if key range is not consistent. #10334
Disable physical shard move by default. #10335
2023-06-06 13:29:24 -07:00
Zhe Wu
22a79b0b7a
Update FDB_AV_LATEST_BINDINGS_VERSION to 7.3
2023-05-23 13:17:09 -07:00
Zhe Wu
b75da0dda0
Add recovered at in CSTATE, and use a knob to guard the use of it
2023-05-22 09:57:13 -07:00
Josh Slocum
ff0c61aaf0
Adding BlobFailureInjection workload
2023-05-19 13:12:57 -07:00
Josh Slocum
8d504f3d2f
Adding explicit blob range mutation log to handle large number of ranges
2023-05-16 18:25:09 -07:00
Josh Slocum
d9c6ed9bb5
Passes existing tests
2023-05-16 17:28:11 -07:00
Josh Slocum
e13fcf3e5e
Adding Simulated HTTP Server and refactoring HTTP code
2023-05-16 16:32:34 -07:00
He Liu
a8cc5fdde0
Merge branch 'release-7.3' into cherry-pick-audit-storage
2023-05-11 10:41:43 -07:00
Zhe Wang
0d8406dde3
Adding cleanup of old audit metadata ( #10137 )
...
* clean up old audit metadata
* change comments
* fix audit cleanup rule as PR description claim and reduce timeout of auditStorageCorrectness in tester
* address comment
* clear audit metadata should not throw error
* cleanup progress metadata by type
* control number of AuditStatistic events
* carefully persist new audit state
* add unit tests and fix issues
* cleanup
* allow audit concurrent run for different types and fix some bug in auditutl
* fix ci issue and nits
2023-05-10 21:18:21 -07:00
Zhe Wu
2158c985fd
Update 7.3 branch to version 7.3
2023-05-10 13:59:36 -07:00
Xiaoge Su
2f70eb12f5
fixup! Cherry pick the change of genericactors.actor.h:store in #9991
2023-05-03 16:29:18 -07:00
Steve Atherton
dd653064dc
Address review comments. KeyRangeMapSnapshot is now ReferenceCounted and getSnapshot() returns a Reference to discourage copying. Added several comments for clarity. Added FormatUsingTraceable and changed all new formatters to use it except for Standalone<T> which redirects to the formatter for T.
2023-05-03 15:52:35 -07:00
Steve Atherton
5c413e48f3
Added `transaction_option_setter<DB>` to determine if a DB-like thing has a `->setOptions(tr)` method. This method is called in `runTransaction()` templates at the top of the retry loop and in the manual retry loops in KeyBackedTypes. Added `if constexpr(` support to the ActorCompiler to support this.
2023-05-03 15:52:09 -07:00
Steve Atherton
32bf39ef8b
DataDistributor will restart if DDConfiguration changes.
2023-05-03 15:29:13 -07:00
Steve Atherton
8d35aa113d
Changed KeyBackedTypes to an actor file. Added TypedKeySelectors for Map and Set classes and getRange() keySelector methods. Added debug macro for KeyBackedTypes. Rewrote KeyBackedRangeMap using keyselectors on KeyBackedMap.
2023-05-03 15:27:57 -07:00
Steve Atherton
7b88d4368b
DDConfiguration class for modeling user specified key range configuration options. Added KeyBackedRangeMapSnapshot, some other supporting changes to KeyBackedTypes. Added invalidKey to give KeyBackedTypes a safe prefix to avoid accidental userspace modification from uninitialized accessors.
2023-05-03 15:27:57 -07:00
Chaoguang Lin
bbc6c6cc8f
Fix for correctness failures when issuing duplicate requests
...
Add comments; Disable failure injection in Snap test
2023-05-03 14:43:50 -07:00
Junhyun Shim
98bd49c676
Apply review suggestions
2023-05-02 17:38:51 -07:00
Junhyun Shim
4b3d8f43da
Extend WipedString guarantees to serialized packets
2023-05-02 17:38:51 -07:00
Ata E Husain Bohra
37e0a43106
Address review comments and fix compilation issue
...
Description
Testing
2023-05-01 19:24:04 -07:00