foundationdb

Commit Graph

Author	SHA1	Message	Date
Zhe Wang	d6e7b5f736	Audit storage: validate consistency of replica and shard location metadata (#9628 ) * Implemented AuditUtils.actor.cpp Moved AuditUtils to fdbserver/ * Persist AuditStorageState. * Passed persisted AuditStorageState test. * Added audit_storage_error to indicate a corruption is caught. Throw/Send audit_storage_error when there is a data corruption. Added doAuditStorage() for resuming Audit. * Load and resume AuditStorage when DD restarts. * Generate audit id monotonically. * Fixed minor issue AuditId/Type was not set. * Adding getLatestAuditStates. * Improved persisted errors and added AuditStorageCommand.actor.cpp for fdbcli. * Added `audit_storage` fdbcli command. * fmt. * Fixed null shared_ptr issue. * Improve audit data. * Change DDAuditFailed to SevWarn. * Sev. * set SERVE_AUDIT_STORAGE_PARALLELISM to 1. * Moved AuditUtils* to fdbclient/. * Added getAuditStatus fdbcli command. * Refactor audit storage fdb cli commands. * Added auditStorage in sim. * Cleanup. * Resolved comments. * Resolved comments. * Added SystemData for metadata audit. Refactored audit workflow to make sure all sub-tasks are executed w/o early exit. * Improvements. * Persisted Failed state after too many retries. * Added retryCount for resumeAuditStorage(). * resolving conflict. * Resolved conflicts. * allow-merged-to-run * add timeout to audit client * fmt * validate replica * add audit serverKey * address comments and fmt * fix audit_storage_exceeded_request_limit * fix segfault in getLatestAuditStatesImpl * fix bugs * remove timeout from workload * fix bugs * audit local view of shard assignment * fmt * fix-stuck-issue-and-make-dd-audit-storage-self-retry * fix timeout * fix timeout * fix bugs and cleanup * fix nit * change name state to coreState for audit metadata * address comments * code clean * fmt * setup debug * cleanup * clean up * code cleanup * code clean * remove tmp file * fmt * trace portion of shards that of anonymous physical shard * remove unnecessary actor cleanup * do not give up when tr is too old * address commits * refactor * clean * fmt * fix-command-help-text * fix-auditstate-restore-and-enable-restore-to-metadata-audit * address comments * fmrt * debug and improve efficient of resume audit * small change * fix audit cli * bypass completed audit when dd restart * fix auditStorageCommandActor * make mismatch key range more visable * address comments * make local shard metadata check can make progress by retries * address comments * address comments * partition location metadata validation by range and server * unset MIN_TRACE_SEVERITY * address comments and SS auto proceed until failed then notify dd * persistNewAuditState should checkMoveKeysLock * audit storage location metadata partitioned by range and move shard assignment history def to the end of SS structure * code cleanup * fix error message in metadata validation * fix registerAuditsForShardAssignmentHistoryCollection input for local shard validation * add comments to code and add guard to make sure the SS audit does not proceeds automatically for many times without being notified by DD --- to support audit cancellation later * fix coalesceRangeList * replace rangeOverlapping func with operator and use struct instead of complicated type for return value of getKeyServer/serverKey/shardInfo * simplify shard assignment history * shardAssignmentRecordRequests should be unorder_map * address comments, make trackShardAssignment simple, make anyChildAuditFailed cover all audit children, keep only one audit actor run at a time on each SS * only run validate shard info once at a time, other audit type does not have this limitation --------- Co-authored-by: He Liu <heliu05023@gmail.com> Co-authored-by: He Liu <heliu@apple.com> Co-authored-by: Zhe Wang <zhewang@Zhes-Laptop.local>	2023-05-01 10:35:52 -07:00
Steve Atherton	16d8b1d1f9	Merge pull request #9949 from sfc-gh-etschannen/fix-shard-count fix: do not let too many shards use large teams	2023-04-29 23:50:49 -07:00
Steve Atherton	e291d5e51f	Merge commit '5e8feac8c980fbef6b6f523360e42d28dd120e5d' into random-kv-generator	2023-04-28 14:11:42 -07:00
Josh Slocum	5b47913882	disabling global conncetion pool for now (#10054 )	2023-04-28 09:48:56 -05:00
neethuhaneesha	53fe07a709	Enabling auto_prefix_mode to true in rocksdb. (#10050 )	2023-04-27 12:11:48 -07:00
A.J. Beamon	f1cbc86b94	Add a metacluster version to the MetaclusterRegistrationEntry and validate it when loading the entry from the cluster.	2023-04-27 10:04:57 -07:00
Jay Zhuang	0ab691b707	Merge pull request #10002 from sfc-gh-jazhuang/readThrough Fix RangeResult.readThrough misuse	2023-04-27 09:59:11 -07:00
Xiaoxi Wang	a05e078c4a	Remove locations.size() == expectedShardCount assertion and add comments	2023-04-26 14:23:09 -07:00
He Liu	e11f804f96	ShardedRocks checkpoint/restore for physical shard move (#9752 ) * Update NativeAPI getCheckpointForRange(). * Implemented checkpoint in SS. * clean up. * Disabled StorageServerCheckpointTest. * Serialized checkpoint creation and deletion. Simplified checkpoint GC, via deleting CheckpointMetaData::dir. * Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset. * Minor improvements on CheckpointMetaData and DataMoveMetaData. * fmt. * Optimized PhysicalShardMove test cleanup. * Refactored ShardedRocks checkpoint/restore for psm. * Complete ShardedRocks::restore. * dismiss operation_obsolete, and throw actor_cancelled. * Validate checkpoint when !asKeyValues. * fmt. * Don't read from uninitialized physical shard. * Resolved commments. * cleanup. * Added verify_checksum_before_restore for ShardedRocks. * Added ShardedRocksDB checkpoint/restore unit test. * Populate CheckpointMetaData::dir in RocksDB. * Addressed comments.	2023-04-26 09:17:18 -07:00
Steve Atherton	f9c8840fd6	Initial checkin of RandomKeyValueUtils.h/cpp and a unit test.	2023-04-26 01:34:56 -07:00
Steve Atherton	7f6d5f296a	Merge commit 'e318fc260070ba6ba604930b8f259c9b655938ea' into keybackedrangemap # Conflicts: # flow/include/flow/error_definitions.h	2023-04-25 14:21:23 -07:00
Jingyu Zhou	6b15d67928	Merge pull request #10010 from jzhou77/main Properly handle proxy_memory_limit_exceeded error for GetKeyServerLocationsRequest	2023-04-25 11:18:03 -07:00
Jingyu Zhou	74bb659f71	Simplify backoff calls per comment	2023-04-25 09:15:16 -07:00
Steve Atherton	858b51a69b	Address review comments. KeyRangeMapSnapshot is now ReferenceCounted and getSnapshot() returns a Reference to discourage copying. Added several comments for clarity. Added FormatUsingTraceable and changed all new formatters to use it except for Standalone<T> which redirects to the formatter for T.	2023-04-24 19:01:05 -07:00
Steve Atherton	7f3df82d98	Add code probes and trace events to range config setup. Make default and non-default ranges randomly on/off. Restore GetMappedRange getRange buggify on range read size.	2023-04-22 23:52:49 -07:00
Steve Atherton	2c7f3c2120	In KeyBackedTypes, many methods have two versions, one which accepts a Transaction and one which accepts a Database from which a Transaction is created and a retry loop is run via runTransaction(). This refactor combines both versions into a single function which uses a static check to call itself with runTransaction() if the passed object is a Transaction creator.	2023-04-22 13:58:00 -07:00
Steve Atherton	c57ed25987	Renamed SystemDBLockWriteNow() to SystemDBWriteLockedNow() and changed definition to be more direct / clear.	2023-04-22 13:17:41 -07:00
Steve Atherton	dd334f1b02	Added `transaction_option_setter<DB>` to determine if a DB-like thing has a `->setOptions(tr)` method. This method is called in `runTransaction()` templates at the top of the retry loop and in the manual retry loops in KeyBackedTypes. Added `if constexpr(` support to the ActorCompiler to support this.	2023-04-22 11:10:20 -07:00
Steve Atherton	893faf7d5a	Change optional get() to ->.	2023-04-22 01:15:49 -07:00
Steve Atherton	639d4d05ef	Removed SYSTEM_PRIORITY_IMMEDIATE from KeyBackedTypes and all options from KeyBackedRangeMap database functions. Added SystemTransactionGenerator<> for wrapping Database types and generating transactions with selected system level options.	2023-04-21 19:00:29 -07:00
Steve Atherton	46cde666a5	Merge commit '9639192a88001043a104aeef0c394e99ca5d6a6e' into keybackedrangemap	2023-04-21 13:27:15 -07:00
Steve Atherton	879e729cec	WatchableTrigger::onChange() provides a signal when its baseline version is stablished. Wrote docs for it. Fixed DD race condition in DD config watching using this feature.	2023-04-21 13:18:31 -07:00
sfc-gh-tclinkenbeard	9639192a88	Add GLOBAL_TAG_THROTTLING_REPORT_ONLY knob	2023-04-21 11:13:42 -07:00
Steve Atherton	948e2dd781	Bug fix in KeyBackedRangeMap::updateRange() where the range after the modified region could be set wrong. Added Database version of updateRange().	2023-04-20 20:44:24 -07:00
Jingyu Zhou	c544985fe5	Add trace events and adjust backoff For each success, half the backoff until less than initial backoff value, then set the backoff to 0.	2023-04-20 15:56:06 -07:00
Jon Fu	a7cf82adb2	Update fdbcli tenant list function to take tenant group filter, support JSON, and report tenant IDs (#9967 ) * fix metacluster get segfault * update fdbcli tenant list function to take tenant group filter, support JSON, and report tenant IDs * code review changes * code formatting * additional code review changes * account for empty tenant groups * reformat error catching in fdbcli command * refactor json output and address code review comments * add back mistakenly removed hint * keep hints after 4th token * add to tenant management workload * fix compile error * fix test range * add more asserts to metacluster case * nest test condition inside if block * adjust tenant test layout * refactor some test files * reorganize test workload logic	2023-04-20 16:22:47 -04:00
Steve Atherton	2553aed118	KeyBackedRangeMap::updateRange() now coalesces adjacent matching ranges caused by the update, and supports replacing a range's config with a new explicit value. Added update command to rangeconfig cli.	2023-04-20 13:02:04 -07:00
Josh Slocum	ae862e1b96	fixing code probe issues	2023-04-20 11:54:47 -07:00
Steve Atherton	2e9a7f927b	Prevent KeyBackedRangeMap::getSnapshot() from touching a key outside of its map range.	2023-04-19 23:14:10 -07:00
Steve Atherton	183492cfb3	Change WatchableTrigger back to use Versionstamps and complete the watch() and onChange() implementations.	2023-04-19 21:45:56 -07:00
Ata E Husain Bohra	a099d377fa	EaR: Remove unused CODE_PROBE handling encrypiton header flag version (#10020 ) Description Patch removes an unused CODE_PROBE checking the encryption header being read flag version is valid, given the flag-version is determined by peeking into std::variant index and we only have version-1 supported, for now converted the check to an ASSERT Testing EncryptionUnitTests.toml EncryptionOps.toml BlobGranuleCorrectness/Clean.toml	2023-04-19 18:02:31 -07:00
Jingyu Zhou	a83295e3bd	Add backoff to lookupTenantImpl for commit_proxy_memory_limit_exceeded error	2023-04-19 16:55:46 -07:00
Jingyu Zhou	3bfd353a22	Add backoff to getKeyLocation_internal as well	2023-04-19 16:45:50 -07:00
Jingyu Zhou	ad1c7bba74	Make commit_proxy_memory_limit_exceeded error a database level backoff Since this error means the database is overloaded.	2023-04-19 15:52:16 -07:00
Josh Slocum	377dd0d754	Fixing path for non-backup (blob granule) blob url use cases to not prepend /data to path (#10011 )	2023-04-19 14:18:08 -05:00
Nim Wijetunga	021bdccc32	propogate encryption errors properly (#10012 ) propogate encryption errors properly	2023-04-19 11:35:29 -07:00
Jingyu Zhou	b49625d45b	Properly handle proxy_memory_limit_exceeded error for GetKeyServerLocationsRequest Also add buggify to inject the error in simulation.	2023-04-19 09:35:09 -07:00
Jay Zhuang	8e7a5b5b22	Add the helper function to support reverse range read	2023-04-19 08:31:52 -07:00
Steve Atherton	edb071c6f2	Updated includes on newer files.	2023-04-18 22:51:27 -07:00
Steve Atherton	ebb3c8d698	Apply clang-format, debug output changes.	2023-04-18 22:21:27 -07:00
Steve Atherton	bc6b9cb83f	Added toJSON for DD Range configuration.	2023-04-18 22:21:27 -07:00
Steve Atherton	d210772825	Bug fix in reading KeyBackedRangeMap local snapshot where returned iterable range sets would iterate to nothing.	2023-04-18 22:21:27 -07:00
Steve Atherton	3877eb9019	Rewrote KeyBackedMap and KeyBackedSet getRange(KeySelectors..) to fix bugs. Removed TypedKeySelector::packBounded() because it's a bad idea. Added seek() to KeyBackedMap and KeyBackedSet which is a convenient way to find an item in the container which is <, <=, >, >= some query. Rewrote KeyBackedRangeMap on these new features to remove bugs. Simplified KeyBackedRangeMap::ValueType contract. Updated DDRangeConfig to get rid of the forceBoundary concept, replaced with the teamID concept.	2023-04-18 22:21:27 -07:00
Steve Atherton	38bdc8bcf4	Disable KeyBackedTypes debug by default.	2023-04-18 22:21:26 -07:00
Steve Atherton	53ee26d758	Changed KeyBackedTypes to an actor file. Added TypedKeySelectors for Map and Set classes and getRange() keySelector methods. Added debug macro for KeyBackedTypes. Rewrote KeyBackedRangeMap using keyselectors on KeyBackedMap.	2023-04-18 22:21:19 -07:00
Steve Atherton	b7e68bbf51	DDConfiguration class for modeling user specified key range configuration options. Added KeyBackedRangeMapSnapshot, some other supporting changes to KeyBackedTypes. Added invalidKey to give KeyBackedTypes a safe prefix to avoid accidental userspace modification from uninitialized accessors.	2023-04-18 22:09:18 -07:00
Steve Atherton	b1a17cce0d	Added KeyBackedRangeMap and SystemKey.	2023-04-18 22:03:41 -07:00
Steve Atherton	585ddb9ffc	Avoid unnecessary copy.	2023-04-18 21:48:55 -07:00
Steve Atherton	a4438d4542	Refactored and simplified KeyBackedTypes to base types KeyBackedProperty, KeyBackedMap, and KeyBackedSet. Object and BinaryValue variations are now redefined as customizations of the base types. Added WatchableTrigger for using a key to track a last updated version, supported by all KeyBackedTypes. KeyBackedStruct is renamed to KeyBackedClass contains a WatchableTrigger to pass to contained KeyBackTypes.	2023-04-18 21:48:55 -07:00
Ankita Kejriwal	6b8e35fd19	Merge pull request #9964 from sfc-gh-xwang/fix/main/comments fix HealthMetrics update bug and correct comments about readLoadKSecond()	2023-04-18 16:42:42 -07:00
Yanqin Jin	09ab501cb7	Improve logging for test (#9966 ) In clearData() used by simulation test to clean up, we re-use the same debugID for multiple transactions, making it less clear when grepping for the debugID. This PR assigns a new debugID for each new transaction. Some of the commented-out tracing code are already obsolete, I updated several of them, but am not sure if the tracing should be enabled when enabling debug transaction. Finally, use different but similar with the same prefix error messages for different call stacks. Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>	2023-04-18 15:25:06 -07:00
Jay Zhuang	8da1b875df	Update getReadThrough() to return Key directly And a few comment update	2023-04-18 13:13:59 -07:00
Jay Zhuang	8865be10dd	Add nextBeginKeySelector() to avoid key clone	2023-04-18 12:24:21 -07:00
Nim Wijetunga	22ba818133	Prevent Encryption Key Refresh for Non-Latest Keys (#9959 ) prevent refresh for old encryption keys	2023-04-18 09:43:24 -07:00
Jay Zhuang	b7da2ed16c	Fix RangeResult.readThrough misuse Fix `RangeResult.readThrough` misuses: 1. KeyValueStores do not need to set readThrough, as it will not be serialized and return. Also setting it to the last key of the result is not right, it should at least be the keyAfter of the last key; 2. Fix NativeAPI doesn't set `RangeResult.more` in a few places; 3. Avoid `tryGetRange()` setting `readThrough` when `more` is false, which was a workaround for the above item 2; 4. `tryGetRangeFromBlob()` doesn't set `more` but set `readThrough` to indicate it is end, which was following the same above workaround I think. Fixed that. 5. `getRangeStream()` is going to set `more` to true and then let the `readThrough` be it's boundary. Also added readThrough getter/setter function to validate it's usage.	2023-04-17 21:37:51 -07:00
sfc-gh-tclinkenbeard	862c7e2ee8	Fix uninitialized memory issues in ratekeeper metrics	2023-04-17 15:27:19 -07:00
sfc-gh-tclinkenbeard	7076c050d2	Decrease MIN_TAG_*_PAGES_RATE defaults	2023-04-17 15:16:29 -07:00
Jingyu Zhou	a31f0e1641	Merge pull request #9862 from hfu94/rmain Revert matchIndex feature	2023-04-17 12:43:49 -07:00
Xiaoxi Wang	50e1f629fe	EligibilityCounter use int type; resolve review comments;	2023-04-17 12:24:17 -07:00
Xiaoxi Wang	bdab07cbc8	fix detailed HealthMetrics update bug add omitted return value	2023-04-17 12:24:14 -07:00
sfc-gh-tclinkenbeard	a7217055c8	Update default value of MIN_TAG_WRITE_PAGES_RATE to match default value of MIN_TAG_READ_PAGES_RATE	2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard	568518b6a3	Update semantics of MIN_TAG_WRITE_PAGES_RATE to reflect name	2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard	7e4d4d6527	Update semantics of MIN_TAG_READ_PAGES_RATE to reflect name	2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard	4be3c3e7ff	Fix initialization of SERVER_KNOBS->MIN_TAG_READ_PAGES_RATE	2023-04-17 11:58:50 -07:00
sfc-gh-tclinkenbeard	0f0eb7c2b6	Add GLOBAL_TAG_THROTTLING_TRACE_INTERVAL knob	2023-04-17 10:09:38 -07:00
hao fu	29161b2fda	Revert matchIndex feature It is not protocol compatible, revert it to avoid deployment issue. Will have a new PR to have the feature if moving forward.	2023-04-17 09:39:45 -07:00
Josh Slocum	370feaa3c9	refactoring and adding future compatibility to blob range metadata (#9955 ) * refactoring and adding future compatibility to blob range metadata * formatting	2023-04-13 15:06:50 -05:00
Evan Tschannen	12e507e06c	rename knobs	2023-04-13 09:40:37 -07:00
Ata E Husain Bohra	fe0a4df06a	EaR: Implement Key Check Value semantics (#9936 ) * EaR: Implement Key Check Value semantics Description Key Check Value (KCV) is a checksum of cryptographic encryption key used to validate encryption keys's integrity. FDB Encryption at-rest relies on external KMS to supply encryption keys. Patch proposes following major changes: 1. Implement Sha256 based KCV implementation to protect against 'baseCipher' corruption in two possible scenarios: a) potential corruption external to FDB b) potential corruption within FDB processes. 2. Scheme persists computed KCV token in block encryption header, which then gets validated as part of header validation during decryption. 3. FDB Encryption key derivation uses HMAC_SHA256 digest generation scheme, which allows max 64 bytes of 'cipher buffer', patch add required check to ensure 'baseCipher' length are within bounds. OpenSSL HMAC underlying call ignores extra length if supplied, however, it weakens the security guarantees, hence, disallowed. Testing devRunCorrectness - multiple 500K runs Valgrind & Asan - BlobCipherUnit, RESTKMSUnit, BlobGranuleCorrectness*, EncryptionOps, EncryptKeyProxyTest	2023-04-12 14:29:31 -07:00
Xiaoxi Wang	f7061debde	remove unused CPU knob; add comments for EligibilityCounter	2023-04-12 09:33:05 -07:00
Xiaoxi Wang	b0fe14aed5	getTeam based on EligiblityCount	2023-04-12 09:33:05 -07:00
Xiaoxi Wang	7ca44124d4	explain what does pivot ratio mean; fix the knob assertion	2023-04-12 09:33:05 -07:00
Xiaoxi Wang	5648f827a0	adjust CPU pivot knobs to hack simulation test	2023-04-12 09:33:05 -07:00
Xiaoxi Wang	31fd4bb272	consider consistent low CPU status for 5min	2023-04-12 09:33:05 -07:00
Xiaoxi Wang	490a7b534a	add getAverageCPU method; delete default value of GetTeamRequest arguments (solve conflicts)	2023-04-12 09:33:05 -07:00
Xiaoxi Wang	67b737b44d	add getStorageStats method to DatabaseContext	2023-04-12 09:33:05 -07:00
Xiaoxi Wang	1181b1dfab	use max(ops*empty_penalty, readbytes) as new read load (solve conflicts)	2023-04-12 09:33:05 -07:00
Zhe Wu	10a6f3d2d0	Merge pull request #9890 from halfprice/zhewu/log-router-gray-failure Gray failure detects disconnected remote log router and recover high DC lag	2023-04-07 16:25:11 -07:00
Ata E Husain Bohra	9d8e8d2f9e	Update fdbclient/BlobCipher.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>	2023-04-07 10:34:11 -07:00
Ata E Husain Bohra	e10259f461	Fix Asan reportin gheap overflow Description Fix Asan reportin gheap overflow Testing BlobCipherUnitTest with failing seed	2023-04-07 10:24:22 -07:00
Josh Slocum	d37b2b0a76	Adding BlobFailureInjection workload (#9833 ) * Adding BlobFailureInjection workload * fixing formatting	2023-04-06 15:10:36 -05:00
Josh Slocum	aef5130da2	adding system priority option to getDatabaseConfiguration, and several debugging improvements (#9864 )	2023-04-06 15:08:40 -05:00
Hui Liu	396f89a3f4	Cleanup stale disk files for double recruitment of storage server (#9794 )	2023-04-06 12:13:59 -07:00
Ata E Husain Bohra	ecc6d5a712	EaR: Fix BlobCipher cache handling for cipher needs refresh and/or expired (#9845 ) * EaR: Fix BlobCipher cache handling for cipher needs refresh and/or expired Description Patch proposes BlobCipher cache bug related to handling of cipherKeys that either 'needsRefresh' and/or 'expired' Also, adds a unit-test to cover the following usecase: 1. Test refreshAt and expireAt properties of the cipherKey 2. Validate corresponding Counter value increments Testing Extend /blobCipher unitest tests	2023-04-06 11:43:10 -07:00
Nim Wijetunga	c780d706d1	Remove EKP Interface from ServerDBInfo (#9909 ) Remove ekp interface from ServerDBInfo	2023-04-06 11:35:47 -07:00
Hui Liu	711e040627	RestoreConfig - use restoreRangeSet to replace restoreRanges (#9912 )	2023-04-06 11:16:05 -07:00
Ata E Husain Bohra	625eb00d36	Relax EKP failure detection time for simulated runs (#9908 ) Description EKP failure detection time is choosen to be fairly low given EKP availability could impact cluster availability. However, in simulation runs, for various reasons such a low-latency failure detection could cause EKP recruitment to be done in a loop. Patch relaxes the failure detection time for simulation runs Testing	2023-04-05 18:12:25 -07:00
Nim Wijetunga	6e4e6ab2f4	Revert "Revert "Refactor GetEncryptCipherKeys (#9600 )"" (#9903 ) * Revert "Revert "Refactor GetEncryptCipherKeys (#9600)" (#9708)"	2023-04-05 10:03:48 -07:00
A.J. Beamon	6023ee866a	Fix format	2023-04-04 16:10:56 -07:00
A.J. Beamon	974376b484	Use originalEntry rather than updatedTenantEntry to access the tenant name for the tenant index entry being erased.	2023-04-04 16:02:26 -07:00
A.J. Beamon	e27908556a	Update the tenant group index to be a tuple of (tenant group name, tenant name, tenant ID)	2023-04-04 14:46:15 -07:00
Zhe Wang	8102ac9a41	Serialize concurrent datamoves and their cleanups (#9421 ) * retry when concurrent dm cleanups happens * cleanup * fix conflict data move and cleanup * clean * make code looks better * use background cleanup * fmt * unset StartMoveShardsFoundConflictingDataMove error * address comments	2023-04-04 11:38:24 -07:00
Zhe Wu	90f035d7f7	Merge pull request #9894 from halfprice/zhewu/disable-gc-generation Disable GC Tlog generation due to a bug exposed by the new logic	2023-04-04 10:05:37 -07:00
A.J. Beamon	9c786e6d1e	Merge pull request #9854 from sfc-gh-ajbeamon/metacluster-separate-project Metacluster refactoring	2023-04-04 09:43:41 -07:00
Jingyu Zhou	f17e466aa4	Remove printable() from TSS trace events (#9889 )	2023-04-04 08:46:39 -07:00
Zhe Wu	3751641704	Disable GC Tlog generation due to a bug exposed by the new logic in the existing code	2023-04-04 08:12:03 -07:00
Zhe Wu	675663327d	Adding a knob to control log router triggered recovery	2023-04-03 22:31:22 -07:00
Nim Wijetunga	2867443f95	Recurit EKP without Enabling Encryption (#9885 ) Recruit EKP without needing to enable encryption	2023-04-03 20:05:21 -07:00
neethuhaneesha	1d6908d3b4	Changing single key deletions to delete based on number of deletes instead of bytelimit. (#9867 )	2023-04-03 13:55:58 -07:00
Ata E Husain Bohra	769226e5c0	EaR: Fix heap-over-flow in BlobCipherTest (#9877 ) Description Heap overflow was due to recent upgrade in BlobCipherTest to use variable size 'baseCipher' buffer. Testing BlobCipherUnit test	2023-04-03 12:25:59 -07:00
Zhe Wu	7ab0ae8189	Merge pull request #9041 from halfprice/zhewu/gc-earlier-generations Track TLog generation recovery and remove no longer needed TLog generations before recovery reaches to `fully_recovered`	2023-04-03 10:16:19 -07:00
Zhe Wu	50a20946d1	Implement check if locality is already excluded in exclude locality command	2023-04-01 19:04:58 -07:00
Zhe Wu	d903d0cc8d	checkSafeExclusion should always create new ExclusionSafetyCheckRequest	2023-03-31 16:27:26 -07:00
Jingyu Zhou	92c182a842	Merge pull request #9858 from sfc-gh-nwijetunga/nim/fix-rest-connector-tls Enable TLS with HTTPS connection	2023-03-31 10:02:58 -07:00
Jingyu Zhou	3560117f71	Merge pull request #9846 from johscheuer/fix-exclusion-check Don't stop iterating over all storage processes in exclusion check	2023-03-31 09:59:00 -07:00
Ata E Husain Bohra	3f6fcada45	EaR - Misc fixes found using end-to-end integration testing (#9806 ) * EaR - Misc fixes found using end-to-end integration testing Description Major changes proposed includes: 1. RESTClient filtering of trailing `/`(s) characters from input URI resource path 2. Avoid EKP exponential backup given RESTClient supports exponential backoffs retries for all retryable errors. 3. Memory allocation optimizations: 3.1. BaseCipher key management using Standalone semantics in KMSConnector interface endpoints 3.2. Optimize memcpy while looking encryption-keys in EKP endpoints 4. Avoid delay while starting EKP, given its criticality during cluster recovery. 5. Update BlobCipher to handle variable size BaseCipher buffer 6. Improved logging Testing Setup: 1. External KMS server to supply encryption keys (inhouse) 2. Create cluster with: cluster_aware & domain_aware config * Fix EncryptionOps test Description Testing * EaR - Misc fixes found using end-to-end integration testing Description Major changes: 1. Cleanup EKP driven exponential backup files. 2. Update EKP not to use #1. Testing * EaR - Misc fixes found using end-to-end integration testing Description Address review comments Testing * Fix AES 256 key length value Description Testing * Address review comments Description Testing	2023-03-30 22:22:26 -07:00
Nim Wijetunga	4a68e6072a	enable tls with https connection	2023-03-30 21:45:03 -07:00
A.J. Beamon	807646675c	Refactor the metacluster project into smaller files, and reorganize the namespaces. Move some metacluster and tenant testing helpers into the metacluster project.	2023-03-30 16:20:09 -07:00
A.J. Beamon	e61748c7d5	Move metacluster into its own directory and static library	2023-03-30 16:07:49 -07:00
A.J. Beamon	3164cadc6f	Merge pull request #9851 from sfc-gh-ajbeamon/improve-boolean-param Allow boolean parameters to be nested inside of namespaces or classes	2023-03-30 16:06:03 -07:00
Ata E Husain Bohra	054358e63f	EaR: Update REST validation token-name seperator from '#' -> '$' (#9848 ) Description RESTKms validation tokens are provided using foundationdb.conf, the file uses '#' as comment start character. Update the token-name seperator to '$' instead of '#' Testing RESTKmsConnectorUnit.toml	2023-03-30 15:40:09 -07:00
A.J. Beamon	64b6a5d257	Allow boolean parameters to be nested inside of namespaces or classes	2023-03-30 15:09:59 -07:00
Vaidas Gasiunas	894f555e95	Test loop profiler using API Tester (#7174 ) * Api Tester: Specify knobs in the toml file; Test loop profiler * Gracefully stop the loop profiler thread * Protect loop profiler thread by mutex * Create loop profiler thread only if is not stopped	2023-03-30 23:00:23 +02:00
Johannes M. Scheuermann	6a612dd85f	Don't stop iterating over all storage processes in exclusion check	2023-03-30 13:52:23 +02:00
Ata E Husain Bohra	0e720634f3	EaR: Allow RESTKmsConnector validation token newline char sanitization (#9831 ) Description Patch proposes ability to remove newline characters from KMSConnector validation tokens Testing RESTKmsConnectorUnit.toml	2023-03-29 16:56:46 -07:00
Xiaoxi Wang	0b65f87422	Update fdbclient/include/fdbclient/ServerKnobs.h comment about pivot space ratio	2023-03-28 14:51:00 -07:00
Xiaoxi Wang	d691e94af2	rename median ratio to pivot ratio; extract updatePivotAvailableSpaceRatio function; add related knobs	2023-03-28 14:51:00 -07:00
Josh Slocum	c7fcf1e8f2	handling multiple in-flight rollbacks for tss change feed stream comparison (#9816 )	2023-03-28 09:39:38 -05:00
Zhe Wu	33736ff9af	Cleanup GcGeneration test and function documents	2023-03-27 12:31:44 -07:00
Zhe Wu	40dc54223c	Add GC generation test, and make all simulation test passing	2023-03-27 11:46:13 -07:00
Zhe Wu	2da86c37aa	Add a knob to guard track tlog recovery	2023-03-27 11:42:27 -07:00
Zhe Wu	78bef8110b	Track tlog recovery: tlog side implementation	2023-03-27 11:42:27 -07:00
Jay Zhuang	cb389bf026	Merge pull request #9610 from sfc-gh-jazhuang/encrypt_inplace Add inplace encryption and decryption API which avoids the memory allocation and memcpy.	2023-03-27 11:21:06 -07:00
Zhe Wu	4a7f7cdfce	Merge pull request #9803 from halfprice/zhewu/exclude-check-existance Do not update exclude/failed system metadata in excludeServers if the input list is already excluded/failed.	2023-03-27 09:38:04 -07:00
Zhe Wu	6e1bb08677	Update documentation	2023-03-24 15:29:47 -07:00
Zhe Wu	8211b5d097	Add a check in excludeServer function that if the exclusion list already exists, don't need to issue new writes.	2023-03-24 14:57:31 -07:00
Dan Adkins	02f0a44987	Avoid divide-by-zero in isKeyValueInSample. (#9799 ) In the pathological case that both key size and value size are 0, the probability of choosing that key-value pair is 0, and we divide by zero when computing the sampledSize. This change adds documentation to that function, which was quite difficult to understand. In addition, we add `probability` to the returned values, since one of the callers was backing it out from sampledSize and itself in danger of dividing by zero.	2023-03-24 16:05:26 -04:00
Xiaoge Su	88eeb5a526	Remove WolfSSL support in FoundationDB	2023-03-23 20:17:18 -07:00
Jay Zhuang	dba3555635	fix inplaceEncrypt() unittest issue	2023-03-23 15:26:22 -07:00
Jay Zhuang	d9b37e527c	Replace EncryptFinal() with CTX_reset()	2023-03-23 15:26:22 -07:00
Jay Zhuang	0efd403e59	Add inplace encryption/decryption API	2023-03-23 15:26:22 -07:00
Jingyu Zhou	18a0fa0263	Merge pull request #9468 from johscheuer/dont-block-exclusion-stateless-processes Don't block the exclusion of stateless processes by the free capacity check	2023-03-23 08:59:43 -07:00
Johannes M. Scheuermann	694263ae5f	Format code and update comment	2023-03-22 16:31:04 +01:00
neethuhaneesha	1e656210e1	Adding rocksdb bloom filter knobs.	2023-03-21 13:10:40 -07:00
neethuhaneesha	06657e1e0e	Rocksdb knob changes. (#9389 )	2023-03-21 12:03:43 -07:00
He Liu	81c3cb8c50	Psm checkpoint (#9636 ) * Update NativeAPI getCheckpointForRange(). * Implemented checkpoint in SS. * clean up. * Disabled StorageServerCheckpointTest. * Serialized checkpoint creation and deletion. Simplified checkpoint GC, via deleting CheckpointMetaData::dir. * Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset. * Minor improvements on CheckpointMetaData and DataMoveMetaData. * fmt. * Optimized PhysicalShardMove test cleanup. * dismiss operation_obsolete, and throw actor_cancelled. * fmt. * Resolved commments.	2023-03-21 09:14:10 -07:00
A.J. Beamon	d324afe1bd	Merge pull request #9753 from sfc-gh-ajbeamon/fix-tenant-list-infinite-loop Do not list renaming tenants twice when listing tenant metadata	2023-03-20 16:05:56 -07:00
Evan Tschannen	a258d775c3	Merge pull request #9710 from sfc-gh-etschannen/feature-custom-dd Added the ability to manually create a shard and also increase its replication factor	2023-03-20 15:35:10 -07:00
Zhongxing Zhang	d2c1b3124e	add a field to show average data movement bytes in MovingData trace (#9591 ) * add a field to show average data movement bytes in MovingData trace * change variable type * Make changes to variable/function naming and add more comments * move rolling window struct to a new file; deal with corner case: dd startup, elements are full * format * simplify codes * modify file/struct name, universal to moving window * fix typo * add a new Knob to control MovingWindow::Deque size * make the general use of dequeSize limit * format * format, use space rather than tab	2023-03-20 14:33:32 -07:00
A.J. Beamon	6becf12ecd	Merge branch 'main' into fix-tenant-list-infinite-loop	2023-03-20 14:11:16 -07:00
A.J. Beamon	690a0a81ae	Reading a list of tenant metadata ordered by tenant name would occasionally get stuck in an infinite loop if the last tenant in a batch was being renamed and has the same ID as the tenant read in the previous batch. This change removes rename destinations from the list and avoids this problem.	2023-03-20 13:30:27 -07:00
Evan Tschannen	8e4eb83ba7	addressed review comments	2023-03-20 11:41:23 -07:00
Xiaoxi Wang	dc1eb1375b	add a miss healthy_perpetual_wiggle enum	2023-03-20 09:46:36 -07:00
Xiaoxi Wang	ef706e551f	Add more details into priority comments.	2023-03-20 09:46:36 -07:00
Xiaoxi Wang	e48fd10d8d	add perpetual wiggle to .team_tracker field	2023-03-20 09:46:36 -07:00
Xiaoxi Wang	f89a483f3d	add informal classification of priority	2023-03-20 09:46:36 -07:00
Xiaoxi Wang	c73577de7d	Add team priority comments and document.	2023-03-20 09:46:36 -07:00
Steve Atherton	216d0be2cf	Add processID, networkAddress, and locality to layer status JSON for Backup Agents. (#9736 ) * Add processID, networkAddress, and locality to layer status JSON for Backup Agents. * Backup/dr agent determines network address to report in Layer Status only once, when the status updater loop begins, since it is a blocking call which connects to the cluster. And lots of code cleanup.	2023-03-17 18:07:03 -07:00
A.J. Beamon	dc2bd78aa7	The consistency check should retry if it couldn't find all the commit proxies when getting key server locations	2023-03-17 12:00:47 -07:00
Evan Tschannen	73767501d4	Merge branch 'main' into feature-custom-dd # Conflicts: # fdbserver/tester.actor.cpp	2023-03-17 10:33:38 -07:00
Ata E Husain Bohra	c492f83bf4	EaR: Avoid appending `tls` to the URL (#9734 ) Description Patch proposes two changes: 1. Avoid appending tls as part of URI for secure connections 2. RefreshEKs recurring task can be skipped if there are no keys to be refreshed Testing EncryptionOps.toml EncryptKeyProxyTest.toml devRunCorrectness devRunCorrectnessFiltered 'Encrypt*'	2023-03-16 22:52:51 -07:00
He Liu	0f5e75b34b	Added newDataMoveId(). (#9647 ) * Added newDataMoveId(). * Added `ENABLE_DD_PHYSICAL_SHARD_MOVE` * fmt. * Replace `teamId` with `shardId`.	2023-03-16 18:06:06 -07:00
A.J. Beamon	484a414117	Increase the buggified tag measurement interval to reduce trace spam	2023-03-16 11:53:45 -07:00
Evan Tschannen	ac54962533	code cleanup	2023-03-16 09:47:21 -07:00
A.J. Beamon	6d5ffa11f9	Merge branch 'main' into fix-tenant-id-increment	2023-03-15 17:56:42 -07:00
Josh Slocum	b4eb665f1d	fixing copy constructor error and adding test for it (#9711 )	2023-03-15 15:33:16 -07:00
A.J. Beamon	3881f1ccc6	More carefully validate tenant increments to avoid overflows	2023-03-15 14:56:12 -07:00
Ata E Husain Bohra	dbcab0b1bd	Revert "Refactor GetEncryptCipherKeys (#9600 )" (#9708 ) This reverts commit `2702665e35`.	2023-03-15 12:10:08 -07:00
Evan Tschannen	aaf7b9b32b	Added the ability to manually create a shard and also increase its replication factor	2023-03-15 11:26:15 -07:00
Markus Pilman	aa09baadab	Merge pull request #9635 from sfc-gh-etschannen/fix-consistency-check Fix: the consistency check did not properly report failed tests	2023-03-15 11:21:44 -07:00
Evan Tschannen	6c1d02a14f	Merge pull request #9703 from sfc-gh-jslocum/bg_file_logical_size adding blob granule logical size	2023-03-15 09:59:57 -07:00
Evan Tschannen	2f96627d43	merge in main	2023-03-15 09:26:22 -07:00
Evan Tschannen	0a8435b742	Merge pull request #9702 from sfc-gh-jslocum/dbg_bg_ctest_timeout fixing 2 bugs related to high delta file waitCommitted latency	2023-03-15 08:52:35 -07:00
Johannes M. Scheuermann	b317928646	Only consider newly excluded processes	2023-03-15 15:36:15 +01:00
Josh Slocum	a5b4212990	adding blob granule logical size	2023-03-15 08:54:49 -05:00
Josh Slocum	52c0dc56cc	fixing 2 bugs related to high delta file waitCommitted latency	2023-03-15 08:39:42 -05:00
Evan Tschannen	c435e8336a	no message	2023-03-14 16:40:50 -07:00
He Liu	a0a3f4bff3	Fetch byte sample file (#9657 )	2023-03-14 16:24:08 -07:00
Zhe Wang	7d2766b692	Fix KeyRangeRef::isCovered() (#9675 ) * fix KeyRangeRef::isCovered() * reproduce bug * more unit test added * fmt	2023-03-14 12:41:18 -07:00
Jingyu Zhou	c5e9bdc6e4	Merge pull request #9684 from sfc-gh-ahusain/ahusain-fix-rest-test Fix RestUtilUnit test	2023-03-14 09:16:39 -07:00
A.J. Beamon	d39cda610a	Merge branch 'main' into metacluster-improvements # Conflicts: # fdbcli/TenantCommands.actor.cpp	2023-03-13 15:58:39 -07:00
Ata E Husain Bohra	aae8b131cb	Remove 'printf' Description Testing	2023-03-13 15:50:04 -07:00
Ata E Husain Bohra	a196f2fd75	Fix RestUtilUnit test Description Fix RestUtilUnit test Testing RESTUtilUnits.toml	2023-03-13 15:46:15 -07:00
A.J. Beamon	45056370b8	Merge branch 'main' into metacluster-improvements	2023-03-13 13:14:09 -07:00
A.J. Beamon	18cf523f49	Merge pull request #9660 from sfc-gh-ajbeamon/tenant-id-restore-safety Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix	2023-03-13 13:12:30 -07:00
Ata E Husain Bohra	ea796eb3ec	EaR: REST kms misc fixes (#9664 ) * EaR: REST kms misc fixes Description Patch addresses following issues: 1. Fix "return connection" routine, it fixes a regression introduced by an earlier fix. 2. Update RESTConnectionPool::connectionPoolMap to an "unordered_map" for O(1) lookups 3. Improve logging 4. Make RESTUrl parsing handle extra '/' for 'resource' Testing Standalone fdbserver connecting to external KMS and database create	2023-03-13 13:11:05 -07:00
A.J. Beamon	cbc330697c	Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix unless forced. Remember the largest used tenant ID on the data cluster and use it to update the management cluster tenant ID when force repopulating the same ID.	2023-03-10 15:36:37 -08:00
Jingyu Zhou	b755e668bf	Merge pull request #9601 from jzhou77/fix-head Allow log router to detect slow peeks and to switch DC for peeking	2023-03-09 15:34:24 -08:00
Ata E Husain Bohra	b227007ab0	EaR: Fix knob name (#9630 ) Description Knob 'REST_KMS_ALLOW_NOT_SECURE_CONNECTION' got renamed in recent patch, however, there are other places that needs an update too. Testing devRunCorrectness - 100K RESTUtilUnits.toml RESTKmsConnectorUnits.toml	2023-03-08 17:37:39 -08:00
Nim Wijetunga	2702665e35	Refactor GetEncryptCipherKeys (#9600 ) * inital commit * address pr comments	2023-03-08 17:05:03 -08:00
Evan Tschannen	4a17ed363a	Fix: the consistency check did not properly report failed tests	2023-03-08 16:56:23 -08:00
Nim Wijetunga	218ed4519f	Strengthen Snapshot Backup/Restore Asserts (#9552 ) strengthen backup/restore asserts for encryption	2023-03-08 15:24:02 -08:00
Ata E Husain Bohra	d0eec9d0ba	EaR: REST KMS fixes - encryption integration testing (#9598 ) * EaR: REST KMS fixes - encryption integration testing Description Major changes: 1. Multiple fixes observed while performing integration end-to-end testing for Encryption at-rest feature. 2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL to have more granular control of feature logging disconnected from the cluster log level. Testing Integration testbed: 1. Run fdbserver standalone 2. Run external KMS http-server to serve encryption key fetch requests	2023-03-08 09:49:43 -08:00
Hui Liu	c43f8b3fdc	Refactor - introduce BlobRestoreController for APIs to manage restore state (#9616 )	2023-03-08 07:50:30 -08:00
Johannes M. Scheuermann	c6eca3f398	Format code	2023-03-08 08:33:19 +01:00
Johannes M. Scheuermann	1550f3c596	Make use of precomputed exclude check	2023-03-08 08:19:42 +01:00
Johannes M. Scheuermann	bae627f016	Fix syntax	2023-03-08 08:19:42 +01:00
Johannes M. Scheuermann	db8c60c80f	Don't block the exclusion of stateless processes by the free capacity check	2023-03-08 08:19:41 +01:00
A.J. Beamon	de5f2c0fee	Disallow cluster names that start with the `\xff` byte	2023-03-07 11:46:34 -08:00
Steve Atherton	5ff0bc3f87	Merge pull request #9576 from sfc-gh-satherton/storage-configure-refactor Storage and log engine configuration support / refactor a few things.	2023-03-07 02:10:14 -08:00
Steve Atherton	f6747adf89	Move implementations to cpp file.	2023-03-06 18:43:26 -08:00
Jingyu Zhou	0259a243ae	Switch DC if log router peek becomes stuck Trying to a different DC if this happens.	2023-03-06 17:41:56 -08:00
Ata E Husain Bohra	a45de70003	EaR: RESTClient HTTP compliance, fix json request content type (#9544 ) * EaR: RESTClient HTTP compliance, fix json request content type Description diff-1: Address review comments RESTClient is responsible to handle FDB <-> KMS communication for Encryption and other usecases. By design, it only supports "secure connection" i.e. "https"; however, it seems there is a need to expand the module to support "http" connection, for instance: test and dev deployments for instance. However, given RESTClient gets involved in handling high sensitive contents such as: plaintext "encryption cipher from a KMS", the feature is guarded using CLIENT_KNOB->REST_KMS_ENABLE_NOT_SECURE_CONNECTION which is settable using FDBServer command line argument "--kms-rest-enable_not_secure_connection" (boolean) Testing Deployed a standalone fdbserver and communicate with a simple "http" server	2023-03-06 16:06:03 -08:00
Jingyu Zhou	0d8bde9dcd	Merge pull request #9505 from jzhou77/fix Support multiple key prefix filters for fdbdecode	2023-03-06 15:57:03 -08:00
Chaoguang Lin	7273723a43	Add the hotrange fdbcli command (#9570 ) * Add the hotrange fdbcli command * Remove the unnecessary state * Add the doc about the hotrange command	2023-03-06 14:46:52 -08:00
Jingyu Zhou	7a0b3c05b9	Merge pull request #9540 from sfc-gh-huliu/timestamp Report restore phase start time and eta	2023-03-06 14:06:23 -08:00
A.J. Beamon	85c3cf702c	Merge pull request #9584 from sfc-gh-ajbeamon/fix-metacluster-create-error-msg Fix metacluster create error message	2023-03-06 10:30:03 -08:00
A.J. Beamon	ea907f10f5	Print the tenant mode string rather than integer value when reporting that we couldn't create a metacluster	2023-03-06 09:25:50 -08:00
Josh Slocum	e1b620135b	Merge branch 'main' into bg_latency_fixes	2023-03-06 09:23:11 -06:00
Steve Atherton	50d567b5a5	Refactored some parts of database configuration to support log_engine=<name> and storage_engine=<name> and generate these when converting a DatabaseConfig JSON object to a `configure` command. Refactored `fileconfigure` and simulation setup to use the same JSON -> configure function as the same code was copy/pasted to both places but only one has been kept up to date with new features. Renamed Redwood to `ssd-redwood-1` canonically but the experimental name is still supported for backward compatibility.	2023-03-04 20:52:31 -08:00
Jingyu Zhou	df53bcd844	Merge branch 'main' of https://github.com/apple/foundationdb into fix	2023-03-03 20:32:29 -08:00
Hui Liu	b2d497a3b2	Report restore phase start timestamp	2023-03-03 18:09:51 -08:00
Jingyu Zhou	8847e70be0	Merge pull request #9306 from kakaiu/add-physical-shard-meta-data-to-checkpoint Dump checkpoint metadata to sst file	2023-03-03 14:45:50 -08:00
Jingyu Zhou	ca00c9485b	Merge branch 'main' of https://github.com/apple/foundationdb into fix	2023-03-03 11:12:40 -08:00
Xiaoxi Wang	b13b586b71	Merge pull request #9547 from sfc-gh-xwang/feature/main/minReadBand add knob for min read rebalance shard bandwidth	2023-03-03 09:37:23 -08:00
Jingyu Zhou	ee5154f478	Refactor decoder to read file as a whole once To reduce the number of network requests.	2023-03-03 09:32:12 -08:00
Zhe Wang	1766f412bb	address comments	2023-03-03 09:04:26 -08:00
Zhe Wang	e8aced0961	add sampled sample bytes to sst file	2023-03-03 09:04:26 -08:00
Zhe Wang	83a0547281	address comments and add test	2023-03-03 09:04:26 -08:00
Zhe Wang	338e0971a9	address comments	2023-03-03 09:04:25 -08:00
Zhe Wang	e283e067d3	clean and address comments	2023-03-03 09:04:25 -08:00
Zhe Wang	2e68b44579	dump-checkpoint-meta-data-to-sstfile	2023-03-03 09:04:25 -08:00
Josh Slocum	c6a21245d8	also using other GRVs BW already gets for committed version checking	2023-03-02 17:14:17 -06:00
Josh Slocum	57b120e702	Adding grv history so delta files can wait for less time to determine that they're committed	2023-03-02 17:14:17 -06:00
Xiaoxi Wang	26ffcf6b4a	add knob for min read rebalance shard bandwidth	2023-03-02 11:26:27 -08:00
Xiaoxi Wang	010d5590e3	Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/hotRangeDetect	2023-03-02 10:07:17 -08:00
Jingyu Zhou	de89d2cca1	Merge pull request #9521 from sfc-gh-ajbeamon/fix-metacluster-issues Fix some metacluster issues	2023-03-02 10:07:11 -08:00
Jingyu Zhou	ad778cbe5e	Merge branch 'main' of https://github.com/apple/foundationdb into fix	2023-03-02 09:56:30 -08:00
Josh Slocum	0c9397ef22	BW metric improvements for reads and file blocking	2023-03-02 10:57:51 -06:00
Josh Slocum	3861a2249b	increasing bw request timeout	2023-03-02 09:36:47 -06:00
Josh Slocum	40ed365303	fixing checkBlobSubRange to not increase version every retry	2023-03-02 09:34:55 -06:00
Xiaoxi Wang	2d78b126f6	rename splitCount to chunkCount	2023-03-01 21:51:51 -08:00
Xiaoxi Wang	179f0ba71c	new version of getReadHotRanges	2023-03-01 15:55:29 -08:00
A.J. Beamon	533f83b05e	Fix a few more issues in metacluster code and tests: 1. Some additional idempotency problems in metacluster tests 2. An assertion that checked that a rename had expected values could fail during concurrent restores, but it would only happen if the transaction itself would fail to commit 3. Tweak the parameters of the MetaclusterRecovery test to try to avoid rare cases of logging too many trace events	2023-03-01 15:31:36 -08:00
A.J. Beamon	544890a6cd	Merge branch 'main' into transaction-debug-logging	2023-03-01 10:09:17 -08:00
A.J. Beamon	2898a95c81	Fix two metacluster issues: 1. When retrying the transaction to register a restoring cluster, don't choose a new ID if the current ID matches the one recorded for the restoring cluster 2. A metacluster test was incorrectly handling the case where a transaction was retried with unknown result and had committed successfully	2023-02-28 15:40:04 -08:00
Junhyun Shim	6b26f5a6da	Fix transaction option consistency in TagThrottleInfo getter (#9513 ) * Fix transaction option consistency in TagThrottleInfo getter Subroutine of getter actor function for throttled and recommended tags was, upon retry, resetting the transaction object which the caller also uses, resetting the transaction option and causing a key_outside_legal_range by caller Also, allowing a subroutine to conditionally, non-trivially modify the passed object (i.e. transaction reset) is a risky pattern. Fix: confine subroutine's responsibility to "attempting to" fetch and parse "autoThrottlingEnabled" key. Let the calling function reset the object if needed. * Apply Clang format	2023-02-28 23:47:26 +01:00
A.J. Beamon	310fc2ff4e	Merge branch 'main' into transaction-debug-logging	2023-02-28 14:18:51 -08:00
Xiaoxi Wang	26237a291d	update read range reply field	2023-02-28 13:18:57 -08:00
Jingyu Zhou	6c955080e9	Merge pull request #9207 from sfc-gh-jslocum/disable_feed_coalesce disabling feed coalesce for now	2023-02-28 12:32:01 -08:00
Jingyu Zhou	a350a929b9	Merge pull request #9494 from sfc-gh-jslocum/bg_cp_improvements addressing review comments and fixmes in bg commit proxy code	2023-02-28 12:30:58 -08:00
Xiaoxi Wang	8cb2a1553a	add read ops sampler	2023-02-28 12:03:42 -08:00
Josh Slocum	f4308a0f6c	Merge branch 'main' into disable_feed_coalesce	2023-02-28 13:57:21 -06:00
A.J. Beamon	87ac857aeb	Make debug logging functions pure virtual on the transaction interfaces. Rename the function on TraceEvent to be more generic.	2023-02-28 11:11:06 -08:00
Markus Pilman	5bebb5b4aa	Merge pull request #9492 from sfc-gh-vgasiunas/vgasiunas-api-version-defs Centralize definition of API Version for Java, Python and C API	2023-02-28 12:04:02 -07:00
Josh Slocum	301f2fd201	disabling feed coalesce for now	2023-02-28 12:07:12 -06:00
Lukas Joswiak	47fc53ed6e	Adds more detailed mutation logging to commit proxy The commit proxy writes a `ProxyMetrics` trace every 5 seconds. This event contains a lot of useful information, such as the number of commit batches that arrived and exited, the number of mutations processed, the number of bytes those mutations made up, etc.. However, it is difficult to tell what the workload pattern looks like within these 5 second intervals when the metrics are being calculated. This PR adds a new trace, `ProxyDetailedMetrics`, which logs itself every 100ms. It currently only writes the number of mutations and the number of mutation bytes that arrived during the 100ms time period. But it should be easy to add more metrics in the future. It's possible this increased logging could cause issues. Based off a simulation run of the `WriteDuringRead` test, I got the following results: ``` $ rg ProxyDetailedMetrics trace.json \| wc -l 6877 $ rg "Roles\": \".CP.\"" trace.json \| wc -l 11402 $ wc -l trace.json 96147 trace.json ``` So on processes running as a commit proxy, this approximately doubled the number of lines logged. But relative to the cluster overall, it only added about 5% overhead. If we want to reduce this number, one possibility would be to not write a trace if all the values being written are 0. I'm not sure if this would help much in production, but in simulation the large majority of the traces (99%+) consist of zero values.	2023-02-28 09:48:39 -08:00
A.J. Beamon	cb66a76d80	Merge branch 'main' into transaction-debug-logging	2023-02-28 09:21:30 -08:00
Jingyu Zhou	4ea70b1f59	Merge pull request #9512 from sfc-gh-mpilman/bugfixes/remove-lockid-from-txnstatestore Don't store lockid in txnStateStore	2023-02-28 09:16:45 -08:00
A.J. Beamon	0abb33a9a5	Add the ability to print messages or log trace events based on a transaction's result	2023-02-28 09:06:54 -08:00
Ata E Husain Bohra	2db1da26d9	EaR: Update ApiWorkload to validate encryption at-rest guarantees (#9466 ) * EaR: Update ApiWorkload to validate encryption at-rest guarantees Description FDB encryption data at-rest guarantees if cluster is configured with feature enabled, all data written to persistent disks shall be "encrypted". Given FDB maintains multiple persistent storages during lifecycle of the data, the patch proposes a scheme to validate the invariant via "simulation testing" Patch proposes updating ApiCorrectness workload to do the following: 1. Client supplied params and/randomly enable the validation feature. 2. Validation when enabled, allows injecting a known "marker string" to workload generated Key and Value data patterns. 3. On shutdown, if the validation is enabled, all test files are scanned for the known "marker" pattern. Simulation tests are already capable of doing the following: 1. Randomly select TenantMode (disabled/optional/required) 2. Randomly select EncryptionAtRestMode (cluster_aware/domain_aware) Hence, the updates test all possible combinations are validated. Also, 'defaultTenant' is present to cover 'domain_aware' encryption use cases. Testing devRunCorrectness devRetryCorrectness - ApiCorrectness & EncryptedBackupCorrectness	2023-02-27 21:40:46 -08:00
Markus Pilman	871a9676ea	Don't store lock id in txnStateStore	2023-02-27 21:25:42 -07:00
Markus Pilman	20874d8575	Merge pull request #9502 from sfc-gh-ajbeamon/metacluster-tenant-lock-support Metacluster tenant lock support	2023-02-27 21:19:03 -07:00
Jingyu Zhou	5ac526a3e5	Merge pull request #9474 from sfc-gh-xwang/feature/main/readaware enable read-aware DD by default and write release notes/doc	2023-02-27 20:04:04 -08:00
Jingyu Zhou	1313a7fa25	Use KeyspaceSnapshotFile to filter range files	2023-02-27 19:41:08 -08:00
Jingyu Zhou	842d485862	Merge pull request #9402 from yao-xiao-github/main Add shard consistency validation.	2023-02-27 17:05:30 -08:00
Jingyu Zhou	f414cd0ed8	Merge pull request #9486 from sfc-gh-ajbeamon/metacluster-management-concurrency-restore-support Add restore to the metacluster management concurrency workload	2023-02-27 17:05:05 -08:00
A.J. Beamon	469e77158f	Add metacluster support for tenant locking	2023-02-27 16:53:13 -08:00
Jingyu Zhou	29a406948a	Merge pull request #9370 from sfc-gh-mpilman/features/tenant-lock-fdbcli fdbcli commands for tenant lock	2023-02-27 16:18:51 -08:00
Xiaoxi Wang	eeade33c30	Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/readaware	2023-02-27 14:18:44 -08:00
Russell Sears	bcc05b1058	Improve support for prebuilt boost	2023-02-27 15:38:58 -06:00
Zhe Wu	7304e5cad0	Merge pull request #9485 from halfprice/zhewu/backup-size-estimate Enhance fdbbackup query command to estimate data processing from a specific snapshot to a target version	2023-02-27 13:35:54 -08:00
Vishesh Yadav	dd0ea8b0cf	Clang-format	2023-02-27 13:10:19 -08:00
Vishesh Yadav	3e6e31ad0b	Use the RangeMapFilters	2023-02-27 13:08:55 -08:00
Jingyu Zhou	dd4bc82862	Refactor code	2023-02-27 13:06:01 -08:00
Jingyu Zhou	46fce2710e	Use RangeMap for backup agent filtering This is more efficient than going through ranges one by one.	2023-02-27 12:21:52 -08:00
A.J. Beamon	a44c4c2e2e	Merge pull request #9478 from sfc-gh-ajbeamon/assert-comparison-all-types Allow performing assert comparisons with any traceable type	2023-02-27 11:03:27 -08:00
Xiaoxi Wang	da7d441436	Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/readaware	2023-02-27 09:09:35 -08:00
Josh Slocum	716a9c3817	addressing review comments and fixmes in bg commit proxy code	2023-02-27 10:51:47 -06:00
Ankita Kejriwal	f7108958bf	Merge pull request #9449 from sfc-gh-akejriwal/exclusion Improve space estimation in checkExclusion()	2023-02-27 08:31:52 -08:00
Markus Pilman	a0e347c7ba	Merge remote-tracking branch 'origin/main' into features/tenant-lock-fdbcli	2023-02-27 09:09:20 -07:00
Vaidas Gasiunas	ba726fac87	Replace hardcoded API version checks for 720 and 730	2023-02-27 16:18:01 +01:00
Steve Atherton	674c105050	Merge pull request #9473 from sfc-gh-etschannen/feature-change-feed-lock Replace fetchKeysParallelismFullLock to speed up fetch keys in idle clusters	2023-02-26 23:18:43 -08:00
Zhe Wu	ffa3467098	Explicitly using min and max restorable version from backup description in query command in stead of going throw snapshots	2023-02-26 12:17:07 -08:00
A.J. Beamon	364bf062cb	A few fixes to prevent simultaneous a simultaneous restore and removal from both making progress: 1. Change the cluster ID each time the restore registers the cluster 2. Handle commit unknown result in the removal purge 3. Delete the active restore ID when a removal is first recorded rather than at the end of the removal 4. Delete any existing active restore IDs when registering a cluster in the management cluster	2023-02-25 22:52:02 -08:00
A.J. Beamon	e6a4b0489e	Run a couple restore transactions using the correct runRestoreManagamentTransaction function in order to verify that the restore is still valid	2023-02-25 19:59:59 -08:00
A.J. Beamon	8e231f8809	During a dry-run restore, it is possible that the tenants being restored are modified concurrently. Handle this case with an output message.	2023-02-25 19:59:29 -08:00
A.J. Beamon	b869f5a6ac	Don't validate the configuration sequence number of a tenant during tenant reconciliation until we've started a transaction and confirmed the restore is still valid	2023-02-25 19:58:41 -08:00
A.J. Beamon	b1111ce112	When renaming a tenant during a restore, use the rename destination for the tenant if it has one	2023-02-25 19:57:59 -08:00
A.J. Beamon	3aca47e600	When reconciling tenants during a restore, if a tenant is in the removing state on the management cluster, remove it from the data cluster	2023-02-25 19:54:30 -08:00
A.J. Beamon	040d44927b	Store the rename destination for tenant movements in the crluster tenant index with an ID of -1. Use this to filter out tenant aliases when modifying the tenant count during a tenant purge.	2023-02-25 19:51:21 -08:00
Zhe Wu	8a88df0e91	Query backup size from a specific snapshot	2023-02-25 17:38:27 -08:00
Zhe Wu	a94dd3a430	Fix fdbbackup query returning earliest version	2023-02-25 16:44:45 -08:00
A.J. Beamon	8c3ee768a2	Add an option to allow exceeding the tenant group capacity limit when changing tenant configuration	2023-02-24 21:01:36 -08:00
A.J. Beamon	1c71056e26	Merge pull request #9479 from sfc-gh-nwijetunga/nim/enforce-metacluster-tenant-mode Enforce Disabled Tenant Mode in Metacluster	2023-02-24 19:27:57 -08:00
Ankita Kejriwal	99a1fb52e3	Prevent divison by 0	2023-02-24 18:36:55 -08:00
Jingyu Zhou	6b121de6a6	Merge pull request #9464 from jzhou77/fix Add exclude to fdbcli's configure command	2023-02-24 16:31:02 -08:00
Nim Wijetunga	c1087187d1	format change	2023-02-24 15:43:40 -08:00
Nim Wijetunga	c8b7cff10c	fix api test	2023-02-24 15:26:59 -08:00
Josh Slocum	6187811f71	Reworking getBlobGranuleRanges to also use commit proxy rpc for authz, and adding test (#9470 )	2023-02-24 17:15:32 -06:00
Nim Wijetunga	eca98afcb0	metacluster check tenant mode	2023-02-24 13:59:54 -08:00
Evan Tschannen	8872e5a462	Merge pull request #9347 from sfc-gh-etschannen/feature-change-feed-cache added a disk to blob workers	2023-02-24 13:59:03 -08:00
A.J. Beamon	4a38bb4c3f	Allow performing assert comparisons (e.g. ASSERT_EQ) with any traceable type	2023-02-24 12:53:01 -08:00
Xiaoxi Wang	998a5b7c0e	enable read-aware DD by default and write release notes/doc	2023-02-24 11:11:25 -08:00
Evan Tschannen	f3673d808b	Replaced the fetchKeysParallelismFullLock with a lock specifically for change feeds to avoid blocking fetches on idle clusters	2023-02-24 10:59:35 -08:00
Jingyu Zhou	9a257a60a4	Address review comments	2023-02-24 10:47:32 -08:00
Markus Pilman	6c15506c36	Fixed tests	2023-02-24 11:32:37 -07:00
A.J. Beamon	03fbc59bb1	Merge pull request #9461 from sfc-gh-ajbeamon/metacluster-concurrent-restore-testing Metacluster concurrent restore testing	2023-02-24 09:13:51 -08:00
Markus Pilman	ee9d511d16	Merge pull request #9463 from sfc-gh-mpilman/buildcop/2023-02-23/bugfixes/arm-awssdk Fix build issue with awssdk_target	2023-02-24 09:20:08 -07:00
Nim Wijetunga	29819b0645	Change Feed Bug Fix + Encryption Asserts (#9457 ) * add encryption asserts * modify function name * address pr comments * address pr comments * Trigger Build	2023-02-23 19:33:25 -08:00
A.J. Beamon	2b25cfef8b	Merge branch 'main' into metacluster-concurrent-restore-testing	2023-02-23 16:06:47 -08:00
Jon Fu	33f8e90f9f	Split tenant group metadata (#9446 ) * initial commit to split tenant group metadata * attempt to fix merge errors * fix compile errors and adjust existing tests * fix infinite loop and extra ACTOR tag * direct assignment instead of store * direct assign instead of store (missed a few)	2023-02-23 18:11:49 -05:00
Jingyu Zhou	1f1dc5e768	Allow a comma separated list of excluded addresses	2023-02-23 14:29:08 -08:00
Jingyu Zhou	6ac8720364	Add exclude to fdbcli's configure command Right now this only allows one server address being excluded. This is useful when the database is unavailable but we want the recruitment to skip some particular processes. Manually tested the concept works with a loopback cluster.	2023-02-23 14:28:20 -08:00
Markus Pilman	c1f80fe471	Merge remote-tracking branch 'origin/main' into features/tenant-lock-fdbcli	2023-02-23 15:16:14 -07:00
Jingyu Zhou	792950dbdc	Merge pull request #9434 from sfc-gh-huliu/splitmetrics Implement SplitMetric pagination in blob migrator	2023-02-23 14:10:27 -08:00
Markus Pilman	1862e65415	Fix build issue with awssdk_target	2023-02-23 15:05:17 -07:00
Markus Pilman	8759fd8f12	Fix refactoring mistake	2023-02-23 14:41:27 -07:00
A.J. Beamon	54955d54f2	Don't allow repopulating from a management cluster if there is another ID registered for the same cluster. Instead, the cluster must be unregistered first before repopulating from it. Also improves a trace event.	2023-02-23 13:28:10 -08:00
A.J. Beamon	c2d28377af	Set the restore ID in the data cluster after marking the cluster restoring in the management cluster	2023-02-23 13:28:10 -08:00

... 4 5 6 7 8 ...

7281 Commits