foundationdb

Commit Graph

Author	SHA1	Message	Date
Jon Fu	8ef0411b32	address code review comments and introduce offset parameter	2022-11-03 11:39:39 -07:00
Jon Fu	d95eb4dd71	Merge branch 'main' of github.com:apple/foundationdb into tenant-list-filter	2022-10-28 10:00:42 -07:00
Andrew Noyes	0a15f081a1	Proactively clean up idempotency ids for successful commits (#8578 ) * Proactively clean up idempotency ids for successful commits This change also includes some minor changes from my branch working on an idempotency ids cleaner, that I'd like to get merged sooner rather than later. - Adding a timestamp to idempotency values - Making IdempotencyId an actor file - Adding commit_unknown_result_fatal - Checking idempotencyIdsExpiredVersion in determineCommitStatus - Some testing QOL changes * Factor out decodeIdempotencyKey logic * Fix formatting * Update flow/include/flow/error_definitions.h Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * Use KeyBackedObjectProperty for idempotencyIdsExpiredVersion * Add IDEMPOTENCY_ID_IN_MEMORY_LIFETIME knob * Rename ExpireIdempotencyKeyValuePairRequest Also add a code probe for the case where an ExpireIdempotencyIdRequest is received before the count is known, and add an assert * Fix formatting and add TODO for nwijetunga Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>	2022-10-28 09:07:54 -07:00
Lukas Joswiak	91146a03f0	Write cluster ID to `ClientDBInfo` This enables clients to receive the cluster ID.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	28540e5962	Format	2022-10-27 13:56:13 -07:00
Lukas Joswiak	02bc5edbf8	Avoid blocking in choose when	2022-10-27 13:56:13 -07:00
Lukas Joswiak	9d3c3b1efe	Remove cluster ID logic from individual roles The logic to determine the validity of a process joining a cluster now belongs on the worker and the cluster controller. It is no longer restricted to tlogs and storages, but instead applies to all processes (even stateless ones).	2022-10-27 13:56:13 -07:00
Lukas Joswiak	1fca3b7ddc	Modify how cluster ID tests are run in simulation	2022-10-27 13:56:13 -07:00
Lukas Joswiak	bba05b7c9b	Move cluster ID from txnStateStore to the database The cluster ID is now stored in the database instead of in the txnStateStore. The cluster controller will read it on boot and send it to all processes to persist.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	5ca2b89bdf	Fix simulation issue where process switch was ignored The simulator tracks only active processes. Rebooted or killed processes are removed from the list of processes, and only get added back when the process is rebooted and starts up again. This causes a problem for the `RebootProcessAndSwitch` kill type, which wants to simultaneously reboot all machines in a cluster and change their cluster file. If a machine is currently being rebooted, it will miss the reboot process and switch command. The fix is to add a check when a process is being started in simulation. If the process has had its cluster file changed and the cluster is in a state where all processes should have had their cluster files reverted to the original value, the simulator will now send a `RebootProcessAndSwitch` signal right when the process is started. This will cause an extra reboot, but should correctly switch the process back to its original, correct cluster file, allowing the cluster to fully recover all clusters. Note that the above issue should only affect simulation, due to how the simulator tracks processes and handles kill signals. This commit also adds a field to each process struct to determine whether the process is being run in a DR cluster in the simulation run. This is needed because simulation does not differentiate between processes in different clusters (other than by the IP), and some processes needed to switch clusters and some simply needed to be rebooted.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	f43011e4b7	Notify processes joining the wrong cluster And have these processes enter a "zombie" state where they cancel all their actors and then wait forever, refusing to do any additional work until they are manually handled by the operator.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	72a97afcd6	Avoid recruiting workers with different cluster ID	2022-10-27 13:56:13 -07:00
Lukas Joswiak	a72066be33	Add simulation support for changing the cluster file	2022-10-27 13:56:13 -07:00
Xiaoge Su	ab0f827058	configurationMonitor does not need to check watch reference count	2022-10-27 12:42:05 -07:00
Xiaoge Su	639afbe62c	Cancel watch when the key is not being waited Currently, there is a cyclic reference situation in DatabaseContext -> WatchMetadata -> watchStorageServerResp -> DatabaseContext If there is a watch created in the DatabaseContext, even the corresponding wait ACTOR is cancelled, the WatchMetadata will still hold a reference to watchStorageServerResp ACTOR, which holds a reference to DatabaseContext. In this situation, any DatabaseContext who held a watch will not be automatically destructed since its reference count will never reduce to 0 until the watch value is changed. Every time the cluster recoveries, several watches are created, and when the cluster restarts, the DatabaseContext which not being used, will not be able to destructed due to these watches. With this patch, each wait to the watch will be counted. Either the watch is triggered or cancelled, the corresponding count will be reduced. If a watch is not being waited, the watch will be cancelled, effectively reduce the reference count of DatabaseContext. This will hopefully fix the issue mentioned above. The code is tested by 1) Manually change the number of logs of a local cluster, see the cluster recovery and previous DatabaseContext being destructed; 2) 100K joshua run, with 1 failure, the same test will fail on the current git main branch.	2022-10-27 12:42:05 -07:00
Nim Wijetunga	bf01d9b879	Bulk Setup Workload Improvements (#8573 ) * bulk setup workload improvements * fix workload * modify	2022-10-27 11:10:14 -07:00
Josh Slocum	4d3553481f	Blob connection provider test (#8478 ) * Refactoring test blob metadata creation * Implementing BlobConnectionProviderTest * createRandomTestBlobMetadata supports blobstore and works outside simulation	2022-10-27 10:44:06 -05:00
Jon Fu	886c286297	Merge branch 'main' of github.com:apple/foundationdb into tenant-list-filter	2022-10-26 15:01:46 -07:00
Jon Fu	b17c3fecbb	add invalid tenant state and assertion in metacluster consistency	2022-10-26 14:37:00 -07:00
Dennis Zhou	deeedfc3f8	Merge pull request #8537 from sfc-gh-dzhou/unblob blob: allow purge ranges to begin and end in unblobbified regions	2022-10-26 11:11:09 -07:00
Josh Slocum	623e6ef761	adding delay in bw forced shutdown to prevent crash races (#8552 )	2022-10-26 12:22:41 -05:00
Nim Wijetunga	6f37f55917	Restore System Keys First in Backup/Restore Workloads (#8475 ) * system key restore ordering * restore system keys before regular data * atomic restore backup fix * change testing * fix compile error * fix compile issue * fix compile issues * Trigger Build * only split restore if encryption is enabled * revert knob changes * Update fdbserver/workloads/AtomicSwitchover.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * Update fdbserver/workloads/AtomicSwitchover.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * Update fdbserver/workloads/BackupCorrectness.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * Update fdbserver/workloads/AtomicRestore.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * add todo * strengthen check * seperate system restore for atomic restore * address pr comments * address pr comments Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>	2022-10-26 09:38:27 -07:00
Josh Slocum	ab6953be7d	Blob Granule read-driven compaction (#8572 )	2022-10-26 09:02:50 -07:00
Marian Dvorsky	3c5d3f7a94	Fix SpanContext for GP:getLiveCommittedVersion (#8565 ) * Fix SpanContext for GP:getLiveCommittedVersion	2022-10-26 16:29:28 +02:00
Xiaoxi Wang	bb0236433c	Merge pull request #8540 from sfc-gh-xwang/feature/main/storageMetrics Make MockStorageServer serve StorageMetrics related request	2022-10-25 17:29:21 -07:00
Xiaoxi Wang	0a5e596758	fix network failure check in unit test	2022-10-25 16:43:00 -07:00
Xiaoxi Wang	36d9de9072	change UNREACHABLE to ASSERT(false); change function name	2022-10-25 15:43:24 -07:00
Trevor Clinkenbeard	0f4fddfa17	Merge pull request #8480 from sfc-gh-tclinkenbeard/reject-tag-throttled-txns Reject transactions that have been tag throttled too long	2022-10-25 15:34:07 -07:00
Jingyu Zhou	744c391608	Merge pull request #8539 from vishesh/cc-fail-later Don't fail ConsistencyCheck on first mismatch	2022-10-25 15:33:11 -07:00
sfc-gh-tclinkenbeard	e8e7c873d8	Merge remote-tracking branch 'origin/main' into reject-tag-throttled-txns	2022-10-25 14:28:55 -07:00
Trevor Clinkenbeard	25f3a99b3d	Merge pull request #8568 from sfc-gh-tclinkenbeard/make-tracecounters-method Encapsulate `CounterCollection`	2022-10-25 14:27:56 -07:00
sfc-gh-tclinkenbeard	f339819758	Merge remote-tracking branch 'origin/main' into reject-tag-throttled-txns	2022-10-25 11:59:35 -07:00
Xiaoxi Wang	5a8adca1f7	solve review comments: mark const; add comments; template abbreviation	2022-10-25 10:56:24 -07:00
sfc-gh-tclinkenbeard	74212eeacf	Encapsulate CounterCollection	2022-10-25 10:17:15 -07:00
Jingyu Zhou	0ae568a872	Merge pull request #8556 from jzhou77/fix Fix stack overflows	2022-10-24 16:46:35 -07:00
Ankita Kejriwal	ce733cd1a1	Merge pull request #8538 from sfc-gh-akejriwal/monitorusage Add functionality to get tenants over storage quota and improve the relevant monitors	2022-10-24 16:28:07 -07:00
Hui Liu	e2dc50d220	Merge pull request #8508 from sfc-gh-huliu/storageinterf Implement StorageServerInterface for BlobMigrator	2022-10-24 16:10:31 -07:00
Zhe Wu	0140991d15	Rename NewPhysicalShardReason to RetryFindDstReason	2022-10-24 15:18:20 -07:00
Zhe Wu	fc9295ab66	Address comments	2022-10-24 15:18:20 -07:00
Zhe Wu	22047385c4	Count the detailed reason for new physical shard creation during data move	2022-10-24 15:18:20 -07:00
Hui Liu	f2289ced27	Add StorageServerInterface for BlobMigrator	2022-10-24 13:12:07 -07:00
Xiaoxi Wang	db72a29c06	fix compile error after rebase	2022-10-24 11:16:23 -07:00
Jingyu Zhou	a8f821e152	Fix stack overflows The loop is transformed by actor compiler into recursions that may cause stack overflows. Thus, I added yield() to unwind stack and refactor the parsing code so that the subsequent files are blocked until previous ones have finished.	2022-10-24 11:13:11 -07:00
Dennis Zhou	136a325fdc	blob/testing: randomly purge the whole range instead of just active	2022-10-24 11:08:04 -07:00
Dennis Zhou	070e4c133e	blob/testing: remove setRange() and call (un)blobbifyRange() directly This also fixes a few wrong setRange(true/false).	2022-10-24 11:08:04 -07:00
Xiaoxi Wang	918018d492	format code	2022-10-24 10:50:46 -07:00
Xiaoxi Wang	0d4b4d05e2	implement MSS as IStorageMetricsService and pass the unit test	2022-10-24 09:58:41 -07:00
Xiaoxi Wang	3c67b7df39	extract serveStorageMetricsRequests template function	2022-10-24 09:58:41 -07:00
Xiaoxi Wang	c14ee5395f	define IStorageMetricsService	2022-10-24 09:58:41 -07:00
Xiaoxi Wang	e07a50573a	splitStorageMetrics finish implementation (no unit test yet but 100k test pass)	2022-10-24 09:58:41 -07:00

1 2 3 4 5 ...

11088 Commits