foundationdb

Commit Graph

Author	SHA1	Message	Date
Jingyu Zhou	dc60f63f9b	Revert "Cancel watch when the key is not being waited" This reverts commit `639afbe62c`.	2022-10-27 19:46:05 -07:00
Jingyu Zhou	fbe9802be5	Revert "configurationMonitor does not need to check watch reference count" This reverts commit `ab0f827058`.	2022-10-27 19:46:05 -07:00
Jingyu Zhou	634bd529e7	Revert "Record the version of each watch" This reverts commit `4bd24e4d64`.	2022-10-27 19:46:05 -07:00
Jingyu Zhou	19ae4e7eb7	Revert "Reformat source" This reverts commit `ec47c261bf`.	2022-10-27 19:46:05 -07:00
Jingyu Zhou	e460933b52	Revert "Remove debugging output" This reverts commit `41d1d6404d`.	2022-10-27 19:46:05 -07:00
Jingyu Zhou	e7fd3eda00	Revert "Update fdbclient/NativeAPI.actor.cpp" This reverts commit `812243bafa`.	2022-10-27 19:46:05 -07:00
Lukas Joswiak	9625efd5b9	Add comment about configuration database	2022-10-27 13:56:13 -07:00
Lukas Joswiak	8e76621653	Disable shared state updates on configuration database	2022-10-27 13:56:13 -07:00
Lukas Joswiak	91146a03f0	Write cluster ID to `ClientDBInfo` This enables clients to receive the cluster ID.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	28540e5962	Format	2022-10-27 13:56:13 -07:00
Lukas Joswiak	a8f8757f77	Rename cluster ID key In FDB 7.1, this key was stored in the txnStateStore. In 7.2, it has been moved to the database. This was causing protocol compatibility issues during upgrades, so we need to rename the key.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	02bc5edbf8	Avoid blocking in choose when	2022-10-27 13:56:13 -07:00
Lukas Joswiak	9d3c3b1efe	Remove cluster ID logic from individual roles The logic to determine the validity of a process joining a cluster now belongs on the worker and the cluster controller. It is no longer restricted to tlogs and storages, but instead applies to all processes (even stateless ones).	2022-10-27 13:56:13 -07:00
Lukas Joswiak	1fca3b7ddc	Modify how cluster ID tests are run in simulation	2022-10-27 13:56:13 -07:00
Lukas Joswiak	bba05b7c9b	Move cluster ID from txnStateStore to the database The cluster ID is now stored in the database instead of in the txnStateStore. The cluster controller will read it on boot and send it to all processes to persist.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	5ca2b89bdf	Fix simulation issue where process switch was ignored The simulator tracks only active processes. Rebooted or killed processes are removed from the list of processes, and only get added back when the process is rebooted and starts up again. This causes a problem for the `RebootProcessAndSwitch` kill type, which wants to simultaneously reboot all machines in a cluster and change their cluster file. If a machine is currently being rebooted, it will miss the reboot process and switch command. The fix is to add a check when a process is being started in simulation. If the process has had its cluster file changed and the cluster is in a state where all processes should have had their cluster files reverted to the original value, the simulator will now send a `RebootProcessAndSwitch` signal right when the process is started. This will cause an extra reboot, but should correctly switch the process back to its original, correct cluster file, allowing the cluster to fully recover all clusters. Note that the above issue should only affect simulation, due to how the simulator tracks processes and handles kill signals. This commit also adds a field to each process struct to determine whether the process is being run in a DR cluster in the simulation run. This is needed because simulation does not differentiate between processes in different clusters (other than by the IP), and some processes needed to switch clusters and some simply needed to be rebooted.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	f43011e4b7	Notify processes joining the wrong cluster And have these processes enter a "zombie" state where they cancel all their actors and then wait forever, refusing to do any additional work until they are manually handled by the operator.	2022-10-27 13:56:13 -07:00
Lukas Joswiak	72a97afcd6	Avoid recruiting workers with different cluster ID	2022-10-27 13:56:13 -07:00
Lukas Joswiak	a72066be33	Add simulation support for changing the cluster file	2022-10-27 13:56:13 -07:00
Jingyu Zhou	6e0835f8a8	Merge pull request #8599 from technmsg/main updated copyright year on web site	2022-10-27 13:36:56 -07:00
Xiaoge Su	812243bafa	Update fdbclient/NativeAPI.actor.cpp Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com>	2022-10-27 12:42:05 -07:00
Xiaoge Su	41d1d6404d	Remove debugging output	2022-10-27 12:42:05 -07:00
Xiaoge Su	ec47c261bf	Reformat source	2022-10-27 12:42:05 -07:00
Xiaoge Su	4bd24e4d64	Record the version of each watch In the case 1. A watch to key A is set, the watchValueMap ACTOR, noted as X, starts waiting. 2. All watches are cleared due to connection string change. 3. The watch to key A is restarted with watchValueMap ACTOR Y. 4. X receives the cancel exception, and tries to dereference the counter. This causes Y gets cancelled. the reference count will cause watch prematurely terminate. Recording the versions of each watch would help preventing this issue	2022-10-27 12:42:05 -07:00
Xiaoge Su	ab0f827058	configurationMonitor does not need to check watch reference count	2022-10-27 12:42:05 -07:00
Xiaoge Su	639afbe62c	Cancel watch when the key is not being waited Currently, there is a cyclic reference situation in DatabaseContext -> WatchMetadata -> watchStorageServerResp -> DatabaseContext If there is a watch created in the DatabaseContext, even the corresponding wait ACTOR is cancelled, the WatchMetadata will still hold a reference to watchStorageServerResp ACTOR, which holds a reference to DatabaseContext. In this situation, any DatabaseContext who held a watch will not be automatically destructed since its reference count will never reduce to 0 until the watch value is changed. Every time the cluster recoveries, several watches are created, and when the cluster restarts, the DatabaseContext which not being used, will not be able to destructed due to these watches. With this patch, each wait to the watch will be counted. Either the watch is triggered or cancelled, the corresponding count will be reduced. If a watch is not being waited, the watch will be cancelled, effectively reduce the reference count of DatabaseContext. This will hopefully fix the issue mentioned above. The code is tested by 1) Manually change the number of logs of a local cluster, see the cluster recovery and previous DatabaseContext being destructed; 2) 100K joshua run, with 1 failure, the same test will fail on the current git main branch.	2022-10-27 12:42:05 -07:00
Xiaoge Su	03b102d86a	Clean up unused comment in flow.h	2022-10-27 12:42:05 -07:00
Alex Moundalexis	67049518b9	updated copyright year on web site	2022-10-27 15:05:52 -04:00
Nim Wijetunga	bf01d9b879	Bulk Setup Workload Improvements (#8573 ) * bulk setup workload improvements * fix workload * modify	2022-10-27 11:10:14 -07:00
Jingyu Zhou	fe66c026b4	Merge pull request #8598 from jzhou77/fix Fix restarting restore test failure	2022-10-27 10:44:17 -07:00
Josh Slocum	4d3553481f	Blob connection provider test (#8478 ) * Refactoring test blob metadata creation * Implementing BlobConnectionProviderTest * createRandomTestBlobMetadata supports blobstore and works outside simulation	2022-10-27 10:44:06 -05:00
Jingyu Zhou	6c0f890f78	Fix restarting restore test failure Old fdbserver may not set the "enableSnapshotBackupEncryption" key, thus we should allow the key to be not present.	2022-10-27 08:43:55 -07:00
Vaidas Gasiunas	c6adb3a98c	Building fdb_c_shim to a shared library (#8586 )	2022-10-27 12:37:20 +02:00
Markus Pilman	2bf9c2f448	Merge pull request #8588 from sfc-gh-mpilman/bugfixes/fix-build-dependencies Fix AWS SDK build and removed check for old build system	2022-10-26 12:36:08 -06:00
Dennis Zhou	deeedfc3f8	Merge pull request #8537 from sfc-gh-dzhou/unblob blob: allow purge ranges to begin and end in unblobbified regions	2022-10-26 11:11:09 -07:00
Markus Pilman	989731f7f4	Fix AWS SDK build and removed check for old build system	2022-10-26 11:48:10 -06:00
Aaron Molitor	f620f391f5	make same change to Dockerfile.eks (from #8583 )	2022-10-26 12:24:37 -05:00
Josh Slocum	623e6ef761	adding delay in bw forced shutdown to prevent crash races (#8552 )	2022-10-26 12:22:41 -05:00
Nim Wijetunga	6f37f55917	Restore System Keys First in Backup/Restore Workloads (#8475 ) * system key restore ordering * restore system keys before regular data * atomic restore backup fix * change testing * fix compile error * fix compile issue * fix compile issues * Trigger Build * only split restore if encryption is enabled * revert knob changes * Update fdbserver/workloads/AtomicSwitchover.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * Update fdbserver/workloads/AtomicSwitchover.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * Update fdbserver/workloads/BackupCorrectness.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * Update fdbserver/workloads/AtomicRestore.actor.cpp Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com> * add todo * strengthen check * seperate system restore for atomic restore * address pr comments * address pr comments Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>	2022-10-26 09:38:27 -07:00
Josh Slocum	ab6953be7d	Blob Granule read-driven compaction (#8572 )	2022-10-26 09:02:50 -07:00
Aaron Molitor	b8b7b46d8f	update kubectl and awscli	2022-10-26 10:52:05 -05:00
Marian Dvorsky	3c5d3f7a94	Fix SpanContext for GP:getLiveCommittedVersion (#8565 ) * Fix SpanContext for GP:getLiveCommittedVersion	2022-10-26 16:29:28 +02:00
Junhyun Shim	32099bfce5	Merge pull request #8564 from sfc-gh-jshim/enable-authz-benchmark-in-mako Enable authz/TLS-enabled benchmark in mako	2022-10-26 14:55:53 +02:00
Junhyun Shim	2917598dc4	Merge remote-tracking branch 'origin/main' into enable-authz-benchmark-in-mako	2022-10-26 12:49:13 +02:00
Aaron Molitor	e4116f8aee	cleanup shell script, remove set -x, add more detailed logging	2022-10-25 23:23:22 -05:00
Xiaoxi Wang	bb0236433c	Merge pull request #8540 from sfc-gh-xwang/feature/main/storageMetrics Make MockStorageServer serve StorageMetrics related request	2022-10-25 17:29:21 -07:00
Xiaoxi Wang	0a5e596758	fix network failure check in unit test	2022-10-25 16:43:00 -07:00
Xiaoxi Wang	36d9de9072	change UNREACHABLE to ASSERT(false); change function name	2022-10-25 15:43:24 -07:00
Trevor Clinkenbeard	0f4fddfa17	Merge pull request #8480 from sfc-gh-tclinkenbeard/reject-tag-throttled-txns Reject transactions that have been tag throttled too long	2022-10-25 15:34:07 -07:00
Jingyu Zhou	744c391608	Merge pull request #8539 from vishesh/cc-fail-later Don't fail ConsistencyCheck on first mismatch	2022-10-25 15:33:11 -07:00

1 2 3 4 5 ...

23462 Commits All Branches Search

23462 Commits

All Branches