foundationdb

Commit Graph

Author	SHA1	Message	Date
Lukas Joswiak	795b666e23	Fix a rare configuration database data loss bug See the comment contained in this commit. This bug could only manifest under a specific set of circumstances: 1. A coordinator change is started 2. The coordinator change succeeds, but its action of clearing `previousCoordinatorsKey` is delayed. 3. A minority of `ConfigNode`s have an old state of the configuration database, compared to the majority. 4. A `ConfigNode` in the majority dies and permanently loses data. 5. A long delay occurs on the `PaxosConfigConsumer` when it tries to read the latest changes from the `ConfigNode`s. In the above circumstances, the `ConfigBroadcaster` could incorrectly send a snapshot of an old state of the configuration database to a majority of `ConfigNode`s. This would cause new, durable, and acknowledged commit data to be overwritten. Note that this bug only affects the configuration database (used for knob storage). It does not affect the normal keyspace.	2022-11-22 11:20:04 -08:00
sfc-gh-tclinkenbeard	74212eeacf	Encapsulate CounterCollection	2022-10-25 10:17:15 -07:00
Lukas Joswiak	8c50f98c00	Update type of coordinators hash This fixes some serialization issues due to `BinaryReader` not being able to deserialize types of size_t.	2022-09-13 16:53:54 -07:00
Lukas Joswiak	7ee6be9238	Simplify how ConfigBroadcastInterface is stored on worker	2022-09-13 16:53:54 -07:00
Lukas Joswiak	809d77c2ab	Fix issue where annotations were not being serialized	2022-09-13 16:53:54 -07:00
Lukas Joswiak	b2d395a304	Delay cluster controller restart when pushing knob updates to workers This gives the `ConfigBroadcaster` time to send the knob change to all workers before applying the change to itself and restarting.	2022-09-13 16:53:54 -07:00
Lukas Joswiak	8d237ba493	Fix various correctness and timeout issues Contains the following fixes: * When handling the special case rollforward where nodes can be rolled forward even if a majority are at version 0, we don't want to reset the live version of the node being rolled forward. This is because a quorum of nodes at version 0 can continue handing out and incrementing their live version, and if they are rolled forward there is the potential for them to go back in time in regard to their live version. So in this one special case, they should maintain their existing live version. * Fixes some unseed issues due to fields not being initialized properly. * Temporarily disables a coordinator restart in the recovery path (in the coordinated state) due to it causing a timeout. This needs more investigation in the future.	2022-09-13 16:53:54 -07:00
Lukas Joswiak	249ff2b2fd	Fix configuration database unit tests	2022-09-13 16:53:54 -07:00
Lukas Joswiak	cd2bbffa4c	Add flag to disable the configuration database The `--no-config-db` flag, passed to `fdbserver`, will disable the configuration database. When this flag is specified, no `ConfigNode`s will be started, the `ConfigBroadcaster` will not be started, and on a coordinator change no attempt will be made to lock `ConfigNode`s.	2022-09-13 16:53:54 -07:00
Lukas Joswiak	74ac617a34	Add support for changing coordinators to the configuration database Configuration database data lives on the coordinators. When a change coordinators command is issued, the data must be sent to the new coordinators to keep the database consistent.	2022-09-13 16:53:54 -07:00
Lukas Joswiak	9ca8a3c683	Reenable status json for dynamic knobs, add unit test	2022-06-21 11:43:05 -07:00
sfc-gh-tclinkenbeard	a71099471b	Update copyright header dates	2022-03-21 13:36:23 -07:00
Lukas Joswiak	582ba5d519	Fix issue with stuck config nodes In rare circumstances where the cluster controller dies / moves to a new machine, sometimes only a minority of `ConfigNode`s received messages telling them they were registered. When the `ConfigNode`s attempt to register with the new broadcaster (on the new cluster controller), the knob system would get stuck because only a minority would be registered. Part of this change allows registration of unregistered `ConfigNode`s if there is no path to a majority of registered nodes.	2022-03-15 11:42:58 -07:00
Lukas Joswiak	a8828db58e	Load balance dynamic knob requests This commit also removes an attempt to read the latest configuration snapshot when a rollforward timeout occurs. The normal retry loop will eventually fetch an up to date snapshot and the rollforward will be retried.	2022-02-22 10:53:48 -08:00
Lukas Joswiak	f300cec6ed	Fast-track ConfigNode registration with Simple DB When using the `ConfigDBType::Simple` configuration database, allow nodes to immediately register with the broadcaster without having to wait for a quorum.	2022-02-09 14:18:48 -08:00
Lukas Joswiak	b5a3312a26	Factor out known replica update step	2022-02-09 13:43:33 -08:00
Lukas Joswiak	1d15aa5580	Fix internal function name	2022-02-09 13:43:32 -08:00
Lukas Joswiak	d5a562e6b8	Fix dynamic knobs correctness issues	2022-02-09 13:43:32 -08:00
Lukas Joswiak	7e6bc27863	Remove linear time loop	2021-08-23 14:02:41 -07:00
Lukas Joswiak	08892eab55	Move client failure cleanup	2021-08-23 12:54:03 -07:00
Lukas Joswiak	adc1025fa1	Clean up clientFailures periodically	2021-08-23 12:45:42 -07:00
Lukas Joswiak	d004703cc8	Add worker kill unit test	2021-08-23 12:45:42 -07:00
sfc-gh-tclinkenbeard	b6c669be23	Send ConfigBroadcastSnapshotReply to broadcaster	2021-08-19 14:45:30 -07:00
sfc-gh-tclinkenbeard	62303af832	Remove invalid assertion from ConfigBroadcastSnapshotRequest handling	2021-08-18 13:24:00 -07:00
sfc-gh-tclinkenbeard	0bacc310ef	Reenable consumer in config broadcaster	2021-08-17 12:09:12 -07:00
sfc-gh-tclinkenbeard	616a01d01d	Only register each worker once with config broadcaster (consumer currently disabled)	2021-08-17 11:45:50 -07:00
sfc-gh-tclinkenbeard	3418c20867	Merge remote-tracking branch 'origin/master' into paxos-config-db	2021-08-16 10:49:47 -07:00
Lukas Joswiak	1faec36bc6	Wait for all snapshot replies before sending incremental changes	2021-08-11 11:17:51 -07:00
Lukas Joswiak	c098a1128d	Push snapshot changes to local configuration on refresh	2021-08-11 09:13:22 -07:00
Lukas Joswiak	b112560c94	Reorder registerWorker to prevent potential conflict	2021-08-10 15:09:35 -07:00
Lukas Joswiak	f018af6ee4	Update fdbserver/ConfigBroadcaster.actor.cpp Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>	2021-08-10 13:24:41 -07:00
Lukas Joswiak	d27c9e2520	Revert error check	2021-08-10 12:41:41 -07:00
Lukas Joswiak	a838a47b0b	Use ActorCollection for consumer future	2021-08-10 12:27:19 -07:00
Lukas Joswiak	598b23f8d4	Merge branch 'features/broadcaster-push' of github.com:sfc-gh-ljoswiak/foundationdb into features/broadcaster-push	2021-08-10 12:08:16 -07:00
Lukas Joswiak	5dfd7c4b1a	Remove redundant dead worker check	2021-08-10 11:56:58 -07:00
Lukas Joswiak	cf81b0650d	Only register consumer once on the broadcaster	2021-08-10 11:56:16 -07:00
Lukas Joswiak	72e55ef72e	Add broadcaster error check to unit tests	2021-08-10 11:39:29 -07:00
Lukas Joswiak	564a3d69b7	Rename config broadcast interface messages	2021-08-10 11:39:29 -07:00
Lukas Joswiak	85fa264a16	Remove move constructor and assignment operator	2021-08-10 11:39:29 -07:00
Lukas Joswiak	305a17c811	Improve config broadcaster logic, fix unit tests	2021-08-10 11:39:29 -07:00
Lukas Joswiak	72e63db856	Send ConfigBroadcastInterface to ConfigBroadcaster instead of entire worker interface	2021-08-10 11:39:29 -07:00
Lukas Joswiak	3946cf94ff	Push updates to workers (clang-formatted files)	2021-08-10 11:39:29 -07:00
Lukas Joswiak	092ab4302b	Push updates to workers	2021-08-10 11:39:29 -07:00
Lukas Joswiak	3a607d9a38	Update fdbserver/ConfigBroadcaster.actor.cpp Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>	2021-08-10 09:36:39 -07:00
Lukas Joswiak	c97a1b3b4d	Remove move constructor and assignment operator	2021-08-09 15:33:01 -07:00
Lukas Joswiak	5249105b04	Improve config broadcaster logic, fix unit tests	2021-08-09 13:20:06 -07:00
sfc-gh-tclinkenbeard	82546853c0	Rename UseConfigDB to ConfigDBType	2021-08-09 10:04:35 -07:00
sfc-gh-tclinkenbeard	b15daf1886	Added PImpl class This class propogates the constness of methods to their pimpl implementations	2021-08-09 10:04:34 -07:00
Lukas Joswiak	fae29dbb1f	Send ConfigBroadcastInterface to ConfigBroadcaster instead of entire worker interface	2021-08-06 12:42:07 -07:00
Lukas Joswiak	38d05a2f49	Push updates to workers (clang-formatted files)	2021-08-05 18:57:12 -07:00

1 2

83 Commits