See the comment contained in this commit. This bug could only manifest
under a specific set of circumstances:
1. A coordinator change is started
2. The coordinator change succeeds, but its action of clearing
`previousCoordinatorsKey` is delayed.
3. A minority of `ConfigNode`s have an old state of the configuration
database, compared to the majority.
4. A `ConfigNode` in the majority dies and permanently loses data.
5. A long delay occurs on the `PaxosConfigConsumer` when it tries to
read the latest changes from the `ConfigNode`s.
In the above circumstances, the `ConfigBroadcaster` could incorrectly
send a snapshot of an old state of the configuration database to a
majority of `ConfigNode`s. This would cause new, durable, and
acknowledged commit data to be overwritten.
Note that this bug only affects the configuration database (used for
knob storage). It does not affect the normal keyspace.
fdbclient/PaxosConfigTransaction.actor.cpp:221:77: runtime error: shift exponent 32 is too large for 32-bit type 'int'
I confirmed that 1 << 30 is not UB
Configuration database data lives on the coordinators. When a change
coordinators command is issued, the data must be sent to the new
coordinators to keep the database consistent.
* proof of concept
* use code-probe instead of test
* code probe working on gcc
* code probe implemented
* renamed TestProbe to CodeProbe
* fixed refactoring typo
* support filtered output
* print probes at end of simulation
* fix missed probes print
* fix deduplication
* Fix refactoring issues
* revert bad refactor
* make sure file paths are relative
* fix more wrong refactor changes
* Fix a heap-use-after-free in PaxosConfigConsumer.actor.cpp
* Two more defensive local promises
* Two more defensive promise copies
* Fix latent logic error
* coordinatorsKey should not storing IP addresses.
Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.
* Remove the legacy coordinators() function.
* Update async_resolve().
ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.
* Clean code format.
* Fix typo.
* Remove SpecifiedQuorumChange and NoQuorumChange.
This commit also removes an attempt to read the latest configuration
snapshot when a rollforward timeout occurs. The normal retry loop will
eventually fetch an up to date snapshot and the rollforward will be
retried.