Commit Graph

57 Commits

Author SHA1 Message Date
sfc-gh-tclinkenbeard 0f14647bbf Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-01-05 08:10:49 -08:00
FoundationDB CI 86d6106dc1
format source code after switch to clang 15 2022-12-08 17:26:45 +00:00
sfc-gh-tclinkenbeard 4b6098931c Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2022-12-04 08:40:04 -08:00
Lukas Joswiak 795b666e23 Fix a rare configuration database data loss bug
See the comment contained in this commit. This bug could only manifest
under a specific set of circumstances:

1. A coordinator change is started
2. The coordinator change succeeds, but its action of clearing
   `previousCoordinatorsKey` is delayed.
3. A minority of `ConfigNode`s have an old state of the configuration
   database, compared to the majority.
4. A `ConfigNode` in the majority dies and permanently loses data.
5. A long delay occurs on the `PaxosConfigConsumer` when it tries to
   read the latest changes from the `ConfigNode`s.

In the above circumstances, the `ConfigBroadcaster` could incorrectly
send a snapshot of an old state of the configuration database to a
majority of `ConfigNode`s. This would cause new, durable, and
acknowledged commit data to be overwritten.

Note that this bug only affects the configuration database (used for
knob storage). It does not affect the normal keyspace.
2022-11-22 11:20:04 -08:00
sfc-gh-tclinkenbeard 7fc85d86b4 Add fdb_transaction_get_tag_throttled_duration function to C bindings 2022-11-13 11:22:57 -08:00
sfc-gh-tclinkenbeard c2dab7b0e0 Add ISingleThreadTransaction::getTotalCost method 2022-10-16 21:58:11 -07:00
Andrew Noyes 0afa24bb3f
Fix undefined behavior when retries is too large (#8180)
fdbclient/PaxosConfigTransaction.actor.cpp:221:77: runtime error: shift exponent 32 is too large for 32-bit type 'int'

I confirmed that 1 << 30 is not UB
2022-09-14 11:46:15 -07:00
Lukas Joswiak 8c50f98c00 Update type of coordinators hash
This fixes some serialization issues due to `BinaryReader` not being
able to deserialize types of size_t.
2022-09-13 16:53:54 -07:00
Lukas Joswiak 424bb87f3e Apply feedback 2022-09-13 16:53:54 -07:00
Lukas Joswiak 7ee6be9238 Simplify how ConfigBroadcastInterface is stored on worker 2022-09-13 16:53:54 -07:00
Lukas Joswiak 2fe3fc5379 Fix issue with pointer dereference after actor cancellation 2022-09-13 16:53:54 -07:00
Lukas Joswiak b2d395a304 Delay cluster controller restart when pushing knob updates to workers
This gives the `ConfigBroadcaster` time to send the knob change to all
workers before applying the change to itself and restarting.
2022-09-13 16:53:54 -07:00
Lukas Joswiak 249ff2b2fd Fix configuration database unit tests 2022-09-13 16:53:54 -07:00
Lukas Joswiak 74ac617a34 Add support for changing coordinators to the configuration database
Configuration database data lives on the coordinators. When a change
coordinators command is issued, the data must be sent to the new
coordinators to keep the database consistent.
2022-09-13 16:53:54 -07:00
Markus Pilman 1de37afd52
Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
Andrew Noyes 6f500b59c0
Fix a heap-use-after-free in PaxosConfigConsumer.actor.cpp (#7244)
* Fix a heap-use-after-free in PaxosConfigConsumer.actor.cpp

* Two more defensive local promises

* Two more defensive promise copies

* Fix latent logic error
2022-05-25 12:08:30 -07:00
Renxuan Wang df4e0deb4d
coordinatorsKey should not always store IP addresses. (#7204)
* coordinatorsKey should not storing IP addresses.

Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.

* Remove the legacy coordinators() function.

* Update async_resolve().

ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.

* Clean code format.

* Fix typo.

* Remove SpecifiedQuorumChange and NoQuorumChange.
2022-05-23 11:42:56 -07:00
Renxuan Wang c69a07a858
Check in the new Hostname logic. (#6926)
* Revert #6655.

20220407-031010-renxuan-c101052c21da8346           compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan

* Revert #6271.

20220407-051532-renxuan-470f0fe6aac1c217           compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan

* Revert #6266.

Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable.

20220407-175119-renxuan-55d30ee1a4b42c2f           compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan

* Add hostname to coordinator interfaces.

* Turn on the new hostname logic.

* Add the corresponding change in config txns.

The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first.

Passed correctness tests.

* Return error when hostnames cannot be resolved in coordinators command.

* Minor fixes.
2022-04-27 21:54:13 -07:00
sfc-gh-tclinkenbeard a71099471b Update copyright header dates 2022-03-21 13:36:23 -07:00
sfc-gh-tclinkenbeard 0e7dc83f25 Fix compilation issues with ModelInterface construction in configuration database code 2022-03-16 14:25:32 -07:00
Trevor Clinkenbeard 10c536c700
Merge pull request #6435 from sfc-gh-ljoswiak/fixes/dynamic-knobs-release-readiness
Dynamic knobs improvements
2022-03-16 10:26:56 -07:00
A.J. Beamon c635dcd3ad Add tenant support in the FDB native client 2022-03-15 09:21:27 -07:00
Lukas Joswiak a8828db58e Load balance dynamic knob requests
This commit also removes an attempt to read the latest configuration
snapshot when a rollforward timeout occurs. The normal retry loop will
eventually fetch an up to date snapshot and the rollforward will be
retried.
2022-02-22 10:53:48 -08:00
Lukas Joswiak cdc1549282 Fix client timeout errors 2022-02-09 13:43:33 -08:00
Lukas Joswiak d5a562e6b8 Fix dynamic knobs correctness issues 2022-02-09 13:43:32 -08:00
Lukas Joswiak 92998fd20b Merge rollback message into rollforward message 2021-10-25 12:03:22 -07:00
Lukas Joswiak 7357d7714c Retry with well known endpoints, move last committed check to consumer 2021-10-25 12:03:22 -07:00
Lukas Joswiak 48dc91dd7f Add rollback and rollforward logic to ConfigBroadcaster 2021-10-25 12:03:22 -07:00
A.J. Beamon e882eb33fc Abstract the cluster file into a cluster connection record that can be backed by something other than the filesystem. 2021-10-22 11:05:18 -07:00
Steve Atherton 2ebaddcc1e
Bug fix: CommitQuorum::addRequestActor() accesses self after destruction due to ignoring actor_cancelled error. (#5744) 2021-10-11 12:17:09 -07:00
sfc-gh-tclinkenbeard 3a880b43d4 Fix and reenable PaxosConfigTransaction::onError 2021-08-27 15:06:33 -07:00
sfc-gh-tclinkenbeard 29d83291a1 Add CommitUnknownResult metric for ConfigIncrement workload 2021-08-27 00:44:12 -07:00
sfc-gh-tclinkenbeard d541b3804d Make SimpleConfigTransaction more consistent with PaxosConfigTransaction 2021-08-26 16:18:24 -07:00
sfc-gh-tclinkenbeard 28daab9f5c Use ActorCollection instead of std::vector<Future<Void>> in *Quorum classes, to listen for errors 2021-08-26 16:18:19 -07:00
sfc-gh-tclinkenbeard 96726a275a Create separate copies of commit request in PaxosConfigTransactionImpl 2021-08-25 15:21:36 -07:00
sfc-gh-tclinkenbeard 78b57629a1 Handle not_committed error in PaxosConfigTransaction::onError 2021-08-23 13:09:49 -07:00
sfc-gh-tclinkenbeard 2b3041f205 Add CommitQuorum to PaxosConfigTransaction.actor.cpp 2021-08-20 15:53:13 -07:00
sfc-gh-tclinkenbeard 3a067b9cc8 Expand scope of GetGenerationQuorum object in PaxosConfigTransactionImpl 2021-08-20 10:15:01 -07:00
sfc-gh-tclinkenbeard 04d33d3cba Remove anonymous namespace in PaxosConfigTransaction.actor.cpp 2021-08-16 14:30:05 -07:00
sfc-gh-tclinkenbeard a5b916cd8d PaxosConfigTransaction should only send read requests to valid replicas 2021-08-09 10:04:35 -07:00
sfc-gh-tclinkenbeard 79ba9c4e3a Add GetGenerationQuorum to get generation from a quorum of config nodes 2021-08-09 10:04:35 -07:00
sfc-gh-tclinkenbeard b15daf1886 Added PImpl class
This class propogates the constness of methods to their pimpl
implementations
2021-08-09 10:04:34 -07:00
sfc-gh-tclinkenbeard a55e849da0 Add some documentation to ConfigGeneration and fix getReadVersion implementations 2021-07-28 13:04:05 -07:00
sfc-gh-tclinkenbeard 634aa2deae Fix IConfigTransaction::getReadVersion implementations 2021-07-26 19:37:12 -07:00
sfc-gh-tclinkenbeard 7de573faf8 Add simple PaxosConfigTransaction::commit implementation 2021-07-18 14:43:58 -07:00
sfc-gh-tclinkenbeard b24b46c862 Replace Standalone<RangeResultRef> with RangeResult in configuration database code 2021-07-18 14:26:15 -07:00
sfc-gh-tclinkenbeard 91e6b7d83d Implement several more PaxosConfigTransaction methods 2021-07-18 14:21:21 -07:00
sfc-gh-tclinkenbeard de871da75f Add some simple implementations to PaxosConfigTransaction methods 2021-07-18 14:02:45 -07:00
sfc-gh-tclinkenbeard 475abe301c Merge remote-tracking branch 'origin/master' into fix-ub 2021-07-14 10:47:02 -07:00
sfc-gh-tclinkenbeard 8cc40e3a2b Expand use of BOOLEAN_PARAM 2021-07-02 21:41:50 -07:00