Commit Graph

18217 Commits

Author SHA1 Message Date
Ata E Husain Bohra abd2959702 Refactor: ClusterController driving cluster-recovery state machine
diff-1: Address Jingyu's review comments

At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
Ata E Husain Bohra dfe9d184ff Refactor: ClusterController driving cluster-recovery state machine
At present, cluster recovery process consists of following steps:
1. ClusterController clusterWatchDatabase actor recruits
   master/sequencer process.
2. Sequencer process implements the cluster recovery state machine,
   responsible to recruit all other processes as well restore the
   cluster state.

Patch proposes a scheme where the cluster recovery state machine
is implemented and driven by the ClusterController process instead
of the Sequencer process.

Advantages of the scheme could be:
1. Simplified design where ClusterController recruits "sequencer"
   process like other worker processes compared to current scheme
   where "sequencer" process gets special treatment. In newer scheme
   sequencer is responsible for maintaining/providing
   "committed version" (as expected).
2. ClusterController is responsible for worker processes recruitment,
   the sequencer though orchestrating the recovery state machine, it
   need to reachout to the ClusterController for recruiting worker
   processes etc.

NOTE:
Patch has moved the recovery state machine code from
'sequencer' -> 'cluster-controller' process, however, necessary
updates were done for both functionality as well as performance
improvement reasons.

Next Steps:
Cluster recovery documentation will be updated in near future.
2021-12-22 14:06:27 -08:00
neethuhaneesha 3086941c12
Merge pull request #6149 from neethuhaneesha/rocksdbHistograms
KeyValueStoreRocksDB histograms to track latencies
2021-12-22 11:28:36 -08:00
Neethu Haneesha Bingi 1f30368e71 KeyValueStoreRocksDB histograms to track latencies 2021-12-21 23:09:46 -08:00
Andrew Noyes fba55557ae Update generated.go, and test to keep it up to date
Also remove some unnecessary cgo stuff, and add a description to
trace_partial_file_suffix
2021-12-21 15:16:50 -08:00
Andrew Noyes fd33d31ff5 Enable -Wdelete-non-virtual-dtor for clang build
We had been disabling -Wdelete-non-virtual-dtor, because this seems to be done intentionally in the generated code of the actor compiler. I spent some time trying to rewrite it in a way that doesn't literally delete/destroy through a pointer to a base class without a virtual destructor, but I was unable to come up with something that passes correctness. My best guess is that we do this so that we can destroy actor state classes, call callbacks registered on the actor SAV, and then destroy the SAV.

Anyway now we'll detect new usages of deleting through a pointer to a base class without a virtual destructor.
2021-12-20 16:19:31 -08:00
Chris Douglas 6613ec282d
Merge pull request #6164 from cdouglas/awscli-baseimg
Move awscli to base image from YCSB
2021-12-17 12:58:59 -08:00
A.J. Beamon d8e161f89e Refactor NativeAPI transactions to create and pass around a reference counted state object. Watches no longer use the tranasction info object but instead use their own state. 2021-12-17 11:57:39 -08:00
Chris Douglas 3793e4e5f0 Remove rundant WORKDIR directive 2021-12-17 11:34:32 -08:00
Chris Douglas fec0fb9e9f Move awscli to base image from YCSB 2021-12-17 11:25:22 -08:00
A.J. Beamon 6ffe2067fe Set the CommitTransactionRequest span context upon construction/reset rather than in tryCommit. 2021-12-16 14:08:08 -08:00
Aaron Molitor 95d33cb363 copy packaging/docker to PROJECT_BINARY_DIR (undoing part of #5994),
fetch commit_sha from source_code_directory (don't assume we're in the source tree anymore),
allow custom tag (if a parameter is passed in as $1)
update README.md
2021-12-15 15:23:17 -08:00
A.J. Beamon 496000477c
Merge pull request #6144 from sfc-gh-ajbeamon/unify-flags
Convert command line arguments to use hyphens rather than underscores
2021-12-15 10:47:32 -08:00
Renxuan Wang 2227fc2943 Fix a bug in getDesiredCoordinators().
When no workers are chosen, we should return 0 coordinators.
2021-12-15 10:42:28 -08:00
A.J. Beamon c2a960b0f7 Invoke a different command on fdbbackup that doesn't hang when a cluster file is present but the cluster is unavailable. 2021-12-15 09:34:08 -08:00
zhenfeng yang 76974605c1
support lto (#6140)
* support lto

* use relative path

* add another variable to control lto

* remove unnecessary code
2021-12-14 15:45:07 -08:00
A.J. Beamon ca47b436ac
Apply suggestions from code review
Co-authored-by: Markus Pilman <markus.pilman@snowflake.com>
2021-12-14 14:44:20 -08:00
A.J. Beamon 16fb079c2d Undo some changes that aren't command line flags. 2021-12-14 12:35:49 -08:00
A.J. Beamon 30e2c2d9a6 Don' use new-style arguments in test harness. 2021-12-14 12:31:12 -08:00
A.J. Beamon 1a893e8d32 Add a test that various binaries properly parse arguments. 2021-12-14 12:03:44 -08:00
A.J. Beamon 5c9b64e414 Backup agent was mistakenly modified in conf files. 2021-12-14 12:02:13 -08:00
A.J. Beamon ff1cb58174 Convert hyphens to underscores for all prefix-based arguments (e.g. --knob-, --locality-) 2021-12-14 12:01:44 -08:00
A.J. Beamon f24adc7b6a Fix a bunch of places where we used old-style arguments. Allow hyphens for profiler args. 2021-12-14 09:59:14 -08:00
Andrew Noyes 1452680d54
Merge pull request #6120 from sfc-gh-anoyes/anoyes/noexecstack
Link libfdb_c with `-z noexecstack`
2021-12-14 09:53:02 -08:00
A.J. Beamon f29f487823
Unify flags (#25)
* Unify flags implementation and change help text in backup.actor.cpp
Description

Testing

* Keep LOG_GROUP unchanged

Description

Testing

* Transfer the hyphens to underscores for internal options and user's input, EXCEPT leading hyphens

Description

Testing

* Use a deep copy of the user's input flag to do the match

Description

Testing

* Convert the _ to - in Option arrays of backup.actor.cpp

Description

Testing

* Transter _ to - for files:
        TLSConfig.actor.h, fdbcli.actor.cpp, fdbserver.actor.cpp, FileConverter.h, FileConverter.cpp

Description

Testing

* Change another way to unify flag: using SO_O_ICASE_HYPHEN_AND_UNDERSCORE to determine whether we do the conversion in function IsEqual

Description

Testing

* Change the config command's name from SO_O_ICASE_HYPHEN_AND_UNDERSCORE to SO_O_HYPHEN_TO_UNDERSCORE

Description

Testing

* Update the comment for the SO_O_HYPHEN_TO_UNDERSCORE

Description

Testing

* Fix left underscore in SOption arrays

Description

Testing

* Convert _ to - in several files for commands

Description

Testing

* Make the FDBService and fdbmonitor backward compatible

Description

Testing

* Fix bugs about pointers

Description

Testing

* Check underscore and hyphen at the same time for --knob_, --localily_ and --test_
And fix bugs in fdbmonitor and FDBService
Description

Testing

* Simplify the function in fdbmonitor and FDBService about retrieving arguments.
And fix some documents in masterserver.actor.cpp

Description

Testing

* Convert _ to - for knob in the setKnob functions

Description

Testing

* Convert - to _ in the setKnob functions

Description
Since key in the knob related maps only contain _

Testing

* Rename varialbe name in the fdbmonitor and FDBService for clarification

Description

Testing

Co-authored-by: Chang Liu <chang.liu@snowflake.com>
2021-12-14 08:44:39 -08:00
Renxuan Wang 5b079acd66 Coordinator should reply clientInfo when it changes.
This bug is introduced in #5231.
2021-12-10 16:37:48 -08:00
Renxuan Wang 6978486e85 Coordinator should only reply client data if it's valid.
Because when a coordinator restarts or newly joins a cluster, a client trying to connect to it may already have client data, while the coordinator doesn't. In this case, the coordinator should not reply empty client data.
2021-12-10 16:37:48 -08:00
Aaron Molitor 8aab68303b add discrete tag_postfix to images, cleanup documentation 2021-12-10 15:37:04 -08:00
Josh Slocum 26a36535fb fixing formatting 2021-12-10 12:47:53 -06:00
Josh Slocum bd0ec5c69e Update bindings/c/test/mako/mako.c
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2021-12-10 12:47:53 -06:00
Josh Slocum 3afe9fb6e0 MVC bug fixes 2021-12-10 12:47:53 -06:00
Josh Slocum b7b2ad0a6f Handling timeout and transaction changes for ThreadResults properly 2021-12-10 12:47:53 -06:00
Josh Slocum da5d3e3ae8 Added new RETURN*_ON_ERROR variants to allow catching errors in other types of functions 2021-12-10 12:47:53 -06:00
Evan Tschannen 2bee1e4030
Merge pull request #6135 from sfc-gh-jslocum/add_cf_empty_back
Adding explicit empty version back to change feeds for now
2021-12-10 09:28:32 -08:00
Josh Slocum f6ea67120e Adding explicit empty version back to change feeds for now 2021-12-10 10:04:05 -06:00
Andrew Noyes 66b387916a Add test for correct permissions for libfdb_c execstack 2021-12-09 17:15:22 -08:00
Andrew Noyes e27d5f6208
Merge pull request #6131 from sfc-gh-anoyes/anoyes/sanitizer-fixes
Fix some problems when running ctest with ASAN builds
2021-12-09 09:13:25 -08:00
Andrew Noyes 79116ea887 Use boost coroutines by default for sanitizers 2021-12-08 16:44:35 -08:00
Andrew Noyes 1ce9c0faed Add sleep 1 after killing/suspending a process
So that it's more likely to actually deliver the message
2021-12-08 16:44:03 -08:00
Josh Slocum c24826430f Removing this check as it still won't work if the knob is set 2021-12-08 18:13:41 -06:00
neethuhaneesha 50ed545706
Merge pull request #6122 from neethuhaneesha/enableMetricsLogger
Enabling rocksdb metrics logger in simulation.
2021-12-08 11:00:26 -08:00
Andrew Noyes ffe51901ec
Merge pull request #6123 from sfc-gh-ajbeamon/increase-timeout-test-epsilon
Fix false positive in disconnected_timeout_test
2021-12-08 09:20:13 -08:00
A.J. Beamon 1a1f15323a When checking whether a timeout fired too early, use a larger epsilon from the target duration. 2021-12-07 18:50:49 -08:00
Zhe Wu e95ed4e9e0 Release notes for 6.3.23 2021-12-07 16:35:35 -08:00
Renxuan Wang 9f70e84a7b Remove trimFromHostname(). 2021-12-07 15:39:51 -08:00
Neethu Haneesha Bingi d23b8645f8 Enabling rocksdb metrics logger in simulation. 2021-12-07 15:18:29 -08:00
Evan Tschannen 20ee921986
Merge pull request #5923 from sfc-gh-sgwydir/minicycletest
Add MiniCycle Test
2021-12-07 11:30:16 -08:00
Andrew Noyes ef81252f31 Link libfdb_c with `-z noexecstack` 2021-12-07 10:51:10 -08:00
Sam Gwydir 31c0eef69c Add Minicycle Workload 2021-12-06 15:46:40 -08:00
Evan Tschannen fd2b27d7c4
Merge pull request #6103 from sfc-gh-ajbeamon/fix-dd-merge-too-soon
Fix: Merge too soon bug
2021-12-06 15:01:47 -08:00