Jingyu Zhou
3379f1e974
Move resolutionBalancing() back to master
...
This revert the behavior done by a recent refactor on master recovery in PR #6191 .
2022-03-23 09:57:31 -07:00
sfc-gh-tclinkenbeard
a71099471b
Update copyright header dates
2022-03-21 13:36:23 -07:00
A.J. Beamon
e882eb33fc
Abstract the cluster file into a cluster connection record that can be backed by something other than the filesystem.
2021-10-22 11:05:18 -07:00
Chaoguang Lin
65956ae6b7
Refactor configure command; refactor changeConfig to template code to reuse existing tests
2021-09-21 10:06:04 -07:00
Xiaoge Su
abf73047ca
Enforce std:: specifier rather than using namespace
2021-09-16 19:40:28 -07:00
Josh Slocum
6aabd9a03e
Adding FIXME for simulation issue
2021-08-17 18:18:28 -05:00
Steve Atherton
507c1f11e3
Add .log() to bare TraceEvent() invocations without any .detail()s to avoid clang-tidy warning about immediate destruction of object without use.
2021-07-26 19:55:10 -07:00
FDB Formatster
df90cc89de
apply clang-format to *.c, *.cpp, *.h, *.hpp files
2021-03-10 10:18:07 -08:00
sfc-gh-tclinkenbeard
7f0d14c8e4
Modernize/refactor workloads directory
2020-10-04 22:29:07 -07:00
Balachandar Namasivayam
a5af31de23
Addressed simple review comments
2020-03-31 18:34:13 -07:00
Balachandar Namasivayam
b1c3893d40
Fix some corner case bugs exposed by simulation.
...
In one case, when a SS joins the cluster and DD doesn't find any healthy server to form a team with the newly added server, then the SS does not get added to any team even when the other servers get healthy.
Another is an extreme case where a data center is down, and a SS in the active DC joins and then dies immediately but not before DD adds it to a destination team for a relocating shard which will result in DD waiting indefinitely for the dead data center to come back up for the cluster to be fully recovered.
2020-03-31 18:33:12 -07:00
Evan Tschannen
4a866290b7
Clients keep a persistent connection open with coordinators to get updates to the list of proxies
...
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Andrew Noyes
d7612a4426
Fix OPEN_FOR_IDE build errors
2019-04-05 16:30:42 -07:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen
27e3617548
fix: remove bad teams needed to use dd_stall_check delay, because in simulation the buggified delay time could make us remove bad teams before they submit their ranges to the queue
2019-02-20 14:18:36 -08:00
mpilman
999ea09bfd
Use correct fwd decls in TesterInterface
...
Also TesterInterface.h -> TesterInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
699216f713
Use fwd decls in workloads
...
Also workloads.h -> workloads.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3cb2391b58
use proper fwd declarations in ManagementAPI
...
Also ManagementAPI.h -> ManagementAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen
8ed89fd711
fixed review comments
2019-02-19 11:26:53 -08:00
Evan Tschannen
ed9e20ce17
forgot to fix merge conflicts
2019-02-18 17:09:55 -08:00
Evan Tschannen
065a45e05f
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen
62603d11a1
updated the killRegion simulation test to test a much larger variety of failure scenarios
2019-02-18 15:32:51 -08:00
Andrew Noyes
067a445e06
Replace unused _ variables with wait(success(...))
2019-02-12 17:30:30 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
c02690471d
added protection against configuration changes which cannot be immediately reverted
...
the configure database workload tests region configurations
2018-11-04 19:53:55 -08:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
3922e477a5
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/LogSystemDiskQueueAdapter.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
Evan Tschannen
c9f4109539
fix: add some additional time in the kill region workload to detect if we recovered successfully
2018-10-02 17:47:15 -07:00
Evan Tschannen
b560b94ebc
fix: do not force a recovery if the master was already in the other region (and therefore already recovered)
...
fix: reboot the remaining DC, because any storage server rejoins that were rolled back will cause that server to be unusable
2018-09-28 12:10:04 -07:00
Evan Tschannen
200e65fe61
added a workload which tests killing an entire region, and recovering from the failure with data loss.
...
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00