Evan Tschannen
200e65fe61
added a workload which tests killing an entire region, and recovering from the failure with data loss.
...
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen
4dd2dda0a3
Merge branch 'release-6.0'
...
# Conflicts:
# fdbserver/worker.actor.cpp
2018-09-05 16:11:06 -07:00
A.J. Beamon
2de0b5d6d7
Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations.
2018-09-05 15:06:14 -07:00
A.J. Beamon
2a97139d5d
This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.
2018-08-16 10:24:12 -07:00
Alex Miller
86dbe1f0e9
Fix more instances of actorcompiler.h being in the wrong place.
2018-08-14 15:50:26 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
1c29275672
call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.
2018-08-01 14:30:57 -07:00
Evan Tschannen
f72a9f60c0
only disable fearless if a datacenter has actually been killed
...
fix: we must prevent recovery into the dead datacenter while reducing usable_regions
2018-07-16 10:06:57 -07:00
Evan Tschannen
d42c9914d2
fix: future quiet databases need to be able to continue the reconfigure if the first one completes the repopulate but is cancelled before changing usable_regions
2018-07-08 19:56:55 -07:00
Evan Tschannen
ce6b0d4952
fix: consistency check must also configuration usable regions to 1, because the remote log set might not be able to copy data
2018-07-08 18:25:01 -07:00
Evan Tschannen
866ccfe344
added the ability to allow the master to finish recovery before all storage servers in both regions have their mutations. This allows you to recover from scenarios where you lose all your tlogs in one dc.
2018-07-04 01:59:04 -04:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
A.J. Beamon
f1d389448c
Merge pull request #453 from apple/release-5.2
...
Merge release-5.2 into master
2018-06-08 10:41:44 -07:00
A.J. Beamon
6461478695
Merge pull request #452 from apple/release-5.1
...
Merge release-5.1 into release-5.2
2018-06-08 10:41:13 -07:00
A.J. Beamon
c9543791fd
Fix case of newSeverity detail in StderrSeverity trace event
2018-06-08 10:24:12 -07:00
Evan Tschannen
7ed64c821e
fix: recruiting a cluster controller takes longer after restarting tests because we wait until files have recovered from disk before starting
2018-05-05 17:20:48 -07:00
Evan Tschannen
656a817e74
fix: only reconfigure during the quiet database check, because excluding at the same time as reconfiguring causes the master to indefinitely restart recovery
2018-05-01 15:31:49 -07:00
yichic
ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
...
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang
d6559b144f
Share log mutations between backups and DRs which have the same backup range
2018-03-19 11:32:50 -07:00
Yichi Chiang
26b93ff920
Share log mutations between backups and DRs which have the same backup range
2018-03-16 18:09:23 -07:00
Evan Tschannen
68606c7984
fix: sim2 logic for when a kill is safe was incorrect
2018-03-06 18:38:05 -08:00
A.J. Beamon
f2c804e14f
Reverting changes from merge of master into release-5.2 ( b25810711c
). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit.
2018-03-06 10:15:04 -08:00
Evan Tschannen
1194e3a361
added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.
2018-03-05 19:27:46 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
1b5628d2c5
testing a single configured fearless setup in simulated cluster
...
consolidated simulation connection disablers into one call in the tester
automatically reconfigure from a fearless setup in simulation
2018-02-18 12:59:43 -08:00
Evan Tschannen
1fedcba890
fix: do not use log router tags when configured without remote logs
...
fix: data distribution tracks undesired storage servers
re-enabled consistency check
2018-02-13 17:01:34 -08:00
Evan Tschannen
21482a45e1
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DBCoreState.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen
645dc5ead6
warmRange needs to get a read version occasionally to prevent it from overwhelming the proxy
...
quietDatabase waits for all data distribution to be completely finished so that databases are cached in a cleaner state
2018-01-14 12:50:52 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Yichi Chiang
defdc6550d
Exclude excluded processses when getting testers
2017-10-24 15:16:34 -07:00
Yichi Chiang
3865c5ae0e
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 12:58:54 -07:00
Evan Tschannen
7a36fd2134
disabled a variety of simulation tests to get correctness clean
2017-10-19 15:49:54 -07:00
Evan Tschannen
e8b895c878
added the ability to disable connection failures for a period of time after one happens
2017-09-18 12:46:29 -07:00
Evan Tschannen
34f987f56d
added a test in simulation which ensures that a recovery after a single failure takes less than 15 seconds
2017-09-15 17:55:01 -07:00
Evan Tschannen
dc1f7ca6b7
testers now use client locality load balancing
2017-09-01 12:53:01 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00