Meng Xu
5051b35c61
TeamCollection: Use machine team to create server team
...
Current server team collection logic does not consider
the fact that multipe storage servers can run on the same machine.
When multiple machines fail, all servers on the machines will fail, and
the possibility of having one process team fail and lose data is very high.
To reduce the possibility of losing data when multiple machine fails,
we first create machine teams which span across different fault zones;
we then create server teams based on machine teams by
first picking 1 machine team, and then
picking 1 server from each machine in the machine team.
Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:53:22 -08:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
90301f497f
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/TLSConnection.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# versions.target
2018-09-05 16:06:33 -07:00
Evan Tschannen
21f5cf9ce9
suppress spammy trace events
2018-09-04 17:12:26 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
1c29275672
call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.
2018-08-01 14:30:57 -07:00
Evan Tschannen
2820b6e0bb
data inconsistency is always an error when detected by the consistency check
2018-07-09 22:26:13 -07:00
Evan Tschannen
89a4b2cd68
fix: consistency check could loop too long
2018-07-02 12:08:02 -04:00
Evan Tschannen
4a3247da69
fixed a few problems with the consistency check
2018-06-30 10:39:28 -07:00
Evan Tschannen
02f616eb68
fix: consistency check was broken when the key server key space is sharded
2018-06-28 23:16:32 -07:00
Evan Tschannen
45cf0067e4
fix: consistency check was not checking for data inconsistencies
2018-06-28 11:08:16 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen
19762b847d
Merge branch 'release-5.2'
...
# Conflicts:
# fdbserver/DatabaseConfiguration.cpp
# fdbserver/SimulatedCluster.actor.cpp
2018-04-10 17:02:43 -07:00
Evan Tschannen
b95e68eb5a
fix: getDatabaseSize is really inefficient and causes slow tasks in the real world. Outside of simulation just assume the database is really large, because we only need the InvalidShardSize check in simulation
2018-03-26 17:35:11 -07:00
Evan Tschannen
65b532658f
added support for single region configurations
2018-03-15 10:59:30 -07:00
Evan Tschannen
3abf4d7fdf
Merge branch 'master' into feature-remote-logs
2018-03-09 14:50:04 -08:00
Evan Tschannen
91bb8faa45
Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
Evan Tschannen
cf6dd1437b
suppress spammy trace events
2018-03-09 10:16:34 -08:00
Balachandar Namasivayam
e7309a3535
Add trace events to print the ranges in ConsistencyCheck.
2018-03-08 13:53:59 -08:00
Balachandar Namasivayam
4f58bca66a
Simple refactor of code...
2018-03-08 11:34:25 -08:00
Balachandar Namasivayam
1c1a497ea2
Refactor getKeyServers to be more readable.
...
Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers.
Simplify getMasterProxies on DatabaseContext class.
2018-03-08 11:34:18 -08:00
Balachandar Namasivayam
03a40354e3
Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100.
...
Fix the bug where some of the key server shards may not be fetched.
2018-03-08 11:34:11 -08:00
Evan Tschannen
1194e3a361
added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.
2018-03-05 19:27:46 -08:00
Evan Tschannen
470f5c01f3
changed remoteDcId to a vector of ids, to support future configurations where there are multiple remote databases
2018-02-26 17:09:09 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen
719bb5bd0c
Merge pull request #4 from bnamasivayam/getKeyServers-refactor
...
Having 1000 as the limit for Limit for GetKeyServerLocationsRequest s…
2018-02-22 12:39:48 -08:00
Balachandar Namasivayam
2fe2b522d5
Simple refactor of code...
2018-02-22 12:38:14 -08:00
Balachandar Namasivayam
e2030db5a8
Refactor getKeyServers to be more readable.
...
Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers.
Simplify getMasterProxies on DatabaseContext class.
2018-02-21 17:11:50 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Balachandar Namasivayam
6218934c7b
Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100.
...
Fix the bug where some of the key server shards may not be fetched.
2018-02-20 17:41:34 -08:00
Evan Tschannen
1fedcba890
fix: do not use log router tags when configured without remote logs
...
fix: data distribution tracks undesired storage servers
re-enabled consistency check
2018-02-13 17:01:34 -08:00
Evan Tschannen
1dc9eceb6d
optimize GetKeyLocationRequests on the proxy so they only require a single map lookup, instead of doing 3 + (3* [number of ranges]) lookups
2017-12-15 20:13:44 -08:00
Evan Tschannen
73a0a07eac
clients ask for key location information directly from the proxy, instead of reading it from the database
2017-12-09 16:10:22 -08:00
Yichi Chiang
8ba0eaebff
Check cluster controller using desired process class in consistency check
2017-11-29 15:09:23 -08:00
Evan Tschannen
57aba0b3bc
fix: excluded servers were the same fitness as storage servers for the master role
...
fix: better master exists did not considers exclusion for master fitness
2017-11-03 17:09:14 -07:00
Yichi Chiang
c2a117fe07
Merge pull request #189 from cie/enable-check-desired-class
...
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 15:18:21 -07:00
Yichi Chiang
3865c5ae0e
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 12:58:54 -07:00
Evan Tschannen
ef41b07bb3
renamed past_version to transaction_too_old
...
implemented read_lock_aware option
2017-09-28 16:35:08 -07:00
Evan Tschannen
7b60e26660
Merge pull request #160 from cie/use-error-descriptions
...
Add the ability to access name and description in Error. Update error…
2017-09-28 16:00:39 -07:00
Evan Tschannen
73fca75239
added the ability to disable timeKeeper; disabled timeKeeper before consistency check in simulation
2017-09-28 13:13:24 -07:00
A.J. Beamon
d30c730f75
Add the ability to access name and description in Error. Update error descriptions.
2017-09-28 12:35:03 -07:00
Evan Tschannen
d61be4c760
Merge branch 'release-5.0'
2017-08-30 12:59:24 -07:00
Evan Tschannen
963e1c3f31
fix: we need to reboot the process even if it will result in too many files, because the check will not succeed without it
2017-08-30 12:58:46 -07:00
Alvin Moore
6020d70863
Added trace event to track reboots initiated by ConsistencyCheck workload in simulation
2017-08-29 11:41:27 -07:00
Alvin Moore
c95a1be5ec
Add trace event for rebooting process during simulation for consistency check
2017-08-29 11:00:44 -07:00
Alec Grieser
300b5a17ed
Merge branch 'release-5.0'
2017-08-25 18:55:33 -07:00
Evan Tschannen
272b4b984c
fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine
2017-08-25 10:12:58 -07:00
Alec Grieser
ca7437ecf6
Merge branch 'release-5.0'
2017-08-02 22:07:01 -07:00
John King
d0fbc41338
set LOCK_AWARE on several transactions used for getting cluster info for the consistency check
2017-07-28 18:50:32 -07:00
Yichi Chiang
53e1ae9f60
shard system keyspace
2017-07-26 13:47:31 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00