foundationdb

Commit Graph

Author	SHA1	Message	Date
Jingyu Zhou	254c78053c	Fix a segfault error After wait, ServerDBInfo may have changed. Using the old copy is wrong.	2019-03-15 22:11:13 -07:00
Jingyu Zhou	12ddd56698	Fix Ratekeeper and DataDistributor placement Make sure both RateKeeper and DataDistributor are placed in the same data center as the Master. Make sure only one RateKeeper is live in the cluster as well.	2019-03-15 17:09:28 -07:00
Jingyu Zhou	bb5686eb75	Fix monitoring of DD and RK	2019-03-15 16:02:17 -07:00
Jingyu Zhou	40860e0093	Attempt to fix.	2019-03-15 11:29:04 -07:00
Jingyu Zhou	9e59c9c253	Check DataDistributor and RateKeeper fitness Fail the test if they are not put in the best fitness.	2019-03-14 16:14:57 -07:00
Evan Tschannen	044b6b4f8a	Merge branch 'master' into feature-degraded-tlog # Conflicts: # fdbserver/ClusterController.actor.cpp	2019-03-08 22:50:41 -05:00
Evan Tschannen	710a64dc4e	replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails	2019-03-08 11:25:07 -05:00
anoyes	981426bac9	More ide fixes	2019-03-05 18:03:57 -08:00
Evan Tschannen	fafb66b0a8	Merge pull request #1126 from bnamasivayam/ratelimit-consistencycheck Dynamically rate limit consistency check.	2019-02-27 14:43:09 -08:00
Balachandar Namasivayam	ab99497695	Addressed review comments.	2019-02-25 18:29:30 -08:00
Evan Tschannen	d008de576e	Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR Add background actor to remove redundant teams	2019-02-22 14:22:07 -08:00
mpilman	999ea09bfd	Use correct fwd decls in TesterInterface Also TesterInterface.h -> TesterInterface.actor.h	2019-02-19 15:16:59 -08:00
mpilman	699216f713	Use fwd decls in workloads Also workloads.h -> workloads.actor.h	2019-02-19 15:16:59 -08:00
mpilman	0bb60e5a3b	Use proper fwd decl in NativeAPI Also NativeAPI.h -> NativeAPI.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3cb2391b58	use proper fwd declarations in ManagementAPI Also ManagementAPI.h -> ManagementAPI.actor.h	2019-02-19 15:16:59 -08:00
Meng Xu	b35631365f	TeamRemover: Solve confict when merge with PR 1061 The previous commit merge with the master, which just merges the pull request #1062 from jzhou77/PR that adds a new DataDistribution role. The merge causes conflicts and errors in simulation tests. This commit resolves the code conflicts and tries to fix the new errors after incorporating the new DataDistribution role	2019-02-19 08:13:10 -08:00
Meng Xu	6d09ac483c	Merge with master	2019-02-15 17:03:40 -08:00
Meng Xu	5481851e82	TeamCollection: Add knobs for team remover Added three knobs to control team remover bool TR_FLAG_DISABLE_TEAM_REMOVER: Disable the teamRemover actor double TR_REMOVE_MACHINE_TEAM_DELAY: Wait for the specified time before try to remove next machine team double TR_WAIT_FOR_ALL_MACHINES_HEALTHY_DELAY: Wait before checking if all machines are healthy	2019-02-13 15:11:56 -08:00
Meng Xu	3ae8767ee8	TeamCollection: Apply clang-format	2019-02-12 13:41:18 -08:00
Balachandar Namasivayam	f44f26c232	Dynamically rate limit consistency check.	2019-02-07 16:08:39 -08:00
Meng Xu	2b73c89e98	TeamCollection: Test the number of teams Call the traceTeamCollectionInfo function to record the team numbers when we add a team directly from the shard information, instead of using addTeamsBestOf logic.	2019-02-05 15:58:16 -08:00
Meng Xu	f5171d1b57	TeamCollection: Test the number of teams The current simulator does not validate if the number of teams in the system is larger than the maximum desired number of teams. This validation should be added because we do NOT want too many teams in the system, which may impede the systems availability when multiple fault zones (e.g., machines) crashes at the same time. This commit adds the test at the consistency check in simulation. Since the current code does not handle the upgrading situation when we enforce the machine teams, the test is expected to fail. The later commit will handle the upgrading situation which gracefully remove the surplus teams.	2019-02-04 18:14:36 -08:00
Evan Tschannen	699f8dd617	fix: coordinators auto could put two coordinators in the same zone simulation now tests two machines in the same zone	2019-01-18 15:42:48 -08:00
Balachandar Namasivayam	a8e2e75cd5	Re-enable CheckDesiredClasses after making necessary changes for multi-region setup. Fixed a couple of bugs 1) A rare race condition where a worker is being roles even after it died. 2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.	2019-01-10 10:28:32 -08:00
Meng Xu	8de031f9a6	TeamCollection: clang-format Format the changes with git clang-format. No functional changes. Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-21 11:18:26 -08:00
Meng Xu	f7a7e069f0	TeamCollection: Remove unnecessary comments Pass 41806 tests with no failure Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-16 15:56:35 -08:00
Meng Xu	5051b35c61	TeamCollection: Use machine team to create server team Current server team collection logic does not consider the fact that multipe storage servers can run on the same machine. When multiple machines fail, all servers on the machines will fail, and the possibility of having one process team fail and lose data is very high. To reduce the possibility of losing data when multiple machine fails, we first create machine teams which span across different fault zones; we then create server teams based on machine teams by first picking 1 machine team, and then picking 1 server from each machine in the machine team. Signed-off-by: Meng Xu <meng_xu@apple.com>	2018-11-16 15:53:22 -08:00
Robert Escriva	268093a96d	Adjust all includes to be relative to the root. Remove the use of relative paths. A header at foo/bar.h could be included by files under foo/ with "bar.h", but would be included everywhere else as "foo/bar.h". Adjust so that every include references such a header with the latter form. Signed-off-by: Robert Escriva <rescriva@dropbox.com>	2018-10-19 17:35:33 +00:00
Evan Tschannen	90301f497f	Merge branch 'release-6.0' # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbrpc/FlowTransport.actor.cpp # fdbrpc/TLSConnection.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/Status.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/StatusWorkload.actor.cpp # versions.target	2018-09-05 16:06:33 -07:00
Evan Tschannen	21f5cf9ce9	suppress spammy trace events	2018-09-04 17:12:26 -07:00
Alex Miller	fb31a6999f	Rewrite all files to have #include actorcompiler.h as the last include.	2018-08-14 15:50:26 -07:00
Alex Miller	535b5701e5	Rewrite all `Void _ = wait(...)` -> `wait(...)`. This takes advantage of the new actorcompiler functionality to avoid having duplicate definitions of `Void _` when trying to feed the un-actorompiled source through clang.	2018-08-14 15:50:26 -07:00
Evan Tschannen	1c29275672	call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.	2018-08-01 14:30:57 -07:00
Evan Tschannen	2820b6e0bb	data inconsistency is always an error when detected by the consistency check	2018-07-09 22:26:13 -07:00
Evan Tschannen	89a4b2cd68	fix: consistency check could loop too long	2018-07-02 12:08:02 -04:00
Evan Tschannen	4a3247da69	fixed a few problems with the consistency check	2018-06-30 10:39:28 -07:00
Evan Tschannen	02f616eb68	fix: consistency check was broken when the key server key space is sharded	2018-06-28 23:16:32 -07:00
Evan Tschannen	45cf0067e4	fix: consistency check was not checking for data inconsistencies	2018-06-28 11:08:16 -07:00
Evan Tschannen	0913368651	added usable_regions to specify if we will replicate into a remote region remote replication defaults to the primary replication removed remote_logs, because they should be specified as an override in the regions object	2018-06-17 19:31:15 -07:00
A.J. Beamon	e5488419cc	Attempt to normalize trace events: * Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check. * Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase. * Use seconds instead of milliseconds in details. Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed. This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.	2018-06-08 11:11:08 -07:00
Evan Tschannen	19762b847d	Merge branch 'release-5.2' # Conflicts: # fdbserver/DatabaseConfiguration.cpp # fdbserver/SimulatedCluster.actor.cpp	2018-04-10 17:02:43 -07:00
Evan Tschannen	b95e68eb5a	fix: getDatabaseSize is really inefficient and causes slow tasks in the real world. Outside of simulation just assume the database is really large, because we only need the InvalidShardSize check in simulation	2018-03-26 17:35:11 -07:00
Evan Tschannen	65b532658f	added support for single region configurations	2018-03-15 10:59:30 -07:00
Evan Tschannen	3abf4d7fdf	Merge branch 'master' into feature-remote-logs	2018-03-09 14:50:04 -08:00
Evan Tschannen	91bb8faa45	Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa' # Conflicts: # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-03-09 14:47:03 -08:00
Evan Tschannen	cf6dd1437b	suppress spammy trace events	2018-03-09 10:16:34 -08:00
Balachandar Namasivayam	e7309a3535	Add trace events to print the ranges in ConsistencyCheck.	2018-03-08 13:53:59 -08:00
Balachandar Namasivayam	4f58bca66a	Simple refactor of code...	2018-03-08 11:34:25 -08:00
Balachandar Namasivayam	1c1a497ea2	Refactor getKeyServers to be more readable. Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers. Simplify getMasterProxies on DatabaseContext class.	2018-03-08 11:34:18 -08:00
Balachandar Namasivayam	03a40354e3	Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100. Fix the bug where some of the key server shards may not be fetched.	2018-03-08 11:34:11 -08:00

1 2

79 Commits