foundationdb

Commit Graph

Author	SHA1	Message	Date
Meng Xu	3ae8767ee8	TeamCollection: Apply clang-format	2019-02-12 13:41:18 -08:00
Meng Xu	7cfe6de27e	TeamCollection: Server team number must match machine team number DESIRED_TEAMS_PER_MACHINE must equal to DESIRED_TEAMS_PER_SERVER. Otherwise, we may have to few machine teams to create enough server teams. Note that BUGGIFY macro value is based on a random number generator. When you have two BUGGIFY, one may be true and the other is false. Also fix a bug in get the number of healthy machine teams.	2019-02-07 13:53:55 -08:00
Meng Xu	76d022f71c	TeamCollection: Remove redundant teams When the total number of teams is larger than the desired number, we should gracefully remove the redundant teams so that the number of teams is kept to a low number and the possibility of losing data is guaranteed to be extremely low even when multiple racks fail at the same time.	2019-02-07 11:24:51 -08:00
Meng Xu	455024b3fe	SimulationTest: Test the number of teams Magnify the possibility that the number of created machine teams is larger than the number of desired machine teams if we do NOT try to remove the surplus machine teams. This help test the upgrade to machine team in FDB 6.1	2019-02-06 11:04:41 -08:00
Meng Xu	2b73c89e98	TeamCollection: Test the number of teams Call the traceTeamCollectionInfo function to record the team numbers when we add a team directly from the shard information, instead of using addTeamsBestOf logic.	2019-02-05 15:58:16 -08:00
Meng Xu	f5171d1b57	TeamCollection: Test the number of teams The current simulator does not validate if the number of teams in the system is larger than the maximum desired number of teams. This validation should be added because we do NOT want too many teams in the system, which may impede the systems availability when multiple fault zones (e.g., machines) crashes at the same time. This commit adds the test at the consistency check in simulation. Since the current code does not handle the upgrading situation when we enforce the machine teams, the test is expected to fail. The later commit will handle the upgrading situation which gracefully remove the surplus teams.	2019-02-04 18:14:36 -08:00
Evan Tschannen	4b5d0b4e2c	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/AsyncFileBlobStore.actor.cpp # fdbclient/AsyncFileBlobStore.actor.h # fdbclient/BlobStore.actor.cpp # fdbclient/BlobStore.h # fdbclient/HTTP.actor.cpp # fdbclient/ManagementAPI.actor.cpp # fdbclient/NativeAPI.actor.cpp # fdbrpc/LoadBalance.actor.h # fdbrpc/batcher.actor.h # fdbrpc/fdbrpc.vcxproj # fdbrpc/sim2.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/DataDistributionTracker.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/masterserver.actor.cpp	2018-11-10 13:04:24 -08:00
Evan Tschannen	3e2484baf7	fix: a team tracker could downgrade the priority of a relocation issued by the team tracker for the other region	2018-11-09 10:07:55 -08:00
Evan Tschannen	c02690471d	added protection against configuration changes which cannot be immediately reverted the configure database workload tests region configurations	2018-11-04 19:53:55 -08:00
Robert Escriva	268093a96d	Adjust all includes to be relative to the root. Remove the use of relative paths. A header at foo/bar.h could be included by files under foo/ with "bar.h", but would be included everywhere else as "foo/bar.h". Adjust so that every include references such a header with the latter form. Signed-off-by: Robert Escriva <rescriva@dropbox.com>	2018-10-19 17:35:33 +00:00
Evan Tschannen	1314bcec9e	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst	2018-10-05 12:54:00 -07:00
Evan Tschannen	daed31708b	fix: we can only repair dead DCs if we have a fearless configuration	2018-10-05 12:35:37 -07:00
A.J. Beamon	2a97139d5d	This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.	2018-08-16 10:24:12 -07:00
Alex Miller	fb31a6999f	Rewrite all files to have #include actorcompiler.h as the last include.	2018-08-14 15:50:26 -07:00
Alex Miller	535b5701e5	Rewrite all `Void _ = wait(...)` -> `wait(...)`. This takes advantage of the new actorcompiler functionality to avoid having duplicate definitions of `Void _` when trying to feed the un-actorompiled source through clang.	2018-08-14 15:50:26 -07:00
Evan Tschannen	9c918a28f6	fix: status was reporting no replicas remaining when the remote datacenter was initially configured with usable_regions=2	2018-08-09 13:16:09 -07:00
Evan Tschannen	1c29275672	call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.	2018-08-01 14:30:57 -07:00
Evan Tschannen	f72a9f60c0	only disable fearless if a datacenter has actually been killed fix: we must prevent recovery into the dead datacenter while reducing usable_regions	2018-07-16 10:06:57 -07:00
Evan Tschannen	d42c9914d2	fix: future quiet databases need to be able to continue the reconfigure if the first one completes the repopulate but is cancelled before changing usable_regions	2018-07-08 19:56:55 -07:00
Evan Tschannen	ce6b0d4952	fix: consistency check must also configuration usable regions to 1, because the remote log set might not be able to copy data	2018-07-08 18:25:01 -07:00
Evan Tschannen	cd4fb9285a	waitForExlusion requires both regions to be healthy, which is only possible if we do not kill all logs in a region	2018-07-05 14:04:42 -07:00
Evan Tschannen	507b3bacb0	fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1. added more recovery states.	2018-07-05 00:08:51 -07:00
Evan Tschannen	e17dfea3b6	fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1. canKillProcess logic was wrong. We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.	2018-07-04 16:22:32 -04:00
Evan Tschannen	ea3365dc38	fix: quiet database only needs to use repopulate_anti_quorum instead of reducing usable_regions	2018-07-04 02:52:00 -04:00
A.J. Beamon	9f545ce002	Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor	2018-06-26 11:37:23 -07:00
Evan Tschannen	0913368651	added usable_regions to specify if we will replicate into a remote region remote replication defaults to the primary replication removed remote_logs, because they should be specified as an override in the regions object	2018-06-17 19:31:15 -07:00
A.J. Beamon	0ca51989bb	Merge branch 'master' into trace-log-refactor # Conflicts: # fdbserver/QuietDatabase.actor.cpp # fdbserver/Status.actor.cpp # flow/Trace.cpp	2018-06-08 13:24:30 -07:00
A.J. Beamon	e5488419cc	Attempt to normalize trace events: * Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check. * Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase. * Use seconds instead of milliseconds in details. Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed. This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.	2018-06-08 11:11:08 -07:00
A.J. Beamon	78839b20fd	Merge branch 'master' into trace-log-refactor # Conflicts: # flow/Trace.cpp	2018-05-31 10:46:20 -07:00
A.J. Beamon	ce0c991e78	Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs.	2018-05-02 10:44:38 -07:00
Evan Tschannen	656a817e74	fix: only reconfigure during the quiet database check, because excluding at the same time as reconfiguring causes the master to indefinitely restart recovery	2018-05-01 15:31:49 -07:00
Alec Grieser	551ea9c7f8	Merge remote-tracking branch 'upstream/release-5.2' into master-release-5.2-merge	2018-03-19 12:34:50 -07:00
Alec Grieser	70a05c1a9b	fix some compiler whinges	2018-03-13 15:00:16 -07:00
A.J. Beamon	f2c804e14f	Reverting changes from merge of master into release-5.2 (`b25810711c`). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit.	2018-03-06 10:15:04 -08:00
Evan Tschannen	37a6a81634	Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs # Conflicts: # fdbserver/workloads/RestartRecovery.actor.cpp	2018-02-23 12:33:28 -08:00
Alec Grieser	0bae9880f1	remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py	2018-02-21 10:25:11 -08:00
Evan Tschannen	1b5628d2c5	testing a single configured fearless setup in simulated cluster consolidated simulation connection disablers into one call in the tester automatically reconfigure from a fearless setup in simulation	2018-02-18 12:59:43 -08:00
Evan Tschannen	645dc5ead6	warmRange needs to get a read version occasionally to prevent it from overwhelming the proxy quietDatabase waits for all data distribution to be completely finished so that databases are cached in a cleaner state	2018-01-14 12:50:52 -08:00
A.J. Beamon	bb1297c686	Remove RkServerQueueInfo and RkTLogQueueInfo trace events, since this information is more or less already logged on the storage servers and tlogs. Update the quiet database check and magnesium to use the information from the logs and storage servers.	2017-11-14 12:59:42 -08:00
Yichi Chiang	3865c5ae0e	Enable checkUsingDesiredClasses() in consistency check	2017-10-24 12:58:54 -07:00
Evan Tschannen	e8b895c878	added the ability to disable connection failures for a period of time after one happens	2017-09-18 12:46:29 -07:00
John King	d0fbc41338	set LOCK_AWARE on several transactions used for getting cluster info for the consistency check	2017-07-28 18:50:32 -07:00
FDB Dev Team	a674cb4ef4	Initial repository commit	2017-05-25 13:48:44 -07:00

43 Commits