foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	1818aab205	Apply suggestions from code review Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>	2019-02-14 16:30:13 -08:00
Jingyu Zhou	886e7ab2ba	Add a new DataDistributor role. Let cluster controller to start a new data distributor role by sending a message to a chosen worker. Change MasterInterface usage in DataDistribution to masterId Add DataDistributor rejoin handling. This allows the data distributor to tell the new cluster controller of its existence so that the controller doesn't spawn a new one. I.e., there should be only ONE data distributor in the cluster. If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries to recruit one as DD. CC also monitors DD and restarts one if it failed. The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for the new DD. Add GetRecoveryInfo RPC to master server, which is called by data distributor to obtain the recovery Transaction version from the master server.	2019-02-14 16:30:13 -08:00
Vishesh Yadav	907446d0ce	Merge remote-tracking branch 'apple/master' into task/tls-upgrade	2019-02-14 11:37:38 -08:00
A.J. Beamon	b435d51061	Merge branch 'master' into track-server-request-latencies	2019-02-14 08:07:32 -08:00
Evan Tschannen	e9ddd94e27	The failure monitor is given a list of all IP addresses associated with a process The connect packet includes the correct remote address Did a lot of code cleanup Simulation test mixed TLS and non-TLS listeners on the same process	2019-01-31 18:20:14 -08:00
Evan Tschannen	a678f778fa	Increase severity to SevWarnAlways for TooManyStatusRequests trace Co-Authored-By: tclinken <trevorclinkenbeard@gmail.com>	2019-01-28 17:50:50 -08:00
Trevor Clinkenbeard	5b89db811a	Throttle status requests with MAX_STATUS_REQUESTS_PER_SECOND knob, whenever status batching is used.	2019-01-28 15:37:30 -08:00
Evan Tschannen	1d7fec3074	Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2 # Conflicts: # .gitignore	2019-01-24 17:43:06 -08:00
A.J. Beamon	2198d24ce1	Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies # Conflicts: # fdbclient/MasterProxyInterface.h # fdbserver/ClusterController.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/ServerDBInfo.h # fdbserver/Status.actor.cpp # fdbserver/fdbserver.vcxproj # fdbserver/storageserver.actor.cpp	2019-01-24 11:43:26 -08:00
A.J. Beamon	8e05e95045	Added the ability to configure the latency band settings by setting a special key in \xff keyspace.	2019-01-18 16:18:34 -08:00
Evan Tschannen	7dbf06162e	Update fdbserver/ClusterController.actor.cpp Co-Authored-By: bnamasivayam <36455962+bnamasivayam@users.noreply.github.com>	2019-01-14 16:57:00 -08:00
Balachandar Namasivayam	ff661bca22	Fix a minor bug in the RoleFitness Class.	2019-01-14 14:54:54 -08:00
Balachandar Namasivayam	a8e2e75cd5	Re-enable CheckDesiredClasses after making necessary changes for multi-region setup. Fixed a couple of bugs 1) A rare race condition where a worker is being roles even after it died. 2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.	2019-01-10 10:28:32 -08:00
Vishesh Yadav	3eb9b23024	Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint - This patch will make FDB listen to multiple addresses given via command line. Although, we'll still use first address in most places, this patch starts using vector<NetworkAddress> in Endpoint at some basic places. - When sending packets to an endpoint, pick a random network address in endpoints - Renames Endpoint::address to Endpoint::addresses since it now holds a vector of addresses.	2018-12-13 13:36:52 -08:00
Vishesh Yadav	43e5a46f9b	Change Endpoint::address(NetworkAddress) to vector<NetworkAddress> Extend `Endpoint` class to take multiple NetworkAddresses instead of just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll have multiple IP:PORT pairs. This patch simply adds the field and makes changes to compile the codebase. The first element of of `address` field is used everywhere. Hence the way we talk to remains same with this patch. NOTE: Directly accessing the first memeber of Endpoint::address is unsafe as Endpoint() doesn't enforces non-empty address list. However, since the correctness test pass for now and are anyway replacing all those unsafe accesses with ones considering the whole vector, this patch ignores to access them in safe way.	2018-12-13 13:36:52 -08:00
Evan Tschannen	4b5d0b4e2c	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/AsyncFileBlobStore.actor.cpp # fdbclient/AsyncFileBlobStore.actor.h # fdbclient/BlobStore.actor.cpp # fdbclient/BlobStore.h # fdbclient/HTTP.actor.cpp # fdbclient/ManagementAPI.actor.cpp # fdbclient/NativeAPI.actor.cpp # fdbrpc/LoadBalance.actor.h # fdbrpc/batcher.actor.h # fdbrpc/fdbrpc.vcxproj # fdbrpc/sim2.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/DataDistributionTracker.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/masterserver.actor.cpp	2018-11-10 13:04:24 -08:00
Evan Tschannen	04fa2a7202	fix: we could recover in a region with priority < 0	2018-11-05 10:14:26 -08:00
Evan Tschannen	87295cc263	suppressed spammy trace events, and avoid reporting a long master recovery duration when the cluster is first created	2018-11-04 23:07:56 -08:00
Evan Tschannen	c1bd279a4e	addressed review comments	2018-11-04 20:26:23 -08:00
Evan Tschannen	accba4fa1d	keep track of the last time a process became available to set a better starting value for remoteStartTime	2018-11-04 14:33:03 -08:00
Evan Tschannen	30fbc29af1	Renamed TimeKeeperStarted to TimeKeeperCommit	2018-11-02 12:57:03 -07:00
Evan Tschannen	278dbd5096	call debug transaction on timekeeper	2018-11-02 12:56:29 -07:00
Robert Escriva	268093a96d	Adjust all includes to be relative to the root. Remove the use of relative paths. A header at foo/bar.h could be included by files under foo/ with "bar.h", but would be included everywhere else as "foo/bar.h". Adjust so that every include references such a header with the latter form. Signed-off-by: Robert Escriva <rescriva@dropbox.com>	2018-10-19 17:35:33 +00:00
Evan Tschannen	3922e477a5	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/ManagementAPI.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/LogSystemDiskQueueAdapter.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp	2018-10-03 16:57:18 -07:00
Evan Tschannen	3fdf72c626	fix: we need to force recovery if the master is still attempting to read the txs tag	2018-09-28 13:33:33 -07:00
Evan Tschannen	22e6afbb18	fix: the cluster controller did not pass in its own locality when creating its database object, therefore it was not using locality aware load balancing	2018-09-28 12:12:06 -07:00
A.J. Beamon	92990d6aef	Merge release-6.0 into master	2018-09-21 16:14:39 -07:00
Evan Tschannen	6b6d7a087d	The cluster controller should never consider itself as failed (that will be handled by the coordinators) Simplified the check that the cluster controller is excluded	2018-09-20 17:01:11 -07:00
Evan Tschannen	03728db99b	do not trigger better master exists if the cluster controller is excluded, since the master will change anyways once the cluster controller is moved	2018-09-19 18:28:24 -07:00
Evan Tschannen	4dd2dda0a3	Merge branch 'release-6.0' # Conflicts: # fdbserver/worker.actor.cpp	2018-09-05 16:11:06 -07:00
Evan Tschannen	df406a340e	Merge pull request #742 from ajbeamon/roles-in-trace-events Add the roles running on a process as a field on trace events in the …	2018-09-05 16:08:12 -07:00
Evan Tschannen	90301f497f	Merge branch 'release-6.0' # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbrpc/FlowTransport.actor.cpp # fdbrpc/TLSConnection.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/Status.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/StatusWorkload.actor.cpp # versions.target	2018-09-05 16:06:33 -07:00
A.J. Beamon	2de0b5d6d7	Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations.	2018-09-05 15:06:14 -07:00
Evan Tschannen	4eaff42e4f	Merge pull request #712 from ajbeamon/remove-database-name-internal Eliminate use of database names (phase 1)	2018-09-05 10:35:00 -07:00
Evan Tschannen	e60c668853	The cluster controller will increase its failure monitoring delay after there have been many unfinishedRecoveries	2018-08-31 10:51:55 -07:00
Evan Tschannen	a9987202d6	fixed merge problem	2018-08-22 08:47:47 -07:00
Evan Tschannen	717c43a69f	merge 6.0 into master	2018-08-22 00:28:04 -07:00
A.J. Beamon	2a97139d5d	This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.	2018-08-16 10:24:12 -07:00
Evan Tschannen	e770629229	fix: json_spirit::write_string is very CPU intensive, especially for large JSON documents. The cluster controller would call this function for each status reply it needed to send, resulting in a slow task.	2018-08-15 19:39:06 -07:00
Alex Miller	86dbe1f0e9	Fix more instances of actorcompiler.h being in the wrong place.	2018-08-14 15:50:26 -07:00
Alex Miller	fb31a6999f	Rewrite all files to have #include actorcompiler.h as the last include.	2018-08-14 15:50:26 -07:00
Alex Miller	535b5701e5	Rewrite all `Void _ = wait(...)` -> `wait(...)`. This takes advantage of the new actorcompiler functionality to avoid having duplicate definitions of `Void _` when trying to feed the un-actorompiled source through clang.	2018-08-14 15:50:26 -07:00
A.J. Beamon	574c5576a2	Merge branch 'release-6.0' of github.com:apple/foundationdb # Conflicts: # fdbrpc/TLSConnection.actor.cpp # versions.target	2018-08-10 14:31:58 -07:00
A.J. Beamon	3535ddad80	Merge pull request #674 from alexmiller-apple/glibcxx-debug-fixes Fix bugs uncovered by -D_GLIBCXX_DEBUG	2018-08-09 08:18:51 -07:00
Evan Tschannen	be1a4d74c7	tlogs serve reads to log routers at a low priority, to prevent them from using all their resources catching up a remote dc that has been down for a long time increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget	2018-08-04 10:31:30 -07:00
Alex Miller	1a7cda4149	Stop performing self-moves. (e.g. a = std::move(a)) self-moves are frowned upon in C++, and in our code this generally happens from calls to swap as part of trying to implement a "unordered erase" function via swap-to-the-end-and-pop_back. For convenience, a swapAndPop() function is now offered that performs this, while disallowing self-moves.	2018-08-01 18:09:54 -07:00
Evan Tschannen	1c29275672	call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.	2018-08-01 14:30:57 -07:00
Evan Tschannen	30b2f85020	fix: it is not safe to drop logs supporting the current primary datacenter, because configuring usable_regions down will drop the storage servers in the remote region, leaving you will no remaining logs	2018-07-14 16:26:45 -07:00
Evan Tschannen	28c0d96c90	fix: treat the local region as best when version difference is too large re-check requests when the version difference becomes small	2018-07-06 14:44:11 -07:00
Evan Tschannen	21347df254	fix: getting metrics did not handle broken_promise errors	2018-07-05 12:30:11 -07:00
Evan Tschannen	507b3bacb0	fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1. added more recovery states.	2018-07-05 00:08:51 -07:00
Evan Tschannen	e17dfea3b6	fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1. canKillProcess logic was wrong. We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.	2018-07-04 16:22:32 -04:00
Evan Tschannen	f2ec80f10d	added trace events for cluster controller changing datacenters	2018-07-02 13:06:54 -04:00
Evan Tschannen	334a433238	spend less time before using satellite fallback, because the database will be unavailable during this waiting time	2018-07-02 12:50:52 -04:00
Evan Tschannen	7a12d3e130	added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive.	2018-07-01 09:39:04 -04:00
Evan Tschannen	7e68bee692	update better machine classes first to give them a higher chance of becoming the next cluster controller	2018-06-29 01:11:59 -07:00
Evan Tschannen	e9ac8a1039	when the cluster controller is changing itself to a better dc fitness, it should notify itself first so another process does not take over	2018-06-29 00:10:29 -07:00
Evan Tschannen	a288d5b9a9	added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down	2018-06-28 23:15:32 -07:00
Evan Tschannen	58c2f67ff6	checking outstanding requests can be CPU intensive, so rate limit checking requests	2018-06-27 23:02:08 -07:00
Evan Tschannen	a5b4698bc8	do not wait for good recruitment delay if the cluster controller is in the second best region	2018-06-27 21:05:55 -07:00
Evan Tschannen	c6313a79e3	fix: the cluster controller needs to continue to retry recruitment until after wait_for_good_remote_recruitment_delay	2018-06-25 18:20:16 -07:00
Evan Tschannen	398497f5c3	fix: wrong desired count used when checking good remote fitness	2018-06-22 12:24:01 -07:00
Evan Tschannen	96b0a91ab2	simplified betterCount logic	2018-06-22 10:38:36 -07:00
Evan Tschannen	5fc8199abc	Swapped OkayFit and UnsetFit, because generally if machine classes are set on one machine they are set everywhere and it helps with wait_for_good_recruitment logic wait_for_good_recruitment now requires that you have the desired count of each roll remote recruitment is given a much longer wait_for_good_recruitment time interval, which does not start until enough remote machines have registered	2018-06-22 10:15:24 -07:00
Evan Tschannen	8a8914f046	re-added the ability to configure the number of log routers. Many log routers are needed to get a sufficient number of sockets involved in copying data across the WAN	2018-06-22 00:04:00 -07:00
Evan Tschannen	9a91dad5bd	fixed compile issue	2018-06-21 16:34:36 -07:00
Evan Tschannen	678b4494f4	added logging for the datacenter version difference	2018-06-21 16:31:52 -07:00
Evan Tschannen	0913368651	added usable_regions to specify if we will replicate into a remote region remote replication defaults to the primary replication removed remote_logs, because they should be specified as an override in the regions object	2018-06-17 19:31:15 -07:00
Evan Tschannen	f694f7c9ca	removed hasBestPolicy	2018-06-15 12:36:19 -07:00
Evan Tschannen	0103b6f5ed	added datacenter_version_difference to status	2018-06-14 19:09:25 -07:00
Evan Tschannen	0c6825eb43	allow multiple regions with the same priority configurations must have at least one region with non-negative priority	2018-06-14 12:59:55 -07:00
Evan Tschannen	26b7dd32da	fix: cluster controller did not respect usable dcs	2018-06-14 12:56:48 -07:00
Evan Tschannen	889889323e	The master will tell the cluster controller if it is going to take a long time to recruit new logs in its DC; the cluster controller can determine if the other DC would be better and recruit there. The cluster controller will not switch to the other data center if remote logs are too far behind. We will not recruit in DCs with negative priority.	2018-06-13 18:14:14 -07:00
Alex Miller	fcfa00928b	Make RecoveryState an enum class. This means that all the == 7 or != 0 checks go away, and explicit names must be used.	2018-06-12 16:50:25 -07:00
A.J. Beamon	e5488419cc	Attempt to normalize trace events: * Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check. * Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase. * Use seconds instead of milliseconds in details. Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed. This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.	2018-06-08 11:11:08 -07:00
Evan Tschannen	c3f2e2bb38	fix: do not attempt to become the cluster controller before recovering files from disk	2018-05-01 12:05:43 -07:00
Evan Tschannen	d72087bfd3	fix: we may not be able to recruit enough log routers, in this case put multiple log routers on the same worker, but also properly rank this configuration lower in better master exists	2018-04-26 22:18:07 -07:00
Evan Tschannen	7af892f50b	first working version of non-copying recovery working with fearless configurations	2018-04-08 21:24:05 -07:00
Evan Tschannen	b36e08f08f	first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet	2018-03-29 15:12:38 -07:00
Evan Tschannen	65b532658f	added support for single region configurations	2018-03-15 10:59:30 -07:00
Evan Tschannen	3abf4d7fdf	Merge branch 'master' into feature-remote-logs	2018-03-09 14:50:04 -08:00
Evan Tschannen	91bb8faa45	Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa' # Conflicts: # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-03-09 14:47:03 -08:00
Evan Tschannen	f9625f5b2f	fix: new cluster controllers should not consider anything failed until they have time to get failure monitoring updates fix: storage and log class machines wait 100MS before attempting to become the cluster controller	2018-03-08 18:08:41 -08:00
Evan Tschannen	8c88041608	fix: we must commit to the number of log routers we are going to use when recruiting the primary, because it determines the number of log router tags that will be attached to mutations	2018-03-06 16:31:21 -08:00
Evan Tschannen	1194e3a361	added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.	2018-03-05 19:27:46 -08:00
Evan Tschannen	470f5c01f3	changed remoteDcId to a vector of ids, to support future configurations where there are multiple remote databases	2018-02-26 17:09:09 -08:00
Evan Tschannen	37a6a81634	Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs # Conflicts: # fdbserver/workloads/RestartRecovery.actor.cpp	2018-02-23 12:33:28 -08:00
Alec Grieser	0bae9880f1	remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py	2018-02-21 10:25:11 -08:00
Evan Tschannen	42405c78a5	Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs # Conflicts: # fdbserver/ClusterController.actor.cpp # fdbserver/Knobs.cpp	2018-02-10 12:08:52 -08:00
Evan Tschannen	c7b3be5b19	re-enabled better master exists the cluster controller can choose a better data center for itself and let the workers know where the next cluster controller should be recruited	2018-02-09 16:48:55 -08:00
Evan Tschannen	b7dde88029	fix: the cluster controller did not consider the master sharing the same process as the cluster controller as bad in all needed locations waited too long for good recruitment locations, which would add too much time to recoveries of clusters that do not use machine classes	2018-02-06 11:30:05 -08:00
Evan Tschannen	ebd94bb654	removed a separately configurable storage team size for the remote data center, because it did not make sense fix: the master did not monitor for the failure of remote logs stop merge attempts when a data center is failed fixed a variety of other problems with data distribution when a data center is failed	2018-02-02 11:46:04 -08:00
Evan Tschannen	8f58bdd1cd	fixed a large number of problems related to running without remote logs	2018-01-16 18:12:40 -08:00
Evan Tschannen	21482a45e1	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/DBCoreState.h # fdbserver/LogSystem.h # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/TLogServer.actor.cpp	2018-01-14 13:40:24 -08:00
Evan Tschannen	645f68212b	make timekeeper priority system immediate	2018-01-08 18:21:00 -08:00
Evan Tschannen	4e8bc273b3	added a version of getKeyRangeLocations that checks for endpoint failures fix: did not add the cluster controller to id_used in all cases removed obsolete fixmes	2018-01-07 15:32:43 -08:00
Evan Tschannen	5ac4f73978	Merge branch 'release-5.1' into feature-remote-logs # Conflicts: # fdbclient/NativeAPI.actor.cpp # fdbrpc/Locality.h # fdbrpc/simulator.h # fdbserver/ApplyMetadataMutation.h # fdbserver/ClusterController.actor.cpp # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/masterserver.actor.cpp # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-01-05 11:33:42 -08:00
Evan Tschannen	e11f461cbd	fix: better master exists needs to check master fitness before tlogs or proxies because that is the order of recruitment	2018-01-04 15:19:46 -08:00
Evan Tschannen	f2c4beed9f	fix: tlogFitness did not consider it better to have one tlog of a better fitness fix: checkStable was not used in all places in better master exists fix: we need to call checkOutstanding on worker registration in all cases fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it	2018-01-04 11:33:02 -08:00
Evan Tschannen	482ac38ca6	added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs	2017-12-01 13:04:32 -08:00
Yichi Chiang	d9a98aa968	Remove commented code	2017-11-16 17:25:37 -08:00
Yichi Chiang	0d5dc15ac8	Fix double recoveries	2017-11-16 16:58:55 -08:00
Evan Tschannen	ad456a939a	Merge pull request #206 from cie/change-excluded-cluster-controller Change excluded cluster controller	2017-11-15 17:28:33 -08:00
Yichi Chiang	f96faf72d9	Add fullyRecoveredConfig for checking exclusions	2017-11-15 17:15:24 -08:00
Yichi Chiang	df922bc973	Change excluded cluster controller	2017-11-14 13:57:37 -08:00
A.J. Beamon	3b952efb4e	Remove events from cluster controller that get logged for roughly every worker upon recovery, master registration, etc.	2017-11-14 10:15:45 -08:00
Evan Tschannen	706bf1e018	fix: we cannot trigger better master exists before a master is fully recovered because exclusions changed by the provisional master will not be committed until the master is fully recovered	2017-11-04 12:48:04 -07:00
Evan Tschannen	57aba0b3bc	fix: excluded servers were the same fitness as storage servers for the master role fix: better master exists did not considers exclusion for master fitness	2017-11-03 17:09:14 -07:00
Balachandar Namasivayam	3efaaec479	onMasterProxiesChanged was being triggered when any member of ClientDBInfo changed. Change the behavior to be triggered only when proxies field in ClientDBInfo is changed.	2017-11-01 18:29:56 -07:00
A.J. Beamon	7cf17df821	Merge branch 'master' into log-group-for-unsupported-clients # Conflicts: # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2017-11-01 11:31:02 -07:00
Balachandar Namasivayam	988bc0207f	Reset Client Transaction profiling parameters when the config keys are cleared.	2017-10-31 15:40:57 -07:00
Yichi Chiang	4d54a73f5b	Merge pull request #191 from cie/count-cluster-controller-role Take cluster controller role into consideration when recruiting workers	2017-10-25 12:09:15 -07:00
Yichi Chiang	f39cce9b8d	Use processId instead of address for comparison	2017-10-25 11:35:29 -07:00
Yichi Chiang	5fcef911f0	Take cluster controller role into consideration when recruiting workers	2017-10-25 10:35:46 -07:00
Yichi Chiang	c2a117fe07	Merge pull request #189 from cie/enable-check-desired-class Enable checkUsingDesiredClasses() in consistency check	2017-10-24 15:18:21 -07:00
Yichi Chiang	3865c5ae0e	Enable checkUsingDesiredClasses() in consistency check	2017-10-24 12:58:54 -07:00
Evan Tschannen	e2c1e87df6	made a large number of fixes to make fearless DR correctness clean.	2017-10-19 15:36:32 -07:00
Bhaskar Muppana	2007f3799f	Don't ignore TimeKeeper failures.	2017-10-18 14:31:31 -07:00
Evan Tschannen	215bcb8d3e	Merge pull request #157 from cie/choose-leader-on-stateless-processes Catch and update processClass change from DBSource	2017-10-13 14:03:29 -07:00
Yichi Chiang	5bcdd37c0d	Move UID generation and add initialClass	2017-10-13 13:46:37 -07:00
Yichi Chiang	12edd27281	Introduce prevChangeID to CandidacyRequest and LeaderHeartbeatRequest	2017-10-12 17:11:58 -07:00
Evan Tschannen	15962cf079	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbrpc/Locality.cpp # fdbrpc/Locality.h # fdbserver/ClusterController.actor.cpp # fdbserver/ClusterRecruitmentInterface.h # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/fdbserver.vcxproj.filters # fdbserver/masterserver.actor.cpp # fdbserver/worker.actor.cpp # flow/error_definitions.h	2017-10-05 17:09:44 -07:00
Yichi Chiang	3edc2824a9	Add initialClass to RegisterWorkerRequest 2	2017-10-05 11:03:25 -07:00
Yichi Chiang	05f7626e39	Add initialClass to RegisterWorkerRequest	2017-10-04 17:11:12 -07:00
Yichi Chiang	3c70df57b5	Fix cluster controller review comments	2017-10-04 15:48:55 -07:00
Alex Miller	e55cc447d2	Address code review comments. * Fixed memory corruption with SystemData key constants * Removed duplication in ClusterController * Reworked fdbcli actions to better represent explicit vs default assignments	2017-10-04 13:36:18 -07:00
Yichi Chiang	636ce4a131	Replace leader when find a better one	2017-09-29 16:34:55 -07:00
Yichi Chiang	d4f75630de	Support log group field in status json	2017-09-28 16:31:29 -07:00
Evan Tschannen	73fca75239	added the ability to disable timeKeeper; disabled timeKeeper before consistency check in simulation	2017-09-28 13:13:24 -07:00
Bhaskar Muppana	0f8ff26029	Merge pull request #158 from bmuppana/master <rdar://problem/34557380> Need a way to map real time to version	2017-09-27 17:56:42 -07:00
Bhaskar Muppana	6a0b1d6808	Fixing PR comments <rdar://problem/34557380> Need a way to map real time to version	2017-09-27 17:56:01 -07:00
Bhaskar Muppana	0bf5bdb23a	<rdar://problem/34557380> Need a way to map real time to version	2017-09-25 12:51:37 -07:00
Yichi Chiang	6758c649fc	Catch and update processClass change from DBSource	2017-09-25 10:36:03 -07:00
Evan Tschannen	cce4eeb52d	fix: the master was sending the cluster controller uninitialized configurations	2017-09-22 16:59:24 -07:00
Evan Tschannen	fbd67ea547	fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded) fix: better master exists did not use exclusions because the configuration was reset	2017-09-20 11:48:26 -07:00
Evan Tschannen	f3b7aa615d	fix: seed storage servers are recruited based on the storage policy	2017-09-14 17:06:00 -07:00
Alvin Moore	9404d226d0	Merge branch 'release-5.0'	2017-09-13 16:49:00 -07:00
Alvin Moore	cb92194772	Fixed problem with master being recruited on excluded servers	2017-09-13 16:48:27 -07:00
Evan Tschannen	8cb53fd608	Merge pull request #149 from cie/choose-leader-on-stateless-processes choose leader on the perferred process class	2017-09-13 13:58:49 -07:00
Evan Tschannen	aea7a78cff	cluster controller changes were not maintained during merge	2017-09-11 17:40:46 -07:00
Evan Tschannen	d343d37274	fixed merge problems	2017-09-11 16:37:10 -07:00
A.J. Beamon	9a0a3b6329	Merge commit '66528becb82d826e81fa644bb378212584ab580e'	2017-08-28 16:47:59 -07:00
Yichi Chiang	9fe927127f	choose leader on the perferred process class	2017-08-28 14:41:04 -07:00
Alvin Moore	44e0df78c5	Added support for tracking roles for simulation workers Fixed the exclusion and inclusion address simulation API and integration within workloads Added more information within trace events for simulation	2017-08-28 11:25:37 -07:00
Evan Tschannen	9fd5955e92	Merge branch 'master' into removing-old-dc-code	2017-06-26 16:27:10 -07:00
Alvin Moore	0b9ed67e12	Fixed support for RemoveServers Workload Added availability functions to simulation	2017-05-26 14:20:11 -07:00
FDB Dev Team	a674cb4ef4	Initial repository commit	2017-05-25 13:48:44 -07:00

... 3 4 5 6 7

347 Commits