foundationdb

Commit Graph

Author	SHA1	Message	Date
A.J. Beamon	7c801513e2	Fix cases where latency band config could be discarded during recovery or process start.	2019-11-20 11:44:18 -08:00
Evan Tschannen	ffc89d1182	fix: dd test recruitment should prefer the location of ratekeeper over other used processes	2019-11-13 12:58:55 -08:00
Balachandar Namasivayam	2e41497580	This commit tries to distribute RK and DD among other empty available processes.	2019-11-12 17:52:42 -08:00
Balachandar Namasivayam	f5282f2c7e	Fix bug where DD or RK could be halted and re-recruited in a loop for certain valid process class configurations. Specifically, recruitment of DD or RK takes into account that master process is preferred over proxy, resolver or cc. But check for better DD only looks for better machine class ignoring that the new recruit could share a proxy or resolver or CC. Also try to balance the distribution of the DD and RK role if there are enough processes to do so.	2019-11-12 14:22:36 -08:00
Evan Tschannen	43e99ef6a4	fix: better master exists must check if fitness is better for proxies or resolvers before looking at the count of either of them	2019-10-17 13:18:31 -07:00
Evan Tschannen	298b815109	one proxy or resolver with best fitness no longer prevents more proxies or resolvers from being recruited with good fitness	2019-10-14 18:32:17 -07:00
Evan Tschannen	5064d91b75	fix: the cluster controller would not change to a new set of satellite tlogs when they become available in a better satellite location	2019-10-14 18:31:23 -07:00
Evan Tschannen	35e816e9ad	added the ability to configure satellite_logs by satellite location, this will overwrite the region configure if both are present	2019-10-14 18:30:15 -07:00
A.J. Beamon	31ce56eddf	Add cluster controller metrics	2019-10-03 15:29:11 -07:00
Evan Tschannen	a62862c105	add yieldedFutures to prevent slow tasks	2019-09-11 16:26:48 -07:00
Evan Tschannen	945cff1e5b	the cluster controller caches the serialization of serverDBInfo, to avoid regenerating it many times	2019-09-10 14:27:22 -07:00
Evan Tschannen	90e3b50213	Merge branch 'master' into feature-coordinator-connection # Conflicts: # fdbclient/DatabaseContext.h # fdbclient/NativeAPI.actor.cpp # fdbclient/NativeAPI.actor.h # fdbserver/workloads/KillRegion.actor.cpp	2019-07-26 15:05:02 -07:00
Evan Tschannen	be5d144b8b	added status information on connected clients	2019-07-25 17:15:31 -07:00
Jingyu Zhou	bbeaf0ebbb	Add a monitorServerInfoConfig() call back This was deleted during a code refactor in `ef868f5`. Because no tests were complaining, we didn't find this until now.	2019-07-25 15:17:26 -07:00
Evan Tschannen	4a866290b7	Clients keep a persistent connection open with coordinators to get updates to the list of proxies Status still needs to be updated with client information with information from the coordinators	2019-07-23 19:22:44 -07:00
Jingyu Zhou	50e7593c5b	Merge pull request #1796 from ajbeamon/remove-trace-event-underscores Remove trace event underscores	2019-07-05 21:45:55 -07:00
A.J. Beamon	9f4b6fd770	Remove additional underscores	2019-07-05 08:12:25 -07:00
Alex Miller	7a500cd37f	A giant translation of TaskFooPriority -> TaskPriority::Foo This is so that APIs that take priorities don't take ints, which are common and easy to accidentally pass the wrong thing.	2019-06-25 02:47:35 -07:00
Vishesh Yadav	a8e408e268	run clang-format on changes	2019-06-10 14:10:24 -07:00
Vishesh Yadav	6fa7081a21	net: Don't make FailureMonitoring requests from client This patch removes the need for clients to continuously contact cluster coordinator for failure monitoring information. Instead, it uses the FlowTransport to monitor the statuses of peers and update FailureMonitor accordingly.	2019-06-09 00:43:38 -07:00
Evan Tschannen	29b96414e2	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/NativeAPI.actor.cpp # fdbserver/Coordination.actor.cpp # flow/Arena.h # versions.target	2019-06-03 18:49:35 -07:00
Evan Tschannen	7c333dbc16	If a process receives a message in its clusterControllerInterface before becoming the cluster controller, if the process does not become the cluster controller in the next minute it should destroy the interface to prevent a memory leak.	2019-05-29 16:57:13 -07:00
A.J. Beamon	5f55f3f613	Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.	2019-05-10 14:01:52 -07:00
Andrew Noyes	6207d724f8	Fix all -Wunused-variable warnings	2019-04-15 18:13:00 -07:00
mpilman	1c16f87a4e	Remove trace-calls to printable (in non-workloads)	2019-04-05 13:12:19 -07:00
mpilman	c008e16c81	Defer formatting in traces to make them cheaper This is the first part of making `TraceEvent` cheaper. The main idea is to defer calls to any code that formats string. These are the main changes: - TraceEvent::detail now takes a c-string instead of std::string for literals. This prevents unnecessary allocations if the trace is not going to be printed in the first place (for example for SevDebug). Before that `detail` expected a `std::string` as key, which mean that any string literal would be copied on each call. - Templates Traceable and SpecialTraceMetricType. These templates can be specialized for any type that needs to be printed. The actual formatting will be deferred to after the `enabled` check. This provides two benefits: (1) if a TraceEvent is disabled, we don't pay for the formatting and (2) TraceEvent can trace types that it doesn't know about. - TraceEvent::enabled will be set in the constructor if the Severity is passed. This will make sure that `TraceEvent::init` is not called. - `TraceEvent::detail` will be inlined. So for disabled TraceEvent calls, a call to detail will only introduce a if-branch which is much cheaper than a function call.	2019-04-05 13:12:19 -07:00
Evan Tschannen	8ebf771392	cleanup cluster controller trace events	2019-03-30 14:17:18 -07:00
A.J. Beamon	71e2fdafb8	Changes to ratekeeper camel case	2019-03-27 08:24:25 -07:00
Evan Tschannen	5e03e178de	Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues Add support for a client or worker having multiple issues.	2019-03-24 17:27:50 -07:00
Evan Tschannen	d45159ebf7	Merge pull request #1307 from jzhou77/ratekeeper Monitor placement of Ratekeeper and DataDistributor	2019-03-24 17:26:07 -07:00
Evan Tschannen	d6ad027d37	ratekeeper needs to be recruited for proxies to make progress, so if one has not registered with the cluster controller by the time we are accepting commits, recruit a new one	2019-03-24 16:48:24 -07:00
Evan Tschannen	f426d732ea	fix: forgot to remove one location where id_used was incremented for distributor and ratekeeper	2019-03-24 16:04:59 -07:00
Evan Tschannen	e8948726e8	once we recruit a ratekeeper, do not allow any other ratekeepers to register	2019-03-24 11:04:39 -07:00
Jingyu Zhou	40eec20252	Restore master PID in worker registration This fix is lost during merge.	2019-03-23 21:02:11 -07:00
Jingyu Zhou	3ef26e6be3	Fix fitness assignment statements Found by MacOS build.	2019-03-23 19:16:04 -07:00
Evan Tschannen	1fc6937802	changed NetworkAddressList to at most two addresses for performance	2019-03-23 17:54:46 -07:00
Evan Tschannen	b51a24453e	the data distributor and ratekeeper are not included in id_used, but when comparing equally good options we prefer to avoid sharing with those roles excluded data distributor and ratekeeper were improperly killed when the best option was also excluded	2019-03-23 13:25:36 -07:00
Jingyu Zhou	fdc5b5ddbf	Fix: spurious ratekeeper registration A rare race condition: -r simulation -f ./foundationdb/tests/slow/WriteDuringReadAtomicRestore.txt -s 114256311 -b on - A is the ratekeeper. - CC recruit B and B starts - CC halts ratekeeper A and A is halted - A registers back with CC, which then halts B. CC sets A to be the ratekeeper. CC starts recruiting and finds A is the best machine. But skips recruiting because CC thinks A is already used. Now the cluster is left with no ratekeeper. Fix by disallowing ratekeeper registration with previous ID.	2019-03-23 11:03:51 -07:00
Jingyu Zhou	6523cd4931	Fix: recruit ratekeeper is not triggerred	2019-03-23 09:20:54 -07:00
Evan Tschannen	2da46e3172	fix: halt if datacenters are different	2019-03-22 23:53:21 -07:00
Evan Tschannen	d34c56c9a5	ensure that the processId exists in id_worker before accessing it	2019-03-22 18:54:39 -07:00
Evan Tschannen	36ab852bb1	Merge branch 'master' into ratekeeper # Conflicts: # fdbserver/ClusterController.actor.cpp	2019-03-22 18:41:00 -07:00
Evan Tschannen	ddb6058770	simplified ratekeeper monitoring loop	2019-03-22 18:22:45 -07:00
Jingyu Zhou	12917d8c7d	Add actors to store halt request futures Address best fitness in checking better DD or RK.	2019-03-22 18:06:38 -07:00
Jingyu Zhou	e8977aeb98	Remove clusterControllerDcId check This is no longer needed since it'll be set in the ctor.	2019-03-22 18:01:54 -07:00
Evan Tschannen	82bc447e29	startRatekeeper is responsible for updating serverDBInfo	2019-03-22 17:56:16 -07:00
Evan Tschannen	82c80c225d	make sure id_worker is updated before setting ratekeeper or data distribution	2019-03-22 17:08:54 -07:00
Evan Tschannen	6a9c9d79cc	Update fdbserver/ClusterController.actor.cpp	2019-03-22 17:00:58 -07:00
Evan Tschannen	70b1c88cdd	Update fdbserver/ClusterController.actor.cpp	2019-03-22 17:00:52 -07:00
Jingyu Zhou	16f54577ee	Restore master PID in cluster controller worker registration CC may think master failed and clear the master PID, which can block both data distributor and ratekeeper recruitment. Fix by restoring it during worker registration.	2019-03-22 14:53:05 -07:00

1 2 3 4 5 ...

272 Commits