foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	1a4c1759a4	Merge pull request #1429 from jzhou77/pprof Dump heap profiler when memory usage is high	2019-04-29 16:31:44 -07:00
Jingyu Zhou	6870e132b2	Merge branch 'master' into pprof	2019-04-19 14:06:44 -07:00
Andrew Noyes	ef04471a66	Fix more unused-variable warnings	2019-04-17 16:04:10 -07:00
Evan Tschannen	cd5c9d91fa	Merge pull request #1443 from etschannen/master Merge 6.1 into master	2019-04-10 17:43:07 -07:00
Evan Tschannen	05869a8383	do not log a degraded reset message if the previous reset was more than a week ago	2019-04-07 23:00:58 -07:00
Jingyu Zhou	4b08042a88	Change memory profiling threshold to a flag	2019-04-05 16:33:51 -07:00
Jingyu Zhou	09b2c35d11	Dump heap profiler when memory usage is high Set the threshold of dump to 2GB.	2019-04-05 16:12:23 -07:00
mpilman	1c16f87a4e	Remove trace-calls to printable (in non-workloads)	2019-04-05 13:12:19 -07:00
mpilman	c008e16c81	Defer formatting in traces to make them cheaper This is the first part of making `TraceEvent` cheaper. The main idea is to defer calls to any code that formats string. These are the main changes: - TraceEvent::detail now takes a c-string instead of std::string for literals. This prevents unnecessary allocations if the trace is not going to be printed in the first place (for example for SevDebug). Before that `detail` expected a `std::string` as key, which mean that any string literal would be copied on each call. - Templates Traceable and SpecialTraceMetricType. These templates can be specialized for any type that needs to be printed. The actual formatting will be deferred to after the `enabled` check. This provides two benefits: (1) if a TraceEvent is disabled, we don't pay for the formatting and (2) TraceEvent can trace types that it doesn't know about. - TraceEvent::enabled will be set in the constructor if the Severity is passed. This will make sure that `TraceEvent::init` is not called. - `TraceEvent::detail` will be inlined. So for disabled TraceEvent calls, a call to detail will only introduce a if-branch which is much cheaper than a function call.	2019-04-05 13:12:19 -07:00
Jingyu Zhou	5be592632b	Change trace event message If heap profiler is not running, we can't take a snapshot of the profile.	2019-04-04 15:29:50 -07:00
Jingyu Zhou	f538df5e6c	Add TraceEvent if unable to invoke heap profiler	2019-04-04 15:26:41 -07:00
Evan Tschannen	390ab9cfed	A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy	2019-04-04 14:11:12 -07:00
Alex Miller	8f49be480b	Update fdbserver/worker.actor.cpp Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>	2019-04-04 13:32:10 -07:00
Jingyu Zhou	eaaf58ee34	Refactor profiler into cpu and heap profilers	2019-04-03 20:54:30 -07:00
Jingyu Zhou	3371cf22d4	Add manually triggered heap profiling At client side: fdb> profile ERROR: Usage: profile <client\|list\|flow\|heap> fdb> profile heap 127.0.0.1:4500 On the server side: $ HEAPPROFILE=/tmp/fdbserver bin/fdbserver -C ../test.cluster -p 127.0.0.1:4500 Starting tracking the heap FDBD joined cluster. Dumping heap profile to /tmp/fdbserver.0001.heap (1024 MB allocated cumulatively, 13 MB currently in use) Dumping heap profile to /tmp/fdbserver.0002.heap (User triggered heap dump)	2019-04-03 16:00:54 -07:00
Jingyu Zhou	49fdc35e5e	Gperftools Profiling fix. Fix a bug and update gperftools compiling flags The added flags are recommended by gperftools here: https://github.com/gperftools/gperftools Verified that heap profiles are saved with the following command: HEAPPROFILE=/tmp/fdbserver fdbserver [args...]	2019-04-01 14:42:18 -07:00
A.J. Beamon	91014d4529	Add file changes that I accidentally failed to commit; fix naming issue in worker.	2019-03-27 08:41:19 -07:00
A.J. Beamon	71e2fdafb8	Changes to ratekeeper camel case	2019-03-27 08:24:25 -07:00
A.J. Beamon	d508658569	Make ratekeeper one word to match our existing convention	2019-03-27 08:15:19 -07:00
Evan Tschannen	5e03e178de	Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues Add support for a client or worker having multiple issues.	2019-03-24 17:27:50 -07:00
Evan Tschannen	d45159ebf7	Merge pull request #1307 from jzhou77/ratekeeper Monitor placement of Ratekeeper and DataDistributor	2019-03-24 17:26:07 -07:00
Steve Atherton	09f37cf3d2	Merge pull request #533 from ajbeamon/fix-parent-directory Fixes to parentDirectory() and abspath()	2019-03-22 23:53:46 -07:00
Evan Tschannen	36ab852bb1	Merge branch 'master' into ratekeeper # Conflicts: # fdbserver/ClusterController.actor.cpp	2019-03-22 18:41:00 -07:00
A.J. Beamon	4eb5715689	Add support for a client or worker having multiple issues.	2019-03-22 08:29:41 -07:00
Jingyu Zhou	da338c3ad6	Avoid unnecessary recuriting of DD or RK While waiting for recruting data distributor or ratekeeper, a previous one could already joined. So we can skip this unnecessary recruiting. Revert the change of worker.actor.cpp for ratekeeper. Instead, recruiting ratekeeper should avoid the process with an existing one. This fixes a bug where the ratekeeper interface became zombie, killing other healthy ratekeeper but doing no useful work. Found by: -r simulation --crash -f tests/fast/WriteDuringRead.txt -s 31858110 -b on	2019-03-21 22:40:07 -07:00
Jingyu Zhou	48324ad4be	Fix a race during ratekeeper registration When a ratekeeper registers, the monitorRatekeeper wakes up and recruits a new ratekeeper. Adding a 0s delay to avoid this. If a ratekeeper is recruited on an existing machine, update the interface so that the cluster controller can clear the ratekeeperID.	2019-03-21 12:56:56 -07:00
Evan Tschannen	5b9c45ea0b	clients do not attempt to connect to provisional proxies	2019-03-19 13:37:50 -07:00
Jingyu Zhou	0fb6a03c07	First round of review comment fixes for PR#1307	2019-03-19 11:29:19 -07:00
Alex Miller	29ab7370cd	Clear versionLocation when spilling, and pop DQ separately. Popping the disk queue now requires potentially recovering the location to which we can pop from the spilled data itself, and for each tag we must maintain the first location with relevant data. The previous queue we had to represent the ordering, queueOrder, was used by spilling, and popped when a TLog had been spilled. This means that as soon as a TLog has been fully spilled, we have no idea how it relates in order to other fully spilled TLogs. Instead, use queueOrder to keep track of all the TLog UIDs until they're removed, and use spillOrder to keep track of the order only for spilling.	2019-03-18 15:09:22 -07:00
Stephen Atherton	2efb6f4c0d	Added cleanPath() which puts a path in a canonical form without .., ., or duplicate separators without using the filesystem or resolving symbolic links. absPath() redefined to use cleanPath() so it will return the same result for a path without symbolic links regardless of whether or not the path actually exists. Redefined parentDirectory() to use absPath() and error on certain inputs. Added comments describing behavior of these functions, and added a unit test which verbosely tests many inputs to them.	2019-03-15 23:54:33 -07:00
Alex Miller	bf247eeed0	If TLogVersion >= 3, use crc32c for the DiskQueue hash for TLogs. We don't have a forward compatibility story for the memory storage engine, so its DiskQueue will still be hashlittle2 until one exists.	2019-03-15 21:01:16 -07:00
Jingyu Zhou	99d521ef4f	Monitor Ratekeeper and DataDistributor to use stateless processes Since Ratekeeper and DataDistributor are no longer running with Master, they might be running with stateful processes before a new Master becomes alive, which is undesirable. This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster Controller -- if Master runs on a stateless class and RK/DD runs at a worse class, then RK/DD will be killed. I.e., RK/DD should be running at their own classes or on the same stateless process as Master. After restart, RK/DD should be running at a better process class.	2019-03-14 15:00:57 -07:00
Evan Tschannen	e068c478b5	merge master	2019-03-12 18:31:25 -07:00
Jingyu Zhou	2b0139670e	Fix review comment for PR 1176	2019-03-12 12:02:30 -07:00
Evan Tschannen	c6e94293bf	reset a process to not be degraded after 2 days	2019-03-10 22:39:21 -07:00
Evan Tschannen	53f16b5347	when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded	2019-03-08 11:46:34 -05:00
Jingyu Zhou	3c86643822	Separate Ratekeeper from data distribution. Add a new role for ratekeeper. Remove StorageServerChanges from data distribution. Ratekeeper monitors storage servers, which borrows the idea from DataDistribution.	2019-03-07 13:16:20 -08:00
A.J. Beamon	999ee68609	Improve avoidance of transient issues when logging IncorrectClusterFileContents SevWarnAlways events by making it time based.	2019-02-27 10:08:24 -08:00
Alex Miller	2dc57568cb	Change many things about log_version. * log_version in the database (`/conf/log_version`) is now a hint that gets rounded to the nearest supported version. * fdbcli and FDB enforce that only a valid log_version can be configured to * TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet) * Some comments here and there * Add an assert on filename length to make sure KV-pairs in filename don't exceed a maximum length.	2019-02-26 16:47:04 -08:00
Alex Miller	6d23eb2d1a	Implement log_version. This mega-commit introduces a new configuration setting, `log_version`, that controls the TLog implementations and features that are available within FDB, so that users can opt in to new features if they're willing to sacrifice backwards compatibility.	2019-02-22 12:15:23 -08:00
Alex Miller	7bd63cf5ea	Added missing switch case.	2019-02-20 15:26:41 -08:00
Alex Miller	fa1bfbc0c5	Replace TLogSpillType with TLogVersion in worker and filenames.	2019-02-19 22:30:15 -08:00
Alex Miller	7b1afdc71e	Hacky plumbing of spill type and file renaming.	2019-02-19 22:18:10 -08:00
mpilman	999ea09bfd	Use correct fwd decls in TesterInterface Also TesterInterface.h -> TesterInterface.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3f0fd2a20c	Use fwd decls in WorkerInterface Also WorkerInterface.h -> WorkerInterface.actor.h	2019-02-19 15:16:59 -08:00
mpilman	3a0f9839b9	Fix minor IDE build errors	2019-02-19 15:16:59 -08:00
mpilman	0bb60e5a3b	Use proper fwd decl in NativeAPI Also NativeAPI.h -> NativeAPI.actor.h	2019-02-19 15:16:59 -08:00
Vishesh Yadav	e05b53d755	Merge remote-tracking branch 'apple/master' into task/tls-upgrade	2019-02-15 20:37:07 -08:00
Jingyu Zhou	5e6577cc82	Final cleanup per review comments Make distributor interface optional in ServerDBInfo and many other small changes.	2019-02-14 16:37:17 -08:00
Jingyu Zhou	578473a974	Various review comments fixes	2019-02-14 16:37:16 -08:00

1 2 3 4

167 Commits