foundationdb

Commit Graph

Author	SHA1	Message	Date
Alex Miller	df7f0cffa1	Raise the priority of TLogRejoin above the default work priority. With sharded txs tags, the master now receives data from transaction logs at an order of magnitude higher rate. This is the intentional desires result of sharding the txs tag. With a sufficient number of TLogs, the master will saturate its CPU time handling the peek responses. Performance tests revealed some unstable oddities in how long a recovery would take, which was eventually root caused to a priority inversion between TLogRejoin requests and TLog peek replies. Once peek replies saturate the CPU, the master would proceed to ignore further TLogRejoin messages. TLogRejoin is what marks a TLog as available to the failure monitor, which is also what decides between a ServerPeekCursor and a MergePeekCursor for a SetPeekCursor. Ignoring TLogRejoins meant that the sharded txs locality tags for those servers would be merge peeked over all TLogs. This is much less efficient than just peeking one copy of data from the one preferred server. Depending on the race between TLogPeek replies saturating the CPU and TLogRejoin requests being submitted, a variable number of tags would be affected, and thus the performance test would have some variance in its results.	2019-07-19 16:55:04 -07:00
Alex Miller	c3a8ae4752	Merge pull request #1791 from fzhjon/fetch-keys-requests-priority Introduce priority to fetchKeys requests from data distribution	2019-07-19 14:54:51 -07:00
mpilman	6a4a129cf5	fixed silly boost visitor bug	2019-07-16 15:10:55 -07:00
mpilman	b18666d942	statically link libstdc++ on Linux and remove std::variant this will hopefully fix #1610	2019-07-16 14:53:16 -07:00
Jon Fu	f707d186fe	added new priority for fetchkeys requests and adjusted ddmetrics workload to run parallel with mako	2019-07-11 09:56:58 -07:00
A.J. Beamon	69d7c4f79c	Merge branch 'master' into track-run-loop-busyness # Conflicts: # documentation/sphinx/source/release-notes.rst # flow/Net2.actor.cpp # flow/network.h	2019-07-09 18:39:23 -07:00
Alex Miller	888f4f92e0	Fix errors and TaskPriority more priorities.	2019-07-03 21:03:58 -07:00
Alex Miller	ea6898144d	Merge remote-tracking branch 'upstream/master' into flowlock-api	2019-07-03 20:44:15 -07:00
Jingyu Zhou	5ea2e69016	Remove a fdbprc header from flow library Flow should be an independent library.	2019-07-03 19:56:38 -07:00
Evan Tschannen	3fb0999e10	revert storage server priority changes	2019-07-02 16:54:47 -07:00
Alex Miller	8e1ab6e7db	Merge remote-tracking branch 'upstream/master' into flowlock-api	2019-06-28 17:32:54 -07:00
Evan Tschannen	4cef1d3937	Experimental change of storage write priority	2019-06-28 16:54:22 -07:00
A.J. Beamon	7f23814841	Track run loop busyness and report it in status.	2019-06-26 14:03:02 -07:00
Alex Miller	bf883d7055	Merge remote-tracking branch 'upstream/master' into flowlock-api	2019-06-25 14:26:50 -07:00
Evan Tschannen	0fe6edc254	Merge pull request #1678 from mpilman/features/external-workload Features/external workload	2019-06-25 13:53:19 -07:00
Alex Miller	7a500cd37f	A giant translation of TaskFooPriority -> TaskPriority::Foo This is so that APIs that take priorities don't take ints, which are common and easy to accidentally pass the wrong thing.	2019-06-25 02:47:35 -07:00
mpilman	2eff2b7e21	First simple test is working (but very buggy)	2019-06-19 13:03:41 -07:00
mpilman	68ce9a5e75	ProtocolVersion type - second try	2019-06-18 17:55:27 -07:00
Vishesh Yadav	a8e408e268	run clang-format on changes	2019-06-10 14:10:24 -07:00
Vishesh Yadav	6b4d30c3ae	failmon: Identify client vs server when starting failure monitoring client	2019-06-09 00:43:12 -07:00
mpilman	f5fa3a65b4	some more fixes	2019-05-13 14:15:23 -07:00
mpilman	44db3450ec	Several flatbuffers bug fixes	2019-05-13 14:15:23 -07:00
mpilman	69fa3d3903	fixed compilation issues after rebase	2019-05-13 14:15:23 -07:00
mpilman	9eeb48c43d	Allow to turn on object serializer This commit includes functionality to turn on the object serializer for network communication. This is done the following way: - On incoming connections, a process will detect whether the client supports the object serializer and will only serialize responses with it, if it does - On outgoing connections, the command line flag is used to determine whether the object serializer should be used to send data. This way, a cluster can run in mixed mode. To upgrade one can upgrade one process at a time and set the flag one process at a time. This is how this is tested on the simulator: - The command line flag can take three options: on, off, and random. - For off, the object serializer will never we used. - For on, the object serializer will be always used. - For random, the simulator will flip a coin for each process it starts up.	2019-05-13 14:15:22 -07:00
mpilman	fe81454ec2	basic functionality for object serializer This commit includes: - The flatbuffers implementation - A draft on how it should be used for network messages - A serializer that can be used independently What is missing: - All root objects will need a file identifier - Many special classes can not be serialized yet as the corresponding traits are not yet implemented - Object serialization can not yet be turned on (this will need a network option)	2019-05-13 14:15:22 -07:00
Evan Tschannen	2d5043c665	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-04-30 18:27:04 -07:00
Evan Tschannen	4fa1c008f9	Highly prioritize storageServerRejoin messages on the proxy, so that storage servers can rejoin the cluster even when a proxy is CPU saturated	2019-04-23 20:56:01 -07:00
Alex Miller	c25fac9421	Lower the priority of spilled peeks to below that of spilling. This prevents peeking from starving the TLog of CPU time to spill, and thus impacting write throughput at prolonged saturation.	2019-04-22 18:39:21 -07:00
mpilman	c008e16c81	Defer formatting in traces to make them cheaper This is the first part of making `TraceEvent` cheaper. The main idea is to defer calls to any code that formats string. These are the main changes: - TraceEvent::detail now takes a c-string instead of std::string for literals. This prevents unnecessary allocations if the trace is not going to be printed in the first place (for example for SevDebug). Before that `detail` expected a `std::string` as key, which mean that any string literal would be copied on each call. - Templates Traceable and SpecialTraceMetricType. These templates can be specialized for any type that needs to be printed. The actual formatting will be deferred to after the `enabled` check. This provides two benefits: (1) if a TraceEvent is disabled, we don't pay for the formatting and (2) TraceEvent can trace types that it doesn't know about. - TraceEvent::enabled will be set in the constructor if the Severity is passed. This will make sure that `TraceEvent::init` is not called. - `TraceEvent::detail` will be inlined. So for disabled TraceEvent calls, a call to detail will only introduce a if-branch which is much cheaper than a function call.	2019-04-05 13:12:19 -07:00
Evan Tschannen	aa368c08a2	changed NetworkAddress hash function to use more bytes from the IP address	2019-03-28 17:47:13 -07:00
Evan Tschannen	80ecb12190	change the IPv6 hash function to be more efficient	2019-03-28 14:07:46 -07:00
Evan Tschannen	34b9d5e722	Merge pull request #1364 from etschannen/feature-fast-serialize A few performance optimizations	2019-03-27 20:57:25 -07:00
Evan Tschannen	6997075917	changed back to isV6addr instead of isV4addr for compatibility	2019-03-27 19:55:36 -07:00
Evan Tschannen	e5a80f2c94	optimized IPaddress	2019-03-27 18:21:13 -07:00
A.J. Beamon	71e2fdafb8	Changes to ratekeeper camel case	2019-03-27 08:24:25 -07:00
Evan Tschannen	1fc6937802	changed NetworkAddressList to at most two addresses for performance	2019-03-23 17:54:46 -07:00
Alex Miller	7f5bc2981f	Checksum DiskQueue pages on read, but at a lower priority. If a server has its data spilled, then it's behind the 5s window. Feeding it data is less important than committing, so we can hide the extra CPU usage from checksumming the read amplified disk queue pages.	2019-03-15 21:01:19 -07:00
Jingyu Zhou	3c86643822	Separate Ratekeeper from data distribution. Add a new role for ratekeeper. Remove StorageServerChanges from data distribution. Ratekeeper monitors storage servers, which borrows the idea from DataDistribution.	2019-03-07 13:16:20 -08:00
Vishesh Yadav	96ee95b9ad	fix: macOS build #963 Use the boost representation of IPv6 address internally and make sure it uses std::array.	2019-03-05 14:03:14 -08:00
Vishesh Yadav	592e224155	net: add/use formatIpPort to format IP:PORT pairs #963	2019-03-04 14:12:45 -08:00
Vishesh Yadav	82b2da4b78	net: Add IPAddress::parse() util #963	2019-03-04 14:12:45 -08:00
Vishesh Yadav	25daabdc02	net: TraceEvent and toIPVectorString for new IPAddress structure #963	2019-03-04 14:12:45 -08:00
Vishesh Yadav	57832e625d	net: Support IPv6 #963 - NetworkAddress now contains IPAddress object which can be either IPv4 or IPv6 address. 128bits are used even for IPv4 addresses, however only 32bits are used when using/serializing IPv4 address. - ConnectPacket is updated to store IPv6 address. Backward compatible with old format since the first 32bits of IP address field is used for serialization of IPv4. - Mainly updates rest of the code to use IPAddress structure instead of plain uint32_t. - IPv6 address/pair ports should be represented as `[ip]:port` as per convention. This applies to both cluster files and command line arguments.	2019-03-04 14:12:41 -08:00
Evan Tschannen	e9ddd94e27	The failure monitor is given a list of all IP addresses associated with a process The connect packet includes the correct remote address Did a lot of code cleanup Simulation test mixed TLS and non-TLS listeners on the same process	2019-01-31 18:20:14 -08:00
Vishesh Yadav	42dffd4dff	Take a vector of network addresses from CLI to start FDB server Extends the CLI interface to take multiple public and listen addresses. We however do not do anything with those extra addresses and just consider the first one for now.	2018-12-13 13:36:52 -08:00
Vishesh Yadav	e8e01b2406	Remove unused localAddress parameter from newNet2 and Net2 classes	2018-12-13 13:36:52 -08:00
Robert Escriva	268093a96d	Adjust all includes to be relative to the root. Remove the use of relative paths. A header at foo/bar.h could be included by files under foo/ with "bar.h", but would be included everywhere else as "foo/bar.h". Adjust so that every include references such a header with the latter form. Signed-off-by: Robert Escriva <rescriva@dropbox.com>	2018-10-19 17:35:33 +00:00
Evan Tschannen	be1a4d74c7	tlogs serve reads to log routers at a low priority, to prevent them from using all their resources catching up a remote dc that has been down for a long time increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget	2018-08-04 10:31:30 -07:00
Evan Tschannen	65057b4788	Merge branch 'release-5.2' into release-6.0 # Conflicts: # documentation/sphinx/source/downloads.rst # documentation/sphinx/source/release-notes.rst # fdbclient/MasterProxyInterface.h # packaging/msi/FDBInstaller.wxs	2018-08-02 11:29:40 -07:00
Evan Tschannen	a361a785e8	fix: clients which cannot talk to storage servers poll the proxy for new storage server interfaces. If there are too many clients polling, we saturate the proxies with these requests, prevents the storage servers from updating their interfaces.	2018-07-31 16:57:23 -07:00
Balachandar Namasivayam	529d0497f1	Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload. Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.	2018-06-01 15:21:40 -07:00
Balachandar Namasivayam	d3b5cfb93c	Support latest TLS plugin. Add support for https in backup.	2018-05-08 16:28:13 -07:00
Alec Grieser	0bae9880f1	remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py	2018-02-21 10:25:11 -08:00
Evan Tschannen	660cee0254	increased the priority of getKeyServersLocations, because once a client gets a read version, answering their reads should be higher priority than starting new transactions	2018-01-12 13:46:20 -08:00
Evan Tschannen	de119f192d	fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled)	2018-01-11 16:09:49 -08:00
Stephen Atherton	e3aee45a74	Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings.	2017-12-21 01:58:15 -08:00
Stephen Atherton	e934604f67	Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random. BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.	2017-10-15 21:51:11 -07:00
Stephen Atherton	fa4fdb1f1d	Merge branch 'fix-io-timeout-handling' into release-5.0 # Conflicts: # fdbserver/optimisttest.actor.cpp	2017-05-31 17:03:15 -07:00
Stephen Atherton	98604d33a0	Merge branch 'fix-io-timeout-handling' # Conflicts: # fdbrpc/AsyncFileKAIO.actor.h # fdbrpc/sim2.actor.cpp # fdbserver/KeyValueStoreSQLite.actor.cpp # fdbserver/optimisttest.actor.cpp # fdbserver/worker.actor.cpp # fdbserver/workloads/MachineAttrition.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2017-05-26 18:43:08 -07:00
FDB Dev Team	a674cb4ef4	Initial repository commit	2017-05-25 13:48:44 -07:00

1 2 3

110 Commits