foundationdb

Commit Graph

Author	SHA1	Message	Date
Markus Pilman	d853892b41	Bugfix Co-authored-by: Lukas Joswiak <lukas.joswiak@snowflake.com>	2020-11-12 13:48:23 -07:00
Markus Pilman	bdd3dbfa7d	remove duplicates	2020-11-10 14:01:07 -07:00
sfc-gh-tclinkenbeard	4669f837fa	Add uses of makeReference	2020-11-07 22:10:18 -08:00
Andrew Noyes	1c05b70942	Add INetwork::timer_monotonic	2020-11-05 17:07:49 +00:00
Russell Sears	32c87bbb33	Lightweight, power of two spaced histogram implementation + automatic reporting	2020-11-02 11:13:16 -08:00
Richard Chen	545ee4269d	master conflicts	2020-10-19 01:03:54 +00:00
Richard Chen	1c533e7363	fix flowtransport conflicts	2020-10-19 01:00:03 +00:00
Richard Chen	4eb20a1283	merge anoyes/stable-interface and add back in isCompatible	2020-10-12 20:39:37 +00:00
Richard Chen	2f5b0bef08	switch to test newer incompatible version. Fix PR comments. Modify schema	2020-10-12 18:29:16 +00:00
Richard Chen	bbf5bdf6da	fix stable interfaces test and corresponding changes in simulator	2020-10-12 18:25:12 +00:00
Richard Chen	5488ff1d81	draft diff protocol	2020-10-12 18:24:03 +00:00
Richard Chen	b233f44d2d	remove some print statements and spin lock that was used for debugging	2020-10-12 18:20:37 +00:00
Richard Chen	41843f07e6	add simulator support for different process versions and ProtocolVersion test	2020-10-12 18:19:31 +00:00
Richard Chen	76d0027fa2	merge anoyes/stable-interface and add back in isCompatible	2020-10-12 18:18:30 +00:00
sfc-gh-tclinkenbeard	a9607bdcec	Explicitly seal classes that inherit but aren't inherited from	2020-10-07 21:58:24 -07:00
sfc-gh-tclinkenbeard	71d0ef676c	Use override where applicable in fdbrpc	2020-10-07 19:55:05 -07:00
sfc-gh-tclinkenbeard	390c26b352	Replace NULL with nullptr in fdbrpc	2020-09-20 11:33:18 -07:00
sfc-gh-tclinkenbeard	c8b774a30a	Explicitly mark IAsyncFile functions as overrided	2020-08-19 18:17:32 -07:00
sfc-gh-tclinkenbeard	157700e5b6	Make IAsyncFile const-correct	2020-08-19 17:34:56 -07:00
Markus Pilman	a88d2f72e4	Simulation test now working	2020-08-11 15:34:59 -06:00
Markus Pilman	6d84bcb568	Merge remote-tracking branch 'origin/master' into features/udp	2020-08-06 14:08:34 -06:00
Markus Pilman	8976694ba1	UDP implementation (untested)	2020-08-06 14:06:50 -06:00
Meng Xu	a2089b354a	RemoveServersSafely:Safety check toKill1 to avoid cluster getting stuck toKill1 and toKill2 are a random subset of all processes. If simply kill all processes in toKill1 or toKill2, we may kill too many processes to make the cluster unavailable and stuck. Similar as what toKill2 were modified if it can cause cluster unavailable, we should do the same thing for toKill1	2020-07-28 21:07:31 -07:00
Meng Xu	6f2e12be42	Minor improvement on comments	2020-07-12 18:32:47 -07:00
Russell Sears	fcaaf11678	Merge pull request #3402 from sfc-gh-tclinkenbeard/improve-const-correctness Added more const-correctness improvements	2020-07-02 16:43:06 -07:00
Meng Xu	22f7f804b8	Merge branch 'release-6.3' into mengxu/merge-6.3-PR	2020-06-28 11:19:39 -07:00
Meng Xu	10e043da9d	Format sim2 code a bit	2020-06-23 10:07:39 -07:00
sfc-gh-tclinkenbeard	a59925dd73	Added more const-correctness improvements	2020-06-20 22:15:19 -07:00
Meng Xu	d268b6c055	Sim2:Remove an useless line	2020-06-18 12:12:15 -07:00
sfc-gh-tclinkenbeard	99bf993815	Replace BOOST_NOEXCEPT with noexcept	2020-06-09 22:39:19 -07:00
A.J. Beamon	fa08384bc3	Merge pull request #3239 from apple/release-6.3 Merge release 6.3 into master	2020-05-27 08:29:28 -07:00
A.J. Beamon	86f712657e	Eliminate the undefined behavior of calling run_network twice, instead returning an error.	2020-05-22 13:31:06 -07:00
A.J. Beamon	d128252e90	Merge release-6.3 into master	2020-05-22 09:25:32 -07:00
Evan Tschannen	5f979c0178	another compiler fix, reduced the size of the endpointNotFound map	2020-05-20 12:30:26 -07:00
tclinken	7003a68ba1	Removed outdated comments and errorCounts() declaration	2020-05-15 11:04:26 -07:00
Evan Tschannen	ad900135dd	Simulation did not properly track exclusions of tls processes	2020-05-07 10:53:13 -07:00
Evan Tschannen	c87aa33941	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # bindings/go/src/fdb/generated.go # documentation/sphinx/source/api-common.rst.inc # documentation/sphinx/source/api-ruby.rst # documentation/sphinx/source/release-notes.rst # fdbclient/FailureMonitorClient.actor.cpp # fdbclient/NativeAPI.actor.cpp # fdbclient/vexillographer/fdb.options # fdbrpc/FlowTransport.actor.cpp # fdbserver/OldTLogServer_6_0.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/fdbserver.actor.cpp # versions.target	2020-04-23 13:47:53 -07:00
Alex Miller	8b004fe8e3	Move stop callbacks to be called after run() in sim2.	2020-04-21 20:22:16 -07:00
Alex Miller	2ce539ef6d	Respect flow<->fdbrpc module boundaries. Which fixes a compilation error due to a circular dependency between flow.a and fdbrpc.a. However, this is now done at the cost of newNet2 users have to remember to add Net2FileSystem::stop() as a callback.	2020-04-20 02:53:07 -07:00
Balachandar Namasivayam	a476127f5f	Merge pull request #2802 from xumengpanda/mengxu/debug-master-PR Fix correctness failure on master branch	2020-03-18 16:07:36 -07:00
Evan Tschannen	e08f0201f1	merge release 6.2 into master	2020-03-17 12:51:47 -07:00
Meng Xu	7f559bc712	Cleanup code and apply clang-format Self code review	2020-03-16 15:08:32 -07:00
Alex Miller	0c558efcfe	Add a `tlsinfo` command to fdbcli that prints the certificate chain. This requires the certificate chain to load successfully, otherwise fdbcli will error out at an earlier point due to Net2 not being able to configure TLS.	2020-03-13 00:11:53 -07:00
Meng Xu	0ef09539a9	addressMap[normalizedAddress]->address may not equal to normalizedAddress	2020-03-12 13:01:25 -07:00
Meng Xu	1759d5c8c4	Apply clang-format	2020-03-12 10:18:53 -07:00
Meng Xu	bd345f85db	ConsistencyCheck:Fix failue due to address inconsistency between process and worker With TLS, a worker (or process) can have a TLS address and non-TLS address. When a process is created in simulation, the primary address is TLS by default. The non-TLS one is the TLS address port plus one. In a connection between two workers, if their primary addresses do not enable or disable TLS together, one worker will swap its primary address and secondary address so that the TLS config of the two endpoints can match. The swap can make the primary address no longer the TLS one that was created when the process is created. And the swap only happens for worker instead of process struct in simulation. This swap can cause worker->address != process->address. In checkForExtraDataStores actor, we use worker->address to check if a process is killable and use the process->address to kill the process. The inconsistency can cause simulation to kill a protected process that is not killable and leads to simulation failure.	2020-03-10 21:07:16 -07:00
Evan Tschannen	303df197cf	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # bindings/c/test/mako/mako.c # documentation/sphinx/source/release-notes.rst # fdbbackup/backup.actor.cpp # fdbclient/NativeAPI.actor.cpp # fdbclient/NativeAPI.actor.h # fdbserver/DataDistributionQueue.actor.cpp # fdbserver/Knobs.cpp # fdbserver/Knobs.h # fdbserver/LogRouter.actor.cpp # fdbserver/SkipList.cpp # fdbserver/fdbserver.actor.cpp # flow/CMakeLists.txt # flow/Knobs.cpp # flow/Knobs.h # flow/flow.vcxproj # flow/flow.vcxproj.filters # versions.target	2020-03-06 18:22:46 -08:00
Alex Miller	9b5ef3416e	Refactor TLSParams into TLSConfig + LoadedTLSConfig The idea being that we keep around a TLSConfig that the configuration that the user has provided, and then when we want to intialize an SSL context, we ask the TLSConfig to load all certificates and return us a LoadedTLSConfig that is a concrete set of certificate bytes in memory. initTLS now just takes the in-memory bytes and applies them to the ssl context. This is a large refactor to lead up into certificate refeshing, where we will periodically check for changes to the certificates, and then re-load them and apply them to a new SSL context.	2020-03-04 20:14:47 -08:00
Evan Tschannen	c11c24b79d	removed the fdbrpc version of platform.h	2020-02-28 14:56:10 -08:00
Evan Tschannen	924d335aa7	Merge branch 'release-6.2' # Conflicts: # documentation/sphinx/source/release-notes.rst # flow/Knobs.cpp # flow/Knobs.h	2020-02-25 18:25:19 -08:00
Evan Tschannen	d60268123b	updated comment	2020-02-25 16:00:46 -08:00
Evan Tschannen	6e7d2ff7dd	prevent the proxy from delaying too long based on an incorrect estimate of the compute time	2020-02-25 15:46:13 -08:00
Evan Tschannen	96258b9809	Merge branch 'release-6.2' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbcli/fdbcli.actor.cpp # fdbclient/ManagementAPI.actor.cpp # fdbrpc/FlowTransport.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/DataDistribution.actor.h # fdbserver/DataDistributionQueue.actor.cpp # fdbserver/KeyValueStoreMemory.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/QuietDatabase.actor.cpp # fdbserver/SkipList.cpp # fdbserver/StorageMetrics.actor.h # fdbserver/TLogServer.actor.cpp # fdbserver/fdbserver.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/KVStoreTest.actor.cpp # flow/CMakeLists.txt # flow/Knobs.cpp # flow/Knobs.h # flow/genericactors.actor.cpp # flow/serialize.h	2020-02-21 19:09:16 -08:00
Evan Tschannen	f04e311a1e	Merge commit 'b46d6e25e24993ab5a5f04091fd3235050b7cd09' into feature-boost-ssl # Conflicts: # fdbserver/SimulatedCluster.actor.cpp # flow/Net2.actor.cpp	2020-02-20 17:36:38 -08:00
Evan Tschannen	f7a37077cc	handshake takes time in simulation	2020-02-20 15:26:56 -08:00
Evan Tschannen	d7c841a28a	Merge pull request #2589 from etschannen/feature-proxy-delay Improve version pipelining on the proxy	2020-02-20 15:23:30 -08:00
Evan Tschannen	def8ca6da3	simulation advances timer() separately from now() to better model the real world	2020-02-20 12:10:20 -08:00
Evan Tschannen	761da5a059	code cleanup	2020-02-19 17:59:45 -08:00
Evan Tschannen	a6486766c2	fix: rebooting an unreliable process will make it reliable again, but while unreliable the files for that process could have already been corrupted so simulation will think a process is healthy that is actually corrupted	2020-02-19 15:18:57 -08:00
Evan Tschannen	dcbce3593e	fixed TLS in simulation	2020-02-10 14:00:21 -08:00
Evan Tschannen	38d8d0d675	fixed simulation	2020-02-06 19:29:31 -08:00
Evan Tschannen	69de430057	separate handshaking from connection to improve pipelining	2020-02-06 16:45:54 -08:00
Evan Tschannen	231d7830a0	more accurate calculation on the amount of time that proxy should wait before getting a version from the master	2020-01-26 19:47:12 -08:00
Alex Miller	8d44a2a0d4	Convert sim2 from hashlittle to crc32c	2020-01-13 18:28:40 -08:00
Evan Tschannen	3c769fcf60	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # documentation/sphinx/source/release-notes.rst # fdbserver/ClusterController.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # versions.target	2019-11-22 15:39:19 -08:00
Evan Tschannen	746b357b7f	fix: simulation should not allow connections to dead processes	2019-11-21 20:36:40 -08:00
Evan Tschannen	27cb299d84	simulation can sometimes randomly hang or throw connection_failed, instead of always doing one or the other	2019-11-21 16:24:18 -08:00
Evan Tschannen	2727b91c46	simulation tests network connections failing due to errors instead of just hanging	2019-11-21 12:33:07 -08:00
Evan Tschannen	57fdbbf975	fix: in simulation dead connections need to stop receiving traffic after 1 second	2019-11-15 10:16:44 -08:00
Meng Xu	d5b9c46de9	Increase delay in monitoring LeakedConnection trackLeakedConnection actor should give server enough time to close its connection due to idle connection. The current logic waits for at least 24 seconds to detect and close an idle connection. The current trackLeakedConnection actor waits for about 30 seconds to claim LeakedConnection error. We increase the delay in trackLeakedConnection actor to avoid false positive error in simulation test. Co-authored by: Vishesh Yadav	2019-08-23 15:10:39 -07:00
Andrew Noyes	4ebb325ff9	Make cancellable actors [[nodiscard]] by default	2019-08-16 09:24:57 -07:00
mpilman	370ba8b841	Remove --object-serializer flag from executables	2019-08-06 09:25:40 -07:00
Vishesh Yadav	c694931e33	sim2: Remove obsolete comment	2019-07-10 14:06:06 -07:00
Vishesh Yadav	1f9c80f633	fdbrpc: Instead of tracking last sent data, track last sent non-ping data * This will allow client to continue monitoring peer connections while connection stays open, so that there is no period of "uncertainity" without previous no-monitoring approach. * Use multiplier for incoming connection idle timeout * Update idle connection timeout values and leaked connection timeout in simulator.	2019-07-09 14:24:16 -07:00
Vishesh Yadav	867986cdea	fdbrpc: Reduced connection monitoring from clients This patch does two changes to connection monitoring: 1. Connection monitoring at client side will check if the connection has been stayed idle for some time. If connection is unused for a while, we close the connection. There is some weirdness involved here as ping messages are by themselves are connection traffic. We get over this by making it two-phase process, first being checking idle reliable traffic, followed by disabling pings and then checking for idle unreliable traffic. 2. Connection monitoring of clients from server will no longer send pings to clients. Instead, it keep monitor the received bytes and close after certain period of inactivity.	2019-07-09 14:24:16 -07:00
Alex Miller	bc4548e0d3	Fix sed accidentally rewriting a trace event to have an invalid field name.	2019-06-27 17:55:41 -07:00
Alex Miller	bf883d7055	Merge remote-tracking branch 'upstream/master' into flowlock-api	2019-06-25 14:26:50 -07:00
Evan Tschannen	0fe6edc254	Merge pull request #1678 from mpilman/features/external-workload Features/external workload	2019-06-25 13:53:19 -07:00
Alex Miller	7a500cd37f	A giant translation of TaskFooPriority -> TaskPriority::Foo This is so that APIs that take priorities don't take ints, which are common and easy to accidentally pass the wrong thing.	2019-06-25 02:47:35 -07:00
Alex Miller	12dbe13c9c	Provide a no-op O_CLOEXEC on windows to fix the build.	2019-06-19 17:16:06 -07:00
mpilman	2eff2b7e21	First simple test is working (but very buggy)	2019-06-19 13:03:41 -07:00
Parallels	773f52d0a1	Merge remote-tracking branch 'upstream/master' into cloexec	2019-06-03 15:43:32 -07:00
A.J. Beamon	603721e125	Merge branch 'master' into thread-safe-random-number-generation # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbrpc/AsyncFileCached.actor.h # fdbrpc/genericactors.actor.cpp # fdbrpc/sim2.actor.cpp # fdbserver/DiskQueue.actor.cpp # fdbserver/workloads/BulkSetup.actor.h # flow/ActorCollection.actor.cpp # flow/Net2.actor.cpp # flow/Trace.cpp # flow/flow.cpp	2019-05-23 08:35:47 -07:00
Evan Tschannen	f4fbaac6b0	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-05-19 10:27:59 -07:00
Evan Tschannen	2b8b7954a9	in simulation, prevent data from being received over a connection 1 second after the connection is closed on the other end	2019-05-17 15:05:32 -07:00
Alex Miller	69fb852ee0	Add more CLOEXEC-like things. From missed call sites found during/after code review.	2019-05-14 20:30:58 -10:00
mpilman	20c3f7f264	remove mixed-mode support	2019-05-13 14:15:23 -07:00
mpilman	642a96807b	Fixed compilation issues after rebase	2019-05-13 14:15:22 -07:00
mpilman	9eeb48c43d	Allow to turn on object serializer This commit includes functionality to turn on the object serializer for network communication. This is done the following way: - On incoming connections, a process will detect whether the client supports the object serializer and will only serialize responses with it, if it does - On outgoing connections, the command line flag is used to determine whether the object serializer should be used to send data. This way, a cluster can run in mixed mode. To upgrade one can upgrade one process at a time and set the flag one process at a time. This is how this is tested on the simulator: - The command line flag can take three options: on, off, and random. - For off, the object serializer will never we used. - For on, the object serializer will be always used. - For random, the simulator will flip a coin for each process it starts up.	2019-05-13 14:15:22 -07:00
Evan Tschannen	8c3516951a	Merge branch 'release-6.1' # Conflicts: # documentation/sphinx/source/release-notes.rst # versions.target	2019-05-12 20:13:49 -07:00
Alex Miller	c502ed3d15	Fix a variety of problems stemming from a wait() being added to push(). And that this code was previously insufficiently tested.	2019-05-10 14:55:11 -10:00
A.J. Beamon	5f55f3f613	Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.	2019-05-10 14:01:52 -07:00
Alex Miller	510b0b2fcd	Fix DiskQueue not replaceFile'ing frequently enough for the final time.	2019-05-08 23:08:25 -10:00
Austin Seipp	af00248df6	fdbrpc: fix some print/scan format warnings Signed-off-by: Austin Seipp <aseipp@pobox.com>	2019-05-06 13:35:29 -07:00
Andrew Noyes	781b6ece77	Fix OPEN_FOR_IDE -Wunused-variable warnings CC #1255, #1173	2019-04-16 15:28:01 -07:00
Andrew Noyes	6207d724f8	Fix all -Wunused-variable warnings	2019-04-15 18:13:00 -07:00
Evan Tschannen	6220a5ce0f	Merge pull request #1370 from jzhou77/fix-unreferenced Remove unused functions	2019-04-09 11:49:45 -07:00
mpilman	c008e16c81	Defer formatting in traces to make them cheaper This is the first part of making `TraceEvent` cheaper. The main idea is to defer calls to any code that formats string. These are the main changes: - TraceEvent::detail now takes a c-string instead of std::string for literals. This prevents unnecessary allocations if the trace is not going to be printed in the first place (for example for SevDebug). Before that `detail` expected a `std::string` as key, which mean that any string literal would be copied on each call. - Templates Traceable and SpecialTraceMetricType. These templates can be specialized for any type that needs to be printed. The actual formatting will be deferred to after the `enabled` check. This provides two benefits: (1) if a TraceEvent is disabled, we don't pay for the formatting and (2) TraceEvent can trace types that it doesn't know about. - TraceEvent::enabled will be set in the constructor if the Severity is passed. This will make sure that `TraceEvent::init` is not called. - `TraceEvent::detail` will be inlined. So for disabled TraceEvent calls, a call to detail will only introduce a if-branch which is much cheaper than a function call.	2019-04-05 13:12:19 -07:00
Jingyu Zhou	a55f06e082	Remove unused functions Found with -Wunused-function flag.	2019-03-27 15:45:28 -07:00
Evan Tschannen	1fc6937802	changed NetworkAddressList to at most two addresses for performance	2019-03-23 17:54:46 -07:00
Alex Miller	c6a65389ae	Remove noexcept macro and replace with BOOST_NOEXCEPT. BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a way that is correctly maintained over time.	2019-03-05 22:06:12 -08:00
Vishesh Yadav	cc9ad0e202	net: Use IPv6 in simulation testing #963 25% times we will use IPv6 addresses	2019-03-04 14:12:45 -08:00
Vishesh Yadav	57832e625d	net: Support IPv6 #963 - NetworkAddress now contains IPAddress object which can be either IPv4 or IPv6 address. 128bits are used even for IPv4 addresses, however only 32bits are used when using/serializing IPv4 address. - ConnectPacket is updated to store IPv6 address. Backward compatible with old format since the first 32bits of IP address field is used for serialization of IPv4. - Mainly updates rest of the code to use IPAddress structure instead of plain uint32_t. - IPv6 address/pair ports should be represented as `[ip]:port` as per convention. This applies to both cluster files and command line arguments.	2019-03-04 14:12:41 -08:00
Evan Tschannen	8afb7fbb9d	Merge pull request #1160 from alexmiller-apple/tstlog-fork Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously	2019-02-26 18:00:04 -08:00
Alex Miller	2dc57568cb	Change many things about log_version. * log_version in the database (`/conf/log_version`) is now a hint that gets rounded to the nearest supported version. * fdbcli and FDB enforce that only a valid log_version can be configured to * TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet) * Some comments here and there * Add an assert on filename length to make sure KV-pairs in filename don't exceed a maximum length.	2019-02-26 16:47:04 -08:00
Evan Tschannen	b8910ba7cd	Merge branch 'master' into feature-fix-force-recovery # Conflicts: # fdbclient/ManagementAPI.actor.h # fdbserver/DataDistribution.actor.cpp # fdbserver/storageserver.actor.cpp # fdbserver/workloads/KillRegion.actor.cpp	2019-02-22 14:38:13 -08:00
Evan Tschannen	065a45e05f	Merge branch 'master' into feature-fix-force-recovery # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/workloads/KillRegion.actor.cpp	2019-02-18 17:09:06 -08:00
Evan Tschannen	62603d11a1	updated the killRegion simulation test to test a much larger variety of failure scenarios	2019-02-18 15:32:51 -08:00
Vishesh Yadav	907446d0ce	Merge remote-tracking branch 'apple/master' into task/tls-upgrade	2019-02-14 11:37:38 -08:00
Evan Tschannen	486e0e13c3	Merge pull request #1116 from alexmiller-apple/tstlog Random cleanups that prepare for Spill-By-Reference TLog	2019-02-05 18:09:06 -08:00
Alex Miller	6668b7c544	Make simulation enforce what KAIO requires.	2019-02-04 18:04:22 -08:00
Evan Tschannen	e9ddd94e27	The failure monitor is given a list of all IP addresses associated with a process The connect packet includes the correct remote address Did a lot of code cleanup Simulation test mixed TLS and non-TLS listeners on the same process	2019-01-31 18:20:14 -08:00
A.J. Beamon	05b38167d0	Update fdbrpc/sim2.actor.cpp Co-Authored-By: etschannen <36455792+etschannen@users.noreply.github.com>	2019-01-29 11:35:02 -08:00
Evan Tschannen	699f8dd617	fix: coordinators auto could put two coordinators in the same zone simulation now tests two machines in the same zone	2019-01-18 15:42:48 -08:00
Vishesh Yadav	51b89ae083	WIP	2019-01-09 07:41:02 -08:00
Alex Miller	cebdb83def	Revert "Merge pull request #977 from alexmiller-apple/abspath" This reverts commit `9881b1d074`, reversing changes made to `6d278e466b`.	2019-01-08 16:52:09 -08:00
Markus Pilman	4ae701d8a9	minor bugfix to look up correct filename in cache (manually cherry-picked from flat-buffers branch)	2018-12-13 22:21:25 -08:00
Markus Pilman	0207831fd6	Use abspath when dealing with the simulator file-cache The simulator uses a hash table to cache all open files to make sure that several simulated processes don't open the file more than once. This currently doesn't work properly and deleted files are often kept open forever. As a result, we often ran out of file descriptors. The problem is luckily quite simple: files are often opened with an absolute path but later a relativ path is passed for deletion. This is not working because the map that is used to store the file descriptors is not aware of paths - so deleted files are often not removed from this map. The fix that works for us is to just always work with absolute paths when adding and removing files from this map.	2018-12-13 22:21:06 -08:00
Vishesh Yadav	e04abf25f7	simulator: Support multiple listeners on single process Sim2Listener can now take the network address to listen on. This is used to listen to multiple ports in simulator and test the patch which added multiple network addresses to single endpoint.	2018-12-13 13:36:52 -08:00
Vishesh Yadav	e8e01b2406	Remove unused localAddress parameter from newNet2 and Net2 classes	2018-12-13 13:36:52 -08:00
Evan Tschannen	4b5d0b4e2c	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/AsyncFileBlobStore.actor.cpp # fdbclient/AsyncFileBlobStore.actor.h # fdbclient/BlobStore.actor.cpp # fdbclient/BlobStore.h # fdbclient/HTTP.actor.cpp # fdbclient/ManagementAPI.actor.cpp # fdbclient/NativeAPI.actor.cpp # fdbrpc/LoadBalance.actor.h # fdbrpc/batcher.actor.h # fdbrpc/fdbrpc.vcxproj # fdbrpc/sim2.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/DataDistributionTracker.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/masterserver.actor.cpp	2018-11-10 13:04:24 -08:00
A.J. Beamon	58a0e22d3c	Remove sim2 dependency on fdbclient: * Remove unused 'exclusionSet' that used a type from fdbclient. * Replace usages of describe(x) with x.toString(). Also removed some using statements.	2018-10-26 09:23:12 -07:00
Robert Escriva	268093a96d	Adjust all includes to be relative to the root. Remove the use of relative paths. A header at foo/bar.h could be included by files under foo/ with "bar.h", but would be included everywhere else as "foo/bar.h". Adjust so that every include references such a header with the latter form. Signed-off-by: Robert Escriva <rescriva@dropbox.com>	2018-10-19 17:35:33 +00:00
Evan Tschannen	3922e477a5	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/ManagementAPI.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/LogSystemDiskQueueAdapter.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp	2018-10-03 16:57:18 -07:00
Evan Tschannen	200e65fe61	added a workload which tests killing an entire region, and recovering from the failure with data loss. fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore fix: we have to modify all of history, we cannot stop after finding a local remote	2018-09-17 18:32:39 -07:00
Alex Miller	535b5701e5	Rewrite all `Void _ = wait(...)` -> `wait(...)`. This takes advantage of the new actorcompiler functionality to avoid having duplicate definitions of `Void _` when trying to feed the un-actorompiled source through clang.	2018-08-14 15:50:26 -07:00
A.J. Beamon	3535ddad80	Merge pull request #674 from alexmiller-apple/glibcxx-debug-fixes Fix bugs uncovered by -D_GLIBCXX_DEBUG	2018-08-09 08:18:51 -07:00
Steve Atherton	fb46385a39	Merge pull request #628 from alexmiller-apple/reloadcertificates Reload certificates if changed. This is a cherry-pick of #628 back to release-6.0	2018-08-06 18:04:04 -07:00
Evan Tschannen	538e684f1c	Merge branch 'release-6.0' # Conflicts: # versions.target	2018-08-03 11:41:46 -07:00
Alex Miller	1a7cda4149	Stop performing self-moves. (e.g. a = std::move(a)) self-moves are frowned upon in C++, and in our code this generally happens from calls to swap as part of trying to implement a "unordered erase" function via swap-to-the-end-and-pop_back. For convenience, a swapAndPop() function is now offered that performs this, while disallowing self-moves.	2018-08-01 18:09:54 -07:00
Evan Tschannen	1c29275672	call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.	2018-08-01 14:30:57 -07:00
Alex Miller	262af775eb	Implement overly simple file write timestamps for simulation, and clean up code.	2018-07-24 17:20:31 -07:00
Alex Miller	2d26e98d07	Add a cross-platform getLastWrite() to get a file's mtime.	2018-07-20 19:00:32 -07:00
Evan Tschannen	e0caa28758	code cleanup	2018-07-16 15:56:43 -07:00
Evan Tschannen	f72a9f60c0	only disable fearless if a datacenter has actually been killed fix: we must prevent recovery into the dead datacenter while reducing usable_regions	2018-07-16 10:06:57 -07:00
Evan Tschannen	82cc30be62	added testing for two_satellite_fast and two_satellite_safe	2018-07-09 22:01:46 -07:00
Evan Tschannen	6d7172ef7e	fix: canKillProcesses did not take into account the remoteTLogPolicy when checking notEnoughLeft	2018-07-05 21:36:09 -07:00
Evan Tschannen	6f4ca2eba2	fix: get all processes did not include rebooting processes	2018-07-05 21:13:56 -07:00
Evan Tschannen	cd4fb9285a	waitForExlusion requires both regions to be healthy, which is only possible if we do not kill all logs in a region	2018-07-05 14:04:42 -07:00
Evan Tschannen	7315e5da55	fix: isExcluded and isCleared were exactly wrong fix: isCleared should mean the process is dead	2018-07-05 02:22:22 -07:00
Evan Tschannen	e17dfea3b6	fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1. canKillProcess logic was wrong. We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.	2018-07-04 16:22:32 -04:00
Evan Tschannen	0913368651	added usable_regions to specify if we will replicate into a remote region remote replication defaults to the primary replication removed remote_logs, because they should be specified as an override in the regions object	2018-06-17 19:31:15 -07:00
Evan Tschannen	372ed67497	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/DataDistribution.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp	2018-06-11 11:34:10 -07:00
Evan Tschannen	48fbc407fd	fix: we cannot kill all of the remote tlogs, because we still need their data to copy to the next generation in the same data center	2018-06-08 15:28:44 -07:00
A.J. Beamon	e5488419cc	Attempt to normalize trace events: * Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check. * Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase. * Use seconds instead of milliseconds in details. Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed. This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.	2018-06-08 11:11:08 -07:00
Evan Tschannen	8f984cb2c9	Merge branch 'release-5.2' # Conflicts: # fdbrpc/TLSConnection.h	2018-05-10 09:13:22 -07:00
Balachandar Namasivayam	d3b5cfb93c	Support latest TLS plugin. Add support for https in backup.	2018-05-08 16:28:13 -07:00
Evan Tschannen	68606c7984	fix: sim2 logic for when a kill is safe was incorrect	2018-03-06 18:38:05 -08:00
A.J. Beamon	f2c804e14f	Reverting changes from merge of master into release-5.2 (`b25810711c`). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit.	2018-03-06 10:15:04 -08:00
Evan Tschannen	e3c6b66240	fix: do not commit more data after being stopped fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center	2018-02-26 13:13:37 -08:00
Evan Tschannen	37a6a81634	Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs # Conflicts: # fdbserver/workloads/RestartRecovery.actor.cpp	2018-02-23 12:33:28 -08:00
Alec Grieser	0bae9880f1	remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py	2018-02-21 10:25:11 -08:00
Evan Tschannen	cb25564d38	simulated cluster supports fearless configurations removed unused simulation variables run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation	2018-02-15 18:32:39 -08:00
Evan Tschannen	ebd94bb654	removed a separately configurable storage team size for the remote data center, because it did not make sense fix: the master did not monitor for the failure of remote logs stop merge attempts when a data center is failed fixed a variety of other problems with data distribution when a data center is failed	2018-02-02 11:46:04 -08:00
Evan Tschannen	5ac4f73978	Merge branch 'release-5.1' into feature-remote-logs # Conflicts: # fdbclient/NativeAPI.actor.cpp # fdbrpc/Locality.h # fdbrpc/simulator.h # fdbserver/ApplyMetadataMutation.h # fdbserver/ClusterController.actor.cpp # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/masterserver.actor.cpp # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-01-05 11:33:42 -08:00
Evan Tschannen	e2c1e87df6	made a large number of fixes to make fearless DR correctness clean.	2017-10-19 15:36:32 -07:00
Stephen Atherton	e934604f67	Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random. BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.	2017-10-15 21:51:11 -07:00
Alvin Moore	de8f875038	Fixed call to IsClear Changed killMachine and killDataCenter interface to return final killtype Updated TESTs for DataCenter to ensure that DataCenter was killed Added assertion to ensure that failed DC kills were not downgrades	2017-10-05 03:07:20 -07:00
Alvin Moore	5257b99d3f	Fixed problem with machines RebootedAndCleared not being considered dead in availability consideration	2017-10-03 10:48:16 -07:00
Alvin Moore	d099656557	Merge branch 'release-5.0'	2017-10-02 12:05:24 -07:00
Alvin Moore	25513d8e2c	Added tests for DataCenter kills	2017-10-02 12:04:28 -07:00
Alvin Moore	298b54104e	Merge branch 'release-5.0'	2017-09-26 11:16:14 -07:00
Alvin Moore	02525d7b14	Added TESTs to ensure that all of the different kills are performed during simulation	2017-09-26 11:15:39 -07:00
Evan Tschannen	e8b895c878	added the ability to disable connection failures for a period of time after one happens	2017-09-18 12:46:29 -07:00
Alvin Moore	4a6fb10a42	Added TraceEvents for remaining and killed workers when killing DataCenter Fixed consideration of excluded workers when checking cluster availability	2017-09-12 13:33:13 -07:00
Alvin Moore	44e0df78c5	Added support for tracking roles for simulation workers Fixed the exclusion and inclusion address simulation API and integration within workloads Added more information within trace events for simulation	2017-08-28 11:25:37 -07:00
Evan Tschannen	272b4b984c	fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine	2017-08-25 10:12:58 -07:00
Alvin Moore	17c6392295	Added support for printing out information on the current simulation workers	2017-08-22 16:56:33 -07:00
Alvin Moore	6d19580789	Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0 # Conflicts: # fdbrpc/simulator.h	2017-06-19 17:39:37 -07:00
Alvin Moore	9553458b78	Updated simulation to support managing exclusion and inclusion address Added method for identifying acceptable availability process classes Extended cluster availability function to ensure coordinators can be auto configured Fixed availability function to allow protected processes to be considered as dead if not available Added debug trace events for providing machine state when considering availability Added trace event for protected coordinators	2017-06-19 16:48:15 -07:00
Stephen Atherton	430bb6224e	Merge branch 'release-4.6' into release-5.0 # Conflicts: # fdbrpc/AsyncFileKAIO.actor.h # fdbrpc/Net2FileSystem.cpp # fdbrpc/sim2.actor.cpp	2017-06-16 02:14:19 -07:00
Stephen Atherton	f405c8d88e	Merge branch 'release-4.6' into release-5.0 # Conflicts: # fdbrpc/AsyncFileKAIO.actor.h # fdbrpc/sim2.actor.cpp # fdbserver/optimisttest.actor.cpp # versions.target	2017-06-15 17:40:19 -07:00
Stephen Atherton	fa4fdb1f1d	Merge branch 'fix-io-timeout-handling' into release-5.0 # Conflicts: # fdbserver/optimisttest.actor.cpp	2017-05-31 17:03:15 -07:00
Stephen Atherton	98604d33a0	Merge branch 'fix-io-timeout-handling' # Conflicts: # fdbrpc/AsyncFileKAIO.actor.h # fdbrpc/sim2.actor.cpp # fdbserver/KeyValueStoreSQLite.actor.cpp # fdbserver/optimisttest.actor.cpp # fdbserver/worker.actor.cpp # fdbserver/workloads/MachineAttrition.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2017-05-26 18:43:08 -07:00
Alvin Moore	0b9ed67e12	Fixed support for RemoveServers Workload Added availability functions to simulation	2017-05-26 14:20:11 -07:00
Alvin Moore	16cc0821b1	Removed dead machine option from simulation	2017-05-25 16:29:02 -07:00
FDB Dev Team	a674cb4ef4	Initial repository commit	2017-05-25 13:48:44 -07:00

... 2 3 4 5 6 ...

327 Commits