foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	3915d6825c	we need to check the server list at a higher priority, because if we do not notice a storage server interface change for a long period of time, we will mark it as failed	2018-01-12 12:51:07 -08:00
Evan Tschannen	de119f192d	fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled)	2018-01-11 16:09:49 -08:00
Evan Tschannen	29ebb19388	Merge branch 'release-5.0' into release-5.1	2018-01-11 15:43:37 -08:00
Evan Tschannen	22e5a0b257	formatting	2018-01-11 14:44:09 -08:00
Evan Tschannen	173a8de3ed	DBCoreState supports upgrades from 3.0 versions	2018-01-11 14:39:51 -08:00
A.J. Beamon	2f5073d00f	Some visual studio project cleanup.	2018-01-10 10:07:18 -08:00
Evan Tschannen	022df3b91b	backup and restore sometimes took too long in simulation	2018-01-09 17:26:42 -08:00
Evan Tschannen	645f68212b	make timekeeper priority system immediate	2018-01-08 18:21:00 -08:00
Evan Tschannen	370e8a9903	fix: split metrics could fail an assert in a very rare scenario	2018-01-08 18:20:22 -08:00
Evan Tschannen	9630deba3a	fixed a number of bugs related to running fearless without remote logs	2018-01-08 12:04:19 -08:00
Evan Tschannen	d3116fb336	masterRecoveryDuration is only a sevWarnAlways outside of simulation	2018-01-07 15:37:45 -08:00
Evan Tschannen	4e8bc273b3	added a version of getKeyRangeLocations that checks for endpoint failures fix: did not add the cluster controller to id_used in all cases removed obsolete fixmes	2018-01-07 15:32:43 -08:00
Evan Tschannen	30710f7493	syncLogId was not necessary	2018-01-06 14:52:39 -08:00
Evan Tschannen	3ec45d38a0	Merge branch 'master' into feature-remote-logs # Conflicts: # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-01-06 13:54:45 -08:00
Evan Tschannen	10c3fc165e	fix: after recovering from disk, only allow peeking data the was fully recovered	2018-01-06 13:49:13 -08:00
Stephen Atherton	b86f68ceb8	Added new test that combines atomic backup/restore. Added randomization to delays in AtomicRestore workload.	2018-01-05 14:43:21 -08:00
Evan Tschannen	63751fb0e2	fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from	2018-01-05 14:15:25 -08:00
Evan Tschannen	5ac4f73978	Merge branch 'release-5.1' into feature-remote-logs # Conflicts: # fdbclient/NativeAPI.actor.cpp # fdbrpc/Locality.h # fdbrpc/simulator.h # fdbserver/ApplyMetadataMutation.h # fdbserver/ClusterController.actor.cpp # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/masterserver.actor.cpp # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-01-05 11:33:42 -08:00
A.J. Beamon	5015119115	Generalize the message that gets displayed in status if a cluster file's contents are incorrect.	2018-01-05 10:29:47 -08:00
Evan Tschannen	e11f461cbd	fix: better master exists needs to check master fitness before tlogs or proxies because that is the order of recruitment	2018-01-04 15:19:46 -08:00
Evan Tschannen	f8f1c48d83	sometimes test pausing backups	2018-01-04 11:40:08 -08:00
Evan Tschannen	f2c4beed9f	fix: tlogFitness did not consider it better to have one tlog of a better fitness fix: checkStable was not used in all places in better master exists fix: we need to call checkOutstanding on worker registration in all cases fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it	2018-01-04 11:33:02 -08:00
Evan Tschannen	6d5dd9bd27	fix: we cannot pipeline disk queue commits until after the first commit is successful	2018-01-02 13:30:27 -08:00
Evan Tschannen	86958cb08d	Merge pull request #226 from cie/fix-taskBucket-unblockFuture Modify TaskBucketCorrectness to support chain and multiple tasks	2017-12-20 18:00:54 -08:00
Yichi Chiang	91e5abeaa6	Modify TaskBucketCorrectness to support chain and multiple tasks	2017-12-20 17:02:49 -08:00
Alex Miller	f70e3b9fe8	Add or change a bunch of comments to provide descriptions of function contracts. This cleans up a bit of the VersionStamp DR work I did, and leaves hints and advice for anyone who will be touching mutation applying code in the future.	2017-12-20 16:57:14 -08:00
Evan Tschannen	982f0dcb1e	Merge pull request #222 from cie/alexmiller/drtimefix2 Fix yet another VersionStamp DR issue.	2017-12-20 15:09:23 -08:00
Alex Miller	b5a6bc0ab7	Fix VersionStamp problems by instead adding a COMMIT_ON_FIRST_PROXY transaction option. Simulation identified the fact that we can violate the VersionStamps-are-always-increasing promise via the following series of events: 1. On proxy 0, dumpData adds commit requests to proxy 0's commit promise stream 2. To any proxy, a client submits the first transaction of abortBackup, which stops further dumpData calls on proxy 0. 3. To any proxy that is not proxy 0, submit a transaction that checks if it needs to upgrade the destination version. 4. The transaction from (3) is committed 5. Transactions from (1) are committed This is possible because the dumpData transactions have no read conflict ranges, and thus it's impossible to make them abort due to "conflicting" transactions. There's also no promise that if client C sends a commit to proxy A, and later a client D sends a commit to proxy B, that B must log its commit after A. (We only promise that if C is told it was committed before D is told it was committed, then A committed before B.) There was a failed attempt to fix this problem. We tried to add read conflict ranges to dumpData transactions so that they could be aborted by "conflicting" transactions. However, this failed because this now means that dumpData transactions require conflict resolution, and the stale read version that they use can cause them to be aborted with a transaction_too_old error. (Transactions that don't have read conflict ranges will never return transaction_too_old, because with no reads, the read snapshot version is effectively meaningless.) This was never previously possible, so the existing code doesn't retry commits, and to make things more complicated, the dumpData commits must be applied in order. This would require either adding dependencies to transactions (if A is going to commit then B must also be/have committed), which would be complicated, or submitting transactions with a fixed read version, and replaying the failed commits with a higher read version once we get a transaction_too_old error, which would unacceptably slow down the maximum throughput of dumpData. Thus, we've instead elected to add a special transaction option that bypasses proxy load balancing for commits, and always commits against proxy 0. We can know for certain that after the transaction from (2) is committed, all of the dumpData transactions that will be committed have been added to the commit promise stream on proxy 0. Thus, if we enqueue another transaction against proxy 0, we can know that it will be placed into the promise stream after all of the dumpData transactions, thus providing the semantics that we require: no dumpData transaction can commit after the destination version upgrade transaction.	2017-12-20 15:04:04 -08:00
Stephen Atherton	e0d9cea008	Merge branch 'master' into continuous-backup # Conflicts: # fdbclient/FileBackupAgent.actor.cpp # fdbrpc/BlobStore.actor.cpp	2017-12-19 23:02:14 -08:00
Alex Miller	c7dbd31a1e	Refactoring: Create a common prefixRange and do UID->Key once in backup.	2017-12-19 17:17:50 -08:00
Alex Miller	1488c12c18	Simulation will return and error and print if any non-suppressed SevError events were logged. This means that loops like `seed=1; while ./fdbserver -r simulation -s $seed; do seed=$(($seed+1)); done` to find an example of an often failing test. This also means joshua will report ExitCode errors on anything that has a SevError in the log. As a part of this, we also implicitly downgrade any injected errors to SevWarnAlways.	2017-12-19 17:17:50 -08:00
Stephen Atherton	e28641886d	TraceEvent improvements. Minor bug fix, restore log writing tasks didn't have the log file endVersion but it's only for logging purposes.	2017-12-19 15:27:04 -08:00
Evan Tschannen	a5601877b3	fix: valgrind issue with destruction ordering	2017-12-18 15:31:59 -08:00
Evan Tschannen	1dc9eceb6d	optimize GetKeyLocationRequests on the proxy so they only require a single map lookup, instead of doing 3 + (3* [number of ranges]) lookups	2017-12-15 20:13:44 -08:00
Stephen Atherton	33f9f1a95c	Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration.	2017-12-14 01:44:38 -08:00
Evan Tschannen	7ce93426ed	fix: connection disabler in removeServerSafely needs to run for the whole test to avoid getting stuck on include all	2017-12-12 18:38:57 -08:00
Alec Grieser	4495a19299	Merge pull request #220 from cie/alexmiller/flowprofcircus Add class restrictions to CpuProfiler, and fix metric crash.	2017-12-11 14:13:22 -08:00
Evan Tschannen	73a0a07eac	clients ask for key location information directly from the proxy, instead of reading it from the database	2017-12-09 16:10:22 -08:00
Alex Miller	48660e9ce5	Add class restrictions to CpuProfiler, and fix metric crash. This change largely refactors away the old meaning of the value given to flow_profiler, which was the number of machines that we'd be profiling, and instead replaces it with the classes of processes to profile for the duration of the test. Most importantly, this means that one can profile in circus with a configuration that has "ssd" in it, and the circus run will still complete (as long as the argument isn't "storage"). And also finally add some other fixes I had to the same file to conditionally change the name of the metric we're looking for to comply with what's actually written.	2017-12-07 19:28:29 -08:00
Stephen Atherton	abb2dd1ebc	Merge pull request #214 from cie/alexmiller/fallocate Use fallocate to zero ranges instead of writing zeroes	2017-12-06 13:47:40 -08:00
Evan Tschannen	5a947212ed	fix: ensure all prior commits have completed before returning that a commit has committed from the disk queue	2017-12-06 12:31:07 -08:00
Stephen Atherton	f8e89a40ac	Bug fixes, take(1) is incorrect usage of FlowLock.	2017-12-04 10:25:47 -08:00
Evan Tschannen	49dac11a5f	added a SevWarnAlways for when a disk queue file grows larger than 20GB	2017-12-01 15:05:17 -08:00
Evan Tschannen	482ac38ca6	added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs	2017-12-01 13:04:32 -08:00
Evan Tschannen	c3918d892a	do not use bandwidth splitting on the keyServer shard, lots of sets and clears to this shard generally means you do not want to create additional data distribution work	2017-11-30 18:28:16 -08:00
Alex Miller	196258080b	Refactor zeroing a chunk of a file from DiskQueue into IAsyncFile. If we're going to do the work to provide more optimized ways to zero files, then I'd feel better with this being in a more common place, so that any other zero-ers are likely to reuse it. It also makes testing easier/more obvious. Also, because it's needed for correctness, fix the aligned_alloc for OSX, which wasn't aligned, and use an actually aligned allocation function.	2017-11-30 17:57:55 -08:00
Alex Miller	c7a120c59d	Rename IAsyncFile::incrementalDelete -> IAsyncFileSystem::incrementalDeleteFile. `deleteFile` existed in IAsyncFileSystem, so an incremental delete function seems to belong more as a virtual method on IAsyncFileSystem than a static method on IAsyncFile, and the naming should match. As long as we're here, change IAsyncFile to declare a virtual destructor, so that it has good and proper C++ behavior. I presume this is what was vaguely intended by the default constructor definition that previously existed?	2017-11-30 17:19:10 -08:00
Evan Tschannen	7f72aa7de5	fix: a storage server does not ever need to rollback before a version restored from disk	2017-11-30 11:19:43 -08:00
Evan Tschannen	e5a682948c	Merge pull request #212 from cie/check-cluster-controller-desired-class Check cluster controller using desired process class in consistency c…	2017-11-29 15:57:51 -08:00
Yichi Chiang	8ba0eaebff	Check cluster controller using desired process class in consistency check	2017-11-29 15:09:23 -08:00
Evan Tschannen	8c51bc4ac4	fixed low latency tests in a way that gives us better test coverage	2017-11-28 18:20:29 -08:00
Evan Tschannen	dc624a54dc	fix: avoid flushing large queues in simulation when checking latency	2017-11-27 17:23:20 -08:00
Stephen Atherton	1b1c8e985a	Merge branch 'master' into backup-container-refactor # Conflicts: # fdbclient/FileBackupAgent.actor.cpp	2017-11-25 19:54:51 -08:00
Stephen Atherton	6695c9e6a2	Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files.	2017-11-25 00:46:16 -08:00
Alex Miller	f19cb3bbbd	Merge pull request #208 from cie/alexmiller/grvtfix Fix the GRV performance regression	2017-11-17 15:00:44 -08:00
Yichi Chiang	d9a98aa968	Remove commented code	2017-11-16 17:25:37 -08:00
Yichi Chiang	0d5dc15ac8	Fix double recoveries	2017-11-16 16:58:55 -08:00
Alex Miller	e9412bbb11	Fix the GRV performance regression introduced by adding the policy engine to GRV calculations. Construction of LocalityGroup from LocalityData is expensive, and the previous code greatly ran afoul of that. The policy engine does a large amount of interning of strings and building compressed maps to make the expected many future selectReplica calls cheap. Unfortunately we don't call selectReplicas, so much of this work is undesireable for us, and a large amount of CPU time is spent doing this initialization work. The new changes aggressively do the minimal LocalityGroup::add() calls necessary, and make them as cheap as possibly by removing all elements from LocalityData that don't need to be considered by the policy. This optimization was also applied to the PeekCursor used during recovery, which should speed recoveries up by a small amount.	2017-11-16 16:15:52 -08:00
Evan Tschannen	ad456a939a	Merge pull request #206 from cie/change-excluded-cluster-controller Change excluded cluster controller	2017-11-15 17:28:33 -08:00
Yichi Chiang	f96faf72d9	Add fullyRecoveredConfig for checking exclusions	2017-11-15 17:15:24 -08:00
Evan Tschannen	30464e943c	Merge pull request #205 from cie/cleanup-spammy-traceevents Cleanup spammy traceevents	2017-11-15 12:41:37 -08:00
Evan Tschannen	e113dba0e3	added a new trace event tracking master recovery durations	2017-11-15 12:38:26 -08:00
Stephen Atherton	a77162b53d	Merge branch 'master' into backup-container-refactor # Conflicts: # fdbclient/BackupAgent.h # fdbclient/FileBackupAgent.actor.cpp # fdbclient/KeyBackedTypes.h	2017-11-15 08:14:47 -08:00
Stephen Atherton	3dfaf13b67	IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse. Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively(). Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.	2017-11-14 23:33:17 -08:00
Yichi Chiang	df922bc973	Change excluded cluster controller	2017-11-14 13:57:37 -08:00
A.J. Beamon	bb1297c686	Remove RkServerQueueInfo and RkTLogQueueInfo trace events, since this information is more or less already logged on the storage servers and tlogs. Update the quiet database check and magnesium to use the information from the logs and storage servers.	2017-11-14 12:59:42 -08:00
A.J. Beamon	3b952efb4e	Remove events from cluster controller that get logged for roughly every worker upon recovery, master registration, etc.	2017-11-14 10:15:45 -08:00
A.J. Beamon	0fea5e9c2f	Convert client_invalid_operation errors to ASSERTs.	2017-11-13 11:38:34 -08:00
A.J. Beamon	cd085764f1	Do not automatically change a cluster file that does not match what you expect.	2017-11-10 14:12:45 -08:00
Alex Miller	311d1ca87d	A variety of fixes that collectively fix using flow profiling in circus. To run, use --co=flow_profiling=-1, because reasons.	2017-11-07 13:55:16 -08:00
Evan Tschannen	706bf1e018	fix: we cannot trigger better master exists before a master is fully recovered because exclusions changed by the provisional master will not be committed until the master is fully recovered	2017-11-04 12:48:04 -07:00
Evan Tschannen	57aba0b3bc	fix: excluded servers were the same fitness as storage servers for the master role fix: better master exists did not considers exclusion for master fitness	2017-11-03 17:09:14 -07:00
Yichi Chiang	42fad5efe5	Introduce cluster controller process class in circus	2017-11-03 14:22:55 -07:00
Yichi Chiang	dcc9aafab7	Merge branch 'master' of github.com:apple/foundationdb	2017-11-02 10:47:59 -07:00
Yichi Chiang	c033d8efd8	Fix typo message and remove extra TraceEvent which overwrites the expected one	2017-11-02 10:47:51 -07:00
Balachandar Namasivayam	3efaaec479	onMasterProxiesChanged was being triggered when any member of ClientDBInfo changed. Change the behavior to be triggered only when proxies field in ClientDBInfo is changed.	2017-11-01 18:29:56 -07:00
A.J. Beamon	7cf17df821	Merge branch 'master' into log-group-for-unsupported-clients # Conflicts: # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2017-11-01 11:31:02 -07:00
A.J. Beamon	31caac67dc	Rename supported_versions[x].clients to supported_versions[x].connected_clients	2017-11-01 10:41:30 -07:00
Balachandar Namasivayam	988bc0207f	Reset Client Transaction profiling parameters when the config keys are cleared.	2017-10-31 15:40:57 -07:00
Alec Grieser	5a4a5985fd	Merge branch 'release-5.0'	2017-10-30 08:31:23 -07:00
Alec Grieser	87321f5017	Merge branch 'release-4.6' into release-5.0	2017-10-30 08:31:01 -07:00
Evan Tschannen	54d82c0d92	Merge pull request #194 from cie/alexmiller/valgrind Fix valgrind errors	2017-10-27 17:25:12 -07:00
Alex Miller	e0d33ef8d7	Preemptively fix profiler-related valgrind errors/straight out bugs. I forgot to initialize some fields in requests.	2017-10-27 17:20:19 -07:00
Evan Tschannen	aa0c2ae317	only increase the max shard size if the shard begins in the keyServer keyspace, do not increase the minimum shard size	2017-10-27 14:22:26 -07:00
Evan Tschannen	3a4078bdda	the keyservers shards are always a fixed large size	2017-10-27 11:52:11 -07:00
Balachandar Namasivayam	cfefab18fb	Merge branch 'master' into add-new-atomic-ops	2017-10-25 18:03:34 -07:00
Balachandar Namasivayam	3d5658940a	Addressed Review Comments	2017-10-25 16:42:05 -07:00
Balachandar Namasivayam	9dd588dcce	Addressed review comments. Changed naming for NewMin and NewAnd to MinV2 and AndV2	2017-10-25 14:48:05 -07:00
Evan Tschannen	d852a53ae4	Merge pull request #181 from cie/throttle-spammy-logs Throttle spammy logs	2017-10-25 13:45:55 -07:00
Balachandar Namasivayam	2f6d55a52f	Add correctness tests for all atomic ops	2017-10-25 13:36:49 -07:00
Yichi Chiang	4d54a73f5b	Merge pull request #191 from cie/count-cluster-controller-role Take cluster controller role into consideration when recruiting workers	2017-10-25 12:09:15 -07:00
Yichi Chiang	f39cce9b8d	Use processId instead of address for comparison	2017-10-25 11:35:29 -07:00
Yichi Chiang	5fcef911f0	Take cluster controller role into consideration when recruiting workers	2017-10-25 10:35:46 -07:00
Evan Tschannen	48901a9223	added a list of tlog IDs that are missing to status	2017-10-24 16:28:50 -07:00
Yichi Chiang	c2a117fe07	Merge pull request #189 from cie/enable-check-desired-class Enable checkUsingDesiredClasses() in consistency check	2017-10-24 15:18:21 -07:00
Yichi Chiang	defdc6550d	Exclude excluded processses when getting testers	2017-10-24 15:16:34 -07:00
Evan Tschannen	df74e2a373	re-added support for non-copying tlog recovery	2017-10-24 15:09:31 -07:00
Yichi Chiang	3865c5ae0e	Enable checkUsingDesiredClasses() in consistency check	2017-10-24 12:58:54 -07:00
Balachandar Namasivayam	8c3bdc5b3b	Make atomic ops differentiate between unset and empty values.	2017-10-23 16:48:13 -07:00
Evan Tschannen	7a36fd2134	disabled a variety of simulation tests to get correctness clean	2017-10-19 15:49:54 -07:00
Evan Tschannen	e2c1e87df6	made a large number of fixes to make fearless DR correctness clean.	2017-10-19 15:36:32 -07:00
Bhaskar Muppana	360b777b78	Fail with correct error code in case of abort or discontinue of non-existing backups.	2017-10-18 23:17:48 -07:00
Alec Grieser	dd6d8f3b0e	Merge branch 'master' into add-new-atomic-ops	2017-10-18 16:36:44 -07:00
Bhaskar Muppana	2007f3799f	Don't ignore TimeKeeper failures.	2017-10-18 14:31:31 -07:00
Bhaskar Muppana	314511f4d7	Fixing spaces in BackupCorrectness TraceEvents.	2017-10-18 14:27:52 -07:00
Alex Miller	7b9bc1d715	Merge pull request #170 from cie/alexmiller/flowprofile Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.	2017-10-16 16:51:53 -07:00
Alex Miller	f997cb9038	Add a string knob to hold the Log directory, and write profiles to it. This is the combination of two small changes. 1. Add support for a string knob type. 2. Change profiles to be written to the log directory instead of the working directory. We have three options of where to write files: the working directory, the data directory, and the log directory. The working directory may be set to a non-writable location, and likely contains the fdb binaries. Allowing these files to be overwritten would likely not be a wise idea. The data directory hosts our sqlite b-trees. It would also be very unfortunate if these were ever overwritten by an unfortunate profile name. The log directory contains logs. Out of the three, these matter the least if they disappear or become corrupted. Thus, we write to the log directory.	2017-10-16 16:05:02 -07:00
Alex Miller	c5fbe33df6	Disallow arbitrary paths for storing profiles. Previously, one could request profiles to be stored at "../../../../../../etc/passwd". Now we expand the paths, including symlinks, and ensure that the target is a child of the targetted subdirectory. This was the least convoluted way I could figure out to handle paths.	2017-10-16 16:05:02 -07:00
Alex Miller	91a26a170c	Add toggleable profiling support to fdbserver+fdbcli. This adds the fdbcli commands: * profile list -- Lists all workers in a way that doesn't fill `kill`'s list. * profile flow run -- Allows starting flow profiling on a set of hosts for a specified interval. And threads through all the support for enabling and disabling profiling as an RPC.	2017-10-16 16:05:02 -07:00
Balachandar Namasivayam	312f614133	Add the new ops and AND to NON_ASSOCIATIVE_MASK. In the storage server, read the entire value if the op is ByteMin or ByteMax.	2017-10-16 11:06:31 -07:00
Alec Grieser	e0be1ef1e0	Merge branch 'release-5.0'	2017-10-16 10:08:11 -07:00
Alec Grieser	432726ba2d	Merge branch 'release-4.6' into release-5.0	2017-10-16 09:54:21 -07:00
Stephen Atherton	68eccb681e	Merge pull request #173 from bmuppana/master Backup log messages.	2017-10-13 18:31:53 -07:00
Evan Tschannen	215bcb8d3e	Merge pull request #157 from cie/choose-leader-on-stateless-processes Catch and update processClass change from DBSource	2017-10-13 14:03:29 -07:00
Yichi Chiang	5bcdd37c0d	Move UID generation and add initialClass	2017-10-13 13:46:37 -07:00
Yichi Chiang	12edd27281	Introduce prevChangeID to CandidacyRequest and LeaderHeartbeatRequest	2017-10-12 17:11:58 -07:00
Bhaskar Muppana	d1e9d28239	Backup log messages.	2017-10-12 16:12:42 -07:00
Stephen Atherton	11517f7bfc	Merge branch 'master' into continuous-backup # Conflicts: # fdbclient/FileBackupAgent.actor.cpp	2017-10-12 11:03:23 -07:00
Alex Miller	c24b941485	Fix erroneous std::move in indexed set, and clean up addMetric users. This is a follow-on to c4eb73d0. Thanks to Bala for pointing out the unchanged std::move usage, and there appeared to not be many existing users of addMetric anyway.	2017-10-11 17:36:51 -07:00
Balachandar Namasivayam	8e0bea2795	Update API_VERSION from 500 to 510	2017-10-11 13:49:38 -07:00
Stephen Atherton	c3d8412abb	Merge pull request #166 from cie/alexmiller/deathservice Fix potential division by zero issues via RPC.	2017-10-10 16:47:38 -07:00
Evan Tschannen	ff1b49be2e	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/DatabaseConfiguration.cpp	2017-10-10 16:07:59 -07:00
Evan Tschannen	8feb3b8fbc	fixed conflict range workload by just disabling timeKeeper instead of the check, because it should be a more robust fix	2017-10-10 16:01:02 -07:00
Balachandar Namasivayam	eeebf10030	Modified existing behavior of MIN and AND atomic ops. The new behavior results in a 'SET' if the atomic op is performed on a non -existing key. Added new atomic ops ByteMin and ByteMax that does lexicographic comparison of byte strings.	2017-10-10 13:02:22 -07:00
Evan Tschannen	c8525dc3e7	timekeeper is constantly changing keys in the system keyspace, so do not report errors on key mismatches on keys in the system keyspace	2017-10-10 12:04:56 -07:00
Evan Tschannen	3d2103075d	data distribution tracks teams for each data center separately	2017-10-10 10:36:33 -07:00
Evan Tschannen	5e6eba365b	fix: always set confChange, because popVersion is not deterministic across proxies, and confChange needs to be set deterministically	2017-10-06 18:37:08 -07:00
Evan Tschannen	93b3d0e4e7	fix: toMap didn’t report logs proxies and resolvers	2017-10-06 15:55:50 -07:00
Evan Tschannen	15962cf079	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbrpc/Locality.cpp # fdbrpc/Locality.h # fdbserver/ClusterController.actor.cpp # fdbserver/ClusterRecruitmentInterface.h # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/fdbserver.vcxproj.filters # fdbserver/masterserver.actor.cpp # fdbserver/worker.actor.cpp # flow/error_definitions.h	2017-10-05 17:09:44 -07:00
Alex Miller	a21c8a820b	Move cpuProfilerRequest from WorkerInterface to ClientWorkerInterface. A way to access this stream is required if we wish to be able to toggle profiling from fdbcli. There's two ways to do this: 1. Use `monitorLeader()` to get a `ClusterControllerFullInterface`, and use `getWorkers` from there to get a list of `WorkerInterface`s, from which we can access cpuProfilerRequest. 2. Move cpuProfilerRequest to ClientWorkerInterface and use the existing code in the client that can fetch a list of all `ClientWorkerInterface`s. The split between WorkerInterface and ClientWorkerInterface appears to be what a client might have a need to call versus what is fdbserver-internal (and thus no client should even want to call). Thus, it seems to make more sense to acknowledge that profiling is useful to be able to toggle from a client, and go with option (2).	2017-10-05 14:08:28 -07:00
Yichi Chiang	3edc2824a9	Add initialClass to RegisterWorkerRequest 2	2017-10-05 11:03:25 -07:00
Yichi Chiang	05f7626e39	Add initialClass to RegisterWorkerRequest	2017-10-04 17:11:12 -07:00
Yichi Chiang	3c70df57b5	Fix cluster controller review comments	2017-10-04 15:48:55 -07:00
Alex Miller	e55cc447d2	Address code review comments. * Fixed memory corruption with SystemData key constants * Removed duplication in ClusterController * Reworked fdbcli actions to better represent explicit vs default assignments	2017-10-04 13:36:18 -07:00
A.J. Beamon	5063793f36	Revert line ending change	2017-10-04 11:19:19 -07:00
Alex Miller	706427ee62	Fix potential division by zero issues via RPC. A carefully crafted SplitMetricRequest could have caused division by zero. It's not really great to offer Division By Zero As A Service, so let's just return an error instead.	2017-10-03 22:11:08 -07:00
Evan Tschannen	3a2ddcc84a	Add destinations that are read-write to the source list, so that cancelled data movement can contribute to copying the data for the next movement.	2017-10-03 17:39:08 -07:00
Balachandar Namasivayam	0e153cdd35	Throttle Spammy logs. Three knobs are added. Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent. If a TraceEvent is throttled, a warning msg is logged.	2017-10-02 18:43:11 -07:00
Evan Tschannen	6ea9903c82	Merge branch 'release-5.0' # Conflicts: # fdbbackup/backup.actor.cpp # fdbserver/ClusterController.actor.cpp # versions.target	2017-10-01 18:46:44 -07:00
Evan Tschannen	0949c4be65	Revert "Fixed problem with master being recruited on excluded servers" This reverts commit 1f7b624734a8ad6e896dd3f01f9cdf334ca62486.	2017-10-01 16:30:19 -07:00
Evan Tschannen	696d432462	Revert "fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)" This reverts commit 83b2ce68c8e1a29fc1559598cc38d3ef7eb46101.	2017-10-01 16:29:32 -07:00
Evan Tschannen	0dde15f1d2	fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded) fix: better master exists did not use exclusions because the configuration was reset	2017-10-01 16:26:58 -07:00
Yichi Chiang	636ce4a131	Replace leader when find a better one	2017-09-29 16:34:55 -07:00
Alex Miller	11668bb359	Fixing code review comments.	2017-09-29 15:58:36 -07:00
Alex Miller	b7ce9d996c	Comment out verbose TraceEvents in preparation for pushing.	2017-09-29 15:58:36 -07:00
Alex Miller	c40c1bb5fe	Add a new workload: BackupToDBAbort, which does an ACI switchover. This is to allower easier testing of non-durable switchovers without having to wiggle into BackupToDBCorrectness's view of the world.	2017-09-29 15:58:36 -07:00
Alex Miller	9e9a96ae76	Make VersionStamp workload able to run with DR-style workloads. * It is now tolerant of locked database errors, and handles them correctly. * There is an option to specify which database to verify against.	2017-09-29 15:58:36 -07:00
Alex Miller	34630b6130	Make VersionStamp workload can handle commit_unknown_result. Previously, if a transaction failed with commit_unknown_result, and was actually committed, it would look like data that magically appeared in the database and verification would fail. Now, we explicitly re-read and check to see if the commit happened, so that we may maintain an accurate understanding of what the database state should be.	2017-09-29 15:58:36 -07:00
Alex Miller	23945b9fea	VersionStamp can co-exist with other workloads that write data to the database. VersionStamp previously would range-read the entire database during validation. This has the unfortunate effect of making it fail during validation if run with any other workload that writes keys to the database. Now, all keys written and read are done with a configurable prefix, so that it may co-exist with a variety of other workloads.	2017-09-29 15:58:36 -07:00
Alex Miller	370a6afb80	Make VersionStamp have an option to be tolerant of data being lost.	2017-09-29 15:58:36 -07:00
Alex Miller	8f4c45418b	Make atomicSwitchover preserve an ever-increasing commit version.	2017-09-29 15:58:36 -07:00
Alex Miller	69523ce151	Hackish version of a test, but it does fail.	2017-09-29 15:58:36 -07:00
Alex Miller	65713b226f	Fix whitespace and line endings.	2017-09-29 15:58:36 -07:00
Evan Tschannen	e2b65e86ed	added configurable memory limits for backup and dr executables added a default memory limit of 8GB for fdbcli	2017-09-29 10:35:40 -07:00
Bhaskar Muppana	91975244fe	Fixing OSX build.	2017-09-28 19:35:44 -07:00
Bhaskar Muppana	942c04e992	Merge pull request #162 from bmuppana/master Fixing TimeKeeperCorrectness to deal with network delays.	2017-09-28 17:04:39 -07:00
Bhaskar Muppana	3d2bafc3a6	Fixing TimeKeeperCorrectness to deal with network delays.	2017-09-28 16:52:28 -07:00
Evan Tschannen	ef41b07bb3	renamed past_version to transaction_too_old implemented read_lock_aware option	2017-09-28 16:35:08 -07:00
Yichi Chiang	d4f75630de	Support log group field in status json	2017-09-28 16:31:29 -07:00
Evan Tschannen	7b60e26660	Merge pull request #160 from cie/use-error-descriptions Add the ability to access name and description in Error. Update error…	2017-09-28 16:00:39 -07:00
Evan Tschannen	5f4b997400	emergency teams are bad for performance, because we will route client read requests to servers that do not have the data, therefore getting many wrong shard server errors. emergency teams only protect us from data loss in very rare scenarios, we may want to add them in again in the future, but make sure load balance knows which storage servers used to be destinations so they can only route to them as a last resort.	2017-09-28 13:20:01 -07:00
Evan Tschannen	73fca75239	added the ability to disable timeKeeper; disabled timeKeeper before consistency check in simulation	2017-09-28 13:13:24 -07:00
A.J. Beamon	d30c730f75	Add the ability to access name and description in Error. Update error descriptions.	2017-09-28 12:35:03 -07:00
Bhaskar Muppana	0f8ff26029	Merge pull request #158 from bmuppana/master <rdar://problem/34557380> Need a way to map real time to version	2017-09-27 17:56:42 -07:00
Bhaskar Muppana	6a0b1d6808	Fixing PR comments <rdar://problem/34557380> Need a way to map real time to version	2017-09-27 17:56:01 -07:00
Evan Tschannen	4b21da1cd6	fix: lastVersionWithData was not updated when fetchKeys injects mutations	2017-09-27 10:44:34 -07:00
Evan Tschannen	acb7e66d01	fix: failed logs do not count even if they have returned a result	2017-09-25 18:14:40 -07:00
Evan Tschannen	2bf042a559	fix: file_corrupt was not checking for fault injection latency threshold was too long	2017-09-25 17:22:41 -07:00
Bhaskar Muppana	0bf5bdb23a	<rdar://problem/34557380> Need a way to map real time to version	2017-09-25 12:51:37 -07:00
Yichi Chiang	6758c649fc	Catch and update processClass change from DBSource	2017-09-25 10:36:03 -07:00
Evan Tschannen	cce4eeb52d	fix: the master was sending the cluster controller uninitialized configurations	2017-09-22 16:59:24 -07:00
Evan Tschannen	180438d41e	fix: use the number of present logServers rather than the total size of the vector	2017-09-22 16:19:16 -07:00
Evan Tschannen	738ae21c3a	fix: an optimization in buggified locking can cause recovery to break because it would not restart if a locked process was killed when the remaining logs cannot obtain a quorum	2017-09-22 15:07:57 -07:00
Alex Miller	585c9bf68f	Quick fix to reduce CPU usage of ensureEpochLive. It is suspected that policy recomputations are driving proxy CPU usage up, and thus latency and throughput down. To quickly confirm this theory, we're forcing ensureEpochLive to wait until it has RF responses, which means we'll probably only validate the policy once per call.	2017-09-21 18:22:24 -07:00
Evan Tschannen	fbd67ea547	fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded) fix: better master exists did not use exclusions because the configuration was reset	2017-09-20 11:48:26 -07:00
Evan Tschannen	cb43563b2d	fix: toMap properly lists the redundancy mode of the cluster	2017-09-19 16:35:42 -07:00
Evan Tschannen	f75dfc3153	do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response	2017-09-18 17:39:12 -07:00
Alex Miller	567d663afd	Fix SimulationConfig never generating a custom config. A 0 was changed to a 1 when rewriting code, and `case 0:` was never being hit. :( Thankfully, it looks like nothing was broken by this in the meantime.	2017-09-18 17:29:36 -07:00
Evan Tschannen	e8b895c878	added the ability to disable connection failures for a period of time after one happens	2017-09-18 12:46:29 -07:00
Evan Tschannen	489332533c	all timeouts longer than two minutes have been can be lowered to 60.0 with buggification added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures	2017-09-18 11:04:51 -07:00
Evan Tschannen	34f987f56d	added a test in simulation which ensures that a recovery after a single failure takes less than 15 seconds	2017-09-15 17:55:01 -07:00
Evan Tschannen	d9b64899c5	fix: we need to wait for log server failures if we have not locked all of the logs	2017-09-15 13:11:21 -07:00
Evan Tschannen	36c98f18e9	do not register a worker with the cluster controller until it has finished recovering all files from disk	2017-09-15 10:57:58 -07:00
Evan Tschannen	f3b7aa615d	fix: seed storage servers are recruited based on the storage policy	2017-09-14 17:06:00 -07:00
Alvin Moore	9404d226d0	Merge branch 'release-5.0'	2017-09-13 16:49:00 -07:00
Alvin Moore	cb92194772	Fixed problem with master being recruited on excluded servers	2017-09-13 16:48:27 -07:00
Alex Miller	5e14f19875	Merge pull request #147 from cie/alexmiller/grvtlogs Only verify a quorum of TLogs are unlocked for a GRV request	2017-09-13 16:07:25 -07:00
Alex Miller	d6b3be98fe	Fix whitespace.	2017-09-13 15:49:39 -07:00
Alex Miller	06a9c7a772	Remove unnecessary policy recomputations in confirmEpochLive. Watching for interface changes on readied servers was done as a workaround for a case where all futures could be ready, but the policy verification would never succeed. This turns out to be because stopping a tlog causes an error to be returned. However, if a TLog is stopped, then we know that we can't do any more commits, so we can just immediately stop trying and never mark our future as ready.	2017-09-13 15:45:09 -07:00
Evan Tschannen	8cb53fd608	Merge pull request #149 from cie/choose-leader-on-stateless-processes choose leader on the perferred process class	2017-09-13 13:58:49 -07:00
Evan Tschannen	aea7a78cff	cluster controller changes were not maintained during merge	2017-09-11 17:40:46 -07:00
Evan Tschannen	d343d37274	fixed merge problems	2017-09-11 16:37:10 -07:00
Evan Tschannen	76e7988663	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/ClusterController.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/OldTLogServer.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/WorkerInterface.h # flow/Net2.actor.cpp	2017-09-11 15:15:56 -07:00
A.J. Beamon	4fa2415553	Merge branch 'release-5.0'	2017-09-08 17:28:12 -07:00
A.J. Beamon	bb8a245bdb	circus: throughput test scales latency error by the target latency	2017-09-08 17:27:54 -07:00
Evan Tschannen	ea26bc1c43	passed first tests which kill entire datacenters added configuration options for the remote data center and satellite data centers updated cluster controller recruitment logic refactors how master writes core state updated log recovery, and log system peeking	2017-09-07 15:32:08 -07:00
Yichi Chiang	bd1c7e7295	Use addTeamsBestOf() instead of addAllTeams() when team size is greater than 3	2017-09-07 12:31:01 -07:00
Bhaskar Muppana	c7df951f7c	Using BackupConfig from backup.actor.cpp to reduce intermediate functions.	2017-09-07 08:36:36 -07:00
Bhaskar Muppana	fe208d6adf	Merge branch 'master' of github.com:apple/foundationdb into backup	2017-09-06 10:01:55 -07:00
Bhaskar Muppana	83810edabc	Backup/Restore tag can be std::string instad of Key.	2017-09-05 11:38:40 -07:00
Evan Tschannen	dc1f7ca6b7	testers now use client locality load balancing	2017-09-01 12:53:01 -07:00
A.J. Beamon	cc24072a5d	Add the multi version API to the list of APIs to choose in the APICorrectness tester. Support for the multi-version client already existed.	2017-08-31 16:23:55 -07:00
Evan Tschannen	d61be4c760	Merge branch 'release-5.0'	2017-08-30 12:59:24 -07:00
Evan Tschannen	963e1c3f31	fix: we need to reboot the process even if it will result in too many files, because the check will not succeed without it	2017-08-30 12:58:46 -07:00
Alex Miller	8d97a15c3f	BUGGIFY recovery to lock only the minimum number of TLogs required to prevent a quorum. This is to test the quorum logic introduced in the previous patch, and should flush out any other bugs that rely on TLog locking during recovery.	2017-08-29 14:43:40 -07:00
Alex Miller	f8486d1368	Only ensure a quorum of TLogs are unlocked to confirm the epoch hasn't ended. Currently, GRV will wait to hear back from (almost) all TLogs to confirm that they're unlocked and that the current epoch hasn't ended. This confirms that there isn't a new set of proxies and using the commit version from the old set of proxies would violate causal consistency. However, during recovery, we ensure that no quorum of TLogs exists before starting a new epoch and allowing new commits on the new TLogs. Thus, we only need to wait until we have a quorum of TLogs that are unlocked. This should be a significant improvement in latency particularly for the cases when we start running >10 TLogs.	2017-08-29 14:43:40 -07:00
Alex Miller	4c1d61cd08	Assorted minor changes. In which we: * Clarify some math in a comment * Remove misleading debugging information * Add a useful trace event	2017-08-29 14:43:40 -07:00
Alex Miller	dbfa94f735	LF -> CRLF It appears a previous patch left parts of this file ending with LF, and the majority of the file ends in CRLF. I see no reason to keep this inconsistency, but these line ending wars are going to drive me insane.	2017-08-29 14:43:40 -07:00
Alvin Moore	6020d70863	Added trace event to track reboots initiated by ConsistencyCheck workload in simulation	2017-08-29 11:41:27 -07:00
Alvin Moore	c95a1be5ec	Add trace event for rebooting process during simulation for consistency check	2017-08-29 11:00:44 -07:00
A.J. Beamon	86774f6e42	Merge branch 'release-5.0'	2017-08-28 17:17:00 -07:00
A.J. Beamon	03478561b9	fix: Set lock aware at the transaction level for latency probe to avoid having to fill the shard cache every time.	2017-08-28 17:16:46 -07:00
A.J. Beamon	9a0a3b6329	Merge commit '66528becb82d826e81fa644bb378212584ab580e'	2017-08-28 16:47:59 -07:00
Yichi Chiang	9fe927127f	choose leader on the perferred process class	2017-08-28 14:41:04 -07:00
Alvin Moore	44e0df78c5	Added support for tracking roles for simulation workers Fixed the exclusion and inclusion address simulation API and integration within workloads Added more information within trace events for simulation	2017-08-28 11:25:37 -07:00
Alvin Moore	581bd6c8ed	Added option to delay the displaying of the simulation workers	2017-08-28 10:53:56 -07:00
Alec Grieser	300b5a17ed	Merge branch 'release-5.0'	2017-08-25 18:55:33 -07:00
Evan Tschannen	272b4b984c	fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine	2017-08-25 10:12:58 -07:00
Evan Tschannen	26a5b5e422	rollback workload now clogs the communication between one of the proxies and the tlogs, since that is what will cause a rollback	2017-08-23 16:08:13 -07:00
A.J. Beamon	4c706d33e9	Merge branch 'release-5.0'	2017-08-23 14:59:43 -07:00
Evan Tschannen	be941b4bd1	sending void to committed could cause self to be deleted, so call cleanup before sending	2017-08-23 13:56:18 -07:00
Alvin Moore	7729f663e9	Ensured that the circus id is always lowercase	2017-08-23 13:45:00 -07:00
Evan Tschannen	f9308b8fa6	Merge pull request #145 from cie/alexmiller/simrefactor Refactor simulation to pull all configuration parameters into one struct.	2017-08-23 12:54:21 -07:00
Evan Tschannen	4b40f817f1	fix: is recovery is cancelled before the copy is complete, remove the tlog	2017-08-23 12:26:03 -07:00
Alvin Moore	8056b78414	Merge branch 'release-5.0'	2017-08-22 13:51:19 -07:00
Alvin Moore	814e471689	Added support for displaying initial workers via printf within simulation using a workload	2017-08-22 13:38:24 -07:00
Alex Miller	7b78035365	Have SimulationConfig wrap DatabaseConfiguration to reduce code duplication. This effectively turns initializing SimulationConfig into the equivalent of building a config string and calling buildConfiguration on it.	2017-08-22 10:13:57 -07:00
Alex Miller	9b25c72971	Pull database config and cluster config into one struct. This will allow us to specify custom situations to be chosen more frequently, and in particular control machines and processes.	2017-08-21 22:35:44 -07:00
Alec Grieser	5ee07b1a9e	Merge branch 'release-5.0'	2017-08-14 16:56:58 -07:00
Evan Tschannen	de1b590a8a	The TLog did not delete data from removed logs The TLog continued to make data from removed logs persistent	2017-08-11 18:08:09 -07:00
Stephen Atherton	50fb44be92	Merge branch 'release-5.0' # Conflicts: # versions.target	2017-08-09 23:36:12 -07:00
Evan Tschannen	2335fc73f2	fix: peek cursors were being timed out every 10 minutes, instead of 10 minutes after the last use fix: if an interface is changed while we are not waiting in getMore, we will not reset the sequence to 0.	2017-08-09 15:58:06 -07:00
Evan Tschannen	47a37f3f1e	Merge pull request #135 from cie/switch-for-data-distribution Add a switch to turn off data distribution in CLI	2017-08-07 12:54:08 -07:00
Evan Tschannen	c22708b6d6	added tag localities fix: remote logs need to stop the master when they are stopped	2017-08-03 16:16:36 -07:00
Alec Grieser	ca7437ecf6	Merge branch 'release-5.0'	2017-08-02 22:07:01 -07:00
John King	d0fbc41338	set LOCK_AWARE on several transactions used for getting cluster info for the consistency check	2017-07-28 18:50:32 -07:00
Yichi Chiang	6a8a5c41b0	Add a switch to turn off data distribution in CLI	2017-07-28 18:14:55 -07:00
A.J. Beamon	4243486f54	fix: TLogMetrics was being track latested with the wrong ID	2017-07-28 14:37:23 -07:00
Yichi Chiang	37e5e2acbb	Fix parentheses issue in StorageMetrics.actor.h	2017-07-27 12:03:36 -07:00
Yichi Chiang	cdc62e265c	Merge pull request #133 from cie/shard-system-keyspace Shard system keyspace	2017-07-26 17:09:13 -07:00
A.J. Beamon	41c90bcdea	Merge commit '89ac94853c70d08289e7fb58055bc5d0cd4e494d'	2017-07-26 15:35:36 -07:00
A.J. Beamon	d8e308c18f	Enable use of incremental delete when deleting disk queue and sqlite KVS sqlite files.	2017-07-26 14:11:11 -07:00
Yichi Chiang	53e1ae9f60	shard system keyspace	2017-07-26 13:47:31 -07:00
Stephen Atherton	4aaee86c2a	Moved MetricLogger actor to fdbclient so applications other than fdbserver can use it.	2017-07-24 13:13:06 -07:00
Evan Tschannen	2ae445782e	fix: cannot rely on the bestServer’s version because other logs may have higher versions	2017-07-21 19:21:49 -07:00
Evan Tschannen	f6826f1e15	fix: log routers were popped at too high of a version fix: make sure tlogs make everything durable fix: make cluster controller’s temporary remote log recruitment not interfere with better master exists	2017-07-20 16:26:05 -07:00
Evan Tschannen	7fec378830	do not continue copying data from prior generations after being locked	2017-07-19 15:11:18 -07:00
Evan Tschannen	5852a6301b	fixed even more bugs	2017-07-15 15:15:03 -07:00
Alec Grieser	c860f09d8a	Merge branch 'release-5.0'	2017-07-14 16:01:15 -07:00
Alec Grieser	660729839c	moved Notified.h from flow -> fdbclient ; flow bindings package does better job when excluding testers	2017-07-14 15:49:30 -07:00
Evan Tschannen	5ac4de8775	fix: the same tag could be in the server tags list twice	2017-07-13 16:31:55 -07:00
Evan Tschannen	57ba9d36af	fixed a large number of bugs	2017-07-13 12:29:21 -07:00
Alec Grieser	f75b6f333b	Merge branch 'release-5.0'	2017-07-13 11:21:18 -07:00
Evan Tschannen	415458deef	made LogSet reference counted, fixed a few bugs	2017-07-11 15:48:10 -07:00
Evan Tschannen	81ae263ad9	implemented setPeekCursor removed oldTLogServer first compiling version	2017-07-10 17:41:32 -07:00
Evan Tschannen	979ebcef6c	changed to using a vector of logSets instead of a duplicate set of logs for remote servers finished porting changes to the tlog everything but peeking is finished in the TagPartitionedLogSystem	2017-07-09 14:46:16 -07:00
Evan Tschannen	aa1c903b52	fix: do not log that data distribution is initialized until readyToStart is ready	2017-06-30 16:21:59 -07:00
Evan Tschannen	0906250e78	merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem re-included tags in messages to the tlog previously never committed the LogRouter	2017-06-29 15:50:19 -07:00
A.J. Beamon	a2d229ff00	Merge branch 'release-5.0'	2017-06-29 08:16:22 -07:00
Evan Tschannen	69c862ed6e	updated release notes for 5.0.1	2017-06-28 16:52:45 -07:00
Evan Tschannen	663535e00a	add the same number for shared metrics and local metrics	2017-06-28 15:21:54 -07:00
Evan Tschannen	fe37d0b056	added additional logging about bytesInput and bytesDurable	2017-06-28 13:29:40 -07:00
Alvin Moore	31d562ff7b	Merge branch 'release-5.0' # Conflicts: # fdbclient/ManagementAPI.actor.cpp # fdbserver/DatabaseConfiguration.cpp # versions.target	2017-06-27 11:16:08 -07:00
Evan Tschannen	9fd5955e92	Merge branch 'master' into removing-old-dc-code	2017-06-26 16:27:10 -07:00
Evan Tschannen	15cb498aa7	removed fast_recovery_double and fast_recovery_triple from the fdbcli	2017-06-23 16:18:23 -07:00
Evan Tschannen	533dca95d8	fix: bytesDurable was not correctly increased when a log was removed re-added many TLogMetrics added a new role for the shared tlog	2017-06-22 17:21:42 -07:00
Alvin Moore	6d19580789	Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0 # Conflicts: # fdbrpc/simulator.h	2017-06-19 17:39:37 -07:00
Alvin Moore	96f40d8eb0	Added support for optionally re-including excluded servers Removed unneeded code to protect coordinators Ensured that simulation exclusion list is updated for excluded processes	2017-06-19 16:51:07 -07:00
Alvin Moore	9553458b78	Updated simulation to support managing exclusion and inclusion address Added method for identifying acceptable availability process classes Extended cluster availability function to ensure coordinators can be auto configured Fixed availability function to allow protected processes to be considered as dead if not available Added debug trace events for providing machine state when considering availability Added trace event for protected coordinators	2017-06-19 16:48:15 -07:00
Stephen Atherton	f405c8d88e	Merge branch 'release-4.6' into release-5.0 # Conflicts: # fdbrpc/AsyncFileKAIO.actor.h # fdbrpc/sim2.actor.cpp # fdbserver/optimisttest.actor.cpp # versions.target	2017-06-15 17:40:19 -07:00
Stephen Atherton	5de6c703cf	Fixed line endings which were changed to UNIX style during a recent merge.	2017-06-15 16:57:35 -07:00
Evan Tschannen	4bdcd8fc12	Merge branch 'release-4.6' into release-5.0 # Conflicts: # bindings/bindingtester/run_binding_tester.sh # fdbrpc/AsyncFileKAIO.actor.h	2017-06-14 16:43:53 -07:00
Evan Tschannen	cefaa2391d	fix: tlog actor needs to be cancelled when worker shutdown	2017-06-02 17:56:47 -07:00
Evan Tschannen	766dc23e26	fix: do not use TLS in protectedAddresses	2017-06-02 13:52:21 -07:00
Evan Tschannen	b108c1ea0d	fix: do not delay before setting logData->version	2017-06-02 11:27:37 -07:00
Evan Tschannen	2d0dbd57e8	randomized the delays in atomic switchover workload	2017-06-01 12:08:21 -07:00
Evan Tschannen	bfcbb5623f	fixed build from merge error	2017-06-01 12:07:30 -07:00
Evan Tschannen	276073d91b	Merge branch 'release-4.6' into release-5.0	2017-06-01 11:54:54 -07:00
Stephen Atherton	fa4fdb1f1d	Merge branch 'fix-io-timeout-handling' into release-5.0 # Conflicts: # fdbserver/optimisttest.actor.cpp	2017-05-31 17:03:15 -07:00
Evan Tschannen	1626e16377	Merge branch 'release-4.6' into release-5.0	2017-05-31 16:23:37 -07:00
Alvin Moore	333f2e4865	Added connection fault disabler within setup of backup submission. It should be reviewed to determine the amount of time to wait before disabling	2017-05-31 14:21:50 -07:00
Stephen Atherton	98604d33a0	Merge branch 'fix-io-timeout-handling' # Conflicts: # fdbrpc/AsyncFileKAIO.actor.h # fdbrpc/sim2.actor.cpp # fdbserver/KeyValueStoreSQLite.actor.cpp # fdbserver/optimisttest.actor.cpp # fdbserver/worker.actor.cpp # fdbserver/workloads/MachineAttrition.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2017-05-26 18:43:08 -07:00
Alvin Moore	ba606c2135	Removed NOT_IN_CLEAN macro from file	2017-05-26 14:52:06 -07:00
Alvin Moore	b28ed397a2	Fixed printf field width specifier to reduce compilation warnings within OS X	2017-05-26 14:51:34 -07:00
Alvin Moore	0b9ed67e12	Fixed support for RemoveServers Workload Added availability functions to simulation	2017-05-26 14:20:11 -07:00
Alvin Moore	16cc0821b1	Removed dead machine option from simulation	2017-05-25 16:29:02 -07:00
FDB Dev Team	a674cb4ef4	Initial repository commit	2017-05-25 13:48:44 -07:00

... 5 6 7 8 9 ...

587 Commits