foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	3abf4d7fdf	Merge branch 'master' into feature-remote-logs	2018-03-09 14:50:04 -08:00
Evan Tschannen	91bb8faa45	Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa' # Conflicts: # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-03-09 14:47:03 -08:00
Evan Tschannen	28ea983487	Merge branch 'release-5.1' into release-5.2 # Conflicts: # flow/Trace.cpp # versions.target	2018-03-09 14:40:31 -08:00
A.J. Beamon	bb9f51bb5c	Don't try to extract attributes from the program start trace events if they couldn't be collected.	2018-03-09 11:55:57 -08:00
Evan Tschannen	cf6dd1437b	suppress spammy trace events	2018-03-09 10:16:34 -08:00
Evan Tschannen	ae7d8e90b2	Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1	2018-03-09 09:56:09 -08:00
Evan Tschannen	5390af8be4	suppress spammy logs	2018-03-09 09:40:36 -08:00
A.J. Beamon	1bf9f0ec6b	Merge pull request #54 from etschannen/release-5.1 fix: new cluster controllers should not consider anything failed unti…	2018-03-09 09:28:21 -08:00
Evan Tschannen	f9625f5b2f	fix: new cluster controllers should not consider anything failed until they have time to get failure monitoring updates fix: storage and log class machines wait 100MS before attempting to become the cluster controller	2018-03-08 18:08:41 -08:00
Balachandar Namasivayam	e7309a3535	Add trace events to print the ranges in ConsistencyCheck.	2018-03-08 13:53:59 -08:00
Evan Tschannen	cf9d02cdbd	Merge pull request #48 from apple/release-5.2 Merge release-5.2 into master	2018-03-08 13:21:26 -08:00
A.J. Beamon	2c92ef8ff8	Merge pull request #47 from apple/release-5.1 Merge Release 5.1 into Release 5.2	2018-03-08 13:18:45 -08:00
A.J. Beamon	73cec8abad	Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1	2018-03-08 11:47:44 -08:00
Balachandar Namasivayam	4f58bca66a	Simple refactor of code...	2018-03-08 11:34:25 -08:00
Balachandar Namasivayam	1c1a497ea2	Refactor getKeyServers to be more readable. Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers. Simplify getMasterProxies on DatabaseContext class.	2018-03-08 11:34:18 -08:00
Balachandar Namasivayam	03a40354e3	Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100. Fix the bug where some of the key server shards may not be fetched.	2018-03-08 11:34:11 -08:00
A.J. Beamon	fdcaf473ae	Don't pass a copy of the StorageServerInterface to storageServerRollbackRebooter. This prevents a situation where the storage server has terminated but the request streams are left open until the underlying KV-store gets closed.	2018-03-08 11:14:24 -08:00
Evan Tschannen	fa7eaea7cf	fix: shards affected by team failure did not properly handle separate teams for the remote and primary data centers	2018-03-08 10:50:05 -08:00
bnamasivayam	f838bc077e	Merge pull request #36 from ajbeamon/release-5.2 Set the address in consistency check processes…	2018-03-07 15:00:14 -08:00
Evan Tschannen	9d4cdc828b	fix: inactive cursors are still useful if their version is larger than the current version	2018-03-07 12:54:53 -08:00
Evan Tschannen	68606c7984	fix: sim2 logic for when a kill is safe was incorrect	2018-03-06 18:38:05 -08:00
Alec Grieser	2a2ac56529	Merge pull request #22 from alecgrieser/37844532-expose-append-if-fits Expose APPEND_IF_FITS to clients	2018-03-06 16:31:36 -08:00
Evan Tschannen	8c88041608	fix: we must commit to the number of log routers we are going to use when recruiting the primary, because it determines the number of log router tags that will be attached to mutations	2018-03-06 16:31:21 -08:00
A.J. Beamon	232bd496bf	Set the address in consistency check processes in the same way we set it for clients so that it shows up in trace logs. Disallow specifying a public address for consistency check processes.	2018-03-06 15:40:04 -08:00
A.J. Beamon	7f8f655b9c	Revert "Fix build errors" This reverts commit `51804f0504`.	2018-03-06 10:28:39 -08:00
A.J. Beamon	f2c804e14f	Reverting changes from merge of master into release-5.2 (`b25810711c`). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit.	2018-03-06 10:15:04 -08:00
Evan Tschannen	1194e3a361	added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.	2018-03-05 19:27:46 -08:00
Balachandar Namasivayam	aea1f7ba21	Add tests for Client Transaction Profiling correctness	2018-03-05 18:55:23 -08:00
Balachandar Namasivayam	51804f0504	Fix build errors	2018-03-05 15:18:14 -08:00
A.J. Beamon	b25810711c	Merge branch 'master' into release-5.2	2018-03-05 10:32:57 -08:00
Balachandar Namasivayam	8ae640c062	Addressed review comments.	2018-03-02 17:56:49 -08:00
Alec Grieser	218b7a41e2	add APPEND_IF_FITS to workload and remove guard ; add command to vexillographer	2018-03-02 17:43:39 -08:00
Balachandar Namasivayam	11df1aeabf	Add new api to get shared tlogs id and address	2018-03-02 16:50:30 -08:00
Evan Tschannen	470f5c01f3	changed remoteDcId to a vector of ids, to support future configurations where there are multiple remote databases	2018-02-26 17:09:09 -08:00
Evan Tschannen	a67296b373	do not test fearless configurations to merge with master	2018-02-26 13:31:06 -08:00
Evan Tschannen	8e966fdf9c	simulated cluster tests all configurations. Still needs to randomize the remote and satellite replication, along with them number of remote tlogs, log routers, and satellite tlogs	2018-02-26 13:15:44 -08:00
Evan Tschannen	e3c6b66240	fix: do not commit more data after being stopped fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center	2018-02-26 13:13:37 -08:00
Evan Tschannen	37a6a81634	Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs # Conflicts: # fdbserver/workloads/RestartRecovery.actor.cpp	2018-02-23 12:33:28 -08:00
Evan Tschannen	cfcf98cffc	fix: log router tags were not stored at a best location	2018-02-23 12:26:19 -08:00
Evan Tschannen	a49e43000e	fix: did not peek from log routers correctly	2018-02-22 16:13:56 -08:00
Evan Tschannen	719bb5bd0c	Merge pull request #4 from bnamasivayam/getKeyServers-refactor Having 1000 as the limit for Limit for GetKeyServerLocationsRequest s…	2018-02-22 12:39:48 -08:00
Balachandar Namasivayam	2fe2b522d5	Simple refactor of code...	2018-02-22 12:38:14 -08:00
Alec Grieser	e1162e9238	Merge remote-tracking branch 'upstream/release-5.1'	2018-02-22 11:16:12 -08:00
Balachandar Namasivayam	e2030db5a8	Refactor getKeyServers to be more readable. Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers. Simplify getMasterProxies on DatabaseContext class.	2018-02-21 17:11:50 -08:00
Evan Tschannen	2aa273df96	addStorageServer was advancing tags too much because of read errors	2018-02-21 17:05:39 -08:00
Evan Tschannen	310f56d98a	fix: tlogs was resized incorrectly	2018-02-21 15:28:02 -08:00
Evan Tschannen	ddb484143c	fix: do not peek from remote logs if they are not fully recovered	2018-02-21 14:06:44 -08:00
Alec Grieser	0bae9880f1	remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py	2018-02-21 10:25:11 -08:00
Balachandar Namasivayam	6218934c7b	Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100. Fix the bug where some of the key server shards may not be fetched.	2018-02-20 17:41:34 -08:00
Evan Tschannen	1dc6a8d4bd	fix: the tlog can peek from log systems that have been recovered even if it does not match its recoverFrom set	2018-02-20 14:50:13 -08:00
Alec Grieser	aadc06de99	Merge remote-tracking branch 'upstream/release-5.1'	2018-02-20 14:28:29 -08:00
Evan Tschannen	9ea963ddd6	fix: the master did not detect core state changes if it changed while writing fix: do not attempt to use three_data_hall when in a fearless deployment fix: log router tags are ephemeral and can be cleared after every recovery	2018-02-19 16:49:57 -08:00
Evan Tschannen	1b5628d2c5	testing a single configured fearless setup in simulated cluster consolidated simulation connection disablers into one call in the tester automatically reconfigure from a fearless setup in simulation	2018-02-18 12:59:43 -08:00
Evan Tschannen	31b89a638f	added satellite_none and remote_none options to unconfigure from a fearless setup fix: log_router configuration was broken	2018-02-17 13:51:17 -08:00
Stephen Atherton	54fc81b260	Improved backup error reporting in backup status. The most recent error for each error type is reported along with how long ago the error occurred, and errors are divided into two categories based on whether or not they occurred since the most recent backup progress.	2018-02-16 19:38:31 -08:00
Evan Tschannen	dc93759e15	suppressed trace events that are spammy	2018-02-16 16:01:19 -08:00
Evan Tschannen	cb25564d38	simulated cluster supports fearless configurations removed unused simulation variables run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation	2018-02-15 18:32:39 -08:00
Evan Tschannen	ad19d3926b	fix: make sure there are enough machines in each dc to support triple replication for the configure workload	2018-02-14 17:06:22 -08:00
Evan Tschannen	5303962af6	re-enabled configure database and remove servers safely, even though they do not work with fearless	2018-02-14 16:07:23 -08:00
Evan Tschannen	ead3892e77	fix: prevent fast spin for future version	2018-02-14 15:16:18 -08:00
Evan Tschannen	110309272c	fix: do not count a server as read-write unless it has a recent version, because it could have been readable a long time ago	2018-02-14 15:09:19 -08:00
A.J. Beamon	3300c2efed	Enable slow task profiling in the consistency check processes.	2018-02-14 09:50:12 -08:00
Evan Tschannen	d2b0c07558	storage servers continue to attempt to pop old tags after the log system updates	2018-02-13 18:34:13 -08:00
Evan Tschannen	1fedcba890	fix: do not use log router tags when configured without remote logs fix: data distribution tracks undesired storage servers re-enabled consistency check	2018-02-13 17:01:34 -08:00
Evan Tschannen	a52ea4eb78	restored 5.1 functionality of simulated cluster. Will test assigned primary and remote data centers. Does not test remote replication or satellite logs	2018-02-10 13:27:51 -08:00
Evan Tschannen	42405c78a5	Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs # Conflicts: # fdbserver/ClusterController.actor.cpp # fdbserver/Knobs.cpp	2018-02-10 12:08:52 -08:00
Evan Tschannen	fbadcc6eea	changing a storage server’s tag must be the first mutations applied in a version, because privatized mutations applied earlier in the same version will use the old tag	2018-02-09 18:21:29 -08:00
Evan Tschannen	c7b3be5b19	re-enabled better master exists the cluster controller can choose a better data center for itself and let the workers know where the next cluster controller should be recruited	2018-02-09 16:48:55 -08:00
Stephen Atherton	acb876d520	Merge branch 'release-5.1'	2018-02-07 15:11:52 -08:00
Evan Tschannen	d0caffd339	fix: knob was set to incorrect value	2018-02-06 18:11:45 -08:00
Stephen Atherton	3a49211c44	Merge branch 'release-5.1'	2018-02-06 13:58:35 -08:00
Stephen Atherton	7de40413d5	Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1	2018-02-06 13:44:25 -08:00
Stephen Atherton	0792d5e3dd	Fix: last restorable version for a backup tag name (a separate value from the latest restorable version for a configured backup) was not being updated. Fix: backup blob speed was sometimes an error because the JSON $sum merge operator did not support mixed numeric types. Fix: JSON merge operator handling was squashing errors in some cases, which was generally obscuring the backup speed metric issue. Cleaned up some of the JSON object merging logic. Improved error messages in JSON merge operators. Added JSON merge operator tests for mixed numeric math and improved readability of test output.	2018-02-06 13:44:04 -08:00
Evan Tschannen	b7dde88029	fix: the cluster controller did not consider the master sharing the same process as the cluster controller as bad in all needed locations waited too long for good recruitment locations, which would add too much time to recoveries of clusters that do not use machine classes	2018-02-06 11:30:05 -08:00
Evan Tschannen	63a9f2aed6	fix: history tags were being incorrectly popped fix: history tags were not cleared when a storage server was removed	2018-02-03 12:20:18 -08:00
Evan Tschannen	ebd94bb654	removed a separately configurable storage team size for the remote data center, because it did not make sense fix: the master did not monitor for the failure of remote logs stop merge attempts when a data center is failed fixed a variety of other problems with data distribution when a data center is failed	2018-02-02 11:46:04 -08:00
Evan Tschannen	766964ff48	fix: dest tags were not repopulated when the tag cache was cleared	2018-01-31 17:35:48 -08:00
A.J. Beamon	0c601d6f85	Purge past version references	2018-01-31 12:05:41 -08:00
Evan Tschannen	6b54d56ca7	gracefully exit if attempting to upgrade from 4.X versions	2018-01-30 17:10:50 -08:00
Evan Tschannen	b48d8ce96d	getTeam will return an unhealthy exact match if all teams are unhealthy. Resubmit relocation requests once healthy teams are available	2018-01-30 17:00:51 -08:00
Evan Tschannen	4160765fa1	added a buggify which reboots a server immediately after it has changed its locality	2018-01-29 18:21:28 -08:00
Evan Tschannen	af97a512f5	to support more complicated policies in the future for determining the best location for a tag within a set of tlogs, use an integer instead of a bool	2018-01-29 17:48:18 -08:00
Evan Tschannen	497bc3fe83	fix: txsTag needs to choose the same best location as 5.X version of the software	2018-01-29 17:09:35 -08:00
Evan Tschannen	29c5d4ad3d	upgrades from 5.X mostly supported, still some remaining correctness problems	2018-01-28 11:52:54 -08:00
Evan Tschannen	79d94214a4	Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs	2018-01-25 10:12:06 -08:00
A.J. Beamon	2744646090	Merge branch 'release-5.0' into release-5.1	2018-01-22 11:57:58 -08:00
A.J. Beamon	188562ccbc	fix: Status should create its DatabaseConfiguration using fromKeyValues(). This makes sure that various state is correctly set if not specified in the configuration.	2018-01-22 11:40:08 -08:00
Evan Tschannen	66b2218989	added tlog support for upgrading from 5.X clusters. Does not support upgrading from 4.X or earlier. Untested, storage servers still need the ability to change their tag.	2018-01-21 12:21:46 -08:00
Evan Tschannen	698ef4117e	Merge branch 'master' into feature-remote-logs	2018-01-20 10:34:30 -08:00
Evan Tschannen	b5eba4f13a	fix: do not check for desired data centers if they have not been set	2018-01-20 10:28:59 -08:00
A.J. Beamon	35b91bfb55	Add back (in different form) some ratekeeper trace events when a storage server or log doesn't respond. Add actualTPS (named TPSBasis) to RkUpdate.	2018-01-18 14:51:38 -08:00
Evan Tschannen	b78e0a362a	fix: do not pause when running multiple backup tests simultaneously	2018-01-18 12:24:33 -08:00
Evan Tschannen	2e46ee3dba	fix: getTeam works when there are no teams	2018-01-17 17:49:13 -08:00
Evan Tschannen	264dc44dfa	fixed many more bugs associated with running without remote logs	2018-01-17 17:03:17 -08:00
Stephen Atherton	93b34a945f	Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup.	2018-01-17 04:09:43 -08:00
Evan Tschannen	8f58bdd1cd	fixed a large number of problems related to running without remote logs	2018-01-16 18:12:40 -08:00
Evan Tschannen	316e200a0c	fix: compilation errors after merge	2018-01-16 10:48:50 -08:00
Evan Tschannen	21482a45e1	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/DBCoreState.h # fdbserver/LogSystem.h # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/TLogServer.actor.cpp	2018-01-14 13:40:24 -08:00
Evan Tschannen	645dc5ead6	warmRange needs to get a read version occasionally to prevent it from overwhelming the proxy quietDatabase waits for all data distribution to be completely finished so that databases are cached in a cleaner state	2018-01-14 12:50:52 -08:00
Evan Tschannen	be643d6937	fix: the tlog did not cancel recovery properly when stopped	2018-01-12 17:18:14 -08:00
Evan Tschannen	3915d6825c	we need to check the server list at a higher priority, because if we do not notice a storage server interface change for a long period of time, we will mark it as failed	2018-01-12 12:51:07 -08:00
Evan Tschannen	de119f192d	fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled)	2018-01-11 16:09:49 -08:00
Evan Tschannen	29ebb19388	Merge branch 'release-5.0' into release-5.1	2018-01-11 15:43:37 -08:00
Evan Tschannen	22e5a0b257	formatting	2018-01-11 14:44:09 -08:00
Evan Tschannen	173a8de3ed	DBCoreState supports upgrades from 3.0 versions	2018-01-11 14:39:51 -08:00
A.J. Beamon	2f5073d00f	Some visual studio project cleanup.	2018-01-10 10:07:18 -08:00
Evan Tschannen	022df3b91b	backup and restore sometimes took too long in simulation	2018-01-09 17:26:42 -08:00
Evan Tschannen	645f68212b	make timekeeper priority system immediate	2018-01-08 18:21:00 -08:00
Evan Tschannen	370e8a9903	fix: split metrics could fail an assert in a very rare scenario	2018-01-08 18:20:22 -08:00
Evan Tschannen	9630deba3a	fixed a number of bugs related to running fearless without remote logs	2018-01-08 12:04:19 -08:00
Evan Tschannen	d3116fb336	masterRecoveryDuration is only a sevWarnAlways outside of simulation	2018-01-07 15:37:45 -08:00
Evan Tschannen	4e8bc273b3	added a version of getKeyRangeLocations that checks for endpoint failures fix: did not add the cluster controller to id_used in all cases removed obsolete fixmes	2018-01-07 15:32:43 -08:00
Evan Tschannen	30710f7493	syncLogId was not necessary	2018-01-06 14:52:39 -08:00
Evan Tschannen	3ec45d38a0	Merge branch 'master' into feature-remote-logs # Conflicts: # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-01-06 13:54:45 -08:00
Evan Tschannen	10c3fc165e	fix: after recovering from disk, only allow peeking data the was fully recovered	2018-01-06 13:49:13 -08:00
Stephen Atherton	b86f68ceb8	Added new test that combines atomic backup/restore. Added randomization to delays in AtomicRestore workload.	2018-01-05 14:43:21 -08:00
Evan Tschannen	63751fb0e2	fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from	2018-01-05 14:15:25 -08:00
Evan Tschannen	5ac4f73978	Merge branch 'release-5.1' into feature-remote-logs # Conflicts: # fdbclient/NativeAPI.actor.cpp # fdbrpc/Locality.h # fdbrpc/simulator.h # fdbserver/ApplyMetadataMutation.h # fdbserver/ClusterController.actor.cpp # fdbserver/LogSystemPeekCursor.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/masterserver.actor.cpp # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2018-01-05 11:33:42 -08:00
A.J. Beamon	5015119115	Generalize the message that gets displayed in status if a cluster file's contents are incorrect.	2018-01-05 10:29:47 -08:00
Evan Tschannen	e11f461cbd	fix: better master exists needs to check master fitness before tlogs or proxies because that is the order of recruitment	2018-01-04 15:19:46 -08:00
Evan Tschannen	f8f1c48d83	sometimes test pausing backups	2018-01-04 11:40:08 -08:00
Evan Tschannen	f2c4beed9f	fix: tlogFitness did not consider it better to have one tlog of a better fitness fix: checkStable was not used in all places in better master exists fix: we need to call checkOutstanding on worker registration in all cases fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it	2018-01-04 11:33:02 -08:00
Evan Tschannen	6d5dd9bd27	fix: we cannot pipeline disk queue commits until after the first commit is successful	2018-01-02 13:30:27 -08:00
Evan Tschannen	86958cb08d	Merge pull request #226 from cie/fix-taskBucket-unblockFuture Modify TaskBucketCorrectness to support chain and multiple tasks	2017-12-20 18:00:54 -08:00
Yichi Chiang	91e5abeaa6	Modify TaskBucketCorrectness to support chain and multiple tasks	2017-12-20 17:02:49 -08:00
Alex Miller	f70e3b9fe8	Add or change a bunch of comments to provide descriptions of function contracts. This cleans up a bit of the VersionStamp DR work I did, and leaves hints and advice for anyone who will be touching mutation applying code in the future.	2017-12-20 16:57:14 -08:00
Evan Tschannen	982f0dcb1e	Merge pull request #222 from cie/alexmiller/drtimefix2 Fix yet another VersionStamp DR issue.	2017-12-20 15:09:23 -08:00
Alex Miller	b5a6bc0ab7	Fix VersionStamp problems by instead adding a COMMIT_ON_FIRST_PROXY transaction option. Simulation identified the fact that we can violate the VersionStamps-are-always-increasing promise via the following series of events: 1. On proxy 0, dumpData adds commit requests to proxy 0's commit promise stream 2. To any proxy, a client submits the first transaction of abortBackup, which stops further dumpData calls on proxy 0. 3. To any proxy that is not proxy 0, submit a transaction that checks if it needs to upgrade the destination version. 4. The transaction from (3) is committed 5. Transactions from (1) are committed This is possible because the dumpData transactions have no read conflict ranges, and thus it's impossible to make them abort due to "conflicting" transactions. There's also no promise that if client C sends a commit to proxy A, and later a client D sends a commit to proxy B, that B must log its commit after A. (We only promise that if C is told it was committed before D is told it was committed, then A committed before B.) There was a failed attempt to fix this problem. We tried to add read conflict ranges to dumpData transactions so that they could be aborted by "conflicting" transactions. However, this failed because this now means that dumpData transactions require conflict resolution, and the stale read version that they use can cause them to be aborted with a transaction_too_old error. (Transactions that don't have read conflict ranges will never return transaction_too_old, because with no reads, the read snapshot version is effectively meaningless.) This was never previously possible, so the existing code doesn't retry commits, and to make things more complicated, the dumpData commits must be applied in order. This would require either adding dependencies to transactions (if A is going to commit then B must also be/have committed), which would be complicated, or submitting transactions with a fixed read version, and replaying the failed commits with a higher read version once we get a transaction_too_old error, which would unacceptably slow down the maximum throughput of dumpData. Thus, we've instead elected to add a special transaction option that bypasses proxy load balancing for commits, and always commits against proxy 0. We can know for certain that after the transaction from (2) is committed, all of the dumpData transactions that will be committed have been added to the commit promise stream on proxy 0. Thus, if we enqueue another transaction against proxy 0, we can know that it will be placed into the promise stream after all of the dumpData transactions, thus providing the semantics that we require: no dumpData transaction can commit after the destination version upgrade transaction.	2017-12-20 15:04:04 -08:00
Stephen Atherton	e0d9cea008	Merge branch 'master' into continuous-backup # Conflicts: # fdbclient/FileBackupAgent.actor.cpp # fdbrpc/BlobStore.actor.cpp	2017-12-19 23:02:14 -08:00
Alex Miller	c7dbd31a1e	Refactoring: Create a common prefixRange and do UID->Key once in backup.	2017-12-19 17:17:50 -08:00
Alex Miller	1488c12c18	Simulation will return and error and print if any non-suppressed SevError events were logged. This means that loops like `seed=1; while ./fdbserver -r simulation -s $seed; do seed=$(($seed+1)); done` to find an example of an often failing test. This also means joshua will report ExitCode errors on anything that has a SevError in the log. As a part of this, we also implicitly downgrade any injected errors to SevWarnAlways.	2017-12-19 17:17:50 -08:00
Stephen Atherton	e28641886d	TraceEvent improvements. Minor bug fix, restore log writing tasks didn't have the log file endVersion but it's only for logging purposes.	2017-12-19 15:27:04 -08:00
Evan Tschannen	a5601877b3	fix: valgrind issue with destruction ordering	2017-12-18 15:31:59 -08:00
Evan Tschannen	1dc9eceb6d	optimize GetKeyLocationRequests on the proxy so they only require a single map lookup, instead of doing 3 + (3* [number of ranges]) lookups	2017-12-15 20:13:44 -08:00
Stephen Atherton	33f9f1a95c	Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration.	2017-12-14 01:44:38 -08:00
Evan Tschannen	7ce93426ed	fix: connection disabler in removeServerSafely needs to run for the whole test to avoid getting stuck on include all	2017-12-12 18:38:57 -08:00
Alec Grieser	4495a19299	Merge pull request #220 from cie/alexmiller/flowprofcircus Add class restrictions to CpuProfiler, and fix metric crash.	2017-12-11 14:13:22 -08:00
Evan Tschannen	73a0a07eac	clients ask for key location information directly from the proxy, instead of reading it from the database	2017-12-09 16:10:22 -08:00
Alex Miller	48660e9ce5	Add class restrictions to CpuProfiler, and fix metric crash. This change largely refactors away the old meaning of the value given to flow_profiler, which was the number of machines that we'd be profiling, and instead replaces it with the classes of processes to profile for the duration of the test. Most importantly, this means that one can profile in circus with a configuration that has "ssd" in it, and the circus run will still complete (as long as the argument isn't "storage"). And also finally add some other fixes I had to the same file to conditionally change the name of the metric we're looking for to comply with what's actually written.	2017-12-07 19:28:29 -08:00
Stephen Atherton	abb2dd1ebc	Merge pull request #214 from cie/alexmiller/fallocate Use fallocate to zero ranges instead of writing zeroes	2017-12-06 13:47:40 -08:00
Evan Tschannen	5a947212ed	fix: ensure all prior commits have completed before returning that a commit has committed from the disk queue	2017-12-06 12:31:07 -08:00
Stephen Atherton	f8e89a40ac	Bug fixes, take(1) is incorrect usage of FlowLock.	2017-12-04 10:25:47 -08:00
Evan Tschannen	49dac11a5f	added a SevWarnAlways for when a disk queue file grows larger than 20GB	2017-12-01 15:05:17 -08:00
Evan Tschannen	482ac38ca6	added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs	2017-12-01 13:04:32 -08:00
Evan Tschannen	c3918d892a	do not use bandwidth splitting on the keyServer shard, lots of sets and clears to this shard generally means you do not want to create additional data distribution work	2017-11-30 18:28:16 -08:00
Alex Miller	196258080b	Refactor zeroing a chunk of a file from DiskQueue into IAsyncFile. If we're going to do the work to provide more optimized ways to zero files, then I'd feel better with this being in a more common place, so that any other zero-ers are likely to reuse it. It also makes testing easier/more obvious. Also, because it's needed for correctness, fix the aligned_alloc for OSX, which wasn't aligned, and use an actually aligned allocation function.	2017-11-30 17:57:55 -08:00
Alex Miller	c7a120c59d	Rename IAsyncFile::incrementalDelete -> IAsyncFileSystem::incrementalDeleteFile. `deleteFile` existed in IAsyncFileSystem, so an incremental delete function seems to belong more as a virtual method on IAsyncFileSystem than a static method on IAsyncFile, and the naming should match. As long as we're here, change IAsyncFile to declare a virtual destructor, so that it has good and proper C++ behavior. I presume this is what was vaguely intended by the default constructor definition that previously existed?	2017-11-30 17:19:10 -08:00
Evan Tschannen	7f72aa7de5	fix: a storage server does not ever need to rollback before a version restored from disk	2017-11-30 11:19:43 -08:00
Evan Tschannen	e5a682948c	Merge pull request #212 from cie/check-cluster-controller-desired-class Check cluster controller using desired process class in consistency c…	2017-11-29 15:57:51 -08:00
Yichi Chiang	8ba0eaebff	Check cluster controller using desired process class in consistency check	2017-11-29 15:09:23 -08:00
Evan Tschannen	8c51bc4ac4	fixed low latency tests in a way that gives us better test coverage	2017-11-28 18:20:29 -08:00
Evan Tschannen	dc624a54dc	fix: avoid flushing large queues in simulation when checking latency	2017-11-27 17:23:20 -08:00
Stephen Atherton	1b1c8e985a	Merge branch 'master' into backup-container-refactor # Conflicts: # fdbclient/FileBackupAgent.actor.cpp	2017-11-25 19:54:51 -08:00
Stephen Atherton	6695c9e6a2	Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files.	2017-11-25 00:46:16 -08:00
Alex Miller	f19cb3bbbd	Merge pull request #208 from cie/alexmiller/grvtfix Fix the GRV performance regression	2017-11-17 15:00:44 -08:00
Yichi Chiang	d9a98aa968	Remove commented code	2017-11-16 17:25:37 -08:00
Yichi Chiang	0d5dc15ac8	Fix double recoveries	2017-11-16 16:58:55 -08:00
Alex Miller	e9412bbb11	Fix the GRV performance regression introduced by adding the policy engine to GRV calculations. Construction of LocalityGroup from LocalityData is expensive, and the previous code greatly ran afoul of that. The policy engine does a large amount of interning of strings and building compressed maps to make the expected many future selectReplica calls cheap. Unfortunately we don't call selectReplicas, so much of this work is undesireable for us, and a large amount of CPU time is spent doing this initialization work. The new changes aggressively do the minimal LocalityGroup::add() calls necessary, and make them as cheap as possibly by removing all elements from LocalityData that don't need to be considered by the policy. This optimization was also applied to the PeekCursor used during recovery, which should speed recoveries up by a small amount.	2017-11-16 16:15:52 -08:00
Evan Tschannen	ad456a939a	Merge pull request #206 from cie/change-excluded-cluster-controller Change excluded cluster controller	2017-11-15 17:28:33 -08:00
Yichi Chiang	f96faf72d9	Add fullyRecoveredConfig for checking exclusions	2017-11-15 17:15:24 -08:00
Evan Tschannen	30464e943c	Merge pull request #205 from cie/cleanup-spammy-traceevents Cleanup spammy traceevents	2017-11-15 12:41:37 -08:00
Evan Tschannen	e113dba0e3	added a new trace event tracking master recovery durations	2017-11-15 12:38:26 -08:00
Stephen Atherton	a77162b53d	Merge branch 'master' into backup-container-refactor # Conflicts: # fdbclient/BackupAgent.h # fdbclient/FileBackupAgent.actor.cpp # fdbclient/KeyBackedTypes.h	2017-11-15 08:14:47 -08:00
Stephen Atherton	3dfaf13b67	IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse. Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively(). Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.	2017-11-14 23:33:17 -08:00
Yichi Chiang	df922bc973	Change excluded cluster controller	2017-11-14 13:57:37 -08:00
A.J. Beamon	bb1297c686	Remove RkServerQueueInfo and RkTLogQueueInfo trace events, since this information is more or less already logged on the storage servers and tlogs. Update the quiet database check and magnesium to use the information from the logs and storage servers.	2017-11-14 12:59:42 -08:00
A.J. Beamon	3b952efb4e	Remove events from cluster controller that get logged for roughly every worker upon recovery, master registration, etc.	2017-11-14 10:15:45 -08:00
A.J. Beamon	0fea5e9c2f	Convert client_invalid_operation errors to ASSERTs.	2017-11-13 11:38:34 -08:00
A.J. Beamon	cd085764f1	Do not automatically change a cluster file that does not match what you expect.	2017-11-10 14:12:45 -08:00
Alex Miller	311d1ca87d	A variety of fixes that collectively fix using flow profiling in circus. To run, use --co=flow_profiling=-1, because reasons.	2017-11-07 13:55:16 -08:00
Evan Tschannen	706bf1e018	fix: we cannot trigger better master exists before a master is fully recovered because exclusions changed by the provisional master will not be committed until the master is fully recovered	2017-11-04 12:48:04 -07:00
Evan Tschannen	57aba0b3bc	fix: excluded servers were the same fitness as storage servers for the master role fix: better master exists did not considers exclusion for master fitness	2017-11-03 17:09:14 -07:00
Yichi Chiang	42fad5efe5	Introduce cluster controller process class in circus	2017-11-03 14:22:55 -07:00
Yichi Chiang	dcc9aafab7	Merge branch 'master' of github.com:apple/foundationdb	2017-11-02 10:47:59 -07:00
Yichi Chiang	c033d8efd8	Fix typo message and remove extra TraceEvent which overwrites the expected one	2017-11-02 10:47:51 -07:00
Balachandar Namasivayam	3efaaec479	onMasterProxiesChanged was being triggered when any member of ClientDBInfo changed. Change the behavior to be triggered only when proxies field in ClientDBInfo is changed.	2017-11-01 18:29:56 -07:00
A.J. Beamon	7cf17df821	Merge branch 'master' into log-group-for-unsupported-clients # Conflicts: # flow/Net2.actor.cpp # tests/fast/SidebandWithStatus.txt # tests/rare/LargeApiCorrectnessStatus.txt # tests/slow/DDBalanceAndRemoveStatus.txt	2017-11-01 11:31:02 -07:00
A.J. Beamon	31caac67dc	Rename supported_versions[x].clients to supported_versions[x].connected_clients	2017-11-01 10:41:30 -07:00
Balachandar Namasivayam	988bc0207f	Reset Client Transaction profiling parameters when the config keys are cleared.	2017-10-31 15:40:57 -07:00
Alec Grieser	5a4a5985fd	Merge branch 'release-5.0'	2017-10-30 08:31:23 -07:00
Alec Grieser	87321f5017	Merge branch 'release-4.6' into release-5.0	2017-10-30 08:31:01 -07:00
Evan Tschannen	54d82c0d92	Merge pull request #194 from cie/alexmiller/valgrind Fix valgrind errors	2017-10-27 17:25:12 -07:00
Alex Miller	e0d33ef8d7	Preemptively fix profiler-related valgrind errors/straight out bugs. I forgot to initialize some fields in requests.	2017-10-27 17:20:19 -07:00
Evan Tschannen	aa0c2ae317	only increase the max shard size if the shard begins in the keyServer keyspace, do not increase the minimum shard size	2017-10-27 14:22:26 -07:00
Evan Tschannen	3a4078bdda	the keyservers shards are always a fixed large size	2017-10-27 11:52:11 -07:00
Balachandar Namasivayam	cfefab18fb	Merge branch 'master' into add-new-atomic-ops	2017-10-25 18:03:34 -07:00
Balachandar Namasivayam	3d5658940a	Addressed Review Comments	2017-10-25 16:42:05 -07:00
Balachandar Namasivayam	9dd588dcce	Addressed review comments. Changed naming for NewMin and NewAnd to MinV2 and AndV2	2017-10-25 14:48:05 -07:00
Evan Tschannen	d852a53ae4	Merge pull request #181 from cie/throttle-spammy-logs Throttle spammy logs	2017-10-25 13:45:55 -07:00
Balachandar Namasivayam	2f6d55a52f	Add correctness tests for all atomic ops	2017-10-25 13:36:49 -07:00
Yichi Chiang	4d54a73f5b	Merge pull request #191 from cie/count-cluster-controller-role Take cluster controller role into consideration when recruiting workers	2017-10-25 12:09:15 -07:00
Yichi Chiang	f39cce9b8d	Use processId instead of address for comparison	2017-10-25 11:35:29 -07:00
Yichi Chiang	5fcef911f0	Take cluster controller role into consideration when recruiting workers	2017-10-25 10:35:46 -07:00
Evan Tschannen	48901a9223	added a list of tlog IDs that are missing to status	2017-10-24 16:28:50 -07:00
Yichi Chiang	c2a117fe07	Merge pull request #189 from cie/enable-check-desired-class Enable checkUsingDesiredClasses() in consistency check	2017-10-24 15:18:21 -07:00
Yichi Chiang	defdc6550d	Exclude excluded processses when getting testers	2017-10-24 15:16:34 -07:00
Evan Tschannen	df74e2a373	re-added support for non-copying tlog recovery	2017-10-24 15:09:31 -07:00
Yichi Chiang	3865c5ae0e	Enable checkUsingDesiredClasses() in consistency check	2017-10-24 12:58:54 -07:00
Balachandar Namasivayam	8c3bdc5b3b	Make atomic ops differentiate between unset and empty values.	2017-10-23 16:48:13 -07:00
Evan Tschannen	7a36fd2134	disabled a variety of simulation tests to get correctness clean	2017-10-19 15:49:54 -07:00
Evan Tschannen	e2c1e87df6	made a large number of fixes to make fearless DR correctness clean.	2017-10-19 15:36:32 -07:00
Bhaskar Muppana	360b777b78	Fail with correct error code in case of abort or discontinue of non-existing backups.	2017-10-18 23:17:48 -07:00
Alec Grieser	dd6d8f3b0e	Merge branch 'master' into add-new-atomic-ops	2017-10-18 16:36:44 -07:00
Bhaskar Muppana	2007f3799f	Don't ignore TimeKeeper failures.	2017-10-18 14:31:31 -07:00
Bhaskar Muppana	314511f4d7	Fixing spaces in BackupCorrectness TraceEvents.	2017-10-18 14:27:52 -07:00
Alex Miller	7b9bc1d715	Merge pull request #170 from cie/alexmiller/flowprofile Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.	2017-10-16 16:51:53 -07:00
Alex Miller	f997cb9038	Add a string knob to hold the Log directory, and write profiles to it. This is the combination of two small changes. 1. Add support for a string knob type. 2. Change profiles to be written to the log directory instead of the working directory. We have three options of where to write files: the working directory, the data directory, and the log directory. The working directory may be set to a non-writable location, and likely contains the fdb binaries. Allowing these files to be overwritten would likely not be a wise idea. The data directory hosts our sqlite b-trees. It would also be very unfortunate if these were ever overwritten by an unfortunate profile name. The log directory contains logs. Out of the three, these matter the least if they disappear or become corrupted. Thus, we write to the log directory.	2017-10-16 16:05:02 -07:00
Alex Miller	c5fbe33df6	Disallow arbitrary paths for storing profiles. Previously, one could request profiles to be stored at "../../../../../../etc/passwd". Now we expand the paths, including symlinks, and ensure that the target is a child of the targetted subdirectory. This was the least convoluted way I could figure out to handle paths.	2017-10-16 16:05:02 -07:00
Alex Miller	91a26a170c	Add toggleable profiling support to fdbserver+fdbcli. This adds the fdbcli commands: * profile list -- Lists all workers in a way that doesn't fill `kill`'s list. * profile flow run -- Allows starting flow profiling on a set of hosts for a specified interval. And threads through all the support for enabling and disabling profiling as an RPC.	2017-10-16 16:05:02 -07:00
Balachandar Namasivayam	312f614133	Add the new ops and AND to NON_ASSOCIATIVE_MASK. In the storage server, read the entire value if the op is ByteMin or ByteMax.	2017-10-16 11:06:31 -07:00
Alec Grieser	e0be1ef1e0	Merge branch 'release-5.0'	2017-10-16 10:08:11 -07:00
Alec Grieser	432726ba2d	Merge branch 'release-4.6' into release-5.0	2017-10-16 09:54:21 -07:00
Stephen Atherton	68eccb681e	Merge pull request #173 from bmuppana/master Backup log messages.	2017-10-13 18:31:53 -07:00
Evan Tschannen	215bcb8d3e	Merge pull request #157 from cie/choose-leader-on-stateless-processes Catch and update processClass change from DBSource	2017-10-13 14:03:29 -07:00
Yichi Chiang	5bcdd37c0d	Move UID generation and add initialClass	2017-10-13 13:46:37 -07:00
Yichi Chiang	12edd27281	Introduce prevChangeID to CandidacyRequest and LeaderHeartbeatRequest	2017-10-12 17:11:58 -07:00
Bhaskar Muppana	d1e9d28239	Backup log messages.	2017-10-12 16:12:42 -07:00
Stephen Atherton	11517f7bfc	Merge branch 'master' into continuous-backup # Conflicts: # fdbclient/FileBackupAgent.actor.cpp	2017-10-12 11:03:23 -07:00
Alex Miller	c24b941485	Fix erroneous std::move in indexed set, and clean up addMetric users. This is a follow-on to c4eb73d0. Thanks to Bala for pointing out the unchanged std::move usage, and there appeared to not be many existing users of addMetric anyway.	2017-10-11 17:36:51 -07:00
Balachandar Namasivayam	8e0bea2795	Update API_VERSION from 500 to 510	2017-10-11 13:49:38 -07:00
Stephen Atherton	c3d8412abb	Merge pull request #166 from cie/alexmiller/deathservice Fix potential division by zero issues via RPC.	2017-10-10 16:47:38 -07:00
Evan Tschannen	ff1b49be2e	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/DatabaseConfiguration.cpp	2017-10-10 16:07:59 -07:00
Evan Tschannen	8feb3b8fbc	fixed conflict range workload by just disabling timeKeeper instead of the check, because it should be a more robust fix	2017-10-10 16:01:02 -07:00
Balachandar Namasivayam	eeebf10030	Modified existing behavior of MIN and AND atomic ops. The new behavior results in a 'SET' if the atomic op is performed on a non -existing key. Added new atomic ops ByteMin and ByteMax that does lexicographic comparison of byte strings.	2017-10-10 13:02:22 -07:00
Evan Tschannen	c8525dc3e7	timekeeper is constantly changing keys in the system keyspace, so do not report errors on key mismatches on keys in the system keyspace	2017-10-10 12:04:56 -07:00
Evan Tschannen	3d2103075d	data distribution tracks teams for each data center separately	2017-10-10 10:36:33 -07:00
Evan Tschannen	5e6eba365b	fix: always set confChange, because popVersion is not deterministic across proxies, and confChange needs to be set deterministically	2017-10-06 18:37:08 -07:00
Evan Tschannen	93b3d0e4e7	fix: toMap didn’t report logs proxies and resolvers	2017-10-06 15:55:50 -07:00
Evan Tschannen	15962cf079	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbrpc/Locality.cpp # fdbrpc/Locality.h # fdbserver/ClusterController.actor.cpp # fdbserver/ClusterRecruitmentInterface.h # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp # fdbserver/WorkerInterface.h # fdbserver/fdbserver.vcxproj.filters # fdbserver/masterserver.actor.cpp # fdbserver/worker.actor.cpp # flow/error_definitions.h	2017-10-05 17:09:44 -07:00
Alex Miller	a21c8a820b	Move cpuProfilerRequest from WorkerInterface to ClientWorkerInterface. A way to access this stream is required if we wish to be able to toggle profiling from fdbcli. There's two ways to do this: 1. Use `monitorLeader()` to get a `ClusterControllerFullInterface`, and use `getWorkers` from there to get a list of `WorkerInterface`s, from which we can access cpuProfilerRequest. 2. Move cpuProfilerRequest to ClientWorkerInterface and use the existing code in the client that can fetch a list of all `ClientWorkerInterface`s. The split between WorkerInterface and ClientWorkerInterface appears to be what a client might have a need to call versus what is fdbserver-internal (and thus no client should even want to call). Thus, it seems to make more sense to acknowledge that profiling is useful to be able to toggle from a client, and go with option (2).	2017-10-05 14:08:28 -07:00
Yichi Chiang	3edc2824a9	Add initialClass to RegisterWorkerRequest 2	2017-10-05 11:03:25 -07:00
Yichi Chiang	05f7626e39	Add initialClass to RegisterWorkerRequest	2017-10-04 17:11:12 -07:00
Yichi Chiang	3c70df57b5	Fix cluster controller review comments	2017-10-04 15:48:55 -07:00
Alex Miller	e55cc447d2	Address code review comments. * Fixed memory corruption with SystemData key constants * Removed duplication in ClusterController * Reworked fdbcli actions to better represent explicit vs default assignments	2017-10-04 13:36:18 -07:00
A.J. Beamon	5063793f36	Revert line ending change	2017-10-04 11:19:19 -07:00
Alex Miller	706427ee62	Fix potential division by zero issues via RPC. A carefully crafted SplitMetricRequest could have caused division by zero. It's not really great to offer Division By Zero As A Service, so let's just return an error instead.	2017-10-03 22:11:08 -07:00
Evan Tschannen	3a2ddcc84a	Add destinations that are read-write to the source list, so that cancelled data movement can contribute to copying the data for the next movement.	2017-10-03 17:39:08 -07:00
Balachandar Namasivayam	0e153cdd35	Throttle Spammy logs. Three knobs are added. Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent. If a TraceEvent is throttled, a warning msg is logged.	2017-10-02 18:43:11 -07:00
Evan Tschannen	6ea9903c82	Merge branch 'release-5.0' # Conflicts: # fdbbackup/backup.actor.cpp # fdbserver/ClusterController.actor.cpp # versions.target	2017-10-01 18:46:44 -07:00
Evan Tschannen	0949c4be65	Revert "Fixed problem with master being recruited on excluded servers" This reverts commit 1f7b624734a8ad6e896dd3f01f9cdf334ca62486.	2017-10-01 16:30:19 -07:00
Evan Tschannen	696d432462	Revert "fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)" This reverts commit 83b2ce68c8e1a29fc1559598cc38d3ef7eb46101.	2017-10-01 16:29:32 -07:00
Evan Tschannen	0dde15f1d2	fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded) fix: better master exists did not use exclusions because the configuration was reset	2017-10-01 16:26:58 -07:00
Yichi Chiang	636ce4a131	Replace leader when find a better one	2017-09-29 16:34:55 -07:00
Alex Miller	11668bb359	Fixing code review comments.	2017-09-29 15:58:36 -07:00
Alex Miller	b7ce9d996c	Comment out verbose TraceEvents in preparation for pushing.	2017-09-29 15:58:36 -07:00
Alex Miller	c40c1bb5fe	Add a new workload: BackupToDBAbort, which does an ACI switchover. This is to allower easier testing of non-durable switchovers without having to wiggle into BackupToDBCorrectness's view of the world.	2017-09-29 15:58:36 -07:00
Alex Miller	9e9a96ae76	Make VersionStamp workload able to run with DR-style workloads. * It is now tolerant of locked database errors, and handles them correctly. * There is an option to specify which database to verify against.	2017-09-29 15:58:36 -07:00
Alex Miller	34630b6130	Make VersionStamp workload can handle commit_unknown_result. Previously, if a transaction failed with commit_unknown_result, and was actually committed, it would look like data that magically appeared in the database and verification would fail. Now, we explicitly re-read and check to see if the commit happened, so that we may maintain an accurate understanding of what the database state should be.	2017-09-29 15:58:36 -07:00
Alex Miller	23945b9fea	VersionStamp can co-exist with other workloads that write data to the database. VersionStamp previously would range-read the entire database during validation. This has the unfortunate effect of making it fail during validation if run with any other workload that writes keys to the database. Now, all keys written and read are done with a configurable prefix, so that it may co-exist with a variety of other workloads.	2017-09-29 15:58:36 -07:00
Alex Miller	370a6afb80	Make VersionStamp have an option to be tolerant of data being lost.	2017-09-29 15:58:36 -07:00

... 3 4 5 6 7 ...

587 Commits