foundationdb

Commit Graph

Author	SHA1	Message	Date
Evan Tschannen	1314bcec9e	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst	2018-10-05 12:54:00 -07:00
Evan Tschannen	06be70bace	fix: if localEnd is smaller than begin, we cannot peek from the local dc	2018-10-05 12:36:34 -07:00
Evan Tschannen	3922e477a5	Merge branch 'release-6.0' # Conflicts: # documentation/sphinx/source/release-notes.rst # fdbclient/ManagementAPI.actor.cpp # fdbserver/ClusterController.actor.cpp # fdbserver/DataDistribution.actor.cpp # fdbserver/LogSystemDiskQueueAdapter.actor.cpp # fdbserver/SimulatedCluster.actor.cpp # fdbserver/TLogServer.actor.cpp	2018-10-03 16:57:18 -07:00
Evan Tschannen	aa51d69b2d	fix: set peekLocality for upgraded tags	2018-10-03 13:54:59 -07:00
Evan Tschannen	69711a107b	fix: because of forced recovery, 0 log router tags does not mean we are a special tlog set	2018-10-02 17:45:11 -07:00
Evan Tschannen	e7e1c634e0	fix: we need to restart the peek cursor when the known committed version becomes available	2018-10-02 17:44:14 -07:00
Evan Tschannen	59335aa757	fix: the latest generation of remote transaction logs might has less data the a previous generation, because they take over at known committed version. Detect this case and end at the version that has the most data	2018-09-28 12:25:27 -07:00
Evan Tschannen	c577840020	fix: forced recovery should remove all references to the old primary tlogs in all generations of logs to help the peek logic avoid attempting to read from them	2018-09-28 12:23:09 -07:00
Evan Tschannen	05e7f08b26	added a peek method which will attempt to read the txsTag from the local region as much as possible	2018-09-28 12:21:08 -07:00
Evan Tschannen	200e65fe61	added a workload which tests killing an entire region, and recovering from the failure with data loss. fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore fix: we have to modify all of history, we cannot stop after finding a local remote	2018-09-17 18:32:39 -07:00
Alex Miller	fb31a6999f	Rewrite all files to have #include actorcompiler.h as the last include.	2018-08-14 15:50:26 -07:00
Alex Miller	535b5701e5	Rewrite all `Void _ = wait(...)` -> `wait(...)`. This takes advantage of the new actorcompiler functionality to avoid having duplicate definitions of `Void _` when trying to feed the un-actorompiled source through clang.	2018-08-14 15:50:26 -07:00
Evan Tschannen	1c29275672	call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.	2018-08-01 14:30:57 -07:00
Evan Tschannen	30b2f85020	fix: it is not safe to drop logs supporting the current primary datacenter, because configuring usable_regions down will drop the storage servers in the remote region, leaving you will no remaining logs	2018-07-14 16:26:45 -07:00
Evan Tschannen	cd63c7a7cc	added a buffered cursor, which efficiently merges lots of peek cursors	2018-07-12 12:09:48 -07:00
Evan Tschannen	c148c865e3	optimized log peek cursors to use much less CPU when using the policy engine	2018-07-11 15:43:55 -07:00
Evan Tschannen	f0494f18b1	added a trace event for forced recovery	2018-07-06 17:09:29 -07:00
Evan Tschannen	43b5cb28ba	fix: properly handle zero logRouterTags, this is important for forced recovery	2018-07-06 16:52:25 -07:00
Evan Tschannen	866ccfe344	added the ability to allow the master to finish recovery before all storage servers in both regions have their mutations. This allows you to recover from scenarios where you lose all your tlogs in one dc.	2018-07-04 01:59:04 -04:00
Evan Tschannen	c69d6166e3	another attempt at forced recovery	2018-07-03 13:42:58 -04:00
Evan Tschannen	57a8c6862e	fix: force recovery did not work if the latest log set did not recover th	2018-07-02 23:48:22 -04:00
Evan Tschannen	9eb8dc3a59	fix: previous attempt at force recovery did not work because we need to treat the remote logs as local for peeking	2018-07-02 22:35:18 -04:00
Evan Tschannen	7a12d3e130	added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive.	2018-07-01 09:39:04 -04:00
Evan Tschannen	a288d5b9a9	added a fallback satellite configuration, so that we can use two satellites if available, but do not have to failover to the remote datacenter if one satellite is down	2018-06-28 23:15:32 -07:00
Evan Tschannen	00167b0157	renamed some uses of knownCommittedVersion to durableKnownCommittedVersion epochEnd exclusively refers to the last version a set of logs is responsible for serving peek requests for recoverAt and recoveredAt refer to the last committed version of the previous generation	2018-06-26 18:20:28 -07:00
Evan Tschannen	8a8914f046	re-added the ability to configure the number of log routers. Many log routers are needed to get a sufficient number of sockets involved in copying data across the WAN	2018-06-22 00:04:00 -07:00
Evan Tschannen	68ac3bdc4c	log routers now calculate a precise version to pop for their log router tag	2018-06-21 15:29:46 -07:00
Evan Tschannen	e7999e7a3e	log routers need to use parallelGetMore when peeking because the latency to the primary datacenter makes the bandwidth of normal peeking too low.	2018-06-19 22:16:45 -07:00
Evan Tschannen	50e1e03130	fix: for configurations with anti-quorums to work, the push actors need to be put in the proxy’s actor collection	2018-06-18 15:25:54 -07:00
Evan Tschannen	0913368651	added usable_regions to specify if we will replicate into a remote region remote replication defaults to the primary replication removed remote_logs, because they should be specified as an override in the regions object	2018-06-17 19:31:15 -07:00
Evan Tschannen	f637c680f1	fix: populateSatelliteTagLocations was broken fix: satellites do not index the upgraded locality	2018-06-17 13:29:17 -07:00
Evan Tschannen	6931a00993	satellite log push locations are static per tag, which will reduce the number of tags each satellite log has to index, and reduce the proxy cpu when calculating push locations	2018-06-16 17:39:02 -07:00
Evan Tschannen	f694f7c9ca	removed hasBestPolicy	2018-06-15 12:36:19 -07:00
Evan Tschannen	0d87186821	use a specific locality for satellites	2018-06-15 11:06:38 -07:00
Evan Tschannen	1796e00149	do not pop tags from logs that are not indexing that tag	2018-06-14 12:55:33 -07:00
Evan Tschannen	889889323e	The master will tell the cluster controller if it is going to take a long time to recruit new logs in its DC; the cluster controller can determine if the other DC would be better and recruit there. The cluster controller will not switch to the other data center if remote logs are too far behind. We will not recruit in DCs with negative priority.	2018-06-13 18:14:14 -07:00
Evan Tschannen	8dfda1e57b	fixed another trace event	2018-06-11 12:53:07 -07:00
Evan Tschannen	372ed67497	Merge branch 'master' into feature-remote-logs # Conflicts: # fdbserver/DataDistribution.actor.cpp # fdbserver/MasterProxyServer.actor.cpp # fdbserver/TLogServer.actor.cpp # fdbserver/TagPartitionedLogSystem.actor.cpp	2018-06-11 11:34:10 -07:00
Evan Tschannen	b60264024a	fix: we need to copy the txsTag on satellite logs	2018-06-10 20:30:44 -07:00
Evan Tschannen	8a24bf6124	describe did not list all the log sets	2018-06-10 12:38:50 -07:00
A.J. Beamon	e5488419cc	Attempt to normalize trace events: * Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check. * Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase. * Use seconds instead of milliseconds in details. Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed. This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.	2018-06-08 11:11:08 -07:00
Evan Tschannen	c519339adb	avoid peeking from logs that do not match the tag’s locality	2018-06-01 18:42:48 -07:00
Evan Tschannen	81c7bddaf8	fix: must check for log router errors while waiting on satellite replies because the recruitmentID will not be updated if it threw an error	2018-05-06 18:15:12 -07:00
Evan Tschannen	8371afb565	fix: log routers need to know if the log system is stopped to determine how they should peek the last log generation	2018-05-05 17:56:00 -07:00
Evan Tschannen	e8ea02e054	fix: storage servers need to fail if they can no longer peek data	2018-05-05 17:19:59 -07:00
Evan Tschannen	e1e43cff28	endEpoch implemented using getDurableVersion	2018-04-30 18:32:04 -07:00
Evan Tschannen	5143871fed	passed debug ids into all versions of peek() to assist debugging	2018-04-30 13:36:35 -07:00
Evan Tschannen	9cdabfed0e	added useful trace events	2018-04-29 18:54:47 -07:00
Evan Tschannen	2e286b768d	fix: locality is needed for a logSet to call getPushLocations fix: accidentally deleted allowPops assignment on the log router	2018-04-29 13:47:32 -07:00
Evan Tschannen	dbdeeaa5cf	fix: log routers are given all the information they need to add remote tags in their initialization request	2018-04-28 18:04:57 -07:00

1 2 3

137 Commits