Commit Graph

204 Commits

Author SHA1 Message Date
Evan Tschannen 7a12d3e130 added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive. 2018-07-01 09:39:04 -04:00
Evan Tschannen fb0d10635d the first location in a satellite team is the one that will serve peek requests. Make sure we probably balance peek traffic by having the first servers on each team be used an equal amount of times 2018-06-27 22:14:50 -07:00
Evan Tschannen 00167b0157 renamed some uses of knownCommittedVersion to durableKnownCommittedVersion
epochEnd exclusively refers to the last version a set of logs is responsible for serving peek requests for
recoverAt and recoveredAt refer to the last committed version of the previous generation
2018-06-26 18:20:28 -07:00
Evan Tschannen 68ac3bdc4c log routers now calculate a precise version to pop for their log router tag 2018-06-21 15:29:46 -07:00
Evan Tschannen 127c2ad775 fix: prevent adding the same location multiple times for satellite logs 2018-06-18 15:27:28 -07:00
Evan Tschannen 50e1e03130 fix: for configurations with anti-quorums to work, the push actors need to be put in the proxy’s actor collection 2018-06-18 15:25:54 -07:00
Evan Tschannen f637c680f1 fix: populateSatelliteTagLocations was broken
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen 6931a00993 satellite log push locations are static per tag, which will reduce the number of tags each satellite log has to index, and reduce the proxy cpu when calculating push locations 2018-06-16 17:39:02 -07:00
Evan Tschannen f694f7c9ca removed hasBestPolicy 2018-06-15 12:36:19 -07:00
Evan Tschannen 889889323e The master will tell the cluster controller if it is going to take a long time to recruit new logs in its DC; the cluster controller can determine if the other DC would be better and recruit there.
The cluster controller will not switch to the other data center if remote logs are too far behind.
We will not recruit in DCs with negative priority.
2018-06-13 18:14:14 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
Evan Tschannen c519339adb avoid peeking from logs that do not match the tag’s locality 2018-06-01 18:42:48 -07:00
Evan Tschannen 5143871fed passed debug ids into all versions of peek() to assist debugging 2018-04-30 13:36:35 -07:00
Evan Tschannen 9cdabfed0e added useful trace events 2018-04-29 18:54:47 -07:00
Evan Tschannen dbdeeaa5cf fix: log routers are given all the information they need to add remote tags in their initialization request 2018-04-28 18:04:57 -07:00
Evan Tschannen abcfb0604a fix: cloneNoMore needs to pass useBestSet 2018-04-26 18:32:12 -07:00
Evan Tschannen 7e434348ce fix: storage servers did not properly pull data when configuring from a fearless setup to a non-fearless setup 2018-04-25 18:20:28 -07:00
Evan Tschannen 73597f190e fix: new tlogs are initialized with exactly the tags which existed at the recovery version 2018-04-22 20:28:01 -07:00
Evan Tschannen a6d9e889f0 a cleaner solution to preventing tlogs from peeking log routers 2018-04-20 13:25:22 -07:00
Evan Tschannen f5c3417905 fix: prevent tlogs from peeking the wrong log routers 2018-04-20 00:30:37 -07:00
Evan Tschannen 447c7bd15b fix: log routers use durable known committed version at the time of the pop to determine what is safe to pop from their logs
fix: storage server does not advance its version across large version increase until it has data associated with the version
2018-04-18 12:07:29 -07:00
Evan Tschannen 7af892f50b first working version of non-copying recovery working with fearless configurations 2018-04-08 21:24:05 -07:00
Evan Tschannen 579ba58930 pop old tags only looks are recovered tags, and checks if they are still being used 2018-03-30 19:08:01 -07:00
Evan Tschannen 43cb63df25 fix: the collectTags bool was set incorrectly 2018-03-29 18:19:29 -07:00
Evan Tschannen 1a4ded1c99 support upgrades by merging tags associated with the different peek requests 2018-03-29 17:54:08 -07:00
Evan Tschannen b36e08f08f first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet 2018-03-29 15:12:38 -07:00
Evan Tschannen ccd70fd005 The tlog uses the tags embedded in the message instead of a separate vector of locations
optimized remote tlog committing to avoid re-serializing the message
2018-03-16 16:47:05 -07:00
Evan Tschannen 820382ea68 optimized the log router commit path to avoid re-serializing the data 2018-03-16 11:40:21 -07:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen cfcf98cffc fix: log router tags were not stored at a best location 2018-02-23 12:26:19 -08:00
Evan Tschannen ddb484143c fix: do not peek from remote logs if they are not fully recovered 2018-02-21 14:06:44 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen af97a512f5 to support more complicated policies in the future for determining the best location for a tag within a set of tlogs, use an integer instead of a bool 2018-01-29 17:48:18 -08:00
Evan Tschannen 497bc3fe83 fix: txsTag needs to choose the same best location as 5.X version of the software 2018-01-29 17:09:35 -08:00
Evan Tschannen 29c5d4ad3d upgrades from 5.X mostly supported, still some remaining correctness problems 2018-01-28 11:52:54 -08:00
Evan Tschannen 264dc44dfa fixed many more bugs associated with running without remote logs 2018-01-17 17:03:17 -08:00
Evan Tschannen 316e200a0c fix: compilation errors after merge 2018-01-16 10:48:50 -08:00
Evan Tschannen 21482a45e1 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DBCoreState.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen de119f192d fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled) 2018-01-11 16:09:49 -08:00
Evan Tschannen 9630deba3a fixed a number of bugs related to running fearless without remote logs 2018-01-08 12:04:19 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen 76e7988663 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.h
#	flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Evan Tschannen ea26bc1c43 passed first tests which kill entire datacenters
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
Evan Tschannen 2335fc73f2 fix: peek cursors were being timed out every 10 minutes, instead of 10 minutes after the last use
fix: if an interface is changed while we are not waiting in getMore, we will not reset the sequence to 0.
2017-08-09 15:58:06 -07:00
Evan Tschannen c22708b6d6 added tag localities
fix: remote logs need to stop the master when they are stopped
2017-08-03 16:16:36 -07:00
Evan Tschannen 5852a6301b fixed even more bugs 2017-07-15 15:15:03 -07:00
Evan Tschannen 5ac4de8775 fix: the same tag could be in the server tags list twice 2017-07-13 16:31:55 -07:00
Evan Tschannen 57ba9d36af fixed a large number of bugs 2017-07-13 12:29:21 -07:00
Evan Tschannen 415458deef made LogSet reference counted,
fixed a few bugs
2017-07-11 15:48:10 -07:00
Evan Tschannen 81ae263ad9 implemented setPeekCursor
removed oldTLogServer
first compiling version
2017-07-10 17:41:32 -07:00
Evan Tschannen 979ebcef6c changed to using a vector of logSets instead of a duplicate set of logs for remote servers
finished porting changes to the tlog
everything but peeking is finished in the TagPartitionedLogSystem
2017-07-09 14:46:16 -07:00
Evan Tschannen 0906250e78 merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem
re-included tags in messages to the tlog
previously never committed the LogRouter
2017-06-29 15:50:19 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00