Commit Graph

114 Commits

Author SHA1 Message Date
Evan Tschannen f637c680f1 fix: populateSatelliteTagLocations was broken
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen 0d87186821 use a specific locality for satellites 2018-06-15 11:06:38 -07:00
Evan Tschannen feb8578c06 fix: only flush and exit in simulation 2018-06-14 13:48:30 -07:00
Alex Miller fcfa00928b Make RecoveryState an enum class.
This means that all the == 7 or != 0 checks go away, and explicit names must be used.
2018-06-12 16:50:25 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen 588eaf4b36 fix: previous delay 0 could still cause us to recruit a tlog before processing disk errors 2018-06-11 11:26:30 -07:00
Evan Tschannen a5c2a8ee8a fix: allow disk errors to cancel the actor before recruiting logs 2018-06-10 20:27:19 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen bf65e745a9 tlogs do not index tags for other localities 2018-06-01 22:51:08 -07:00
Evan Tschannen c519339adb avoid peeking from logs that do not match the tag’s locality 2018-06-01 18:42:48 -07:00
A.J. Beamon 1f0b519a73 Rename several variables in TLogServer.actor.cpp to follow our normal camel case conventions. I didn't rename every variable here, because some appear to be data structures (like a map) following the pattern keydesc_valuedesc, and I wasn't sure that the straightforward keydescValuedesc rename made sense. I did rename a couple of instances of these where it seemed reasonable, though. 2018-06-01 10:18:07 -07:00
Evan Tschannen 529bd34cf9 fix: when a tlog is stopped by another recruitment it no longer has the opportunity for commtingQueue to be set 2018-05-06 20:37:44 -07:00
Evan Tschannen 81c7bddaf8 fix: must check for log router errors while waiting on satellite replies because the recruitmentID will not be updated if it threw an error 2018-05-06 18:15:12 -07:00
Evan Tschannen b4bd03e67e fix: we cannot set queueCommitEnd until we have popped the log system to prevent the popped version from going backwards 2018-05-01 22:20:25 -07:00
Evan Tschannen 12ef63b698 knobify replace contents bytes 2018-05-01 19:43:35 -07:00
Evan Tschannen 10d25927cd Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
2018-04-30 22:15:39 -07:00
Evan Tschannen 5143871fed passed debug ids into all versions of peek() to assist debugging 2018-04-30 13:36:35 -07:00
Evan Tschannen 883f2318a0 test fearless configurations 2018-04-30 13:17:29 -07:00
Evan Tschannen 92b134eb98 fix: errors from removed were not handled properly 2018-04-29 23:05:08 -07:00
Evan Tschannen 6f318dbff2 fix: do not reply to recruitment until we are sure the log commits to the queue 2018-04-29 22:08:24 -07:00
Evan Tschannen 9cdabfed0e added useful trace events 2018-04-29 18:54:47 -07:00
Evan Tschannen f77c1ec14e fix: fixed rare bug where a log stopped by a different recruitment would still response successfully to the recruitment message 2018-04-28 13:34:06 -07:00
Evan Tschannen af63dac5dd fix: remote logs need to wait until the durable known committed version is greater than the recovery version before completing recovery to ensure we will not pick a start version that we do not have 2018-04-27 12:18:42 -07:00
Evan Tschannen ae1de575f1 fix: remote logs are not considered fully recovered until they are at recoveredAt 2018-04-23 17:49:46 -07:00
Evan Tschannen 3ec09ce9f6 fix: only peekSingle needs to throw worker_removed, because tlogs have other ways to get notified they are no longer needed
fix: we need to wait until tags are popped past recoveredAt instead of unrecovered before
2018-04-23 16:43:08 -07:00
Evan Tschannen 126fc53d10 fix: the start version for peek cursors that merge with multiple log sets is the maximum of the individual start versions 2018-04-23 12:42:51 -07:00
Dennis Schafroth 290122637b Using ASSERT_ABORT in destructors 2018-04-23 14:05:10 +02:00
Evan Tschannen 73597f190e fix: new tlogs are initialized with exactly the tags which existed at the recovery version 2018-04-22 20:28:01 -07:00
Evan Tschannen a520d03397 fix: if we cannot find a tag, it must have been popped at the recovery version. 2018-04-22 15:08:38 -07:00
Evan Tschannen 1d1e2cd367 fix: initialize the known committed version on the tlog 2018-04-21 00:41:15 -07:00
Evan Tschannen 8d350ceb5f fix: persist the known committed version on the tlogs 2018-04-20 17:55:46 -07:00
Evan Tschannen a6d9e889f0 a cleaner solution to preventing tlogs from peeking log routers 2018-04-20 13:25:22 -07:00
Evan Tschannen f5c3417905 fix: prevent tlogs from peeking the wrong log routers 2018-04-20 00:30:37 -07:00
Evan Tschannen 5da452db8e fix: pop the log routers again after the log system updates 2018-04-19 14:33:31 -07:00
Evan Tschannen 22526ef996 fix: do not tell storage servers about large sections of empty versions, because it can lead them to make mutations durable which have not been committed 2018-04-18 16:06:44 -07:00
Evan Tschannen 447c7bd15b fix: log routers use durable known committed version at the time of the pop to determine what is safe to pop from their logs
fix: storage server does not advance its version across large version increase until it has data associated with the version
2018-04-18 12:07:29 -07:00
Evan Tschannen 760bc8bc99 fix: log router version needs to be fetched before it is available
fix: tlog did not fetch known committed version if start version was exactly equal to it
2018-04-17 11:16:48 -07:00
Evan Tschannen 3e40505f4a Revert "fix: remote logs should reply until they have recovered through recoverAt"
This reverts commit 3c0c03c004.
2018-04-16 23:17:16 -07:00
Evan Tschannen 3c0c03c004 fix: remote logs should reply until they have recovered through recoverAt 2018-04-16 17:25:49 -07:00
Evan Tschannen 3018a7b1b3 fix: the known committed version of a newly initialized log is 1, since by definition the first commit must have succeeded 2018-04-16 10:42:48 -07:00
Evan Tschannen 5533016f1e fix: tlogs are now initialized immediately, instead of when starting the core, this must be done to pop the log routers during recovery
fix: log router start version must be the same as remote log start version
2018-04-15 14:33:07 -07:00
Evan Tschannen 65e69620a7 fix: unrecoveredBefore on a new log is at minimum 1 2018-04-13 10:41:30 -07:00
Evan Tschannen 1af5ac0d9d fix: a number of different problems prevented tlogs from using log routers during recovery 2018-04-12 15:20:54 -07:00
Evan Tschannen a738c4bec1 fix: if the known committed version is equal to the recovery version we do not need to copy any data 2018-04-09 20:48:55 -07:00
Evan Tschannen 419951f601 fix: need to initialize tlog versions to less than the startVersion 2018-04-09 17:17:11 -07:00
Evan Tschannen 4c89f721cd fix: do not include logRouter tags in lock results 2018-04-09 10:48:57 -07:00
Evan Tschannen 7af892f50b first working version of non-copying recovery working with fearless configurations 2018-04-08 21:24:05 -07:00
Evan Tschannen 331e707684 fix: pop all tags that did not have data at the recovery version because fully popped tags may come back when pullAsyncData re-indexes the mutations 2018-03-31 16:47:56 -07:00
Evan Tschannen 96fffe2cea fix: do not update version if the log has been stopped 2018-03-30 22:11:42 -07:00
Evan Tschannen 1a4ded1c99 support upgrades by merging tags associated with the different peek requests 2018-03-29 17:54:08 -07:00