Evan Tschannen
f637c680f1
fix: populateSatelliteTagLocations was broken
...
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen
0d87186821
use a specific locality for satellites
2018-06-15 11:06:38 -07:00
Evan Tschannen
feb8578c06
fix: only flush and exit in simulation
2018-06-14 13:48:30 -07:00
Alex Miller
fcfa00928b
Make RecoveryState an enum class.
...
This means that all the == 7 or != 0 checks go away, and explicit names must be used.
2018-06-12 16:50:25 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen
588eaf4b36
fix: previous delay 0 could still cause us to recruit a tlog before processing disk errors
2018-06-11 11:26:30 -07:00
Evan Tschannen
a5c2a8ee8a
fix: allow disk errors to cancel the actor before recruiting logs
2018-06-10 20:27:19 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen
bf65e745a9
tlogs do not index tags for other localities
2018-06-01 22:51:08 -07:00
Evan Tschannen
c519339adb
avoid peeking from logs that do not match the tag’s locality
2018-06-01 18:42:48 -07:00
A.J. Beamon
1f0b519a73
Rename several variables in TLogServer.actor.cpp to follow our normal camel case conventions. I didn't rename every variable here, because some appear to be data structures (like a map) following the pattern keydesc_valuedesc, and I wasn't sure that the straightforward keydescValuedesc rename made sense. I did rename a couple of instances of these where it seemed reasonable, though.
2018-06-01 10:18:07 -07:00
Evan Tschannen
529bd34cf9
fix: when a tlog is stopped by another recruitment it no longer has the opportunity for commtingQueue to be set
2018-05-06 20:37:44 -07:00
Evan Tschannen
81c7bddaf8
fix: must check for log router errors while waiting on satellite replies because the recruitmentID will not be updated if it threw an error
2018-05-06 18:15:12 -07:00
Evan Tschannen
b4bd03e67e
fix: we cannot set queueCommitEnd until we have popped the log system to prevent the popped version from going backwards
2018-05-01 22:20:25 -07:00
Evan Tschannen
12ef63b698
knobify replace contents bytes
2018-05-01 19:43:35 -07:00
Evan Tschannen
10d25927cd
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
2018-04-30 22:15:39 -07:00
Evan Tschannen
5143871fed
passed debug ids into all versions of peek() to assist debugging
2018-04-30 13:36:35 -07:00
Evan Tschannen
883f2318a0
test fearless configurations
2018-04-30 13:17:29 -07:00
Evan Tschannen
92b134eb98
fix: errors from removed were not handled properly
2018-04-29 23:05:08 -07:00
Evan Tschannen
6f318dbff2
fix: do not reply to recruitment until we are sure the log commits to the queue
2018-04-29 22:08:24 -07:00
Evan Tschannen
9cdabfed0e
added useful trace events
2018-04-29 18:54:47 -07:00
Evan Tschannen
f77c1ec14e
fix: fixed rare bug where a log stopped by a different recruitment would still response successfully to the recruitment message
2018-04-28 13:34:06 -07:00
Evan Tschannen
af63dac5dd
fix: remote logs need to wait until the durable known committed version is greater than the recovery version before completing recovery to ensure we will not pick a start version that we do not have
2018-04-27 12:18:42 -07:00
Evan Tschannen
ae1de575f1
fix: remote logs are not considered fully recovered until they are at recoveredAt
2018-04-23 17:49:46 -07:00
Evan Tschannen
3ec09ce9f6
fix: only peekSingle needs to throw worker_removed, because tlogs have other ways to get notified they are no longer needed
...
fix: we need to wait until tags are popped past recoveredAt instead of unrecovered before
2018-04-23 16:43:08 -07:00
Evan Tschannen
126fc53d10
fix: the start version for peek cursors that merge with multiple log sets is the maximum of the individual start versions
2018-04-23 12:42:51 -07:00
Dennis Schafroth
290122637b
Using ASSERT_ABORT in destructors
2018-04-23 14:05:10 +02:00
Evan Tschannen
73597f190e
fix: new tlogs are initialized with exactly the tags which existed at the recovery version
2018-04-22 20:28:01 -07:00
Evan Tschannen
a520d03397
fix: if we cannot find a tag, it must have been popped at the recovery version.
2018-04-22 15:08:38 -07:00
Evan Tschannen
1d1e2cd367
fix: initialize the known committed version on the tlog
2018-04-21 00:41:15 -07:00
Evan Tschannen
8d350ceb5f
fix: persist the known committed version on the tlogs
2018-04-20 17:55:46 -07:00
Evan Tschannen
a6d9e889f0
a cleaner solution to preventing tlogs from peeking log routers
2018-04-20 13:25:22 -07:00
Evan Tschannen
f5c3417905
fix: prevent tlogs from peeking the wrong log routers
2018-04-20 00:30:37 -07:00
Evan Tschannen
5da452db8e
fix: pop the log routers again after the log system updates
2018-04-19 14:33:31 -07:00
Evan Tschannen
22526ef996
fix: do not tell storage servers about large sections of empty versions, because it can lead them to make mutations durable which have not been committed
2018-04-18 16:06:44 -07:00
Evan Tschannen
447c7bd15b
fix: log routers use durable known committed version at the time of the pop to determine what is safe to pop from their logs
...
fix: storage server does not advance its version across large version increase until it has data associated with the version
2018-04-18 12:07:29 -07:00
Evan Tschannen
760bc8bc99
fix: log router version needs to be fetched before it is available
...
fix: tlog did not fetch known committed version if start version was exactly equal to it
2018-04-17 11:16:48 -07:00
Evan Tschannen
3e40505f4a
Revert "fix: remote logs should reply until they have recovered through recoverAt"
...
This reverts commit 3c0c03c004
.
2018-04-16 23:17:16 -07:00
Evan Tschannen
3c0c03c004
fix: remote logs should reply until they have recovered through recoverAt
2018-04-16 17:25:49 -07:00
Evan Tschannen
3018a7b1b3
fix: the known committed version of a newly initialized log is 1, since by definition the first commit must have succeeded
2018-04-16 10:42:48 -07:00
Evan Tschannen
5533016f1e
fix: tlogs are now initialized immediately, instead of when starting the core, this must be done to pop the log routers during recovery
...
fix: log router start version must be the same as remote log start version
2018-04-15 14:33:07 -07:00
Evan Tschannen
65e69620a7
fix: unrecoveredBefore on a new log is at minimum 1
2018-04-13 10:41:30 -07:00
Evan Tschannen
1af5ac0d9d
fix: a number of different problems prevented tlogs from using log routers during recovery
2018-04-12 15:20:54 -07:00
Evan Tschannen
a738c4bec1
fix: if the known committed version is equal to the recovery version we do not need to copy any data
2018-04-09 20:48:55 -07:00
Evan Tschannen
419951f601
fix: need to initialize tlog versions to less than the startVersion
2018-04-09 17:17:11 -07:00
Evan Tschannen
4c89f721cd
fix: do not include logRouter tags in lock results
2018-04-09 10:48:57 -07:00
Evan Tschannen
7af892f50b
first working version of non-copying recovery working with fearless configurations
2018-04-08 21:24:05 -07:00
Evan Tschannen
331e707684
fix: pop all tags that did not have data at the recovery version because fully popped tags may come back when pullAsyncData re-indexes the mutations
2018-03-31 16:47:56 -07:00
Evan Tschannen
96fffe2cea
fix: do not update version if the log has been stopped
2018-03-30 22:11:42 -07:00
Evan Tschannen
1a4ded1c99
support upgrades by merging tags associated with the different peek requests
2018-03-29 17:54:08 -07:00