Commit Graph

204 Commits

Author SHA1 Message Date
Evan Tschannen 9e3ec2cb33 fix: when resetting the peekCursor, we cannot discard the popped data if the adapter has already processed data 2019-07-30 13:25:25 -07:00
Evan Tschannen 1d326e3dc8 removed debugging message 2019-07-30 12:42:50 -07:00
Evan Tschannen 5d79e4141f fix: buffered cursor messageVersion should be set to the version we will be at after exhausting everything in messages 2019-07-30 12:38:44 -07:00
Evan Tschannen 45f7b41b48 fix: multi-cursor could discard popped commits after already returning data 2019-07-29 21:36:42 -07:00
Evan Tschannen 5bb322b483 implement popped on bufferedCursor 2019-07-29 21:19:47 -07:00
Evan Tschannen 28df2c35bb
Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade
Make sharded txsTag upgradeable and downgradeable
2019-07-26 13:29:39 -07:00
sramamoorthy 9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
Alex Miller 95487861be Make sharded txsTag gated on TLogVersion::V4.
To allow a potential 6.2 -> 6.1 rollback.
2019-07-16 19:09:53 -07:00
Alex Miller 9396eedd11 Const some random functions that are trivially const.
For code hygiene reasons only.
2019-07-16 19:09:09 -07:00
Evan Tschannen 15e894c724 Merge in master 2019-07-05 15:49:24 -07:00
Alex Miller bf883d7055 Merge remote-tracking branch 'upstream/master' into flowlock-api 2019-06-25 14:26:50 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen 1c005d5878
Merge pull request #1584 from alexmiller-apple/spilled-only-peek
Save TLog resources by letting peek request only spilled data.
2019-06-20 18:22:31 -07:00
Alex Miller 26343f557a Update getMore() contract.
MultiCursor already did this.
2019-06-20 17:48:24 -07:00
Evan Tschannen e0be631414 shard the txs tag so that more transaction logs are involved in its recovery 2019-06-19 18:15:09 -07:00
Alex Miller 51fd42a4d2 Merge remote-tracking branch 'upstream/master' into spilled-only-peek 2019-06-18 17:33:52 -07:00
mpilman 8576665a90 Revert "Revert "Make protocol version a type""
This reverts commit 455bf3b3ec.
2019-06-18 14:49:04 -07:00
Alex Miller 455bf3b3ec Revert "Make protocol version a type" 2019-06-18 10:59:17 -07:00
mpilman da53a92bec Make protocol version a type
This fixes #1214

The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
sramamoorthy 3d5998e9dd tlog: when pops are disabled, store them & replay
In Tlogs, disable pop is done whlie taking snapshots. Earlier, tlogs
were ignoring the pops if it got pop requests when pops were
disabled. In this change, instead of ignoring the pop - it remembers
the list of pops in-memory and plays them once the popping is
enabled.
2019-05-28 22:07:46 -07:00
sramamoorthy 4bc4c615da exec op to all tlog, restore change in test &other
- exec operation to go to all the TLogs
- minor bug fix in tlog
- restore implementation for the simulator
- restore snap UID to be stored in restartInfo.ini
- test cases added
- indentation and trace file fixes
2019-05-28 22:07:46 -07:00
sramamoorthy 69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
Jingyu Zhou b8e7fc1b84 Refactor: add std:: qualifier and use emplace_back 2019-05-17 09:38:50 -10:00
Alex Miller 4eb4c03ce5 Save TLog resources by letting peek request only spilled data.
If a peek is entirely fulfilled from spilled data, then it's likely that
the next peek will be also.  It is thus wasteful for each of these peeks
to call peekMessagesFromMemory, which memcpy's excessively, and then
throw all that data away without using it.

Now, TLogs will give a hint back to peek cursors about if the provided
reply was served entirely from the spilled data, which peek curors then
feed back as the hint into their next request.

At some point, a cursor will send a request for only spilled data, get
an incomplete response, and then be told to send its next request as one
that peeks from memory as well, and then it will fully catch up.
2019-05-14 15:38:48 -10:00
Jingyu Zhou 8b5449e608 Fix review comments for PR #1473 2019-04-29 16:45:42 -07:00
Jingyu Zhou 5462f560e7 Add pseudo locality for log routers and tlogs
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
  popping.

Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
Jingyu Zhou 010f825aff Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet 2019-04-21 10:41:07 -07:00
Jingyu Zhou 7befce6bf1 More pseudoLocalities and refactors. 2019-04-21 10:41:07 -07:00
Jingyu Zhou 966ec30fcc Add pseudoLocalities for special tag consumers 2019-04-21 10:41:07 -07:00
Jingyu Zhou d19b0cf1c1 Refactor LogSet with two new constructors 2019-04-21 10:41:07 -07:00
Jingyu Zhou 0b1984978a Small code refactoring. 2019-04-21 10:41:07 -07:00
Jingyu Zhou ec1bc5cfca Add LogSystemType enum 2019-04-21 10:41:07 -07:00
Evan Tschannen b6008558d3 renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen 5a00f567be fix CheckSatelliteTagLocation 2019-03-20 09:30:11 -07:00
Balachandar Namasivayam f9560e1abd Addressed Review Comments 2019-03-19 15:23:14 -07:00
Balachandar Namasivayam 5471725db5 Support config where the primary and remote DC's can be used as satellites. 2019-03-18 12:17:59 -07:00
Evan Tschannen a2108047aa removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects. 2019-03-13 13:14:39 -07:00
Evan Tschannen 8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller 2dc57568cb Change many things about log_version.
* log_version in the database (`/conf/log_version`) is now a hint that gets
  rounded to the nearest supported version.
* fdbcli and FDB enforce that only a valid log_version can be configured to
* TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet)
* Some comments here and there
* Add an assert on filename length to make sure KV-pairs in filename
  don't exceed a maximum length.
2019-02-26 16:47:04 -08:00
Evan Tschannen b8910ba7cd Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.h
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen 0e19b5a935 fix: allow the txnStateStore to be recovered from a process in a down datacenter, so that the cluster controller can know to switch to the other region 2019-02-21 16:52:27 -08:00
mpilman 3f0fd2a20c Use fwd decls in WorkerInterface
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen 9cfadad41b fix: if the tagPartitionedLogSystem cannot do a forced recovery, the master should not execute it forced recovery based modifications either 2019-02-18 15:13:18 -08:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen e7e1c634e0 fix: we need to restart the peek cursor when the known committed version becomes available 2018-10-02 17:44:14 -07:00
Evan Tschannen 05e7f08b26 added a peek method which will attempt to read the txsTag from the local region as much as possible 2018-09-28 12:21:08 -07:00
Evan Tschannen 30b2f85020 fix: it is not safe to drop logs supporting the current primary datacenter, because configuring usable_regions down will drop the storage servers in the remote region, leaving you will no remaining logs 2018-07-14 16:26:45 -07:00
Evan Tschannen cd63c7a7cc added a buffered cursor, which efficiently merges lots of peek cursors 2018-07-12 12:09:48 -07:00
Evan Tschannen c148c865e3 optimized log peek cursors to use much less CPU when using the policy engine 2018-07-11 15:43:55 -07:00
Evan Tschannen 6cf5354425 checkSatelliteTagLocations is not an error if the same zoneId is used multiple times 2018-07-05 13:00:13 -07:00
Evan Tschannen 7a12d3e130 added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive. 2018-07-01 09:39:04 -04:00
Evan Tschannen fb0d10635d the first location in a satellite team is the one that will serve peek requests. Make sure we probably balance peek traffic by having the first servers on each team be used an equal amount of times 2018-06-27 22:14:50 -07:00
Evan Tschannen 00167b0157 renamed some uses of knownCommittedVersion to durableKnownCommittedVersion
epochEnd exclusively refers to the last version a set of logs is responsible for serving peek requests for
recoverAt and recoveredAt refer to the last committed version of the previous generation
2018-06-26 18:20:28 -07:00
Evan Tschannen 68ac3bdc4c log routers now calculate a precise version to pop for their log router tag 2018-06-21 15:29:46 -07:00
Evan Tschannen 127c2ad775 fix: prevent adding the same location multiple times for satellite logs 2018-06-18 15:27:28 -07:00
Evan Tschannen 50e1e03130 fix: for configurations with anti-quorums to work, the push actors need to be put in the proxy’s actor collection 2018-06-18 15:25:54 -07:00
Evan Tschannen f637c680f1 fix: populateSatelliteTagLocations was broken
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen 6931a00993 satellite log push locations are static per tag, which will reduce the number of tags each satellite log has to index, and reduce the proxy cpu when calculating push locations 2018-06-16 17:39:02 -07:00
Evan Tschannen f694f7c9ca removed hasBestPolicy 2018-06-15 12:36:19 -07:00
Evan Tschannen 889889323e The master will tell the cluster controller if it is going to take a long time to recruit new logs in its DC; the cluster controller can determine if the other DC would be better and recruit there.
The cluster controller will not switch to the other data center if remote logs are too far behind.
We will not recruit in DCs with negative priority.
2018-06-13 18:14:14 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
A.J. Beamon 99c9958db7 Some more trace event normalization 2018-06-08 13:57:00 -07:00
Evan Tschannen c519339adb avoid peeking from logs that do not match the tag’s locality 2018-06-01 18:42:48 -07:00
Evan Tschannen 5143871fed passed debug ids into all versions of peek() to assist debugging 2018-04-30 13:36:35 -07:00
Evan Tschannen 9cdabfed0e added useful trace events 2018-04-29 18:54:47 -07:00
Evan Tschannen dbdeeaa5cf fix: log routers are given all the information they need to add remote tags in their initialization request 2018-04-28 18:04:57 -07:00
Evan Tschannen abcfb0604a fix: cloneNoMore needs to pass useBestSet 2018-04-26 18:32:12 -07:00
Evan Tschannen 7e434348ce fix: storage servers did not properly pull data when configuring from a fearless setup to a non-fearless setup 2018-04-25 18:20:28 -07:00
Evan Tschannen 73597f190e fix: new tlogs are initialized with exactly the tags which existed at the recovery version 2018-04-22 20:28:01 -07:00
Evan Tschannen a6d9e889f0 a cleaner solution to preventing tlogs from peeking log routers 2018-04-20 13:25:22 -07:00
Evan Tschannen f5c3417905 fix: prevent tlogs from peeking the wrong log routers 2018-04-20 00:30:37 -07:00
Evan Tschannen 447c7bd15b fix: log routers use durable known committed version at the time of the pop to determine what is safe to pop from their logs
fix: storage server does not advance its version across large version increase until it has data associated with the version
2018-04-18 12:07:29 -07:00
Evan Tschannen 7af892f50b first working version of non-copying recovery working with fearless configurations 2018-04-08 21:24:05 -07:00
Evan Tschannen 579ba58930 pop old tags only looks are recovered tags, and checks if they are still being used 2018-03-30 19:08:01 -07:00
Evan Tschannen 43cb63df25 fix: the collectTags bool was set incorrectly 2018-03-29 18:19:29 -07:00
Evan Tschannen 1a4ded1c99 support upgrades by merging tags associated with the different peek requests 2018-03-29 17:54:08 -07:00
Evan Tschannen b36e08f08f first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet 2018-03-29 15:12:38 -07:00
Evan Tschannen ccd70fd005 The tlog uses the tags embedded in the message instead of a separate vector of locations
optimized remote tlog committing to avoid re-serializing the message
2018-03-16 16:47:05 -07:00
Evan Tschannen 820382ea68 optimized the log router commit path to avoid re-serializing the data 2018-03-16 11:40:21 -07:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen cfcf98cffc fix: log router tags were not stored at a best location 2018-02-23 12:26:19 -08:00
Evan Tschannen ddb484143c fix: do not peek from remote logs if they are not fully recovered 2018-02-21 14:06:44 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen af97a512f5 to support more complicated policies in the future for determining the best location for a tag within a set of tlogs, use an integer instead of a bool 2018-01-29 17:48:18 -08:00
Evan Tschannen 497bc3fe83 fix: txsTag needs to choose the same best location as 5.X version of the software 2018-01-29 17:09:35 -08:00
Evan Tschannen 29c5d4ad3d upgrades from 5.X mostly supported, still some remaining correctness problems 2018-01-28 11:52:54 -08:00
Evan Tschannen 264dc44dfa fixed many more bugs associated with running without remote logs 2018-01-17 17:03:17 -08:00
Evan Tschannen 316e200a0c fix: compilation errors after merge 2018-01-16 10:48:50 -08:00
Evan Tschannen 21482a45e1 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DBCoreState.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen de119f192d fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled) 2018-01-11 16:09:49 -08:00
Evan Tschannen 9630deba3a fixed a number of bugs related to running fearless without remote logs 2018-01-08 12:04:19 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen 76e7988663 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.h
#	flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Evan Tschannen ea26bc1c43 passed first tests which kill entire datacenters
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
Evan Tschannen 2335fc73f2 fix: peek cursors were being timed out every 10 minutes, instead of 10 minutes after the last use
fix: if an interface is changed while we are not waiting in getMore, we will not reset the sequence to 0.
2017-08-09 15:58:06 -07:00
Evan Tschannen c22708b6d6 added tag localities
fix: remote logs need to stop the master when they are stopped
2017-08-03 16:16:36 -07:00
Evan Tschannen 5852a6301b fixed even more bugs 2017-07-15 15:15:03 -07:00
Evan Tschannen 5ac4de8775 fix: the same tag could be in the server tags list twice 2017-07-13 16:31:55 -07:00
Evan Tschannen 57ba9d36af fixed a large number of bugs 2017-07-13 12:29:21 -07:00
Evan Tschannen 415458deef made LogSet reference counted,
fixed a few bugs
2017-07-11 15:48:10 -07:00
Evan Tschannen 81ae263ad9 implemented setPeekCursor
removed oldTLogServer
first compiling version
2017-07-10 17:41:32 -07:00
Evan Tschannen 979ebcef6c changed to using a vector of logSets instead of a duplicate set of logs for remote servers
finished porting changes to the tlog
everything but peeking is finished in the TagPartitionedLogSystem
2017-07-09 14:46:16 -07:00
Evan Tschannen 0906250e78 merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem
re-included tags in messages to the tlog
previously never committed the LogRouter
2017-06-29 15:50:19 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00