Evan Tschannen
9e3ec2cb33
fix: when resetting the peekCursor, we cannot discard the popped data if the adapter has already processed data
2019-07-30 13:25:25 -07:00
Evan Tschannen
1d326e3dc8
removed debugging message
2019-07-30 12:42:50 -07:00
Evan Tschannen
5d79e4141f
fix: buffered cursor messageVersion should be set to the version we will be at after exhausting everything in messages
2019-07-30 12:38:44 -07:00
Evan Tschannen
45f7b41b48
fix: multi-cursor could discard popped commits after already returning data
2019-07-29 21:36:42 -07:00
Evan Tschannen
5bb322b483
implement popped on bufferedCursor
2019-07-29 21:19:47 -07:00
Evan Tschannen
28df2c35bb
Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade
...
Make sharded txsTag upgradeable and downgradeable
2019-07-26 13:29:39 -07:00
sramamoorthy
9afd162e2f
remove snap v1 related code
2019-07-25 17:29:31 -07:00
Alex Miller
95487861be
Make sharded txsTag gated on TLogVersion::V4.
...
To allow a potential 6.2 -> 6.1 rollback.
2019-07-16 19:09:53 -07:00
Alex Miller
9396eedd11
Const some random functions that are trivially const.
...
For code hygiene reasons only.
2019-07-16 19:09:09 -07:00
Evan Tschannen
15e894c724
Merge in master
2019-07-05 15:49:24 -07:00
Alex Miller
bf883d7055
Merge remote-tracking branch 'upstream/master' into flowlock-api
2019-06-25 14:26:50 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen
1c005d5878
Merge pull request #1584 from alexmiller-apple/spilled-only-peek
...
Save TLog resources by letting peek request only spilled data.
2019-06-20 18:22:31 -07:00
Alex Miller
26343f557a
Update getMore() contract.
...
MultiCursor already did this.
2019-06-20 17:48:24 -07:00
Evan Tschannen
e0be631414
shard the txs tag so that more transaction logs are involved in its recovery
2019-06-19 18:15:09 -07:00
Alex Miller
51fd42a4d2
Merge remote-tracking branch 'upstream/master' into spilled-only-peek
2019-06-18 17:33:52 -07:00
mpilman
8576665a90
Revert "Revert "Make protocol version a type""
...
This reverts commit 455bf3b3ec
.
2019-06-18 14:49:04 -07:00
Alex Miller
455bf3b3ec
Revert "Make protocol version a type"
2019-06-18 10:59:17 -07:00
mpilman
da53a92bec
Make protocol version a type
...
This fixes #1214
The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
sramamoorthy
3d5998e9dd
tlog: when pops are disabled, store them & replay
...
In Tlogs, disable pop is done whlie taking snapshots. Earlier, tlogs
were ignoring the pops if it got pop requests when pops were
disabled. In this change, instead of ignoring the pop - it remembers
the list of pops in-memory and plays them once the popping is
enabled.
2019-05-28 22:07:46 -07:00
sramamoorthy
4bc4c615da
exec op to all tlog, restore change in test &other
...
- exec operation to go to all the TLogs
- minor bug fix in tlog
- restore implementation for the simulator
- restore snap UID to be stored in restartInfo.ini
- test cases added
- indentation and trace file fixes
2019-05-28 22:07:46 -07:00
sramamoorthy
69edefe68b
Snapshot based backup and resotre implementation
2019-05-28 22:07:46 -07:00
Jingyu Zhou
b8e7fc1b84
Refactor: add std:: qualifier and use emplace_back
2019-05-17 09:38:50 -10:00
Alex Miller
4eb4c03ce5
Save TLog resources by letting peek request only spilled data.
...
If a peek is entirely fulfilled from spilled data, then it's likely that
the next peek will be also. It is thus wasteful for each of these peeks
to call peekMessagesFromMemory, which memcpy's excessively, and then
throw all that data away without using it.
Now, TLogs will give a hint back to peek cursors about if the provided
reply was served entirely from the spilled data, which peek curors then
feed back as the hint into their next request.
At some point, a cursor will send a request for only spilled data, get
an incomplete response, and then be told to send its next request as one
that peeks from memory as well, and then it will fully catch up.
2019-05-14 15:38:48 -10:00
Jingyu Zhou
8b5449e608
Fix review comments for PR #1473
2019-04-29 16:45:42 -07:00
Jingyu Zhou
5462f560e7
Add pseudo locality for log routers and tlogs
...
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
popping.
Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
Jingyu Zhou
010f825aff
Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet
2019-04-21 10:41:07 -07:00
Jingyu Zhou
7befce6bf1
More pseudoLocalities and refactors.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
966ec30fcc
Add pseudoLocalities for special tag consumers
2019-04-21 10:41:07 -07:00
Jingyu Zhou
d19b0cf1c1
Refactor LogSet with two new constructors
2019-04-21 10:41:07 -07:00
Jingyu Zhou
0b1984978a
Small code refactoring.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
ec1bc5cfca
Add LogSystemType enum
2019-04-21 10:41:07 -07:00
Evan Tschannen
b6008558d3
renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
...
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen
5a00f567be
fix CheckSatelliteTagLocation
2019-03-20 09:30:11 -07:00
Balachandar Namasivayam
f9560e1abd
Addressed Review Comments
2019-03-19 15:23:14 -07:00
Balachandar Namasivayam
5471725db5
Support config where the primary and remote DC's can be used as satellites.
2019-03-18 12:17:59 -07:00
Evan Tschannen
a2108047aa
removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects.
2019-03-13 13:14:39 -07:00
Evan Tschannen
8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
...
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller
2dc57568cb
Change many things about log_version.
...
* log_version in the database (`/conf/log_version`) is now a hint that gets
rounded to the nearest supported version.
* fdbcli and FDB enforce that only a valid log_version can be configured to
* TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet)
* Some comments here and there
* Add an assert on filename length to make sure KV-pairs in filename
don't exceed a maximum length.
2019-02-26 16:47:04 -08:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen
0e19b5a935
fix: allow the txnStateStore to be recovered from a process in a down datacenter, so that the cluster controller can know to switch to the other region
2019-02-21 16:52:27 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen
9cfadad41b
fix: if the tagPartitionedLogSystem cannot do a forced recovery, the master should not execute it forced recovery based modifications either
2019-02-18 15:13:18 -08:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
e7e1c634e0
fix: we need to restart the peek cursor when the known committed version becomes available
2018-10-02 17:44:14 -07:00
Evan Tschannen
05e7f08b26
added a peek method which will attempt to read the txsTag from the local region as much as possible
2018-09-28 12:21:08 -07:00
Evan Tschannen
30b2f85020
fix: it is not safe to drop logs supporting the current primary datacenter, because configuring usable_regions down will drop the storage servers in the remote region, leaving you will no remaining logs
2018-07-14 16:26:45 -07:00
Evan Tschannen
cd63c7a7cc
added a buffered cursor, which efficiently merges lots of peek cursors
2018-07-12 12:09:48 -07:00
Evan Tschannen
c148c865e3
optimized log peek cursors to use much less CPU when using the policy engine
2018-07-11 15:43:55 -07:00
Evan Tschannen
6cf5354425
checkSatelliteTagLocations is not an error if the same zoneId is used multiple times
2018-07-05 13:00:13 -07:00
Evan Tschannen
7a12d3e130
added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive.
2018-07-01 09:39:04 -04:00
Evan Tschannen
fb0d10635d
the first location in a satellite team is the one that will serve peek requests. Make sure we probably balance peek traffic by having the first servers on each team be used an equal amount of times
2018-06-27 22:14:50 -07:00
Evan Tschannen
00167b0157
renamed some uses of knownCommittedVersion to durableKnownCommittedVersion
...
epochEnd exclusively refers to the last version a set of logs is responsible for serving peek requests for
recoverAt and recoveredAt refer to the last committed version of the previous generation
2018-06-26 18:20:28 -07:00
Evan Tschannen
68ac3bdc4c
log routers now calculate a precise version to pop for their log router tag
2018-06-21 15:29:46 -07:00
Evan Tschannen
127c2ad775
fix: prevent adding the same location multiple times for satellite logs
2018-06-18 15:27:28 -07:00
Evan Tschannen
50e1e03130
fix: for configurations with anti-quorums to work, the push actors need to be put in the proxy’s actor collection
2018-06-18 15:25:54 -07:00
Evan Tschannen
f637c680f1
fix: populateSatelliteTagLocations was broken
...
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen
6931a00993
satellite log push locations are static per tag, which will reduce the number of tags each satellite log has to index, and reduce the proxy cpu when calculating push locations
2018-06-16 17:39:02 -07:00
Evan Tschannen
f694f7c9ca
removed hasBestPolicy
2018-06-15 12:36:19 -07:00
Evan Tschannen
889889323e
The master will tell the cluster controller if it is going to take a long time to recruit new logs in its DC; the cluster controller can determine if the other DC would be better and recruit there.
...
The cluster controller will not switch to the other data center if remote logs are too far behind.
We will not recruit in DCs with negative priority.
2018-06-13 18:14:14 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
A.J. Beamon
99c9958db7
Some more trace event normalization
2018-06-08 13:57:00 -07:00
Evan Tschannen
c519339adb
avoid peeking from logs that do not match the tag’s locality
2018-06-01 18:42:48 -07:00
Evan Tschannen
5143871fed
passed debug ids into all versions of peek() to assist debugging
2018-04-30 13:36:35 -07:00
Evan Tschannen
9cdabfed0e
added useful trace events
2018-04-29 18:54:47 -07:00
Evan Tschannen
dbdeeaa5cf
fix: log routers are given all the information they need to add remote tags in their initialization request
2018-04-28 18:04:57 -07:00
Evan Tschannen
abcfb0604a
fix: cloneNoMore needs to pass useBestSet
2018-04-26 18:32:12 -07:00
Evan Tschannen
7e434348ce
fix: storage servers did not properly pull data when configuring from a fearless setup to a non-fearless setup
2018-04-25 18:20:28 -07:00
Evan Tschannen
73597f190e
fix: new tlogs are initialized with exactly the tags which existed at the recovery version
2018-04-22 20:28:01 -07:00
Evan Tschannen
a6d9e889f0
a cleaner solution to preventing tlogs from peeking log routers
2018-04-20 13:25:22 -07:00
Evan Tschannen
f5c3417905
fix: prevent tlogs from peeking the wrong log routers
2018-04-20 00:30:37 -07:00
Evan Tschannen
447c7bd15b
fix: log routers use durable known committed version at the time of the pop to determine what is safe to pop from their logs
...
fix: storage server does not advance its version across large version increase until it has data associated with the version
2018-04-18 12:07:29 -07:00
Evan Tschannen
7af892f50b
first working version of non-copying recovery working with fearless configurations
2018-04-08 21:24:05 -07:00
Evan Tschannen
579ba58930
pop old tags only looks are recovered tags, and checks if they are still being used
2018-03-30 19:08:01 -07:00
Evan Tschannen
43cb63df25
fix: the collectTags bool was set incorrectly
2018-03-29 18:19:29 -07:00
Evan Tschannen
1a4ded1c99
support upgrades by merging tags associated with the different peek requests
2018-03-29 17:54:08 -07:00
Evan Tschannen
b36e08f08f
first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet
2018-03-29 15:12:38 -07:00
Evan Tschannen
ccd70fd005
The tlog uses the tags embedded in the message instead of a separate vector of locations
...
optimized remote tlog committing to avoid re-serializing the message
2018-03-16 16:47:05 -07:00
Evan Tschannen
820382ea68
optimized the log router commit path to avoid re-serializing the data
2018-03-16 11:40:21 -07:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen
cfcf98cffc
fix: log router tags were not stored at a best location
2018-02-23 12:26:19 -08:00
Evan Tschannen
ddb484143c
fix: do not peek from remote logs if they are not fully recovered
2018-02-21 14:06:44 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
af97a512f5
to support more complicated policies in the future for determining the best location for a tag within a set of tlogs, use an integer instead of a bool
2018-01-29 17:48:18 -08:00
Evan Tschannen
497bc3fe83
fix: txsTag needs to choose the same best location as 5.X version of the software
2018-01-29 17:09:35 -08:00
Evan Tschannen
29c5d4ad3d
upgrades from 5.X mostly supported, still some remaining correctness problems
2018-01-28 11:52:54 -08:00
Evan Tschannen
264dc44dfa
fixed many more bugs associated with running without remote logs
2018-01-17 17:03:17 -08:00
Evan Tschannen
316e200a0c
fix: compilation errors after merge
2018-01-16 10:48:50 -08:00
Evan Tschannen
21482a45e1
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DBCoreState.h
# fdbserver/LogSystem.h
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen
de119f192d
fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled)
2018-01-11 16:09:49 -08:00
Evan Tschannen
9630deba3a
fixed a number of bugs related to running fearless without remote logs
2018-01-08 12:04:19 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen
76e7988663
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.h
# flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Evan Tschannen
ea26bc1c43
passed first tests which kill entire datacenters
...
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
Evan Tschannen
2335fc73f2
fix: peek cursors were being timed out every 10 minutes, instead of 10 minutes after the last use
...
fix: if an interface is changed while we are not waiting in getMore, we will not reset the sequence to 0.
2017-08-09 15:58:06 -07:00
Evan Tschannen
c22708b6d6
added tag localities
...
fix: remote logs need to stop the master when they are stopped
2017-08-03 16:16:36 -07:00
Evan Tschannen
5852a6301b
fixed even more bugs
2017-07-15 15:15:03 -07:00
Evan Tschannen
5ac4de8775
fix: the same tag could be in the server tags list twice
2017-07-13 16:31:55 -07:00
Evan Tschannen
57ba9d36af
fixed a large number of bugs
2017-07-13 12:29:21 -07:00
Evan Tschannen
415458deef
made LogSet reference counted,
...
fixed a few bugs
2017-07-11 15:48:10 -07:00
Evan Tschannen
81ae263ad9
implemented setPeekCursor
...
removed oldTLogServer
first compiling version
2017-07-10 17:41:32 -07:00
Evan Tschannen
979ebcef6c
changed to using a vector of logSets instead of a duplicate set of logs for remote servers
...
finished porting changes to the tlog
everything but peeking is finished in the TagPartitionedLogSystem
2017-07-09 14:46:16 -07:00
Evan Tschannen
0906250e78
merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem
...
re-included tags in messages to the tlog
previously never committed the LogRouter
2017-06-29 15:50:19 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00