Evan Tschannen
80c3f2f8e2
added status fields detailing which processes are degraded, and also the total number of degraded processes
2019-03-10 22:58:15 -07:00
Evan Tschannen
044b6b4f8a
Merge branch 'master' into feature-degraded-tlog
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen
53f16b5347
when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded
2019-03-08 11:46:34 -05:00
Alex Miller
c6a65389ae
Remove noexcept macro and replace with BOOST_NOEXCEPT.
...
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
Alex Miller
244903a9de
Spill txsTag by value under TagMsg/ and not TagMsgRef/
...
There's not a tremendous reason as to why this matters now, but I feel
like I might regret sometime later not keeping the same schema under the
same key.
2019-03-04 01:42:39 -08:00
Alex Miller
72c2cf11ab
Replace ResourceLimiter with FlowLock.
2019-03-04 01:42:38 -08:00
Alex Miller
aff9ebe21a
Spill (start,length) instead of (begin,end) to save a few bytes.
2019-03-04 01:42:38 -08:00
Alex Miller
9ef283d4e7
Implement hard limiting of memory used to serve peek requests.
2019-03-04 01:42:38 -08:00
Alex Miller
e3506ad9af
Add a yield to parseMessagesForTag
2019-03-04 01:42:38 -08:00
Alex Miller
742f6e1847
Solve overreading via pre-calculating tag bytes per commit
2019-03-04 01:42:38 -08:00
Alex Miller
e7d8520c63
Batch more when spilling data.
2019-03-04 01:42:38 -08:00
Alex Miller
04e1170c88
Spill txsTag by value
2019-03-04 01:42:38 -08:00
Alex Miller
ba31f8f1f9
Remove all code related to writing and cleaning up old-style spilling.
2019-02-26 18:13:49 -08:00
Alex Miller
0539c1df00
Peek from new spilled data and not old spilled data.
2019-02-26 18:13:49 -08:00
Alex Miller
d687a4bb85
Implement proper cleanup of disk queue when spilling refs.
2019-02-26 18:00:55 -08:00
Alex Miller
84dc41c206
Add a comment.
2019-02-26 18:00:55 -08:00
Alex Miller
cade914645
Temporarily add verification of spilled tag data.
2019-02-26 18:00:55 -08:00
Alex Miller
f659825575
Persist start,end of message when spilling tags.
2019-02-26 18:00:55 -08:00
Alex Miller
8d76cbed02
Track both start and end versions in versionLocation.
2019-02-26 18:00:55 -08:00
Evan Tschannen
8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
...
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller
2dc57568cb
Change many things about log_version.
...
* log_version in the database (`/conf/log_version`) is now a hint that gets
rounded to the nearest supported version.
* fdbcli and FDB enforce that only a valid log_version can be configured to
* TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet)
* Some comments here and there
* Add an assert on filename length to make sure KV-pairs in filename
don't exceed a maximum length.
2019-02-26 16:47:04 -08:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Alex Miller
91e05575a2
Rename OldTLogServer -> OldTLogServer_4_6
2019-02-19 22:18:10 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen
065a45e05f
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen
3247d59498
partially restored an optimization on remote storage servers where a behind storage server will keep less data in memory. This optimization was fully maintained on the primary storage servers, but remote storage servers can only use a version which is known to be durable on all remote transaction logs
2019-02-18 16:47:38 -08:00
Vishesh Yadav
e05b53d755
Merge remote-tracking branch 'apple/master' into task/tls-upgrade
2019-02-15 20:37:07 -08:00
Jingyu Zhou
7897616164
Fix wait failure bug on cluster controller
...
The setDistributor() sets an AsyncVar and then runs waitFailureClient. This
ordering is wrong because the AsyncVar::set triggers the other loop to run
first, which will wait on Never(). The correct code should wait on the Future
returned by the waitFailureClient.
2019-02-14 16:37:16 -08:00
Evan Tschannen
1d7fec3074
Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2
...
# Conflicts:
# .gitignore
2019-01-24 17:43:06 -08:00
anoyes
6a4d87802b
Replace & operator with variadic function
2018-12-28 11:33:42 -08:00
Vishesh Yadav
3eb9b23024
Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint
...
- This patch will make FDB listen to multiple addresses given via
command line. Although, we'll still use first address in most places,
this patch starts using vector<NetworkAddress> in Endpoint at some basic
places.
- When sending packets to an endpoint, pick a random network address in
endpoints
- Renames Endpoint::address to Endpoint::addresses since it
now holds a vector of addresses.
2018-12-13 13:36:52 -08:00
Vishesh Yadav
43e5a46f9b
Change Endpoint::address(NetworkAddress) to vector<NetworkAddress>
...
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.
This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.
NOTE:
Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
2018-12-13 13:36:52 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
f045c041eb
fix: if a storage server already exists in a remote region after converting to fearless, it did not receive mutations between the known committed version and the recovery version
2018-11-02 14:11:39 -07:00
Evan Tschannen
1b5d28386a
fix: the Tlog would not update the durable version properly when version_sizes was empty
2018-11-02 13:05:54 -07:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
db71b60d72
Merge pull request #819 from satherton/feature-redwood
...
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen
0217aed74c
Merge branch 'release-6.0'
...
# Conflicts:
# bindings/go/README.md
# documentation/sphinx/source/release-notes.rst
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2018-10-15 18:38:51 -07:00
Evan Tschannen
ecddeab2ae
fixed review comments; demote killRegionCycle test for now
2018-10-08 10:39:39 -07:00
Stephen Atherton
22f8a4efa9
Normalized all unit test names to begin with "/" if they should be included in random unit testing.
2018-10-05 22:09:58 -07:00
Stephen Atherton
7c1dc305cb
Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood
2018-10-05 10:15:10 -07:00
Evan Tschannen
3922e477a5
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/LogSystemDiskQueueAdapter.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
Evan Tschannen
15ce215c1b
fix: parallel peek requests leaked memory
2018-10-02 17:28:39 -07:00
Evan Tschannen
a24eadd73a
fix: for remote logs, their known committed version cannot be set to 1, because they can be used when their durable version is 0, leading to a known committed version being greater than a queue committed version
2018-09-28 12:17:21 -07:00
A.J. Beamon
92990d6aef
Merge release-6.0 into master
2018-09-21 16:14:39 -07:00
Evan Tschannen
77e2fb787e
Merge branch 'release-6.0' into feature-fix-forced-recovery
2018-09-21 14:55:37 -07:00
Evan Tschannen
31d0b0315f
fix: tlog spill policy would spill everything when it wanted to spill nothing
...
use a flow lock to protect updatePersistData and initPersistentState from committing simultaneously
2018-09-20 15:33:38 -07:00
Stephen Atherton
2fc86c5ff3
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbrpc/AsyncFileCached.actor.h
# fdbserver/IKeyValueStore.h
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-09-20 03:39:55 -07:00
Evan Tschannen
270b1b24a6
fix: we have to use durableKnownCommittedVersion, because the is the true lower bound on the recovery version of the remote logs
...
fixed a compiler error
2018-09-18 16:29:03 -07:00
Evan Tschannen
200e65fe61
added a workload which tests killing an entire region, and recovering from the failure with data loss.
...
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen
4dd2dda0a3
Merge branch 'release-6.0'
...
# Conflicts:
# fdbserver/worker.actor.cpp
2018-09-05 16:11:06 -07:00
Evan Tschannen
df406a340e
Merge pull request #742 from ajbeamon/roles-in-trace-events
...
Add the roles running on a process as a field on trace events in the …
2018-09-05 16:08:12 -07:00
Evan Tschannen
90301f497f
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbrpc/TLSConnection.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/Status.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/StatusWorkload.actor.cpp
# versions.target
2018-09-05 16:06:33 -07:00
A.J. Beamon
2de0b5d6d7
Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations.
2018-09-05 15:06:14 -07:00
Evan Tschannen
1022e0a5c6
added yields to the log router and tlogs after processing a version
2018-09-04 17:16:44 -07:00
Evan Tschannen
717c43a69f
merge 6.0 into master
2018-08-22 00:28:04 -07:00
Evan Tschannen
84e1f7b2b5
added overhead bytes durable to complement overhead bytes input
2018-08-21 22:35:04 -07:00
Evan Tschannen
74f7412975
added separate logging for overhead bytes
2018-08-21 22:18:38 -07:00
Evan Tschannen
d7c01f0419
added a separate knob for tlog’s recoverMemoryLimit
2018-08-21 21:11:23 -07:00
Alex Miller
74a9d2f836
Remove a couple more `Void _ = wait`s that crept in from rebase.
2018-08-14 15:50:26 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
69aa33eed5
Fix a case of a bool being used as an integer.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
7c5d414f7b
fix: during destruction logData could attempt to dereference tLogData after it has been deleted
2018-08-09 12:38:35 -07:00
Evan Tschannen
c757c68bfa
fix: nextVersion needs to be set to logData->version if version_sizes is empty
2018-08-04 23:53:37 -07:00
Evan Tschannen
fec285146c
significant cpu optimization in update storage
2018-08-04 12:36:48 -07:00
Evan Tschannen
be1a4d74c7
tlogs serve reads to log routers at a low priority, to prevent them from using all their resources catching up a remote dc that has been down for a long time
...
increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget
2018-08-04 10:31:30 -07:00
Evan Tschannen
71f89f372f
changed a trace event name to avoid scope type mismatch on the tag field
2018-08-03 15:53:38 -07:00
Evan Tschannen
2619234477
Merge branch 'release-5.2' into release-6.0
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2018-08-03 11:40:24 -07:00
Evan Tschannen
501033c5af
fix: tlog spilling on a stopped log was only making one version durable at a time
2018-08-03 11:38:12 -07:00
Evan Tschannen
1c29275672
call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.
2018-08-01 14:30:57 -07:00
Stephen Atherton
40762d9f9b
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-07-25 17:58:52 -07:00
Evan Tschannen
0f59dc4086
fix: do not write to the persistent queue when we are terminated, which could happen if shutdown was caused by setting a promise in the asyncPullData loop
2018-07-13 17:01:31 -07:00
Evan Tschannen
cd63c7a7cc
added a buffered cursor, which efficiently merges lots of peek cursors
2018-07-12 12:09:48 -07:00
Stephen Atherton
96389c74cd
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 16:42:34 -07:00
Stephen Atherton
1bc95862b7
Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 04:16:02 -07:00
Evan Tschannen
6b40f2764d
fix: off by one error on popping missing tags
2018-07-09 15:43:22 -07:00
Evan Tschannen
2718176927
fix: remote logs did not pop all of the data for removed logs on recovery because data for the missing tag was not recorded yet at the time of recovery
2018-07-06 16:10:41 -07:00
Evan Tschannen
507b3bacb0
fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
...
added more recovery states.
2018-07-05 00:08:51 -07:00
Stephen Atherton
b95a2bd6c1
Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
...
# Conflicts:
# flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen
2987f85177
fix: known committed version must be updated before creating the tlogQueueEntryRef
2018-06-26 23:21:30 -07:00
Evan Tschannen
6e19622872
fix: durableKnownCommittedVersion was incorrect because we could update knownCommittedVersion before pushing a TLogQueueEntry with that known committed version
2018-06-26 18:02:55 -07:00
Evan Tschannen
2ec8744ab3
fix: parallel get more needs to verify the begin version matches the end of the previous request, because when a peek cursor expires we lose all history, so the same sequence number could start at different versions
2018-06-25 11:15:49 -07:00
Evan Tschannen
68ac3bdc4c
log routers now calculate a precise version to pop for their log router tag
2018-06-21 15:29:46 -07:00
Evan Tschannen
f755961c42
use parallelGetMore during log recovery
2018-06-20 17:04:06 -07:00
Stephen Atherton
e5c48d453a
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-18 22:45:27 -07:00
Evan Tschannen
df4c445e25
fix: we need to return from commit when stopped
2018-06-18 22:12:46 -07:00
Evan Tschannen
403fb5a2e9
removed unnecessary suppressFor
2018-06-18 17:59:29 -07:00
Evan Tschannen
b79feaddd3
added a hard memory limit to the TLog to prevent it from running out of memory. Because remote logs are not ratekeeper controlled this is their only protection
2018-06-18 17:22:40 -07:00
Stephen Atherton
90c8288c68
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-17 14:55:05 -07:00
Evan Tschannen
7aef5ec6f1
got rid of persistUnrecoveredBefore
...
added persistLocality
2018-06-17 14:44:33 -07:00
Evan Tschannen
f637c680f1
fix: populateSatelliteTagLocations was broken
...
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen
0d87186821
use a specific locality for satellites
2018-06-15 11:06:38 -07:00
Evan Tschannen
feb8578c06
fix: only flush and exit in simulation
2018-06-14 13:48:30 -07:00
Stephen Atherton
1eae9d621b
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-13 15:58:21 -07:00
Stephen Atherton
2878f30f29
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/IKeyValueStore.h
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Alex Miller
fcfa00928b
Make RecoveryState an enum class.
...
This means that all the == 7 or != 0 checks go away, and explicit names must be used.
2018-06-12 16:50:25 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen
588eaf4b36
fix: previous delay 0 could still cause us to recruit a tlog before processing disk errors
2018-06-11 11:26:30 -07:00