Commit Graph

265 Commits

Author SHA1 Message Date
Evan Tschannen 80c3f2f8e2 added status fields detailing which processes are degraded, and also the total number of degraded processes 2019-03-10 22:58:15 -07:00
Evan Tschannen 044b6b4f8a Merge branch 'master' into feature-degraded-tlog
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen 53f16b5347 when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded 2019-03-08 11:46:34 -05:00
Alex Miller c6a65389ae Remove noexcept macro and replace with BOOST_NOEXCEPT.
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
Alex Miller 244903a9de Spill txsTag by value under TagMsg/ and not TagMsgRef/
There's not a tremendous reason as to why this matters now, but I feel
like I might regret sometime later not keeping the same schema under the
same key.
2019-03-04 01:42:39 -08:00
Alex Miller 72c2cf11ab Replace ResourceLimiter with FlowLock. 2019-03-04 01:42:38 -08:00
Alex Miller aff9ebe21a Spill (start,length) instead of (begin,end) to save a few bytes. 2019-03-04 01:42:38 -08:00
Alex Miller 9ef283d4e7 Implement hard limiting of memory used to serve peek requests. 2019-03-04 01:42:38 -08:00
Alex Miller e3506ad9af Add a yield to parseMessagesForTag 2019-03-04 01:42:38 -08:00
Alex Miller 742f6e1847 Solve overreading via pre-calculating tag bytes per commit 2019-03-04 01:42:38 -08:00
Alex Miller e7d8520c63 Batch more when spilling data. 2019-03-04 01:42:38 -08:00
Alex Miller 04e1170c88 Spill txsTag by value 2019-03-04 01:42:38 -08:00
Alex Miller ba31f8f1f9 Remove all code related to writing and cleaning up old-style spilling. 2019-02-26 18:13:49 -08:00
Alex Miller 0539c1df00 Peek from new spilled data and not old spilled data. 2019-02-26 18:13:49 -08:00
Alex Miller d687a4bb85 Implement proper cleanup of disk queue when spilling refs. 2019-02-26 18:00:55 -08:00
Alex Miller 84dc41c206 Add a comment. 2019-02-26 18:00:55 -08:00
Alex Miller cade914645 Temporarily add verification of spilled tag data. 2019-02-26 18:00:55 -08:00
Alex Miller f659825575 Persist start,end of message when spilling tags. 2019-02-26 18:00:55 -08:00
Alex Miller 8d76cbed02 Track both start and end versions in versionLocation. 2019-02-26 18:00:55 -08:00
Evan Tschannen 8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller 2dc57568cb Change many things about log_version.
* log_version in the database (`/conf/log_version`) is now a hint that gets
  rounded to the nearest supported version.
* fdbcli and FDB enforce that only a valid log_version can be configured to
* TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet)
* Some comments here and there
* Add an assert on filename length to make sure KV-pairs in filename
  don't exceed a maximum length.
2019-02-26 16:47:04 -08:00
Evan Tschannen b8910ba7cd Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.h
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Alex Miller 91e05575a2 Rename OldTLogServer -> OldTLogServer_4_6 2019-02-19 22:18:10 -08:00
mpilman 3f0fd2a20c Use fwd decls in WorkerInterface
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman 0bb60e5a3b Use proper fwd decl in NativeAPI
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen 065a45e05f Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen 3247d59498 partially restored an optimization on remote storage servers where a behind storage server will keep less data in memory. This optimization was fully maintained on the primary storage servers, but remote storage servers can only use a version which is known to be durable on all remote transaction logs 2019-02-18 16:47:38 -08:00
Vishesh Yadav e05b53d755 Merge remote-tracking branch 'apple/master' into task/tls-upgrade 2019-02-15 20:37:07 -08:00
Jingyu Zhou 7897616164 Fix wait failure bug on cluster controller
The setDistributor() sets an AsyncVar and then runs waitFailureClient. This
ordering is wrong because the AsyncVar::set triggers the other loop to run
first, which will wait on Never(). The correct code should wait on the Future
returned by the waitFailureClient.
2019-02-14 16:37:16 -08:00
Evan Tschannen 1d7fec3074 Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2
# Conflicts:
#	.gitignore
2019-01-24 17:43:06 -08:00
anoyes 6a4d87802b Replace & operator with variadic function 2018-12-28 11:33:42 -08:00
Vishesh Yadav 3eb9b23024 Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint
- This patch will make FDB listen to multiple addresses given via
  command line. Although, we'll still use first address in most places,
  this patch starts using vector<NetworkAddress> in Endpoint at some basic
  places.
- When sending packets to an endpoint, pick a random network address in
  endpoints
- Renames Endpoint::address to Endpoint::addresses since it
  now holds a vector of addresses.
2018-12-13 13:36:52 -08:00
Vishesh Yadav 43e5a46f9b Change Endpoint::address(NetworkAddress) to vector<NetworkAddress>
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.

This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.

NOTE:

Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
2018-12-13 13:36:52 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen f045c041eb fix: if a storage server already exists in a remote region after converting to fearless, it did not receive mutations between the known committed version and the recovery version 2018-11-02 14:11:39 -07:00
Evan Tschannen 1b5d28386a fix: the Tlog would not update the durable version properly when version_sizes was empty 2018-11-02 13:05:54 -07:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen db71b60d72
Merge pull request #819 from satherton/feature-redwood
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen 0217aed74c Merge branch 'release-6.0'
# Conflicts:
#	bindings/go/README.md
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2018-10-15 18:38:51 -07:00
Evan Tschannen ecddeab2ae fixed review comments; demote killRegionCycle test for now 2018-10-08 10:39:39 -07:00
Stephen Atherton 22f8a4efa9 Normalized all unit test names to begin with "/" if they should be included in random unit testing. 2018-10-05 22:09:58 -07:00
Stephen Atherton 7c1dc305cb Merge commit 'a72c8f5cb2e79a673abc0ed3d27ef1c51028fb13' into feature-redwood 2018-10-05 10:15:10 -07:00
Evan Tschannen 3922e477a5 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/LogSystemDiskQueueAdapter.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
Evan Tschannen 15ce215c1b fix: parallel peek requests leaked memory 2018-10-02 17:28:39 -07:00
Evan Tschannen a24eadd73a fix: for remote logs, their known committed version cannot be set to 1, because they can be used when their durable version is 0, leading to a known committed version being greater than a queue committed version 2018-09-28 12:17:21 -07:00
A.J. Beamon 92990d6aef Merge release-6.0 into master 2018-09-21 16:14:39 -07:00
Evan Tschannen 77e2fb787e Merge branch 'release-6.0' into feature-fix-forced-recovery 2018-09-21 14:55:37 -07:00
Evan Tschannen 31d0b0315f fix: tlog spill policy would spill everything when it wanted to spill nothing
use a flow lock to protect updatePersistData and initPersistentState from committing simultaneously
2018-09-20 15:33:38 -07:00
Stephen Atherton 2fc86c5ff3 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbrpc/AsyncFileCached.actor.h
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-09-20 03:39:55 -07:00
Evan Tschannen 270b1b24a6 fix: we have to use durableKnownCommittedVersion, because the is the true lower bound on the recovery version of the remote logs
fixed a compiler error
2018-09-18 16:29:03 -07:00
Evan Tschannen 200e65fe61 added a workload which tests killing an entire region, and recovering from the failure with data loss.
fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
2018-09-17 18:32:39 -07:00
Evan Tschannen 4dd2dda0a3 Merge branch 'release-6.0'
# Conflicts:
#	fdbserver/worker.actor.cpp
2018-09-05 16:11:06 -07:00
Evan Tschannen df406a340e
Merge pull request #742 from ajbeamon/roles-in-trace-events
Add the roles running on a process as a field on trace events in the …
2018-09-05 16:08:12 -07:00
Evan Tschannen 90301f497f Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbrpc/TLSConnection.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/Status.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/StatusWorkload.actor.cpp
#	versions.target
2018-09-05 16:06:33 -07:00
A.J. Beamon 2de0b5d6d7 Add the roles running on a process as a field on trace events in the form of a comma delimited string of role abbreviations. 2018-09-05 15:06:14 -07:00
Evan Tschannen 1022e0a5c6 added yields to the log router and tlogs after processing a version 2018-09-04 17:16:44 -07:00
Evan Tschannen 717c43a69f merge 6.0 into master 2018-08-22 00:28:04 -07:00
Evan Tschannen 84e1f7b2b5 added overhead bytes durable to complement overhead bytes input 2018-08-21 22:35:04 -07:00
Evan Tschannen 74f7412975 added separate logging for overhead bytes 2018-08-21 22:18:38 -07:00
Evan Tschannen d7c01f0419 added a separate knob for tlog’s recoverMemoryLimit 2018-08-21 21:11:23 -07:00
Alex Miller 74a9d2f836 Remove a couple more `Void _ = wait`s that crept in from rebase. 2018-08-14 15:50:26 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 69aa33eed5 Fix a case of a bool being used as an integer. 2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen 7c5d414f7b fix: during destruction logData could attempt to dereference tLogData after it has been deleted 2018-08-09 12:38:35 -07:00
Evan Tschannen c757c68bfa fix: nextVersion needs to be set to logData->version if version_sizes is empty 2018-08-04 23:53:37 -07:00
Evan Tschannen fec285146c significant cpu optimization in update storage 2018-08-04 12:36:48 -07:00
Evan Tschannen be1a4d74c7 tlogs serve reads to log routers at a low priority, to prevent them from using all their resources catching up a remote dc that has been down for a long time
increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget
2018-08-04 10:31:30 -07:00
Evan Tschannen 71f89f372f changed a trace event name to avoid scope type mismatch on the tag field 2018-08-03 15:53:38 -07:00
Evan Tschannen 2619234477 Merge branch 'release-5.2' into release-6.0
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2018-08-03 11:40:24 -07:00
Evan Tschannen 501033c5af fix: tlog spilling on a stopped log was only making one version durable at a time 2018-08-03 11:38:12 -07:00
Evan Tschannen 1c29275672 call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details. 2018-08-01 14:30:57 -07:00
Stephen Atherton 40762d9f9b Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-07-25 17:58:52 -07:00
Evan Tschannen 0f59dc4086 fix: do not write to the persistent queue when we are terminated, which could happen if shutdown was caused by setting a promise in the asyncPullData loop 2018-07-13 17:01:31 -07:00
Evan Tschannen cd63c7a7cc added a buffered cursor, which efficiently merges lots of peek cursors 2018-07-12 12:09:48 -07:00
Stephen Atherton 96389c74cd Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 16:42:34 -07:00
Stephen Atherton 1bc95862b7 Merge branch 'release-6.0' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-07-10 04:16:02 -07:00
Evan Tschannen 6b40f2764d fix: off by one error on popping missing tags 2018-07-09 15:43:22 -07:00
Evan Tschannen 2718176927 fix: remote logs did not pop all of the data for removed logs on recovery because data for the missing tag was not recorded yet at the time of recovery 2018-07-06 16:10:41 -07:00
Evan Tschannen 507b3bacb0 fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
added more recovery states.
2018-07-05 00:08:51 -07:00
Stephen Atherton b95a2bd6c1 Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
# Conflicts:
#	flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen 2987f85177 fix: known committed version must be updated before creating the tlogQueueEntryRef 2018-06-26 23:21:30 -07:00
Evan Tschannen 6e19622872 fix: durableKnownCommittedVersion was incorrect because we could update knownCommittedVersion before pushing a TLogQueueEntry with that known committed version 2018-06-26 18:02:55 -07:00
Evan Tschannen 2ec8744ab3 fix: parallel get more needs to verify the begin version matches the end of the previous request, because when a peek cursor expires we lose all history, so the same sequence number could start at different versions 2018-06-25 11:15:49 -07:00
Evan Tschannen 68ac3bdc4c log routers now calculate a precise version to pop for their log router tag 2018-06-21 15:29:46 -07:00
Evan Tschannen f755961c42 use parallelGetMore during log recovery 2018-06-20 17:04:06 -07:00
Stephen Atherton e5c48d453a Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-18 22:45:27 -07:00
Evan Tschannen df4c445e25 fix: we need to return from commit when stopped 2018-06-18 22:12:46 -07:00
Evan Tschannen 403fb5a2e9 removed unnecessary suppressFor 2018-06-18 17:59:29 -07:00
Evan Tschannen b79feaddd3 added a hard memory limit to the TLog to prevent it from running out of memory. Because remote logs are not ratekeeper controlled this is their only protection 2018-06-18 17:22:40 -07:00
Stephen Atherton 90c8288c68 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-17 14:55:05 -07:00
Evan Tschannen 7aef5ec6f1 got rid of persistUnrecoveredBefore
added persistLocality
2018-06-17 14:44:33 -07:00
Evan Tschannen f637c680f1 fix: populateSatelliteTagLocations was broken
fix: satellites do not index the upgraded locality
2018-06-17 13:29:17 -07:00
Evan Tschannen 0d87186821 use a specific locality for satellites 2018-06-15 11:06:38 -07:00
Evan Tschannen feb8578c06 fix: only flush and exit in simulation 2018-06-14 13:48:30 -07:00
Stephen Atherton 1eae9d621b Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2018-06-13 15:58:21 -07:00
Stephen Atherton 2878f30f29 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/IKeyValueStore.h
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Alex Miller fcfa00928b Make RecoveryState an enum class.
This means that all the == 7 or != 0 checks go away, and explicit names must be used.
2018-06-12 16:50:25 -07:00
Evan Tschannen 372ed67497 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
Evan Tschannen 588eaf4b36 fix: previous delay 0 could still cause us to recruit a tlog before processing disk errors 2018-06-11 11:26:30 -07:00