Merge branch 'master' into transaction-tagging

2020-05-07 14:52:22 -07:00 · 2020-05-07 14:52:22 -07:00 · aed97a9f20
parent 92730b2f05 ff992060cd
commit aed97a9f20
14 changed files with 340 additions and 85 deletions
--- a/documentation/sphinx/source/api-python.rst
+++ b/documentation/sphinx/source/api-python.rst
@ -837,6 +837,8 @@ Transaction options

    .. warning:: |option-priority-system-immediate-warning|

+.. _api-python-option-set-causal-read-risky:
+
 .. method:: Transaction.options.set_causal_read_risky

    |option-causal-read-risky-blurb|
--- a/documentation/sphinx/source/developer-guide.rst
+++ b/documentation/sphinx/source/developer-guide.rst
@ -371,7 +371,7 @@ An additional important property, though technically not part of ACID, is also g

 FoundationDB implements these properties using multiversion concurrency control (MVCC) for reads and optimistic concurrency for writes. As a result, neither reads nor writes are blocked by other readers or writers. Instead, conflicting transactions will fail at commit time and will usually be retried by the client.

-In particular, the reads in a transaction take place from an instantaneous snapshot of the database. From the perspective of the transaction this snapshot is not modified by the writes of other, concurrent transactions. When the transaction is ready to be committed, the FoundationDB cluster checks that it does not conflict with any previously committed transaction (i.e. that no value read by a transaction has been modified by another transaction since the read occurred) and, if it does conflict, rejects it. Rejected conflicting transactions are usually retried by the client. Accepted transactions are written to disk on multiple cluster nodes and then reported accepted to the client.
+In particular, the reads in a transaction take place from an instantaneous snapshot of the database. From the perspective of the transaction this snapshot is not modified by the writes of other, concurrent transactions. When the read-write transaction is ready to be committed (read-only transactions don't get committed and therefore never conflict), the FoundationDB cluster checks that it does not conflict with any previously committed transaction (i.e. that no value read by a transaction has been modified by another transaction since the read occurred) and, if it does conflict, rejects it. Rejected conflicting transactions are usually retried by the client. Accepted transactions are written to disk on multiple cluster nodes and then reported accepted to the client.

 * For more background on transactions, see Wikipedia articles for `Database transaction <http://en.wikipedia.org/wiki/Database_transaction>`_, `Atomicity (database systems) <http://en.wikipedia.org/wiki/Atomicity_(database_systems)>`_, and `Concurrency Control <http://en.wikipedia.org/wiki/Concurrency_control>`_.

@ -823,3 +823,136 @@ Loading data is a common task in any database. Loading data in FoundationDB will
 * Use multiple processes loading in parallel if a single one is CPU-bound.

 Using these techniques, our cluster of 24 nodes and 48 SSDs loads about 3 billion (100 byte) key-value pairs per hour.
+
+Implementation Details
+======================
+
+These following sections go into some of the gritty details of FoundationDB. Most users don't need to read or understand this in order to use FoundationDB efficiently.
+
+How FoundationDB Detects Conflicts
+----------------------------------
+
+As written above, FoundationDB implements serializable transactions with external consistency. The underlying algorithm uses multi-version concurrency control. At commit time, each transaction is checked for read-write conflicts.
+
+Conceptually this algorithm is quite simple. Each transaction will get a read version assigned when it issues the first read or before it tries to commit. All reads that happen during that transaction will be read as of that version. Writes will go into a local cache and will be sent to FoundationDB during commit time. The transaction can successfully commit if it is conflict free; it will then get a commit-version assigned. A transaction is conflict free if and only if there have been no writes to any key that was read by that transaction between the time the transaction started and the commit time. This is true if there was no transaction with a commit version larger than our read version but smaller than our commit version that wrote to any of the keys that we read.
+
+This form of conflict detection, while simple, can often be confusing for people who are familiar with databases that check for write-write conflicts.
+
+Some interesting properties of FoundationDB transactions are:
+
+* FoundationDB transactions are optimistic: we never block on reads or writes (there are no locks), instead we abort transactions at commit time.
+* Read-only transactions will never conflict and never cause conflicts with other transactions.
+* Write-only transactions will never conflict but might cause future transactions to conflict.
+* For read-write transactions: A read will never cause any other transaction to be aborted - but reading a key might result in the current transaction being aborted at commit time. A write will never cause a conflict in the current transaction but might cause conflicts in transactions that try to commit in the future.
+* FoundationDB only uses the read conflict set and the write conflict set to resolve transactions. A user can read from and write to FoundationDB without adding entries to these sets. If not done carefully, this can cause non-serializable executions (see :ref:`Snapshot Reads <api-python-snapshot-reads>` and the :ref:`no-write-conflict-range option <api-python-no-write-conflict-range>` option).
+
+How Versions are Generated and Assigned
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Versions are generated by the process that runs the *master* role. FoundationDB guarantees that no version will be generated twice and that the versions are monotonically increasing.
+
+In order to assign read and commit versions to transactions, a client will never talk to the master. Instead it will get both from a proxy. Getting a read version is more complex than a commit version. Let's first look at commit versions:
+
+1. The client will send a commit message to a proxy.
+1. The proxy will put this commit message in a queue in order to build a batch.
+1. In parallel, the proxy will ask for a new version from the master (note that this means that only proxies will ever ask for new versions - which scales much better as it puts less stress on the network).
+1. The proxy will then resolve all transactions within that batch (discussed later) and assign the version it got from the master to *all* transactions within that batch. It will then write the transactions to the transaction log system to make it durable.
+1. If the transaction succeeded, it will send back the version as commit version to the client. Otherwise it will send back an error.
+
+As mentioned before, the algorithm to assign read versions is a bit more complex. At the start of a transaction, a client will ask a proxy server for a read version. The proxy will reply with the last committed version as of the time it received the request - this is important to guarantee external consistency. This is how this is achieved:
+
+#. The client will send a GRV (get read version) request to a proxy.
+#. The proxy will batch GRV requests for a short amount of time (it depends on load and configuartion how big these batches will be).
+#. The proxy will do the following steps in parallel:
+   * Ask all other proxies for their most recent committed version (the largest version they received from the master for which it successfully wrote the transactions to the transaction log system).
+   * Send a message to the transaction log system to verify that it is still writable. This is to prevent that we fetch read versions from a proxy that has been declared to be dead.
+#. It will then take the largest committed version from all proxies (including its own) and send it back to the clients.
+
+Checking whether the log-system is still writeable can be especially expensive if a clusters runs in a multi-region configuration. If a user is fine to sacrifice strict serializability they can use :ref:`option-causal-read-risky <api-python-option-set-causal-read-risky>`.
+
+Conflict Detection
+~~~~~~~~~~~~~~~~~~
+
+This section will only explain conceptually how transactions are resolved in FoundationDB. The implementation will use multiple servers running the *Resolver* role and the keyspace will be sharded across them. It will also only allow resolving transactions whose read versions are less than 5 million versions older than their commit version (around 5 seconds).
+
+A resolver will keep a map in memory which stores the written keys of each commit version. A simpified resolver state could look like this:
+
+=======    =======
+Version    Keys
+=======    =======
+1000       a, b
+1200       f, q, c
+1210       a
+1340       t, u, x
+=======    =======
+
+Now let's assume we have a transaction with read version *1200* and the assigned commit version will be something larger than 1340 - let's say it is *1450*. In that transaction we read keys ``b, m, s`` and we want to write to ``a``. Note that we didn't read ``a`` - so we will issue a blind write. The resolver will check whether any of the read keys (``b, m, or s``) appers in any line between version *1200* and the most recent version, *1450*. The last write to ``b`` was at version 1000 which was before the read version. This means that transaction read the most recent value. We don't know about any recent writes to the other keys. Therefore the resolver will decide that this transaction does *NOT* conflict and it can be committed. It will then add this new write set to its internal state so that it can resolve future transactions. The new state will look like this:
+
+=======    =======
+Version    Keys
+=======    =======
+1000       a, b
+1200       f, q, c
+1210       a
+1340       t, u, x
+1450       a
+=======    =======
+
+Note that the resolver didn't use the write set at all in order to make a decision whether the transaction can commit or not. This means that blind writes (writes to keys without reading them first) will never cause a conflict. But since the resolver will then remember these writes, blind writes can cause future transactions to conflict.
+
+Error Handling
+--------------
+
+When using FoundationDB we strongly recommend users to use the retry-loop. In Python the retry loop would look this this:
+
+.. code-block:: python
+
+   tr = tr.create_transaction()
+   while True:
+       try:
+           # execute reads and writes on FDB using the tr object
+           tr.commit().wait()
+           break
+       except FDBError as e:
+           tr.on_error(e.code).wait()
+
+This is also what the transaction decoration in python does, if you pass a ``Database`` object to a decorated function. There are some interesting properies of this retry loop:
+
+* We never create a new transaction within that loop. Instead ``tr.on_error`` will create a soft reset on the transaction.
+* ``tr.on_error`` returns a future. This is because ``on_error`` will do back off to make sure we don't overwhelm the cluster.
+* If ``tr.on_error`` throws an error, we exit the retry loop.
+
+If you use this retry loop, there are very few caveats. If you write your own and you are not careful, some things might behave differently than you would expect. The following sections will go over the most common errors you will see, the guarantees FoundationDB provides during failures, and common caveats. This retry loop will take care of most of these errors, but it might still be beneficial to understand those.
+
+Errors where we know the State of the Transaction
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The most common errors you will see are errors where we know that the transaction failed to commit. In this case, we're guaranteed that nothing that we attempted to write was written to the database. The most common error codes for this are:
+
+* ``not_committed`` is thrown whenever there was a conflict. This will only be thrown by a ``commit``, read and write operations won't generate this error.
+* ``transaction_too_old`` is thrown if your transaction runs for more than five seconds. If you see this error often, you should try to make your transactions shorter.
+* ``future_version`` is one of the slightly more complex errors. There are a couple ways this error could be generated: if you set the read version of your transaction manually to something larger than exists or if the storage servers are falling behind. The second case should be more common. This is usually caused by a write load that is too high for FoundationDB to handle or by faulty/slow disks.
+
+The good thing about these errors is that retrying is simple: you know that the transaction didn't commit and therefore you can retry even without thinking much about weird corner cases.
+
+The ``commit_unknown_result`` Error
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``commit_unknown_result`` can be thrown during a commit. This error is difficult to handle as you won't know whether your transaction was committed or not. There are mostly two reasons why you might see this error:
+
+#. The client lost the connection to the proxy to which it did send the commit. So it never got a reply and therefore can't know whether the commit was successful or not.
+#. There was a FoundationDB failure - for example a proxy failed during the commit. In that case there is no way for the client know whether the transaction succeeded or not.
+
+However, there is one guarantee FoundationDB gives to the caller: at the point of time where you receive this error, the transaction either committed or not and if it didn't commit, it will never commit in the future. Or: it is guaranteed that the transaction is not in-flight anymore. This is an important guarantee as it means that if your transaction is idempotent you can simply retry. For more explanations see developer-guide-unknown-results_.
+
+Non-Retryable Errors
+~~~~~~~~~~~~~~~~~~~~
+
+The trickiest errors are non-retryable errors. ``Transaction.on_error`` will rethrow these. Some examples of non-retryable errors are:
+
+#. ``transaction_timed_out``. If you set a timeout for a transaction, the transaction will throw this error as soon as that timeout occurs.
+#. ``operation_cancelled``. This error is thrown if you call ``cancel()`` on any future returned by a transaction. So if this future is shared by multiple threads or coroutines, all other waiters will see this error.
+
+If you see one of those errors, the best way of action is to fail the client.
+
+At a first glance this looks very similar to an ``commit_unknown_result``. However, these errors lack the one guarantee ``commit_unknown_result`` still gives to the user: if the commit has already been sent to the database, the transaction could get committed at a later point in time. This means that if you retry the transaction, your new transaction might race with the old transaction. While this technically doesn't violate any consistency guarantees, abandoning a transaction means that there are no causality guaranatees.
--- a/documentation/sphinx/source/kv-architecture.rst
+++ b/documentation/sphinx/source/kv-architecture.rst
@ -38,7 +38,7 @@ The transaction logs make mutations durable to disk for fast commit latencies. T
 Resolvers
 =========

-The resolvers are responsible determining conflicts between transactions. A transaction conflicts if it reads a key that has been written between the transaction's read version and commit version. The resolver does this by holding the last 5 seconds of committed writes in memory, and comparing a new transaction's reads against this set of commits.
+The resolvers are responsible determining conflicts between transactions. A read-write transaction conflicts if it reads a key that has been written between the transaction's read version and commit version. The resolver does this by holding the last 5 seconds of committed writes in memory, and comparing a new transaction's reads against this set of commits.

 Storage Servers
 ===============
--- a/fdbclient/FileBackupAgent.actor.cpp
+++ b/fdbclient/FileBackupAgent.actor.cpp
@ -4614,11 +4614,10 @@ public:
 	// Similar to atomicRestore, only used in simulation test.
 	// locks the database before discontinuing the backup and that same lock is then used while doing the restore.
 	// the tagname of the backup must be the same as the restore.
-	ACTOR static Future<Void> atomicParallelRestore(FileBackupAgent* backupAgent, Database cx, Key tagName,
+	static Future<Void> atomicParallelRestore(FileBackupAgent* backupAgent, Database cx, Key tagName,
 	                                                Standalone<VectorRef<KeyRangeRef>> ranges, Key addPrefix,
 	                                                Key removePrefix) {
-		Version ver = wait(atomicRestore(backupAgent, cx, tagName, ranges, addPrefix, removePrefix, true));
-		return Void();
+		return success(atomicRestore(backupAgent, cx, tagName, ranges, addPrefix, removePrefix, true));
 	}
 };

--- a/fdbrpc/FlowTransport.actor.cpp
+++ b/fdbrpc/FlowTransport.actor.cpp
@ -437,12 +437,13 @@ ACTOR Future<Void> connectionKeeper( Reference<Peer> self,
 		.detail("ConnSet", (bool)conn);
 	ASSERT_WE_THINK(FlowTransport::transport().getLocalAddress() != self->destination);

+	state Future<Void> delayedHealthUpdateF;
 	state Optional<double> firstConnFailedTime = Optional<double>();
 	state int retryConnect = false;

 	loop {
 		try {
-			state Future<Void> delayedHealthUpdateF = Future<Void>();
+			delayedHealthUpdateF = Future<Void>();

 			if (!conn) {  // Always, except for the first loop with an incoming connection
 				self->outgoingConnectionIdle = true;
--- a/fdbserver/CMakeLists.txt
+++ b/fdbserver/CMakeLists.txt
@ -77,6 +77,7 @@ set(FDBSERVER_SRCS
  RestoreWorker.actor.cpp
  Resolver.actor.cpp
  ResolverInterface.h
+  ServerDBInfo.actor.h
  ServerDBInfo.h
  SimulatedCluster.actor.cpp
  SimulatedCluster.h
--- a/fdbserver/RestoreMaster.actor.cpp
+++ b/fdbserver/RestoreMaster.actor.cpp
@ -193,7 +193,6 @@ ACTOR Future<Void> distributeRestoreSysInfo(Reference<RestoreMasterData> masterD
 ACTOR Future<Void> startProcessRestoreRequests(Reference<RestoreMasterData> self, Database cx) {
 	state UID randomUID = deterministicRandom()->randomUniqueID();
 	state Standalone<VectorRef<RestoreRequest>> restoreRequests = wait(collectRestoreRequests(cx));
-	state int numTries = 0;
 	state int restoreIndex = 0;

 	TraceEvent("FastRestoreMasterWaitOnRestoreRequests", self->id()).detail("RestoreRequests", restoreRequests.size());
--- a/fdbserver/ServerDBInfo.actor.h
+++ b/fdbserver/ServerDBInfo.actor.h
@ -0,0 +1,101 @@
+/*
+ * ServerDBInfo.actor.h
+ *
+ * This source file is part of the FoundationDB open source project
+ *
+ * Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#if defined(NO_INTELLISENSE) && !defined(FDBSERVER_SERVERDBINFO_ACTOR_G_H)
+#define FDBSERVER_SERVERDBINFO_ACTOR_G_H
+#include "fdbserver/ServerDBInfo.actor.g.h"
+#elif !defined(FDBSERVER_SERVERDBINFO_ACTOR_H)
+#define FDBSERVER_SERVERDBINFO_ACTOR_H
+#define FDBSERVER_SERVERDBINFO_H
+#pragma once
+
+#include "fdbserver/DataDistributorInterface.h"
+#include "fdbserver/MasterInterface.h"
+#include "fdbserver/LogSystemConfig.h"
+#include "fdbserver/RatekeeperInterface.h"
+#include "fdbserver/RecoveryState.h"
+#include "fdbserver/LatencyBandConfig.h"
+#include "fdbserver/WorkerInterface.actor.h"
+#include "flow/actorcompiler.h" // This must be the last #include.
+
+struct ServerDBInfo {
+	constexpr static FileIdentifier file_identifier = 13838807;
+	// This structure contains transient information which is broadcast to all workers for a database,
+	// permitting them to communicate with each other.  It is not available to the client.  This mechanism
+	// (see GetServerDBInfoRequest) is closely parallel to OpenDatabaseRequest for the client.
+
+	UID id;  // Changes each time any other member changes
+	ClusterControllerFullInterface clusterInterface;
+	ClientDBInfo client;           // After a successful recovery, eventually proxies that communicate with it
+	Optional<DataDistributorInterface> distributor;  // The best guess of current data distributor.
+	MasterInterface master;        // The best guess as to the most recent master, which might still be recovering
+	Optional<RatekeeperInterface> ratekeeper;
+	std::vector<ResolverInterface> resolvers;
+	DBRecoveryCount recoveryCount; // A recovery count from DBCoreState.  A successful master recovery increments it twice; unsuccessful recoveries may increment it once. Depending on where the current master is in its recovery process, this might not have been written by the current master.
+	RecoveryState recoveryState;
+	LifetimeToken masterLifetime;  // Used by masterserver to detect not being the currently chosen master
+	LocalityData myLocality;       // (Not serialized) Locality information, if available, for the *local* process
+	LogSystemConfig logSystemConfig;
+	std::vector<UID> priorCommittedLogServers;   // If !fullyRecovered and logSystemConfig refers to a new log system which may not have been committed to the coordinated state yet, then priorCommittedLogServers are the previous, fully committed generation which need to stay alive in case this recovery fails
+	Optional<LatencyBandConfig> latencyBandConfig;
+	std::vector<std::pair<uint16_t,StorageServerInterface>> storageCaches;
+	int64_t infoGeneration;
+
+	ServerDBInfo() : recoveryCount(0), recoveryState(RecoveryState::UNINITIALIZED), logSystemConfig(0), infoGeneration(0) {}
+
+	bool operator == (ServerDBInfo const& r) const { return id == r.id; }
+	bool operator != (ServerDBInfo const& r) const { return id != r.id; }
+
+	template <class Ar>
+	void serialize( Ar& ar ) {
+		serializer(ar, id, clusterInterface, client, distributor, master, ratekeeper, resolvers, recoveryCount, recoveryState, masterLifetime, logSystemConfig, priorCommittedLogServers, latencyBandConfig, storageCaches, infoGeneration);
+	}
+};
+
+struct UpdateServerDBInfoRequest {
+	constexpr static FileIdentifier file_identifier = 9467438;
+	Standalone<StringRef> serializedDbInfo;
+	std::vector<Endpoint> broadcastInfo;
+	ReplyPromise<std::vector<Endpoint>> reply;
+
+	template <class Ar>
+	void serialize(Ar& ar) {
+		serializer(ar, serializedDbInfo, broadcastInfo, reply);
+	}
+};
+
+struct GetServerDBInfoRequest {
+	constexpr static FileIdentifier file_identifier = 9467439;
+	UID knownServerInfoID;
+	ReplyPromise<struct ServerDBInfo> reply;
+
+	template <class Ar>
+	void serialize(Ar& ar) {
+		serializer(ar, knownServerInfoID, reply);
+	}
+};
+
+
+ACTOR Future<Void> broadcastTxnRequest(TxnStateRequest req, int sendAmount, bool sendReply);
+
+ACTOR Future<std::vector<Endpoint>> broadcastDBInfoRequest(UpdateServerDBInfoRequest req, int sendAmount, Optional<Endpoint> sender, bool sendReply);
+
+#include "flow/unactorcompiler.h"
+#endif
--- a/fdbserver/ServerDBInfo.h
+++ b/fdbserver/ServerDBInfo.h
@ -22,74 +22,6 @@
 #define FDBSERVER_SERVERDBINFO_H
 #pragma once

-#include "fdbserver/DataDistributorInterface.h"
-#include "fdbserver/MasterInterface.h"
-#include "fdbserver/LogSystemConfig.h"
-#include "fdbserver/RatekeeperInterface.h"
-#include "fdbserver/RecoveryState.h"
-#include "fdbserver/LatencyBandConfig.h"
-#include "fdbserver/WorkerInterface.actor.h"
-
-struct ServerDBInfo {
-	constexpr static FileIdentifier file_identifier = 13838807;
-	// This structure contains transient information which is broadcast to all workers for a database,
-	// permitting them to communicate with each other.  It is not available to the client.  This mechanism
-	// (see GetServerDBInfoRequest) is closely parallel to OpenDatabaseRequest for the client.
-
-	UID id;  // Changes each time any other member changes
-	ClusterControllerFullInterface clusterInterface;
-	ClientDBInfo client;           // After a successful recovery, eventually proxies that communicate with it
-	Optional<DataDistributorInterface> distributor;  // The best guess of current data distributor.
-	MasterInterface master;        // The best guess as to the most recent master, which might still be recovering
-	Optional<RatekeeperInterface> ratekeeper;
-	std::vector<ResolverInterface> resolvers;
-	DBRecoveryCount recoveryCount; // A recovery count from DBCoreState.  A successful master recovery increments it twice; unsuccessful recoveries may increment it once. Depending on where the current master is in its recovery process, this might not have been written by the current master.
-	RecoveryState recoveryState;
-	LifetimeToken masterLifetime;  // Used by masterserver to detect not being the currently chosen master
-	LocalityData myLocality;       // (Not serialized) Locality information, if available, for the *local* process
-	LogSystemConfig logSystemConfig;
-	std::vector<UID> priorCommittedLogServers;   // If !fullyRecovered and logSystemConfig refers to a new log system which may not have been committed to the coordinated state yet, then priorCommittedLogServers are the previous, fully committed generation which need to stay alive in case this recovery fails
-	Optional<LatencyBandConfig> latencyBandConfig;
-	std::vector<std::pair<uint16_t,StorageServerInterface>> storageCaches;
-	int64_t infoGeneration;
-
-	ServerDBInfo() : recoveryCount(0), recoveryState(RecoveryState::UNINITIALIZED), logSystemConfig(0), infoGeneration(0) {}
-
-	bool operator == (ServerDBInfo const& r) const { return id == r.id; }
-	bool operator != (ServerDBInfo const& r) const { return id != r.id; }
-
-	template <class Ar>
-	void serialize( Ar& ar ) {
-		serializer(ar, id, clusterInterface, client, distributor, master, ratekeeper, resolvers, recoveryCount, recoveryState, masterLifetime, logSystemConfig, priorCommittedLogServers, latencyBandConfig, storageCaches, infoGeneration);
-	}
-};
-
-struct UpdateServerDBInfoRequest {
-	constexpr static FileIdentifier file_identifier = 9467438;
-	Standalone<StringRef> serializedDbInfo;
-	std::vector<Endpoint> broadcastInfo;
-	ReplyPromise<std::vector<Endpoint>> reply;
-
-	template <class Ar>
-	void serialize(Ar& ar) {
-		serializer(ar, serializedDbInfo, broadcastInfo, reply);
-	}
-};
-
-struct GetServerDBInfoRequest {
-	constexpr static FileIdentifier file_identifier = 9467439;
-	UID knownServerInfoID;
-	ReplyPromise<struct ServerDBInfo> reply;
-
-	template <class Ar>
-	void serialize(Ar& ar) {
-		serializer(ar, knownServerInfoID, reply);
-	}
-};
-
-
-Future<Void> broadcastTxnRequest(TxnStateRequest const& req, int const& sendAmount, bool const& sendReply);
-
-Future<std::vector<Endpoint>> broadcastDBInfoRequest(UpdateServerDBInfoRequest const& req, int const& sendAmount, Optional<Endpoint> const& sender, bool const& sendReply);
+#include "fdbserver/ServerDBInfo.actor.h"

 #endif
--- a/fdbserver/Status.actor.cpp
+++ b/fdbserver/Status.actor.cpp
@ -974,9 +974,9 @@ ACTOR static Future<JsonBuilderObject> recoveryStateStatusFetcher(WorkerDetails
 		// TODO:  time_in_recovery: 0.5
 		//        time_in_state: 0.1

-		TraceEventFields md = wait(activeGens);
-		if(md.size()) {
-			int activeGenerations = md.getInt("ActiveGenerations");
+		TraceEventFields mdActiveGens = wait(activeGens);
+		if(mdActiveGens.size()) {
+			int activeGenerations = mdActiveGens.getInt("ActiveGenerations");
 			message["active_generations"] = activeGenerations;
 		}

--- a/fdbserver/workloads/BackupAndParallelRestoreCorrectness.actor.cpp
+++ b/fdbserver/workloads/BackupAndParallelRestoreCorrectness.actor.cpp
@ -335,7 +335,6 @@ struct BackupAndParallelRestoreCorrectnessWorkload : TestWorkload {
 		state bool extraTasks = false;
 		state UID randomID = nondeterministicRandom()->randomUniqueID();
 		state int restoreIndex = 0;
-		state bool restoreDone = false;
 		state ReadYourWritesTransaction tr2(cx);

 		TraceEvent("BARW_Arguments")
--- a/flow/error_definitions.h
+++ b/flow/error_definitions.h
@ -187,7 +187,7 @@ ERROR( backup_duplicate, 2311, "Backup duplicate request")
 ERROR( backup_unneeded, 2312, "Backup unneeded request")
 ERROR( backup_bad_block_size, 2313, "Backup file block size too small")
 ERROR( backup_invalid_url, 2314, "Backup Container URL invalid")
-ERROR( backup_invalid_info, 2315, "Backup Container URL invalid")
+ERROR( backup_invalid_info, 2315, "Backup Container info invalid")
 ERROR( backup_cannot_expire, 2316, "Cannot expire requested data from backup without violating minimum restorability")
 ERROR( backup_auth_missing, 2317, "Cannot find authentication details (such as a password or secret key) for the specified Backup Container URL")
 ERROR( backup_auth_unreadable, 2318, "Cannot read or parse one or more sources of authentication information for Backup Container URLs")
--- a/flow/flat_buffers.cpp
+++ b/flow/flat_buffers.cpp
@ -347,6 +347,16 @@ TEST_CASE("flow/FlatBuffers/vectorBool") {

 } // namespace unit_tests

+template <>
+struct string_serialized_traits<Void> : std::true_type {
+	int32_t getSize(const Void& item) const { return 0; }
+	uint32_t save(uint8_t* out, const Void& t) const { return 0; }
+	template <class Context>
+	uint32_t load(const uint8_t* data, Void& t, Context& context) {
+		return 0;
+	}
+};
+
 namespace unit_tests {

 struct Y1 {
@ -499,4 +509,57 @@ TEST_CASE("/flow/FlatBuffers/Void") {
 	return Void();
 }

+TEST_CASE("/flow/FlatBuffers/EmptyStrings") {
+	int kSize = deterministicRandom()->randomInt(0, 100);
+	Standalone<StringRef> msg = ObjectWriter::toValue(std::vector<StringRef>(kSize), Unversioned());
+	ObjectReader rd(msg.begin(), Unversioned());
+	std::vector<StringRef> xs;
+	rd.deserialize(xs);
+	ASSERT(xs.size() == kSize);
+	for (const auto& x : xs) {
+		ASSERT(x.size() == 0);
+	}
+	return Void();
+}
+
+TEST_CASE("/flow/FlatBuffers/EmptyVectors") {
+	int kSize = deterministicRandom()->randomInt(0, 100);
+	Standalone<StringRef> msg = ObjectWriter::toValue(std::vector<std::vector<Void>>(kSize), Unversioned());
+	ObjectReader rd(msg.begin(), Unversioned());
+	std::vector<std::vector<Void>> xs;
+	rd.deserialize(xs);
+	ASSERT(xs.size() == kSize);
+	for (const auto& x : xs) {
+		ASSERT(x.size() == 0);
+	}
+	return Void();
+}
+
+TEST_CASE("/flow/FlatBuffers/EmptyVectorRefs") {
+	int kSize = deterministicRandom()->randomInt(0, 100);
+	Standalone<StringRef> msg = ObjectWriter::toValue(std::vector<VectorRef<Void>>(kSize), Unversioned());
+	ObjectReader rd(msg.begin(), Unversioned());
+	std::vector<VectorRef<Void>> xs;
+	rd.deserialize(xs);
+	ASSERT(xs.size() == kSize);
+	for (const auto& x : xs) {
+		ASSERT(x.size() == 0);
+	}
+	return Void();
+}
+
+TEST_CASE("/flow/FlatBuffers/EmptyPreSerVectorRefs") {
+	int kSize = deterministicRandom()->randomInt(0, 100);
+	Standalone<StringRef> msg =
+	    ObjectWriter::toValue(std::vector<VectorRef<Void, VecSerStrategy::String>>(kSize), Unversioned());
+	ObjectReader rd(msg.begin(), Unversioned());
+	std::vector<VectorRef<Void, VecSerStrategy::String>> xs;
+	rd.deserialize(xs);
+	ASSERT(xs.size() == kSize);
+	for (const auto& x : xs) {
+		ASSERT(x.size() == 0);
+	}
+	return Void();
+}
+
 } // namespace unit_tests
--- a/flow/flat_buffers.h
+++ b/flow/flat_buffers.h
@ -387,10 +387,17 @@ struct PrecomputeSize : Context {
 	void write(const void*, int offset, int /*len*/) { current_buffer_size = std::max(current_buffer_size, offset); }

 	template <class T>
-	std::enable_if_t<is_dynamic_size<T>> visitDynamicSize(const T& t) {
+	std::enable_if_t<is_dynamic_size<T>, bool> visitDynamicSize(const T& t) {
 		uint32_t size = dynamic_size_traits<T>::size(t, this->context());
+		if (size == 0 && emptyVector.value != -1) {
+			return true;
+		}
 		int start = RightAlign(current_buffer_size + size + 4, 4);
 		current_buffer_size = std::max(current_buffer_size, start);
+		if (size == 0) {
+			emptyVector = RelativeOffset{ current_buffer_size };
+		}
+		return false;
 	}

 	struct Noop {
@ -416,6 +423,9 @@ struct PrecomputeSize : Context {
 	const int buffer_length = -1; // Dummy, the value of this should not affect anything.
 	const int vtable_start = -1; // Dummy, the value of this should not affect anything.
 	std::vector<int> writeToOffsets;
+
+	// We only need to write an empty vector once, then we can re-use the relative offset.
+	RelativeOffset emptyVector{ -1 };
 };

 template <class Member, class Context>
@ -473,8 +483,11 @@ struct WriteToBuffer : Context {
 	}

 	template <class T>
-	std::enable_if_t<is_dynamic_size<T>> visitDynamicSize(const T& t) {
+	std::enable_if_t<is_dynamic_size<T>, bool> visitDynamicSize(const T& t) {
 		uint32_t size = dynamic_size_traits<T>::size(t, this->context());
+		if (size == 0 && emptyVector.value != -1) {
+			return true;
+		}
 		int padding = 0;
 		int start = RightAlign(current_buffer_size + size + 4, 4, &padding);
 		write(&size, start, 4);
@ -482,11 +495,16 @@ struct WriteToBuffer : Context {
 		dynamic_size_traits<T>::save(&buffer[buffer_length - start], t, this->context());
 		start -= size;
 		memset(&buffer[buffer_length - start], 0, padding);
+		if (size == 0) {
+			emptyVector = RelativeOffset{ current_buffer_size };
+		}
+		return false;
 	}

 	const int buffer_length;
 	const int vtable_start;
 	int current_buffer_size = 0;
+	RelativeOffset emptyVector{ -1 };

 private:
 	void copy_memory(const void* src, int offset, int len) {
@ -1031,7 +1049,7 @@ struct LoadSaveHelper : Context {
 	template <class U, class Writer, typename = std::enable_if_t<is_dynamic_size<U>>>
 	RelativeOffset save(const U& message, Writer& writer, const VTableSet*,
 	                    std::enable_if_t<is_dynamic_size<U>, int> _ = 0) {
-		writer.visitDynamicSize(message);
+		if (writer.visitDynamicSize(message)) return writer.emptyVector;
 		return RelativeOffset{ writer.current_buffer_size };
 	}

@ -1053,6 +1071,9 @@ struct LoadSaveHelper : Context {
 		using T = typename VectorTraits::value_type;
 		constexpr auto size = fb_size<T>;
 		uint32_t num_entries = VectorTraits::num_entries(members, this->context());
+		if (num_entries == 0 && writer.emptyVector.value != -1) {
+			return writer.emptyVector;
+		}
 		uint32_t len = num_entries * size;
 		auto self = writer.getMessageWriter(len);
 		auto iter = VectorTraits::begin(members, this->context());
@ -1062,10 +1083,14 @@ struct LoadSaveHelper : Context {
 			++iter;
 		}
 		int padding = 0;
-		int start = RightAlign(writer.current_buffer_size + len, std::max(4, fb_align<T>), &padding) + 4;
+		int start =
+		    RightAlign(writer.current_buffer_size + len, std::max(4, num_entries == 0 ? 0 : fb_align<T>), &padding) + 4;
 		writer.write(&num_entries, start, sizeof(uint32_t));
 		self.writeTo(writer, start - sizeof(uint32_t));
 		writer.write(&zeros, start - len - 4, padding);
+		if (num_entries == 0) {
+			writer.emptyVector = RelativeOffset{ writer.current_buffer_size };
+		}
 		return RelativeOffset{ writer.current_buffer_size };
 	}
 };