Merge commit '72b9202c953fe00e8899877a8da0b02851863e9f' into update-release-notes

# Conflicts: # documentation/sphinx/source/release-notes.rst
2019-04-02 11:26:22 -07:00 · 2019-04-02 11:26:22 -07:00 · 972e29f2b9
parent fdd2d89723 72b9202c95
commit 972e29f2b9
6 changed files with 49 additions and 24 deletions
--- a/documentation/sphinx/source/release-notes.rst
+++ b/documentation/sphinx/source/release-notes.rst
@ -14,7 +14,7 @@ Features
 * Added configuration option to choose log spilling implementation `(PR #1160) <https://github.com/apple/foundationdb/pull/1160>`_
 * Added configuration option to choose log system implementation `(PR #1160) <https://github.com/apple/foundationdb/pull/1160>`_
 * Batch priority transactions are now limited separately by ratekeeper and will be throttled at lower levels of cluster saturation. This makes it possible to run a more intense background load at saturation without significantly affecting normal priority transactions. It is still recommended not to run excessive loads at batch priority. `(PR #1198) <https://github.com/apple/foundationdb/pull/1198>`_
-* Restore now requires the destnation cluster to be specified explicitly to avoid confusion. `(PR #1240) <https://github.com/apple/foundationdb/pull/1240>`_
+* Restore now requires the destination cluster to be specified explicitly to avoid confusion. `(PR #1240) <https://github.com/apple/foundationdb/pull/1240>`_
 * Restore now accepts a timestamp that can be used to determine the restore version if the original cluster is available. `(PR #1240) <https://github.com/apple/foundationdb/pull/1240>`_
 * Backup ``status`` and ``describe`` commands now have a ``--json`` output option. `(PR #1248) <https://github.com/apple/foundationdb/pull/1248>`_
 * Separated data distribution from master into its own role. `(PR #1062) <https://github.com/apple/foundationdb/pull/1062>`_
@ -29,11 +29,23 @@ Features
 * Deprecated transaction option ``TRANSACTION_LOGGING_ENABLE``. Added two new transaction options ``DEBUG_TRANSACTION_IDENTIFIER`` and ``LOG_TRANSACTION`` that sets an identifier for the transaction and logs the transaction to the trace file respectively. `(PR #1200) <https://github.com/apple/foundationdb/pull/1200>`_
 * Clients can now specify default transaction timeouts and retry limits for all transactions through a database option. `(Issue #775) <https://github.com/apple/foundationdb/issues/775>`_
 * The "timeout", "max retry delay", and "retry limit" transaction options are no longer reset when the transaction is reset after a call to ``onError`` (as of API version 610). `(Issue #775) <https://github.com/apple/foundationdb/issues/775>`_
+* Added the ``force_recovery_with_data_loss`` command to fdbcli. When a cluster is configured with usable_regions=2, this command will force the database to recover in the remote region. `(PR #1168) <https://github.com/apple/foundationdb/pull/1168>`_
+* Added a limit to the number of status requests the cluster controller will handle. `(PR #1093) <https://github.com/apple/foundationdb/pull/1093>`_ (submitted by tclinken)
+* Added a ``coordinator`` process class. Processes with this class can only be used as a coordinator, and ``coordinators auto`` will prefer to choose processes of this class. `(PR #1069) <https://github.com/apple/foundationdb/pull/1069>`_ (submitted by tclinken)
+* The ``consistencycheck`` fdbserver role will check the entire database at most once every week. `(PR #1126) <https://github.com/apple/foundationdb/pull/1126>`_
+* Added the metadata version key (``\xff/metadataVersion``). The value of this key is sent with every read version. It is intended to help clients cache rarely changing metadata. `(PR #1213) <https://github.com/apple/foundationdb/pull/1213>`_
+* The ``fdbdr switch`` command verifies a ``dr_agent`` exists in both directions. `(Issue #1220) <https://github.com/apple/foundationdb/issues/1220>`_
+* Transaction logs that cannot commit to disk for more than 5 seconds are marked as degraded. The cluster controller will prefer to recruit transaction logs on other processes before using degraded processes. `(Issue #690) <https://github.com/apple/foundationdb/issues/690>`_
+* The ``memory`` storage engine configuration now uses the ssd engine for transaction log spilling. Transaction log spilling only happens when the transaction logs are using too much memory, so using the memory storage engine for this purpose can cause the process to run out of memory. Existing clusters will NOT automatically change their configuration. `(PR #1314) <https://github.com/apple/foundationdb/pull/1314>`_
+* Trace logs can be output as JSON instead of XML using the ``--trace_format`` command line option. `(PR #976) <https://github.com/apple/foundationdb/pull/976>`_ (by atn34)
+* Added ``modify`` command to fdbbackup for modifying parameters of a running backup. `(PR #1237) <https://github.com/apple/foundationdb/pull/1237>`_
+* Added 'header' parameter to blobstore backup URLs for setting custom HTTP headers. `(PR #1237) <https://github.com/apple/foundationdb/pull/1237>`_

 Performance
 -----------

-* Java: Successful commits and range reads no longer create ``FDBException`` objects to reduce memory pressure. `(Issue #1235) <https://github.com/apple/foundationdb/issues/1235>`_
+* Increased the get read version batch size in the client. This change reduces the load on the proxies when doing many transactions with only a few operations per transaction. `(PR #1311) <https://github.com/apple/foundationdb/pull/1311>`_
+* Clients no longer attempt to connect to the master during recovery. `(PR #1317) <https://github.com/apple/foundationdb/pull/1317>`_

 Fixes
 -----
@ -42,11 +54,22 @@ Fixes
 * In some cases, calling ``OnError`` with a non-retryable error would partially reset a transaction. As of API version 610, the transaction will no longer be reset in these cases and will instead put the transaction into an error state. `(PR #1298) <https://github.com/apple/foundationdb/pull/1298>`_
 * Standardized datetime string format across all backup and restore command options and outputs. `(PR #1248) <https://github.com/apple/foundationdb/pull/1248>`_
 * Read workload status metrics would disappear when a storage server was missing. `(PR #1348) <https://github.com/apple/foundationdb/pull/1348>`_
+* The ``coordinators auto`` command could recruit multiple coordinators with the same zone ID. `(Issue #988) <https://github.com/apple/foundationdb/issues/988>`_
+* The data version of a cluster after a restore could have been lower than the restore version, making versionstamp operations get smaller. `(PR #1213) <https://github.com/apple/foundationdb/pull/1213>`_
+* Fixed a few thread safety issues with slow task profiling. `(PR #1085) <https://github.com/apple/foundationdb/pull/1085>`_
+* Changing the class of a process would not change its preference for becoming the cluster controller. `(PR #1350) <https://github.com/apple/foundationdb/pull/1350>`_
+* The Go bindings reported an incorrect required version when trying to load an incompatible fdb_c library. `(PR #1053) <https://github.com/apple/foundationdb/pull/1053>`_
+* The ``include`` command in fdbcli would falsely include all machines with IP addresses that
+  have the included IP address as a prefix (for example ``include 1.0.0.1`` would also include
+  ``1.0.0.10``). `(PR #1121) <https://github.com/apple/foundationdb/pull/1121>`_
+* Restore could crash when reading a file that ends on a block boundary (1MB default). `(PR #1205) <https://github.com/apple/foundationdb/pull/1205>`_
+* Java: Successful commits and range reads no longer create ``FDBException`` objects, which avoids wasting resources and reduces memory pressure. `(Issue #1235) <https://github.com/apple/foundationdb/issues/1235>`_

 Status
 ------

 * Report the number of connected coordinators for each client. This aids in monitoring client TLS support when enabling TLS on a live cluster. `(PR #1222) <https://github.com/apple/foundationdb/pull/1222>`_
+* Degraded processes are reported in ``status json``. `(Issue #690) <https://github.com/apple/foundationdb/issues/690>`_

 Bindings
 --------
@ -79,6 +102,7 @@ Other Changes
 -------------

 * Migrated to Boost 1.67. `(PR #1242) <https://github.com/apple/foundationdb/pull/1242>`_
+* IPv4 address in trace log filename is no longer zero-padded. `(PR #1157) <https://github.com/apple/foundationdb/pull/1157>`_

 Earlier release notes
 ---------------------
--- a/fdbcli/fdbcli.actor.cpp
+++ b/fdbcli/fdbcli.actor.cpp
@ -40,6 +40,7 @@

 #include "fdbcli/FlowLineNoise.h"

+#include <type_traits>
 #include <signal.h>

 #ifdef __unixish__
@ -1311,7 +1312,7 @@ void printStatus(StatusObjectReader statusObj, StatusClient::StatusLevel level,
 				outputStringCache = outputString;
 				try {
 					// constructs process performance details output
-					std::map<int, std::string> workerDetails;
+					std::map<NetworkAddress, std::string> workerDetails;
 					for (auto proc : processesMap.obj()){
 						StatusObjectReader procObj(proc.second);
 						std::string address;
@ -1319,27 +1320,24 @@ void printStatus(StatusObjectReader statusObj, StatusClient::StatusLevel level,

 						std::string line;

-						// Windows does not support the "hh" width specifier so just using unsigned int to be safe.
-						unsigned int a, b, c, d, port;
-						int tokens = sscanf(address.c_str(), "%u.%u.%u.%u:%u", &a, &b, &c, &d, &port);
-
-						// If did not get exactly 5 tokens, or one of the integers is too large, address is invalid.
-						if (tokens != 5 || a & ~0xFF || b & ~0xFF || c & ~0xFF || d & ~0xFF || port & ~0xFFFF)
-						{
+						NetworkAddress parsedAddress;
+						try {
+							parsedAddress = NetworkAddress::parse(address);
+						} catch (Error& e) {
+							// Groups all invalid IP address/port pair in the end of this detail group.
 							line = format("  %-22s (invalid IP address or port)", address.c_str());
-							std::string &lastline = workerDetails[std::numeric_limits<int>::max()];
+							IPAddress::IPAddressStore maxIp;
+							for (int i = 0; i < maxIp.size(); ++i) {
+								maxIp[i] = std::numeric_limits<std::remove_reference<decltype(maxIp[0])>::type>::max();
+							}
+							std::string& lastline =
+							    workerDetails[NetworkAddress(IPAddress(maxIp), std::numeric_limits<uint16_t>::max())];
 							if (!lastline.empty())
 								lastline.append("\n");
 							lastline += line;
 							continue;
 						}

-						// Create addrNum as a 48 bit number {A}{B}{C}{D}{PORT}
-						uint64_t addrNum = 0;
-						for (auto i : { a, b, c, d })
-							addrNum = (addrNum << 8) | i;
-						addrNum = (addrNum << 16) | port;
-
 						try {
 							double tx = -1, rx = -1, mCPUUtil = -1;
 							int64_t processTotalSize;
@ -1401,12 +1399,12 @@ void printStatus(StatusObjectReader statusObj, StatusClient::StatusLevel level,
 								}
 							}

-							workerDetails[addrNum] = line;
+							workerDetails[parsedAddress] = line;
 						}

 						catch (std::runtime_error& e) {
 							std::string noMetrics = format("  %-22s (no metrics available)", address.c_str());
-							workerDetails[addrNum] = noMetrics;
+							workerDetails[parsedAddress] = noMetrics;
 						}
 					}
 					for (auto w : workerDetails)
--- a/fdbclient/FileBackupAgent.actor.cpp
+++ b/fdbclient/FileBackupAgent.actor.cpp
@ -3772,7 +3772,7 @@ public:
 					printf("%s\n", details.c_str());
 				}

-			ERestoreState status_ = wait(restore.stateEnum().getD(tr));
+				ERestoreState status_ = wait(restore.stateEnum().getD(tr));
 				status = status_;
 				state bool runnable = wait(restore.isRunnable(tr));

@ -4251,9 +4251,10 @@ public:
 				wait(tr->commit());
 				break;
 			} catch(Error &e) {
-				if(e.code() != error_code_restore_duplicate_tag) {
-					wait(tr->onError(e));
+				if(e.code() == error_code_restore_duplicate_tag) {
+					throw;
 				}
+				wait(tr->onError(e));
 			}
 		}

--- a/fdbclient/NativeAPI.actor.cpp
+++ b/fdbclient/NativeAPI.actor.cpp
@ -2945,7 +2945,7 @@ ACTOR Future<Void> readVersionBatcher( DatabaseContext *cx, FutureStream< std::p
 				if (requests.size() == CLIENT_KNOBS->MAX_BATCH_SIZE)
 					send_batch = true;
 				else if (!timeout.isValid())
-					timeout = delay(batchTime, cx->taskID);
+					timeout = delay(batchTime, TaskProxyGetConsistentReadVersion);
 			}
 			when(wait(timeout.isValid() ? timeout : Never())) {
 				send_batch = true;
--- a/fdbrpc/IAsyncFile.actor.cpp
+++ b/fdbrpc/IAsyncFile.actor.cpp
@ -88,7 +88,7 @@ ACTOR static Future<Void> incrementalDeleteHelper( std::string filename, bool mu
 	state bool exists = fileExists(filename);

 	if(exists) {
-		Reference<IAsyncFile> f = wait(IAsyncFileSystem::filesystem()->open(filename, IAsyncFile::OPEN_READWRITE | IAsyncFile::OPEN_UNCACHED, 0));
+		Reference<IAsyncFile> f = wait(IAsyncFileSystem::filesystem()->open(filename, IAsyncFile::OPEN_READWRITE | IAsyncFile::OPEN_UNCACHED | IAsyncFile::OPEN_UNBUFFERED, 0));
 		file = f;

 		int64_t fileSize = wait(file->size());
--- a/fdbserver/masterserver.actor.cpp
+++ b/fdbserver/masterserver.actor.cpp
@ -326,10 +326,12 @@ ACTOR Future<Void> newTLogServers( Reference<MasterData> self, RecruitFromConfig
 		Future<RecruitRemoteFromConfigurationReply> fRemoteWorkers = brokenPromiseToNever( self->clusterController.recruitRemoteFromConfiguration.getReply( RecruitRemoteFromConfigurationRequest( self->configuration, remoteDcId, recr.tLogs.size() * std::max<int>(1, self->configuration.desiredLogRouterCount / std::max<int>(1, recr.tLogs.size())), exclusionWorkerIds) ) );

 		self->primaryLocality = self->dcId_locality[recr.dcId];
+		self->logSystem = Reference<ILogSystem>();
 		Reference<ILogSystem> newLogSystem = wait( oldLogSystem->newEpoch( recr, fRemoteWorkers, self->configuration, self->cstate.myDBState.recoveryCount + 1, self->primaryLocality, self->dcId_locality[remoteDcId], self->allTags, self->recruitmentStalled ) );
 		self->logSystem = newLogSystem;
 	} else {
 		self->primaryLocality = tagLocalitySpecial;
+		self->logSystem = Reference<ILogSystem>();
 		Reference<ILogSystem> newLogSystem = wait( oldLogSystem->newEpoch( recr, Never(), self->configuration, self->cstate.myDBState.recoveryCount + 1, self->primaryLocality, tagLocalitySpecial, self->allTags, self->recruitmentStalled ) );
 		self->logSystem = newLogSystem;
 	}