tmp merge

This commit is contained in:
Young Liu 2020-06-16 20:32:07 -07:00
commit 4dfb903a3a
52 changed files with 1069 additions and 944 deletions

View File

@ -1034,6 +1034,9 @@ int parse_transaction(mako_args_t* args, char* optarg) {
op = 0;
while (*ptr) {
// Clang gives false positive array bounds warning, which must be ignored:
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Warray-bounds"
if (strncmp(ptr, "grv", 3) == 0) {
op = OP_GETREADVERSION;
ptr += 3;
@ -1080,6 +1083,7 @@ int parse_transaction(mako_args_t* args, char* optarg) {
error = 1;
break;
}
#pragma clang diagnostic pop
/* count */
num = 0;

View File

@ -10,38 +10,38 @@ macOS
The macOS installation package is supported on macOS 10.7+. It includes the client and (optionally) the server.
* `FoundationDB-6.3.1.pkg <https://www.foundationdb.org/downloads/6.3.1/macOS/installers/FoundationDB-6.3.1.pkg>`_
* `FoundationDB-6.3.2.pkg <https://www.foundationdb.org/downloads/6.3.2/macOS/installers/FoundationDB-6.3.2.pkg>`_
Ubuntu
------
The Ubuntu packages are supported on 64-bit Ubuntu 12.04+, but beware of the Linux kernel bug in Ubuntu 12.x.
* `foundationdb-clients-6.3.1-1_amd64.deb <https://www.foundationdb.org/downloads/6.3.1/ubuntu/installers/foundationdb-clients_6.3.1-1_amd64.deb>`_
* `foundationdb-server-6.3.1-1_amd64.deb <https://www.foundationdb.org/downloads/6.3.1/ubuntu/installers/foundationdb-server_6.3.1-1_amd64.deb>`_ (depends on the clients package)
* `foundationdb-clients-6.3.2-1_amd64.deb <https://www.foundationdb.org/downloads/6.3.2/ubuntu/installers/foundationdb-clients_6.3.2-1_amd64.deb>`_
* `foundationdb-server-6.3.2-1_amd64.deb <https://www.foundationdb.org/downloads/6.3.2/ubuntu/installers/foundationdb-server_6.3.2-1_amd64.deb>`_ (depends on the clients package)
RHEL/CentOS EL6
---------------
The RHEL/CentOS EL6 packages are supported on 64-bit RHEL/CentOS 6.x.
* `foundationdb-clients-6.3.1-1.el6.x86_64.rpm <https://www.foundationdb.org/downloads/6.3.1/rhel6/installers/foundationdb-clients-6.3.1-1.el6.x86_64.rpm>`_
* `foundationdb-server-6.3.1-1.el6.x86_64.rpm <https://www.foundationdb.org/downloads/6.3.1/rhel6/installers/foundationdb-server-6.3.1-1.el6.x86_64.rpm>`_ (depends on the clients package)
* `foundationdb-clients-6.3.2-1.el6.x86_64.rpm <https://www.foundationdb.org/downloads/6.3.2/rhel6/installers/foundationdb-clients-6.3.2-1.el6.x86_64.rpm>`_
* `foundationdb-server-6.3.2-1.el6.x86_64.rpm <https://www.foundationdb.org/downloads/6.3.2/rhel6/installers/foundationdb-server-6.3.2-1.el6.x86_64.rpm>`_ (depends on the clients package)
RHEL/CentOS EL7
---------------
The RHEL/CentOS EL7 packages are supported on 64-bit RHEL/CentOS 7.x.
* `foundationdb-clients-6.3.1-1.el7.x86_64.rpm <https://www.foundationdb.org/downloads/6.3.1/rhel7/installers/foundationdb-clients-6.3.1-1.el7.x86_64.rpm>`_
* `foundationdb-server-6.3.1-1.el7.x86_64.rpm <https://www.foundationdb.org/downloads/6.3.1/rhel7/installers/foundationdb-server-6.3.1-1.el7.x86_64.rpm>`_ (depends on the clients package)
* `foundationdb-clients-6.3.2-1.el7.x86_64.rpm <https://www.foundationdb.org/downloads/6.3.2/rhel7/installers/foundationdb-clients-6.3.2-1.el7.x86_64.rpm>`_
* `foundationdb-server-6.3.2-1.el7.x86_64.rpm <https://www.foundationdb.org/downloads/6.3.2/rhel7/installers/foundationdb-server-6.3.2-1.el7.x86_64.rpm>`_ (depends on the clients package)
Windows
-------
The Windows installer is supported on 64-bit Windows XP and later. It includes the client and (optionally) the server.
* `foundationdb-6.3.1-x64.msi <https://www.foundationdb.org/downloads/6.3.1/windows/installers/foundationdb-6.3.1-x64.msi>`_
* `foundationdb-6.3.2-x64.msi <https://www.foundationdb.org/downloads/6.3.2/windows/installers/foundationdb-6.3.2-x64.msi>`_
API Language Bindings
=====================
@ -58,18 +58,18 @@ On macOS and Windows, the FoundationDB Python API bindings are installed as part
If you need to use the FoundationDB Python API from other Python installations or paths, use the Python package manager ``pip`` (``pip install foundationdb``) or download the Python package:
* `foundationdb-6.3.1.tar.gz <https://www.foundationdb.org/downloads/6.3.1/bindings/python/foundationdb-6.3.1.tar.gz>`_
* `foundationdb-6.3.2.tar.gz <https://www.foundationdb.org/downloads/6.3.2/bindings/python/foundationdb-6.3.2.tar.gz>`_
Ruby 1.9.3/2.0.0+
-----------------
* `fdb-6.3.1.gem <https://www.foundationdb.org/downloads/6.3.1/bindings/ruby/fdb-6.3.1.gem>`_
* `fdb-6.3.2.gem <https://www.foundationdb.org/downloads/6.3.2/bindings/ruby/fdb-6.3.2.gem>`_
Java 8+
-------
* `fdb-java-6.3.1.jar <https://www.foundationdb.org/downloads/6.3.1/bindings/java/fdb-java-6.3.1.jar>`_
* `fdb-java-6.3.1-javadoc.jar <https://www.foundationdb.org/downloads/6.3.1/bindings/java/fdb-java-6.3.1-javadoc.jar>`_
* `fdb-java-6.3.2.jar <https://www.foundationdb.org/downloads/6.3.2/bindings/java/fdb-java-6.3.2.jar>`_
* `fdb-java-6.3.2-javadoc.jar <https://www.foundationdb.org/downloads/6.3.2/bindings/java/fdb-java-6.3.2-javadoc.jar>`_
Go 1.11+
--------

View File

@ -514,6 +514,7 @@
"data_distribution_disabled_for_ss_failures":true,
"data_distribution_disabled_for_rebalance":true,
"data_distribution_disabled":true,
"active_primary_dc":"pv",
"configuration":{
"log_anti_quorum":0,
"log_replicas":2,
@ -575,6 +576,7 @@
"ssd-1",
"ssd-2",
"ssd-redwood-experimental",
"ssd-rocksdb-experimental",
"memory"
]},
"coordinators_count":1,

View File

@ -2,6 +2,14 @@
Release Notes
#############
6.2.23
======
Status
------
* Added ``cluster.active_primary_dc`` that indicates which datacenter is serving as the primary datacenter in multi-region setups. `(PR #3320) <https://github.com/apple/foundationdb/pull/3320>`_
6.2.22
======
@ -21,6 +29,7 @@ Fixes
* ``fdbrestore`` prefix options required exactly a single hyphen instead of the standard two. `(PR #3056) <https://github.com/apple/foundationdb/pull/3056>`_
* Commits could stall on a newly elected proxy because of inaccurate compute estimates. `(PR #3123) <https://github.com/apple/foundationdb/pull/3123>`_
* A transaction class process with a bad disk could be repeatedly recruited as a transaction log. `(PR #3268) <https://github.com/apple/foundationdb/pull/3268>`_
* Fix a potential race condition that could lead to undefined behavior when connecting to a database using the multi-version client API. `(PR #3265) <https://github.com/apple/foundationdb/pull/3265>`_
Features
--------

View File

@ -97,11 +97,19 @@ Other Changes
* The ``\xff\xff/worker_interfaces/`` keyspace now begins at a key which includes a trailing ``/`` (previously ``\xff\xff/worker_interfaces``). Range reads to this range now respect the end key passed into the range and include the keyspace prefix in the resulting keys. `(PR #3095) <https://github.com/apple/foundationdb/pull/3095>`_
* Added FreeBSD support. `(PR #2634) <https://github.com/apple/foundationdb/pull/2634>`_
* Updated boost to 1.72. `(PR #2684) <https://github.com/apple/foundationdb/pull/2684>`_
* Calling ``fdb_run_network`` multiple times in a single run of a client program now returns an error instead of causing undefined behavior. [6.3.1] `(PR #3229) <https://github.com/apple/foundationdb/pull/3229>`_
Fixes from previous versions
----------------------------
* The 6.3.1 patch release includes all fixes from the patch releases 6.2.21 and 6.2.22. :doc:`(6.2 Release Notes) </old-release-notes/release-notes-620>`
Fixes only impacting 6.3.0+
---------------------------
* Renamed ``MIN_DELAY_STORAGE_CANDIDACY_SECONDS`` knob to ``MIN_DELAY_CC_WORST_FIT_CANDIDACY_SECONDS``. [6.3.2] `(PR #3327) <https://github.com/apple/foundationdb/pull/3327>`_
* Refreshing TLS certificates could cause crashes. [6.3.2] `(PR #3352) <https://github.com/apple/foundationdb/pull/3352>`_
* All storage class processes attempted to connect to the same coordinator. [6.3.2] `(PR #3361) <https://github.com/apple/foundationdb/pull/3361>`_
Earlier release notes
---------------------

View File

@ -268,6 +268,8 @@ StatusObject DatabaseConfiguration::toJSON(bool noPolicies) const {
result["storage_engine"] = "ssd-2";
} else if( tLogDataStoreType == KeyValueStoreType::SSD_BTREE_V2 && storageServerStoreType == KeyValueStoreType::SSD_REDWOOD_V1 ) {
result["storage_engine"] = "ssd-redwood-experimental";
} else if (tLogDataStoreType == KeyValueStoreType::SSD_BTREE_V2 && storageServerStoreType == KeyValueStoreType::SSD_ROCKSDB_V1) {
result["storage_engine"] = "ssd-rocksdb-experimental";
} else if( tLogDataStoreType == KeyValueStoreType::MEMORY && storageServerStoreType == KeyValueStoreType::MEMORY ) {
result["storage_engine"] = "memory-1";
} else if( tLogDataStoreType == KeyValueStoreType::SSD_BTREE_V2 && storageServerStoreType == KeyValueStoreType::MEMORY_RADIXTREE ) {
@ -498,7 +500,7 @@ bool DatabaseConfiguration::isExcludedServer( NetworkAddressList a ) const {
return get( encodeExcludedServersKey( AddressExclusion(a.address.ip, a.address.port) ) ).present() ||
get( encodeExcludedServersKey( AddressExclusion(a.address.ip) ) ).present() ||
get( encodeFailedServersKey( AddressExclusion(a.address.ip, a.address.port) ) ).present() ||
get( encodeFailedServersKey( AddressExclusion(a.address.ip) ) ).present() ||
get( encodeFailedServersKey( AddressExclusion(a.address.ip) ) ).present() ||
( a.secondaryAddress.present() && (
get( encodeExcludedServersKey( AddressExclusion(a.secondaryAddress.get().ip, a.secondaryAddress.get().port) ) ).present() ||
get( encodeExcludedServersKey( AddressExclusion(a.secondaryAddress.get().ip) ) ).present() ||

View File

@ -223,13 +223,11 @@ public:
bool enableLocalityLoadBalance;
struct VersionRequest {
SpanID spanContext;
Promise<GetReadVersionReply> reply;
TagSet tags;
Optional<UID> debugID;
VersionRequest(SpanID spanContext, TagSet tags = TagSet(), Optional<UID> debugID = Optional<UID>())
: spanContext(spanContext), tags(tags), debugID(debugID) {}
VersionRequest(TagSet tags = TagSet(), Optional<UID> debugID = Optional<UID>()) : tags(tags), debugID(debugID) {}
};
// Transaction start request batching

View File

@ -36,7 +36,6 @@ typedef uint64_t Sequence;
typedef StringRef KeyRef;
typedef StringRef ValueRef;
typedef int64_t Generation;
typedef UID SpanID;
enum {
tagLocalitySpecial = -1,
@ -697,6 +696,7 @@ struct KeyValueStoreType {
SSD_BTREE_V2,
SSD_REDWOOD_V1,
MEMORY_RADIXTREE,
SSD_ROCKSDB_V1,
END
};
@ -716,6 +716,7 @@ struct KeyValueStoreType {
case SSD_BTREE_V1: return "ssd-1";
case SSD_BTREE_V2: return "ssd-2";
case SSD_REDWOOD_V1: return "ssd-redwood-experimental";
case SSD_ROCKSDB_V1: return "ssd-rocksdb-experimental";
case MEMORY: return "memory";
case MEMORY_RADIXTREE: return "memory-radixtree-beta";
default: return "unknown";

View File

@ -106,6 +106,9 @@ std::map<std::string, std::string> configForToken( std::string const& mode ) {
} else if (mode == "ssd-redwood-experimental") {
logType = KeyValueStoreType::SSD_BTREE_V2;
storeType = KeyValueStoreType::SSD_REDWOOD_V1;
} else if (mode == "ssd-rocksdb-experimental") {
logType = KeyValueStoreType::SSD_BTREE_V2;
storeType = KeyValueStoreType::SSD_ROCKSDB_V1;
} else if (mode == "memory" || mode == "memory-2") {
logType = KeyValueStoreType::SSD_BTREE_V2;
storeType= KeyValueStoreType::MEMORY;

View File

@ -153,7 +153,6 @@ struct CommitTransactionRequest : TimedRequest {
bool firstInBatch() const { return (flags & FLAG_FIRST_IN_BATCH) != 0; }
Arena arena;
SpanID spanContext;
CommitTransactionRef transaction;
ReplyPromise<CommitID> reply;
uint32_t flags;
@ -163,7 +162,7 @@ struct CommitTransactionRequest : TimedRequest {
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, transaction, reply, arena, flags, debugID, spanContext);
serializer(ar, transaction, reply, arena, flags, debugID);
}
};
@ -210,7 +209,6 @@ struct GetReadVersionRequest : TimedRequest {
FLAG_PRIORITY_MASK = PRIORITY_SYSTEM_IMMEDIATE,
};
SpanID spanContext;
uint32_t transactionCount;
uint32_t flags;
TransactionPriority priority;
@ -221,11 +219,9 @@ struct GetReadVersionRequest : TimedRequest {
ReplyPromise<GetReadVersionReply> reply;
GetReadVersionRequest() : transactionCount(1), flags(0) {}
GetReadVersionRequest(SpanID spanContext, uint32_t transactionCount, TransactionPriority priority,
uint32_t flags = 0, TransactionTagMap<uint32_t> tags = TransactionTagMap<uint32_t>(),
Optional<UID> debugID = Optional<UID>())
: spanContext(spanContext), transactionCount(transactionCount), priority(priority), flags(flags), tags(tags),
debugID(debugID) {
GetReadVersionRequest(uint32_t transactionCount, TransactionPriority priority, uint32_t flags = 0, TransactionTagMap<uint32_t> tags = TransactionTagMap<uint32_t>(), Optional<UID> debugID = Optional<UID>())
: transactionCount(transactionCount), priority(priority), flags(flags), tags(tags), debugID(debugID)
{
flags = flags & ~FLAG_PRIORITY_MASK;
switch(priority) {
case TransactionPriority::BATCH:
@ -241,12 +237,12 @@ struct GetReadVersionRequest : TimedRequest {
ASSERT(false);
}
}
bool operator < (GetReadVersionRequest const& rhs) const { return priority < rhs.priority; }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, transactionCount, flags, tags, debugID, reply, spanContext);
serializer(ar, transactionCount, flags, tags, debugID, reply);
if(ar.isDeserializing) {
if((flags & PRIORITY_SYSTEM_IMMEDIATE) == PRIORITY_SYSTEM_IMMEDIATE) {
@ -279,7 +275,6 @@ struct GetKeyServerLocationsReply {
struct GetKeyServerLocationsRequest {
constexpr static FileIdentifier file_identifier = 9144680;
Arena arena;
SpanID spanContext;
KeyRef begin;
Optional<KeyRef> end;
int limit;
@ -287,28 +282,24 @@ struct GetKeyServerLocationsRequest {
ReplyPromise<GetKeyServerLocationsReply> reply;
GetKeyServerLocationsRequest() : limit(0), reverse(false) {}
GetKeyServerLocationsRequest(SpanID spanContext, KeyRef const& begin, Optional<KeyRef> const& end, int limit,
bool reverse, Arena const& arena)
: spanContext(spanContext), begin(begin), end(end), limit(limit), reverse(reverse), arena(arena) {}
template <class Ar>
GetKeyServerLocationsRequest( KeyRef const& begin, Optional<KeyRef> const& end, int limit, bool reverse, Arena const& arena ) : begin( begin ), end( end ), limit( limit ), reverse( reverse ), arena( arena ) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, begin, end, limit, reverse, reply, spanContext, arena);
serializer(ar, begin, end, limit, reverse, reply, arena);
}
};
struct GetRawCommittedVersionRequest {
constexpr static FileIdentifier file_identifier = 12954034;
SpanID spanContext;
Optional<UID> debugID;
ReplyPromise<GetReadVersionReply> reply;
explicit GetRawCommittedVersionRequest(SpanID spanContext, Optional<UID> const& debugID = Optional<UID>()) : spanContext(spanContext), debugID(debugID) {}
explicit GetRawCommittedVersionRequest() : spanContext(), debugID() {}
explicit GetRawCommittedVersionRequest(Optional<UID> const& debugID = Optional<UID>()) : debugID(debugID) {}
template <class Ar>
void serialize( Ar& ar ) {
serializer(ar, debugID, reply, spanContext);
serializer(ar, debugID, reply);
}
};

View File

@ -36,7 +36,6 @@
#include "fdbclient/ClusterInterface.h"
#include "fdbclient/CoordinationInterface.h"
#include "fdbclient/DatabaseContext.h"
#include "fdbclient/FDBOptions.g.h"
#include "fdbclient/KeyRangeMap.h"
#include "fdbclient/Knobs.h"
#include "fdbclient/ManagementAPI.actor.h"
@ -47,7 +46,6 @@
#include "fdbclient/SpecialKeySpace.actor.h"
#include "fdbclient/StorageServerInterface.h"
#include "fdbclient/SystemData.h"
#include "fdbclient/versions.h"
#include "fdbrpc/LoadBalance.h"
#include "fdbrpc/Net2FileSystem.h"
#include "fdbrpc/simulator.h"
@ -61,10 +59,12 @@
#include "flow/Platform.h"
#include "flow/SystemMonitor.h"
#include "flow/TLSConfig.actor.h"
#include "flow/Tracing.h"
#include "flow/Trace.h"
#include "flow/UnitTest.h"
#include "flow/serialize.h"
#include "fdbclient/versions.h"
#ifdef WIN32
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
@ -1539,7 +1539,6 @@ ACTOR Future<Optional<vector<StorageServerInterface>>> transactionalGetServerInt
//If isBackward == true, returns the shard containing the key before 'key' (an infinitely long, inexpressible key). Otherwise returns the shard containing key
ACTOR Future< pair<KeyRange,Reference<LocationInfo>> > getKeyLocation_internal( Database cx, Key key, TransactionInfo info, bool isBackward = false ) {
state Span span("NAPI:getKeyLocation"_loc, { info.span->context });
if (isBackward) {
ASSERT( key != allKeys.begin && key <= allKeys.end );
} else {
@ -1553,10 +1552,7 @@ ACTOR Future< pair<KeyRange,Reference<LocationInfo>> > getKeyLocation_internal(
++cx->transactionKeyServerLocationRequests;
choose {
when ( wait( cx->onMasterProxiesChanged() ) ) {}
when(GetKeyServerLocationsReply rep = wait(basicLoadBalance(
cx->getMasterProxies(info.useProvisionalProxies), &MasterProxyInterface::getKeyServersLocations,
GetKeyServerLocationsRequest(span->context, key, Optional<KeyRef>(), 100, isBackward, key.arena()),
TaskPriority::DefaultPromiseEndpoint))) {
when ( GetKeyServerLocationsReply rep = wait( basicLoadBalance( cx->getMasterProxies(info.useProvisionalProxies), &MasterProxyInterface::getKeyServersLocations, GetKeyServerLocationsRequest(key, Optional<KeyRef>(), 100, isBackward, key.arena()), TaskPriority::DefaultPromiseEndpoint ) ) ) {
++cx->transactionKeyServerLocationRequestsCompleted;
if( info.debugID.present() )
g_traceBatch.addEvent("TransactionDebug", info.debugID.get().first(), "NativeAPI.getKeyLocation.After");
@ -1592,7 +1588,6 @@ Future<pair<KeyRange, Reference<LocationInfo>>> getKeyLocation(Database const& c
}
ACTOR Future< vector< pair<KeyRange,Reference<LocationInfo>> > > getKeyRangeLocations_internal( Database cx, KeyRange keys, int limit, bool reverse, TransactionInfo info ) {
state Span span("NAPI:getKeyRangeLocations"_loc, { info.span->context });
if( info.debugID.present() )
g_traceBatch.addEvent("TransactionDebug", info.debugID.get().first(), "NativeAPI.getKeyLocations.Before");
@ -1600,10 +1595,7 @@ ACTOR Future< vector< pair<KeyRange,Reference<LocationInfo>> > > getKeyRangeLoca
++cx->transactionKeyServerLocationRequests;
choose {
when ( wait( cx->onMasterProxiesChanged() ) ) {}
when(GetKeyServerLocationsReply _rep = wait(basicLoadBalance(
cx->getMasterProxies(info.useProvisionalProxies), &MasterProxyInterface::getKeyServersLocations,
GetKeyServerLocationsRequest(span->context, keys.begin, keys.end, limit, reverse, keys.arena()),
TaskPriority::DefaultPromiseEndpoint))) {
when ( GetKeyServerLocationsReply _rep = wait( basicLoadBalance( cx->getMasterProxies(info.useProvisionalProxies), &MasterProxyInterface::getKeyServersLocations, GetKeyServerLocationsRequest(keys.begin, keys.end, limit, reverse, keys.arena()), TaskPriority::DefaultPromiseEndpoint ) ) ) {
++cx->transactionKeyServerLocationRequestsCompleted;
state GetKeyServerLocationsReply rep = _rep;
if( info.debugID.present() )
@ -1694,7 +1686,6 @@ Future<Void> Transaction::warmRange(Database cx, KeyRange keys) {
ACTOR Future<Optional<Value>> getValue( Future<Version> version, Key key, Database cx, TransactionInfo info, Reference<TransactionLogInfo> trLogInfo, TagSet tags )
{
state Version ver = wait( version );
state Span span("NAPI:getValue"_loc, { info.span->context });
cx->validateVersion(ver);
loop {
@ -1727,12 +1718,10 @@ ACTOR Future<Optional<Value>> getValue( Future<Version> version, Key key, Databa
}
choose {
when(wait(cx->connectionFileChanged())) { throw transaction_too_old(); }
when(GetValueReply _reply = wait(
loadBalance(cx.getPtr(), ssi.second, &StorageServerInterface::getValue,
GetValueRequest(span->context, key, ver,
cx->sampleReadTags() ? tags : Optional<TagSet>(), getValueID),
TaskPriority::DefaultPromiseEndpoint, false,
cx->enableLocalityLoadBalance ? &cx->queueModel : nullptr))) {
when(GetValueReply _reply =
wait(loadBalance(cx.getPtr(), ssi.second, &StorageServerInterface::getValue,
GetValueRequest(key, ver, cx->sampleReadTags() ? tags : Optional<TagSet>(), getValueID), TaskPriority::DefaultPromiseEndpoint, false,
cx->enableLocalityLoadBalance ? &cx->queueModel : nullptr))) {
reply = _reply;
}
}
@ -1790,7 +1779,6 @@ ACTOR Future<Key> getKey( Database cx, KeySelector k, Future<Version> version, T
wait(success(version));
state Optional<UID> getKeyID = Optional<UID>();
state Span span("NAPI:getKey"_loc, { info.span->context });
if( info.debugID.present() ) {
getKeyID = nondeterministicRandom()->randomUniqueID();
@ -1819,11 +1807,9 @@ ACTOR Future<Key> getKey( Database cx, KeySelector k, Future<Version> version, T
choose {
when(wait(cx->connectionFileChanged())) { throw transaction_too_old(); }
when(GetKeyReply _reply =
wait(loadBalance(cx.getPtr(), ssi.second, &StorageServerInterface::getKey,
GetKeyRequest(span->context, k, version.get(),
cx->sampleReadTags() ? tags : Optional<TagSet>(), getKeyID),
TaskPriority::DefaultPromiseEndpoint, false,
cx->enableLocalityLoadBalance ? &cx->queueModel : nullptr))) {
wait(loadBalance(cx.getPtr(), ssi.second, &StorageServerInterface::getKey, GetKeyRequest(k, version.get(), cx->sampleReadTags() ? tags : Optional<TagSet>(), getKeyID),
TaskPriority::DefaultPromiseEndpoint, false,
cx->enableLocalityLoadBalance ? &cx->queueModel : nullptr))) {
reply = _reply;
}
}
@ -1856,15 +1842,12 @@ ACTOR Future<Key> getKey( Database cx, KeySelector k, Future<Version> version, T
}
}
ACTOR Future<Version> waitForCommittedVersion( Database cx, Version version, SpanID spanContext ) {
state Span span("NAPI:waitForCommittedVersion"_loc, { spanContext });
ACTOR Future<Version> waitForCommittedVersion( Database cx, Version version ) {
try {
loop {
choose {
when ( wait( cx->onMasterProxiesChanged() ) ) {}
when(GetReadVersionReply v = wait(basicLoadBalance(
cx->getMasterProxies(false), &MasterProxyInterface::getConsistentReadVersion,
GetReadVersionRequest(span->context, 0, TransactionPriority::IMMEDIATE), cx->taskID))) {
when ( GetReadVersionReply v = wait( basicLoadBalance( cx->getMasterProxies(false), &MasterProxyInterface::getConsistentReadVersion, GetReadVersionRequest( 0, TransactionPriority::IMMEDIATE ), cx->taskID ) ) ) {
cx->minAcceptableReadVersion = std::min(cx->minAcceptableReadVersion, v.version);
if (v.version >= version)
@ -1880,14 +1863,11 @@ ACTOR Future<Version> waitForCommittedVersion( Database cx, Version version, Spa
}
}
ACTOR Future<Version> getRawVersion( Database cx, SpanID spanContext ) {
state Span span("NAPI:getRawVersion"_loc, { spanContext });
ACTOR Future<Version> getRawVersion( Database cx ) {
loop {
choose {
when ( wait( cx->onMasterProxiesChanged() ) ) {}
when(GetReadVersionReply v =
wait(basicLoadBalance(cx->getMasterProxies(false), &MasterProxyInterface::getConsistentReadVersion,
GetReadVersionRequest(spanContext, 0, TransactionPriority::IMMEDIATE), cx->taskID))) {
when ( GetReadVersionReply v = wait( basicLoadBalance( cx->getMasterProxies(false), &MasterProxyInterface::getConsistentReadVersion, GetReadVersionRequest( 0, TransactionPriority::IMMEDIATE ), cx->taskID ) ) ) {
return v.version;
}
}
@ -1901,7 +1881,6 @@ ACTOR Future<Void> readVersionBatcher(
ACTOR Future<Void> watchValue(Future<Version> version, Key key, Optional<Value> value, Database cx,
TransactionInfo info, TagSet tags) {
state Version ver = wait( version );
state Span span(deterministicRandom()->randomUniqueID(), "NAPI:watchValue"_loc, { info.span->context });
cx->validateVersion(ver);
ASSERT(ver != latestVersion);
@ -1918,11 +1897,9 @@ ACTOR Future<Void> watchValue(Future<Version> version, Key key, Optional<Value>
}
state WatchValueReply resp;
choose {
when(WatchValueReply r = wait(
loadBalance(cx.getPtr(), ssi.second, &StorageServerInterface::watchValue,
WatchValueRequest(span->context, key, value, ver,
cx->sampleReadTags() ? tags : Optional<TagSet>(), watchValueID),
TaskPriority::DefaultPromiseEndpoint))) {
when(WatchValueReply r = wait(loadBalance(cx.getPtr(), ssi.second, &StorageServerInterface::watchValue,
WatchValueRequest(key, value, ver, cx->sampleReadTags() ? tags : Optional<TagSet>(), watchValueID),
TaskPriority::DefaultPromiseEndpoint))) {
resp = r;
}
when(wait(cx->connectionFile ? cx->connectionFile->onChange() : Never())) { wait(Never()); }
@ -1933,7 +1910,7 @@ ACTOR Future<Void> watchValue(Future<Version> version, Key key, Optional<Value>
//FIXME: wait for known committed version on the storage server before replying,
//cannot do this until the storage server is notified on knownCommittedVersion changes from tlog (faster than the current update loop)
Version v = wait(waitForCommittedVersion(cx, resp.version, span->context));
Version v = wait(waitForCommittedVersion(cx, resp.version));
//TraceEvent("WatcherCommitted").detail("CommittedVersion", v).detail("WatchVersion", resp.version).detail("Key", key ).detail("Value", value);
@ -1986,7 +1963,6 @@ ACTOR Future<Standalone<RangeResultRef>> getExactRange( Database cx, Version ver
KeyRange keys, GetRangeLimits limits, bool reverse, TransactionInfo info, TagSet tags )
{
state Standalone<RangeResultRef> output;
state Span span("NAPI:getExactRange"_loc, { info.span->context });
//printf("getExactRange( '%s', '%s' )\n", keys.begin.toString().c_str(), keys.end.toString().c_str());
loop {
@ -2000,7 +1976,6 @@ ACTOR Future<Standalone<RangeResultRef>> getExactRange( Database cx, Version ver
req.version = version;
req.begin = firstGreaterOrEqual( range.begin );
req.end = firstGreaterOrEqual( range.end );
req.spanContext = span->context;
transformRangeLimits(limits, reverse, req);
ASSERT(req.limitBytes > 0 && req.limit != 0 && req.limit < 0 == reverse);
@ -2245,7 +2220,6 @@ ACTOR Future<Standalone<RangeResultRef>> getRange( Database cx, Reference<Transa
state KeySelector originalBegin = begin;
state KeySelector originalEnd = end;
state Standalone<RangeResultRef> output;
state Span span("NAPI:getRange"_loc, info.span);
try {
state Version version = wait( fVersion );
@ -2298,7 +2272,6 @@ ACTOR Future<Standalone<RangeResultRef>> getRange( Database cx, Reference<Transa
req.tags = cx->sampleReadTags() ? tags : Optional<TagSet>();
req.debugID = info.debugID;
req.spanContext = span->context;
try {
if( info.debugID.present() ) {
g_traceBatch.addEvent("TransactionDebug", info.debugID.get().first(), "NativeAPI.getRange.Before");
@ -2636,7 +2609,7 @@ ACTOR Future<Void> watch(Reference<Watch> watch, Database cx, TagSet tags, Trans
}
Future<Version> Transaction::getRawReadVersion() {
return ::getRawVersion(cx, info.span->context);
return ::getRawVersion(cx);
}
Future< Void > Transaction::watch( Reference<Watch> watch ) {
@ -2947,22 +2920,31 @@ TransactionOptions::TransactionOptions(Database const& cx) {
}
}
TransactionOptions::TransactionOptions() {
memset(this, 0, sizeof(*this));
void TransactionOptions::clear() {
maxBackoff = CLIENT_KNOBS->DEFAULT_MAX_BACKOFF;
getReadVersionFlags = 0;
sizeLimit = CLIENT_KNOBS->TRANSACTION_SIZE_LIMIT;
tags = TagSet();
readTags = TagSet();
maxTransactionLoggingFieldLength = 0;
checkWritesEnabled = false;
causalWriteRisky = false;
commitOnFirstProxy = false;
debugDump = false;
lockAware = false;
readOnly = false;
firstInBatch = false;
includePort = false;
reportConflictingKeys = false;
tags = TagSet{};
readTags = TagSet{};
priority = TransactionPriority::DEFAULT;
}
TransactionOptions::TransactionOptions() {
clear();
}
void TransactionOptions::reset(Database const& cx) {
memset(this, 0, sizeof(*this));
maxBackoff = CLIENT_KNOBS->DEFAULT_MAX_BACKOFF;
sizeLimit = CLIENT_KNOBS->TRANSACTION_SIZE_LIMIT;
tags = TagSet();
readTags = TagSet();
priority = TransactionPriority::DEFAULT;
clear();
lockAware = cx->lockAware;
if (cx->apiVersionAtLeast(630)) {
includePort = true;
@ -2990,7 +2972,6 @@ void Transaction::reset() {
void Transaction::fullReset() {
reset();
info.span = Span(info.span->location);
backoff = CLIENT_KNOBS->DEFAULT_BACKOFF;
}
@ -3107,8 +3088,6 @@ ACTOR void checkWrites( Database cx, Future<Void> committed, Promise<Void> outCo
ACTOR static Future<Void> commitDummyTransaction( Database cx, KeyRange range, TransactionInfo info, TransactionOptions options ) {
state Transaction tr(cx);
state int retries = 0;
state Span span("NAPI:dummyTransaction"_loc, info.span);
tr.info.span->parents.insert(span->context);
loop {
try {
TraceEvent("CommitDummyTransaction").detail("Key", range.begin).detail("Retries", retries);
@ -3155,8 +3134,6 @@ void Transaction::setupWatches() {
ACTOR static Future<Void> tryCommit( Database cx, Reference<TransactionLogInfo> trLogInfo, CommitTransactionRequest req, Future<Version> readVersion, TransactionInfo info, Version* pCommittedVersion, Transaction* tr, TransactionOptions options) {
state TraceInterval interval( "TransactionCommit" );
state double startTime = now();
state Span span("NAPI:tryCommit"_loc, { info.span->context });
req.spanContext = span->context;
if (info.debugID.present())
TraceEvent(interval.begin()).detail( "Parent", info.debugID.get() );
try {
@ -3580,14 +3557,6 @@ void Transaction::setOption( FDBTransactionOptions::Option option, Optional<Stri
options.readTags.addTag(value.get());
break;
case FDBTransactionOptions::SPAN_PARENT:
validateOptionValue(value, true);
if (value.get().size() != 16) {
throw invalid_option_value();
}
info.span->parents.emplace(BinaryReader::fromStringRef<UID>(value.get(), Unversioned()));
break;
case FDBTransactionOptions::REPORT_CONFLICTING_KEYS:
validateOptionValue(value, false);
options.reportConflictingKeys = true;
@ -3598,16 +3567,13 @@ void Transaction::setOption( FDBTransactionOptions::Option option, Optional<Stri
}
}
ACTOR Future<GetReadVersionReply> getConsistentReadVersion(Span parentSpan, DatabaseContext* cx, uint32_t transactionCount,
TransactionPriority priority, uint32_t flags,
TransactionTagMap<uint32_t> tags, Optional<UID> debugID) {
state Span span("NAPI:getConsistentReadVersion"_loc, parentSpan);
ACTOR Future<GetReadVersionReply> getConsistentReadVersion( DatabaseContext *cx, uint32_t transactionCount, TransactionPriority priority, uint32_t flags, TransactionTagMap<uint32_t> tags, Optional<UID> debugID ) {
try {
++cx->transactionReadVersionBatches;
if( debugID.present() )
g_traceBatch.addEvent("TransactionDebug", debugID.get().first(), "NativeAPI.getConsistentReadVersion.Before");
loop {
state GetReadVersionRequest req( span->context, transactionCount, priority, flags, tags, debugID );
state GetReadVersionRequest req( transactionCount, priority, flags, tags, debugID );
choose {
when ( wait( cx->onMasterProxiesChanged() ) ) {}
when ( GetReadVersionReply v = wait( basicLoadBalance( cx->getMasterProxies(flags & GetReadVersionRequest::FLAG_USE_PROVISIONAL_PROXIES), &MasterProxyInterface::getConsistentReadVersion, req, cx->taskID ) ) ) {
@ -3658,7 +3624,6 @@ ACTOR Future<Void> readVersionBatcher( DatabaseContext *cx, FutureStream<Databas
state PromiseStream<double> replyTimes;
state PromiseStream<Error> _errorStream;
state double batchTime = 0;
state Span span("NAPI:readVersionBatcher"_loc);
loop {
send_batch = false;
choose {
@ -3669,7 +3634,6 @@ ACTOR Future<Void> readVersionBatcher( DatabaseContext *cx, FutureStream<Databas
}
g_traceBatch.addAttach("TransactionAttachID", req.debugID.get().first(), debugID.get().first());
}
span->parents.insert(req.spanContext);
requests.push_back(req.reply);
for(auto tag : req.tags) {
++tags[tag];
@ -3697,10 +3661,9 @@ ACTOR Future<Void> readVersionBatcher( DatabaseContext *cx, FutureStream<Databas
addActor.send(ready(timeReply(GRVReply.getFuture(), replyTimes)));
Future<Void> batch = incrementalBroadcastWithError(
getConsistentReadVersion(span, cx, count, priority, flags, std::move(tags), std::move(debugID)),
getConsistentReadVersion(cx, count, priority, flags, std::move(tags), std::move(debugID)),
std::move(requests), CLIENT_KNOBS->BROADCAST_BATCH_SIZE);
span = Span("NAPI:readVersionBatcher"_loc);
tags.clear();
debugID = Optional<UID>();
requests.clear();
@ -3710,11 +3673,7 @@ ACTOR Future<Void> readVersionBatcher( DatabaseContext *cx, FutureStream<Databas
}
}
ACTOR Future<Version> extractReadVersion(Span parentSpan, DatabaseContext* cx, TransactionPriority priority,
Reference<TransactionLogInfo> trLogInfo, Future<GetReadVersionReply> f,
bool lockAware, double startTime, Promise<Optional<Value>> metadataVersion,
TagSet tags) {
// parentSpan here is only used to keep the parent alive until the request completes
ACTOR Future<Version> extractReadVersion(DatabaseContext* cx, TransactionPriority priority, Reference<TransactionLogInfo> trLogInfo, Future<GetReadVersionReply> f, bool lockAware, double startTime, Promise<Optional<Value>> metadataVersion, TagSet tags) {
GetReadVersionReply rep = wait(f);
double latency = now() - startTime;
cx->GRVLatencies.addSample(latency);
@ -3836,12 +3795,10 @@ Future<Version> Transaction::getReadVersion(uint32_t flags) {
batcher.actor = readVersionBatcher( cx.getPtr(), batcher.stream.getFuture(), options.priority, flags );
}
Span span("NAPI:getReadVersion"_loc, info.span);
auto const req = DatabaseContext::VersionRequest(span->context, options.tags, info.debugID);
auto const req = DatabaseContext::VersionRequest(options.tags, info.debugID);
batcher.stream.send(req);
startTime = now();
readVersion = extractReadVersion(span, cx.getPtr(), options.priority, trLogInfo, req.reply.getFuture(),
options.lockAware, startTime, metadataVersion, options.tags);
readVersion = extractReadVersion( cx.getPtr(), options.priority, trLogInfo, req.reply.getFuture(), options.lockAware, startTime, metadataVersion, options.tags);
}
return readVersion;
}

View File

@ -19,8 +19,6 @@
*/
#pragma once
#include "flow/IRandom.h"
#include "flow/Tracing.h"
#if defined(NO_INTELLISENSE) && !defined(FDBCLIENT_NATIVEAPI_ACTOR_G_H)
#define FDBCLIENT_NATIVEAPI_ACTOR_G_H
#include "fdbclient/NativeAPI.actor.g.h"
@ -139,26 +137,28 @@ struct TransactionOptions {
TagSet tags; // All tags set on transaction
TagSet readTags; // Tags that can be sent with read requests
// update clear function if you add a new field
TransactionOptions(Database const& cx);
TransactionOptions();
void reset(Database const& cx);
private:
void clear();
};
class ReadYourWritesTransaction; // workaround cyclic dependency
struct TransactionInfo {
Optional<UID> debugID;
TaskPriority taskID;
Span span;
bool useProvisionalProxies;
// Used to save conflicting keys if FDBTransactionOptions::REPORT_CONFLICTING_KEYS is enabled
// prefix/<key1> : '1' - any keys equal or larger than this key are (probably) conflicting keys
// prefix/<key2> : '0' - any keys equal or larger than this key are (definitely) not conflicting keys
std::shared_ptr<CoalescedKeyRangeMap<Value>> conflictingKeys;
explicit TransactionInfo(TaskPriority taskID)
: taskID(taskID), span(deterministicRandom()->randomUniqueID(), "Transaction"_loc), useProvisionalProxies(false) {
}
explicit TransactionInfo( TaskPriority taskID ) : taskID(taskID), useProvisionalProxies(false) {}
};
struct TransactionLogInfo : public ReferenceCounted<TransactionLogInfo>, NonCopyable {
@ -334,7 +334,7 @@ private:
Future<Void> committing;
};
ACTOR Future<Version> waitForCommittedVersion(Database cx, Version version, SpanID spanContext);
ACTOR Future<Version> waitForCommittedVersion(Database cx, Version version);
ACTOR Future<Standalone<VectorRef<DDMetricsRef>>> waitDataDistributionMetricsList(Database cx, KeyRange keys,
int shardLimit);

View File

@ -150,7 +150,7 @@ const KeyRef JSONSchemas::statusSchema = LiteralStringRef(R"statusSchema(
"fractional_cost": 0.0,
"estimated_cost":{
"hz": 0.0
}
}
}
}
],
@ -612,6 +612,7 @@ const KeyRef JSONSchemas::statusSchema = LiteralStringRef(R"statusSchema(
"ssd-1",
"ssd-2",
"ssd-redwood-experimental",
"ssd-rocksdb-experimental",
"memory",
"memory-1",
"memory-2",

View File

@ -37,13 +37,17 @@ std::unordered_map<SpecialKeySpace::MODULE, KeyRange> SpecialKeySpace::moduleToB
// This function will move the given KeySelector as far as possible to the standard form:
// orEqual == false && offset == 1 (Standard form)
// If the corresponding key is not in the underlying key range, it will move over the range
// The cache object is used to cache the first read result from the rpc call during the key resolution,
// then when we need to do key resolution or result filtering,
// we, instead of rpc call, read from this cache object have consistent results
ACTOR Future<Void> moveKeySelectorOverRangeActor(const SpecialKeyRangeBaseImpl* skrImpl, ReadYourWritesTransaction* ryw,
KeySelector* ks) {
KeySelector* ks, Optional<Standalone<RangeResultRef>>* cache) {
ASSERT(!ks->orEqual); // should be removed before calling
ASSERT(ks->offset != 1); // never being called if KeySelector is already normalized
state Key startKey(skrImpl->getKeyRange().begin);
state Key endKey(skrImpl->getKeyRange().end);
state Standalone<RangeResultRef> result;
if (ks->offset < 1) {
// less than the given key
@ -60,7 +64,15 @@ ACTOR Future<Void> moveKeySelectorOverRangeActor(const SpecialKeyRangeBaseImpl*
.detail("SpecialKeyRangeStart", skrImpl->getKeyRange().begin)
.detail("SpecialKeyRangeEnd", skrImpl->getKeyRange().end);
Standalone<RangeResultRef> result = wait(skrImpl->getRange(ryw, KeyRangeRef(startKey, endKey)));
if (skrImpl->isAsync()) {
const SpecialKeyRangeAsyncImpl* ptr = dynamic_cast<const SpecialKeyRangeAsyncImpl*>(skrImpl);
Standalone<RangeResultRef> result_ = wait(ptr->getRange(ryw, KeyRangeRef(startKey, endKey), cache));
result = result_;
} else {
Standalone<RangeResultRef> result_ = wait(skrImpl->getRange(ryw, KeyRangeRef(startKey, endKey)));
result = result_;
}
if (result.size() == 0) {
TraceEvent(SevDebug, "ZeroElementsIntheRange").detail("Start", startKey).detail("End", endKey);
return Void();
@ -107,7 +119,8 @@ void onModuleRead(ReadYourWritesTransaction* ryw, SpecialKeySpace::MODULE module
// to maintain; Thus, separate each part to make the code easy to understand and more compact
ACTOR Future<Void> normalizeKeySelectorActor(SpecialKeySpace* sks, ReadYourWritesTransaction* ryw, KeySelector* ks,
Optional<SpecialKeySpace::MODULE>* lastModuleRead, int* actualOffset,
Standalone<RangeResultRef>* result) {
Standalone<RangeResultRef>* result,
Optional<Standalone<RangeResultRef>>* cache) {
state RangeMap<Key, SpecialKeyRangeBaseImpl*, KeyRangeRef>::Iterator iter =
ks->offset < 1 ? sks->getImpls().rangeContainingKeyBefore(ks->getKey())
: sks->getImpls().rangeContaining(ks->getKey());
@ -115,7 +128,7 @@ ACTOR Future<Void> normalizeKeySelectorActor(SpecialKeySpace* sks, ReadYourWrite
(ks->offset > 1 && iter != sks->getImpls().ranges().end())) {
onModuleRead(ryw, sks->getModules().rangeContaining(iter->begin())->value(), *lastModuleRead);
if (iter->value() != nullptr) {
wait(moveKeySelectorOverRangeActor(iter->value(), ryw, ks));
wait(moveKeySelectorOverRangeActor(iter->value(), ryw, ks, cache));
}
ks->offset < 1 ? --iter : ++iter;
}
@ -164,13 +177,16 @@ SpecialKeySpace::getRangeAggregationActor(SpecialKeySpace* sks, ReadYourWritesTr
// This function handles ranges which cover more than one keyrange and aggregates all results
// KeySelector, GetRangeLimits and reverse are all handled here
state Standalone<RangeResultRef> result;
state Standalone<RangeResultRef> pairs;
state RangeMap<Key, SpecialKeyRangeBaseImpl*, KeyRangeRef>::Iterator iter;
state int actualBeginOffset;
state int actualEndOffset;
state Optional<SpecialKeySpace::MODULE> lastModuleRead;
// used to cache result from potential first read
state Optional<Standalone<RangeResultRef>> cache;
wait(normalizeKeySelectorActor(sks, ryw, &begin, &lastModuleRead, &actualBeginOffset, &result));
wait(normalizeKeySelectorActor(sks, ryw, &end, &lastModuleRead, &actualEndOffset, &result));
wait(normalizeKeySelectorActor(sks, ryw, &begin, &lastModuleRead, &actualBeginOffset, &result, &cache));
wait(normalizeKeySelectorActor(sks, ryw, &end, &lastModuleRead, &actualEndOffset, &result, &cache));
// Handle all corner cases like what RYW does
// return if range inverted
if (actualBeginOffset >= actualEndOffset && begin.getKey() >= end.getKey()) {
@ -195,7 +211,14 @@ SpecialKeySpace::getRangeAggregationActor(SpecialKeySpace* sks, ReadYourWritesTr
KeyRangeRef kr = iter->range();
KeyRef keyStart = kr.contains(begin.getKey()) ? begin.getKey() : kr.begin;
KeyRef keyEnd = kr.contains(end.getKey()) ? end.getKey() : kr.end;
Standalone<RangeResultRef> pairs = wait(iter->value()->getRange(ryw, KeyRangeRef(keyStart, keyEnd)));
if (iter->value()->isAsync() && cache.present()) {
const SpecialKeyRangeAsyncImpl* ptr = dynamic_cast<const SpecialKeyRangeAsyncImpl*>(iter->value());
Standalone<RangeResultRef> pairs_ = wait(ptr->getRange(ryw, KeyRangeRef(keyStart, keyEnd), &cache));
pairs = pairs_;
} else {
Standalone<RangeResultRef> pairs_ = wait(iter->value()->getRange(ryw, KeyRangeRef(keyStart, keyEnd)));
pairs = pairs_;
}
result.arena().dependsOn(pairs.arena());
// limits handler
for (int i = pairs.size() - 1; i >= 0; --i) {
@ -218,7 +241,14 @@ SpecialKeySpace::getRangeAggregationActor(SpecialKeySpace* sks, ReadYourWritesTr
KeyRangeRef kr = iter->range();
KeyRef keyStart = kr.contains(begin.getKey()) ? begin.getKey() : kr.begin;
KeyRef keyEnd = kr.contains(end.getKey()) ? end.getKey() : kr.end;
Standalone<RangeResultRef> pairs = wait(iter->value()->getRange(ryw, KeyRangeRef(keyStart, keyEnd)));
if (iter->value()->isAsync() && cache.present()) {
const SpecialKeyRangeAsyncImpl* ptr = dynamic_cast<const SpecialKeyRangeAsyncImpl*>(iter->value());
Standalone<RangeResultRef> pairs_ = wait(ptr->getRange(ryw, KeyRangeRef(keyStart, keyEnd), &cache));
pairs = pairs_;
} else {
Standalone<RangeResultRef> pairs_ = wait(iter->value()->getRange(ryw, KeyRangeRef(keyStart, keyEnd)));
pairs = pairs_;
}
result.arena().dependsOn(pairs.arena());
// limits handler
for (int i = 0; i < pairs.size(); ++i) {
@ -316,7 +346,7 @@ Future<Standalone<RangeResultRef>> ConflictingKeysImpl::getRange(ReadYourWritesT
return result;
}
ACTOR Future<Standalone<RangeResultRef>> ddStatsGetRangeActor(ReadYourWritesTransaction* ryw, KeyRangeRef kr) {
ACTOR Future<Standalone<RangeResultRef>> ddMetricsGetRangeActor(ReadYourWritesTransaction* ryw, KeyRangeRef kr) {
try {
auto keys = kr.removePrefix(ddStatsRange.begin);
Standalone<VectorRef<DDMetricsRef>> resultWithoutPrefix =
@ -339,10 +369,10 @@ ACTOR Future<Standalone<RangeResultRef>> ddStatsGetRangeActor(ReadYourWritesTran
}
}
DDStatsRangeImpl::DDStatsRangeImpl(KeyRangeRef kr) : SpecialKeyRangeBaseImpl(kr) {}
DDStatsRangeImpl::DDStatsRangeImpl(KeyRangeRef kr) : SpecialKeyRangeAsyncImpl(kr) {}
Future<Standalone<RangeResultRef>> DDStatsRangeImpl::getRange(ReadYourWritesTransaction* ryw, KeyRangeRef kr) const {
return ddStatsGetRangeActor(ryw, kr);
return ddMetricsGetRangeActor(ryw, kr);
}
class SpecialKeyRangeTestImpl : public SpecialKeyRangeBaseImpl {

View File

@ -40,6 +40,10 @@ public:
explicit SpecialKeyRangeBaseImpl(KeyRangeRef kr) : range(kr) {}
KeyRangeRef getKeyRange() const { return range; }
// true if the getRange call can emit more than one rpc calls,
// we cache the results to keep consistency in the same getrange lifetime
// TODO : give this function a more descriptive name
virtual bool isAsync() const { return false; }
virtual ~SpecialKeyRangeBaseImpl() {}
@ -47,6 +51,45 @@ protected:
KeyRange range; // underlying key range for this function
};
class SpecialKeyRangeAsyncImpl : public SpecialKeyRangeBaseImpl {
public:
explicit SpecialKeyRangeAsyncImpl(KeyRangeRef kr) : SpecialKeyRangeBaseImpl(kr) {}
Future<Standalone<RangeResultRef>> getRange(ReadYourWritesTransaction* ryw, KeyRangeRef kr) const = 0;
// calling with a cache object to have consistent results if we need to call rpc
Future<Standalone<RangeResultRef>> getRange(ReadYourWritesTransaction* ryw, KeyRangeRef kr,
Optional<Standalone<RangeResultRef>>* cache) const {
return getRangeAsyncActor(this, ryw, kr, cache);
}
bool isAsync() const override { return true; }
ACTOR static Future<Standalone<RangeResultRef>> getRangeAsyncActor(const SpecialKeyRangeBaseImpl* skrAyncImpl,
ReadYourWritesTransaction* ryw, KeyRangeRef kr,
Optional<Standalone<RangeResultRef>>* cache) {
ASSERT(skrAyncImpl->getKeyRange().contains(kr));
ASSERT(cache != nullptr);
if (!cache->present()) {
// For simplicity, every time we need to cache, we read the whole range
// Although sometimes the range can be narrowed,
// there is not a general way to do it in complicated scenarios
Standalone<RangeResultRef> result_ = wait(skrAyncImpl->getRange(ryw, skrAyncImpl->getKeyRange()));
*cache = result_;
}
const auto& allResults = cache->get();
int start = 0, end = allResults.size();
while (start < allResults.size() && allResults[start].key < kr.begin) ++start;
while (end > 0 && allResults[end - 1].key >= kr.end) --end;
if (start < end) {
Standalone<RangeResultRef> result = RangeResultRef(allResults.slice(start, end), false);
result.arena().dependsOn(allResults.arena());
return result;
} else
return Standalone<RangeResultRef>();
}
};
class SpecialKeySpace {
public:
enum class MODULE {
@ -148,7 +191,7 @@ public:
Future<Standalone<RangeResultRef>> getRange(ReadYourWritesTransaction* ryw, KeyRangeRef kr) const override;
};
class DDStatsRangeImpl : public SpecialKeyRangeBaseImpl {
class DDStatsRangeImpl : public SpecialKeyRangeAsyncImpl {
public:
explicit DDStatsRangeImpl(KeyRangeRef kr);
Future<Standalone<RangeResultRef>> getRange(ReadYourWritesTransaction* ryw, KeyRangeRef kr) const override;

View File

@ -169,7 +169,6 @@ struct GetValueReply : public LoadBalancedReply {
struct GetValueRequest : TimedRequest {
constexpr static FileIdentifier file_identifier = 8454530;
SpanID spanContext;
Key key;
Version version;
Optional<TagSet> tags;
@ -177,12 +176,11 @@ struct GetValueRequest : TimedRequest {
ReplyPromise<GetValueReply> reply;
GetValueRequest(){}
GetValueRequest(SpanID spanContext, const Key& key, Version ver, Optional<TagSet> tags, Optional<UID> debugID)
: spanContext(spanContext), key(key), version(ver), tags(tags), debugID(debugID) {}
template <class Ar>
GetValueRequest(const Key& key, Version ver, Optional<TagSet> tags, Optional<UID> debugID) : key(key), version(ver), tags(tags), debugID(debugID) {}
template <class Ar>
void serialize( Ar& ar ) {
serializer(ar, key, version, tags, debugID, reply, spanContext);
serializer(ar, key, version, tags, debugID, reply);
}
};
@ -202,7 +200,6 @@ struct WatchValueReply {
struct WatchValueRequest {
constexpr static FileIdentifier file_identifier = 14747733;
SpanID spanContext;
Key key;
Optional<Value> value;
Version version;
@ -211,13 +208,11 @@ struct WatchValueRequest {
ReplyPromise<WatchValueReply> reply;
WatchValueRequest(){}
WatchValueRequest(SpanID spanContext, const Key& key, Optional<Value> value, Version ver, Optional<TagSet> tags,
Optional<UID> debugID)
: spanContext(spanContext), key(key), value(value), version(ver), tags(tags), debugID(debugID) {}
template <class Ar>
WatchValueRequest(const Key& key, Optional<Value> value, Version ver, Optional<TagSet> tags, Optional<UID> debugID) : key(key), value(value), version(ver), tags(tags), debugID(debugID) {}
template <class Ar>
void serialize( Ar& ar ) {
serializer(ar, key, value, version, tags, debugID, reply, spanContext);
serializer(ar, key, value, version, tags, debugID, reply);
}
};
@ -239,7 +234,6 @@ struct GetKeyValuesReply : public LoadBalancedReply {
struct GetKeyValuesRequest : TimedRequest {
constexpr static FileIdentifier file_identifier = 6795746;
SpanID spanContext;
Arena arena;
KeySelectorRef begin, end;
Version version; // or latestVersion
@ -252,7 +246,7 @@ struct GetKeyValuesRequest : TimedRequest {
GetKeyValuesRequest() : isFetchKeys(false) {}
template <class Ar>
void serialize( Ar& ar ) {
serializer(ar, begin, end, version, limit, limitBytes, isFetchKeys, tags, debugID, reply, spanContext, arena);
serializer(ar, begin, end, version, limit, limitBytes, isFetchKeys, tags, debugID, reply, arena);
}
};
@ -272,7 +266,6 @@ struct GetKeyReply : public LoadBalancedReply {
struct GetKeyRequest : TimedRequest {
constexpr static FileIdentifier file_identifier = 10457870;
SpanID spanContext;
Arena arena;
KeySelectorRef sel;
Version version; // or latestVersion
@ -281,13 +274,11 @@ struct GetKeyRequest : TimedRequest {
ReplyPromise<GetKeyReply> reply;
GetKeyRequest() {}
GetKeyRequest(SpanID spanContext, KeySelectorRef const& sel, Version version, Optional<TagSet> tags,
Optional<UID> debugID)
: spanContext(spanContext), sel(sel), version(version), debugID(debugID) {}
GetKeyRequest(KeySelectorRef const& sel, Version version, Optional<TagSet> tags, Optional<UID> debugID) : sel(sel), version(version), debugID(debugID) {}
template <class Ar>
void serialize( Ar& ar ) {
serializer(ar, sel, version, tags, debugID, reply, spanContext, arena);
serializer(ar, sel, version, tags, debugID, reply, arena);
}
};

View File

@ -268,8 +268,6 @@ description is not currently required but encouraged.
description="Adds a tag to the transaction that can be used to apply manual targeted throttling. At most 5 tags can be set on a transaction." />
<Option name="auto_throttle_tag" code="801" paramType="String" paramDescription="String identifier used to associated this transaction with a throttling group. Must not exceed 16 characters."
description="Adds a tag to the transaction that can be used to apply manual or automatic targeted throttling. At most 5 tags can be set on a transaction." />
<Option name="span_parent" code="900" paramType="Bytes" paramDescription="A byte string of length 16 used to associate the span of this transaction with a parent"
description="Adds a parent to the Span of this transaction. Used for transaction tracing. A span can be identified with any 16 bytes"/>
</Scope>
<!-- The enumeration values matter - do not change them without

View File

@ -32,8 +32,6 @@
#include "fdbserver/WorkerInterface.actor.h"
#include "flow/Error.h"
#include "flow/IRandom.h"
#include "flow/Tracing.h"
#include "flow/actorcompiler.h" // This must be the last #include.
#define SevDebugMemory SevVerbose
@ -431,9 +429,8 @@ struct BackupData {
}
ACTOR static Future<Version> _getMinKnownCommittedVersion(BackupData* self) {
state Span span(deterministicRandom()->randomUniqueID(), "BA:GetMinCommittedVersion"_loc);
loop {
GetReadVersionRequest request(span->context, 1, TransactionPriority::DEFAULT,
GetReadVersionRequest request(1, TransactionPriority::DEFAULT,
GetReadVersionRequest::FLAG_USE_MIN_KNOWN_COMMITTED_VERSION);
choose {
when(wait(self->cx->onMasterProxiesChanged())) {}

View File

@ -29,6 +29,7 @@ set(FDBSERVER_SRCS
IVersionedStore.h
KeyValueStoreCompressTestData.actor.cpp
KeyValueStoreMemory.actor.cpp
KeyValueStoreRocksDB.actor.cpp
KeyValueStoreSQLite.actor.cpp
Knobs.cpp
Knobs.h

View File

@ -215,10 +215,6 @@ TEST_CASE("/fdbserver/Coordination/localGenerationReg/simple") {
}
ACTOR Future<Void> openDatabase(ClientData* db, int* clientCount, Reference<AsyncVar<bool>> hasConnectedClients, OpenDatabaseCoordRequest req) {
if(db->clientInfo->get().read().id != req.knownClientInfoID && !db->clientInfo->get().read().forward.present()) {
req.reply.send( db->clientInfo->get() );
return Void();
}
++(*clientCount);
hasConnectedClients->set(true);
@ -247,11 +243,6 @@ ACTOR Future<Void> openDatabase(ClientData* db, int* clientCount, Reference<Asyn
}
ACTOR Future<Void> remoteMonitorLeader( int* clientCount, Reference<AsyncVar<bool>> hasConnectedClients, Reference<AsyncVar<Optional<LeaderInfo>>> currentElectedLeader, ElectionResultRequest req ) {
if (currentElectedLeader->get().present() && req.knownLeader != currentElectedLeader->get().get().changeID) {
req.reply.send( currentElectedLeader->get() );
return Void();
}
++(*clientCount);
hasConnectedClients->set(true);
@ -293,16 +284,24 @@ ACTOR Future<Void> leaderRegister(LeaderElectionRegInterface interf, Key key) {
loop choose {
when ( OpenDatabaseCoordRequest req = waitNext( interf.openDatabase.getFuture() ) ) {
if(!leaderMon.isValid()) {
leaderMon = monitorLeaderForProxies(req.clusterKey, req.coordinators, &clientData, currentElectedLeader);
if (clientData.clientInfo->get().read().id != req.knownClientInfoID && !clientData.clientInfo->get().read().forward.present()) {
req.reply.send(clientData.clientInfo->get());
} else {
if(!leaderMon.isValid()) {
leaderMon = monitorLeaderForProxies(req.clusterKey, req.coordinators, &clientData, currentElectedLeader);
}
actors.add(openDatabase(&clientData, &clientCount, hasConnectedClients, req));
}
actors.add(openDatabase(&clientData, &clientCount, hasConnectedClients, req));
}
when ( ElectionResultRequest req = waitNext( interf.electionResult.getFuture() ) ) {
if(!leaderMon.isValid()) {
leaderMon = monitorLeaderForProxies(req.key, req.coordinators, &clientData, currentElectedLeader);
if (currentElectedLeader->get().present() && req.knownLeader != currentElectedLeader->get().get().changeID) {
req.reply.send(currentElectedLeader->get());
} else {
if(!leaderMon.isValid()) {
leaderMon = monitorLeaderForProxies(req.key, req.coordinators, &clientData, currentElectedLeader);
}
actors.add(remoteMonitorLeader(&clientCount, hasConnectedClients, currentElectedLeader, req));
}
actors.add( remoteMonitorLeader( &clientCount, hasConnectedClients, currentElectedLeader, req ) );
}
when ( GetLeaderRequest req = waitNext( interf.getLeader.getFuture() ) ) {
if (currentNominee.present() && currentNominee.get().changeID != req.knownLeader) {

View File

@ -822,7 +822,9 @@ ACTOR Future<Void> fetchShardMetricsList_impl( DataDistributionTracker* self, Ge
// list of metrics, regenerate on loop when full range unsuccessful
Standalone<VectorRef<DDMetricsRef>> result;
Future<Void> onChange;
for (auto t : self->shards.containedRanges(req.keys)) {
auto beginIter = self->shards.containedRanges(req.keys).begin();
auto endIter = self->shards.intersectingRanges(req.keys).end();
for (auto t = beginIter; t != endIter; ++t) {
auto &stats = t.value().stats;
if( !stats->get().present() ) {
onChange = stats->onChange();

View File

@ -87,6 +87,7 @@ protected:
extern IKeyValueStore* keyValueStoreSQLite( std::string const& filename, UID logID, KeyValueStoreType storeType, bool checkChecksums=false, bool checkIntegrity=false );
extern IKeyValueStore* keyValueStoreRedwoodV1( std::string const& filename, UID logID);
extern IKeyValueStore* keyValueStoreRocksDB(std::string const& path, UID logID, KeyValueStoreType storeType, bool checkChecksums=false, bool checkIntegrity=false);
extern IKeyValueStore* keyValueStoreMemory(std::string const& basename, UID logID, int64_t memoryLimit,
std::string ext = "fdq",
KeyValueStoreType storeType = KeyValueStoreType::MEMORY);
@ -102,8 +103,10 @@ inline IKeyValueStore* openKVStore( KeyValueStoreType storeType, std::string con
return keyValueStoreMemory( filename, logID, memoryLimit );
case KeyValueStoreType::SSD_REDWOOD_V1:
return keyValueStoreRedwoodV1( filename, logID );
case KeyValueStoreType::MEMORY_RADIXTREE:
return keyValueStoreMemory(filename, logID, memoryLimit, "fdr", KeyValueStoreType::MEMORY_RADIXTREE); // for radixTree type, set file ext to "fdr"
case KeyValueStoreType::SSD_ROCKSDB_V1:
return keyValueStoreRocksDB(filename, logID, storeType);
case KeyValueStoreType::MEMORY_RADIXTREE:
return keyValueStoreMemory(filename, logID, memoryLimit, "fdr", KeyValueStoreType::MEMORY_RADIXTREE); // for radixTree type, set file ext to "fdr"
default:
UNREACHABLE();
}

View File

@ -0,0 +1,431 @@
#ifdef SSD_ROCKSDB_EXPERIMENTAL
#include <rocksdb/env.h>
#include <rocksdb/db.h>
#include "flow/flow.h"
#include "fdbrpc/AsyncFileCached.actor.h"
#include "fdbserver/CoroFlow.h"
#endif // SSD_ROCKSDB_EXPERIMENTAL
#include "fdbserver/IKeyValueStore.h"
#include "flow/actorcompiler.h" // has to be last include
#ifdef SSD_ROCKSDB_EXPERIMENTAL
namespace {
class FlowLogger : public rocksdb::Logger, public FastAllocated<FlowLogger> {
UID id;
std::string loggerName;
size_t logSize = 0;
public:
explicit FlowLogger(UID id, const std::string& loggerName, const rocksdb::InfoLogLevel log_level = rocksdb::InfoLogLevel::INFO_LEVEL)
: rocksdb::Logger(log_level)
, id(id)
, loggerName(loggerName) {}
rocksdb::Status Close() override { return rocksdb::Status::OK(); }
void Logv(const char* fmtString, va_list ap) override {
Logv(rocksdb::InfoLogLevel::INFO_LEVEL, fmtString, ap);
}
void Logv(const rocksdb::InfoLogLevel log_level, const char* fmtString, va_list ap) override {
Severity sev;
switch (log_level) {
case rocksdb::InfoLogLevel::DEBUG_LEVEL:
sev = SevDebug;
break;
case rocksdb::InfoLogLevel::INFO_LEVEL:
case rocksdb::InfoLogLevel::HEADER_LEVEL:
case rocksdb::InfoLogLevel::NUM_INFO_LOG_LEVELS:
sev = SevInfo;
break;
case rocksdb::InfoLogLevel::WARN_LEVEL:
sev = SevWarn;
break;
case rocksdb::InfoLogLevel::ERROR_LEVEL:
sev = SevWarnAlways;
break;
case rocksdb::InfoLogLevel::FATAL_LEVEL:
sev = SevError;
break;
}
std::string outStr;
auto sz = vsformat(outStr, fmtString, ap);
if (sz < 0) {
TraceEvent(SevError, "RocksDBLogFormatError", id)
.detail("Logger", loggerName)
.detail("FormatString", fmtString);
return;
}
logSize += sz;
TraceEvent(sev, "RocksDBLogMessage", id)
.detail("Msg", outStr);
}
size_t GetLogFileSize() const override {
return logSize;
}
};
rocksdb::Slice toSlice(StringRef s) {
return rocksdb::Slice(reinterpret_cast<const char*>(s.begin()), s.size());
}
StringRef toStringRef(rocksdb::Slice s) {
return StringRef(reinterpret_cast<const uint8_t*>(s.data()), s.size());
}
rocksdb::Options getOptions(const std::string& path) {
rocksdb::Options options;
bool exists = directoryExists(path);
options.create_if_missing = !exists;
return options;
}
rocksdb::ColumnFamilyOptions getCFOptions() {
return {};
}
struct RocksDBKeyValueStore : IKeyValueStore {
using DB = rocksdb::DB*;
using CF = rocksdb::ColumnFamilyHandle*;
struct Writer : IThreadPoolReceiver {
DB& db;
UID id;
explicit Writer(DB& db, UID id) : db(db), id(id) {}
~Writer() {
if (db) {
delete db;
}
}
void init() override {}
Error statusToError(const rocksdb::Status& s) {
if (s == rocksdb::Status::IOError()) {
return io_error();
} else {
return unknown_error();
}
}
struct OpenAction : TypedAction<Writer, OpenAction> {
std::string path;
ThreadReturnPromise<Void> done;
double getTimeEstimate() {
return SERVER_KNOBS->COMMIT_TIME_ESTIMATE;
}
};
void action(OpenAction& a) {
std::vector<rocksdb::ColumnFamilyDescriptor> defaultCF = { rocksdb::ColumnFamilyDescriptor{
"default", getCFOptions() } };
std::vector<rocksdb::ColumnFamilyHandle*> handle;
auto status = rocksdb::DB::Open(getOptions(a.path), a.path, defaultCF, &handle, &db);
if (!status.ok()) {
TraceEvent(SevError, "RocksDBError").detail("Error", status.ToString()).detail("Method", "Open");
a.done.sendError(statusToError(status));
} else {
a.done.send(Void());
}
}
struct CommitAction : TypedAction<Writer, CommitAction> {
std::unique_ptr<rocksdb::WriteBatch> batchToCommit;
ThreadReturnPromise<Void> done;
double getTimeEstimate() override { return SERVER_KNOBS->COMMIT_TIME_ESTIMATE; }
};
void action(CommitAction& a) {
rocksdb::WriteOptions options;
options.sync = true;
auto s = db->Write(options, a.batchToCommit.get());
if (!s.ok()) {
TraceEvent(SevError, "RocksDBError").detail("Error", s.ToString()).detail("Method", "Commit");
a.done.sendError(statusToError(s));
} else {
a.done.send(Void());
}
}
struct CloseAction : TypedAction<Writer, CloseAction> {
ThreadReturnPromise<Void> done;
double getTimeEstimate() override { return SERVER_KNOBS->COMMIT_TIME_ESTIMATE; }
};
void action(CloseAction& a) {
auto s = db->Close();
if (!s.ok()) {
TraceEvent(SevError, "RocksDBError").detail("Error", s.ToString()).detail("Method", "Close");
}
a.done.send(Void());
}
};
struct Reader : IThreadPoolReceiver {
DB& db;
rocksdb::ReadOptions readOptions;
std::unique_ptr<rocksdb::Iterator> cursor = nullptr;
explicit Reader(DB& db)
: db(db)
{
readOptions.total_order_seek = true;
}
void init() override {}
struct ReadValueAction : TypedAction<Reader, ReadValueAction> {
Key key;
Optional<UID> debugID;
ThreadReturnPromise<Optional<Value>> result;
ReadValueAction(KeyRef key, Optional<UID> debugID)
: key(key), debugID(debugID)
{}
double getTimeEstimate() override { return SERVER_KNOBS->READ_VALUE_TIME_ESTIMATE; }
};
void action(ReadValueAction& a) {
Optional<TraceBatch> traceBatch;
if (a.debugID.present()) {
traceBatch = { TraceBatch{} };
traceBatch.get().addEvent("GetValueDebug", a.debugID.get().first(), "Reader.Before");
}
rocksdb::PinnableSlice value;
auto s = db->Get(readOptions, db->DefaultColumnFamily(), toSlice(a.key), &value);
if (a.debugID.present()) {
traceBatch.get().addEvent("GetValueDebug", a.debugID.get().first(), "Reader.After");
traceBatch.get().dump();
}
if (s.ok()) {
a.result.send(Value(toStringRef(value)));
} else {
if (!s.IsNotFound()) {
TraceEvent(SevError, "RocksDBError").detail("Error", s.ToString()).detail("Method", "ReadValue");
}
a.result.send(Optional<Value>());
}
}
struct ReadValuePrefixAction : TypedAction<Reader, ReadValuePrefixAction> {
Key key;
int maxLength;
Optional<UID> debugID;
ThreadReturnPromise<Optional<Value>> result;
ReadValuePrefixAction(Key key, int maxLength, Optional<UID> debugID) : key(key), maxLength(maxLength), debugID(debugID) {};
virtual double getTimeEstimate() { return SERVER_KNOBS->READ_VALUE_TIME_ESTIMATE; }
};
void action(ReadValuePrefixAction& a) {
rocksdb::PinnableSlice value;
Optional<TraceBatch> traceBatch;
if (a.debugID.present()) {
traceBatch = { TraceBatch{} };
traceBatch.get().addEvent("GetValuePrefixDebug", a.debugID.get().first(),
"Reader.Before"); //.detail("TaskID", g_network->getCurrentTask());
}
auto s = db->Get(readOptions, db->DefaultColumnFamily(), toSlice(a.key), &value);
if (a.debugID.present()) {
traceBatch.get().addEvent("GetValuePrefixDebug", a.debugID.get().first(),
"Reader.After"); //.detail("TaskID", g_network->getCurrentTask());
traceBatch.get().dump();
}
if (s.ok()) {
a.result.send(Value(StringRef(reinterpret_cast<const uint8_t*>(value.data()),
std::min(value.size(), size_t(a.maxLength)))));
} else {
TraceEvent(SevError, "RocksDBError").detail("Error", s.ToString()).detail("Method", "ReadValuePrefix");
a.result.send(Optional<Value>());
}
}
struct ReadRangeAction : TypedAction<Reader, ReadRangeAction>, FastAllocated<ReadRangeAction> {
KeyRange keys;
int rowLimit, byteLimit;
ThreadReturnPromise<Standalone<RangeResultRef>> result;
ReadRangeAction(KeyRange keys, int rowLimit, int byteLimit) : keys(keys), rowLimit(rowLimit), byteLimit(byteLimit) {}
virtual double getTimeEstimate() { return SERVER_KNOBS->READ_RANGE_TIME_ESTIMATE; }
};
void action(ReadRangeAction& a) {
auto cursor = std::unique_ptr<rocksdb::Iterator>(db->NewIterator(readOptions));
Standalone<RangeResultRef> result;
int accumulatedBytes = 0;
if (a.rowLimit >= 0) {
cursor->Seek(toSlice(a.keys.begin));
while (cursor->Valid() && toStringRef(cursor->key()) < a.keys.end && result.size() < a.rowLimit &&
accumulatedBytes < a.byteLimit) {
KeyValueRef kv(toStringRef(cursor->key()), toStringRef(cursor->value()));
accumulatedBytes += sizeof(KeyValueRef) + kv.expectedSize();
result.push_back_deep(result.arena(), kv);
cursor->Next();
}
} else {
cursor->Seek(toSlice(a.keys.end));
if (!cursor->Valid()) {
cursor->SeekToLast();
} else {
cursor->Prev();
}
while (cursor->Valid() && toStringRef(cursor->key()) >= a.keys.begin && result.size() < -a.rowLimit &&
accumulatedBytes < a.byteLimit) {
KeyValueRef kv(toStringRef(cursor->key()), toStringRef(cursor->value()));
accumulatedBytes += sizeof(KeyValueRef) + kv.expectedSize();
result.push_back_deep(result.arena(), kv);
cursor->Prev();
}
}
auto s = cursor->status();
if (!s.ok()) {
TraceEvent(SevError, "RocksDBError").detail("Error", s.ToString()).detail("Method", "ReadRange");
}
result.more = (result.size() == a.rowLimit);
if (result.more) {
result.readThrough = result[result.size()-1].key;
}
a.result.send(result);
}
};
DB db = nullptr;
std::string path;
UID id;
size_t diskBytesUsed = 0;
Reference<IThreadPool> writeThread;
Reference<IThreadPool> readThreads;
unsigned nReaders = 2;
Promise<Void> errorPromise;
Promise<Void> closePromise;
std::unique_ptr<rocksdb::WriteBatch> writeBatch;
explicit RocksDBKeyValueStore(const std::string& path, UID id)
: path(path)
, id(id)
{
writeThread = createGenericThreadPool();
readThreads = createGenericThreadPool();
writeThread->addThread(new Writer(db, id));
for (unsigned i = 0; i < nReaders; ++i) {
readThreads->addThread(new Reader(db));
}
}
Future<Void> getError() override {
return errorPromise.getFuture();
}
ACTOR static void doClose(RocksDBKeyValueStore* self, bool deleteOnClose) {
wait(self->readThreads->stop());
auto a = new Writer::CloseAction{};
auto f = a->done.getFuture();
self->writeThread->post(a);
wait(f);
wait(self->writeThread->stop());
// TODO: delete data on close
if (self->closePromise.canBeSet()) self->closePromise.send(Void());
if (self->errorPromise.canBeSet()) self->errorPromise.send(Never());
if (deleteOnClose) {
std::vector<rocksdb::ColumnFamilyDescriptor> defaultCF = { rocksdb::ColumnFamilyDescriptor{
"default", getCFOptions() } };
rocksdb::DestroyDB(self->path, getOptions(self->path), defaultCF);
}
delete self;
}
Future<Void> onClosed() override {
return closePromise.getFuture();
}
void dispose() override {
doClose(this, true);
}
void close() override {
doClose(this, false);
}
KeyValueStoreType getType() override {
return KeyValueStoreType(KeyValueStoreType::SSD_ROCKSDB_V1);
}
Future<Void> init() override {
std::unique_ptr<Writer::OpenAction> a(new Writer::OpenAction());
a->path = path;
auto res = a->done.getFuture();
writeThread->post(a.release());
return res;
}
void set(KeyValueRef kv, const Arena*) override {
if (writeBatch == nullptr) {
writeBatch.reset(new rocksdb::WriteBatch());
}
writeBatch->Put(toSlice(kv.key), toSlice(kv.value));
}
void clear(KeyRangeRef keyRange, const Arena*) override {
if (writeBatch == nullptr) {
writeBatch.reset(new rocksdb::WriteBatch());
}
writeBatch->DeleteRange(toSlice(keyRange.begin), toSlice(keyRange.end));
}
Future<Void> commit(bool) override {
// If there is nothing to write, don't write.
if (writeBatch == nullptr) {
return Void();
}
auto a = new Writer::CommitAction();
a->batchToCommit = std::move(writeBatch);
auto res = a->done.getFuture();
writeThread->post(a);
return res;
}
Future<Optional<Value>> readValue(KeyRef key, Optional<UID> debugID) override {
auto a = new Reader::ReadValueAction(key, debugID);
auto res = a->result.getFuture();
readThreads->post(a);
return res;
}
Future<Optional<Value>> readValuePrefix(KeyRef key, int maxLength, Optional<UID> debugID) override {
auto a = new Reader::ReadValuePrefixAction(key, maxLength, debugID);
auto res = a->result.getFuture();
readThreads->post(a);
return res;
}
Future<Standalone<RangeResultRef>> readRange(KeyRangeRef keys, int rowLimit, int byteLimit) override {
auto a = new Reader::ReadRangeAction(keys, rowLimit, byteLimit);
auto res = a->result.getFuture();
readThreads->post(a);
return res;
}
StorageBytes getStorageBytes() override {
int64_t free;
int64_t total;
g_network->getDiskBytes(path, free, total);
return StorageBytes(free, total, diskBytesUsed, free);
}
};
} // namespace
#endif // SSD_ROCKSDB_EXPERIMENTAL
IKeyValueStore* keyValueStoreRocksDB(std::string const& path, UID logID, KeyValueStoreType storeType, bool checkChecksums, bool checkIntegrity) {
#ifdef SSD_ROCKSDB_EXPERIMENTAL
return new RocksDBKeyValueStore(path, logID);
#else
TraceEvent(SevError, "RocksDBEngineInitFailure").detail("Reason", "Built without RocksDB");
ASSERT(false);
return nullptr;
#endif // SSD_ROCKSDB_EXPERIMENTAL
}

View File

@ -163,39 +163,35 @@ struct GetCommitVersionReply {
struct GetCommitVersionRequest {
constexpr static FileIdentifier file_identifier = 16683181;
SpanID spanContext;
uint64_t requestNum;
uint64_t mostRecentProcessedRequestNum;
UID requestingProxy;
ReplyPromise<GetCommitVersionReply> reply;
GetCommitVersionRequest() { }
GetCommitVersionRequest(SpanID spanContext, uint64_t requestNum, uint64_t mostRecentProcessedRequestNum,
UID requestingProxy)
: spanContext(spanContext), requestNum(requestNum), mostRecentProcessedRequestNum(mostRecentProcessedRequestNum),
requestingProxy(requestingProxy) {}
GetCommitVersionRequest(uint64_t requestNum, uint64_t mostRecentProcessedRequestNum, UID requestingProxy)
: requestNum(requestNum), mostRecentProcessedRequestNum(mostRecentProcessedRequestNum), requestingProxy(requestingProxy) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, requestNum, mostRecentProcessedRequestNum, requestingProxy, reply, spanContext);
serializer(ar, requestNum, mostRecentProcessedRequestNum, requestingProxy, reply);
}
};
struct ReportRawCommittedVersionRequest {
constexpr static FileIdentifier file_identifier = 1853148;
SpanID spanContext;
Version version;
bool locked;
Optional<Value> metadataVersion;
ReplyPromise<Void> reply;
ReportRawCommittedVersionRequest() : spanContext(), version(invalidVersion), locked(false) {}
ReportRawCommittedVersionRequest(SpanID spanContext, Version version, bool locked, Optional<Value> metadataVersion) : spanContext(spanContext), version(version), locked(locked), metadataVersion(metadataVersion) {}
ReportRawCommittedVersionRequest() : version(invalidVersion), locked(false) {}
ReportRawCommittedVersionRequest(Version version, bool locked, Optional<Value> metadataVersion) : version(version), locked(locked), metadataVersion(metadataVersion) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version, locked, metadataVersion, reply, spanContext);
serializer(ar, version, locked, metadataVersion, reply);
}
};

View File

@ -44,13 +44,10 @@
#include "fdbserver/WaitFailure.h"
#include "fdbserver/WorkerInterface.actor.h"
#include "flow/ActorCollection.h"
#include "flow/IRandom.h"
#include "flow/Knobs.h"
#include "flow/Stats.h"
#include "flow/TDMetric.actor.h"
#include "flow/Tracing.h"
#include "flow/actorcompiler.h" // This must be the last #include.
#include <tuple>
ACTOR Future<Void> broadcastTxnRequest(TxnStateRequest req, int sendAmount, bool sendReply) {
state ReplyPromise<Void> reply = req.reply;
@ -290,9 +287,9 @@ ACTOR Future<Void> getRate(UID myID, Reference<AsyncVar<ServerDBInfo>> db, int64
ACTOR Future<Void> queueTransactionStartRequests(
Reference<AsyncVar<ServerDBInfo>> db,
SpannedDeque<GetReadVersionRequest> *systemQueue,
SpannedDeque<GetReadVersionRequest> *defaultQueue,
SpannedDeque<GetReadVersionRequest> *batchQueue,
Deque<GetReadVersionRequest> *systemQueue,
Deque<GetReadVersionRequest> *defaultQueue,
Deque<GetReadVersionRequest> *batchQueue,
FutureStream<GetReadVersionRequest> readVersionRequests,
PromiseStream<Void> GRVTimer, double *lastGRVTime,
double *GRVBatchTime, FutureStream<double> replyTimes,
@ -329,11 +326,9 @@ ACTOR Future<Void> queueTransactionStartRequests(
if (req.priority >= TransactionPriority::IMMEDIATE) {
stats->txnSystemPriorityStartIn += req.transactionCount;
systemQueue->push_back(req);
systemQueue->span->parents.insert(req.spanContext);
} else if (req.priority >= TransactionPriority::DEFAULT) {
stats->txnDefaultPriorityStartIn += req.transactionCount;
defaultQueue->push_back(req);
defaultQueue->span->parents.insert(req.spanContext);
} else {
// Return error for batch_priority GRV requests
int64_t proxiesCount = std::max((int)db->get().client.proxies.size(), 1);
@ -345,7 +340,6 @@ ACTOR Future<Void> queueTransactionStartRequests(
stats->txnBatchPriorityStartIn += req.transactionCount;
batchQueue->push_back(req);
batchQueue->span->parents.insert(req.spanContext);
}
}
}
@ -511,11 +505,8 @@ struct ResolutionRequestBuilder {
// [CommitTransactionRef_Index][Resolver_Index][Read_Conflict_Range_Index_on_Resolver]
// -> read_conflict_range's original index in the commitTransactionRef
ResolutionRequestBuilder(ProxyCommitData* self, Version version, Version prevVersion, Version lastReceivedVersion,
Span& parentSpan)
: self(self), requests(self->resolvers.size()) {
for (auto& req : requests) {
req.spanContext = parentSpan->context;
ResolutionRequestBuilder( ProxyCommitData* self, Version version, Version prevVersion, Version lastReceivedVersion) : self(self), requests(self->resolvers.size()) {
for(auto& req : requests) {
req.prevVersion = prevVersion;
req.version = version;
req.lastReceivedVersion = lastReceivedVersion;
@ -799,7 +790,6 @@ ACTOR Future<Void> commitBatch(
state Optional<UID> debugID;
state bool forceRecovery = false;
state int batchOperations = 0;
state Span span("MP:commitBatch"_loc);
int64_t batchBytes = 0;
for (int t = 0; t<trs.size(); t++) {
batchOperations += trs[t].transaction.mutations.size();
@ -822,7 +812,6 @@ ACTOR Future<Void> commitBatch(
debugID = nondeterministicRandom()->randomUniqueID();
g_traceBatch.addAttach("CommitAttachID", trs[t].debugID.get().first(), debugID.get().first());
}
span->parents.insert(trs[t].spanContext);
}
if(localBatchNumber == 2 && !debugID.present() && self->firstProxy && !g_network->isSimulated()) {
@ -843,7 +832,7 @@ ACTOR Future<Void> commitBatch(
if (debugID.present())
g_traceBatch.addEvent("CommitDebug", debugID.get().first(), "MasterProxyServer.commitBatch.GettingCommitVersion");
GetCommitVersionRequest req(span->context, self->commitVersionRequestNumber++, self->mostRecentProcessedRequestNumber, self->dbgid);
GetCommitVersionRequest req(self->commitVersionRequestNumber++, self->mostRecentProcessedRequestNumber, self->dbgid);
GetCommitVersionReply versionReply = wait( brokenPromiseToNever(self->master.getCommitVersion.getReply(req, TaskPriority::ProxyMasterVersionReply)) );
self->mostRecentProcessedRequestNumber = versionReply.requestNum;
@ -864,7 +853,7 @@ ACTOR Future<Void> commitBatch(
if (debugID.present())
g_traceBatch.addEvent("CommitDebug", debugID.get().first(), "MasterProxyServer.commitBatch.GotCommitVersion");
ResolutionRequestBuilder requests( self, commitVersion, prevVersion, self->version, span );
ResolutionRequestBuilder requests( self, commitVersion, prevVersion, self->version );
int conflictRangeCount = 0;
state int64_t maxTransactionBytes = 0;
for (int t = 0; t<trs.size(); t++) {
@ -1177,21 +1166,17 @@ ACTOR Future<Void> commitBatch(
// We prevent this by limiting the number of versions which are semi-committed but not fully committed to be less than the MVCC window
if(self->committedVersion.get() < commitVersion - SERVER_KNOBS->MAX_READ_TRANSACTION_LIFE_VERSIONS) {
computeDuration += g_network->timer() - computeStart;
state Span waitVersionSpan;
while (self->committedVersion.get() < commitVersion - SERVER_KNOBS->MAX_READ_TRANSACTION_LIFE_VERSIONS) {
// This should be *extremely* rare in the real world, but knob buggification should make it happen in simulation
TEST(true); // Semi-committed pipeline limited by MVCC window
//TraceEvent("ProxyWaitingForCommitted", self->dbgid).detail("CommittedVersion", self->committedVersion.get()).detail("NeedToCommit", commitVersion);
waitVersionSpan = Span(deterministicRandom()->randomUniqueID(), "MP:overMaxReadTransactionLifeVersions"_loc, {span->context});
choose{
when(wait(self->committedVersion.whenAtLeast(commitVersion - SERVER_KNOBS->MAX_READ_TRANSACTION_LIFE_VERSIONS))) {
wait(yield());
break;
}
when(GetReadVersionReply v = wait(self->getConsistentReadVersion.getReply(
GetReadVersionRequest(waitVersionSpan->context, 0, TransactionPriority::IMMEDIATE,
GetReadVersionRequest::FLAG_CAUSAL_READ_RISKY)))) {
if (v.version > self->committedVersion.get()) {
when(GetReadVersionReply v = wait(self->getConsistentReadVersion.getReply(GetReadVersionRequest(0, TransactionPriority::IMMEDIATE, GetReadVersionRequest::FLAG_CAUSAL_READ_RISKY)))) {
if(v.version > self->committedVersion.get()) {
self->locked = v.locked;
self->metadataVersion = v.metadataVersion;
self->committedVersion.set(v.version);
@ -1202,7 +1187,6 @@ ACTOR Future<Void> commitBatch(
}
}
}
waitVersionSpan = Span{};
computeStart = g_network->timer();
}
@ -1292,11 +1276,6 @@ ACTOR Future<Void> commitBatch(
ASSERT(p.second.isReady());
}
if (SERVER_KNOBS->ASK_READ_VERSION_FROM_MASTER) {
// Let master know this commit version so that every other proxy can know.
wait(self->master.reportLiveCommittedVersion.getReply(ReportRawCommittedVersionRequest(span->context, commitVersion, lockedAfter, metadataVersionAfter), TaskPriority::ProxyMasterVersionReply));
}
TEST(self->committedVersion.get() > commitVersion); // A later version was reported committed first
if( commitVersion > self->committedVersion.get() ) {
self->locked = lockedAfter;
@ -1402,14 +1381,13 @@ ACTOR Future<Void> updateLastCommit(ProxyCommitData* self, Optional<UID> debugID
return Void();
}
ACTOR Future<GetReadVersionReply> getLiveCommittedVersion(Span parentSpan, ProxyCommitData* commitData, uint32_t flags, vector<MasterProxyInterface> *otherProxies, Optional<UID> debugID,
ACTOR Future<GetReadVersionReply> getLiveCommittedVersion(ProxyCommitData* commitData, uint32_t flags, vector<MasterProxyInterface> *otherProxies, Optional<UID> debugID,
int transactionCount, int systemTransactionCount, int defaultPriTransactionCount, int batchPriTransactionCount)
{
// Returns a version which (1) is committed, and (2) is >= the latest version reported committed (by a commit response) when this request was sent
// (1) The version returned is the committedVersion of some proxy at some point before the request returns, so it is committed.
// (2) No proxy on our list reported committed a higher version before this request was received, because then its committedVersion would have been higher,
// and no other proxy could have already committed anything without first ending the epoch
state Span span("MP:getLiveCommittedVersion"_loc, parentSpan);
++commitData->stats.txnStartBatch;
state vector<Future<GetReadVersionReply>> proxyVersions;
state Future<GetReadVersionReply> replyFromMasterFuture;
@ -1525,16 +1503,15 @@ ACTOR static Future<Void> transactionStarter(
state TransactionRateInfo normalRateInfo(10);
state TransactionRateInfo batchRateInfo(0);
state SpannedDeque<GetReadVersionRequest> systemQueue("MP:transactionStarterSystemQueue"_loc);
state SpannedDeque<GetReadVersionRequest> defaultQueue("MP:transactionStarterDefaultQueue"_loc);
state SpannedDeque<GetReadVersionRequest> batchQueue("MP:transactionStarterBatchQueue"_loc);
state Deque<GetReadVersionRequest> systemQueue;
state Deque<GetReadVersionRequest> defaultQueue;
state Deque<GetReadVersionRequest> batchQueue;
state vector<MasterProxyInterface> otherProxies;
state TransactionTagMap<uint64_t> transactionTagCounter;
state PrioritizedTransactionTagMap<ClientTagThrottleLimits> throttledTags;
state PromiseStream<double> replyTimes;
state Span span;
addActor.send(getRate(proxy.id(), db, &transactionCount, &batchTransactionCount, &normalRateInfo, &batchRateInfo, healthMetricsReply, detailedHealthMetricsReply, &transactionTagCounter, &throttledTags));
addActor.send(queueTransactionStartRequests(db, &systemQueue, &defaultQueue, &batchQueue, proxy.getConsistentReadVersion.getFuture(),
@ -1576,16 +1553,13 @@ ACTOR static Future<Void> transactionStarter(
int requestsToStart = 0;
while (requestsToStart < SERVER_KNOBS->START_TRANSACTION_MAX_REQUESTS_TO_START) {
SpannedDeque<GetReadVersionRequest>* transactionQueue;
Deque<GetReadVersionRequest>* transactionQueue;
if(!systemQueue.empty()) {
transactionQueue = &systemQueue;
span = systemQueue.resetSpan();
} else if(!defaultQueue.empty()) {
transactionQueue = &defaultQueue;
span = defaultQueue.resetSpan();
} else if(!batchQueue.empty()) {
transactionQueue = &batchQueue;
span = batchQueue.resetSpan();
} else {
break;
}
@ -1650,9 +1624,7 @@ ACTOR static Future<Void> transactionStarter(
for (int i = 0; i < start.size(); i++) {
if (start[i].size()) {
Future<GetReadVersionReply> readVersionReply = getLiveCommittedVersion(
span, commitData, i, &otherProxies, debugID, transactionsStarted[i], systemTransactionsStarted[i],
defaultPriTransactionsStarted[i], batchPriTransactionsStarted[i]);
Future<GetReadVersionReply> readVersionReply = getLiveCommittedVersion(commitData, i, &otherProxies, debugID, transactionsStarted[i], systemTransactionsStarted[i], defaultPriTransactionsStarted[i], batchPriTransactionsStarted[i]);
addActor.send(sendGrvReplies(readVersionReply, start[i], &commitData->stats,
commitData->minKnownCommittedVersion, throttledTags));
@ -1662,7 +1634,6 @@ ACTOR static Future<Void> transactionStarter(
}
}
}
span.reset();
}
}
@ -2121,7 +2092,6 @@ ACTOR Future<Void> masterProxyServerCore(
}
when(GetRawCommittedVersionRequest req = waitNext(proxy.getRawCommittedVersion.getFuture())) {
//TraceEvent("ProxyGetRCV", proxy.id());
Span span("MP:getRawCommittedReadVersion"_loc, { req.spanContext });
if (req.debugID.present())
g_traceBatch.addEvent("TransactionDebug", req.debugID.get().first(), "MasterProxyServer.masterProxyServerCore.GetRawCommittedVersion");
GetReadVersionReply rep;

View File

@ -20,9 +20,6 @@
#ifndef FDBSERVER_RESOLVERINTERFACE_H
#define FDBSERVER_RESOLVERINTERFACE_H
#include "fdbclient/CommitTransaction.h"
#include "fdbrpc/Locality.h"
#include "fdbrpc/fdbrpc.h"
#pragma once
#include "fdbrpc/Locality.h"
@ -97,19 +94,17 @@ struct ResolveTransactionBatchRequest {
constexpr static FileIdentifier file_identifier = 16462858;
Arena arena;
SpanID spanContext;
Version prevVersion;
Version version; // FIXME: ?
Version lastReceivedVersion;
VectorRef<struct CommitTransactionRef> transactions;
VectorRef<CommitTransactionRef> transactions;
VectorRef<int> txnStateTransactions; // Offsets of elements of transactions that have (transaction subsystem state) mutations
ReplyPromise<ResolveTransactionBatchReply> reply;
Optional<UID> debugID;
template <class Archive>
void serialize(Archive& ar) {
serializer(ar, prevVersion, version, lastReceivedVersion, transactions, txnStateTransactions, reply, arena,
debugID, spanContext);
serializer(ar, prevVersion, version, lastReceivedVersion, transactions, txnStateTransactions, reply, arena, debugID);
}
};

View File

@ -447,6 +447,7 @@ ACTOR Future<Version> waitForVersionNoTooOld( StorageCacheData* data, Version ve
ACTOR Future<Void> getValueQ( StorageCacheData* data, GetValueRequest req ) {
state int64_t resultSize = 0;
try {
++data->counters.getValueQueries;
++data->counters.allQueries;
@ -456,13 +457,12 @@ ACTOR Future<Void> getValueQ( StorageCacheData* data, GetValueRequest req ) {
// Active load balancing runs at a very high priority (to obtain accurate queue lengths)
// so we need to downgrade here
//TODO what's this?
wait( delay(0, TaskPriority::DefaultEndpoint) );
if( req.debugID.present() ) {
if( req.debugID.present() )
g_traceBatch.addEvent("GetValueDebug", req.debugID.get().first(), "getValueQ.DoRead"); //.detail("TaskID", g_network->getCurrentTask());
//FIXME
}
state Optional<Value> v;
state Version version = wait( waitForVersion( data, req.version ) );

View File

@ -2993,7 +2993,7 @@ public:
VersionedBTree(IPager2* pager, std::string name)
: m_pager(pager), m_writeVersion(invalidVersion), m_lastCommittedVersion(invalidVersion), m_pBuffer(nullptr),
m_commitReadLock(SERVER_KNOBS->REDWOOD_COMMIT_CONCURRENT_READS), m_name(name) {
m_commitReadLock(new FlowLock(SERVER_KNOBS->REDWOOD_COMMIT_CONCURRENT_READS)), m_name(name) {
m_lazyClearActor = 0;
m_init = init_impl(this);
@ -3441,7 +3441,7 @@ private:
Version m_writeVersion;
Version m_lastCommittedVersion;
Version m_newOldestVersion;
FlowLock m_commitReadLock;
Reference<FlowLock> m_commitReadLock;
Future<Void> m_latestCommit;
Future<Void> m_init;
std::string m_name;
@ -4134,8 +4134,9 @@ private:
state Version writeVersion = self->getLastCommittedVersion() + 1;
wait(self->m_commitReadLock.take());
state FlowLock::Releaser readLock(self->m_commitReadLock);
state Reference<FlowLock> commitReadLock = self->m_commitReadLock;
wait(commitReadLock->take());
state FlowLock::Releaser readLock(*commitReadLock);
state Reference<const IPage> page =
wait(readPage(snapshot, rootID, update->decodeLowerBound, update->decodeUpperBound));
readLock.release();
@ -5443,7 +5444,7 @@ RedwoodRecordRef VersionedBTree::dbEnd(LiteralStringRef("\xff\xff\xff\xff\xff"))
class KeyValueStoreRedwoodUnversioned : public IKeyValueStore {
public:
KeyValueStoreRedwoodUnversioned(std::string filePrefix, UID logID)
: m_filePrefix(filePrefix), m_concurrentReads(SERVER_KNOBS->REDWOOD_KVSTORE_CONCURRENT_READS) {
: m_filePrefix(filePrefix), m_concurrentReads(new FlowLock(SERVER_KNOBS->REDWOOD_KVSTORE_CONCURRENT_READS)) {
// TODO: This constructor should really just take an IVersionedStore
IPager2* pager = new DWALPager(SERVER_KNOBS->REDWOOD_DEFAULT_PAGE_SIZE, filePrefix, 0);
m_tree = new VersionedBTree(pager, filePrefix);
@ -5519,8 +5520,9 @@ public:
state VersionedBTree::BTreeCursor cur;
wait(self->m_tree->initBTreeCursor(&cur, self->m_tree->getLastCommittedVersion()));
wait(self->m_concurrentReads.take());
state FlowLock::Releaser releaser(self->m_concurrentReads);
state Reference<FlowLock> readLock = self->m_concurrentReads;
wait(readLock->take());
state FlowLock::Releaser releaser(*readLock);
++g_redwoodMetrics.opGetRange;
state Standalone<RangeResultRef> result;
@ -5599,8 +5601,9 @@ public:
state VersionedBTree::BTreeCursor cur;
wait(self->m_tree->initBTreeCursor(&cur, self->m_tree->getLastCommittedVersion()));
wait(self->m_concurrentReads.take());
state FlowLock::Releaser releaser(self->m_concurrentReads);
state Reference<FlowLock> readLock = self->m_concurrentReads;
wait(readLock->take());
state FlowLock::Releaser releaser(*readLock);
++g_redwoodMetrics.opGet;
wait(cur.seekGTE(key, 0));
@ -5619,8 +5622,9 @@ public:
state VersionedBTree::BTreeCursor cur;
wait(self->m_tree->initBTreeCursor(&cur, self->m_tree->getLastCommittedVersion()));
wait(self->m_concurrentReads.take());
state FlowLock::Releaser releaser(self->m_concurrentReads);
state Reference<FlowLock> readLock = self->m_concurrentReads;
wait(readLock->take());
state FlowLock::Releaser releaser(*readLock);
++g_redwoodMetrics.opGet;
wait(cur.seekGTE(key, 0));
@ -5645,7 +5649,7 @@ private:
Future<Void> m_init;
Promise<Void> m_closed;
Promise<Void> m_error;
FlowLock m_concurrentReads;
Reference<FlowLock> m_concurrentReads;
template <typename T>
inline Future<T> catchError(Future<T> f) {

View File

@ -20,9 +20,6 @@
// There's something in one of the files below that defines a macros
// a macro that makes boost interprocess break on Windows.
#include "flow/Tracing.h"
#include <cctype>
#include <iterator>
#define BOOST_DATE_TIME_NO_LIB
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/algorithm/string.hpp>
@ -81,7 +78,7 @@
// clang-format off
enum {
OPT_CONNFILE, OPT_SEEDCONNFILE, OPT_SEEDCONNSTRING, OPT_ROLE, OPT_LISTEN, OPT_PUBLICADDR, OPT_DATAFOLDER, OPT_LOGFOLDER, OPT_PARENTPID, OPT_TRACER, OPT_NEWCONSOLE,
OPT_CONNFILE, OPT_SEEDCONNFILE, OPT_SEEDCONNSTRING, OPT_ROLE, OPT_LISTEN, OPT_PUBLICADDR, OPT_DATAFOLDER, OPT_LOGFOLDER, OPT_PARENTPID, OPT_NEWCONSOLE,
OPT_NOBOX, OPT_TESTFILE, OPT_RESTARTING, OPT_RESTORING, OPT_RANDOMSEED, OPT_KEY, OPT_MEMLIMIT, OPT_STORAGEMEMLIMIT, OPT_CACHEMEMLIMIT, OPT_MACHINEID,
OPT_DCID, OPT_MACHINE_CLASS, OPT_BUGGIFY, OPT_VERSION, OPT_CRASHONERROR, OPT_HELP, OPT_NETWORKIMPL, OPT_NOBUFSTDOUT, OPT_BUFSTDOUTERR, OPT_TRACECLOCK,
OPT_NUMTESTERS, OPT_DEVHELP, OPT_ROLLSIZE, OPT_MAXLOGS, OPT_MAXLOGSSIZE, OPT_KNOB, OPT_TESTSERVERS, OPT_TEST_ON_SERVERS, OPT_METRICSCONNFILE,
@ -114,7 +111,6 @@ CSimpleOpt::SOption g_rgOptions[] = {
{ OPT_MAXLOGSSIZE, "--maxlogssize", SO_REQ_SEP },
{ OPT_LOGGROUP, "--loggroup", SO_REQ_SEP },
{ OPT_PARENTPID, "--parentpid", SO_REQ_SEP },
{ OPT_TRACER, "--tracer", SO_REQ_SEP },
#ifdef _WIN32
{ OPT_NEWCONSOLE, "-n", SO_NONE },
{ OPT_NEWCONSOLE, "--newconsole", SO_NONE },
@ -518,9 +514,6 @@ static void printUsage( const char *name, bool devhelp ) {
printf(" --trace_format FORMAT\n"
" Select the format of the log files. xml (the default) and json\n"
" are supported.\n");
printf(" --tracer TRACER\n"
" Select a tracer for transaction tracing. Currently disabled\n"
" (the default) and log_file are supported.\n");
printf(" -i ID, --machine_id ID\n"
" Machine and zone identifier key (up to 16 hex characters).\n"
" Defaults to a random value shared by all fdbserver processes\n"
@ -1176,22 +1169,6 @@ private:
break;
}
#endif
case OPT_TRACER:
{
std::string arg = args.OptionArg();
std::string tracer;
std::transform(arg.begin(), arg.end(), std::back_inserter(tracer), [](char c) { return tolower(c); });
if (tracer == "none" || tracer == "disabled") {
openTracer(TracerType::DISABLED);
} else if (tracer == "logfile" || tracer == "file" || tracer == "log_file") {
openTracer(TracerType::LOG_FILE);
} else {
fprintf(stderr, "ERROR: Unknown or unsupported tracer: `%s'", args.OptionArg());
printHelpTeaser(argv[0]);
flushAndExit(FDB_EXIT_ERROR);
}
break;
}
case OPT_TESTFILE:
testFile = args.OptionArg();
break;

View File

@ -921,7 +921,6 @@ ACTOR Future<Void> recoverFrom( Reference<MasterData> self, Reference<ILogSystem
}
ACTOR Future<Void> getVersion(Reference<MasterData> self, GetCommitVersionRequest req) {
state Span span("M:getVersion"_loc, { req.spanContext });
state std::map<UID, ProxyVersionReplies>::iterator proxyItr = self->lastProxyVersionReplies.find(req.requestingProxy); // lastProxyVersionReplies never changes
if (proxyItr == self->lastProxyVersionReplies.end()) {
@ -1008,7 +1007,6 @@ ACTOR Future<Void> serveLiveCommittedVersion(Reference<MasterData> self) {
loop {
choose {
when(GetRawCommittedVersionRequest req = waitNext(self->myInterface.getLiveCommittedVersion.getFuture())) {
Span span("MS:getLiveCommittedVersion"_loc, { req.spanContext });
if (req.debugID.present())
g_traceBatch.addEvent("TransactionDebug", req.debugID.get().first(), "MasterServer.serveLiveCommittedVersion.GetRawCommittedVersion");
@ -1022,7 +1020,6 @@ ACTOR Future<Void> serveLiveCommittedVersion(Reference<MasterData> self) {
req.reply.send(reply);
}
when(ReportRawCommittedVersionRequest req = waitNext(self->myInterface.reportLiveCommittedVersion.getFuture())) {
Span span("MS:reportLiveCommittedVersion"_loc, { req.spanContext });
if (req.version > self->liveCommittedVersion) {
self->liveCommittedVersion = req.version;
self->databaseLocked = req.locked;

View File

@ -21,9 +21,6 @@
#include <cinttypes>
#include "fdbrpc/fdbrpc.h"
#include "fdbrpc/LoadBalance.h"
#include "flow/Arena.h"
#include "flow/IRandom.h"
#include "flow/Tracing.h"
#include "flow/IndexedSet.h"
#include "flow/Hash3.h"
#include "flow/ActorCollection.h"
@ -715,7 +712,7 @@ public:
}
template<class Request, class HandleFunction>
Future<Void> readGuard(const Span& parentSpan, const Request& request, const HandleFunction& fun) {
Future<Void> readGuard(const Request& request, const HandleFunction& fun) {
auto rate = currentRate();
if (rate < SERVER_KNOBS->STORAGE_DURABILITY_LAG_REJECT_THRESHOLD && deterministicRandom()->random01() > std::max(SERVER_KNOBS->STORAGE_DURABILITY_LAG_MIN_RATE, rate/SERVER_KNOBS->STORAGE_DURABILITY_LAG_REJECT_THRESHOLD)) {
//request.error = future_version();
@ -723,7 +720,7 @@ public:
++counters.readsRejected;
return Void();
}
return fun(this, request, parentSpan);
return fun(this, request);
}
};
@ -849,8 +846,7 @@ updateProcessStats(StorageServer* self)
#pragma region Queries
#endif
ACTOR Future<Version> waitForVersionActor(StorageServer* data, Version version, SpanID spanContext) {
state Span span("SS.WaitForVersion"_loc, { spanContext });
ACTOR Future<Version> waitForVersionActor(StorageServer* data, Version version) {
choose {
when(wait(data->version.whenAtLeast(version))) {
// FIXME: A bunch of these can block with or without the following delay 0.
@ -869,7 +865,7 @@ ACTOR Future<Version> waitForVersionActor(StorageServer* data, Version version,
}
}
Future<Version> waitForVersion(StorageServer* data, Version version, SpanID spanContext) {
Future<Version> waitForVersion(StorageServer* data, Version version) {
if (version == latestVersion) {
version = std::max(Version(1), data->version.get());
}
@ -887,7 +883,7 @@ Future<Version> waitForVersion(StorageServer* data, Version version, SpanID span
if (deterministicRandom()->random01() < 0.001) {
TraceEvent("WaitForVersion1000x");
}
return waitForVersionActor(data, version, spanContext);
return waitForVersionActor(data, version);
}
ACTOR Future<Version> waitForVersionNoTooOld( StorageServer* data, Version version ) {
@ -911,7 +907,7 @@ ACTOR Future<Version> waitForVersionNoTooOld( StorageServer* data, Version versi
}
}
ACTOR Future<Void> getValueQ( StorageServer* data, GetValueRequest req, Span span ) {
ACTOR Future<Void> getValueQ( StorageServer* data, GetValueRequest req ) {
state int64_t resultSize = 0;
try {
@ -928,7 +924,7 @@ ACTOR Future<Void> getValueQ( StorageServer* data, GetValueRequest req, Span spa
g_traceBatch.addEvent("GetValueDebug", req.debugID.get().first(), "getValueQ.DoRead"); //.detail("TaskID", g_network->getCurrentTask());
state Optional<Value> v;
state Version version = wait( waitForVersion( data, req.version, req.spanContext ) );
state Version version = wait( waitForVersion( data, req.version ) );
if( req.debugID.present() )
g_traceBatch.addEvent("GetValueDebug", req.debugID.get().first(), "getValueQ.AfterVersion"); //.detail("TaskID", g_network->getCurrentTask());
@ -1012,8 +1008,7 @@ ACTOR Future<Void> getValueQ( StorageServer* data, GetValueRequest req, Span spa
return Void();
};
ACTOR Future<Void> watchValue_impl( StorageServer* data, WatchValueRequest req, SpanID parent ) {
state Span span("SS:WatchValueImpl"_loc, { parent });
ACTOR Future<Void> watchValue_impl( StorageServer* data, WatchValueRequest req ) {
try {
++data->counters.watchQueries;
@ -1024,21 +1019,25 @@ ACTOR Future<Void> watchValue_impl( StorageServer* data, WatchValueRequest req,
if( req.debugID.present() )
g_traceBatch.addEvent("WatchValueDebug", req.debugID.get().first(), "watchValueQ.AfterVersion"); //.detail("TaskID", g_network->getCurrentTask());
state Version minVersion = data->data().latestVersion;
state Future<Void> watchFuture = data->watches.onChange(req.key);
loop {
try {
state Version latest = data->data().latestVersion;
state Future<Void> watchFuture = data->watches.onChange(req.key);
state Span getValueSpan(deterministicRandom()->randomUniqueID(), "SS:GetValue"_loc, { span->context });
GetValueRequest getReq( getValueSpan->context, req.key, latest, req.tags, req.debugID );
state Future<Void> getValue = getValueQ( data, getReq, span ); //we are relying on the delay zero at the top of getValueQ, if removed we need one here
state Version latest = data->version.get();
TEST(latest >= minVersion && latest < data->data().latestVersion); // Starting watch loop with latestVersion > data->version
GetValueRequest getReq( req.key, latest, req.tags, req.debugID );
state Future<Void> getValue = getValueQ( data, getReq ); //we are relying on the delay zero at the top of getValueQ, if removed we need one here
GetValueReply reply = wait( getReq.reply.getFuture() );
getValueSpan.reset();
//TraceEvent("WatcherCheckValue").detail("Key", req.key ).detail("Value", req.value ).detail("CurrentValue", v ).detail("Ver", latest);
if(reply.error.present()) {
ASSERT(reply.error.get().code() != error_code_future_version);
throw reply.error.get();
}
if(BUGGIFY) {
throw transaction_too_old();
}
DEBUG_MUTATION("ShardWatchValue", latest, MutationRef(MutationRef::DebugKey, req.key, reply.value.present() ? StringRef( reply.value.get() ) : LiteralStringRef("<null>") ) );
if( req.debugID.present() )
@ -1058,7 +1057,16 @@ ACTOR Future<Void> watchValue_impl( StorageServer* data, WatchValueRequest req,
++data->numWatches;
data->watchBytes += ( req.key.expectedSize() + req.value.expectedSize() + 1000 );
try {
wait( watchFuture );
if(latest < minVersion) {
// If the version we read is less than minVersion, then we may fail to be notified of any changes that occur up to or including minVersion
// To prevent that, we'll check the key again once the version reaches our minVersion
watchFuture = watchFuture || data->version.whenAtLeast(minVersion);
}
if(BUGGIFY) {
// Simulate a trigger on the watch that results in the loop going around without the value changing
watchFuture = watchFuture || delay(deterministicRandom()->random01());
}
wait(watchFuture);
--data->numWatches;
data->watchBytes -= ( req.key.expectedSize() + req.value.expectedSize() + 1000 );
} catch( Error &e ) {
@ -1067,9 +1075,15 @@ ACTOR Future<Void> watchValue_impl( StorageServer* data, WatchValueRequest req,
throw;
}
} catch( Error &e ) {
if( e.code() != error_code_transaction_too_old )
if( e.code() != error_code_transaction_too_old ) {
throw;
}
TEST(true); // Reading a watched key failed with transaction_too_old
}
watchFuture = data->watches.onChange(req.key);
wait(data->version.whenAtLeast(data->data().latestVersion));
}
} catch (Error& e) {
if(!canReplyWith(e))
@ -1079,8 +1093,8 @@ ACTOR Future<Void> watchValue_impl( StorageServer* data, WatchValueRequest req,
return Void();
}
ACTOR Future<Void> watchValueQ( StorageServer* data, WatchValueRequest req, Span span ) {
state Future<Void> watch = watchValue_impl( data, req, span->context );
ACTOR Future<Void> watchValueQ( StorageServer* data, WatchValueRequest req ) {
state Future<Void> watch = watchValue_impl( data, req );
state double startTime = now();
loop {
@ -1185,7 +1199,7 @@ void merge( Arena& arena, VectorRef<KeyValueRef, VecSerStrategy::String>& output
// If limit>=0, it returns the first rows in the range (sorted ascending), otherwise the last rows (sorted descending).
// readRange has O(|result|) + O(log |data|) cost
ACTOR Future<GetKeyValuesReply> readRange( StorageServer* data, Version version, KeyRange range, int limit, int* pLimitBytes, Span parentSpan ) {
ACTOR Future<GetKeyValuesReply> readRange( StorageServer* data, Version version, KeyRange range, int limit, int* pLimitBytes ) {
state GetKeyValuesReply result;
state StorageServer::VersionedData::ViewAtVersion view = data->data().at(version);
state StorageServer::VersionedData::iterator vCurrent = view.end();
@ -1193,7 +1207,6 @@ ACTOR Future<GetKeyValuesReply> readRange( StorageServer* data, Version version,
state KeyRef readEnd;
state Key readBeginTemp;
state int vCount = 0;
state Span span("SS:readRange"_loc, parentSpan);
// for caching the storage queue results during the first PTree traversal
state VectorRef<KeyValueRef> resultCache;
@ -1365,7 +1378,7 @@ ACTOR Future<GetKeyValuesReply> readRange( StorageServer* data, Version version,
// return sel.getKey() >= range.begin && (sel.isBackward() ? sel.getKey() <= range.end : sel.getKey() < range.end);
//}
ACTOR Future<Key> findKey( StorageServer* data, KeySelectorRef sel, Version version, KeyRange range, int* pOffset, SpanID parentSpan)
ACTOR Future<Key> findKey( StorageServer* data, KeySelectorRef sel, Version version, KeyRange range, int* pOffset)
// Attempts to find the key indicated by sel in the data at version, within range.
// Precondition: selectorInRange(sel, range)
// If it is found, offset is set to 0 and a key is returned which falls inside range.
@ -1382,7 +1395,6 @@ ACTOR Future<Key> findKey( StorageServer* data, KeySelectorRef sel, Version vers
state int sign = forward ? +1 : -1;
state bool skipEqualKey = sel.orEqual == forward;
state int distance = forward ? sel.offset : 1-sel.offset;
state Span span("SS.findKey"_loc, { parentSpan });
//Don't limit the number of bytes if this is a trivial key selector (there will be at most two items returned from the read range in this case)
state int maxBytes;
@ -1391,18 +1403,14 @@ ACTOR Future<Key> findKey( StorageServer* data, KeySelectorRef sel, Version vers
else
maxBytes = BUGGIFY ? SERVER_KNOBS->BUGGIFY_LIMIT_BYTES : SERVER_KNOBS->STORAGE_LIMIT_BYTES;
state GetKeyValuesReply rep = wait(
readRange(data, version,
forward ? KeyRangeRef(sel.getKey(), range.end) : KeyRangeRef(range.begin, keyAfter(sel.getKey())),
(distance + skipEqualKey) * sign, &maxBytes, span));
state GetKeyValuesReply rep = wait( readRange( data, version, forward ? KeyRangeRef(sel.getKey(), range.end) : KeyRangeRef(range.begin, keyAfter(sel.getKey())), (distance + skipEqualKey)*sign, &maxBytes ) );
state bool more = rep.more && rep.data.size() != distance + skipEqualKey;
//If we get only one result in the reverse direction as a result of the data being too large, we could get stuck in a loop
if(more && !forward && rep.data.size() == 1) {
TEST(true); //Reverse key selector returned only one result in range read
maxBytes = std::numeric_limits<int>::max();
GetKeyValuesReply rep2 =
wait(readRange(data, version, KeyRangeRef(range.begin, keyAfter(sel.getKey())), -2, &maxBytes, span));
GetKeyValuesReply rep2 = wait( readRange( data, version, KeyRangeRef(range.begin, keyAfter(sel.getKey())), -2, &maxBytes ) );
rep = rep2;
more = rep.more && rep.data.size() != distance + skipEqualKey;
ASSERT(rep.data.size() == 2 || !more);
@ -1457,7 +1465,7 @@ KeyRange getShardKeyRange( StorageServer* data, const KeySelectorRef& sel )
return i->range();
}
ACTOR Future<Void> getKeyValuesQ( StorageServer* data, GetKeyValuesRequest req, Span span )
ACTOR Future<Void> getKeyValuesQ( StorageServer* data, GetKeyValuesRequest req )
// Throws a wrong_shard_server if the keys in the request or result depend on data outside this server OR if a large selector offset prevents
// all data from being read in one range read
{
@ -1482,7 +1490,7 @@ ACTOR Future<Void> getKeyValuesQ( StorageServer* data, GetKeyValuesRequest req,
try {
if( req.debugID.present() )
g_traceBatch.addEvent("TransactionDebug", req.debugID.get().first(), "storageserver.getKeyValues.Before");
state Version version = wait( waitForVersion( data, req.version, span->context ) );
state Version version = wait( waitForVersion( data, req.version ) );
state uint64_t changeCounter = data->shardChangeCounter;
// try {
@ -1500,8 +1508,8 @@ ACTOR Future<Void> getKeyValuesQ( StorageServer* data, GetKeyValuesRequest req,
state int offset1;
state int offset2;
state Future<Key> fBegin = req.begin.isFirstGreaterOrEqual() ? Future<Key>(req.begin.getKey()) : findKey( data, req.begin, version, shard, &offset1, span->context );
state Future<Key> fEnd = req.end.isFirstGreaterOrEqual() ? Future<Key>(req.end.getKey()) : findKey( data, req.end, version, shard, &offset2, span->context );
state Future<Key> fBegin = req.begin.isFirstGreaterOrEqual() ? Future<Key>(req.begin.getKey()) : findKey( data, req.begin, version, shard, &offset1 );
state Future<Key> fEnd = req.end.isFirstGreaterOrEqual() ? Future<Key>(req.end.getKey()) : findKey( data, req.end, version, shard, &offset2 );
state Key begin = wait(fBegin);
state Key end = wait(fEnd);
if( req.debugID.present() )
@ -1535,7 +1543,7 @@ ACTOR Future<Void> getKeyValuesQ( StorageServer* data, GetKeyValuesRequest req,
} else {
state int remainingLimitBytes = req.limitBytes;
GetKeyValuesReply _r = wait( readRange(data, version, KeyRangeRef(begin, end), req.limit, &remainingLimitBytes, span) );
GetKeyValuesReply _r = wait( readRange(data, version, KeyRangeRef(begin, end), req.limit, &remainingLimitBytes) );
GetKeyValuesReply r = _r;
if( req.debugID.present() )
@ -1597,7 +1605,7 @@ ACTOR Future<Void> getKeyValuesQ( StorageServer* data, GetKeyValuesRequest req,
return Void();
}
ACTOR Future<Void> getKeyQ( StorageServer* data, GetKeyRequest req, Span span ) {
ACTOR Future<Void> getKeyQ( StorageServer* data, GetKeyRequest req ) {
state int64_t resultSize = 0;
++data->counters.getKeyQueries;
@ -1610,12 +1618,12 @@ ACTOR Future<Void> getKeyQ( StorageServer* data, GetKeyRequest req, Span span )
wait( delay(0, TaskPriority::DefaultEndpoint) );
try {
state Version version = wait( waitForVersion( data, req.version, req.spanContext ) );
state Version version = wait( waitForVersion( data, req.version ) );
state uint64_t changeCounter = data->shardChangeCounter;
state KeyRange shard = getShardKeyRange( data, req.sel );
state int offset;
Key k = wait( findKey( data, req.sel, version, shard, &offset, req.spanContext ) );
Key k = wait( findKey( data, req.sel, version, shard, &offset ) );
data->checkChangeCounter( changeCounter, KeyRangeRef( std::min<KeyRef>(req.sel.getKey(), k), std::max<KeyRef>(req.sel.getKey(), k) ) );
@ -3676,7 +3684,6 @@ ACTOR Future<Void> checkBehind( StorageServer* self ) {
ACTOR Future<Void> serveGetValueRequests( StorageServer* self, FutureStream<GetValueRequest> getValue ) {
loop {
GetValueRequest req = waitNext(getValue);
Span span("SS:getValue"_loc, { req.spanContext });
// Warning: This code is executed at extremely high priority (TaskPriority::LoadBalancedEndpoint), so downgrade before doing real work
if( req.debugID.present() )
g_traceBatch.addEvent("GetValueDebug", req.debugID.get().first(), "storageServer.received"); //.detail("TaskID", g_network->getCurrentTask());
@ -3684,35 +3691,32 @@ ACTOR Future<Void> serveGetValueRequests( StorageServer* self, FutureStream<GetV
if (SHORT_CIRCUT_ACTUAL_STORAGE && normalKeys.contains(req.key))
req.reply.send(GetValueReply());
else
self->actors.add(self->readGuard(span, req , getValueQ));
self->actors.add(self->readGuard(req , getValueQ));
}
}
ACTOR Future<Void> serveGetKeyValuesRequests( StorageServer* self, FutureStream<GetKeyValuesRequest> getKeyValues ) {
loop {
GetKeyValuesRequest req = waitNext(getKeyValues);
Span span("SS:getKeyValues"_loc, { req.spanContext });
// Warning: This code is executed at extremely high priority (TaskPriority::LoadBalancedEndpoint), so downgrade before doing real work
self->actors.add(self->readGuard(span, req, getKeyValuesQ));
self->actors.add(self->readGuard(req, getKeyValuesQ));
}
}
ACTOR Future<Void> serveGetKeyRequests( StorageServer* self, FutureStream<GetKeyRequest> getKey ) {
loop {
GetKeyRequest req = waitNext(getKey);
Span span("SS:getKey"_loc, { req.spanContext });
// Warning: This code is executed at extremely high priority (TaskPriority::LoadBalancedEndpoint), so downgrade before doing real work
self->actors.add(self->readGuard(span, req , getKeyQ));
self->actors.add(self->readGuard(req , getKeyQ));
}
}
ACTOR Future<Void> serveWatchValueRequests( StorageServer* self, FutureStream<WatchValueRequest> watchValue ) {
loop {
WatchValueRequest req = waitNext(watchValue);
Span span("SS:watchValue"_loc, { req.spanContext });
// TODO: fast load balancing?
// SOMEDAY: combine watches for the same key/value into a single watch
self->actors.add(self->readGuard(span, req, watchValueQ));
self->actors.add(self->readGuard(req, watchValueQ));
}
}

View File

@ -256,6 +256,7 @@ std::pair<KeyValueStoreType, std::string> bTreeV2Suffix = std::make_pair(KeyValu
std::pair<KeyValueStoreType, std::string> memorySuffix = std::make_pair( KeyValueStoreType::MEMORY, "-0.fdq" );
std::pair<KeyValueStoreType, std::string> memoryRTSuffix = std::make_pair( KeyValueStoreType::MEMORY_RADIXTREE, "-0.fdr" );
std::pair<KeyValueStoreType, std::string> redwoodSuffix = std::make_pair( KeyValueStoreType::SSD_REDWOOD_V1, ".redwood" );
std::pair<KeyValueStoreType, std::string> rocksdbSuffix = std::make_pair( KeyValueStoreType::SSD_ROCKSDB_V1, ".rocksdb" );
std::string validationFilename = "_validate";
@ -268,6 +269,8 @@ std::string filenameFromSample( KeyValueStoreType storeType, std::string folder,
return joinPath( folder, sample_filename.substr(0, sample_filename.size() - 5) );
else if ( storeType == KeyValueStoreType::SSD_REDWOOD_V1 )
return joinPath(folder, sample_filename);
else if (storeType == KeyValueStoreType::SSD_ROCKSDB_V1)
return joinPath(folder, sample_filename);
UNREACHABLE();
}
@ -280,6 +283,8 @@ std::string filenameFromId( KeyValueStoreType storeType, std::string folder, std
return joinPath( folder, prefix + id.toString() + "-" );
else if (storeType == KeyValueStoreType::SSD_REDWOOD_V1)
return joinPath(folder, prefix + id.toString() + ".redwood");
else if (storeType == KeyValueStoreType::SSD_ROCKSDB_V1)
return joinPath(folder, prefix + id.toString() + ".rocksdb");
UNREACHABLE();
}
@ -425,8 +430,10 @@ std::vector< DiskStore > getDiskStores( std::string folder ) {
result.insert( result.end(), result2.begin(), result2.end() );
auto result3 = getDiskStores( folder, redwoodSuffix.second, redwoodSuffix.first);
result.insert( result.end(), result3.begin(), result3.end() );
auto result4 = getDiskStores( folder, memoryRTSuffix.second, memoryRTSuffix.first );
result.insert( result.end(), result4.begin(), result4.end() );
auto result4 = getDiskStores( folder, memoryRTSuffix.second, memoryRTSuffix.first );
result.insert( result.end(), result4.begin(), result4.end() );
auto result5 = getDiskStores( folder, rocksdbSuffix.second, rocksdbSuffix.first);
result.insert( result.end(), result5.begin(), result5.end() );
return result;
}
@ -1114,7 +1121,7 @@ ACTOR Future<Void> workerServer(
notUpdated = interf.updateServerDBInfo.getEndpoint();
}
else if(localInfo.infoGeneration > dbInfo->get().infoGeneration || dbInfo->get().clusterInterface != ccInterface->get().get()) {
TraceEvent("GotServerDBInfoChange").detail("ChangeID", localInfo.id).detail("MasterID", localInfo.master.id())
.detail("RatekeeperID", localInfo.ratekeeper.present() ? localInfo.ratekeeper.get().id() : UID())
.detail("DataDistributorID", localInfo.distributor.present() ? localInfo.distributor.get().id() : UID());
@ -1425,6 +1432,9 @@ ACTOR Future<Void> workerServer(
}
else if (d.storeType == KeyValueStoreType::SSD_REDWOOD_V1) {
included = fileExists(d.filename + "0.pagerlog") && fileExists(d.filename + "1.pagerlog");
}
else if (d.storeType == KeyValueStoreType::SSD_ROCKSDB_V1) {
included = fileExists(joinPath(d.filename, "CURRENT")) && fileExists(joinPath(d.filename, "IDENTITY"));
} else if (d.storeType == KeyValueStoreType::MEMORY) {
included = fileExists(d.filename + "1.fdq");
} else {
@ -1630,14 +1640,17 @@ ACTOR Future<UID> createAndLockProcessIdFile(std::string folder) {
ACTOR Future<MonitorLeaderInfo> monitorLeaderRemotelyOneGeneration( Reference<ClusterConnectionFile> connFile, Reference<AsyncVar<Value>> result, MonitorLeaderInfo info ) {
state ClusterConnectionString ccf = info.intermediateConnFile->getConnectionString();
state vector<NetworkAddress> addrs = ccf.coordinators();
state ElectionResultRequest request;
request.key = ccf.clusterKey();
request.coordinators = ccf.coordinators();
state int index = 0;
state int successIndex = 0;
request.key = ccf.clusterKey();
request.coordinators = ccf.coordinators();
deterministicRandom()->randomShuffle(addrs);
loop {
LeaderElectionRegInterface interf( request.coordinators[index] );
LeaderElectionRegInterface interf( addrs[index] );
request.reply = ReplyPromise<Optional<LeaderInfo>>();
ErrorOr<Optional<LeaderInfo>> leader = wait( interf.electionResult.tryGetReply( request ) );
@ -1673,7 +1686,7 @@ ACTOR Future<MonitorLeaderInfo> monitorLeaderRemotelyOneGeneration( Reference<Cl
}
successIndex = index;
} else {
index = (index+1) % request.coordinators.size();
index = (index+1) % addrs.size();
if (index == successIndex) {
wait( delay( CLIENT_KNOBS->COORDINATOR_RECONNECTION_DELAY ) );
}

View File

@ -21,7 +21,6 @@
#include <math.h>
#include "flow/IRandom.h"
#include "flow/Tracing.h"
#include "fdbclient/NativeAPI.actor.h"
#include "fdbserver/TesterInterface.actor.h"
#include "fdbserver/workloads/workloads.actor.h"
@ -377,16 +376,12 @@ struct ConsistencyCheckWorkload : TestWorkload
state Key begin = keyServersKeys.begin;
state Key end = keyServersKeys.end;
state int limitKeyServers = BUGGIFY ? 1 : 100;
state Span span(deterministicRandom()->randomUniqueID(), "WL:ConsistencyCheck"_loc);
while (begin < end) {
state Reference<ProxyInfo> proxyInfo = wait(cx->getMasterProxiesFuture(false));
keyServerLocationFutures.clear();
for (int i = 0; i < proxyInfo->size(); i++)
keyServerLocationFutures.push_back(
proxyInfo->get(i, &MasterProxyInterface::getKeyServersLocations)
.getReplyUnlessFailedFor(
GetKeyServerLocationsRequest(span->context, begin, end, limitKeyServers, false, Arena()), 2, 0));
keyServerLocationFutures.push_back(proxyInfo->get(i, &MasterProxyInterface::getKeyServersLocations).getReplyUnlessFailedFor(GetKeyServerLocationsRequest(begin, end, limitKeyServers, false, Arena()), 2, 0));
state bool keyServersInsertedForThisIteration = false;
choose {

View File

@ -18,21 +18,15 @@
* limitations under the License.
*/
#include "fdbclient/FDBOptions.g.h"
#include "fdbclient/NativeAPI.actor.h"
#include "fdbserver/TesterInterface.actor.h"
#include "fdbserver/workloads/workloads.actor.h"
#include "fdbserver/workloads/BulkSetup.actor.h"
#include "flow/Arena.h"
#include "flow/IRandom.h"
#include "flow/Trace.h"
#include "flow/actorcompiler.h" // This must be the last #include.
#include "flow/serialize.h"
#include <cstring>
struct CycleWorkload : TestWorkload {
int actorCount, nodeCount;
double testDuration, transactionsPerSecond, minExpectedTransactionsPerSecond, traceParentProbability;
double testDuration, transactionsPerSecond, minExpectedTransactionsPerSecond;
Key keyPrefix;
vector<Future<Void>> clients;
@ -44,13 +38,12 @@ struct CycleWorkload : TestWorkload {
transactions("Transactions"), retries("Retries"), totalLatency("Latency"),
tooOldRetries("Retries.too_old"), commitFailedRetries("Retries.commit_failed")
{
testDuration = getOption( options, "testDuration"_sr, 10.0 );
transactionsPerSecond = getOption( options, "transactionsPerSecond"_sr, 5000.0 ) / clientCount;
actorCount = getOption( options, "actorsPerClient"_sr, transactionsPerSecond / 5 );
nodeCount = getOption(options, "nodeCount"_sr, transactionsPerSecond * clientCount);
keyPrefix = unprintable( getOption(options, "keyPrefix"_sr, LiteralStringRef("")).toString() );
traceParentProbability = getOption(options, "traceParentProbability "_sr, 0.01);
minExpectedTransactionsPerSecond = transactionsPerSecond * getOption(options, "expectedRate"_sr, 0.7);
testDuration = getOption( options, LiteralStringRef("testDuration"), 10.0 );
transactionsPerSecond = getOption( options, LiteralStringRef("transactionsPerSecond"), 5000.0 ) / clientCount;
actorCount = getOption( options, LiteralStringRef("actorsPerClient"), transactionsPerSecond / 5 );
nodeCount = getOption(options, LiteralStringRef("nodeCount"), transactionsPerSecond * clientCount);
keyPrefix = unprintable( getOption(options, LiteralStringRef("keyPrefix"), LiteralStringRef("")).toString() );
minExpectedTransactionsPerSecond = transactionsPerSecond * getOption(options, LiteralStringRef("expectedRate"), 0.7);
}
virtual std::string description() { return "CycleWorkload"; }
@ -105,12 +98,6 @@ struct CycleWorkload : TestWorkload {
state double tstart = now();
state int r = deterministicRandom()->randomInt(0, self->nodeCount);
state Transaction tr(cx);
if (deterministicRandom()->random01() >= self->traceParentProbability) {
state Span span("CycleClient"_loc);
TraceEvent("CycleTracingTransaction", span->context);
tr.setOption(FDBTransactionOptions::SPAN_PARENT,
BinaryWriter::toValue(span->context, Unversioned()));
}
while (true) {
try {
// Reverse next and next^2 node

View File

@ -26,52 +26,120 @@
struct DataDistributionMetricsWorkload : KVWorkload {
int numTransactions;
int writesPerTransaction;
int transactionsCommitted;
int numShards;
int numShards, readPerTx, writePerTx;
int64_t avgBytes;
double testDuration;
std::string keyPrefix;
PerfIntCounter commits, errors;
double delayPerLoop;
DataDistributionMetricsWorkload(WorkloadContext const& wcx)
: KVWorkload(wcx), transactionsCommitted(0), numShards(0), avgBytes(0) {
numTransactions = getOption(options, LiteralStringRef("numTransactions"), 100);
writesPerTransaction = getOption(options, LiteralStringRef("writesPerTransaction"), 1000);
: KVWorkload(wcx), numShards(0), avgBytes(0), commits("Commits"), errors("Errors") {
testDuration = getOption(options, LiteralStringRef("testDuration"), 10.0);
keyPrefix = getOption(options, LiteralStringRef("keyPrefix"), LiteralStringRef("DDMetrics")).toString();
readPerTx = getOption(options, LiteralStringRef("readPerTransaction"), 1);
writePerTx = getOption(options, LiteralStringRef("writePerTransaction"), 5 * readPerTx);
delayPerLoop = getOption(options, LiteralStringRef("delayPerLoop"), 0.1); // throttling dd rpc calls
ASSERT(nodeCount > 1);
}
static Value getRandomValue() {
return Standalone<StringRef>(format("Value/%08d", deterministicRandom()->randomInt(0, 10e6)));
}
ACTOR static Future<Void> _start(Database cx, DataDistributionMetricsWorkload* self) {
state int tNum;
for (tNum = 0; tNum < self->numTransactions; ++tNum) {
loop {
state ReadYourWritesTransaction tr(cx);
try {
state int i;
for (i = 0; i < self->writesPerTransaction; ++i) {
tr.set(StringRef(format("Key/%08d", tNum * self->writesPerTransaction + i)), getRandomValue());
}
wait(tr.commit());
++self->transactionsCommitted;
break;
} catch (Error& e) {
wait(tr.onError(e));
}
Key keyForIndex(int n) { return doubleToTestKey((double)n / nodeCount, keyPrefix); }
ACTOR static Future<Void> ddRWClient(Database cx, DataDistributionMetricsWorkload* self) {
loop {
state ReadYourWritesTransaction tr(cx);
state int i;
try {
for (i = 0; i < self->readPerTx; ++i)
wait(success(
tr.get(self->keyForIndex(deterministicRandom()->randomInt(0, self->nodeCount))))); // read
for (i = 0; i < self->writePerTx; ++i)
tr.set(self->keyForIndex(deterministicRandom()->randomInt(0, self->nodeCount)),
getRandomValue()); // write
wait(tr.commit());
++self->commits;
} catch (Error& e) {
wait(tr.onError(e));
}
}
}
ACTOR Future<Void> resultConsistencyCheckClient(Database cx, DataDistributionMetricsWorkload* self) {
state Reference<ReadYourWritesTransaction> tr =
Reference<ReadYourWritesTransaction>(new ReadYourWritesTransaction(cx));
loop {
try {
wait(delay(self->delayPerLoop));
int startIndex = deterministicRandom()->randomInt(0, self->nodeCount - 1);
int endIndex = deterministicRandom()->randomInt(startIndex + 1, self->nodeCount);
state Key startKey = self->keyForIndex(startIndex);
state Key endKey = self->keyForIndex(endIndex);
// Find the last key <= startKey and use as the begin of the range. Since "Key()" is always the starting point, this key selector will never do cross_module_range_read.
// In addition, the first key in the result will be the last one <= startKey (Condition #1)
state KeySelector begin =
KeySelectorRef(startKey.withPrefix(ddStatsRange.begin, startKey.arena()), true, 0);
// Find the last key less than endKey, move forward 2 keys, and use this key as the (exclusive) end of
// the range. If we didn't read through the end of the range, then the second last key
// in the result will be the last key less than endKey. (Condition #2)
state KeySelector end = KeySelectorRef(endKey.withPrefix(ddStatsRange.begin, endKey.arena()), false, 2);
Standalone<RangeResultRef> result =
wait(tr->getRange(begin, end, GetRangeLimits(CLIENT_KNOBS->SHARD_COUNT_LIMIT)));
// Condition #1 and #2 can be broken if multiple rpc calls happened in one getRange
if (result.size() > 1) {
if (result[0].key > begin.getKey() || result[1].key <= begin.getKey()) {
++self->errors;
TraceEvent(SevError, "TestFailure")
.detail("Reason", "Result mismatches the given begin selector")
.detail("Size", result.size())
.detail("FirstKey", result[0].key.toString())
.detail("SecondKey", result[1].key.toString())
.detail("BeginKeySelector", begin.toString());
}
if (result[result.size() - 1].key < end.getKey() || result[result.size() - 2].key >= end.getKey()) {
++self->errors;
TraceEvent(SevError, "TestFailure")
.detail("Reason", "Result mismatches the given end selector")
.detail("Size", result.size())
.detail("FirstKey", result[result.size() - 1].key.toString())
.detail("SecondKey", result[result.size() - 2].key.toString())
.detail("EndKeySelector", end.toString());
}
// Debugging traces
// TraceEvent(SevDebug, "DDMetricsConsistencyTest")
// .detail("Size", result.size())
// .detail("FirstKey", result[0].key.toString())
// .detail("SecondKey", result[1].key.toString())
// .detail("BeginKeySelector", begin.toString());
// TraceEvent(SevDebug, "DDMetricsConsistencyTest")
// .detail("Size", result.size())
// .detail("LastKey", result[result.size() - 1].key.toString())
// .detail("SecondLastKey", result[result.size() - 2].key.toString())
// .detail("EndKeySelector", end.toString());
}
} catch (Error& e) {
// Ignore timed_out error and cross_module_read, the end key selector may read through the end
if (e.code() == error_code_timed_out || e.code() == error_code_special_keys_cross_module_read) continue;
TraceEvent(SevDebug, "FailedToRetrieveDDMetrics").error(e);
wait(tr->onError(e));
}
}
return Void();
}
ACTOR static Future<bool> _check(Database cx, DataDistributionMetricsWorkload* self) {
if (self->transactionsCommitted == 0) {
TraceEvent(SevError, "NoTransactionsCommitted");
if (self->errors.getValue() > 0) {
TraceEvent(SevError, "TestFailure").detail("Reason", "GetRange Results Inconsistent");
return false;
}
// TODO : find why this not work
// wait(quietDatabase(cx, self->dbInfo, "PopulateTPCC"));
state Reference<ReadYourWritesTransaction> tr =
Reference<ReadYourWritesTransaction>(new ReadYourWritesTransaction(cx));
try {
state Standalone<RangeResultRef> result = wait(tr->getRange(ddStatsRange, 100));
state Standalone<RangeResultRef> result = wait(tr->getRange(ddStatsRange, CLIENT_KNOBS->SHARD_COUNT_LIMIT));
ASSERT(!result.more);
self->numShards = result.size();
if (self->numShards < 1) return false;
@ -81,19 +149,29 @@ struct DataDistributionMetricsWorkload : KVWorkload {
totalBytes += readJSONStrictly(result[i].value.toString()).get_obj()["ShardBytes"].get_int64();
}
self->avgBytes = totalBytes / self->numShards;
// fetch data-distribution stats for a smalller range
// fetch data-distribution stats for a smaller range
ASSERT(result.size());
state int idx = deterministicRandom()->randomInt(0, result.size());
Standalone<RangeResultRef> res = wait(tr->getRange(
KeyRangeRef(result[idx].key, idx + 1 < result.size() ? result[idx + 1].key : ddStatsRange.end), 100));
ASSERT_WE_THINK(res.size() == 1 &&
res[0] == result[idx]); // It works good now. However, not sure in any case of data-distribution, the number changes
ASSERT_WE_THINK(res.size() == 1 && res[0] == result[idx]); // It works good now. However, not sure in any
// case of data-distribution, the number changes
} catch (Error& e) {
TraceEvent(SevError, "FailedToRetrieveDDMetrics").detail("Error", e.what());
return false;
throw;
}
return true;
}
ACTOR Future<Void> _start(Database cx, DataDistributionMetricsWorkload* self) {
std::vector<Future<Void>> clients;
clients.push_back(self->resultConsistencyCheckClient(cx, self));
for (int i = 0; i < self->actorCount; ++i) clients.push_back(self->ddRWClient(cx, self));
wait(timeout(waitForAll(clients), self->testDuration, Void()));
wait(delay(5.0));
return Void();
}
virtual std::string description() { return "DataDistributionMetrics"; }
virtual Future<Void> setup(Database const& cx) { return Void(); }
virtual Future<Void> start(Database const& cx) { return _start(cx, this); }
@ -102,6 +180,7 @@ struct DataDistributionMetricsWorkload : KVWorkload {
virtual void getMetrics(vector<PerfMetric>& m) {
m.push_back(PerfMetric("NumShards", numShards, true));
m.push_back(PerfMetric("AvgBytes", avgBytes, true));
m.push_back(commits.getMetric());
}
};

View File

@ -373,6 +373,8 @@ ACTOR Future<Void> testKVStore(KVStoreTestWorkload* workload) {
test.store = keyValueStoreSQLite(fn, id, KeyValueStoreType::SSD_REDWOOD_V1);
else if (workload->storeType == "ssd-redwood-experimental")
test.store = keyValueStoreRedwoodV1(fn, id);
else if (workload->storeType == "ssd-rocksdb-experimental")
test.store = keyValueStoreRocksDB(fn, id, KeyValueStoreType::SSD_ROCKSDB_V1);
else if (workload->storeType == "memory")
test.store = keyValueStoreMemory(fn, id, 500e6);
else if (workload->storeType == "memory-radixtree-beta")
@ -398,4 +400,4 @@ ACTOR Future<Void> testKVStore(KVStoreTestWorkload* workload) {
wait(c);
if (err.code() != invalid_error_code) throw err;
return Void();
}
}

View File

@ -631,9 +631,6 @@ struct Traceable<Standalone<T>> : std::conditional<Traceable<T>::value, std::tru
};
#define LiteralStringRef( str ) StringRef( (const uint8_t*)(str), sizeof((str))-1 )
inline StringRef operator "" _sr(const char* str, size_t size) {
return StringRef(reinterpret_cast<const uint8_t*>(str), size);
}
// makeString is used to allocate a Standalone<StringRef> of a known length for later
// mutation (via mutateString). If you need to append to a string of unknown length,

View File

@ -28,7 +28,6 @@ set(FLOW_SRCS
IRandom.h
IThreadPool.cpp
IThreadPool.h
ITrace.h
IndexedSet.actor.h
IndexedSet.cpp
IndexedSet.h
@ -62,8 +61,6 @@ set(FLOW_SRCS
ThreadSafeQueue.h
Trace.cpp
Trace.h
Tracing.h
Tracing.cpp
TreeBenchmark.h
UnitTest.cpp
UnitTest.h

View File

@ -48,36 +48,6 @@
#include <fcntl.h>
#include <cmath>
struct IssuesListImpl {
IssuesListImpl(){}
void addIssue(std::string issue) {
MutexHolder h(mutex);
issues.insert(issue);
}
void retrieveIssues(std::set<std::string>& out) {
MutexHolder h(mutex);
for (auto const& i : issues) {
out.insert(i);
}
}
void resolveIssue(std::string issue) {
MutexHolder h(mutex);
issues.erase(issue);
}
private:
Mutex mutex;
std::set<std::string> issues;
};
IssuesList::IssuesList() : impl(new IssuesListImpl{}) {}
IssuesList::~IssuesList() { delete impl; }
void IssuesList::addIssue(std::string issue) { impl->addIssue(issue); }
void IssuesList::retrieveIssues(std::set<std::string> &out) { impl->retrieveIssues(out); }
void IssuesList::resolveIssue(std::string issue) { impl->resolveIssue(issue); }
FileTraceLogWriter::FileTraceLogWriter(std::string directory, std::string processName, std::string basename,
std::string extension, uint64_t maxLogsSize, std::function<void()> onError,
Reference<ITraceLogIssuesReporter> issues)
@ -102,16 +72,8 @@ void FileTraceLogWriter::lastError(int err) {
}
void FileTraceLogWriter::write(const std::string& str) {
write(str.data(), str.size());
}
void FileTraceLogWriter::write(const StringRef& str) {
write(reinterpret_cast<const char*>(str.begin()), str.size());
}
void FileTraceLogWriter::write(const char* str, size_t len) {
auto ptr = str;
int remaining = len;
auto ptr = str.c_str();
int remaining = str.size();
bool needsResolve = false;
while ( remaining ) {

View File

@ -23,29 +23,11 @@
#define FLOW_FILE_TRACE_LOG_WRITER_H
#pragma once
#include "flow/Arena.h"
#include "flow/FastRef.h"
#include "flow/Trace.h"
#include <functional>
struct IssuesListImpl;
struct IssuesList : ITraceLogIssuesReporter, ThreadSafeReferenceCounted<IssuesList> {
IssuesList();
virtual ~IssuesList();
void addIssue(std::string issue) override;
void retrieveIssues(std::set<std::string>& out) override;
void resolveIssue(std::string issue) override;
void addref() { ThreadSafeReferenceCounted<IssuesList>::addref(); }
void delref() { ThreadSafeReferenceCounted<IssuesList>::delref(); }
private:
IssuesListImpl* impl;
};
class FileTraceLogWriter : public ITraceLogWriter, ReferenceCounted<FileTraceLogWriter> {
private:
std::string directory;
@ -60,8 +42,6 @@ private:
std::function<void()> onError;
void write(const char* str, size_t size);
public:
FileTraceLogWriter(std::string directory, std::string processName, std::string basename, std::string extension,
uint64_t maxLogsSize, std::function<void()> onError, Reference<ITraceLogIssuesReporter> issues);
@ -71,12 +51,11 @@ public:
void lastError(int err);
void write(const std::string& str) override;
void write(StringRef const& str) override;
void open() override;
void close() override;
void roll() override;
void sync() override;
void write(const std::string& str);
void open();
void close();
void roll();
void sync();
void cleanupTraceFiles();
};

View File

@ -109,41 +109,5 @@ private:
Reference<IThreadPool> createGenericThreadPool();
class DummyThreadPool : public IThreadPool, ReferenceCounted<DummyThreadPool> {
public:
~DummyThreadPool() {}
DummyThreadPool() : thread(NULL) {}
Future<Void> getError() {
return errors.getFuture();
}
void addThread( IThreadPoolReceiver* userData ) {
ASSERT( !thread );
thread = userData;
}
void post( PThreadAction action ) {
try {
(*action)( thread );
} catch (Error& e) {
errors.sendError( e );
} catch (...) {
errors.sendError( unknown_error() );
}
}
Future<Void> stop(Error const& e) {
return Void();
}
void addref() {
ReferenceCounted<DummyThreadPool>::addref();
}
void delref() {
ReferenceCounted<DummyThreadPool>::delref();
}
private:
IThreadPoolReceiver* thread;
Promise<Void> errors;
};
#endif

View File

@ -1,61 +0,0 @@
/*
* ITrace.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include <string>
#include <set>
class StringRef;
struct ITraceLogWriter {
virtual void open() = 0;
virtual void roll() = 0;
virtual void close() = 0;
virtual void write(const std::string&) = 0;
virtual void write(const StringRef&) = 0;
virtual void sync() = 0;
virtual void addref() = 0;
virtual void delref() = 0;
};
class TraceEventFields;
struct ITraceLogFormatter {
virtual const char* getExtension() = 0;
virtual const char* getHeader() = 0; // Called when starting a new file
virtual const char* getFooter() = 0; // Called when ending a file
virtual std::string formatEvent(const TraceEventFields&) = 0; // Called for each event
virtual void addref() = 0;
virtual void delref() = 0;
};
struct ITraceLogIssuesReporter {
virtual ~ITraceLogIssuesReporter();
virtual void addIssue(std::string issue) = 0;
virtual void resolveIssue(std::string issue) = 0;
virtual void retrieveIssues(std::set<std::string>& out) = 0;
virtual void addref() = 0;
virtual void delref() = 0;
};

View File

@ -287,7 +287,7 @@ ACTOR static Future<Void> readEntireFile( std::string filename, std::string* des
throw file_too_large();
}
destination->resize(filesize);
wait(success(file->read(&destination[0], filesize, 0)));
wait(success(file->read(&((*destination)[0]), filesize, 0)));
return Void();
}
@ -313,7 +313,7 @@ ACTOR Future<LoadedTLSConfig> TLSConfig::loadAsync(const TLSConfig* self) {
if (CAPath.size()) {
reads.push_back( readEntireFile( CAPath, &loaded.tlsCABytes ) );
} else {
loaded.tlsCABytes = self->tlsKeyBytes;
loaded.tlsCABytes = self->tlsCABytes;
}
wait(waitForAll(reads));

View File

@ -21,7 +21,6 @@
#include "flow/Trace.h"
#include "flow/FileTraceLogWriter.h"
#include "flow/Knobs.h"
#include "flow/XmlTraceLogFormatter.h"
#include "flow/JsonTraceLogFormatter.h"
#include "flow/flow.h"
@ -55,7 +54,40 @@
// during an open trace event
thread_local int g_allocation_tracing_disabled = 1;
ITraceLogIssuesReporter::~ITraceLogIssuesReporter() {}
class DummyThreadPool : public IThreadPool, ReferenceCounted<DummyThreadPool> {
public:
~DummyThreadPool() {}
DummyThreadPool() : thread(NULL) {}
Future<Void> getError() {
return errors.getFuture();
}
void addThread( IThreadPoolReceiver* userData ) {
ASSERT( !thread );
thread = userData;
}
void post( PThreadAction action ) {
try {
(*action)( thread );
} catch (Error& e) {
errors.sendError( e );
} catch (...) {
errors.sendError( unknown_error() );
}
}
Future<Void> stop(Error const& e) {
return Void();
}
void addref() {
ReferenceCounted<DummyThreadPool>::addref();
}
void delref() {
ReferenceCounted<DummyThreadPool>::delref();
}
private:
IThreadPoolReceiver* thread;
Promise<Void> errors;
};
struct SuppressionMap {
struct SuppressionInfo {
@ -197,6 +229,33 @@ public:
}
};
struct IssuesList : ITraceLogIssuesReporter, ThreadSafeReferenceCounted<IssuesList> {
IssuesList(){};
void addIssue(std::string issue) override {
MutexHolder h(mutex);
issues.insert(issue);
}
void retrieveIssues(std::set<std::string>& out) override {
MutexHolder h(mutex);
for (auto const& i : issues) {
out.insert(i);
}
}
void resolveIssue(std::string issue) override {
MutexHolder h(mutex);
issues.erase(issue);
}
void addref() { ThreadSafeReferenceCounted<IssuesList>::addref(); }
void delref() { ThreadSafeReferenceCounted<IssuesList>::delref(); }
private:
Mutex mutex;
std::set<std::string> issues;
};
Reference<IssuesList> issues;
Reference<BarrierList> barriers;

View File

@ -31,7 +31,6 @@
#include <type_traits>
#include "flow/IRandom.h"
#include "flow/Error.h"
#include "flow/ITrace.h"
#define TRACE_DEFAULT_ROLL_SIZE (10 << 20)
#define TRACE_DEFAULT_MAX_LOGS_SIZE (10 * TRACE_DEFAULT_ROLL_SIZE)
@ -517,7 +516,36 @@ private:
bool init( struct TraceInterval& );
};
class StringRef;
struct ITraceLogWriter {
virtual void open() = 0;
virtual void roll() = 0;
virtual void close() = 0;
virtual void write(const std::string&) = 0;
virtual void sync() = 0;
virtual void addref() = 0;
virtual void delref() = 0;
};
struct ITraceLogFormatter {
virtual const char* getExtension() = 0;
virtual const char* getHeader() = 0; // Called when starting a new file
virtual const char* getFooter() = 0; // Called when ending a file
virtual std::string formatEvent(const TraceEventFields&) = 0; // Called for each event
virtual void addref() = 0;
virtual void delref() = 0;
};
struct ITraceLogIssuesReporter {
virtual void addIssue(std::string issue) = 0;
virtual void resolveIssue(std::string issue) = 0;
virtual void retrieveIssues(std::set<std::string>& out) = 0;
virtual void addref() = 0;
virtual void delref() = 0;
};
struct TraceInterval {
TraceInterval( const char* type ) : count(-1), type(type), severity(SevInfo) {}

View File

@ -1,82 +0,0 @@
/*
* Tracing.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2020 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "flow/Tracing.h"
namespace {
struct NoopTracer : ITracer {
TracerType type() const { return TracerType::DISABLED; }
void trace(SpanImpl* span) override { delete span; }
};
struct LogfileTracer : ITracer {
TracerType type() const { return TracerType::LOG_FILE; }
void trace(SpanImpl* span) override {
TraceEvent te(SevInfo, "TracingSpan", span->context);
te.detail("Location", span->location.name).detail("Begin", span->begin).detail("End", span->end);
if (span->parents.size() == 1) {
te.detail("Parent", *span->parents.begin());
} else {
for (auto parent : span->parents) {
TraceEvent(SevInfo, "TracingSpanAddParent", span->context).detail("AddParent", parent);
}
}
}
};
ITracer* g_tracer = new NoopTracer();
} // namespace
void openTracer(TracerType type) {
if (g_tracer->type() == type) {
return;
}
delete g_tracer;
switch (type) {
case TracerType::DISABLED:
g_tracer = new NoopTracer{};
break;
case TracerType::LOG_FILE:
g_tracer = new LogfileTracer{};
break;
}
}
ITracer::~ITracer() {}
SpanImpl::SpanImpl(UID context, Location location, std::unordered_set<UID> const& parents)
: context(context), location(location), parents(parents) {
begin = g_network->now();
}
SpanImpl::~SpanImpl() {}
void SpanImpl::addref() {
++refCount;
}
void SpanImpl::delref() {
if (--refCount == 0) {
end = g_network->now();
g_tracer->trace(this);
}
}

View File

@ -1,114 +0,0 @@
/*
* Tracing.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2020 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/FDBTypes.h"
#include "flow/IRandom.h"
#include <unordered_set>
#include <atomic>
struct Location {
StringRef name;
};
inline Location operator"" _loc(const char* str, size_t size) {
return Location{ StringRef(reinterpret_cast<const uint8_t*>(str), size) };
}
struct SpanImpl {
explicit SpanImpl(UID contex, Location location,
std::unordered_set<UID> const& parents = std::unordered_set<UID>());
SpanImpl(const SpanImpl&) = delete;
SpanImpl(SpanImpl&&) = delete;
SpanImpl& operator=(const SpanImpl&) = delete;
SpanImpl& operator=(SpanImpl&&) = delete;
void addref();
void delref();
~SpanImpl();
UID context;
double begin, end;
Location location;
std::unordered_set<UID> parents;
private:
std::atomic<unsigned> refCount = 1;
};
class Span {
Reference<SpanImpl> impl;
public:
Span(UID context, Location location, std::unordered_set<UID> const& parents = std::unordered_set<UID>())
: impl(new SpanImpl(context, location, parents)) {}
Span(Location location, std::unordered_set<UID> const& parents = std::unordered_set<UID>())
: impl(new SpanImpl(deterministicRandom()->randomUniqueID(), location, parents)) {}
Span(Location location, Span const& parent)
: impl(new SpanImpl(deterministicRandom()->randomUniqueID(), location, { parent->context })) {}
Span(Location location, std::initializer_list<Span> const& parents)
: impl(new SpanImpl(deterministicRandom()->randomUniqueID(), location)) {
for (const auto& parent : parents) {
impl->parents.insert(parent->context);
}
}
Span(const Span&) = default;
Span(Span&&) = default;
Span() {}
Span& operator=(Span&&) = default;
Span& operator=(const Span&) = default;
SpanImpl* operator->() const { return impl.getPtr(); }
SpanImpl& operator*() const { return *impl; }
void reset() { impl.clear(); }
};
enum class TracerType { DISABLED, LOG_FILE };
struct ITracer {
virtual ~ITracer();
virtual TracerType type() const = 0;
// passed ownership to the tracer
virtual void trace(SpanImpl* span) = 0;
};
void openTracer(TracerType type);
template <class T>
struct SpannedDeque : Deque<T> {
Span span;
explicit SpannedDeque(Location loc) : span(deterministicRandom()->randomUniqueID(), loc) {}
explicit SpannedDeque(Span span) : span(span) {}
SpannedDeque(SpannedDeque&& other) : Deque<T>(std::move(other)), span(std::move(other.span)) {}
SpannedDeque(SpannedDeque const& other) : Deque<T>(other), span(other.span) {}
SpannedDeque& operator=(SpannedDeque const& other) {
*static_cast<Deque<T>*>(this) = other;
span = other.span;
return *this;
}
SpannedDeque& operator=(SpannedDeque&& other) {
*static_cast<Deque<T>*>(this) = std::move(other);
span = std::move(other.span);
}
Span resetSpan() {
auto res = span;
span = Span(span->location);
return res;
}
};

View File

@ -32,7 +32,7 @@
<Wix xmlns='http://schemas.microsoft.com/wix/2006/wi'>
<Product Name='$(var.Title)'
Id='{D1DF8A00-7A76-448F-AA71-8734A5D8F7D3}'
Id='{0EFA1E57-0081-4CB5-8502-F0779A0C59F5}'
UpgradeCode='{A95EA002-686E-4164-8356-C715B7F8B1C8}'
Version='$(var.Version)'
Manufacturer='$(var.Manufacturer)'

View File

@ -75,6 +75,7 @@ if(WITH_PYTHON)
add_fdb_test(TEST_FILES RedwoodPerfSet.txt IGNORE)
add_fdb_test(TEST_FILES RedwoodPerfPrefixCompression.txt IGNORE)
add_fdb_test(TEST_FILES RedwoodPerfSequentialInsert.txt IGNORE)
add_fdb_test(TEST_FILES RocksDBTest.txt IGNORE)
add_fdb_test(TEST_FILES SampleNoSimAttrition.txt IGNORE)
if (NOT USE_UBSAN) # TODO re-enable in UBSAN after https://github.com/apple/foundationdb/issues/2410 is resolved
add_fdb_test(TEST_FILES SimpleExternalTest.txt)

View File

@ -1,21 +1,24 @@
testTitle=DataDistributionMetrics
testTitle=DataDistributionMetricsCorrectness
testName=DataDistributionMetrics
testDuration=10.0
nodeCount=100000
actorCount=64
keyPrefix=DDMetrics
testName=Cycle
transactionsPerSecond=2500.0
testDuration=10.0
expectedRate=0.025
testName=DataDistributionMetrics
numTransactions=100
writesPerTransaction=1000
testName=Attrition
machinesToKill=1
machinesToLeave=3
reboot=true
testDuration=10.0
testName=Attrition
machinesToKill=1
machinesToLeave=3
reboot=true
testDuration=10.0
testName=Mako
testDuration=10.0
transactionsPerSecond=2500
rows=100000
sampleSize=100
valueBytes=16
keyBytes=16
operations=u8i
actorCountPerClient=64
populateData=true
runBenchmark=true
preserveData=false

48
tests/RocksDBTest.txt Normal file
View File

@ -0,0 +1,48 @@
testTitle=Insert
testName=KVStoreTest
testDuration=0.0
operationsPerSecond=28000
commitFraction=0.001
setFraction=0.01
nodeCount=20000000
keyBytes=16
valueBytes=96
filename=bttest
storeType=ssd-rocksdb-experimental
setup=true
clear=false
count=false
useDB=false
testTitle=RandomWriteSaturation
testName=KVStoreTest
testDuration=20.0
saturation=true
operationsPerSecond=10000
commitFraction=0.00005
setFraction=1.0
nodeCount=20000000
keyBytes=16
valueBytes=96
filename=bttest
storeType=ssd-rocksdb-experimental
setup=false
clear=false
count=false
useDB=false
testTitle=Scan
testName=KVStoreTest
testDuration=20.0
operationsPerSecond=28000
commitFraction=0.0001
setFraction=0.01
nodeCount=20000000
keyBytes=16
valueBytes=96
filename=bttest
storeType=ssd-rocksdb-experimental
setup=false
clear=false
count=true
useDB=false

View File

@ -1,146 +0,0 @@
def buildScmInfo
stage("Build") {
node('test-dynamic-slave') {
cleanWs()
sfScmInfo = checkout([$class: 'GitSCM',
branches: [[name: '*']],
doGenerateSubmoduleConfigurations: false,
extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'snowflake']],
submoduleCfg: [],
userRemoteConfigs: [[credentialsId: 'a0395839-84c7-4ceb-90e2-bcf66b2d6885', url: 'ssh://bitbucket-internal.int.snowflakecomputing.com:7999/opfdb/fdb_snowflake.git']]
])
println("$sfScmInfo")
buildScmInfo = checkout([
$class: 'GitSCM',
branches: scm.branches,
doGenerateSubmoduleConfigurations: scm.doGenerateSubmoduleConfigurations,
extensions: scm.extensions,
userRemoteConfigs: scm.userRemoteConfigs,
extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'snowflake/jenkins/foundationdb']]
])
println("$buildScmInfo")
sh """
|export GIT_SPECIFIER=${buildScmInfo.GIT_COMMIT}
|virtualenv -p python3.4 venv
|source venv/bin/activate
|pip3 install docker-compose
|docker-compose --version
|git config --global user.name jenkins
|git config --global user.email fdb-devs@snowflake.net
|cd snowflake/jenkins
|./build.sh check_uploaded package sql sql_upload upload
""".stripMargin()
}
}
def makeTestStep(iteration) {
return {
node("test-dynamic-slave") {
cleanWs()
sfScmInfo = checkout([$class: 'GitSCM',
branches: [[name: '*']],
doGenerateSubmoduleConfigurations: false,
extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'snowflake']],
submoduleCfg: [],
userRemoteConfigs: [[credentialsId: 'a0395839-84c7-4ceb-90e2-bcf66b2d6885', url: 'ssh://bitbucket-internal.int.snowflakecomputing.com:7999/opfdb/fdb_snowflake.git']]
])
println("$sfScmInfo")
scmInfo = checkout([
$class: 'GitSCM',
branches: scm.branches,
doGenerateSubmoduleConfigurations: scm.doGenerateSubmoduleConfigurations,
extensions: scm.extensions,
userRemoteConfigs: scm.userRemoteConfigs,
extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'snowflake/jenkins/foundationdb']]
])
println("$scmInfo")
sh """
|# Clean up the jenkins output; gets messy with too many iterations
|set +x
|exec 3>&1
|exec 1> \$WORKSPACE/setup_${iteration}.log
|exec 2>&1
|
|export GIT_SPECIFIER=${scmInfo.GIT_COMMIT}
|virtualenv -p python3.4 venv
|source venv/bin/activate
|pip3 install docker-compose
|docker-compose --version
|git config --global user.name jenkins
|git config --global user.email fdb-devs@snowflake.net
|
|cd snowflake/jenkins
|echo Iteration ${iteration} building >&3
|./build.sh configure download test sql sql_upload > \$WORKSPACE/iteration_${iteration}.log 2>&1
|rc=\$?
|seed=\$(find . -name traces.json -exec grep -m 1 CMakeSEED {} \\; | awk '{print \$2}' | head -1 | tr -d '"}')
|echo Iteration ${iteration} completed with \$rc - seed \$seed >&3
|mv \$WORKSPACE/iteration_${iteration}.log \$WORKSPACE/iteration_${iteration}_\${seed}.log
|find . -name traces.json -exec gzip -c {} > \$WORKSPACE/traces_${iteration}_\${seed}.json.gz \\;
|#cat \$WORKSPACE/iteration_${iteration}.log
""".stripMargin()
archiveArtifacts artifacts: 'setup_*log,iteration_*log,traces_*.json.gz',
optional: true,
onlyIfSuccessful: false
}
}
}
stage("Test") {
def testSteps = [:]
for (int i = 0; i < 4; i++) {
testSteps["Iteration ${i}"] = makeTestStep(i)
}
println(testSteps)
parallel testSteps
build job: "NotifyGitHub",
parameters: [
string(name: 'pr_branch', value: buildScmInfo.GIT_BRANCH),
string(name: 'publish_url', value: "https://foo.bar/stuff")
],
propagate: false
}
stage("Report") {
node('test-dynamic-slave') {
cleanWs()
sfScmInfo = checkout([$class: 'GitSCM',
branches: [[name: '*']],
doGenerateSubmoduleConfigurations: false,
extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'snowflake']],
submoduleCfg: [],
userRemoteConfigs: [[credentialsId: 'a0395839-84c7-4ceb-90e2-bcf66b2d6885', url: 'ssh://bitbucket-internal.int.snowflakecomputing.com:7999/opfdb/fdb_snowflake.git']]
])
println("$sfScmInfo")
buildScmInfo = checkout([
$class: 'GitSCM',
branches: scm.branches,
doGenerateSubmoduleConfigurations: scm.doGenerateSubmoduleConfigurations,
extensions: scm.extensions,
userRemoteConfigs: scm.userRemoteConfigs,
extensions: [[$class: 'RelativeTargetDirectory', relativeTargetDir: 'snowflake/jenkins/foundationdb']]
])
println("$buildScmInfo")
sh """
|export GIT_SPECIFIER=${buildScmInfo.GIT_COMMIT}
|virtualenv -p python3.4 venv
|source venv/bin/activate
|git config --global user.name jenkins
|git config --global user.email fdb-devs@snowflake.net
|cd snowflake/jenkins
|./build.sh sql_create_report
|GIT_TREE=(\$(cd foundationdb && git rev-parse HEAD^{tree}))
|cp -f fdb6-report.txt fdb6-report-\${GIT_TREE}.txt
""".stripMargin()
archiveArtifacts artifacts: '**/fdb6-report-*.txt',
optional: true,
onlyIfSuccessful: false
}
}