Merge branch 'main' into tenantquota

This commit is contained in:
Ankita Kejriwal 2022-08-18 10:59:00 -07:00 committed by GitHub
commit ff6fe48909
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
199 changed files with 7664 additions and 2464 deletions

View File

@ -20,7 +20,7 @@ If you have questions, we encourage you to engage in discussion on the [communit
## Before you get started ## Before you get started
### Community Guidelines ### Community Guidelines
We want the FoundationDB community to be as welcoming and inclusive as possible, and have adopted a [Code of Conduct](CODE_OF_CONDUCT.md) that we ask all community members to read and observe. We want the FoundationDB community to be as welcoming and inclusive as possible, and have adopted a [Code of Conduct](CODE_OF_CONDUCT.md) that we ask all community members to read and abide by.
### Project Licensing ### Project Licensing
By submitting a pull request, you represent that you have the right to license your contribution to Apple and the community, and agree by submitting the patch that your contributions are licensed under the Apache 2.0 license. By submitting a pull request, you represent that you have the right to license your contribution to Apple and the community, and agree by submitting the patch that your contributions are licensed under the Apache 2.0 license.
@ -34,7 +34,7 @@ Members of the Apple FoundationDB team are part of the core committers helping r
## Contributing ## Contributing
### Opening a Pull Request ### Opening a Pull Request
We love pull requests! For minor changes, feel free to open up a PR directly. For larger feature development and any changes that may require community discussion, we ask that you discuss your ideas on the [community forums](https://forums.foundationdb.org) prior to opening a PR, and then reference that thread within your PR comment. Please refer to [FoundationDB Commit Process](https://github.com/apple/foundationdb/wiki/FoundationDB-Commit-Process) for more detailed guidelines. We love pull requests! For minor changes, feel free to open up a PR directly. For larger feature development and any changes that may require community discussion, we ask that you discuss your ideas on the [community forums](https://forums.foundationdb.org) prior to opening a PR, and then reference that thread within your PR comment. Please refer to the [FoundationDB Commit Process](https://github.com/apple/foundationdb/wiki/FoundationDB-Commit-Process) for more detailed guidelines.
CI will be run automatically for core committers, and for community PRs it will be initiated by the request of a core committer. Tests can also be run locally via `ctest`, and core committers can run additional validation on pull requests prior to merging them. CI will be run automatically for core committers, and for community PRs it will be initiated by the request of a core committer. Tests can also be run locally via `ctest`, and core committers can run additional validation on pull requests prior to merging them.
@ -46,10 +46,10 @@ To report a security issue, please **DO NOT** start by filing a public issue or
## Project Communication ## Project Communication
### Community Forums ### Community Forums
We encourage your participation asking questions and helping improve the FoundationDB project. Check out the [FoundationDB community forums](https://forums.foundationdb.org), which serve a similar function as mailing lists in many open source projects. The forums are organized into three sections: We encourage your participation asking questions and helping improve the FoundationDB project. Check out the [FoundationDB community forums](https://forums.foundationdb.org), which serve a similar function as mailing lists in many open source projects. The forums are organized into three categories:
* [Development](https://forums.foundationdb.org/c/development): For discussing the internals and development of the FoundationDB core, as well as layers. * [Development](https://forums.foundationdb.org/c/development): For discussing the internals and development of the FoundationDB core, as well as layers.
* [Using FoundationDB](https://forums.foundationdb.org/c/using-foundationdb): For discussing user-facing topics. Getting started and have a question? This is the place for you. * [Using FoundationDB](https://forums.foundationdb.org/c/using-foundationdb): For discussing user-facing topics. Getting started and have a question? This is the category for you.
* [Site Feedback](https://forums.foundationdb.org/c/site-feedback): A category for discussing the forums and the OSS project, its organization, how it works, and how we can improve it. * [Site Feedback](https://forums.foundationdb.org/c/site-feedback): A category for discussing the forums and the OSS project, its organization, how it works, and how we can improve it.
### Using GitHub Issues and Community Forums ### Using GitHub Issues and Community Forums
@ -63,4 +63,4 @@ GitHub Issues should be used for tracking tasks. If you know the specific code t
* Implementing an agreed upon feature: *GitHub Issues* * Implementing an agreed upon feature: *GitHub Issues*
### Project and Development Updates ### Project and Development Updates
Stay connected to the project and the community! For project and community updates, follow the [FoundationDB project blog](https://www.foundationdb.org/blog/). Development announcements will be made via the community forums' [dev-announce](https://forums.foundationdb.org/c/development/dev-announce) section. Stay connected to the project and the community! For project and community updates, follow the [FoundationDB project blog](https://www.foundationdb.org/blog/). Development announcements will be made via the community forums' [dev-announce](https://forums.foundationdb.org/c/development/dev-announce) category.

View File

@ -141,6 +141,7 @@ if(NOT WIN32)
test/apitester/TesterBlobGranuleCorrectnessWorkload.cpp test/apitester/TesterBlobGranuleCorrectnessWorkload.cpp
test/apitester/TesterCancelTransactionWorkload.cpp test/apitester/TesterCancelTransactionWorkload.cpp
test/apitester/TesterCorrectnessWorkload.cpp test/apitester/TesterCorrectnessWorkload.cpp
test/apitester/TesterExampleWorkload.cpp
test/apitester/TesterKeyValueStore.cpp test/apitester/TesterKeyValueStore.cpp
test/apitester/TesterKeyValueStore.h test/apitester/TesterKeyValueStore.h
test/apitester/TesterOptions.h test/apitester/TesterOptions.h
@ -332,6 +333,26 @@ if(NOT WIN32)
@SERVER_CA_FILE@ @SERVER_CA_FILE@
) )
add_test(NAME fdb_c_upgrade_to_future_version
COMMAND ${CMAKE_SOURCE_DIR}/tests/TestRunner/upgrade_test.py
--build-dir ${CMAKE_BINARY_DIR}
--test-file ${CMAKE_SOURCE_DIR}/bindings/c/test/apitester/tests/upgrade/MixedApiWorkloadMultiThr.toml
--upgrade-path "7.2.0" "7.3.0" "7.2.0"
--process-number 3
)
set_tests_properties("fdb_c_upgrade_to_future_version" PROPERTIES ENVIRONMENT "${SANITIZER_OPTIONS}")
if (0) # reenable after stabilizing the test
add_test(NAME fdb_c_upgrade_to_future_version_blob_granules
COMMAND ${CMAKE_SOURCE_DIR}/tests/TestRunner/upgrade_test.py
--build-dir ${CMAKE_BINARY_DIR}
--test-file ${CMAKE_SOURCE_DIR}/bindings/c/test/apitester/tests/upgrade/ApiBlobGranulesCorrectness.toml
--upgrade-path "7.2.0" "7.3.0" "7.2.0"
--blob-granules-enabled
--process-number 3
)
endif()
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64" AND NOT USE_SANITIZER) if(CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64" AND NOT USE_SANITIZER)
add_test(NAME fdb_c_upgrade_single_threaded_630api add_test(NAME fdb_c_upgrade_single_threaded_630api
COMMAND ${CMAKE_SOURCE_DIR}/tests/TestRunner/upgrade_test.py COMMAND ${CMAKE_SOURCE_DIR}/tests/TestRunner/upgrade_test.py
@ -439,7 +460,7 @@ if (OPEN_FOR_IDE)
target_link_libraries(fdb_c_shim_lib_tester PRIVATE fdb_c_shim SimpleOpt fdb_cpp Threads::Threads) target_link_libraries(fdb_c_shim_lib_tester PRIVATE fdb_c_shim SimpleOpt fdb_cpp Threads::Threads)
target_include_directories(fdb_c_shim_lib_tester PUBLIC ${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR}/foundationdb/ ${CMAKE_SOURCE_DIR}/flow/include) target_include_directories(fdb_c_shim_lib_tester PUBLIC ${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR}/foundationdb/ ${CMAKE_SOURCE_DIR}/flow/include)
elseif(NOT WIN32 AND NOT APPLE AND NOT USE_UBSAN) # Linux Only, non-ubsan only elseif(NOT WIN32 AND NOT APPLE AND NOT USE_SANITIZER) # Linux Only, non-santizer only
set(SHIM_LIB_OUTPUT_DIR ${CMAKE_CURRENT_BINARY_DIR}) set(SHIM_LIB_OUTPUT_DIR ${CMAKE_CURRENT_BINARY_DIR})
@ -492,7 +513,7 @@ elseif(NOT WIN32 AND NOT APPLE AND NOT USE_UBSAN) # Linux Only, non-ubsan only
--api-test-dir ${CMAKE_SOURCE_DIR}/bindings/c/test/apitester/tests --api-test-dir ${CMAKE_SOURCE_DIR}/bindings/c/test/apitester/tests
) )
endif() # End Linux only, non-ubsan only endif() # End Linux only, non-sanitizer only
# TODO: re-enable once the old vcxproj-based build system is removed. # TODO: re-enable once the old vcxproj-based build system is removed.
#generate_export_header(fdb_c EXPORT_MACRO_NAME "DLLEXPORT" #generate_export_header(fdb_c EXPORT_MACRO_NAME "DLLEXPORT"
@ -537,7 +558,7 @@ fdb_install(
DESTINATION_SUFFIX "/cmake/${targets_export_name}" DESTINATION_SUFFIX "/cmake/${targets_export_name}"
COMPONENT clients) COMPONENT clients)
if(NOT WIN32 AND NOT APPLE AND NOT USE_UBSAN) # Linux Only, non-ubsan only if(NOT WIN32 AND NOT APPLE AND NOT USE_SANITIZER) # Linux Only, non-sanitizer only
fdb_install( fdb_install(
FILES foundationdb/fdb_c_shim.h FILES foundationdb/fdb_c_shim.h

View File

@ -79,9 +79,10 @@ extern "C" DLLEXPORT fdb_bool_t fdb_error_predicate(int predicate_test, fdb_erro
if (predicate_test == FDBErrorPredicates::RETRYABLE_NOT_COMMITTED) { if (predicate_test == FDBErrorPredicates::RETRYABLE_NOT_COMMITTED) {
return code == error_code_not_committed || code == error_code_transaction_too_old || return code == error_code_not_committed || code == error_code_transaction_too_old ||
code == error_code_future_version || code == error_code_database_locked || code == error_code_future_version || code == error_code_database_locked ||
code == error_code_proxy_memory_limit_exceeded || code == error_code_batch_transaction_throttled || code == error_code_grv_proxy_memory_limit_exceeded ||
code == error_code_process_behind || code == error_code_tag_throttled || code == error_code_commit_proxy_memory_limit_exceeded ||
code == error_code_unknown_tenant; code == error_code_batch_transaction_throttled || code == error_code_process_behind ||
code == error_code_tag_throttled || code == error_code_unknown_tenant;
} }
return false; return false;
} }
@ -238,6 +239,10 @@ fdb_error_t fdb_future_get_version_v619(FDBFuture* f, int64_t* out_version) {
CATCH_AND_RETURN(*out_version = TSAV(Version, f)->get();); CATCH_AND_RETURN(*out_version = TSAV(Version, f)->get(););
} }
extern "C" DLLEXPORT fdb_error_t fdb_future_get_bool(FDBFuture* f, fdb_bool_t* out_value) {
CATCH_AND_RETURN(*out_value = TSAV(bool, f)->get(););
}
extern "C" DLLEXPORT fdb_error_t fdb_future_get_int64(FDBFuture* f, int64_t* out_value) { extern "C" DLLEXPORT fdb_error_t fdb_future_get_int64(FDBFuture* f, int64_t* out_value) {
CATCH_AND_RETURN(*out_value = TSAV(int64_t, f)->get();); CATCH_AND_RETURN(*out_value = TSAV(int64_t, f)->get(););
} }
@ -493,6 +498,54 @@ extern "C" DLLEXPORT FDBFuture* fdb_database_wait_purge_granules_complete(FDBDat
FDBFuture*)(DB(db)->waitPurgeGranulesComplete(StringRef(purge_key_name, purge_key_name_length)).extractPtr()); FDBFuture*)(DB(db)->waitPurgeGranulesComplete(StringRef(purge_key_name, purge_key_name_length)).extractPtr());
} }
extern "C" DLLEXPORT FDBFuture* fdb_database_blobbify_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length) {
return (FDBFuture*)(DB(db)
->blobbifyRange(KeyRangeRef(StringRef(begin_key_name, begin_key_name_length),
StringRef(end_key_name, end_key_name_length)))
.extractPtr());
}
extern "C" DLLEXPORT FDBFuture* fdb_database_unblobbify_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length) {
return (FDBFuture*)(DB(db)
->unblobbifyRange(KeyRangeRef(StringRef(begin_key_name, begin_key_name_length),
StringRef(end_key_name, end_key_name_length)))
.extractPtr());
}
extern "C" DLLEXPORT FDBFuture* fdb_database_list_blobbified_ranges(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int rangeLimit) {
return (FDBFuture*)(DB(db)
->listBlobbifiedRanges(KeyRangeRef(StringRef(begin_key_name, begin_key_name_length),
StringRef(end_key_name, end_key_name_length)),
rangeLimit)
.extractPtr());
}
extern "C" DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_verify_blob_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int64_t version) {
return (FDBFuture*)(DB(db)
->verifyBlobRange(KeyRangeRef(StringRef(begin_key_name, begin_key_name_length),
StringRef(end_key_name, end_key_name_length)),
version)
.extractPtr());
}
extern "C" DLLEXPORT fdb_error_t fdb_tenant_create_transaction(FDBTenant* tenant, FDBTransaction** out_transaction) { extern "C" DLLEXPORT fdb_error_t fdb_tenant_create_transaction(FDBTenant* tenant, FDBTransaction** out_transaction) {
CATCH_AND_RETURN(*out_transaction = (FDBTransaction*)TENANT(tenant)->createTransaction().extractPtr();); CATCH_AND_RETURN(*out_transaction = (FDBTransaction*)TENANT(tenant)->createTransaction().extractPtr(););
} }
@ -855,11 +908,12 @@ extern "C" DLLEXPORT FDBFuture* fdb_transaction_get_blob_granule_ranges(FDBTrans
uint8_t const* begin_key_name, uint8_t const* begin_key_name,
int begin_key_name_length, int begin_key_name_length,
uint8_t const* end_key_name, uint8_t const* end_key_name,
int end_key_name_length) { int end_key_name_length,
int rangeLimit) {
RETURN_FUTURE_ON_ERROR( RETURN_FUTURE_ON_ERROR(
Standalone<VectorRef<KeyRangeRef>>, Standalone<VectorRef<KeyRangeRef>>,
KeyRangeRef range(KeyRef(begin_key_name, begin_key_name_length), KeyRef(end_key_name, end_key_name_length)); KeyRangeRef range(KeyRef(begin_key_name, begin_key_name_length), KeyRef(end_key_name, end_key_name_length));
return (FDBFuture*)(TXN(tr)->getBlobGranuleRanges(range).extractPtr());); return (FDBFuture*)(TXN(tr)->getBlobGranuleRanges(range, rangeLimit).extractPtr()););
} }
extern "C" DLLEXPORT FDBResult* fdb_transaction_read_blob_granules(FDBTransaction* tr, extern "C" DLLEXPORT FDBResult* fdb_transaction_read_blob_granules(FDBTransaction* tr,
@ -964,6 +1018,10 @@ extern "C" DLLEXPORT const char* fdb_get_client_version() {
return API->getClientVersion(); return API->getClientVersion();
} }
extern "C" DLLEXPORT void fdb_use_future_protocol_version() {
API->useFutureProtocolVersion();
}
#if defined(__APPLE__) #if defined(__APPLE__)
#include <dlfcn.h> #include <dlfcn.h>
__attribute__((constructor)) static void initialize() { __attribute__((constructor)) static void initialize() {

View File

@ -227,6 +227,8 @@ DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_set_callback(FDBFuture* f,
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_error(FDBFuture* f); DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_error(FDBFuture* f);
#endif #endif
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_bool(FDBFuture* f, fdb_bool_t* out);
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_int64(FDBFuture* f, int64_t* out); DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_int64(FDBFuture* f, int64_t* out);
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_uint64(FDBFuture* f, uint64_t* out); DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_uint64(FDBFuture* f, uint64_t* out);
@ -321,6 +323,32 @@ DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_wait_purge_granules_complet
uint8_t const* purge_key_name, uint8_t const* purge_key_name,
int purge_key_name_length); int purge_key_name_length);
DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_blobbify_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length);
DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_unblobbify_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length);
DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_list_blobbified_ranges(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int rangeLimit);
DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_verify_blob_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int64_t version);
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_tenant_create_transaction(FDBTenant* tenant, DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_tenant_create_transaction(FDBTenant* tenant,
FDBTransaction** out_transaction); FDBTransaction** out_transaction);
@ -479,7 +507,8 @@ DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_transaction_get_blob_granule_ranges(
uint8_t const* begin_key_name, uint8_t const* begin_key_name,
int begin_key_name_length, int begin_key_name_length,
uint8_t const* end_key_name, uint8_t const* end_key_name,
int end_key_name_length); int end_key_name_length,
int rangeLimit);
/* LatestVersion (-2) for readVersion means get read version from transaction /* LatestVersion (-2) for readVersion means get read version from transaction
Separated out as optional because BG reads can support longer-lived reads than normal FDB transactions */ Separated out as optional because BG reads can support longer-lived reads than normal FDB transactions */

View File

@ -49,6 +49,8 @@ DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_shared_state(FDBFuture*
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_create_database_from_connection_string(const char* connection_string, DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_create_database_from_connection_string(const char* connection_string,
FDBDatabase** out_database); FDBDatabase** out_database);
DLLEXPORT void fdb_use_future_protocol_version();
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif

View File

@ -124,8 +124,10 @@ private:
} else if (err.code() != error_code_success) { } else if (err.code() != error_code_success) {
ctx->onError(err); ctx->onError(err);
} else { } else {
auto& [out_kv, out_count, out_more] = out; auto resCopy = copyKeyValueArray(out);
auto& [resVector, out_more] = resCopy;
ASSERT(!out_more); ASSERT(!out_more);
results.get()->assign(resVector.begin(), resVector.end());
if (!seenReadSuccess) { if (!seenReadSuccess) {
info("BlobGranuleCorrectness::randomReadOp first success\n"); info("BlobGranuleCorrectness::randomReadOp first success\n");
} }
@ -178,7 +180,7 @@ private:
} }
execTransaction( execTransaction(
[begin, end, results](auto ctx) { [begin, end, results](auto ctx) {
fdb::Future f = ctx->tx().getBlobGranuleRanges(begin, end).eraseType(); fdb::Future f = ctx->tx().getBlobGranuleRanges(begin, end, 1000).eraseType();
ctx->continueAfter( ctx->continueAfter(
f, f,
[ctx, f, results]() { [ctx, f, results]() {

View File

@ -0,0 +1,65 @@
/*
* TesterExampleWorkload.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "TesterWorkload.h"
#include "TesterUtil.h"
namespace FdbApiTester {
class SetAndGetWorkload : public WorkloadBase {
public:
fdb::Key keyPrefix;
Random random;
SetAndGetWorkload(const WorkloadConfig& config) : WorkloadBase(config) {
keyPrefix = fdb::toBytesRef(fmt::format("{}/", workloadId));
}
void start() override { setAndGet(NO_OP_TASK); }
void setAndGet(TTaskFct cont) {
fdb::Key key = keyPrefix + random.randomStringLowerCase(10, 100);
fdb::Value value = random.randomStringLowerCase(10, 1000);
execTransaction(
[key, value](auto ctx) {
ctx->tx().set(key, value);
ctx->commit();
},
[this, key, value, cont]() {
execTransaction(
[this, key, value](auto ctx) {
auto future = ctx->tx().get(key, false);
ctx->continueAfter(future, [this, ctx, future, value]() {
std::optional<fdb::Value> res = copyValueRef(future.get());
if (res != value) {
error(fmt::format(
"expected: {} actual: {}", fdb::toCharsRef(value), fdb::toCharsRef(res.value())));
}
ctx->done();
});
},
cont);
});
}
};
WorkloadFactory<SetAndGetWorkload> SetAndGetWorkloadFactory("SetAndGet");
} // namespace FdbApiTester

View File

@ -38,6 +38,7 @@ public:
std::string logGroup; std::string logGroup;
std::string externalClientLibrary; std::string externalClientLibrary;
std::string externalClientDir; std::string externalClientDir;
std::string futureVersionClientLibrary;
std::string tmpDir; std::string tmpDir;
bool disableLocalClient = false; bool disableLocalClient = false;
std::string testFile; std::string testFile;

View File

@ -165,9 +165,12 @@ void WorkloadManager::add(std::shared_ptr<IWorkload> workload, TTaskFct cont) {
void WorkloadManager::run() { void WorkloadManager::run() {
std::vector<std::shared_ptr<IWorkload>> initialWorkloads; std::vector<std::shared_ptr<IWorkload>> initialWorkloads;
{
std::unique_lock<std::mutex> lock(mutex);
for (auto iter : workloads) { for (auto iter : workloads) {
initialWorkloads.push_back(iter.second.ref); initialWorkloads.push_back(iter.second.ref);
} }
}
for (auto iter : initialWorkloads) { for (auto iter : initialWorkloads) {
iter->init(this); iter->init(this);
} }

View File

@ -46,6 +46,7 @@ enum TesterOptionId {
OPT_KNOB, OPT_KNOB,
OPT_EXTERNAL_CLIENT_LIBRARY, OPT_EXTERNAL_CLIENT_LIBRARY,
OPT_EXTERNAL_CLIENT_DIRECTORY, OPT_EXTERNAL_CLIENT_DIRECTORY,
OPT_FUTURE_VERSION_CLIENT_LIBRARY,
OPT_TMP_DIR, OPT_TMP_DIR,
OPT_DISABLE_LOCAL_CLIENT, OPT_DISABLE_LOCAL_CLIENT,
OPT_TEST_FILE, OPT_TEST_FILE,
@ -72,6 +73,7 @@ CSimpleOpt::SOption TesterOptionDefs[] = //
{ OPT_KNOB, "--knob-", SO_REQ_SEP }, { OPT_KNOB, "--knob-", SO_REQ_SEP },
{ OPT_EXTERNAL_CLIENT_LIBRARY, "--external-client-library", SO_REQ_SEP }, { OPT_EXTERNAL_CLIENT_LIBRARY, "--external-client-library", SO_REQ_SEP },
{ OPT_EXTERNAL_CLIENT_DIRECTORY, "--external-client-dir", SO_REQ_SEP }, { OPT_EXTERNAL_CLIENT_DIRECTORY, "--external-client-dir", SO_REQ_SEP },
{ OPT_FUTURE_VERSION_CLIENT_LIBRARY, "--future-version-client-library", SO_REQ_SEP },
{ OPT_TMP_DIR, "--tmp-dir", SO_REQ_SEP }, { OPT_TMP_DIR, "--tmp-dir", SO_REQ_SEP },
{ OPT_DISABLE_LOCAL_CLIENT, "--disable-local-client", SO_NONE }, { OPT_DISABLE_LOCAL_CLIENT, "--disable-local-client", SO_NONE },
{ OPT_TEST_FILE, "-f", SO_REQ_SEP }, { OPT_TEST_FILE, "-f", SO_REQ_SEP },
@ -110,6 +112,8 @@ void printProgramUsage(const char* execName) {
" Path to the external client library.\n" " Path to the external client library.\n"
" --external-client-dir DIR\n" " --external-client-dir DIR\n"
" Directory containing external client libraries.\n" " Directory containing external client libraries.\n"
" --future-version-client-library FILE\n"
" Path to a client library to be used with a future protocol version.\n"
" --tmp-dir DIR\n" " --tmp-dir DIR\n"
" Directory for temporary files of the client.\n" " Directory for temporary files of the client.\n"
" --disable-local-client DIR\n" " --disable-local-client DIR\n"
@ -204,6 +208,9 @@ bool processArg(TesterOptions& options, const CSimpleOpt& args) {
case OPT_EXTERNAL_CLIENT_DIRECTORY: case OPT_EXTERNAL_CLIENT_DIRECTORY:
options.externalClientDir = args.OptionArg(); options.externalClientDir = args.OptionArg();
break; break;
case OPT_FUTURE_VERSION_CLIENT_LIBRARY:
options.futureVersionClientLibrary = args.OptionArg();
break;
case OPT_TMP_DIR: case OPT_TMP_DIR:
options.tmpDir = args.OptionArg(); options.tmpDir = args.OptionArg();
break; break;
@ -296,6 +303,11 @@ void applyNetworkOptions(TesterOptions& options) {
} }
} }
if (!options.futureVersionClientLibrary.empty()) {
fdb::network::setOption(FDBNetworkOption::FDB_NET_OPTION_FUTURE_VERSION_CLIENT_LIBRARY,
options.futureVersionClientLibrary);
}
if (options.testSpec.multiThreaded) { if (options.testSpec.multiThreaded) {
fdb::network::setOption(FDBNetworkOption::FDB_NET_OPTION_CLIENT_THREADS_PER_VERSION, options.numFdbThreads); fdb::network::setOption(FDBNetworkOption::FDB_NET_OPTION_CLIENT_THREADS_PER_VERSION, options.numFdbThreads);
} }

View File

@ -0,0 +1,23 @@
[[test]]
title = 'Mixed Workload for Upgrade Tests with a Multi-Threaded Client'
multiThreaded = true
buggify = true
databasePerTransaction = false
minFdbThreads = 2
maxFdbThreads = 8
minDatabases = 2
maxDatabases = 8
minClientThreads = 2
maxClientThreads = 8
minClients = 2
maxClients = 8
[[test.workload]]
name = 'ApiBlobGranuleCorrectness'
minKeyLength = 1
maxKeyLength = 64
minValueLength = 1
maxValueLength = 1000
maxKeysPerTransaction = 50
initialSize = 100
runUntilStop = true

View File

@ -33,3 +33,13 @@ maxClients = 8
initialSize = 100 initialSize = 100
runUntilStop = true runUntilStop = true
readExistingKeysRatio = 0.9 readExistingKeysRatio = 0.9
[[test.workload]]
name = 'AtomicOpsCorrectness'
initialSize = 0
runUntilStop = true
[[test.workload]]
name = 'WatchAndWait'
initialSize = 0
runUntilStop = true

View File

@ -31,3 +31,13 @@ maxClients = 8
initialSize = 100 initialSize = 100
runUntilStop = true runUntilStop = true
readExistingKeysRatio = 0.9 readExistingKeysRatio = 0.9
[[test.workload]]
name = 'AtomicOpsCorrectness'
initialSize = 0
runUntilStop = true
[[test.workload]]
name = 'WatchAndWait'
initialSize = 0
runUntilStop = true

View File

@ -559,9 +559,9 @@ public:
reverse); reverse);
} }
TypedFuture<future_var::KeyRangeRefArray> getBlobGranuleRanges(KeyRef begin, KeyRef end) { TypedFuture<future_var::KeyRangeRefArray> getBlobGranuleRanges(KeyRef begin, KeyRef end, int rangeLimit) {
return native::fdb_transaction_get_blob_granule_ranges( return native::fdb_transaction_get_blob_granule_ranges(
tr.get(), begin.data(), intSize(begin), end.data(), intSize(end)); tr.get(), begin.data(), intSize(begin), end.data(), intSize(end), rangeLimit);
} }
Result readBlobGranules(KeyRef begin, Result readBlobGranules(KeyRef begin,

View File

@ -356,9 +356,15 @@ fdb_error_t Transaction::add_conflict_range(std::string_view begin_key,
tr_, (const uint8_t*)begin_key.data(), begin_key.size(), (const uint8_t*)end_key.data(), end_key.size(), type); tr_, (const uint8_t*)begin_key.data(), begin_key.size(), (const uint8_t*)end_key.data(), end_key.size(), type);
} }
KeyRangeArrayFuture Transaction::get_blob_granule_ranges(std::string_view begin_key, std::string_view end_key) { KeyRangeArrayFuture Transaction::get_blob_granule_ranges(std::string_view begin_key,
return KeyRangeArrayFuture(fdb_transaction_get_blob_granule_ranges( std::string_view end_key,
tr_, (const uint8_t*)begin_key.data(), begin_key.size(), (const uint8_t*)end_key.data(), end_key.size())); int rangeLimit) {
return KeyRangeArrayFuture(fdb_transaction_get_blob_granule_ranges(tr_,
(const uint8_t*)begin_key.data(),
begin_key.size(),
(const uint8_t*)end_key.data(),
end_key.size(),
rangeLimit));
} }
KeyValueArrayResult Transaction::read_blob_granules(std::string_view begin_key, KeyValueArrayResult Transaction::read_blob_granules(std::string_view begin_key,
std::string_view end_key, std::string_view end_key,

View File

@ -348,7 +348,7 @@ public:
// Wrapper around fdb_transaction_add_conflict_range. // Wrapper around fdb_transaction_add_conflict_range.
fdb_error_t add_conflict_range(std::string_view begin_key, std::string_view end_key, FDBConflictRangeType type); fdb_error_t add_conflict_range(std::string_view begin_key, std::string_view end_key, FDBConflictRangeType type);
KeyRangeArrayFuture get_blob_granule_ranges(std::string_view begin_key, std::string_view end_key); KeyRangeArrayFuture get_blob_granule_ranges(std::string_view begin_key, std::string_view end_key, int rangeLimit);
KeyValueArrayResult read_blob_granules(std::string_view begin_key, KeyValueArrayResult read_blob_granules(std::string_view begin_key,
std::string_view end_key, std::string_view end_key,
int64_t beginVersion, int64_t beginVersion,

View File

@ -2853,7 +2853,7 @@ TEST_CASE("Blob Granule Functions") {
// test ranges // test ranges
while (1) { while (1) {
fdb::KeyRangeArrayFuture f = tr.get_blob_granule_ranges(key("bg"), key("bh")); fdb::KeyRangeArrayFuture f = tr.get_blob_granule_ranges(key("bg"), key("bh"), 1000);
fdb_error_t err = wait_future(f); fdb_error_t err = wait_future(f);
if (err) { if (err) {
fdb::EmptyFuture f2 = tr.on_error(err); fdb::EmptyFuture f2 = tr.on_error(err);

View File

@ -239,6 +239,13 @@ func (o NetworkOptions) SetClientThreadsPerVersion(param int64) error {
return o.setOpt(65, int64ToBytes(param)) return o.setOpt(65, int64ToBytes(param))
} }
// Adds an external client library to be used with a future version protocol. This option can be used testing purposes only!
//
// Parameter: path to client library
func (o NetworkOptions) SetFutureVersionClientLibrary(param string) error {
return o.setOpt(66, []byte(param))
}
// Disables logging of client statistics, such as sampled transaction activity. // Disables logging of client statistics, such as sampled transaction activity.
func (o NetworkOptions) SetDisableClientStatisticsLogging() error { func (o NetworkOptions) SetDisableClientStatisticsLogging() error {
return o.setOpt(70, nil) return o.setOpt(70, nil)

View File

@ -34,9 +34,11 @@ set(JAVA_BINDING_SRCS
src/main/com/apple/foundationdb/FDBDatabase.java src/main/com/apple/foundationdb/FDBDatabase.java
src/main/com/apple/foundationdb/FDBTenant.java src/main/com/apple/foundationdb/FDBTenant.java
src/main/com/apple/foundationdb/FDBTransaction.java src/main/com/apple/foundationdb/FDBTransaction.java
src/main/com/apple/foundationdb/FutureBool.java
src/main/com/apple/foundationdb/FutureInt64.java src/main/com/apple/foundationdb/FutureInt64.java
src/main/com/apple/foundationdb/FutureKey.java src/main/com/apple/foundationdb/FutureKey.java
src/main/com/apple/foundationdb/FutureKeyArray.java src/main/com/apple/foundationdb/FutureKeyArray.java
src/main/com/apple/foundationdb/FutureKeyRangeArray.java
src/main/com/apple/foundationdb/FutureResult.java src/main/com/apple/foundationdb/FutureResult.java
src/main/com/apple/foundationdb/FutureResults.java src/main/com/apple/foundationdb/FutureResults.java
src/main/com/apple/foundationdb/FutureMappedResults.java src/main/com/apple/foundationdb/FutureMappedResults.java
@ -56,6 +58,7 @@ set(JAVA_BINDING_SRCS
src/main/com/apple/foundationdb/RangeQuery.java src/main/com/apple/foundationdb/RangeQuery.java
src/main/com/apple/foundationdb/MappedRangeQuery.java src/main/com/apple/foundationdb/MappedRangeQuery.java
src/main/com/apple/foundationdb/KeyArrayResult.java src/main/com/apple/foundationdb/KeyArrayResult.java
src/main/com/apple/foundationdb/KeyRangeArrayResult.java
src/main/com/apple/foundationdb/RangeResult.java src/main/com/apple/foundationdb/RangeResult.java
src/main/com/apple/foundationdb/MappedRangeResult.java src/main/com/apple/foundationdb/MappedRangeResult.java
src/main/com/apple/foundationdb/RangeResultInfo.java src/main/com/apple/foundationdb/RangeResultInfo.java

View File

@ -25,9 +25,11 @@
#include "com_apple_foundationdb_FDB.h" #include "com_apple_foundationdb_FDB.h"
#include "com_apple_foundationdb_FDBDatabase.h" #include "com_apple_foundationdb_FDBDatabase.h"
#include "com_apple_foundationdb_FDBTransaction.h" #include "com_apple_foundationdb_FDBTransaction.h"
#include "com_apple_foundationdb_FutureBool.h"
#include "com_apple_foundationdb_FutureInt64.h" #include "com_apple_foundationdb_FutureInt64.h"
#include "com_apple_foundationdb_FutureKey.h" #include "com_apple_foundationdb_FutureKey.h"
#include "com_apple_foundationdb_FutureKeyArray.h" #include "com_apple_foundationdb_FutureKeyArray.h"
#include "com_apple_foundationdb_FutureKeyRangeArray.h"
#include "com_apple_foundationdb_FutureResult.h" #include "com_apple_foundationdb_FutureResult.h"
#include "com_apple_foundationdb_FutureResults.h" #include "com_apple_foundationdb_FutureResults.h"
#include "com_apple_foundationdb_FutureStrings.h" #include "com_apple_foundationdb_FutureStrings.h"
@ -55,7 +57,11 @@ static jclass mapped_range_result_class;
static jclass mapped_key_value_class; static jclass mapped_key_value_class;
static jclass string_class; static jclass string_class;
static jclass key_array_result_class; static jclass key_array_result_class;
static jclass keyrange_class;
static jclass keyrange_array_result_class;
static jmethodID key_array_result_init; static jmethodID key_array_result_init;
static jmethodID keyrange_init;
static jmethodID keyrange_array_result_init;
static jmethodID range_result_init; static jmethodID range_result_init;
static jmethodID mapped_range_result_init; static jmethodID mapped_range_result_init;
static jmethodID mapped_key_value_from_bytes; static jmethodID mapped_key_value_from_bytes;
@ -278,6 +284,23 @@ JNIEXPORT void JNICALL Java_com_apple_foundationdb_NativeFuture_Future_1releaseM
fdb_future_release_memory(var); fdb_future_release_memory(var);
} }
JNIEXPORT jboolean JNICALL Java_com_apple_foundationdb_FutureBool_FutureBool_1get(JNIEnv* jenv, jobject, jlong future) {
if (!future) {
throwParamNotNull(jenv);
return 0;
}
FDBFuture* f = (FDBFuture*)future;
fdb_bool_t value = false;
fdb_error_t err = fdb_future_get_bool(f, &value);
if (err) {
safeThrow(jenv, getThrowable(jenv, err));
return 0;
}
return (jboolean)value;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FutureInt64_FutureInt64_1get(JNIEnv* jenv, jobject, jlong future) { JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FutureInt64_FutureInt64_1get(JNIEnv* jenv, jobject, jlong future) {
if (!future) { if (!future) {
throwParamNotNull(jenv); throwParamNotNull(jenv);
@ -407,6 +430,61 @@ JNIEXPORT jobject JNICALL Java_com_apple_foundationdb_FutureKeyArray_FutureKeyAr
return result; return result;
} }
JNIEXPORT jobject JNICALL Java_com_apple_foundationdb_FutureKeyRangeArray_FutureKeyRangeArray_1get(JNIEnv* jenv,
jobject,
jlong future) {
if (!future) {
throwParamNotNull(jenv);
return JNI_NULL;
}
FDBFuture* f = (FDBFuture*)future;
const FDBKeyRange* fdbKr;
int count;
fdb_error_t err = fdb_future_get_keyrange_array(f, &fdbKr, &count);
if (err) {
safeThrow(jenv, getThrowable(jenv, err));
return JNI_NULL;
}
jobjectArray kr_values = jenv->NewObjectArray(count, keyrange_class, NULL);
if (!kr_values) {
if (!jenv->ExceptionOccurred())
throwOutOfMem(jenv);
return JNI_NULL;
}
for (int i = 0; i < count; i++) {
jbyteArray beginArr = jenv->NewByteArray(fdbKr[i].begin_key_length);
if (!beginArr) {
if (!jenv->ExceptionOccurred())
throwOutOfMem(jenv);
return JNI_NULL;
}
jbyteArray endArr = jenv->NewByteArray(fdbKr[i].end_key_length);
if (!endArr) {
if (!jenv->ExceptionOccurred())
throwOutOfMem(jenv);
return JNI_NULL;
}
jenv->SetByteArrayRegion(beginArr, 0, fdbKr[i].begin_key_length, (const jbyte*)fdbKr[i].begin_key);
jenv->SetByteArrayRegion(endArr, 0, fdbKr[i].end_key_length, (const jbyte*)fdbKr[i].end_key);
jobject kr = jenv->NewObject(keyrange_class, keyrange_init, beginArr, endArr);
if (jenv->ExceptionOccurred())
return JNI_NULL;
jenv->SetObjectArrayElement(kr_values, i, kr);
if (jenv->ExceptionOccurred())
return JNI_NULL;
}
jobject krarr = jenv->NewObject(keyrange_array_result_class, keyrange_array_result_init, kr_values);
if (jenv->ExceptionOccurred())
return JNI_NULL;
return krarr;
}
// SOMEDAY: explore doing this more efficiently with Direct ByteBuffers // SOMEDAY: explore doing this more efficiently with Direct ByteBuffers
JNIEXPORT jobject JNICALL Java_com_apple_foundationdb_FutureResults_FutureResults_1get(JNIEnv* jenv, JNIEXPORT jobject JNICALL Java_com_apple_foundationdb_FutureResults_FutureResults_1get(JNIEnv* jenv,
jobject, jobject,
@ -830,6 +908,142 @@ Java_com_apple_foundationdb_FDBDatabase_Database_1waitPurgeGranulesComplete(JNIE
return (jlong)f; return (jlong)f;
} }
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FDBDatabase_Database_1blobbifyRange(JNIEnv* jenv,
jobject,
jlong dbPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes) {
if (!dbPtr || !beginKeyBytes || !endKeyBytes) {
throwParamNotNull(jenv);
return 0;
}
FDBDatabase* database = (FDBDatabase*)dbPtr;
uint8_t* beginKeyArr = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!beginKeyArr) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKeyArr = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKeyArr) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)beginKeyArr, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_database_blobbify_range(
database, beginKeyArr, jenv->GetArrayLength(beginKeyBytes), endKeyArr, jenv->GetArrayLength(endKeyBytes));
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)beginKeyArr, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKeyArr, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FDBDatabase_Database_1unblobbifyRange(JNIEnv* jenv,
jobject,
jlong dbPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes) {
if (!dbPtr || !beginKeyBytes || !endKeyBytes) {
throwParamNotNull(jenv);
return 0;
}
FDBDatabase* database = (FDBDatabase*)dbPtr;
uint8_t* beginKeyArr = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!beginKeyArr) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKeyArr = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKeyArr) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)beginKeyArr, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_database_unblobbify_range(
database, beginKeyArr, jenv->GetArrayLength(beginKeyBytes), endKeyArr, jenv->GetArrayLength(endKeyBytes));
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)beginKeyArr, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKeyArr, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FDBDatabase_Database_1listBlobbifiedRanges(JNIEnv* jenv,
jobject,
jlong dbPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes,
jint rangeLimit) {
if (!dbPtr || !beginKeyBytes || !endKeyBytes) {
throwParamNotNull(jenv);
return 0;
}
FDBDatabase* tr = (FDBDatabase*)dbPtr;
uint8_t* startKey = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!startKey) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKey = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKey) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_database_list_blobbified_ranges(
tr, startKey, jenv->GetArrayLength(beginKeyBytes), endKey, jenv->GetArrayLength(endKeyBytes), rangeLimit);
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKey, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FDBDatabase_Database_1verifyBlobRange(JNIEnv* jenv,
jobject,
jlong dbPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes,
jlong version) {
if (!dbPtr || !beginKeyBytes || !endKeyBytes) {
throwParamNotNull(jenv);
return 0;
}
FDBDatabase* tr = (FDBDatabase*)dbPtr;
uint8_t* startKey = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!startKey) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKey = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKey) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_database_list_blobbified_ranges(
tr, startKey, jenv->GetArrayLength(beginKeyBytes), endKey, jenv->GetArrayLength(endKeyBytes), version);
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKey, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT jboolean JNICALL Java_com_apple_foundationdb_FDB_Error_1predicate(JNIEnv* jenv, JNIEXPORT jboolean JNICALL Java_com_apple_foundationdb_FDB_Error_1predicate(JNIEnv* jenv,
jobject, jobject,
jint predicate, jint predicate,
@ -1307,6 +1521,41 @@ Java_com_apple_foundationdb_FDBTransaction_Transaction_1getRangeSplitPoints(JNIE
return (jlong)f; return (jlong)f;
} }
JNIEXPORT jlong JNICALL
Java_com_apple_foundationdb_FDBTransaction_Transaction_1getBlobGranuleRanges(JNIEnv* jenv,
jobject,
jlong tPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes,
jint rowLimit) {
if (!tPtr || !beginKeyBytes || !endKeyBytes || !rowLimit) {
throwParamNotNull(jenv);
return 0;
}
FDBTransaction* tr = (FDBTransaction*)tPtr;
uint8_t* startKey = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!startKey) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKey = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKey) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_transaction_get_blob_granule_ranges(
tr, startKey, jenv->GetArrayLength(beginKeyBytes), endKey, jenv->GetArrayLength(endKeyBytes), rowLimit);
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKey, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT void JNICALL Java_com_apple_foundationdb_FDBTransaction_Transaction_1set(JNIEnv* jenv, JNIEXPORT void JNICALL Java_com_apple_foundationdb_FDBTransaction_Transaction_1set(JNIEnv* jenv,
jobject, jobject,
jlong tPtr, jlong tPtr,
@ -1746,6 +1995,15 @@ jint JNI_OnLoad(JavaVM* vm, void* reserved) {
key_array_result_init = env->GetMethodID(local_key_array_result_class, "<init>", "([B[I)V"); key_array_result_init = env->GetMethodID(local_key_array_result_class, "<init>", "([B[I)V");
key_array_result_class = (jclass)(env)->NewGlobalRef(local_key_array_result_class); key_array_result_class = (jclass)(env)->NewGlobalRef(local_key_array_result_class);
jclass local_keyrange_class = env->FindClass("com/apple/foundationdb/Range");
keyrange_init = env->GetMethodID(local_keyrange_class, "<init>", "([B[B)V");
keyrange_class = (jclass)(env)->NewGlobalRef(local_keyrange_class);
jclass local_keyrange_array_result_class = env->FindClass("com/apple/foundationdb/KeyRangeArrayResult");
keyrange_array_result_init =
env->GetMethodID(local_keyrange_array_result_class, "<init>", "([Lcom/apple/foundationdb/Range;)V");
keyrange_array_result_class = (jclass)(env)->NewGlobalRef(local_keyrange_array_result_class);
jclass local_range_result_summary_class = env->FindClass("com/apple/foundationdb/RangeResultSummary"); jclass local_range_result_summary_class = env->FindClass("com/apple/foundationdb/RangeResultSummary");
range_result_summary_init = env->GetMethodID(local_range_result_summary_class, "<init>", "([BIZ)V"); range_result_summary_init = env->GetMethodID(local_range_result_summary_class, "<init>", "([BIZ)V");
range_result_summary_class = (jclass)(env)->NewGlobalRef(local_range_result_summary_class); range_result_summary_class = (jclass)(env)->NewGlobalRef(local_range_result_summary_class);
@ -1770,6 +2028,12 @@ void JNI_OnUnload(JavaVM* vm, void* reserved) {
if (range_result_class != JNI_NULL) { if (range_result_class != JNI_NULL) {
env->DeleteGlobalRef(range_result_class); env->DeleteGlobalRef(range_result_class);
} }
if (keyrange_array_result_class != JNI_NULL) {
env->DeleteGlobalRef(keyrange_array_result_class);
}
if (keyrange_class != JNI_NULL) {
env->DeleteGlobalRef(keyrange_class);
}
if (mapped_range_result_class != JNI_NULL) { if (mapped_range_result_class != JNI_NULL) {
env->DeleteGlobalRef(mapped_range_result_class); env->DeleteGlobalRef(mapped_range_result_class);
} }

View File

@ -161,6 +161,20 @@ public interface Database extends AutoCloseable, TransactionContext {
*/ */
double getMainThreadBusyness(); double getMainThreadBusyness();
/**
* Runs {@link #purgeBlobGranules(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param purgeVersion version to purge at
* @param force if true delete all data, if not keep data >= purgeVersion
*
* @return the key to watch for purge complete
*/
default CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force) {
return purgeBlobGranules(beginKey, endKey, purgeVersion, force, getExecutor());
}
/** /**
* Queues a purge of blob granules for the specified key range, at the specified version. * Queues a purge of blob granules for the specified key range, at the specified version.
* *
@ -168,17 +182,126 @@ public interface Database extends AutoCloseable, TransactionContext {
* @param endKey end of the key range * @param endKey end of the key range
* @param purgeVersion version to purge at * @param purgeVersion version to purge at
* @param force if true delete all data, if not keep data >= purgeVersion * @param force if true delete all data, if not keep data >= purgeVersion
* @param e the {@link Executor} to use for asynchronous callbacks
* @return the key to watch for purge complete * @return the key to watch for purge complete
*/ */
CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force, Executor e); CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force, Executor e);
/** /**
* Wait for a previous call to purgeBlobGranules to complete * Runs {@link #waitPurgeGranulesComplete(Function)} on the default executor.
* *
* @param purgeKey key to watch * @param purgeKey key to watch
*/ */
default CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey) {
return waitPurgeGranulesComplete(purgeKey, getExecutor());
}
/**
* Wait for a previous call to purgeBlobGranules to complete.
*
* @param purgeKey key to watch
* @param e the {@link Executor} to use for asynchronous callbacks
*/
CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey, Executor e); CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey, Executor e);
/**
* Runs {@link #blobbifyRange(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @return if the recording of the range was successful
*/
default CompletableFuture<Boolean> blobbifyRange(byte[] beginKey, byte[] endKey) {
return blobbifyRange(beginKey, endKey, getExecutor());
}
/**
* Sets a range to be blobbified in the database. Must be a completely unblobbified range.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param e the {@link Executor} to use for asynchronous callbacks
* @return if the recording of the range was successful
*/
CompletableFuture<Boolean> blobbifyRange(byte[] beginKey, byte[] endKey, Executor e);
/**
* Runs {@link #unblobbifyRange(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @return if the recording of the range was successful
*/
default CompletableFuture<Boolean> unblobbifyRange(byte[] beginKey, byte[] endKey) {
return unblobbifyRange(beginKey, endKey, getExecutor());
}
/**
* Sets a range to be unblobbified in the database.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param e the {@link Executor} to use for asynchronous callbacks
* @return if the recording of the range was successful
*/
CompletableFuture<Boolean> unblobbifyRange(byte[] beginKey, byte[] endKey, Executor e);
/**
* Runs {@link #listBlobbifiedRanges(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param rangeLimit batch size
* @param e the {@link Executor} to use for asynchronous callbacks
* @return a future with the list of blobbified ranges.
*/
default CompletableFuture<KeyRangeArrayResult> listBlobbifiedRanges(byte[] beginKey, byte[] endKey, int rangeLimit) {
return listBlobbifiedRanges(beginKey, endKey, rangeLimit, getExecutor());
}
/**
* Lists blobbified ranges in the database. There may be more if result.size() == rangeLimit.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param rangeLimit batch size
* @param e the {@link Executor} to use for asynchronous callbacks
* @return a future with the list of blobbified ranges.
*/
CompletableFuture<KeyRangeArrayResult> listBlobbifiedRanges(byte[] beginKey, byte[] endKey, int rangeLimit, Executor e);
/**
* Runs {@link #verifyBlobRange(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param version version to read at
*
* @return a future with the version of the last blob granule.
*/
default CompletableFuture<Long> verifyBlobRange(byte[] beginKey, byte[] endKey, long version) {
return verifyBlobRange(beginKey, endKey, version, getExecutor());
}
/**
* Checks if a blob range is blobbified.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param version version to read at
*
* @return a future with the version of the last blob granule.
*/
CompletableFuture<Long> verifyBlobRange(byte[] beginKey, byte[] endKey, long version, Executor e);
/** /**
* Runs a read-only transactional function against this {@code Database} with retry logic. * Runs a read-only transactional function against this {@code Database} with retry logic.
* {@link Function#apply(Object) apply(ReadTransaction)} will be called on the * {@link Function#apply(Object) apply(ReadTransaction)} will be called on the

View File

@ -201,20 +201,60 @@ class FDBDatabase extends NativeObjectWrapper implements Database, OptionConsume
} }
@Override @Override
public CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force, Executor executor) { public CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force, Executor e) {
pointerReadLock.lock(); pointerReadLock.lock();
try { try {
return new FutureKey(Database_purgeBlobGranules(getPtr(), beginKey, endKey, purgeVersion, force), executor, eventKeeper); return new FutureKey(Database_purgeBlobGranules(getPtr(), beginKey, endKey, purgeVersion, force), e, eventKeeper);
} finally { } finally {
pointerReadLock.unlock(); pointerReadLock.unlock();
} }
} }
@Override @Override
public CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey, Executor executor) { public CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey, Executor e) {
pointerReadLock.lock(); pointerReadLock.lock();
try { try {
return new FutureVoid(Database_waitPurgeGranulesComplete(getPtr(), purgeKey), executor); return new FutureVoid(Database_waitPurgeGranulesComplete(getPtr(), purgeKey), e);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<Boolean> blobbifyRange(byte[] beginKey, byte[] endKey, Executor e) {
pointerReadLock.lock();
try {
return new FutureBool(Database_blobbifyRange(getPtr(), beginKey, endKey), e);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<Boolean> unblobbifyRange(byte[] beginKey, byte[] endKey, Executor e) {
pointerReadLock.lock();
try {
return new FutureBool(Database_unblobbifyRange(getPtr(), beginKey, endKey), e);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<KeyRangeArrayResult> listBlobbifiedRanges(byte[] beginKey, byte[] endKey, int rangeLimit, Executor e) {
pointerReadLock.lock();
try {
return new FutureKeyRangeArray(Database_listBlobbifiedRanges(getPtr(), beginKey, endKey, rangeLimit), e);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<Long> verifyBlobRange(byte[] beginKey, byte[] endKey, long version, Executor e) {
pointerReadLock.lock();
try {
return new FutureInt64(Database_verifyBlobRange(getPtr(), beginKey, endKey, version), e);
} finally { } finally {
pointerReadLock.unlock(); pointerReadLock.unlock();
} }
@ -237,4 +277,8 @@ class FDBDatabase extends NativeObjectWrapper implements Database, OptionConsume
private native double Database_getMainThreadBusyness(long cPtr); private native double Database_getMainThreadBusyness(long cPtr);
private native long Database_purgeBlobGranules(long cPtr, byte[] beginKey, byte[] endKey, long purgeVersion, boolean force); private native long Database_purgeBlobGranules(long cPtr, byte[] beginKey, byte[] endKey, long purgeVersion, boolean force);
private native long Database_waitPurgeGranulesComplete(long cPtr, byte[] purgeKey); private native long Database_waitPurgeGranulesComplete(long cPtr, byte[] purgeKey);
private native long Database_blobbifyRange(long cPtr, byte[] beginKey, byte[] endKey);
private native long Database_unblobbifyRange(long cPtr, byte[] beginKey, byte[] endKey);
private native long Database_listBlobbifiedRanges(long cPtr, byte[] beginKey, byte[] endKey, int rangeLimit);
private native long Database_verifyBlobRange(long cPtr, byte[] beginKey, byte[] endKey, long version);
} }

View File

@ -97,6 +97,11 @@ class FDBTransaction extends NativeObjectWrapper implements Transaction, OptionC
return FDBTransaction.this.getRangeSplitPoints(range, chunkSize); return FDBTransaction.this.getRangeSplitPoints(range, chunkSize);
} }
@Override
public CompletableFuture<KeyRangeArrayResult> getBlobGranuleRanges(byte[] begin, byte[] end, int rowLimit) {
return FDBTransaction.this.getBlobGranuleRanges(begin, end, rowLimit);
}
@Override @Override
public AsyncIterable<MappedKeyValue> getMappedRange(KeySelector begin, KeySelector end, byte[] mapper, public AsyncIterable<MappedKeyValue> getMappedRange(KeySelector begin, KeySelector end, byte[] mapper,
int limit, int matchIndex, boolean reverse, int limit, int matchIndex, boolean reverse,
@ -352,6 +357,16 @@ class FDBTransaction extends NativeObjectWrapper implements Transaction, OptionC
return this.getRangeSplitPoints(range.begin, range.end, chunkSize); return this.getRangeSplitPoints(range.begin, range.end, chunkSize);
} }
@Override
public CompletableFuture<KeyRangeArrayResult> getBlobGranuleRanges(byte[] begin, byte[] end, int rowLimit) {
pointerReadLock.lock();
try {
return new FutureKeyRangeArray(Transaction_getBlobGranuleRanges(getPtr(), begin, end, rowLimit), executor);
} finally {
pointerReadLock.unlock();
}
}
@Override @Override
public AsyncIterable<MappedKeyValue> getMappedRange(KeySelector begin, KeySelector end, byte[] mapper, int limit, public AsyncIterable<MappedKeyValue> getMappedRange(KeySelector begin, KeySelector end, byte[] mapper, int limit,
int matchIndex, boolean reverse, StreamingMode mode) { int matchIndex, boolean reverse, StreamingMode mode) {
@ -842,4 +857,5 @@ class FDBTransaction extends NativeObjectWrapper implements Transaction, OptionC
private native long Transaction_getKeyLocations(long cPtr, byte[] key); private native long Transaction_getKeyLocations(long cPtr, byte[] key);
private native long Transaction_getEstimatedRangeSizeBytes(long cPtr, byte[] keyBegin, byte[] keyEnd); private native long Transaction_getEstimatedRangeSizeBytes(long cPtr, byte[] keyBegin, byte[] keyEnd);
private native long Transaction_getRangeSplitPoints(long cPtr, byte[] keyBegin, byte[] keyEnd, long chunkSize); private native long Transaction_getRangeSplitPoints(long cPtr, byte[] keyBegin, byte[] keyEnd, long chunkSize);
private native long Transaction_getBlobGranuleRanges(long cPtr, byte[] keyBegin, byte[] keyEnd, int rowLimit);
} }

View File

@ -0,0 +1,37 @@
/*
* FutureBool.java
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2019 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.apple.foundationdb;
import java.util.concurrent.Executor;
class FutureBool extends NativeFuture<Boolean> {
FutureBool(long cPtr, Executor executor) {
super(cPtr);
registerMarshalCallback(executor);
}
@Override
protected Boolean getIfDone_internal(long cPtr) throws FDBException {
return FutureBool_get(cPtr);
}
private native boolean FutureBool_get(long cPtr) throws FDBException;
}

View File

@ -0,0 +1,37 @@
/*
* FutureKeyRangeArray.java
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2019 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.apple.foundationdb;
import java.util.concurrent.Executor;
class FutureKeyRangeArray extends NativeFuture<KeyRangeArrayResult> {
FutureKeyRangeArray(long cPtr, Executor executor) {
super(cPtr);
registerMarshalCallback(executor);
}
@Override
protected KeyRangeArrayResult getIfDone_internal(long cPtr) throws FDBException {
return FutureKeyRangeArray_get(cPtr);
}
private native KeyRangeArrayResult FutureKeyRangeArray_get(long cPtr) throws FDBException;
}

View File

@ -0,0 +1,36 @@
/*
* KeyRangeArrayResult.java
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2020 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.apple.foundationdb;
import java.util.Arrays;
import java.util.List;
public class KeyRangeArrayResult {
final List<Range> keyRanges;
public KeyRangeArrayResult(Range[] keyRangeArr) {
this.keyRanges = Arrays.asList(keyRangeArr);
}
public List<Range> getKeyRanges() {
return keyRanges;
}
}

View File

@ -513,6 +513,17 @@ public interface ReadTransaction extends ReadTransactionContext {
*/ */
CompletableFuture<KeyArrayResult> getRangeSplitPoints(Range range, long chunkSize); CompletableFuture<KeyArrayResult> getRangeSplitPoints(Range range, long chunkSize);
/**
* Gets the blob granule ranges for a given region.
* Returned in batches, requires calling again moving the begin key up.
*
* @param begin beginning of the range (inclusive)
* @param end end of the range (exclusive)
* @return list of blob granules in the given range. May not be all.
*/
CompletableFuture<KeyRangeArrayResult> getBlobGranuleRanges(byte[] begin, byte[] end, int rowLimit);
/** /**
* Returns a set of options that can be set on a {@code Transaction} * Returns a set of options that can be set on a {@code Transaction}

View File

@ -29,6 +29,7 @@ import java.util.Optional;
import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.CompletableFuture; import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ConcurrentHashMap;
import com.apple.foundationdb.Database; import com.apple.foundationdb.Database;
import com.apple.foundationdb.FDB; import com.apple.foundationdb.FDB;
@ -64,7 +65,7 @@ abstract class Context implements Runnable, AutoCloseable {
private List<Thread> children = new LinkedList<>(); private List<Thread> children = new LinkedList<>();
private static Map<String, TransactionState> transactionMap = new HashMap<>(); private static Map<String, TransactionState> transactionMap = new HashMap<>();
private static Map<Transaction, AtomicInteger> transactionRefCounts = new HashMap<>(); private static Map<Transaction, AtomicInteger> transactionRefCounts = new HashMap<>();
private static Map<byte[], Tenant> tenantMap = new HashMap<>(); private static Map<byte[], Tenant> tenantMap = new ConcurrentHashMap<>();
Context(Database db, byte[] prefix) { Context(Database db, byte[] prefix) {
this.db = db; this.db = db;

View File

@ -66,6 +66,9 @@ def test_size_limit_option(db):
except fdb.FDBError as e: except fdb.FDBError as e:
assert(e.code == 2101) # Transaction exceeds byte limit (2101) assert(e.code == 2101) # Transaction exceeds byte limit (2101)
# Reset the size limit for future tests
db.options.set_transaction_size_limit(10000000)
@fdb.transactional @fdb.transactional
def test_get_approximate_size(tr): def test_get_approximate_size(tr):
tr[b'key1'] = b'value1' tr[b'key1'] = b'value1'

View File

@ -142,7 +142,7 @@ function(add_fdb_test)
${VALGRIND_OPTION} ${VALGRIND_OPTION}
${ADD_FDB_TEST_TEST_FILES} ${ADD_FDB_TEST_TEST_FILES}
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}) WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
set_tests_properties("${test_name}" PROPERTIES ENVIRONMENT UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1) set_tests_properties("${test_name}" PROPERTIES ENVIRONMENT "${SANITIZER_OPTIONS}")
get_filename_component(test_dir_full ${first_file} DIRECTORY) get_filename_component(test_dir_full ${first_file} DIRECTORY)
if(NOT ${test_dir_full} STREQUAL "") if(NOT ${test_dir_full} STREQUAL "")
get_filename_component(test_dir ${test_dir_full} NAME) get_filename_component(test_dir ${test_dir_full} NAME)
@ -172,8 +172,7 @@ function(stage_correctness_package)
file(MAKE_DIRECTORY ${STAGE_OUT_DIR}/bin) file(MAKE_DIRECTORY ${STAGE_OUT_DIR}/bin)
string(LENGTH "${CMAKE_SOURCE_DIR}/tests/" base_length) string(LENGTH "${CMAKE_SOURCE_DIR}/tests/" base_length)
foreach(test IN LISTS TEST_NAMES) foreach(test IN LISTS TEST_NAMES)
if(("${TEST_TYPE_${test}}" STREQUAL "simulation") AND if((${test} MATCHES ${TEST_PACKAGE_INCLUDE}) AND
(${test} MATCHES ${TEST_PACKAGE_INCLUDE}) AND
(NOT ${test} MATCHES ${TEST_PACKAGE_EXCLUDE})) (NOT ${test} MATCHES ${TEST_PACKAGE_EXCLUDE}))
foreach(file IN LISTS TEST_FILES_${test}) foreach(file IN LISTS TEST_FILES_${test})
string(SUBSTRING ${file} ${base_length} -1 rel_out_file) string(SUBSTRING ${file} ${base_length} -1 rel_out_file)
@ -404,7 +403,7 @@ endfunction()
# Creates a single cluster before running the specified command (usually a ctest test) # Creates a single cluster before running the specified command (usually a ctest test)
function(add_fdbclient_test) function(add_fdbclient_test)
set(options DISABLED ENABLED DISABLE_LOG_DUMP API_TEST_BLOB_GRANULES_ENABLED TLS_ENABLED) set(options DISABLED ENABLED DISABLE_TENANTS DISABLE_LOG_DUMP API_TEST_BLOB_GRANULES_ENABLED TLS_ENABLED)
set(oneValueArgs NAME PROCESS_NUMBER TEST_TIMEOUT WORKING_DIRECTORY) set(oneValueArgs NAME PROCESS_NUMBER TEST_TIMEOUT WORKING_DIRECTORY)
set(multiValueArgs COMMAND) set(multiValueArgs COMMAND)
cmake_parse_arguments(T "${options}" "${oneValueArgs}" "${multiValueArgs}" "${ARGN}") cmake_parse_arguments(T "${options}" "${oneValueArgs}" "${multiValueArgs}" "${ARGN}")
@ -431,6 +430,9 @@ function(add_fdbclient_test)
if(T_DISABLE_LOG_DUMP) if(T_DISABLE_LOG_DUMP)
list(APPEND TMP_CLUSTER_CMD --disable-log-dump) list(APPEND TMP_CLUSTER_CMD --disable-log-dump)
endif() endif()
if(T_DISABLE_TENANTS)
list(APPEND TMP_CLUSTER_CMD --disable-tenants)
endif()
if(T_API_TEST_BLOB_GRANULES_ENABLED) if(T_API_TEST_BLOB_GRANULES_ENABLED)
list(APPEND TMP_CLUSTER_CMD --blob-granules-enabled) list(APPEND TMP_CLUSTER_CMD --blob-granules-enabled)
endif() endif()
@ -449,7 +451,7 @@ function(add_fdbclient_test)
# default timeout # default timeout
set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT 300) set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT 300)
endif() endif()
set_tests_properties("${T_NAME}" PROPERTIES ENVIRONMENT UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1) set_tests_properties("${T_NAME}" PROPERTIES ENVIRONMENT "${SANITIZER_OPTIONS}")
endfunction() endfunction()
# Creates a cluster file for a nonexistent cluster before running the specified command # Creates a cluster file for a nonexistent cluster before running the specified command
@ -483,7 +485,7 @@ function(add_unavailable_fdbclient_test)
# default timeout # default timeout
set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT 60) set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT 60)
endif() endif()
set_tests_properties("${T_NAME}" PROPERTIES ENVIRONMENT UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1) set_tests_properties("${T_NAME}" PROPERTIES ENVIRONMENT "${SANITIZER_OPTIONS}")
endfunction() endfunction()
# Creates 3 distinct clusters before running the specified command. # Creates 3 distinct clusters before running the specified command.

View File

@ -69,6 +69,7 @@ if(WIN32)
add_definitions(-DWIN32_LEAN_AND_MEAN) add_definitions(-DWIN32_LEAN_AND_MEAN)
add_definitions(-D_ITERATOR_DEBUG_LEVEL=0) add_definitions(-D_ITERATOR_DEBUG_LEVEL=0)
add_definitions(-DNOGDI) # WinGDI.h defines macro ERROR add_definitions(-DNOGDI) # WinGDI.h defines macro ERROR
add_definitions(-D_USE_MATH_DEFINES) # Math constants
endif() endif()
if (USE_CCACHE) if (USE_CCACHE)
@ -191,6 +192,7 @@ else()
endif() endif()
if(USE_GCOV) if(USE_GCOV)
add_compile_options(--coverage)
add_link_options(--coverage) add_link_options(--coverage)
endif() endif()

View File

@ -302,6 +302,7 @@ namespace SummarizeTest
uniqueFileSet.Add(file.Substring(0, file.LastIndexOf("-"))); // all restarting tests end with -1.txt or -2.txt uniqueFileSet.Add(file.Substring(0, file.LastIndexOf("-"))); // all restarting tests end with -1.txt or -2.txt
} }
uniqueFiles = uniqueFileSet.ToArray(); uniqueFiles = uniqueFileSet.ToArray();
Array.Sort(uniqueFiles);
testFile = random.Choice(uniqueFiles); testFile = random.Choice(uniqueFiles);
// The on-disk format changed in 4.0.0, and 5.x can't load files from 3.x. // The on-disk format changed in 4.0.0, and 5.x can't load files from 3.x.
string oldBinaryVersionLowerBound = "4.0.0"; string oldBinaryVersionLowerBound = "4.0.0";
@ -334,8 +335,9 @@ namespace SummarizeTest
// thus, by definition, if "until_" appears, we do not want to run with the current binary version // thus, by definition, if "until_" appears, we do not want to run with the current binary version
oldBinaries = oldBinaries.Concat(currentBinary); oldBinaries = oldBinaries.Concat(currentBinary);
} }
List<string> oldBinariesList = oldBinaries.ToList<string>(); string[] oldBinariesList = oldBinaries.ToArray<string>();
if (oldBinariesList.Count == 0) { Array.Sort(oldBinariesList);
if (oldBinariesList.Count() == 0) {
// In theory, restarting tests are named to have at least one old binary version to run // In theory, restarting tests are named to have at least one old binary version to run
// But if none of the provided old binaries fall in the range, we just skip the test // But if none of the provided old binaries fall in the range, we just skip the test
Console.WriteLine("No available old binary version from {0} to {1}", oldBinaryVersionLowerBound, oldBinaryVersionUpperBound); Console.WriteLine("No available old binary version from {0} to {1}", oldBinaryVersionLowerBound, oldBinaryVersionUpperBound);
@ -347,6 +349,7 @@ namespace SummarizeTest
else else
{ {
uniqueFiles = Directory.GetFiles(testDir); uniqueFiles = Directory.GetFiles(testDir);
Array.Sort(uniqueFiles);
testFile = random.Choice(uniqueFiles); testFile = random.Choice(uniqueFiles);
} }
} }
@ -718,7 +721,7 @@ namespace SummarizeTest
process.Refresh(); process.Refresh();
if (process.HasExited) if (process.HasExited)
return; return;
long mem = process.PrivateMemorySize64; long mem = process.PagedMemorySize64;
MaxMem = Math.Max(MaxMem, mem); MaxMem = Math.Max(MaxMem, mem);
//Console.WriteLine(string.Format("Process used {0} bytes", MaxMem)); //Console.WriteLine(string.Format("Process used {0} bytes", MaxMem));
Thread.Sleep(1000); Thread.Sleep(1000);
@ -927,6 +930,10 @@ namespace SummarizeTest
{ {
xout.Add(new XElement(ev.Type, new XAttribute("File", ev.Details.File), new XAttribute("Line", ev.Details.Line))); xout.Add(new XElement(ev.Type, new XAttribute("File", ev.Details.File), new XAttribute("Line", ev.Details.Line)));
} }
if (ev.Type == "RunningUnitTest")
{
xout.Add(new XElement(ev.Type, new XAttribute("Name", ev.Details.Name), new XAttribute("File", ev.Details.File), new XAttribute("Line", ev.Details.Line)));
}
if (ev.Type == "TestsExpectedToPass") if (ev.Type == "TestsExpectedToPass")
testCount = int.Parse(ev.Details.Count); testCount = int.Parse(ev.Details.Count);
if (ev.Type == "TestResults" && ev.Details.Passed == "1") if (ev.Type == "TestResults" && ev.Details.Passed == "1")

View File

@ -165,7 +165,6 @@ def centos_image_with_fdb_helper(versioned: bool) -> Iterator[Optional[Image]]:
container = Container("centos:7", initd=True) container = Container("centos:7", initd=True)
for rpm in rpms: for rpm in rpms:
container.copy_to(rpm, "/opt") container.copy_to(rpm, "/opt")
container.run(["bash", "-c", "yum update -y"])
container.run( container.run(
["bash", "-c", "yum install -y prelink"] ["bash", "-c", "yum install -y prelink"]
) # this is for testing libfdb_c execstack permissions ) # this is for testing libfdb_c execstack permissions
@ -327,7 +326,7 @@ def test_execstack_permissions_libfdb_c(linux_container: Container, snapshot):
[ [
"bash", "bash",
"-c", "-c",
"execstack -q $(ldconfig -p | grep libfdb_c | awk '{print $(NF)}')", "execstack -q $(ldconfig -p | grep libfdb_c.so | awk '{print $(NF)}')",
] ]
) )

View File

@ -284,6 +284,12 @@ class ErrorCommitInfo(BaseInfo):
if protocol_version >= PROTOCOL_VERSION_6_3: if protocol_version >= PROTOCOL_VERSION_6_3:
self.report_conflicting_keys = bb.get_bool() self.report_conflicting_keys = bb.get_bool()
if protocol_version >= PROTOCOL_VERSION_7_1:
lock_aware = bb.get_bool()
if bb.get_bool():
spanId = bb.get_bytes(16)
class UnsupportedProtocolVersionError(Exception): class UnsupportedProtocolVersionError(Exception):
def __init__(self, protocol_version): def __init__(self, protocol_version):
super().__init__("Unsupported protocol version 0x%0.2X" % protocol_version) super().__init__("Unsupported protocol version 0x%0.2X" % protocol_version)

View File

@ -0,0 +1,5 @@
# ThreadSanitizer suppressions file for FDB
# https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions
# FDB signal handler is not async-signal safe
signal:crashHandler

View File

@ -20,7 +20,7 @@ Data distribution manages the lifetime of storage servers, decides which storage
**RelocateShard (`struct RelocateShard`)**: A `RelocateShard` records the key range that need to be moved among servers and the data movements priority. DD always move shards with higher priorities first. **RelocateShard (`struct RelocateShard`)**: A `RelocateShard` records the key range that need to be moved among servers and the data movements priority. DD always move shards with higher priorities first.
**Data distribution queue (`struct DDQueueData`)**: It receives shards to be relocated (i.e., RelocateShards), decides which shard should be moved to which server team, prioritizes the data movement based on relocate shards priority, and controls the progress of data movement based on servers workload. **Data distribution queue (`struct DDQueue`)**: It receives shards to be relocated (i.e., RelocateShards), decides which shard should be moved to which server team, prioritizes the data movement based on relocate shards priority, and controls the progress of data movement based on servers workload.
**Special keys in the system keyspace**: DD saves its state in the system keyspace to recover from failure and to ensure every process (e.g., commit proxies, tLogs and storage servers) has a consistent view of which storage server is responsible for which key range. **Special keys in the system keyspace**: DD saves its state in the system keyspace to recover from failure and to ensure every process (e.g., commit proxies, tLogs and storage servers) has a consistent view of which storage server is responsible for which key range.
@ -153,3 +153,25 @@ CPU utilization. This metric is in a positive relationship with “FinishedQueri
* The typical movement size under a read-skew scenario is 100M ~ 600M under default KNOB value `READ_REBALANCE_MAX_SHARD_FRAC=0.2, READ_REBALANCE_SRC_PARALLELISM = 20`. Increasing those knobs may accelerate the converge speed with the risk of data movement churn, which overwhelms the destination and over-cold the source. * The typical movement size under a read-skew scenario is 100M ~ 600M under default KNOB value `READ_REBALANCE_MAX_SHARD_FRAC=0.2, READ_REBALANCE_SRC_PARALLELISM = 20`. Increasing those knobs may accelerate the converge speed with the risk of data movement churn, which overwhelms the destination and over-cold the source.
* The upper bound of `READ_REBALANCE_MAX_SHARD_FRAC` is 0.5. Any value larger than 0.5 can result in hot server switching. * The upper bound of `READ_REBALANCE_MAX_SHARD_FRAC` is 0.5. Any value larger than 0.5 can result in hot server switching.
* When needing a deeper diagnosis of the read aware DD, `BgDDMountainChopper_New`, and `BgDDValleyFiller_New` trace events are where to go. * When needing a deeper diagnosis of the read aware DD, `BgDDMountainChopper_New`, and `BgDDValleyFiller_New` trace events are where to go.
## Data Distribution Diagnosis Q&A
* Why Read-aware DD hasn't been triggered when there's a read imbalance?
* Check `BgDDMountainChopper_New`, `BgDDValleyFiller_New` `SkipReason` field.
* The Read-aware DD is triggered, and some data movement happened, but it doesn't help the read balance. Why?
* Need to figure out which server is selected as the source and destination. The information is in `BgDDMountainChopper*`, `BgDDValleyFiller*` `DestTeam` and `SourceTeam` field.
* Also, the `DDQueueServerCounter` event tells how many times a server being a source or destination (defined in
```c++
enum CountType : uint8_t { ProposedSource = 0, QueuedSource, LaunchedSource, LaunchedDest };
```
) for different relocation reason (`Other`, `RebalanceDisk` and so on) in different phase within `DD_QUEUE_COUNTER_REFRESH_INTERVAL` (default 60) seconds. For example,
```xml
<Event Severity="10" Time="1659974950.984176" DateTime="2022-08-08T16:09:10Z" Type="DDQueueServerCounter" ID="0000000000000000" ServerId="0000000000000004" OtherPQSD="0 1 3 2" RebalanceDiskPQSD="0 0 1 4" RebalanceReadPQSD="2 0 0 5" MergeShardPQSD="0 0 1 0" SizeSplitPQSD="0 0 5 0" WriteSplitPQSD="1 0 0 0" ThreadID="9733255463206053180" Machine="0.0.0.0:0" LogGroup="default" Roles="TS" />
```
`RebalanceReadPQSD="2 0 0 5"` means server `0000000000000004` has been selected as for read balancing for twice, but it's not queued and executed yet. This server also has been a destination for read balancing for 5 times in the past 1 min. Note that the field will be skipped if all 4 numbers are 0. To avoid spammy traces, if is enabled with knob `DD_QUEUE_COUNTER_SUMMARIZE = true`, event `DDQueueServerCounterTooMany` will summarize the unreported servers that involved in launched relocations (aka. `LaunchedSource`, `LaunchedDest` count are non-zero):
```xml
<Event Severity="10" Time="1660095057.995837" DateTime="2022-08-10T01:30:57Z" Type="DDQueueServerCounterTooMany" ID="0000000000000000" RemainedLaunchedSources="000000000000007f,00000000000000d9,00000000000000e8,000000000000014c,0000000000000028,00000000000000d6,0000000000000067,000000000000003e,000000000000007d,000000000000000a,00000000000000cb,0000000000000106,00000000000000c1,000000000000003c,000000000000016e,00000000000000e4,000000000000013c,0000000000000016,0000000000000179,0000000000000061,00000000000000c2,000000000000005a,0000000000000001,00000000000000c9,000000000000012a,00000000000000fb,0000000000000146," RemainedLaunchedDestinations="0000000000000079,0000000000000115,000000000000018e,0000000000000167,0000000000000135,0000000000000139,0000000000000077,0000000000000118,00000000000000bb,0000000000000177,00000000000000c0,000000000000014d,000000000000017f,00000000000000c3,000000000000015c,00000000000000fb,0000000000000186,0000000000000157,00000000000000b6,0000000000000072,0000000000000144," ThreadID="1322639651557440362" Machine="0.0.0.0:0" LogGroup="default" Roles="TS" />
```
* How to track the lifecycle of a relocation attempt for balancing?
* First find the TraceId fields in `BgDDMountainChopper*`, `BgDDValleyFiller*`, which indicates a relocation is triggered.
* (Only when enabled) Find the `QueuedRelocation` event with the same `BeginPair` and `EndPair` as the original `TraceId`. This means the relocation request is queued.
* Find the `RelocateShard` event whose `BeginPair`, `EndPair` field is the same as `TraceId`. This event means the relocation is ongoing.

View File

@ -128,3 +128,17 @@ On storage servers, every `SERVER_KNOBS->TAG_MEASUREMENT_INTERVAL` seconds, ther
### Status ### Status
For each storage server, the busiest read tag is reported in the full status output, along with its cost and fractional busyness. For each storage server, the busiest read tag is reported in the full status output, along with its cost and fractional busyness.
At path `.cluster.qos.global_tag_throttler`, throttling limitations for each tag are reported:
```
{
"<tagName>": {
"desired_tps": <number>,
"reserved_tps": <number>,
"limiting_tps": [<number>|"unset"],
"target_tps": <number>
},
...
}
```

View File

@ -373,3 +373,302 @@ with the ``multitest`` role:
fdbserver -r multitest -f testfile.txt fdbserver -r multitest -f testfile.txt
This command will block until all tests are completed. This command will block until all tests are completed.
##########
API Tester
##########
Introduction
============
API tester is a framework for implementing end-to-end tests of FDB C API, i.e. testing the API on a real
FDB cluster through all layers of the FDB client. Its executable is ``fdb_c_api_tester``, and the source
code is located in ``bindings/c/test/apitester``. The structure of API Tests is similar to that of the
Simulation Tests. The tests are implemented as workloads using FDB API, which are all built into the
``fdb_c_api_tester``. A concrete test configuration is defined as a TOML file, which specifies the
combination of workloads to be executed by the test together with their parameters. The test can be then
executed by passing the TOML file as a parameter to ``fdb_c_api_tester``.
Since simulation tests rely on the actor model to execute the tests deterministically in single-threaded
mode, they are not suitable for testing various multi-threaded aspects of the FDB client. End-to-end API
tests complement the simulation tests by testing the FDB Client layers above the single-threaded Native
Client.
- The specific testing goals of the end-to-end tests are:
- Check functional correctness of the Multi-Version Client (MVC) and Thread-Safe Client
- Detecting race conditions. They can be caused by accessing the state of the Native Client from wrong
threads or introducing other shared state without proper synchronization
- Detecting memory management errors. Thread-safe reference counting must be used where necessary. MVC
works with multiple client libraries. Memory allocated by one client library must be also deallocated
by the same library.
- Maintaining interoperability with other client versions. The client functionality is made available
depending on the selected API version. The API changes are correctly adapted.
- Client API behaves correctly in case of cluster upgrades. Database and transaction state is correctly
migrated to the upgraded connections. Pending operations are canceled and successfully retried on the
upgraded connections.
Implementing a Workload
=======================
Each workload is declared as a direct or indirect subclass of ``WorkloadBase`` implementing a constructor
with ``WorkloadConfig`` as a parameter and the method ``start()``, which defines the entry point of the
workload.
``WorkloadBase`` provides a set of methods that serve as building blocks for implementation of a workload:
.. function:: execTransaction(start, cont, failOnError = true)
creates and executes an FDB transaction. Here ``start`` is a function that takes a transaction context
as parameter and implements the starting point of the transaction, and ``cont`` is a function implementing
a continuation to be executed after finishing the transaction execution. Transactions are automatically
retried on retryable errors. Transactions are retried by calling the ``start`` function again. In case
of a fatal error, the entire workload is considered as failed unless ``failOnError`` is set to ``false``.
.. function:: schedule(task)
schedules a task for asynchronous execution. It is usually used in the continuations to schedule
the next step of the workload.
.. function:: info(msg)
error(msg)
are used for logging a message with a tag identifying the workload. Issuing an error message marks
the workload as failed.
The transaction context provides methods for implementation of the transaction logics:
.. function:: tx()
the reference to the FDB transaction object
.. function:: continueAfter(future, cont, retryOnError = true)
set a continuation to be executed when the future is ready. The ``retryOnError`` flag controls whether
the transaction should be automatically retried in case the future results in a retriable error.
.. function:: continueAfterAll(futures, cont)
takes a vector of futures and sets a continuation to be executed when all of the futures get ready.
The transaction is retried if at least one of the futures results in an error. This method is useful
for handling multiple concurrent reads.
.. function:: commit()
commit and finish the transaction. If the commit is successful, the execution proceeds to the
continuation of ``execTransaction()``. In case of a retriable error the transaction is
automatically retried. A fatal error results in a failure of the workoad.
.. function:: done()
finish the transaction without committing. This method should be used to finish read transactions.
The transaction gets destroyed and execution proceeds to the continuation of ``execTransaction()``.
Each transaction must be finished either by ``commit()`` or ``done()``, because otherwise
the framework considers that the transaction is still being executed, so it won't destroy it and
won't call the continuation.
.. function:: onError(err)
Handle an error: restart the transaction in case of a retriable error, otherwise fail the workload.
This method is typically used in the continuation of ``continueAfter`` called with
``retryOnError=false`` as a fallback to the default error handling.
A workload execution ends automatically when it is marked as failed or its last continuation does not
schedule any new task or transaction.
The workload class should be defined in the namespace FdbApiTester. The file name convention is
``Tester{Name}Workload.cpp`` so that we distinguish them from the source files of simulation workloads.
Basic Workload Example
======================
The code below implements a workload that consists of only two transactions. The first one sets a
randomly generated key to a randomly generated value, and the second one reads the key and checks if
the returned value matches the written one.
.. literalinclude:: ../../../bindings/c/test/apitester/TesterExampleWorkload.cpp
:language: C++
:lines: 21-
The workload is implemented in the method ``setAndGet``. It generates a random key and a random value
and executes a transaction that writes that key-value pair and commits. In the continuation of the
first ``execTransaction`` call, we execute the second transaction that reads the same key. The read
operation returns a future. So we call ``continueAfter`` to set a continuation for that future. In the
continuation we check if the returned value matches the written one and finish the transaction by
calling ``ctx->done()``. After completing the second transaction we execute the continuation passed
as parameter to the ``setAndGet`` method by the start method. In this case it is ``NO_OP_TASK``, which
does nothing and so finishes the workload.
Finally, we declare an instance ``WorkloadFactory`` to register this workload with the name ``SetAndGet``.
Note that we use ``workloadId`` as a key prefix. This is necessary for isolating the key space of this
workload, because the framework may be instructed to create multiple instances of the ``SetAndGet``
workload. If we do not isolate the key space, another workload can write a different value for the
same key and so break the assumption of the test.
The workload is implemented using the internal C++ API, implemented in ``fdb_api.hpp``. It introduces
a set of classes representing the FDB objects (transactions, futures, etc.). These classes provide C++-style
methods wrapping FDB C API calls and automate memory management by means of reference counting.
Implementing Control Structures
===============================
Our basic workload executes just 2 transactions, but in practice we want to have workloads that generate
multiple transactions. The following code demonstrates how we can modify our basic workload to generate
multiple transactions in a loop.
.. code-block:: C++
class SetAndGetWorkload : public WorkloadBase {
public:
...
int numIterations;
int iterationsLeft;
SetAndGetWorkload(const WorkloadConfig& config) : WorkloadBase(config) {
keyPrefix = fdb::toBytesRef(fmt::format("{}/", workloadId));
numIterations = config.getIntOption("numIterations", 1000);
}
void start() override {
iterationsLeft = numIterations;
setAndGetLoop();
}
void setAndGetLoop() {
if (iterationsLeft == 0) {
return;
}
iterationsLeft--;
setAndGet([this]() { setAndGetLoop(); });
}
...
}
We introduce a workload parameter ``numIterations`` to specify the number of iterations. If not specified
in the test configuration it defaults to 1000.
The method ``setAndGetLoop`` implements the loop that decrements iterationsLeft counter until it reaches 0
and each iteration calls setAndGet with a continuation that returns the execution to the loop. As you
can see we don't need any change in ``setAndGet``, just call it with another continuation.
The pattern of passing a continuation as a parameter also can be used to decompose the workload into a
sequence of steps. For example, we can introduce setup and cleanUp steps to our workload and modify the
``setAndGetLoop`` to make it composable with an arbitrary continuation:
.. code-block:: C++
void start() override {
setup([this](){
iterationsLeft = numIterations;
setAndGetLoop([this](){
cleanup(NO_OP_TASK);
});
});
}
void setAndGetLoop(TTaskFct cont) {
if (iterationsLeft == 0) {
schedule(cont);
}
iterationsLeft--;
setAndGet([this, cont]() { setAndGetLoop(cont); });
}
void setup(TTaskFct cont) { ... }
void cleanup(TTaskFct cont) { ... }
Note that we call ``schedule(cont)`` in ``setAndGetLoop`` instead of calling the continuation directly.
In this way we avoid keeping ``setAndGetLoop`` in the call stack, when executing the next step.
Subclassing ApiWorkload
=======================
``ApiWorkload`` is an abstract subclass of ``WorkloadBase`` that provides a framework for a typical
implementation of API test workloads. It implements a workflow consisting of cleaning up the key space
of the workload, populating it with newly generated data and then running a loop consisting of random
database operations. The concrete subclasses of ``ApiWorkload`` are expected to override the method
``randomOperation`` with an implementation of concrete random operations.
The ``ApiWorkload`` maintains a local key-value store that mirrors the part of the database state
relevant to the workload. A successful database write operation should be followed by a continuation
that performs equivalent changes in the local store, and the results of a database read operation should
be validated against the values from the local store.
Test Configuration
==================
A concrete test configuration is specified by a TOML file. The file must contain one ``[[test]]`` section
specifying the general settings for test execution followed by one or more ``[[test.workload]]``
configuration sessions, specifying the workloads to be executed and their parameters. The specified
workloads are started all at once and executed concurrently.
The ``[[test]]`` section can contain the following options:
- ``title``: descriptive title of the test
- ``multiThreaded``: enable multi-threading (default: false)
- ``minFdbThreads`` and ``maxFdbThreads``: the number of FDB (network) threads to be randomly selected
from the given range (default: 1-1). Used only if ``multiThreaded=true``. It is also important to use
multiple database instances to make use of the multithreading.
- ``minDatabases`` and ``maxDatabases``: the number of database instances to be randomly selected from
the given range (default 1-1). The transactions of all workloads are randomly load-balanced over the
pool of database instances.
- ``minClients`` and ``maxClients``: the number of clients, i.e. instances of each workload, to be
randomly selected from the given range (default 1-8).
- ``minClientThreads`` and ``maxClientThreads``: the number of client threads, i.e. the threads used
for execution of the workload, to be randomly selected from the given range (default 1-1).
- ``blockOnFutures``: use blocking waits on futures instead of scheduling future callbacks asynchronously
(default: false)
- ``buggify``: Enable client-side failure injection (default: false)
- ``databasePerTransaction``: Create a separate database instance for each transaction (default: false).
It is a special mode useful for testing bugs related to creation and destruction of database instances.
- ``fdbCallbacksOnExternalThreads``: Enables the option ``FDB_NET_OPTION_CALLBACKS_ON_EXTERNAL_THREADS``
causting the callbacks of futures to be executed directly on the threads of the external FDB clients
rather than on the thread of the local FDB client.
The workload section ``[[test.workload]]`` must contain the attribute name matching the registered name
of the workload to be executed. Other options are workload-specific.
The subclasses of the ``ApiWorkload`` inherit the following configuration options:
- ``minKeyLength`` and ``maxKeyLength``: the size range of randomly generated keys (default: 1-64)
- ``minValueLength`` and ``maxValueLength``: the size range of randomly generated values
(default: 1-1000)
- ``maxKeysPerTransaction``: the maximum number of keys per transaction (default: 50)
- ``initialSize``: the number of key-value pairs in the initially populated database (default: 1000)
- ``readExistingKeysRatio``: the probability of choosing an existing key for read operations
(default: 0.9)
- ``numRandomOperations``: the number of random operations to be executed per workload (default: 1000)
- ``runUntilStop``: run the workload indefinitely until the stop command is received (default: false).
This execution mode in upgrade tests and other scripted tests, where the workload needs to
be generated continously until completion of the scripted test.
- ``numOperationsForProgressCheck``: the number of operations to be performed to confirm a progress
check (default: 10). This option is used in combination with ``runUntilStop``. Progress checks are
initiated by a test script to check if the client workload is successfully progressing after a
cluster change.
Executing the Tests
===================
The ``fdb_c_api_tester`` executable takes a single TOML file as a parameter and executes the test
according to its specification. Before that we must create a FDB cluster and pass its cluster file as
a parameter to ``fdb_c_api_tester``. Note that multithreaded tests also need to be provided with an
external client library.
For example, we can create a temporary cluster and use it for execution of one of the existing API tests:
.. code-block:: bash
${srcDir}/tests/TestRunner/tmp_cluster.py --build-dir ${buildDir} -- \
${buildDir}/bin/fdb_c_api_tester \
--cluster-file @CLUSTER_FILE@ \
--external-client-library=${buildDir}/bindings/c/libfdb_c_external.so \
--test-file ${srcDir}/bindings/c/test/apitester/tests/CApiCorrectnessMultiThr.toml
The test specifications added to the ``bindings/c/test/apitester/tests/`` directory are executed as a part
of the regression test suite. They can be executed using the ``ctest`` target ``fdb_c_api_tests``:
.. code-block:: bash
ctest -R fdb_c_api_tests -VV

View File

@ -379,7 +379,9 @@
"log_server_min_free_space", "log_server_min_free_space",
"log_server_min_free_space_ratio", "log_server_min_free_space_ratio",
"storage_server_durability_lag", "storage_server_durability_lag",
"storage_server_list_fetch_failed" "storage_server_list_fetch_failed",
"blob_worker_lag",
"blob_worker_missing"
] ]
}, },
"description":"The database is not being saturated by the workload." "description":"The database is not being saturated by the workload."
@ -400,7 +402,9 @@
"log_server_min_free_space", "log_server_min_free_space",
"log_server_min_free_space_ratio", "log_server_min_free_space_ratio",
"storage_server_durability_lag", "storage_server_durability_lag",
"storage_server_list_fetch_failed" "storage_server_list_fetch_failed",
"blob_worker_lag",
"blob_worker_missing"
] ]
}, },
"description":"The database is not being saturated by the workload." "description":"The database is not being saturated by the workload."
@ -599,7 +603,7 @@
"counter":0, "counter":0,
"roughness":0.0 "roughness":0.0
}, },
"memory_errors":{ // measures number of proxy_memory_limit_exceeded errors "memory_errors":{ // measures number of (commit/grv)_proxy_memory_limit_exceeded errors
"hz":0.0, "hz":0.0,
"counter":0, "counter":0,
"roughness":0.0 "roughness":0.0

View File

@ -131,6 +131,9 @@ min_free_space_ratio Running out of space (approaching 5% limit).
log_server_min_free_space Log server running out of space (approaching 100MB limit). log_server_min_free_space Log server running out of space (approaching 100MB limit).
log_server_min_free_space_ratio Log server running out of space (approaching 5% limit). log_server_min_free_space_ratio Log server running out of space (approaching 5% limit).
storage_server_durability_lag Storage server durable version falling behind. storage_server_durability_lag Storage server durable version falling behind.
storage_server_list_fetch_failed Unable to fetch storage server list.
blob_worker_lag Blob worker granule version falling behind.
blob_worker_missing No blob workers are reporting metrics.
=================================== ==================================================== =================================== ====================================================
The JSON path ``cluster.qos.throttled_tags``, when it exists, is an Object containing ``"auto"`` , ``"manual"`` and ``"recommended"``. The possible fields for those object are in the following table: The JSON path ``cluster.qos.throttled_tags``, when it exists, is an Object containing ``"auto"`` , ``"manual"`` and ``"recommended"``. The possible fields for those object are in the following table:

View File

@ -2,6 +2,19 @@
Release Notes Release Notes
############# #############
7.1.19
======
* Same as 7.1.18 release with AVX enabled.
7.1.18
======
* Released with AVX disabled.
* Added knobs for the minimum and the maximum of the Ratekeeper's default priority. `(PR #7820) <https://github.com/apple/foundationdb/pull/7820>`_
* Fixed bugs in ``getRange`` of the special key space. `(PR #7778) <https://github.com/apple/foundationdb/pull/7778>`_, `(PR #7720) <https://github.com/apple/foundationdb/pull/7720>`_
* Added debug ID for secondary queries in index prefetching. `(PR #7755) <https://github.com/apple/foundationdb/pull/7755>`_
* Changed hostname resolving to prefer IPv6 addresses. `(PR #7750) <https://github.com/apple/foundationdb/pull/7750>`_
* Added more transaction debug events for prefetch queries. `(PR #7732) <https://github.com/apple/foundationdb/pull/7732>`_
7.1.17 7.1.17
====== ======
* Same as 7.1.16 release with AVX enabled. * Same as 7.1.16 release with AVX enabled.
@ -15,7 +28,7 @@ Release Notes
* Fixed ScopeEventFieldTypeMismatch error for TLogMetrics. `(PR #7640) <https://github.com/apple/foundationdb/pull/7640>`_ * Fixed ScopeEventFieldTypeMismatch error for TLogMetrics. `(PR #7640) <https://github.com/apple/foundationdb/pull/7640>`_
* Added getMappedRange latency metrics. `(PR #7632) <https://github.com/apple/foundationdb/pull/7632>`_ * Added getMappedRange latency metrics. `(PR #7632) <https://github.com/apple/foundationdb/pull/7632>`_
* Fixed a version vector performance bug due to not updating client side tag cache. `(PR #7616) <https://github.com/apple/foundationdb/pull/7616>`_ * Fixed a version vector performance bug due to not updating client side tag cache. `(PR #7616) <https://github.com/apple/foundationdb/pull/7616>`_
* Fixed DiskReadSeconds and DiskWriteSeconds calculaion in ProcessMetrics. `(PR #7609) <https://github.com/apple/foundationdb/pull/7609>`_ * Fixed DiskReadSeconds and DiskWriteSeconds calculation in ProcessMetrics. `(PR #7609) <https://github.com/apple/foundationdb/pull/7609>`_
* Added Rocksdb compression and data size stats. `(PR #7596) <https://github.com/apple/foundationdb/pull/7596>`_ * Added Rocksdb compression and data size stats. `(PR #7596) <https://github.com/apple/foundationdb/pull/7596>`_
7.1.15 7.1.15
@ -74,7 +87,7 @@ Release Notes
* Added support of the reboot command in go bindings. `(PR #7270) <https://github.com/apple/foundationdb/pull/7270>`_ * Added support of the reboot command in go bindings. `(PR #7270) <https://github.com/apple/foundationdb/pull/7270>`_
* Fixed several issues in profiling special keys using GlobalConfig. `(PR #7120) <https://github.com/apple/foundationdb/pull/7120>`_ * Fixed several issues in profiling special keys using GlobalConfig. `(PR #7120) <https://github.com/apple/foundationdb/pull/7120>`_
* Fixed a stuck transaction system bug due to inconsistent recovery transaction version. `(PR #7261) <https://github.com/apple/foundationdb/pull/7261>`_ * Fixed a stuck transaction system bug due to inconsistent recovery transaction version. `(PR #7261) <https://github.com/apple/foundationdb/pull/7261>`_
* Fixed a unknown_error crash due to not resolving hostnames. `(PR #7254) <https://github.com/apple/foundationdb/pull/7254>`_ * Fixed an unknown_error crash due to not resolving hostnames. `(PR #7254) <https://github.com/apple/foundationdb/pull/7254>`_
* Fixed a heap-use-after-free bug. `(PR #7250) <https://github.com/apple/foundationdb/pull/7250>`_ * Fixed a heap-use-after-free bug. `(PR #7250) <https://github.com/apple/foundationdb/pull/7250>`_
* Fixed a performance issue that remote TLogs are sending too many pops to log routers. `(PR #7235) <https://github.com/apple/foundationdb/pull/7235>`_ * Fixed a performance issue that remote TLogs are sending too many pops to log routers. `(PR #7235) <https://github.com/apple/foundationdb/pull/7235>`_
* Fixed an issue that SharedTLogs are not displaced and leaking disk space. `(PR #7246) <https://github.com/apple/foundationdb/pull/7246>`_ * Fixed an issue that SharedTLogs are not displaced and leaking disk space. `(PR #7246) <https://github.com/apple/foundationdb/pull/7246>`_

View File

@ -22,6 +22,8 @@ Each special key that existed before api version 630 is its own module. These ar
#. ``\xff\xff/cluster_file_path`` - See :ref:`cluster file client access <cluster-file-client-access>` #. ``\xff\xff/cluster_file_path`` - See :ref:`cluster file client access <cluster-file-client-access>`
#. ``\xff\xff/status/json`` - See :doc:`Machine-readable status <mr-status>` #. ``\xff\xff/status/json`` - See :doc:`Machine-readable status <mr-status>`
#. ``\xff\xff/worker_interfaces`` - key as the worker's network address and value as the serialized ClientWorkerInterface, not transactional
Prior to api version 630, it was also possible to read a range starting at ``\xff\xff/worker_interfaces``. This is mostly an implementation detail of fdbcli, Prior to api version 630, it was also possible to read a range starting at ``\xff\xff/worker_interfaces``. This is mostly an implementation detail of fdbcli,
but it's available in api version 630 as a module with prefix ``\xff\xff/worker_interfaces/``. but it's available in api version 630 as a module with prefix ``\xff\xff/worker_interfaces/``.
@ -210,6 +212,7 @@ that process, and wait for necessary data to be moved away.
#. ``\xff\xff/management/options/failed_locality/force`` Read/write. Setting this key disables safety checks for writes to ``\xff\xff/management/failed_locality/<locality>``. Setting this key only has an effect in the current transaction and is not persisted on commit. #. ``\xff\xff/management/options/failed_locality/force`` Read/write. Setting this key disables safety checks for writes to ``\xff\xff/management/failed_locality/<locality>``. Setting this key only has an effect in the current transaction and is not persisted on commit.
#. ``\xff\xff/management/tenant/map/<tenant>`` Read/write. Setting a key in this range to any value will result in a tenant being created with name ``<tenant>``. Clearing a key in this range will delete the tenant with name ``<tenant>``. Reading all or a portion of this range will return the list of tenants currently present in the cluster, excluding any changes in this transaction. Values read in this range will be JSON objects containing the metadata for the associated tenants. #. ``\xff\xff/management/tenant/map/<tenant>`` Read/write. Setting a key in this range to any value will result in a tenant being created with name ``<tenant>``. Clearing a key in this range will delete the tenant with name ``<tenant>``. Reading all or a portion of this range will return the list of tenants currently present in the cluster, excluding any changes in this transaction. Values read in this range will be JSON objects containing the metadata for the associated tenants.
#. ``\xff\xff/management/tenant/rename/<tenant>`` Read/write. Setting a key in this range to an unused tenant name will result in the tenant with the name ``<tenant>`` to be renamed to the value provided. If the rename operation is a transaction retried in a loop, it is possible for the rename to be applied twice, in which case ``tenant_not_found`` or ``tenant_already_exists`` errors may be returned. This can be avoided by checking for the tenant's existence first. #. ``\xff\xff/management/tenant/rename/<tenant>`` Read/write. Setting a key in this range to an unused tenant name will result in the tenant with the name ``<tenant>`` to be renamed to the value provided. If the rename operation is a transaction retried in a loop, it is possible for the rename to be applied twice, in which case ``tenant_not_found`` or ``tenant_already_exists`` errors may be returned. This can be avoided by checking for the tenant's existence first.
#. ``\xff\xff/management/options/worker_interfaces/verify`` Read/write. Setting this key will add a verification phase in reading ``\xff\xff/worker_interfaces``. Setting this key only has an effect in the current transaction and is not persisted on commit. Try to establish connections with every worker from the list returned by Cluster Controller and only return those workers that the client can connect to. This option is now only used in fdbcli commands ``kill``, ``suspend`` and ``expensive_data_check`` to populate the worker list.
An exclusion is syntactically either an ip address (e.g. ``127.0.0.1``), or An exclusion is syntactically either an ip address (e.g. ``127.0.0.1``), or
an ip address and port (e.g. ``127.0.0.1:4500``) or any locality (e.g ``locality_dcid:primary-satellite`` or an ip address and port (e.g. ``127.0.0.1:4500``) or any locality (e.g ``locality_dcid:primary-satellite`` or

View File

@ -928,7 +928,7 @@ void parentWatcher(void* parentHandle) {
static void printVersion() { static void printVersion() {
printf("FoundationDB " FDB_VT_PACKAGE_NAME " (v" FDB_VT_VERSION ")\n"); printf("FoundationDB " FDB_VT_PACKAGE_NAME " (v" FDB_VT_VERSION ")\n");
printf("source version %s\n", getSourceVersion()); printf("source version %s\n", getSourceVersion());
printf("protocol %llx\n", (long long)currentProtocolVersion.version()); printf("protocol %llx\n", (long long)currentProtocolVersion().version());
} }
static void printBuildInformation() { static void printBuildInformation() {

View File

@ -23,6 +23,7 @@
#include "fdbclient/FDBOptions.g.h" #include "fdbclient/FDBOptions.g.h"
#include "fdbclient/IClientApi.h" #include "fdbclient/IClientApi.h"
#include "fdbclient/ManagementAPI.actor.h" #include "fdbclient/ManagementAPI.actor.h"
#include "fdbclient/NativeAPI.actor.h"
#include "flow/Arena.h" #include "flow/Arena.h"
#include "flow/FastRef.h" #include "flow/FastRef.h"
@ -31,33 +32,6 @@
namespace { namespace {
// copy to standalones for krm
ACTOR Future<Void> setBlobRange(Database db, Key startKey, Key endKey, Value value) {
state Reference<ReadYourWritesTransaction> tr = makeReference<ReadYourWritesTransaction>(db);
loop {
try {
tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
tr->setOption(FDBTransactionOptions::PRIORITY_SYSTEM_IMMEDIATE);
// FIXME: check that the set range is currently inactive, and that a revoked range is currently its own
// range in the map and fully set.
tr->set(blobRangeChangeKey, deterministicRandom()->randomUniqueID().toString());
// This is not coalescing because we want to keep each range logically separate.
wait(krmSetRange(tr, blobRangeKeys.begin, KeyRange(KeyRangeRef(startKey, endKey)), value));
wait(tr->commit());
printf("Successfully updated blob range [%s - %s) to %s\n",
startKey.printable().c_str(),
endKey.printable().c_str(),
value.printable().c_str());
return Void();
} catch (Error& e) {
wait(tr->onError(e));
}
}
}
ACTOR Future<Version> getLatestReadVersion(Database db) { ACTOR Future<Version> getLatestReadVersion(Database db) {
state Transaction tr(db); state Transaction tr(db);
loop { loop {
@ -78,7 +52,7 @@ ACTOR Future<Void> printAfterDelay(double delaySeconds, std::string message) {
return Void(); return Void();
} }
ACTOR Future<Void> doBlobPurge(Database db, Key startKey, Key endKey, Optional<Version> version) { ACTOR Future<Void> doBlobPurge(Database db, Key startKey, Key endKey, Optional<Version> version, bool force) {
state Version purgeVersion; state Version purgeVersion;
if (version.present()) { if (version.present()) {
purgeVersion = version.get(); purgeVersion = version.get();
@ -86,7 +60,7 @@ ACTOR Future<Void> doBlobPurge(Database db, Key startKey, Key endKey, Optional<V
wait(store(purgeVersion, getLatestReadVersion(db))); wait(store(purgeVersion, getLatestReadVersion(db)));
} }
state Key purgeKey = wait(db->purgeBlobGranules(KeyRange(KeyRangeRef(startKey, endKey)), purgeVersion, {})); state Key purgeKey = wait(db->purgeBlobGranules(KeyRange(KeyRangeRef(startKey, endKey)), purgeVersion, {}, force));
fmt::print("Blob purge registered for [{0} - {1}) @ {2}\n", startKey.printable(), endKey.printable(), purgeVersion); fmt::print("Blob purge registered for [{0} - {1}) @ {2}\n", startKey.printable(), endKey.printable(), purgeVersion);
@ -99,65 +73,10 @@ ACTOR Future<Void> doBlobPurge(Database db, Key startKey, Key endKey, Optional<V
return Void(); return Void();
} }
ACTOR Future<Version> checkBlobSubrange(Database db, KeyRange keyRange, Optional<Version> version) {
state Transaction tr(db);
state Version readVersionOut = invalidVersion;
loop {
try {
wait(success(tr.readBlobGranules(keyRange, 0, version, &readVersionOut)));
return readVersionOut;
} catch (Error& e) {
wait(tr.onError(e));
}
}
}
ACTOR Future<Void> doBlobCheck(Database db, Key startKey, Key endKey, Optional<Version> version) { ACTOR Future<Void> doBlobCheck(Database db, Key startKey, Key endKey, Optional<Version> version) {
state Transaction tr(db);
state Version readVersionOut = invalidVersion;
state double elapsed = -timer_monotonic(); state double elapsed = -timer_monotonic();
state KeyRange range = KeyRange(KeyRangeRef(startKey, endKey));
state Standalone<VectorRef<KeyRangeRef>> allRanges;
loop {
try {
wait(store(allRanges, tr.getBlobGranuleRanges(range)));
break;
} catch (Error& e) {
wait(tr.onError(e));
}
}
if (allRanges.empty()) { state Version readVersionOut = wait(db->verifyBlobRange(KeyRangeRef(startKey, endKey), version));
fmt::print("ERROR: No blob ranges for [{0} - {1})\n", startKey.printable(), endKey.printable());
return Void();
}
fmt::print("Loaded {0} blob ranges to check\n", allRanges.size());
state std::vector<Future<Version>> checkParts;
// Chunk up to smaller ranges than this limit. Must be smaller than BG_TOO_MANY_GRANULES to not hit the limit
int maxChunkSize = CLIENT_KNOBS->BG_TOO_MANY_GRANULES / 2;
KeyRange currentChunk;
int currentChunkSize = 0;
for (auto& it : allRanges) {
if (currentChunkSize == maxChunkSize) {
checkParts.push_back(checkBlobSubrange(db, currentChunk, version));
currentChunkSize = 0;
}
if (currentChunkSize == 0) {
currentChunk = it;
} else if (it.begin != currentChunk.end) {
fmt::print("ERROR: Blobrange check failed, gap in blob ranges from [{0} - {1})\n",
currentChunk.end.printable(),
it.begin.printable());
return Void();
} else {
currentChunk = KeyRangeRef(currentChunk.begin, it.end);
}
currentChunkSize++;
}
checkParts.push_back(checkBlobSubrange(db, currentChunk, version));
wait(waitForAll(checkParts));
readVersionOut = checkParts.back().get();
elapsed += timer_monotonic(); elapsed += timer_monotonic();
@ -201,7 +120,7 @@ ACTOR Future<bool> blobRangeCommandActor(Database localDb,
fmt::print("Invalid blob range [{0} - {1})\n", tokens[2].printable(), tokens[3].printable()); fmt::print("Invalid blob range [{0} - {1})\n", tokens[2].printable(), tokens[3].printable());
} else { } else {
if (tokencmp(tokens[1], "start") || tokencmp(tokens[1], "stop")) { if (tokencmp(tokens[1], "start") || tokencmp(tokens[1], "stop")) {
bool starting = tokencmp(tokens[1], "start"); state bool starting = tokencmp(tokens[1], "start");
if (tokens.size() > 4) { if (tokens.size() > 4) {
printUsage(tokens[0]); printUsage(tokens[0]);
return false; return false;
@ -210,9 +129,22 @@ ACTOR Future<bool> blobRangeCommandActor(Database localDb,
starting ? "Starting" : "Stopping", starting ? "Starting" : "Stopping",
tokens[2].printable().c_str(), tokens[2].printable().c_str(),
tokens[3].printable().c_str()); tokens[3].printable().c_str());
wait(setBlobRange(localDb, begin, end, starting ? LiteralStringRef("1") : StringRef())); state bool success = false;
} else if (tokencmp(tokens[1], "purge") || tokencmp(tokens[1], "check")) { if (starting) {
bool purge = tokencmp(tokens[1], "purge"); wait(store(success, localDb->blobbifyRange(KeyRangeRef(begin, end))));
} else {
wait(store(success, localDb->unblobbifyRange(KeyRangeRef(begin, end))));
}
if (!success) {
fmt::print("{0} blobbify range for [{1} - {2}) failed\n",
starting ? "Starting" : "Stopping",
tokens[2].printable().c_str(),
tokens[3].printable().c_str());
}
return success;
} else if (tokencmp(tokens[1], "purge") || tokencmp(tokens[1], "forcepurge") || tokencmp(tokens[1], "check")) {
bool purge = tokencmp(tokens[1], "purge") || tokencmp(tokens[1], "forcepurge");
bool forcePurge = tokencmp(tokens[1], "forcepurge");
Optional<Version> version; Optional<Version> version;
if (tokens.size() > 4) { if (tokens.size() > 4) {
@ -225,17 +157,18 @@ ACTOR Future<bool> blobRangeCommandActor(Database localDb,
version = v; version = v;
} }
fmt::print("{0} blob range [{1} - {2})", fmt::print("{0} blob range [{1} - {2}){3}",
purge ? "Purging" : "Checking", purge ? "Purging" : "Checking",
tokens[2].printable(), tokens[2].printable(),
tokens[3].printable()); tokens[3].printable(),
forcePurge ? " (force)" : "");
if (version.present()) { if (version.present()) {
fmt::print(" @ {0}", version.get()); fmt::print(" @ {0}", version.get());
} }
fmt::print("\n"); fmt::print("\n");
if (purge) { if (purge) {
wait(doBlobPurge(localDb, begin, end, version)); wait(doBlobPurge(localDb, begin, end, version, forcePurge));
} else { } else {
wait(doBlobCheck(localDb, begin, end, version)); wait(doBlobCheck(localDb, begin, end, version));
} }
@ -247,8 +180,7 @@ ACTOR Future<bool> blobRangeCommandActor(Database localDb,
return true; return true;
} }
CommandFactory blobRangeFactory("blobrange", CommandFactory blobRangeFactory(
CommandHelp("blobrange <start|stop|purge|check> <startkey> <endkey> [version]", "blobrange",
"", CommandHelp("blobrange <start|stop|check|purge|forcepurge> <startkey> <endkey> [version]", "", ""));
""));
} // namespace fdb_cli } // namespace fdb_cli

View File

@ -46,7 +46,7 @@ ACTOR Future<bool> expensiveDataCheckCommandActor(
if (tokens.size() == 1) { if (tokens.size() == 1) {
// initialize worker interfaces // initialize worker interfaces
address_interface->clear(); address_interface->clear();
wait(getWorkerInterfaces(tr, address_interface)); wait(getWorkerInterfaces(tr, address_interface, true));
} }
if (tokens.size() == 1 || tokencmp(tokens[1], "list")) { if (tokens.size() == 1 || tokencmp(tokens[1], "list")) {
if (address_interface->size() == 0) { if (address_interface->size() == 0) {

View File

@ -44,7 +44,7 @@ ACTOR Future<bool> killCommandActor(Reference<IDatabase> db,
if (tokens.size() == 1) { if (tokens.size() == 1) {
// initialize worker interfaces // initialize worker interfaces
address_interface->clear(); address_interface->clear();
wait(getWorkerInterfaces(tr, address_interface)); wait(getWorkerInterfaces(tr, address_interface, true));
} }
if (tokens.size() == 1 || tokencmp(tokens[1], "list")) { if (tokens.size() == 1 || tokencmp(tokens[1], "list")) {
if (address_interface->size() == 0) { if (address_interface->size() == 0) {

View File

@ -257,11 +257,9 @@ ACTOR Future<bool> metaclusterGetCommand(Reference<IDatabase> db, std::vector<St
fmt::print("{}\n", json_spirit::write_string(json_spirit::mValue(obj), json_spirit::pretty_print).c_str()); fmt::print("{}\n", json_spirit::write_string(json_spirit::mValue(obj), json_spirit::pretty_print).c_str());
} else { } else {
fmt::print(" connection string: {}\n", metadata.connectionString.toString().c_str()); fmt::print(" connection string: {}\n", metadata.connectionString.toString().c_str());
fmt::print(" cluster state: {}\n", DataClusterEntry::clusterStateToString(metadata.entry.clusterState));
fmt::print(" tenant group capacity: {}\n", metadata.entry.capacity.numTenantGroups); fmt::print(" tenant group capacity: {}\n", metadata.entry.capacity.numTenantGroups);
fmt::print(" allocated tenant groups: {}\n", metadata.entry.allocated.numTenantGroups); fmt::print(" allocated tenant groups: {}\n", metadata.entry.allocated.numTenantGroups);
if (metadata.entry.locked) {
fmt::print(" locked: true\n");
}
} }
} catch (Error& e) { } catch (Error& e) {
if (useJson) { if (useJson) {

View File

@ -43,7 +43,7 @@ ACTOR Future<bool> suspendCommandActor(Reference<IDatabase> db,
if (tokens.size() == 1) { if (tokens.size() == 1) {
// initialize worker interfaces // initialize worker interfaces
address_interface->clear(); address_interface->clear();
wait(getWorkerInterfaces(tr, address_interface)); wait(getWorkerInterfaces(tr, address_interface, true));
if (address_interface->size() == 0) { if (address_interface->size() == 0) {
printf("\nNo addresses can be suspended.\n"); printf("\nNo addresses can be suspended.\n");
} else if (address_interface->size() == 1) { } else if (address_interface->size() == 1) {

View File

@ -62,23 +62,18 @@ ACTOR Future<std::string> getSpecialKeysFailureErrorMessage(Reference<ITransacti
return valueObj["message"].get_str(); return valueObj["message"].get_str();
} }
ACTOR Future<Void> verifyAndAddInterface(std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface, void addInterfacesFromKVs(RangeResult& kvs,
Reference<FlowLock> connectLock, std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface) {
KeyValue kv) { for (const auto& kv : kvs) {
wait(connectLock->take()); ClientWorkerInterface workerInterf;
state FlowLock::Releaser releaser(*connectLock);
state ClientWorkerInterface workerInterf;
try { try {
// the interface is back-ward compatible, thus if parsing failed, it needs to upgrade cli version // the interface is back-ward compatible, thus if parsing failed, it needs to upgrade cli version
workerInterf = BinaryReader::fromStringRef<ClientWorkerInterface>(kv.value, IncludeVersion()); workerInterf = BinaryReader::fromStringRef<ClientWorkerInterface>(kv.value, IncludeVersion());
} catch (Error& e) { } catch (Error& e) {
fprintf(stderr, "Error: %s; CLI version is too old, please update to use a newer version\n", e.what()); fprintf(stderr, "Error: %s; CLI version is too old, please update to use a newer version\n", e.what());
return Void(); return;
} }
state ClientLeaderRegInterface leaderInterf(workerInterf.address()); ClientLeaderRegInterface leaderInterf(workerInterf.address());
choose {
when(Optional<LeaderInfo> rep =
wait(brokenPromiseToNever(leaderInterf.getLeader.getReply(GetLeaderRequest())))) {
StringRef ip_port = StringRef ip_port =
(kv.key.endsWith(LiteralStringRef(":tls")) ? kv.key.removeSuffix(LiteralStringRef(":tls")) : kv.key) (kv.key.endsWith(LiteralStringRef(":tls")) ? kv.key.removeSuffix(LiteralStringRef(":tls")) : kv.key)
.removePrefix(LiteralStringRef("\xff\xff/worker_interfaces/")); .removePrefix(LiteralStringRef("\xff\xff/worker_interfaces/"));
@ -93,25 +88,26 @@ ACTOR Future<Void> verifyAndAddInterface(std::map<Key, std::pair<Value, ClientLe
(*address_interface)[ip_port2] = std::make_pair(kv.value, leaderInterf); (*address_interface)[ip_port2] = std::make_pair(kv.value, leaderInterf);
} }
} }
when(wait(delay(CLIENT_KNOBS->CLI_CONNECT_TIMEOUT))) {}
}
return Void();
} }
ACTOR Future<Void> getWorkerInterfaces(Reference<ITransaction> tr, ACTOR Future<Void> getWorkerInterfaces(Reference<ITransaction> tr,
std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface) { std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
bool verify) {
if (verify) {
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
tr->set(workerInterfacesVerifyOptionSpecialKey, ValueRef());
}
// Hold the reference to the standalone's memory // Hold the reference to the standalone's memory
state ThreadFuture<RangeResult> kvsFuture = tr->getRange( state ThreadFuture<RangeResult> kvsFuture = tr->getRange(
KeyRangeRef(LiteralStringRef("\xff\xff/worker_interfaces/"), LiteralStringRef("\xff\xff/worker_interfaces0")), KeyRangeRef(LiteralStringRef("\xff\xff/worker_interfaces/"), LiteralStringRef("\xff\xff/worker_interfaces0")),
CLIENT_KNOBS->TOO_MANY); CLIENT_KNOBS->TOO_MANY);
RangeResult kvs = wait(safeThreadFutureToFuture(kvsFuture)); state RangeResult kvs = wait(safeThreadFutureToFuture(kvsFuture));
ASSERT(!kvs.more); ASSERT(!kvs.more);
auto connectLock = makeReference<FlowLock>(CLIENT_KNOBS->CLI_CONNECT_PARALLELISM); if (verify) {
std::vector<Future<Void>> addInterfs; // remove the option if set
for (auto it : kvs) { tr->clear(workerInterfacesVerifyOptionSpecialKey);
addInterfs.push_back(verifyAndAddInterface(address_interface, connectLock, it));
} }
wait(waitForAll(addInterfs)); addInterfacesFromKVs(kvs, address_interface);
return Void(); return Void();
} }

View File

@ -103,6 +103,7 @@ enum {
OPT_DEBUG_TLS, OPT_DEBUG_TLS,
OPT_API_VERSION, OPT_API_VERSION,
OPT_MEMORY, OPT_MEMORY,
OPT_USE_FUTURE_PROTOCOL_VERSION
}; };
CSimpleOpt::SOption g_rgOptions[] = { { OPT_CONNFILE, "-C", SO_REQ_SEP }, CSimpleOpt::SOption g_rgOptions[] = { { OPT_CONNFILE, "-C", SO_REQ_SEP },
@ -127,6 +128,7 @@ CSimpleOpt::SOption g_rgOptions[] = { { OPT_CONNFILE, "-C", SO_REQ_SEP },
{ OPT_DEBUG_TLS, "--debug-tls", SO_NONE }, { OPT_DEBUG_TLS, "--debug-tls", SO_NONE },
{ OPT_API_VERSION, "--api-version", SO_REQ_SEP }, { OPT_API_VERSION, "--api-version", SO_REQ_SEP },
{ OPT_MEMORY, "--memory", SO_REQ_SEP }, { OPT_MEMORY, "--memory", SO_REQ_SEP },
{ OPT_USE_FUTURE_PROTOCOL_VERSION, "--use-future-protocol-version", SO_NONE },
TLS_OPTION_FLAGS, TLS_OPTION_FLAGS,
SO_END_OF_OPTIONS }; SO_END_OF_OPTIONS };
@ -475,6 +477,9 @@ static void printProgramUsage(const char* name) {
" Useful in reporting and diagnosing TLS issues.\n" " Useful in reporting and diagnosing TLS issues.\n"
" --build-flags Print build information and exit.\n" " --build-flags Print build information and exit.\n"
" --memory Resident memory limit of the CLI (defaults to 8GiB).\n" " --memory Resident memory limit of the CLI (defaults to 8GiB).\n"
" --use-future-protocol-version\n"
" Use the simulated future protocol version to connect to the cluster.\n"
" This option can be used testing purposes only!\n"
" -v, --version Print FoundationDB CLI version information and exit.\n" " -v, --version Print FoundationDB CLI version information and exit.\n"
" -h, --help Display this help and exit.\n"); " -h, --help Display this help and exit.\n");
} }
@ -578,7 +583,7 @@ void initHelp() {
void printVersion() { void printVersion() {
printf("FoundationDB CLI " FDB_VT_PACKAGE_NAME " (v" FDB_VT_VERSION ")\n"); printf("FoundationDB CLI " FDB_VT_PACKAGE_NAME " (v" FDB_VT_VERSION ")\n");
printf("source version %s\n", getSourceVersion()); printf("source version %s\n", getSourceVersion());
printf("protocol %" PRIx64 "\n", currentProtocolVersion.version()); printf("protocol %" PRIx64 "\n", currentProtocolVersion().version());
} }
void printBuildInformation() { void printBuildInformation() {
@ -872,6 +877,7 @@ struct CLIOptions {
Optional<std::string> exec; Optional<std::string> exec;
bool initialStatusCheck = true; bool initialStatusCheck = true;
bool cliHints = true; bool cliHints = true;
bool useFutureProtocolVersion = false;
bool debugTLS = false; bool debugTLS = false;
std::string tlsCertPath; std::string tlsCertPath;
std::string tlsKeyPath; std::string tlsKeyPath;
@ -973,6 +979,10 @@ struct CLIOptions {
break; break;
case OPT_NO_HINTS: case OPT_NO_HINTS:
cliHints = false; cliHints = false;
break;
case OPT_USE_FUTURE_PROTOCOL_VERSION:
useFutureProtocolVersion = true;
break;
// TLS Options // TLS Options
case TLSConfig::OPT_TLS_PLUGIN: case TLSConfig::OPT_TLS_PLUGIN:
@ -1040,36 +1050,6 @@ Future<T> stopNetworkAfter(Future<T> what) {
} }
} }
ACTOR Future<Void> addInterface(std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
Reference<FlowLock> connectLock,
KeyValue kv) {
wait(connectLock->take());
state FlowLock::Releaser releaser(*connectLock);
state ClientWorkerInterface workerInterf =
BinaryReader::fromStringRef<ClientWorkerInterface>(kv.value, IncludeVersion());
state ClientLeaderRegInterface leaderInterf(workerInterf.address());
choose {
when(Optional<LeaderInfo> rep =
wait(brokenPromiseToNever(leaderInterf.getLeader.getReply(GetLeaderRequest())))) {
StringRef ip_port =
(kv.key.endsWith(LiteralStringRef(":tls")) ? kv.key.removeSuffix(LiteralStringRef(":tls")) : kv.key)
.removePrefix(LiteralStringRef("\xff\xff/worker_interfaces/"));
(*address_interface)[ip_port] = std::make_pair(kv.value, leaderInterf);
if (workerInterf.reboot.getEndpoint().addresses.secondaryAddress.present()) {
Key full_ip_port2 =
StringRef(workerInterf.reboot.getEndpoint().addresses.secondaryAddress.get().toString());
StringRef ip_port2 = full_ip_port2.endsWith(LiteralStringRef(":tls"))
? full_ip_port2.removeSuffix(LiteralStringRef(":tls"))
: full_ip_port2;
(*address_interface)[ip_port2] = std::make_pair(kv.value, leaderInterf);
}
}
when(wait(delay(CLIENT_KNOBS->CLI_CONNECT_TIMEOUT))) {}
}
return Void();
}
ACTOR Future<int> cli(CLIOptions opt, LineNoise* plinenoise) { ACTOR Future<int> cli(CLIOptions opt, LineNoise* plinenoise) {
state LineNoise& linenoise = *plinenoise; state LineNoise& linenoise = *plinenoise;
state bool intrans = false; state bool intrans = false;
@ -2199,6 +2179,9 @@ int main(int argc, char** argv) {
try { try {
API->selectApiVersion(opt.apiVersion); API->selectApiVersion(opt.apiVersion);
if (opt.useFutureProtocolVersion) {
API->useFutureProtocolVersion();
}
API->setupNetwork(); API->setupNetwork();
opt.setupKnobs(); opt.setupKnobs();
if (opt.exit_code != -1) { if (opt.exit_code != -1) {

View File

@ -120,6 +120,7 @@ extern const KeyRangeRef processClassSourceSpecialKeyRange;
extern const KeyRangeRef processClassTypeSpecialKeyRange; extern const KeyRangeRef processClassTypeSpecialKeyRange;
// Other special keys // Other special keys
inline const KeyRef errorMsgSpecialKey = LiteralStringRef("\xff\xff/error_message"); inline const KeyRef errorMsgSpecialKey = LiteralStringRef("\xff\xff/error_message");
inline const KeyRef workerInterfacesVerifyOptionSpecialKey = "\xff\xff/management/options/worker_interfaces/verify"_sr;
// help functions (Copied from fdbcli.actor.cpp) // help functions (Copied from fdbcli.actor.cpp)
// get all workers' info // get all workers' info
@ -132,13 +133,14 @@ void printUsage(StringRef command);
// Pre: tr failed with special_keys_api_failure error // Pre: tr failed with special_keys_api_failure error
// Read the error message special key and return the message // Read the error message special key and return the message
ACTOR Future<std::string> getSpecialKeysFailureErrorMessage(Reference<ITransaction> tr); ACTOR Future<std::string> getSpecialKeysFailureErrorMessage(Reference<ITransaction> tr);
// Using \xff\xff/worker_interfaces/ special key, get all worker interfaces // Using \xff\xff/worker_interfaces/ special key, get all worker interfaces.
// A worker list will be returned from CC.
// If verify, we will try to establish connections to all workers returned.
// In particular, it will deserialize \xff\xff/worker_interfaces/<address>:=<ClientInterface> kv pairs and issue RPC
// calls, then only return interfaces(kv pairs) the client can talk to
ACTOR Future<Void> getWorkerInterfaces(Reference<ITransaction> tr, ACTOR Future<Void> getWorkerInterfaces(Reference<ITransaction> tr,
std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface); std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
// Deserialize \xff\xff/worker_interfaces/<address>:=<ClientInterface> k-v pair and verify by a RPC call bool verify = false);
ACTOR Future<Void> verifyAndAddInterface(std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
Reference<FlowLock> connectLock,
KeyValue kv);
// print cluster status info // print cluster status info
void printStatus(StatusObjectReader statusObj, void printStatus(StatusObjectReader statusObj,
StatusClient::StatusLevel level, StatusClient::StatusLevel level,

View File

@ -288,12 +288,47 @@ Reference<IBackupContainer> IBackupContainer::openContainer(const std::string& u
#ifdef BUILD_AZURE_BACKUP #ifdef BUILD_AZURE_BACKUP
else if (u.startsWith("azure://"_sr)) { else if (u.startsWith("azure://"_sr)) {
u.eat("azure://"_sr); u.eat("azure://"_sr);
auto accountName = u.eat("@"_sr).toString(); auto address = u.eat("/"_sr);
auto endpoint = u.eat("/"_sr).toString(); if (address.endsWith(std::string(azure::storage_lite::constants::default_endpoint_suffix))) {
CODE_PROBE(true, "Azure backup url with standard azure storage account endpoint");
// <account>.<service>.core.windows.net/<resource_path>
auto endPoint = address.toString();
auto accountName = address.eat("."_sr).toString();
auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>(
endPoint, accountName, containerName, encryptionKeyFileName);
} else {
// resolve the network address if necessary
std::string endpoint(address.toString());
Optional<NetworkAddress> parsedAddress = NetworkAddress::parseOptional(endpoint);
if (!parsedAddress.present()) {
try {
auto hostname = Hostname::parse(endpoint);
auto resolvedAddress = hostname.resolveBlocking();
if (resolvedAddress.present()) {
CODE_PROBE(true, "Azure backup url with hostname in the endpoint");
parsedAddress = resolvedAddress.get();
}
} catch (Error& e) {
TraceEvent(SevError, "InvalidAzureBackupUrl").error(e).detail("Endpoint", endpoint);
throw backup_invalid_url();
}
}
if (!parsedAddress.present()) {
TraceEvent(SevError, "InvalidAzureBackupUrl").detail("Endpoint", endpoint);
throw backup_invalid_url();
}
auto accountName = u.eat("/"_sr).toString();
// Avoid including ":tls" and "(fromHostname)"
// note: the endpoint needs to contain the account name
// so either "<account_name>.blob.core.windows.net" or "<ip>:<port>/<account_name>"
endpoint =
fmt::format("{}/{}", formatIpPort(parsedAddress.get().ip, parsedAddress.get().port), accountName);
auto containerName = u.eat("/"_sr).toString(); auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>( r = makeReference<BackupContainerAzureBlobStore>(
endpoint, accountName, containerName, encryptionKeyFileName); endpoint, accountName, containerName, encryptionKeyFileName);
} }
}
#endif #endif
else { else {
lastOpenError = "invalid URL prefix"; lastOpenError = "invalid URL prefix";

View File

@ -1523,12 +1523,47 @@ Reference<BackupContainerFileSystem> BackupContainerFileSystem::openContainerFS(
#ifdef BUILD_AZURE_BACKUP #ifdef BUILD_AZURE_BACKUP
else if (u.startsWith("azure://"_sr)) { else if (u.startsWith("azure://"_sr)) {
u.eat("azure://"_sr); u.eat("azure://"_sr);
auto accountName = u.eat("@"_sr).toString(); auto address = u.eat("/"_sr);
auto endpoint = u.eat("/"_sr).toString(); if (address.endsWith(std::string(azure::storage_lite::constants::default_endpoint_suffix))) {
CODE_PROBE(true, "Azure backup url with standard azure storage account endpoint");
// <account>.<service>.core.windows.net/<resource_path>
auto endPoint = address.toString();
auto accountName = address.eat("."_sr).toString();
auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>(
endPoint, accountName, containerName, encryptionKeyFileName);
} else {
// resolve the network address if necessary
std::string endpoint(address.toString());
Optional<NetworkAddress> parsedAddress = NetworkAddress::parseOptional(endpoint);
if (!parsedAddress.present()) {
try {
auto hostname = Hostname::parse(endpoint);
auto resolvedAddress = hostname.resolveBlocking();
if (resolvedAddress.present()) {
CODE_PROBE(true, "Azure backup url with hostname in the endpoint");
parsedAddress = resolvedAddress.get();
}
} catch (Error& e) {
TraceEvent(SevError, "InvalidAzureBackupUrl").error(e).detail("Endpoint", endpoint);
throw backup_invalid_url();
}
}
if (!parsedAddress.present()) {
TraceEvent(SevError, "InvalidAzureBackupUrl").detail("Endpoint", endpoint);
throw backup_invalid_url();
}
auto accountName = u.eat("/"_sr).toString();
// Avoid including ":tls" and "(fromHostname)"
// note: the endpoint needs to contain the account name
// so either "<account_name>.blob.core.windows.net" or "<ip>:<port>/<account_name>"
endpoint =
fmt::format("{}/{}", formatIpPort(parsedAddress.get().ip, parsedAddress.get().port), accountName);
auto containerName = u.eat("/"_sr).toString(); auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>( r = makeReference<BackupContainerAzureBlobStore>(
endpoint, accountName, containerName, encryptionKeyFileName); endpoint, accountName, containerName, encryptionKeyFileName);
} }
}
#endif #endif
else { else {
lastOpenError = "invalid URL prefix"; lastOpenError = "invalid URL prefix";

View File

@ -40,6 +40,7 @@
#include <cstring> #include <cstring>
#include <fstream> // for perf microbenchmark #include <fstream> // for perf microbenchmark
#include <limits>
#include <vector> #include <vector>
#define BG_READ_DEBUG false #define BG_READ_DEBUG false
@ -209,16 +210,21 @@ namespace {
BlobGranuleFileEncryptionKeys getEncryptBlobCipherKey(const BlobGranuleCipherKeysCtx cipherKeysCtx) { BlobGranuleFileEncryptionKeys getEncryptBlobCipherKey(const BlobGranuleCipherKeysCtx cipherKeysCtx) {
BlobGranuleFileEncryptionKeys eKeys; BlobGranuleFileEncryptionKeys eKeys;
// Cipher key reconstructed is 'never' inserted into BlobCipherKey cache, choose 'neverExpire'
eKeys.textCipherKey = makeReference<BlobCipherKey>(cipherKeysCtx.textCipherKey.encryptDomainId, eKeys.textCipherKey = makeReference<BlobCipherKey>(cipherKeysCtx.textCipherKey.encryptDomainId,
cipherKeysCtx.textCipherKey.baseCipherId, cipherKeysCtx.textCipherKey.baseCipherId,
cipherKeysCtx.textCipherKey.baseCipher.begin(), cipherKeysCtx.textCipherKey.baseCipher.begin(),
cipherKeysCtx.textCipherKey.baseCipher.size(), cipherKeysCtx.textCipherKey.baseCipher.size(),
cipherKeysCtx.textCipherKey.salt); cipherKeysCtx.textCipherKey.salt,
std::numeric_limits<int64_t>::max(),
std::numeric_limits<int64_t>::max());
eKeys.headerCipherKey = makeReference<BlobCipherKey>(cipherKeysCtx.headerCipherKey.encryptDomainId, eKeys.headerCipherKey = makeReference<BlobCipherKey>(cipherKeysCtx.headerCipherKey.encryptDomainId,
cipherKeysCtx.headerCipherKey.baseCipherId, cipherKeysCtx.headerCipherKey.baseCipherId,
cipherKeysCtx.headerCipherKey.baseCipher.begin(), cipherKeysCtx.headerCipherKey.baseCipher.begin(),
cipherKeysCtx.headerCipherKey.baseCipher.size(), cipherKeysCtx.headerCipherKey.baseCipher.size(),
cipherKeysCtx.headerCipherKey.salt); cipherKeysCtx.headerCipherKey.salt,
std::numeric_limits<int64_t>::max(),
std::numeric_limits<int64_t>::max());
return eKeys; return eKeys;
} }
@ -346,7 +352,9 @@ struct IndexBlockRef {
decrypt(cipherKeysCtx.get(), *this, arena); decrypt(cipherKeysCtx.get(), *this, arena);
} else { } else {
if (BG_ENCRYPT_COMPRESS_DEBUG) {
TraceEvent("IndexBlockSize").detail("Sz", buffer.size()); TraceEvent("IndexBlockSize").detail("Sz", buffer.size());
}
ObjectReader dataReader(buffer.begin(), IncludeVersion()); ObjectReader dataReader(buffer.begin(), IncludeVersion());
dataReader.deserialize(FileIdentifierFor<IndexBlock>::value, block, arena); dataReader.deserialize(FileIdentifierFor<IndexBlock>::value, block, arena);
@ -368,7 +376,11 @@ struct IndexBlockRef {
arena, ObjectWriter::toValue(block, IncludeVersion(ProtocolVersion::withBlobGranuleFile())).contents()); arena, ObjectWriter::toValue(block, IncludeVersion(ProtocolVersion::withBlobGranuleFile())).contents());
} }
TraceEvent(SevDebug, "IndexBlockSize").detail("Sz", buffer.size()).detail("Encrypted", cipherKeysCtx.present()); if (BG_ENCRYPT_COMPRESS_DEBUG) {
TraceEvent(SevDebug, "IndexBlockSize")
.detail("Sz", buffer.size())
.detail("Encrypted", cipherKeysCtx.present());
}
} }
template <class Ar> template <class Ar>

View File

@ -90,8 +90,8 @@ add_flow_target(LINK_TEST NAME fdbclientlinktest SRCS LinkTest.cpp)
target_link_libraries(fdbclientlinktest PRIVATE fdbclient rapidxml) # re-link rapidxml due to private link interface target_link_libraries(fdbclientlinktest PRIVATE fdbclient rapidxml) # re-link rapidxml due to private link interface
if(BUILD_AZURE_BACKUP) if(BUILD_AZURE_BACKUP)
target_link_libraries(fdbclient PRIVATE curl uuid azure-storage-lite) target_link_libraries(fdbclient PRIVATE curl azure-storage-lite)
target_link_libraries(fdbclient_sampling PRIVATE curl uuid azure-storage-lite) target_link_libraries(fdbclient_sampling PRIVATE curl azure-storage-lite)
endif() endif()
if(BUILD_AWS_BACKUP) if(BUILD_AWS_BACKUP)

View File

@ -42,10 +42,6 @@ void ClientKnobs::initialize(Randomize randomize) {
init( FAILURE_MAX_DELAY, 5.0 ); init( FAILURE_MAX_DELAY, 5.0 );
init( FAILURE_MIN_DELAY, 4.0 ); if( randomize && BUGGIFY ) FAILURE_MIN_DELAY = 1.0; init( FAILURE_MIN_DELAY, 4.0 ); if( randomize && BUGGIFY ) FAILURE_MIN_DELAY = 1.0;
init( FAILURE_TIMEOUT_DELAY, FAILURE_MIN_DELAY );
init( CLIENT_FAILURE_TIMEOUT_DELAY, FAILURE_MIN_DELAY );
init( FAILURE_EMERGENCY_DELAY, 30.0 );
init( FAILURE_MAX_GENERATIONS, 10 );
init( RECOVERY_DELAY_START_GENERATION, 70 ); init( RECOVERY_DELAY_START_GENERATION, 70 );
init( RECOVERY_DELAY_SECONDS_PER_GENERATION, 60.0 ); init( RECOVERY_DELAY_SECONDS_PER_GENERATION, 60.0 );
init( MAX_GENERATIONS, 100 ); init( MAX_GENERATIONS, 100 );
@ -64,6 +60,7 @@ void ClientKnobs::initialize(Randomize randomize) {
init( WRONG_SHARD_SERVER_DELAY, .01 ); if( randomize && BUGGIFY ) WRONG_SHARD_SERVER_DELAY = deterministicRandom()->random01(); // FLOW_KNOBS->PREVENT_FAST_SPIN_DELAY; // SOMEDAY: This delay can limit performance of retrieving data when the cache is mostly wrong (e.g. dumping the database after a test) init( WRONG_SHARD_SERVER_DELAY, .01 ); if( randomize && BUGGIFY ) WRONG_SHARD_SERVER_DELAY = deterministicRandom()->random01(); // FLOW_KNOBS->PREVENT_FAST_SPIN_DELAY; // SOMEDAY: This delay can limit performance of retrieving data when the cache is mostly wrong (e.g. dumping the database after a test)
init( FUTURE_VERSION_RETRY_DELAY, .01 ); if( randomize && BUGGIFY ) FUTURE_VERSION_RETRY_DELAY = deterministicRandom()->random01();// FLOW_KNOBS->PREVENT_FAST_SPIN_DELAY; init( FUTURE_VERSION_RETRY_DELAY, .01 ); if( randomize && BUGGIFY ) FUTURE_VERSION_RETRY_DELAY = deterministicRandom()->random01();// FLOW_KNOBS->PREVENT_FAST_SPIN_DELAY;
init( GRV_ERROR_RETRY_DELAY, 5.0 ); if( randomize && BUGGIFY ) GRV_ERROR_RETRY_DELAY = 0.01 + 5 * deterministicRandom()->random01();
init( UNKNOWN_TENANT_RETRY_DELAY, 0.0 ); if( randomize && BUGGIFY ) UNKNOWN_TENANT_RETRY_DELAY = deterministicRandom()->random01(); init( UNKNOWN_TENANT_RETRY_DELAY, 0.0 ); if( randomize && BUGGIFY ) UNKNOWN_TENANT_RETRY_DELAY = deterministicRandom()->random01();
init( REPLY_BYTE_LIMIT, 80000 ); init( REPLY_BYTE_LIMIT, 80000 );
init( DEFAULT_BACKOFF, .01 ); if( randomize && BUGGIFY ) DEFAULT_BACKOFF = deterministicRandom()->random01(); init( DEFAULT_BACKOFF, .01 ); if( randomize && BUGGIFY ) DEFAULT_BACKOFF = deterministicRandom()->random01();
@ -159,8 +156,6 @@ void ClientKnobs::initialize(Randomize randomize) {
init( BACKUP_AGGREGATE_POLL_RATE_UPDATE_INTERVAL, 60); init( BACKUP_AGGREGATE_POLL_RATE_UPDATE_INTERVAL, 60);
init( BACKUP_AGGREGATE_POLL_RATE, 2.0 ); // polls per second target for all agents on the cluster init( BACKUP_AGGREGATE_POLL_RATE, 2.0 ); // polls per second target for all agents on the cluster
init( BACKUP_LOG_WRITE_BATCH_MAX_SIZE, 1e6 ); //Must be much smaller than TRANSACTION_SIZE_LIMIT init( BACKUP_LOG_WRITE_BATCH_MAX_SIZE, 1e6 ); //Must be much smaller than TRANSACTION_SIZE_LIMIT
init( BACKUP_LOG_ATOMIC_OPS_SIZE, 1000 );
init( BACKUP_OPERATION_COST_OVERHEAD, 50 );
init( BACKUP_MAX_LOG_RANGES, 21 ); if( randomize && BUGGIFY ) BACKUP_MAX_LOG_RANGES = 4; init( BACKUP_MAX_LOG_RANGES, 21 ); if( randomize && BUGGIFY ) BACKUP_MAX_LOG_RANGES = 4;
init( BACKUP_SIM_COPY_LOG_RANGES, 100 ); init( BACKUP_SIM_COPY_LOG_RANGES, 100 );
init( BACKUP_VERSION_DELAY, 5*CORE_VERSIONSPERSECOND ); init( BACKUP_VERSION_DELAY, 5*CORE_VERSIONSPERSECOND );
@ -279,10 +274,6 @@ void ClientKnobs::initialize(Randomize randomize) {
init( BUSYNESS_SPIKE_START_THRESHOLD, 0.100 ); init( BUSYNESS_SPIKE_START_THRESHOLD, 0.100 );
init( BUSYNESS_SPIKE_SATURATED_THRESHOLD, 0.500 ); init( BUSYNESS_SPIKE_SATURATED_THRESHOLD, 0.500 );
// multi-version client control
init( MVC_CLIENTLIB_CHUNK_SIZE, 8*1024 );
init( MVC_CLIENTLIB_CHUNKS_PER_TRANSACTION, 32 );
// Blob granules // Blob granules
init( BG_MAX_GRANULE_PARALLELISM, 10 ); init( BG_MAX_GRANULE_PARALLELISM, 10 );
init( BG_TOO_MANY_GRANULES, 1000 ); init( BG_TOO_MANY_GRANULES, 1000 );
@ -291,7 +282,7 @@ void ClientKnobs::initialize(Randomize randomize) {
init( CHANGE_QUORUM_BAD_STATE_RETRY_DELAY, 2.0 ); init( CHANGE_QUORUM_BAD_STATE_RETRY_DELAY, 2.0 );
// Tenants and Metacluster // Tenants and Metacluster
init( MAX_TENANTS_PER_CLUSTER, 1e6 ); if ( randomize && BUGGIFY ) MAX_TENANTS_PER_CLUSTER = deterministicRandom()->randomInt(20, 100); init( MAX_TENANTS_PER_CLUSTER, 1e6 );
init( TENANT_TOMBSTONE_CLEANUP_INTERVAL, 60 ); if ( randomize && BUGGIFY ) TENANT_TOMBSTONE_CLEANUP_INTERVAL = deterministicRandom()->random01() * 30; init( TENANT_TOMBSTONE_CLEANUP_INTERVAL, 60 ); if ( randomize && BUGGIFY ) TENANT_TOMBSTONE_CLEANUP_INTERVAL = deterministicRandom()->random01() * 30;
init( MAX_DATA_CLUSTERS, 1e5 ); init( MAX_DATA_CLUSTERS, 1e5 );
init( REMOVE_CLUSTER_TENANT_BATCH_SIZE, 1e4 ); if ( randomize && BUGGIFY ) REMOVE_CLUSTER_TENANT_BATCH_SIZE = 1; init( REMOVE_CLUSTER_TENANT_BATCH_SIZE, 1e4 ); if ( randomize && BUGGIFY ) REMOVE_CLUSTER_TENANT_BATCH_SIZE = 1;

View File

@ -2559,7 +2559,7 @@ TEST_CASE("/ManagementAPI/AutoQuorumChange/checkLocality") {
ProcessClass(ProcessClass::CoordinatorClass, ProcessClass::CommandLineSource), ProcessClass(ProcessClass::CoordinatorClass, ProcessClass::CommandLineSource),
"", "",
"", "",
currentProtocolVersion); currentProtocolVersion());
} }
workers.push_back(data); workers.push_back(data);

View File

@ -24,8 +24,48 @@
FDB_DEFINE_BOOLEAN_PARAM(AddNewTenants); FDB_DEFINE_BOOLEAN_PARAM(AddNewTenants);
FDB_DEFINE_BOOLEAN_PARAM(RemoveMissingTenants); FDB_DEFINE_BOOLEAN_PARAM(RemoveMissingTenants);
std::string DataClusterEntry::clusterStateToString(DataClusterState clusterState) {
switch (clusterState) {
case DataClusterState::READY:
return "ready";
case DataClusterState::REMOVING:
return "removing";
case DataClusterState::RESTORING:
return "restoring";
default:
UNREACHABLE();
}
}
DataClusterState DataClusterEntry::stringToClusterState(std::string stateStr) {
if (stateStr == "ready") {
return DataClusterState::READY;
} else if (stateStr == "removing") {
return DataClusterState::REMOVING;
} else if (stateStr == "restoring") {
return DataClusterState::RESTORING;
}
UNREACHABLE();
}
json_spirit::mObject DataClusterEntry::toJson() const {
json_spirit::mObject obj;
obj["capacity"] = capacity.toJson();
obj["allocated"] = allocated.toJson();
obj["cluster_state"] = DataClusterEntry::clusterStateToString(clusterState);
return obj;
}
json_spirit::mObject ClusterUsage::toJson() const { json_spirit::mObject ClusterUsage::toJson() const {
json_spirit::mObject obj; json_spirit::mObject obj;
obj["num_tenant_groups"] = numTenantGroups; obj["num_tenant_groups"] = numTenantGroups;
return obj; return obj;
} }
KeyBackedObjectProperty<MetaclusterRegistrationEntry, decltype(IncludeVersion())>&
MetaclusterMetadata::metaclusterRegistration() {
static KeyBackedObjectProperty<MetaclusterRegistrationEntry, decltype(IncludeVersion())> instance(
"\xff/metacluster/clusterRegistration"_sr, IncludeVersion());
return instance;
}

View File

@ -40,9 +40,12 @@ ACTOR Future<Reference<IDatabase>> openDatabase(ClusterConnectionString connecti
} }
} }
KeyBackedObjectMap<ClusterName, DataClusterEntry, decltype(IncludeVersion())> ManagementClusterMetadata::dataClusters( KeyBackedObjectMap<ClusterName, DataClusterEntry, decltype(IncludeVersion())>&
"metacluster/dataCluster/metadata/"_sr, ManagementClusterMetadata::dataClusters() {
IncludeVersion()); static KeyBackedObjectMap<ClusterName, DataClusterEntry, decltype(IncludeVersion())> instance(
"metacluster/dataCluster/metadata/"_sr, IncludeVersion());
return instance;
}
KeyBackedMap<ClusterName, KeyBackedMap<ClusterName,
ClusterConnectionString, ClusterConnectionString,
@ -56,4 +59,9 @@ KeyBackedMap<ClusterName, int64_t, TupleCodec<ClusterName>, BinaryCodec<int64_t>
KeyBackedSet<Tuple> ManagementClusterMetadata::clusterTenantIndex("metacluster/dataCluster/tenantMap/"_sr); KeyBackedSet<Tuple> ManagementClusterMetadata::clusterTenantIndex("metacluster/dataCluster/tenantMap/"_sr);
KeyBackedSet<Tuple> ManagementClusterMetadata::clusterTenantGroupIndex("metacluster/dataCluster/tenantGroupMap/"_sr); KeyBackedSet<Tuple> ManagementClusterMetadata::clusterTenantGroupIndex("metacluster/dataCluster/tenantGroupMap/"_sr);
TenantMetadataSpecification& ManagementClusterMetadata::tenantMetadata() {
static TenantMetadataSpecification instance(""_sr);
return instance;
}
}; // namespace MetaclusterAPI }; // namespace MetaclusterAPI

View File

@ -663,69 +663,43 @@ ACTOR Future<Void> asyncDeserializeClusterInterface(Reference<AsyncVar<Value>> s
} }
} }
struct ClientStatusStats { namespace {
int count;
std::vector<std::pair<NetworkAddress, Key>> examples;
ClientStatusStats() : count(0) { examples.reserve(CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT); } void tryInsertIntoSamples(OpenDatabaseRequest::Samples& samples,
}; const NetworkAddress& networkAddress,
const Key& traceLogGroup) {
++samples.count;
if (samples.samples.size() < static_cast<size_t>(CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT)) {
samples.samples.insert({ networkAddress, traceLogGroup });
}
}
} // namespace
OpenDatabaseRequest ClientData::getRequest() { OpenDatabaseRequest ClientData::getRequest() {
OpenDatabaseRequest req; OpenDatabaseRequest req;
std::map<StringRef, ClientStatusStats> issueMap;
std::map<ClientVersionRef, ClientStatusStats> versionMap;
std::map<StringRef, ClientStatusStats> maxProtocolMap;
int clientCount = 0;
// SOMEDAY: add a yield in this loop
for (auto& ci : clientStatusInfoMap) { for (auto& ci : clientStatusInfoMap) {
for (auto& it : ci.second.issues) { const auto& networkAddress = ci.first;
auto& entry = issueMap[it]; const auto& traceLogGroup = ci.second.traceLogGroup;
entry.count++;
if (entry.examples.size() < CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT) { for (auto& issue : ci.second.issues) {
entry.examples.emplace_back(ci.first, ci.second.traceLogGroup); tryInsertIntoSamples(req.issues[issue], networkAddress, traceLogGroup);
} }
if (!ci.second.versions.size()) {
tryInsertIntoSamples(req.supportedVersions[ClientVersionRef()], networkAddress, traceLogGroup);
continue;
} }
if (ci.second.versions.size()) {
clientCount++; ++req.clientCount;
StringRef maxProtocol; StringRef maxProtocol;
for (auto& it : ci.second.versions) { for (auto& it : ci.second.versions) {
maxProtocol = std::max(maxProtocol, it.protocolVersion); maxProtocol = std::max(maxProtocol, it.protocolVersion);
auto& entry = versionMap[it]; tryInsertIntoSamples(req.supportedVersions[it], networkAddress, traceLogGroup);
entry.count++;
if (entry.examples.size() < CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT) {
entry.examples.emplace_back(ci.first, ci.second.traceLogGroup);
} }
tryInsertIntoSamples(req.maxProtocolSupported[maxProtocol], networkAddress, traceLogGroup);
} }
auto& maxEntry = maxProtocolMap[maxProtocol];
maxEntry.count++;
if (maxEntry.examples.size() < CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT) {
maxEntry.examples.emplace_back(ci.first, ci.second.traceLogGroup);
}
} else {
auto& entry = versionMap[ClientVersionRef()];
entry.count++;
if (entry.examples.size() < CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT) {
entry.examples.emplace_back(ci.first, ci.second.traceLogGroup);
}
}
}
req.issues.reserve(issueMap.size());
for (auto& it : issueMap) {
req.issues.push_back(ItemWithExamples<Key>(it.first, it.second.count, it.second.examples));
}
req.supportedVersions.reserve(versionMap.size());
for (auto& it : versionMap) {
req.supportedVersions.push_back(
ItemWithExamples<Standalone<ClientVersionRef>>(it.first, it.second.count, it.second.examples));
}
req.maxProtocolSupported.reserve(maxProtocolMap.size());
for (auto& it : maxProtocolMap) {
req.maxProtocolSupported.push_back(ItemWithExamples<Key>(it.first, it.second.count, it.second.examples));
}
req.clientCount = clientCount;
return req; return req;
} }

View File

@ -257,13 +257,14 @@ ThreadFuture<Standalone<VectorRef<KeyRef>>> DLTransaction::getRangeSplitPoints(c
}); });
} }
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> DLTransaction::getBlobGranuleRanges(const KeyRangeRef& keyRange) { ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> DLTransaction::getBlobGranuleRanges(const KeyRangeRef& keyRange,
int rangeLimit) {
if (!api->transactionGetBlobGranuleRanges) { if (!api->transactionGetBlobGranuleRanges) {
return unsupported_operation(); return unsupported_operation();
} }
FdbCApi::FDBFuture* f = api->transactionGetBlobGranuleRanges( FdbCApi::FDBFuture* f = api->transactionGetBlobGranuleRanges(
tr, keyRange.begin.begin(), keyRange.begin.size(), keyRange.end.begin(), keyRange.end.size()); tr, keyRange.begin.begin(), keyRange.begin.size(), keyRange.end.begin(), keyRange.end.size(), rangeLimit);
return toThreadFuture<Standalone<VectorRef<KeyRangeRef>>>(api, f, [](FdbCApi::FDBFuture* f, FdbCApi* api) { return toThreadFuture<Standalone<VectorRef<KeyRangeRef>>>(api, f, [](FdbCApi::FDBFuture* f, FdbCApi* api) {
const FdbCApi::FDBKeyRange* keyRanges; const FdbCApi::FDBKeyRange* keyRanges;
int keyRangesLength; int keyRangesLength;
@ -583,6 +584,71 @@ ThreadFuture<Void> DLDatabase::waitPurgeGranulesComplete(const KeyRef& purgeKey)
return toThreadFuture<Void>(api, f, [](FdbCApi::FDBFuture* f, FdbCApi* api) { return Void(); }); return toThreadFuture<Void>(api, f, [](FdbCApi::FDBFuture* f, FdbCApi* api) { return Void(); });
} }
ThreadFuture<bool> DLDatabase::blobbifyRange(const KeyRangeRef& keyRange) {
if (!api->databaseBlobbifyRange) {
return unsupported_operation();
}
FdbCApi::FDBFuture* f = api->databaseBlobbifyRange(
db, keyRange.begin.begin(), keyRange.begin.size(), keyRange.end.begin(), keyRange.end.size());
return toThreadFuture<bool>(api, f, [](FdbCApi::FDBFuture* f, FdbCApi* api) {
bool ret = false;
ASSERT(!api->futureGetBool(f, &ret));
return ret;
});
}
ThreadFuture<bool> DLDatabase::unblobbifyRange(const KeyRangeRef& keyRange) {
if (!api->databaseUnblobbifyRange) {
return unsupported_operation();
}
FdbCApi::FDBFuture* f = api->databaseUnblobbifyRange(
db, keyRange.begin.begin(), keyRange.begin.size(), keyRange.end.begin(), keyRange.end.size());
return toThreadFuture<bool>(api, f, [](FdbCApi::FDBFuture* f, FdbCApi* api) {
bool ret = false;
ASSERT(!api->futureGetBool(f, &ret));
return ret;
});
}
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> DLDatabase::listBlobbifiedRanges(const KeyRangeRef& keyRange,
int rangeLimit) {
if (!api->databaseListBlobbifiedRanges) {
return unsupported_operation();
}
FdbCApi::FDBFuture* f = api->databaseListBlobbifiedRanges(
db, keyRange.begin.begin(), keyRange.begin.size(), keyRange.end.begin(), keyRange.end.size(), rangeLimit);
return toThreadFuture<Standalone<VectorRef<KeyRangeRef>>>(api, f, [](FdbCApi::FDBFuture* f, FdbCApi* api) {
const FdbCApi::FDBKeyRange* keyRanges;
int keyRangesLength;
FdbCApi::fdb_error_t error = api->futureGetKeyRangeArray(f, &keyRanges, &keyRangesLength);
ASSERT(!error);
// The memory for this is stored in the FDBFuture and is released when the future gets destroyed.
return Standalone<VectorRef<KeyRangeRef>>(VectorRef<KeyRangeRef>((KeyRangeRef*)keyRanges, keyRangesLength),
Arena());
});
}
ThreadFuture<Version> DLDatabase::verifyBlobRange(const KeyRangeRef& keyRange, Optional<Version> version) {
if (!api->databaseVerifyBlobRange) {
return unsupported_operation();
}
FdbCApi::FDBFuture* f = api->databaseVerifyBlobRange(
db, keyRange.begin.begin(), keyRange.begin.size(), keyRange.end.begin(), keyRange.end.size(), version);
return toThreadFuture<Version>(api, f, [](FdbCApi::FDBFuture* f, FdbCApi* api) {
Version version = invalidVersion;
ASSERT(!api->futureGetInt64(f, &version));
return version;
});
}
// DLApi // DLApi
// Loads the specified function from a dynamic library // Loads the specified function from a dynamic library
@ -626,6 +692,8 @@ void DLApi::init() {
loadClientFunction(&api->selectApiVersion, lib, fdbCPath, "fdb_select_api_version_impl", headerVersion >= 0); loadClientFunction(&api->selectApiVersion, lib, fdbCPath, "fdb_select_api_version_impl", headerVersion >= 0);
loadClientFunction(&api->getClientVersion, lib, fdbCPath, "fdb_get_client_version", headerVersion >= 410); loadClientFunction(&api->getClientVersion, lib, fdbCPath, "fdb_get_client_version", headerVersion >= 410);
loadClientFunction(
&api->useFutureProtocolVersion, lib, fdbCPath, "fdb_use_future_protocol_version", headerVersion >= 720);
loadClientFunction(&api->setNetworkOption, lib, fdbCPath, "fdb_network_set_option", headerVersion >= 0); loadClientFunction(&api->setNetworkOption, lib, fdbCPath, "fdb_network_set_option", headerVersion >= 0);
loadClientFunction(&api->setupNetwork, lib, fdbCPath, "fdb_setup_network", headerVersion >= 0); loadClientFunction(&api->setupNetwork, lib, fdbCPath, "fdb_setup_network", headerVersion >= 0);
loadClientFunction(&api->runNetwork, lib, fdbCPath, "fdb_run_network", headerVersion >= 0); loadClientFunction(&api->runNetwork, lib, fdbCPath, "fdb_run_network", headerVersion >= 0);
@ -668,6 +736,13 @@ void DLApi::init() {
fdbCPath, fdbCPath,
"fdb_database_wait_purge_granules_complete", "fdb_database_wait_purge_granules_complete",
headerVersion >= 710); headerVersion >= 710);
loadClientFunction(&api->databaseBlobbifyRange, lib, fdbCPath, "fdb_database_blobbify_range", headerVersion >= 720);
loadClientFunction(
&api->databaseUnblobbifyRange, lib, fdbCPath, "fdb_database_unblobbify_range", headerVersion >= 720);
loadClientFunction(
&api->databaseListBlobbifiedRanges, lib, fdbCPath, "fdb_database_list_blobbified_ranges", headerVersion >= 720);
loadClientFunction(
&api->databaseVerifyBlobRange, lib, fdbCPath, "fdb_database_verify_blob_range", headerVersion >= 720);
loadClientFunction( loadClientFunction(
&api->tenantCreateTransaction, lib, fdbCPath, "fdb_tenant_create_transaction", headerVersion >= 710); &api->tenantCreateTransaction, lib, fdbCPath, "fdb_tenant_create_transaction", headerVersion >= 710);
@ -742,6 +817,7 @@ void DLApi::init() {
fdbCPath, fdbCPath,
headerVersion >= 620 ? "fdb_future_get_int64" : "fdb_future_get_version", headerVersion >= 620 ? "fdb_future_get_int64" : "fdb_future_get_version",
headerVersion >= 0); headerVersion >= 0);
loadClientFunction(&api->futureGetBool, lib, fdbCPath, "fdb_future_get_bool", headerVersion >= 720);
loadClientFunction(&api->futureGetUInt64, lib, fdbCPath, "fdb_future_get_uint64", headerVersion >= 700); loadClientFunction(&api->futureGetUInt64, lib, fdbCPath, "fdb_future_get_uint64", headerVersion >= 700);
loadClientFunction(&api->futureGetError, lib, fdbCPath, "fdb_future_get_error", headerVersion >= 0); loadClientFunction(&api->futureGetError, lib, fdbCPath, "fdb_future_get_error", headerVersion >= 0);
loadClientFunction(&api->futureGetKey, lib, fdbCPath, "fdb_future_get_key", headerVersion >= 0); loadClientFunction(&api->futureGetKey, lib, fdbCPath, "fdb_future_get_key", headerVersion >= 0);
@ -788,6 +864,14 @@ const char* DLApi::getClientVersion() {
return api->getClientVersion(); return api->getClientVersion();
} }
void DLApi::useFutureProtocolVersion() {
if (!api->useFutureProtocolVersion) {
return;
}
api->useFutureProtocolVersion();
}
void DLApi::setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value) { void DLApi::setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value) {
throwIfError(api->setNetworkOption(static_cast<FDBNetworkOption>(option), throwIfError(api->setNetworkOption(static_cast<FDBNetworkOption>(option),
value.present() ? value.get().begin() : nullptr, value.present() ? value.get().begin() : nullptr,
@ -1069,9 +1153,10 @@ ThreadFuture<Standalone<VectorRef<KeyRef>>> MultiVersionTransaction::getRangeSpl
} }
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> MultiVersionTransaction::getBlobGranuleRanges( ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> MultiVersionTransaction::getBlobGranuleRanges(
const KeyRangeRef& keyRange) { const KeyRangeRef& keyRange,
int rangeLimit) {
auto tr = getTransaction(); auto tr = getTransaction();
auto f = tr.transaction ? tr.transaction->getBlobGranuleRanges(keyRange) auto f = tr.transaction ? tr.transaction->getBlobGranuleRanges(keyRange, rangeLimit)
: makeTimeout<Standalone<VectorRef<KeyRangeRef>>>(); : makeTimeout<Standalone<VectorRef<KeyRangeRef>>>();
return abortableFuture(f, tr.onChange); return abortableFuture(f, tr.onChange);
} }
@ -1579,6 +1664,32 @@ ThreadFuture<Void> MultiVersionDatabase::waitPurgeGranulesComplete(const KeyRef&
return abortableFuture(f, dbState->dbVar->get().onChange); return abortableFuture(f, dbState->dbVar->get().onChange);
} }
ThreadFuture<bool> MultiVersionDatabase::blobbifyRange(const KeyRangeRef& keyRange) {
auto dbVar = dbState->dbVar->get();
auto f = dbVar.value ? dbVar.value->blobbifyRange(keyRange) : ThreadFuture<bool>(Never());
return abortableFuture(f, dbVar.onChange);
}
ThreadFuture<bool> MultiVersionDatabase::unblobbifyRange(const KeyRangeRef& keyRange) {
auto dbVar = dbState->dbVar->get();
auto f = dbVar.value ? dbVar.value->unblobbifyRange(keyRange) : ThreadFuture<bool>(Never());
return abortableFuture(f, dbVar.onChange);
}
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> MultiVersionDatabase::listBlobbifiedRanges(const KeyRangeRef& keyRange,
int rangeLimit) {
auto dbVar = dbState->dbVar->get();
auto f = dbVar.value ? dbVar.value->listBlobbifiedRanges(keyRange, rangeLimit)
: ThreadFuture<Standalone<VectorRef<KeyRangeRef>>>(Never());
return abortableFuture(f, dbVar.onChange);
}
ThreadFuture<Version> MultiVersionDatabase::verifyBlobRange(const KeyRangeRef& keyRange, Optional<Version> version) {
auto dbVar = dbState->dbVar->get();
auto f = dbVar.value ? dbVar.value->verifyBlobRange(keyRange, version) : ThreadFuture<Version>(Never());
return abortableFuture(f, dbVar.onChange);
}
// Returns the protocol version reported by the coordinator this client is connected to // Returns the protocol version reported by the coordinator this client is connected to
// If an expected version is given, the future won't return until the protocol version is different than expected // If an expected version is given, the future won't return until the protocol version is different than expected
// Note: this will never return if the server is running a protocol from FDB 5.0 or older // Note: this will never return if the server is running a protocol from FDB 5.0 or older
@ -1644,7 +1755,7 @@ ThreadFuture<Void> MultiVersionDatabase::DatabaseState::monitorProtocolVersion()
} }
ProtocolVersion clusterVersion = ProtocolVersion clusterVersion =
!cv.isError() ? cv.get() : self->dbProtocolVersion.orDefault(currentProtocolVersion); !cv.isError() ? cv.get() : self->dbProtocolVersion.orDefault(currentProtocolVersion());
onMainThreadVoid([self, clusterVersion]() { self->protocolVersionChanged(clusterVersion); }); onMainThreadVoid([self, clusterVersion]() { self->protocolVersionChanged(clusterVersion); });
return ErrorOr<Void>(Void()); return ErrorOr<Void>(Void());
}); });
@ -1974,6 +2085,10 @@ const char* MultiVersionApi::getClientVersion() {
return localClient->api->getClientVersion(); return localClient->api->getClientVersion();
} }
void MultiVersionApi::useFutureProtocolVersion() {
localClient->api->useFutureProtocolVersion();
}
namespace { namespace {
void validateOption(Optional<StringRef> value, bool canBePresent, bool canBeAbsent, bool canBeEmpty = true) { void validateOption(Optional<StringRef> value, bool canBePresent, bool canBeAbsent, bool canBeEmpty = true) {
@ -2006,7 +2121,7 @@ void MultiVersionApi::setCallbacksOnExternalThreads() {
callbackOnMainThread = false; callbackOnMainThread = false;
} }
void MultiVersionApi::addExternalLibrary(std::string path) { void MultiVersionApi::addExternalLibrary(std::string path, bool useFutureVersion) {
std::string filename = basename(path); std::string filename = basename(path);
if (filename.empty() || !fileExists(path)) { if (filename.empty() || !fileExists(path)) {
@ -2023,8 +2138,8 @@ void MultiVersionApi::addExternalLibrary(std::string path) {
threadCount = std::max(threadCount, 1); threadCount = std::max(threadCount, 1);
if (externalClientDescriptions.count(filename) == 0) { if (externalClientDescriptions.count(filename) == 0) {
TraceEvent("AddingExternalClient").detail("LibraryPath", filename); TraceEvent("AddingExternalClient").detail("LibraryPath", filename).detail("UseFutureVersion", useFutureVersion);
externalClientDescriptions.emplace(std::make_pair(filename, ClientDesc(path, true))); externalClientDescriptions.emplace(std::make_pair(filename, ClientDesc(path, true, useFutureVersion)));
} }
} }
@ -2044,7 +2159,7 @@ void MultiVersionApi::addExternalLibraryDirectory(std::string path) {
std::string lib = abspath(joinPath(path, filename)); std::string lib = abspath(joinPath(path, filename));
if (externalClientDescriptions.count(filename) == 0) { if (externalClientDescriptions.count(filename) == 0) {
TraceEvent("AddingExternalClient").detail("LibraryPath", filename); TraceEvent("AddingExternalClient").detail("LibraryPath", filename);
externalClientDescriptions.emplace(std::make_pair(filename, ClientDesc(lib, true))); externalClientDescriptions.emplace(std::make_pair(filename, ClientDesc(lib, true, false)));
} }
} }
} }
@ -2182,7 +2297,7 @@ void MultiVersionApi::setNetworkOptionInternal(FDBNetworkOptions::Option option,
setCallbacksOnExternalThreads(); setCallbacksOnExternalThreads();
} else if (option == FDBNetworkOptions::EXTERNAL_CLIENT_LIBRARY) { } else if (option == FDBNetworkOptions::EXTERNAL_CLIENT_LIBRARY) {
validateOption(value, true, false, false); validateOption(value, true, false, false);
addExternalLibrary(abspath(value.get().toString())); addExternalLibrary(abspath(value.get().toString()), false);
} else if (option == FDBNetworkOptions::EXTERNAL_CLIENT_DIRECTORY) { } else if (option == FDBNetworkOptions::EXTERNAL_CLIENT_DIRECTORY) {
validateOption(value, true, false, false); validateOption(value, true, false, false);
addExternalLibraryDirectory(value.get().toString()); addExternalLibraryDirectory(value.get().toString());
@ -2213,6 +2328,9 @@ void MultiVersionApi::setNetworkOptionInternal(FDBNetworkOptions::Option option,
} else if (option == FDBNetworkOptions::CLIENT_TMP_DIR) { } else if (option == FDBNetworkOptions::CLIENT_TMP_DIR) {
validateOption(value, true, false, false); validateOption(value, true, false, false);
tmpDir = abspath(value.get().toString()); tmpDir = abspath(value.get().toString());
} else if (option == FDBNetworkOptions::FUTURE_VERSION_CLIENT_LIBRARY) {
validateOption(value, true, false, false);
addExternalLibrary(abspath(value.get().toString()), true);
} else { } else {
forwardOption = true; forwardOption = true;
} }
@ -2251,13 +2369,14 @@ void MultiVersionApi::setupNetwork() {
for (auto i : externalClientDescriptions) { for (auto i : externalClientDescriptions) {
std::string path = i.second.libPath; std::string path = i.second.libPath;
std::string filename = basename(path); std::string filename = basename(path);
bool useFutureVersion = i.second.useFutureVersion;
// Copy external lib for each thread // Copy external lib for each thread
if (externalClients.count(filename) == 0) { if (externalClients.count(filename) == 0) {
externalClients[filename] = {}; externalClients[filename] = {};
for (const auto& tmp : copyExternalLibraryPerThread(path)) { for (const auto& tmp : copyExternalLibraryPerThread(path)) {
externalClients[filename].push_back(Reference<ClientInfo>( externalClients[filename].push_back(Reference<ClientInfo>(
new ClientInfo(new DLApi(tmp.first, tmp.second /*unlink on load*/), path))); new ClientInfo(new DLApi(tmp.first, tmp.second /*unlink on load*/), path, useFutureVersion)));
} }
} }
} }
@ -2297,6 +2416,9 @@ void MultiVersionApi::setupNetwork() {
runOnExternalClientsAllThreads([this](Reference<ClientInfo> client) { runOnExternalClientsAllThreads([this](Reference<ClientInfo> client) {
TraceEvent("InitializingExternalClient").detail("LibraryPath", client->libPath); TraceEvent("InitializingExternalClient").detail("LibraryPath", client->libPath);
client->api->selectApiVersion(apiVersion); client->api->selectApiVersion(apiVersion);
if (client->useFutureVersion) {
client->api->useFutureProtocolVersion();
}
client->loadVersion(); client->loadVersion();
}); });

View File

@ -23,6 +23,7 @@
#include <algorithm> #include <algorithm>
#include <cstdio> #include <cstdio>
#include <iterator> #include <iterator>
#include <limits>
#include <memory> #include <memory>
#include <regex> #include <regex>
#include <unordered_set> #include <unordered_set>
@ -1278,32 +1279,6 @@ void DatabaseContext::registerSpecialKeysImpl(SpecialKeySpace::MODULE module,
ACTOR Future<RangeResult> getWorkerInterfaces(Reference<IClusterConnectionRecord> clusterRecord); ACTOR Future<RangeResult> getWorkerInterfaces(Reference<IClusterConnectionRecord> clusterRecord);
ACTOR Future<Optional<Value>> getJSON(Database db); ACTOR Future<Optional<Value>> getJSON(Database db);
struct WorkerInterfacesSpecialKeyImpl : SpecialKeyRangeReadImpl {
Future<RangeResult> getRange(ReadYourWritesTransaction* ryw,
KeyRangeRef kr,
GetRangeLimits limitsHint) const override {
if (ryw->getDatabase().getPtr() && ryw->getDatabase()->getConnectionRecord()) {
Key prefix = Key(getKeyRange().begin);
return map(getWorkerInterfaces(ryw->getDatabase()->getConnectionRecord()),
[prefix = prefix, kr = KeyRange(kr)](const RangeResult& in) {
RangeResult result;
for (const auto& [k_, v] : in) {
auto k = k_.withPrefix(prefix);
if (kr.contains(k))
result.push_back_deep(result.arena(), KeyValueRef(k, v));
}
std::sort(result.begin(), result.end(), KeyValueRef::OrderByKey{});
return result;
});
} else {
return RangeResult();
}
}
explicit WorkerInterfacesSpecialKeyImpl(KeyRangeRef kr) : SpecialKeyRangeReadImpl(kr) {}
};
struct SingleSpecialKeyImpl : SpecialKeyRangeReadImpl { struct SingleSpecialKeyImpl : SpecialKeyRangeReadImpl {
Future<RangeResult> getRange(ReadYourWritesTransaction* ryw, Future<RangeResult> getRange(ReadYourWritesTransaction* ryw,
KeyRangeRef kr, KeyRangeRef kr,
@ -1826,6 +1801,9 @@ DatabaseContext::~DatabaseContext() {
it->second->notifyContextDestroyed(); it->second->notifyContextDestroyed();
ASSERT_ABORT(server_interf.empty()); ASSERT_ABORT(server_interf.empty());
locationCache.insert(allKeys, Reference<LocationInfo>()); locationCache.insert(allKeys, Reference<LocationInfo>());
for (auto& it : notAtLatestChangeFeeds) {
it.second->context = nullptr;
}
TraceEvent("DatabaseContextDestructed", dbId).backtrace(); TraceEvent("DatabaseContextDestructed", dbId).backtrace();
} }
@ -2992,11 +2970,9 @@ Future<KeyRangeLocationInfo> getKeyLocation(Reference<TransactionState> trState,
isBackward, isBackward,
version); version);
if (trState->tenant().present() && useTenant && trState->tenantId == TenantInfo::INVALID_TENANT) { if (trState->tenant().present() && useTenant && trState->tenantId() == TenantInfo::INVALID_TENANT) {
return map(f, [trState](const KeyRangeLocationInfo& locationInfo) { return map(f, [trState](const KeyRangeLocationInfo& locationInfo) {
if (trState->tenantId == TenantInfo::INVALID_TENANT) { trState->trySetTenantId(locationInfo.tenantEntry.id);
trState->tenantId = locationInfo.tenantEntry.id;
}
return locationInfo; return locationInfo;
}); });
} else { } else {
@ -3134,12 +3110,10 @@ Future<std::vector<KeyRangeLocationInfo>> getKeyRangeLocations(Reference<Transac
trState->useProvisionalProxies, trState->useProvisionalProxies,
version); version);
if (trState->tenant().present() && useTenant && trState->tenantId == TenantInfo::INVALID_TENANT) { if (trState->tenant().present() && useTenant && trState->tenantId() == TenantInfo::INVALID_TENANT) {
return map(f, [trState](const std::vector<KeyRangeLocationInfo>& locationInfo) { return map(f, [trState](const std::vector<KeyRangeLocationInfo>& locationInfo) {
ASSERT(!locationInfo.empty()); ASSERT(!locationInfo.empty());
if (trState->tenantId == TenantInfo::INVALID_TENANT) { trState->trySetTenantId(locationInfo[0].tenantEntry.id);
trState->tenantId = locationInfo[0].tenantEntry.id;
}
return locationInfo; return locationInfo;
}); });
} else { } else {
@ -3259,8 +3233,8 @@ TenantInfo TransactionState::getTenantInfo(AllowInvalidTenantID allowInvalidId /
} }
} }
ASSERT(allowInvalidId || tenantId != TenantInfo::INVALID_TENANT); ASSERT(allowInvalidId || tenantId_ != TenantInfo::INVALID_TENANT);
return TenantInfo(t, authToken, tenantId); return TenantInfo(t, authToken, tenantId_);
} }
// Returns the tenant used in this transaction. If the tenant is unset and raw access isn't specified, then the default // Returns the tenant used in this transaction. If the tenant is unset and raw access isn't specified, then the default
@ -3288,6 +3262,13 @@ bool TransactionState::hasTenant() const {
return tenantSet && tenant_.present(); return tenantSet && tenant_.present();
} }
Future<Void> TransactionState::handleUnknownTenant() {
tenantId_ = TenantInfo::INVALID_TENANT;
ASSERT(tenant().present());
cx->invalidateCachedTenant(tenant().get());
return delay(CLIENT_KNOBS->UNKNOWN_TENANT_RETRY_DELAY, taskID);
}
Future<Void> Transaction::warmRange(KeyRange keys) { Future<Void> Transaction::warmRange(KeyRange keys) {
return warmRange_impl(trState, keys, getReadVersion()); return warmRange_impl(trState, keys, getReadVersion());
} }
@ -3406,9 +3387,8 @@ ACTOR Future<Optional<Value>> getValue(Reference<TransactionState> trState,
trState->cx->invalidateCache(locationInfo.tenantEntry.prefix, key); trState->cx->invalidateCache(locationInfo.tenantEntry.prefix, key);
wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID)); wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID));
} else if (e.code() == error_code_unknown_tenant) { } else if (e.code() == error_code_unknown_tenant) {
ASSERT(useTenant && trState->tenant().present()); ASSERT(useTenant);
trState->cx->invalidateCachedTenant(trState->tenant().get()); wait(trState->handleUnknownTenant());
wait(delay(CLIENT_KNOBS->UNKNOWN_TENANT_RETRY_DELAY, trState->taskID));
} else { } else {
if (trState->trLogInfo && recordLogInfo) if (trState->trLogInfo && recordLogInfo)
trState->trLogInfo->addLog(FdbClientLogEvents::EventGetError(startTimeD, trState->trLogInfo->addLog(FdbClientLogEvents::EventGetError(startTimeD,
@ -3517,9 +3497,8 @@ ACTOR Future<Key> getKey(Reference<TransactionState> trState,
wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID)); wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID));
} else if (e.code() == error_code_unknown_tenant) { } else if (e.code() == error_code_unknown_tenant) {
ASSERT(useTenant && trState->tenant().present()); ASSERT(useTenant);
trState->cx->invalidateCachedTenant(trState->tenant().get()); wait(trState->handleUnknownTenant());
wait(delay(CLIENT_KNOBS->UNKNOWN_TENANT_RETRY_DELAY, trState->taskID));
} else { } else {
TraceEvent(SevInfo, "GetKeyError").error(e).detail("AtKey", k.getKey()).detail("Offset", k.offset); TraceEvent(SevInfo, "GetKeyError").error(e).detail("AtKey", k.getKey()).detail("Offset", k.offset);
throw e; throw e;
@ -3530,8 +3509,8 @@ ACTOR Future<Key> getKey(Reference<TransactionState> trState,
ACTOR Future<Version> waitForCommittedVersion(Database cx, Version version, SpanContext spanContext) { ACTOR Future<Version> waitForCommittedVersion(Database cx, Version version, SpanContext spanContext) {
state Span span("NAPI:waitForCommittedVersion"_loc, spanContext); state Span span("NAPI:waitForCommittedVersion"_loc, spanContext);
try {
loop { loop {
try {
choose { choose {
when(wait(cx->onProxiesChanged())) {} when(wait(cx->onProxiesChanged())) {}
when(GetReadVersionReply v = wait(basicLoadBalance( when(GetReadVersionReply v = wait(basicLoadBalance(
@ -3557,12 +3536,18 @@ ACTOR Future<Version> waitForCommittedVersion(Database cx, Version version, Span
wait(delay(CLIENT_KNOBS->FUTURE_VERSION_RETRY_DELAY, cx->taskID)); wait(delay(CLIENT_KNOBS->FUTURE_VERSION_RETRY_DELAY, cx->taskID));
} }
} }
}
} catch (Error& e) { } catch (Error& e) {
if (e.code() == error_code_batch_transaction_throttled ||
e.code() == error_code_grv_proxy_memory_limit_exceeded) {
// GRV Proxy returns an error
wait(delayJittered(CLIENT_KNOBS->GRV_ERROR_RETRY_DELAY));
} else {
TraceEvent(SevError, "WaitForCommittedVersionError").error(e); TraceEvent(SevError, "WaitForCommittedVersionError").error(e);
throw; throw;
} }
} }
}
}
ACTOR Future<Version> getRawVersion(Reference<TransactionState> trState) { ACTOR Future<Version> getRawVersion(Reference<TransactionState> trState) {
state Span span("NAPI:getRawVersion"_loc, trState->spanContext); state Span span("NAPI:getRawVersion"_loc, trState->spanContext);
@ -3753,7 +3738,7 @@ ACTOR Future<Void> sameVersionDiffValue(Database cx, Reference<WatchParameters>
} }
// val_3 == val_2 (storage server value matches value passed into the function -> new watch) // val_3 == val_2 (storage server value matches value passed into the function -> new watch)
if (valSS == parameters->value && tr.getTransactionState()->tenantId == parameters->tenant.tenantId) { if (valSS == parameters->value && tr.getTransactionState()->tenantId() == parameters->tenant.tenantId) {
metadata = makeReference<WatchMetadata>(parameters); metadata = makeReference<WatchMetadata>(parameters);
cx->setWatchMetadata(metadata); cx->setWatchMetadata(metadata);
@ -4061,9 +4046,8 @@ Future<RangeResultFamily> getExactRange(Reference<TransactionState> trState,
wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID)); wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID));
break; break;
} else if (e.code() == error_code_unknown_tenant) { } else if (e.code() == error_code_unknown_tenant) {
ASSERT(useTenant && trState->tenant().present()); ASSERT(useTenant);
trState->cx->invalidateCachedTenant(trState->tenant().get()); wait(trState->handleUnknownTenant());
wait(delay(CLIENT_KNOBS->UNKNOWN_TENANT_RETRY_DELAY, trState->taskID));
break; break;
} else { } else {
TraceEvent(SevInfo, "GetExactRangeError") TraceEvent(SevInfo, "GetExactRangeError")
@ -4532,9 +4516,8 @@ Future<RangeResultFamily> getRange(Reference<TransactionState> trState,
wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID)); wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID));
} else if (e.code() == error_code_unknown_tenant) { } else if (e.code() == error_code_unknown_tenant) {
ASSERT(useTenant && trState->tenant().present()); ASSERT(useTenant);
trState->cx->invalidateCachedTenant(trState->tenant().get()); wait(trState->handleUnknownTenant());
wait(delay(CLIENT_KNOBS->UNKNOWN_TENANT_RETRY_DELAY, trState->taskID));
} else { } else {
if (trState->trLogInfo) if (trState->trLogInfo)
trState->trLogInfo->addLog( trState->trLogInfo->addLog(
@ -4993,9 +4976,7 @@ ACTOR Future<Void> getRangeStreamFragment(Reference<TransactionState> trState,
wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID)); wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, trState->taskID));
break; break;
} else if (e.code() == error_code_unknown_tenant) { } else if (e.code() == error_code_unknown_tenant) {
ASSERT(trState->tenant().present()); wait(trState->handleUnknownTenant());
trState->cx->invalidateCachedTenant(trState->tenant().get());
wait(delay(CLIENT_KNOBS->UNKNOWN_TENANT_RETRY_DELAY, trState->taskID));
break; break;
} else { } else {
results->sendError(e); results->sendError(e);
@ -5278,7 +5259,7 @@ ACTOR Future<TenantInfo> getTenantMetadata(Reference<TransactionState> trState,
Future<TenantInfo> populateAndGetTenant(Reference<TransactionState> trState, Key const& key, Version version) { Future<TenantInfo> populateAndGetTenant(Reference<TransactionState> trState, Key const& key, Version version) {
if (!trState->tenant().present() || key == metadataVersionKey) { if (!trState->tenant().present() || key == metadataVersionKey) {
return TenantInfo(); return TenantInfo();
} else if (trState->tenantId != TenantInfo::INVALID_TENANT) { } else if (trState->tenantId() != TenantInfo::INVALID_TENANT) {
return trState->getTenantInfo(); return trState->getTenantInfo();
} else { } else {
return getTenantMetadata(trState, key, version); return getTenantMetadata(trState, key, version);
@ -5772,7 +5753,9 @@ double Transaction::getBackoff(int errCode) {
returnedBackoff *= deterministicRandom()->random01(); returnedBackoff *= deterministicRandom()->random01();
// Set backoff for next time // Set backoff for next time
if (errCode == error_code_proxy_memory_limit_exceeded) { if (errCode == error_code_commit_proxy_memory_limit_exceeded ||
errCode == error_code_grv_proxy_memory_limit_exceeded) {
backoff = std::min(backoff * CLIENT_KNOBS->BACKOFF_GROWTH_RATE, CLIENT_KNOBS->RESOURCE_CONSTRAINED_MAX_BACKOFF); backoff = std::min(backoff * CLIENT_KNOBS->BACKOFF_GROWTH_RATE, CLIENT_KNOBS->RESOURCE_CONSTRAINED_MAX_BACKOFF);
} else { } else {
backoff = std::min(backoff * CLIENT_KNOBS->BACKOFF_GROWTH_RATE, trState->options.maxBackoff); backoff = std::min(backoff * CLIENT_KNOBS->BACKOFF_GROWTH_RATE, trState->options.maxBackoff);
@ -5978,7 +5961,7 @@ ACTOR static Future<Void> commitDummyTransaction(Reference<TransactionState> trS
tr.trState->options = trState->options; tr.trState->options = trState->options;
tr.trState->taskID = trState->taskID; tr.trState->taskID = trState->taskID;
tr.trState->authToken = trState->authToken; tr.trState->authToken = trState->authToken;
tr.trState->tenantId = trState->tenantId; tr.trState->trySetTenantId(trState->tenantId());
if (!trState->hasTenant()) { if (!trState->hasTenant()) {
tr.setOption(FDBTransactionOptions::RAW_ACCESS); tr.setOption(FDBTransactionOptions::RAW_ACCESS);
} else { } else {
@ -6156,8 +6139,11 @@ ACTOR static Future<Void> tryCommit(Reference<TransactionState> trState,
} }
try { try {
if (CLIENT_BUGGIFY) { if (CLIENT_BUGGIFY) {
throw deterministicRandom()->randomChoice(std::vector<Error>{ throw deterministicRandom()->randomChoice(std::vector<Error>{ not_committed(),
not_committed(), transaction_too_old(), proxy_memory_limit_exceeded(), commit_unknown_result() }); transaction_too_old(),
commit_proxy_memory_limit_exceeded(),
grv_proxy_memory_limit_exceeded(),
commit_unknown_result() });
} }
if (req.tagSet.present() && trState->options.priority < TransactionPriority::IMMEDIATE) { if (req.tagSet.present() && trState->options.priority < TransactionPriority::IMMEDIATE) {
@ -6316,12 +6302,15 @@ ACTOR static Future<Void> tryCommit(Reference<TransactionState> trState,
// retry it anyway (relying on transaction idempotence) but a client might do something else. // retry it anyway (relying on transaction idempotence) but a client might do something else.
throw commit_unknown_result(); throw commit_unknown_result();
} else if (e.code() == error_code_unknown_tenant) { } else if (e.code() == error_code_unknown_tenant) {
// Rather than reset the tenant and retry just the commit, we need to throw this error to the user and let
// them retry the whole transaction
ASSERT(trState->tenant().present()); ASSERT(trState->tenant().present());
trState->cx->invalidateCachedTenant(trState->tenant().get()); trState->cx->invalidateCachedTenant(trState->tenant().get());
throw; throw;
} else { } else {
if (e.code() != error_code_transaction_too_old && e.code() != error_code_not_committed && if (e.code() != error_code_transaction_too_old && e.code() != error_code_not_committed &&
e.code() != error_code_database_locked && e.code() != error_code_proxy_memory_limit_exceeded && e.code() != error_code_database_locked && e.code() != error_code_commit_proxy_memory_limit_exceeded &&
e.code() != error_code_grv_proxy_memory_limit_exceeded &&
e.code() != error_code_batch_transaction_throttled && e.code() != error_code_tag_throttled && e.code() != error_code_batch_transaction_throttled && e.code() != error_code_tag_throttled &&
e.code() != error_code_process_behind && e.code() != error_code_future_version && e.code() != error_code_process_behind && e.code() != error_code_future_version &&
e.code() != error_code_tenant_not_found) { e.code() != error_code_tenant_not_found) {
@ -6765,9 +6754,12 @@ ACTOR Future<GetReadVersionReply> getConsistentReadVersion(SpanContext parentSpa
} }
} }
} catch (Error& e) { } catch (Error& e) {
if (e.code() != error_code_broken_promise && e.code() != error_code_batch_transaction_throttled) if (e.code() != error_code_broken_promise && e.code() != error_code_batch_transaction_throttled &&
e.code() != error_code_grv_proxy_memory_limit_exceeded)
TraceEvent(SevError, "GetConsistentReadVersionError").error(e); TraceEvent(SevError, "GetConsistentReadVersionError").error(e);
if (e.code() == error_code_batch_transaction_throttled && !cx->apiVersionAtLeast(630)) { if ((e.code() == error_code_batch_transaction_throttled ||
e.code() == error_code_grv_proxy_memory_limit_exceeded) &&
!cx->apiVersionAtLeast(630)) {
wait(delayJittered(5.0)); wait(delayJittered(5.0));
} else { } else {
throw; throw;
@ -7211,14 +7203,15 @@ Future<Void> Transaction::onError(Error const& e) {
return client_invalid_operation(); return client_invalid_operation();
} }
if (e.code() == error_code_not_committed || e.code() == error_code_commit_unknown_result || if (e.code() == error_code_not_committed || e.code() == error_code_commit_unknown_result ||
e.code() == error_code_database_locked || e.code() == error_code_proxy_memory_limit_exceeded || e.code() == error_code_database_locked || e.code() == error_code_commit_proxy_memory_limit_exceeded ||
e.code() == error_code_process_behind || e.code() == error_code_batch_transaction_throttled || e.code() == error_code_grv_proxy_memory_limit_exceeded || e.code() == error_code_process_behind ||
e.code() == error_code_tag_throttled) { e.code() == error_code_batch_transaction_throttled || e.code() == error_code_tag_throttled) {
if (e.code() == error_code_not_committed) if (e.code() == error_code_not_committed)
++trState->cx->transactionsNotCommitted; ++trState->cx->transactionsNotCommitted;
else if (e.code() == error_code_commit_unknown_result) else if (e.code() == error_code_commit_unknown_result)
++trState->cx->transactionsMaybeCommitted; ++trState->cx->transactionsMaybeCommitted;
else if (e.code() == error_code_proxy_memory_limit_exceeded) else if (e.code() == error_code_commit_proxy_memory_limit_exceeded ||
e.code() == error_code_grv_proxy_memory_limit_exceeded)
++trState->cx->transactionsResourceConstrained; ++trState->cx->transactionsResourceConstrained;
else if (e.code() == error_code_process_behind) else if (e.code() == error_code_process_behind)
++trState->cx->transactionsProcessBehind; ++trState->cx->transactionsProcessBehind;
@ -7606,9 +7599,7 @@ ACTOR Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(Reference<Transa
trState->cx->invalidateCache(locations[0].tenantEntry.prefix, keys); trState->cx->invalidateCache(locations[0].tenantEntry.prefix, keys);
wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, TaskPriority::DataDistribution)); wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, TaskPriority::DataDistribution));
} else if (e.code() == error_code_unknown_tenant) { } else if (e.code() == error_code_unknown_tenant) {
ASSERT(trState->tenant().present()); wait(trState->handleUnknownTenant());
trState->cx->invalidateCachedTenant(trState->tenant().get());
wait(delay(CLIENT_KNOBS->UNKNOWN_TENANT_RETRY_DELAY, trState->taskID));
} else { } else {
TraceEvent(SevError, "GetRangeSplitPoints").error(e); TraceEvent(SevError, "GetRangeSplitPoints").error(e);
throw; throw;
@ -7630,14 +7621,10 @@ ACTOR Future<TenantMapEntry> blobGranuleGetTenantEntry(Transaction* self, Key ra
self->trState->useProvisionalProxies, self->trState->useProvisionalProxies,
Reverse::False, Reverse::False,
latestVersion)); latestVersion));
if (self->trState->tenantId == TenantInfo::INVALID_TENANT) { self->trState->trySetTenantId(l.tenantEntry.id);
self->trState->tenantId = l.tenantEntry.id;
}
return l.tenantEntry; return l.tenantEntry;
} else { } else {
if (self->trState->tenantId == TenantInfo::INVALID_TENANT) { self->trState->trySetTenantId(cachedLocationInfo.get().tenantEntry.id);
self->trState->tenantId = cachedLocationInfo.get().tenantEntry.id;
}
return cachedLocationInfo.get().tenantEntry; return cachedLocationInfo.get().tenantEntry;
} }
} }
@ -7651,7 +7638,9 @@ Future<Standalone<VectorRef<KeyRef>>> Transaction::getRangeSplitPoints(KeyRange
// the blob granule requests are a bit funky because they piggyback off the existing transaction to read from the system // the blob granule requests are a bit funky because they piggyback off the existing transaction to read from the system
// keyspace // keyspace
ACTOR Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRangesActor(Transaction* self, KeyRange keyRange) { ACTOR Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRangesActor(Transaction* self,
KeyRange keyRange,
int rangeLimit) {
// FIXME: use streaming range read // FIXME: use streaming range read
state KeyRange currentRange = keyRange; state KeyRange currentRange = keyRange;
state Standalone<VectorRef<KeyRangeRef>> results; state Standalone<VectorRef<KeyRangeRef>> results;
@ -7674,7 +7663,7 @@ ACTOR Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRangesActor(Trans
// basically krmGetRange, but enable it to not use tenant without RAW_ACCESS by doing manual getRange with // basically krmGetRange, but enable it to not use tenant without RAW_ACCESS by doing manual getRange with
// UseTenant::False // UseTenant::False
GetRangeLimits limits(1000); GetRangeLimits limits(2 * rangeLimit + 2);
limits.minRows = 2; limits.minRows = 2;
RangeResult rawMapping = wait(getRange(self->trState, RangeResult rawMapping = wait(getRange(self->trState,
self->getReadVersion(), self->getReadVersion(),
@ -7696,6 +7685,9 @@ ACTOR Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRangesActor(Trans
if (blobGranuleMapping[i].value.size()) { if (blobGranuleMapping[i].value.size()) {
results.push_back(results.arena(), results.push_back(results.arena(),
KeyRangeRef(blobGranuleMapping[i].key, blobGranuleMapping[i + 1].key)); KeyRangeRef(blobGranuleMapping[i].key, blobGranuleMapping[i + 1].key));
if (results.size() == rangeLimit) {
return results;
}
} }
} }
results.arena().dependsOn(blobGranuleMapping.arena()); results.arena().dependsOn(blobGranuleMapping.arena());
@ -7707,8 +7699,8 @@ ACTOR Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRangesActor(Trans
} }
} }
Future<Standalone<VectorRef<KeyRangeRef>>> Transaction::getBlobGranuleRanges(const KeyRange& range) { Future<Standalone<VectorRef<KeyRangeRef>>> Transaction::getBlobGranuleRanges(const KeyRange& range, int rangeLimit) {
return ::getBlobGranuleRangesActor(this, range); return ::getBlobGranuleRangesActor(this, range, rangeLimit);
} }
// hack (for now) to get blob worker interface into load balance // hack (for now) to get blob worker interface into load balance
@ -8020,6 +8012,71 @@ ACTOR Future<Version> setPerpetualStorageWiggle(Database cx, bool enable, LockAw
return version; return version;
} }
ACTOR Future<Version> checkBlobSubrange(Database db, KeyRange keyRange, Optional<Version> version) {
state Transaction tr(db);
state Version readVersionOut = invalidVersion;
loop {
try {
wait(success(tr.readBlobGranules(keyRange, 0, version, &readVersionOut)));
return readVersionOut;
} catch (Error& e) {
wait(tr.onError(e));
}
}
}
ACTOR Future<Version> verifyBlobRangeActor(Reference<DatabaseContext> cx, KeyRange range, Optional<Version> version) {
state Database db(cx);
state Transaction tr(db);
state Standalone<VectorRef<KeyRangeRef>> allRanges;
state KeyRange curRegion = KeyRangeRef(range.begin, range.begin);
state Version readVersionOut = invalidVersion;
state int batchSize = CLIENT_KNOBS->BG_TOO_MANY_GRANULES / 2;
loop {
try {
wait(store(allRanges, tr.getBlobGranuleRanges(KeyRangeRef(curRegion.begin, range.end), 20 * batchSize)));
} catch (Error& e) {
wait(tr.onError(e));
}
if (allRanges.empty()) {
if (curRegion.begin < range.end) {
return invalidVersion;
}
return readVersionOut;
}
state std::vector<Future<Version>> checkParts;
// Chunk up to smaller ranges than this limit. Must be smaller than BG_TOO_MANY_GRANULES to not hit the limit
int batchCount = 0;
for (auto& it : allRanges) {
if (it.begin != curRegion.end) {
return invalidVersion;
}
curRegion = KeyRangeRef(curRegion.begin, it.end);
batchCount++;
if (batchCount == batchSize) {
checkParts.push_back(checkBlobSubrange(db, curRegion, version));
batchCount = 0;
curRegion = KeyRangeRef(curRegion.end, curRegion.end);
}
}
if (!curRegion.empty()) {
checkParts.push_back(checkBlobSubrange(db, curRegion, version));
}
wait(waitForAll(checkParts));
readVersionOut = checkParts.back().get();
curRegion = KeyRangeRef(curRegion.end, curRegion.end);
}
}
Future<Version> DatabaseContext::verifyBlobRange(const KeyRange& range, Optional<Version> version) {
return verifyBlobRangeActor(Reference<DatabaseContext>::addRef(this), range, version);
}
ACTOR Future<std::vector<std::pair<UID, StorageWiggleValue>>> readStorageWiggleValues(Database cx, ACTOR Future<std::vector<std::pair<UID, StorageWiggleValue>>> readStorageWiggleValues(Database cx,
bool primary, bool primary,
bool use_system_priority) { bool use_system_priority) {
@ -8649,7 +8706,7 @@ Future<DatabaseSharedState*> DatabaseContext::initSharedState() {
} }
void DatabaseContext::setSharedState(DatabaseSharedState* p) { void DatabaseContext::setSharedState(DatabaseSharedState* p) {
ASSERT(p->protocolVersion == currentProtocolVersion); ASSERT(p->protocolVersion == currentProtocolVersion());
sharedStatePtr = p; sharedStatePtr = p;
sharedStatePtr->refCount++; sharedStatePtr->refCount++;
} }
@ -8705,6 +8762,39 @@ Reference<ChangeFeedStorageData> DatabaseContext::getStorageData(StorageServerIn
return it->second; return it->second;
} }
Version DatabaseContext::getMinimumChangeFeedVersion() {
Version minVersion = std::numeric_limits<Version>::max();
for (auto& it : changeFeedUpdaters) {
minVersion = std::min(minVersion, it.second->version.get());
}
for (auto& it : notAtLatestChangeFeeds) {
if (it.second->getVersion() > 0) {
minVersion = std::min(minVersion, it.second->getVersion());
}
}
return minVersion;
}
void DatabaseContext::setDesiredChangeFeedVersion(Version v) {
for (auto& it : changeFeedUpdaters) {
if (it.second->version.get() < v && it.second->desired.get() < v) {
it.second->desired.set(v);
}
}
}
ChangeFeedData::ChangeFeedData(DatabaseContext* context)
: dbgid(deterministicRandom()->randomUniqueID()), context(context), notAtLatest(1) {
if (context) {
context->notAtLatestChangeFeeds[dbgid] = this;
}
}
ChangeFeedData::~ChangeFeedData() {
if (context) {
context->notAtLatestChangeFeeds.erase(dbgid);
}
}
Version ChangeFeedData::getVersion() { Version ChangeFeedData::getVersion() {
return lastReturnedVersion.get(); return lastReturnedVersion.get();
} }
@ -8896,6 +8986,9 @@ ACTOR Future<Void> partialChangeFeedStream(StorageServerInterface interf,
if (refresh.canBeSet() && !atLatestVersion && rep.atLatestVersion) { if (refresh.canBeSet() && !atLatestVersion && rep.atLatestVersion) {
atLatestVersion = true; atLatestVersion = true;
feedData->notAtLatest.set(feedData->notAtLatest.get() - 1); feedData->notAtLatest.set(feedData->notAtLatest.get() - 1);
if (feedData->notAtLatest.get() == 0 && feedData->context) {
feedData->context->notAtLatestChangeFeeds.erase(feedData->dbgid);
}
} }
if (refresh.canBeSet() && rep.minStreamVersion > storageData->version.get()) { if (refresh.canBeSet() && rep.minStreamVersion > storageData->version.get()) {
storageData->version.set(rep.minStreamVersion); storageData->version.set(rep.minStreamVersion);
@ -9099,6 +9192,9 @@ ACTOR Future<Void> mergeChangeFeedStream(Reference<DatabaseContext> db,
results->storageData.push_back(db->getStorageData(interfs[i].first)); results->storageData.push_back(db->getStorageData(interfs[i].first));
} }
results->notAtLatest.set(interfs.size()); results->notAtLatest.set(interfs.size());
if (results->context) {
results->context->notAtLatestChangeFeeds[results->dbgid] = results.getPtr();
}
refresh.send(Void()); refresh.send(Void());
for (int i = 0; i < interfs.size(); i++) { for (int i = 0; i < interfs.size(); i++) {
@ -9187,10 +9283,21 @@ ACTOR Future<Void> singleChangeFeedStreamInternal(KeyRange range,
// update lastReturned once the previous mutation has been consumed // update lastReturned once the previous mutation has been consumed
if (*begin - 1 > results->lastReturnedVersion.get()) { if (*begin - 1 > results->lastReturnedVersion.get()) {
results->lastReturnedVersion.set(*begin - 1); results->lastReturnedVersion.set(*begin - 1);
if (!refresh.canBeSet()) {
try {
// refresh is set if and only if this actor is cancelled
wait(Future<Void>(Void()));
// Catch any unexpected behavior if the above contract is broken
ASSERT(false);
} catch (Error& e) {
ASSERT(e.code() == error_code_actor_cancelled);
throw;
}
}
} }
loop { loop {
ASSERT(refresh.canBeSet());
state ChangeFeedStreamReply feedReply = waitNext(results->streams[0].getFuture()); state ChangeFeedStreamReply feedReply = waitNext(results->streams[0].getFuture());
*begin = feedReply.mutations.back().version + 1; *begin = feedReply.mutations.back().version + 1;
@ -9240,6 +9347,9 @@ ACTOR Future<Void> singleChangeFeedStreamInternal(KeyRange range,
if (!atLatest && feedReply.atLatestVersion) { if (!atLatest && feedReply.atLatestVersion) {
atLatest = true; atLatest = true;
results->notAtLatest.set(0); results->notAtLatest.set(0);
if (results->context) {
results->context->notAtLatestChangeFeeds.erase(results->dbgid);
}
} }
if (feedReply.minStreamVersion > results->storageData[0]->version.get()) { if (feedReply.minStreamVersion > results->storageData[0]->version.get()) {
@ -9291,6 +9401,9 @@ ACTOR Future<Void> singleChangeFeedStream(Reference<DatabaseContext> db,
Promise<Void> refresh = results->refresh; Promise<Void> refresh = results->refresh;
results->refresh = Promise<Void>(); results->refresh = Promise<Void>();
results->notAtLatest.set(1); results->notAtLatest.set(1);
if (results->context) {
results->context->notAtLatestChangeFeeds[results->dbgid] = results.getPtr();
}
refresh.send(Void()); refresh.send(Void());
wait(results->streams[0].onError() || singleChangeFeedStreamInternal(range, results, rangeID, begin, end)); wait(results->streams[0].onError() || singleChangeFeedStreamInternal(range, results, rangeID, begin, end));
@ -9417,6 +9530,9 @@ ACTOR Future<Void> getChangeFeedStreamActor(Reference<DatabaseContext> db,
} }
if (results->notAtLatest.get() == 0) { if (results->notAtLatest.get() == 0) {
results->notAtLatest.set(1); results->notAtLatest.set(1);
if (results->context) {
results->context->notAtLatestChangeFeeds[results->dbgid] = results.getPtr();
}
} }
if (e.code() == error_code_wrong_shard_server || e.code() == error_code_all_alternatives_failed || if (e.code() == error_code_wrong_shard_server || e.code() == error_code_all_alternatives_failed ||
@ -9670,6 +9786,7 @@ Reference<DatabaseContext::TransactionT> DatabaseContext::createTransaction() {
return makeReference<ReadYourWritesTransaction>(Database(Reference<DatabaseContext>::addRef(this))); return makeReference<ReadYourWritesTransaction>(Database(Reference<DatabaseContext>::addRef(this)));
} }
// BlobGranule API.
ACTOR Future<Key> purgeBlobGranulesActor(Reference<DatabaseContext> db, ACTOR Future<Key> purgeBlobGranulesActor(Reference<DatabaseContext> db,
KeyRange range, KeyRange range,
Version purgeVersion, Version purgeVersion,
@ -9681,11 +9798,6 @@ ACTOR Future<Key> purgeBlobGranulesActor(Reference<DatabaseContext> db,
state KeyRange purgeRange = range; state KeyRange purgeRange = range;
state bool loadedTenantPrefix = false; state bool loadedTenantPrefix = false;
// FIXME: implement force
if (force) {
throw unsupported_operation();
}
loop { loop {
try { try {
tr.setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS); tr.setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
@ -9766,6 +9878,89 @@ Future<Void> DatabaseContext::waitPurgeGranulesComplete(Key purgeKey) {
return waitPurgeGranulesCompleteActor(Reference<DatabaseContext>::addRef(this), purgeKey); return waitPurgeGranulesCompleteActor(Reference<DatabaseContext>::addRef(this), purgeKey);
} }
ACTOR Future<bool> setBlobRangeActor(Reference<DatabaseContext> cx, KeyRange range, bool active) {
state Database db(cx);
state Reference<ReadYourWritesTransaction> tr = makeReference<ReadYourWritesTransaction>(db);
state Value value = active ? blobRangeActive : blobRangeInactive;
loop {
try {
tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
tr->setOption(FDBTransactionOptions::PRIORITY_SYSTEM_IMMEDIATE);
if (active) {
state RangeResult results = wait(krmGetRanges(tr, blobRangeKeys.begin, range));
ASSERT(results.size() >= 2);
if (results[0].key == range.begin && results[1].key == range.end &&
results[0].value == blobRangeActive) {
return true;
} else {
for (int i = 0; i < results.size(); i++) {
if (results[i].value == blobRangeActive) {
return false;
}
}
}
}
tr->set(blobRangeChangeKey, deterministicRandom()->randomUniqueID().toString());
// This is not coalescing because we want to keep each range logically separate.
wait(krmSetRange(tr, blobRangeKeys.begin, range, value));
wait(tr->commit());
printf("Successfully updated blob range [%s - %s) to %s\n",
range.begin.printable().c_str(),
range.end.printable().c_str(),
value.printable().c_str());
return true;
} catch (Error& e) {
wait(tr->onError(e));
}
}
}
Future<bool> DatabaseContext::blobbifyRange(KeyRange range) {
return setBlobRangeActor(Reference<DatabaseContext>::addRef(this), range, true);
}
Future<bool> DatabaseContext::unblobbifyRange(KeyRange range) {
return setBlobRangeActor(Reference<DatabaseContext>::addRef(this), range, false);
}
ACTOR Future<Standalone<VectorRef<KeyRangeRef>>> listBlobbifiedRangesActor(Reference<DatabaseContext> cx,
KeyRange range,
int rangeLimit) {
state Database db(cx);
state Reference<ReadYourWritesTransaction> tr = makeReference<ReadYourWritesTransaction>(db);
state Standalone<VectorRef<KeyRangeRef>> blobRanges;
loop {
try {
tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
state RangeResult results = wait(krmGetRanges(tr, blobRangeKeys.begin, range, 2 * rangeLimit + 2));
blobRanges.arena().dependsOn(results.arena());
for (int i = 0; i < results.size() - 1; i++) {
if (results[i].value == LiteralStringRef("1")) {
blobRanges.push_back(blobRanges.arena(), KeyRangeRef(results[i].value, results[i + 1].value));
}
if (blobRanges.size() == rangeLimit) {
return blobRanges;
}
}
return blobRanges;
} catch (Error& e) {
wait(tr->onError(e));
}
}
}
Future<Standalone<VectorRef<KeyRangeRef>>> DatabaseContext::listBlobbifiedRanges(KeyRange range, int rowLimit) {
return listBlobbifiedRangesActor(Reference<DatabaseContext>::addRef(this), range, rowLimit);
}
int64_t getMaxKeySize(KeyRef const& key) { int64_t getMaxKeySize(KeyRef const& key) {
return getMaxWriteKeySize(key, true); return getMaxWriteKeySize(key, true);
} }

View File

@ -1783,7 +1783,8 @@ Future<Standalone<VectorRef<KeyRef>>> ReadYourWritesTransaction::getRangeSplitPo
return waitOrError(tr.getRangeSplitPoints(range, chunkSize), resetPromise.getFuture()); return waitOrError(tr.getRangeSplitPoints(range, chunkSize), resetPromise.getFuture());
} }
Future<Standalone<VectorRef<KeyRangeRef>>> ReadYourWritesTransaction::getBlobGranuleRanges(const KeyRange& range) { Future<Standalone<VectorRef<KeyRangeRef>>> ReadYourWritesTransaction::getBlobGranuleRanges(const KeyRange& range,
int rangeLimit) {
if (checkUsedDuringCommit()) { if (checkUsedDuringCommit()) {
return used_during_commit(); return used_during_commit();
} }
@ -1794,7 +1795,7 @@ Future<Standalone<VectorRef<KeyRangeRef>>> ReadYourWritesTransaction::getBlobGra
if (range.begin > maxKey || range.end > maxKey) if (range.begin > maxKey || range.end > maxKey)
return key_outside_legal_range(); return key_outside_legal_range();
return waitOrError(tr.getBlobGranuleRanges(range), resetPromise.getFuture()); return waitOrError(tr.getBlobGranuleRanges(range, rangeLimit), resetPromise.getFuture());
} }
Future<Standalone<VectorRef<BlobGranuleChunkRef>>> ReadYourWritesTransaction::readBlobGranules( Future<Standalone<VectorRef<BlobGranuleChunkRef>>> ReadYourWritesTransaction::readBlobGranules(

View File

@ -427,7 +427,9 @@ const KeyRef JSONSchemas::statusSchema = LiteralStringRef(R"statusSchema(
"log_server_min_free_space", "log_server_min_free_space",
"log_server_min_free_space_ratio", "log_server_min_free_space_ratio",
"storage_server_durability_lag", "storage_server_durability_lag",
"storage_server_list_fetch_failed" "storage_server_list_fetch_failed",
"blob_worker_lag",
"blob_worker_missing"
] ]
}, },
"description":"The database is not being saturated by the workload." "description":"The database is not being saturated by the workload."
@ -448,7 +450,9 @@ const KeyRef JSONSchemas::statusSchema = LiteralStringRef(R"statusSchema(
"log_server_min_free_space", "log_server_min_free_space",
"log_server_min_free_space_ratio", "log_server_min_free_space_ratio",
"storage_server_durability_lag", "storage_server_durability_lag",
"storage_server_list_fetch_failed" "storage_server_list_fetch_failed",
"blob_worker_lag",
"blob_worker_missing"
] ]
}, },
"description":"The database is not being saturated by the workload." "description":"The database is not being saturated by the workload."

View File

@ -50,7 +50,6 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
// TLogs // TLogs
init( TLOG_TIMEOUT, 0.4 ); //cannot buggify because of availability init( TLOG_TIMEOUT, 0.4 ); //cannot buggify because of availability
init( TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS, 60 ); if( randomize && BUGGIFY ) TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS = deterministicRandom()->randomInt(5,10); init( TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS, 60 ); if( randomize && BUGGIFY ) TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS = deterministicRandom()->randomInt(5,10);
init( RECOVERY_TLOG_SMART_QUORUM_DELAY, 0.25 ); if( randomize && BUGGIFY ) RECOVERY_TLOG_SMART_QUORUM_DELAY = 0.0; // smaller might be better for bug amplification
init( TLOG_STORAGE_MIN_UPDATE_INTERVAL, 0.5 ); init( TLOG_STORAGE_MIN_UPDATE_INTERVAL, 0.5 );
init( BUGGIFY_TLOG_STORAGE_MIN_UPDATE_INTERVAL, 30 ); init( BUGGIFY_TLOG_STORAGE_MIN_UPDATE_INTERVAL, 30 );
init( DESIRED_TOTAL_BYTES, 150000 ); if( randomize && BUGGIFY ) DESIRED_TOTAL_BYTES = 10000; init( DESIRED_TOTAL_BYTES, 150000 ); if( randomize && BUGGIFY ) DESIRED_TOTAL_BYTES = 10000;
@ -58,10 +57,6 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( UPDATE_DELAY, 0.001 ); init( UPDATE_DELAY, 0.001 );
init( MAXIMUM_PEEK_BYTES, 10e6 ); init( MAXIMUM_PEEK_BYTES, 10e6 );
init( APPLY_MUTATION_BYTES, 1e6 ); init( APPLY_MUTATION_BYTES, 1e6 );
init( RECOVERY_DATA_BYTE_LIMIT, 100000 );
init( BUGGIFY_RECOVERY_DATA_LIMIT, 1000 );
init( LONG_TLOG_COMMIT_TIME, 0.25 ); //cannot buggify because of recovery time
init( LARGE_TLOG_COMMIT_BYTES, 4<<20 );
init( BUGGIFY_RECOVER_MEMORY_LIMIT, 1e6 ); init( BUGGIFY_RECOVER_MEMORY_LIMIT, 1e6 );
init( BUGGIFY_WORKER_REMOVED_MAX_LAG, 30 ); init( BUGGIFY_WORKER_REMOVED_MAX_LAG, 30 );
init( UPDATE_STORAGE_BYTE_LIMIT, 1e6 ); init( UPDATE_STORAGE_BYTE_LIMIT, 1e6 );
@ -133,16 +128,15 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( BG_REBALANCE_POLLING_INTERVAL, 10.0 ); init( BG_REBALANCE_POLLING_INTERVAL, 10.0 );
init( BG_REBALANCE_SWITCH_CHECK_INTERVAL, 5.0 ); if (randomize && BUGGIFY) BG_REBALANCE_SWITCH_CHECK_INTERVAL = 1.0; init( BG_REBALANCE_SWITCH_CHECK_INTERVAL, 5.0 ); if (randomize && BUGGIFY) BG_REBALANCE_SWITCH_CHECK_INTERVAL = 1.0;
init( DD_QUEUE_LOGGING_INTERVAL, 5.0 ); init( DD_QUEUE_LOGGING_INTERVAL, 5.0 );
init( DD_QUEUE_COUNTER_REFRESH_INTERVAL, 60.0 );
// 100 / 60 < 2 trace/sec ~ 2 * 200 = 400b/sec
init( DD_QUEUE_COUNTER_MAX_LOG, 100 ); if( randomize && BUGGIFY ) DD_QUEUE_COUNTER_MAX_LOG = 1;
init( DD_QUEUE_COUNTER_SUMMARIZE, true );
init( RELOCATION_PARALLELISM_PER_SOURCE_SERVER, 2 ); if( randomize && BUGGIFY ) RELOCATION_PARALLELISM_PER_SOURCE_SERVER = 1; init( RELOCATION_PARALLELISM_PER_SOURCE_SERVER, 2 ); if( randomize && BUGGIFY ) RELOCATION_PARALLELISM_PER_SOURCE_SERVER = 1;
init( RELOCATION_PARALLELISM_PER_DEST_SERVER, 10 ); if( randomize && BUGGIFY ) RELOCATION_PARALLELISM_PER_DEST_SERVER = 1; // Note: if this is smaller than FETCH_KEYS_PARALLELISM, this will artificially reduce performance. The current default of 10 is probably too high but is set conservatively for now. init( RELOCATION_PARALLELISM_PER_DEST_SERVER, 10 ); if( randomize && BUGGIFY ) RELOCATION_PARALLELISM_PER_DEST_SERVER = 1; // Note: if this is smaller than FETCH_KEYS_PARALLELISM, this will artificially reduce performance. The current default of 10 is probably too high but is set conservatively for now.
init( DD_QUEUE_MAX_KEY_SERVERS, 100 ); if( randomize && BUGGIFY ) DD_QUEUE_MAX_KEY_SERVERS = 1; init( DD_QUEUE_MAX_KEY_SERVERS, 100 ); if( randomize && BUGGIFY ) DD_QUEUE_MAX_KEY_SERVERS = 1;
init( DD_REBALANCE_PARALLELISM, 50 ); init( DD_REBALANCE_PARALLELISM, 50 );
init( DD_REBALANCE_RESET_AMOUNT, 30 ); init( DD_REBALANCE_RESET_AMOUNT, 30 );
init( BG_DD_MAX_WAIT, 120.0 );
init( BG_DD_MIN_WAIT, 0.1 );
init( BG_DD_INCREASE_RATE, 1.10 );
init( BG_DD_DECREASE_RATE, 1.02 );
init( BG_DD_SATURATION_DELAY, 1.0 );
init( INFLIGHT_PENALTY_HEALTHY, 1.0 ); init( INFLIGHT_PENALTY_HEALTHY, 1.0 );
init( INFLIGHT_PENALTY_UNHEALTHY, 500.0 ); init( INFLIGHT_PENALTY_UNHEALTHY, 500.0 );
init( INFLIGHT_PENALTY_ONE_LEFT, 1000.0 ); init( INFLIGHT_PENALTY_ONE_LEFT, 1000.0 );
@ -250,7 +244,6 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( SERVER_LIST_DELAY, 1.0 ); init( SERVER_LIST_DELAY, 1.0 );
init( RECRUITMENT_IDLE_DELAY, 1.0 ); init( RECRUITMENT_IDLE_DELAY, 1.0 );
init( STORAGE_RECRUITMENT_DELAY, 10.0 ); init( STORAGE_RECRUITMENT_DELAY, 10.0 );
init( BLOB_WORKER_RECRUITMENT_DELAY, 10.0 );
init( TSS_HACK_IDENTITY_MAPPING, false ); // THIS SHOULD NEVER BE SET IN PROD. Only for performance testing init( TSS_HACK_IDENTITY_MAPPING, false ); // THIS SHOULD NEVER BE SET IN PROD. Only for performance testing
init( TSS_RECRUITMENT_TIMEOUT, 3*STORAGE_RECRUITMENT_DELAY ); if (randomize && BUGGIFY ) TSS_RECRUITMENT_TIMEOUT = 1.0; // Super low timeout should cause tss recruitments to fail init( TSS_RECRUITMENT_TIMEOUT, 3*STORAGE_RECRUITMENT_DELAY ); if (randomize && BUGGIFY ) TSS_RECRUITMENT_TIMEOUT = 1.0; // Super low timeout should cause tss recruitments to fail
init( TSS_DD_CHECK_INTERVAL, 60.0 ); if (randomize && BUGGIFY ) TSS_DD_CHECK_INTERVAL = 1.0; // May kill all TSS quickly init( TSS_DD_CHECK_INTERVAL, 60.0 ); if (randomize && BUGGIFY ) TSS_DD_CHECK_INTERVAL = 1.0; // May kill all TSS quickly
@ -292,6 +285,7 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( DD_TEAM_ZERO_SERVER_LEFT_LOG_DELAY, 120 ); if( randomize && BUGGIFY ) DD_TEAM_ZERO_SERVER_LEFT_LOG_DELAY = 5; init( DD_TEAM_ZERO_SERVER_LEFT_LOG_DELAY, 120 ); if( randomize && BUGGIFY ) DD_TEAM_ZERO_SERVER_LEFT_LOG_DELAY = 5;
init( DD_STORAGE_WIGGLE_PAUSE_THRESHOLD, 10 ); if( randomize && BUGGIFY ) DD_STORAGE_WIGGLE_PAUSE_THRESHOLD = 1000; init( DD_STORAGE_WIGGLE_PAUSE_THRESHOLD, 10 ); if( randomize && BUGGIFY ) DD_STORAGE_WIGGLE_PAUSE_THRESHOLD = 1000;
init( DD_STORAGE_WIGGLE_STUCK_THRESHOLD, 20 ); init( DD_STORAGE_WIGGLE_STUCK_THRESHOLD, 20 );
init( DD_STORAGE_WIGGLE_MIN_SS_AGE_SEC, isSimulated ? 2 : 21 * 60 * 60 * 24 ); if(randomize && BUGGIFY) DD_STORAGE_WIGGLE_MIN_SS_AGE_SEC = isSimulated ? 0: 120;
init( DD_TENANT_AWARENESS_ENABLED, false ); init( DD_TENANT_AWARENESS_ENABLED, false );
init( TENANT_CACHE_LIST_REFRESH_INTERVAL, 2.0 ); init( TENANT_CACHE_LIST_REFRESH_INTERVAL, 2.0 );
@ -400,6 +394,7 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( ROCKSDB_PERFCONTEXT_ENABLE, false ); if( randomize && BUGGIFY ) ROCKSDB_PERFCONTEXT_ENABLE = deterministicRandom()->coinflip() ? false : true; init( ROCKSDB_PERFCONTEXT_ENABLE, false ); if( randomize && BUGGIFY ) ROCKSDB_PERFCONTEXT_ENABLE = deterministicRandom()->coinflip() ? false : true;
init( ROCKSDB_PERFCONTEXT_SAMPLE_RATE, 0.0001 ); init( ROCKSDB_PERFCONTEXT_SAMPLE_RATE, 0.0001 );
init( ROCKSDB_METRICS_SAMPLE_INTERVAL, 0.0);
init( ROCKSDB_MAX_SUBCOMPACTIONS, 2 ); init( ROCKSDB_MAX_SUBCOMPACTIONS, 2 );
init( ROCKSDB_SOFT_PENDING_COMPACT_BYTES_LIMIT, 64000000000 ); // 64GB, Rocksdb option, Writes will slow down. init( ROCKSDB_SOFT_PENDING_COMPACT_BYTES_LIMIT, 64000000000 ); // 64GB, Rocksdb option, Writes will slow down.
init( ROCKSDB_HARD_PENDING_COMPACT_BYTES_LIMIT, 100000000000 ); // 100GB, Rocksdb option, Writes will stall. init( ROCKSDB_HARD_PENDING_COMPACT_BYTES_LIMIT, 100000000000 ); // 100GB, Rocksdb option, Writes will stall.
@ -412,6 +407,10 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( ROCKSDB_COMPACTION_READAHEAD_SIZE, 32768 ); // 32 KB, performs bigger reads when doing compaction. init( ROCKSDB_COMPACTION_READAHEAD_SIZE, 32768 ); // 32 KB, performs bigger reads when doing compaction.
init( ROCKSDB_BLOCK_SIZE, 32768 ); // 32 KB, size of the block in rocksdb cache. init( ROCKSDB_BLOCK_SIZE, 32768 ); // 32 KB, size of the block in rocksdb cache.
init( ENABLE_SHARDED_ROCKSDB, false ); init( ENABLE_SHARDED_ROCKSDB, false );
init( ROCKSDB_WRITE_BUFFER_SIZE, 1 << 30 ); // 1G
init( ROCKSDB_MAX_TOTAL_WAL_SIZE, 0 ); // RocksDB default.
init( ROCKSDB_MAX_BACKGROUND_JOBS, 2 ); // RocksDB default.
init( ROCKSDB_DELETE_OBSOLETE_FILE_PERIOD, 21600 ); // 6h, RocksDB default.
// Leader election // Leader election
bool longLeaderElection = randomize && BUGGIFY; bool longLeaderElection = randomize && BUGGIFY;
@ -613,6 +612,8 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( RATEKEEPER_DEFAULT_LIMIT, 1e6 ); if( randomize && BUGGIFY ) RATEKEEPER_DEFAULT_LIMIT = 0; init( RATEKEEPER_DEFAULT_LIMIT, 1e6 ); if( randomize && BUGGIFY ) RATEKEEPER_DEFAULT_LIMIT = 0;
init( RATEKEEPER_LIMIT_REASON_SAMPLE_RATE, 0.1 ); init( RATEKEEPER_LIMIT_REASON_SAMPLE_RATE, 0.1 );
init( RATEKEEPER_PRINT_LIMIT_REASON, false ); if( randomize && BUGGIFY ) RATEKEEPER_PRINT_LIMIT_REASON = true; init( RATEKEEPER_PRINT_LIMIT_REASON, false ); if( randomize && BUGGIFY ) RATEKEEPER_PRINT_LIMIT_REASON = true;
init( RATEKEEPER_MIN_RATE, 0.0 );
init( RATEKEEPER_MAX_RATE, 1e9 );
bool smallStorageTarget = randomize && BUGGIFY; bool smallStorageTarget = randomize && BUGGIFY;
init( TARGET_BYTES_PER_STORAGE_SERVER, 1000e6 ); if( smallStorageTarget ) TARGET_BYTES_PER_STORAGE_SERVER = 3000e3; init( TARGET_BYTES_PER_STORAGE_SERVER, 1000e6 ); if( smallStorageTarget ) TARGET_BYTES_PER_STORAGE_SERVER = 3000e3;
@ -662,6 +663,16 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( DURABILITY_LAG_REDUCTION_RATE, 0.9999 ); init( DURABILITY_LAG_REDUCTION_RATE, 0.9999 );
init( DURABILITY_LAG_INCREASE_RATE, 1.001 ); init( DURABILITY_LAG_INCREASE_RATE, 1.001 );
init( STORAGE_SERVER_LIST_FETCH_TIMEOUT, 20.0 ); init( STORAGE_SERVER_LIST_FETCH_TIMEOUT, 20.0 );
init( BW_THROTTLING_ENABLED, true );
init( TARGET_BW_LAG, 50.0 );
init( TARGET_BW_LAG_BATCH, 20.0 );
init( TARGET_BW_LAG_UPDATE, 9.0 );
init( MIN_BW_HISTORY, 10 );
init( BW_ESTIMATION_INTERVAL, 10.0 );
init( BW_LAG_INCREASE_AMOUNT, 1.1 );
init( BW_LAG_DECREASE_AMOUNT, 0.9 );
init( BW_FETCH_WORKERS_INTERVAL, 5.0 );
init( BW_RW_LOGGING_INTERVAL, 5.0 );
init( MAX_AUTO_THROTTLED_TRANSACTION_TAGS, 5 ); if(randomize && BUGGIFY) MAX_AUTO_THROTTLED_TRANSACTION_TAGS = 1; init( MAX_AUTO_THROTTLED_TRANSACTION_TAGS, 5 ); if(randomize && BUGGIFY) MAX_AUTO_THROTTLED_TRANSACTION_TAGS = 1;
init( MAX_MANUAL_THROTTLED_TRANSACTION_TAGS, 40 ); if(randomize && BUGGIFY) MAX_MANUAL_THROTTLED_TRANSACTION_TAGS = 1; init( MAX_MANUAL_THROTTLED_TRANSACTION_TAGS, 40 ); if(randomize && BUGGIFY) MAX_MANUAL_THROTTLED_TRANSACTION_TAGS = 1;
@ -676,6 +687,7 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( AUTO_TAG_THROTTLING_ENABLED, true ); if(randomize && BUGGIFY) AUTO_TAG_THROTTLING_ENABLED = false; init( AUTO_TAG_THROTTLING_ENABLED, true ); if(randomize && BUGGIFY) AUTO_TAG_THROTTLING_ENABLED = false;
init( SS_THROTTLE_TAGS_TRACKED, 1 ); if(randomize && BUGGIFY) SS_THROTTLE_TAGS_TRACKED = deterministicRandom()->randomInt(1, 10); init( SS_THROTTLE_TAGS_TRACKED, 1 ); if(randomize && BUGGIFY) SS_THROTTLE_TAGS_TRACKED = deterministicRandom()->randomInt(1, 10);
init( GLOBAL_TAG_THROTTLING, false ); init( GLOBAL_TAG_THROTTLING, false );
init( ENFORCE_TAG_THROTTLING_ON_PROXIES, false );
init( GLOBAL_TAG_THROTTLING_MIN_RATE, 1.0 ); init( GLOBAL_TAG_THROTTLING_MIN_RATE, 1.0 );
init( GLOBAL_TAG_THROTTLING_FOLDING_TIME, 10.0 ); init( GLOBAL_TAG_THROTTLING_FOLDING_TIME, 10.0 );
@ -703,7 +715,6 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( FETCH_KEYS_PARALLELISM, 2 ); init( FETCH_KEYS_PARALLELISM, 2 );
init( FETCH_KEYS_PARALLELISM_FULL, 10 ); init( FETCH_KEYS_PARALLELISM_FULL, 10 );
init( FETCH_KEYS_LOWER_PRIORITY, 0 ); init( FETCH_KEYS_LOWER_PRIORITY, 0 );
init( FETCH_CHANGEFEED_PARALLELISM, 4 );
init( SERVE_FETCH_CHECKPOINT_PARALLELISM, 4 ); init( SERVE_FETCH_CHECKPOINT_PARALLELISM, 4 );
init( BUGGIFY_BLOCK_BYTES, 10000 ); init( BUGGIFY_BLOCK_BYTES, 10000 );
init( STORAGE_RECOVERY_VERSION_LAG_LIMIT, 2 * MAX_READ_TRANSACTION_LIFE_VERSIONS ); init( STORAGE_RECOVERY_VERSION_LAG_LIMIT, 2 * MAX_READ_TRANSACTION_LIFE_VERSIONS );
@ -712,7 +723,6 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( STORAGE_DURABILITY_LAG_REJECT_THRESHOLD, 0.25 ); init( STORAGE_DURABILITY_LAG_REJECT_THRESHOLD, 0.25 );
init( STORAGE_DURABILITY_LAG_MIN_RATE, 0.1 ); init( STORAGE_DURABILITY_LAG_MIN_RATE, 0.1 );
init( STORAGE_COMMIT_INTERVAL, 0.5 ); if( randomize && BUGGIFY ) STORAGE_COMMIT_INTERVAL = 2.0; init( STORAGE_COMMIT_INTERVAL, 0.5 ); if( randomize && BUGGIFY ) STORAGE_COMMIT_INTERVAL = 2.0;
init( UPDATE_SHARD_VERSION_INTERVAL, 0.25 ); if( randomize && BUGGIFY ) UPDATE_SHARD_VERSION_INTERVAL = 1.0;
init( BYTE_SAMPLING_FACTOR, 250 ); //cannot buggify because of differences in restarting tests init( BYTE_SAMPLING_FACTOR, 250 ); //cannot buggify because of differences in restarting tests
init( BYTE_SAMPLING_OVERHEAD, 100 ); init( BYTE_SAMPLING_OVERHEAD, 100 );
init( MAX_STORAGE_SERVER_WATCH_BYTES, 100e6 ); if( randomize && BUGGIFY ) MAX_STORAGE_SERVER_WATCH_BYTES = 10e3; init( MAX_STORAGE_SERVER_WATCH_BYTES, 100e6 ); if( randomize && BUGGIFY ) MAX_STORAGE_SERVER_WATCH_BYTES = 10e3;
@ -721,7 +731,6 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( BYTE_SAMPLE_LOAD_PARALLELISM, 8 ); if( randomize && BUGGIFY ) BYTE_SAMPLE_LOAD_PARALLELISM = 1; init( BYTE_SAMPLE_LOAD_PARALLELISM, 8 ); if( randomize && BUGGIFY ) BYTE_SAMPLE_LOAD_PARALLELISM = 1;
init( BYTE_SAMPLE_LOAD_DELAY, 0.0 ); if( randomize && BUGGIFY ) BYTE_SAMPLE_LOAD_DELAY = 0.1; init( BYTE_SAMPLE_LOAD_DELAY, 0.0 ); if( randomize && BUGGIFY ) BYTE_SAMPLE_LOAD_DELAY = 0.1;
init( BYTE_SAMPLE_START_DELAY, 1.0 ); if( randomize && BUGGIFY ) BYTE_SAMPLE_START_DELAY = 0.0; init( BYTE_SAMPLE_START_DELAY, 1.0 ); if( randomize && BUGGIFY ) BYTE_SAMPLE_START_DELAY = 0.0;
init( UPDATE_STORAGE_PROCESS_STATS_INTERVAL, 5.0 );
init( BEHIND_CHECK_DELAY, 2.0 ); init( BEHIND_CHECK_DELAY, 2.0 );
init( BEHIND_CHECK_COUNT, 2 ); init( BEHIND_CHECK_COUNT, 2 );
init( BEHIND_CHECK_VERSIONS, 5 * VERSIONS_PER_SECOND ); init( BEHIND_CHECK_VERSIONS, 5 * VERSIONS_PER_SECOND );
@ -785,7 +794,6 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
// Dynamic Knobs (implementation) // Dynamic Knobs (implementation)
init( COMPACTION_INTERVAL, isSimulated ? 5.0 : 300.0 ); init( COMPACTION_INTERVAL, isSimulated ? 5.0 : 300.0 );
init( UPDATE_NODE_TIMEOUT, 3.0 );
init( GET_COMMITTED_VERSION_TIMEOUT, 3.0 ); init( GET_COMMITTED_VERSION_TIMEOUT, 3.0 );
init( GET_SNAPSHOT_AND_CHANGES_TIMEOUT, 3.0 ); init( GET_SNAPSHOT_AND_CHANGES_TIMEOUT, 3.0 );
init( FETCH_CHANGES_TIMEOUT, 3.0 ); init( FETCH_CHANGES_TIMEOUT, 3.0 );
@ -801,14 +809,6 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( DISABLE_DUPLICATE_LOG_WARNING, false ); init( DISABLE_DUPLICATE_LOG_WARNING, false );
init( HISTOGRAM_REPORT_INTERVAL, 300.0 ); init( HISTOGRAM_REPORT_INTERVAL, 300.0 );
// IPager
init( PAGER_RESERVED_PAGES, 1 );
// IndirectShadowPager
init( FREE_PAGE_VACUUM_THRESHOLD, 1 );
init( VACUUM_QUEUE_SIZE, 100000 );
init( VACUUM_BYTES_PER_SECOND, 1e6 );
// Timekeeper // Timekeeper
init( TIME_KEEPER_DELAY, 10 ); init( TIME_KEEPER_DELAY, 10 );
init( TIME_KEEPER_MAX_ENTRIES, 3600 * 24 * 30 * 6 ); if( randomize && BUGGIFY ) { TIME_KEEPER_MAX_ENTRIES = 2; } init( TIME_KEEPER_MAX_ENTRIES, 3600 * 24 * 30 * 6 ); if( randomize && BUGGIFY ) { TIME_KEEPER_MAX_ENTRIES = 2; }
@ -827,11 +827,9 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( FASTRESTORE_ROLE_LOGGING_DELAY, 5 ); if( randomize && BUGGIFY ) { FASTRESTORE_ROLE_LOGGING_DELAY = deterministicRandom()->random01() * 60 + 1; } init( FASTRESTORE_ROLE_LOGGING_DELAY, 5 ); if( randomize && BUGGIFY ) { FASTRESTORE_ROLE_LOGGING_DELAY = deterministicRandom()->random01() * 60 + 1; }
init( FASTRESTORE_UPDATE_PROCESS_STATS_INTERVAL, 5 ); if( randomize && BUGGIFY ) { FASTRESTORE_UPDATE_PROCESS_STATS_INTERVAL = deterministicRandom()->random01() * 60 + 1; } init( FASTRESTORE_UPDATE_PROCESS_STATS_INTERVAL, 5 ); if( randomize && BUGGIFY ) { FASTRESTORE_UPDATE_PROCESS_STATS_INTERVAL = deterministicRandom()->random01() * 60 + 1; }
init( FASTRESTORE_ATOMICOP_WEIGHT, 1 ); if( randomize && BUGGIFY ) { FASTRESTORE_ATOMICOP_WEIGHT = deterministicRandom()->random01() * 200 + 1; } init( FASTRESTORE_ATOMICOP_WEIGHT, 1 ); if( randomize && BUGGIFY ) { FASTRESTORE_ATOMICOP_WEIGHT = deterministicRandom()->random01() * 200 + 1; }
init( FASTRESTORE_APPLYING_PARALLELISM, 10000 ); if( randomize && BUGGIFY ) { FASTRESTORE_APPLYING_PARALLELISM = deterministicRandom()->random01() * 10 + 1; }
init( FASTRESTORE_MONITOR_LEADER_DELAY, 5 ); if( randomize && BUGGIFY ) { FASTRESTORE_MONITOR_LEADER_DELAY = deterministicRandom()->random01() * 100; } init( FASTRESTORE_MONITOR_LEADER_DELAY, 5 ); if( randomize && BUGGIFY ) { FASTRESTORE_MONITOR_LEADER_DELAY = deterministicRandom()->random01() * 100; }
init( FASTRESTORE_STRAGGLER_THRESHOLD_SECONDS, 60 ); if( randomize && BUGGIFY ) { FASTRESTORE_STRAGGLER_THRESHOLD_SECONDS = deterministicRandom()->random01() * 240 + 10; } init( FASTRESTORE_STRAGGLER_THRESHOLD_SECONDS, 60 ); if( randomize && BUGGIFY ) { FASTRESTORE_STRAGGLER_THRESHOLD_SECONDS = deterministicRandom()->random01() * 240 + 10; }
init( FASTRESTORE_TRACK_REQUEST_LATENCY, false ); if( randomize && BUGGIFY ) { FASTRESTORE_TRACK_REQUEST_LATENCY = false; } init( FASTRESTORE_TRACK_REQUEST_LATENCY, false ); if( randomize && BUGGIFY ) { FASTRESTORE_TRACK_REQUEST_LATENCY = false; }
init( FASTRESTORE_TRACK_LOADER_SEND_REQUESTS, false ); if( randomize && BUGGIFY ) { FASTRESTORE_TRACK_LOADER_SEND_REQUESTS = true; }
init( FASTRESTORE_MEMORY_THRESHOLD_MB_SOFT, 6144 ); if( randomize && BUGGIFY ) { FASTRESTORE_MEMORY_THRESHOLD_MB_SOFT = 1; } init( FASTRESTORE_MEMORY_THRESHOLD_MB_SOFT, 6144 ); if( randomize && BUGGIFY ) { FASTRESTORE_MEMORY_THRESHOLD_MB_SOFT = 1; }
init( FASTRESTORE_WAIT_FOR_MEMORY_LATENCY, 10 ); if( randomize && BUGGIFY ) { FASTRESTORE_WAIT_FOR_MEMORY_LATENCY = 60; } init( FASTRESTORE_WAIT_FOR_MEMORY_LATENCY, 10 ); if( randomize && BUGGIFY ) { FASTRESTORE_WAIT_FOR_MEMORY_LATENCY = 60; }
init( FASTRESTORE_HEARTBEAT_DELAY, 10 ); if( randomize && BUGGIFY ) { FASTRESTORE_HEARTBEAT_DELAY = deterministicRandom()->random01() * 120 + 2; } init( FASTRESTORE_HEARTBEAT_DELAY, 10 ); if( randomize && BUGGIFY ) { FASTRESTORE_HEARTBEAT_DELAY = deterministicRandom()->random01() * 120 + 2; }
@ -926,11 +924,14 @@ void ServerKnobs::initialize(Randomize randomize, ClientKnobs* clientKnobs, IsSi
init( BG_MERGE_CANDIDATE_DELAY_SECONDS, BG_MERGE_CANDIDATE_THRESHOLD_SECONDS / 10.0 ); init( BG_MERGE_CANDIDATE_DELAY_SECONDS, BG_MERGE_CANDIDATE_THRESHOLD_SECONDS / 10.0 );
init( BLOB_WORKER_INITIAL_SNAPSHOT_PARALLELISM, 8 ); if( randomize && BUGGIFY ) BLOB_WORKER_INITIAL_SNAPSHOT_PARALLELISM = 1; init( BLOB_WORKER_INITIAL_SNAPSHOT_PARALLELISM, 8 ); if( randomize && BUGGIFY ) BLOB_WORKER_INITIAL_SNAPSHOT_PARALLELISM = 1;
init( BLOB_WORKER_RESNAPSHOT_PARALLELISM, 40 ); if( randomize && BUGGIFY ) BLOB_WORKER_RESNAPSHOT_PARALLELISM = deterministicRandom()->randomInt(1, 10);
init( BLOB_WORKER_DELTA_FILE_WRITE_PARALLELISM, 2000 ); if( randomize && BUGGIFY ) BLOB_WORKER_DELTA_FILE_WRITE_PARALLELISM = deterministicRandom()->randomInt(10, 100);
init( BLOB_WORKER_TIMEOUT, 10.0 ); if( randomize && BUGGIFY ) BLOB_WORKER_TIMEOUT = 1.0; init( BLOB_WORKER_TIMEOUT, 10.0 ); if( randomize && BUGGIFY ) BLOB_WORKER_TIMEOUT = 1.0;
init( BLOB_WORKER_REQUEST_TIMEOUT, 5.0 ); if( randomize && BUGGIFY ) BLOB_WORKER_REQUEST_TIMEOUT = 1.0; init( BLOB_WORKER_REQUEST_TIMEOUT, 5.0 ); if( randomize && BUGGIFY ) BLOB_WORKER_REQUEST_TIMEOUT = 1.0;
init( BLOB_WORKERLIST_FETCH_INTERVAL, 1.0 ); init( BLOB_WORKERLIST_FETCH_INTERVAL, 1.0 );
init( BLOB_WORKER_BATCH_GRV_INTERVAL, 0.1 ); init( BLOB_WORKER_BATCH_GRV_INTERVAL, 0.1 );
init( BLOB_WORKER_DO_REJECT_WHEN_FULL, true ); if ( randomize && BUGGIFY ) BLOB_WORKER_DO_REJECT_WHEN_FULL = false;
init( BLOB_WORKER_REJECT_WHEN_FULL_THRESHOLD, 0.9 );
init( BLOB_MANAGER_STATUS_EXP_BACKOFF_MIN, 0.1 ); init( BLOB_MANAGER_STATUS_EXP_BACKOFF_MIN, 0.1 );
init( BLOB_MANAGER_STATUS_EXP_BACKOFF_MAX, 5.0 ); init( BLOB_MANAGER_STATUS_EXP_BACKOFF_MAX, 5.0 );

View File

@ -133,7 +133,8 @@ std::unordered_map<std::string, KeyRange> SpecialKeySpace::actorLineageApiComman
std::set<std::string> SpecialKeySpace::options = { "excluded/force", std::set<std::string> SpecialKeySpace::options = { "excluded/force",
"failed/force", "failed/force",
"excluded_locality/force", "excluded_locality/force",
"failed_locality/force" }; "failed_locality/force",
"worker_interfaces/verify" };
std::set<std::string> SpecialKeySpace::tracingOptions = { kTracingTransactionIdKey, kTracingTokenKey }; std::set<std::string> SpecialKeySpace::tracingOptions = { kTracingTransactionIdKey, kTracingTokenKey };
@ -1603,7 +1604,8 @@ Future<RangeResult> TracingOptionsImpl::getRange(ReadYourWritesTransaction* ryw,
void TracingOptionsImpl::set(ReadYourWritesTransaction* ryw, const KeyRef& key, const ValueRef& value) { void TracingOptionsImpl::set(ReadYourWritesTransaction* ryw, const KeyRef& key, const ValueRef& value) {
if (ryw->getApproximateSize() > 0) { if (ryw->getApproximateSize() > 0) {
ryw->setSpecialKeySpaceErrorMsg("tracing options must be set first"); ryw->setSpecialKeySpaceErrorMsg(
ManagementAPIError::toJsonString(false, "configure trace", "tracing options must be set first"));
ryw->getSpecialKeySpaceWriteMap().insert(key, std::make_pair(true, Optional<Value>())); ryw->getSpecialKeySpaceWriteMap().insert(key, std::make_pair(true, Optional<Value>()));
return; return;
} }
@ -1616,7 +1618,8 @@ void TracingOptionsImpl::set(ReadYourWritesTransaction* ryw, const KeyRef& key,
} else if (value.toString() == "false") { } else if (value.toString() == "false") {
ryw->setToken(0); ryw->setToken(0);
} else { } else {
ryw->setSpecialKeySpaceErrorMsg("token must be set to true/false"); ryw->setSpecialKeySpaceErrorMsg(
ManagementAPIError::toJsonString(false, "configure trace token", "token must be set to true/false"));
throw special_keys_api_failure(); throw special_keys_api_failure();
} }
} }
@ -1630,12 +1633,12 @@ Future<Optional<std::string>> TracingOptionsImpl::commit(ReadYourWritesTransacti
} }
void TracingOptionsImpl::clear(ReadYourWritesTransaction* ryw, const KeyRangeRef& range) { void TracingOptionsImpl::clear(ReadYourWritesTransaction* ryw, const KeyRangeRef& range) {
ryw->setSpecialKeySpaceErrorMsg("clear range disabled"); ryw->setSpecialKeySpaceErrorMsg(ManagementAPIError::toJsonString(false, "clear trace", "clear range disabled"));
throw special_keys_api_failure(); throw special_keys_api_failure();
} }
void TracingOptionsImpl::clear(ReadYourWritesTransaction* ryw, const KeyRef& key) { void TracingOptionsImpl::clear(ReadYourWritesTransaction* ryw, const KeyRef& key) {
ryw->setSpecialKeySpaceErrorMsg("clear disabled"); ryw->setSpecialKeySpaceErrorMsg(ManagementAPIError::toJsonString(false, "clear trace", "clear disabled"));
throw special_keys_api_failure(); throw special_keys_api_failure();
} }
@ -2180,7 +2183,8 @@ ACTOR static Future<RangeResult> actorLineageGetRangeActor(ReadYourWritesTransac
state std::vector<StringRef> endValues = kr.end.removePrefix(prefix).splitAny("/"_sr); state std::vector<StringRef> endValues = kr.end.removePrefix(prefix).splitAny("/"_sr);
// Require index (either "state" or "time") and address:port. // Require index (either "state" or "time") and address:port.
if (beginValues.size() < 2 || endValues.size() < 2) { if (beginValues.size() < 2 || endValues.size() < 2) {
ryw->setSpecialKeySpaceErrorMsg("missing required parameters (index, host)"); ryw->setSpecialKeySpaceErrorMsg(
ManagementAPIError::toJsonString(false, "read actor_lineage", "missing required parameters (index, host)"));
throw special_keys_api_failure(); throw special_keys_api_failure();
} }
@ -2199,12 +2203,14 @@ ACTOR static Future<RangeResult> actorLineageGetRangeActor(ReadYourWritesTransac
parse(endValues.begin() + 1, endValues.end(), endRangeHost, timeEnd, waitStateEnd, seqEnd); parse(endValues.begin() + 1, endValues.end(), endRangeHost, timeEnd, waitStateEnd, seqEnd);
} }
} else { } else {
ryw->setSpecialKeySpaceErrorMsg("invalid index in actor_lineage"); ryw->setSpecialKeySpaceErrorMsg(
ManagementAPIError::toJsonString(false, "read actor_lineage", "invalid index in actor_lineage"));
throw special_keys_api_failure(); throw special_keys_api_failure();
} }
} catch (Error& e) { } catch (Error& e) {
if (e.code() != special_keys_api_failure().code()) { if (e.code() != special_keys_api_failure().code()) {
ryw->setSpecialKeySpaceErrorMsg("failed to parse key"); ryw->setSpecialKeySpaceErrorMsg(
ManagementAPIError::toJsonString(false, "read actor_lineage", "failed to parse key"));
throw special_keys_api_failure(); throw special_keys_api_failure();
} else { } else {
throw e; throw e;
@ -2214,7 +2220,8 @@ ACTOR static Future<RangeResult> actorLineageGetRangeActor(ReadYourWritesTransac
if (kr.begin != kr.end && host != endRangeHost) { if (kr.begin != kr.end && host != endRangeHost) {
// The client doesn't know about all the hosts, so a get range covering // The client doesn't know about all the hosts, so a get range covering
// multiple hosts has no way of knowing which IP:port combos to use. // multiple hosts has no way of knowing which IP:port combos to use.
ryw->setSpecialKeySpaceErrorMsg("the host must remain the same on both ends of the range"); ryw->setSpecialKeySpaceErrorMsg(ManagementAPIError::toJsonString(
false, "read actor_lineage", "the host must remain the same on both ends of the range"));
throw special_keys_api_failure(); throw special_keys_api_failure();
} }
@ -2748,6 +2755,64 @@ Future<Optional<std::string>> FailedLocalitiesRangeImpl::commit(ReadYourWritesTr
return excludeLocalityCommitActor(ryw, true); return excludeLocalityCommitActor(ryw, true);
} }
// Defined in ReadYourWrites.actor.cpp
ACTOR Future<RangeResult> getWorkerInterfaces(Reference<IClusterConnectionRecord> clusterRecord);
// Defined in NativeAPI.actor.cpp
ACTOR Future<bool> verifyInterfaceActor(Reference<FlowLock> connectLock, ClientWorkerInterface workerInterf);
ACTOR static Future<RangeResult> workerInterfacesImplGetRangeActor(ReadYourWritesTransaction* ryw,
KeyRef prefix,
KeyRangeRef kr) {
if (!ryw->getDatabase().getPtr() || !ryw->getDatabase()->getConnectionRecord())
return RangeResult();
state RangeResult interfs = wait(getWorkerInterfaces(ryw->getDatabase()->getConnectionRecord()));
// for options' special keys, the boolean flag indicates if it's a SET operation
auto [verify, _] = ryw->getSpecialKeySpaceWriteMap()[SpecialKeySpace::getManagementApiCommandOptionSpecialKey(
"worker_interfaces", "verify")];
state RangeResult result;
if (verify) {
// if verify option is set, we try to talk to every worker and only returns those we can talk to
Reference<FlowLock> connectLock(new FlowLock(CLIENT_KNOBS->CLI_CONNECT_PARALLELISM));
state std::vector<Future<bool>> verifyInterfs;
for (const auto& [k_, value] : interfs) {
auto k = k_.withPrefix(prefix);
if (kr.contains(k)) {
ClientWorkerInterface workerInterf =
BinaryReader::fromStringRef<ClientWorkerInterface>(value, IncludeVersion());
verifyInterfs.push_back(verifyInterfaceActor(connectLock, workerInterf));
} else {
verifyInterfs.push_back(false);
}
}
wait(waitForAll(verifyInterfs));
// state int index;
for (int index = 0; index < interfs.size(); index++) {
if (verifyInterfs[index].get()) {
// if we can establish a connection, add the kv pair into the result
result.push_back_deep(result.arena(),
KeyValueRef(interfs[index].key.withPrefix(prefix), interfs[index].value));
}
}
} else {
for (const auto& [k_, v] : interfs) {
auto k = k_.withPrefix(prefix);
if (kr.contains(k))
result.push_back_deep(result.arena(), KeyValueRef(k, v));
}
}
std::sort(result.begin(), result.end(), KeyValueRef::OrderByKey{});
return result;
}
WorkerInterfacesSpecialKeyImpl::WorkerInterfacesSpecialKeyImpl(KeyRangeRef kr) : SpecialKeyRangeReadImpl(kr) {}
Future<RangeResult> WorkerInterfacesSpecialKeyImpl::getRange(ReadYourWritesTransaction* ryw,
KeyRangeRef kr,
GetRangeLimits limitsHint) const {
return workerInterfacesImplGetRangeActor(ryw, getKeyRange().begin, kr);
}
ACTOR Future<Void> validateSpecialSubrangeRead(ReadYourWritesTransaction* ryw, ACTOR Future<Void> validateSpecialSubrangeRead(ReadYourWritesTransaction* ryw,
KeySelector begin, KeySelector begin,
KeySelector end, KeySelector end,

View File

@ -356,7 +356,7 @@ const Key storageCacheServerKey(UID id) {
} }
const Value storageCacheServerValue(const StorageServerInterface& ssi) { const Value storageCacheServerValue(const StorageServerInterface& ssi) {
auto protocolVersion = currentProtocolVersion; auto protocolVersion = currentProtocolVersion();
protocolVersion.addObjectSerializerFlag(); protocolVersion.addObjectSerializerFlag();
return ObjectWriter::toValue(ssi, IncludeVersion(protocolVersion)); return ObjectWriter::toValue(ssi, IncludeVersion(protocolVersion));
} }
@ -666,7 +666,7 @@ const KeyRangeRef tagLocalityListKeys(LiteralStringRef("\xff/tagLocalityList/"),
const KeyRef tagLocalityListPrefix = tagLocalityListKeys.begin; const KeyRef tagLocalityListPrefix = tagLocalityListKeys.begin;
const Key tagLocalityListKeyFor(Optional<Value> dcID) { const Key tagLocalityListKeyFor(Optional<Value> dcID) {
BinaryWriter wr(AssumeVersion(currentProtocolVersion)); BinaryWriter wr(AssumeVersion(currentProtocolVersion()));
wr.serializeBytes(tagLocalityListKeys.begin); wr.serializeBytes(tagLocalityListKeys.begin);
wr << dcID; wr << dcID;
return wr.toValue(); return wr.toValue();
@ -679,7 +679,7 @@ const Value tagLocalityListValue(int8_t const& tagLocality) {
} }
Optional<Value> decodeTagLocalityListKey(KeyRef const& key) { Optional<Value> decodeTagLocalityListKey(KeyRef const& key) {
Optional<Value> dcID; Optional<Value> dcID;
BinaryReader rd(key.removePrefix(tagLocalityListKeys.begin), AssumeVersion(currentProtocolVersion)); BinaryReader rd(key.removePrefix(tagLocalityListKeys.begin), AssumeVersion(currentProtocolVersion()));
rd >> dcID; rd >> dcID;
return dcID; return dcID;
} }
@ -695,7 +695,7 @@ const KeyRangeRef datacenterReplicasKeys(LiteralStringRef("\xff\x02/datacenterRe
const KeyRef datacenterReplicasPrefix = datacenterReplicasKeys.begin; const KeyRef datacenterReplicasPrefix = datacenterReplicasKeys.begin;
const Key datacenterReplicasKeyFor(Optional<Value> dcID) { const Key datacenterReplicasKeyFor(Optional<Value> dcID) {
BinaryWriter wr(AssumeVersion(currentProtocolVersion)); BinaryWriter wr(AssumeVersion(currentProtocolVersion()));
wr.serializeBytes(datacenterReplicasKeys.begin); wr.serializeBytes(datacenterReplicasKeys.begin);
wr << dcID; wr << dcID;
return wr.toValue(); return wr.toValue();
@ -708,7 +708,7 @@ const Value datacenterReplicasValue(int const& replicas) {
} }
Optional<Value> decodeDatacenterReplicasKey(KeyRef const& key) { Optional<Value> decodeDatacenterReplicasKey(KeyRef const& key) {
Optional<Value> dcID; Optional<Value> dcID;
BinaryReader rd(key.removePrefix(datacenterReplicasKeys.begin), AssumeVersion(currentProtocolVersion)); BinaryReader rd(key.removePrefix(datacenterReplicasKeys.begin), AssumeVersion(currentProtocolVersion()));
rd >> dcID; rd >> dcID;
return dcID; return dcID;
} }
@ -729,14 +729,14 @@ const KeyRangeRef tLogDatacentersKeys(LiteralStringRef("\xff\x02/tLogDatacenters
const KeyRef tLogDatacentersPrefix = tLogDatacentersKeys.begin; const KeyRef tLogDatacentersPrefix = tLogDatacentersKeys.begin;
const Key tLogDatacentersKeyFor(Optional<Value> dcID) { const Key tLogDatacentersKeyFor(Optional<Value> dcID) {
BinaryWriter wr(AssumeVersion(currentProtocolVersion)); BinaryWriter wr(AssumeVersion(currentProtocolVersion()));
wr.serializeBytes(tLogDatacentersKeys.begin); wr.serializeBytes(tLogDatacentersKeys.begin);
wr << dcID; wr << dcID;
return wr.toValue(); return wr.toValue();
} }
Optional<Value> decodeTLogDatacentersKey(KeyRef const& key) { Optional<Value> decodeTLogDatacentersKey(KeyRef const& key) {
Optional<Value> dcID; Optional<Value> dcID;
BinaryReader rd(key.removePrefix(tLogDatacentersKeys.begin), AssumeVersion(currentProtocolVersion)); BinaryReader rd(key.removePrefix(tLogDatacentersKeys.begin), AssumeVersion(currentProtocolVersion()));
rd >> dcID; rd >> dcID;
return dcID; return dcID;
} }
@ -755,7 +755,7 @@ const Key serverListKeyFor(UID serverID) {
} }
const Value serverListValue(StorageServerInterface const& server) { const Value serverListValue(StorageServerInterface const& server) {
auto protocolVersion = currentProtocolVersion; auto protocolVersion = currentProtocolVersion();
protocolVersion.addObjectSerializerFlag(); protocolVersion.addObjectSerializerFlag();
return ObjectWriter::toValue(server, IncludeVersion(protocolVersion)); return ObjectWriter::toValue(server, IncludeVersion(protocolVersion));
} }
@ -787,7 +787,7 @@ StorageServerInterface decodeServerListValue(ValueRef const& value) {
} }
Value swVersionValue(SWVersion const& swversion) { Value swVersionValue(SWVersion const& swversion) {
auto protocolVersion = currentProtocolVersion; auto protocolVersion = currentProtocolVersion();
protocolVersion.addObjectSerializerFlag(); protocolVersion.addObjectSerializerFlag();
return ObjectWriter::toValue(swversion, IncludeVersion(protocolVersion)); return ObjectWriter::toValue(swversion, IncludeVersion(protocolVersion));
} }
@ -1331,6 +1331,9 @@ int64_t decodeBlobManagerEpochValue(ValueRef const& value) {
} }
// blob granule data // blob granule data
const KeyRef blobRangeActive = LiteralStringRef("1");
const KeyRef blobRangeInactive = LiteralStringRef("0");
const KeyRangeRef blobGranuleFileKeys(LiteralStringRef("\xff\x02/bgf/"), LiteralStringRef("\xff\x02/bgf0")); const KeyRangeRef blobGranuleFileKeys(LiteralStringRef("\xff\x02/bgf/"), LiteralStringRef("\xff\x02/bgf0"));
const KeyRangeRef blobGranuleMappingKeys(LiteralStringRef("\xff\x02/bgm/"), LiteralStringRef("\xff\x02/bgm0")); const KeyRangeRef blobGranuleMappingKeys(LiteralStringRef("\xff\x02/bgm/"), LiteralStringRef("\xff\x02/bgm0"));
const KeyRangeRef blobGranuleLockKeys(LiteralStringRef("\xff\x02/bgl/"), LiteralStringRef("\xff\x02/bgl0")); const KeyRangeRef blobGranuleLockKeys(LiteralStringRef("\xff\x02/bgl/"), LiteralStringRef("\xff\x02/bgl0"));
@ -1340,7 +1343,8 @@ const KeyRangeRef blobGranuleMergeBoundaryKeys(LiteralStringRef("\xff\x02/bgmerg
LiteralStringRef("\xff\x02/bgmergebounds0")); LiteralStringRef("\xff\x02/bgmergebounds0"));
const KeyRangeRef blobGranuleHistoryKeys(LiteralStringRef("\xff\x02/bgh/"), LiteralStringRef("\xff\x02/bgh0")); const KeyRangeRef blobGranuleHistoryKeys(LiteralStringRef("\xff\x02/bgh/"), LiteralStringRef("\xff\x02/bgh0"));
const KeyRangeRef blobGranulePurgeKeys(LiteralStringRef("\xff\x02/bgp/"), LiteralStringRef("\xff\x02/bgp0")); const KeyRangeRef blobGranulePurgeKeys(LiteralStringRef("\xff\x02/bgp/"), LiteralStringRef("\xff\x02/bgp0"));
const KeyRangeRef blobGranuleVersionKeys(LiteralStringRef("\xff\x02/bgv/"), LiteralStringRef("\xff\x02/bgv0")); const KeyRangeRef blobGranuleForcePurgedKeys(LiteralStringRef("\xff\x02/bgpforce/"),
LiteralStringRef("\xff\x02/bgpforce0"));
const KeyRef blobGranulePurgeChangeKey = LiteralStringRef("\xff\x02/bgpChange"); const KeyRef blobGranulePurgeChangeKey = LiteralStringRef("\xff\x02/bgpChange");
const uint8_t BG_FILE_TYPE_DELTA = 'D'; const uint8_t BG_FILE_TYPE_DELTA = 'D';

View File

@ -47,6 +47,10 @@ std::string TenantMapEntry::tenantStateToString(TenantState tenantState) {
return "removing"; return "removing";
case TenantState::UPDATING_CONFIGURATION: case TenantState::UPDATING_CONFIGURATION:
return "updating configuration"; return "updating configuration";
case TenantState::RENAMING_FROM:
return "renaming from";
case TenantState::RENAMING_TO:
return "renaming to";
case TenantState::ERROR: case TenantState::ERROR:
return "error"; return "error";
default: default:
@ -63,6 +67,10 @@ TenantState TenantMapEntry::stringToTenantState(std::string stateStr) {
return TenantState::REMOVING; return TenantState::REMOVING;
} else if (stateStr == "updating configuration") { } else if (stateStr == "updating configuration") {
return TenantState::UPDATING_CONFIGURATION; return TenantState::UPDATING_CONFIGURATION;
} else if (stateStr == "renaming from") {
return TenantState::RENAMING_FROM;
} else if (stateStr == "renaming to") {
return TenantState::RENAMING_TO;
} else if (stateStr == "error") { } else if (stateStr == "error") {
return TenantState::ERROR; return TenantState::ERROR;
} }
@ -127,7 +135,7 @@ std::string TenantMapEntry::toJson(int apiVersion) const {
} }
bool TenantMapEntry::matchesConfiguration(TenantMapEntry const& other) const { bool TenantMapEntry::matchesConfiguration(TenantMapEntry const& other) const {
return tenantGroup == other.tenantGroup; return tenantGroup == other.tenantGroup && encrypted == other.encrypted;
} }
void TenantMapEntry::configure(Standalone<StringRef> parameter, Optional<Value> value) { void TenantMapEntry::configure(Standalone<StringRef> parameter, Optional<Value> value) {
@ -139,6 +147,16 @@ void TenantMapEntry::configure(Standalone<StringRef> parameter, Optional<Value>
} }
} }
TenantMetadataSpecification& TenantMetadata::instance() {
static TenantMetadataSpecification _instance = TenantMetadataSpecification("\xff/"_sr);
return _instance;
}
Key TenantMetadata::tenantMapPrivatePrefix() {
static Key _prefix = "\xff"_sr.withSuffix(tenantMap().subspace.begin);
return _prefix;
}
TEST_CASE("/fdbclient/TenantMapEntry/Serialization") { TEST_CASE("/fdbclient/TenantMapEntry/Serialization") {
TenantMapEntry entry1(1, TenantState::READY, false); TenantMapEntry entry1(1, TenantState::READY, false);
ASSERT(entry1.prefix == "\x00\x00\x00\x00\x00\x00\x00\x01"_sr); ASSERT(entry1.prefix == "\x00\x00\x00\x00\x00\x00\x00\x01"_sr);

View File

@ -21,6 +21,7 @@
#include "fdbclient/BlobGranuleFiles.h" #include "fdbclient/BlobGranuleFiles.h"
#include "fdbclient/ClusterConnectionFile.h" #include "fdbclient/ClusterConnectionFile.h"
#include "fdbclient/ClusterConnectionMemoryRecord.h" #include "fdbclient/ClusterConnectionMemoryRecord.h"
#include "fdbclient/CoordinationInterface.h"
#include "fdbclient/ThreadSafeTransaction.h" #include "fdbclient/ThreadSafeTransaction.h"
#include "fdbclient/DatabaseContext.h" #include "fdbclient/DatabaseContext.h"
#include "fdbclient/versions.h" #include "fdbclient/versions.h"
@ -143,13 +144,47 @@ ThreadFuture<Void> ThreadSafeDatabase::waitPurgeGranulesComplete(const KeyRef& p
return onMainThread([db, key]() -> Future<Void> { return db->waitPurgeGranulesComplete(key); }); return onMainThread([db, key]() -> Future<Void> { return db->waitPurgeGranulesComplete(key); });
} }
ThreadSafeDatabase::ThreadSafeDatabase(Reference<IClusterConnectionRecord> connectionRecord, int apiVersion) { ThreadFuture<bool> ThreadSafeDatabase::blobbifyRange(const KeyRangeRef& keyRange) {
DatabaseContext* db = this->db;
KeyRange range = keyRange;
return onMainThread([=]() -> Future<bool> { return db->blobbifyRange(range); });
}
ThreadFuture<bool> ThreadSafeDatabase::unblobbifyRange(const KeyRangeRef& keyRange) {
DatabaseContext* db = this->db;
KeyRange range = keyRange;
return onMainThread([=]() -> Future<bool> { return db->blobbifyRange(range); });
}
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> ThreadSafeDatabase::listBlobbifiedRanges(const KeyRangeRef& keyRange,
int rangeLimit) {
DatabaseContext* db = this->db;
KeyRange range = keyRange;
return onMainThread(
[=]() -> Future<Standalone<VectorRef<KeyRangeRef>>> { return db->listBlobbifiedRanges(range, rangeLimit); });
}
ThreadFuture<Version> ThreadSafeDatabase::verifyBlobRange(const KeyRangeRef& keyRange, Optional<Version> version) {
DatabaseContext* db = this->db;
KeyRange range = keyRange;
return onMainThread([=]() -> Future<Version> { return db->verifyBlobRange(range, version); });
}
ThreadSafeDatabase::ThreadSafeDatabase(ConnectionRecordType connectionRecordType,
std::string connectionRecordString,
int apiVersion) {
// Allocate memory for the Database from this thread (so the pointer is known for subsequent method calls) // Allocate memory for the Database from this thread (so the pointer is known for subsequent method calls)
// but run its constructor on the main thread // but run its constructor on the main thread
DatabaseContext* db = this->db = DatabaseContext::allocateOnForeignThread(); DatabaseContext* db = this->db = DatabaseContext::allocateOnForeignThread();
onMainThreadVoid([db, connectionRecord, apiVersion]() { onMainThreadVoid([db, connectionRecordType, connectionRecordString, apiVersion]() {
try { try {
Reference<IClusterConnectionRecord> connectionRecord =
connectionRecordType == ConnectionRecordType::FILE
? Reference<IClusterConnectionRecord>(ClusterConnectionFile::openOrDefault(connectionRecordString))
: Reference<IClusterConnectionRecord>(
new ClusterConnectionMemoryRecord(ClusterConnectionString(connectionRecordString)));
Database::createDatabase(connectionRecord, apiVersion, IsInternal::False, LocalityData(), db).extractPtr(); Database::createDatabase(connectionRecord, apiVersion, IsInternal::False, LocalityData(), db).extractPtr();
} catch (Error& e) { } catch (Error& e) {
new (db) DatabaseContext(e); new (db) DatabaseContext(e);
@ -350,13 +385,14 @@ ThreadFuture<Standalone<VectorRef<const char*>>> ThreadSafeTransaction::getAddre
} }
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> ThreadSafeTransaction::getBlobGranuleRanges( ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> ThreadSafeTransaction::getBlobGranuleRanges(
const KeyRangeRef& keyRange) { const KeyRangeRef& keyRange,
int rangeLimit) {
ISingleThreadTransaction* tr = this->tr; ISingleThreadTransaction* tr = this->tr;
KeyRange r = keyRange; KeyRange r = keyRange;
return onMainThread([tr, r]() -> Future<Standalone<VectorRef<KeyRangeRef>>> { return onMainThread([=]() -> Future<Standalone<VectorRef<KeyRangeRef>>> {
tr->checkDeferredError(); tr->checkDeferredError();
return tr->getBlobGranuleRanges(r); return tr->getBlobGranuleRanges(r, rangeLimit);
}); });
} }
@ -563,19 +599,25 @@ void ThreadSafeTransaction::reset() {
extern const char* getSourceVersion(); extern const char* getSourceVersion();
ThreadSafeApi::ThreadSafeApi() ThreadSafeApi::ThreadSafeApi() : apiVersion(-1), transportId(0) {}
: apiVersion(-1), clientVersion(format("%s,%s,%llx", FDB_VT_VERSION, getSourceVersion(), currentProtocolVersion)),
transportId(0) {}
void ThreadSafeApi::selectApiVersion(int apiVersion) { void ThreadSafeApi::selectApiVersion(int apiVersion) {
this->apiVersion = apiVersion; this->apiVersion = apiVersion;
} }
const char* ThreadSafeApi::getClientVersion() { const char* ThreadSafeApi::getClientVersion() {
// There is only one copy of the ThreadSafeAPI, and it never gets deleted. Also, clientVersion is never modified. // There is only one copy of the ThreadSafeAPI, and it never gets deleted.
// Also, clientVersion is initialized on demand and never modified afterwards.
if (clientVersion.empty()) {
clientVersion = format("%s,%s,%llx", FDB_VT_VERSION, getSourceVersion(), currentProtocolVersion());
}
return clientVersion.c_str(); return clientVersion.c_str();
} }
void ThreadSafeApi::useFutureProtocolVersion() {
::useFutureProtocolVersion();
}
void ThreadSafeApi::setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value) { void ThreadSafeApi::setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value) {
if (option == FDBNetworkOptions::EXTERNAL_CLIENT_TRANSPORT_ID) { if (option == FDBNetworkOptions::EXTERNAL_CLIENT_TRANSPORT_ID) {
if (value.present()) { if (value.present()) {
@ -632,12 +674,12 @@ void ThreadSafeApi::stopNetwork() {
Reference<IDatabase> ThreadSafeApi::createDatabase(const char* clusterFilePath) { Reference<IDatabase> ThreadSafeApi::createDatabase(const char* clusterFilePath) {
return Reference<IDatabase>( return Reference<IDatabase>(
new ThreadSafeDatabase(ClusterConnectionFile::openOrDefault(clusterFilePath), apiVersion)); new ThreadSafeDatabase(ThreadSafeDatabase::ConnectionRecordType::FILE, clusterFilePath, apiVersion));
} }
Reference<IDatabase> ThreadSafeApi::createDatabaseFromConnectionString(const char* connectionString) { Reference<IDatabase> ThreadSafeApi::createDatabaseFromConnectionString(const char* connectionString) {
return Reference<IDatabase>(new ThreadSafeDatabase( return Reference<IDatabase>(new ThreadSafeDatabase(
makeReference<ClusterConnectionMemoryRecord>(ClusterConnectionString(connectionString)), apiVersion)); ThreadSafeDatabase::ConnectionRecordType::CONNECTION_STRING, connectionString, apiVersion));
} }
void ThreadSafeApi::addNetworkThreadCompletionHook(void (*hook)(void*), void* hookParameter) { void ThreadSafeApi::addNetworkThreadCompletionHook(void (*hook)(void*), void* hookParameter) {

View File

@ -29,7 +29,7 @@ namespace {
std::string const notFoundErrorCode = "404"; std::string const notFoundErrorCode = "404";
void printAzureError(std::string const& operationName, azure::storage_lite::storage_error const& err) { void printAzureError(std::string const& operationName, azure::storage_lite::storage_error const& err) {
printf("(%s) : Error from Azure SDK : %s (%s) : %s", printf("(%s) : Error from Azure SDK : %s (%s) : %s\n",
operationName.c_str(), operationName.c_str(),
err.code_name.c_str(), err.code_name.c_str(),
err.code.c_str(), err.code.c_str(),
@ -109,9 +109,9 @@ public:
class WriteFile final : public IAsyncFile, ReferenceCounted<WriteFile> { class WriteFile final : public IAsyncFile, ReferenceCounted<WriteFile> {
AsyncTaskThread* asyncTaskThread; AsyncTaskThread* asyncTaskThread;
std::shared_ptr<AzureClient> client;
std::string containerName; std::string containerName;
std::string blobName; std::string blobName;
std::shared_ptr<AzureClient> client;
int64_t m_cursor{ 0 }; int64_t m_cursor{ 0 };
// Ideally this buffer should not be a string, but // Ideally this buffer should not be a string, but
// the Azure SDK only supports/tests uploading to append // the Azure SDK only supports/tests uploading to append
@ -318,7 +318,7 @@ BackupContainerAzureBlobStore::BackupContainerAzureBlobStore(const std::string&
std::string accountKey = _accountKey; std::string accountKey = _accountKey;
auto credential = std::make_shared<azure::storage_lite::shared_key_credential>(accountName, accountKey); auto credential = std::make_shared<azure::storage_lite::shared_key_credential>(accountName, accountKey);
auto storageAccount = std::make_shared<azure::storage_lite::storage_account>( auto storageAccount = std::make_shared<azure::storage_lite::storage_account>(
accountName, credential, true, format("https://%s", endpoint.c_str())); accountName, credential, true, fmt::format("https://{}", endpoint));
client = std::make_unique<AzureClient>(storageAccount, 1); client = std::make_unique<AzureClient>(storageAccount, 1);
} }
@ -342,6 +342,7 @@ Future<Void> BackupContainerAzureBlobStore::create() {
Future<Void> encryptionSetupFuture = usesEncryption() ? encryptionSetupComplete() : Void(); Future<Void> encryptionSetupFuture = usesEncryption() ? encryptionSetupComplete() : Void();
return createContainerFuture && encryptionSetupFuture; return createContainerFuture && encryptionSetupFuture;
} }
Future<bool> BackupContainerAzureBlobStore::exists() { Future<bool> BackupContainerAzureBlobStore::exists() {
TraceEvent(SevDebug, "BCAzureBlobStoreCheckContainerExists").detail("ContainerName", containerName); TraceEvent(SevDebug, "BCAzureBlobStoreCheckContainerExists").detail("ContainerName", containerName);
return asyncTaskThread.execAsync([containerName = this->containerName, client = this->client] { return asyncTaskThread.execAsync([containerName = this->containerName, client = this->client] {

View File

@ -0,0 +1,33 @@
# Set up the Azure Backup Testing Environment
Make sure we built FDB with `-DBUILD_AZURE_BACKUP=ON`
# Test
If you run _BackupToBlob_ and _RestoreFromBlob_ workloads with the paramter _backupURL_ starts with `azure://`,
the workload will backup to and restore from the azure blob storage.
For example, _BackupAzureBlobCorrectness.toml_
## Url format
The code now supports the following style urls:
- `azure://<account_name>.blob.core.windows.net/<container_name>` (The formal url format for the blob service provided by the azure storage account)
- `azure://<ip|hostname>:<port>/<account_name>/<container_name>` (Directly providing the endpoint address for the blob service, usually for local testing)
## Local test environment
We need to use the _Azurite_ to simulate an Azure blob service locally.
Please follow the [turtorial](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=docker-hub) to start your service locally.
For example,
```
docker run -p 10000:10000 -v `pwd`:<path> -w <path> mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --oauth basic --cert ./<...>.pem --key ./<...>.key.pem --debug ./<log_file_path>
```
### Notice
- To use uses _https_, we need to provide the certificates via `--cert` and `--key`
The detailed [turtorial](https://github.com/Azure/Azurite/blob/main/README.md#https-setup) to setup HTTPS. (We tested with the `mkcert` method)
- To use Azure SDKs, we need to pass `--oauth basic` option
- Please take a look at the [difference](https://github.com/Azure/Azurite/blob/main/README.md#differences-between-azurite-and-azure-storage) between Azurite and Azure Storage

View File

@ -1,3 +1,5 @@
cmake_minimum_required(VERSION 3.13)
project(azurestorage-download) project(azurestorage-download)
include(ExternalProject) include(ExternalProject)

View File

@ -25,8 +25,6 @@
#include "fdbclient/AsyncTaskThread.h" #include "fdbclient/AsyncTaskThread.h"
#include "fdbclient/BackupContainerFileSystem.h" #include "fdbclient/BackupContainerFileSystem.h"
#include "storage_credential.h"
#include "storage_account.h"
#include "blob/blob_client.h" #include "blob/blob_client.h"
class BackupContainerAzureBlobStore final : public BackupContainerFileSystem, class BackupContainerAzureBlobStore final : public BackupContainerFileSystem,

View File

@ -252,7 +252,6 @@ struct BlobGranuleMergeBoundary {
struct BlobGranuleHistoryValue { struct BlobGranuleHistoryValue {
constexpr static FileIdentifier file_identifier = 991434; constexpr static FileIdentifier file_identifier = 991434;
UID granuleID; UID granuleID;
// VectorRef<std::pair<KeyRangeRef, Version>> parentGranules;
VectorRef<KeyRef> parentBoundaries; VectorRef<KeyRef> parentBoundaries;
VectorRef<Version> parentVersions; VectorRef<Version> parentVersions;

View File

@ -41,16 +41,31 @@ struct BlobWorkerStats {
Counter readRequestsWithBegin; Counter readRequestsWithBegin;
Counter readRequestsCollapsed; Counter readRequestsCollapsed;
Counter flushGranuleReqs; Counter flushGranuleReqs;
Counter compressionBytesRaw;
Counter compressionBytesFinal;
Counter fullRejections;
int numRangesAssigned; int numRangesAssigned;
int mutationBytesBuffered; int mutationBytesBuffered;
int activeReadRequests; int activeReadRequests;
int granulesPendingSplitCheck; int granulesPendingSplitCheck;
Version minimumCFVersion;
int notAtLatestChangeFeeds;
int64_t lastResidentMemory;
int64_t estimatedMaxResidentMemory;
Reference<FlowLock> initialSnapshotLock;
Reference<FlowLock> resnapshotLock;
Reference<FlowLock> deltaWritesLock;
Future<Void> logger; Future<Void> logger;
// Current stats maintained for a given blob worker process // Current stats maintained for a given blob worker process
explicit BlobWorkerStats(UID id, double interval) explicit BlobWorkerStats(UID id,
double interval,
Reference<FlowLock> initialSnapshotLock,
Reference<FlowLock> resnapshotLock,
Reference<FlowLock> deltaWritesLock)
: cc("BlobWorkerStats", id.toString()), : cc("BlobWorkerStats", id.toString()),
s3PutReqs("S3PutReqs", cc), s3GetReqs("S3GetReqs", cc), s3DeleteReqs("S3DeleteReqs", cc), s3PutReqs("S3PutReqs", cc), s3GetReqs("S3GetReqs", cc), s3DeleteReqs("S3DeleteReqs", cc),
@ -64,12 +79,25 @@ struct BlobWorkerStats {
readReqDeltaBytesReturned("ReadReqDeltaBytesReturned", cc), commitVersionChecks("CommitVersionChecks", cc), readReqDeltaBytesReturned("ReadReqDeltaBytesReturned", cc), commitVersionChecks("CommitVersionChecks", cc),
granuleUpdateErrors("GranuleUpdateErrors", cc), granuleRequestTimeouts("GranuleRequestTimeouts", cc), granuleUpdateErrors("GranuleUpdateErrors", cc), granuleRequestTimeouts("GranuleRequestTimeouts", cc),
readRequestsWithBegin("ReadRequestsWithBegin", cc), readRequestsCollapsed("ReadRequestsCollapsed", cc), readRequestsWithBegin("ReadRequestsWithBegin", cc), readRequestsCollapsed("ReadRequestsCollapsed", cc),
flushGranuleReqs("FlushGranuleReqs", cc), numRangesAssigned(0), mutationBytesBuffered(0), activeReadRequests(0), flushGranuleReqs("FlushGranuleReqs", cc), compressionBytesRaw("CompressionBytesRaw", cc),
granulesPendingSplitCheck(0) { compressionBytesFinal("CompressionBytesFinal", cc), fullRejections("FullRejections", cc), numRangesAssigned(0),
mutationBytesBuffered(0), activeReadRequests(0), granulesPendingSplitCheck(0), minimumCFVersion(0),
notAtLatestChangeFeeds(0), lastResidentMemory(0), estimatedMaxResidentMemory(0),
initialSnapshotLock(initialSnapshotLock), resnapshotLock(resnapshotLock), deltaWritesLock(deltaWritesLock) {
specialCounter(cc, "NumRangesAssigned", [this]() { return this->numRangesAssigned; }); specialCounter(cc, "NumRangesAssigned", [this]() { return this->numRangesAssigned; });
specialCounter(cc, "MutationBytesBuffered", [this]() { return this->mutationBytesBuffered; }); specialCounter(cc, "MutationBytesBuffered", [this]() { return this->mutationBytesBuffered; });
specialCounter(cc, "ActiveReadRequests", [this]() { return this->activeReadRequests; }); specialCounter(cc, "ActiveReadRequests", [this]() { return this->activeReadRequests; });
specialCounter(cc, "GranulesPendingSplitCheck", [this]() { return this->granulesPendingSplitCheck; }); specialCounter(cc, "GranulesPendingSplitCheck", [this]() { return this->granulesPendingSplitCheck; });
specialCounter(cc, "MinimumChangeFeedVersion", [this]() { return this->minimumCFVersion; });
specialCounter(cc, "NotAtLatestChangeFeeds", [this]() { return this->notAtLatestChangeFeeds; });
specialCounter(cc, "LastResidentMemory", [this]() { return this->lastResidentMemory; });
specialCounter(cc, "EstimatedMaxResidentMemory", [this]() { return this->estimatedMaxResidentMemory; });
specialCounter(cc, "InitialSnapshotsActive", [this]() { return this->initialSnapshotLock->activePermits(); });
specialCounter(cc, "InitialSnapshotsWaiting", [this]() { return this->initialSnapshotLock->waiters(); });
specialCounter(cc, "ReSnapshotsActive", [this]() { return this->resnapshotLock->activePermits(); });
specialCounter(cc, "ReSnapshotsWaiting", [this]() { return this->resnapshotLock->waiters(); });
specialCounter(cc, "DeltaFileWritesActive", [this]() { return this->deltaWritesLock->activePermits(); });
specialCounter(cc, "DeltaFileWritesWaiting", [this]() { return this->deltaWritesLock->waiters(); });
logger = traceCounters("BlobWorkerMetrics", id, interval, &cc, "BlobWorkerMetrics"); logger = traceCounters("BlobWorkerMetrics", id, interval, &cc, "BlobWorkerMetrics");
} }

View File

@ -32,13 +32,14 @@ struct BlobWorkerInterface {
constexpr static FileIdentifier file_identifier = 8358753; constexpr static FileIdentifier file_identifier = 8358753;
// TODO: mimic what StorageServerInterface does with sequential endpoint IDs // TODO: mimic what StorageServerInterface does with sequential endpoint IDs
RequestStream<ReplyPromise<Void>> waitFailure; RequestStream<ReplyPromise<Void>> waitFailure;
RequestStream<struct BlobGranuleFileRequest> blobGranuleFileRequest; PublicRequestStream<struct BlobGranuleFileRequest> blobGranuleFileRequest;
RequestStream<struct AssignBlobRangeRequest> assignBlobRangeRequest; RequestStream<struct AssignBlobRangeRequest> assignBlobRangeRequest;
RequestStream<struct RevokeBlobRangeRequest> revokeBlobRangeRequest; RequestStream<struct RevokeBlobRangeRequest> revokeBlobRangeRequest;
RequestStream<struct GetGranuleAssignmentsRequest> granuleAssignmentsRequest; RequestStream<struct GetGranuleAssignmentsRequest> granuleAssignmentsRequest;
RequestStream<struct GranuleStatusStreamRequest> granuleStatusStreamRequest; RequestStream<struct GranuleStatusStreamRequest> granuleStatusStreamRequest;
RequestStream<struct HaltBlobWorkerRequest> haltBlobWorker; RequestStream<struct HaltBlobWorkerRequest> haltBlobWorker;
RequestStream<struct FlushGranuleRequest> flushGranuleRequest; RequestStream<struct FlushGranuleRequest> flushGranuleRequest;
RequestStream<struct MinBlobVersionRequest> minBlobVersionRequest;
struct LocalityData locality; struct LocalityData locality;
UID myId; UID myId;
@ -57,6 +58,7 @@ struct BlobWorkerInterface {
streams.push_back(granuleStatusStreamRequest.getReceiver()); streams.push_back(granuleStatusStreamRequest.getReceiver());
streams.push_back(haltBlobWorker.getReceiver()); streams.push_back(haltBlobWorker.getReceiver());
streams.push_back(flushGranuleRequest.getReceiver()); streams.push_back(flushGranuleRequest.getReceiver());
streams.push_back(minBlobVersionRequest.getReceiver());
FlowTransport::transport().addEndpoints(streams); FlowTransport::transport().addEndpoints(streams);
} }
UID id() const { return myId; } UID id() const { return myId; }
@ -72,7 +74,7 @@ struct BlobWorkerInterface {
serializer(ar, myId, locality, waitFailure); serializer(ar, myId, locality, waitFailure);
if (Archive::isDeserializing) { if (Archive::isDeserializing) {
blobGranuleFileRequest = blobGranuleFileRequest =
RequestStream<struct BlobGranuleFileRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(1)); PublicRequestStream<struct BlobGranuleFileRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(1));
assignBlobRangeRequest = assignBlobRangeRequest =
RequestStream<struct AssignBlobRangeRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(2)); RequestStream<struct AssignBlobRangeRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(2));
revokeBlobRangeRequest = revokeBlobRangeRequest =
@ -85,6 +87,8 @@ struct BlobWorkerInterface {
RequestStream<struct HaltBlobWorkerRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(6)); RequestStream<struct HaltBlobWorkerRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(6));
flushGranuleRequest = flushGranuleRequest =
RequestStream<struct FlushGranuleRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(7)); RequestStream<struct FlushGranuleRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(7));
minBlobVersionRequest =
RequestStream<struct MinBlobVersionRequest>(waitFailure.getEndpoint().getAdjustedEndpoint(8));
} }
} }
}; };
@ -114,6 +118,8 @@ struct BlobGranuleFileRequest {
BlobGranuleFileRequest() {} BlobGranuleFileRequest() {}
bool verify() const { return tenantInfo.isAuthorized(); }
template <class Ar> template <class Ar>
void serialize(Ar& ar) { void serialize(Ar& ar) {
serializer(ar, keyRange, beginVersion, readVersion, canCollapseBegin, tenantInfo, reply, arena); serializer(ar, keyRange, beginVersion, readVersion, canCollapseBegin, tenantInfo, reply, arena);
@ -137,6 +143,28 @@ struct RevokeBlobRangeRequest {
} }
}; };
struct MinBlobVersionReply {
constexpr static FileIdentifier file_identifier = 6857512;
Version version;
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version);
}
};
struct MinBlobVersionRequest {
constexpr static FileIdentifier file_identifier = 4833278;
Version grv;
ReplyPromise<MinBlobVersionReply> reply;
MinBlobVersionRequest() {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, grv, reply);
}
};
/* /*
* Continue: Blob worker should continue handling a granule that was evaluated for a split * Continue: Blob worker should continue handling a granule that was evaluated for a split
* Normal: Blob worker should open the granule and start processing it * Normal: Blob worker should open the granule and start processing it
@ -172,6 +200,7 @@ struct GranuleStatusReply : public ReplyPromiseStreamReply {
KeyRange granuleRange; KeyRange granuleRange;
bool doSplit; bool doSplit;
bool writeHotSplit; bool writeHotSplit;
bool initialSplitTooBig;
int64_t continueEpoch; int64_t continueEpoch;
int64_t continueSeqno; int64_t continueSeqno;
UID granuleID; UID granuleID;
@ -180,11 +209,13 @@ struct GranuleStatusReply : public ReplyPromiseStreamReply {
bool mergeCandidate; bool mergeCandidate;
int64_t originalEpoch; int64_t originalEpoch;
int64_t originalSeqno; int64_t originalSeqno;
Optional<Key> proposedSplitKey;
GranuleStatusReply() {} GranuleStatusReply() {}
explicit GranuleStatusReply(KeyRange range, explicit GranuleStatusReply(KeyRange range,
bool doSplit, bool doSplit,
bool writeHotSplit, bool writeHotSplit,
bool initialSplitTooBig,
int64_t continueEpoch, int64_t continueEpoch,
int64_t continueSeqno, int64_t continueSeqno,
UID granuleID, UID granuleID,
@ -193,11 +224,15 @@ struct GranuleStatusReply : public ReplyPromiseStreamReply {
bool mergeCandidate, bool mergeCandidate,
int64_t originalEpoch, int64_t originalEpoch,
int64_t originalSeqno) int64_t originalSeqno)
: granuleRange(range), doSplit(doSplit), writeHotSplit(writeHotSplit), continueEpoch(continueEpoch), : granuleRange(range), doSplit(doSplit), writeHotSplit(writeHotSplit), initialSplitTooBig(initialSplitTooBig),
continueSeqno(continueSeqno), granuleID(granuleID), startVersion(startVersion), blockedVersion(blockedVersion), continueEpoch(continueEpoch), continueSeqno(continueSeqno), granuleID(granuleID), startVersion(startVersion),
mergeCandidate(mergeCandidate), originalEpoch(originalEpoch), originalSeqno(originalSeqno) {} blockedVersion(blockedVersion), mergeCandidate(mergeCandidate), originalEpoch(originalEpoch),
originalSeqno(originalSeqno) {}
int expectedSize() const { return sizeof(GranuleStatusReply) + granuleRange.expectedSize(); } int expectedSize() const {
return sizeof(GranuleStatusReply) + granuleRange.expectedSize() +
(proposedSplitKey.present() ? proposedSplitKey.get().expectedSize() : 0);
}
template <class Ar> template <class Ar>
void serialize(Ar& ar) { void serialize(Ar& ar) {
@ -207,6 +242,7 @@ struct GranuleStatusReply : public ReplyPromiseStreamReply {
granuleRange, granuleRange,
doSplit, doSplit,
writeHotSplit, writeHotSplit,
initialSplitTooBig,
continueEpoch, continueEpoch,
continueSeqno, continueSeqno,
granuleID, granuleID,
@ -214,7 +250,8 @@ struct GranuleStatusReply : public ReplyPromiseStreamReply {
blockedVersion, blockedVersion,
mergeCandidate, mergeCandidate,
originalEpoch, originalEpoch,
originalSeqno); originalSeqno,
proposedSplitKey);
} }
}; };

View File

@ -39,10 +39,6 @@ public:
double FAILURE_MAX_DELAY; double FAILURE_MAX_DELAY;
double FAILURE_MIN_DELAY; double FAILURE_MIN_DELAY;
double FAILURE_TIMEOUT_DELAY;
double CLIENT_FAILURE_TIMEOUT_DELAY;
double FAILURE_EMERGENCY_DELAY;
double FAILURE_MAX_GENERATIONS;
double RECOVERY_DELAY_START_GENERATION; double RECOVERY_DELAY_START_GENERATION;
double RECOVERY_DELAY_SECONDS_PER_GENERATION; double RECOVERY_DELAY_SECONDS_PER_GENERATION;
double MAX_GENERATIONS; double MAX_GENERATIONS;
@ -61,6 +57,7 @@ public:
double WRONG_SHARD_SERVER_DELAY; // SOMEDAY: This delay can limit performance of retrieving data when the cache is double WRONG_SHARD_SERVER_DELAY; // SOMEDAY: This delay can limit performance of retrieving data when the cache is
// mostly wrong (e.g. dumping the database after a test) // mostly wrong (e.g. dumping the database after a test)
double FUTURE_VERSION_RETRY_DELAY; double FUTURE_VERSION_RETRY_DELAY;
double GRV_ERROR_RETRY_DELAY;
double UNKNOWN_TENANT_RETRY_DELAY; double UNKNOWN_TENANT_RETRY_DELAY;
int REPLY_BYTE_LIMIT; int REPLY_BYTE_LIMIT;
double DEFAULT_BACKOFF; double DEFAULT_BACKOFF;
@ -161,10 +158,8 @@ public:
double BACKUP_AGGREGATE_POLL_RATE; double BACKUP_AGGREGATE_POLL_RATE;
double BACKUP_AGGREGATE_POLL_RATE_UPDATE_INTERVAL; double BACKUP_AGGREGATE_POLL_RATE_UPDATE_INTERVAL;
int BACKUP_LOG_WRITE_BATCH_MAX_SIZE; int BACKUP_LOG_WRITE_BATCH_MAX_SIZE;
int BACKUP_LOG_ATOMIC_OPS_SIZE;
int BACKUP_MAX_LOG_RANGES; int BACKUP_MAX_LOG_RANGES;
int BACKUP_SIM_COPY_LOG_RANGES; int BACKUP_SIM_COPY_LOG_RANGES;
int BACKUP_OPERATION_COST_OVERHEAD;
int BACKUP_VERSION_DELAY; int BACKUP_VERSION_DELAY;
int BACKUP_MAP_KEY_LOWER_LIMIT; int BACKUP_MAP_KEY_LOWER_LIMIT;
int BACKUP_MAP_KEY_UPPER_LIMIT; int BACKUP_MAP_KEY_UPPER_LIMIT;
@ -269,10 +264,6 @@ public:
double BUSYNESS_SPIKE_START_THRESHOLD; double BUSYNESS_SPIKE_START_THRESHOLD;
double BUSYNESS_SPIKE_SATURATED_THRESHOLD; double BUSYNESS_SPIKE_SATURATED_THRESHOLD;
// multi-version client control
int MVC_CLIENTLIB_CHUNK_SIZE;
int MVC_CLIENTLIB_CHUNKS_PER_TRANSACTION;
// Blob Granules // Blob Granules
int BG_MAX_GRANULE_PARALLELISM; int BG_MAX_GRANULE_PARALLELISM;
int BG_TOO_MANY_GRANULES; int BG_TOO_MANY_GRANULES;

View File

@ -98,32 +98,44 @@ struct ClusterControllerClientInterface {
} }
}; };
template <class T>
struct ItemWithExamples {
T item;
int count;
std::vector<std::pair<NetworkAddress, Key>> examples;
ItemWithExamples() : item{}, count(0) {}
ItemWithExamples(T const& item, int count, std::vector<std::pair<NetworkAddress, Key>> const& examples)
: item(item), count(count), examples(examples) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, item, count, examples);
}
};
struct OpenDatabaseRequest { struct OpenDatabaseRequest {
constexpr static FileIdentifier file_identifier = 2799502; constexpr static FileIdentifier file_identifier = 2799502;
// Sent by the native API to the cluster controller to open a database and track client // Sent by the native API to the cluster controller to open a database and track client
// info changes. Returns immediately if the current client info id is different from // info changes. Returns immediately if the current client info id is different from
// knownClientInfoID; otherwise returns when it next changes (or perhaps after a long interval) // knownClientInfoID; otherwise returns when it next changes (or perhaps after a long interval)
int clientCount; struct Samples {
std::vector<ItemWithExamples<Key>> issues; int count;
std::vector<ItemWithExamples<Standalone<ClientVersionRef>>> supportedVersions;
std::vector<ItemWithExamples<Key>> maxProtocolSupported; // network address / trace log group
std::set<std::pair<NetworkAddress, Key>> samples;
Samples() : count(0), samples{} {}
template <typename Ar>
void serialize(Ar& ar) {
serializer(ar, count, samples);
}
// Merges a set of Samples into *this
Samples& operator+=(const Samples& other) {
count += other.count;
samples.insert(std::begin(other.samples), std::end(other.samples));
return *this;
}
};
int clientCount = 0;
// Maps issue to Samples
std::map<Key, Samples> issues;
// Maps ClientVersionRef to Samples
std::map<Standalone<ClientVersionRef>, Samples> supportedVersions;
// Maps max protocol to Samples
std::map<Key, Samples> maxProtocolSupported;
UID knownClientInfoID; UID knownClientInfoID;
ReplyPromise<struct ClientDBInfo> reply; ReplyPromise<struct ClientDBInfo> reply;

View File

@ -25,6 +25,7 @@
#include "fdbclient/FDBTypes.h" #include "fdbclient/FDBTypes.h"
#include "fdbclient/Knobs.h" #include "fdbclient/Knobs.h"
#include "fdbclient/Tracing.h" #include "fdbclient/Tracing.h"
#include "flow/BlobCipher.h"
// The versioned message has wire format : -1, version, messages // The versioned message has wire format : -1, version, messages
static const int32_t VERSION_HEADER = -1; static const int32_t VERSION_HEADER = -1;
@ -79,7 +80,7 @@ struct MutationRef {
CompareAndClear, CompareAndClear,
Reserved_For_SpanContextMessage /* See fdbserver/SpanContextMessage.h */, Reserved_For_SpanContextMessage /* See fdbserver/SpanContextMessage.h */,
Reserved_For_OTELSpanContextMessage, Reserved_For_OTELSpanContextMessage,
Reserved_For_EncryptedMutationMessage /* See fdbserver/EncryptedMutationMessage.actor.h */, Encrypted, /* Represents an encrypted mutation and cannot be used directly before decrypting */
MAX_ATOMIC_OP MAX_ATOMIC_OP
}; };
// This is stored this way for serialization purposes. // This is stored this way for serialization purposes.
@ -128,6 +129,64 @@ struct MutationRef {
} }
} }
// An encrypted mutation has type Encrypted, encryption header (which contains encryption metadata) as param1,
// and the payload as param2. It can be serialize/deserialize as normal mutation, but can only be used after
// decryption via decrypt().
bool isEncrypted() const { return type == Encrypted; }
const BlobCipherEncryptHeader* encryptionHeader() const {
ASSERT(isEncrypted());
return reinterpret_cast<const BlobCipherEncryptHeader*>(param1.begin());
}
MutationRef encrypt(const std::unordered_map<EncryptCipherDomainId, Reference<BlobCipherKey>>& cipherKeys,
const EncryptCipherDomainId& domainId,
Arena& arena) const {
ASSERT_NE(domainId, ENCRYPT_INVALID_DOMAIN_ID);
auto textCipherItr = cipherKeys.find(domainId);
auto headerCipherItr = cipherKeys.find(ENCRYPT_HEADER_DOMAIN_ID);
ASSERT(textCipherItr != cipherKeys.end() && textCipherItr->second.isValid());
ASSERT(headerCipherItr != cipherKeys.end() && headerCipherItr->second.isValid());
uint8_t iv[AES_256_IV_LENGTH] = { 0 };
deterministicRandom()->randomBytes(iv, AES_256_IV_LENGTH);
BinaryWriter bw(AssumeVersion(ProtocolVersion::withEncryptionAtRest()));
bw << *this;
EncryptBlobCipherAes265Ctr cipher(textCipherItr->second,
headerCipherItr->second,
iv,
AES_256_IV_LENGTH,
ENCRYPT_HEADER_AUTH_TOKEN_MODE_SINGLE);
BlobCipherEncryptHeader* header = new (arena) BlobCipherEncryptHeader;
StringRef headerRef(reinterpret_cast<const uint8_t*>(header), sizeof(BlobCipherEncryptHeader));
StringRef payload =
cipher.encrypt(static_cast<const uint8_t*>(bw.getData()), bw.getLength(), header, arena)->toStringRef();
return MutationRef(Encrypted, headerRef, payload);
}
MutationRef encryptMetadata(const std::unordered_map<EncryptCipherDomainId, Reference<BlobCipherKey>>& cipherKeys,
Arena& arena) const {
return encrypt(cipherKeys, SYSTEM_KEYSPACE_ENCRYPT_DOMAIN_ID, arena);
}
MutationRef decrypt(const std::unordered_map<BlobCipherDetails, Reference<BlobCipherKey>>& cipherKeys,
Arena& arena,
StringRef* buf = nullptr) const {
const BlobCipherEncryptHeader* header = encryptionHeader();
auto textCipherItr = cipherKeys.find(header->cipherTextDetails);
auto headerCipherItr = cipherKeys.find(header->cipherHeaderDetails);
ASSERT(textCipherItr != cipherKeys.end() && textCipherItr->second.isValid());
ASSERT(headerCipherItr != cipherKeys.end() && headerCipherItr->second.isValid());
DecryptBlobCipherAes256Ctr cipher(textCipherItr->second, headerCipherItr->second, header->iv);
StringRef plaintext = cipher.decrypt(param2.begin(), param2.size(), *header, arena)->toStringRef();
if (buf != nullptr) {
*buf = plaintext;
}
ArenaReader reader(arena, plaintext, AssumeVersion(ProtocolVersion::withEncryptionAtRest()));
MutationRef mutation;
reader >> mutation;
return mutation;
}
// These masks define which mutation types have particular properties (they are used to implement // These masks define which mutation types have particular properties (they are used to implement
// isSingleKeyMutation() etc) // isSingleKeyMutation() etc)
enum { enum {

View File

@ -25,6 +25,7 @@
#include "flow/FastRef.h" #include "flow/FastRef.h"
#include "fdbclient/GlobalConfig.actor.h" #include "fdbclient/GlobalConfig.actor.h"
#include "fdbclient/StorageServerInterface.h" #include "fdbclient/StorageServerInterface.h"
#include "flow/IRandom.h"
#include "flow/genericactors.actor.h" #include "flow/genericactors.actor.h"
#include <vector> #include <vector>
#include <unordered_map> #include <unordered_map>
@ -180,6 +181,8 @@ struct ChangeFeedData : ReferenceCounted<ChangeFeedData> {
Version getVersion(); Version getVersion();
Future<Void> whenAtLeast(Version version); Future<Void> whenAtLeast(Version version);
UID dbgid;
DatabaseContext* context;
NotifiedVersion lastReturnedVersion; NotifiedVersion lastReturnedVersion;
std::vector<Reference<ChangeFeedStorageData>> storageData; std::vector<Reference<ChangeFeedStorageData>> storageData;
AsyncVar<int> notAtLatest; AsyncVar<int> notAtLatest;
@ -189,7 +192,8 @@ struct ChangeFeedData : ReferenceCounted<ChangeFeedData> {
Version popVersion = Version popVersion =
invalidVersion; // like TLog pop version, set by SS and client can check it to see if they missed data invalidVersion; // like TLog pop version, set by SS and client can check it to see if they missed data
ChangeFeedData() : notAtLatest(1) {} explicit ChangeFeedData(DatabaseContext* context = nullptr);
~ChangeFeedData();
}; };
struct EndpointFailureInfo { struct EndpointFailureInfo {
@ -374,12 +378,18 @@ public:
Future<OverlappingChangeFeedsInfo> getOverlappingChangeFeeds(KeyRangeRef ranges, Version minVersion); Future<OverlappingChangeFeedsInfo> getOverlappingChangeFeeds(KeyRangeRef ranges, Version minVersion);
Future<Void> popChangeFeedMutations(Key rangeID, Version version); Future<Void> popChangeFeedMutations(Key rangeID, Version version);
// BlobGranule API.
Future<Key> purgeBlobGranules(KeyRange keyRange, Future<Key> purgeBlobGranules(KeyRange keyRange,
Version purgeVersion, Version purgeVersion,
Optional<TenantName> tenant, Optional<TenantName> tenant,
bool force = false); bool force = false);
Future<Void> waitPurgeGranulesComplete(Key purgeKey); Future<Void> waitPurgeGranulesComplete(Key purgeKey);
Future<bool> blobbifyRange(KeyRange range);
Future<bool> unblobbifyRange(KeyRange range);
Future<Standalone<VectorRef<KeyRangeRef>>> listBlobbifiedRanges(KeyRange range, int rangeLimit);
Future<Version> verifyBlobRange(const KeyRange& range, Optional<Version> version);
// private: // private:
explicit DatabaseContext(Reference<AsyncVar<Reference<IClusterConnectionRecord>>> connectionRecord, explicit DatabaseContext(Reference<AsyncVar<Reference<IClusterConnectionRecord>>> connectionRecord,
Reference<AsyncVar<ClientDBInfo>> clientDBInfo, Reference<AsyncVar<ClientDBInfo>> clientDBInfo,
@ -468,8 +478,11 @@ public:
// map from changeFeedId -> changeFeedRange // map from changeFeedId -> changeFeedRange
std::unordered_map<Key, KeyRange> changeFeedCache; std::unordered_map<Key, KeyRange> changeFeedCache;
std::unordered_map<UID, Reference<ChangeFeedStorageData>> changeFeedUpdaters; std::unordered_map<UID, Reference<ChangeFeedStorageData>> changeFeedUpdaters;
std::map<UID, ChangeFeedData*> notAtLatestChangeFeeds;
Reference<ChangeFeedStorageData> getStorageData(StorageServerInterface interf); Reference<ChangeFeedStorageData> getStorageData(StorageServerInterface interf);
Version getMinimumChangeFeedVersion();
void setDesiredChangeFeedVersion(Version v);
// map from ssid -> ss tag // map from ssid -> ss tag
// @note this map allows the client to identify the latest commit versions // @note this map allows the client to identify the latest commit versions

View File

@ -1418,7 +1418,7 @@ struct DatabaseSharedState {
std::atomic<int> refCount; std::atomic<int> refCount;
DatabaseSharedState() DatabaseSharedState()
: protocolVersion(currentProtocolVersion), mutexLock(Mutex()), grvCacheSpace(GRVCacheSpace()), refCount(0) {} : protocolVersion(currentProtocolVersion()), mutexLock(Mutex()), grvCacheSpace(GRVCacheSpace()), refCount(0) {}
}; };
inline bool isValidPerpetualStorageWiggleLocality(std::string locality) { inline bool isValidPerpetualStorageWiggleLocality(std::string locality) {
@ -1465,7 +1465,7 @@ struct StorageMetadataType {
bool wrongConfigured = false; bool wrongConfigured = false;
StorageMetadataType() : createdTime(0) {} StorageMetadataType() : createdTime(0) {}
StorageMetadataType(uint64_t t, KeyValueStoreType storeType = KeyValueStoreType::END, bool wrongConfigured = false) StorageMetadataType(double t, KeyValueStoreType storeType = KeyValueStoreType::END, bool wrongConfigured = false)
: createdTime(t), storeType(storeType), wrongConfigured(wrongConfigured) {} : createdTime(t), storeType(storeType), wrongConfigured(wrongConfigured) {}
static double currentTime() { return g_network->timer(); } static double currentTime() { return g_network->timer(); }

View File

@ -480,7 +480,7 @@ Future<ConfigurationResult> changeConfig(Reference<DB> db, std::map<std::string,
if (newConfig.tenantMode != oldConfig.tenantMode) { if (newConfig.tenantMode != oldConfig.tenantMode) {
Optional<MetaclusterRegistrationEntry> metaclusterRegistration = Optional<MetaclusterRegistrationEntry> metaclusterRegistration =
wait(MetaclusterMetadata::metaclusterRegistration.get(tr)); wait(MetaclusterMetadata::metaclusterRegistration().get(tr));
if (metaclusterRegistration.present()) { if (metaclusterRegistration.present()) {
return ConfigurationResult::DATABASE_IS_REGISTERED; return ConfigurationResult::DATABASE_IS_REGISTERED;
} }

View File

@ -1,4 +1,3 @@
/* /*
* GrvProxyInterface.h * GrvProxyInterface.h
* *

View File

@ -78,7 +78,8 @@ public:
virtual ThreadFuture<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRangeRef& range, virtual ThreadFuture<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRangeRef& range,
int64_t chunkSize) = 0; int64_t chunkSize) = 0;
virtual ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRangeRef& keyRange) = 0; virtual ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRangeRef& keyRange,
int rowLimit) = 0;
virtual ThreadResult<RangeResult> readBlobGranules(const KeyRangeRef& keyRange, virtual ThreadResult<RangeResult> readBlobGranules(const KeyRangeRef& keyRange,
Version beginVersion, Version beginVersion,
@ -172,6 +173,13 @@ public:
virtual ThreadFuture<Key> purgeBlobGranules(const KeyRangeRef& keyRange, Version purgeVersion, bool force) = 0; virtual ThreadFuture<Key> purgeBlobGranules(const KeyRangeRef& keyRange, Version purgeVersion, bool force) = 0;
virtual ThreadFuture<Void> waitPurgeGranulesComplete(const KeyRef& purgeKey) = 0; virtual ThreadFuture<Void> waitPurgeGranulesComplete(const KeyRef& purgeKey) = 0;
virtual ThreadFuture<bool> blobbifyRange(const KeyRangeRef& keyRange) = 0;
virtual ThreadFuture<bool> unblobbifyRange(const KeyRangeRef& keyRange) = 0;
virtual ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> listBlobbifiedRanges(const KeyRangeRef& keyRange,
int rangeLimit) = 0;
virtual ThreadFuture<Version> verifyBlobRange(const KeyRangeRef& keyRange, Optional<Version> version) = 0;
// Interface to manage shared state across multiple connections to the same Database // Interface to manage shared state across multiple connections to the same Database
virtual ThreadFuture<DatabaseSharedState*> createSharedState() = 0; virtual ThreadFuture<DatabaseSharedState*> createSharedState() = 0;
virtual void setSharedState(DatabaseSharedState* p) = 0; virtual void setSharedState(DatabaseSharedState* p) = 0;
@ -190,6 +198,7 @@ public:
virtual void selectApiVersion(int apiVersion) = 0; virtual void selectApiVersion(int apiVersion) = 0;
virtual const char* getClientVersion() = 0; virtual const char* getClientVersion() = 0;
virtual void useFutureProtocolVersion() = 0;
virtual void setNetworkOption(FDBNetworkOptions::Option option, virtual void setNetworkOption(FDBNetworkOptions::Option option,
Optional<StringRef> value = Optional<StringRef>()) = 0; Optional<StringRef> value = Optional<StringRef>()) = 0;

View File

@ -55,7 +55,7 @@ public:
Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(KeyRange const& range, int64_t chunkSize) override { Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(KeyRange const& range, int64_t chunkSize) override {
throw client_invalid_operation(); throw client_invalid_operation();
} }
Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(KeyRange const& range) override { Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(KeyRange const& range, int rowLimit) override {
throw client_invalid_operation(); throw client_invalid_operation();
} }
Future<Standalone<VectorRef<BlobGranuleChunkRef>>> readBlobGranules(KeyRange const& range, Future<Standalone<VectorRef<BlobGranuleChunkRef>>> readBlobGranules(KeyRange const& range,

View File

@ -80,7 +80,7 @@ public:
virtual Future<Standalone<VectorRef<const char*>>> getAddressesForKey(Key const& key) = 0; virtual Future<Standalone<VectorRef<const char*>>> getAddressesForKey(Key const& key) = 0;
virtual Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(KeyRange const& range, int64_t chunkSize) = 0; virtual Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(KeyRange const& range, int64_t chunkSize) = 0;
virtual Future<int64_t> getEstimatedRangeSizeBytes(KeyRange const& keys) = 0; virtual Future<int64_t> getEstimatedRangeSizeBytes(KeyRange const& keys) = 0;
virtual Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(KeyRange const& range) = 0; virtual Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(KeyRange const& range, int rangeLimit) = 0;
virtual Future<Standalone<VectorRef<BlobGranuleChunkRef>>> readBlobGranules(KeyRange const& range, virtual Future<Standalone<VectorRef<BlobGranuleChunkRef>>> readBlobGranules(KeyRange const& range,
Version begin, Version begin,
Optional<Version> readVersion, Optional<Version> readVersion,

View File

@ -25,6 +25,7 @@
#include "fdbclient/ClientBooleanParams.h" #include "fdbclient/ClientBooleanParams.h"
#include "fdbclient/CommitTransaction.h" #include "fdbclient/CommitTransaction.h"
#include "fdbclient/FDBOptions.g.h"
#include "fdbclient/GenericTransactionHelper.h" #include "fdbclient/GenericTransactionHelper.h"
#include "fdbclient/Subspace.h" #include "fdbclient/Subspace.h"
#include "flow/ObjectSerializer.h" #include "flow/ObjectSerializer.h"

View File

@ -53,16 +53,25 @@ struct Traceable<ClusterUsage> : std::true_type {
} }
}; };
// Represents the various states that a data cluster could be in.
//
// READY - the data cluster is active
// REMOVING - the data cluster is being removed and cannot have its configuration changed or any tenants created
// RESTORING - the data cluster is being restored and cannot have its configuration changed or any tenants
// created/updated/deleted.
enum class DataClusterState { READY, REMOVING, RESTORING };
struct DataClusterEntry { struct DataClusterEntry {
constexpr static FileIdentifier file_identifier = 929511; constexpr static FileIdentifier file_identifier = 929511;
static std::string clusterStateToString(DataClusterState clusterState);
static DataClusterState stringToClusterState(std::string stateStr);
UID id; UID id;
ClusterUsage capacity; ClusterUsage capacity;
ClusterUsage allocated; ClusterUsage allocated;
// If true, then tenant groups cannot be assigned to this cluster. This is used when a cluster is being forcefully DataClusterState clusterState = DataClusterState::READY;
// removed.
bool locked = false;
DataClusterEntry() = default; DataClusterEntry() = default;
DataClusterEntry(ClusterUsage capacity) : capacity(capacity) {} DataClusterEntry(ClusterUsage capacity) : capacity(capacity) {}
@ -81,19 +90,11 @@ struct DataClusterEntry {
return ObjectReader::fromStringRef<DataClusterEntry>(value, IncludeVersion()); return ObjectReader::fromStringRef<DataClusterEntry>(value, IncludeVersion());
} }
json_spirit::mObject toJson() const { json_spirit::mObject toJson() const;
json_spirit::mObject obj;
obj["capacity"] = capacity.toJson();
obj["allocated"] = allocated.toJson();
if (locked) {
obj["locked"] = locked;
}
return obj;
}
template <class Ar> template <class Ar>
void serialize(Ar& ar) { void serialize(Ar& ar) {
serializer(ar, id, capacity, allocated, locked); serializer(ar, id, capacity, allocated, clusterState);
} }
}; };
@ -176,10 +177,7 @@ struct Traceable<MetaclusterRegistrationEntry> : std::true_type {
struct MetaclusterMetadata { struct MetaclusterMetadata {
// Registration information for a metacluster, stored on both management and data clusters // Registration information for a metacluster, stored on both management and data clusters
static inline KeyBackedObjectProperty<MetaclusterRegistrationEntry, decltype(IncludeVersion())> static KeyBackedObjectProperty<MetaclusterRegistrationEntry, decltype(IncludeVersion())>& metaclusterRegistration();
metaclusterRegistration = KeyBackedObjectProperty<MetaclusterRegistrationEntry, decltype(IncludeVersion())>(
"\xff/metacluster/clusterRegistration"_sr,
IncludeVersion());
}; };
#endif #endif

View File

@ -93,10 +93,10 @@ struct ManagementClusterMetadata {
} }
}; };
static inline TenantMetadataSpecification tenantMetadata = TenantMetadataSpecification(""_sr); static TenantMetadataSpecification& tenantMetadata();
// A map from cluster name to the metadata associated with a cluster // A map from cluster name to the metadata associated with a cluster
static KeyBackedObjectMap<ClusterName, DataClusterEntry, decltype(IncludeVersion())> dataClusters; static KeyBackedObjectMap<ClusterName, DataClusterEntry, decltype(IncludeVersion())>& dataClusters();
// A map from cluster name to the connection string for the cluster // A map from cluster name to the connection string for the cluster
static KeyBackedMap<ClusterName, ClusterConnectionString, TupleCodec<ClusterName>, ConnectionStringCodec> static KeyBackedMap<ClusterName, ClusterConnectionString, TupleCodec<ClusterName>, ConnectionStringCodec>
@ -108,10 +108,10 @@ struct ManagementClusterMetadata {
// A map from cluster name to a count of tenants // A map from cluster name to a count of tenants
static KeyBackedMap<ClusterName, int64_t, TupleCodec<ClusterName>, BinaryCodec<int64_t>> clusterTenantCount; static KeyBackedMap<ClusterName, int64_t, TupleCodec<ClusterName>, BinaryCodec<int64_t>> clusterTenantCount;
// A set of cluster/tenant pairings ordered by cluster // A set of (cluster name, tenant name, tenant ID) tuples ordered by cluster
static KeyBackedSet<Tuple> clusterTenantIndex; static KeyBackedSet<Tuple> clusterTenantIndex;
// A set of cluster/tenant group pairings ordered by cluster // A set of (cluster, tenant group name) tuples ordered by cluster
static KeyBackedSet<Tuple> clusterTenantGroupIndex; static KeyBackedSet<Tuple> clusterTenantGroupIndex;
}; };
@ -124,7 +124,8 @@ Future<Optional<DataClusterMetadata>> tryGetClusterTransaction(Transaction tr, C
state Future<Void> metaclusterRegistrationCheck = state Future<Void> metaclusterRegistrationCheck =
TenantAPI::checkTenantMode(tr, ClusterType::METACLUSTER_MANAGEMENT); TenantAPI::checkTenantMode(tr, ClusterType::METACLUSTER_MANAGEMENT);
state Future<Optional<DataClusterEntry>> clusterEntryFuture = ManagementClusterMetadata::dataClusters.get(tr, name); state Future<Optional<DataClusterEntry>> clusterEntryFuture =
ManagementClusterMetadata::dataClusters().get(tr, name);
state Future<Optional<ClusterConnectionString>> connectionRecordFuture = state Future<Optional<ClusterConnectionString>> connectionRecordFuture =
ManagementClusterMetadata::dataClusterConnectionRecords.get(tr, name); ManagementClusterMetadata::dataClusterConnectionRecords.get(tr, name);
@ -222,7 +223,7 @@ struct MetaclusterOperationContext {
// Get the metacluster registration information // Get the metacluster registration information
state Optional<MetaclusterRegistrationEntry> currentMetaclusterRegistration = state Optional<MetaclusterRegistrationEntry> currentMetaclusterRegistration =
wait(MetaclusterMetadata::metaclusterRegistration.get(tr)); wait(MetaclusterMetadata::metaclusterRegistration().get(tr));
state Optional<DataClusterMetadata> currentDataClusterMetadata; state Optional<DataClusterMetadata> currentDataClusterMetadata;
if (self->clusterName.present()) { if (self->clusterName.present()) {
@ -303,7 +304,7 @@ struct MetaclusterOperationContext {
tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS); tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
state Optional<MetaclusterRegistrationEntry> currentMetaclusterRegistration = state Optional<MetaclusterRegistrationEntry> currentMetaclusterRegistration =
wait(MetaclusterMetadata::metaclusterRegistration.get(tr)); wait(MetaclusterMetadata::metaclusterRegistration().get(tr));
// Check that this is the expected data cluster and is part of the right metacluster // Check that this is the expected data cluster and is part of the right metacluster
if (!currentMetaclusterRegistration.present() || if (!currentMetaclusterRegistration.present() ||
@ -371,7 +372,7 @@ struct MetaclusterOperationContext {
template <class Transaction> template <class Transaction>
Future<Optional<TenantMapEntry>> tryGetTenantTransaction(Transaction tr, TenantName name) { Future<Optional<TenantMapEntry>> tryGetTenantTransaction(Transaction tr, TenantName name) {
tr->setOption(FDBTransactionOptions::RAW_ACCESS); tr->setOption(FDBTransactionOptions::RAW_ACCESS);
return ManagementClusterMetadata::tenantMetadata.tenantMap.get(tr, name); return ManagementClusterMetadata::tenantMetadata().tenantMap.get(tr, name);
} }
ACTOR template <class DB> ACTOR template <class DB>
@ -413,7 +414,7 @@ Future<TenantMapEntry> getTenant(Reference<DB> db, TenantName name) {
ACTOR template <class Transaction> ACTOR template <class Transaction>
Future<Void> managementClusterCheckEmpty(Transaction tr) { Future<Void> managementClusterCheckEmpty(Transaction tr) {
state Future<KeyBackedRangeResult<std::pair<TenantName, TenantMapEntry>>> tenantsFuture = state Future<KeyBackedRangeResult<std::pair<TenantName, TenantMapEntry>>> tenantsFuture =
TenantMetadata::tenantMap.getRange(tr, {}, {}, 1); TenantMetadata::tenantMap().getRange(tr, {}, {}, 1);
state typename transaction_future_type<Transaction, RangeResult>::type dbContentsFuture = state typename transaction_future_type<Transaction, RangeResult>::type dbContentsFuture =
tr->getRange(normalKeys, 1); tr->getRange(normalKeys, 1);
@ -440,7 +441,7 @@ Future<Optional<std::string>> createMetacluster(Reference<DB> db, ClusterName na
tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS); tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
state Future<Optional<MetaclusterRegistrationEntry>> metaclusterRegistrationFuture = state Future<Optional<MetaclusterRegistrationEntry>> metaclusterRegistrationFuture =
MetaclusterMetadata::metaclusterRegistration.get(tr); MetaclusterMetadata::metaclusterRegistration().get(tr);
wait(managementClusterCheckEmpty(tr)); wait(managementClusterCheckEmpty(tr));
@ -461,8 +462,8 @@ Future<Optional<std::string>> createMetacluster(Reference<DB> db, ClusterName na
metaclusterUid = deterministicRandom()->randomUniqueID(); metaclusterUid = deterministicRandom()->randomUniqueID();
} }
MetaclusterMetadata::metaclusterRegistration.set(tr, MetaclusterMetadata::metaclusterRegistration().set(
MetaclusterRegistrationEntry(name, metaclusterUid.get())); tr, MetaclusterRegistrationEntry(name, metaclusterUid.get()));
wait(buggifiedCommit(tr, BUGGIFY_WITH_PROB(0.1))); wait(buggifiedCommit(tr, BUGGIFY_WITH_PROB(0.1)));
break; break;
@ -483,10 +484,7 @@ Future<Void> decommissionMetacluster(Reference<DB> db) {
try { try {
tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS); tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
state Future<ClusterType> clusterTypeFuture = TenantAPI::getClusterType(tr); ClusterType clusterType = wait(TenantAPI::getClusterType(tr));
wait(managementClusterCheckEmpty(tr));
ClusterType clusterType = wait(clusterTypeFuture);
if (clusterType != ClusterType::METACLUSTER_MANAGEMENT) { if (clusterType != ClusterType::METACLUSTER_MANAGEMENT) {
if (firstTry) { if (firstTry) {
throw invalid_metacluster_operation(); throw invalid_metacluster_operation();
@ -495,7 +493,15 @@ Future<Void> decommissionMetacluster(Reference<DB> db) {
} }
} }
MetaclusterMetadata::metaclusterRegistration.clear(tr); // Erase all metadata not associated with specific tenants prior to checking
// cluster emptiness
ManagementClusterMetadata::tenantMetadata().tenantCount.clear(tr);
ManagementClusterMetadata::tenantMetadata().lastTenantId.clear(tr);
ManagementClusterMetadata::tenantMetadata().tenantTombstones.clear(tr);
ManagementClusterMetadata::tenantMetadata().tombstoneCleanupData.clear(tr);
wait(managementClusterCheckEmpty(tr));
MetaclusterMetadata::metaclusterRegistration().clear(tr);
firstTry = false; firstTry = false;
wait(buggifiedCommit(tr, BUGGIFY_WITH_PROB(0.1))); wait(buggifiedCommit(tr, BUGGIFY_WITH_PROB(0.1)));
@ -535,10 +541,10 @@ void updateClusterMetadata(Transaction tr,
Optional<DataClusterEntry> const& updatedEntry) { Optional<DataClusterEntry> const& updatedEntry) {
if (updatedEntry.present()) { if (updatedEntry.present()) {
if (previousMetadata.entry.locked) { if (previousMetadata.entry.clusterState == DataClusterState::REMOVING) {
throw cluster_locked(); throw cluster_removed();
} }
ManagementClusterMetadata::dataClusters.set(tr, name, updatedEntry.get()); ManagementClusterMetadata::dataClusters().set(tr, name, updatedEntry.get());
updateClusterCapacityIndex(tr, name, previousMetadata.entry, updatedEntry.get()); updateClusterCapacityIndex(tr, name, previousMetadata.entry, updatedEntry.get());
} }
if (updatedConnectionString.present()) { if (updatedConnectionString.present()) {
@ -584,7 +590,7 @@ struct RegisterClusterImpl {
// Check whether this cluster has already been registered // Check whether this cluster has already been registered
state Optional<MetaclusterRegistrationEntry> existingRegistration = state Optional<MetaclusterRegistrationEntry> existingRegistration =
wait(MetaclusterMetadata::metaclusterRegistration.get(tr)); wait(MetaclusterMetadata::metaclusterRegistration().get(tr));
if (existingRegistration.present()) { if (existingRegistration.present()) {
if (existingRegistration.get().clusterType != ClusterType::METACLUSTER_DATA || if (existingRegistration.get().clusterType != ClusterType::METACLUSTER_DATA ||
existingRegistration.get().name != self->clusterName || existingRegistration.get().name != self->clusterName ||
@ -612,7 +618,7 @@ struct RegisterClusterImpl {
} }
self->clusterEntry.id = deterministicRandom()->randomUniqueID(); self->clusterEntry.id = deterministicRandom()->randomUniqueID();
MetaclusterMetadata::metaclusterRegistration.set( MetaclusterMetadata::metaclusterRegistration().set(
tr, tr,
self->ctx.metaclusterRegistration.get().toDataClusterRegistration(self->clusterName, self->ctx.metaclusterRegistration.get().toDataClusterRegistration(self->clusterName,
self->clusterEntry.id)); self->clusterEntry.id));
@ -647,7 +653,7 @@ struct RegisterClusterImpl {
ManagementClusterMetadata::clusterCapacityIndex.insert( ManagementClusterMetadata::clusterCapacityIndex.insert(
tr, Tuple::makeTuple(self->clusterEntry.allocated.numTenantGroups, self->clusterName)); tr, Tuple::makeTuple(self->clusterEntry.allocated.numTenantGroups, self->clusterName));
} }
ManagementClusterMetadata::dataClusters.set(tr, self->clusterName, self->clusterEntry); ManagementClusterMetadata::dataClusters().set(tr, self->clusterName, self->clusterEntry);
ManagementClusterMetadata::dataClusterConnectionRecords.set(tr, self->clusterName, self->connectionString); ManagementClusterMetadata::dataClusterConnectionRecords.set(tr, self->clusterName, self->connectionString);
} }
@ -703,21 +709,21 @@ struct RemoveClusterImpl {
// Initialization parameters // Initialization parameters
bool forceRemove; bool forceRemove;
// Parameters set in lockDataCluster // Parameters set in markClusterRemoving
Optional<int64_t> lastTenantId; Optional<int64_t> lastTenantId;
RemoveClusterImpl(Reference<DB> managementDb, ClusterName clusterName, bool forceRemove) RemoveClusterImpl(Reference<DB> managementDb, ClusterName clusterName, bool forceRemove)
: ctx(managementDb, clusterName), forceRemove(forceRemove) {} : ctx(managementDb, clusterName), forceRemove(forceRemove) {}
// Returns false if the cluster is no longer present, or true if it is present and the removal should proceed. // Returns false if the cluster is no longer present, or true if it is present and the removal should proceed.
ACTOR static Future<bool> lockDataCluster(RemoveClusterImpl* self, Reference<typename DB::TransactionT> tr) { ACTOR static Future<bool> markClusterRemoving(RemoveClusterImpl* self, Reference<typename DB::TransactionT> tr) {
if (!self->forceRemove && self->ctx.dataClusterMetadata.get().entry.allocated.numTenantGroups > 0) { if (!self->forceRemove && self->ctx.dataClusterMetadata.get().entry.allocated.numTenantGroups > 0) {
throw cluster_not_empty(); throw cluster_not_empty();
} else if (!self->ctx.dataClusterMetadata.get().entry.locked) { } else if (self->ctx.dataClusterMetadata.get().entry.clusterState != DataClusterState::REMOVING) {
// Lock the cluster while we finish the remaining removal steps to prevent new tenants from being // Mark the cluster in a removing state while we finish the remaining removal steps. This prevents new
// assigned to it. // tenants from being assigned to it.
DataClusterEntry updatedEntry = self->ctx.dataClusterMetadata.get().entry; DataClusterEntry updatedEntry = self->ctx.dataClusterMetadata.get().entry;
updatedEntry.locked = true; updatedEntry.clusterState = DataClusterState::REMOVING;
updatedEntry.capacity.numTenantGroups = 0; updatedEntry.capacity.numTenantGroups = 0;
updateClusterMetadata(tr, updateClusterMetadata(tr,
@ -734,11 +740,11 @@ struct RemoveClusterImpl {
// Get the last allocated tenant ID to be used on the detached data cluster // Get the last allocated tenant ID to be used on the detached data cluster
if (self->forceRemove) { if (self->forceRemove) {
Optional<int64_t> lastId = wait(ManagementClusterMetadata::tenantMetadata.lastTenantId.get(tr)); Optional<int64_t> lastId = wait(ManagementClusterMetadata::tenantMetadata().lastTenantId.get(tr));
self->lastTenantId = lastId; self->lastTenantId = lastId;
} }
TraceEvent("LockedDataCluster") TraceEvent("MarkedDataClusterRemoving")
.detail("Name", self->ctx.clusterName.get()) .detail("Name", self->ctx.clusterName.get())
.detail("Version", tr->getCommittedVersion()); .detail("Version", tr->getCommittedVersion());
@ -748,18 +754,18 @@ struct RemoveClusterImpl {
// Delete metacluster metadata from the data cluster // Delete metacluster metadata from the data cluster
ACTOR static Future<Void> updateDataCluster(RemoveClusterImpl* self, Reference<ITransaction> tr) { ACTOR static Future<Void> updateDataCluster(RemoveClusterImpl* self, Reference<ITransaction> tr) {
// Delete metacluster related metadata // Delete metacluster related metadata
MetaclusterMetadata::metaclusterRegistration.clear(tr); MetaclusterMetadata::metaclusterRegistration().clear(tr);
TenantMetadata::tenantTombstones.clear(tr); TenantMetadata::tenantTombstones().clear(tr);
TenantMetadata::tombstoneCleanupData.clear(tr); TenantMetadata::tombstoneCleanupData().clear(tr);
// If we are force removing a cluster, then it will potentially contain tenants that have IDs // If we are force removing a cluster, then it will potentially contain tenants that have IDs
// larger than the next tenant ID to be allocated on the cluster. To avoid collisions, we advance // larger than the next tenant ID to be allocated on the cluster. To avoid collisions, we advance
// the ID so that it will be the larger of the current one on the data cluster and the management // the ID so that it will be the larger of the current one on the data cluster and the management
// cluster. // cluster.
if (self->lastTenantId.present()) { if (self->lastTenantId.present()) {
Optional<int64_t> lastId = wait(TenantMetadata::lastTenantId.get(tr)); Optional<int64_t> lastId = wait(TenantMetadata::lastTenantId().get(tr));
if (!lastId.present() || lastId.get() < self->lastTenantId.get()) { if (!lastId.present() || lastId.get() < self->lastTenantId.get()) {
TenantMetadata::lastTenantId.set(tr, self->lastTenantId.get()); TenantMetadata::lastTenantId().set(tr, self->lastTenantId.get());
} }
} }
@ -774,6 +780,8 @@ struct RemoveClusterImpl {
ACTOR static Future<bool> purgeTenants(RemoveClusterImpl* self, ACTOR static Future<bool> purgeTenants(RemoveClusterImpl* self,
Reference<typename DB::TransactionT> tr, Reference<typename DB::TransactionT> tr,
std::pair<Tuple, Tuple> clusterTupleRange) { std::pair<Tuple, Tuple> clusterTupleRange) {
ASSERT(self->ctx.dataClusterMetadata.get().entry.clusterState == DataClusterState::REMOVING);
// Get the list of tenants // Get the list of tenants
state Future<KeyBackedRangeResult<Tuple>> tenantEntriesFuture = state Future<KeyBackedRangeResult<Tuple>> tenantEntriesFuture =
ManagementClusterMetadata::clusterTenantIndex.getRange( ManagementClusterMetadata::clusterTenantIndex.getRange(
@ -784,7 +792,8 @@ struct RemoveClusterImpl {
// Erase each tenant from the tenant map on the management cluster // Erase each tenant from the tenant map on the management cluster
for (Tuple entry : tenantEntries.results) { for (Tuple entry : tenantEntries.results) {
ASSERT(entry.getString(0) == self->ctx.clusterName.get()); ASSERT(entry.getString(0) == self->ctx.clusterName.get());
ManagementClusterMetadata::tenantMetadata.tenantMap.erase(tr, entry.getString(1)); ManagementClusterMetadata::tenantMetadata().tenantMap.erase(tr, entry.getString(1));
ManagementClusterMetadata::tenantMetadata().tenantIdIndex.erase(tr, entry.getInt(2));
} }
// Erase all of the tenants processed in this transaction from the cluster tenant index // Erase all of the tenants processed in this transaction from the cluster tenant index
@ -795,7 +804,7 @@ struct RemoveClusterImpl {
Tuple::makeTuple(self->ctx.clusterName.get(), keyAfter(tenantEntries.results.rbegin()->getString(1)))); Tuple::makeTuple(self->ctx.clusterName.get(), keyAfter(tenantEntries.results.rbegin()->getString(1))));
} }
ManagementClusterMetadata::tenantMetadata.tenantCount.atomicOp( ManagementClusterMetadata::tenantMetadata().tenantCount.atomicOp(
tr, -tenantEntries.results.size(), MutationRef::AddValue); tr, -tenantEntries.results.size(), MutationRef::AddValue);
ManagementClusterMetadata::clusterTenantCount.atomicOp( ManagementClusterMetadata::clusterTenantCount.atomicOp(
tr, self->ctx.clusterName.get(), -tenantEntries.results.size(), MutationRef::AddValue); tr, self->ctx.clusterName.get(), -tenantEntries.results.size(), MutationRef::AddValue);
@ -807,6 +816,8 @@ struct RemoveClusterImpl {
ACTOR static Future<bool> purgeTenantGroupsAndDataCluster(RemoveClusterImpl* self, ACTOR static Future<bool> purgeTenantGroupsAndDataCluster(RemoveClusterImpl* self,
Reference<typename DB::TransactionT> tr, Reference<typename DB::TransactionT> tr,
std::pair<Tuple, Tuple> clusterTupleRange) { std::pair<Tuple, Tuple> clusterTupleRange) {
ASSERT(self->ctx.dataClusterMetadata.get().entry.clusterState == DataClusterState::REMOVING);
// Get the list of tenant groups // Get the list of tenant groups
state Future<KeyBackedRangeResult<Tuple>> tenantGroupEntriesFuture = state Future<KeyBackedRangeResult<Tuple>> tenantGroupEntriesFuture =
ManagementClusterMetadata::clusterTenantGroupIndex.getRange( ManagementClusterMetadata::clusterTenantGroupIndex.getRange(
@ -817,9 +828,9 @@ struct RemoveClusterImpl {
for (Tuple entry : tenantGroupEntries.results) { for (Tuple entry : tenantGroupEntries.results) {
ASSERT(entry.getString(0) == self->ctx.clusterName.get()); ASSERT(entry.getString(0) == self->ctx.clusterName.get());
TenantGroupName tenantGroup = entry.getString(1); TenantGroupName tenantGroup = entry.getString(1);
ManagementClusterMetadata::tenantMetadata.tenantGroupTenantIndex.erase( ManagementClusterMetadata::tenantMetadata().tenantGroupTenantIndex.erase(
tr, Tuple::makeTuple(tenantGroup), Tuple::makeTuple(keyAfter(tenantGroup))); tr, Tuple::makeTuple(tenantGroup), Tuple::makeTuple(keyAfter(tenantGroup)));
ManagementClusterMetadata::tenantMetadata.tenantGroupMap.erase(tr, tenantGroup); ManagementClusterMetadata::tenantMetadata().tenantGroupMap.erase(tr, tenantGroup);
} }
if (!tenantGroupEntries.results.empty()) { if (!tenantGroupEntries.results.empty()) {
@ -833,7 +844,7 @@ struct RemoveClusterImpl {
// Erase the data cluster record from the management cluster if processing our last batch // Erase the data cluster record from the management cluster if processing our last batch
if (!tenantGroupEntries.more) { if (!tenantGroupEntries.more) {
ManagementClusterMetadata::dataClusters.erase(tr, self->ctx.clusterName.get()); ManagementClusterMetadata::dataClusters().erase(tr, self->ctx.clusterName.get());
ManagementClusterMetadata::dataClusterConnectionRecords.erase(tr, self->ctx.clusterName.get()); ManagementClusterMetadata::dataClusterConnectionRecords.erase(tr, self->ctx.clusterName.get());
ManagementClusterMetadata::clusterTenantCount.erase(tr, self->ctx.clusterName.get()); ManagementClusterMetadata::clusterTenantCount.erase(tr, self->ctx.clusterName.get());
} }
@ -874,8 +885,21 @@ struct RemoveClusterImpl {
} }
ACTOR static Future<Void> run(RemoveClusterImpl* self) { ACTOR static Future<Void> run(RemoveClusterImpl* self) {
bool clusterIsPresent = wait(self->ctx.runManagementTransaction( state bool clusterIsPresent;
[self = self](Reference<typename DB::TransactionT> tr) { return lockDataCluster(self, tr); })); try {
wait(store(clusterIsPresent,
self->ctx.runManagementTransaction([self = self](Reference<typename DB::TransactionT> tr) {
return markClusterRemoving(self, tr);
})));
} catch (Error& e) {
// If the transaction retries after success or if we are trying a second time to remove the cluster, it will
// throw an error indicating that the removal has already started
if (e.code() == error_code_cluster_removed) {
clusterIsPresent = true;
} else {
throw;
}
}
if (clusterIsPresent) { if (clusterIsPresent) {
try { try {
@ -921,7 +945,7 @@ Future<std::map<ClusterName, DataClusterMetadata>> listClustersTransaction(Trans
state Future<Void> tenantModeCheck = TenantAPI::checkTenantMode(tr, ClusterType::METACLUSTER_MANAGEMENT); state Future<Void> tenantModeCheck = TenantAPI::checkTenantMode(tr, ClusterType::METACLUSTER_MANAGEMENT);
state Future<KeyBackedRangeResult<std::pair<ClusterName, DataClusterEntry>>> clusterEntriesFuture = state Future<KeyBackedRangeResult<std::pair<ClusterName, DataClusterEntry>>> clusterEntriesFuture =
ManagementClusterMetadata::dataClusters.getRange(tr, begin, end, limit); ManagementClusterMetadata::dataClusters().getRange(tr, begin, end, limit);
state Future<KeyBackedRangeResult<std::pair<ClusterName, ClusterConnectionString>>> connectionStringFuture = state Future<KeyBackedRangeResult<std::pair<ClusterName, ClusterConnectionString>>> connectionStringFuture =
ManagementClusterMetadata::dataClusterConnectionRecords.getRange(tr, begin, end, limit); ManagementClusterMetadata::dataClusterConnectionRecords.getRange(tr, begin, end, limit);
@ -975,12 +999,12 @@ void managementClusterAddTenantToGroup(Transaction tr,
} }
if (!groupAlreadyExists) { if (!groupAlreadyExists) {
ManagementClusterMetadata::tenantMetadata.tenantGroupMap.set( ManagementClusterMetadata::tenantMetadata().tenantGroupMap.set(
tr, tenantEntry.tenantGroup.get(), TenantGroupEntry(tenantEntry.assignedCluster)); tr, tenantEntry.tenantGroup.get(), TenantGroupEntry(tenantEntry.assignedCluster));
ManagementClusterMetadata::clusterTenantGroupIndex.insert( ManagementClusterMetadata::clusterTenantGroupIndex.insert(
tr, Tuple::makeTuple(tenantEntry.assignedCluster.get(), tenantEntry.tenantGroup.get())); tr, Tuple::makeTuple(tenantEntry.assignedCluster.get(), tenantEntry.tenantGroup.get()));
} }
ManagementClusterMetadata::tenantMetadata.tenantGroupTenantIndex.insert( ManagementClusterMetadata::tenantMetadata().tenantGroupTenantIndex.insert(
tr, Tuple::makeTuple(tenantEntry.tenantGroup.get(), tenantName)); tr, Tuple::makeTuple(tenantEntry.tenantGroup.get(), tenantName));
} }
@ -1004,11 +1028,11 @@ Future<Void> managementClusterRemoveTenantFromGroup(Transaction tr,
DataClusterMetadata* clusterMetadata) { DataClusterMetadata* clusterMetadata) {
state bool updateClusterCapacity = !tenantEntry.tenantGroup.present(); state bool updateClusterCapacity = !tenantEntry.tenantGroup.present();
if (tenantEntry.tenantGroup.present()) { if (tenantEntry.tenantGroup.present()) {
ManagementClusterMetadata::tenantMetadata.tenantGroupTenantIndex.erase( ManagementClusterMetadata::tenantMetadata().tenantGroupTenantIndex.erase(
tr, Tuple::makeTuple(tenantEntry.tenantGroup.get(), tenantName)); tr, Tuple::makeTuple(tenantEntry.tenantGroup.get(), tenantName));
state KeyBackedSet<Tuple>::RangeResultType result = state KeyBackedSet<Tuple>::RangeResultType result =
wait(ManagementClusterMetadata::tenantMetadata.tenantGroupTenantIndex.getRange( wait(ManagementClusterMetadata::tenantMetadata().tenantGroupTenantIndex.getRange(
tr, tr,
Tuple::makeTuple(tenantEntry.tenantGroup.get()), Tuple::makeTuple(tenantEntry.tenantGroup.get()),
Tuple::makeTuple(keyAfter(tenantEntry.tenantGroup.get())), Tuple::makeTuple(keyAfter(tenantEntry.tenantGroup.get())),
@ -1018,7 +1042,7 @@ Future<Void> managementClusterRemoveTenantFromGroup(Transaction tr,
ManagementClusterMetadata::clusterTenantGroupIndex.erase( ManagementClusterMetadata::clusterTenantGroupIndex.erase(
tr, Tuple::makeTuple(tenantEntry.assignedCluster.get(), tenantEntry.tenantGroup.get())); tr, Tuple::makeTuple(tenantEntry.assignedCluster.get(), tenantEntry.tenantGroup.get()));
ManagementClusterMetadata::tenantMetadata.tenantGroupMap.erase(tr, tenantEntry.tenantGroup.get()); ManagementClusterMetadata::tenantMetadata().tenantGroupMap.erase(tr, tenantEntry.tenantGroup.get());
updateClusterCapacity = true; updateClusterCapacity = true;
} }
} }
@ -1091,12 +1115,15 @@ struct CreateTenantImpl {
// The previous creation is permanently failed, so cleanup the tenant and create it again from scratch // The previous creation is permanently failed, so cleanup the tenant and create it again from scratch
// We don't need to remove it from the tenant map because we will overwrite the existing entry later in // We don't need to remove it from the tenant map because we will overwrite the existing entry later in
// this transaction. // this transaction.
ManagementClusterMetadata::tenantMetadata.tenantCount.atomicOp(tr, -1, MutationRef::AddValue); ManagementClusterMetadata::tenantMetadata().tenantIdIndex.erase(tr, existingEntry.get().id);
ManagementClusterMetadata::tenantMetadata().tenantCount.atomicOp(tr, -1, MutationRef::AddValue);
ManagementClusterMetadata::clusterTenantCount.atomicOp( ManagementClusterMetadata::clusterTenantCount.atomicOp(
tr, existingEntry.get().assignedCluster.get(), -1, MutationRef::AddValue); tr, existingEntry.get().assignedCluster.get(), -1, MutationRef::AddValue);
ManagementClusterMetadata::clusterTenantIndex.erase( ManagementClusterMetadata::clusterTenantIndex.erase(
tr, Tuple::makeTuple(existingEntry.get().assignedCluster.get(), self->tenantName)); tr,
Tuple::makeTuple(
existingEntry.get().assignedCluster.get(), self->tenantName, existingEntry.get().id));
state DataClusterMetadata previousAssignedClusterMetadata = state DataClusterMetadata previousAssignedClusterMetadata =
wait(getClusterTransaction(tr, existingEntry.get().assignedCluster.get())); wait(getClusterTransaction(tr, existingEntry.get().assignedCluster.get()));
@ -1117,8 +1144,9 @@ struct CreateTenantImpl {
// If our tenant group is already assigned, then we just use that assignment // If our tenant group is already assigned, then we just use that assignment
state Optional<TenantGroupEntry> groupEntry; state Optional<TenantGroupEntry> groupEntry;
if (self->tenantEntry.tenantGroup.present()) { if (self->tenantEntry.tenantGroup.present()) {
Optional<TenantGroupEntry> _groupEntry = wait( Optional<TenantGroupEntry> _groupEntry =
ManagementClusterMetadata::tenantMetadata.tenantGroupMap.get(tr, self->tenantEntry.tenantGroup.get())); wait(ManagementClusterMetadata::tenantMetadata().tenantGroupMap.get(
tr, self->tenantEntry.tenantGroup.get()));
groupEntry = _groupEntry; groupEntry = _groupEntry;
if (groupEntry.present()) { if (groupEntry.present()) {
@ -1191,14 +1219,15 @@ struct CreateTenantImpl {
state Future<Void> setClusterFuture = self->ctx.setCluster(tr, assignment.first); state Future<Void> setClusterFuture = self->ctx.setCluster(tr, assignment.first);
// Create a tenant entry in the management cluster // Create a tenant entry in the management cluster
Optional<int64_t> lastId = wait(ManagementClusterMetadata::tenantMetadata.lastTenantId.get(tr)); Optional<int64_t> lastId = wait(ManagementClusterMetadata::tenantMetadata().lastTenantId.get(tr));
self->tenantEntry.setId(lastId.orDefault(-1) + 1); self->tenantEntry.setId(lastId.orDefault(-1) + 1);
ManagementClusterMetadata::tenantMetadata.lastTenantId.set(tr, self->tenantEntry.id); ManagementClusterMetadata::tenantMetadata().lastTenantId.set(tr, self->tenantEntry.id);
self->tenantEntry.tenantState = TenantState::REGISTERING; self->tenantEntry.tenantState = TenantState::REGISTERING;
ManagementClusterMetadata::tenantMetadata.tenantMap.set(tr, self->tenantName, self->tenantEntry); ManagementClusterMetadata::tenantMetadata().tenantMap.set(tr, self->tenantName, self->tenantEntry);
ManagementClusterMetadata::tenantMetadata().tenantIdIndex.set(tr, self->tenantEntry.id, self->tenantName);
ManagementClusterMetadata::tenantMetadata.tenantCount.atomicOp(tr, 1, MutationRef::AddValue); ManagementClusterMetadata::tenantMetadata().tenantCount.atomicOp(tr, 1, MutationRef::AddValue);
ManagementClusterMetadata::clusterTenantCount.atomicOp( ManagementClusterMetadata::clusterTenantCount.atomicOp(
tr, self->tenantEntry.assignedCluster.get(), 1, MutationRef::AddValue); tr, self->tenantEntry.assignedCluster.get(), 1, MutationRef::AddValue);
@ -1211,14 +1240,14 @@ struct CreateTenantImpl {
// Updated indexes to include the new tenant // Updated indexes to include the new tenant
ManagementClusterMetadata::clusterTenantIndex.insert( ManagementClusterMetadata::clusterTenantIndex.insert(
tr, Tuple::makeTuple(self->tenantEntry.assignedCluster.get(), self->tenantName)); tr, Tuple::makeTuple(self->tenantEntry.assignedCluster.get(), self->tenantName, self->tenantEntry.id));
wait(setClusterFuture); wait(setClusterFuture);
// If we are part of a tenant group that is assigned to a cluster being removed from the metacluster, // If we are part of a tenant group that is assigned to a cluster being removed from the metacluster,
// then we fail with an error. // then we fail with an error.
if (self->ctx.dataClusterMetadata.get().entry.locked) { if (self->ctx.dataClusterMetadata.get().entry.clusterState == DataClusterState::REMOVING) {
throw cluster_locked(); throw cluster_removed();
} }
managementClusterAddTenantToGroup( managementClusterAddTenantToGroup(
@ -1251,7 +1280,7 @@ struct CreateTenantImpl {
if (managementEntry.get().tenantState == TenantState::REGISTERING) { if (managementEntry.get().tenantState == TenantState::REGISTERING) {
TenantMapEntry updatedEntry = managementEntry.get(); TenantMapEntry updatedEntry = managementEntry.get();
updatedEntry.tenantState = TenantState::READY; updatedEntry.tenantState = TenantState::READY;
ManagementClusterMetadata::tenantMetadata.tenantMap.set(tr, self->tenantName, updatedEntry); ManagementClusterMetadata::tenantMetadata().tenantMap.set(tr, self->tenantName, updatedEntry);
} }
return Void(); return Void();
@ -1359,7 +1388,7 @@ struct DeleteTenantImpl {
if (tenantEntry.get().tenantState != TenantState::REMOVING) { if (tenantEntry.get().tenantState != TenantState::REMOVING) {
TenantMapEntry updatedEntry = tenantEntry.get(); TenantMapEntry updatedEntry = tenantEntry.get();
updatedEntry.tenantState = TenantState::REMOVING; updatedEntry.tenantState = TenantState::REMOVING;
ManagementClusterMetadata::tenantMetadata.tenantMap.set(tr, self->tenantName, updatedEntry); ManagementClusterMetadata::tenantMetadata().tenantMap.set(tr, self->tenantName, updatedEntry);
} }
return Void(); return Void();
@ -1377,16 +1406,17 @@ struct DeleteTenantImpl {
ASSERT(tenantEntry.get().tenantState == TenantState::REMOVING); ASSERT(tenantEntry.get().tenantState == TenantState::REMOVING);
// Erase the tenant entry itself // Erase the tenant entry itself
ManagementClusterMetadata::tenantMetadata.tenantMap.erase(tr, self->tenantName); ManagementClusterMetadata::tenantMetadata().tenantMap.erase(tr, self->tenantName);
ManagementClusterMetadata::tenantMetadata().tenantIdIndex.erase(tr, tenantEntry.get().id);
// This is idempotent because this function is only called if the tenant is in the map // This is idempotent because this function is only called if the tenant is in the map
ManagementClusterMetadata::tenantMetadata.tenantCount.atomicOp(tr, -1, MutationRef::AddValue); ManagementClusterMetadata::tenantMetadata().tenantCount.atomicOp(tr, -1, MutationRef::AddValue);
ManagementClusterMetadata::clusterTenantCount.atomicOp( ManagementClusterMetadata::clusterTenantCount.atomicOp(
tr, tenantEntry.get().assignedCluster.get(), -1, MutationRef::AddValue); tr, tenantEntry.get().assignedCluster.get(), -1, MutationRef::AddValue);
// Remove the tenant from the cluster -> tenant index // Remove the tenant from the cluster -> tenant index
ManagementClusterMetadata::clusterTenantIndex.erase( ManagementClusterMetadata::clusterTenantIndex.erase(
tr, Tuple::makeTuple(tenantEntry.get().assignedCluster.get(), self->tenantName)); tr, Tuple::makeTuple(tenantEntry.get().assignedCluster.get(), self->tenantName, tenantEntry.get().id));
// Remove the tenant from its tenant group // Remove the tenant from its tenant group
wait(managementClusterRemoveTenantFromGroup( wait(managementClusterRemoveTenantFromGroup(
@ -1439,7 +1469,7 @@ Future<std::vector<std::pair<TenantName, TenantMapEntry>>> listTenantsTransactio
tr->setOption(FDBTransactionOptions::RAW_ACCESS); tr->setOption(FDBTransactionOptions::RAW_ACCESS);
KeyBackedRangeResult<std::pair<TenantName, TenantMapEntry>> results = KeyBackedRangeResult<std::pair<TenantName, TenantMapEntry>> results =
wait(ManagementClusterMetadata::tenantMetadata.tenantMap.getRange(tr, begin, end, limit)); wait(ManagementClusterMetadata::tenantMetadata().tenantMap.getRange(tr, begin, end, limit));
return results.results; return results.results;
} }
@ -1508,7 +1538,7 @@ struct ConfigureTenantImpl {
} }
state Optional<TenantGroupEntry> tenantGroupEntry = state Optional<TenantGroupEntry> tenantGroupEntry =
wait(ManagementClusterMetadata::tenantMetadata.tenantGroupMap.get(tr, desiredGroup.get())); wait(ManagementClusterMetadata::tenantMetadata().tenantGroupMap.get(tr, desiredGroup.get()));
// If we are creating a new tenant group, we need to have capacity on the current cluster // If we are creating a new tenant group, we need to have capacity on the current cluster
if (!tenantGroupEntry.present()) { if (!tenantGroupEntry.present()) {
@ -1566,7 +1596,7 @@ struct ConfigureTenantImpl {
} }
++self->updatedEntry.configurationSequenceNum; ++self->updatedEntry.configurationSequenceNum;
ManagementClusterMetadata::tenantMetadata.tenantMap.set(tr, self->tenantName, self->updatedEntry); ManagementClusterMetadata::tenantMetadata().tenantMap.set(tr, self->tenantName, self->updatedEntry);
return Void(); return Void();
} }
@ -1601,7 +1631,7 @@ struct ConfigureTenantImpl {
} }
tenantEntry.get().tenantState = TenantState::READY; tenantEntry.get().tenantState = TenantState::READY;
ManagementClusterMetadata::tenantMetadata.tenantMap.set(tr, self->tenantName, tenantEntry.get()); ManagementClusterMetadata::tenantMetadata().tenantMap.set(tr, self->tenantName, tenantEntry.get());
return Void(); return Void();
} }

View File

@ -122,6 +122,8 @@ struct FdbCApi : public ThreadSafeReferenceCounted<FdbCApi> {
// Network // Network
fdb_error_t (*selectApiVersion)(int runtimeVersion, int headerVersion); fdb_error_t (*selectApiVersion)(int runtimeVersion, int headerVersion);
const char* (*getClientVersion)(); const char* (*getClientVersion)();
void (*useFutureProtocolVersion)();
fdb_error_t (*setNetworkOption)(FDBNetworkOption option, uint8_t const* value, int valueLength); fdb_error_t (*setNetworkOption)(FDBNetworkOption option, uint8_t const* value, int valueLength);
fdb_error_t (*setupNetwork)(); fdb_error_t (*setupNetwork)();
fdb_error_t (*runNetwork)(); fdb_error_t (*runNetwork)();
@ -169,6 +171,32 @@ struct FdbCApi : public ThreadSafeReferenceCounted<FdbCApi> {
uint8_t const* purge_key_name, uint8_t const* purge_key_name,
int purge_key_name_length); int purge_key_name_length);
FDBFuture* (*databaseBlobbifyRange)(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length);
FDBFuture* (*databaseUnblobbifyRange)(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length);
FDBFuture* (*databaseListBlobbifiedRanges)(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int rangeLimit);
FDBFuture* (*databaseVerifyBlobRange)(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
Optional<Version> version);
// Tenant // Tenant
fdb_error_t (*tenantCreateTransaction)(FDBTenant* tenant, FDBTransaction** outTransaction); fdb_error_t (*tenantCreateTransaction)(FDBTenant* tenant, FDBTransaction** outTransaction);
@ -274,7 +302,8 @@ struct FdbCApi : public ThreadSafeReferenceCounted<FdbCApi> {
uint8_t const* begin_key_name, uint8_t const* begin_key_name,
int begin_key_name_length, int begin_key_name_length,
uint8_t const* end_key_name, uint8_t const* end_key_name,
int end_key_name_length); int end_key_name_length,
int rangeLimit);
FDBResult* (*transactionReadBlobGranules)(FDBTransaction* db, FDBResult* (*transactionReadBlobGranules)(FDBTransaction* db,
uint8_t const* begin_key_name, uint8_t const* begin_key_name,
@ -374,7 +403,8 @@ public:
ThreadFuture<int64_t> getEstimatedRangeSizeBytes(const KeyRangeRef& keys) override; ThreadFuture<int64_t> getEstimatedRangeSizeBytes(const KeyRangeRef& keys) override;
ThreadFuture<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRangeRef& range, ThreadFuture<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRangeRef& range,
int64_t chunkSize) override; int64_t chunkSize) override;
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRangeRef& keyRange) override; ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRangeRef& keyRange,
int rangeLimit) override;
ThreadResult<RangeResult> readBlobGranules(const KeyRangeRef& keyRange, ThreadResult<RangeResult> readBlobGranules(const KeyRangeRef& keyRange,
Version beginVersion, Version beginVersion,
@ -474,6 +504,12 @@ public:
ThreadFuture<Key> purgeBlobGranules(const KeyRangeRef& keyRange, Version purgeVersion, bool force) override; ThreadFuture<Key> purgeBlobGranules(const KeyRangeRef& keyRange, Version purgeVersion, bool force) override;
ThreadFuture<Void> waitPurgeGranulesComplete(const KeyRef& purgeKey) override; ThreadFuture<Void> waitPurgeGranulesComplete(const KeyRef& purgeKey) override;
ThreadFuture<bool> blobbifyRange(const KeyRangeRef& keyRange) override;
ThreadFuture<bool> unblobbifyRange(const KeyRangeRef& keyRange) override;
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> listBlobbifiedRanges(const KeyRangeRef& keyRange,
int rangeLimit) override;
ThreadFuture<Version> verifyBlobRange(const KeyRangeRef& keyRange, Optional<Version> version) override;
ThreadFuture<DatabaseSharedState*> createSharedState() override; ThreadFuture<DatabaseSharedState*> createSharedState() override;
void setSharedState(DatabaseSharedState* p) override; void setSharedState(DatabaseSharedState* p) override;
@ -492,6 +528,7 @@ public:
void selectApiVersion(int apiVersion) override; void selectApiVersion(int apiVersion) override;
const char* getClientVersion() override; const char* getClientVersion() override;
void useFutureProtocolVersion() override;
void setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override; void setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override;
void setupNetwork() override; void setupNetwork() override;
@ -571,7 +608,8 @@ public:
ThreadFuture<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRangeRef& range, ThreadFuture<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRangeRef& range,
int64_t chunkSize) override; int64_t chunkSize) override;
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRangeRef& keyRange) override; ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRangeRef& keyRange,
int rangeLimit) override;
ThreadResult<RangeResult> readBlobGranules(const KeyRangeRef& keyRange, ThreadResult<RangeResult> readBlobGranules(const KeyRangeRef& keyRange,
Version beginVersion, Version beginVersion,
@ -655,8 +693,10 @@ private:
struct ClientDesc { struct ClientDesc {
std::string const libPath; std::string const libPath;
bool const external; bool const external;
bool const useFutureVersion;
ClientDesc(std::string libPath, bool external) : libPath(libPath), external(external) {} ClientDesc(std::string libPath, bool external, bool useFutureVersion)
: libPath(libPath), external(external), useFutureVersion(useFutureVersion) {}
}; };
struct ClientInfo : ClientDesc, ThreadSafeReferenceCounted<ClientInfo> { struct ClientInfo : ClientDesc, ThreadSafeReferenceCounted<ClientInfo> {
@ -668,11 +708,11 @@ struct ClientInfo : ClientDesc, ThreadSafeReferenceCounted<ClientInfo> {
std::vector<std::pair<void (*)(void*), void*>> threadCompletionHooks; std::vector<std::pair<void (*)(void*), void*>> threadCompletionHooks;
ClientInfo() ClientInfo()
: ClientDesc(std::string(), false), protocolVersion(0), api(nullptr), failed(true), initialized(false) {} : ClientDesc(std::string(), false, false), protocolVersion(0), api(nullptr), failed(true), initialized(false) {}
ClientInfo(IClientApi* api) ClientInfo(IClientApi* api)
: ClientDesc("internal", false), protocolVersion(0), api(api), failed(false), initialized(false) {} : ClientDesc("internal", false, false), protocolVersion(0), api(api), failed(false), initialized(false) {}
ClientInfo(IClientApi* api, std::string libPath) ClientInfo(IClientApi* api, std::string libPath, bool useFutureVersion)
: ClientDesc(libPath, true), protocolVersion(0), api(api), failed(false), initialized(false) {} : ClientDesc(libPath, true, useFutureVersion), protocolVersion(0), api(api), failed(false), initialized(false) {}
void loadVersion(); void loadVersion();
bool canReplace(Reference<ClientInfo> other) const; bool canReplace(Reference<ClientInfo> other) const;
@ -812,6 +852,12 @@ public:
ThreadFuture<Key> purgeBlobGranules(const KeyRangeRef& keyRange, Version purgeVersion, bool force) override; ThreadFuture<Key> purgeBlobGranules(const KeyRangeRef& keyRange, Version purgeVersion, bool force) override;
ThreadFuture<Void> waitPurgeGranulesComplete(const KeyRef& purgeKey) override; ThreadFuture<Void> waitPurgeGranulesComplete(const KeyRef& purgeKey) override;
ThreadFuture<bool> blobbifyRange(const KeyRangeRef& keyRange) override;
ThreadFuture<bool> unblobbifyRange(const KeyRangeRef& keyRange) override;
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> listBlobbifiedRanges(const KeyRangeRef& keyRange,
int rangeLimit) override;
ThreadFuture<Version> verifyBlobRange(const KeyRangeRef& keyRange, Optional<Version> version) override;
ThreadFuture<DatabaseSharedState*> createSharedState() override; ThreadFuture<DatabaseSharedState*> createSharedState() override;
void setSharedState(DatabaseSharedState* p) override; void setSharedState(DatabaseSharedState* p) override;
@ -919,6 +965,7 @@ class MultiVersionApi : public IClientApi {
public: public:
void selectApiVersion(int apiVersion) override; void selectApiVersion(int apiVersion) override;
const char* getClientVersion() override; const char* getClientVersion() override;
void useFutureProtocolVersion() override;
void setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override; void setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override;
void setupNetwork() override; void setupNetwork() override;
@ -965,7 +1012,7 @@ private:
void disableMultiVersionClientApi(); void disableMultiVersionClientApi();
void setCallbacksOnExternalThreads(); void setCallbacksOnExternalThreads();
void addExternalLibrary(std::string path); void addExternalLibrary(std::string path, bool useFutureVersion);
void addExternalLibraryDirectory(std::string path); void addExternalLibraryDirectory(std::string path);
// Return a vector of (pathname, unlink_on_close) pairs. Makes threadCount - 1 copies of the library stored in // Return a vector of (pathname, unlink_on_close) pairs. Makes threadCount - 1 copies of the library stored in
// path, and returns a vector of length threadCount. // path, and returns a vector of length threadCount.

View File

@ -239,7 +239,6 @@ FDB_DECLARE_BOOLEAN_PARAM(AllowInvalidTenantID);
struct TransactionState : ReferenceCounted<TransactionState> { struct TransactionState : ReferenceCounted<TransactionState> {
Database cx; Database cx;
int64_t tenantId = TenantInfo::INVALID_TENANT;
Optional<Standalone<StringRef>> authToken; Optional<Standalone<StringRef>> authToken;
Reference<TransactionLogInfo> trLogInfo; Reference<TransactionLogInfo> trLogInfo;
TransactionOptions options; TransactionOptions options;
@ -285,8 +284,18 @@ struct TransactionState : ReferenceCounted<TransactionState> {
Optional<TenantName> const& tenant(); Optional<TenantName> const& tenant();
bool hasTenant() const; bool hasTenant() const;
int64_t tenantId() const { return tenantId_; }
void trySetTenantId(int64_t tenantId) {
if (tenantId_ == TenantInfo::INVALID_TENANT) {
tenantId_ = tenantId;
}
}
Future<Void> handleUnknownTenant();
private: private:
Optional<TenantName> tenant_; Optional<TenantName> tenant_;
int64_t tenantId_ = TenantInfo::INVALID_TENANT;
bool tenantSet; bool tenantSet;
}; };
@ -406,7 +415,7 @@ public:
// The returned list would still be in form of [keys.begin, splitPoint1, splitPoint2, ... , keys.end] // The returned list would still be in form of [keys.begin, splitPoint1, splitPoint2, ... , keys.end]
Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(KeyRange const& keys, int64_t chunkSize); Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(KeyRange const& keys, int64_t chunkSize);
Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRange& range); Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRange& range, int rangeLimit);
Future<Standalone<VectorRef<BlobGranuleChunkRef>>> readBlobGranules(const KeyRange& range, Future<Standalone<VectorRef<BlobGranuleChunkRef>>> readBlobGranules(const KeyRange& range,
Version begin, Version begin,
Optional<Version> readVersion, Optional<Version> readVersion,

View File

@ -20,6 +20,7 @@
#ifndef FDBCLIENT_READYOURWRITES_H #ifndef FDBCLIENT_READYOURWRITES_H
#define FDBCLIENT_READYOURWRITES_H #define FDBCLIENT_READYOURWRITES_H
#include "Status.h"
#pragma once #pragma once
#include "fdbclient/NativeAPI.actor.h" #include "fdbclient/NativeAPI.actor.h"
@ -120,7 +121,7 @@ public:
Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRange& range, int64_t chunkSize) override; Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRange& range, int64_t chunkSize) override;
Future<int64_t> getEstimatedRangeSizeBytes(const KeyRange& keys) override; Future<int64_t> getEstimatedRangeSizeBytes(const KeyRange& keys) override;
Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRange& range) override; Future<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRange& range, int rangeLimit) override;
Future<Standalone<VectorRef<BlobGranuleChunkRef>>> readBlobGranules(const KeyRange& range, Future<Standalone<VectorRef<BlobGranuleChunkRef>>> readBlobGranules(const KeyRange& range,
Version begin, Version begin,
Optional<Version> readVersion, Optional<Version> readVersion,
@ -192,7 +193,17 @@ public:
KeyRangeMap<std::pair<bool, Optional<Value>>>& getSpecialKeySpaceWriteMap() { return specialKeySpaceWriteMap; } KeyRangeMap<std::pair<bool, Optional<Value>>>& getSpecialKeySpaceWriteMap() { return specialKeySpaceWriteMap; }
bool readYourWritesDisabled() const { return options.readYourWritesDisabled; } bool readYourWritesDisabled() const { return options.readYourWritesDisabled; }
const Optional<std::string>& getSpecialKeySpaceErrorMsg() { return specialKeySpaceErrorMsg; } const Optional<std::string>& getSpecialKeySpaceErrorMsg() { return specialKeySpaceErrorMsg; }
void setSpecialKeySpaceErrorMsg(const std::string& msg) { specialKeySpaceErrorMsg = msg; } void setSpecialKeySpaceErrorMsg(const std::string& msg) {
if (g_network && g_network->isSimulated()) {
try {
readJSONStrictly(msg);
} catch (Error& e) {
TraceEvent(SevError, "InvalidSpecialKeySpaceErrorMessage").error(e).detail("Message", msg);
ASSERT(false);
}
}
specialKeySpaceErrorMsg = msg;
}
Transaction& getTransaction() { return tr; } Transaction& getTransaction() { return tr; }
Optional<TenantName> getTenant() { return tr.getTenant(); } Optional<TenantName> getTenant() { return tr.getTenant(); }

View File

@ -50,7 +50,6 @@ public:
bool PEEK_USING_STREAMING; bool PEEK_USING_STREAMING;
double TLOG_TIMEOUT; // tlog OR commit proxy failure - master's reaction time double TLOG_TIMEOUT; // tlog OR commit proxy failure - master's reaction time
double TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS; // Warns if a tlog takes too long to rejoin double TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS; // Warns if a tlog takes too long to rejoin
double RECOVERY_TLOG_SMART_QUORUM_DELAY; // smaller might be better for bug amplification
double TLOG_STORAGE_MIN_UPDATE_INTERVAL; double TLOG_STORAGE_MIN_UPDATE_INTERVAL;
double BUGGIFY_TLOG_STORAGE_MIN_UPDATE_INTERVAL; double BUGGIFY_TLOG_STORAGE_MIN_UPDATE_INTERVAL;
int DESIRED_TOTAL_BYTES; int DESIRED_TOTAL_BYTES;
@ -58,10 +57,6 @@ public:
double UPDATE_DELAY; double UPDATE_DELAY;
int MAXIMUM_PEEK_BYTES; int MAXIMUM_PEEK_BYTES;
int APPLY_MUTATION_BYTES; int APPLY_MUTATION_BYTES;
int RECOVERY_DATA_BYTE_LIMIT;
int BUGGIFY_RECOVERY_DATA_LIMIT;
double LONG_TLOG_COMMIT_TIME;
int64_t LARGE_TLOG_COMMIT_BYTES;
double BUGGIFY_RECOVER_MEMORY_LIMIT; double BUGGIFY_RECOVER_MEMORY_LIMIT;
double BUGGIFY_WORKER_REMOVED_MAX_LAG; double BUGGIFY_WORKER_REMOVED_MAX_LAG;
int64_t UPDATE_STORAGE_BYTE_LIMIT; int64_t UPDATE_STORAGE_BYTE_LIMIT;
@ -123,16 +118,16 @@ public:
double BG_REBALANCE_POLLING_INTERVAL; double BG_REBALANCE_POLLING_INTERVAL;
double BG_REBALANCE_SWITCH_CHECK_INTERVAL; double BG_REBALANCE_SWITCH_CHECK_INTERVAL;
double DD_QUEUE_LOGGING_INTERVAL; double DD_QUEUE_LOGGING_INTERVAL;
double DD_QUEUE_COUNTER_REFRESH_INTERVAL;
double DD_QUEUE_COUNTER_MAX_LOG; // max number of servers for which trace events will be generated in each round of
// DD_QUEUE_COUNTER_REFRESH_INTERVAL duration
bool DD_QUEUE_COUNTER_SUMMARIZE; // Enable summary of remaining servers when the number of servers with ongoing
// relocations in the last minute exceeds DD_QUEUE_COUNTER_MAX_LOG
double RELOCATION_PARALLELISM_PER_SOURCE_SERVER; double RELOCATION_PARALLELISM_PER_SOURCE_SERVER;
double RELOCATION_PARALLELISM_PER_DEST_SERVER; double RELOCATION_PARALLELISM_PER_DEST_SERVER;
int DD_QUEUE_MAX_KEY_SERVERS; int DD_QUEUE_MAX_KEY_SERVERS;
int DD_REBALANCE_PARALLELISM; int DD_REBALANCE_PARALLELISM;
int DD_REBALANCE_RESET_AMOUNT; int DD_REBALANCE_RESET_AMOUNT;
double BG_DD_MAX_WAIT;
double BG_DD_MIN_WAIT;
double BG_DD_INCREASE_RATE;
double BG_DD_DECREASE_RATE;
double BG_DD_SATURATION_DELAY;
double INFLIGHT_PENALTY_HEALTHY; double INFLIGHT_PENALTY_HEALTHY;
double INFLIGHT_PENALTY_REDUNDANT; double INFLIGHT_PENALTY_REDUNDANT;
double INFLIGHT_PENALTY_UNHEALTHY; double INFLIGHT_PENALTY_UNHEALTHY;
@ -195,7 +190,6 @@ public:
double SERVER_LIST_DELAY; double SERVER_LIST_DELAY;
double RECRUITMENT_IDLE_DELAY; double RECRUITMENT_IDLE_DELAY;
double STORAGE_RECRUITMENT_DELAY; double STORAGE_RECRUITMENT_DELAY;
double BLOB_WORKER_RECRUITMENT_DELAY;
bool TSS_HACK_IDENTITY_MAPPING; bool TSS_HACK_IDENTITY_MAPPING;
double TSS_RECRUITMENT_TIMEOUT; double TSS_RECRUITMENT_TIMEOUT;
double TSS_DD_CHECK_INTERVAL; double TSS_DD_CHECK_INTERVAL;
@ -234,6 +228,8 @@ public:
int DD_TEAM_ZERO_SERVER_LEFT_LOG_DELAY; int DD_TEAM_ZERO_SERVER_LEFT_LOG_DELAY;
int DD_STORAGE_WIGGLE_PAUSE_THRESHOLD; // How many unhealthy relocations are ongoing will pause storage wiggle int DD_STORAGE_WIGGLE_PAUSE_THRESHOLD; // How many unhealthy relocations are ongoing will pause storage wiggle
int DD_STORAGE_WIGGLE_STUCK_THRESHOLD; // How many times bestTeamStuck accumulate will pause storage wiggle int DD_STORAGE_WIGGLE_STUCK_THRESHOLD; // How many times bestTeamStuck accumulate will pause storage wiggle
int64_t
DD_STORAGE_WIGGLE_MIN_SS_AGE_SEC; // Minimal age of a correct-configured server before it's chosen to be wiggled
bool DD_TENANT_AWARENESS_ENABLED; bool DD_TENANT_AWARENESS_ENABLED;
int TENANT_CACHE_LIST_REFRESH_INTERVAL; // How often the TenantCache is refreshed int TENANT_CACHE_LIST_REFRESH_INTERVAL; // How often the TenantCache is refreshed
@ -328,6 +324,7 @@ public:
std::string DEFAULT_FDB_ROCKSDB_COLUMN_FAMILY; std::string DEFAULT_FDB_ROCKSDB_COLUMN_FAMILY;
bool ROCKSDB_PERFCONTEXT_ENABLE; // Enable rocks perf context metrics. May cause performance overhead bool ROCKSDB_PERFCONTEXT_ENABLE; // Enable rocks perf context metrics. May cause performance overhead
double ROCKSDB_PERFCONTEXT_SAMPLE_RATE; double ROCKSDB_PERFCONTEXT_SAMPLE_RATE;
double ROCKSDB_METRICS_SAMPLE_INTERVAL;
int ROCKSDB_MAX_SUBCOMPACTIONS; int ROCKSDB_MAX_SUBCOMPACTIONS;
int64_t ROCKSDB_SOFT_PENDING_COMPACT_BYTES_LIMIT; int64_t ROCKSDB_SOFT_PENDING_COMPACT_BYTES_LIMIT;
int64_t ROCKSDB_HARD_PENDING_COMPACT_BYTES_LIMIT; int64_t ROCKSDB_HARD_PENDING_COMPACT_BYTES_LIMIT;
@ -337,6 +334,10 @@ public:
int64_t ROCKSDB_COMPACTION_READAHEAD_SIZE; int64_t ROCKSDB_COMPACTION_READAHEAD_SIZE;
int64_t ROCKSDB_BLOCK_SIZE; int64_t ROCKSDB_BLOCK_SIZE;
bool ENABLE_SHARDED_ROCKSDB; bool ENABLE_SHARDED_ROCKSDB;
int64_t ROCKSDB_WRITE_BUFFER_SIZE;
int64_t ROCKSDB_MAX_TOTAL_WAL_SIZE;
int64_t ROCKSDB_MAX_BACKGROUND_JOBS;
int64_t ROCKSDB_DELETE_OBSOLETE_FILE_PERIOD;
// Leader election // Leader election
int MAX_NOTIFICATIONS; int MAX_NOTIFICATIONS;
@ -548,6 +549,8 @@ public:
double RATEKEEPER_DEFAULT_LIMIT; double RATEKEEPER_DEFAULT_LIMIT;
double RATEKEEPER_LIMIT_REASON_SAMPLE_RATE; double RATEKEEPER_LIMIT_REASON_SAMPLE_RATE;
bool RATEKEEPER_PRINT_LIMIT_REASON; bool RATEKEEPER_PRINT_LIMIT_REASON;
double RATEKEEPER_MIN_RATE;
double RATEKEEPER_MAX_RATE;
int64_t TARGET_BYTES_PER_STORAGE_SERVER; int64_t TARGET_BYTES_PER_STORAGE_SERVER;
int64_t SPRING_BYTES_STORAGE_SERVER; int64_t SPRING_BYTES_STORAGE_SERVER;
@ -591,6 +594,8 @@ public:
// Use global tag throttling strategy. i.e. throttle based on the cluster-wide // Use global tag throttling strategy. i.e. throttle based on the cluster-wide
// throughput for tags and their associated quotas. // throughput for tags and their associated quotas.
bool GLOBAL_TAG_THROTTLING; bool GLOBAL_TAG_THROTTLING;
// Enforce tag throttling on proxies rather than on clients
bool ENFORCE_TAG_THROTTLING_ON_PROXIES;
// Minimum number of transactions per second that the global tag throttler must allow for each tag // Minimum number of transactions per second that the global tag throttler must allow for each tag
double GLOBAL_TAG_THROTTLING_MIN_RATE; double GLOBAL_TAG_THROTTLING_MIN_RATE;
// Used by global tag throttling counters // Used by global tag throttling counters
@ -618,8 +623,17 @@ public:
double INITIAL_DURABILITY_LAG_MULTIPLIER; double INITIAL_DURABILITY_LAG_MULTIPLIER;
double DURABILITY_LAG_REDUCTION_RATE; double DURABILITY_LAG_REDUCTION_RATE;
double DURABILITY_LAG_INCREASE_RATE; double DURABILITY_LAG_INCREASE_RATE;
double STORAGE_SERVER_LIST_FETCH_TIMEOUT; double STORAGE_SERVER_LIST_FETCH_TIMEOUT;
bool BW_THROTTLING_ENABLED;
double TARGET_BW_LAG;
double TARGET_BW_LAG_BATCH;
double TARGET_BW_LAG_UPDATE;
int MIN_BW_HISTORY;
double BW_ESTIMATION_INTERVAL;
double BW_LAG_INCREASE_AMOUNT;
double BW_LAG_DECREASE_AMOUNT;
double BW_FETCH_WORKERS_INTERVAL;
double BW_RW_LOGGING_INTERVAL;
// disk snapshot // disk snapshot
int64_t MAX_FORKED_PROCESS_OUTPUT; int64_t MAX_FORKED_PROCESS_OUTPUT;
@ -663,7 +677,6 @@ public:
int FETCH_KEYS_PARALLELISM; int FETCH_KEYS_PARALLELISM;
int FETCH_KEYS_PARALLELISM_FULL; int FETCH_KEYS_PARALLELISM_FULL;
int FETCH_KEYS_LOWER_PRIORITY; int FETCH_KEYS_LOWER_PRIORITY;
int FETCH_CHANGEFEED_PARALLELISM;
int SERVE_FETCH_CHECKPOINT_PARALLELISM; int SERVE_FETCH_CHECKPOINT_PARALLELISM;
int BUGGIFY_BLOCK_BYTES; int BUGGIFY_BLOCK_BYTES;
int64_t STORAGE_RECOVERY_VERSION_LAG_LIMIT; int64_t STORAGE_RECOVERY_VERSION_LAG_LIMIT;
@ -672,7 +685,6 @@ public:
int STORAGE_COMMIT_BYTES; int STORAGE_COMMIT_BYTES;
int STORAGE_FETCH_BYTES; int STORAGE_FETCH_BYTES;
double STORAGE_COMMIT_INTERVAL; double STORAGE_COMMIT_INTERVAL;
double UPDATE_SHARD_VERSION_INTERVAL;
int BYTE_SAMPLING_FACTOR; int BYTE_SAMPLING_FACTOR;
int BYTE_SAMPLING_OVERHEAD; int BYTE_SAMPLING_OVERHEAD;
int MAX_STORAGE_SERVER_WATCH_BYTES; int MAX_STORAGE_SERVER_WATCH_BYTES;
@ -681,7 +693,6 @@ public:
int BYTE_SAMPLE_LOAD_PARALLELISM; int BYTE_SAMPLE_LOAD_PARALLELISM;
double BYTE_SAMPLE_LOAD_DELAY; double BYTE_SAMPLE_LOAD_DELAY;
double BYTE_SAMPLE_START_DELAY; double BYTE_SAMPLE_START_DELAY;
double UPDATE_STORAGE_PROCESS_STATS_INTERVAL;
double BEHIND_CHECK_DELAY; double BEHIND_CHECK_DELAY;
int BEHIND_CHECK_COUNT; int BEHIND_CHECK_COUNT;
int64_t BEHIND_CHECK_VERSIONS; int64_t BEHIND_CHECK_VERSIONS;
@ -752,7 +763,6 @@ public:
// Dynamic Knobs (implementation) // Dynamic Knobs (implementation)
double COMPACTION_INTERVAL; double COMPACTION_INTERVAL;
double UPDATE_NODE_TIMEOUT;
double GET_COMMITTED_VERSION_TIMEOUT; double GET_COMMITTED_VERSION_TIMEOUT;
double GET_SNAPSHOT_AND_CHANGES_TIMEOUT; double GET_SNAPSHOT_AND_CHANGES_TIMEOUT;
double FETCH_CHANGES_TIMEOUT; double FETCH_CHANGES_TIMEOUT;
@ -768,14 +778,6 @@ public:
bool DISABLE_DUPLICATE_LOG_WARNING; bool DISABLE_DUPLICATE_LOG_WARNING;
double HISTOGRAM_REPORT_INTERVAL; double HISTOGRAM_REPORT_INTERVAL;
// IPager
int PAGER_RESERVED_PAGES;
// IndirectShadowPager
int FREE_PAGE_VACUUM_THRESHOLD;
int VACUUM_QUEUE_SIZE;
int VACUUM_BYTES_PER_SECOND;
// Timekeeper // Timekeeper
int64_t TIME_KEEPER_DELAY; int64_t TIME_KEEPER_DELAY;
int64_t TIME_KEEPER_MAX_ENTRIES; int64_t TIME_KEEPER_MAX_ENTRIES;
@ -798,11 +800,9 @@ public:
int64_t FASTRESTORE_ROLE_LOGGING_DELAY; int64_t FASTRESTORE_ROLE_LOGGING_DELAY;
int64_t FASTRESTORE_UPDATE_PROCESS_STATS_INTERVAL; // How quickly to update process metrics for restore int64_t FASTRESTORE_UPDATE_PROCESS_STATS_INTERVAL; // How quickly to update process metrics for restore
int64_t FASTRESTORE_ATOMICOP_WEIGHT; // workload amplication factor for atomic op int64_t FASTRESTORE_ATOMICOP_WEIGHT; // workload amplication factor for atomic op
int64_t FASTRESTORE_APPLYING_PARALLELISM; // number of outstanding txns writing to dest. DB
int64_t FASTRESTORE_MONITOR_LEADER_DELAY; int64_t FASTRESTORE_MONITOR_LEADER_DELAY;
int64_t FASTRESTORE_STRAGGLER_THRESHOLD_SECONDS; int64_t FASTRESTORE_STRAGGLER_THRESHOLD_SECONDS;
bool FASTRESTORE_TRACK_REQUEST_LATENCY; // true to track reply latency of each request in a request batch bool FASTRESTORE_TRACK_REQUEST_LATENCY; // true to track reply latency of each request in a request batch
bool FASTRESTORE_TRACK_LOADER_SEND_REQUESTS; // track requests of load send mutations to appliers?
int64_t FASTRESTORE_MEMORY_THRESHOLD_MB_SOFT; // threshold when pipelined actors should be delayed int64_t FASTRESTORE_MEMORY_THRESHOLD_MB_SOFT; // threshold when pipelined actors should be delayed
int64_t FASTRESTORE_WAIT_FOR_MEMORY_LATENCY; int64_t FASTRESTORE_WAIT_FOR_MEMORY_LATENCY;
int64_t FASTRESTORE_HEARTBEAT_DELAY; // interval for master to ping loaders and appliers int64_t FASTRESTORE_HEARTBEAT_DELAY; // interval for master to ping loaders and appliers
@ -903,10 +903,15 @@ public:
int BG_KEY_TUPLE_TRUNCATE_OFFSET; int BG_KEY_TUPLE_TRUNCATE_OFFSET;
int BLOB_WORKER_INITIAL_SNAPSHOT_PARALLELISM; int BLOB_WORKER_INITIAL_SNAPSHOT_PARALLELISM;
int BLOB_WORKER_RESNAPSHOT_PARALLELISM;
int BLOB_WORKER_DELTA_FILE_WRITE_PARALLELISM;
double BLOB_WORKER_TIMEOUT; // Blob Manager's reaction time to a blob worker failure double BLOB_WORKER_TIMEOUT; // Blob Manager's reaction time to a blob worker failure
double BLOB_WORKER_REQUEST_TIMEOUT; // Blob Worker's server-side request timeout double BLOB_WORKER_REQUEST_TIMEOUT; // Blob Worker's server-side request timeout
double BLOB_WORKERLIST_FETCH_INTERVAL; double BLOB_WORKERLIST_FETCH_INTERVAL;
double BLOB_WORKER_BATCH_GRV_INTERVAL; double BLOB_WORKER_BATCH_GRV_INTERVAL;
bool BLOB_WORKER_DO_REJECT_WHEN_FULL;
double BLOB_WORKER_REJECT_WHEN_FULL_THRESHOLD;
double BLOB_MANAGER_STATUS_EXP_BACKOFF_MIN; double BLOB_MANAGER_STATUS_EXP_BACKOFF_MIN;
double BLOB_MANAGER_STATUS_EXP_BACKOFF_MAX; double BLOB_MANAGER_STATUS_EXP_BACKOFF_MAX;

View File

@ -548,6 +548,15 @@ public:
Future<Optional<std::string>> commit(ReadYourWritesTransaction* ryw) override; Future<Optional<std::string>> commit(ReadYourWritesTransaction* ryw) override;
}; };
class WorkerInterfacesSpecialKeyImpl : public SpecialKeyRangeReadImpl {
public:
explicit WorkerInterfacesSpecialKeyImpl(KeyRangeRef kr);
Future<RangeResult> getRange(ReadYourWritesTransaction* ryw,
KeyRangeRef kr,
GetRangeLimits limitsHint) const override;
};
// If the underlying set of key-value pairs of a key space is not changing, then we expect repeating a read to give the // If the underlying set of key-value pairs of a key space is not changing, then we expect repeating a read to give the
// same result. Additionally, we can generate the expected result of any read if that read is reading a subrange. This // same result. Additionally, we can generate the expected result of any read if that read is reading a subrange. This
// actor performs a read of an arbitrary subrange of [begin, end) and validates the results. // actor performs a read of an arbitrary subrange of [begin, end) and validates the results.

View File

@ -594,6 +594,8 @@ const Value blobManagerEpochValueFor(int64_t epoch);
int64_t decodeBlobManagerEpochValue(ValueRef const& value); int64_t decodeBlobManagerEpochValue(ValueRef const& value);
// blob granule keys // blob granule keys
extern const StringRef blobRangeActive;
extern const StringRef blobRangeInactive;
extern const uint8_t BG_FILE_TYPE_DELTA; extern const uint8_t BG_FILE_TYPE_DELTA;
extern const uint8_t BG_FILE_TYPE_SNAPSHOT; extern const uint8_t BG_FILE_TYPE_SNAPSHOT;
@ -621,7 +623,8 @@ extern const KeyRangeRef blobGranuleHistoryKeys;
// \xff\x02/bgp/(start,end) = (version, force) // \xff\x02/bgp/(start,end) = (version, force)
extern const KeyRangeRef blobGranulePurgeKeys; extern const KeyRangeRef blobGranulePurgeKeys;
extern const KeyRangeRef blobGranuleVersionKeys; // \xff\x02/bgpforce/(start) = {1|0} (key range map)
extern const KeyRangeRef blobGranuleForcePurgedKeys;
extern const KeyRef blobGranulePurgeChangeKey; extern const KeyRef blobGranulePurgeChangeKey;
const Key blobGranuleFileKeyFor(UID granuleID, Version fileVersion, uint8_t fileType); const Key blobGranuleFileKeyFor(UID granuleID, Version fileVersion, uint8_t fileType);

View File

@ -44,14 +44,23 @@ typedef Standalone<TenantGroupNameRef> TenantGroupName;
// REMOVING - the tenant has been marked for removal and is being removed on the data cluster // REMOVING - the tenant has been marked for removal and is being removed on the data cluster
// UPDATING_CONFIGURATION - the tenant configuration has changed on the management cluster and is being applied to the // UPDATING_CONFIGURATION - the tenant configuration has changed on the management cluster and is being applied to the
// data cluster // data cluster
// ERROR - currently unused // RENAMING_FROM - the tenant is being renamed to a new name and is awaiting the rename to complete on the data cluster
// RENAMING_TO - the tenant is being created as a rename from an existing tenant and is awaiting the rename to complete
// on the data cluster
// ERROR - the tenant is in an error state
// //
// A tenant in any configuration is allowed to be removed. Only tenants in the READY or UPDATING_CONFIGURATION phases // A tenant in any configuration is allowed to be removed. Only tenants in the READY or UPDATING_CONFIGURATION phases
// can have their configuration updated. A tenant must not exist or be in the REGISTERING phase to be created. // can have their configuration updated. A tenant must not exist or be in the REGISTERING phase to be created. To be
// renamed, a tenant must be in the READY or RENAMING_FROM state. In the latter case, the rename destination must match
// the original rename attempt.
// //
// If an operation fails and the tenant is left in a non-ready state, re-running the same operation is legal. If // If an operation fails and the tenant is left in a non-ready state, re-running the same operation is legal. If
// successful, the tenant will return to the READY state. // successful, the tenant will return to the READY state.
enum class TenantState { REGISTERING, READY, REMOVING, UPDATING_CONFIGURATION, ERROR }; enum class TenantState { REGISTERING, READY, REMOVING, UPDATING_CONFIGURATION, RENAMING_FROM, RENAMING_TO, ERROR };
// Represents the lock state the tenant could be in.
// Can be used in conjunction with the other tenant states above.
enum class TenantLockState { UNLOCKED, READ_ONLY, LOCKED };
struct TenantMapEntry { struct TenantMapEntry {
constexpr static FileIdentifier file_identifier = 12247338; constexpr static FileIdentifier file_identifier = 12247338;
@ -65,10 +74,15 @@ struct TenantMapEntry {
int64_t id = -1; int64_t id = -1;
Key prefix; Key prefix;
TenantState tenantState = TenantState::READY; TenantState tenantState = TenantState::READY;
TenantLockState tenantLockState = TenantLockState::UNLOCKED;
Optional<TenantGroupName> tenantGroup; Optional<TenantGroupName> tenantGroup;
bool encrypted = false; bool encrypted = false;
Optional<ClusterName> assignedCluster; Optional<ClusterName> assignedCluster;
int64_t configurationSequenceNum = 0; int64_t configurationSequenceNum = 0;
Optional<TenantName> renamePair;
// Can be set to an error string if the tenant is in the ERROR state
std::string error;
constexpr static int PREFIX_SIZE = sizeof(id); constexpr static int PREFIX_SIZE = sizeof(id);
@ -89,7 +103,16 @@ struct TenantMapEntry {
template <class Ar> template <class Ar>
void serialize(Ar& ar) { void serialize(Ar& ar) {
serializer(ar, id, tenantState, tenantGroup, encrypted, assignedCluster, configurationSequenceNum); serializer(ar,
id,
tenantState,
tenantLockState,
tenantGroup,
encrypted,
assignedCluster,
configurationSequenceNum,
renamePair,
error);
if constexpr (Ar::isDeserializing) { if constexpr (Ar::isDeserializing) {
if (id >= 0) { if (id >= 0) {
prefix = idToPrefix(id); prefix = idToPrefix(id);
@ -141,6 +164,7 @@ struct TenantMetadataSpecification {
Key subspace; Key subspace;
KeyBackedObjectMap<TenantName, TenantMapEntry, decltype(IncludeVersion()), NullCodec> tenantMap; KeyBackedObjectMap<TenantName, TenantMapEntry, decltype(IncludeVersion()), NullCodec> tenantMap;
KeyBackedMap<int64_t, TenantName> tenantIdIndex;
KeyBackedProperty<int64_t> lastTenantId; KeyBackedProperty<int64_t> lastTenantId;
KeyBackedBinaryValue<int64_t> tenantCount; KeyBackedBinaryValue<int64_t> tenantCount;
KeyBackedSet<int64_t> tenantTombstones; KeyBackedSet<int64_t> tenantTombstones;
@ -150,26 +174,27 @@ struct TenantMetadataSpecification {
TenantMetadataSpecification(KeyRef prefix) TenantMetadataSpecification(KeyRef prefix)
: subspace(prefix.withSuffix("tenant/"_sr)), tenantMap(subspace.withSuffix("map/"_sr), IncludeVersion()), : subspace(prefix.withSuffix("tenant/"_sr)), tenantMap(subspace.withSuffix("map/"_sr), IncludeVersion()),
lastTenantId(subspace.withSuffix("lastId"_sr)), tenantCount(subspace.withSuffix("count"_sr)), tenantIdIndex(subspace.withSuffix("idIndex/"_sr)), lastTenantId(subspace.withSuffix("lastId"_sr)),
tenantTombstones(subspace.withSuffix("tombstones/"_sr)), tenantCount(subspace.withSuffix("count"_sr)), tenantTombstones(subspace.withSuffix("tombstones/"_sr)),
tombstoneCleanupData(subspace.withSuffix("tombstoneCleanup"_sr), IncludeVersion()), tombstoneCleanupData(subspace.withSuffix("tombstoneCleanup"_sr), IncludeVersion()),
tenantGroupTenantIndex(subspace.withSuffix("tenantGroup/tenantIndex/"_sr)), tenantGroupTenantIndex(subspace.withSuffix("tenantGroup/tenantIndex/"_sr)),
tenantGroupMap(subspace.withSuffix("tenantGroup/map/"_sr), IncludeVersion()) {} tenantGroupMap(subspace.withSuffix("tenantGroup/map/"_sr), IncludeVersion()) {}
}; };
struct TenantMetadata { struct TenantMetadata {
static inline TenantMetadataSpecification instance = TenantMetadataSpecification("\xff/"_sr); static TenantMetadataSpecification& instance();
static inline auto& subspace = instance.subspace; static inline auto& subspace() { return instance().subspace; }
static inline auto& tenantMap = instance.tenantMap; static inline auto& tenantMap() { return instance().tenantMap; }
static inline auto& lastTenantId = instance.lastTenantId; static inline auto& tenantIdIndex() { return instance().tenantIdIndex; }
static inline auto& tenantCount = instance.tenantCount; static inline auto& lastTenantId() { return instance().lastTenantId; }
static inline auto& tenantTombstones = instance.tenantTombstones; static inline auto& tenantCount() { return instance().tenantCount; }
static inline auto& tombstoneCleanupData = instance.tombstoneCleanupData; static inline auto& tenantTombstones() { return instance().tenantTombstones; }
static inline auto& tenantGroupTenantIndex = instance.tenantGroupTenantIndex; static inline auto& tombstoneCleanupData() { return instance().tombstoneCleanupData; }
static inline auto& tenantGroupMap = instance.tenantGroupMap; static inline auto& tenantGroupTenantIndex() { return instance().tenantGroupTenantIndex; }
static inline auto& tenantGroupMap() { return instance().tenantGroupMap; }
static inline Key tenantMapPrivatePrefix = "\xff"_sr.withSuffix(tenantMap.subspace.begin); static Key tenantMapPrivatePrefix();
}; };
typedef VersionedMap<TenantName, TenantMapEntry> TenantMap; typedef VersionedMap<TenantName, TenantMapEntry> TenantMap;

View File

@ -40,7 +40,7 @@ namespace TenantAPI {
template <class Transaction> template <class Transaction>
Future<Optional<TenantMapEntry>> tryGetTenantTransaction(Transaction tr, TenantName name) { Future<Optional<TenantMapEntry>> tryGetTenantTransaction(Transaction tr, TenantName name) {
tr->setOption(FDBTransactionOptions::RAW_ACCESS); tr->setOption(FDBTransactionOptions::RAW_ACCESS);
return TenantMetadata::tenantMap.get(tr, name); return TenantMetadata::tenantMap().get(tr, name);
} }
ACTOR template <class DB> ACTOR template <class DB>
@ -82,7 +82,7 @@ Future<TenantMapEntry> getTenant(Reference<DB> db, TenantName name) {
ACTOR template <class Transaction> ACTOR template <class Transaction>
Future<ClusterType> getClusterType(Transaction tr) { Future<ClusterType> getClusterType(Transaction tr) {
Optional<MetaclusterRegistrationEntry> metaclusterRegistration = Optional<MetaclusterRegistrationEntry> metaclusterRegistration =
wait(MetaclusterMetadata::metaclusterRegistration.get(tr)); wait(MetaclusterMetadata::metaclusterRegistration().get(tr));
return metaclusterRegistration.present() ? metaclusterRegistration.get().clusterType : ClusterType::STANDALONE; return metaclusterRegistration.present() ? metaclusterRegistration.get().clusterType : ClusterType::STANDALONE;
} }
@ -111,11 +111,11 @@ TenantMode tenantModeForClusterType(ClusterType clusterType, TenantMode tenantMo
// that we no longer keep tombstones for it, an error is thrown. // that we no longer keep tombstones for it, an error is thrown.
ACTOR template <class Transaction> ACTOR template <class Transaction>
Future<bool> checkTombstone(Transaction tr, int64_t id) { Future<bool> checkTombstone(Transaction tr, int64_t id) {
state Future<bool> tombstoneFuture = TenantMetadata::tenantTombstones.exists(tr, id); state Future<bool> tombstoneFuture = TenantMetadata::tenantTombstones().exists(tr, id);
// If we are trying to create a tenant older than the oldest tombstones we still maintain, then we fail it // If we are trying to create a tenant older than the oldest tombstones we still maintain, then we fail it
// with an error. // with an error.
Optional<TenantTombstoneCleanupData> tombstoneCleanupData = wait(TenantMetadata::tombstoneCleanupData.get(tr)); Optional<TenantTombstoneCleanupData> tombstoneCleanupData = wait(TenantMetadata::tombstoneCleanupData().get(tr));
if (tombstoneCleanupData.present() && tombstoneCleanupData.get().tombstonesErasedThrough >= id) { if (tombstoneCleanupData.present() && tombstoneCleanupData.get().tombstonesErasedThrough >= id) {
throw tenant_creation_permanently_failed(); throw tenant_creation_permanently_failed();
} }
@ -151,7 +151,7 @@ Future<std::pair<Optional<TenantMapEntry>, bool>> createTenantTransaction(
(clusterType == ClusterType::STANDALONE) ? false : checkTombstone(tr, tenantEntry.id); (clusterType == ClusterType::STANDALONE) ? false : checkTombstone(tr, tenantEntry.id);
state Future<Optional<TenantGroupEntry>> existingTenantGroupEntryFuture; state Future<Optional<TenantGroupEntry>> existingTenantGroupEntryFuture;
if (tenantEntry.tenantGroup.present()) { if (tenantEntry.tenantGroup.present()) {
existingTenantGroupEntryFuture = TenantMetadata::tenantGroupMap.get(tr, tenantEntry.tenantGroup.get()); existingTenantGroupEntryFuture = TenantMetadata::tenantGroupMap().get(tr, tenantEntry.tenantGroup.get());
} }
wait(tenantModeCheck); wait(tenantModeCheck);
@ -176,23 +176,25 @@ Future<std::pair<Optional<TenantMapEntry>, bool>> createTenantTransaction(
tenantEntry.tenantState = TenantState::READY; tenantEntry.tenantState = TenantState::READY;
tenantEntry.assignedCluster = Optional<ClusterName>(); tenantEntry.assignedCluster = Optional<ClusterName>();
TenantMetadata::tenantMap.set(tr, name, tenantEntry); TenantMetadata::tenantMap().set(tr, name, tenantEntry);
TenantMetadata::tenantIdIndex().set(tr, tenantEntry.id, name);
if (tenantEntry.tenantGroup.present()) { if (tenantEntry.tenantGroup.present()) {
TenantMetadata::tenantGroupTenantIndex.insert(tr, Tuple::makeTuple(tenantEntry.tenantGroup.get(), name)); TenantMetadata::tenantGroupTenantIndex().insert(tr, Tuple::makeTuple(tenantEntry.tenantGroup.get(), name));
// Create the tenant group associated with this tenant if it doesn't already exist // Create the tenant group associated with this tenant if it doesn't already exist
Optional<TenantGroupEntry> existingTenantGroup = wait(existingTenantGroupEntryFuture); Optional<TenantGroupEntry> existingTenantGroup = wait(existingTenantGroupEntryFuture);
if (!existingTenantGroup.present()) { if (!existingTenantGroup.present()) {
TenantMetadata::tenantGroupMap.set(tr, tenantEntry.tenantGroup.get(), TenantGroupEntry()); TenantMetadata::tenantGroupMap().set(tr, tenantEntry.tenantGroup.get(), TenantGroupEntry());
} }
} }
// This is idempotent because we only add an entry to the tenant map if it isn't already there // This is idempotent because we only add an entry to the tenant map if it isn't already there
TenantMetadata::tenantCount.atomicOp(tr, 1, MutationRef::AddValue); TenantMetadata::tenantCount().atomicOp(tr, 1, MutationRef::AddValue);
// Read the tenant count after incrementing the counter so that simultaneous attempts to create // Read the tenant count after incrementing the counter so that simultaneous attempts to create
// tenants in the same transaction are properly reflected. // tenants in the same transaction are properly reflected.
int64_t tenantCount = wait(TenantMetadata::tenantCount.getD(tr, Snapshot::False, 0)); int64_t tenantCount = wait(TenantMetadata::tenantCount().getD(tr, Snapshot::False, 0));
if (tenantCount > CLIENT_KNOBS->MAX_TENANTS_PER_CLUSTER) { if (tenantCount > CLIENT_KNOBS->MAX_TENANTS_PER_CLUSTER) {
throw cluster_no_capacity(); throw cluster_no_capacity();
} }
@ -202,7 +204,7 @@ Future<std::pair<Optional<TenantMapEntry>, bool>> createTenantTransaction(
ACTOR template <class Transaction> ACTOR template <class Transaction>
Future<int64_t> getNextTenantId(Transaction tr) { Future<int64_t> getNextTenantId(Transaction tr) {
Optional<int64_t> lastId = wait(TenantMetadata::lastTenantId.get(tr)); Optional<int64_t> lastId = wait(TenantMetadata::lastTenantId().get(tr));
int64_t tenantId = lastId.orDefault(-1) + 1; int64_t tenantId = lastId.orDefault(-1) + 1;
if (BUGGIFY) { if (BUGGIFY) {
tenantId += deterministicRandom()->randomSkewedUInt32(1, 1e9); tenantId += deterministicRandom()->randomSkewedUInt32(1, 1e9);
@ -244,7 +246,7 @@ Future<Optional<TenantMapEntry>> createTenant(Reference<DB> db,
if (generateTenantId) { if (generateTenantId) {
int64_t tenantId = wait(tenantIdFuture); int64_t tenantId = wait(tenantIdFuture);
tenantEntry.setId(tenantId); tenantEntry.setId(tenantId);
TenantMetadata::lastTenantId.set(tr, tenantId); TenantMetadata::lastTenantId().set(tr, tenantId);
} }
state std::pair<Optional<TenantMapEntry>, bool> newTenant = state std::pair<Optional<TenantMapEntry>, bool> newTenant =
@ -297,20 +299,22 @@ Future<Void> deleteTenantTransaction(Transaction tr,
} }
// This is idempotent because we only erase an entry from the tenant map if it is present // This is idempotent because we only erase an entry from the tenant map if it is present
TenantMetadata::tenantMap.erase(tr, name); TenantMetadata::tenantMap().erase(tr, name);
TenantMetadata::tenantCount.atomicOp(tr, -1, MutationRef::AddValue); TenantMetadata::tenantIdIndex().erase(tr, tenantEntry.get().id);
TenantMetadata::tenantCount().atomicOp(tr, -1, MutationRef::AddValue);
if (tenantEntry.get().tenantGroup.present()) { if (tenantEntry.get().tenantGroup.present()) {
TenantMetadata::tenantGroupTenantIndex.erase(tr, TenantMetadata::tenantGroupTenantIndex().erase(tr,
Tuple::makeTuple(tenantEntry.get().tenantGroup.get(), name)); Tuple::makeTuple(tenantEntry.get().tenantGroup.get(), name));
KeyBackedSet<Tuple>::RangeResultType tenantsInGroup = wait(TenantMetadata::tenantGroupTenantIndex.getRange( KeyBackedSet<Tuple>::RangeResultType tenantsInGroup =
wait(TenantMetadata::tenantGroupTenantIndex().getRange(
tr, tr,
Tuple::makeTuple(tenantEntry.get().tenantGroup.get()), Tuple::makeTuple(tenantEntry.get().tenantGroup.get()),
Tuple::makeTuple(keyAfter(tenantEntry.get().tenantGroup.get())), Tuple::makeTuple(keyAfter(tenantEntry.get().tenantGroup.get())),
2)); 2));
if (tenantsInGroup.results.empty() || if (tenantsInGroup.results.empty() ||
(tenantsInGroup.results.size() == 1 && tenantsInGroup.results[0].getString(1) == name)) { (tenantsInGroup.results.size() == 1 && tenantsInGroup.results[0].getString(1) == name)) {
TenantMetadata::tenantGroupMap.erase(tr, tenantEntry.get().tenantGroup.get()); TenantMetadata::tenantGroupMap().erase(tr, tenantEntry.get().tenantGroup.get());
} }
} }
} }
@ -318,8 +322,8 @@ Future<Void> deleteTenantTransaction(Transaction tr,
if (clusterType == ClusterType::METACLUSTER_DATA) { if (clusterType == ClusterType::METACLUSTER_DATA) {
// In data clusters, we store a tombstone // In data clusters, we store a tombstone
state Future<KeyBackedRangeResult<int64_t>> latestTombstoneFuture = state Future<KeyBackedRangeResult<int64_t>> latestTombstoneFuture =
TenantMetadata::tenantTombstones.getRange(tr, {}, {}, 1, Snapshot::False, Reverse::True); TenantMetadata::tenantTombstones().getRange(tr, {}, {}, 1, Snapshot::False, Reverse::True);
state Optional<TenantTombstoneCleanupData> cleanupData = wait(TenantMetadata::tombstoneCleanupData.get(tr)); state Optional<TenantTombstoneCleanupData> cleanupData = wait(TenantMetadata::tombstoneCleanupData().get(tr));
state Version transactionReadVersion = wait(safeThreadFutureToFuture(tr->getReadVersion())); state Version transactionReadVersion = wait(safeThreadFutureToFuture(tr->getReadVersion()));
// If it has been long enough since we last cleaned up the tenant tombstones, we do that first // If it has been long enough since we last cleaned up the tenant tombstones, we do that first
@ -327,7 +331,7 @@ Future<Void> deleteTenantTransaction(Transaction tr,
state int64_t deleteThroughId = cleanupData.present() ? cleanupData.get().nextTombstoneEraseId : -1; state int64_t deleteThroughId = cleanupData.present() ? cleanupData.get().nextTombstoneEraseId : -1;
// Delete all tombstones up through the one currently marked in the cleanup data // Delete all tombstones up through the one currently marked in the cleanup data
if (deleteThroughId >= 0) { if (deleteThroughId >= 0) {
TenantMetadata::tenantTombstones.erase(tr, 0, deleteThroughId + 1); TenantMetadata::tenantTombstones().erase(tr, 0, deleteThroughId + 1);
} }
KeyBackedRangeResult<int64_t> latestTombstone = wait(latestTombstoneFuture); KeyBackedRangeResult<int64_t> latestTombstone = wait(latestTombstoneFuture);
@ -345,15 +349,15 @@ Future<Void> deleteTenantTransaction(Transaction tr,
transactionReadVersion + transactionReadVersion +
CLIENT_KNOBS->TENANT_TOMBSTONE_CLEANUP_INTERVAL * CLIENT_KNOBS->VERSIONS_PER_SECOND; CLIENT_KNOBS->TENANT_TOMBSTONE_CLEANUP_INTERVAL * CLIENT_KNOBS->VERSIONS_PER_SECOND;
TenantMetadata::tombstoneCleanupData.set(tr, updatedCleanupData); TenantMetadata::tombstoneCleanupData().set(tr, updatedCleanupData);
// If the tenant being deleted is within the tombstone window, record the tombstone // If the tenant being deleted is within the tombstone window, record the tombstone
if (tenantId.get() > updatedCleanupData.tombstonesErasedThrough) { if (tenantId.get() > updatedCleanupData.tombstonesErasedThrough) {
TenantMetadata::tenantTombstones.insert(tr, tenantId.get()); TenantMetadata::tenantTombstones().insert(tr, tenantId.get());
} }
} else if (tenantId.get() > cleanupData.get().tombstonesErasedThrough) { } else if (tenantId.get() > cleanupData.get().tombstonesErasedThrough) {
// If the tenant being deleted is within the tombstone window, record the tombstone // If the tenant being deleted is within the tombstone window, record the tombstone
TenantMetadata::tenantTombstones.insert(tr, tenantId.get()); TenantMetadata::tenantTombstones().insert(tr, tenantId.get());
} }
} }
@ -406,8 +410,10 @@ Future<Void> configureTenantTransaction(Transaction tr,
TenantNameRef tenantName, TenantNameRef tenantName,
TenantMapEntry originalEntry, TenantMapEntry originalEntry,
TenantMapEntry updatedTenantEntry) { TenantMapEntry updatedTenantEntry) {
ASSERT(updatedTenantEntry.id == originalEntry.id);
tr->setOption(FDBTransactionOptions::RAW_ACCESS); tr->setOption(FDBTransactionOptions::RAW_ACCESS);
TenantMetadata::tenantMap.set(tr, tenantName, updatedTenantEntry); TenantMetadata::tenantMap().set(tr, tenantName, updatedTenantEntry);
// If the tenant group was changed, we need to update the tenant group metadata structures // If the tenant group was changed, we need to update the tenant group metadata structures
if (originalEntry.tenantGroup != updatedTenantEntry.tenantGroup) { if (originalEntry.tenantGroup != updatedTenantEntry.tenantGroup) {
@ -416,11 +422,11 @@ Future<Void> configureTenantTransaction(Transaction tr,
} }
if (originalEntry.tenantGroup.present()) { if (originalEntry.tenantGroup.present()) {
// Remove this tenant from the original tenant group index // Remove this tenant from the original tenant group index
TenantMetadata::tenantGroupTenantIndex.erase(tr, TenantMetadata::tenantGroupTenantIndex().erase(
Tuple::makeTuple(originalEntry.tenantGroup.get(), tenantName)); tr, Tuple::makeTuple(originalEntry.tenantGroup.get(), tenantName));
// Check if the original tenant group is now empty. If so, remove the tenant group. // Check if the original tenant group is now empty. If so, remove the tenant group.
KeyBackedSet<Tuple>::RangeResultType tenants = wait(TenantMetadata::tenantGroupTenantIndex.getRange( KeyBackedSet<Tuple>::RangeResultType tenants = wait(TenantMetadata::tenantGroupTenantIndex().getRange(
tr, tr,
Tuple::makeTuple(originalEntry.tenantGroup.get()), Tuple::makeTuple(originalEntry.tenantGroup.get()),
Tuple::makeTuple(keyAfter(originalEntry.tenantGroup.get())), Tuple::makeTuple(keyAfter(originalEntry.tenantGroup.get())),
@ -428,19 +434,19 @@ Future<Void> configureTenantTransaction(Transaction tr,
if (tenants.results.empty() || if (tenants.results.empty() ||
(tenants.results.size() == 1 && tenants.results[0].getString(1) == tenantName)) { (tenants.results.size() == 1 && tenants.results[0].getString(1) == tenantName)) {
TenantMetadata::tenantGroupMap.erase(tr, originalEntry.tenantGroup.get()); TenantMetadata::tenantGroupMap().erase(tr, originalEntry.tenantGroup.get());
} }
} }
if (updatedTenantEntry.tenantGroup.present()) { if (updatedTenantEntry.tenantGroup.present()) {
// If this is creating a new tenant group, add it to the tenant group map // If this is creating a new tenant group, add it to the tenant group map
Optional<TenantGroupEntry> entry = Optional<TenantGroupEntry> entry =
wait(TenantMetadata::tenantGroupMap.get(tr, updatedTenantEntry.tenantGroup.get())); wait(TenantMetadata::tenantGroupMap().get(tr, updatedTenantEntry.tenantGroup.get()));
if (!entry.present()) { if (!entry.present()) {
TenantMetadata::tenantGroupMap.set(tr, updatedTenantEntry.tenantGroup.get(), TenantGroupEntry()); TenantMetadata::tenantGroupMap().set(tr, updatedTenantEntry.tenantGroup.get(), TenantGroupEntry());
} }
// Insert this tenant in the tenant group index // Insert this tenant in the tenant group index
TenantMetadata::tenantGroupTenantIndex.insert( TenantMetadata::tenantGroupTenantIndex().insert(
tr, Tuple::makeTuple(updatedTenantEntry.tenantGroup.get(), tenantName)); tr, Tuple::makeTuple(updatedTenantEntry.tenantGroup.get(), tenantName));
} }
} }
@ -456,7 +462,7 @@ Future<std::vector<std::pair<TenantName, TenantMapEntry>>> listTenantsTransactio
tr->setOption(FDBTransactionOptions::RAW_ACCESS); tr->setOption(FDBTransactionOptions::RAW_ACCESS);
KeyBackedRangeResult<std::pair<TenantName, TenantMapEntry>> results = KeyBackedRangeResult<std::pair<TenantName, TenantMapEntry>> results =
wait(TenantMetadata::tenantMap.getRange(tr, begin, end, limit)); wait(TenantMetadata::tenantMap().getRange(tr, begin, end, limit));
return results.results; return results.results;
} }
@ -494,13 +500,15 @@ Future<Void> renameTenantTransaction(Transaction tr, TenantNameRef oldName, Tena
if (newEntry.present()) { if (newEntry.present()) {
throw tenant_already_exists(); throw tenant_already_exists();
} }
TenantMetadata::tenantMap.erase(tr, oldName); TenantMetadata::tenantMap().erase(tr, oldName);
TenantMetadata::tenantMap.set(tr, newName, oldEntry.get()); TenantMetadata::tenantMap().set(tr, newName, oldEntry.get());
TenantMetadata::tenantIdIndex().set(tr, oldEntry.get().id, newName);
// Update the tenant group index to reflect the new tenant name // Update the tenant group index to reflect the new tenant name
if (oldEntry.get().tenantGroup.present()) { if (oldEntry.get().tenantGroup.present()) {
TenantMetadata::tenantGroupTenantIndex.erase(tr, Tuple::makeTuple(oldEntry.get().tenantGroup.get(), oldName)); TenantMetadata::tenantGroupTenantIndex().erase(tr, Tuple::makeTuple(oldEntry.get().tenantGroup.get(), oldName));
TenantMetadata::tenantGroupTenantIndex.insert(tr, Tuple::makeTuple(oldEntry.get().tenantGroup.get(), newName)); TenantMetadata::tenantGroupTenantIndex().insert(tr,
Tuple::makeTuple(oldEntry.get().tenantGroup.get(), newName));
} }
return Void(); return Void();

View File

@ -137,7 +137,7 @@ private:
std::map<TenantName, std::vector<std::pair<Standalone<StringRef>, Optional<Value>>>> tenants, std::map<TenantName, std::vector<std::pair<Standalone<StringRef>, Optional<Value>>>> tenants,
std::map<TenantGroupName, int>* tenantGroupNetTenantDelta) { std::map<TenantGroupName, int>* tenantGroupNetTenantDelta) {
state Future<int64_t> tenantCountFuture = state Future<int64_t> tenantCountFuture =
TenantMetadata::tenantCount.getD(&ryw->getTransaction(), Snapshot::False, 0); TenantMetadata::tenantCount().getD(&ryw->getTransaction(), Snapshot::False, 0);
int64_t _nextId = wait(TenantAPI::getNextTenantId(&ryw->getTransaction())); int64_t _nextId = wait(TenantAPI::getNextTenantId(&ryw->getTransaction()));
state int64_t nextId = _nextId; state int64_t nextId = _nextId;
@ -146,7 +146,7 @@ private:
createFutures.push_back(createTenant(ryw, tenant, config, nextId++, tenantGroupNetTenantDelta)); createFutures.push_back(createTenant(ryw, tenant, config, nextId++, tenantGroupNetTenantDelta));
} }
TenantMetadata::lastTenantId.set(&ryw->getTransaction(), nextId - 1); TenantMetadata::lastTenantId().set(&ryw->getTransaction(), nextId - 1);
wait(waitForAll(createFutures)); wait(waitForAll(createFutures));
state int numCreatedTenants = 0; state int numCreatedTenants = 0;
@ -240,14 +240,14 @@ private:
ASSERT(tenantDelta < 0); ASSERT(tenantDelta < 0);
state int removedTenants = -tenantDelta; state int removedTenants = -tenantDelta;
KeyBackedSet<Tuple>::RangeResultType tenantsInGroup = KeyBackedSet<Tuple>::RangeResultType tenantsInGroup =
wait(TenantMetadata::tenantGroupTenantIndex.getRange(&ryw->getTransaction(), wait(TenantMetadata::tenantGroupTenantIndex().getRange(&ryw->getTransaction(),
Tuple::makeTuple(tenantGroup), Tuple::makeTuple(tenantGroup),
Tuple::makeTuple(keyAfter(tenantGroup)), Tuple::makeTuple(keyAfter(tenantGroup)),
removedTenants + 1)); removedTenants + 1));
ASSERT(tenantsInGroup.results.size() >= removedTenants); ASSERT(tenantsInGroup.results.size() >= removedTenants);
if (tenantsInGroup.results.size() == removedTenants) { if (tenantsInGroup.results.size() == removedTenants) {
TenantMetadata::tenantGroupMap.erase(&ryw->getTransaction(), tenantGroup); TenantMetadata::tenantGroupMap().erase(&ryw->getTransaction(), tenantGroup);
} }
return Void(); return Void();

View File

@ -62,6 +62,13 @@ public:
ThreadFuture<Key> purgeBlobGranules(const KeyRangeRef& keyRange, Version purgeVersion, bool force) override; ThreadFuture<Key> purgeBlobGranules(const KeyRangeRef& keyRange, Version purgeVersion, bool force) override;
ThreadFuture<Void> waitPurgeGranulesComplete(const KeyRef& purgeKey) override; ThreadFuture<Void> waitPurgeGranulesComplete(const KeyRef& purgeKey) override;
ThreadFuture<bool> blobbifyRange(const KeyRangeRef& keyRange) override;
ThreadFuture<bool> unblobbifyRange(const KeyRangeRef& keyRange) override;
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> listBlobbifiedRanges(const KeyRangeRef& keyRange,
int rangeLimit) override;
ThreadFuture<Version> verifyBlobRange(const KeyRangeRef& keyRange, Optional<Version> version) override;
ThreadFuture<DatabaseSharedState*> createSharedState() override; ThreadFuture<DatabaseSharedState*> createSharedState() override;
void setSharedState(DatabaseSharedState* p) override; void setSharedState(DatabaseSharedState* p) override;
@ -72,7 +79,8 @@ private:
DatabaseContext* db; DatabaseContext* db;
public: // Internal use only public: // Internal use only
ThreadSafeDatabase(Reference<IClusterConnectionRecord> connectionRecord, int apiVersion); enum class ConnectionRecordType { FILE, CONNECTION_STRING };
ThreadSafeDatabase(ConnectionRecordType connectionRecordType, std::string connectionRecord, int apiVersion);
ThreadSafeDatabase(DatabaseContext* db) : db(db) {} ThreadSafeDatabase(DatabaseContext* db) : db(db) {}
DatabaseContext* unsafeGetPtr() const { return db; } DatabaseContext* unsafeGetPtr() const { return db; }
}; };
@ -148,7 +156,8 @@ public:
ThreadFuture<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRangeRef& range, ThreadFuture<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRangeRef& range,
int64_t chunkSize) override; int64_t chunkSize) override;
ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRangeRef& keyRange) override; ThreadFuture<Standalone<VectorRef<KeyRangeRef>>> getBlobGranuleRanges(const KeyRangeRef& keyRange,
int rangeLimit) override;
ThreadResult<RangeResult> readBlobGranules(const KeyRangeRef& keyRange, ThreadResult<RangeResult> readBlobGranules(const KeyRangeRef& keyRange,
Version beginVersion, Version beginVersion,
@ -205,6 +214,7 @@ class ThreadSafeApi : public IClientApi, ThreadSafeReferenceCounted<ThreadSafeAp
public: public:
void selectApiVersion(int apiVersion) override; void selectApiVersion(int apiVersion) override;
const char* getClientVersion() override; const char* getClientVersion() override;
void useFutureProtocolVersion() override;
void setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override; void setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override;
void setupNetwork() override; void setupNetwork() override;
@ -221,7 +231,7 @@ private:
ThreadSafeApi(); ThreadSafeApi();
int apiVersion; int apiVersion;
const std::string clientVersion; std::string clientVersion;
uint64_t transportId; uint64_t transportId;
Mutex lock; Mutex lock;

View File

@ -115,6 +115,9 @@ description is not currently required but encouraged.
<Option name="client_threads_per_version" code="65" <Option name="client_threads_per_version" code="65"
paramType="Int" paramDescription="Number of client threads to be spawned. Each cluster will be serviced by a single client thread." paramType="Int" paramDescription="Number of client threads to be spawned. Each cluster will be serviced by a single client thread."
description="Spawns multiple worker threads for each version of the client that is loaded. Setting this to a number greater than one implies disable_local_client." /> description="Spawns multiple worker threads for each version of the client that is loaded. Setting this to a number greater than one implies disable_local_client." />
<Option name="future_version_client_library" code="66"
paramType="String" paramDescription="path to client library"
description="Adds an external client library to be used with a future version protocol. This option can be used testing purposes only!" />
<Option name="disable_client_statistics_logging" code="70" <Option name="disable_client_statistics_logging" code="70"
description="Disables logging of client statistics, such as sampled transaction activity." /> description="Disables logging of client statistics, such as sampled transaction activity." />
<Option name="enable_slow_task_profiling" code="71" <Option name="enable_slow_task_profiling" code="71"

Some files were not shown because too many files have changed in this diff Show More