sync with upstream main

This commit is contained in:
Fuheng Zhao 2022-08-31 15:46:39 -07:00
parent 620c119e9a
commit 0aa096dc17
367 changed files with 27212 additions and 4712 deletions

View File

@ -20,7 +20,7 @@ If you have questions, we encourage you to engage in discussion on the [communit
## Before you get started
### Community Guidelines
We want the FoundationDB community to be as welcoming and inclusive as possible, and have adopted a [Code of Conduct](CODE_OF_CONDUCT.md) that we ask all community members to read and observe.
We want the FoundationDB community to be as welcoming and inclusive as possible, and have adopted a [Code of Conduct](CODE_OF_CONDUCT.md) that we ask all community members to read and abide by.
### Project Licensing
By submitting a pull request, you represent that you have the right to license your contribution to Apple and the community, and agree by submitting the patch that your contributions are licensed under the Apache 2.0 license.
@ -34,7 +34,7 @@ Members of the Apple FoundationDB team are part of the core committers helping r
## Contributing
### Opening a Pull Request
We love pull requests! For minor changes, feel free to open up a PR directly. For larger feature development and any changes that may require community discussion, we ask that you discuss your ideas on the [community forums](https://forums.foundationdb.org) prior to opening a PR, and then reference that thread within your PR comment. Please refer to [FoundationDB Commit Process](https://github.com/apple/foundationdb/wiki/FoundationDB-Commit-Process) for more detailed guidelines.
We love pull requests! For minor changes, feel free to open up a PR directly. For larger feature development and any changes that may require community discussion, we ask that you discuss your ideas on the [community forums](https://forums.foundationdb.org) prior to opening a PR, and then reference that thread within your PR comment. Please refer to the [FoundationDB Commit Process](https://github.com/apple/foundationdb/wiki/FoundationDB-Commit-Process) for more detailed guidelines.
CI will be run automatically for core committers, and for community PRs it will be initiated by the request of a core committer. Tests can also be run locally via `ctest`, and core committers can run additional validation on pull requests prior to merging them.
@ -46,10 +46,10 @@ To report a security issue, please **DO NOT** start by filing a public issue or
## Project Communication
### Community Forums
We encourage your participation asking questions and helping improve the FoundationDB project. Check out the [FoundationDB community forums](https://forums.foundationdb.org), which serve a similar function as mailing lists in many open source projects. The forums are organized into three sections:
We encourage your participation asking questions and helping improve the FoundationDB project. Check out the [FoundationDB community forums](https://forums.foundationdb.org), which serve a similar function as mailing lists in many open source projects. The forums are organized into three categories:
* [Development](https://forums.foundationdb.org/c/development): For discussing the internals and development of the FoundationDB core, as well as layers.
* [Using FoundationDB](https://forums.foundationdb.org/c/using-foundationdb): For discussing user-facing topics. Getting started and have a question? This is the place for you.
* [Using FoundationDB](https://forums.foundationdb.org/c/using-foundationdb): For discussing user-facing topics. Getting started and have a question? This is the category for you.
* [Site Feedback](https://forums.foundationdb.org/c/site-feedback): A category for discussing the forums and the OSS project, its organization, how it works, and how we can improve it.
### Using GitHub Issues and Community Forums
@ -63,4 +63,4 @@ GitHub Issues should be used for tracking tasks. If you know the specific code t
* Implementing an agreed upon feature: *GitHub Issues*
### Project and Development Updates
Stay connected to the project and the community! For project and community updates, follow the [FoundationDB project blog](https://www.foundationdb.org/blog/). Development announcements will be made via the community forums' [dev-announce](https://forums.foundationdb.org/c/development/dev-announce) section.
Stay connected to the project and the community! For project and community updates, follow the [FoundationDB project blog](https://www.foundationdb.org/blog/). Development announcements will be made via the community forums' [dev-announce](https://forums.foundationdb.org/c/development/dev-announce) category.

View File

@ -139,8 +139,12 @@ if(NOT WIN32)
test/apitester/TesterTestSpec.cpp
test/apitester/TesterTestSpec.h
test/apitester/TesterBlobGranuleCorrectnessWorkload.cpp
test/apitester/TesterBlobGranuleErrorsWorkload.cpp
test/apitester/TesterBlobGranuleUtil.cpp
test/apitester/TesterBlobGranuleUtil.h
test/apitester/TesterCancelTransactionWorkload.cpp
test/apitester/TesterCorrectnessWorkload.cpp
test/apitester/TesterExampleWorkload.cpp
test/apitester/TesterKeyValueStore.cpp
test/apitester/TesterKeyValueStore.h
test/apitester/TesterOptions.h
@ -332,6 +336,24 @@ if(NOT WIN32)
@SERVER_CA_FILE@
)
add_test(NAME fdb_c_upgrade_to_future_version
COMMAND ${CMAKE_SOURCE_DIR}/tests/TestRunner/upgrade_test.py
--build-dir ${CMAKE_BINARY_DIR}
--test-file ${CMAKE_SOURCE_DIR}/bindings/c/test/apitester/tests/upgrade/MixedApiWorkloadMultiThr.toml
--upgrade-path "7.2.0" "7.3.0" "7.2.0"
--process-number 3
)
set_tests_properties("fdb_c_upgrade_to_future_version" PROPERTIES ENVIRONMENT "${SANITIZER_OPTIONS}")
add_test(NAME fdb_c_upgrade_to_future_version_blob_granules
COMMAND ${CMAKE_SOURCE_DIR}/tests/TestRunner/upgrade_test.py
--build-dir ${CMAKE_BINARY_DIR}
--test-file ${CMAKE_SOURCE_DIR}/bindings/c/test/apitester/tests/upgrade/ApiBlobGranulesCorrectness.toml
--upgrade-path "7.2.0" "7.3.0" "7.2.0"
--blob-granules-enabled
--process-number 3
)
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64" AND NOT USE_SANITIZER)
add_test(NAME fdb_c_upgrade_single_threaded_630api
COMMAND ${CMAKE_SOURCE_DIR}/tests/TestRunner/upgrade_test.py
@ -439,7 +461,7 @@ if (OPEN_FOR_IDE)
target_link_libraries(fdb_c_shim_lib_tester PRIVATE fdb_c_shim SimpleOpt fdb_cpp Threads::Threads)
target_include_directories(fdb_c_shim_lib_tester PUBLIC ${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR}/foundationdb/ ${CMAKE_SOURCE_DIR}/flow/include)
elseif(NOT WIN32 AND NOT APPLE AND NOT USE_UBSAN) # Linux Only, non-ubsan only
elseif(NOT WIN32 AND NOT APPLE AND NOT USE_SANITIZER) # Linux Only, non-santizer only
set(SHIM_LIB_OUTPUT_DIR ${CMAKE_CURRENT_BINARY_DIR})
@ -465,7 +487,7 @@ elseif(NOT WIN32 AND NOT APPLE AND NOT USE_UBSAN) # Linux Only, non-ubsan only
DEPENDS ${IMPLIBSO_SRC}
COMMENT "Generating source code for C shim library")
add_library(fdb_c_shim SHARED ${SHIM_LIB_GEN_SRC} foundationdb/fdb_c_shim.h fdb_c_shim.cpp)
add_library(fdb_c_shim STATIC ${SHIM_LIB_GEN_SRC} foundationdb/fdb_c_shim.h fdb_c_shim.cpp)
target_link_options(fdb_c_shim PRIVATE "LINKER:--version-script=${CMAKE_CURRENT_SOURCE_DIR}/fdb_c.map,-z,nodelete,-z,noexecstack")
target_link_libraries(fdb_c_shim PUBLIC dl)
target_include_directories(fdb_c_shim PUBLIC
@ -492,7 +514,7 @@ elseif(NOT WIN32 AND NOT APPLE AND NOT USE_UBSAN) # Linux Only, non-ubsan only
--api-test-dir ${CMAKE_SOURCE_DIR}/bindings/c/test/apitester/tests
)
endif() # End Linux only, non-ubsan only
endif() # End Linux only, non-sanitizer only
# TODO: re-enable once the old vcxproj-based build system is removed.
#generate_export_header(fdb_c EXPORT_MACRO_NAME "DLLEXPORT"
@ -537,7 +559,7 @@ fdb_install(
DESTINATION_SUFFIX "/cmake/${targets_export_name}"
COMPONENT clients)
if(NOT WIN32 AND NOT APPLE AND NOT USE_UBSAN) # Linux Only, non-ubsan only
if(NOT WIN32 AND NOT APPLE AND NOT USE_SANITIZER) # Linux Only, non-sanitizer only
fdb_install(
FILES foundationdb/fdb_c_shim.h

View File

@ -79,9 +79,10 @@ extern "C" DLLEXPORT fdb_bool_t fdb_error_predicate(int predicate_test, fdb_erro
if (predicate_test == FDBErrorPredicates::RETRYABLE_NOT_COMMITTED) {
return code == error_code_not_committed || code == error_code_transaction_too_old ||
code == error_code_future_version || code == error_code_database_locked ||
code == error_code_proxy_memory_limit_exceeded || code == error_code_batch_transaction_throttled ||
code == error_code_process_behind || code == error_code_tag_throttled ||
code == error_code_unknown_tenant;
code == error_code_grv_proxy_memory_limit_exceeded ||
code == error_code_commit_proxy_memory_limit_exceeded ||
code == error_code_batch_transaction_throttled || code == error_code_process_behind ||
code == error_code_tag_throttled || code == error_code_unknown_tenant;
}
return false;
}
@ -238,6 +239,10 @@ fdb_error_t fdb_future_get_version_v619(FDBFuture* f, int64_t* out_version) {
CATCH_AND_RETURN(*out_version = TSAV(Version, f)->get(););
}
extern "C" DLLEXPORT fdb_error_t fdb_future_get_bool(FDBFuture* f, fdb_bool_t* out_value) {
CATCH_AND_RETURN(*out_value = TSAV(bool, f)->get(););
}
extern "C" DLLEXPORT fdb_error_t fdb_future_get_int64(FDBFuture* f, int64_t* out_value) {
CATCH_AND_RETURN(*out_value = TSAV(int64_t, f)->get(););
}
@ -493,6 +498,54 @@ extern "C" DLLEXPORT FDBFuture* fdb_database_wait_purge_granules_complete(FDBDat
FDBFuture*)(DB(db)->waitPurgeGranulesComplete(StringRef(purge_key_name, purge_key_name_length)).extractPtr());
}
extern "C" DLLEXPORT FDBFuture* fdb_database_blobbify_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length) {
return (FDBFuture*)(DB(db)
->blobbifyRange(KeyRangeRef(StringRef(begin_key_name, begin_key_name_length),
StringRef(end_key_name, end_key_name_length)))
.extractPtr());
}
extern "C" DLLEXPORT FDBFuture* fdb_database_unblobbify_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length) {
return (FDBFuture*)(DB(db)
->unblobbifyRange(KeyRangeRef(StringRef(begin_key_name, begin_key_name_length),
StringRef(end_key_name, end_key_name_length)))
.extractPtr());
}
extern "C" DLLEXPORT FDBFuture* fdb_database_list_blobbified_ranges(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int rangeLimit) {
return (FDBFuture*)(DB(db)
->listBlobbifiedRanges(KeyRangeRef(StringRef(begin_key_name, begin_key_name_length),
StringRef(end_key_name, end_key_name_length)),
rangeLimit)
.extractPtr());
}
extern "C" DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_verify_blob_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int64_t version) {
return (FDBFuture*)(DB(db)
->verifyBlobRange(KeyRangeRef(StringRef(begin_key_name, begin_key_name_length),
StringRef(end_key_name, end_key_name_length)),
version)
.extractPtr());
}
extern "C" DLLEXPORT fdb_error_t fdb_tenant_create_transaction(FDBTenant* tenant, FDBTransaction** out_transaction) {
CATCH_AND_RETURN(*out_transaction = (FDBTransaction*)TENANT(tenant)->createTransaction().extractPtr(););
}
@ -855,11 +908,12 @@ extern "C" DLLEXPORT FDBFuture* fdb_transaction_get_blob_granule_ranges(FDBTrans
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length) {
int end_key_name_length,
int rangeLimit) {
RETURN_FUTURE_ON_ERROR(
Standalone<VectorRef<KeyRangeRef>>,
KeyRangeRef range(KeyRef(begin_key_name, begin_key_name_length), KeyRef(end_key_name, end_key_name_length));
return (FDBFuture*)(TXN(tr)->getBlobGranuleRanges(range).extractPtr()););
return (FDBFuture*)(TXN(tr)->getBlobGranuleRanges(range, rangeLimit).extractPtr()););
}
extern "C" DLLEXPORT FDBResult* fdb_transaction_read_blob_granules(FDBTransaction* tr,
@ -889,6 +943,57 @@ extern "C" DLLEXPORT FDBResult* fdb_transaction_read_blob_granules(FDBTransactio
return (FDBResult*)(TXN(tr)->readBlobGranules(range, beginVersion, rv, context).extractPtr()););
}
extern "C" DLLEXPORT FDBFuture* fdb_transaction_read_blob_granules_start(FDBTransaction* tr,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int64_t beginVersion,
int64_t readVersion,
int64_t* readVersionOut) {
Optional<Version> rv;
if (readVersion != latestVersion) {
rv = readVersion;
}
return (FDBFuture*)(TXN(tr)
->readBlobGranulesStart(KeyRangeRef(KeyRef(begin_key_name, begin_key_name_length),
KeyRef(end_key_name, end_key_name_length)),
beginVersion,
rv,
readVersionOut)
.extractPtr());
}
extern "C" DLLEXPORT FDBResult* fdb_transaction_read_blob_granules_finish(FDBTransaction* tr,
FDBFuture* f,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int64_t beginVersion,
int64_t readVersion,
FDBReadBlobGranuleContext* granule_context) {
// FIXME: better way to convert?
ReadBlobGranuleContext context;
context.userContext = granule_context->userContext;
context.start_load_f = granule_context->start_load_f;
context.get_load_f = granule_context->get_load_f;
context.free_load_f = granule_context->free_load_f;
context.debugNoMaterialize = granule_context->debugNoMaterialize;
context.granuleParallelism = granule_context->granuleParallelism;
ThreadFuture<Standalone<VectorRef<BlobGranuleChunkRef>>> startFuture(
TSAV(Standalone<VectorRef<BlobGranuleChunkRef>>, f));
return (FDBResult*)(TXN(tr)
->readBlobGranulesFinish(startFuture,
KeyRangeRef(KeyRef(begin_key_name, begin_key_name_length),
KeyRef(end_key_name, end_key_name_length)),
beginVersion,
readVersion,
context)
.extractPtr());
}
#include "fdb_c_function_pointers.g.h"
#define FDB_API_CHANGED(func, ver) \
@ -964,6 +1069,10 @@ extern "C" DLLEXPORT const char* fdb_get_client_version() {
return API->getClientVersion();
}
extern "C" DLLEXPORT void fdb_use_future_protocol_version() {
API->useFutureProtocolVersion();
}
#if defined(__APPLE__)
#include <dlfcn.h>
__attribute__((constructor)) static void initialize() {

View File

@ -227,6 +227,8 @@ DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_set_callback(FDBFuture* f,
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_error(FDBFuture* f);
#endif
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_bool(FDBFuture* f, fdb_bool_t* out);
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_int64(FDBFuture* f, int64_t* out);
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_uint64(FDBFuture* f, uint64_t* out);
@ -321,6 +323,32 @@ DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_wait_purge_granules_complet
uint8_t const* purge_key_name,
int purge_key_name_length);
DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_blobbify_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length);
DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_unblobbify_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length);
DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_list_blobbified_ranges(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int rangeLimit);
DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_database_verify_blob_range(FDBDatabase* db,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int64_t version);
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_tenant_create_transaction(FDBTenant* tenant,
FDBTransaction** out_transaction);
@ -479,7 +507,8 @@ DLLEXPORT WARN_UNUSED_RESULT FDBFuture* fdb_transaction_get_blob_granule_ranges(
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length);
int end_key_name_length,
int rangeLimit);
/* LatestVersion (-2) for readVersion means get read version from transaction
Separated out as optional because BG reads can support longer-lived reads than normal FDB transactions */

View File

@ -49,6 +49,29 @@ DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_future_get_shared_state(FDBFuture*
DLLEXPORT WARN_UNUSED_RESULT fdb_error_t fdb_create_database_from_connection_string(const char* connection_string,
FDBDatabase** out_database);
DLLEXPORT void fdb_use_future_protocol_version();
// the logical read_blob_granules is broken out (at different points depending on the client type) into the asynchronous
// start() that happens on the fdb network thread, and synchronous finish() that happens off it
DLLEXPORT FDBFuture* fdb_transaction_read_blob_granules_start(FDBTransaction* tr,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int64_t beginVersion,
int64_t readVersion,
int64_t* readVersionOut);
DLLEXPORT FDBResult* fdb_transaction_read_blob_granules_finish(FDBTransaction* tr,
FDBFuture* f,
uint8_t const* begin_key_name,
int begin_key_name_length,
uint8_t const* end_key_name,
int end_key_name_length,
int64_t beginVersion,
int64_t readVersion,
FDBReadBlobGranuleContext* granuleContext);
#ifdef __cplusplus
}
#endif

View File

@ -18,61 +18,13 @@
* limitations under the License.
*/
#include "TesterApiWorkload.h"
#include "TesterBlobGranuleUtil.h"
#include "TesterUtil.h"
#include <memory>
#include <fmt/format.h>
namespace FdbApiTester {
class TesterGranuleContext {
public:
std::unordered_map<int64_t, uint8_t*> loadsInProgress;
int64_t nextId = 0;
std::string basePath;
~TesterGranuleContext() {
// if there was an error or not all loads finished, delete data
for (auto& it : loadsInProgress) {
uint8_t* dataToFree = it.second;
delete[] dataToFree;
}
}
};
static int64_t granule_start_load(const char* filename,
int filenameLength,
int64_t offset,
int64_t length,
int64_t fullFileLength,
void* context) {
TesterGranuleContext* ctx = (TesterGranuleContext*)context;
int64_t loadId = ctx->nextId++;
uint8_t* buffer = new uint8_t[length];
std::ifstream fin(ctx->basePath + std::string(filename, filenameLength), std::ios::in | std::ios::binary);
fin.seekg(offset);
fin.read((char*)buffer, length);
ctx->loadsInProgress.insert({ loadId, buffer });
return loadId;
}
static uint8_t* granule_get_load(int64_t loadId, void* context) {
TesterGranuleContext* ctx = (TesterGranuleContext*)context;
return ctx->loadsInProgress.at(loadId);
}
static void granule_free_load(int64_t loadId, void* context) {
TesterGranuleContext* ctx = (TesterGranuleContext*)context;
auto it = ctx->loadsInProgress.find(loadId);
uint8_t* dataToFree = it->second;
delete[] dataToFree;
ctx->loadsInProgress.erase(it);
}
class ApiBlobGranuleCorrectnessWorkload : public ApiWorkload {
public:
ApiBlobGranuleCorrectnessWorkload(const WorkloadConfig& config) : ApiWorkload(config) {
@ -80,9 +32,12 @@ public:
if (Random::get().randomInt(0, 1) == 0) {
excludedOpTypes.push_back(OP_CLEAR_RANGE);
}
// FIXME: remove! this bug is fixed in another PR
excludedOpTypes.push_back(OP_GET_RANGES);
}
private:
// FIXME: use other new blob granule apis!
enum OpType { OP_INSERT, OP_CLEAR, OP_CLEAR_RANGE, OP_READ, OP_GET_RANGES, OP_LAST = OP_GET_RANGES };
std::vector<OpType> excludedOpTypes;
@ -101,16 +56,8 @@ private:
execTransaction(
[this, begin, end, results, tooOld](auto ctx) {
ctx->tx().setOption(FDB_TR_OPTION_READ_YOUR_WRITES_DISABLE);
TesterGranuleContext testerContext;
testerContext.basePath = ctx->getBGBasePath();
fdb::native::FDBReadBlobGranuleContext granuleContext;
granuleContext.userContext = &testerContext;
granuleContext.debugNoMaterialize = false;
granuleContext.granuleParallelism = 1;
granuleContext.start_load_f = &granule_start_load;
granuleContext.get_load_f = &granule_get_load;
granuleContext.free_load_f = &granule_free_load;
TesterGranuleContext testerContext(ctx->getBGBasePath());
fdb::native::FDBReadBlobGranuleContext granuleContext = createGranuleContext(&testerContext);
fdb::Result res = ctx->tx().readBlobGranules(
begin, end, 0 /* beginVersion */, -2 /* latest read version */, granuleContext);
@ -124,8 +71,10 @@ private:
} else if (err.code() != error_code_success) {
ctx->onError(err);
} else {
auto& [out_kv, out_count, out_more] = out;
auto resCopy = copyKeyValueArray(out);
auto& [resVector, out_more] = resCopy;
ASSERT(!out_more);
results.get()->assign(resVector.begin(), resVector.end());
if (!seenReadSuccess) {
info("BlobGranuleCorrectness::randomReadOp first success\n");
}
@ -178,7 +127,7 @@ private:
}
execTransaction(
[begin, end, results](auto ctx) {
fdb::Future f = ctx->tx().getBlobGranuleRanges(begin, end).eraseType();
fdb::Future f = ctx->tx().getBlobGranuleRanges(begin, end, 1000).eraseType();
ctx->continueAfter(
f,
[ctx, f, results]() {
@ -196,11 +145,25 @@ private:
for (int i = 0; i < results->size(); i++) {
// no empty or inverted ranges
if ((*results)[i].beginKey >= (*results)[i].endKey) {
error(fmt::format("Empty/inverted range [{0} - {1}) for getBlobGranuleRanges({2} - {3})",
fdb::toCharsRef((*results)[i].beginKey),
fdb::toCharsRef((*results)[i].endKey),
fdb::toCharsRef(begin),
fdb::toCharsRef(end)));
}
ASSERT((*results)[i].beginKey < (*results)[i].endKey);
}
for (int i = 1; i < results->size(); i++) {
// ranges contain entire requested key range
if ((*results)[i].beginKey != (*results)[i].endKey) {
error(fmt::format("Non-contiguous range [{0} - {1}) for getBlobGranuleRanges({2} - {3})",
fdb::toCharsRef((*results)[i].beginKey),
fdb::toCharsRef((*results)[i].endKey),
fdb::toCharsRef(begin),
fdb::toCharsRef(end)));
}
ASSERT((*results)[i].beginKey == (*results)[i - 1].endKey);
}

View File

@ -0,0 +1,145 @@
/*
* TesterBlobGranuleErrorsWorkload.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "TesterApiWorkload.h"
#include "TesterBlobGranuleUtil.h"
#include "TesterUtil.h"
#include <memory>
#include <fmt/format.h>
namespace FdbApiTester {
class BlobGranuleErrorsWorkload : public ApiWorkload {
public:
BlobGranuleErrorsWorkload(const WorkloadConfig& config) : ApiWorkload(config) {}
private:
enum OpType {
OP_READ_NO_MATERIALIZE,
OP_READ_FILE_LOAD_ERROR,
OP_READ_TOO_OLD,
OP_CANCEL_RANGES,
OP_LAST = OP_CANCEL_RANGES
};
// Allow reads at the start to get blob_granule_transaction_too_old if BG data isn't initialized yet
// FIXME: should still guarantee a read succeeds eventually somehow
bool seenReadSuccess = false;
void doErrorOp(TTaskFct cont,
std::string basePathAddition,
bool doMaterialize,
int64_t readVersion,
fdb::native::fdb_error_t expectedError) {
fdb::Key begin = randomKeyName();
fdb::Key end = begin;
// [K - K) empty range will succeed read because there is trivially nothing to do, so don't do it
while (end == begin) {
end = randomKeyName();
}
if (begin > end) {
std::swap(begin, end);
}
execTransaction(
[this, begin, end, basePathAddition, doMaterialize, readVersion, expectedError](auto ctx) {
ctx->tx().setOption(FDB_TR_OPTION_READ_YOUR_WRITES_DISABLE);
TesterGranuleContext testerContext(ctx->getBGBasePath() + basePathAddition);
fdb::native::FDBReadBlobGranuleContext granuleContext = createGranuleContext(&testerContext);
granuleContext.debugNoMaterialize = !doMaterialize;
fdb::Result res =
ctx->tx().readBlobGranules(begin, end, 0 /* beginVersion */, readVersion, granuleContext);
auto out = fdb::Result::KeyValueRefArray{};
fdb::Error err = res.getKeyValueArrayNothrow(out);
if (err.code() == error_code_success) {
error(fmt::format("Operation succeeded in error test!"));
}
ASSERT(err.code() != error_code_success);
if (err.code() != error_code_blob_granule_transaction_too_old) {
seenReadSuccess = true;
}
if (err.code() != expectedError) {
info(fmt::format("incorrect error. Expected {}, Got {}", err.code(), expectedError));
if (err.code() == error_code_blob_granule_transaction_too_old) {
ASSERT(!seenReadSuccess);
ctx->done();
} else {
ctx->onError(err);
}
} else {
ctx->done();
}
},
[this, cont]() { schedule(cont); });
}
void randomOpReadNoMaterialize(TTaskFct cont) {
// ensure setting noMaterialize flag produces blob_granule_not_materialized
doErrorOp(cont, "", false, -2 /*latest read version */, error_code_blob_granule_not_materialized);
}
void randomOpReadFileLoadError(TTaskFct cont) {
// point to a file path that doesn't exist by adding an extra suffix
doErrorOp(cont, "extrapath/", true, -2 /*latest read version */, error_code_blob_granule_file_load_error);
}
void randomOpReadTooOld(TTaskFct cont) {
// read at a version (1) that should predate granule data
doErrorOp(cont, "", true, 1, error_code_blob_granule_transaction_too_old);
}
void randomCancelGetRangesOp(TTaskFct cont) {
fdb::Key begin = randomKeyName();
fdb::Key end = randomKeyName();
if (begin > end) {
std::swap(begin, end);
}
execTransaction(
[begin, end](auto ctx) {
fdb::Future f = ctx->tx().getBlobGranuleRanges(begin, end, 1000).eraseType();
ctx->done();
},
[this, cont]() { schedule(cont); });
}
void randomOperation(TTaskFct cont) override {
OpType txType = (OpType)Random::get().randomInt(0, OP_LAST);
switch (txType) {
case OP_READ_NO_MATERIALIZE:
randomOpReadNoMaterialize(cont);
break;
case OP_READ_FILE_LOAD_ERROR:
randomOpReadFileLoadError(cont);
break;
case OP_READ_TOO_OLD:
randomOpReadTooOld(cont);
break;
case OP_CANCEL_RANGES:
randomCancelGetRangesOp(cont);
break;
}
}
};
WorkloadFactory<BlobGranuleErrorsWorkload> BlobGranuleErrorsWorkloadFactory("BlobGranuleErrors");
} // namespace FdbApiTester

View File

@ -0,0 +1,80 @@
/*
* TesterBlobGranuleUtil.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "TesterBlobGranuleUtil.h"
#include "TesterUtil.h"
#include <fstream>
namespace FdbApiTester {
// FIXME: avoid duplicating this between files!
static int64_t granule_start_load(const char* filename,
int filenameLength,
int64_t offset,
int64_t length,
int64_t fullFileLength,
void* context) {
TesterGranuleContext* ctx = (TesterGranuleContext*)context;
int64_t loadId = ctx->nextId++;
uint8_t* buffer = new uint8_t[length];
std::ifstream fin(ctx->basePath + std::string(filename, filenameLength), std::ios::in | std::ios::binary);
if (fin.fail()) {
delete[] buffer;
buffer = nullptr;
} else {
fin.seekg(offset);
fin.read((char*)buffer, length);
}
ctx->loadsInProgress.insert({ loadId, buffer });
return loadId;
}
static uint8_t* granule_get_load(int64_t loadId, void* context) {
TesterGranuleContext* ctx = (TesterGranuleContext*)context;
return ctx->loadsInProgress.at(loadId);
}
static void granule_free_load(int64_t loadId, void* context) {
TesterGranuleContext* ctx = (TesterGranuleContext*)context;
auto it = ctx->loadsInProgress.find(loadId);
uint8_t* dataToFree = it->second;
delete[] dataToFree;
ctx->loadsInProgress.erase(it);
}
fdb::native::FDBReadBlobGranuleContext createGranuleContext(const TesterGranuleContext* testerContext) {
fdb::native::FDBReadBlobGranuleContext granuleContext;
granuleContext.userContext = (void*)testerContext;
granuleContext.debugNoMaterialize = false;
granuleContext.granuleParallelism = 1 + Random::get().randomInt(0, 3);
granuleContext.start_load_f = &granule_start_load;
granuleContext.get_load_f = &granule_get_load;
granuleContext.free_load_f = &granule_free_load;
return granuleContext;
}
} // namespace FdbApiTester

View File

@ -0,0 +1,49 @@
/*
* TesterBlobGranuleUtil.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#ifndef APITESTER_BLOBGRANULE_UTIL_H
#define APITESTER_BLOBGRANULE_UTIL_H
#include "TesterUtil.h"
#include "test/fdb_api.hpp"
#include <unordered_map>
namespace FdbApiTester {
class TesterGranuleContext {
public:
std::unordered_map<int64_t, uint8_t*> loadsInProgress;
std::string basePath;
int64_t nextId;
TesterGranuleContext(const std::string& basePath) : basePath(basePath), nextId(0) {}
~TesterGranuleContext() {
// this should now never happen with proper memory management
ASSERT(loadsInProgress.empty());
}
};
fdb::native::FDBReadBlobGranuleContext createGranuleContext(const TesterGranuleContext* testerContext);
} // namespace FdbApiTester
#endif

View File

@ -0,0 +1,65 @@
/*
* TesterExampleWorkload.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "TesterWorkload.h"
#include "TesterUtil.h"
namespace FdbApiTester {
class SetAndGetWorkload : public WorkloadBase {
public:
fdb::Key keyPrefix;
Random random;
SetAndGetWorkload(const WorkloadConfig& config) : WorkloadBase(config) {
keyPrefix = fdb::toBytesRef(fmt::format("{}/", workloadId));
}
void start() override { setAndGet(NO_OP_TASK); }
void setAndGet(TTaskFct cont) {
fdb::Key key = keyPrefix + random.randomStringLowerCase(10, 100);
fdb::Value value = random.randomStringLowerCase(10, 1000);
execTransaction(
[key, value](auto ctx) {
ctx->tx().set(key, value);
ctx->commit();
},
[this, key, value, cont]() {
execTransaction(
[this, key, value](auto ctx) {
auto future = ctx->tx().get(key, false);
ctx->continueAfter(future, [this, ctx, future, value]() {
std::optional<fdb::Value> res = copyValueRef(future.get());
if (res != value) {
error(fmt::format(
"expected: {} actual: {}", fdb::toCharsRef(value), fdb::toCharsRef(res.value())));
}
ctx->done();
});
},
cont);
});
}
};
WorkloadFactory<SetAndGetWorkload> SetAndGetWorkloadFactory("SetAndGet");
} // namespace FdbApiTester

View File

@ -38,6 +38,7 @@ public:
std::string logGroup;
std::string externalClientLibrary;
std::string externalClientDir;
std::string futureVersionClientLibrary;
std::string tmpDir;
bool disableLocalClient = false;
std::string testFile;

View File

@ -165,8 +165,11 @@ void WorkloadManager::add(std::shared_ptr<IWorkload> workload, TTaskFct cont) {
void WorkloadManager::run() {
std::vector<std::shared_ptr<IWorkload>> initialWorkloads;
for (auto iter : workloads) {
initialWorkloads.push_back(iter.second.ref);
{
std::unique_lock<std::mutex> lock(mutex);
for (auto iter : workloads) {
initialWorkloads.push_back(iter.second.ref);
}
}
for (auto iter : initialWorkloads) {
iter->init(this);
@ -324,4 +327,4 @@ std::unordered_map<std::string, IWorkloadFactory*>& IWorkloadFactory::factories(
return theFactories;
}
} // namespace FdbApiTester
} // namespace FdbApiTester

View File

@ -0,0 +1,22 @@
[[test]]
title = 'Blob Granule Errors Multi Threaded'
multiThreaded = true
buggify = true
minFdbThreads = 2
maxFdbThreads = 8
minDatabases = 2
maxDatabases = 8
minClientThreads = 2
maxClientThreads = 8
minClients = 2
maxClients = 8
[[test.workload]]
name = 'BlobGranuleErrors'
minKeyLength = 1
maxKeyLength = 64
minValueLength = 1
maxValueLength = 1000
maxKeysPerTransaction = 50
initialSize = 100
numRandomOperations = 100

View File

@ -0,0 +1,22 @@
[[test]]
title = 'Blob Granule Errors Multi Threaded'
multiThreaded = true
buggify = true
minFdbThreads = 2
maxFdbThreads = 8
minDatabases = 2
maxDatabases = 8
minClientThreads = 2
maxClientThreads = 8
minClients = 2
maxClients = 8
[[test.workload]]
name = 'BlobGranuleErrors'
minKeyLength = 1
maxKeyLength = 64
minValueLength = 1
maxValueLength = 1000
maxKeysPerTransaction = 50
initialSize = 100
numRandomOperations = 100

View File

@ -0,0 +1,15 @@
[[test]]
title = 'Blob Granule Errors Single Threaded'
minClients = 1
maxClients = 3
multiThreaded = false
[[test.workload]]
name = 'BlobGranuleErrors'
minKeyLength = 1
maxKeyLength = 64
minValueLength = 1
maxValueLength = 1000
maxKeysPerTransaction = 50
initialSize = 100
numRandomOperations = 100

View File

@ -46,6 +46,7 @@ enum TesterOptionId {
OPT_KNOB,
OPT_EXTERNAL_CLIENT_LIBRARY,
OPT_EXTERNAL_CLIENT_DIRECTORY,
OPT_FUTURE_VERSION_CLIENT_LIBRARY,
OPT_TMP_DIR,
OPT_DISABLE_LOCAL_CLIENT,
OPT_TEST_FILE,
@ -72,6 +73,7 @@ CSimpleOpt::SOption TesterOptionDefs[] = //
{ OPT_KNOB, "--knob-", SO_REQ_SEP },
{ OPT_EXTERNAL_CLIENT_LIBRARY, "--external-client-library", SO_REQ_SEP },
{ OPT_EXTERNAL_CLIENT_DIRECTORY, "--external-client-dir", SO_REQ_SEP },
{ OPT_FUTURE_VERSION_CLIENT_LIBRARY, "--future-version-client-library", SO_REQ_SEP },
{ OPT_TMP_DIR, "--tmp-dir", SO_REQ_SEP },
{ OPT_DISABLE_LOCAL_CLIENT, "--disable-local-client", SO_NONE },
{ OPT_TEST_FILE, "-f", SO_REQ_SEP },
@ -110,6 +112,8 @@ void printProgramUsage(const char* execName) {
" Path to the external client library.\n"
" --external-client-dir DIR\n"
" Directory containing external client libraries.\n"
" --future-version-client-library FILE\n"
" Path to a client library to be used with a future protocol version.\n"
" --tmp-dir DIR\n"
" Directory for temporary files of the client.\n"
" --disable-local-client DIR\n"
@ -204,6 +208,9 @@ bool processArg(TesterOptions& options, const CSimpleOpt& args) {
case OPT_EXTERNAL_CLIENT_DIRECTORY:
options.externalClientDir = args.OptionArg();
break;
case OPT_FUTURE_VERSION_CLIENT_LIBRARY:
options.futureVersionClientLibrary = args.OptionArg();
break;
case OPT_TMP_DIR:
options.tmpDir = args.OptionArg();
break;
@ -296,6 +303,11 @@ void applyNetworkOptions(TesterOptions& options) {
}
}
if (!options.futureVersionClientLibrary.empty()) {
fdb::network::setOption(FDBNetworkOption::FDB_NET_OPTION_FUTURE_VERSION_CLIENT_LIBRARY,
options.futureVersionClientLibrary);
}
if (options.testSpec.multiThreaded) {
fdb::network::setOption(FDBNetworkOption::FDB_NET_OPTION_CLIENT_THREADS_PER_VERSION, options.numFdbThreads);
}

View File

@ -0,0 +1,23 @@
[[test]]
title = 'Mixed Workload for Upgrade Tests with a Multi-Threaded Client'
multiThreaded = true
buggify = true
databasePerTransaction = false
minFdbThreads = 2
maxFdbThreads = 8
minDatabases = 2
maxDatabases = 8
minClientThreads = 2
maxClientThreads = 8
minClients = 2
maxClients = 8
[[test.workload]]
name = 'ApiBlobGranuleCorrectness'
minKeyLength = 1
maxKeyLength = 64
minValueLength = 1
maxValueLength = 1000
maxKeysPerTransaction = 50
initialSize = 100
runUntilStop = true

View File

@ -32,4 +32,14 @@ maxClients = 8
maxKeysPerTransaction = 50
initialSize = 100
runUntilStop = true
readExistingKeysRatio = 0.9
readExistingKeysRatio = 0.9
[[test.workload]]
name = 'AtomicOpsCorrectness'
initialSize = 0
runUntilStop = true
[[test.workload]]
name = 'WatchAndWait'
initialSize = 0
runUntilStop = true

View File

@ -30,4 +30,14 @@ maxClients = 8
maxKeysPerTransaction = 50
initialSize = 100
runUntilStop = true
readExistingKeysRatio = 0.9
readExistingKeysRatio = 0.9
[[test.workload]]
name = 'AtomicOpsCorrectness'
initialSize = 0
runUntilStop = true
[[test.workload]]
name = 'WatchAndWait'
initialSize = 0
runUntilStop = true

View File

@ -559,9 +559,9 @@ public:
reverse);
}
TypedFuture<future_var::KeyRangeRefArray> getBlobGranuleRanges(KeyRef begin, KeyRef end) {
TypedFuture<future_var::KeyRangeRefArray> getBlobGranuleRanges(KeyRef begin, KeyRef end, int rangeLimit) {
return native::fdb_transaction_get_blob_granule_ranges(
tr.get(), begin.data(), intSize(begin), end.data(), intSize(end));
tr.get(), begin.data(), intSize(begin), end.data(), intSize(end), rangeLimit);
}
Result readBlobGranules(KeyRef begin,

View File

@ -26,6 +26,9 @@
extern thread_local mako::Logger logr;
// FIXME: use the same implementation as the api tester! this implementation was from back when mako was written in C
// and is inferior.
namespace mako::blob_granules::local_file {
int64_t startLoad(const char* filename,

View File

@ -356,9 +356,15 @@ fdb_error_t Transaction::add_conflict_range(std::string_view begin_key,
tr_, (const uint8_t*)begin_key.data(), begin_key.size(), (const uint8_t*)end_key.data(), end_key.size(), type);
}
KeyRangeArrayFuture Transaction::get_blob_granule_ranges(std::string_view begin_key, std::string_view end_key) {
return KeyRangeArrayFuture(fdb_transaction_get_blob_granule_ranges(
tr_, (const uint8_t*)begin_key.data(), begin_key.size(), (const uint8_t*)end_key.data(), end_key.size()));
KeyRangeArrayFuture Transaction::get_blob_granule_ranges(std::string_view begin_key,
std::string_view end_key,
int rangeLimit) {
return KeyRangeArrayFuture(fdb_transaction_get_blob_granule_ranges(tr_,
(const uint8_t*)begin_key.data(),
begin_key.size(),
(const uint8_t*)end_key.data(),
end_key.size(),
rangeLimit));
}
KeyValueArrayResult Transaction::read_blob_granules(std::string_view begin_key,
std::string_view end_key,

View File

@ -348,7 +348,7 @@ public:
// Wrapper around fdb_transaction_add_conflict_range.
fdb_error_t add_conflict_range(std::string_view begin_key, std::string_view end_key, FDBConflictRangeType type);
KeyRangeArrayFuture get_blob_granule_ranges(std::string_view begin_key, std::string_view end_key);
KeyRangeArrayFuture get_blob_granule_ranges(std::string_view begin_key, std::string_view end_key, int rangeLimit);
KeyValueArrayResult read_blob_granules(std::string_view begin_key,
std::string_view end_key,
int64_t beginVersion,

View File

@ -2853,7 +2853,7 @@ TEST_CASE("Blob Granule Functions") {
// test ranges
while (1) {
fdb::KeyRangeArrayFuture f = tr.get_blob_granule_ranges(key("bg"), key("bh"));
fdb::KeyRangeArrayFuture f = tr.get_blob_granule_ranges(key("bg"), key("bh"), 1000);
fdb_error_t err = wait_future(f);
if (err) {
fdb::EmptyFuture f2 = tr.on_error(err);

View File

@ -239,6 +239,13 @@ func (o NetworkOptions) SetClientThreadsPerVersion(param int64) error {
return o.setOpt(65, int64ToBytes(param))
}
// Adds an external client library to be used with a future version protocol. This option can be used testing purposes only!
//
// Parameter: path to client library
func (o NetworkOptions) SetFutureVersionClientLibrary(param string) error {
return o.setOpt(66, []byte(param))
}
// Disables logging of client statistics, such as sampled transaction activity.
func (o NetworkOptions) SetDisableClientStatisticsLogging() error {
return o.setOpt(70, nil)
@ -615,6 +622,13 @@ func (o TransactionOptions) SetUseGrvCache() error {
return o.setOpt(1101, nil)
}
// Attach given authorization token to the transaction such that subsequent tenant-aware requests are authorized
//
// Parameter: A JSON Web Token authorized to access data belonging to one or more tenants, indicated by 'tenants' claim of the token's payload.
func (o TransactionOptions) SetAuthorizationToken(param string) error {
return o.setOpt(2000, []byte(param))
}
type StreamingMode int
const (

View File

@ -34,9 +34,11 @@ set(JAVA_BINDING_SRCS
src/main/com/apple/foundationdb/FDBDatabase.java
src/main/com/apple/foundationdb/FDBTenant.java
src/main/com/apple/foundationdb/FDBTransaction.java
src/main/com/apple/foundationdb/FutureBool.java
src/main/com/apple/foundationdb/FutureInt64.java
src/main/com/apple/foundationdb/FutureKey.java
src/main/com/apple/foundationdb/FutureKeyArray.java
src/main/com/apple/foundationdb/FutureKeyRangeArray.java
src/main/com/apple/foundationdb/FutureResult.java
src/main/com/apple/foundationdb/FutureResults.java
src/main/com/apple/foundationdb/FutureMappedResults.java
@ -56,6 +58,7 @@ set(JAVA_BINDING_SRCS
src/main/com/apple/foundationdb/RangeQuery.java
src/main/com/apple/foundationdb/MappedRangeQuery.java
src/main/com/apple/foundationdb/KeyArrayResult.java
src/main/com/apple/foundationdb/KeyRangeArrayResult.java
src/main/com/apple/foundationdb/RangeResult.java
src/main/com/apple/foundationdb/MappedRangeResult.java
src/main/com/apple/foundationdb/RangeResultInfo.java

View File

@ -25,9 +25,11 @@
#include "com_apple_foundationdb_FDB.h"
#include "com_apple_foundationdb_FDBDatabase.h"
#include "com_apple_foundationdb_FDBTransaction.h"
#include "com_apple_foundationdb_FutureBool.h"
#include "com_apple_foundationdb_FutureInt64.h"
#include "com_apple_foundationdb_FutureKey.h"
#include "com_apple_foundationdb_FutureKeyArray.h"
#include "com_apple_foundationdb_FutureKeyRangeArray.h"
#include "com_apple_foundationdb_FutureResult.h"
#include "com_apple_foundationdb_FutureResults.h"
#include "com_apple_foundationdb_FutureStrings.h"
@ -55,7 +57,11 @@ static jclass mapped_range_result_class;
static jclass mapped_key_value_class;
static jclass string_class;
static jclass key_array_result_class;
static jclass keyrange_class;
static jclass keyrange_array_result_class;
static jmethodID key_array_result_init;
static jmethodID keyrange_init;
static jmethodID keyrange_array_result_init;
static jmethodID range_result_init;
static jmethodID mapped_range_result_init;
static jmethodID mapped_key_value_from_bytes;
@ -278,6 +284,23 @@ JNIEXPORT void JNICALL Java_com_apple_foundationdb_NativeFuture_Future_1releaseM
fdb_future_release_memory(var);
}
JNIEXPORT jboolean JNICALL Java_com_apple_foundationdb_FutureBool_FutureBool_1get(JNIEnv* jenv, jobject, jlong future) {
if (!future) {
throwParamNotNull(jenv);
return 0;
}
FDBFuture* f = (FDBFuture*)future;
fdb_bool_t value = false;
fdb_error_t err = fdb_future_get_bool(f, &value);
if (err) {
safeThrow(jenv, getThrowable(jenv, err));
return 0;
}
return (jboolean)value;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FutureInt64_FutureInt64_1get(JNIEnv* jenv, jobject, jlong future) {
if (!future) {
throwParamNotNull(jenv);
@ -407,6 +430,61 @@ JNIEXPORT jobject JNICALL Java_com_apple_foundationdb_FutureKeyArray_FutureKeyAr
return result;
}
JNIEXPORT jobject JNICALL Java_com_apple_foundationdb_FutureKeyRangeArray_FutureKeyRangeArray_1get(JNIEnv* jenv,
jobject,
jlong future) {
if (!future) {
throwParamNotNull(jenv);
return JNI_NULL;
}
FDBFuture* f = (FDBFuture*)future;
const FDBKeyRange* fdbKr;
int count;
fdb_error_t err = fdb_future_get_keyrange_array(f, &fdbKr, &count);
if (err) {
safeThrow(jenv, getThrowable(jenv, err));
return JNI_NULL;
}
jobjectArray kr_values = jenv->NewObjectArray(count, keyrange_class, NULL);
if (!kr_values) {
if (!jenv->ExceptionOccurred())
throwOutOfMem(jenv);
return JNI_NULL;
}
for (int i = 0; i < count; i++) {
jbyteArray beginArr = jenv->NewByteArray(fdbKr[i].begin_key_length);
if (!beginArr) {
if (!jenv->ExceptionOccurred())
throwOutOfMem(jenv);
return JNI_NULL;
}
jbyteArray endArr = jenv->NewByteArray(fdbKr[i].end_key_length);
if (!endArr) {
if (!jenv->ExceptionOccurred())
throwOutOfMem(jenv);
return JNI_NULL;
}
jenv->SetByteArrayRegion(beginArr, 0, fdbKr[i].begin_key_length, (const jbyte*)fdbKr[i].begin_key);
jenv->SetByteArrayRegion(endArr, 0, fdbKr[i].end_key_length, (const jbyte*)fdbKr[i].end_key);
jobject kr = jenv->NewObject(keyrange_class, keyrange_init, beginArr, endArr);
if (jenv->ExceptionOccurred())
return JNI_NULL;
jenv->SetObjectArrayElement(kr_values, i, kr);
if (jenv->ExceptionOccurred())
return JNI_NULL;
}
jobject krarr = jenv->NewObject(keyrange_array_result_class, keyrange_array_result_init, kr_values);
if (jenv->ExceptionOccurred())
return JNI_NULL;
return krarr;
}
// SOMEDAY: explore doing this more efficiently with Direct ByteBuffers
JNIEXPORT jobject JNICALL Java_com_apple_foundationdb_FutureResults_FutureResults_1get(JNIEnv* jenv,
jobject,
@ -830,6 +908,142 @@ Java_com_apple_foundationdb_FDBDatabase_Database_1waitPurgeGranulesComplete(JNIE
return (jlong)f;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FDBDatabase_Database_1blobbifyRange(JNIEnv* jenv,
jobject,
jlong dbPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes) {
if (!dbPtr || !beginKeyBytes || !endKeyBytes) {
throwParamNotNull(jenv);
return 0;
}
FDBDatabase* database = (FDBDatabase*)dbPtr;
uint8_t* beginKeyArr = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!beginKeyArr) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKeyArr = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKeyArr) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)beginKeyArr, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_database_blobbify_range(
database, beginKeyArr, jenv->GetArrayLength(beginKeyBytes), endKeyArr, jenv->GetArrayLength(endKeyBytes));
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)beginKeyArr, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKeyArr, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FDBDatabase_Database_1unblobbifyRange(JNIEnv* jenv,
jobject,
jlong dbPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes) {
if (!dbPtr || !beginKeyBytes || !endKeyBytes) {
throwParamNotNull(jenv);
return 0;
}
FDBDatabase* database = (FDBDatabase*)dbPtr;
uint8_t* beginKeyArr = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!beginKeyArr) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKeyArr = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKeyArr) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)beginKeyArr, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_database_unblobbify_range(
database, beginKeyArr, jenv->GetArrayLength(beginKeyBytes), endKeyArr, jenv->GetArrayLength(endKeyBytes));
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)beginKeyArr, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKeyArr, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FDBDatabase_Database_1listBlobbifiedRanges(JNIEnv* jenv,
jobject,
jlong dbPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes,
jint rangeLimit) {
if (!dbPtr || !beginKeyBytes || !endKeyBytes) {
throwParamNotNull(jenv);
return 0;
}
FDBDatabase* tr = (FDBDatabase*)dbPtr;
uint8_t* startKey = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!startKey) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKey = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKey) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_database_list_blobbified_ranges(
tr, startKey, jenv->GetArrayLength(beginKeyBytes), endKey, jenv->GetArrayLength(endKeyBytes), rangeLimit);
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKey, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT jlong JNICALL Java_com_apple_foundationdb_FDBDatabase_Database_1verifyBlobRange(JNIEnv* jenv,
jobject,
jlong dbPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes,
jlong version) {
if (!dbPtr || !beginKeyBytes || !endKeyBytes) {
throwParamNotNull(jenv);
return 0;
}
FDBDatabase* tr = (FDBDatabase*)dbPtr;
uint8_t* startKey = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!startKey) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKey = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKey) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_database_list_blobbified_ranges(
tr, startKey, jenv->GetArrayLength(beginKeyBytes), endKey, jenv->GetArrayLength(endKeyBytes), version);
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKey, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT jboolean JNICALL Java_com_apple_foundationdb_FDB_Error_1predicate(JNIEnv* jenv,
jobject,
jint predicate,
@ -1307,6 +1521,41 @@ Java_com_apple_foundationdb_FDBTransaction_Transaction_1getRangeSplitPoints(JNIE
return (jlong)f;
}
JNIEXPORT jlong JNICALL
Java_com_apple_foundationdb_FDBTransaction_Transaction_1getBlobGranuleRanges(JNIEnv* jenv,
jobject,
jlong tPtr,
jbyteArray beginKeyBytes,
jbyteArray endKeyBytes,
jint rowLimit) {
if (!tPtr || !beginKeyBytes || !endKeyBytes || !rowLimit) {
throwParamNotNull(jenv);
return 0;
}
FDBTransaction* tr = (FDBTransaction*)tPtr;
uint8_t* startKey = (uint8_t*)jenv->GetByteArrayElements(beginKeyBytes, JNI_NULL);
if (!startKey) {
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
uint8_t* endKey = (uint8_t*)jenv->GetByteArrayElements(endKeyBytes, JNI_NULL);
if (!endKey) {
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
if (!jenv->ExceptionOccurred())
throwRuntimeEx(jenv, "Error getting handle to native resources");
return 0;
}
FDBFuture* f = fdb_transaction_get_blob_granule_ranges(
tr, startKey, jenv->GetArrayLength(beginKeyBytes), endKey, jenv->GetArrayLength(endKeyBytes), rowLimit);
jenv->ReleaseByteArrayElements(beginKeyBytes, (jbyte*)startKey, JNI_ABORT);
jenv->ReleaseByteArrayElements(endKeyBytes, (jbyte*)endKey, JNI_ABORT);
return (jlong)f;
}
JNIEXPORT void JNICALL Java_com_apple_foundationdb_FDBTransaction_Transaction_1set(JNIEnv* jenv,
jobject,
jlong tPtr,
@ -1746,6 +1995,15 @@ jint JNI_OnLoad(JavaVM* vm, void* reserved) {
key_array_result_init = env->GetMethodID(local_key_array_result_class, "<init>", "([B[I)V");
key_array_result_class = (jclass)(env)->NewGlobalRef(local_key_array_result_class);
jclass local_keyrange_class = env->FindClass("com/apple/foundationdb/Range");
keyrange_init = env->GetMethodID(local_keyrange_class, "<init>", "([B[B)V");
keyrange_class = (jclass)(env)->NewGlobalRef(local_keyrange_class);
jclass local_keyrange_array_result_class = env->FindClass("com/apple/foundationdb/KeyRangeArrayResult");
keyrange_array_result_init =
env->GetMethodID(local_keyrange_array_result_class, "<init>", "([Lcom/apple/foundationdb/Range;)V");
keyrange_array_result_class = (jclass)(env)->NewGlobalRef(local_keyrange_array_result_class);
jclass local_range_result_summary_class = env->FindClass("com/apple/foundationdb/RangeResultSummary");
range_result_summary_init = env->GetMethodID(local_range_result_summary_class, "<init>", "([BIZ)V");
range_result_summary_class = (jclass)(env)->NewGlobalRef(local_range_result_summary_class);
@ -1770,6 +2028,12 @@ void JNI_OnUnload(JavaVM* vm, void* reserved) {
if (range_result_class != JNI_NULL) {
env->DeleteGlobalRef(range_result_class);
}
if (keyrange_array_result_class != JNI_NULL) {
env->DeleteGlobalRef(keyrange_array_result_class);
}
if (keyrange_class != JNI_NULL) {
env->DeleteGlobalRef(keyrange_class);
}
if (mapped_range_result_class != JNI_NULL) {
env->DeleteGlobalRef(mapped_range_result_class);
}

View File

@ -161,6 +161,20 @@ public interface Database extends AutoCloseable, TransactionContext {
*/
double getMainThreadBusyness();
/**
* Runs {@link #purgeBlobGranules(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param purgeVersion version to purge at
* @param force if true delete all data, if not keep data >= purgeVersion
*
* @return the key to watch for purge complete
*/
default CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force) {
return purgeBlobGranules(beginKey, endKey, purgeVersion, force, getExecutor());
}
/**
* Queues a purge of blob granules for the specified key range, at the specified version.
*
@ -168,17 +182,126 @@ public interface Database extends AutoCloseable, TransactionContext {
* @param endKey end of the key range
* @param purgeVersion version to purge at
* @param force if true delete all data, if not keep data >= purgeVersion
* @param e the {@link Executor} to use for asynchronous callbacks
* @return the key to watch for purge complete
*/
CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force, Executor e);
/**
* Wait for a previous call to purgeBlobGranules to complete
* Runs {@link #waitPurgeGranulesComplete(Function)} on the default executor.
*
* @param purgeKey key to watch
*/
default CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey) {
return waitPurgeGranulesComplete(purgeKey, getExecutor());
}
/**
* Wait for a previous call to purgeBlobGranules to complete.
*
* @param purgeKey key to watch
* @param e the {@link Executor} to use for asynchronous callbacks
*/
CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey, Executor e);
/**
* Runs {@link #blobbifyRange(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @return if the recording of the range was successful
*/
default CompletableFuture<Boolean> blobbifyRange(byte[] beginKey, byte[] endKey) {
return blobbifyRange(beginKey, endKey, getExecutor());
}
/**
* Sets a range to be blobbified in the database. Must be a completely unblobbified range.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param e the {@link Executor} to use for asynchronous callbacks
* @return if the recording of the range was successful
*/
CompletableFuture<Boolean> blobbifyRange(byte[] beginKey, byte[] endKey, Executor e);
/**
* Runs {@link #unblobbifyRange(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @return if the recording of the range was successful
*/
default CompletableFuture<Boolean> unblobbifyRange(byte[] beginKey, byte[] endKey) {
return unblobbifyRange(beginKey, endKey, getExecutor());
}
/**
* Unsets a blobbified range in the database. The range must be aligned to known blob ranges.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param e the {@link Executor} to use for asynchronous callbacks
* @return if the recording of the range was successful
*/
CompletableFuture<Boolean> unblobbifyRange(byte[] beginKey, byte[] endKey, Executor e);
/**
* Runs {@link #listBlobbifiedRanges(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param rangeLimit batch size
* @param e the {@link Executor} to use for asynchronous callbacks
* @return a future with the list of blobbified ranges: [lastLessThan(beginKey), firstGreaterThanOrEqual(endKey)]
*/
default CompletableFuture<KeyRangeArrayResult> listBlobbifiedRanges(byte[] beginKey, byte[] endKey, int rangeLimit) {
return listBlobbifiedRanges(beginKey, endKey, rangeLimit, getExecutor());
}
/**
* Lists blobbified ranges in the database. There may be more if result.size() == rangeLimit.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param rangeLimit batch size
* @param e the {@link Executor} to use for asynchronous callbacks
* @return a future with the list of blobbified ranges: [lastLessThan(beginKey), firstGreaterThanOrEqual(endKey)]
*/
CompletableFuture<KeyRangeArrayResult> listBlobbifiedRanges(byte[] beginKey, byte[] endKey, int rangeLimit, Executor e);
/**
* Runs {@link #verifyBlobRange(Function)} on the default executor.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param version version to read at
*
* @return a future with the version of the last blob granule.
*/
default CompletableFuture<Long> verifyBlobRange(byte[] beginKey, byte[] endKey, long version) {
return verifyBlobRange(beginKey, endKey, version, getExecutor());
}
/**
* Checks if a blob range is blobbified.
*
* @param beginKey start of the key range
* @param endKey end of the key range
* @param version version to read at
*
* @return a future with the version of the last blob granule.
*/
CompletableFuture<Long> verifyBlobRange(byte[] beginKey, byte[] endKey, long version, Executor e);
/**
* Runs a read-only transactional function against this {@code Database} with retry logic.
* {@link Function#apply(Object) apply(ReadTransaction)} will be called on the

View File

@ -201,20 +201,60 @@ class FDBDatabase extends NativeObjectWrapper implements Database, OptionConsume
}
@Override
public CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force, Executor executor) {
public CompletableFuture<byte[]> purgeBlobGranules(byte[] beginKey, byte[] endKey, long purgeVersion, boolean force, Executor e) {
pointerReadLock.lock();
try {
return new FutureKey(Database_purgeBlobGranules(getPtr(), beginKey, endKey, purgeVersion, force), executor, eventKeeper);
return new FutureKey(Database_purgeBlobGranules(getPtr(), beginKey, endKey, purgeVersion, force), e, eventKeeper);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey, Executor executor) {
public CompletableFuture<Void> waitPurgeGranulesComplete(byte[] purgeKey, Executor e) {
pointerReadLock.lock();
try {
return new FutureVoid(Database_waitPurgeGranulesComplete(getPtr(), purgeKey), executor);
return new FutureVoid(Database_waitPurgeGranulesComplete(getPtr(), purgeKey), e);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<Boolean> blobbifyRange(byte[] beginKey, byte[] endKey, Executor e) {
pointerReadLock.lock();
try {
return new FutureBool(Database_blobbifyRange(getPtr(), beginKey, endKey), e);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<Boolean> unblobbifyRange(byte[] beginKey, byte[] endKey, Executor e) {
pointerReadLock.lock();
try {
return new FutureBool(Database_unblobbifyRange(getPtr(), beginKey, endKey), e);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<KeyRangeArrayResult> listBlobbifiedRanges(byte[] beginKey, byte[] endKey, int rangeLimit, Executor e) {
pointerReadLock.lock();
try {
return new FutureKeyRangeArray(Database_listBlobbifiedRanges(getPtr(), beginKey, endKey, rangeLimit), e);
} finally {
pointerReadLock.unlock();
}
}
@Override
public CompletableFuture<Long> verifyBlobRange(byte[] beginKey, byte[] endKey, long version, Executor e) {
pointerReadLock.lock();
try {
return new FutureInt64(Database_verifyBlobRange(getPtr(), beginKey, endKey, version), e);
} finally {
pointerReadLock.unlock();
}
@ -237,4 +277,8 @@ class FDBDatabase extends NativeObjectWrapper implements Database, OptionConsume
private native double Database_getMainThreadBusyness(long cPtr);
private native long Database_purgeBlobGranules(long cPtr, byte[] beginKey, byte[] endKey, long purgeVersion, boolean force);
private native long Database_waitPurgeGranulesComplete(long cPtr, byte[] purgeKey);
private native long Database_blobbifyRange(long cPtr, byte[] beginKey, byte[] endKey);
private native long Database_unblobbifyRange(long cPtr, byte[] beginKey, byte[] endKey);
private native long Database_listBlobbifiedRanges(long cPtr, byte[] beginKey, byte[] endKey, int rangeLimit);
private native long Database_verifyBlobRange(long cPtr, byte[] beginKey, byte[] endKey, long version);
}

View File

@ -97,6 +97,11 @@ class FDBTransaction extends NativeObjectWrapper implements Transaction, OptionC
return FDBTransaction.this.getRangeSplitPoints(range, chunkSize);
}
@Override
public CompletableFuture<KeyRangeArrayResult> getBlobGranuleRanges(byte[] begin, byte[] end, int rowLimit) {
return FDBTransaction.this.getBlobGranuleRanges(begin, end, rowLimit);
}
@Override
public AsyncIterable<MappedKeyValue> getMappedRange(KeySelector begin, KeySelector end, byte[] mapper,
int limit, int matchIndex, boolean reverse,
@ -352,6 +357,16 @@ class FDBTransaction extends NativeObjectWrapper implements Transaction, OptionC
return this.getRangeSplitPoints(range.begin, range.end, chunkSize);
}
@Override
public CompletableFuture<KeyRangeArrayResult> getBlobGranuleRanges(byte[] begin, byte[] end, int rowLimit) {
pointerReadLock.lock();
try {
return new FutureKeyRangeArray(Transaction_getBlobGranuleRanges(getPtr(), begin, end, rowLimit), executor);
} finally {
pointerReadLock.unlock();
}
}
@Override
public AsyncIterable<MappedKeyValue> getMappedRange(KeySelector begin, KeySelector end, byte[] mapper, int limit,
int matchIndex, boolean reverse, StreamingMode mode) {
@ -842,4 +857,5 @@ class FDBTransaction extends NativeObjectWrapper implements Transaction, OptionC
private native long Transaction_getKeyLocations(long cPtr, byte[] key);
private native long Transaction_getEstimatedRangeSizeBytes(long cPtr, byte[] keyBegin, byte[] keyEnd);
private native long Transaction_getRangeSplitPoints(long cPtr, byte[] keyBegin, byte[] keyEnd, long chunkSize);
private native long Transaction_getBlobGranuleRanges(long cPtr, byte[] keyBegin, byte[] keyEnd, int rowLimit);
}

View File

@ -0,0 +1,37 @@
/*
* FutureBool.java
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2019 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.apple.foundationdb;
import java.util.concurrent.Executor;
class FutureBool extends NativeFuture<Boolean> {
FutureBool(long cPtr, Executor executor) {
super(cPtr);
registerMarshalCallback(executor);
}
@Override
protected Boolean getIfDone_internal(long cPtr) throws FDBException {
return FutureBool_get(cPtr);
}
private native boolean FutureBool_get(long cPtr) throws FDBException;
}

View File

@ -0,0 +1,37 @@
/*
* FutureKeyRangeArray.java
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2019 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.apple.foundationdb;
import java.util.concurrent.Executor;
class FutureKeyRangeArray extends NativeFuture<KeyRangeArrayResult> {
FutureKeyRangeArray(long cPtr, Executor executor) {
super(cPtr);
registerMarshalCallback(executor);
}
@Override
protected KeyRangeArrayResult getIfDone_internal(long cPtr) throws FDBException {
return FutureKeyRangeArray_get(cPtr);
}
private native KeyRangeArrayResult FutureKeyRangeArray_get(long cPtr) throws FDBException;
}

View File

@ -0,0 +1,36 @@
/*
* KeyRangeArrayResult.java
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2020 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.apple.foundationdb;
import java.util.Arrays;
import java.util.List;
public class KeyRangeArrayResult {
final List<Range> keyRanges;
public KeyRangeArrayResult(Range[] keyRangeArr) {
this.keyRanges = Arrays.asList(keyRangeArr);
}
public List<Range> getKeyRanges() {
return keyRanges;
}
}

View File

@ -513,6 +513,17 @@ public interface ReadTransaction extends ReadTransactionContext {
*/
CompletableFuture<KeyArrayResult> getRangeSplitPoints(Range range, long chunkSize);
/**
* Gets the blob granule ranges for a given region.
* Returned in batches, requires calling again moving the begin key up.
*
* @param begin beginning of the range (inclusive)
* @param end end of the range (exclusive)
* @return list of blob granules in the given range. May not be all.
*/
CompletableFuture<KeyRangeArrayResult> getBlobGranuleRanges(byte[] begin, byte[] end, int rowLimit);
/**
* Returns a set of options that can be set on a {@code Transaction}

View File

@ -29,6 +29,7 @@ import java.util.Optional;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ConcurrentHashMap;
import com.apple.foundationdb.Database;
import com.apple.foundationdb.FDB;
@ -64,7 +65,7 @@ abstract class Context implements Runnable, AutoCloseable {
private List<Thread> children = new LinkedList<>();
private static Map<String, TransactionState> transactionMap = new HashMap<>();
private static Map<Transaction, AtomicInteger> transactionRefCounts = new HashMap<>();
private static Map<byte[], Tenant> tenantMap = new HashMap<>();
private static Map<byte[], Tenant> tenantMap = new ConcurrentHashMap<>();
Context(Database db, byte[] prefix) {
this.db = db;

View File

@ -66,6 +66,9 @@ def test_size_limit_option(db):
except fdb.FDBError as e:
assert(e.code == 2101) # Transaction exceeds byte limit (2101)
# Reset the size limit for future tests
db.options.set_transaction_size_limit(10000000)
@fdb.transactional
def test_get_approximate_size(tr):
tr[b'key1'] = b'value1'

View File

@ -142,7 +142,7 @@ function(add_fdb_test)
${VALGRIND_OPTION}
${ADD_FDB_TEST_TEST_FILES}
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
set_tests_properties("${test_name}" PROPERTIES ENVIRONMENT UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1)
set_tests_properties("${test_name}" PROPERTIES ENVIRONMENT "${SANITIZER_OPTIONS}")
get_filename_component(test_dir_full ${first_file} DIRECTORY)
if(NOT ${test_dir_full} STREQUAL "")
get_filename_component(test_dir ${test_dir_full} NAME)
@ -172,8 +172,7 @@ function(stage_correctness_package)
file(MAKE_DIRECTORY ${STAGE_OUT_DIR}/bin)
string(LENGTH "${CMAKE_SOURCE_DIR}/tests/" base_length)
foreach(test IN LISTS TEST_NAMES)
if(("${TEST_TYPE_${test}}" STREQUAL "simulation") AND
(${test} MATCHES ${TEST_PACKAGE_INCLUDE}) AND
if((${test} MATCHES ${TEST_PACKAGE_INCLUDE}) AND
(NOT ${test} MATCHES ${TEST_PACKAGE_EXCLUDE}))
foreach(file IN LISTS TEST_FILES_${test})
string(SUBSTRING ${file} ${base_length} -1 rel_out_file)
@ -199,16 +198,17 @@ function(stage_correctness_package)
set(src_dir "${src_dir}/")
string(SUBSTRING ${src_dir} ${dir_len} -1 dest_dir)
string(SUBSTRING ${file} ${dir_len} -1 rel_out_file)
set(out_file ${STAGE_OUT_DIR}/${rel_out_file})
set(out_file ${STAGE_OUT_DIR}/${rel_out_file})
list(APPEND external_files ${out_file})
add_custom_command(
add_custom_command(
OUTPUT ${out_file}
DEPENDS ${file}
COMMAND ${CMAKE_COMMAND} -E copy ${file} ${out_file}
COMMENT "Copying ${STAGE_CONTEXT} external file ${file}"
)
DEPENDS ${file}
COMMAND ${CMAKE_COMMAND} -E copy ${file} ${out_file}
COMMENT "Copying ${STAGE_CONTEXT} external file ${file}"
)
endforeach()
endforeach()
list(APPEND package_files ${STAGE_OUT_DIR}/bin/fdbserver
${STAGE_OUT_DIR}/bin/coverage.fdbserver.xml
${STAGE_OUT_DIR}/bin/coverage.fdbclient.xml
@ -218,6 +218,7 @@ function(stage_correctness_package)
${STAGE_OUT_DIR}/bin/TraceLogHelper.dll
${STAGE_OUT_DIR}/CMakeCache.txt
)
add_custom_command(
OUTPUT ${package_files}
DEPENDS ${CMAKE_BINARY_DIR}/CMakeCache.txt
@ -239,6 +240,20 @@ function(stage_correctness_package)
${STAGE_OUT_DIR}/bin
COMMENT "Copying files for ${STAGE_CONTEXT} package"
)
set(test_harness_dir "${CMAKE_SOURCE_DIR}/contrib/TestHarness2")
file(GLOB_RECURSE test_harness2_files RELATIVE "${test_harness_dir}" CONFIGURE_DEPENDS "${test_harness_dir}/*.py")
foreach(file IN LISTS test_harness2_files)
set(src_file "${test_harness_dir}/${file}")
set(out_file "${STAGE_OUT_DIR}/${file}")
get_filename_component(dir "${out_file}" DIRECTORY)
file(MAKE_DIRECTORY "${dir}")
add_custom_command(OUTPUT ${out_file}
COMMAND ${CMAKE_COMMAND} -E copy "${src_file}" "${out_file}"
DEPENDS "${src_file}")
list(APPEND package_files "${out_file}")
endforeach()
list(APPEND package_files ${test_files} ${external_files})
if(STAGE_OUT_FILES)
set(${STAGE_OUT_FILES} ${package_files} PARENT_SCOPE)
@ -404,7 +419,7 @@ endfunction()
# Creates a single cluster before running the specified command (usually a ctest test)
function(add_fdbclient_test)
set(options DISABLED ENABLED DISABLE_LOG_DUMP API_TEST_BLOB_GRANULES_ENABLED TLS_ENABLED)
set(options DISABLED ENABLED DISABLE_TENANTS DISABLE_LOG_DUMP API_TEST_BLOB_GRANULES_ENABLED TLS_ENABLED)
set(oneValueArgs NAME PROCESS_NUMBER TEST_TIMEOUT WORKING_DIRECTORY)
set(multiValueArgs COMMAND)
cmake_parse_arguments(T "${options}" "${oneValueArgs}" "${multiValueArgs}" "${ARGN}")
@ -431,6 +446,9 @@ function(add_fdbclient_test)
if(T_DISABLE_LOG_DUMP)
list(APPEND TMP_CLUSTER_CMD --disable-log-dump)
endif()
if(T_DISABLE_TENANTS)
list(APPEND TMP_CLUSTER_CMD --disable-tenants)
endif()
if(T_API_TEST_BLOB_GRANULES_ENABLED)
list(APPEND TMP_CLUSTER_CMD --blob-granules-enabled)
endif()
@ -447,9 +465,13 @@ function(add_fdbclient_test)
set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT ${T_TEST_TIMEOUT})
else()
# default timeout
set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT 300)
if(USE_SANITIZER)
set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT 1200)
else()
set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT 300)
endif()
endif()
set_tests_properties("${T_NAME}" PROPERTIES ENVIRONMENT UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1)
set_tests_properties("${T_NAME}" PROPERTIES ENVIRONMENT "${SANITIZER_OPTIONS}")
endfunction()
# Creates a cluster file for a nonexistent cluster before running the specified command
@ -483,7 +505,7 @@ function(add_unavailable_fdbclient_test)
# default timeout
set_tests_properties("${T_NAME}" PROPERTIES TIMEOUT 60)
endif()
set_tests_properties("${T_NAME}" PROPERTIES ENVIRONMENT UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1)
set_tests_properties("${T_NAME}" PROPERTIES ENVIRONMENT "${SANITIZER_OPTIONS}")
endfunction()
# Creates 3 distinct clusters before running the specified command.

View File

@ -69,6 +69,7 @@ if(WIN32)
add_definitions(-DWIN32_LEAN_AND_MEAN)
add_definitions(-D_ITERATOR_DEBUG_LEVEL=0)
add_definitions(-DNOGDI) # WinGDI.h defines macro ERROR
add_definitions(-D_USE_MATH_DEFINES) # Math constants
endif()
if (USE_CCACHE)
@ -191,6 +192,7 @@ else()
endif()
if(USE_GCOV)
add_compile_options(--coverage)
add_link_options(--coverage)
endif()
@ -199,6 +201,8 @@ else()
-fsanitize=undefined
# TODO(atn34) Re-enable -fsanitize=alignment once https://github.com/apple/foundationdb/issues/1434 is resolved
-fno-sanitize=alignment
# https://github.com/apple/foundationdb/issues/7955
-fno-sanitize=function
-DBOOST_USE_UCONTEXT)
list(APPEND SANITIZER_LINK_OPTIONS -fsanitize=undefined)
endif()

View File

@ -11,7 +11,7 @@ endif()
include(ExternalProject)
ExternalProject_Add(awssdk_project
GIT_REPOSITORY https://github.com/aws/aws-sdk-cpp.git
GIT_TAG 2af3ce543c322cb259471b3b090829464f825972 # v1.9.200
GIT_TAG e4b4b310d8631bc7e9a797b6ac03a73c6f210bf6 # v1.9.331
SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}/awssdk-src"
BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/awssdk-build"
GIT_CONFIG advice.detachedHead=false
@ -35,6 +35,7 @@ ExternalProject_Add(awssdk_project
"${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-c-event-stream.a"
"${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-c-http.a"
"${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-c-mqtt.a"
"${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-c-sdkutils.a"
"${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-c-io.a"
"${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-checksums.a"
"${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-c-compression.a"
@ -75,6 +76,10 @@ add_library(awssdk_c_io STATIC IMPORTED)
add_dependencies(awssdk_c_io awssdk_project)
set_target_properties(awssdk_c_io PROPERTIES IMPORTED_LOCATION "${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-c-io.a")
add_library(awssdk_c_sdkutils STATIC IMPORTED)
add_dependencies(awssdk_c_sdkutils awssdk_project)
set_target_properties(awssdk_c_sdkutils PROPERTIES IMPORTED_LOCATION "${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-c-sdkutils.a")
add_library(awssdk_checksums STATIC IMPORTED)
add_dependencies(awssdk_checksums awssdk_project)
set_target_properties(awssdk_checksums PROPERTIES IMPORTED_LOCATION "${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/lib64/libaws-checksums.a")
@ -94,4 +99,4 @@ set_target_properties(awssdk_c_common PROPERTIES IMPORTED_LOCATION "${CMAKE_CURR
# link them all together in one interface target
add_library(awssdk_target INTERFACE)
target_include_directories(awssdk_target SYSTEM INTERFACE ${CMAKE_CURRENT_BINARY_DIR}/awssdk-build/install/include)
target_link_libraries(awssdk_target INTERFACE awssdk_core awssdk_crt awssdk_c_s3 awssdk_c_auth awssdk_c_eventstream awssdk_c_http awssdk_c_mqtt awssdk_c_io awssdk_checksums awssdk_c_compression awssdk_c_cal awssdk_c_common curl)
target_link_libraries(awssdk_target INTERFACE awssdk_core awssdk_crt awssdk_c_s3 awssdk_c_auth awssdk_c_eventstream awssdk_c_http awssdk_c_mqtt awssdk_c_sdkutils awssdk_c_io awssdk_checksums awssdk_c_compression awssdk_c_cal awssdk_c_common curl)

View File

@ -4,4 +4,6 @@
export ASAN_OPTIONS="detect_leaks=0"
OLDBINDIR="${OLDBINDIR:-/app/deploy/global_data/oldBinaries}"
mono bin/TestHarness.exe joshua-run "${OLDBINDIR}" false
#mono bin/TestHarness.exe joshua-run "${OLDBINDIR}" false
python3 -m test_harness.app -s ${JOSHUA_SEED} --old-binaries-path ${OLDBINDIR}

View File

@ -1,4 +1,4 @@
#!/bin/bash -u
for file in `find . -name 'trace*.xml'` ; do
mono ./bin/TestHarness.exe summarize "${file}" summary.xml "" JoshuaTimeout true
done
python3 -m test_harness.timeout

View File

@ -1,3 +1,3 @@
#!/bin/sh
OLDBINDIR="${OLDBINDIR:-/app/deploy/global_data/oldBinaries}"
mono bin/TestHarness.exe joshua-run "${OLDBINDIR}" true
python3 -m test_harness.app -s ${JOSHUA_SEED} --old-binaries-path ${OLDBINDIR} --use-valgrind

View File

@ -1,6 +1,2 @@
#!/bin/bash -u
for file in `find . -name 'trace*.xml'` ; do
for valgrindFile in `find . -name 'valgrind*.xml'` ; do
mono ./bin/TestHarness.exe summarize "${file}" summary.xml "${valgrindFile}" JoshuaTimeout true
done
done
python3 -m test_harness.timeout --use-valgrind

View File

@ -19,6 +19,7 @@
*/
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
@ -302,6 +303,7 @@ namespace SummarizeTest
uniqueFileSet.Add(file.Substring(0, file.LastIndexOf("-"))); // all restarting tests end with -1.txt or -2.txt
}
uniqueFiles = uniqueFileSet.ToArray();
Array.Sort(uniqueFiles);
testFile = random.Choice(uniqueFiles);
// The on-disk format changed in 4.0.0, and 5.x can't load files from 3.x.
string oldBinaryVersionLowerBound = "4.0.0";
@ -334,8 +336,9 @@ namespace SummarizeTest
// thus, by definition, if "until_" appears, we do not want to run with the current binary version
oldBinaries = oldBinaries.Concat(currentBinary);
}
List<string> oldBinariesList = oldBinaries.ToList<string>();
if (oldBinariesList.Count == 0) {
string[] oldBinariesList = oldBinaries.ToArray<string>();
Array.Sort(oldBinariesList);
if (oldBinariesList.Count() == 0) {
// In theory, restarting tests are named to have at least one old binary version to run
// But if none of the provided old binaries fall in the range, we just skip the test
Console.WriteLine("No available old binary version from {0} to {1}", oldBinaryVersionLowerBound, oldBinaryVersionUpperBound);
@ -347,6 +350,7 @@ namespace SummarizeTest
else
{
uniqueFiles = Directory.GetFiles(testDir);
Array.Sort(uniqueFiles);
testFile = random.Choice(uniqueFiles);
}
}
@ -487,6 +491,16 @@ namespace SummarizeTest
useValgrind ? "on" : "off");
}
IDictionary data = Environment.GetEnvironmentVariables();
foreach (DictionaryEntry i in data)
{
string k=(string)i.Key;
string v=(string)i.Value;
if (k.StartsWith("FDB_KNOB")) {
process.StartInfo.EnvironmentVariables[k]=v;
}
}
process.Start();
// SOMEDAY: Do we want to actually do anything with standard output or error?
@ -718,7 +732,7 @@ namespace SummarizeTest
process.Refresh();
if (process.HasExited)
return;
long mem = process.PrivateMemorySize64;
long mem = process.PagedMemorySize64;
MaxMem = Math.Max(MaxMem, mem);
//Console.WriteLine(string.Format("Process used {0} bytes", MaxMem));
Thread.Sleep(1000);
@ -744,16 +758,28 @@ namespace SummarizeTest
AppendToSummary(summaryFileName, xout);
}
// Parses the valgrind XML file and returns a list of "what" tags for each error.
static string ParseValgrindStack(XElement stackElement) {
string backtrace = "";
foreach (XElement frame in stackElement.Elements()) {
backtrace += " " + frame.Element("ip").Value.ToLower();
}
if (backtrace.Length > 0) {
backtrace = "addr2line -e fdbserver.debug -p -C -f -i" + backtrace;
}
return backtrace;
}
// Parses the valgrind XML file and returns a list of error elements.
// All errors for which the "kind" tag starts with "Leak" are ignored
static string[] ParseValgrindOutput(string valgrindOutputFileName, bool traceToStdout)
static XElement[] ParseValgrindOutput(string valgrindOutputFileName, bool traceToStdout)
{
if (!traceToStdout)
{
Console.WriteLine("Reading vXML file: " + valgrindOutputFileName);
}
ISet<string> whats = new HashSet<string>();
IList<XElement> errors = new List<XElement>();
XElement xdoc = XDocument.Load(valgrindOutputFileName).Element("valgrindoutput");
foreach(var elem in xdoc.Elements()) {
if (elem.Name != "error")
@ -761,9 +787,29 @@ namespace SummarizeTest
string kind = elem.Element("kind").Value;
if(kind.StartsWith("Leak"))
continue;
whats.Add(elem.Element("what").Value);
XElement errorElement = new XElement("ValgrindError",
new XAttribute("Severity", (int)Magnesium.Severity.SevError));
int num = 1;
string suffix = "";
foreach (XElement sub in elem.Elements()) {
if (sub.Name == "what") {
errorElement.SetAttributeValue("What", sub.Value);
} else if (sub.Name == "auxwhat") {
suffix = "Aux" + num++;
errorElement.SetAttributeValue("What" + suffix, sub.Value);
} else if (sub.Name == "stack") {
errorElement.SetAttributeValue("Backtrace" + suffix, ParseValgrindStack(sub));
} else if (sub.Name == "origin") {
errorElement.SetAttributeValue("WhatOrigin", sub.Element("what").Value);
errorElement.SetAttributeValue("BacktraceOrigin", ParseValgrindStack(sub.Element("stack")));
}
}
errors.Add(errorElement);
}
return whats.ToArray();
return errors.ToArray();
}
delegate IEnumerable<Magnesium.Event> parseDelegate(System.IO.Stream stream, string file,
@ -927,6 +973,10 @@ namespace SummarizeTest
{
xout.Add(new XElement(ev.Type, new XAttribute("File", ev.Details.File), new XAttribute("Line", ev.Details.Line)));
}
if (ev.Type == "RunningUnitTest")
{
xout.Add(new XElement(ev.Type, new XAttribute("Name", ev.Details.Name), new XAttribute("File", ev.Details.File), new XAttribute("Line", ev.Details.Line)));
}
if (ev.Type == "TestsExpectedToPass")
testCount = int.Parse(ev.Details.Count);
if (ev.Type == "TestResults" && ev.Details.Passed == "1")
@ -1065,12 +1115,10 @@ namespace SummarizeTest
try
{
// If there are any errors reported "ok" will be set to false
var whats = ParseValgrindOutput(valgrindOutputFileName, traceToStdout);
foreach (var what in whats)
var valgrindErrors = ParseValgrindOutput(valgrindOutputFileName, traceToStdout);
foreach (var vError in valgrindErrors)
{
xout.Add(new XElement("ValgrindError",
new XAttribute("Severity", (int)Magnesium.Severity.SevError),
new XAttribute("What", what)));
xout.Add(vError);
ok = false;
error = true;
}

2
contrib/TestHarness2/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
/tmp/
/venv

View File

@ -0,0 +1,2 @@
# Currently this file is left intentionally empty. It's main job for now is to indicate that this directory
# should be used as a module.

View File

@ -0,0 +1,25 @@
import argparse
import sys
import traceback
from test_harness.config import config
from test_harness.run import TestRunner
from test_harness.summarize import SummaryTree
if __name__ == '__main__':
try:
parser = argparse.ArgumentParser('TestHarness', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
config.build_arguments(parser)
args = parser.parse_args()
config.extract_args(args)
test_runner = TestRunner()
if not test_runner.run():
exit(1)
except Exception as e:
_, _, exc_traceback = sys.exc_info()
error = SummaryTree('TestHarnessError')
error.attributes['Severity'] = '40'
error.attributes['ErrorMessage'] = str(e)
error.attributes['Trace'] = repr(traceback.format_tb(exc_traceback))
error.dump(sys.stdout)
exit(1)

View File

@ -0,0 +1,263 @@
from __future__ import annotations
import argparse
import collections
import copy
import os
import random
from enum import Enum
from pathlib import Path
from typing import List, Any, OrderedDict, Dict
class BuggifyOptionValue(Enum):
ON = 1
OFF = 2
RANDOM = 3
class BuggifyOption:
def __init__(self, val: str | None = None):
self.value = BuggifyOptionValue.RANDOM
if val is not None:
v = val.lower()
if v in ['on', '1', 'true']:
self.value = BuggifyOptionValue.ON
elif v in ['off', '0', 'false']:
self.value = BuggifyOptionValue.OFF
elif v in ['random', 'rnd', 'r']:
pass
else:
assert False, 'Invalid value {} -- use true, false, or random'.format(v)
class ConfigValue:
def __init__(self, name: str, **kwargs):
self.name = name
self.value = None
self.kwargs = kwargs
if 'default' in self.kwargs:
self.value = self.kwargs['default']
def get_arg_name(self) -> str:
if 'long_name' in self.kwargs:
return self.kwargs['long_name']
else:
return self.name
def add_to_args(self, parser: argparse.ArgumentParser):
kwargs = copy.copy(self.kwargs)
long_name = self.name
short_name = None
if 'long_name' in kwargs:
long_name = kwargs['long_name']
del kwargs['long_name']
if 'short_name' in kwargs:
short_name = kwargs['short_name']
del kwargs['short_name']
if 'action' in kwargs and kwargs['action'] in ['store_true', 'store_false']:
del kwargs['type']
long_name = long_name.replace('_', '-')
if short_name is None:
# line below is useful for debugging
# print('add_argument(\'--{}\', [{{{}}}])'.format(long_name, ', '.join(['\'{}\': \'{}\''.format(k, v)
# for k, v in kwargs.items()])))
parser.add_argument('--{}'.format(long_name), **kwargs)
else:
# line below is useful for debugging
# print('add_argument(\'-{}\', \'--{}\', [{{{}}}])'.format(short_name, long_name,
# ', '.join(['\'{}\': \'{}\''.format(k, v)
# for k, v in kwargs.items()])))
parser.add_argument('-{}'.format(short_name), '--{}'.format(long_name), **kwargs)
def get_value(self, args: argparse.Namespace) -> tuple[str, Any]:
return self.name, args.__getattribute__(self.get_arg_name())
class Config:
"""
This is the central configuration class for test harness. The values in this class are exposed globally through
a global variable test_harness.config.config. This class provides some "magic" to keep test harness flexible.
Each parameter can further be configured using an `_args` member variable which is expected to be a dictionary.
* The value of any variable can be set through the command line. For a variable named `variable_name` we will
by default create a new command line option `--variable-name` (`_` is automatically changed to `-`). This
default can be changed by setting the `'long_name'` property in the `_arg` dict.
* In addition the user can also optionally set a short-name. This can be achieved by setting the `'short_name'`
property in the `_arg` dictionary.
* All additional properties in `_args` are passed to `argparse.add_argument`.
* If the default of a variable is `None` the user should explicitly set the `'type'` property to an appropriate
type.
* In addition to command line flags, all configuration options can also be controlled through environment variables.
By default, `variable-name` can be changed by setting the environment variable `TH_VARIABLE_NAME`. This default
can be changed by setting the `'env_name'` property.
* Test harness comes with multiple executables. Each of these should use the config facility. For this,
`Config.build_arguments` should be called first with the `argparse` parser. Then `Config.extract_args` needs
to be called with the result of `argparse.ArgumentParser.parse_args`. A sample example could look like this:
```
parser = argparse.ArgumentParser('TestHarness', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
config.build_arguments(parser)
args = parser.parse_args()
config.extract_args(args)
```
* Changing the default value for all executables might not always be desirable. If it should be only changed for
one executable Config.change_default should be used.
"""
def __init__(self):
self.random = random.Random()
self.cluster_file: str | None = None
self.cluster_file_args = {'short_name': 'C', 'type': str, 'help': 'Path to fdb cluster file', 'required': False,
'env_name': 'JOSHUA_CLUSTER_FILE'}
self.joshua_dir: str | None = None
self.joshua_dir_args = {'type': str, 'help': 'Where to write FDB data to', 'required': False,
'env_name': 'JOSHUA_APP_DIR'}
self.stats: str | None = None
self.stats_args = {'type': str, 'help': 'A base64 encoded list of statistics (used to reproduce runs)',
'required': False}
self.random_seed: int | None = None
self.random_seed_args = {'type': int,
'help': 'Force given seed given to fdbserver -- mostly useful for debugging',
'required': False}
self.kill_seconds: int = 30 * 60
self.kill_seconds_args = {'help': 'Timeout for individual test'}
self.buggify_on_ratio: float = 0.8
self.buggify_on_ratio_args = {'help': 'Probability that buggify is turned on'}
self.write_run_times = False
self.write_run_times_args = {'help': 'Write back probabilities after each test run',
'action': 'store_true'}
self.unseed_check_ratio: float = 0.05
self.unseed_check_ratio_args = {'help': 'Probability for doing determinism check'}
self.test_dirs: List[str] = ['slow', 'fast', 'restarting', 'rare', 'noSim']
self.test_dirs_args: dict = {'nargs': '*', 'help': 'test_directories to look for files in'}
self.trace_format: str = 'json'
self.trace_format_args = {'choices': ['json', 'xml'], 'help': 'What format fdb should produce'}
self.crash_on_error: bool = True
self.crash_on_error_args = {'long_name': 'no_crash', 'action': 'store_false',
'help': 'Don\'t crash on first error'}
self.max_warnings: int = 10
self.max_warnings_args = {'short_name': 'W'}
self.max_errors: int = 10
self.max_errors_args = {'short_name': 'E'}
self.old_binaries_path: Path = Path('/app/deploy/global_data/oldBinaries/')
self.old_binaries_path_args = {'help': 'Path to the directory containing the old fdb binaries'}
self.use_valgrind: bool = False
self.use_valgrind_args = {'action': 'store_true'}
self.buggify = BuggifyOption('random')
self.buggify_args = {'short_name': 'b', 'choices': ['on', 'off', 'random']}
self.pretty_print: bool = False
self.pretty_print_args = {'short_name': 'P', 'action': 'store_true'}
self.clean_up: bool = True
self.clean_up_args = {'long_name': 'no_clean_up', 'action': 'store_false'}
self.run_dir: Path = Path('tmp')
self.joshua_seed: int = random.randint(0, 2 ** 32 - 1)
self.joshua_seed_args = {'short_name': 's', 'help': 'A random seed', 'env_name': 'JOSHUA_SEED'}
self.print_coverage = False
self.print_coverage_args = {'action': 'store_true'}
self.binary = Path('bin') / ('fdbserver.exe' if os.name == 'nt' else 'fdbserver')
self.binary_args = {'help': 'Path to executable'}
self.hit_per_runs_ratio: int = 20000
self.hit_per_runs_ratio_args = {'help': 'Maximum test runs before each code probe hit at least once'}
self.output_format: str = 'xml'
self.output_format_args = {'short_name': 'O', 'choices': ['json', 'xml'],
'help': 'What format TestHarness should produce'}
self.include_test_files: str = r'.*'
self.include_test_files_args = {'help': 'Only consider test files whose path match against the given regex'}
self.exclude_test_files: str = r'.^'
self.exclude_test_files_args = {'help': 'Don\'t consider test files whose path match against the given regex'}
self.include_test_classes: str = r'.*'
self.include_test_classes_args = {'help': 'Only consider tests whose names match against the given regex'}
self.exclude_test_names: str = r'.^'
self.exclude_test_names_args = {'help': 'Don\'t consider tests whose names match against the given regex'}
self.details: bool = False
self.details_args = {'help': 'Print detailed results', 'short_name': 'c', 'action': 'store_true'}
self.success: bool = False
self.success_args = {'help': 'Print successful results', 'action': 'store_true'}
self.cov_include_files: str = r'.*'
self.cov_include_files_args = {'help': 'Only consider coverage traces that originated in files matching regex'}
self.cov_exclude_files: str = r'.^'
self.cov_exclude_files_args = {'help': 'Ignore coverage traces that originated in files matching regex'}
self.max_stderr_bytes: int = 1000
self.write_stats: bool = True
self.read_stats: bool = True
self.reproduce_prefix: str | None = None
self.reproduce_prefix_args = {'type': str, 'required': False,
'help': 'When printing the results, prepend this string to the command'}
self._env_names: Dict[str, str] = {}
self._config_map = self._build_map()
self._read_env()
self.random.seed(self.joshua_seed, version=2)
def change_default(self, attr: str, default_val):
assert attr in self._config_map, 'Unknown config attribute {}'.format(attr)
self.__setattr__(attr, default_val)
self._config_map[attr].kwargs['default'] = default_val
def _get_env_name(self, var_name: str) -> str:
return self._env_names.get(var_name, 'TH_{}'.format(var_name.upper()))
def dump(self):
for attr in dir(self):
obj = getattr(self, attr)
if attr == 'random' or attr.startswith('_') or callable(obj) or attr.endswith('_args'):
continue
print('config.{}: {} = {}'.format(attr, type(obj), obj))
def _build_map(self) -> OrderedDict[str, ConfigValue]:
config_map: OrderedDict[str, ConfigValue] = collections.OrderedDict()
for attr in dir(self):
obj = getattr(self, attr)
if attr == 'random' or attr.startswith('_') or callable(obj):
continue
if attr.endswith('_args'):
name = attr[0:-len('_args')]
assert name in config_map
assert isinstance(obj, dict)
for k, v in obj.items():
if k == 'env_name':
self._env_names[name] = v
else:
config_map[name].kwargs[k] = v
else:
# attribute_args has to be declared after the attribute
assert attr not in config_map
val_type = type(obj)
kwargs = {'type': val_type, 'default': obj}
config_map[attr] = ConfigValue(attr, **kwargs)
return config_map
def _read_env(self):
for attr in dir(self):
obj = getattr(self, attr)
if attr == 'random' or attr.startswith('_') or attr.endswith('_args') or callable(obj):
continue
env_name = self._get_env_name(attr)
attr_type = self._config_map[attr].kwargs['type']
assert type(None) != attr_type
e = os.getenv(env_name)
if e is not None:
# Use the env var to supply the default value, so that if the
# environment variable is set and the corresponding command line
# flag is not, the environment variable has an effect.
self._config_map[attr].kwargs['default'] = attr_type(e)
def build_arguments(self, parser: argparse.ArgumentParser):
for val in self._config_map.values():
val.add_to_args(parser)
def extract_args(self, args: argparse.Namespace):
for val in self._config_map.values():
k, v = val.get_value(args)
if v is not None:
config.__setattr__(k, v)
self.random.seed(self.joshua_seed, version=2)
config = Config()
if __name__ == '__main__':
# test the config setup
parser = argparse.ArgumentParser('TestHarness Config Tester',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
config.build_arguments(parser)
args = parser.parse_args()
config.extract_args(args)
config.dump()

View File

@ -0,0 +1,144 @@
from __future__ import annotations
from typing import OrderedDict, Tuple, List
import collections
import fdb
import fdb.tuple
import struct
from test_harness.run import StatFetcher, TestDescription
from test_harness.config import config
from test_harness.summarize import SummaryTree, Coverage
# Before increasing this, make sure that all Joshua clusters (at Apple and Snowflake) have been upgraded.
# This version needs to be changed if we either need newer features from FDB or the current API version is
# getting retired.
fdb.api_version(630)
def str_to_tuple(s: str | None):
if s is None:
return s
return tuple(s.split(','))
fdb_db = None
def open_db(cluster_file: str | None):
global fdb_db
if fdb_db is None:
fdb_db = fdb.open(cluster_file)
return fdb_db
def chunkify(iterable, sz: int):
res = []
for item in iterable:
res.append(item)
if len(res) >= sz:
yield res
res = []
if len(res) > 0:
yield res
@fdb.transactional
def write_coverage_chunk(tr, path: Tuple[str, ...], metadata: Tuple[str, ...],
coverage: List[Tuple[Coverage, bool]], initialized: bool) -> bool:
cov_dir = fdb.directory.create_or_open(tr, path)
if not initialized:
metadata_dir = fdb.directory.create_or_open(tr, metadata)
v = tr[metadata_dir['initialized']]
initialized = v.present()
for cov, covered in coverage:
if not initialized or covered:
tr.add(cov_dir.pack((cov.file, cov.line, cov.comment)), struct.pack('<I', 1 if covered else 0))
return initialized
@fdb.transactional
def set_initialized(tr, metadata: Tuple[str, ...]):
metadata_dir = fdb.directory.create_or_open(tr, metadata)
tr[metadata_dir['initialized']] = fdb.tuple.pack((True,))
def write_coverage(cluster_file: str | None, cov_path: Tuple[str, ...], metadata: Tuple[str, ...],
coverage: OrderedDict[Coverage, bool]):
db = open_db(cluster_file)
assert config.joshua_dir is not None
initialized: bool = False
for chunk in chunkify(coverage.items(), 100):
initialized = write_coverage_chunk(db, cov_path, metadata, chunk, initialized)
if not initialized:
set_initialized(db, metadata)
@fdb.transactional
def _read_coverage(tr, cov_path: Tuple[str, ...]) -> OrderedDict[Coverage, int]:
res = collections.OrderedDict()
cov_dir = fdb.directory.create_or_open(tr, cov_path)
for k, v in tr[cov_dir.range()]:
file, line, comment = cov_dir.unpack(k)
count = struct.unpack('<I', v)[0]
res[Coverage(file, line, comment)] = count
return res
def read_coverage(cluster_file: str | None, cov_path: Tuple[str, ...]) -> OrderedDict[Coverage, int]:
db = open_db(cluster_file)
return _read_coverage(db, cov_path)
class TestStatistics:
def __init__(self, runtime: int, run_count: int):
self.runtime: int = runtime
self.run_count: int = run_count
class Statistics:
def __init__(self, cluster_file: str | None, joshua_dir: Tuple[str, ...]):
self.db = open_db(cluster_file)
self.stats_dir = self.open_stats_dir(self.db, joshua_dir)
self.stats: OrderedDict[str, TestStatistics] = self.read_stats_from_db(self.db)
@fdb.transactional
def open_stats_dir(self, tr, app_dir: Tuple[str]):
stats_dir = app_dir + ('runtime_stats',)
return fdb.directory.create_or_open(tr, stats_dir)
@fdb.transactional
def read_stats_from_db(self, tr) -> OrderedDict[str, TestStatistics]:
result = collections.OrderedDict()
for k, v in tr[self.stats_dir.range()]:
test_name = self.stats_dir.unpack(k)[0]
runtime, run_count = struct.unpack('<II', v)
result[test_name] = TestStatistics(runtime, run_count)
return result
@fdb.transactional
def _write_runtime(self, tr, test_name: str, time: int) -> None:
key = self.stats_dir.pack((test_name,))
tr.add(key, struct.pack('<II', time, 1))
def write_runtime(self, test_name: str, time: int) -> None:
assert self.db is not None
self._write_runtime(self.db, test_name, time)
class FDBStatFetcher(StatFetcher):
def __init__(self, tests: OrderedDict[str, TestDescription],
joshua_dir: Tuple[str] = str_to_tuple(config.joshua_dir)):
super().__init__(tests)
self.statistics = Statistics(config.cluster_file, joshua_dir)
def read_stats(self):
for k, v in self.statistics.stats.items():
if k in self.tests.keys():
self.tests[k].total_runtime = v.runtime
self.tests[k].num_runs = v.run_count
def add_run_time(self, test_name: str, runtime: int, out: SummaryTree):
self.statistics.write_runtime(test_name, runtime)
super().add_run_time(test_name, runtime, out)

View File

@ -0,0 +1,161 @@
from __future__ import annotations
import collections
import io
import sys
import xml.sax
import xml.sax.handler
from pathlib import Path
from typing import List, OrderedDict, Set
from joshua import joshua_model
import test_harness.run
from test_harness.config import config
from test_harness.summarize import SummaryTree
class ToSummaryTree(xml.sax.handler.ContentHandler):
def __init__(self):
super().__init__()
self.root: SummaryTree | None = None
self.stack: List[SummaryTree] = []
def result(self) -> SummaryTree:
assert len(self.stack) == 0 and self.root is not None, 'Parse Error'
return self.root
def startElement(self, name, attrs):
new_child = SummaryTree(name)
for k, v in attrs.items():
new_child.attributes[k] = v
self.stack.append(new_child)
def endElement(self, name):
closed = self.stack.pop()
assert closed.name == name
if len(self.stack) == 0:
self.root = closed
else:
self.stack[-1].children.append(closed)
def _print_summary(summary: SummaryTree, commands: Set[str]):
cmd = []
if config.reproduce_prefix is not None:
cmd.append(config.reproduce_prefix)
cmd.append('fdbserver')
if 'TestFile' in summary.attributes:
file_name = summary.attributes['TestFile']
role = 'test' if test_harness.run.is_no_sim(Path(file_name)) else 'simulation'
cmd += ['-r', role, '-f', file_name]
else:
cmd += ['-r', 'simulation', '-f', '<ERROR>']
if 'RandomSeed' in summary.attributes:
cmd += ['-s', summary.attributes['RandomSeed']]
else:
cmd += ['-s', '<Error>']
if 'BuggifyEnabled' in summary.attributes:
arg = 'on'
if summary.attributes['BuggifyEnabled'].lower() in ['0', 'off', 'false']:
arg = 'off'
cmd += ['-b', arg]
else:
cmd += ['b', '<ERROR>']
cmd += ['--crash', '--trace_format', config.trace_format]
key = ' '.join(cmd)
count = 1
while key in commands:
key = '{} # {}'.format(' '.join(cmd), count)
count += 1
# we want the command as the first attribute
attributes = {'Command': ' '.join(cmd)}
for k, v in summary.attributes.items():
if k == 'Errors':
attributes['ErrorCount'] = v
else:
attributes[k] = v
summary.attributes = attributes
if config.details:
key = str(len(commands))
str_io = io.StringIO()
summary.dump(str_io, prefix=(' ' if config.pretty_print else ''))
if config.output_format == 'json':
sys.stdout.write('{}"Test{}": {}'.format(' ' if config.pretty_print else '',
key, str_io.getvalue()))
else:
sys.stdout.write(str_io.getvalue())
if config.pretty_print:
sys.stdout.write('\n' if config.output_format == 'xml' else ',\n')
return key
error_count = 0
warning_count = 0
small_summary = SummaryTree('Test')
small_summary.attributes = attributes
errors = SummaryTree('Errors')
warnings = SummaryTree('Warnings')
buggifies: OrderedDict[str, List[int]] = collections.OrderedDict()
for child in summary.children:
if 'Severity' in child.attributes and child.attributes['Severity'] == '40' and error_count < config.max_errors:
error_count += 1
errors.append(child)
if 'Severity' in child.attributes and child.attributes[
'Severity'] == '30' and warning_count < config.max_warnings:
warning_count += 1
warnings.append(child)
if child.name == 'BuggifySection':
file = child.attributes['File']
line = int(child.attributes['Line'])
buggifies.setdefault(file, []).append(line)
buggifies_elem = SummaryTree('Buggifies')
for file, lines in buggifies.items():
lines.sort()
if config.output_format == 'json':
buggifies_elem.attributes[file] = ' '.join(str(line) for line in lines)
else:
child = SummaryTree('Buggify')
child.attributes['File'] = file
child.attributes['Lines'] = ' '.join(str(line) for line in lines)
small_summary.append(child)
small_summary.children.append(buggifies_elem)
if len(errors.children) > 0:
small_summary.children.append(errors)
if len(warnings.children) > 0:
small_summary.children.append(warnings)
output = io.StringIO()
small_summary.dump(output, prefix=(' ' if config.pretty_print else ''))
if config.output_format == 'json':
sys.stdout.write('{}"{}": {}'.format(' ' if config.pretty_print else '', key, output.getvalue().strip()))
else:
sys.stdout.write('{}{}'.format(' ' if config.pretty_print else '', output.getvalue().strip()))
sys.stdout.write('\n' if config.output_format == 'xml' else ',\n')
def print_errors(ensemble_id: str):
joshua_model.open(config.cluster_file)
properties = joshua_model.get_ensemble_properties(ensemble_id)
compressed = properties["compressed"] if "compressed" in properties else False
for rec in joshua_model.tail_results(ensemble_id, errors_only=(not config.success), compressed=compressed):
if len(rec) == 5:
version_stamp, result_code, host, seed, output = rec
elif len(rec) == 4:
version_stamp, result_code, host, output = rec
seed = None
elif len(rec) == 3:
version_stamp, result_code, output = rec
host = None
seed = None
elif len(rec) == 2:
version_stamp, seed = rec
output = str(joshua_model.fdb.tuple.unpack(seed)[0]) + "\n"
result_code = None
host = None
seed = None
else:
raise Exception("Unknown result format")
lines = output.splitlines()
commands: Set[str] = set()
for line in lines:
summary = ToSummaryTree()
xml.sax.parseString(line, summary)
commands.add(_print_summary(summary.result(), commands))

View File

@ -0,0 +1,144 @@
from __future__ import annotations
import argparse
import io
import json
import re
import sys
import test_harness.fdb
from typing import List, Tuple, OrderedDict
from test_harness.summarize import SummaryTree, Coverage
from test_harness.config import config
from xml.sax.saxutils import quoteattr
class GlobalStatistics:
def __init__(self):
self.total_probes_hit: int = 0
self.total_cpu_time: int = 0
self.total_test_runs: int = 0
self.total_missed_probes: int = 0
class EnsembleResults:
def __init__(self, cluster_file: str | None, ensemble_id: str):
self.global_statistics = GlobalStatistics()
self.fdb_path = ('joshua', 'ensembles', 'results', 'application', ensemble_id)
self.coverage_path = self.fdb_path + ('coverage',)
self.statistics = test_harness.fdb.Statistics(cluster_file, self.fdb_path)
coverage_dict: OrderedDict[Coverage, int] = test_harness.fdb.read_coverage(cluster_file, self.coverage_path)
self.coverage: List[Tuple[Coverage, int]] = []
self.min_coverage_hit: int | None = None
self.ratio = self.global_statistics.total_test_runs / config.hit_per_runs_ratio
for cov, count in coverage_dict.items():
if re.search(config.cov_include_files, cov.file) is None:
continue
if re.search(config.cov_exclude_files, cov.file) is not None:
continue
self.global_statistics.total_probes_hit += count
self.coverage.append((cov, count))
if count <= self.ratio:
self.global_statistics.total_missed_probes += 1
if self.min_coverage_hit is None or self.min_coverage_hit > count:
self.min_coverage_hit = count
self.coverage.sort(key=lambda x: (x[1], x[0].file, x[0].line))
self.stats: List[Tuple[str, int, int]] = []
for k, v in self.statistics.stats.items():
self.global_statistics.total_test_runs += v.run_count
self.global_statistics.total_cpu_time += v.runtime
self.stats.append((k, v.runtime, v.run_count))
self.stats.sort(key=lambda x: x[1], reverse=True)
if self.min_coverage_hit is not None:
self.coverage_ok = self.min_coverage_hit > self.ratio
else:
self.coverage_ok = False
def dump(self, prefix: str):
errors = 0
out = SummaryTree('EnsembleResults')
out.attributes['TotalRuntime'] = str(self.global_statistics.total_cpu_time)
out.attributes['TotalTestRuns'] = str(self.global_statistics.total_test_runs)
out.attributes['TotalProbesHit'] = str(self.global_statistics.total_probes_hit)
out.attributes['MinProbeHit'] = str(self.min_coverage_hit)
out.attributes['TotalProbes'] = str(len(self.coverage))
out.attributes['MissedProbes'] = str(self.global_statistics.total_missed_probes)
for cov, count in self.coverage:
severity = 10 if count > self.ratio else 40
if severity == 40:
errors += 1
if (severity == 40 and errors <= config.max_errors) or config.details:
child = SummaryTree('CodeProbe')
child.attributes['Severity'] = str(severity)
child.attributes['File'] = cov.file
child.attributes['Line'] = str(cov.line)
child.attributes['Comment'] = '' if cov.comment is None else cov.comment
child.attributes['HitCount'] = str(count)
out.append(child)
if config.details:
for k, runtime, run_count in self.stats:
child = SummaryTree('Test')
child.attributes['Name'] = k
child.attributes['Runtime'] = str(runtime)
child.attributes['RunCount'] = str(run_count)
out.append(child)
if errors > 0:
out.attributes['Errors'] = str(errors)
str_io = io.StringIO()
out.dump(str_io, prefix=prefix, new_line=config.pretty_print)
if config.output_format == 'xml':
sys.stdout.write(str_io.getvalue())
else:
sys.stdout.write('{}"EnsembleResults":{}{}'.format(' ' if config.pretty_print else '',
'\n' if config.pretty_print else ' ',
str_io.getvalue()))
def write_header(ensemble_id: str):
if config.output_format == 'json':
if config.pretty_print:
print('{')
print(' "{}": {},\n'.format('ID', json.dumps(ensemble_id.strip())))
else:
sys.stdout.write('{{{}: {},'.format('ID', json.dumps(ensemble_id.strip())))
elif config.output_format == 'xml':
sys.stdout.write('<Ensemble ID={}>'.format(quoteattr(ensemble_id.strip())))
if config.pretty_print:
sys.stdout.write('\n')
else:
assert False, 'unknown output format {}'.format(config.output_format)
def write_footer():
if config.output_format == 'xml':
sys.stdout.write('</Ensemble>\n')
elif config.output_format == 'json':
sys.stdout.write('}\n')
else:
assert False, 'unknown output format {}'.format(config.output_format)
if __name__ == '__main__':
parser = argparse.ArgumentParser('TestHarness Results', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
config.change_default('pretty_print', True)
config.change_default('max_warnings', 0)
config.build_arguments(parser)
parser.add_argument('ensemble_id', type=str, help='The ensemble to fetch the result for')
args = parser.parse_args()
config.extract_args(args)
config.output_format = args.output_format
write_header(args.ensemble_id)
try:
import test_harness.joshua
test_harness.joshua.print_errors(args.ensemble_id)
except ModuleNotFoundError:
child = SummaryTree('JoshuaNotFound')
child.attributes['Severity'] = '30'
child.attributes['Message'] = 'Could not import Joshua -- set PYTHONPATH to joshua checkout dir'
child.dump(sys.stdout, prefix=(' ' if config.pretty_print else ''), new_line=config.pretty_print)
results = EnsembleResults(config.cluster_file, args.ensemble_id)
results.dump(' ' if config.pretty_print else '')
write_footer()
exit(0 if results.coverage_ok else 1)

View File

@ -0,0 +1,465 @@
from __future__ import annotations
import array
import base64
import collections
import math
import os
import resource
import shutil
import subprocess
import re
import sys
import threading
import time
import uuid
from functools import total_ordering
from pathlib import Path
from test_harness.version import Version
from test_harness.config import config
from typing import List, Pattern, OrderedDict
from test_harness.summarize import Summary, SummaryTree
@total_ordering
class TestDescription:
def __init__(self, path: Path, name: str, priority: float):
self.paths: List[Path] = [path]
self.name = name
self.priority: float = priority
# we only measure in seconds. Otherwise, keeping determinism will be difficult
self.total_runtime: int = 0
self.num_runs: int = 0
def __lt__(self, other):
if isinstance(other, TestDescription):
return self.name < other.name
else:
return self.name < str(other)
def __eq__(self, other):
if isinstance(other, TestDescription):
return self.name < other.name
else:
return self.name < str(other.name)
class StatFetcher:
def __init__(self, tests: OrderedDict[str, TestDescription]):
self.tests = tests
def read_stats(self):
pass
def add_run_time(self, test_name: str, runtime: int, out: SummaryTree):
self.tests[test_name].total_runtime += runtime
class TestPicker:
def __init__(self, test_dir: Path):
if not test_dir.exists():
raise RuntimeError('{} is neither a directory nor a file'.format(test_dir))
self.include_files_regex = re.compile(config.include_test_files)
self.exclude_files_regex = re.compile(config.exclude_test_files)
self.include_tests_regex = re.compile(config.include_test_classes)
self.exclude_tests_regex = re.compile(config.exclude_test_names)
self.test_dir: Path = test_dir
self.tests: OrderedDict[str, TestDescription] = collections.OrderedDict()
self.restart_test: Pattern = re.compile(r".*-\d+\.(txt|toml)")
self.follow_test: Pattern = re.compile(r".*-[2-9]\d*\.(txt|toml)")
for subdir in self.test_dir.iterdir():
if subdir.is_dir() and subdir.name in config.test_dirs:
self.walk_test_dir(subdir)
self.stat_fetcher: StatFetcher
if config.stats is not None or config.joshua_dir is None:
self.stat_fetcher = StatFetcher(self.tests)
else:
from test_harness.fdb import FDBStatFetcher
self.stat_fetcher = FDBStatFetcher(self.tests)
if config.stats is not None:
self.load_stats(config.stats)
else:
self.fetch_stats()
def add_time(self, test_file: Path, run_time: int, out: SummaryTree) -> None:
# getting the test name is fairly inefficient. But since we only have 100s of tests, I won't bother
test_name: str | None = None
test_desc: TestDescription | None = None
for name, test in self.tests.items():
for p in test.paths:
test_files: List[Path]
if self.restart_test.match(p.name):
test_files = self.list_restart_files(p)
else:
test_files = [p]
for file in test_files:
if file.absolute() == test_file.absolute():
test_name = name
test_desc = test
break
if test_name is not None:
break
if test_name is not None:
break
assert test_name is not None and test_desc is not None
self.stat_fetcher.add_run_time(test_name, run_time, out)
out.attributes['TotalTestTime'] = str(test_desc.total_runtime)
out.attributes['TestRunCount'] = str(test_desc.num_runs)
def dump_stats(self) -> str:
res = array.array('I')
for _, spec in self.tests.items():
res.append(spec.total_runtime)
return base64.standard_b64encode(res.tobytes()).decode('utf-8')
def fetch_stats(self):
self.stat_fetcher.read_stats()
def load_stats(self, serialized: str):
times = array.array('I')
times.frombytes(base64.standard_b64decode(serialized))
assert len(times) == len(self.tests.items())
for idx, (_, spec) in enumerate(self.tests.items()):
spec.total_runtime = times[idx]
def parse_txt(self, path: Path):
if self.include_files_regex.search(str(path)) is None or self.exclude_files_regex.search(str(path)) is not None:
return
with path.open('r') as f:
test_name: str | None = None
test_class: str | None = None
priority: float | None = None
for line in f:
line = line.strip()
kv = line.split('=')
if len(kv) != 2:
continue
kv[0] = kv[0].strip()
kv[1] = kv[1].strip(' \r\n\t\'"')
if kv[0] == 'testTitle' and test_name is None:
test_name = kv[1]
if kv[0] == 'testClass' and test_class is None:
test_class = kv[1]
if kv[0] == 'testPriority' and priority is None:
try:
priority = float(kv[1])
except ValueError:
raise RuntimeError("Can't parse {} -- testPriority in {} should be set to a float".format(kv[1],
path))
if test_name is not None and test_class is not None and priority is not None:
break
if test_name is None:
return
if test_class is None:
test_class = test_name
if priority is None:
priority = 1.0
if self.include_tests_regex.search(test_class) is None \
or self.exclude_tests_regex.search(test_class) is not None:
return
if test_class not in self.tests:
self.tests[test_class] = TestDescription(path, test_class, priority)
else:
self.tests[test_class].paths.append(path)
def walk_test_dir(self, test: Path):
if test.is_dir():
for file in test.iterdir():
self.walk_test_dir(file)
else:
# check whether we're looking at a restart test
if self.follow_test.match(test.name) is not None:
return
if test.suffix == '.txt' or test.suffix == '.toml':
self.parse_txt(test)
@staticmethod
def list_restart_files(start_file: Path) -> List[Path]:
name = re.sub(r'-\d+.(txt|toml)', '', start_file.name)
res: List[Path] = []
for test_file in start_file.parent.iterdir():
if test_file.name.startswith(name):
res.append(test_file)
assert len(res) > 1
res.sort()
return res
def choose_test(self) -> List[Path]:
min_runtime: float | None = None
candidates: List[TestDescription] = []
for _, v in self.tests.items():
this_time = v.total_runtime * v.priority
if min_runtime is None or this_time < min_runtime:
min_runtime = this_time
candidates = [v]
elif this_time == min_runtime:
candidates.append(v)
candidates.sort()
choice = config.random.randint(0, len(candidates) - 1)
test = candidates[choice]
result = test.paths[config.random.randint(0, len(test.paths) - 1)]
if self.restart_test.match(result.name):
return self.list_restart_files(result)
else:
return [result]
class OldBinaries:
def __init__(self):
self.first_file_expr = re.compile(r'.*-1\.(txt|toml)')
self.old_binaries_path: Path = config.old_binaries_path
self.binaries: OrderedDict[Version, Path] = collections.OrderedDict()
if not self.old_binaries_path.exists() or not self.old_binaries_path.is_dir():
return
exec_pattern = re.compile(r'fdbserver-\d+\.\d+\.\d+(\.exe)?')
for file in self.old_binaries_path.iterdir():
if not file.is_file() or not os.access(file, os.X_OK):
continue
if exec_pattern.fullmatch(file.name) is not None:
self._add_file(file)
def _add_file(self, file: Path):
version_str = file.name.split('-')[1]
if version_str.endswith('.exe'):
version_str = version_str[0:-len('.exe')]
ver = Version.parse(version_str)
self.binaries[ver] = file
def choose_binary(self, test_file: Path) -> Path:
if len(self.binaries) == 0:
return config.binary
max_version = Version.max_version()
min_version = Version.parse('5.0.0')
dirs = test_file.parent.parts
if 'restarting' not in dirs:
return config.binary
version_expr = dirs[-1].split('_')
first_file = self.first_file_expr.match(test_file.name) is not None
if first_file and version_expr[0] == 'to':
# downgrade test -- first binary should be current one
return config.binary
if not first_file and version_expr[0] == 'from':
# upgrade test -- we only return an old version for the first test file
return config.binary
if version_expr[0] == 'from' or version_expr[0] == 'to':
min_version = Version.parse(version_expr[1])
if len(version_expr) == 4 and version_expr[2] == 'until':
max_version = Version.parse(version_expr[3])
candidates: List[Path] = []
for ver, binary in self.binaries.items():
if min_version <= ver <= max_version:
candidates.append(binary)
if len(candidates) == 0:
return config.binary
return config.random.choice(candidates)
def is_restarting_test(test_file: Path):
for p in test_file.parts:
if p == 'restarting':
return True
return False
def is_no_sim(test_file: Path):
return test_file.parts[-2] == 'noSim'
class ResourceMonitor(threading.Thread):
def __init__(self):
super().__init__()
self.start_time = time.time()
self.end_time: float | None = None
self._stop_monitor = False
self.max_rss = 0
def run(self) -> None:
while not self._stop_monitor:
time.sleep(1)
resources = resource.getrusage(resource.RUSAGE_CHILDREN)
self.max_rss = max(resources.ru_maxrss, self.max_rss)
def stop(self):
self.end_time = time.time()
self._stop_monitor = True
def time(self):
return self.end_time - self.start_time
class TestRun:
def __init__(self, binary: Path, test_file: Path, random_seed: int, uid: uuid.UUID,
restarting: bool = False, test_determinism: bool = False, buggify_enabled: bool = False,
stats: str | None = None, expected_unseed: int | None = None, will_restart: bool = False):
self.binary = binary
self.test_file = test_file
self.random_seed = random_seed
self.uid = uid
self.restarting = restarting
self.test_determinism = test_determinism
self.stats: str | None = stats
self.expected_unseed: int | None = expected_unseed
self.use_valgrind: bool = config.use_valgrind
self.old_binary_path: Path = config.old_binaries_path
self.buggify_enabled: bool = buggify_enabled
self.fault_injection_enabled: bool = True
self.trace_format: str | None = config.trace_format
if Version.of_binary(self.binary) < "6.1.0":
self.trace_format = None
self.temp_path = config.run_dir / str(self.uid)
# state for the run
self.retryable_error: bool = False
self.summary: Summary = Summary(binary, uid=self.uid, stats=self.stats, expected_unseed=self.expected_unseed,
will_restart=will_restart)
self.run_time: int = 0
self.success = self.run()
def log_test_plan(self, out: SummaryTree):
test_plan: SummaryTree = SummaryTree('TestPlan')
test_plan.attributes['TestUID'] = str(self.uid)
test_plan.attributes['RandomSeed'] = str(self.random_seed)
test_plan.attributes['TestFile'] = str(self.test_file)
test_plan.attributes['Buggify'] = '1' if self.buggify_enabled else '0'
test_plan.attributes['FaultInjectionEnabled'] = '1' if self.fault_injection_enabled else '0'
test_plan.attributes['DeterminismCheck'] = '1' if self.test_determinism else '0'
out.append(test_plan)
def delete_simdir(self):
shutil.rmtree(self.temp_path / Path('simfdb'))
def run(self):
command: List[str] = []
valgrind_file: Path | None = None
if self.use_valgrind:
command.append('valgrind')
valgrind_file = self.temp_path / Path('valgrind-{}.xml'.format(self.random_seed))
dbg_path = os.getenv('FDB_VALGRIND_DBGPATH')
if dbg_path is not None:
command.append('--extra-debuginfo-path={}'.format(dbg_path))
command += ['--xml=yes', '--xml-file={}'.format(valgrind_file.absolute()), '-q']
command += [str(self.binary.absolute()),
'-r', 'test' if is_no_sim(self.test_file) else 'simulation',
'-f', str(self.test_file),
'-s', str(self.random_seed)]
if self.trace_format is not None:
command += ['--trace_format', self.trace_format]
if Version.of_binary(self.binary) >= '7.1.0':
command += ['-fi', 'on' if self.fault_injection_enabled else 'off']
if self.restarting:
command.append('--restarting')
if self.buggify_enabled:
command += ['-b', 'on']
if config.crash_on_error:
command.append('--crash')
self.temp_path.mkdir(parents=True, exist_ok=True)
# self.log_test_plan(out)
resources = ResourceMonitor()
resources.start()
process = subprocess.Popen(command, stdout=subprocess.DEVNULL, stderr=subprocess.PIPE, cwd=self.temp_path,
text=True)
did_kill = False
timeout = 20 * config.kill_seconds if self.use_valgrind else config.kill_seconds
err_out: str
try:
_, err_out = process.communicate(timeout=timeout)
except subprocess.TimeoutExpired:
process.kill()
_, err_out = process.communicate()
did_kill = True
resources.stop()
resources.join()
# we're rounding times up, otherwise we will prefer running very short tests (<1s)
self.run_time = math.ceil(resources.time())
self.summary.runtime = resources.time()
self.summary.max_rss = resources.max_rss
self.summary.was_killed = did_kill
self.summary.valgrind_out_file = valgrind_file
self.summary.error_out = err_out
self.summary.summarize(self.temp_path, ' '.join(command))
return self.summary.ok()
def decorate_summary(out: SummaryTree, test_file: Path, seed: int, buggify: bool):
"""Sometimes a test can crash before ProgramStart is written to the traces. These
tests are then hard to reproduce (they can be reproduced through TestHarness but
require the user to run in the joshua docker container). To account for this we
will write the necessary information into the attributes if it is missing."""
if 'TestFile' not in out.attributes:
out.attributes['TestFile'] = str(test_file)
if 'RandomSeed' not in out.attributes:
out.attributes['RandomSeed'] = str(seed)
if 'BuggifyEnabled' not in out.attributes:
out.attributes['BuggifyEnabled'] = '1' if buggify else '0'
class TestRunner:
def __init__(self):
self.uid = uuid.uuid4()
self.test_path: Path = Path('tests')
self.cluster_file: str | None = None
self.fdb_app_dir: str | None = None
self.binary_chooser = OldBinaries()
self.test_picker = TestPicker(self.test_path)
def backup_sim_dir(self, seed: int):
temp_dir = config.run_dir / str(self.uid)
src_dir = temp_dir / 'simfdb'
assert src_dir.is_dir()
dest_dir = temp_dir / 'simfdb.{}'.format(seed)
assert not dest_dir.exists()
shutil.copytree(src_dir, dest_dir)
def restore_sim_dir(self, seed: int):
temp_dir = config.run_dir / str(self.uid)
src_dir = temp_dir / 'simfdb.{}'.format(seed)
assert src_dir.exists()
dest_dir = temp_dir / 'simfdb'
shutil.rmtree(dest_dir)
shutil.move(src_dir, dest_dir)
def run_tests(self, test_files: List[Path], seed: int, test_picker: TestPicker) -> bool:
result: bool = True
for count, file in enumerate(test_files):
will_restart = count + 1 < len(test_files)
binary = self.binary_chooser.choose_binary(file)
unseed_check = not is_no_sim(file) and config.random.random() < config.unseed_check_ratio
buggify_enabled: bool = config.random.random() < config.buggify_on_ratio
if unseed_check and count != 0:
# for restarting tests we will need to restore the sim2 after the first run
self.backup_sim_dir(seed + count - 1)
run = TestRun(binary, file.absolute(), seed + count, self.uid, restarting=count != 0,
stats=test_picker.dump_stats(), will_restart=will_restart, buggify_enabled=buggify_enabled)
result = result and run.success
test_picker.add_time(test_files[0], run.run_time, run.summary.out)
decorate_summary(run.summary.out, file, seed + count, run.buggify_enabled)
if unseed_check and run.summary.unseed:
run.summary.out.append(run.summary.list_simfdb())
run.summary.out.dump(sys.stdout)
if not result:
return False
if unseed_check and run.summary.unseed is not None:
if count != 0:
self.restore_sim_dir(seed + count - 1)
run2 = TestRun(binary, file.absolute(), seed + count, self.uid, restarting=count != 0,
stats=test_picker.dump_stats(), expected_unseed=run.summary.unseed,
will_restart=will_restart, buggify_enabled=buggify_enabled)
test_picker.add_time(file, run2.run_time, run.summary.out)
decorate_summary(run2.summary.out, file, seed + count, run.buggify_enabled)
run2.summary.out.dump(sys.stdout)
result = result and run2.success
if not result:
return False
return result
def run(self) -> bool:
seed = config.random_seed if config.random_seed is not None else config.random.randint(0, 2 ** 32 - 1)
test_files = self.test_picker.choose_test()
success = self.run_tests(test_files, seed, self.test_picker)
if config.clean_up:
shutil.rmtree(config.run_dir / str(self.uid))
return success

View File

@ -0,0 +1,620 @@
from __future__ import annotations
import collections
import inspect
import json
import os
import re
import sys
import traceback
import uuid
import xml.sax
import xml.sax.handler
import xml.sax.saxutils
from pathlib import Path
from typing import List, Dict, TextIO, Callable, Optional, OrderedDict, Any, Tuple, Iterator, Iterable
from test_harness.config import config
from test_harness.valgrind import parse_valgrind_output
class SummaryTree:
def __init__(self, name: str):
self.name = name
self.children: List[SummaryTree] = []
self.attributes: Dict[str, str] = {}
def append(self, element: SummaryTree):
self.children.append(element)
def to_dict(self, add_name: bool = True) -> Dict[str, Any] | List[Any]:
if len(self.children) > 0 and len(self.attributes) == 0:
children = []
for child in self.children:
children.append(child.to_dict())
if add_name:
return {self.name: children}
else:
return children
res: Dict[str, Any] = {}
if add_name:
res['Type'] = self.name
for k, v in self.attributes.items():
res[k] = v
children = []
child_keys: Dict[str, int] = {}
for child in self.children:
if child.name in child_keys:
child_keys[child.name] += 1
else:
child_keys[child.name] = 1
for child in self.children:
if child_keys[child.name] == 1 and child.name not in self.attributes:
res[child.name] = child.to_dict(add_name=False)
else:
children.append(child.to_dict())
if len(children) > 0:
res['children'] = children
return res
def to_json(self, out: TextIO, prefix: str = ''):
res = json.dumps(self.to_dict(), indent=(' ' if config.pretty_print else None))
for line in res.splitlines(False):
out.write('{}{}\n'.format(prefix, line))
def to_xml(self, out: TextIO, prefix: str = ''):
# minidom doesn't support omitting the xml declaration which is a problem for joshua
# However, our xml is very simple and therefore serializing manually is easy enough
attrs = []
print_width = 120
try:
print_width, _ = os.get_terminal_size()
except OSError:
pass
for k, v in self.attributes.items():
attrs.append('{}={}'.format(k, xml.sax.saxutils.quoteattr(v)))
elem = '{}<{}{}'.format(prefix, self.name, ('' if len(attrs) == 0 else ' '))
out.write(elem)
if config.pretty_print:
curr_line_len = len(elem)
for i in range(len(attrs)):
attr_len = len(attrs[i])
if i == 0 or attr_len + curr_line_len + 1 <= print_width:
if i != 0:
out.write(' ')
out.write(attrs[i])
curr_line_len += attr_len
else:
out.write('\n')
out.write(' ' * len(elem))
out.write(attrs[i])
curr_line_len = len(elem) + attr_len
else:
out.write(' '.join(attrs))
if len(self.children) == 0:
out.write('/>')
else:
out.write('>')
for child in self.children:
if config.pretty_print:
out.write('\n')
child.to_xml(out, prefix=(' {}'.format(prefix) if config.pretty_print else prefix))
if len(self.children) > 0:
out.write('{}{}</{}>'.format(('\n' if config.pretty_print else ''), prefix, self.name))
def dump(self, out: TextIO, prefix: str = '', new_line: bool = True):
if config.output_format == 'json':
self.to_json(out, prefix=prefix)
else:
self.to_xml(out, prefix=prefix)
if new_line:
out.write('\n')
ParserCallback = Callable[[Dict[str, str]], Optional[str]]
class ParseHandler:
def __init__(self, out: SummaryTree):
self.out = out
self.events: OrderedDict[Optional[Tuple[str, Optional[str]]], List[ParserCallback]] = collections.OrderedDict()
def add_handler(self, attr: Tuple[str, Optional[str]], callback: ParserCallback) -> None:
self.events.setdefault(attr, []).append(callback)
def _call(self, callback: ParserCallback, attrs: Dict[str, str]) -> str | None:
try:
return callback(attrs)
except Exception as e:
_, _, exc_traceback = sys.exc_info()
child = SummaryTree('NonFatalParseError')
child.attributes['Severity'] = '30'
child.attributes['ErrorMessage'] = str(e)
child.attributes['Trace'] = repr(traceback.format_tb(exc_traceback))
self.out.append(child)
return None
def handle(self, attrs: Dict[str, str]):
if None in self.events:
for callback in self.events[None]:
self._call(callback, attrs)
for k, v in attrs.items():
if (k, None) in self.events:
for callback in self.events[(k, None)]:
remap = self._call(callback, attrs)
if remap is not None:
v = remap
attrs[k] = v
if (k, v) in self.events:
for callback in self.events[(k, v)]:
remap = self._call(callback, attrs)
if remap is not None:
v = remap
attrs[k] = v
class Parser:
def parse(self, file: TextIO, handler: ParseHandler) -> None:
pass
class XmlParser(Parser, xml.sax.handler.ContentHandler):
def __init__(self):
super().__init__()
self.handler: ParseHandler | None = None
def parse(self, file: TextIO, handler: ParseHandler) -> None:
xml.sax.parse(file, self)
def startElement(self, name, attrs) -> None:
attributes: Dict[str, str] = {}
for name in attrs.getNames():
attributes[name] = attrs.getValue(name)
assert self.handler is not None
self.handler.handle(attributes)
class JsonParser(Parser):
def __init__(self):
super().__init__()
def parse(self, file: TextIO, handler: ParseHandler):
for line in file:
obj = json.loads(line)
handler.handle(obj)
class Coverage:
def __init__(self, file: str, line: str | int, comment: str | None = None):
self.file = file
self.line = int(line)
self.comment = comment
def to_tuple(self) -> Tuple[str, int, str | None]:
return self.file, self.line, self.comment
def __eq__(self, other) -> bool:
if isinstance(other, tuple) and len(other) == 3:
return self.to_tuple() == other
elif isinstance(other, Coverage):
return self.to_tuple() == other.to_tuple()
else:
return False
def __lt__(self, other) -> bool:
if isinstance(other, tuple) and len(other) == 3:
return self.to_tuple() < other
elif isinstance(other, Coverage):
return self.to_tuple() < other.to_tuple()
else:
return False
def __le__(self, other) -> bool:
if isinstance(other, tuple) and len(other) == 3:
return self.to_tuple() <= other
elif isinstance(other, Coverage):
return self.to_tuple() <= other.to_tuple()
else:
return False
def __gt__(self, other: Coverage) -> bool:
if isinstance(other, tuple) and len(other) == 3:
return self.to_tuple() > other
elif isinstance(other, Coverage):
return self.to_tuple() > other.to_tuple()
else:
return False
def __ge__(self, other):
if isinstance(other, tuple) and len(other) == 3:
return self.to_tuple() >= other
elif isinstance(other, Coverage):
return self.to_tuple() >= other.to_tuple()
else:
return False
def __hash__(self):
return hash((self.file, self.line, self.comment))
class TraceFiles:
def __init__(self, path: Path):
self.path: Path = path
self.timestamps: List[int] = []
self.runs: OrderedDict[int, List[Path]] = collections.OrderedDict()
trace_expr = re.compile(r'trace.*\.(json|xml)')
for file in self.path.iterdir():
if file.is_file() and trace_expr.match(file.name) is not None:
ts = int(file.name.split('.')[6])
if ts in self.runs:
self.runs[ts].append(file)
else:
self.timestamps.append(ts)
self.runs[ts] = [file]
self.timestamps.sort(reverse=True)
def __getitem__(self, idx: int) -> List[Path]:
res = self.runs[self.timestamps[idx]]
res.sort()
return res
def __len__(self) -> int:
return len(self.runs)
def items(self) -> Iterator[List[Path]]:
class TraceFilesIterator(Iterable[List[Path]]):
def __init__(self, trace_files: TraceFiles):
self.current = 0
self.trace_files: TraceFiles = trace_files
def __iter__(self):
return self
def __next__(self) -> List[Path]:
if len(self.trace_files) <= self.current:
raise StopIteration
self.current += 1
return self.trace_files[self.current - 1]
return TraceFilesIterator(self)
class Summary:
def __init__(self, binary: Path, runtime: float = 0, max_rss: int | None = None,
was_killed: bool = False, uid: uuid.UUID | None = None, expected_unseed: int | None = None,
exit_code: int = 0, valgrind_out_file: Path | None = None, stats: str | None = None,
error_out: str = None, will_restart: bool = False):
self.binary = binary
self.runtime: float = runtime
self.max_rss: int | None = max_rss
self.was_killed: bool = was_killed
self.expected_unseed: int | None = expected_unseed
self.exit_code: int = exit_code
self.out: SummaryTree = SummaryTree('Test')
self.test_begin_found: bool = False
self.test_end_found: bool = False
self.unseed: int | None = None
self.valgrind_out_file: Path | None = valgrind_out_file
self.severity_map: OrderedDict[tuple[str, int], int] = collections.OrderedDict()
self.error: bool = False
self.errors: int = 0
self.warnings: int = 0
self.coverage: OrderedDict[Coverage, bool] = collections.OrderedDict()
self.test_count: int = 0
self.tests_passed: int = 0
self.error_out = error_out
self.stderr_severity: str = '40'
self.will_restart: bool = will_restart
self.test_dir: Path | None = None
if uid is not None:
self.out.attributes['TestUID'] = str(uid)
if stats is not None:
self.out.attributes['Statistics'] = stats
self.out.attributes['JoshuaSeed'] = str(config.joshua_seed)
self.out.attributes['WillRestart'] = '1' if self.will_restart else '0'
self.handler = ParseHandler(self.out)
self.register_handlers()
def summarize_files(self, trace_files: List[Path]):
assert len(trace_files) > 0
for f in trace_files:
self.parse_file(f)
self.done()
def summarize(self, trace_dir: Path, command: str):
self.test_dir = trace_dir
trace_files = TraceFiles(trace_dir)
if len(trace_files) == 0:
self.error = True
child = SummaryTree('NoTracesFound')
child.attributes['Severity'] = '40'
child.attributes['Path'] = str(trace_dir.absolute())
child.attributes['Command'] = command
self.out.append(child)
return
self.summarize_files(trace_files[0])
if config.joshua_dir is not None:
import test_harness.fdb
test_harness.fdb.write_coverage(config.cluster_file,
test_harness.fdb.str_to_tuple(config.joshua_dir) + ('coverage',),
test_harness.fdb.str_to_tuple(config.joshua_dir) + ('coverage-metadata',),
self.coverage)
def list_simfdb(self) -> SummaryTree:
res = SummaryTree('SimFDB')
res.attributes['TestDir'] = str(self.test_dir)
if self.test_dir is None:
return res
simfdb = self.test_dir / Path('simfdb')
if not simfdb.exists():
res.attributes['NoSimDir'] = "simfdb doesn't exist"
return res
elif not simfdb.is_dir():
res.attributes['NoSimDir'] = 'simfdb is not a directory'
return res
for file in simfdb.iterdir():
child = SummaryTree('Directory' if file.is_dir() else 'File')
child.attributes['Name'] = file.name
res.append(child)
return res
def ok(self):
return not self.error
def done(self):
if config.print_coverage:
for k, v in self.coverage.items():
child = SummaryTree('CodeCoverage')
child.attributes['File'] = k.file
child.attributes['Line'] = str(k.line)
if not v:
child.attributes['Covered'] = '0'
if k.comment is not None and len(k.comment):
child.attributes['Comment'] = k.comment
self.out.append(child)
if self.warnings > config.max_warnings:
child = SummaryTree('WarningLimitExceeded')
child.attributes['Severity'] = '30'
child.attributes['WarningCount'] = str(self.warnings)
self.out.append(child)
if self.errors > config.max_errors:
child = SummaryTree('ErrorLimitExceeded')
child.attributes['Severity'] = '40'
child.attributes['ErrorCount'] = str(self.errors)
self.out.append(child)
if self.was_killed:
child = SummaryTree('ExternalTimeout')
child.attributes['Severity'] = '40'
self.out.append(child)
self.error = True
if self.max_rss is not None:
self.out.attributes['PeakMemory'] = str(self.max_rss)
if self.valgrind_out_file is not None:
try:
valgrind_errors = parse_valgrind_output(self.valgrind_out_file)
for valgrind_error in valgrind_errors:
if valgrind_error.kind.startswith('Leak'):
continue
self.error = True
child = SummaryTree('ValgrindError')
child.attributes['Severity'] = '40'
child.attributes['What'] = valgrind_error.what.what
child.attributes['Backtrace'] = valgrind_error.what.backtrace
aux_count = 0
for aux in valgrind_error.aux:
child.attributes['WhatAux{}'.format(aux_count)] = aux.what
child.attributes['BacktraceAux{}'.format(aux_count)] = aux.backtrace
aux_count += 1
self.out.append(child)
except Exception as e:
self.error = True
child = SummaryTree('ValgrindParseError')
child.attributes['Severity'] = '40'
child.attributes['ErrorMessage'] = str(e)
_, _, exc_traceback = sys.exc_info()
child.attributes['Trace'] = repr(traceback.format_tb(exc_traceback))
self.out.append(child)
if not self.test_end_found:
child = SummaryTree('TestUnexpectedlyNotFinished')
child.attributes['Severity'] = '40'
self.out.append(child)
if self.error_out is not None and len(self.error_out) > 0:
lines = self.error_out.splitlines()
stderr_bytes = 0
for line in lines:
if line.endswith("WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!"):
# When running ASAN we expect to see this message. Boost coroutine should be using the correct asan annotations so that it shouldn't produce any false positives.
continue
if line.endswith("Warning: unimplemented fcntl command: 1036"):
# Valgrind produces this warning when F_SET_RW_HINT is used
continue
if self.stderr_severity == '40':
self.error = True
remaining_bytes = config.max_stderr_bytes - stderr_bytes
if remaining_bytes > 0:
out_err = line[0:remaining_bytes] + ('...' if len(line) > remaining_bytes else '')
child = SummaryTree('StdErrOutput')
child.attributes['Severity'] = self.stderr_severity
child.attributes['Output'] = out_err
self.out.append(child)
stderr_bytes += len(line)
if stderr_bytes > config.max_stderr_bytes:
child = SummaryTree('StdErrOutputTruncated')
child.attributes['Severity'] = self.stderr_severity
child.attributes['BytesRemaining'] = stderr_bytes - config.max_stderr_bytes
self.out.append(child)
self.out.attributes['Ok'] = '1' if self.ok() else '0'
if not self.ok():
reason = 'Unknown'
if self.error:
reason = 'ProducedErrors'
elif not self.test_end_found:
reason = 'TestDidNotFinish'
elif self.tests_passed == 0:
reason = 'NoTestsPassed'
elif self.test_count != self.tests_passed:
reason = 'Expected {} tests to pass, but only {} did'.format(self.test_count, self.tests_passed)
self.out.attributes['FailReason'] = reason
def parse_file(self, file: Path):
parser: Parser
if file.suffix == '.json':
parser = JsonParser()
elif file.suffix == '.xml':
parser = XmlParser()
else:
child = SummaryTree('TestHarnessBug')
child.attributes['File'] = __file__
frame = inspect.currentframe()
if frame is not None:
child.attributes['Line'] = str(inspect.getframeinfo(frame).lineno)
child.attributes['Details'] = 'Unexpected suffix {} for file {}'.format(file.suffix, file.name)
self.error = True
self.out.append(child)
return
with file.open('r') as f:
try:
parser.parse(f, self.handler)
except Exception as e:
child = SummaryTree('SummarizationError')
child.attributes['Severity'] = '40'
child.attributes['ErrorMessage'] = str(e)
self.out.append(child)
def register_handlers(self):
def remap_event_severity(attrs):
if 'Type' not in attrs or 'Severity' not in attrs:
return None
k = (attrs['Type'], int(attrs['Severity']))
if k in self.severity_map:
return str(self.severity_map[k])
self.handler.add_handler(('Severity', None), remap_event_severity)
def program_start(attrs: Dict[str, str]):
if self.test_begin_found:
return
self.test_begin_found = True
self.out.attributes['RandomSeed'] = attrs['RandomSeed']
self.out.attributes['SourceVersion'] = attrs['SourceVersion']
self.out.attributes['Time'] = attrs['ActualTime']
self.out.attributes['BuggifyEnabled'] = attrs['BuggifyEnabled']
self.out.attributes['DeterminismCheck'] = '0' if self.expected_unseed is None else '1'
if self.binary.name != 'fdbserver':
self.out.attributes['OldBinary'] = self.binary.name
if 'FaultInjectionEnabled' in attrs:
self.out.attributes['FaultInjectionEnabled'] = attrs['FaultInjectionEnabled']
self.handler.add_handler(('Type', 'ProgramStart'), program_start)
def set_test_file(attrs: Dict[str, str]):
test_file = Path(attrs['TestFile'])
cwd = Path('.').absolute()
try:
test_file = test_file.relative_to(cwd)
except ValueError:
pass
self.out.attributes['TestFile'] = str(test_file)
self.handler.add_handler(('Type', 'Simulation'), set_test_file)
self.handler.add_handler(('Type', 'NonSimulationTest'), set_test_file)
def set_elapsed_time(attrs: Dict[str, str]):
if self.test_end_found:
return
self.test_end_found = True
self.unseed = int(attrs['RandomUnseed'])
if self.expected_unseed is not None and self.unseed != self.expected_unseed:
severity = 40 if ('UnseedMismatch', 40) not in self.severity_map \
else self.severity_map[('UnseedMismatch', 40)]
if severity >= 30:
child = SummaryTree('UnseedMismatch')
child.attributes['Unseed'] = str(self.unseed)
child.attributes['ExpectedUnseed'] = str(self.expected_unseed)
child.attributes['Severity'] = str(severity)
if severity >= 40:
self.error = True
self.out.append(child)
self.out.attributes['SimElapsedTime'] = attrs['SimTime']
self.out.attributes['RealElapsedTime'] = attrs['RealTime']
if self.unseed is not None:
self.out.attributes['RandomUnseed'] = str(self.unseed)
self.handler.add_handler(('Type', 'ElapsedTime'), set_elapsed_time)
def parse_warning(attrs: Dict[str, str]):
self.warnings += 1
if self.warnings > config.max_warnings:
return
child = SummaryTree(attrs['Type'])
for k, v in attrs.items():
if k != 'Type':
child.attributes[k] = v
self.out.append(child)
self.handler.add_handler(('Severity', '30'), parse_warning)
def parse_error(attrs: Dict[str, str]):
self.errors += 1
self.error = True
if self.errors > config.max_errors:
return
child = SummaryTree(attrs['Type'])
for k, v in attrs.items():
child.attributes[k] = v
self.out.append(child)
self.handler.add_handler(('Severity', '40'), parse_error)
def coverage(attrs: Dict[str, str]):
covered = True
if 'Covered' in attrs:
covered = int(attrs['Covered']) != 0
comment = ''
if 'Comment' in attrs:
comment = attrs['Comment']
c = Coverage(attrs['File'], attrs['Line'], comment)
if covered or c not in self.coverage:
self.coverage[c] = covered
self.handler.add_handler(('Type', 'CodeCoverage'), coverage)
def expected_test_pass(attrs: Dict[str, str]):
self.test_count = int(attrs['Count'])
self.handler.add_handler(('Type', 'TestsExpectedToPass'), expected_test_pass)
def test_passed(attrs: Dict[str, str]):
if attrs['Passed'] == '1':
self.tests_passed += 1
self.handler.add_handler(('Type', 'TestResults'), test_passed)
def remap_event_severity(attrs: Dict[str, str]):
self.severity_map[(attrs['TargetEvent'], int(attrs['OriginalSeverity']))] = int(attrs['NewSeverity'])
self.handler.add_handler(('Type', 'RemapEventSeverity'), remap_event_severity)
def buggify_section(attrs: Dict[str, str]):
if attrs['Type'] == 'FaultInjected' or attrs.get('Activated', '0') == '1':
child = SummaryTree(attrs['Type'])
child.attributes['File'] = attrs['File']
child.attributes['Line'] = attrs['Line']
self.out.append(child)
self.handler.add_handler(('Type', 'BuggifySection'), buggify_section)
self.handler.add_handler(('Type', 'FaultInjected'), buggify_section)
def running_unit_test(attrs: Dict[str, str]):
child = SummaryTree('RunningUnitTest')
child.attributes['Name'] = attrs['Name']
child.attributes['File'] = attrs['File']
child.attributes['Line'] = attrs['Line']
self.handler.add_handler(('Type', 'RunningUnitTest'), running_unit_test)
def stderr_severity(attrs: Dict[str, str]):
if 'NewSeverity' in attrs:
self.stderr_severity = attrs['NewSeverity']
self.handler.add_handler(('Type', 'StderrSeverity'), stderr_severity)

View File

@ -0,0 +1,16 @@
import sys
from test_harness.valgrind import parse_valgrind_output
from pathlib import Path
if __name__ == '__main__':
errors = parse_valgrind_output(Path(sys.argv[1]))
for valgrind_error in errors:
print('ValgrindError: what={}, kind={}'.format(valgrind_error.what.what, valgrind_error.kind))
print('Backtrace: {}'.format(valgrind_error.what.backtrace))
counter = 0
for aux in valgrind_error.aux:
print('Aux {}:'.format(counter))
print(' What: {}'.format(aux.what))
print(' Backtrace: {}'.format(aux.backtrace))

View File

@ -0,0 +1,60 @@
import argparse
import re
import sys
from pathlib import Path
from test_harness.config import config
from test_harness.summarize import Summary, TraceFiles
from typing import Pattern, List
def files_matching(path: Path, pattern: Pattern, recurse: bool = True) -> List[Path]:
res: List[Path] = []
for file in path.iterdir():
if file.is_file() and pattern.match(file.name) is not None:
res.append(file)
elif file.is_dir() and recurse:
res += files_matching(file, pattern, recurse)
return res
def dirs_with_files_matching(path: Path, pattern: Pattern, recurse: bool = True) -> List[Path]:
res: List[Path] = []
sub_directories: List[Path] = []
has_file = False
for file in path.iterdir():
if file.is_file() and pattern.match(file.name) is not None:
has_file = True
elif file.is_dir() and recurse:
sub_directories.append(file)
if has_file:
res.append(path)
if recurse:
for file in sub_directories:
res += dirs_with_files_matching(file, pattern, recurse=True)
res.sort()
return res
if __name__ == '__main__':
parser = argparse.ArgumentParser('TestHarness Timeout', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
config.build_arguments(parser)
args = parser.parse_args()
config.extract_args(args)
valgrind_files: List[Path] = []
if config.use_valgrind:
valgrind_files = files_matching(Path.cwd(), re.compile(r'valgrind.*\.xml'))
for directory in dirs_with_files_matching(Path.cwd(), re.compile(r'trace.*\.(json|xml)'), recurse=True):
trace_files = TraceFiles(directory)
for files in trace_files.items():
if config.use_valgrind:
for valgrind_file in valgrind_files:
summary = Summary(Path('bin/fdbserver'), was_killed=True)
summary.valgrind_out_file = valgrind_file
summary.summarize_files(files)
summary.out.dump(sys.stdout)
else:
summary = Summary(Path('bin/fdbserver'), was_killed=True)
summary.summarize_files(files)
summary.out.dump(sys.stdout)

View File

@ -0,0 +1,141 @@
import enum
import xml
import xml.sax.handler
from pathlib import Path
from typing import List
class ValgrindWhat:
def __init__(self):
self.what: str = ''
self.backtrace: str = ''
class ValgrindError:
def __init__(self):
self.what: ValgrindWhat = ValgrindWhat()
self.kind: str = ''
self.aux: List[ValgrindWhat] = []
# noinspection PyArgumentList
class ValgrindParseState(enum.Enum):
ROOT = enum.auto()
ERROR = enum.auto()
ERROR_AUX = enum.auto()
KIND = enum.auto()
WHAT = enum.auto()
TRACE = enum.auto()
AUX_WHAT = enum.auto()
STACK = enum.auto()
STACK_AUX = enum.auto()
STACK_IP = enum.auto()
STACK_IP_AUX = enum.auto()
class ValgrindHandler(xml.sax.handler.ContentHandler):
def __init__(self):
super().__init__()
self.stack: List[ValgrindError] = []
self.result: List[ValgrindError] = []
self.state_stack: List[ValgrindParseState] = []
def state(self) -> ValgrindParseState:
if len(self.state_stack) == 0:
return ValgrindParseState.ROOT
return self.state_stack[-1]
@staticmethod
def from_content(content):
# pdb.set_trace()
if isinstance(content, bytes):
return content.decode()
assert isinstance(content, str)
return content
def characters(self, content):
# pdb.set_trace()
state = self.state()
if len(self.state_stack) == 0:
return
else:
assert len(self.stack) > 0
if state is ValgrindParseState.KIND:
self.stack[-1].kind += self.from_content(content)
elif state is ValgrindParseState.WHAT:
self.stack[-1].what.what += self.from_content(content)
elif state is ValgrindParseState.AUX_WHAT:
self.stack[-1].aux[-1].what += self.from_content(content)
elif state is ValgrindParseState.STACK_IP:
self.stack[-1].what.backtrace += self.from_content(content)
elif state is ValgrindParseState.STACK_IP_AUX:
self.stack[-1].aux[-1].backtrace += self.from_content(content)
def startElement(self, name, attrs):
# pdb.set_trace()
if name == 'error':
self.stack.append(ValgrindError())
self.state_stack.append(ValgrindParseState.ERROR)
if len(self.stack) == 0:
return
if name == 'kind':
self.state_stack.append(ValgrindParseState.KIND)
elif name == 'what':
self.state_stack.append(ValgrindParseState.WHAT)
elif name == 'auxwhat':
assert self.state() in [ValgrindParseState.ERROR, ValgrindParseState.ERROR_AUX]
self.state_stack.pop()
self.state_stack.append(ValgrindParseState.ERROR_AUX)
self.state_stack.append(ValgrindParseState.AUX_WHAT)
self.stack[-1].aux.append(ValgrindWhat())
elif name == 'stack':
state = self.state()
assert state in [ValgrindParseState.ERROR, ValgrindParseState.ERROR_AUX]
if state == ValgrindParseState.ERROR:
self.state_stack.append(ValgrindParseState.STACK)
else:
self.state_stack.append(ValgrindParseState.STACK_AUX)
elif name == 'ip':
state = self.state()
assert state in [ValgrindParseState.STACK, ValgrindParseState.STACK_AUX]
if state == ValgrindParseState.STACK:
self.state_stack.append(ValgrindParseState.STACK_IP)
if len(self.stack[-1].what.backtrace) == 0:
self.stack[-1].what.backtrace = 'addr2line -e fdbserver.debug -p -C -f -i '
else:
self.stack[-1].what.backtrace += ' '
else:
self.state_stack.append(ValgrindParseState.STACK_IP_AUX)
if len(self.stack[-1].aux[-1].backtrace) == 0:
self.stack[-1].aux[-1].backtrace = 'addr2line -e fdbserver.debug -p -C -f -i '
else:
self.stack[-1].aux[-1].backtrace += ' '
def endElement(self, name):
# pdb.set_trace()
if name == 'error':
self.result.append(self.stack.pop())
self.state_stack.pop()
elif name == 'kind':
assert self.state() == ValgrindParseState.KIND
self.state_stack.pop()
elif name == 'what':
assert self.state() == ValgrindParseState.WHAT
self.state_stack.pop()
elif name == 'auxwhat':
assert self.state() == ValgrindParseState.AUX_WHAT
self.state_stack.pop()
elif name == 'stack':
assert self.state() in [ValgrindParseState.STACK, ValgrindParseState.STACK_AUX]
self.state_stack.pop()
elif name == 'ip':
self.state_stack.pop()
state = self.state()
assert state in [ValgrindParseState.STACK, ValgrindParseState.STACK_AUX]
def parse_valgrind_output(valgrind_out_file: Path) -> List[ValgrindError]:
handler = ValgrindHandler()
with valgrind_out_file.open('r') as f:
xml.sax.parse(f, handler)
return handler.result

View File

@ -0,0 +1,66 @@
from functools import total_ordering
from pathlib import Path
from typing import Tuple
@total_ordering
class Version:
def __init__(self):
self.major: int = 0
self.minor: int = 0
self.patch: int = 0
def version_tuple(self):
return self.major, self.minor, self.patch
def _compare(self, other) -> int:
lhs: Tuple[int, int, int] = self.version_tuple()
rhs: Tuple[int, int, int]
if isinstance(other, Version):
rhs = other.version_tuple()
else:
rhs = Version.parse(str(other)).version_tuple()
if lhs < rhs:
return -1
elif lhs > rhs:
return 1
else:
return 0
def __eq__(self, other) -> bool:
return self._compare(other) == 0
def __lt__(self, other) -> bool:
return self._compare(other) < 0
def __hash__(self):
return hash(self.version_tuple())
def __str__(self):
return format('{}.{}.{}'.format(self.major, self.minor, self.patch))
@staticmethod
def of_binary(binary: Path):
parts = binary.name.split('-')
if len(parts) != 2:
return Version.max_version()
return Version.parse(parts[1])
@staticmethod
def parse(version: str):
version_tuple = version.split('.')
self = Version()
self.major = int(version_tuple[0])
if len(version_tuple) > 1:
self.minor = int(version_tuple[1])
if len(version_tuple) > 2:
self.patch = int(version_tuple[2])
return self
@staticmethod
def max_version():
self = Version()
self.major = 2**32 - 1
self.minor = 2**32 - 1
self.patch = 2**32 - 1
return self

View File

@ -0,0 +1,431 @@
<form theme="light">
<label>FoundationDB - Details</label>
<description>Details for FoundationDB Cluster</description>
<fieldset submitButton="false">
<input type="text" token="Index" searchWhenChanged="true">
<label>Index</label>
<default>*</default>
</input>
<input type="text" token="LogGroup" searchWhenChanged="true">
<label>LogGroup</label>
<default>*</default>
</input>
<input type="time" token="TimeRange" searchWhenChanged="true">
<label>Time Range</label>
<default>
<earliest>-60m@m</earliest>
<latest>now</latest>
</default>
</input>
<input type="dropdown" token="Span" searchWhenChanged="true">
<label>Timechart Resolution</label>
<choice value="bins=100">Default</choice>
<choice value="span=5s">5 seconds</choice>
<choice value="span=1m">1 minute</choice>
<choice value="span=10m">10 minutes</choice>
<choice value="span=1h">1 hour</choice>
<choice value="span=1d">1 day</choice>
<default>bins=100</default>
<initialValue>bins=100</initialValue>
</input>
<input type="dropdown" token="Roles" searchWhenChanged="true">
<label>Roles</label>
<choice value="">All</choice>
<choice value="Roles=*SS*">Storage Server</choice>
<choice value="Roles=*TL*">Transaction Log</choice>
<choice value="Roles=*MP*">Proxy</choice>
<choice value="Roles=*RV*">Resolver</choice>
<choice value="Roles=*MS*">Master</choice>
<choice value="Roles=*CC*">Cluster Controller</choice>
<choice value="Roles=*LR*">Log Router</choice>
<choice value="Roles=*DD*">Data Distributor</choice>
<choice value="Roles=*RK*">Ratekeeper</choice>
<choice value="Roles=*TS*">Tester</choice>
<default></default>
</input>
<input type="text" token="Host" searchWhenChanged="true">
<label>Host</label>
<default>*</default>
</input>
<input type="text" token="Machine" searchWhenChanged="true">
<label>Machine</label>
<default>*</default>
</input>
</fieldset>
<row>
<panel>
<chart>
<title>Storage Queue Size</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=StorageMetrics $Roles$ host=$Host$ Machine=$Machine$ TrackLatestType=Original | rex field=BytesInput "(?&lt;InputRate&gt;.*) (?&lt;InputRoughness&gt;.*) (?&lt;InputCounter&gt;.*)" | rex field=BytesDurable "(?&lt;DurableRate&gt;.*) (?&lt;DurableRoughness&gt;.*) (?&lt;DurableCounter&gt;.*)" | eval QueueSize=InputCounter-DurableCounter | timechart $Span$ avg(QueueSize) by Machine</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Storage Input Rate</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=StorageMetrics $Roles$ host=$Host$ Machine=$Machine$ TrackLatestType=Original | rex field=BytesInput "(?&lt;InputRate&gt;.*) (?&lt;InputRoughness&gt;.*) (?&lt;InputCounter&gt;.*)" | timechart $Span$ avg(InputRate) by Machine</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Storage Bytes Queried</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=StorageMetrics $Roles$ host=$Host$ Machine=$Machine$ TrackLatestType=Original | rex field=BytesQueried "(?&lt;Rate&gt;.*) (?&lt;Roughness&gt;.*) (?&lt;Counter&gt;.*)" | timechart $Span$ avg(Rate) by Machine</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<chart>
<title>Average Process CPU by Role (capped at 2; beware kernel bug)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics $Roles$ host=$Host$ Machine=$Machine$ TrackLatestType=Original | eval Cpu=CPUSeconds/Elapsed | timechart $Span$ avg(Cpu) by Roles</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.axisY.maximumNumber">2</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Max Process CPU by Role (capped at 2; beware kernel bug)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics $Roles$ host=$Host$ Machine=$Machine$ TrackLatestType=Original | eval Cpu=CPUSeconds/Elapsed | timechart $Span$ max(Cpu) by Roles</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.axisY.maximumNumber">2</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Disk Busyness</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ $Roles$ host=$Host$ Machine=$Machine$ Type=ProcessMetrics TrackLatestType=Original | eval DiskBusyPercentage=(Elapsed-DiskIdleSeconds)/Elapsed | timechart $Span$ avg(DiskBusyPercentage) by Machine</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<chart>
<title>Max Run Loop Busyness by Role (for &lt;=6.1, S2Pri1)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ $Roles$ host=$Host$ Machine=$Machine$ Type=NetworkMetrics NOT TrackLatestType=Rolled | eval Busyness=if(isnull(PriorityStarvedBelow1), if(isnull(PriorityBusy1), S2Pri1, PriorityBusy1/Elapsed), PriorityStarvedBelow1/Elapsed) | timechart $Span$ max(Busyness) by Roles</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Max Run Loop Busyness by Priority (6.2+ only)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ $Roles$ host=$Host$ Machine=$Machine$ Type=NetworkMetrics TrackLatestType=Original | foreach PriorityBusy* [eval Busyness&lt;&lt;MATCHSTR&gt;&gt;=PriorityBusy&lt;&lt;MATCHSTR&gt;&gt;/Elapsed] | timechart $Span$ max(Busyness*)</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>TLog Queue Size</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=TLogMetrics $Roles$ host=$Host$ Machine=$Machine$ TrackLatestType=Original | eval QueueSize=SharedBytesInput-SharedBytesDurable | timechart $Span$ avg(QueueSize) by Machine</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<chart>
<title>Connection Timeouts (counted on both sides of connection)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ (Type=ConnectionTimeout OR Type=ConnectionTimedOut) $Roles$ host=$Host$ | eval WithAddr=if(Type=="ConnectionTimedOut", PeerAddr, WithAddr) | rex field=WithAddr "(?&lt;OtherAddr&gt;[^:]*:[^:]*).*" | eval Machine=Machine+","+OtherAddr | makemv delim="," Machine | search Machine=$Machine$ | eval Count=1+SuppressedEventCount | timechart sum(Count) by Machine useother=f</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.chart.nullValueMode">zero</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Pairwise Connection Timeouts Between Datacenters</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ (Type=ConnectionTimeout OR Type=ConnectionTimedOut) host=* Machine=* NOT TrackLatestType=Rolled
| eval WithAddr=if(Type=="ConnectionTimedOut", PeerAddr, WithAddr)
| rex field=host "(?&lt;Datacenter&gt;..).*"
| eval Datacenter=if(isnotnull(pie_work_unit), pie_work_unit, Datacenter)
| rex field=WithAddr "(?&lt;OtherIP&gt;[^:]*):.*"
| join OtherIP
[search index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics NOT TrackLatestType=Rolled
| rex field=Machine "(?&lt;OtherIP&gt;[^:]*):.*"
| rex field=host "(?&lt;OtherDatacenter&gt;..).*"
| eval OtherDatacenter=if(isnotnull(pie_work_unit), pie_work_unit, OtherDatacenter)]
| eval DC1=if(Datacenter&gt;OtherDatacenter, Datacenter, OtherDatacenter), DC2=if(Datacenter&gt;OtherDatacenter, OtherDatacenter, Datacenter)
| eval Connection=DC1+" &lt;-&gt; " + DC2
| eval Count=1+SuppressedEventCount
| timechart count by Connection</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<table>
<title>Pairwise Connection Timeouts Between Known Server Processes (Sorted by Count, descending)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ (Type=ConnectionTimeout OR Type=ConnectionTimedOut OR Type=ProcessMetrics) $Roles$ host=$Host$ Machine=$Machine$ NOT TrackLatestType=Rolled | eval WithAddr=if(Type=="ConnectionTimedOut", PeerAddr, WithAddr), Reason=if(Type=="ConnectionTimedOut", "Timed out trying to connect", "Established connection timed out") | rex field=Machine "(?&lt;IP&gt;[^:]*):.*" | rex field=host "(?&lt;Datacenter&gt;..).*" | rex field=WithAddr "(?&lt;OtherIP&gt;[^:]*):.*" | eventstats values(Roles) as Roles by IP | join OtherIP [search index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics NOT TrackLatestType=Rolled | rex field=Machine "(?&lt;OtherIP&gt;[^:]*):.*" | rex field=host "(?&lt;OtherDatacenter&gt;..).*" | stats values(Roles) as OtherRoles by OtherIP, OtherDatacenter | eval OtherRoles="("+mvjoin(OtherRoles,",")+")"] | eval Roles="("+mvjoin(Roles,",")+")" | eval IP=Datacenter+": "+IP+" "+Roles, OtherIP=OtherDatacenter+": "+OtherIP+" "+OtherRoles | eval Addr1=if(IP&gt;OtherIP, IP, OtherIP), Addr2=if(IP&gt;OtherIP, OtherIP, IP) | eval Connection=Addr1+" &lt;-&gt; " + Addr2 | eval Count=1+SuppressedEventCount | stats sum(Count) as Count, values(Reason) as Reasons by Connection | sort -Count</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<chart>
<title>Lazy Deletion Rate (making space available for reuse)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ $Roles$ host=$Host$ Machine=$Machine$ Type=SpringCleaningMetrics | eval Metric=LazyDeletePages | streamstats current=f global=f window=1 first(Metric) as NextMetric, first(Time), as NextTime by ID | eval Rate=4096*(NextMetric-Metric)/(NextTime-Time) | timechart $Span$ avg(Rate) by Machine</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Vacuuming Rate (shrinking file)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ $Roles$ host=$Host$ Machine=$Machine$ Type=SpringCleaningMetrics | eval Metric=VacuumedPages | streamstats current=f global=f window=1 first(Metric) as NextMetric, first(Time), as NextTime by ID | eval Rate=4096*(NextMetric-Metric)/(NextTime-Time) | timechart $Span$ avg(Rate) by Machine</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Roles</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ $Roles$ host=$Host$ Machine=$Machine$ NOT TrackLatestType=Rolled | makemv delim="," Roles | mvexpand Roles | timechart $Span$ distinct_count(Machine) by Roles</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<table>
<title>Slow Tasks (Sorted by Duration, Descending)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=SlowTask $Roles$ host=$Host$ Machine=$Machine$ | sort -Duration | table _time, Duration, Machine, TaskID, Roles</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Event Counts (Sorted by Severity and Count, Descending)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ $Roles$ host=$Host$ Machine=$Machine$ NOT TrackLatestType=Rolled | stats count as Count by Type, Severity | sort -Severity, -Count</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Errors</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Severity=40 $Roles$ host=$Host$ Machine=$Machine$ NOT TrackLatestType=Rolled | table _time, Type, Machine, Roles</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<table>
<title>Recoveries (Ignores Filters)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=MasterRecoveryState TrackLatestType=Original (StatusCode=0 OR StatusCode=11) | eval RecoveryResetInterval=10 | sort _time | streamstats earliest(_time) as RecoveryStart, count as EventCount reset_after="(StatusCode=11)" | where StatusCode=11 | eval EventCount=if(EventCount==1, 2, EventCount), RecoveryStart=if(RecoveryStart==_time, _time-RecoveryDuration, RecoveryStart) | sort -_time | streamstats current=f global=f window=1 first(RecoveryStart) as NextRecoveryStart | eval RecoverySpan=NextRecoveryStart-_time, FailedRecoveries=EventCount-2, SuccessfulRecoveries=1 | eval AvailableSeconds=if(RecoverySpan&lt;RecoveryResetInterval, RecoverySpan, 0) | sort _time | streamstats earliest(RecoveryStart) as RecoveryStart, sum(FailedRecoveries) as FailedRecoveryCount, sum(SuccessfulRecoveries) as SuccessfulRecoveryCount, sum(AvailableSeconds) as AvailableSeconds reset_after="(NOT RecoverySpan &lt; RecoveryResetInterval)" | where NOT RecoverySpan &lt; RecoveryResetInterval | eval Duration=_time-RecoveryStart, StartTime=strftime(RecoveryStart, "%F %X.%Q"), ShortLivedRecoveryCount=SuccessfulRecoveryCount-1 | table StartTime, Duration, FailedRecoveryCount, ShortLivedRecoveryCount, AvailableSeconds | sort -StartTime</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<table>
<title>Process (Re)starts</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProgramStart TrackLatestType=Original $Roles$ host=$Host$ Machine=$Machine$ | table _time, Machine | sort -_time</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<chart>
<title>Failure Detection (Machine Filter Only)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=FailureDetectionStatus System=$Machine$ | sort _time | eval Failed=if(Status=="Failed", 1, 0) | streamstats current=t global=f window=2 first(Failed) as PrevFailed by System | where PrevFailed=1 OR Failed=1 | eval Failed=PrevFailed + "," + Failed | makemv delim="," Failed | mvexpand Failed | timechart $Span$ max(Failed) by System</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.axisY.maximumNumber">1</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<table>
<title>Storage Server Space Usage (Sorted by Available Space Percentage, Ascending)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=StorageMetrics $Roles$ host=$Host$ Machine=$Machine$ TrackLatestType=Original | eval AvailableSpacePercent=KvstoreBytesAvailable/KvstoreBytesTotal, FreeSpacePercent=KvstoreBytesFree/KvstoreBytesTotal, GBUsed=KvstoreBytesUsed/1e9, GBStored=BytesStored/1e9, Overhead=KvstoreBytesUsed/BytesStored, GBTotalSpace=KvstoreBytesTotal/1e9 | stats latest(AvailableSpacePercent) as AvailableSpacePercent, latest(FreeSpacePercent) as FreeSpacePercent, latest(GBStored) as GBStored, latest(GBUsed) as GBUsed, latest(Overhead) as OverheadFactor, latest(GBTotalSpace) as GBTotalSpace by Machine | sort AvailableSpacePercent</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<table>
<title>TLog Server Space Usage (Sorted by Available Space Percentage, Ascending)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=TLogMetrics host=* Machine=* TrackLatestType=Original Roles=TL | eval AvailableSpacePercent=KvstoreBytesAvailable/KvstoreBytesTotal, FreeDiskSpacePercent=KvstoreBytesFree/KvstoreBytesTotal, GBUsed=KvstoreBytesUsed/1e9, GBTotalSpace=KvstoreBytesTotal/1e9 | stats latest(AvailableSpacePercent) as AvailableSpacePercent, latest(FreeDiskSpacePercent) as FreeDiskSpacePercent, latest(GBUsed) as GBUsed, latest(GBTotalSpace) as GBTotalSpace by Machine | sort AvailableSpacePercent</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<chart>
<title>Data Movement by Type (Log Scale, Ignores Filters)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=MovingData TrackLatestType=Original | timechart avg(Priority*) as *</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<chart>
<title>Storage Server Max Bytes Stored by Host</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=StorageMetrics $Roles$ host=$Host$ Machine=$Machine$ TrackLatestType=Original | eval GBStored=BytesStored/1e9 | timechart max(GBStored) by host limit=100</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<table>
<title>Master Failed Clients</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=WaitFailureClient
| stats count by FailedEndpoint</query>
<earliest>$TimeRange.earliest$</earliest>
<latest>$TimeRange.latest$</latest>
</search>
<option name="drilldown">none</option>
</table>
</panel>
</row>
</form>

View File

@ -0,0 +1,323 @@
<form theme="dark">
<label>FoundationDB - Performance Overview (Dev WiP)</label>
<fieldset submitButton="false" autoRun="true">
<input type="text" token="Index" searchWhenChanged="true">
<label>Index</label>
<default>*</default>
</input>
<input type="text" token="LogGroup" searchWhenChanged="true">
<label>LogGroup</label>
<default></default>
</input>
<input type="time" token="TimeSpan" searchWhenChanged="true">
<label>TimeSpan</label>
<default>
<earliest>-60m@m</earliest>
<latest>now</latest>
</default>
</input>
<input type="dropdown" token="UpdateRateTypeToken" searchWhenChanged="true">
<label>RK: Normal or Batch Txn</label>
<choice value="">Normal</choice>
<choice value="Batch">Batch</choice>
<default></default>
</input>
<input type="text" token="ChartBinSizeToken" searchWhenChanged="true">
<label>Chart Bin Size</label>
<default>60s</default>
</input>
</fieldset>
<row>
<panel>
<title>Transaction Rate measured on Proxies</title>
<chart>
<title>Sum in $ChartBinSizeToken$ seconds</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ host=* Machine=* (Type="ProxyMetrics" OR Type="GrvProxyMetrics") AND TrackLatestType="Original"
| makemv delim=" " TxnRequestIn | makemv delim=" " TxnRequestOut | makemv delim=" " TxnStartIn | makemv delim=" " TxnStartOut | makemv delim=" " TxnThrottled
| eval TxnRequestInRate=mvindex(TxnRequestIn, 0), TxnRequestOutRate=mvindex(TxnRequestOut, 0), TxnStartInRate=mvindex(TxnStartIn, 0), TxnStartOutRate=mvindex(TxnStartOut, 0), TxnThrottledRate=mvindex(TxnThrottled, 0)
| timechart span=$ChartBinSizeToken$ sum(TxnRequestInRate) as StartedTxnBatchRate, sum(TxnRequestOutRate) as FinishedTxnBatchRate, sum(TxnStartInRate) as StartedTxnRate, sum(TxnStartOutRate) as FinishedTxnRate, sum(TxnThrottledRate) as ThrottledTxnRate</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Read Rate measured on Storage Servers</title>
<chart>
<title>Average in $ChartBinSizeToken$ seconds</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=StorageMetrics TrackLatestType="Original"
| rex field=BytesQueried "(?&lt;RRate&gt;.*) (?&lt;RRoughness&gt;.*) (?&lt;RCounter&gt;.*)"
| rex field=RowsQueried "(?&lt;KRate&gt;.*) (?&lt;KRoughness&gt;.*) (?&lt;KCounter&gt;.*)"
| rex field=BytesInput "(?&lt;WRate&gt;.*) (?&lt;WRoughness&gt;.*) (?&lt;WCounter&gt;.*)"
| rex field=BytesFetched "(?&lt;FRate&gt;.*) (?&lt;FRoughness&gt;.*) (?&lt;FCounter&gt;.*)"
| timechart span=$ChartBinSizeToken$ avg(RRate) as BytesReadPerSecond, avg(KRate) as RowsReadPerSecond, avg(FRate) as DDReadPerSecond</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Write Rate measured on Proxies</title>
<chart>
<title>1min Average</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ host=* Machine=* (Type="ProxyMetrics" OR Type="GrvProxyMetrics") AND TrackLatestType="Original"
| makemv delim=" " MutationBytes
| makemv delim=" " Mutations
| eval MutationBytesRate=mvindex(MutationBytes, 0), MutationsRate=mvindex(Mutations,0)
| bucket span=5s _time
| stats sum(MutationBytesRate) as MutationBytes, sum(MutationsRate) as Mutations by _time
|eval MutationMB=MutationBytes/1024/1024, MutationsK=Mutations/1000
| timechart span=$ChartBinSizeToken$ avg(MutationMB) as MutationMB, avg(MutationsK) as MutationsK</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.abbreviation">none</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.layout.splitSeries">0</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Write Rate measured on Storage Servers</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=StorageMetrics TrackLatestType="Original"
| rex field=BytesInput "(?&lt;WRate&gt;.*) (?&lt;WRoughness&gt;.*) (?&lt;WCounter&gt;.*)"
| rex field=BytesFetched "(?&lt;FRate&gt;.*) (?&lt;FRoughness&gt;.*) (?&lt;FCounter&gt;.*)"
| timechart span=$ChartBinSizeToken$ avg(WRate) as BytesPerSecond, avg(FRate) as DDBytesWrittenPerSecond</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>GRV Latency measured on all Proxies</title>
<chart>
<title>Seconds</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=GRVLatencyMetrics AND TrackLatestType="Original"
| timechart span=$ChartBinSizeToken$ avg(Max) as maxLatency, avg(Mean) as meanLatency, avg(P99) as P99Latency, avg(P99.9) as P999Latency, avg(P95) as P95Latency</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Commit Latency measured on all Proxies</title>
<chart>
<title>Seconds</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=CommitLatencyMetrics AND TrackLatestType="Original"
| timechart span=$ChartBinSizeToken$ avg(Max) as maxLatency, avg(Mean) as meanLatency, avg(P99) as P99Latency, avg(P99.9) as P999Latency, avg(P95) as P95Latency</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Read Latency measured on all Storage Servers</title>
<chart>
<title>Seconds</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ReadLatencyMetrics AND TrackLatestType="Original"
| timechart span=$ChartBinSizeToken$ avg(Max) as maxLatency, avg(Mean) as meanLatency, avg(P99) as P99Latency, avg(P99.9) as P999Latency, avg(P95) as P95Latency</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>RateKeeper: ReleasedTPS vs LimitTPS</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=RkUpdate$UpdateRateTypeToken$ AND TrackLatestType="Original"
| replace inf with 100000000000
| eval _time=Time
| table _time ReleasedTPS TPSLimit
| timechart span=$ChartBinSizeToken$ avg(ReleasedTPS) avg(TPSLimit)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="height">251</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>RateKeeper: Throttling Reason</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=RkUpdate$UpdateRateTypeToken$ AND TrackLatestType="Original"
| replace inf with 100000000000
| eval _time=Time
| table _time Reason</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisLabelsY.majorUnit">1</option>
<option name="charting.axisY.abbreviation">none</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">area</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.mode">standard</option>
<option name="height">249</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>RateKeeper: Throttling Server</title>
<table>
<title>Ratekeeper: Limit Reason: ReasonServerID (Most recent 10 records)</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=RkUpdate AND TrackLatestType="Original"
| streamstats count as numOfEvents
| where numOfEvents &lt; 10
| eval DateTime=strftime(Time, "%Y-%m-%dT%H:%M:%S")
| table DateTime, ReasonServerID</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Disk Overhead = Disk Usage / Logical KV Size</title>
<chart>
<title>Y-axis is capped at 10</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ host=* Machine=* (Type=StorageMetrics OR Type=DDTrackerStats) TrackLatestType=Original
| bucket _time span=5s
| stats sum(KvstoreBytesUsed) as StorageDiskUsedBytes, sum(KvstoreBytesTotal) as StorageDiskTotalBytes, avg(TotalSizeBytes) as LogicalKVBytes by _time
| eval overhead=StorageDiskUsedBytes/LogicalKVBytes
| timechart avg(overhead)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.maximumNumber">10</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
</chart>
</panel>
<panel>
<title>KV Data Size</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Roles=*DD* host=* Machine=* Type=DDTrackerStats TrackLatestType=Original
| eval TotalKVGB=TotalSizeBytes/1024/1024/1024, SystemKVGB=SystemSizeBytes/1024/1024/1024
|timechart avg(TotalKVGB), avg(SystemKVGB), avg(Shards)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Disk Usage</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ host=* Machine=* Type=StorageMetrics TrackLatestType=Original
| bucket _time span=5s
| stats sum(KvstoreBytesUsed) as StorageDiskUsedBytes, sum(KvstoreBytesTotal) as StorageDiskTotalBytes by _time
|eval StorageDiskTotalMB = StorageDiskTotalBytes/1024/1024, StorageDiskUsedMB=StorageDiskUsedBytes/1024/1024
| timechart avg(StorageDiskTotalMB) as StorageDiskTotalMB, avg(StorageDiskUsedMB) as StorageDiskUsedMB</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="charting.legend.placement">bottom</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Cluster Roles</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics TrackLatestType="Original"
| rex field=host "(?&lt;HostDC&gt;..).*-..(?&lt;HostConfig&gt;..).*"
| eval HostDC=if(isnotnull(pie_work_unit), pie_work_unit, HostDC)
| makemv delim="," Roles
| stats dc(Machine) as MachineCount by Roles, HostDC
| stats list(HostDC), list(MachineCount) by Roles
| sort Roles</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="drilldown">none</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Storage Engine</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=Role Origination=Recruited As=StorageServer | table StorageEngine, OriginalDateTime, DateTime |head 2</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<title>Cluster Generations</title>
<chart>
<title>Indicate FDB recoveries</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=TLogMetrics |timechart max(Generation)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
</form>

View File

@ -0,0 +1,928 @@
<form theme="dark">
<label>FoundationDB - RateKeeper (Dev)</label>
<fieldset submitButton="false">
<input type="text" token="Index" searchWhenChanged="true">
<label>Index</label>
<default>*</default>
</input>
<input type="text" token="LogGroup" searchWhenChanged="true">
<label>LogGroup</label>
<default></default>
</input>
<input type="time" token="TimeSpan" searchWhenChanged="true">
<label>TimeSpan</label>
<default>
<earliest>-60m@m</earliest>
<latest>now</latest>
</default>
</input>
<input type="dropdown" token="UpdateRateTypeToken" searchWhenChanged="true">
<label>RKChart: Normal or Batch</label>
<choice value="">Normal</choice>
<choice value="Batch">Batch</choice>
<default></default>
</input>
<input type="text" token="ChartBinSizeToken" searchWhenChanged="true">
<label>Chart Bin Size</label>
<default>30s</default>
</input>
<input type="dropdown" token="ChartByMachineToken" searchWhenChanged="true">
<label>ClusterStateMetric byMachine</label>
<choice value="by Machine">Yes</choice>
<choice value="">No</choice>
<default></default>
</input>
<input type="dropdown" token="RolePerformanceChartToken" searchWhenChanged="true">
<label>Role for Proc Perf Charts</label>
<choice value="MasterServer">MasterServer</choice>
<choice value="MasterProxyServer">MasterProxyServer</choice>
<choice value="StorageServer">StorageServer</choice>
<choice value="TLog">TLog</choice>
<choice value="Resolver">Resolver</choice>
<choice value="GrvProxyServer">GrvProxyServer</choice>
<choice value="CommitProxyServer">CommitProxyServer</choice>
</input>
<input type="dropdown" token="SourcePerfConnectionToken" searchWhenChanged="true">
<label>Source for Perf Connection</label>
<choice value="MasterServer">MasterServer</choice>
<choice value="MasterProxyServer">MasterProxyServer</choice>
<choice value="Resolver">Resolver</choice>
<choice value="TLog">TLog</choice>
<choice value="StorageServer">StorageServer</choice>
<choice value="GrvProxyServer">GrvProxyServer</choice>
<choice value="CommitProxyServer">CommitProxyServer</choice>
</input>
<input type="dropdown" token="DestinationPerfConnectionToken" searchWhenChanged="true">
<label>Dest for Perf Connection</label>
<choice value="MasterServer">MasterServer</choice>
<choice value="MasterProxyServer">MasterProxyServer</choice>
<choice value="Resolver">Resolver</choice>
<choice value="TLog">TLog</choice>
<choice value="StorageServer">StorageServer</choice>
<choice value="GrvProxyServer">GrvProxyServer</choice>
<choice value="CommitProxyServer">CommitProxyServer</choice>
</input>
</fieldset>
<row>
<panel>
<title>Aggregated Storage Server Bandwidth</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=StorageMetrics TrackLatestType="Original"
| rex field=BytesQueried "(?&lt;RRate&gt;.*) (?&lt;RRoughness&gt;.*) (?&lt;RCounter&gt;.*)"
| rex field=BytesInput "(?&lt;WRate&gt;.*) (?&lt;WRoughness&gt;.*) (?&lt;WCounter&gt;.*)"
| rex field=BytesFetched "(?&lt;FRate&gt;.*) (?&lt;FRoughness&gt;.*) (?&lt;FCounter&gt;.*)"
| bin span=5s _time
| stats sum(RRate) as ReadSum, sum(WRate) as WriteSum, sum(FRate) as FetchedKeyRate by _time
| eval ReadSpeedMB=ReadSum/1024/1024, WriteSpeedMB=WriteSum/1024/1024, FetchedKeyRateMB=FetchedKeyRate/1024/1024
|timechart avg(ReadSpeedMB), avg(WriteSpeedMB), avg(FetchedKeyRateMB)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Aggregated Proxy Bandwidth</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ (Type="ProxyMetrics" OR Type="GrvProxyMetrics") AND TrackLatestType="Original"
| makemv delim=" " TxnRequestIn | makemv delim=" " TxnRequestOut | makemv delim=" " TxnStartIn | makemv delim=" " TxnStartOut | makemv delim=" " MutationBytes
| eval TxnRequestInRate=mvindex(TxnRequestIn, 0), TxnRequestOutRate=mvindex(TxnRequestOut, 0), TxnStartInRate=mvindex(TxnStartIn, 0), TxnStartOutRate=mvindex(TxnStartOut, 0), MutationBytesRate=mvindex(MutationBytes, 0)
| bin span=60s _time
| stats avg(TxnRequestInRate) as TxnRequestInRatePerHost, avg(TxnRequestOutRate) as TxnRequestOutRatePerHost, avg(TxnStartInRate) as TxnStartInRatePerHost, avg(TxnStartOutRate) as TxnStartOutRatePerHost, avg(MutationBytesRate) as MutationBytesRatePerHost by Machine,_time
| eval WriteThroughputKB=sum(MutationBytesRatePerHost)/1000
| timechart span=1m sum(TxnRequestInRatePerHost), sum(TxnRequestOutRatePerHost), sum(TxnStartInRatePerHost), sum(TxnStartOutRatePerHost), sum(WriteThroughputKB)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 1: Overview - GRV Arrivals and Leaves per Second Seen by Proxies</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ (Type="ProxyMetrics" OR Type="GrvProxyMetrics") AND TrackLatestType="Original"
| eval TxnRequestIn=mvindex(TxnRequestIn, 0), TxnRequestOut=mvindex(TxnRequestOut, 0), TxnStartIn=mvindex(TxnStartIn, 0), TxnStartOut=mvindex(TxnStartOut, 0)
| timechart span=30s avg(TxnRequestIn) avg(TxnRequestOut) avg(TxnStartIn) avg(TxnStartOut) by Machine</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="height">249</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 2: RKOverview - Input ReleasedTPS and Output TPSLimit</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=RkUpdate$UpdateRateTypeToken$ AND TrackLatestType="Original"
| replace inf with 100000000000
| eval _time=Time
| table _time ReleasedTPS TPSLimit
| timechart span=$ChartBinSizeToken$ avg(ReleasedTPS) avg(TPSLimit)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="height">251</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 3: RKOverview - RKLimitReason</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=RkUpdate$UpdateRateTypeToken$ AND TrackLatestType="Original"
| replace inf with 100000000000
| eval _time=Time
| table _time Reason</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisLabelsY.majorUnit">1</option>
<option name="charting.axisY.abbreviation">none</option>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">area</option>
<option name="charting.drilldown">none</option>
<option name="height">249</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 4: Don't Process Transactions - RkSSListFetchTimeout (TpsLimit = 0)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="RkSSListFetchTimeout"
| timechart span=1s count</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 5: Don't Process Transactions - RkTlogMinFreeSpaceZero (TpsLimit = 0)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="RkTlogMinFreeSpaceZero"
| timechart span=1s count</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 6: Don't Process Transactions - ProxyGRVThresholdExceeded</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ (Type="ProxyGRVThresholdExceeded*") AND TrackLatestType="Original"
| timechart span=1s count by Type</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 7: RKLimitReasonCandidate - LimitingStorageServerDurabilityLag (MVCCVersionInMemory)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=RkUpdate$UpdateRateTypeToken$ AND TrackLatestType="Original"
| replace inf with 100000000000
| timechart span=$ChartBinSizeToken$ avg(LimitingStorageServerDurabilityLag)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 8: RKLimitReasonCandidate - LimitingStorageServerVersionLag (TLogVer-SSVer)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=RkUpdate$UpdateRateTypeToken$ AND TrackLatestType="Original"
| replace inf with 100000000000
| timechart span=$ChartBinSizeToken$ avg(LimitingStorageServerVersionLag)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 9: RKLimitReasonCandidate - LimitingStorageServerQueue</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=RkUpdate$UpdateRateTypeToken$ AND TrackLatestType="Original"
| replace inf with 100000000000
| timechart span=$ChartBinSizeToken$ avg(LimitingStorageServerQueue)</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 10: Runtime Monitoring - StorageServer MVCCVersionInMemory (storage_server_durability_lag)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="StorageMetrics" AND TrackLatestType="Original"
| eval NonDurableVersions=Version-DurableVersion
| timechart span=$ChartBinSizeToken$ limit=0 avg(NonDurableVersions) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="height">251</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 11: Runtime Monitoring - StorageServer LocalRate (higher MVCCVersionInMemory -&gt; lower LocalRate)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="StorageMetrics"
| timechart limit=0 avg(LocalRate) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 12: Runtime Monitoring - StorageServer ReadsRejected (lower LocalRate -&gt; higher probability of rejecting read))</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="StorageMetrics"
| timechart limit=0 avg(ReadsRejected) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 13: Runtime Monitoring - Version Lag between StorageServer and Tlog (storage_server_readable_behind)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="StorageMetrics" AND TrackLatestType="Original"
| eval SSFallBehindVersions=VersionLag
| timechart span=$ChartBinSizeToken$ limit=0 avg(SSFallBehindVersions) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 14: Runtime Monitoring - StorageServerBytes (storage_server_write_queue_size)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="StorageMetrics" AND TrackLatestType="Original"
| makemv delim=" " BytesInput | makemv delim=" " BytesDurable | makemv delim=" " BytesFetched | makemv delim=" " MutationBytes
| eval BytesInput=mvindex(BytesInput, 2), BytesDurable=mvindex(BytesDurable, 2), BytesFetched=mvindex(BytesFetched, 2), MutationBytes=mvindex(MutationBytes, 2), BytesInMemoryQueue=BytesInput-BytesDurable
| timechart span=$ChartBinSizeToken$ limit=0 avg(BytesInMemoryQueue) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 15: Runtime Monitoring - StorageServer KVStore Free Space Ratio (storage_server_min_free_space)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="StorageMetrics" AND TrackLatestType="Original"
| eval KvstoreBytesFreeRatio=KvstoreBytesFree/KvstoreBytesTotal
| timechart span=$ChartBinSizeToken$ limit=0 avg(KvstoreBytesFreeRatio) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 16: Runtime Monitoring - TLog Queue Free Space Ratio (log_server_min_free_space)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="TLogMetrics" AND TrackLatestType="Original"
| eval QueueBytesFreeRatio=QueueDiskBytesFree/QueueDiskBytesTotal
| timechart span=$ChartBinSizeToken$ limit=0 avg(QueueBytesFreeRatio) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 17: Runtime Monitoring - TLog KVStore Free Space Ratio (log_server_min_free_space)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="TLogMetrics" AND TrackLatestType="Original"
| eval KvstoreBytesFreeRatio=KvstoreBytesFree/KvstoreBytesTotal
| timechart span=$ChartBinSizeToken$ limit=0 avg(KvstoreBytesFreeRatio) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 18: Runtime Monitoring - TLogBytes (log_server_write_queue)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="TLogMetrics" AND TrackLatestType="Original"
| makemv delim=" " BytesInput
| makemv delim=" " BytesDurable
| eval BytesInput=mvindex(BytesInput, 2), BytesDurable=mvindex(BytesDurable, 2), BytesInMemoryQueue=BytesInput-BytesDurable | timechart span=$ChartBinSizeToken$ limit=0 avg(BytesInMemoryQueue) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 19: Runtime Monitoring - Proxy Throughput</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ (Type="ProxyMetrics" OR Type="GrvProxyMetrics") AND TrackLatestType="Original"
| timechart span=$ChartBinSizeToken$ limit=0 avg(TxnRequestIn) avg(TxnRequestOut) avg(TxnStartIn) avg(TxnStartOut) avg(TxnStartBatch) avg(TxnStartErrors) avg(TxnCommitIn) avg(TxnCommitVersionAssigned) avg(TxnCommitResolving) avg(TxnCommitResolved) avg(TxnCommitOut) avg(TxnCommitOutSuccess) avg(TxnCommitErrors) avg(TxnThrottled) avg(TxnConflicts) avg(CommitBatchIn) avg(CommitBatchOut) avg(TxnRejectedForQueuedTooLong) avg(Mutations) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 20: Runtime Monitoring - Proxy Queue Length</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ (Type="ProxyMetrics" OR Type="GrvProxyMetrics") AND TrackLatestType="Original" | timechart span=$ChartBinSizeToken$ limit=0 avg(*QueueSize*) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 21: Runtime Monitoring - TLog UnpoppedVersion</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="TLogMetrics" AND TrackLatestType="Original"
| eval UnpoppedVersion=PersistentDataDurableVersion-QueuePoppedVersion
| timechart span=$ChartBinSizeToken$ limit=0 avg(UnpoppedVersion) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 22: Runtime Monitoring - Storage Server Disk (AIODiskStall)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="ProcessMetrics"
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type="Role" AND As="StorageServer"
| stats first(Machine) by Machine
| rename first(Machine) as Machine
| table Machine]
| timechart span=$ChartBinSizeToken$ limit=0 avg(AIODiskStall) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 23: Runtime Monitoring - StorageServer Query Queue Length</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="StorageMetrics" AND TrackLatestType="Original"
| makemv QueryQueue | eval QueryQueue=mvindex(QueryQueue, 1) | table _time QueryQueue Machine
| timechart span=$ChartBinSizeToken$ limit=0 avg(QueryQueue) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 24: Transaction Trace Stats - GRV Latency (only show CC transactions by default; it shows client transactions only when you manually open client transaction trace)</title>
<input type="dropdown" token="GRVByMachineStatsToken" searchWhenChanged="true">
<label>By Machine</label>
<choice value="Machine">Yes</choice>
<choice value="">No</choice>
<default></default>
</input>
<input type="text" token="StatsGRVSpanToken" searchWhenChanged="true">
<label>Span</label>
<default>500ms</default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="TransactionDebug" AND (*ProxyServer.masterProxyServerCore.Broadcast OR *ProxyServer.getLiveCommittedVersion.confirmEpochLive OR *ProxyServer.getLiveCommittedVersion.After)
| table Time Type ID Location Machine Roles
| append
[ search index=$Index$ LogGroup=$LogGroup$ Type="TransactionDebug" AND (*ProxyServer.queueTransactionStartRequests.Before)
| rename ID as ParentID
| table Time Type ParentID Location Machine Roles
| join ParentID
[ search index=$Index$ LogGroup=$LogGroup$ Type="TransactionAttachID"
| rename ID as ParentID
| rename To as ID
| table ParentID ID]
| table Time Type ID Location Machine Roles]
| table Time Type ID Location Machine Roles
| sort 0 Time
| table Machine Location Time Roles Type ID
| stats list(*) by ID
| rename list(*) as *
| eval TBegin=mvindex(Time, 0), TEnd=mvindex(Time, -1), TimeSpan=TEnd-TBegin, _time=TBegin
| bin bins=20 span=$StatsGRVSpanToken$ TimeSpan
| chart limit=0 count by TimeSpan $GRVByMachineStatsToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">column</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 25: Transaction Trace Stats - GetValue Latency (only show CC transactions by default; it shows client transactions only when you manually open client transaction trace)</title>
<input type="dropdown" token="GetValueByMachineStatsToken" searchWhenChanged="true">
<label>By Machine</label>
<choice value="Machine">Yes</choice>
<choice value="">No</choice>
<default></default>
</input>
<input type="text" token="StatsReadSpanToken" searchWhenChanged="true">
<label>Span</label>
<default>500ms</default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(storageServer.received OR getValueQ.DoRead OR getValueQ.AfterVersion OR Reader.Before OR Reader.After OR getValueQ.AfterRead OR NativeAPI.getKeyLocation.Before OR NativeAPI.getKeyLocation.After)
| table Machine Location Time Roles ID Type
| eval Order=case(Location=="NativeAPI.getKeyLocation.Before", 0, Location=="NativeAPI.getKeyLocation.After", 1, Location=="NativeAPI.getValue.Before", 2, Location=="storageServer.received", 3, Location=="getValueQ.DoRead", 4, Location=="getValueQ.AfterVersion", 5, Location=="Reader.Before", 6, Location=="Reader.After", 7, Location=="getValueQ.AfterRead", 8, Location=="NativeAPI.getValue.After", 9, Location=="NativeAPI.getValue.Error", 10)
| sort 0 Time Order
| stats list(*) by ID
| rename list(*) as *
| table Machine Location Time Roles ID Type
| eval count = mvcount(Location)
| search count&gt;2
| eval TEnd=mvindex(Time, -1), TBegin=mvindex(Time, 0), TimeSpan=TEnd-TBegin, _time=TBegin
| table _time ID TimeSpan Machine Location Time
| bin bins=20 span=$StatsReadSpanToken$ TimeSpan
| chart limit=0 count by TimeSpan $GetValueByMachineStatsToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">column</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 26: Transaction Trace Stats - Commit Latency (only show CC transactions by default; it shows client transactions only when you manually open client transaction trace)</title>
<input type="dropdown" token="CommitByMachineStatsToken">
<label>By Machine</label>
<choice value="Machine">Yes</choice>
<choice value="">No</choice>
<default>Machine</default>
</input>
<input type="text" token="StatsCommitSpanToken" searchWhenChanged="true">
<label>Span</label>
<default>500ms</default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="CommitDebug" AND (*ProxyServer.commitBatch.Before OR *ProxyServer.commitBatch.GettingCommitVersion OR *ProxyServer.commitBatch.GotCommitVersion OR *ProxyServer.commitBatch.ProcessingMutations OR *ProxyServer.commitBatch.AfterStoreCommits OR *ProxyServer.commitBatch.AfterLogPush OR *ProxyServer.commitBatch.AfterResolution)
| table Time Type ID Location Machine Roles
| sort 0 Time
| table Machine Location Time Roles Type ID
| stats list(*) by ID
| rename list(*) as *
| eval Count=mvcount(Location)
| search Count&gt;=2
| eval TBegin=mvindex(Time, 0), TEnd=mvindex(Time, -1), TimeSpan=TEnd-TBegin, _time=T1
| table _time TimeSpan Machine
| bin bins=20 span=$StatsCommitSpanToken$ TimeSpan
| chart limit=0 count by TimeSpan $CommitByMachineStatsToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">column</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 27: Transaction Tracing - GRV Latency (only show CC transactions by default; it shows client transactions only when you manually open client transaction trace)</title>
<input type="dropdown" token="GRVLatencyByMachineToken" searchWhenChanged="true">
<label>By Machine</label>
<choice value="by Machine">Yes</choice>
<choice value="">No</choice>
<default></default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="TransactionDebug" AND (*ProxyServer.*ProxyServerCore.Broadcast OR *ProxyServer.getLiveCommittedVersion.confirmEpochLive OR *ProxyServer.getLiveCommittedVersion.After)
| table Time Type ID Location Machine Roles
| append
[ search index=$Index$ LogGroup=$LogGroup$ Type="TransactionDebug" AND (*ProxyServer.queueTransactionStartRequests.Before)
| rename ID as ParentID
| table Time Type ParentID Location Machine Roles
| join ParentID
[ search index=$Index$ LogGroup=$LogGroup$ Type="TransactionAttachID"
| rename ID as ParentID
| rename To as ID
| table ParentID ID]
| table Time Type ID Location Machine Roles]
| table Time Type ID Location Machine Roles
| eval Order = case(Location=="NativeAPI.getConsistentReadVersion.Before", 0, Location like "%ProxyServer.queueTransactionStartRequests.Before", 1, Location="MasterProxyServer.masterProxyServerCore.Broadcast", 2, Location like "%ProxyServer.getLiveCommittedVersion.confirmEpochLive", 3, Location like "%ProxyServer.getLiveCommittedVersion.After", 5, Location=="NativeAPI.getConsistentReadVersion.After", 6)
| table Time Order Type ID Location Machine Roles
| sort 0 Order Time
| table Machine Location Time Roles Type ID
| stats list(*) by ID
| rename list(*) as *
| eval T1=mvindex(Time, 0), T2=mvindex(Time, 1), T3=mvindex(Time, 2), T4=mvindex(Time, 3), TimeInQueue = T2-T1, TimeGetVersionFromProxies = if(mvcount==4, T3-T2, -0.0000001), TimeConfirmLivenessFromTLogs = if(mvcount==4, T4-T3, T3-T2), TimeSpan=if(mvcount==4,T4-T1,T3-T1), _time=T1
| table _time TimeSpan TimeInQueue TimeGetVersionFromProxies TimeConfirmLivenessFromTLogs Machine
| timechart span=$ChartBinSizeToken$ limit=0 avg(TimeSpan), avg(TimeInQueue), avg(TimeGetVersionFromProxies), avg(TimeConfirmLivenessFromTLogs) $GRVLatencyByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 28: Transaction Tracing - GetValue Latency (only show CC transactions by default; it shows client transactions only when you manually open client transaction trace)</title>
<input type="dropdown" token="GetValueLatencyByMachineToken" searchWhenChanged="true">
<label>By Machine</label>
<choice value="by Machine">Yes</choice>
<choice value="">No</choice>
<default></default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(storageServer.received OR getValueQ.DoRead OR getValueQ.AfterVersion OR Reader.Before OR Reader.After OR getValueQ.AfterRead OR NativeAPI.getKeyLocation.Before OR NativeAPI.getKeyLocation.After)
| table Machine Location Time Roles ID Type
| eval Order=case(Location=="NativeAPI.getKeyLocation.Before", 0, Location=="NativeAPI.getKeyLocation.After", 1, Location=="NativeAPI.getValue.Before", 2, Location=="storageServer.received", 3, Location=="getValueQ.DoRead", 4, Location=="getValueQ.AfterVersion", 5, Location=="Reader.Before", 6, Location=="Reader.After", 7, Location=="getValueQ.AfterRead", 8, Location=="NativeAPI.getValue.After", 9, Location=="NativeAPI.getValue.Error", 10)
| sort 0 Time Order
| stats list(*) by ID
| rename list(*) as *
| table Machine Location Time Roles ID Type
| eval count = mvcount(Location)
| search count&gt;2
| eval TEnd=mvindex(Time, -1), TBegin=mvindex(Time, 0), TimeSpan=TEnd-TBegin, _time=TBegin
| table _time TimeSpan
| timechart span=30s limit=0 avg(TimeSpan) $GetValueLatencyByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 29: Transaction Tracing - Commit Latency (only show CC transactions by default; it shows client transactions only when you manually open client transaction trace)</title>
<input type="dropdown" token="CommitByMachineToken" searchWhenChanged="true">
<label>By Machine</label>
<choice value="By Machine">Yes</choice>
<choice value="">No</choice>
<default></default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="CommitDebug" AND (*ProxyServer.commitBatch.Before OR *ProxyServer.commitBatch.GettingCommitVersion OR *ProxyServer.commitBatch.GotCommitVersion OR *ProxyServer.commitBatch.ProcessingMutations OR *ProxyServer.commitBatch.AfterStoreCommits OR *ProxyServer.commitBatch.AfterLogPush OR *ProxyServer.commitBatch.AfterResolution)
| table Time Type ID Location Machine Roles
| eval Order=case(Location=="NativeAPI.commit.Before", 0, Location like "%ProxyServer.batcher", 1, Location like "%ProxyServer.commitBatch.Before", 2, Location like "%ProxyServer.commitBatch.GettingCommitVersion", 3, Location like "%ProxyServer.commitBatch.GotCommitVersion", 4, Location=="Resolver.resolveBatch.Before", 5, Location=="Resolver.resolveBatch.AfterQueueSizeCheck", 6, Location=="Resolver.resolveBatch.AfterOrderer", 7, Location=="Resolver.resolveBatch.After", 8, Location like "%ProxyServer.commitBatch.AfterResolution", 8.5, Location like "%ProxyServer.commitBatch.ProcessingMutations", 9, Location like "%ProxyServer.commitBatch.AfterStoreCommits", 10, Location=="TLog.tLogCommit.BeforeWaitForVersion", 11, Location=="TLog.tLogCommit.Before", 12, Location=="TLog.tLogCommit.AfterTLogCommit", 13, Location=="TLog.tLogCommit.After", 14, Location like "%ProxyServer.commitBatch.AfterLogPush", 15, Location=="NativeAPI.commit.After", 16)
| table Time Order Type ID Location Machine Roles
| sort 0 Time Order
| table Machine Location Time Roles Type ID
| stats list(*) by ID
| rename list(*) as *
| eval Count=mvcount(Location)
| search Count=7
| eval T1=mvindex(Time, 0), T2=mvindex(Time, 1), T3=mvindex(Time, 2), T4=mvindex(Time, 3), T5=mvindex(Time, 4), T6=mvindex(Time, 5), T7=mvindex(Time, 6), TimeSpan=T7-T1, TimeResolution=T4-T3, TimePostResolution=T5-T4, TimeProcessingMutation=T6-T5, TimeTLogPush=T7-T6, _time=T1
| table _time TimeSpan TimeResolution TimePostResolution TimeProcessingMutation TimeTLogPush Machine
| timechart span=$ChartBinSizeToken$ limit=0 avg(TimeSpan), avg(TimeResolution), avg(TimePostResolution), avg(TimeProcessingMutation), avg(TimeTLogPush) $CommitByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 30: Transaction Tracing - Commit - TLogPush and Resolver Latency (only show CC transactions by default; it shows client transactions only when you manually open client transaction trace)</title>
<input type="dropdown" token="TLogResolverByMachineToken" searchWhenChanged="true">
<label>By Machine</label>
<choice value="MachineStep">Yes</choice>
<choice value="Step">No</choice>
<default>Step</default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="CommitDebug" AND (Resolver.resolveBatch.Before OR Resolver.resolveBatch.AfterQueueSizeCheck OR Resolver.resolveBatch.AfterOrderer OR Resolver.resolveBatch.After OR TLog.tLogCommit.BeforeWaitForVersion OR TLog.tLogCommit.Before OR TLog.tLogCommit.AfterTLogCommit OR TLog.tLogCommit.After)
| table Time Type ID Location Machine Roles
| eval Order=case(Location=="NativeAPI.commit.Before", 0, Location=="MasterProxyServer.batcher", 1, Location=="MasterProxyServer.commitBatch.Before", 2, Location=="MasterProxyServer.commitBatch.GettingCommitVersion", 3, Location=="MasterProxyServer.commitBatch.GotCommitVersion", 4, Location=="Resolver.resolveBatch.Before", 5, Location=="Resolver.resolveBatch.AfterQueueSizeCheck", 6, Location=="Resolver.resolveBatch.AfterOrderer", 7, Location=="Resolver.resolveBatch.After", 8, Location=="MasterProxyServer.commitBatch.AfterResolution", 8.5, Location=="MasterProxyServer.commitBatch.ProcessingMutations", 9, Location=="MasterProxyServer.commitBatch.AfterStoreCommits", 10, Location=="TLog.tLogCommit.BeforeWaitForVersion", 11, Location=="TLog.tLogCommit.Before", 12, Location=="TLog.tLogCommit.AfterTLogCommit", 13, Location=="TLog.tLogCommit.After", 14, Location=="MasterProxyServer.commitBatch.AfterLogPush", 15, Location=="NativeAPI.commit.After", 16)
| table Time Order Type ID Location Machine Roles
| sort 0 Time Order
| table Machine Location Time Roles Type ID
| stats list(*) by ID
| rename list(*) as *
| eval Count=mvcount(Location), Step=case(Count=4 and (mvindex(Location, 0) like "TLog%"), "TimeTLogCommit", Count=4 and (mvindex(Location, 0) like "Resolver%"), "TimeResolver", Count=10, "TimeSpan"), BeginTime=mvindex(Time, 0), EndTime=mvindex(Time, -1), Duration=EndTime-BeginTime, _time=BeginTime
| search Count=4
| eval Machinei=mvindex(Machine, 0), MachineStep = Step."-".Machinei
| table _time Step Duration Machinei Location Machine MachineStep
| timechart span=$ChartBinSizeToken$ limit=0 avg(Duration) by $TLogResolverByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 31: Machine Performance - CPU Utilization (CPU Time divided by Elapsed)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics AND TrackLatestType="Original"
| table _time Machine CPUSeconds DiskFreeBytes DiskIdleSeconds DiskQueueDepth DiskReadsCount DiskWriteSectors DiskTotalBytes DiskWritesCount FileReads MbpsReceived MbpsSent Memory ResidentMemory UnusedAllocatedMemory Elapsed
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type="Role" AND As=$RolePerformanceChartToken$
| stats first(Machine) by Machine
| rename first(Machine) as Machine
| table Machine]
| eval Utilization=CPUSeconds/Elapsed
| timechart span=$ChartBinSizeToken$ avg(Utilization) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 32: Machine Performance - Memory Utilization (ResidentMemory divided by Memory)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics AND TrackLatestType="Original"
| table _time Machine CPUSeconds DiskFreeBytes DiskIdleSeconds DiskQueueDepth DiskReadsCount DiskWriteSectors DiskTotalBytes DiskWritesCount FileReads MbpsReceived MbpsSent Memory ResidentMemory UnusedAllocatedMemory
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type="Role" AND As=$RolePerformanceChartToken$
| stats first(Machine) by Machine
| rename first(Machine) as Machine
| table Machine]
| eval Utilization = ResidentMemory/Memory
| timechart span=$ChartBinSizeToken$ avg(Utilization) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">linear</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 33: Machine Performance - Disk Utilization ((DiskTotalBytes-DiskFreeBytes)/DiskTotalBytes)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics AND TrackLatestType="Original"
| table _time Machine CPUSeconds DiskFreeBytes DiskIdleSeconds DiskQueueDepth DiskReadsCount DiskWriteSectors DiskTotalBytes DiskWritesCount FileReads MbpsReceived MbpsSent Memory ResidentMemory UnusedAllocatedMemory
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type="Role" AND As=$RolePerformanceChartToken$
| stats first(Machine) by Machine
| rename first(Machine) as Machine
| table Machine]
| eval Utilization = (DiskTotalBytes-DiskFreeBytes)/DiskTotalBytes
| timechart span=$ChartBinSizeToken$ avg(Utilization) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 34: Machine Performance - Network (Mbps Received and Mbps Sent)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics AND TrackLatestType="Original"
| table _time Machine CPUSeconds DiskFreeBytes DiskIdleSeconds DiskQueueDepth DiskReadsCount DiskWriteSectors DiskTotalBytes DiskWritesCount FileReads MbpsReceived MbpsSent Memory ResidentMemory UnusedAllocatedMemory
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type="Role" AND As=$RolePerformanceChartToken$
| stats first(Machine) by Machine
| rename first(Machine) as Machine
| table Machine]
| timechart span=$ChartBinSizeToken$ avg(MbpsReceived) avg(MbpsSent) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.axisY.scale">log</option>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 35: Machine Performance - Disk (Reads Count and Writes Count)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics AND TrackLatestType="Original"
| table _time Machine CPUSeconds DiskFreeBytes DiskIdleSeconds DiskQueueDepth DiskReadsCount DiskWriteSectors DiskTotalBytes DiskWritesCount FileReads MbpsReceived MbpsSent Memory ResidentMemory UnusedAllocatedMemory
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type="Role" AND As=$RolePerformanceChartToken$
| stats first(Machine) by Machine
| rename first(Machine) as Machine
| table Machine]
| timechart span=$ChartBinSizeToken$ avg(DiskReadsCount) avg(DiskWritesCount) $ChartByMachineToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 36: Network Performance - Timeout</title>
<input type="dropdown" token="TimeoutByConnectionToken" searchWhenChanged="true">
<label>By Connection</label>
<choice value="By Connection">Yes</choice>
<choice value="">No</choice>
<default></default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type=ConnectionTimedOut OR Type=ConnectionTimeout)
| replace *:tls with * in PeerAddr
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($SourcePerfConnectionToken$))
| dedup ID]
| join PeerAddr
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($DestinationPerfConnectionToken$))
| dedup ID
| rename Machine as PeerAddr]
| eval Connection=Machine."-".PeerAddr
| timechart useother=0 span=$ChartBinSizeToken$ count $TimeoutByConnectionToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
<panel>
<title>Chart 37: Network Performance - PingLatency</title>
<input type="dropdown" token="PingLatencyByConnectionToken" searchWhenChanged="true">
<label>By Connection</label>
<choice value="By Connection">Yes</choice>
<choice value="">No</choice>
<default></default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type=PingLatency)
| replace *:tls with * in PeerAddr
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($SourcePerfConnectionToken$))
| dedup ID]
| join PeerAddr
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($DestinationPerfConnectionToken$))
| dedup ID
| rename Machine as PeerAddr]
| eval Connection=Machine."-".PeerAddr
| timechart useother=0 span=$ChartBinSizeToken$ avg(MeanLatency) avg(MaxLatency) $PingLatencyByConnectionToken$</query>
<earliest>$TimeSpan.earliest$</earliest>
<latest>$TimeSpan.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
</form>

View File

@ -0,0 +1,873 @@
<form theme="dark">
<label>FoundationDB - Long Recovery (Dev)</label>
<fieldset submitButton="false" autoRun="false"></fieldset>
<row>
<panel>
<title>Table 1: Find long recovery (Input Index and LogGroup and Select a time span).</title>
<input type="text" token="IndexForOverview" searchWhenChanged="true">
<label>Index</label>
<default>*</default>
</input>
<input type="text" token="LogGroupForOverview" searchWhenChanged="true">
<label>LogGroup</label>
<default></default>
</input>
<input type="time" token="time_token_for_recoveryhistorytable" searchWhenChanged="true">
<label>Select a time span</label>
<default>
<earliest>-0s</earliest>
<latest>now</latest>
</default>
</input>
<table>
<search>
<query>index=$IndexForOverview$ LogGroup=$LogGroupForOverview$
((Type="MasterRecoveryState" AND (Status="reading_coordinated_state" OR Status="fully_recovered" OR Status="accepting_commits")) OR (Type="Role" AND As="MasterServer" AND ("Transition"="Begin" OR "Transition"="End")) OR Type="MasterTerminated") AND (NOT TrackLatestType="Rolled") | eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table ID Machine Type Transition As Status DateTime Time ErrorDescription LogGroup
| search NOT ErrorDescription="Success"
| eval EventType=case(Transition="Begin" AND As="MasterServer" AND Type="Role", "MasterStart", Type="MasterRecoveryState" AND Status="fully_recovered", "FullRecovery", Type="MasterRecoveryState" AND Status="reading_coordinated_state", "StartRecoveryAttempt", Transition="End" AND As="MasterServer" AND Type="Role", "MasterTerminated", Type="MasterTerminated", "MasterTerminated", Type="MasterRecoveryState" AND Status="accepting_commits", "AcceptingCommits")
| table ID Machine EventType DateTime Time ErrorDescription LogGroup
| fillnull value="-"
| sort -Time
| eval ifMasterTerminatedEvent=if(EventType="MasterTerminated", 1, 0)
| stats list(*) by ID Machine ifMasterTerminatedEvent
| rename list(*) as *
| table ID Machine EventType DateTime Time ErrorDescription LogGroup
| sort -Time
| eval LastTime=mvindex(Time, 0), FirstTime=mvindex(Time, -1), Duration=LastTime-FirstTime
| table ID Machine Duration EventType DateTime Time ErrorDescription LogGroup</query>
<earliest>$time_token_for_recoveryhistorytable.earliest$</earliest>
<latest>$time_token_for_recoveryhistorytable.latest$</latest>
</search>
<option name="count">15</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 2: Select timespan containing the long recovery and see all recovery attempts in the time span (The input Index and LogGroup and Timespan are for all following tables and charts)</title>
<input type="text" token="Index" searchWhenChanged="true">
<label>Index</label>
<default>*</default>
</input>
<input type="text" searchWhenChanged="true" token="LogGroup">
<label>LogGroup</label>
</input>
<input type="time" token="ReoveryTime" searchWhenChanged="true">
<label>ReoveryTimeSpan</label>
<default>
<earliest>-0s@s</earliest>
<latest>now</latest>
</default>
</input>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type="MasterRecoveryState" OR (Type="MasterTerminated") OR (Type="Role" AND As="MasterServer" AND "Transition"="End") OR Type="RecoveryInternal" OR Type="ProxyReplies" OR Type="CommitProxyReplies" OR Type="ResolverReplies" OR Type="MasterRecruitedInitialStorageServers") AND (NOT TrackLatestType="Rolled")
| rename ID as MasterID
| eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table MasterID Machine Status Step Type DateTime Time StatusCode MyRecoveryCount ErrorDescription Reason ErrorCode
| fillnull value="-" ErrorDescription Reason ErrorCode
| eval Status=case(Type=="MasterRecoveryState", Status, Type=="Role", "RoleEnd", Type=="MasterTerminated", "MasterTerminated", Type=="RecoveryInternal", Status."/".Step, Type=="ProxyReplies" OR Type=="CommitProxyReplies", "initializing_transaction_servers/ProxyReplies", Type="ResolverReplies", "initializing_transaction_servers/ResolverReplies", Type=="MasterRecruitedInitialStorageServers", "initializing_transaction_servers/MasterRecruitedInitialStorageServers"), StatusCode=case(Type=="ProxyReplies" OR Type=="CommitProxyReplies" OR Type=="ResolverReplies" OR Type=="MasterRecruitedInitialStorageServers", "8", Type!="ProxyReplies" AND Type!="CommitProxyReplies" AND Type!="ResolverReplies" AND Type!="MasterRecruitedInitialStorageServers", StatusCode)
| fillnull value="-" StatusCode
| sort 0 -Time -StatusCode
| stats list(*) by MasterID Machine
| rename list(*) as *
| eval FirstTime=mvindex(Time, -1), LastTime=mvindex(Time, 0), Duration=LastTime-FirstTime
| table MasterID Machine MyRecoveryCount Duration ErrorDescription Reason ErrorCode StatusCode Status DateTime Time
| sort -MyRecoveryCount
| fillnull value="-" MyRecoveryCount</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="count">3</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 3: Why recovery is triggered? Using WaitFailureClient event. Machine A detects Machine B's failure. First column is the time when WaitFailureClient happens. Columns of 2,3,4,5 are for A. Columns of 6,7 are for B.</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="WaitFailureClient"
| table Type Time Machine FailedEndpoint
| replace *:tls with * in FailedEndpoint
| join Machine type=left
[ search index=$Index$ LogGroup=$LogGroup$ Type="Role" AND Transition="End"
| eval EndTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| rename As as Role
| table ID EndTime Machine Role]
| join FailedEndpoint type=left
[ search index=$Index$ LogGroup=$LogGroup$ Type="Role"
| stats latest(*) by ID | rename latest(*) as *
| rename Machine as FailedEndpoint
| eval FailedEndpointLatestRoleEventInfo=As."/".ID."/".Type.Transition."/".strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| stats list(*) by FailedEndpoint
| rename list(*) as *
| table FailedEndpoint FailedEndpointLatestRoleEventInfo]
| eval FailureDetectedTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| makemv delim=" " FailedEndpointLatestRoleEventInfo
| table FailureDetectedTime Machine ID Role EndTime FailedEndpoint FailedEndpointLatestRoleEventInfo</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 4: New Recruitment Configuration (using MasterRecoveredConfig event)</title>
<event>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="MasterRecoveredConfig" AND TrackLatestType="Original"
| eval Configuration=replace(Conf, "&amp;quot;", "\"")
| rename Configuration as _raw</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="list.drilldown">none</option>
<option name="refresh.display">progressbar</option>
</event>
</panel>
</row>
<row>
<panel>
<title>Table 5: Data Centers (using ProcessMetrics event)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type=ProcessMetrics
| dedup DCID
| rename DCID as DataCenterID
| table DataCenterID pie_work_unit
| fillnull value="-"</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<title>Table 6: New Role (using Role event joined by ProcessMetrics event)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ((As="ClusterController") OR (As="MasterServer") OR (As="TLog") OR (As="Resolver") OR (As="MasterProxyServer") OR (As="CommitProxyServer") OR (As="GrvProxyServer") OR (As="LogRouter")) AND (NOT TrackLatestType="Rolled") AND (NOT Transition="Refresh"))
| eventstats count by ID
| rename As as Role
| search count=1 AND Transition="Begin"
| table ID Role Machine
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics
| dedup Machine, DCID
| rename DCID as DataCenter
| table Machine DataCenter]
| table ID Role Machine DataCenter
| fillnull value="null" DataCenter
| stats count by Role DataCenter</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 7: Role Details</title>
<input type="multiselect" token="RolesToken" searchWhenChanged="true">
<label>Roles</label>
<choice value="MasterServer">MasterServer</choice>
<choice value="TLog">TLog</choice>
<choice value="Resolver">Resolver</choice>
<choice value="MasterProxyServer">MasterProxyServer (for &lt;7.0)</choice>
<choice value="LogRouter">LogRouter</choice>
<choice value="CommitProxyServer">CommitProxyServer (for 7.0+)</choice>
<choice value="GrvProxyServer">GrvProxyServer (for 7.0+)</choice>
<valuePrefix>As="</valuePrefix>
<valueSuffix>"</valueSuffix>
<delimiter> OR </delimiter>
</input>
<input type="dropdown" token="RoleDetailTableWhichRoleToken" searchWhenChanged="true">
<label>Begin/End</label>
<choice value="count=1 AND Transition=&quot;Begin&quot;">Begin</choice>
<choice value="count=1 AND Transition=&quot;End&quot;">End</choice>
<choice value="count=2">Begin-&gt;End</choice>
<default>count=1 AND Transition="Begin"</default>
</input>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($RolesToken$) AND (NOT TrackLatestType="Rolled") AND (NOT Transition="Refresh"))
| eventstats count by ID
| rename As as Role
| search $RoleDetailTableWhichRoleToken$
| table ID Role Machine Time
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics
| dedup Machine, DCID
| rename DCID as DataCenter
| table Machine DataCenter]
| table ID Role Machine DataCenter Time
| fillnull value="null" DataCenter
| eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table ID Role Machine DataCenter DateTime
| sort 0 -DateTime</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 8: CC Recruitment SevWarn OR SevError (use events in clusterRecruitFromConfiguration and clusterRecruitRemoteFromConfiguration)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="RecruitFromConfigurationNotAvailable" OR Type="RecruitFromConfigurationRetry" OR Type="RecruitFromConfigurationError" OR Type="RecruitRemoteFromConfigurationNotAvailable" OR Type="RecruitRemoteFromConfigurationRetry" OR Type="RecruitRemoteFromConfigurationError"
| eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)"), GoodRecruitmentTimeReady=case(Type=="RecruitFromConfigurationNotAvailable" OR Type=="RecruitRemoteFromConfigurationNotAvailable", "True", Type=="RecruitFromConfigurationRetry" OR Type=="RecruitRemoteFromConfigurationRetry", GoodRecruitmentTimeReady, Type=="RecruitFromConfigurationError" OR Type=="RecruitRemoteFromConfigurationError", "-")
| table Type GoodRecruitmentTimeReady Time DateTime</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 9: RecoveryCount of the selected TLog (in Table 11)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(ID=$row.TLogID$ AND Type="TLogStart") OR (LogId=$row.TLogID$ AND Type="TLogPersistentStateRestore")
| eval ID=if(Type="TLogStart", ID, LogId), DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table ID RecoveryCount Type DateTime | fillnull value="Not found. The fdb version is somewhat old."</query>
<earliest>-7d@h</earliest>
<latest>now</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<title>Table 10: Which roles the selected TLog (in Table 11) talks to</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
((Type="TLogRejoining" AND ID=$row.TLogID$) OR ((Type="TLogJoinedMe" OR Type="TLogJoinedMeUnknown" OR Type="TLogRejoinSlow") AND TLog=$row.TLogID$) OR ((Type="TLogLockStarted" OR Type="TLogLocked") AND TLog=$row.TLogID$) OR (Type="TLogStop" AND ID=$row.TLogID$) OR (Type="TLogStop2" AND LogId=$row.TLogID$) OR (Type="Role" AND As="TLog" AND NOT Transition="Refresh" AND ID=$row.TLogID$)) AND (NOT TrackLatestType="Rolled")
| sort -Time
| eval TLogID=case((Type="TLogRejoining"), ID, (Type="TLogJoinedMe") OR (Type="TLogJoinedMeUnknown") OR (Type="TLogRejoinSlow"), TLog, (Type="TLogLockStarted") OR (Type="TLogLocked"), TLog, (Type="TLogStop"), ID, (Type="TLogStop2"), LogId, Type="Role", ID), TLogEvents=case((Type="TLogRejoining"), Time." ".Type." ".Master, (Type="TLogJoinedMe") OR (Type="TLogJoinedMeUnknown") OR (Type="TLogRejoinSlow") OR (Type="TLogLockStarted") OR (Type="TLogLocked"), Time." ".Type." ".ID." "."Null", (Type="TLogStop") OR (Type="TLogStop2"), Time." ".Type." "."Null", (Type="Role" AND As="TLog" AND NOT Transition="Refresh" AND NOT TrackLatestType="Rolled"), Time." "."Role".Transition." "."Null")
| stats list(*) by TLogID
| rename list(*) As *
| table TLogID TLogEvents
| eval ignore = if(mvcount(TLogEvents)==1 AND like(mvindex(TLogEvents, 0), "% RoleEnd"), 1, 0)
| search ignore=0
| sort TLogID
| table TLogID TLogEvents
| mvexpand TLogEvents
| eval temp=split(TLogEvents," "), Time=mvindex(temp,0), Event=mvindex(temp,1), MasterID=mvindex(temp,2)
| fields - temp - TLogEvents
| sort 0 -Time
| search NOT MasterID="NULL"
| dedup MasterID
| rename MasterID as ID
| join type=left ID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role")
| sort 0 -Time
| dedup ID
| table ID Machine As]
| table ID Machine As | fillnull value="null" Machine As</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 11: TLog Events (Collecting all TLogs that produce interesting events during the time span)</title>
<input type="text" token="SeeLogEventDetailTableToken" searchWhenChanged="true">
<label>Input * to do search</label>
</input>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type="TLogRecover") OR (Type="TLogReady") OR (Type="TLogStart") OR
((Type="TLogLockStarted") OR (Type="TLogLocked") OR (Type="TLogStop") OR (Type="TLogStop2")) OR (Type="Role" AND As="TLog" AND NOT Transition="Refresh") AND (NOT TrackLatestType="Rolled") AND $SeeLogEventDetailTableToken$
| sort -Time
| eval TLogID=case((Type="TLogRecover"), LogId, (Type="TLogReady"), ID, (Type="TLogStart"), ID, (Type="TLogLockStarted") OR (Type="TLogLocked"), TLog, (Type="TLogStop"), ID, (Type="TLogStop2"), LogId, Type="Role", ID), TLogEvents=case((Type="TLogRecover"), Time." ".Type." "."null", (Type="TLogReady"), Time." ".Type." "."null", (Type="TLogStart"), Time." ".Type." "."null", (Type="TLogLockStarted") OR (Type="TLogLocked"), Time." ".Type." ".ID." "."null", (Type="TLogStop") OR (Type="TLogStop2"), Time." ".Type." "."null", (Type="Role" AND As="TLog" AND NOT Transition="Refresh" AND NOT TrackLatestType="Rolled"), Time." "."Role".Transition." "."null")
| stats list(TLogEvents) by TLogID
| rename list(TLogEvents) As TLogEvents
| eval EarliestEvent=mvindex(TLogEvents, -1) , LatestEvent=mvindex(TLogEvents, 0)
| table TLogID TLogEvents EarliestEvent LatestEvent
| eval ignore = if(mvcount(TLogEvents)==1 AND like(mvindex(TLogEvents, 0), "% RoleEnd"), 1, 0)
| search ignore=0
| sort TLogID
| join type=left TLogID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND As="TLog")
| sort 0 -Time
| dedup ID
| rename ID as TLogID
| table TLogID host LogGroup Machine]
| table TLogID Machine LogGroup host EarliestEvent LatestEvent
| fillnull value="null" Machine host LogGroup
| eval temp=split(LatestEvent," "), LatestTime=mvindex(temp,0), LatestEvent=mvindex(temp,1), temp2=split(EarliestEvent," "), EarliestTime=mvindex(temp2,0), EarliestEvent=mvindex(temp2,1), Duration=LatestTime-EarliestTime
| table TLogID Machine EarliestTime Duration LogGroup host
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$
Type=ProcessMetrics
| dedup Machine, DCID
| rename DCID as DataCenter
| table Machine DataCenter]
| fillnull value="null" DataCenter
| table TLogID Machine DataCenter EarliestTime Duration host LogGroup
| join type=left TLogID
[ search index=$Index$ LogGroup=$LogGroup$
((Type="TLogRejoining") OR ((Type="TLogJoinedMe" OR Type="TLogJoinedMeUnknown" OR Type="TLogRejoinSlow")) OR ((Type="TLogLockStarted" OR Type="TLogLocked")) OR (Type="TLogStop") OR (Type="TLogStop2") OR (Type="Role" AND As="TLog" AND NOT Transition="Refresh")) AND (NOT TrackLatestType="Rolled")
| sort -Time
| eval TLogID=case((Type="TLogRejoining"), ID, (Type="TLogJoinedMe") OR (Type="TLogJoinedMeUnknown") OR (Type="TLogRejoinSlow"), TLog, (Type="TLogLockStarted") OR (Type="TLogLocked"), TLog, (Type="TLogStop"), ID, (Type="TLogStop2"), LogId, Type="Role", ID), TLogEvents=case((Type="TLogRejoining"), Time." ".Type." ".Master, (Type="TLogJoinedMe") OR (Type="TLogJoinedMeUnknown") OR (Type="TLogRejoinSlow") OR (Type="TLogLockStarted") OR (Type="TLogLocked"), Time." ".Type." ".ID." "."Null", (Type="TLogStop") OR (Type="TLogStop2"), Time." ".Type." "."Null", (Type="Role" AND As="TLog" AND NOT Transition="Refresh" AND NOT TrackLatestType="Rolled"), Time." "."Role".Transition." "."Null")
| stats list(*) by TLogID
| rename list(*) As *
| table TLogID TLogEvents
| eval ignore = if(mvcount(TLogEvents)==1 AND like(mvindex(TLogEvents, 0), "% RoleEnd"), 1, 0)
| search ignore=0
| sort TLogID
| table TLogID TLogEvents
| mvexpand TLogEvents
| eval temp=split(TLogEvents," "), Time=mvindex(temp,0), Event=mvindex(temp,1), RoleID=mvindex(temp,2)
| fields - temp - TLogEvents
| sort 0 -Time
| search NOT RoleID="NULL"
| table TLogID RoleID MasterMachine
| stats list(*) by TLogID
| rename list(*) as *
| streamstats count
| mvexpand RoleID
| dedup count RoleID
| fields - count
| stats count by TLogID
| rename count as Roles
| table TLogID Roles]
| table TLogID Machine DataCenter Roles EarliestTime Duration host LogGroup
| join type=left TLogID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="TLogRecover") OR (Type="TLogReady") OR (Type="TLogStart") OR
((Type="TLogRejoinSlow") OR (Type="TLogLockStarted") OR (Type="TLogLocked") OR (Type="TLogStop") OR (Type="TLogStop2") OR (Type="Role" AND As="TLog" AND NOT Transition="Refresh") AND (NOT TrackLatestType="Rolled"))
| sort -Time
| eval TLogID=case((Type="TLogRecover"), LogId, (Type="TLogReady"), ID, (Type="TLogStart"), ID, (Type="TLogRejoinSlow"), TLog, (Type="TLogLockStarted") OR (Type="TLogLocked"), TLog, (Type="TLogStop"), ID, (Type="TLogStop2"), LogId, Type="Role", ID), TLogEvents=if(Type="Role", Type.Transition, Type)
| sort 0 TLogEvents
| stats list(TLogEvents) by TLogID
| rename list(TLogEvents) As TLogEvents
| table TLogID TLogEvents
| eval ignore = if(mvcount(TLogEvents)==1 AND like(mvindex(TLogEvents, 0), "% RoleEnd"), 1, 0)
| search ignore=0
| mvcombine delim=" " TLogEvents
| table TLogID TLogEvents]
| table TLogID Machine DataCenter Roles Duration TLogEvents EarliestTime host LogGroup
| eval EarliestDateTime=strftime(EarliestTime, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table TLogID Machine DataCenter Roles Duration TLogEvents EarliestDateTime host LogGroup
| join type=left TLogID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="TLogStart") OR (Type="TLogPersistentStateRestore")
| eval TLogID=if(Type="TLogStart", ID, LogId)
| table TLogID RecoveryCount]
| table TLogID RecoveryCount Machine DataCenter Roles Duration TLogEvents EarliestDateTime host LogGroup
| fillnull value="TLog too old, click and see details" RecoveryCount</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">cell</option>
<option name="wrap">false</option>
<drilldown>
<set token="row.TLogID">$click.value$</set>
</drilldown>
</table>
</panel>
<panel>
<title>Table 12: Event Details (Including rejoining events) of the selected TLog (in Table 11)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type="TLogRecover" AND LogId=$row.TLogID$) OR (Type="TLogReady" AND ID=$row.TLogID$) OR (Type="TLogStart" AND ID=$row.TLogID$) OR
((Type="TLogRejoining" AND ID=$row.TLogID$) OR ((Type="TLogJoinedMe" OR Type="TLogJoinedMeUnknown" OR Type="TLogRejoinSlow") AND TLog=$row.TLogID$) OR ((Type="TLogLockStarted" OR Type="TLogLocked") AND TLog=$row.TLogID$) OR (Type="TLogStop" AND ID=$row.TLogID$) OR (Type="TLogStop2" AND LogId=$row.TLogID$) OR (Type="Role" AND As="TLog" AND NOT Transition="Refresh" AND ID=$row.TLogID$)) AND (NOT TrackLatestType="Rolled")
| sort -Time
| eval TLogID=case((Type="TLogRecover"), LogId, (Type="TLogReady"), ID, (Type="TLogStart"), ID, (Type="TLogRejoining"), ID, (Type="TLogJoinedMe") OR (Type="TLogJoinedMeUnknown") OR (Type="TLogRejoinSlow"), TLog, (Type="TLogLockStarted") OR (Type="TLogLocked"), TLog, (Type="TLogStop"), ID, (Type="TLogStop2"), LogId, Type="Role", ID), TLogEvents=case((Type="TLogRecover"), Time." ".Type." "."-"." "."-", (Type="TLogReady"), Time." ".Type." "."-"." "."-", (Type="TLogStart"), Time." ".Type." "."-"." "."-", (Type="TLogRejoining"), Time." ".Type." ".Master." "."-", (Type="TLogJoinedMe") OR (Type="TLogJoinedMeUnknown") OR (Type="TLogRejoinSlow") OR (Type="TLogLockStarted") OR (Type="TLogLocked"), Time." ".Type." ".ID." "."-", (Type="TLogStop") OR (Type="TLogStop2"), Time." ".Type." "."-"." "."-", (Type="Role" AND As="TLog" AND Transition="Begin" AND NOT TrackLatestType="Rolled"), Time." "."Role".Transition." "."-"." ".Origination, (Type="Role" AND As="TLog" AND Transition="End" AND NOT TrackLatestType="Rolled"), Time." "."Role".Transition." "."-"." "."-")
| stats list(*) by TLogID
| rename list(*) As *
| table TLogID TLogEvents
| eval ignore = if(mvcount(TLogEvents)==1 AND like(mvindex(TLogEvents, 0), "% RoleEnd"), 1, 0)
| search ignore=0
| sort TLogID
| join type=left TLogID
[ search index=$Index$ LogGroup=$LogGroup$ (Type="Role" AND As="TLog" AND ID=$row.TLogID$)
| dedup ID
| rename ID as TLogID
| table TLogID Machine]
| table TLogID Machine TLogEvents
| fillnull value="-" Machine
| mvexpand TLogEvents
| eval temp=split(TLogEvents," "), Time=mvindex(temp,0), Event=mvindex(temp,1), ToID=mvindex(temp,2), Origination= mvindex(temp,3)
| fields - temp - TLogEvents
| join type=left
[ search index=$Index$ LogGroup=$LogGroup$ (Type="Role")
| dedup ID
| rename ID as ToID
| rename As as ToRole
| rename Machine as ToMachine
| table ToID ToRole ToMachine]
| sort 0 -Time
| fillnull value="-" ToRole ToMachine
| eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table TLogID Machine Event DateTime ToID ToRole ToMachine Time DateTime</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="count">14</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 13: All Tags of the selected TLog (in Table 11) that have been popped by SSes (using TLogPoppedTag event)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(ID=$row.TLogID$ AND Type="TLogPoppedTag")
| eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| rename ID as TLogID
| rename Tags as UnpoppedRecoveredTagCount
| rename Tag as TagPopped
| rename DurableKCVer as DurableKnownCommittedVersion
| search TagPopped!="-1:2"
| table TLogID DateTime UnpoppedRecoveredTagCount TagPopped DurableKnownCommittedVersion RecoveredAt
| sort 0 -UnpoppedRecoveredTagCount
| join TagPopped type=left
[ search index=$Index$ LogGroup=$LogGroup$
(Type="StorageMetrics")
| stats latest(*) by Machine
| rename latest(*) as *
| rename Tag as TagPopped
| table TagPopped ID Machine]
| table TLogID DateTime UnpoppedRecoveredTagCount TagPopped DurableKnownCommittedVersion RecoveredAt ID Machine
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$
Type=ProcessMetrics
| dedup Machine, DCID
| rename DCID as DataCenter
| table Machine DataCenter]
| rename ID as SSID
| rename Machine as SSMachine
| rename DataCenter as SSDataCenter
| table TLogID DateTime UnpoppedRecoveredTagCount TagPopped SSID SSMachine SSDataCenter DurableKnownCommittedVersion RecoveredAt
| fillnull value="-"</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
<option name="wrap">false</option>
</table>
</panel>
<panel>
<title>Table 14: All Tags of the selected TLog (in Table 11) to be popped by SSes (using TLogReady event)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(ID=$row.TLogID$ AND Type="TLogReady")
| eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| rename ID as TLogID
| table TLogID Type AllTags Locality
| makemv delim="," AllTags
| mvexpand AllTags
| rename AllTags as Tag | sort 0 Tag
| join Tag type=left
[ search index=$Index$ LogGroup=$LogGroup$
(Type="StorageMetrics")
| stats latest(*) by Machine
| rename latest(*) as *
| table Tag ID Machine]
| table TLogID Tag ID Machine
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$
Type=ProcessMetrics
| dedup Machine, DCID
| rename DCID as DataCenter
| table Machine DataCenter]
| fillnull value="-"
| table TLogID Tag ID Machine DataCenter
| rename ID as SSID | rename Machine as SSMachine | rename DataCenter as SSDataCenter
| search Tag!="-1:2"</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 15: The Tags of the selected TLog (in Table 11) that are not popped by SSes (using set diff tags in Table 13 and Table 14) (if result contains "...", the result of Table 15 is wrong)</title>
<table>
<search>
<query>| set diff
[ search index=$Index$ LogGroup=$LogGroup$
(ID=$row.TLogID$ AND Type="TLogReady")
| table AllTags
| makemv delim="," AllTags
| mvexpand AllTags
| rename AllTags as Tag
| table Tag]
[ search index=$Index$ LogGroup=$LogGroup$
(ID=$row.TLogID$ AND Type="TLogPoppedTag")
| table Tag]</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
<panel>
<title>Table 16: All Current Storage Servers (assume each machine has at most one SS)</title>
<input type="text" token="TriggerSSTableToken" searchWhenChanged="true">
<label>Input * to search</label>
</input>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type="StorageMetrics") AND $TriggerSSTableToken$
| stats latest(*) by Machine
| rename latest(*) as *
| table Tag ID Machine
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$
Type=ProcessMetrics
| dedup Machine, DCID
| rename DCID as DataCenter
| table Machine DataCenter]
| table ID Machine DataCenter Tag
| join ID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ((As="StorageServer")) AND (NOT TrackLatestType="Rolled"))
| stats latest(*) by Machine
| rename latest(*) as *
| rename As as Role
| table ID Role Machine
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$
Type=ProcessMetrics
| dedup Machine, DCID
| rename DCID as DataCenter
| table Machine DataCenter]
| table ID Role Machine DataCenter
| fillnull value="null" DataCenter]
| sort 0 DataCenter
| table Tag ID Machine DataCenter | sort 0 Tag</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Chart 1: Timeout/TimedOut event distribution grouped by source (Machine)</title>
<input type="text" token="TimeoutEventByMachineTableTimeSpanToken" searchWhenChanged="true">
<label>TimeSpan</label>
<default>5s</default>
</input>
<input type="multiselect" token="TimeoutbyMachineTableSourceRoleToken" searchWhenChanged="true">
<label>Select Source Roles</label>
<choice value="TLog">TLog</choice>
<choice value="MasterServer">MasterServer</choice>
<choice value="MasterProxyServer">MasterProxyServer (for version &lt; 7)</choice>
<choice value="Resolver">Resolver</choice>
<choice value="ClusterController">ClusterController</choice>
<choice value="SharedTLog">SharedTLog</choice>
<choice value="LogRouter">LogRouter</choice>
<choice value="Coordinator">Coordinator</choice>
<choice value="StorageServer">StorageServer</choice>
<choice value="CommitProxyServer">CommitProxyServer (for version 7+)</choice>
<choice value="GrvProxyServer">GrvProxyServer (for ver 7+)</choice>
<valuePrefix>As="</valuePrefix>
<valueSuffix>"</valueSuffix>
<delimiter> OR </delimiter>
</input>
<input type="multiselect" token="TimeoutbyMachineTableDestinationRoleToken" searchWhenChanged="true">
<label>Select Destination Roles</label>
<choice value="TLog">TLog</choice>
<choice value="MasterServer">MasterServer</choice>
<choice value="MasterProxyServer">MasterProxyServer (for version &lt;7)</choice>
<choice value="Resolver">Resolver</choice>
<choice value="ClusterController">ClusterController</choice>
<choice value="SharedTLog">SharedTLog</choice>
<choice value="LogRouter">LogRouter</choice>
<choice value="Coordinator">Coordinator</choice>
<choice value="StorageServer">StorageServer</choice>
<choice value="CommitProxyServer">CommitProxyServer (for version 7+)</choice>
<choice value="GrvProxyServer">GrvProxyServer (for version 7+)</choice>
<valuePrefix>As="</valuePrefix>
<valueSuffix>"</valueSuffix>
<delimiter> OR </delimiter>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type=ConnectionTimedOut OR Type=ConnectionTimeout)
| replace *:tls with * in PeerAddr
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($TimeoutbyMachineTableSourceRoleToken$))
| dedup ID]
| join PeerAddr
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($TimeoutbyMachineTableDestinationRoleToken$))
| dedup ID
| rename Machine as PeerAddr]
| timechart useother=0 span=$TimeoutEventByMachineTableTimeSpanToken$ count by Machine</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="height">233</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Chart 2: Timeout/TimedOut event distribution grouped by destination (PeerAddr)</title>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type=ConnectionTimedOut OR Type=ConnectionTimeout)
| replace *:tls with * in PeerAddr
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($TimeoutbyMachineTableSourceRoleToken$))
| dedup ID]
| join PeerAddr
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($TimeoutbyMachineTableDestinationRoleToken$))
| dedup ID
| rename Machine as PeerAddr]
| timechart useother=0 span=$TimeoutEventByMachineTableTimeSpanToken$ count by PeerAddr</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="height">219</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Table 17: Check Type=ConnectionTimedOut OR Type=ConnectionTimeout events between transaction roles in the recovery (including the role that refresh/begin/end in the timespan)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type=ConnectionTimedOut OR Type=ConnectionTimeout)
| replace *:tls with * in PeerAddr
| stats count as TotalTimeouts by Machine PeerAddr
| table Machine PeerAddr TotalTimeouts
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($TimeoutbyMachineTableSourceRoleToken$))
| stats latest(*) by ID
| rename latest(*) as *
| eval Role = As."/".ID."/".Type.Transition."/".strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| stats list(Role) AS MachineRoleLatestEvent BY Machine
]
| join PeerAddr
[ search index=$Index$ LogGroup=$LogGroup$
(Type="Role" AND ($TimeoutbyMachineTableDestinationRoleToken$))
| stats latest(*) by ID
| rename latest(*) as *
| eval Role = As."/".ID."/".Type.Transition."/".strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| stats list(Role) AS PeerRoleLatestEvent BY Machine
| rename Machine AS PeerAddr
]
| table Machine PeerAddr TotalTimeouts MachineRoleLatestEvent PeerRoleLatestEvent</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 18: Proxy 0</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Type="ProxyReplies" OR Type="CommitProxyReplies") AND FirstProxy="True"
| eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table WorkerID LogGroup FirstProxy Time DateTime
| sort 0 -Time
| join type=left WorkerID
[ search index=$Index$ LogGroup=$LogGroup$
Type="Role" AND As="Worker" AND Transition="Refresh"
| dedup ID
| rename ID as WorkerID
| stats list(*) by WorkerID
| rename list(*) as *
| table WorkerID Machine Roles]
| table WorkerID Machine Roles LogGroup FirstProxy Time DateTime
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$
Type="Role" AND (As="MasterProxyServer" OR As="CommitProxyServer") AND Transition="Refresh"
| dedup ID
| rename ID as ProxyID
| table Machine ProxyID]
| table ProxyID Machine LogGroup FirstProxy</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Table 19: Latest Role Events on the input Machine (Input Machine, like 172.27.113.121:4500)</title>
<input type="text" token="SearchMachineToken" searchWhenChanged="true">
<label>Machine (IP:PORT)</label>
</input>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="Role" AND Machine=$SearchMachineToken$
| stats latest(*) by ID Transition
| rename latest(*) as *
| eval DateTime=strftime(Time, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table DateTime Machine ID Transition As Roles LogGroup Error ErrorDescription Reason
| sort 0 -DateTime
| fillnull value="-"</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Chart 3: severity&gt;=20 event distribution (including roles that refresh/begin/end in the timespan)</title>
<input type="text" token="BadEvents" searchWhenChanged="true">
<label>Events</label>
<default>*</default>
</input>
<input type="multiselect" token="BadEventRoleToken" searchWhenChanged="true">
<label>Roles</label>
<choice value="TLog">TLog</choice>
<choice value="MasterServer">MasterServer</choice>
<choice value="MasterProxyServer">MasterProxyServer (for version &lt;7)</choice>
<choice value="Resolver">Resolver</choice>
<choice value="ClusterController">ClusterController</choice>
<choice value="SharedTLog">SharedTLog</choice>
<choice value="LogRouter">LogRouter</choice>
<choice value="Coordinator">Coordinator</choice>
<choice value="StorageServer">StorageServer</choice>
<choice value="CommitProxyServer">CommitProxyServer (for version 7+)</choice>
<choice value="GrvProxyServer">GrvProxyServer (for version 7+)</choice>
<valuePrefix>As="</valuePrefix>
<valueSuffix>"</valueSuffix>
<delimiter> OR </delimiter>
</input>
<input type="dropdown" token="BadEventChartBy" searchWhenChanged="true">
<label>By</label>
<choice value="Type">EventType</choice>
<choice value="Machine">Machine</choice>
<choice value="Severity">Severity</choice>
<default>Type</default>
</input>
<input type="text" token="BadEventChartTimeSpanToken" searchWhenChanged="true">
<label>TimeSpan</label>
<default>5s</default>
</input>
<chart>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Severity&gt;10 AND $BadEvents$
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$
Type="Role" AND ($BadEventRoleToken$)
| dedup ID | table Machine]
| table Machine Type Severity _time
| timechart useother=0 span=$BadEventChartTimeSpanToken$ count by $BadEventChartBy$</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="charting.chart">line</option>
<option name="charting.drilldown">none</option>
<option name="height">305</option>
<option name="refresh.display">progressbar</option>
</chart>
</panel>
</row>
<row>
<panel>
<title>Table 20: Check severity&gt;20 events of roles in the recovery (including the role that refresh/begin/end in the timespan)</title>
<table>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Severity&gt;10
| stats count by Machine Type
| rename count as Count
| join Machine
[ search index=$Index$ LogGroup=$LogGroup$
Type="Role" AND ($BadEventRoleToken$)
| dedup ID
| eval Role=As."-".ID
| stats list(Role) by Machine
| rename list(Role) as Roles
| table Machine Roles]
| table Type Count Roles Machine
| sort -Count</query>
<earliest>$ReoveryTime.earliest$</earliest>
<latest>$ReoveryTime.latest$</latest>
</search>
<option name="drilldown">none</option>
<option name="refresh.display">progressbar</option>
<option name="wrap">false</option>
</table>
</panel>
</row>
</form>

View File

@ -0,0 +1,247 @@
<form theme="dark">
<label>FoundationDB - Tracing GRV and Commit Long Latency of CC Transactions (6.3 and 7.0+) (DEV)</label>
<description>Design for ClusterController issued transactions.</description>
<fieldset submitButton="false" autoRun="true">
<input type="text" token="Index" searchWhenChanged="true">
<label>Index</label>
<default></default>
</input>
<input type="text" token="LogGroup" searchWhenChanged="true">
<label>LogGroup</label>
<default>*</default>
</input>
<input type="text" token="transactionID">
<label>Hex Transaction ID (optional)</label>
<default>*</default>
</input>
<input type="time" token="time_token" searchWhenChanged="true">
<label>Time span</label>
<default>
<earliest>@d</earliest>
<latest>now</latest>
</default>
</input>
</fieldset>
<row>
<panel>
<title>All Transactions (Currently, this table also does not cover getrange operation and the operation which not do commit).</title>
<table>
<title>for FDB 6.3 and 7.0+</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ ID=$transactionID$
(Type="TransactionAttachID" OR Type="GetValueAttachID" OR Type="CommitAttachID")
| eval To=case(Type=="TransactionAttachID", "0"."-".To, Type="GetValueAttachID", "1"."-".To, Type=="CommitAttachID", "2"."-".To)
| stats list(To) by ID
| rename list(To) as ToList
| table ID ToList
| eval Count = mvcount(ToList)
| search Count=3
| eval To0=mvindex(ToList,0), To1=mvindex(ToList,1), To2=mvindex(ToList,2), To0=split(To0,"-"), To1=split(To1,"-"), To2=split(To2,"-"), GrvID=case(mvindex(To0, 0)=="0", mvindex(To0, 1), mvindex(To1, 0)=="0", mvindex(To1, 1), mvindex(To2, 0)=="0", mvindex(To2, 1)), ReadID=case(mvindex(To0, 0)=="1", mvindex(To0, 1), mvindex(To1, 0)=="1", mvindex(To1, 1), mvindex(To2, 0)=="1", mvindex(To2, 1)), CommitID=case(mvindex(To0, 0)=="2", mvindex(To0, 1), mvindex(To1, 0)=="2", mvindex(To1, 1), mvindex(To2, 0)=="2", mvindex(To2, 1))
| table ID GrvID ReadID CommitID
| join GrvID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="TransactionDebug" AND Location="NativeAPI.getConsistentReadVersion.Before")
| rename ID as GrvID
| rename Time as BeginTime
| table GrvID BeginTime
]
| join GrvID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="TransactionDebug" AND Location="NativeAPI.getConsistentReadVersion.After")
| rename ID as GrvID
| rename Time as GRVDoneTime
| table GrvID GRVDoneTime
]
| join ReadID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="GetValueDebug" AND Location="NativeAPI.getValue.After")
| rename ID as ReadID
| rename Time as ReadDoneTime
| table ReadID ReadDoneTime
]
| join CommitID
[ search index=$Index$ LogGroup=$LogGroup$
(Type="CommitDebug" AND Location="NativeAPI.commit.After")
| rename ID as CommitID
| rename Time as CommitDoneTime
| table CommitID CommitDoneTime
]
| rename ID as TransactionID
| eval BeginToGRVDone = GRVDoneTime-BeginTime, GRVDoneToReadDone = ReadDoneTime-GRVDoneTime, ReadDoneToCommitDone = CommitDoneTime-ReadDoneTime, Duration=CommitDoneTime-BeginTime, BeginTimeScope=BeginTime-1, EndTimeScope=CommitDoneTime+1, BeginDateTime=strftime(BeginTime, "%Y-%m-%d %H:%M:%S.%Q (%Z)")
| table TransactionID Duration BeginDateTime BeginToGRVDone GRVDoneToReadDone ReadDoneToCommitDone Duration GrvID ReadID CommitID BeginTimeScope EndTimeScope | sort -Duration</query>
<earliest>$time_token.earliest$</earliest>
<latest>$time_token.latest$</latest>
</search>
<option name="drilldown">cell</option>
<drilldown>
<set token="BeginTime">$row.BeginTimeScope$</set>
<set token="EndTime">$row.EndTimeScope$</set>
<set token="ReadID">$row.ReadID$</set>
<set token="GrvID">$row.GrvID$</set>
<set token="CommitID">$row.CommitID$</set>
</drilldown>
</table>
</panel>
</row>
<row>
<panel>
<title>Step1: GRV</title>
<table>
<title>for FDB 6.3 and 7.0+</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="TransactionDebug" AND (NOT MasterProxyServer.masterProxyServerCore.GetRawCommittedVersion)
AND (ID=$GrvID$ OR ID=
[ search index=$Index$ LogGroup=$LogGroup$
Type="TransactionAttachID" AND ID=$GrvID$
| return $To])
| table Time Type ID Location Machine Roles
| eventstats min(Time) as MinTime
| eval Delta = Time - MinTime, Order = case(Location=="NativeAPI.getConsistentReadVersion.Before", 0, Location like "%ProxyServer.queueTransactionStartRequests.Before", 1, Location=="MasterProxyServer.masterProxyServerCore.Broadcast", 2, Location=="GrvProxyServer.transactionStarter.AskLiveCommittedVersionFromMaster", 2.1, Location like "%ProxyServer.getLiveCommittedVersion.confirmEpochLive", 3, Location=="MasterServer.serveLiveCommittedVersion.GetRawCommittedVersion", 4, Location like "%ProxyServer.getLiveCommittedVersion.After", 5, Location=="NativeAPI.getConsistentReadVersion.After", 6)
| table Time Delta Order Type ID Location Machine Roles
| sort 0 Order
| table Machine Location Delta Time Roles ID Type</query>
<earliest>$BeginTime$</earliest>
<latest>$EndTime$</latest>
</search>
<option name="drilldown">none</option>
</table>
</panel>
<panel>
<title>Step1: (Only for FDB v6.3): GRV --- Get Committed Version (MasterProxyServer.masterProxyServerCore.GetRawCommittedVersion Events)</title>
<table>
<title>only for FDB 6.3</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="TransactionDebug" AND Location="MasterProxyServer.masterProxyServerCore.GetRawCommittedVersion"
AND ID=
[ search index=$Index$ LogGroup=$LogGroup$
Type="TransactionAttachID" AND ID=$GrvID$
| return $To]
| table Time Type ID Location Machine Roles
| eventstats min(Time) as MinTime
| eval Delta = Time - MinTime
| sort 0 -Time
| table Machine Delta Time Roles ID Type</query>
<earliest>$BeginTime$</earliest>
<latest>$EndTime$</latest>
</search>
<option name="drilldown">none</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Step2: GetValue</title>
<table>
<title>for FDB 6.3 and 7.0+</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$ Type="GetValueDebug" AND ID=$ReadID$
| eventstats min(Time) as MinTime
| eval Delta = Time-MinTime
| table Machine Location Delta Time Roles ID Type
| eval Order=case(Location=="NativeAPI.getKeyLocation.Before", 0, Location=="NativeAPI.getKeyLocation.After", 1, Location=="NativeAPI.getValue.Before", 2, Location=="storageServer.received", 3, Location=="getValueQ.DoRead", 4, Location=="getValueQ.AfterVersion", 5, Location=="Reader.Before", 6, Location=="Reader.After", 7, Location=="getValueQ.AfterRead", 8, Location=="NativeAPI.getValue.After", 9, Location=="NativeAPI.getValue.Error", 10)
| sort 0 Order
| table Machine Location Delta Time Roles ID Type</query>
<earliest>$time_token.earliest$</earliest>
<latest>$time_token.latest$</latest>
</search>
<option name="drilldown">none</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Step3: Commit</title>
<table>
<title>for FDB 6.3 and 7.0+</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
Type="CommitDebug" AND (ID=$CommitID$ OR ID=
[ search index=$Index$ LogGroup=$LogGroup$
Type="CommitAttachID" AND ID=$CommitID$
| return $To])
| table Time Type ID Location Machine Roles
| eventstats min(Time) as MinTime
| eval Delta = Time-MinTime
| table Machine Location Delta Time Roles ID Type
| eval Order=case(Location=="NativeAPI.commit.Before", 0, Location like "%ProxyServer.batcher", 1, Location like "%ProxyServer.commitBatch.Before", 2, Location like "%ProxyServer.commitBatch.GettingCommitVersion", 3, Location like "%ProxyServer.commitBatch.GotCommitVersion", 4, Location=="Resolver.resolveBatch.Before", 5, Location=="Resolver.resolveBatch.AfterQueueSizeCheck", 6, Location=="Resolver.resolveBatch.AfterOrderer", 7, Location=="Resolver.resolveBatch.After", 8, Location like "%ProxyServer.commitBatch.AfterResolution", 8.5, Location like "%ProxyServer.commitBatch.ProcessingMutations", 9, Location like "%ProxyServer.commitBatch.AfterStoreCommits", 10, Location=="TLogServer.tLogCommit.BeforeWaitForVersion", 11, Location=="TLogServer.tLogCommit.Before", 12, Location=="TLogServer.tLogCommit.AfterTLogCommit", 13, Location=="TLogServer.tLogCommit.After", 14, Location like "%ProxyServer.commitBatch.AfterLogPush", 15, Location=="NativeAPI.commit.After", 16)
| sort 0 Order
| table Machine Location Delta Time Roles ID Type</query>
<earliest>$BeginTime$</earliest>
<latest>$EndTime$</latest>
</search>
<option name="drilldown">none</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Step3: Commit --- Resolver</title>
<table>
<title>for FDB 6.3 and 7.0+</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Location="Resolver*")
| join ID
[ search index=$Index$ LogGroup=$LogGroup$
Type="CommitAttachID" AND ID=
[ search index=$Index$ LogGroup=$LogGroup$
Type="CommitAttachID" AND ID=$CommitID$
| return $To]
| rename To as ID
| table ID]
| eventstats min(Time) as MinTime
| eval Delta = Time-MinTime
| eval Order=case(Location=="Resolver.resolveBatch.Before", 5, Location=="Resolver.resolveBatch.AfterQueueSizeCheck", 6, Location=="Resolver.resolveBatch.AfterOrderer", 7, Location=="Resolver.resolveBatch.After", 8)
| sort 0 Time Order
| stats list(*) by Type ID Machine Roles
| rename list(*) as *
| eval T1=mvindex(Time, 0), T2=mvindex(Time, 3), Duration=T2-T1 | sort -Duration
| table Machine Roles Duration Location Delta Time
| join type=left Machine
[ search index=$Index$ LogGroup=$LogGroup$ Type=ProcessMetrics
| dedup Machine, DCID
| rename DCID as DataCenter
| table Machine DataCenter]
| table Machine DataCenter Roles Duration Location Delta Time</query>
<earliest>$time_token.earliest$</earliest>
<latest>$time_token.latest$</latest>
</search>
<option name="drilldown">none</option>
</table>
</panel>
</row>
<row>
<panel>
<title>Step3: Commit --- Commit to TLogs (CommitDebug Events), grouped by Machine and sorted by Duration</title>
<table>
<title>for FDB 6.3 and 7.0+</title>
<search>
<query>index=$Index$ LogGroup=$LogGroup$
(Location="TLog*")
| join ID
[ search index=$Index$ LogGroup=$LogGroup$
Type="CommitAttachID" AND ID=
[ search index=$Index$ LogGroup=$LogGroup$
Type="CommitAttachID" AND ID=$CommitID$
| return $To]
| rename To as ID
| table ID]
| eventstats min(Time) as MinTime
| eval Delta = Time-MinTime
| sort 0 Time
| stats list(*) by Type ID Machine Roles
| rename list(*) as *
| eval T1=mvindex(Time, 0), T2=mvindex(Time, 3), Duration=T2-T1 | sort -Duration
| table Machine Roles Duration Location Delta Time</query>
<earliest>$BeginTime$</earliest>
<latest>$EndTime$</latest>
</search>
<option name="count">10</option>
<option name="drilldown">none</option>
</table>
</panel>
</row>
</form>

View File

@ -165,7 +165,6 @@ def centos_image_with_fdb_helper(versioned: bool) -> Iterator[Optional[Image]]:
container = Container("centos:7", initd=True)
for rpm in rpms:
container.copy_to(rpm, "/opt")
container.run(["bash", "-c", "yum update -y"])
container.run(
["bash", "-c", "yum install -y prelink"]
) # this is for testing libfdb_c execstack permissions
@ -327,7 +326,7 @@ def test_execstack_permissions_libfdb_c(linux_container: Container, snapshot):
[
"bash",
"-c",
"execstack -q $(ldconfig -p | grep libfdb_c | awk '{print $(NF)}')",
"execstack -q $(ldconfig -p | grep libfdb_c.so | awk '{print $(NF)}')",
]
)

View File

@ -284,6 +284,12 @@ class ErrorCommitInfo(BaseInfo):
if protocol_version >= PROTOCOL_VERSION_6_3:
self.report_conflicting_keys = bb.get_bool()
if protocol_version >= PROTOCOL_VERSION_7_1:
lock_aware = bb.get_bool()
if bb.get_bool():
spanId = bb.get_bytes(16)
class UnsupportedProtocolVersionError(Exception):
def __init__(self, protocol_version):
super().__init__("Unsupported protocol version 0x%0.2X" % protocol_version)

View File

@ -0,0 +1,5 @@
# ThreadSanitizer suppressions file for FDB
# https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions
# FDB signal handler is not async-signal safe
signal:crashHandler

View File

@ -20,7 +20,7 @@ Data distribution manages the lifetime of storage servers, decides which storage
**RelocateShard (`struct RelocateShard`)**: A `RelocateShard` records the key range that need to be moved among servers and the data movements priority. DD always move shards with higher priorities first.
**Data distribution queue (`struct DDQueueData`)**: It receives shards to be relocated (i.e., RelocateShards), decides which shard should be moved to which server team, prioritizes the data movement based on relocate shards priority, and controls the progress of data movement based on servers workload.
**Data distribution queue (`struct DDQueue`)**: It receives shards to be relocated (i.e., RelocateShards), decides which shard should be moved to which server team, prioritizes the data movement based on relocate shards priority, and controls the progress of data movement based on servers workload.
**Special keys in the system keyspace**: DD saves its state in the system keyspace to recover from failure and to ensure every process (e.g., commit proxies, tLogs and storage servers) has a consistent view of which storage server is responsible for which key range.
@ -153,3 +153,25 @@ CPU utilization. This metric is in a positive relationship with “FinishedQueri
* The typical movement size under a read-skew scenario is 100M ~ 600M under default KNOB value `READ_REBALANCE_MAX_SHARD_FRAC=0.2, READ_REBALANCE_SRC_PARALLELISM = 20`. Increasing those knobs may accelerate the converge speed with the risk of data movement churn, which overwhelms the destination and over-cold the source.
* The upper bound of `READ_REBALANCE_MAX_SHARD_FRAC` is 0.5. Any value larger than 0.5 can result in hot server switching.
* When needing a deeper diagnosis of the read aware DD, `BgDDMountainChopper_New`, and `BgDDValleyFiller_New` trace events are where to go.
## Data Distribution Diagnosis Q&A
* Why Read-aware DD hasn't been triggered when there's a read imbalance?
* Check `BgDDMountainChopper_New`, `BgDDValleyFiller_New` `SkipReason` field.
* The Read-aware DD is triggered, and some data movement happened, but it doesn't help the read balance. Why?
* Need to figure out which server is selected as the source and destination. The information is in `BgDDMountainChopper*`, `BgDDValleyFiller*` `DestTeam` and `SourceTeam` field.
* Also, the `DDQueueServerCounter` event tells how many times a server being a source or destination (defined in
```c++
enum CountType : uint8_t { ProposedSource = 0, QueuedSource, LaunchedSource, LaunchedDest };
```
) for different relocation reason (`Other`, `RebalanceDisk` and so on) in different phase within `DD_QUEUE_COUNTER_REFRESH_INTERVAL` (default 60) seconds. For example,
```xml
<Event Severity="10" Time="1659974950.984176" DateTime="2022-08-08T16:09:10Z" Type="DDQueueServerCounter" ID="0000000000000000" ServerId="0000000000000004" OtherPQSD="0 1 3 2" RebalanceDiskPQSD="0 0 1 4" RebalanceReadPQSD="2 0 0 5" MergeShardPQSD="0 0 1 0" SizeSplitPQSD="0 0 5 0" WriteSplitPQSD="1 0 0 0" ThreadID="9733255463206053180" Machine="0.0.0.0:0" LogGroup="default" Roles="TS" />
```
`RebalanceReadPQSD="2 0 0 5"` means server `0000000000000004` has been selected as for read balancing for twice, but it's not queued and executed yet. This server also has been a destination for read balancing for 5 times in the past 1 min. Note that the field will be skipped if all 4 numbers are 0. To avoid spammy traces, if is enabled with knob `DD_QUEUE_COUNTER_SUMMARIZE = true`, event `DDQueueServerCounterTooMany` will summarize the unreported servers that involved in launched relocations (aka. `LaunchedSource`, `LaunchedDest` count are non-zero):
```xml
<Event Severity="10" Time="1660095057.995837" DateTime="2022-08-10T01:30:57Z" Type="DDQueueServerCounterTooMany" ID="0000000000000000" RemainedLaunchedSources="000000000000007f,00000000000000d9,00000000000000e8,000000000000014c,0000000000000028,00000000000000d6,0000000000000067,000000000000003e,000000000000007d,000000000000000a,00000000000000cb,0000000000000106,00000000000000c1,000000000000003c,000000000000016e,00000000000000e4,000000000000013c,0000000000000016,0000000000000179,0000000000000061,00000000000000c2,000000000000005a,0000000000000001,00000000000000c9,000000000000012a,00000000000000fb,0000000000000146," RemainedLaunchedDestinations="0000000000000079,0000000000000115,000000000000018e,0000000000000167,0000000000000135,0000000000000139,0000000000000077,0000000000000118,00000000000000bb,0000000000000177,00000000000000c0,000000000000014d,000000000000017f,00000000000000c3,000000000000015c,00000000000000fb,0000000000000186,0000000000000157,00000000000000b6,0000000000000072,0000000000000144," ThreadID="1322639651557440362" Machine="0.0.0.0:0" LogGroup="default" Roles="TS" />
```
* How to track the lifecycle of a relocation attempt for balancing?
* First find the TraceId fields in `BgDDMountainChopper*`, `BgDDValleyFiller*`, which indicates a relocation is triggered.
* (Only when enabled) Find the `QueuedRelocation` event with the same `BeginPair` and `EndPair` as the original `TraceId`. This means the relocation request is queued.
* Find the `RelocateShard` event whose `BeginPair`, `EndPair` field is the same as `TraceId`. This event means the relocation is ongoing.

420
design/dynamic-knobs.md Normal file
View File

@ -0,0 +1,420 @@
# Dynamic Knobs
This document is largely adapted from original design documents by Markus
Pilman and Trevor Clinkenbeard.
## Background
FoundationDB parameters control the behavior of the database, including whether
certain features are available and the value of internal constants. Parameters
will be referred to as knobs for the remainder of this document. Currently,
these knobs are configured through arguments passed to `fdbserver` processes,
often controlled by `fdbmonitor`. This has a number of problems:
1. Updating knobs involves updating `foundationdb.conf` files on each host in a
cluster. This has a lot of overhead and typically requires external tooling
for large scale changes.
2. All knob changes require a process restart.
3. We can't easily track the history of knob changes.
## Overview
The dynamic knobs project creates a strictly serializable quorum-based
configuration database stored on the coordinators. Each `fdbserver` process
specifies a configuration path and applies knob overrides from the
configuration database for its specified classes.
### Caveats
The configuration database explicitly does not support the following:
1. A high load. The update rate, while not specified, should be relatively low.
2. A large amount of data. The database is meant to be relatively small (under
one megabyte). Data is not sharded and every coordinator stores a complete
copy.
3. Concurrent writes. At most one write can succeed at a time, and clients must
retry their failed writes.
## Design
### Configuration Path
Each `fdbserver` process can now include a `--config_path` argument specifying
its configuration path. A configuration path is a hierarchical list of
configuration classes specifying which knob overrides the `fdbserver` process
should apply from the configuration database. For example:
```bash
$ fdbserver --config_path classA/classB/classC ...
```
Knob overrides follow descending priority:
1. Manually specified command line knobs.
2. Individual configuration class overrides.
* Subdirectories override parent directories. For example, if the
configuration path is `az-1/storage/gp3`, the `gp3` configuration takes
priority over the `storage` configuration, which takes priority over the
`az-1` configuration.
3. Global configuration knobs.
4. Default knob values.
#### Example
For example, imagine an `fdbserver` process run as follows:
```bash
$ fdbserver --datadir /mnt/fdb/storage/4500 --logdir /var/log/foundationdb --public_address auto:4500 --config_path az-1/storage/gp3 --knob_disable_asserts false
```
And the configuration database contains:
| ConfigClass | KnobName | KnobValue |
|-------------|---------------------|-----------|
| az-2 | page_cache_4k | 8e9 |
| storage | min_trace_severity | 20 |
| az-1 | compaction_interval | 280 |
| storage | compaction_interval | 350 |
| az-1 | disable_asserts | true |
| \<global\> | max_metric_size | 5000 |
| gp3 | max_metric_size | 1000 |
The final configuration for the process will be:
| KnobName | KnobValue | Explanation |
|---------------------|-------------|-------------|
| page_cache_4k | \<default\> | The configuration database knob override for `az-2` is ignored, so the compiled default is used |
| min_trace_severity | 20 | Because the `storage` configuration class is part of the processs configuration path, the corresponding knob override is applied from the configuration database |
| compaction_interval | 350 | The `storage` knob override takes precedence over the `az-1` knob override |
| disable_asserts | false | This knob is manually overridden, so all other overrides are ignored |
| max_metric_size | 1000 | Knob overrides for specific configuration classes take precedence over global knob overrides, so the global override is ignored |
### Clients
Clients can write to the configuration database using transactions.
Configuration database transactions are differentiated from regular
transactions through specification of the `USE_CONFIG_DATABASE` database
option.
In configuration transactions, the client uses the tuple layer to interact with
the configuration database. Keys are tuples of size two, where the first item
is the configuration class being written, and the second item is the knob name.
The value should be specified as a string. It will be converted to the
appropriate type based on the declared type of the knob being set.
Below is a sample Python script to write to the configuration database.
```python
import fdb
fdb.api_version(720)
@fdb.transactional
def set_knob(tr, knob_name, knob_value, config_class, description):
tr['\xff\xff/description'] = description
tr[fdb.tuple.pack((config_class, knob_name,))] = knob_value
# This function performs two knob changes transactionally.
@fdb.transactional
def set_multiple_knobs(tr):
tr['\xff\xff/description'] = 'description'
tr[fdb.tuple.pack((None, 'min_trace_severity',))] = '10'
tr[fdb.tuple.pack(('az-1', 'min_trace_severity',))] = '20'
db = fdb.open()
db.options.set_use_config_database()
set_knob(db, 'min_trace_severity', '10', None, 'description')
set_knob(db, 'min_trace_severity', '20', 'az-1', 'description')
```
### Disable the Configuration Database
The configuration database includes both client and server changes and is
enabled by default. Thus, to disable the configuration database, changes must
be made to both.
#### Server
The configuration database can be disabled by specifying the ``fdbserver``
command line option ``--no-config-db``. Note that this option must be specified
for *every* ``fdbserver`` process.
#### Client
The only client change from the configuration database is as part of the change
coordinators command. The change coordinators command is not considered
successful until the configuration database is readable on the new
coordinators. This will cause the change coordinators command to hang if run
against a database with dynamic knobs disabled. To disable the client side
configuration database liveness check, specify the ``--no-config-db`` flag when
changing coordinators. For example:
```
fdbcli> coordinators auto --no-config-db
```
## Status
The current state of the configuration database is output as part of `status
json`. The configuration path for each process can be determined from the
``command_line`` key associated with each process.
Sample from ``status json``:
```
"configuration_database" : {
"commits" : [
{
"description" : "set some knobs",
"timestamp" : 1659570000,
"version" : 1
},
{
"description" : "make some other changes",
"timestamp" : 1659570000,
"version" : 2
}
],
"last_compacted_version" : 0,
"most_recent_version" : 2,
"mutations" : [
{
"config_class" : "<global>",
"knob_name" : "min_trace_severity",
"knob_value" : "int:5",
"type" : "set",
"version" : 1
},
{
"config_class" : "<global>",
"knob_name" : "compaction_interval",
"knob_value" : "double:30.000000",
"type" : "set",
"version" : 1
},
{
"config_class" : "az-1",
"knob_name" : "compaction_interval",
"knob_value" : "double:60.000000",
"type" : "set",
"version" : 1
},
{
"config_class" : "<global>",
"knob_name" : "compaction_interval",
"type" : "clear",
"version" : 2
},
{
"config_class" : "<global>",
"knob_name" : "update_node_timeout",
"knob_value" : "double:4.000000",
"type" : "set",
"version" : 2
}
],
"snapshot" : {
"<global>" : {
"min_trace_severity" : "int:5",
"update_node_timeout" : "double:4.000000"
},
"az-1" : {
"compaction_interval" : "double:60.000000"
}
}
}
```
After compaction, ``status json`` would show:
```
"configuration_database" : {
"commits" : [
],
"last_compacted_version" : 2,
"most_recent_version" : 2,
"mutations" : [
],
"snapshot" : {
"<global>" : {
"min_trace_severity" : "int:5",
"update_node_timeout" : "double:4.000000"
},
"az-1" : {
"compaction_interval" : "double:60.000000"
}
}
}
```
## Detailed Implementation
The configuration database is implemented as a replicated state machine living
on the coordinators. This allows configuration database transactions to
continue to function in the event of a catastrophic loss of the transaction
subsystem.
To commit a transaction, clients run the two phase Paxos protocol. First, the
client asks for a live version from a quorum of coordinators. When a
coordinator receives a request for its live version, it increments its local
live version by one and returns it to the client. Then, the client submits its
writes at the live version it received in the previous step. A coordinator will
accept the commit if it is still on the same live version. If a majority of
coordinators accept the commit, it is considered committed.
### Coordinator
Each coordinator runs a ``ConfigNode`` which serves as a replica storing one
full copy of the configuration database. Coordinators never communicate with
other coordinators while processing configuration database transactions.
Instead, the client runs the transaction and determines when it has quorum
agreement.
Coordinators serve the following ``ConfigTransactionInterface`` to allow
clients to read from and write to the configuration database.
#### ``ConfigTransactionInterface``
| Request | Request fields | Reply fields | Explanation |
|------------------|----------------------------------------------------------------|-----------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| GetGeneration | (coordinatorsHash) | (generation) or (coordinators_changed error) | Get a new read version. This read version is used for all future requests in the transaction |
| Get | (configuration class, knob name, coordinatorsHash, generation) | (knob value or empty) or (coordinators_changed error) or (transaction_too_old error) | Returns the current value stored at the specified configuration class and knob name, or empty if no value exists |
| GetConfigClasses | (coordinatorsHash, generation) | (configuration classes) or (coordinators_changed error) or (transaction_too_old error) | Returns a list of all configuration classes stored in the configuration database |
| GetKnobs | (configuration class, coordinatorsHash, generation) | (knob names) or (coordinators_changed error) or (transaction_too_old error) | Returns a list of all knob names stored for the provided configuration class |
| Commit | (mutation list, coordinatorsHash, generation) | ack or (coordinators_changed error) or (commit_unknown_result error) or (not_committed error) | Commit mutations set by the transaction |
Coordinators also serve the following ``ConfigFollowerInterface`` to provide
access to (and modification of) their current state. Most interaction through
this interface is done by the cluster controller through its
``IConfigConsumer`` implementation living on the ``ConfigBroadcaster``.
#### ``ConfigFollowerInterface``
| Request | Request fields | Reply fields | Explanation |
|-----------------------|----------------------------------------------------------------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| GetChanges | (lastSeenVersion, mostRecentVersion) | (mutation list, version) or (version_already_compacted error) or (process_behind error) | Request changes since the last seen version, receive a new most recent version, as well as recent mutations |
| GetSnapshotAndChanges | (mostRecentVersion) | (snapshot, snapshotVersion, changes) | Request the full configuration database, in the form of a base snapshot and changes to apply on top of the snapshot |
| Compact | (version) | ack | Compact mutations up to the provided version |
| Rollforward | (rollbackTo, lastKnownCommitted, target, changes, specialZeroQuorum) | ack or (version_already_compacted error) or (transaction_too_old error) | Rollback/rollforward mutations on a node to catch it up with the majority |
| GetCommittedVersion | () | (registered, lastCompacted, lastLive, lastCommitted) | Request version information from a ``ConfigNode`` |
| Lock | (coordinatorsHash) | ack | Lock a ``ConfigNode`` to prevent it from serving requests during a coordinator change |
### Cluster Controller
The cluster controller runs a singleton ``ConfigBroadcaster`` which is
responsible for periodically polling the ``ConfigNode``s for updates, then
broadcasting these updates to workers through the ``ConfigBroadcastInterface``.
When workers join the cluster, they register themselves and their
``ConfigBroadcastInterface`` with the broadcaster. The broadcaster then pushes
new updates to registered workers.
The ``ConfigBroadcastInterface`` is also used by ``ConfigNode``s to register
with the ``ConfigBroadcaster``. ``ConfigNode``s need to register with the
broadcaster because the broadcaster decides when the ``ConfigNode`` may begin
serving requests, based on global information about status of other
``ConfigNode``s. For example, if a system with three ``ConfigNode``s suffers a
fault where one ``ConfigNode`` loses data, the faulty ``ConfigNode`` should
not be allowed to begin serving requests again until it has been rolled forward
and is up to date with the latest state of the configuration database.
#### ``ConfigBroadcastInterface``
| Request | Request fields | Reply fields | Explanation |
|------------|------------------------------------------------------------|-------------------------------|---------------------------------------------------------------------------------------------|
| Snapshot | (snapshot, version, restartDelay) | ack | A snapshot of the configuration database sent by the broadcaster to workers |
| Changes | (changes, mostRecentVersion, restartDelay) | ack | A list of changes up to and including mostRecentVersion, sent by the broadcaster to workers |
| Registered | () | (registered, lastSeenVersion) | Sent by the broadcaster to new ``ConfigNode``s to determine their registration status |
| Ready | (snapshot, snapshotVersion, liveVersion, coordinatorsHash) | ack | Sent by the broadcaster to new ``ConfigNode``s to allow them to start serving requests |
### Worker
Each worker runs a ``LocalConfiguration`` instance which receives and applies
knob updates from the ``ConfigBroadcaster``. The local configuration maintains
a durable ``KeyValueStoreMemory`` containing the following:
* The latest known configuration version
* The most recently used configuration path
* All knob overrides corresponding to the configuration path at the latest known version
Once a worker starts, it will:
* Apply manually set knobs
* Read its local configuration file
* If the stored configuration path does not match the configuration path
specified on the command line, delete the local configuration file
* Otherwise, apply knob updates from the local configuration file. Manually
specified knobs will not be overridden
* Register with the broadcaster to receive new updates for its configuration
classes
* Persist these updates when received and restart if necessary
### Knob Atomicity
All knobs are classified as either atomic or non-atomic. Atomic knobs require a
process restart when changed, while non-atomic knobs do not.
### Compaction
``ConfigNode``s store individual mutations in order to be able to update other,
out of date ``ConfigNode``s without needing to send a full snapshot. Each
configuration database commit also contains additional metadata such as a
timestamp and a text description of the changes being made. To keep the size of
the configuration database manageable, a compaction process runs periodically
(defaulting to every five minutes) which compacts individual mutations into a
simplified snapshot of key-value pairs. Compaction is controlled by the
``ConfigBroadcaster``, using information it peridiodically requests from
``ConfigNode``s. Compaction will only compact up to the minimum known version
across *all* ``ConfigNode``s. This means that if one ``ConfigNode`` is
permanently partitioned from the ``ConfigBroadcaster`` or from clients, no
compaction will ever take place.
### Rollback / Rollforward
It is necessary to be able to roll ``ConfigNode``s backward and forward with
respect to their committed versions due to the nature of quorum logic and
unreliable networks.
Consider a case where a client commit gets persisted durably on one out of
three ``ConfigNode``s (assume commit messages to the other two nodes are lost).
Since the value is not committed on a majority of ``ConfigNode``s, it cannot be
considered committed. But it is also incorrect to have the value persist on one
out of three nodes as future commits are made. In this case, the most common
result is that the ``ConfigNode`` will be rolled back when the next commit from
a different client is made, and then rolled forward to contain the data from
the commit. ``PaxosConfigConsumer`` contains logic to recognize ``ConfigNode``
minorities and update them to match the quorum.
### Changing Coordinators
Since the configuration database lives on the coordinators and the
[coordinators can be
changed](https://apple.github.io/foundationdb/configuration.html#configuration-changing-coordination-servers),
it is necessary to copy the configuration database from the old to the new
coordinators during such an event. A coordinator change performs the following
steps in regards to the configuration database:
1. Write ``\xff/coordinatorsKey`` with the new coordinators string. The key
``\xff/previousCoordinators`` contains the current (old) set of
coordinators.
2. Lock the old ``ConfigNode``s so they can no longer serve client requests.
3. Start a recovery, causing a new cluster controller (and therefore
``ConfigBroadcaster``) to be selected.
4. Read ``\xff/previousCoordinators`` on the ``ConfigBroadcaster`` and, if
present, read an up-to-date snapshot of the configuration database on the
old coordinators.
5. Determine if each registering ``ConfigNode`` needs an up-to-date snapshot of
the configuration database sent to it, based on its reported version and the
snapshot version of the database received from the old coordinators.
* Some new coordinators which were also coordinators in the previous
configuration may not need a snapshot.
6. Send ready requests to new ``ConfigNode``s, including an up-to-date snapshot
if necessary. This allows the new coordinators to begin serving
configuration database requests from clients.
## Testing
The ``ConfigDatabaseUnitTests`` class unit test a number of different
configuration database dimensions.
The ``ConfigIncrement`` workload tests contention between clients attempting to
write to the configuration database, paired with machine failure and
coordinator changes.

View File

@ -125,6 +125,3 @@ In each test, the `GlobalTagThrottlerTesting::monitor` function is used to perio
On the ratekeeper, every `SERVER_KNOBS->TAG_THROTTLE_PUSH_INTERVAL` seconds, the ratekeeper will call `GlobalTagThrottler::getClientRates`. At the end of the rate calculation for each tag, a trace event of type `GlobalTagThrottler_GotClientRate` is produced. This trace event reports the relevant inputs that went in to the rate calculation, and can be used for debugging.
On storage servers, every `SERVER_KNOBS->TAG_MEASUREMENT_INTERVAL` seconds, there are `BusyReadTag` events for every tag that has sufficient read cost to be reported to the ratekeeper. Both cost and fractional busyness are reported.
### Status
For each storage server, the busiest read tag is reported in the full status output, along with its cost and fractional busyness.

View File

@ -14,8 +14,12 @@ Detailed FoundationDB Architecture
The FoundationDB architecture chooses a decoupled design, where
processes are assigned different heterogeneous roles (e.g.,
Coordinators, Storage Servers, Master). Scaling the database is achieved
by horizontally expanding the number of processes for separate roles:
Coordinators, Storage Servers, Master). Cluster attempts to recruit
different roles as separate processes, however, it is possible that
multiple Stateless roles gets colocated (recruited) on a single
process to meet the cluster recruitment goals. Scaling the database
is achieved by horizontally expanding the number of processes for
separate roles:
Coordinators
~~~~~~~~~~~~

View File

@ -373,3 +373,302 @@ with the ``multitest`` role:
fdbserver -r multitest -f testfile.txt
This command will block until all tests are completed.
##########
API Tester
##########
Introduction
============
API tester is a framework for implementing end-to-end tests of FDB C API, i.e. testing the API on a real
FDB cluster through all layers of the FDB client. Its executable is ``fdb_c_api_tester``, and the source
code is located in ``bindings/c/test/apitester``. The structure of API Tests is similar to that of the
Simulation Tests. The tests are implemented as workloads using FDB API, which are all built into the
``fdb_c_api_tester``. A concrete test configuration is defined as a TOML file, which specifies the
combination of workloads to be executed by the test together with their parameters. The test can be then
executed by passing the TOML file as a parameter to ``fdb_c_api_tester``.
Since simulation tests rely on the actor model to execute the tests deterministically in single-threaded
mode, they are not suitable for testing various multi-threaded aspects of the FDB client. End-to-end API
tests complement the simulation tests by testing the FDB Client layers above the single-threaded Native
Client.
- The specific testing goals of the end-to-end tests are:
- Check functional correctness of the Multi-Version Client (MVC) and Thread-Safe Client
- Detecting race conditions. They can be caused by accessing the state of the Native Client from wrong
threads or introducing other shared state without proper synchronization
- Detecting memory management errors. Thread-safe reference counting must be used where necessary. MVC
works with multiple client libraries. Memory allocated by one client library must be also deallocated
by the same library.
- Maintaining interoperability with other client versions. The client functionality is made available
depending on the selected API version. The API changes are correctly adapted.
- Client API behaves correctly in case of cluster upgrades. Database and transaction state is correctly
migrated to the upgraded connections. Pending operations are canceled and successfully retried on the
upgraded connections.
Implementing a Workload
=======================
Each workload is declared as a direct or indirect subclass of ``WorkloadBase`` implementing a constructor
with ``WorkloadConfig`` as a parameter and the method ``start()``, which defines the entry point of the
workload.
``WorkloadBase`` provides a set of methods that serve as building blocks for implementation of a workload:
.. function:: execTransaction(start, cont, failOnError = true)
creates and executes an FDB transaction. Here ``start`` is a function that takes a transaction context
as parameter and implements the starting point of the transaction, and ``cont`` is a function implementing
a continuation to be executed after finishing the transaction execution. Transactions are automatically
retried on retryable errors. Transactions are retried by calling the ``start`` function again. In case
of a fatal error, the entire workload is considered as failed unless ``failOnError`` is set to ``false``.
.. function:: schedule(task)
schedules a task for asynchronous execution. It is usually used in the continuations to schedule
the next step of the workload.
.. function:: info(msg)
error(msg)
are used for logging a message with a tag identifying the workload. Issuing an error message marks
the workload as failed.
The transaction context provides methods for implementation of the transaction logics:
.. function:: tx()
the reference to the FDB transaction object
.. function:: continueAfter(future, cont, retryOnError = true)
set a continuation to be executed when the future is ready. The ``retryOnError`` flag controls whether
the transaction should be automatically retried in case the future results in a retriable error.
.. function:: continueAfterAll(futures, cont)
takes a vector of futures and sets a continuation to be executed when all of the futures get ready.
The transaction is retried if at least one of the futures results in an error. This method is useful
for handling multiple concurrent reads.
.. function:: commit()
commit and finish the transaction. If the commit is successful, the execution proceeds to the
continuation of ``execTransaction()``. In case of a retriable error the transaction is
automatically retried. A fatal error results in a failure of the workoad.
.. function:: done()
finish the transaction without committing. This method should be used to finish read transactions.
The transaction gets destroyed and execution proceeds to the continuation of ``execTransaction()``.
Each transaction must be finished either by ``commit()`` or ``done()``, because otherwise
the framework considers that the transaction is still being executed, so it won't destroy it and
won't call the continuation.
.. function:: onError(err)
Handle an error: restart the transaction in case of a retriable error, otherwise fail the workload.
This method is typically used in the continuation of ``continueAfter`` called with
``retryOnError=false`` as a fallback to the default error handling.
A workload execution ends automatically when it is marked as failed or its last continuation does not
schedule any new task or transaction.
The workload class should be defined in the namespace FdbApiTester. The file name convention is
``Tester{Name}Workload.cpp`` so that we distinguish them from the source files of simulation workloads.
Basic Workload Example
======================
The code below implements a workload that consists of only two transactions. The first one sets a
randomly generated key to a randomly generated value, and the second one reads the key and checks if
the returned value matches the written one.
.. literalinclude:: ../../../bindings/c/test/apitester/TesterExampleWorkload.cpp
:language: C++
:lines: 21-
The workload is implemented in the method ``setAndGet``. It generates a random key and a random value
and executes a transaction that writes that key-value pair and commits. In the continuation of the
first ``execTransaction`` call, we execute the second transaction that reads the same key. The read
operation returns a future. So we call ``continueAfter`` to set a continuation for that future. In the
continuation we check if the returned value matches the written one and finish the transaction by
calling ``ctx->done()``. After completing the second transaction we execute the continuation passed
as parameter to the ``setAndGet`` method by the start method. In this case it is ``NO_OP_TASK``, which
does nothing and so finishes the workload.
Finally, we declare an instance ``WorkloadFactory`` to register this workload with the name ``SetAndGet``.
Note that we use ``workloadId`` as a key prefix. This is necessary for isolating the key space of this
workload, because the framework may be instructed to create multiple instances of the ``SetAndGet``
workload. If we do not isolate the key space, another workload can write a different value for the
same key and so break the assumption of the test.
The workload is implemented using the internal C++ API, implemented in ``fdb_api.hpp``. It introduces
a set of classes representing the FDB objects (transactions, futures, etc.). These classes provide C++-style
methods wrapping FDB C API calls and automate memory management by means of reference counting.
Implementing Control Structures
===============================
Our basic workload executes just 2 transactions, but in practice we want to have workloads that generate
multiple transactions. The following code demonstrates how we can modify our basic workload to generate
multiple transactions in a loop.
.. code-block:: C++
class SetAndGetWorkload : public WorkloadBase {
public:
...
int numIterations;
int iterationsLeft;
SetAndGetWorkload(const WorkloadConfig& config) : WorkloadBase(config) {
keyPrefix = fdb::toBytesRef(fmt::format("{}/", workloadId));
numIterations = config.getIntOption("numIterations", 1000);
}
void start() override {
iterationsLeft = numIterations;
setAndGetLoop();
}
void setAndGetLoop() {
if (iterationsLeft == 0) {
return;
}
iterationsLeft--;
setAndGet([this]() { setAndGetLoop(); });
}
...
}
We introduce a workload parameter ``numIterations`` to specify the number of iterations. If not specified
in the test configuration it defaults to 1000.
The method ``setAndGetLoop`` implements the loop that decrements iterationsLeft counter until it reaches 0
and each iteration calls setAndGet with a continuation that returns the execution to the loop. As you
can see we don't need any change in ``setAndGet``, just call it with another continuation.
The pattern of passing a continuation as a parameter also can be used to decompose the workload into a
sequence of steps. For example, we can introduce setup and cleanUp steps to our workload and modify the
``setAndGetLoop`` to make it composable with an arbitrary continuation:
.. code-block:: C++
void start() override {
setup([this](){
iterationsLeft = numIterations;
setAndGetLoop([this](){
cleanup(NO_OP_TASK);
});
});
}
void setAndGetLoop(TTaskFct cont) {
if (iterationsLeft == 0) {
schedule(cont);
}
iterationsLeft--;
setAndGet([this, cont]() { setAndGetLoop(cont); });
}
void setup(TTaskFct cont) { ... }
void cleanup(TTaskFct cont) { ... }
Note that we call ``schedule(cont)`` in ``setAndGetLoop`` instead of calling the continuation directly.
In this way we avoid keeping ``setAndGetLoop`` in the call stack, when executing the next step.
Subclassing ApiWorkload
=======================
``ApiWorkload`` is an abstract subclass of ``WorkloadBase`` that provides a framework for a typical
implementation of API test workloads. It implements a workflow consisting of cleaning up the key space
of the workload, populating it with newly generated data and then running a loop consisting of random
database operations. The concrete subclasses of ``ApiWorkload`` are expected to override the method
``randomOperation`` with an implementation of concrete random operations.
The ``ApiWorkload`` maintains a local key-value store that mirrors the part of the database state
relevant to the workload. A successful database write operation should be followed by a continuation
that performs equivalent changes in the local store, and the results of a database read operation should
be validated against the values from the local store.
Test Configuration
==================
A concrete test configuration is specified by a TOML file. The file must contain one ``[[test]]`` section
specifying the general settings for test execution followed by one or more ``[[test.workload]]``
configuration sessions, specifying the workloads to be executed and their parameters. The specified
workloads are started all at once and executed concurrently.
The ``[[test]]`` section can contain the following options:
- ``title``: descriptive title of the test
- ``multiThreaded``: enable multi-threading (default: false)
- ``minFdbThreads`` and ``maxFdbThreads``: the number of FDB (network) threads to be randomly selected
from the given range (default: 1-1). Used only if ``multiThreaded=true``. It is also important to use
multiple database instances to make use of the multithreading.
- ``minDatabases`` and ``maxDatabases``: the number of database instances to be randomly selected from
the given range (default 1-1). The transactions of all workloads are randomly load-balanced over the
pool of database instances.
- ``minClients`` and ``maxClients``: the number of clients, i.e. instances of each workload, to be
randomly selected from the given range (default 1-8).
- ``minClientThreads`` and ``maxClientThreads``: the number of client threads, i.e. the threads used
for execution of the workload, to be randomly selected from the given range (default 1-1).
- ``blockOnFutures``: use blocking waits on futures instead of scheduling future callbacks asynchronously
(default: false)
- ``buggify``: Enable client-side failure injection (default: false)
- ``databasePerTransaction``: Create a separate database instance for each transaction (default: false).
It is a special mode useful for testing bugs related to creation and destruction of database instances.
- ``fdbCallbacksOnExternalThreads``: Enables the option ``FDB_NET_OPTION_CALLBACKS_ON_EXTERNAL_THREADS``
causting the callbacks of futures to be executed directly on the threads of the external FDB clients
rather than on the thread of the local FDB client.
The workload section ``[[test.workload]]`` must contain the attribute name matching the registered name
of the workload to be executed. Other options are workload-specific.
The subclasses of the ``ApiWorkload`` inherit the following configuration options:
- ``minKeyLength`` and ``maxKeyLength``: the size range of randomly generated keys (default: 1-64)
- ``minValueLength`` and ``maxValueLength``: the size range of randomly generated values
(default: 1-1000)
- ``maxKeysPerTransaction``: the maximum number of keys per transaction (default: 50)
- ``initialSize``: the number of key-value pairs in the initially populated database (default: 1000)
- ``readExistingKeysRatio``: the probability of choosing an existing key for read operations
(default: 0.9)
- ``numRandomOperations``: the number of random operations to be executed per workload (default: 1000)
- ``runUntilStop``: run the workload indefinitely until the stop command is received (default: false).
This execution mode in upgrade tests and other scripted tests, where the workload needs to
be generated continously until completion of the scripted test.
- ``numOperationsForProgressCheck``: the number of operations to be performed to confirm a progress
check (default: 10). This option is used in combination with ``runUntilStop``. Progress checks are
initiated by a test script to check if the client workload is successfully progressing after a
cluster change.
Executing the Tests
===================
The ``fdb_c_api_tester`` executable takes a single TOML file as a parameter and executes the test
according to its specification. Before that we must create a FDB cluster and pass its cluster file as
a parameter to ``fdb_c_api_tester``. Note that multithreaded tests also need to be provided with an
external client library.
For example, we can create a temporary cluster and use it for execution of one of the existing API tests:
.. code-block:: bash
${srcDir}/tests/TestRunner/tmp_cluster.py --build-dir ${buildDir} -- \
${buildDir}/bin/fdb_c_api_tester \
--cluster-file @CLUSTER_FILE@ \
--external-client-library=${buildDir}/bindings/c/libfdb_c_external.so \
--test-file ${srcDir}/bindings/c/test/apitester/tests/CApiCorrectnessMultiThr.toml
The test specifications added to the ``bindings/c/test/apitester/tests/`` directory are executed as a part
of the regression test suite. They can be executed using the ``ctest`` target ``fdb_c_api_tests``:
.. code-block:: bash
ctest -R fdb_c_api_tests -VV

View File

@ -416,6 +416,9 @@ FoundationDB will never use processes on the same machine for the replication of
``three_data_hall`` mode
FoundationDB stores data in triplicate, with one copy on a storage server in each of three data halls. The transaction logs are replicated four times, with two data halls containing two replicas apiece. Four available machines (two in each of two data halls) are therefore required to make progress. This configuration enables the cluster to remain available after losing a single data hall and one machine in another data hall.
``three_data_hall_fallback`` mode
FoundationDB stores data in duplicate, with one copy each on a storage server in two of three data halls. The transaction logs are replicated four times, with two data halls containing two replicas apiece. Four available machines (two in each of two data halls) are therefore required to make progress. This configuration is similar to ``three_data_hall``, differing only in that data is stored on two instead of three replicas. This configuration is useful to unblock data distribution when a data hall becomes temporarily unavailable. Because ``three_data_hall_fallback`` reduces the redundancy level to two, it should only be used as a temporary measure to restore cluster health during a datacenter outage.
Datacenter-aware mode
---------------------

View File

@ -379,7 +379,9 @@
"log_server_min_free_space",
"log_server_min_free_space_ratio",
"storage_server_durability_lag",
"storage_server_list_fetch_failed"
"storage_server_list_fetch_failed",
"blob_worker_lag",
"blob_worker_missing"
]
},
"description":"The database is not being saturated by the workload."
@ -400,7 +402,9 @@
"log_server_min_free_space",
"log_server_min_free_space_ratio",
"storage_server_durability_lag",
"storage_server_list_fetch_failed"
"storage_server_list_fetch_failed",
"blob_worker_lag",
"blob_worker_missing"
]
},
"description":"The database is not being saturated by the workload."
@ -599,7 +603,7 @@
"counter":0,
"roughness":0.0
},
"memory_errors":{ // measures number of proxy_memory_limit_exceeded errors
"memory_errors":{ // measures number of (commit/grv)_proxy_memory_limit_exceeded errors
"hz":0.0,
"counter":0,
"roughness":0.0

View File

@ -131,6 +131,9 @@ min_free_space_ratio Running out of space (approaching 5% limit).
log_server_min_free_space Log server running out of space (approaching 100MB limit).
log_server_min_free_space_ratio Log server running out of space (approaching 5% limit).
storage_server_durability_lag Storage server durable version falling behind.
storage_server_list_fetch_failed Unable to fetch storage server list.
blob_worker_lag Blob worker granule version falling behind.
blob_worker_missing No blob workers are reporting metrics.
=================================== ====================================================
The JSON path ``cluster.qos.throttled_tags``, when it exists, is an Object containing ``"auto"`` , ``"manual"`` and ``"recommended"``. The possible fields for those object are in the following table:

View File

@ -2,6 +2,30 @@
Release Notes
#############
7.1.21
======
* Same as 7.1.20 release with AVX enabled.
7.1.20
======
* Released with AVX disabled.
* Fixed missing localities for fdbserver that can cause cross DC calls among storage servers. `(PR #7995) <https://github.com/apple/foundationdb/pull/7995>`_
* Removed extremely spammy trace event in FetchKeys and fixed transaction_profiling_analyzer.py. `(PR #7934) <https://github.com/apple/foundationdb/pull/7934>`_
* Fixed bugs when GRV proxy returns an error. `(PR #7860) <https://github.com/apple/foundationdb/pull/7860>`_
7.1.19
======
* Same as 7.1.18 release with AVX enabled.
7.1.18
======
* Released with AVX disabled.
* Added knobs for the minimum and the maximum of the Ratekeeper's default priority. `(PR #7820) <https://github.com/apple/foundationdb/pull/7820>`_
* Fixed bugs in ``getRange`` of the special key space. `(PR #7778) <https://github.com/apple/foundationdb/pull/7778>`_, `(PR #7720) <https://github.com/apple/foundationdb/pull/7720>`_
* Added debug ID for secondary queries in index prefetching. `(PR #7755) <https://github.com/apple/foundationdb/pull/7755>`_
* Changed hostname resolving to prefer IPv6 addresses. `(PR #7750) <https://github.com/apple/foundationdb/pull/7750>`_
* Added more transaction debug events for prefetch queries. `(PR #7732) <https://github.com/apple/foundationdb/pull/7732>`_
7.1.17
======
* Same as 7.1.16 release with AVX enabled.
@ -15,7 +39,7 @@ Release Notes
* Fixed ScopeEventFieldTypeMismatch error for TLogMetrics. `(PR #7640) <https://github.com/apple/foundationdb/pull/7640>`_
* Added getMappedRange latency metrics. `(PR #7632) <https://github.com/apple/foundationdb/pull/7632>`_
* Fixed a version vector performance bug due to not updating client side tag cache. `(PR #7616) <https://github.com/apple/foundationdb/pull/7616>`_
* Fixed DiskReadSeconds and DiskWriteSeconds calculaion in ProcessMetrics. `(PR #7609) <https://github.com/apple/foundationdb/pull/7609>`_
* Fixed DiskReadSeconds and DiskWriteSeconds calculation in ProcessMetrics. `(PR #7609) <https://github.com/apple/foundationdb/pull/7609>`_
* Added Rocksdb compression and data size stats. `(PR #7596) <https://github.com/apple/foundationdb/pull/7596>`_
7.1.15
@ -74,7 +98,7 @@ Release Notes
* Added support of the reboot command in go bindings. `(PR #7270) <https://github.com/apple/foundationdb/pull/7270>`_
* Fixed several issues in profiling special keys using GlobalConfig. `(PR #7120) <https://github.com/apple/foundationdb/pull/7120>`_
* Fixed a stuck transaction system bug due to inconsistent recovery transaction version. `(PR #7261) <https://github.com/apple/foundationdb/pull/7261>`_
* Fixed a unknown_error crash due to not resolving hostnames. `(PR #7254) <https://github.com/apple/foundationdb/pull/7254>`_
* Fixed an unknown_error crash due to not resolving hostnames. `(PR #7254) <https://github.com/apple/foundationdb/pull/7254>`_
* Fixed a heap-use-after-free bug. `(PR #7250) <https://github.com/apple/foundationdb/pull/7250>`_
* Fixed a performance issue that remote TLogs are sending too many pops to log routers. `(PR #7235) <https://github.com/apple/foundationdb/pull/7235>`_
* Fixed an issue that SharedTLogs are not displaced and leaking disk space. `(PR #7246) <https://github.com/apple/foundationdb/pull/7246>`_

View File

@ -22,6 +22,8 @@ Each special key that existed before api version 630 is its own module. These ar
#. ``\xff\xff/cluster_file_path`` - See :ref:`cluster file client access <cluster-file-client-access>`
#. ``\xff\xff/status/json`` - See :doc:`Machine-readable status <mr-status>`
#. ``\xff\xff/worker_interfaces`` - key as the worker's network address and value as the serialized ClientWorkerInterface, not transactional
Prior to api version 630, it was also possible to read a range starting at ``\xff\xff/worker_interfaces``. This is mostly an implementation detail of fdbcli,
but it's available in api version 630 as a module with prefix ``\xff\xff/worker_interfaces/``.
@ -210,6 +212,7 @@ that process, and wait for necessary data to be moved away.
#. ``\xff\xff/management/options/failed_locality/force`` Read/write. Setting this key disables safety checks for writes to ``\xff\xff/management/failed_locality/<locality>``. Setting this key only has an effect in the current transaction and is not persisted on commit.
#. ``\xff\xff/management/tenant/map/<tenant>`` Read/write. Setting a key in this range to any value will result in a tenant being created with name ``<tenant>``. Clearing a key in this range will delete the tenant with name ``<tenant>``. Reading all or a portion of this range will return the list of tenants currently present in the cluster, excluding any changes in this transaction. Values read in this range will be JSON objects containing the metadata for the associated tenants.
#. ``\xff\xff/management/tenant/rename/<tenant>`` Read/write. Setting a key in this range to an unused tenant name will result in the tenant with the name ``<tenant>`` to be renamed to the value provided. If the rename operation is a transaction retried in a loop, it is possible for the rename to be applied twice, in which case ``tenant_not_found`` or ``tenant_already_exists`` errors may be returned. This can be avoided by checking for the tenant's existence first.
#. ``\xff\xff/management/options/worker_interfaces/verify`` Read/write. Setting this key will add a verification phase in reading ``\xff\xff/worker_interfaces``. Setting this key only has an effect in the current transaction and is not persisted on commit. Try to establish connections with every worker from the list returned by Cluster Controller and only return those workers that the client can connect to. This option is now only used in fdbcli commands ``kill``, ``suspend`` and ``expensive_data_check`` to populate the worker list.
An exclusion is syntactically either an ip address (e.g. ``127.0.0.1``), or
an ip address and port (e.g. ``127.0.0.1:4500``) or any locality (e.g ``locality_dcid:primary-satellite`` or

View File

@ -49,7 +49,7 @@ All operations performed within a tenant transaction will occur within the tenan
Raw access
----------
When operating in the tenant mode ``required_experimental``, transactions are not ordinarily permitted to run without using a tenant. In order to access the system keys or perform maintenance operations that span multiple tenants, it is required to use the ``RAW_ACCESS`` transaction option to access the global key-space. It is an error to specify ``RAW_ACCESS`` on a transaction that is configured to use a tenant.
When operating in the tenant mode ``required_experimental`` or using a metacluster, transactions are not ordinarily permitted to run without using a tenant. In order to access the system keys or perform maintenance operations that span multiple tenants, it is required to use the ``RAW_ACCESS`` transaction option to access the global key-space. It is an error to specify ``RAW_ACCESS`` on a transaction that is configured to use a tenant.
.. note :: Setting the ``READ_SYSTEM_KEYS`` or ``ACCESS_SYSTEM_KEYS`` options implies ``RAW_ACCESS`` for your transaction.

View File

@ -928,7 +928,7 @@ void parentWatcher(void* parentHandle) {
static void printVersion() {
printf("FoundationDB " FDB_VT_PACKAGE_NAME " (v" FDB_VT_VERSION ")\n");
printf("source version %s\n", getSourceVersion());
printf("protocol %llx\n", (long long)currentProtocolVersion.version());
printf("protocol %llx\n", (long long)currentProtocolVersion().version());
}
static void printBuildInformation() {

View File

@ -23,6 +23,7 @@
#include "fdbclient/FDBOptions.g.h"
#include "fdbclient/IClientApi.h"
#include "fdbclient/ManagementAPI.actor.h"
#include "fdbclient/NativeAPI.actor.h"
#include "flow/Arena.h"
#include "flow/FastRef.h"
@ -31,33 +32,6 @@
namespace {
// copy to standalones for krm
ACTOR Future<Void> setBlobRange(Database db, Key startKey, Key endKey, Value value) {
state Reference<ReadYourWritesTransaction> tr = makeReference<ReadYourWritesTransaction>(db);
loop {
try {
tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
tr->setOption(FDBTransactionOptions::PRIORITY_SYSTEM_IMMEDIATE);
// FIXME: check that the set range is currently inactive, and that a revoked range is currently its own
// range in the map and fully set.
tr->set(blobRangeChangeKey, deterministicRandom()->randomUniqueID().toString());
// This is not coalescing because we want to keep each range logically separate.
wait(krmSetRange(tr, blobRangeKeys.begin, KeyRange(KeyRangeRef(startKey, endKey)), value));
wait(tr->commit());
printf("Successfully updated blob range [%s - %s) to %s\n",
startKey.printable().c_str(),
endKey.printable().c_str(),
value.printable().c_str());
return Void();
} catch (Error& e) {
wait(tr->onError(e));
}
}
}
ACTOR Future<Version> getLatestReadVersion(Database db) {
state Transaction tr(db);
loop {
@ -78,7 +52,7 @@ ACTOR Future<Void> printAfterDelay(double delaySeconds, std::string message) {
return Void();
}
ACTOR Future<Void> doBlobPurge(Database db, Key startKey, Key endKey, Optional<Version> version) {
ACTOR Future<Void> doBlobPurge(Database db, Key startKey, Key endKey, Optional<Version> version, bool force) {
state Version purgeVersion;
if (version.present()) {
purgeVersion = version.get();
@ -86,7 +60,7 @@ ACTOR Future<Void> doBlobPurge(Database db, Key startKey, Key endKey, Optional<V
wait(store(purgeVersion, getLatestReadVersion(db)));
}
state Key purgeKey = wait(db->purgeBlobGranules(KeyRange(KeyRangeRef(startKey, endKey)), purgeVersion, {}));
state Key purgeKey = wait(db->purgeBlobGranules(KeyRange(KeyRangeRef(startKey, endKey)), purgeVersion, {}, force));
fmt::print("Blob purge registered for [{0} - {1}) @ {2}\n", startKey.printable(), endKey.printable(), purgeVersion);
@ -99,65 +73,10 @@ ACTOR Future<Void> doBlobPurge(Database db, Key startKey, Key endKey, Optional<V
return Void();
}
ACTOR Future<Version> checkBlobSubrange(Database db, KeyRange keyRange, Optional<Version> version) {
state Transaction tr(db);
state Version readVersionOut = invalidVersion;
loop {
try {
wait(success(tr.readBlobGranules(keyRange, 0, version, &readVersionOut)));
return readVersionOut;
} catch (Error& e) {
wait(tr.onError(e));
}
}
}
ACTOR Future<Void> doBlobCheck(Database db, Key startKey, Key endKey, Optional<Version> version) {
state Transaction tr(db);
state Version readVersionOut = invalidVersion;
state double elapsed = -timer_monotonic();
state KeyRange range = KeyRange(KeyRangeRef(startKey, endKey));
state Standalone<VectorRef<KeyRangeRef>> allRanges;
loop {
try {
wait(store(allRanges, tr.getBlobGranuleRanges(range)));
break;
} catch (Error& e) {
wait(tr.onError(e));
}
}
if (allRanges.empty()) {
fmt::print("ERROR: No blob ranges for [{0} - {1})\n", startKey.printable(), endKey.printable());
return Void();
}
fmt::print("Loaded {0} blob ranges to check\n", allRanges.size());
state std::vector<Future<Version>> checkParts;
// chunk up to smaller ranges than max
int maxChunkSize = 1000;
KeyRange currentChunk;
int currentChunkSize = 0;
for (auto& it : allRanges) {
if (currentChunkSize == maxChunkSize) {
checkParts.push_back(checkBlobSubrange(db, currentChunk, version));
currentChunkSize = 0;
}
if (currentChunkSize == 0) {
currentChunk = it;
} else if (it.begin != currentChunk.end) {
fmt::print("ERROR: Blobrange check failed, gap in blob ranges from [{0} - {1})\n",
currentChunk.end.printable(),
it.begin.printable());
return Void();
} else {
currentChunk = KeyRangeRef(currentChunk.begin, it.end);
}
currentChunkSize++;
}
checkParts.push_back(checkBlobSubrange(db, currentChunk, version));
wait(waitForAll(checkParts));
readVersionOut = checkParts.back().get();
state Version readVersionOut = wait(db->verifyBlobRange(KeyRangeRef(startKey, endKey), version));
elapsed += timer_monotonic();
@ -201,7 +120,7 @@ ACTOR Future<bool> blobRangeCommandActor(Database localDb,
fmt::print("Invalid blob range [{0} - {1})\n", tokens[2].printable(), tokens[3].printable());
} else {
if (tokencmp(tokens[1], "start") || tokencmp(tokens[1], "stop")) {
bool starting = tokencmp(tokens[1], "start");
state bool starting = tokencmp(tokens[1], "start");
if (tokens.size() > 4) {
printUsage(tokens[0]);
return false;
@ -210,9 +129,22 @@ ACTOR Future<bool> blobRangeCommandActor(Database localDb,
starting ? "Starting" : "Stopping",
tokens[2].printable().c_str(),
tokens[3].printable().c_str());
wait(setBlobRange(localDb, begin, end, starting ? LiteralStringRef("1") : StringRef()));
} else if (tokencmp(tokens[1], "purge") || tokencmp(tokens[1], "check")) {
bool purge = tokencmp(tokens[1], "purge");
state bool success = false;
if (starting) {
wait(store(success, localDb->blobbifyRange(KeyRangeRef(begin, end))));
} else {
wait(store(success, localDb->unblobbifyRange(KeyRangeRef(begin, end))));
}
if (!success) {
fmt::print("{0} blobbify range for [{1} - {2}) failed\n",
starting ? "Starting" : "Stopping",
tokens[2].printable().c_str(),
tokens[3].printable().c_str());
}
return success;
} else if (tokencmp(tokens[1], "purge") || tokencmp(tokens[1], "forcepurge") || tokencmp(tokens[1], "check")) {
bool purge = tokencmp(tokens[1], "purge") || tokencmp(tokens[1], "forcepurge");
bool forcePurge = tokencmp(tokens[1], "forcepurge");
Optional<Version> version;
if (tokens.size() > 4) {
@ -225,17 +157,18 @@ ACTOR Future<bool> blobRangeCommandActor(Database localDb,
version = v;
}
fmt::print("{0} blob range [{1} - {2})",
fmt::print("{0} blob range [{1} - {2}){3}",
purge ? "Purging" : "Checking",
tokens[2].printable(),
tokens[3].printable());
tokens[3].printable(),
forcePurge ? " (force)" : "");
if (version.present()) {
fmt::print(" @ {0}", version.get());
}
fmt::print("\n");
if (purge) {
wait(doBlobPurge(localDb, begin, end, version));
wait(doBlobPurge(localDb, begin, end, version, forcePurge));
} else {
wait(doBlobCheck(localDb, begin, end, version));
}
@ -247,8 +180,7 @@ ACTOR Future<bool> blobRangeCommandActor(Database localDb,
return true;
}
CommandFactory blobRangeFactory("blobrange",
CommandHelp("blobrange <start|stop|purge|check> <startkey> <endkey> [version]",
"",
""));
CommandFactory blobRangeFactory(
"blobrange",
CommandHelp("blobrange <start|stop|check|purge|forcepurge> <startkey> <endkey> [version]", "", ""));
} // namespace fdb_cli

View File

@ -272,6 +272,10 @@ ACTOR Future<bool> configureCommandActor(Reference<IDatabase> db,
stderr,
"WARN: Sharded RocksDB storage engine type is still in experimental stage, not yet production tested.\n");
break;
case ConfigurationResult::DATABASE_IS_REGISTERED:
fprintf(stderr, "ERROR: A cluster cannot change its tenant mode while part of a metacluster.\n");
ret = false;
break;
default:
ASSERT(false);
ret = false;

View File

@ -46,7 +46,7 @@ ACTOR Future<bool> expensiveDataCheckCommandActor(
if (tokens.size() == 1) {
// initialize worker interfaces
address_interface->clear();
wait(getWorkerInterfaces(tr, address_interface));
wait(getWorkerInterfaces(tr, address_interface, true));
}
if (tokens.size() == 1 || tokencmp(tokens[1], "list")) {
if (address_interface->size() == 0) {

View File

@ -44,7 +44,7 @@ ACTOR Future<bool> killCommandActor(Reference<IDatabase> db,
if (tokens.size() == 1) {
// initialize worker interfaces
address_interface->clear();
wait(getWorkerInterfaces(tr, address_interface));
wait(getWorkerInterfaces(tr, address_interface, true));
}
if (tokens.size() == 1 || tokencmp(tokens[1], "list")) {
if (address_interface->size() == 0) {

View File

@ -0,0 +1,432 @@
/*
* MetaclusterCommands.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbcli/fdbcli.actor.h"
#include "fdbclient/FDBOptions.g.h"
#include "fdbclient/IClientApi.h"
#include "fdbclient/Knobs.h"
#include "fdbclient/MetaclusterManagement.actor.h"
#include "fdbclient/Schemas.h"
#include "flow/Arena.h"
#include "flow/FastRef.h"
#include "flow/ThreadHelper.actor.h"
#include "flow/actorcompiler.h" // This must be the last #include.
namespace fdb_cli {
Optional<std::pair<Optional<ClusterConnectionString>, Optional<DataClusterEntry>>>
parseClusterConfiguration(std::vector<StringRef> const& tokens, DataClusterEntry const& defaults, int startIndex) {
Optional<DataClusterEntry> entry;
Optional<ClusterConnectionString> connectionString;
std::set<std::string> usedParams;
for (int tokenNum = startIndex; tokenNum < tokens.size(); ++tokenNum) {
StringRef token = tokens[tokenNum];
bool foundEquals;
StringRef param = token.eat("=", &foundEquals);
if (!foundEquals) {
fmt::print(stderr,
"ERROR: invalid configuration string `{}'. String must specify a value using `='.\n",
param.toString().c_str());
return {};
}
std::string value = token.toString();
if (!usedParams.insert(value).second) {
fmt::print(
stderr, "ERROR: configuration parameter `{}' specified more than once.\n", param.toString().c_str());
return {};
}
if (tokencmp(param, "max_tenant_groups")) {
entry = defaults;
int n;
if (sscanf(value.c_str(), "%d%n", &entry.get().capacity.numTenantGroups, &n) != 1 || n != value.size() ||
entry.get().capacity.numTenantGroups < 0) {
fmt::print(stderr, "ERROR: invalid number of tenant groups `{}'.\n", value.c_str());
return {};
}
} else if (tokencmp(param, "connection_string")) {
connectionString = ClusterConnectionString(value);
} else {
fmt::print(stderr, "ERROR: unrecognized configuration parameter `{}'.\n", param.toString().c_str());
return {};
}
}
return std::make_pair(connectionString, entry);
}
void printMetaclusterConfigureOptionsUsage() {
fmt::print("max_tenant_groups sets the maximum number of tenant groups that can be assigned\n"
"to the named data cluster.\n");
fmt::print("connection_string sets the connection string for the named data cluster.\n");
}
// metacluster create command
ACTOR Future<bool> metaclusterCreateCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() != 3) {
fmt::print("Usage: metacluster create_experimental <NAME>\n\n");
fmt::print("Configures the cluster to be a management cluster in a metacluster.\n");
fmt::print("NAME is an identifier used to distinguish this metacluster from other metaclusters.\n");
return false;
}
Optional<std::string> errorStr = wait(MetaclusterAPI::createMetacluster(db, tokens[2]));
if (errorStr.present()) {
fmt::print("ERROR: {}.\n", errorStr.get());
} else {
fmt::print("The cluster has been configured as a metacluster.\n");
}
return true;
}
// metacluster decommission command
ACTOR Future<bool> metaclusterDecommissionCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() != 2) {
fmt::print("Usage: metacluster decommission\n\n");
fmt::print("Converts the current cluster from a metacluster management cluster back into an\n");
fmt::print("ordinary cluster. It must be called on a cluster with no registered data clusters.\n");
return false;
}
wait(MetaclusterAPI::decommissionMetacluster(db));
fmt::print("The cluster is no longer a metacluster.\n");
return true;
}
// metacluster register command
ACTOR Future<bool> metaclusterRegisterCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() < 4) {
fmt::print("Usage: metacluster register <NAME> connection_string=<CONNECTION_STRING>\n"
"[max_tenant_groups=<NUM_GROUPS>]\n\n");
fmt::print("Adds a data cluster to a metacluster.\n");
fmt::print("NAME is used to identify the cluster in future commands.\n");
printMetaclusterConfigureOptionsUsage();
return false;
}
DataClusterEntry defaultEntry;
auto config = parseClusterConfiguration(tokens, defaultEntry, 3);
if (!config.present()) {
return false;
} else if (!config.get().first.present()) {
fmt::print(stderr, "ERROR: connection_string must be configured when registering a cluster.\n");
return false;
}
wait(MetaclusterAPI::registerCluster(
db, tokens[2], config.get().first.get(), config.get().second.orDefault(defaultEntry)));
fmt::print("The cluster `{}' has been added\n", printable(tokens[2]).c_str());
return true;
}
// metacluster remove command
ACTOR Future<bool> metaclusterRemoveCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() < 3 || tokens.size() > 4 || (tokens.size() == 4 && tokens[2] != "FORCE"_sr)) {
fmt::print("Usage: metacluster remove [FORCE] <NAME> \n\n");
fmt::print("Removes the specified data cluster from a metacluster.\n");
fmt::print("If FORCE is specified, then the cluster will be detached even if it has\n"
"tenants assigned to it.\n");
return false;
}
state ClusterNameRef clusterName = tokens[tokens.size() - 1];
wait(MetaclusterAPI::removeCluster(db, clusterName, tokens.size() == 4));
fmt::print("The cluster `{}' has been removed\n", printable(clusterName).c_str());
return true;
}
// metacluster configure command
ACTOR Future<bool> metaclusterConfigureCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() < 4) {
fmt::print("Usage: metacluster configure <NAME> <max_tenant_groups=<NUM_GROUPS>|\n"
"connection_string=<CONNECTION_STRING>> ...\n\n");
fmt::print("Updates the configuration of the metacluster.\n");
printMetaclusterConfigureOptionsUsage();
return false;
}
state Reference<ITransaction> tr = db->createTransaction();
loop {
try {
tr->setOption(FDBTransactionOptions::ACCESS_SYSTEM_KEYS);
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
Optional<DataClusterMetadata> metadata = wait(MetaclusterAPI::tryGetClusterTransaction(tr, tokens[2]));
if (!metadata.present()) {
throw cluster_not_found();
}
auto config = parseClusterConfiguration(tokens, metadata.get().entry, 3);
if (!config.present()) {
return false;
}
MetaclusterAPI::updateClusterMetadata(
tr, tokens[2], metadata.get(), config.get().first, config.get().second);
wait(safeThreadFutureToFuture(tr->commit()));
break;
} catch (Error& e) {
wait(safeThreadFutureToFuture(tr->onError(e)));
}
}
return true;
}
// metacluster list command
ACTOR Future<bool> metaclusterListCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() > 5) {
fmt::print("Usage: metacluster list [BEGIN] [END] [LIMIT]\n\n");
fmt::print("Lists the data clusters in a metacluster.\n");
fmt::print("Only cluster names in the range BEGIN - END will be printed.\n");
fmt::print("An optional LIMIT can be specified to limit the number of results (default 100).\n");
return false;
}
state ClusterNameRef begin = tokens.size() > 2 ? tokens[2] : ""_sr;
state ClusterNameRef end = tokens.size() > 3 ? tokens[3] : "\xff"_sr;
int limit = 100;
if (tokens.size() > 4) {
int n = 0;
if (sscanf(tokens[3].toString().c_str(), "%d%n", &limit, &n) != 1 || n != tokens[3].size() || limit < 0) {
fmt::print(stderr, "ERROR: invalid limit {}\n", tokens[3].toString().c_str());
return false;
}
}
std::map<ClusterName, DataClusterMetadata> clusters = wait(MetaclusterAPI::listClusters(db, begin, end, limit));
if (clusters.empty()) {
if (tokens.size() == 2) {
fmt::print("The metacluster has no registered data clusters\n");
} else {
fmt::print("The metacluster has no registered data clusters in the specified range\n");
}
}
int index = 0;
for (auto cluster : clusters) {
fmt::print(" {}. {}\n", ++index, printable(cluster.first).c_str());
}
return true;
}
// metacluster get command
ACTOR Future<bool> metaclusterGetCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() > 4 || (tokens.size() == 4 && tokens[3] != "JSON"_sr)) {
fmt::print("Usage: metacluster get <NAME> [JSON]\n\n");
fmt::print("Prints metadata associated with the given data cluster.\n");
fmt::print("If JSON is specified, then the output will be in JSON format.\n");
return false;
}
state bool useJson = tokens.size() == 4;
try {
DataClusterMetadata metadata = wait(MetaclusterAPI::getCluster(db, tokens[2]));
if (useJson) {
json_spirit::mObject obj;
obj["type"] = "success";
obj["cluster"] = metadata.toJson();
fmt::print("{}\n", json_spirit::write_string(json_spirit::mValue(obj), json_spirit::pretty_print).c_str());
} else {
fmt::print(" connection string: {}\n", metadata.connectionString.toString().c_str());
fmt::print(" cluster state: {}\n", DataClusterEntry::clusterStateToString(metadata.entry.clusterState));
fmt::print(" tenant group capacity: {}\n", metadata.entry.capacity.numTenantGroups);
fmt::print(" allocated tenant groups: {}\n", metadata.entry.allocated.numTenantGroups);
}
} catch (Error& e) {
if (useJson) {
json_spirit::mObject obj;
obj["type"] = "error";
obj["error"] = e.what();
fmt::print("{}\n", json_spirit::write_string(json_spirit::mValue(obj), json_spirit::pretty_print).c_str());
return false;
} else {
throw;
}
}
return true;
}
// metacluster status command
ACTOR Future<bool> metaclusterStatusCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() < 2 || tokens.size() > 3) {
fmt::print("Usage: metacluster status [JSON]\n\n");
fmt::print("Prints metacluster metadata.\n");
fmt::print("If JSON is specified, then the output will be in JSON format.\n");
return false;
}
state bool useJson = tokens.size() == 3;
try {
std::map<ClusterName, DataClusterMetadata> clusters =
wait(MetaclusterAPI::listClusters(db, ""_sr, "\xff"_sr, CLIENT_KNOBS->MAX_DATA_CLUSTERS));
ClusterUsage totalCapacity;
ClusterUsage totalAllocated;
for (auto cluster : clusters) {
totalCapacity.numTenantGroups +=
std::max(cluster.second.entry.capacity.numTenantGroups, cluster.second.entry.allocated.numTenantGroups);
totalAllocated.numTenantGroups += cluster.second.entry.allocated.numTenantGroups;
}
if (useJson) {
json_spirit::mObject obj;
obj["type"] = "success";
json_spirit::mObject metaclusterObj;
metaclusterObj["data_clusters"] = (int)clusters.size();
metaclusterObj["capacity"] = totalCapacity.toJson();
metaclusterObj["allocated"] = totalAllocated.toJson();
obj["metacluster"] = metaclusterObj;
fmt::print("{}\n", json_spirit::write_string(json_spirit::mValue(obj), json_spirit::pretty_print).c_str());
} else {
fmt::print(" number of data clusters: {}\n", clusters.size());
fmt::print(" tenant group capacity: {}\n", totalCapacity.numTenantGroups);
fmt::print(" allocated tenant groups: {}\n", totalAllocated.numTenantGroups);
}
return true;
} catch (Error& e) {
if (useJson) {
json_spirit::mObject obj;
obj["type"] = "error";
obj["error"] = e.what();
fmt::print("{}\n", json_spirit::write_string(json_spirit::mValue(obj), json_spirit::pretty_print).c_str());
return false;
} else {
throw;
}
}
}
// metacluster command
Future<bool> metaclusterCommand(Reference<IDatabase> db, std::vector<StringRef> tokens) {
if (tokens.size() == 1) {
printUsage(tokens[0]);
return true;
} else if (tokencmp(tokens[1], "create_experimental")) {
return metaclusterCreateCommand(db, tokens);
} else if (tokencmp(tokens[1], "decommission")) {
return metaclusterDecommissionCommand(db, tokens);
} else if (tokencmp(tokens[1], "register")) {
return metaclusterRegisterCommand(db, tokens);
} else if (tokencmp(tokens[1], "remove")) {
return metaclusterRemoveCommand(db, tokens);
} else if (tokencmp(tokens[1], "configure")) {
return metaclusterConfigureCommand(db, tokens);
} else if (tokencmp(tokens[1], "list")) {
return metaclusterListCommand(db, tokens);
} else if (tokencmp(tokens[1], "get")) {
return metaclusterGetCommand(db, tokens);
} else if (tokencmp(tokens[1], "status")) {
return metaclusterStatusCommand(db, tokens);
} else {
printUsage(tokens[0]);
return true;
}
}
void metaclusterGenerator(const char* text,
const char* line,
std::vector<std::string>& lc,
std::vector<StringRef> const& tokens) {
if (tokens.size() == 1) {
const char* opts[] = {
"create_experimental", "decommission", "register", "remove", "configure", "list", "get", "status", nullptr
};
arrayGenerator(text, line, opts, lc);
} else if (tokens.size() > 1 && (tokencmp(tokens[1], "register") || tokencmp(tokens[1], "configure"))) {
const char* opts[] = { "max_tenant_groups=", "connection_string=", nullptr };
arrayGenerator(text, line, opts, lc);
} else if ((tokens.size() == 2 && tokencmp(tokens[1], "status")) ||
(tokens.size() == 3 && tokencmp(tokens[1], "get"))) {
const char* opts[] = { "JSON", nullptr };
arrayGenerator(text, line, opts, lc);
}
}
std::vector<const char*> metaclusterHintGenerator(std::vector<StringRef> const& tokens, bool inArgument) {
if (tokens.size() == 1) {
return { "<create_experimental|decommission|register|remove|configure|list|get|status>", "[ARGS]" };
} else if (tokencmp(tokens[1], "create_experimental")) {
return { "<NAME>" };
} else if (tokencmp(tokens[1], "decommission")) {
return {};
} else if (tokencmp(tokens[1], "register") && tokens.size() < 5) {
static std::vector<const char*> opts = { "<NAME>",
"connection_string=<CONNECTION_STRING>",
"[max_tenant_groups=<NUM_GROUPS>]" };
return std::vector<const char*>(opts.begin() + tokens.size() - 2, opts.end());
} else if (tokencmp(tokens[1], "remove") && tokens.size() < 4) {
static std::vector<const char*> opts = { "[FORCE]", "<NAME>" };
if (tokens.size() == 2) {
return opts;
} else if (tokens.size() == 3 && (inArgument || tokens[2].size() == "FORCE"_sr.size()) &&
"FORCE"_sr.startsWith(tokens[2])) {
return std::vector<const char*>(opts.begin() + tokens.size() - 2, opts.end());
} else {
return {};
}
} else if (tokencmp(tokens[1], "configure")) {
static std::vector<const char*> opts = {
"<NAME>", "<max_tenant_groups=<NUM_GROUPS>|connection_string=<CONNECTION_STRING>>"
};
return std::vector<const char*>(opts.begin() + std::min<int>(1, tokens.size() - 2), opts.end());
} else if (tokencmp(tokens[1], "list") && tokens.size() < 5) {
static std::vector<const char*> opts = { "[BEGIN]", "[END]", "[LIMIT]" };
return std::vector<const char*>(opts.begin() + tokens.size() - 2, opts.end());
} else if (tokencmp(tokens[1], "get") && tokens.size() < 4) {
static std::vector<const char*> opts = { "<NAME>", "[JSON]" };
return std::vector<const char*>(opts.begin() + tokens.size() - 2, opts.end());
} else if (tokencmp(tokens[1], "status") && tokens.size() == 2) {
return { "[JSON]" };
} else {
return {};
}
}
CommandFactory metaclusterRegisterFactory(
"metacluster",
CommandHelp("metacluster <create_experimental|decommission|register|remove|configure|list|get|status> [ARGS]",
"view and manage a metacluster",
"`create_experimental' and `decommission' set up or deconfigure a metacluster.\n"
"`register' and `remove' add and remove data clusters from the metacluster.\n"
"`configure' updates the configuration of a data cluster.\n"
"`list' prints a list of data clusters in the metacluster.\n"
"`get' prints the metadata for a particular data cluster.\n"
"`status' prints metacluster metadata.\n"),
&metaclusterGenerator,
&metaclusterHintGenerator);
} // namespace fdb_cli

View File

@ -411,6 +411,7 @@ void printStatus(StatusObjectReader statusObj,
outputString += "\nConfiguration:";
std::string outputStringCache = outputString;
bool isOldMemory = false;
bool blobGranuleEnabled{ false };
try {
// Configuration section
// FIXME: Should we suppress this if there are cluster messages implying that the database has no
@ -434,7 +435,6 @@ void printStatus(StatusObjectReader statusObj,
outputString += "unknown";
int intVal = 0;
bool blobGranuleEnabled{ false };
if (statusObjConfig.get("blob_granules_enabled", intVal) && intVal) {
blobGranuleEnabled = true;
}
@ -1110,6 +1110,15 @@ void printStatus(StatusObjectReader statusObj,
outputString += "\n\nCoordination servers:";
outputString += getCoordinatorsInfoString(statusObj);
}
if (blobGranuleEnabled) {
outputString += "\n\nBlob Granules:";
StatusObjectReader statusObjBlobGranules = statusObjCluster["blob_granules"];
auto numWorkers = statusObjBlobGranules["number_of_blob_workers"].get_int();
outputString += "\n Number of Workers - " + format("%d", numWorkers);
auto numKeyRanges = statusObjBlobGranules["number_of_key_ranges"].get_int();
outputString += "\n Number of Key Ranges - " + format("%d", numKeyRanges);
}
}
// client time

View File

@ -43,7 +43,7 @@ ACTOR Future<bool> suspendCommandActor(Reference<IDatabase> db,
if (tokens.size() == 1) {
// initialize worker interfaces
address_interface->clear();
wait(getWorkerInterfaces(tr, address_interface));
wait(getWorkerInterfaces(tr, address_interface, true));
if (address_interface->size() == 0) {
printf("\nNo addresses can be suspended.\n");
} else if (address_interface->size() == 1) {

View File

@ -25,6 +25,7 @@
#include "fdbclient/IClientApi.h"
#include "fdbclient/Knobs.h"
#include "fdbclient/ManagementAPI.actor.h"
#include "fdbclient/MetaclusterManagement.actor.h"
#include "fdbclient/TenantManagement.actor.h"
#include "fdbclient/Schemas.h"
@ -100,9 +101,9 @@ Key makeConfigKey(TenantNameRef tenantName, StringRef configName) {
return tenantConfigSpecialKeyRange.begin.withSuffix(Tuple().append(tenantName).append(configName).pack());
}
void applyConfiguration(Reference<ITransaction> tr,
TenantNameRef tenantName,
std::map<Standalone<StringRef>, Optional<Value>> configuration) {
void applyConfigurationToSpecialKeys(Reference<ITransaction> tr,
TenantNameRef tenantName,
std::map<Standalone<StringRef>, Optional<Value>> configuration) {
for (auto [configName, value] : configuration) {
if (value.present()) {
tr->set(makeConfigKey(tenantName, configName), value.get());
@ -136,21 +137,32 @@ ACTOR Future<bool> createTenantCommandActor(Reference<IDatabase> db, std::vector
}
loop {
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
try {
if (!doneExistenceCheck) {
// Hold the reference to the standalone's memory
state ThreadFuture<Optional<Value>> existingTenantFuture = tr->get(tenantNameKey);
Optional<Value> existingTenant = wait(safeThreadFutureToFuture(existingTenantFuture));
if (existingTenant.present()) {
throw tenant_already_exists();
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
tr->setOption(FDBTransactionOptions::READ_SYSTEM_KEYS);
state ClusterType clusterType = wait(TenantAPI::getClusterType(tr));
if (clusterType == ClusterType::METACLUSTER_MANAGEMENT) {
TenantMapEntry tenantEntry;
for (auto const& [name, value] : configuration.get()) {
tenantEntry.configure(name, value);
}
doneExistenceCheck = true;
wait(MetaclusterAPI::createTenant(db, tokens[1], tenantEntry));
} else {
if (!doneExistenceCheck) {
// Hold the reference to the standalone's memory
state ThreadFuture<Optional<Value>> existingTenantFuture = tr->get(tenantNameKey);
Optional<Value> existingTenant = wait(safeThreadFutureToFuture(existingTenantFuture));
if (existingTenant.present()) {
throw tenant_already_exists();
}
doneExistenceCheck = true;
}
tr->set(tenantNameKey, ValueRef());
applyConfigurationToSpecialKeys(tr, tokens[1], configuration.get());
wait(safeThreadFutureToFuture(tr->commit()));
}
tr->set(tenantNameKey, ValueRef());
applyConfiguration(tr, tokens[1], configuration.get());
wait(safeThreadFutureToFuture(tr->commit()));
break;
} catch (Error& e) {
state Error err(e);
@ -167,10 +179,12 @@ ACTOR Future<bool> createTenantCommandActor(Reference<IDatabase> db, std::vector
return true;
}
CommandFactory createTenantFactory("createtenant",
CommandHelp("createtenant <TENANT_NAME> [tenant_group=<TENANT_GROUP>]",
"creates a new tenant in the cluster",
"Creates a new tenant in the cluster with the specified name."));
CommandFactory createTenantFactory(
"createtenant",
CommandHelp("createtenant <TENANT_NAME> [tenant_group=<TENANT_GROUP>]",
"creates a new tenant in the cluster",
"Creates a new tenant in the cluster with the specified name. An optional group can be specified"
"that will require this tenant to be placed on the same cluster as other tenants in the same group."));
// deletetenant command
ACTOR Future<bool> deleteTenantCommandActor(Reference<IDatabase> db, std::vector<StringRef> tokens, int apiVersion) {
@ -184,20 +198,27 @@ ACTOR Future<bool> deleteTenantCommandActor(Reference<IDatabase> db, std::vector
state bool doneExistenceCheck = false;
loop {
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
try {
if (!doneExistenceCheck) {
// Hold the reference to the standalone's memory
state ThreadFuture<Optional<Value>> existingTenantFuture = tr->get(tenantNameKey);
Optional<Value> existingTenant = wait(safeThreadFutureToFuture(existingTenantFuture));
if (!existingTenant.present()) {
throw tenant_not_found();
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
tr->setOption(FDBTransactionOptions::READ_SYSTEM_KEYS);
state ClusterType clusterType = wait(TenantAPI::getClusterType(tr));
if (clusterType == ClusterType::METACLUSTER_MANAGEMENT) {
wait(MetaclusterAPI::deleteTenant(db, tokens[1]));
} else {
if (!doneExistenceCheck) {
// Hold the reference to the standalone's memory
state ThreadFuture<Optional<Value>> existingTenantFuture = tr->get(tenantNameKey);
Optional<Value> existingTenant = wait(safeThreadFutureToFuture(existingTenantFuture));
if (!existingTenant.present()) {
throw tenant_not_found();
}
doneExistenceCheck = true;
}
doneExistenceCheck = true;
tr->clear(tenantNameKey);
wait(safeThreadFutureToFuture(tr->commit()));
}
tr->clear(tenantNameKey);
wait(safeThreadFutureToFuture(tr->commit()));
break;
} catch (Error& e) {
state Error err(e);
@ -228,8 +249,8 @@ ACTOR Future<bool> listTenantsCommandActor(Reference<IDatabase> db, std::vector<
return false;
}
StringRef beginTenant = ""_sr;
StringRef endTenant = "\xff\xff"_sr;
state StringRef beginTenant = ""_sr;
state StringRef endTenant = "\xff\xff"_sr;
state int limit = 100;
if (tokens.size() >= 2) {
@ -256,12 +277,26 @@ ACTOR Future<bool> listTenantsCommandActor(Reference<IDatabase> db, std::vector<
loop {
try {
// Hold the reference to the standalone's memory
state ThreadFuture<RangeResult> kvsFuture =
tr->getRange(firstGreaterOrEqual(beginTenantKey), firstGreaterOrEqual(endTenantKey), limit);
RangeResult tenants = wait(safeThreadFutureToFuture(kvsFuture));
tr->setOption(FDBTransactionOptions::READ_SYSTEM_KEYS);
state ClusterType clusterType = wait(TenantAPI::getClusterType(tr));
state std::vector<TenantNameRef> tenantNames;
if (clusterType == ClusterType::METACLUSTER_MANAGEMENT) {
std::vector<std::pair<TenantName, TenantMapEntry>> tenants =
wait(MetaclusterAPI::listTenantsTransaction(tr, beginTenant, endTenant, limit));
for (auto tenant : tenants) {
tenantNames.push_back(tenant.first);
}
} else {
// Hold the reference to the standalone's memory
state ThreadFuture<RangeResult> kvsFuture =
tr->getRange(firstGreaterOrEqual(beginTenantKey), firstGreaterOrEqual(endTenantKey), limit);
RangeResult tenants = wait(safeThreadFutureToFuture(kvsFuture));
for (auto tenant : tenants) {
tenantNames.push_back(tenant.key.removePrefix(tenantMapSpecialKeyRange(apiVersion).begin));
}
}
if (tenants.empty()) {
if (tenantNames.empty()) {
if (tokens.size() == 1) {
fmt::print("The cluster has no tenants\n");
} else {
@ -270,10 +305,8 @@ ACTOR Future<bool> listTenantsCommandActor(Reference<IDatabase> db, std::vector<
}
int index = 0;
for (auto tenant : tenants) {
fmt::print(" {}. {}\n",
++index,
printable(tenant.key.removePrefix(tenantMapSpecialKeyRange(apiVersion).begin)).c_str());
for (auto tenantName : tenantNames) {
fmt::print(" {}. {}\n", ++index, printable(tenantName).c_str());
}
return true;
@ -309,15 +342,24 @@ ACTOR Future<bool> getTenantCommandActor(Reference<IDatabase> db, std::vector<St
loop {
try {
// Hold the reference to the standalone's memory
state ThreadFuture<Optional<Value>> tenantFuture = tr->get(tenantNameKey);
Optional<Value> tenant = wait(safeThreadFutureToFuture(tenantFuture));
if (!tenant.present()) {
throw tenant_not_found();
tr->setOption(FDBTransactionOptions::READ_SYSTEM_KEYS);
state ClusterType clusterType = wait(TenantAPI::getClusterType(tr));
state std::string tenantJson;
if (clusterType == ClusterType::METACLUSTER_MANAGEMENT) {
TenantMapEntry entry = wait(MetaclusterAPI::getTenantTransaction(tr, tokens[1]));
tenantJson = entry.toJson(apiVersion);
} else {
// Hold the reference to the standalone's memory
state ThreadFuture<Optional<Value>> tenantFuture = tr->get(tenantNameKey);
Optional<Value> tenant = wait(safeThreadFutureToFuture(tenantFuture));
if (!tenant.present()) {
throw tenant_not_found();
}
tenantJson = tenant.get().toString();
}
json_spirit::mValue jsonObject;
json_spirit::read_string(tenant.get().toString(), jsonObject);
json_spirit::read_string(tenantJson, jsonObject);
if (useJson) {
json_spirit::mObject resultObj;
@ -333,6 +375,7 @@ ACTOR Future<bool> getTenantCommandActor(Reference<IDatabase> db, std::vector<St
std::string prefix;
std::string tenantState;
std::string tenantGroup;
std::string assignedCluster;
doc.get("id", id);
@ -344,6 +387,7 @@ ACTOR Future<bool> getTenantCommandActor(Reference<IDatabase> db, std::vector<St
doc.get("tenant_state", tenantState);
bool hasTenantGroup = doc.tryGet("tenant_group.printable", tenantGroup);
bool hasAssignedCluster = doc.tryGet("assigned_cluster", assignedCluster);
fmt::print(" id: {}\n", id);
fmt::print(" prefix: {}\n", printable(prefix).c_str());
@ -351,8 +395,10 @@ ACTOR Future<bool> getTenantCommandActor(Reference<IDatabase> db, std::vector<St
if (hasTenantGroup) {
fmt::print(" tenant group: {}\n", tenantGroup.c_str());
}
if (hasAssignedCluster) {
fmt::print(" assigned cluster: {}\n", printable(assignedCluster).c_str());
}
}
return true;
} catch (Error& e) {
try {
@ -408,10 +454,17 @@ ACTOR Future<bool> configureTenantCommandActor(Reference<IDatabase> db, std::vec
state Reference<ITransaction> tr = db->createTransaction();
loop {
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
try {
applyConfiguration(tr, tokens[1], configuration.get());
wait(safeThreadFutureToFuture(tr->commit()));
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
tr->setOption(FDBTransactionOptions::READ_SYSTEM_KEYS);
ClusterType clusterType = wait(TenantAPI::getClusterType(tr));
if (clusterType == ClusterType::METACLUSTER_MANAGEMENT) {
TenantMapEntry tenantEntry;
wait(MetaclusterAPI::configureTenant(db, tokens[1], configuration.get()));
} else {
applyConfigurationToSpecialKeys(tr, tokens[1], configuration.get());
wait(safeThreadFutureToFuture(tr->commit()));
}
break;
} catch (Error& e) {
state Error err(e);
@ -456,50 +509,56 @@ ACTOR Future<bool> renameTenantCommandActor(Reference<IDatabase> db, std::vector
state Key tenantOldNameKey = tenantMapSpecialKeyRange(apiVersion).begin.withSuffix(tokens[1]);
state Key tenantNewNameKey = tenantMapSpecialKeyRange(apiVersion).begin.withSuffix(tokens[2]);
state bool firstTry = true;
state int64_t id;
state int64_t id = -1;
loop {
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
try {
// Hold the reference to the standalone's memory
state ThreadFuture<Optional<Value>> oldEntryFuture = tr->get(tenantOldNameKey);
state ThreadFuture<Optional<Value>> newEntryFuture = tr->get(tenantNewNameKey);
state Optional<Value> oldEntry = wait(safeThreadFutureToFuture(oldEntryFuture));
state Optional<Value> newEntry = wait(safeThreadFutureToFuture(newEntryFuture));
if (firstTry) {
if (!oldEntry.present()) {
throw tenant_not_found();
}
if (newEntry.present()) {
throw tenant_already_exists();
}
// Store the id we see when first reading this key
id = getTenantId(oldEntry.get());
firstTry = false;
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
tr->setOption(FDBTransactionOptions::READ_SYSTEM_KEYS);
state ClusterType clusterType = wait(TenantAPI::getClusterType(tr));
if (clusterType == ClusterType::METACLUSTER_MANAGEMENT) {
wait(MetaclusterAPI::renameTenant(db, tokens[1], tokens[2]));
} else {
// If we got commit_unknown_result, the rename may have already occurred.
if (newEntry.present()) {
int64_t checkId = getTenantId(newEntry.get());
if (id == checkId) {
ASSERT(!oldEntry.present() || getTenantId(oldEntry.get()) != id);
return true;
// Hold the reference to the standalone's memory
state ThreadFuture<Optional<Value>> oldEntryFuture = tr->get(tenantOldNameKey);
state ThreadFuture<Optional<Value>> newEntryFuture = tr->get(tenantNewNameKey);
state Optional<Value> oldEntry = wait(safeThreadFutureToFuture(oldEntryFuture));
state Optional<Value> newEntry = wait(safeThreadFutureToFuture(newEntryFuture));
if (firstTry) {
if (!oldEntry.present()) {
throw tenant_not_found();
}
if (newEntry.present()) {
throw tenant_already_exists();
}
// Store the id we see when first reading this key
id = getTenantId(oldEntry.get());
firstTry = false;
} else {
// If we got commit_unknown_result, the rename may have already occurred.
if (newEntry.present()) {
int64_t checkId = getTenantId(newEntry.get());
if (id == checkId) {
ASSERT(!oldEntry.present() || getTenantId(oldEntry.get()) != id);
return true;
}
// If the new entry is present but does not match, then
// the rename should fail, so we throw an error.
throw tenant_already_exists();
}
if (!oldEntry.present()) {
throw tenant_not_found();
}
int64_t checkId = getTenantId(oldEntry.get());
// If the id has changed since we made our first attempt,
// then it's possible we've already moved the tenant. Don't move it again.
if (id != checkId) {
throw tenant_not_found();
}
// If the new entry is present but does not match, then
// the rename should fail, so we throw an error.
throw tenant_already_exists();
}
if (!oldEntry.present()) {
throw tenant_not_found();
}
int64_t checkId = getTenantId(oldEntry.get());
// If the id has changed since we made our first attempt,
// then it's possible we've already moved the tenant. Don't move it again.
if (id != checkId) {
throw tenant_not_found();
}
tr->set(tenantRenameKey, tokens[2]);
wait(safeThreadFutureToFuture(tr->commit()));
}
tr->set(tenantRenameKey, tokens[2]);
wait(safeThreadFutureToFuture(tr->commit()));
break;
} catch (Error& e) {
state Error err(e);

View File

@ -62,56 +62,52 @@ ACTOR Future<std::string> getSpecialKeysFailureErrorMessage(Reference<ITransacti
return valueObj["message"].get_str();
}
ACTOR Future<Void> verifyAndAddInterface(std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
Reference<FlowLock> connectLock,
KeyValue kv) {
wait(connectLock->take());
state FlowLock::Releaser releaser(*connectLock);
state ClientWorkerInterface workerInterf;
try {
// the interface is back-ward compatible, thus if parsing failed, it needs to upgrade cli version
workerInterf = BinaryReader::fromStringRef<ClientWorkerInterface>(kv.value, IncludeVersion());
} catch (Error& e) {
fprintf(stderr, "Error: %s; CLI version is too old, please update to use a newer version\n", e.what());
return Void();
}
state ClientLeaderRegInterface leaderInterf(workerInterf.address());
choose {
when(Optional<LeaderInfo> rep =
wait(brokenPromiseToNever(leaderInterf.getLeader.getReply(GetLeaderRequest())))) {
StringRef ip_port =
(kv.key.endsWith(LiteralStringRef(":tls")) ? kv.key.removeSuffix(LiteralStringRef(":tls")) : kv.key)
.removePrefix(LiteralStringRef("\xff\xff/worker_interfaces/"));
(*address_interface)[ip_port] = std::make_pair(kv.value, leaderInterf);
if (workerInterf.reboot.getEndpoint().addresses.secondaryAddress.present()) {
Key full_ip_port2 =
StringRef(workerInterf.reboot.getEndpoint().addresses.secondaryAddress.get().toString());
StringRef ip_port2 = full_ip_port2.endsWith(LiteralStringRef(":tls"))
? full_ip_port2.removeSuffix(LiteralStringRef(":tls"))
: full_ip_port2;
(*address_interface)[ip_port2] = std::make_pair(kv.value, leaderInterf);
}
void addInterfacesFromKVs(RangeResult& kvs,
std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface) {
for (const auto& kv : kvs) {
ClientWorkerInterface workerInterf;
try {
// the interface is back-ward compatible, thus if parsing failed, it needs to upgrade cli version
workerInterf = BinaryReader::fromStringRef<ClientWorkerInterface>(kv.value, IncludeVersion());
} catch (Error& e) {
fprintf(stderr, "Error: %s; CLI version is too old, please update to use a newer version\n", e.what());
return;
}
ClientLeaderRegInterface leaderInterf(workerInterf.address());
StringRef ip_port =
(kv.key.endsWith(LiteralStringRef(":tls")) ? kv.key.removeSuffix(LiteralStringRef(":tls")) : kv.key)
.removePrefix(LiteralStringRef("\xff\xff/worker_interfaces/"));
(*address_interface)[ip_port] = std::make_pair(kv.value, leaderInterf);
if (workerInterf.reboot.getEndpoint().addresses.secondaryAddress.present()) {
Key full_ip_port2 =
StringRef(workerInterf.reboot.getEndpoint().addresses.secondaryAddress.get().toString());
StringRef ip_port2 = full_ip_port2.endsWith(LiteralStringRef(":tls"))
? full_ip_port2.removeSuffix(LiteralStringRef(":tls"))
: full_ip_port2;
(*address_interface)[ip_port2] = std::make_pair(kv.value, leaderInterf);
}
when(wait(delay(CLIENT_KNOBS->CLI_CONNECT_TIMEOUT))) {}
}
return Void();
}
ACTOR Future<Void> getWorkerInterfaces(Reference<ITransaction> tr,
std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface) {
std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
bool verify) {
if (verify) {
tr->setOption(FDBTransactionOptions::SPECIAL_KEY_SPACE_ENABLE_WRITES);
tr->set(workerInterfacesVerifyOptionSpecialKey, ValueRef());
}
// Hold the reference to the standalone's memory
state ThreadFuture<RangeResult> kvsFuture = tr->getRange(
KeyRangeRef(LiteralStringRef("\xff\xff/worker_interfaces/"), LiteralStringRef("\xff\xff/worker_interfaces0")),
CLIENT_KNOBS->TOO_MANY);
RangeResult kvs = wait(safeThreadFutureToFuture(kvsFuture));
state RangeResult kvs = wait(safeThreadFutureToFuture(kvsFuture));
ASSERT(!kvs.more);
auto connectLock = makeReference<FlowLock>(CLIENT_KNOBS->CLI_CONNECT_PARALLELISM);
std::vector<Future<Void>> addInterfs;
for (auto it : kvs) {
addInterfs.push_back(verifyAndAddInterface(address_interface, connectLock, it));
if (verify) {
// remove the option if set
tr->clear(workerInterfacesVerifyOptionSpecialKey);
}
wait(waitForAll(addInterfs));
addInterfacesFromKVs(kvs, address_interface);
return Void();
}

View File

@ -103,6 +103,7 @@ enum {
OPT_DEBUG_TLS,
OPT_API_VERSION,
OPT_MEMORY,
OPT_USE_FUTURE_PROTOCOL_VERSION
};
CSimpleOpt::SOption g_rgOptions[] = { { OPT_CONNFILE, "-C", SO_REQ_SEP },
@ -127,6 +128,7 @@ CSimpleOpt::SOption g_rgOptions[] = { { OPT_CONNFILE, "-C", SO_REQ_SEP },
{ OPT_DEBUG_TLS, "--debug-tls", SO_NONE },
{ OPT_API_VERSION, "--api-version", SO_REQ_SEP },
{ OPT_MEMORY, "--memory", SO_REQ_SEP },
{ OPT_USE_FUTURE_PROTOCOL_VERSION, "--use-future-protocol-version", SO_NONE },
TLS_OPTION_FLAGS,
SO_END_OF_OPTIONS };
@ -475,6 +477,9 @@ static void printProgramUsage(const char* name) {
" Useful in reporting and diagnosing TLS issues.\n"
" --build-flags Print build information and exit.\n"
" --memory Resident memory limit of the CLI (defaults to 8GiB).\n"
" --use-future-protocol-version\n"
" Use the simulated future protocol version to connect to the cluster.\n"
" This option can be used testing purposes only!\n"
" -v, --version Print FoundationDB CLI version information and exit.\n"
" -h, --help Display this help and exit.\n");
}
@ -578,7 +583,7 @@ void initHelp() {
void printVersion() {
printf("FoundationDB CLI " FDB_VT_PACKAGE_NAME " (v" FDB_VT_VERSION ")\n");
printf("source version %s\n", getSourceVersion());
printf("protocol %" PRIx64 "\n", currentProtocolVersion.version());
printf("protocol %" PRIx64 "\n", currentProtocolVersion().version());
}
void printBuildInformation() {
@ -872,6 +877,7 @@ struct CLIOptions {
Optional<std::string> exec;
bool initialStatusCheck = true;
bool cliHints = true;
bool useFutureProtocolVersion = false;
bool debugTLS = false;
std::string tlsCertPath;
std::string tlsKeyPath;
@ -973,6 +979,10 @@ struct CLIOptions {
break;
case OPT_NO_HINTS:
cliHints = false;
break;
case OPT_USE_FUTURE_PROTOCOL_VERSION:
useFutureProtocolVersion = true;
break;
// TLS Options
case TLSConfig::OPT_TLS_PLUGIN:
@ -1040,36 +1050,6 @@ Future<T> stopNetworkAfter(Future<T> what) {
}
}
ACTOR Future<Void> addInterface(std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
Reference<FlowLock> connectLock,
KeyValue kv) {
wait(connectLock->take());
state FlowLock::Releaser releaser(*connectLock);
state ClientWorkerInterface workerInterf =
BinaryReader::fromStringRef<ClientWorkerInterface>(kv.value, IncludeVersion());
state ClientLeaderRegInterface leaderInterf(workerInterf.address());
choose {
when(Optional<LeaderInfo> rep =
wait(brokenPromiseToNever(leaderInterf.getLeader.getReply(GetLeaderRequest())))) {
StringRef ip_port =
(kv.key.endsWith(LiteralStringRef(":tls")) ? kv.key.removeSuffix(LiteralStringRef(":tls")) : kv.key)
.removePrefix(LiteralStringRef("\xff\xff/worker_interfaces/"));
(*address_interface)[ip_port] = std::make_pair(kv.value, leaderInterf);
if (workerInterf.reboot.getEndpoint().addresses.secondaryAddress.present()) {
Key full_ip_port2 =
StringRef(workerInterf.reboot.getEndpoint().addresses.secondaryAddress.get().toString());
StringRef ip_port2 = full_ip_port2.endsWith(LiteralStringRef(":tls"))
? full_ip_port2.removeSuffix(LiteralStringRef(":tls"))
: full_ip_port2;
(*address_interface)[ip_port2] = std::make_pair(kv.value, leaderInterf);
}
}
when(wait(delay(CLIENT_KNOBS->CLI_CONNECT_TIMEOUT))) {}
}
return Void();
}
ACTOR Future<int> cli(CLIOptions opt, LineNoise* plinenoise) {
state LineNoise& linenoise = *plinenoise;
state bool intrans = false;
@ -1967,6 +1947,13 @@ ACTOR Future<int> cli(CLIOptions opt, LineNoise* plinenoise) {
continue;
}
if (tokencmp(tokens[0], "metacluster")) {
bool _result = wait(makeInterruptable(metaclusterCommand(db, tokens)));
if (!_result)
is_error = true;
continue;
}
fprintf(stderr, "ERROR: Unknown command `%s'. Try `help'?\n", formatStringRef(tokens[0]).c_str());
is_error = true;
}
@ -2192,6 +2179,9 @@ int main(int argc, char** argv) {
try {
API->selectApiVersion(opt.apiVersion);
if (opt.useFutureProtocolVersion) {
API->useFutureProtocolVersion();
}
API->setupNetwork();
opt.setupKnobs();
if (opt.exit_code != -1) {

View File

@ -120,6 +120,7 @@ extern const KeyRangeRef processClassSourceSpecialKeyRange;
extern const KeyRangeRef processClassTypeSpecialKeyRange;
// Other special keys
inline const KeyRef errorMsgSpecialKey = LiteralStringRef("\xff\xff/error_message");
inline const KeyRef workerInterfacesVerifyOptionSpecialKey = "\xff\xff/management/options/worker_interfaces/verify"_sr;
// help functions (Copied from fdbcli.actor.cpp)
// get all workers' info
@ -132,13 +133,14 @@ void printUsage(StringRef command);
// Pre: tr failed with special_keys_api_failure error
// Read the error message special key and return the message
ACTOR Future<std::string> getSpecialKeysFailureErrorMessage(Reference<ITransaction> tr);
// Using \xff\xff/worker_interfaces/ special key, get all worker interfaces
// Using \xff\xff/worker_interfaces/ special key, get all worker interfaces.
// A worker list will be returned from CC.
// If verify, we will try to establish connections to all workers returned.
// In particular, it will deserialize \xff\xff/worker_interfaces/<address>:=<ClientInterface> kv pairs and issue RPC
// calls, then only return interfaces(kv pairs) the client can talk to
ACTOR Future<Void> getWorkerInterfaces(Reference<ITransaction> tr,
std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface);
// Deserialize \xff\xff/worker_interfaces/<address>:=<ClientInterface> k-v pair and verify by a RPC call
ACTOR Future<Void> verifyAndAddInterface(std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
Reference<FlowLock> connectLock,
KeyValue kv);
std::map<Key, std::pair<Value, ClientLeaderRegInterface>>* address_interface,
bool verify = false);
// print cluster status info
void printStatus(StatusObjectReader statusObj,
StatusClient::StatusLevel level,
@ -200,6 +202,10 @@ ACTOR Future<bool> listTenantsCommandActor(Reference<IDatabase> db, std::vector<
// lock/unlock command
ACTOR Future<bool> lockCommandActor(Reference<IDatabase> db, std::vector<StringRef> tokens);
ACTOR Future<bool> unlockDatabaseActor(Reference<IDatabase> db, UID uid);
// metacluster command
Future<bool> metaclusterCommand(Reference<IDatabase> db, std::vector<StringRef> tokens);
// changefeed command
ACTOR Future<bool> changeFeedCommandActor(Database localDb,
Optional<TenantMapEntry> tenantEntry,

View File

@ -288,11 +288,46 @@ Reference<IBackupContainer> IBackupContainer::openContainer(const std::string& u
#ifdef BUILD_AZURE_BACKUP
else if (u.startsWith("azure://"_sr)) {
u.eat("azure://"_sr);
auto accountName = u.eat("@"_sr).toString();
auto endpoint = u.eat("/"_sr).toString();
auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>(
endpoint, accountName, containerName, encryptionKeyFileName);
auto address = u.eat("/"_sr);
if (address.endsWith(std::string(azure::storage_lite::constants::default_endpoint_suffix))) {
CODE_PROBE(true, "Azure backup url with standard azure storage account endpoint");
// <account>.<service>.core.windows.net/<resource_path>
auto endPoint = address.toString();
auto accountName = address.eat("."_sr).toString();
auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>(
endPoint, accountName, containerName, encryptionKeyFileName);
} else {
// resolve the network address if necessary
std::string endpoint(address.toString());
Optional<NetworkAddress> parsedAddress = NetworkAddress::parseOptional(endpoint);
if (!parsedAddress.present()) {
try {
auto hostname = Hostname::parse(endpoint);
auto resolvedAddress = hostname.resolveBlocking();
if (resolvedAddress.present()) {
CODE_PROBE(true, "Azure backup url with hostname in the endpoint");
parsedAddress = resolvedAddress.get();
}
} catch (Error& e) {
TraceEvent(SevError, "InvalidAzureBackupUrl").error(e).detail("Endpoint", endpoint);
throw backup_invalid_url();
}
}
if (!parsedAddress.present()) {
TraceEvent(SevError, "InvalidAzureBackupUrl").detail("Endpoint", endpoint);
throw backup_invalid_url();
}
auto accountName = u.eat("/"_sr).toString();
// Avoid including ":tls" and "(fromHostname)"
// note: the endpoint needs to contain the account name
// so either "<account_name>.blob.core.windows.net" or "<ip>:<port>/<account_name>"
endpoint =
fmt::format("{}/{}", formatIpPort(parsedAddress.get().ip, parsedAddress.get().port), accountName);
auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>(
endpoint, accountName, containerName, encryptionKeyFileName);
}
}
#endif
else {

View File

@ -1523,11 +1523,46 @@ Reference<BackupContainerFileSystem> BackupContainerFileSystem::openContainerFS(
#ifdef BUILD_AZURE_BACKUP
else if (u.startsWith("azure://"_sr)) {
u.eat("azure://"_sr);
auto accountName = u.eat("@"_sr).toString();
auto endpoint = u.eat("/"_sr).toString();
auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>(
endpoint, accountName, containerName, encryptionKeyFileName);
auto address = u.eat("/"_sr);
if (address.endsWith(std::string(azure::storage_lite::constants::default_endpoint_suffix))) {
CODE_PROBE(true, "Azure backup url with standard azure storage account endpoint");
// <account>.<service>.core.windows.net/<resource_path>
auto endPoint = address.toString();
auto accountName = address.eat("."_sr).toString();
auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>(
endPoint, accountName, containerName, encryptionKeyFileName);
} else {
// resolve the network address if necessary
std::string endpoint(address.toString());
Optional<NetworkAddress> parsedAddress = NetworkAddress::parseOptional(endpoint);
if (!parsedAddress.present()) {
try {
auto hostname = Hostname::parse(endpoint);
auto resolvedAddress = hostname.resolveBlocking();
if (resolvedAddress.present()) {
CODE_PROBE(true, "Azure backup url with hostname in the endpoint");
parsedAddress = resolvedAddress.get();
}
} catch (Error& e) {
TraceEvent(SevError, "InvalidAzureBackupUrl").error(e).detail("Endpoint", endpoint);
throw backup_invalid_url();
}
}
if (!parsedAddress.present()) {
TraceEvent(SevError, "InvalidAzureBackupUrl").detail("Endpoint", endpoint);
throw backup_invalid_url();
}
auto accountName = u.eat("/"_sr).toString();
// Avoid including ":tls" and "(fromHostname)"
// note: the endpoint needs to contain the account name
// so either "<account_name>.blob.core.windows.net" or "<ip>:<port>/<account_name>"
endpoint =
fmt::format("{}/{}", formatIpPort(parsedAddress.get().ip, parsedAddress.get().port), accountName);
auto containerName = u.eat("/"_sr).toString();
r = makeReference<BackupContainerAzureBlobStore>(
endpoint, accountName, containerName, encryptionKeyFileName);
}
}
#endif
else {

View File

@ -0,0 +1,45 @@
/*
* BlobGranuleCommon.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/BlobGranuleCommon.h"
BlobGranuleSummaryRef summarizeGranuleChunk(Arena& ar, const BlobGranuleChunkRef& chunk) {
BlobGranuleSummaryRef summary;
ASSERT(chunk.snapshotFile.present());
ASSERT(chunk.snapshotVersion != invalidVersion);
ASSERT(chunk.includedVersion >= chunk.snapshotVersion);
ASSERT(chunk.newDeltas.empty());
if (chunk.tenantPrefix.present()) {
summary.keyRange = KeyRangeRef(ar, chunk.keyRange.removePrefix(chunk.tenantPrefix.get()));
} else {
summary.keyRange = KeyRangeRef(ar, chunk.keyRange);
}
summary.snapshotVersion = chunk.snapshotVersion;
summary.snapshotSize = chunk.snapshotFile.get().length;
summary.deltaVersion = chunk.includedVersion;
summary.deltaSize = 0;
for (auto& it : chunk.deltaFiles) {
summary.deltaSize += it.length;
}
return summary;
}

View File

@ -40,6 +40,7 @@
#include <cstring>
#include <fstream> // for perf microbenchmark
#include <limits>
#include <vector>
#define BG_READ_DEBUG false
@ -209,16 +210,21 @@ namespace {
BlobGranuleFileEncryptionKeys getEncryptBlobCipherKey(const BlobGranuleCipherKeysCtx cipherKeysCtx) {
BlobGranuleFileEncryptionKeys eKeys;
// Cipher key reconstructed is 'never' inserted into BlobCipherKey cache, choose 'neverExpire'
eKeys.textCipherKey = makeReference<BlobCipherKey>(cipherKeysCtx.textCipherKey.encryptDomainId,
cipherKeysCtx.textCipherKey.baseCipherId,
cipherKeysCtx.textCipherKey.baseCipher.begin(),
cipherKeysCtx.textCipherKey.baseCipher.size(),
cipherKeysCtx.textCipherKey.salt);
cipherKeysCtx.textCipherKey.salt,
std::numeric_limits<int64_t>::max(),
std::numeric_limits<int64_t>::max());
eKeys.headerCipherKey = makeReference<BlobCipherKey>(cipherKeysCtx.headerCipherKey.encryptDomainId,
cipherKeysCtx.headerCipherKey.baseCipherId,
cipherKeysCtx.headerCipherKey.baseCipher.begin(),
cipherKeysCtx.headerCipherKey.baseCipher.size(),
cipherKeysCtx.headerCipherKey.salt);
cipherKeysCtx.headerCipherKey.salt,
std::numeric_limits<int64_t>::max(),
std::numeric_limits<int64_t>::max());
return eKeys;
}
@ -346,7 +352,9 @@ struct IndexBlockRef {
decrypt(cipherKeysCtx.get(), *this, arena);
} else {
TraceEvent("IndexBlockSize").detail("Sz", buffer.size());
if (BG_ENCRYPT_COMPRESS_DEBUG) {
TraceEvent("IndexBlockSize").detail("Sz", buffer.size());
}
ObjectReader dataReader(buffer.begin(), IncludeVersion());
dataReader.deserialize(FileIdentifierFor<IndexBlock>::value, block, arena);
@ -368,7 +376,11 @@ struct IndexBlockRef {
arena, ObjectWriter::toValue(block, IncludeVersion(ProtocolVersion::withBlobGranuleFile())).contents());
}
TraceEvent(SevDebug, "IndexBlockSize").detail("Sz", buffer.size()).detail("Encrypted", cipherKeysCtx.present());
if (BG_ENCRYPT_COMPRESS_DEBUG) {
TraceEvent(SevDebug, "IndexBlockSize")
.detail("Sz", buffer.size())
.detail("Encrypted", cipherKeysCtx.present());
}
}
template <class Ar>
@ -804,10 +816,6 @@ static Standalone<VectorRef<ParsedDeltaBoundaryRef>> loadSnapshotFile(
ASSERT(file.indexBlockRef.block.children.size() >= 2);
// TODO: refactor this out of delta tree
// int commonPrefixLen = commonPrefixLength(index.dataBlockOffsets.front().first,
// index.dataBlockOffsets.back().first);
// find range of blocks needed to read
ChildBlockPointerRef* currentBlock = file.findStartBlock(keyRange.begin);
@ -1157,10 +1165,6 @@ Standalone<VectorRef<ParsedDeltaBoundaryRef>> loadChunkedDeltaFile(const Standal
ASSERT(file.indexBlockRef.block.children.size() >= 2);
// TODO: refactor this out of delta tree
// int commonPrefixLen = commonPrefixLength(index.dataBlockOffsets.front().first,
// index.dataBlockOffsets.back().first);
// find range of blocks needed to read
ChildBlockPointerRef* currentBlock = file.findStartBlock(keyRange.begin);
@ -1169,7 +1173,8 @@ Standalone<VectorRef<ParsedDeltaBoundaryRef>> loadChunkedDeltaFile(const Standal
return deltas;
}
// TODO: could cpu optimize first block a bit more by seeking right to start
// FIXME: shared prefix for key comparison
// FIXME: could cpu optimize first block a bit more by seeking right to start
bool lastBlock = false;
bool prevClearAfter = false;
while (!lastBlock) {
@ -1553,12 +1558,23 @@ RangeResult materializeBlobGranule(const BlobGranuleChunkRef& chunk,
return mergeDeltaStreams(chunk, streams, startClears);
}
struct GranuleLoadFreeHandle : NonCopyable, ReferenceCounted<GranuleLoadFreeHandle> {
const ReadBlobGranuleContext* granuleContext;
int64_t loadId;
GranuleLoadFreeHandle(const ReadBlobGranuleContext* granuleContext, int64_t loadId)
: granuleContext(granuleContext), loadId(loadId) {}
~GranuleLoadFreeHandle() { granuleContext->free_load_f(loadId, granuleContext->userContext); }
};
struct GranuleLoadIds {
Optional<int64_t> snapshotId;
std::vector<int64_t> deltaIds;
std::vector<Reference<GranuleLoadFreeHandle>> freeHandles;
};
static void startLoad(const ReadBlobGranuleContext granuleContext,
static void startLoad(const ReadBlobGranuleContext* granuleContext,
const BlobGranuleChunkRef& chunk,
GranuleLoadIds& loadIds) {
@ -1568,12 +1584,13 @@ static void startLoad(const ReadBlobGranuleContext granuleContext,
// FIXME: remove when we implement file multiplexing
ASSERT(chunk.snapshotFile.get().offset == 0);
ASSERT(chunk.snapshotFile.get().length == chunk.snapshotFile.get().fullFileLength);
loadIds.snapshotId = granuleContext.start_load_f(snapshotFname.c_str(),
snapshotFname.size(),
chunk.snapshotFile.get().offset,
chunk.snapshotFile.get().length,
chunk.snapshotFile.get().fullFileLength,
granuleContext.userContext);
loadIds.snapshotId = granuleContext->start_load_f(snapshotFname.c_str(),
snapshotFname.size(),
chunk.snapshotFile.get().offset,
chunk.snapshotFile.get().length,
chunk.snapshotFile.get().fullFileLength,
granuleContext->userContext);
loadIds.freeHandles.push_back(makeReference<GranuleLoadFreeHandle>(granuleContext, loadIds.snapshotId.get()));
}
loadIds.deltaIds.reserve(chunk.deltaFiles.size());
for (int deltaFileIdx = 0; deltaFileIdx < chunk.deltaFiles.size(); deltaFileIdx++) {
@ -1581,13 +1598,14 @@ static void startLoad(const ReadBlobGranuleContext granuleContext,
// FIXME: remove when we implement file multiplexing
ASSERT(chunk.deltaFiles[deltaFileIdx].offset == 0);
ASSERT(chunk.deltaFiles[deltaFileIdx].length == chunk.deltaFiles[deltaFileIdx].fullFileLength);
int64_t deltaLoadId = granuleContext.start_load_f(deltaFName.c_str(),
deltaFName.size(),
chunk.deltaFiles[deltaFileIdx].offset,
chunk.deltaFiles[deltaFileIdx].length,
chunk.deltaFiles[deltaFileIdx].fullFileLength,
granuleContext.userContext);
int64_t deltaLoadId = granuleContext->start_load_f(deltaFName.c_str(),
deltaFName.size(),
chunk.deltaFiles[deltaFileIdx].offset,
chunk.deltaFiles[deltaFileIdx].length,
chunk.deltaFiles[deltaFileIdx].fullFileLength,
granuleContext->userContext);
loadIds.deltaIds.push_back(deltaLoadId);
loadIds.freeHandles.push_back(makeReference<GranuleLoadFreeHandle>(granuleContext, deltaLoadId));
}
}
@ -1606,17 +1624,16 @@ ErrorOr<RangeResult> loadAndMaterializeBlobGranules(const Standalone<VectorRef<B
GranuleLoadIds loadIds[files.size()];
// Kick off first file reads if parallelism > 1
for (int i = 0; i < parallelism - 1 && i < files.size(); i++) {
startLoad(granuleContext, files[i], loadIds[i]);
}
try {
// Kick off first file reads if parallelism > 1
for (int i = 0; i < parallelism - 1 && i < files.size(); i++) {
startLoad(&granuleContext, files[i], loadIds[i]);
}
RangeResult results;
for (int chunkIdx = 0; chunkIdx < files.size(); chunkIdx++) {
// Kick off files for this granule if parallelism == 1, or future granule if parallelism > 1
if (chunkIdx + parallelism - 1 < files.size()) {
startLoad(granuleContext, files[chunkIdx + parallelism - 1], loadIds[chunkIdx + parallelism - 1]);
startLoad(&granuleContext, files[chunkIdx + parallelism - 1], loadIds[chunkIdx + parallelism - 1]);
}
RangeResult chunkRows;
@ -1632,7 +1649,8 @@ ErrorOr<RangeResult> loadAndMaterializeBlobGranules(const Standalone<VectorRef<B
}
}
StringRef deltaData[files[chunkIdx].deltaFiles.size()];
// +1 to avoid UBSAN variable length array of size zero
StringRef deltaData[files[chunkIdx].deltaFiles.size() + 1];
for (int i = 0; i < files[chunkIdx].deltaFiles.size(); i++) {
deltaData[i] =
StringRef(granuleContext.get_load_f(loadIds[chunkIdx].deltaIds[i], granuleContext.userContext),
@ -1650,12 +1668,8 @@ ErrorOr<RangeResult> loadAndMaterializeBlobGranules(const Standalone<VectorRef<B
results.arena().dependsOn(chunkRows.arena());
results.append(results.arena(), chunkRows.begin(), chunkRows.size());
if (loadIds[chunkIdx].snapshotId.present()) {
granuleContext.free_load_f(loadIds[chunkIdx].snapshotId.get(), granuleContext.userContext);
}
for (int i = 0; i < loadIds[chunkIdx].deltaIds.size(); i++) {
granuleContext.free_load_f(loadIds[chunkIdx].deltaIds[i], granuleContext.userContext);
}
// free once done by forcing FreeHandles to trigger
loadIds[chunkIdx].freeHandles.clear();
}
return ErrorOr<RangeResult>(results);
} catch (Error& e) {
@ -2372,7 +2386,6 @@ void checkDeltaRead(const KeyValueGen& kvGen,
std::string filename = randomBGFilename(
deterministicRandom()->randomUniqueID(), deterministicRandom()->randomUniqueID(), readVersion, ".delta");
Standalone<BlobGranuleChunkRef> chunk;
// TODO need to add cipher keys meta
chunk.deltaFiles.emplace_back_deep(
chunk.arena(), filename, 0, serialized->size(), serialized->size(), kvGen.cipherKeys);
chunk.keyRange = kvGen.allRange;
@ -2429,7 +2442,6 @@ static std::tuple<KeyRange, Version, Version> randomizeKeyAndVersions(const KeyV
}
}
// TODO randomize begin and read version to sometimes +/- 1 and readRange begin and end to keyAfter sometimes
return { readRange, beginVersion, readVersion };
}
@ -2653,7 +2665,11 @@ TEST_CASE("/blobgranule/files/granuleReadUnitTest") {
serializedDeltaFiles,
inMemoryDeltas);
for (int i = 0; i < std::min(100, 5 + snapshotData.size() * deltaData.size()); i++) {
// prevent overflow by doing min before multiply
int maxRuns = 100;
int snapshotAndDeltaSize = 5 + std::min(maxRuns, snapshotData.size()) * std::min(maxRuns, deltaData.size());
int lim = std::min(maxRuns, snapshotAndDeltaSize);
for (int i = 0; i < lim; i++) {
auto params = randomizeKeyAndVersions(kvGen, deltaData);
fmt::print("Partial test {0}: [{1} - {2}) @ {3} - {4}\n",
i,

View File

@ -31,13 +31,6 @@
#include "fdbclient/FDBTypes.h"
#include "flow/actorcompiler.h" // This must be the last #include.
// TODO more efficient data structure besides std::map? PTree is unnecessary since this isn't versioned, but some other
// sorted thing could work. And if it used arenas it'd probably be more efficient with allocations, since everything
// else is in 1 arena and discarded at the end.
// TODO could refactor the file reading code from here and the delta file function into another actor,
// then this part would also be testable? but meh
ACTOR Future<Standalone<StringRef>> readFile(Reference<BlobConnectionProvider> bstoreProvider, BlobFilePointerRef f) {
try {
state Arena arena;
@ -140,3 +133,66 @@ ACTOR Future<Void> readBlobGranules(BlobGranuleFileRequest request,
return Void();
}
// Return true if a given range is fully covered by blob chunks
bool isRangeFullyCovered(KeyRange range, Standalone<VectorRef<BlobGranuleChunkRef>> blobChunks) {
std::vector<KeyRangeRef> blobRanges;
for (const BlobGranuleChunkRef& chunk : blobChunks) {
blobRanges.push_back(chunk.keyRange);
}
return range.isCovered(blobRanges);
}
void testAddChunkRange(KeyRef begin, KeyRef end, Standalone<VectorRef<BlobGranuleChunkRef>>& chunks) {
BlobGranuleChunkRef chunk;
chunk.keyRange = KeyRangeRef(begin, end);
chunks.push_back(chunks.arena(), chunk);
}
TEST_CASE("/fdbserver/blobgranule/isRangeCoveredByBlob") {
Standalone<VectorRef<BlobGranuleChunkRef>> chunks;
// chunk1 key_a1 - key_a9
testAddChunkRange("key_a1"_sr, "key_a9"_sr, chunks);
// chunk2 key_b1 - key_b9
testAddChunkRange("key_b1"_sr, "key_b9"_sr, chunks);
// check empty range. not covered
{ ASSERT(isRangeFullyCovered(KeyRangeRef(), chunks) == false); }
// check empty chunks. not covered
{
Standalone<VectorRef<BlobGranuleChunkRef>> empyChunks;
ASSERT(isRangeFullyCovered(KeyRangeRef(), empyChunks) == false);
}
// check '' to \xff
{ ASSERT(isRangeFullyCovered(KeyRangeRef(LiteralStringRef(""), LiteralStringRef("\xff")), chunks) == false); }
// check {key_a1, key_a9}
{ ASSERT(isRangeFullyCovered(KeyRangeRef("key_a1"_sr, "key_a9"_sr), chunks)); }
// check {key_a1, key_a3}
{ ASSERT(isRangeFullyCovered(KeyRangeRef("key_a1"_sr, "key_a3"_sr), chunks)); }
// check {key_a0, key_a3}
{ ASSERT(isRangeFullyCovered(KeyRangeRef("key_a0"_sr, "key_a3"_sr), chunks) == false); }
// check {key_a5, key_b2}
{
auto range = KeyRangeRef("key_a5"_sr, "key_b5"_sr);
ASSERT(isRangeFullyCovered(range, chunks) == false);
ASSERT(range.begin == "key_a5"_sr);
ASSERT(range.end == "key_b5"_sr);
}
// check continued chunks
{
Standalone<VectorRef<BlobGranuleChunkRef>> continuedChunks;
testAddChunkRange("key_a1"_sr, "key_a9"_sr, continuedChunks);
testAddChunkRange("key_a9"_sr, "key_b1"_sr, continuedChunks);
testAddChunkRange("key_b1"_sr, "key_b9"_sr, continuedChunks);
ASSERT(isRangeFullyCovered(KeyRangeRef("key_a1"_sr, "key_b9"_sr), continuedChunks) == false);
}
return Void();
}

View File

@ -90,8 +90,8 @@ add_flow_target(LINK_TEST NAME fdbclientlinktest SRCS LinkTest.cpp)
target_link_libraries(fdbclientlinktest PRIVATE fdbclient rapidxml) # re-link rapidxml due to private link interface
if(BUILD_AZURE_BACKUP)
target_link_libraries(fdbclient PRIVATE curl uuid azure-storage-lite)
target_link_libraries(fdbclient_sampling PRIVATE curl uuid azure-storage-lite)
target_link_libraries(fdbclient PRIVATE curl azure-storage-lite)
target_link_libraries(fdbclient_sampling PRIVATE curl azure-storage-lite)
endif()
if(BUILD_AWS_BACKUP)

View File

@ -42,10 +42,6 @@ void ClientKnobs::initialize(Randomize randomize) {
init( FAILURE_MAX_DELAY, 5.0 );
init( FAILURE_MIN_DELAY, 4.0 ); if( randomize && BUGGIFY ) FAILURE_MIN_DELAY = 1.0;
init( FAILURE_TIMEOUT_DELAY, FAILURE_MIN_DELAY );
init( CLIENT_FAILURE_TIMEOUT_DELAY, FAILURE_MIN_DELAY );
init( FAILURE_EMERGENCY_DELAY, 30.0 );
init( FAILURE_MAX_GENERATIONS, 10 );
init( RECOVERY_DELAY_START_GENERATION, 70 );
init( RECOVERY_DELAY_SECONDS_PER_GENERATION, 60.0 );
init( MAX_GENERATIONS, 100 );
@ -64,6 +60,7 @@ void ClientKnobs::initialize(Randomize randomize) {
init( WRONG_SHARD_SERVER_DELAY, .01 ); if( randomize && BUGGIFY ) WRONG_SHARD_SERVER_DELAY = deterministicRandom()->random01(); // FLOW_KNOBS->PREVENT_FAST_SPIN_DELAY; // SOMEDAY: This delay can limit performance of retrieving data when the cache is mostly wrong (e.g. dumping the database after a test)
init( FUTURE_VERSION_RETRY_DELAY, .01 ); if( randomize && BUGGIFY ) FUTURE_VERSION_RETRY_DELAY = deterministicRandom()->random01();// FLOW_KNOBS->PREVENT_FAST_SPIN_DELAY;
init( GRV_ERROR_RETRY_DELAY, 5.0 ); if( randomize && BUGGIFY ) GRV_ERROR_RETRY_DELAY = 0.01 + 5 * deterministicRandom()->random01();
init( UNKNOWN_TENANT_RETRY_DELAY, 0.0 ); if( randomize && BUGGIFY ) UNKNOWN_TENANT_RETRY_DELAY = deterministicRandom()->random01();
init( REPLY_BYTE_LIMIT, 80000 );
init( DEFAULT_BACKOFF, .01 ); if( randomize && BUGGIFY ) DEFAULT_BACKOFF = deterministicRandom()->random01();
@ -84,6 +81,7 @@ void ClientKnobs::initialize(Randomize randomize) {
init( CHANGE_FEED_CACHE_SIZE, 100000 ); if( randomize && BUGGIFY ) CHANGE_FEED_CACHE_SIZE = 1;
init( CHANGE_FEED_POP_TIMEOUT, 10.0 );
init( CHANGE_FEED_STREAM_MIN_BYTES, 1e4 ); if( randomize && BUGGIFY ) CHANGE_FEED_STREAM_MIN_BYTES = 1;
init( CHANGE_FEED_START_INTERVAL, 10.0 );
init( MAX_BATCH_SIZE, 1000 ); if( randomize && BUGGIFY ) MAX_BATCH_SIZE = 1;
init( GRV_BATCH_TIMEOUT, 0.005 ); if( randomize && BUGGIFY ) GRV_BATCH_TIMEOUT = 0.1;
@ -159,8 +157,6 @@ void ClientKnobs::initialize(Randomize randomize) {
init( BACKUP_AGGREGATE_POLL_RATE_UPDATE_INTERVAL, 60);
init( BACKUP_AGGREGATE_POLL_RATE, 2.0 ); // polls per second target for all agents on the cluster
init( BACKUP_LOG_WRITE_BATCH_MAX_SIZE, 1e6 ); //Must be much smaller than TRANSACTION_SIZE_LIMIT
init( BACKUP_LOG_ATOMIC_OPS_SIZE, 1000 );
init( BACKUP_OPERATION_COST_OVERHEAD, 50 );
init( BACKUP_MAX_LOG_RANGES, 21 ); if( randomize && BUGGIFY ) BACKUP_MAX_LOG_RANGES = 4;
init( BACKUP_SIM_COPY_LOG_RANGES, 100 );
init( BACKUP_VERSION_DELAY, 5*CORE_VERSIONSPERSECOND );
@ -279,18 +275,21 @@ void ClientKnobs::initialize(Randomize randomize) {
init( BUSYNESS_SPIKE_START_THRESHOLD, 0.100 );
init( BUSYNESS_SPIKE_SATURATED_THRESHOLD, 0.500 );
// multi-version client control
init( MVC_CLIENTLIB_CHUNK_SIZE, 8*1024 );
init( MVC_CLIENTLIB_CHUNKS_PER_TRANSACTION, 32 );
// Blob granules
init( BG_MAX_GRANULE_PARALLELISM, 10 );
init( BG_TOO_MANY_GRANULES, 10000 );
init( CHANGE_QUORUM_BAD_STATE_RETRY_TIMES, 3 );
init( CHANGE_QUORUM_BAD_STATE_RETRY_DELAY, 2.0 );
// Tenants and Metacluster
init( MAX_TENANTS_PER_CLUSTER, 1e6 ); if ( randomize && BUGGIFY ) MAX_TENANTS_PER_CLUSTER = deterministicRandom()->randomInt(20, 100);
init( MAX_TENANTS_PER_CLUSTER, 1e6 );
init( TENANT_TOMBSTONE_CLEANUP_INTERVAL, 60 ); if ( randomize && BUGGIFY ) TENANT_TOMBSTONE_CLEANUP_INTERVAL = deterministicRandom()->random01() * 30;
init( MAX_DATA_CLUSTERS, 1e5 );
init( REMOVE_CLUSTER_TENANT_BATCH_SIZE, 1e4 ); if ( randomize && BUGGIFY ) REMOVE_CLUSTER_TENANT_BATCH_SIZE = 1;
init( METACLUSTER_ASSIGNMENT_CLUSTERS_TO_CHECK, 5 ); if ( randomize && BUGGIFY ) METACLUSTER_ASSIGNMENT_CLUSTERS_TO_CHECK = 1;
init( METACLUSTER_ASSIGNMENT_FIRST_CHOICE_DELAY, 1.0 ); if ( randomize && BUGGIFY ) METACLUSTER_ASSIGNMENT_FIRST_CHOICE_DELAY = deterministicRandom()->random01() * 60;
init( METACLUSTER_ASSIGNMENT_AVAILABILITY_TIMEOUT, 10.0 ); if ( randomize && BUGGIFY ) METACLUSTER_ASSIGNMENT_AVAILABILITY_TIMEOUT = 1 + deterministicRandom()->random01() * 59;
// clang-format on
}

View File

@ -23,6 +23,7 @@
#include "fdbclient/CommitTransaction.h"
#include "fdbclient/FDBTypes.h"
#include "fdbclient/ReadYourWrites.h"
#include "flow/UnitTest.h"
#include "flow/actorcompiler.h" // has to be last include
void KeyRangeActorMap::getRangesAffectedByInsertion(const KeyRangeRef& keys, std::vector<KeyRange>& affectedRanges) {
@ -35,32 +36,54 @@ void KeyRangeActorMap::getRangesAffectedByInsertion(const KeyRangeRef& keys, std
affectedRanges.push_back(KeyRangeRef(keys.end, e.end()));
}
RangeResult krmDecodeRanges(KeyRef mapPrefix, KeyRange keys, RangeResult kv) {
RangeResult krmDecodeRanges(KeyRef mapPrefix, KeyRange keys, RangeResult kv, bool align) {
ASSERT(!kv.more || kv.size() > 1);
KeyRange withPrefix =
KeyRangeRef(mapPrefix.toString() + keys.begin.toString(), mapPrefix.toString() + keys.end.toString());
ValueRef beginValue, endValue;
if (kv.size() && kv[0].key.startsWith(mapPrefix))
beginValue = kv[0].value;
if (kv.size() && kv.end()[-1].key.startsWith(mapPrefix))
endValue = kv.end()[-1].value;
RangeResult result;
result.arena().dependsOn(kv.arena());
result.arena().dependsOn(keys.arena());
result.push_back(result.arena(), KeyValueRef(keys.begin, beginValue));
// Always push a kv pair <= keys.begin.
KeyRef beginKey = keys.begin;
if (!align && !kv.empty() && kv.front().key.startsWith(mapPrefix) && kv.front().key < withPrefix.begin) {
beginKey = kv[0].key.removePrefix(mapPrefix);
}
ValueRef beginValue;
if (!kv.empty() && kv.front().key.startsWith(mapPrefix) && kv.front().key <= withPrefix.begin) {
beginValue = kv.front().value;
}
result.push_back(result.arena(), KeyValueRef(beginKey, beginValue));
for (int i = 0; i < kv.size(); i++) {
if (kv[i].key > withPrefix.begin && kv[i].key < withPrefix.end) {
KeyRef k = kv[i].key.removePrefix(mapPrefix);
result.push_back(result.arena(), KeyValueRef(k, kv[i].value));
} else if (kv[i].key >= withPrefix.end)
} else if (kv[i].key >= withPrefix.end) {
kv.more = false;
// There should be at most 1 value past mapPrefix + keys.end.
ASSERT(i == kv.size() - 1);
break;
}
}
if (!kv.more)
result.push_back(result.arena(), KeyValueRef(keys.end, endValue));
if (!kv.more) {
KeyRef endKey = keys.end;
if (!align && !kv.empty() && kv.back().key.startsWith(mapPrefix) && kv.back().key >= withPrefix.end) {
endKey = kv.back().key.removePrefix(mapPrefix);
}
ValueRef endValue;
if (!kv.empty()) {
// In the aligned case, carry the last value to be the end value.
if (align && kv.back().key.startsWith(mapPrefix) && kv.back().key > withPrefix.end) {
endValue = result.back().value;
} else {
endValue = kv.back().value;
}
}
result.push_back(result.arena(), KeyValueRef(endKey, endValue));
}
result.more = kv.more;
return result;
@ -93,6 +116,37 @@ ACTOR Future<RangeResult> krmGetRanges(Reference<ReadYourWritesTransaction> tr,
return krmDecodeRanges(mapPrefix, keys, kv);
}
// Returns keys.begin, all transitional points in keys, and keys.end, and their values
ACTOR Future<RangeResult> krmGetRangesUnaligned(Transaction* tr,
Key mapPrefix,
KeyRange keys,
int limit,
int limitBytes) {
KeyRange withPrefix =
KeyRangeRef(mapPrefix.toString() + keys.begin.toString(), mapPrefix.toString() + keys.end.toString());
state GetRangeLimits limits(limit, limitBytes);
limits.minRows = 2;
RangeResult kv = wait(tr->getRange(lastLessOrEqual(withPrefix.begin), firstGreaterThan(withPrefix.end), limits));
return krmDecodeRanges(mapPrefix, keys, kv, false);
}
ACTOR Future<RangeResult> krmGetRangesUnaligned(Reference<ReadYourWritesTransaction> tr,
Key mapPrefix,
KeyRange keys,
int limit,
int limitBytes) {
KeyRange withPrefix =
KeyRangeRef(mapPrefix.toString() + keys.begin.toString(), mapPrefix.toString() + keys.end.toString());
state GetRangeLimits limits(limit, limitBytes);
limits.minRows = 2;
RangeResult kv = wait(tr->getRange(lastLessOrEqual(withPrefix.begin), firstGreaterThan(withPrefix.end), limits));
return krmDecodeRanges(mapPrefix, keys, kv, false);
}
void krmSetPreviouslyEmptyRange(Transaction* tr,
const KeyRef& mapPrefix,
const KeyRangeRef& keys,
@ -254,3 +308,87 @@ Future<Void> krmSetRangeCoalescing(Reference<ReadYourWritesTransaction> const& t
Value const& value) {
return holdWhile(tr, krmSetRangeCoalescing_(tr.getPtr(), mapPrefix, range, maxRange, value));
}
TEST_CASE("/keyrangemap/decoderange/aligned") {
Arena arena;
Key prefix = LiteralStringRef("/prefix/");
StringRef fullKeyA = StringRef(arena, LiteralStringRef("/prefix/a"));
StringRef fullKeyB = StringRef(arena, LiteralStringRef("/prefix/b"));
StringRef fullKeyC = StringRef(arena, LiteralStringRef("/prefix/c"));
StringRef fullKeyD = StringRef(arena, LiteralStringRef("/prefix/d"));
StringRef keyA = StringRef(arena, LiteralStringRef("a"));
StringRef keyB = StringRef(arena, LiteralStringRef("b"));
StringRef keyC = StringRef(arena, LiteralStringRef("c"));
StringRef keyD = StringRef(arena, LiteralStringRef("d"));
StringRef keyE = StringRef(arena, LiteralStringRef("e"));
StringRef keyAB = StringRef(arena, LiteralStringRef("ab"));
StringRef keyCD = StringRef(arena, LiteralStringRef("cd"));
// Fake getRange() call.
RangeResult kv;
kv.push_back(arena, KeyValueRef(fullKeyA, keyA));
kv.push_back(arena, KeyValueRef(fullKeyB, keyB));
kv.push_back(arena, KeyValueRef(fullKeyC, keyC));
kv.push_back(arena, KeyValueRef(fullKeyD, keyD));
// [A, AB(start), B, C, CD(end), D]
RangeResult decodedRanges = krmDecodeRanges(prefix, KeyRangeRef(keyAB, keyCD), kv);
ASSERT(decodedRanges.size() == 4);
ASSERT(decodedRanges.front().key == keyAB);
ASSERT(decodedRanges.front().value == keyA);
ASSERT(decodedRanges.back().key == keyCD);
ASSERT(decodedRanges.back().value == keyC);
// [""(start), A, B, C, D, E(end)]
decodedRanges = krmDecodeRanges(prefix, KeyRangeRef(StringRef(), keyE), kv);
ASSERT(decodedRanges.size() == 6);
ASSERT(decodedRanges.front().key == StringRef());
ASSERT(decodedRanges.front().value == StringRef());
ASSERT(decodedRanges.back().key == keyE);
ASSERT(decodedRanges.back().value == keyD);
return Void();
}
TEST_CASE("/keyrangemap/decoderange/unaligned") {
Arena arena;
Key prefix = LiteralStringRef("/prefix/");
StringRef fullKeyA = StringRef(arena, LiteralStringRef("/prefix/a"));
StringRef fullKeyB = StringRef(arena, LiteralStringRef("/prefix/b"));
StringRef fullKeyC = StringRef(arena, LiteralStringRef("/prefix/c"));
StringRef fullKeyD = StringRef(arena, LiteralStringRef("/prefix/d"));
StringRef keyA = StringRef(arena, LiteralStringRef("a"));
StringRef keyB = StringRef(arena, LiteralStringRef("b"));
StringRef keyC = StringRef(arena, LiteralStringRef("c"));
StringRef keyD = StringRef(arena, LiteralStringRef("d"));
StringRef keyE = StringRef(arena, LiteralStringRef("e"));
StringRef keyAB = StringRef(arena, LiteralStringRef("ab"));
StringRef keyCD = StringRef(arena, LiteralStringRef("cd"));
// Fake getRange() call.
RangeResult kv;
kv.push_back(arena, KeyValueRef(fullKeyA, keyA));
kv.push_back(arena, KeyValueRef(fullKeyB, keyB));
kv.push_back(arena, KeyValueRef(fullKeyC, keyC));
kv.push_back(arena, KeyValueRef(fullKeyD, keyD));
// [A, AB(start), B, C, CD(end), D]
RangeResult decodedRanges = krmDecodeRanges(prefix, KeyRangeRef(keyAB, keyCD), kv, false);
ASSERT(decodedRanges.size() == 4);
ASSERT(decodedRanges.front().key == keyA);
ASSERT(decodedRanges.front().value == keyA);
ASSERT(decodedRanges.back().key == keyD);
ASSERT(decodedRanges.back().value == keyD);
// [""(start), A, B, C, D, E(end)]
decodedRanges = krmDecodeRanges(prefix, KeyRangeRef(StringRef(), keyE), kv, false);
ASSERT(decodedRanges.size() == 6);
ASSERT(decodedRanges.front().key == StringRef());
ASSERT(decodedRanges.front().value == StringRef());
ASSERT(decodedRanges.back().key == keyE);
ASSERT(decodedRanges.back().value == keyD);
return Void();
}

View File

@ -2559,7 +2559,7 @@ TEST_CASE("/ManagementAPI/AutoQuorumChange/checkLocality") {
ProcessClass(ProcessClass::CoordinatorClass, ProcessClass::CommandLineSource),
"",
"",
currentProtocolVersion);
currentProtocolVersion());
}
workers.push_back(data);

71
fdbclient/Metacluster.cpp Normal file
View File

@ -0,0 +1,71 @@
/*
* Metacluster.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/Metacluster.h"
#include "fdbclient/MetaclusterManagement.actor.h"
FDB_DEFINE_BOOLEAN_PARAM(AddNewTenants);
FDB_DEFINE_BOOLEAN_PARAM(RemoveMissingTenants);
std::string DataClusterEntry::clusterStateToString(DataClusterState clusterState) {
switch (clusterState) {
case DataClusterState::READY:
return "ready";
case DataClusterState::REMOVING:
return "removing";
case DataClusterState::RESTORING:
return "restoring";
default:
UNREACHABLE();
}
}
DataClusterState DataClusterEntry::stringToClusterState(std::string stateStr) {
if (stateStr == "ready") {
return DataClusterState::READY;
} else if (stateStr == "removing") {
return DataClusterState::REMOVING;
} else if (stateStr == "restoring") {
return DataClusterState::RESTORING;
}
UNREACHABLE();
}
json_spirit::mObject DataClusterEntry::toJson() const {
json_spirit::mObject obj;
obj["capacity"] = capacity.toJson();
obj["allocated"] = allocated.toJson();
obj["cluster_state"] = DataClusterEntry::clusterStateToString(clusterState);
return obj;
}
json_spirit::mObject ClusterUsage::toJson() const {
json_spirit::mObject obj;
obj["num_tenant_groups"] = numTenantGroups;
return obj;
}
KeyBackedObjectProperty<MetaclusterRegistrationEntry, decltype(IncludeVersion())>&
MetaclusterMetadata::metaclusterRegistration() {
static KeyBackedObjectProperty<MetaclusterRegistrationEntry, decltype(IncludeVersion())> instance(
"\xff/metacluster/clusterRegistration"_sr, IncludeVersion());
return instance;
}

View File

@ -0,0 +1,67 @@
/*
* MetaclusterManagement.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2022 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/ClusterConnectionMemoryRecord.h"
#include "fdbclient/DatabaseContext.h"
#include "fdbclient/FDBTypes.h"
#include "fdbclient/MetaclusterManagement.actor.h"
#include "fdbclient/ThreadSafeTransaction.h"
#include "flow/actorcompiler.h" // has to be last include
namespace MetaclusterAPI {
ACTOR Future<Reference<IDatabase>> openDatabase(ClusterConnectionString connectionString) {
if (g_network->isSimulated()) {
Reference<IClusterConnectionRecord> clusterFile =
makeReference<ClusterConnectionMemoryRecord>(connectionString);
Database nativeDb = Database::createDatabase(clusterFile, -1);
Reference<IDatabase> threadSafeDb =
wait(unsafeThreadFutureToFuture(ThreadSafeDatabase::createFromExistingDatabase(nativeDb)));
return MultiVersionDatabase::debugCreateFromExistingDatabase(threadSafeDb);
} else {
return MultiVersionApi::api->createDatabaseFromConnectionString(connectionString.toString().c_str());
}
}
KeyBackedObjectMap<ClusterName, DataClusterEntry, decltype(IncludeVersion())>&
ManagementClusterMetadata::dataClusters() {
static KeyBackedObjectMap<ClusterName, DataClusterEntry, decltype(IncludeVersion())> instance(
"metacluster/dataCluster/metadata/"_sr, IncludeVersion());
return instance;
}
KeyBackedMap<ClusterName,
ClusterConnectionString,
TupleCodec<ClusterName>,
ManagementClusterMetadata::ConnectionStringCodec>
ManagementClusterMetadata::dataClusterConnectionRecords("metacluster/dataCluster/connectionString/"_sr);
KeyBackedSet<Tuple> ManagementClusterMetadata::clusterCapacityIndex("metacluster/clusterCapacityIndex/"_sr);
KeyBackedMap<ClusterName, int64_t, TupleCodec<ClusterName>, BinaryCodec<int64_t>>
ManagementClusterMetadata::clusterTenantCount("metacluster/clusterTenantCount/"_sr);
KeyBackedSet<Tuple> ManagementClusterMetadata::clusterTenantIndex("metacluster/dataCluster/tenantMap/"_sr);
KeyBackedSet<Tuple> ManagementClusterMetadata::clusterTenantGroupIndex("metacluster/dataCluster/tenantGroupMap/"_sr);
TenantMetadataSpecification& ManagementClusterMetadata::tenantMetadata() {
static TenantMetadataSpecification instance(""_sr);
return instance;
}
}; // namespace MetaclusterAPI

View File

@ -663,69 +663,43 @@ ACTOR Future<Void> asyncDeserializeClusterInterface(Reference<AsyncVar<Value>> s
}
}
struct ClientStatusStats {
int count;
std::vector<std::pair<NetworkAddress, Key>> examples;
namespace {
ClientStatusStats() : count(0) { examples.reserve(CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT); }
};
void tryInsertIntoSamples(OpenDatabaseRequest::Samples& samples,
const NetworkAddress& networkAddress,
const Key& traceLogGroup) {
++samples.count;
if (samples.samples.size() < static_cast<size_t>(CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT)) {
samples.samples.insert({ networkAddress, traceLogGroup });
}
}
} // namespace
OpenDatabaseRequest ClientData::getRequest() {
OpenDatabaseRequest req;
std::map<StringRef, ClientStatusStats> issueMap;
std::map<ClientVersionRef, ClientStatusStats> versionMap;
std::map<StringRef, ClientStatusStats> maxProtocolMap;
int clientCount = 0;
// SOMEDAY: add a yield in this loop
for (auto& ci : clientStatusInfoMap) {
for (auto& it : ci.second.issues) {
auto& entry = issueMap[it];
entry.count++;
if (entry.examples.size() < CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT) {
entry.examples.emplace_back(ci.first, ci.second.traceLogGroup);
}
}
if (ci.second.versions.size()) {
clientCount++;
StringRef maxProtocol;
for (auto& it : ci.second.versions) {
maxProtocol = std::max(maxProtocol, it.protocolVersion);
auto& entry = versionMap[it];
entry.count++;
if (entry.examples.size() < CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT) {
entry.examples.emplace_back(ci.first, ci.second.traceLogGroup);
}
}
auto& maxEntry = maxProtocolMap[maxProtocol];
maxEntry.count++;
if (maxEntry.examples.size() < CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT) {
maxEntry.examples.emplace_back(ci.first, ci.second.traceLogGroup);
}
} else {
auto& entry = versionMap[ClientVersionRef()];
entry.count++;
if (entry.examples.size() < CLIENT_KNOBS->CLIENT_EXAMPLE_AMOUNT) {
entry.examples.emplace_back(ci.first, ci.second.traceLogGroup);
}
}
}
const auto& networkAddress = ci.first;
const auto& traceLogGroup = ci.second.traceLogGroup;
req.issues.reserve(issueMap.size());
for (auto& it : issueMap) {
req.issues.push_back(ItemWithExamples<Key>(it.first, it.second.count, it.second.examples));
for (auto& issue : ci.second.issues) {
tryInsertIntoSamples(req.issues[issue], networkAddress, traceLogGroup);
}
if (!ci.second.versions.size()) {
tryInsertIntoSamples(req.supportedVersions[ClientVersionRef()], networkAddress, traceLogGroup);
continue;
}
++req.clientCount;
StringRef maxProtocol;
for (auto& it : ci.second.versions) {
maxProtocol = std::max(maxProtocol, it.protocolVersion);
tryInsertIntoSamples(req.supportedVersions[it], networkAddress, traceLogGroup);
}
tryInsertIntoSamples(req.maxProtocolSupported[maxProtocol], networkAddress, traceLogGroup);
}
req.supportedVersions.reserve(versionMap.size());
for (auto& it : versionMap) {
req.supportedVersions.push_back(
ItemWithExamples<Standalone<ClientVersionRef>>(it.first, it.second.count, it.second.examples));
}
req.maxProtocolSupported.reserve(maxProtocolMap.size());
for (auto& it : maxProtocolMap) {
req.maxProtocolSupported.push_back(ItemWithExamples<Key>(it.first, it.second.count, it.second.examples));
}
req.clientCount = clientCount;
return req;
}

Some files were not shown because too many files have changed in this diff Show More