Add option to limit resident memory and remove default memory limit (#6719)

Changing `memory` option to limit resident memory instead of virtual memory, in config file and fdbserver/fdbbackup/fdbcli command-line argument. Since `rlimit` doesn't support limiting virtual memory, the current implementation have both of fdbmonitor and the fdbserver/fdbbackup process checking process RSS periodically and kill and restart the process if the limit is exceeded.

Adding a new `memory_vsize` option to limit virtual memory, if backward-compatible behavior is desired.

closes #6671, closes #6672
This commit is contained in:
Yi Wu 2022-04-06 20:06:24 -07:00 committed by GitHub
parent f62904187e
commit 994b8c92f8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
12 changed files with 263 additions and 23 deletions

View File

@ -271,6 +271,7 @@ Using the default parameters, a process will restart immediately if it fails and
# maxlogssize = 100MiB
# class =
# memory = 8GiB
# memory-vsize =
# storage-memory = 1GiB
# cache-memory = 2GiB
# locality-machineid =
@ -292,7 +293,8 @@ Contains default parameters for all fdbserver processes on this machine. These s
* ``logsize``: Roll over to a new log file after the current log file reaches the specified size. The default value is 10MiB.
* ``maxlogssize``: Delete the oldest log file when the total size of all log files exceeds the specified size. If set to 0B, old log files will not be deleted. The default value is 100MiB.
* ``class``: Process class specifying the roles that will be taken in the cluster. Recommended options are ``storage``, ``transaction``, ``stateless``. See :ref:`guidelines-process-class-config` for process class config recommendations.
* ``memory``: Maximum memory used by the process. The default value is 8GiB. When specified without a unit, MiB is assumed. This parameter does not change the memory allocation of the program. Rather, it sets a hard limit beyond which the process will kill itself and be restarted. The default value of 8GiB is double the intended memory usage in the default configuration (providing an emergency buffer to deal with memory leaks or similar problems). It is *not* recommended to decrease the value of this parameter below its default value. It may be *increased* if you wish to allocate a very large amount of storage engine memory or cache. In particular, when the ``storage-memory`` or ``cache-memory`` parameters are increased, the ``memory`` parameter should be increased by an equal amount.
* ``memory``: Maximum resident memory used by the process. The default value is 8GiB. When specified without a unit, MiB is assumed. Setting to 0 means unlimited. This parameter does not change the memory allocation of the program. Rather, it sets a hard limit beyond which the process will kill itself and be restarted. The default value of 8GiB is double the intended memory usage in the default configuration (providing an emergency buffer to deal with memory leaks or similar problems). It is *not* recommended to decrease the value of this parameter below its default value. It may be *increased* if you wish to allocate a very large amount of storage engine memory or cache. In particular, when the ``storage-memory`` or ``cache-memory`` parameters are increased, the ``memory`` parameter should be increased by an equal amount.
* ``memory-vsize``: Maximum virtual memory used by the process. The default value is 0, which means unlimited. When specified without a unit, MiB is assumed. Same as ``memory``, this parameter does not change the memory allocation of the program. Rather, it sets a hard limit beyond which the process will kill itself and be restarted.
* ``storage-memory``: Maximum memory used for data storage. This parameter is used *only* with memory storage engine, not the ssd storage engine. The default value is 1GiB. When specified without a unit, MB is assumed. Clusters will be restricted to using this amount of memory per process for purposes of data storage. Memory overhead associated with storing the data is counted against this total. If you increase the ``storage-memory`` parameter, you should also increase the ``memory`` parameter by the same amount.
* ``cache-memory``: Maximum memory used for caching pages from disk. The default value is 2GiB. When specified without a unit, MiB is assumed. If you increase the ``cache-memory`` parameter, you should also increase the ``memory`` parameter by the same amount.
* ``locality-machineid``: Machine identifier key. All processes on a machine should share a unique id. By default, processes on a machine determine a unique id to share. This does not generally need to be set.

View File

@ -29,6 +29,7 @@ Bindings
Other Changes
-------------
* OpenTracing support is now deprecated in favor of OpenTelemetry tracing, which will be enabled in a future release. `(PR #6478) <https://github.com/apple/foundationdb/pull/6478/files>`_
* Changed ``memory`` option to limit resident memory instead of virtual memory. Added a new ``memory_vsize`` option if limiting virtual memory is desired. `(PR #6719) <https://github.com/apple/foundationdb/pull/6719>`_
Earlier release notes
---------------------

View File

@ -24,6 +24,7 @@
#include "flow/Arena.h"
#include "flow/ArgParseUtil.h"
#include "flow/Error.h"
#include "flow/SystemMonitor.h"
#include "flow/Trace.h"
#define BOOST_DATE_TIME_NO_LIB
#include <boost/interprocess/managed_shared_memory.hpp>
@ -171,6 +172,7 @@ enum {
OPT_KNOB,
OPT_TRACE_LOG_GROUP,
OPT_MEMLIMIT,
OPT_VMEMLIMIT,
OPT_LOCALITY,
// DB constants
@ -212,6 +214,7 @@ CSimpleOpt::SOption g_rgAgentOptions[] = {
{ OPT_LOCALITY, "--locality-", SO_REQ_SEP },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -257,6 +260,7 @@ CSimpleOpt::SOption g_rgBackupStartOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -283,6 +287,7 @@ CSimpleOpt::SOption g_rgBackupModifyOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -323,6 +328,7 @@ CSimpleOpt::SOption g_rgBackupStatusOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -352,6 +358,7 @@ CSimpleOpt::SOption g_rgBackupAbortOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -378,6 +385,7 @@ CSimpleOpt::SOption g_rgBackupCleanupOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -410,6 +418,7 @@ CSimpleOpt::SOption g_rgBackupDiscontinueOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -440,6 +449,7 @@ CSimpleOpt::SOption g_rgBackupWaitOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -466,6 +476,7 @@ CSimpleOpt::SOption g_rgBackupPauseOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -495,6 +506,7 @@ CSimpleOpt::SOption g_rgBackupExpireOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -531,6 +543,7 @@ CSimpleOpt::SOption g_rgBackupDeleteOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -561,6 +574,7 @@ CSimpleOpt::SOption g_rgBackupDescribeOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -593,6 +607,7 @@ CSimpleOpt::SOption g_rgBackupDumpOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -640,6 +655,7 @@ CSimpleOpt::SOption g_rgBackupListOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -675,6 +691,7 @@ CSimpleOpt::SOption g_rgBackupQueryOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -720,6 +737,7 @@ CSimpleOpt::SOption g_rgRestoreOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -757,6 +775,7 @@ CSimpleOpt::SOption g_rgDBAgentOptions[] = {
{ OPT_LOCALITY, "--locality-", SO_REQ_SEP },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -788,6 +807,7 @@ CSimpleOpt::SOption g_rgDBStartOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -820,6 +840,7 @@ CSimpleOpt::SOption g_rgDBStatusOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -851,6 +872,7 @@ CSimpleOpt::SOption g_rgDBSwitchOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -883,6 +905,7 @@ CSimpleOpt::SOption g_rgDBAbortOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -911,6 +934,7 @@ CSimpleOpt::SOption g_rgDBPauseOptions[] = {
{ OPT_CRASHONERROR, "--crash", SO_NONE },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_HELP, "-?", SO_NONE },
{ OPT_HELP, "-h", SO_NONE },
{ OPT_HELP, "--help", SO_NONE },
@ -3411,6 +3435,7 @@ int main(int argc, char* argv[]) {
DstOnly dstOnly{ false };
LocalityData localities;
uint64_t memLimit = 8LL << 30;
uint64_t virtualMemLimit = 0; // unlimited
Optional<uint64_t> ti;
BackupTLSConfig tlsConfig;
Version dumpBegin = 0;
@ -3756,6 +3781,15 @@ int main(int argc, char* argv[]) {
}
memLimit = ti.get();
break;
case OPT_VMEMLIMIT:
ti = parse_with_suffix(args->OptionArg(), "MiB");
if (!ti.present()) {
fprintf(stderr, "ERROR: Could not parse virtual memory limit from `%s'\n", args->OptionArg());
printHelpTeaser(argv[0]);
flushAndExit(FDB_EXIT_ERROR);
}
virtualMemLimit = ti.get();
break;
case OPT_BLOB_CREDENTIALS:
tlsConfig.blobCredentials.push_back(args->OptionArg());
break;
@ -3885,7 +3919,7 @@ int main(int argc, char* argv[]) {
Error::init();
std::set_new_handler(&platform::outOfMemory);
setMemoryQuota(memLimit);
setMemoryQuota(virtualMemLimit);
Database db;
Database sourceDb;
@ -3902,6 +3936,8 @@ int main(int argc, char* argv[]) {
return FDB_EXIT_ERROR;
}
Future<Void> memoryUsageMonitor = startMemoryUsageMonitor(memLimit);
IKnobCollection::setupKnobs(knobs);
// Reinitialize knobs in order to update knobs that are dependent on explicitly set knobs
IKnobCollection::getMutableGlobalKnobCollection().initialize(Randomize::False, IsSimulated::False);

View File

@ -42,10 +42,12 @@
#include "fdbclient/Tuple.h"
#include "fdbclient/ThreadSafeTransaction.h"
#include "flow/flow.h"
#include "flow/ArgParseUtil.h"
#include "flow/DeterministicRandom.h"
#include "flow/FastRef.h"
#include "flow/Platform.h"
#include "flow/SystemMonitor.h"
#include "flow/TLSConfig.actor.h"
#include "flow/ThreadHelper.actor.h"
@ -98,6 +100,7 @@ enum {
OPT_KNOB,
OPT_DEBUG_TLS,
OPT_API_VERSION,
OPT_MEMORY,
};
CSimpleOpt::SOption g_rgOptions[] = { { OPT_CONNFILE, "-C", SO_REQ_SEP },
@ -121,6 +124,7 @@ CSimpleOpt::SOption g_rgOptions[] = { { OPT_CONNFILE, "-C", SO_REQ_SEP },
{ OPT_KNOB, "--knob-", SO_REQ_SEP },
{ OPT_DEBUG_TLS, "--debug-tls", SO_NONE },
{ OPT_API_VERSION, "--api-version", SO_REQ_SEP },
{ OPT_MEMORY, "--memory", SO_REQ_SEP },
#ifndef TLS_DISABLED
TLS_OPTION_FLAGS
@ -453,6 +457,7 @@ static void printProgramUsage(const char* name) {
" --debug-tls Prints the TLS configuration and certificate chain, then exits.\n"
" Useful in reporting and diagnosing TLS issues.\n"
" --build-flags Print build information and exit.\n"
" --memory Resident memory limit of the CLI (defaults to 8GiB).\n"
" -v, --version Print FoundationDB CLI version information and exit.\n"
" -h, --help Display this help and exit.\n");
}
@ -990,6 +995,7 @@ struct CLIOptions {
std::string tlsVerifyPeers;
std::string tlsCAPath;
std::string tlsPassword;
uint64_t memLimit = 8uLL << 30;
std::vector<std::pair<std::string, std::string>> knobs;
@ -1053,6 +1059,11 @@ struct CLIOptions {
}
break;
}
case OPT_MEMORY: {
std::string memoryArg(args.OptionArg());
memLimit = parse_with_suffix(memoryArg, "MiB").orDefault(8uLL << 30);
break;
}
case OPT_TRACE:
trace = true;
break;
@ -2117,8 +2128,6 @@ int main(int argc, char** argv) {
platformInit();
Error::init();
std::set_new_handler(&platform::outOfMemory);
uint64_t memLimit = 8LL << 30;
setMemoryQuota(memLimit);
registerCrashHandler();
@ -2230,6 +2239,7 @@ int main(int argc, char** argv) {
if (opt.exit_code != -1) {
return opt.exit_code;
}
Future<Void> memoryUsageMonitor = startMemoryUsageMonitor(opt.memLimit);
Future<int> cliFuture = runCli(opt);
Future<Void> timeoutFuture = opt.exit_timeout ? timeExit(opt.exit_timeout) : Never();
auto f = stopNetworkAfter(success(cliFuture) || timeoutFuture);

View File

@ -80,6 +80,9 @@
#include "fdbclient/SimpleIni.h"
#include "fdbclient/versions.h"
constexpr uint64_t DEFAULT_MEMORY_LIMIT = 8LL << 30;
constexpr double MEMORY_CHECK_INTERVAL = 2.0; // seconds
#ifdef __linux__
typedef fd_set* fdb_fd_set;
#elif defined(__APPLE__) || defined(__FreeBSD__)
@ -397,6 +400,47 @@ int mkdir(std::string const& directory) {
return 0;
}
// Parse size value with same format as parse_with_suffix in flow.h
uint64_t parseWithSuffix(const char* to_parse, const char* default_unit = nullptr) {
char* end_ptr = nullptr;
uint64_t ret = strtoull(to_parse, &end_ptr, 10);
if (end_ptr == to_parse) {
// failed to parse
return 0;
}
const char* unit = default_unit;
if (*end_ptr != 0) {
unit = end_ptr;
}
if (unit == nullptr) {
// no unit found
return 0;
}
if (strcmp(end_ptr, "B") == 0) {
// do nothing
} else if (strcmp(unit, "KB") == 0) {
ret *= static_cast<uint64_t>(1e3);
} else if (strcmp(unit, "KiB") == 0) {
ret *= 1ull << 10;
} else if (strcmp(unit, "MB") == 0) {
ret *= static_cast<uint64_t>(1e6);
} else if (strcmp(unit, "MiB") == 0) {
ret *= 1ull << 20;
} else if (strcmp(unit, "GB") == 0) {
ret *= static_cast<uint64_t>(1e9);
} else if (strcmp(unit, "GiB") == 0) {
ret *= 1ull << 30;
} else if (strcmp(unit, "TB") == 0) {
ret *= static_cast<uint64_t>(1e12);
} else if (strcmp(unit, "TiB") == 0) {
ret *= 1ull << 40;
} else {
// unrecognized unit
ret = 0;
}
return ret;
}
struct Command {
private:
std::vector<std::string> commands;
@ -416,6 +460,7 @@ public:
const char* delete_envvars;
bool deconfigured;
bool kill_on_configuration_change;
uint64_t memory_rss;
// one pair for each of stdout and stderr
int pipes[2][2];
@ -423,7 +468,7 @@ public:
Command() : argv(nullptr) {}
Command(const CSimpleIni& ini, std::string _section, ProcessID id, fdb_fd_set fds, int* maxfd)
: fds(fds), argv(nullptr), section(_section), fork_retry_time(-1), quiet(false), delete_envvars(nullptr),
deconfigured(false), kill_on_configuration_change(true) {
deconfigured(false), kill_on_configuration_change(true), memory_rss(0) {
char _ssection[strlen(section.c_str()) + 22];
snprintf(_ssection, strlen(section.c_str()) + 22, "%s", id.c_str());
ssection = _ssection;
@ -529,6 +574,22 @@ public:
log_msg(SevError, "Unable to resolve command for %s\n", ssection.c_str());
return;
}
const char* mem_rss = get_value_multi(ini, "memory", ssection.c_str(), section.c_str(), "general", nullptr);
#ifdef __linux__
if (mem_rss) {
memory_rss = parseWithSuffix(mem_rss, "MiB");
} else {
memory_rss = DEFAULT_MEMORY_LIMIT;
}
#else
if (mem_rss) {
// While the memory check is not currently implemented on non-Linux by fdbmonitor, the "memory" option is
// still pass to fdbserver, which will crash itself if the limit is exceeded.
log_msg(SevWarn, "Memory monitoring by fdbmonitor is not supported by current system\n");
}
#endif
std::stringstream ss(binary);
std::copy(std::istream_iterator<std::string>(ss),
std::istream_iterator<std::string>(),
@ -537,6 +598,7 @@ public:
const char* id_s = ssection.c_str() + strlen(section.c_str()) + 1;
for (auto i : keys) {
// For "memory" option, despite they are handled by fdbmonitor, we still pass it to fdbserver.
if (isParameterNameEqual(i.pItem, "command") || isParameterNameEqual(i.pItem, "restart-delay") ||
isParameterNameEqual(i.pItem, "initial-restart-delay") ||
isParameterNameEqual(i.pItem, "restart-backoff") ||
@ -642,6 +704,31 @@ CSimpleOpt::SOption g_rgOptions[] = { { OPT_CONFFILE, "--conffile", SO_REQ_SEP }
{ OPT_HELP, "--help", SO_NONE },
SO_END_OF_OPTIONS };
// Return resident memory in bytes for the given process, or 0 if error.
uint64_t getRss(ProcessID id) {
#ifndef __linux__
// TODO: implement for non-linux
return 0;
#else
pid_t pid = id_pid[id];
char stat_path[100];
snprintf(stat_path, sizeof(stat_path), "/proc/%d/statm", pid);
FILE* stat_file = fopen(stat_path, "r");
if (stat_file == nullptr) {
log_msg(SevWarn, "Unable to open stat file for %s\n", id.c_str());
return 0;
}
long rss = 0;
int ret = fscanf(stat_file, "%*s%ld", &rss);
if (ret == 0) {
log_msg(SevWarn, "Unable to parse rss size for %s\n", id.c_str());
return 0;
}
fclose(stat_file);
return static_cast<uint64_t>(rss) * sysconf(_SC_PAGESIZE);
#endif
}
void start_process(Command* cmd, ProcessID id, uid_t uid, gid_t gid, int delay, sigset_t* mask) {
if (!cmd->argv)
return;
@ -797,7 +884,7 @@ bool argv_equal(const char** a1, const char** a2) {
return true;
}
void kill_process(ProcessID id, bool wait = true) {
void kill_process(ProcessID id, bool wait = true, bool cleanup = true) {
pid_t pid = id_pid[id];
log_msg(SevInfo, "Killing process %d\n", pid);
@ -807,9 +894,11 @@ void kill_process(ProcessID id, bool wait = true) {
waitpid(pid, nullptr, 0);
}
if (cleanup) {
pid_id.erase(pid);
id_pid.erase(id);
}
}
void load_conf(const char* confpath, uid_t& uid, gid_t& gid, sigset_t* mask, fdb_fd_set rfds, int* maxfd) {
log_msg(SevInfo, "Loading configuration %s\n", confpath);
@ -1477,6 +1566,7 @@ int main(int argc, char** argv) {
#endif
bool reload = true;
double last_rss_check = timer();
while (1) {
if (reload) {
reload = false;
@ -1534,21 +1624,37 @@ int main(int argc, char** argv) {
}
double end_time = std::numeric_limits<double>::max();
double now = timer();
// True if any process has a resident memory limit
bool need_rss_check = false;
for (auto& i : id_command) {
if (i.second->fork_retry_time >= 0) {
end_time = std::min(i.second->fork_retry_time, end_time);
}
// If process has a resident memory limit and is currently running
if (i.second->memory_rss > 0 && id_pid.count(i.first) > 0) {
need_rss_check = true;
}
}
bool timeout_for_rss_check = false;
if (need_rss_check && end_time > last_rss_check + MEMORY_CHECK_INTERVAL) {
end_time = last_rss_check + MEMORY_CHECK_INTERVAL;
timeout_for_rss_check = true;
}
struct timespec tv;
double timeout = -1;
if (end_time < std::numeric_limits<double>::max()) {
timeout = std::max(0.0, end_time - timer());
timeout = std::max(0.0, end_time - now);
if (timeout > 0) {
tv.tv_sec = timeout;
tv.tv_nsec = 1e9 * (timeout - tv.tv_sec);
}
}
bool is_timeout = false;
#ifdef __linux__
/* Block until something interesting happens (while atomically
unblocking signals) */
@ -1561,8 +1667,11 @@ int main(int argc, char** argv) {
}
if (nfds == 0) {
is_timeout = true;
if (!timeout_for_rss_check) {
reload = true;
}
}
#elif defined(__APPLE__) || defined(__FreeBSD__)
int nev = 0;
if (timeout < 0) {
@ -1572,8 +1681,11 @@ int main(int argc, char** argv) {
}
if (nev == 0) {
is_timeout = true;
if (!timeout_for_rss_check) {
reload = true;
}
}
if (nev > 0) {
switch (ev.filter) {
@ -1621,6 +1733,39 @@ int main(int argc, char** argv) {
}
#endif
if (is_timeout && timeout_for_rss_check) {
last_rss_check = timer();
std::vector<ProcessID> oom_ids;
for (auto& i : id_command) {
if (id_pid.count(i.first) == 0) {
// process is not running
continue;
}
uint64_t rss_limit = i.second->memory_rss;
if (rss_limit == 0) {
continue;
}
uint64_t current_rss = getRss(i.first);
if (current_rss > rss_limit) {
log_process_msg(SevWarn,
i.second->ssection.c_str(),
"Process %d being killed for exceeding resident memory limit, current %" PRIu64
" , limit %" PRIu64 "\n",
id_pid[i.first],
current_rss,
i.second->memory_rss);
oom_ids.push_back(i.first);
}
}
// kill process without waiting, and rely on the SIGCHLD handling logic below to restart the process.
for (auto& id : oom_ids) {
kill_process(id, false /*wait*/, false /*cleanup*/);
}
if (oom_ids.size() > 0) {
child_exited = true;
}
}
/* select() could have returned because received an exit signal */
if (exit_signal > 0) {
switch (exit_signal) {

View File

@ -2182,10 +2182,13 @@ ACTOR Future<Void> commitProxyServerCore(CommitProxyInterface proxy,
// ((SERVER_MEM_LIMIT * COMMIT_BATCHES_MEM_FRACTION_OF_TOTAL) / COMMIT_BATCHES_MEM_TO_TOTAL_MEM_SCALE_FACTOR) is
// only a approximate formula for limiting the memory used. COMMIT_BATCHES_MEM_TO_TOTAL_MEM_SCALE_FACTOR is an
// estimate based on experiments and not an accurate one.
state int64_t commitBatchesMemoryLimit = std::min(
SERVER_KNOBS->COMMIT_BATCHES_MEM_BYTES_HARD_LIMIT,
state int64_t commitBatchesMemoryLimit = SERVER_KNOBS->COMMIT_BATCHES_MEM_BYTES_HARD_LIMIT;
if (SERVER_KNOBS->SERVER_MEM_LIMIT > 0) {
commitBatchesMemoryLimit = std::min(
commitBatchesMemoryLimit,
static_cast<int64_t>((SERVER_KNOBS->SERVER_MEM_LIMIT * SERVER_KNOBS->COMMIT_BATCHES_MEM_FRACTION_OF_TOTAL) /
SERVER_KNOBS->COMMIT_BATCHES_MEM_TO_TOTAL_MEM_SCALE_FACTOR));
}
TraceEvent(SevInfo, "CommitBatchesMemoryLimit").detail("BytesLimit", commitBatchesMemoryLimit);
addActor.send(monitorRemoteCommitted(&commitData));

View File

@ -103,7 +103,7 @@ using namespace std::literals;
// clang-format off
enum {
OPT_CONNFILE, OPT_SEEDCONNFILE, OPT_SEEDCONNSTRING, OPT_ROLE, OPT_LISTEN, OPT_PUBLICADDR, OPT_DATAFOLDER, OPT_LOGFOLDER, OPT_PARENTPID, OPT_TRACER, OPT_NEWCONSOLE,
OPT_NOBOX, OPT_TESTFILE, OPT_RESTARTING, OPT_RESTORING, OPT_RANDOMSEED, OPT_KEY, OPT_MEMLIMIT, OPT_STORAGEMEMLIMIT, OPT_CACHEMEMLIMIT, OPT_MACHINEID,
OPT_NOBOX, OPT_TESTFILE, OPT_RESTARTING, OPT_RESTORING, OPT_RANDOMSEED, OPT_KEY, OPT_MEMLIMIT, OPT_VMEMLIMIT, OPT_STORAGEMEMLIMIT, OPT_CACHEMEMLIMIT, OPT_MACHINEID,
OPT_DCID, OPT_MACHINE_CLASS, OPT_BUGGIFY, OPT_VERSION, OPT_BUILD_FLAGS, OPT_CRASHONERROR, OPT_HELP, OPT_NETWORKIMPL, OPT_NOBUFSTDOUT, OPT_BUFSTDOUTERR,
OPT_TRACECLOCK, OPT_NUMTESTERS, OPT_DEVHELP, OPT_ROLLSIZE, OPT_MAXLOGS, OPT_MAXLOGSSIZE, OPT_KNOB, OPT_UNITTESTPARAM, OPT_TESTSERVERS, OPT_TEST_ON_SERVERS, OPT_METRICSCONNFILE,
OPT_METRICSPREFIX, OPT_LOGGROUP, OPT_LOCALITY, OPT_IO_TRUST_SECONDS, OPT_IO_TRUST_WARN_ONLY, OPT_FILESYSTEM, OPT_PROFILER_RSS_SIZE, OPT_KVFILE,
@ -153,6 +153,7 @@ CSimpleOpt::SOption g_rgOptions[] = {
{ OPT_KEY, "--key", SO_REQ_SEP },
{ OPT_MEMLIMIT, "-m", SO_REQ_SEP },
{ OPT_MEMLIMIT, "--memory", SO_REQ_SEP },
{ OPT_VMEMLIMIT, "--memory-vsize", SO_REQ_SEP },
{ OPT_STORAGEMEMLIMIT, "-M", SO_REQ_SEP },
{ OPT_STORAGEMEMLIMIT, "--storage-memory", SO_REQ_SEP },
{ OPT_CACHEMEMLIMIT, "--cache-memory", SO_REQ_SEP },
@ -634,7 +635,10 @@ static void printUsage(const char* name, bool devhelp) {
" Define a locality key. LOCALITYKEY is case-insensitive though"
" LOCALITYVALUE is not.");
printOptionUsage("-m SIZE, --memory SIZE",
" Memory limit. The default value is 8GiB. When specified"
" Resident memory limit. The default value is 8GiB. When specified"
" without a unit, MiB is assumed.");
printOptionUsage("--memory-vsize SIZE",
" Virtual memory limit. The default value is unlimited. When specified"
" without a unit, MiB is assumed.");
printOptionUsage("-M SIZE, --storage-memory SIZE",
" Maximum amount of memory used for storage. The default"
@ -1002,9 +1006,10 @@ struct CLIOptions {
NetworkAddressList publicAddresses, listenAddresses;
const char* targetKey = nullptr;
int64_t memLimit =
uint64_t memLimit =
8LL << 30; // Nice to maintain the same default value for memLimit and SERVER_KNOBS->SERVER_MEM_LIMIT and
// SERVER_KNOBS->COMMIT_BATCHES_MEM_BYTES_HARD_LIMIT
uint64_t virtualMemLimit = 0; // unlimited
uint64_t storageMemLimit = 1LL << 30;
bool buggifyEnabled = false, faultInjectionEnabled = true, restarting = false;
Optional<Standalone<StringRef>> zoneId;
@ -1434,6 +1439,15 @@ private:
}
memLimit = ti.get();
break;
case OPT_VMEMLIMIT:
ti = parse_with_suffix(args.OptionArg(), "MiB");
if (!ti.present()) {
fprintf(stderr, "ERROR: Could not parse virtual memory limit from `%s'\n", args.OptionArg());
printHelpTeaser(argv[0]);
flushAndExit(FDB_EXIT_ERROR);
}
virtualMemLimit = ti.get();
break;
case OPT_STORAGEMEMLIMIT:
ti = parse_with_suffix(args.OptionArg(), "MB");
if (!ti.present()) {
@ -1783,19 +1797,25 @@ int main(int argc, char* argv[]) {
auto& g_knobs = IKnobCollection::getMutableGlobalKnobCollection();
g_knobs.setKnob("log_directory", KnobValue::create(opts.logFolder));
g_knobs.setKnob("conn_file", KnobValue::create(opts.connFile));
if (role != ServerRole::Simulation) {
g_knobs.setKnob("commit_batches_mem_bytes_hard_limit", KnobValue::create(int64_t{ opts.memLimit }));
if (role != ServerRole::Simulation && opts.memLimit > 0) {
g_knobs.setKnob("commit_batches_mem_bytes_hard_limit",
KnobValue::create(static_cast<int64_t>(opts.memLimit)));
}
IKnobCollection::setupKnobs(opts.knobs);
g_knobs.setKnob("server_mem_limit", KnobValue::create(int64_t{ opts.memLimit }));
g_knobs.setKnob("server_mem_limit", KnobValue::create(static_cast<int64_t>(opts.memLimit)));
// Reinitialize knobs in order to update knobs that are dependent on explicitly set knobs
g_knobs.initialize(Randomize::True, role == ServerRole::Simulation ? IsSimulated::True : IsSimulated::False);
// evictionPolicyStringToEnum will throw an exception if the string is not recognized as a valid
EvictablePageCache::evictionPolicyStringToEnum(FLOW_KNOBS->CACHE_EVICTION_POLICY);
if (opts.memLimit <= FLOW_KNOBS->PAGE_CACHE_4K) {
if (opts.memLimit > 0 && opts.virtualMemLimit > 0 && opts.memLimit > opts.virtualMemLimit) {
fprintf(stderr, "ERROR : --memory-vsize has to be no less than --memory");
flushAndExit(FDB_EXIT_ERROR);
}
if (opts.memLimit > 0 && opts.memLimit <= FLOW_KNOBS->PAGE_CACHE_4K) {
fprintf(stderr, "ERROR: --memory has to be larger than --cache-memory\n");
flushAndExit(FDB_EXIT_ERROR);
}
@ -1912,11 +1932,13 @@ int main(int argc, char* argv[]) {
.detail("BuggifyEnabled", opts.buggifyEnabled)
.detail("FaultInjectionEnabled", opts.faultInjectionEnabled)
.detail("MemoryLimit", opts.memLimit)
.detail("VirtualMemoryLimit", opts.virtualMemLimit)
.trackLatest("ProgramStart");
Error::init();
std::set_new_handler(&platform::outOfMemory);
setMemoryQuota(opts.memLimit);
Future<Void> memoryUsageMonitor = startMemoryUsageMonitor(opts.memLimit);
setMemoryQuota(opts.virtualMemLimit);
Future<Optional<Void>> f;

View File

@ -68,6 +68,8 @@ void FlowKnobs::initialize(Randomize randomize, IsSimulated isSimulated) {
init( HUGE_ARENA_LOGGING_BYTES, 100e6 );
init( HUGE_ARENA_LOGGING_INTERVAL, 5.0 );
init( MEMORY_USAGE_CHECK_INTERVAL, 1.0 );
// Chaos testing - enabled for simulation by default
init( ENABLE_CHAOS_FEATURES, isSimulated );
init( CHAOS_LOGGING_INTERVAL, 5.0 );

View File

@ -130,6 +130,8 @@ public:
double HUGE_ARENA_LOGGING_BYTES;
double HUGE_ARENA_LOGGING_INTERVAL;
double MEMORY_USAGE_CHECK_INTERVAL;
// Chaos testing
bool ENABLE_CHAOS_FEATURES;
double CHAOS_LOGGING_INTERVAL;

View File

@ -1942,6 +1942,9 @@ std::string epochsToGMTString(double epochs) {
}
void setMemoryQuota(size_t limit) {
if (limit == 0) {
return;
}
#if defined(USE_SANITIZER)
// ASAN doesn't work with memory quotas: https://github.com/google/sanitizers/wiki/AddressSanitizer#ulimit--v
return;

View File

@ -415,3 +415,15 @@ SystemStatistics customSystemMonitor(std::string const& eventName, StatisticsSta
statState->networkState = netData;
return currentStats;
}
Future<Void> startMemoryUsageMonitor(uint64_t memLimit) {
if (memLimit == 0) {
return Void();
}
auto checkMemoryUsage = [=]() {
if (getResidentMemoryUsage() > memLimit) {
platform::outOfMemory();
}
};
return recurring(checkMemoryUsage, FLOW_KNOBS->MEMORY_USAGE_CHECK_INTERVAL);
}

View File

@ -156,4 +156,6 @@ SystemStatistics customSystemMonitor(std::string const& eventName,
bool machineMetrics = false);
SystemStatistics getSystemStatistics();
Future<Void> startMemoryUsageMonitor(uint64_t memLimit);
#endif /* FLOW_SYSTEM_MONITOR_H */