fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
Simulation identified the fact that we can violate the
VersionStamps-are-always-increasing promise via the following series of events:
1. On proxy 0, dumpData adds commit requests to proxy 0's commit promise stream
2. To any proxy, a client submits the first transaction of abortBackup, which stops further dumpData calls on proxy 0.
3. To any proxy that is not proxy 0, submit a transaction that checks if it needs to upgrade the destination version.
4. The transaction from (3) is committed
5. Transactions from (1) are committed
This is possible because the dumpData transactions have no read conflict
ranges, and thus it's impossible to make them abort due to "conflicting"
transactions. There's also no promise that if client C sends a commit to proxy
A, and later a client D sends a commit to proxy B, that B must log its commit
after A. (We only promise that if C is told it was committed before D is told
it was committed, then A committed before B.)
There was a failed attempt to fix this problem. We tried to add read conflict
ranges to dumpData transactions so that they could be aborted by "conflicting"
transactions. However, this failed because this now means that dumpData
transactions require conflict resolution, and the stale read version that they
use can cause them to be aborted with a transaction_too_old error.
(Transactions that don't have read conflict ranges will never return
transaction_too_old, because with no reads, the read snapshot version is
effectively meaningless.) This was never previously possible, so the existing
code doesn't retry commits, and to make things more complicated, the dumpData
commits must be applied in order. This would require either adding
dependencies to transactions (if A is going to commit then B must also be/have
committed), which would be complicated, or submitting transactions with a fixed
read version, and replaying the failed commits with a higher read version once
we get a transaction_too_old error, which would unacceptably slow down the
maximum throughput of dumpData.
Thus, we've instead elected to add a special transaction option that bypasses
proxy load balancing for commits, and always commits against proxy 0. We can
know for certain that after the transaction from (2) is committed, all of the
dumpData transactions that will be committed have been added to the commit
promise stream on proxy 0. Thus, if we enqueue another transaction against
proxy 0, we can know that it will be placed into the promise stream after all
of the dumpData transactions, thus providing the semantics that we require: no
dumpData transaction can commit after the destination version upgrade
transaction.
This means that loops like `seed=1; while ./fdbserver -r simulation -s $seed;
do seed=$(($seed+1)); done` to find an example of an often failing test. This
also means joshua will report ExitCode errors on anything that has a SevError
in the log.
As a part of this, we also implicitly downgrade any injected errors to SevWarnAlways.
If we're going to do the work to provide more optimized ways to zero files,
then I'd feel better with this being in a more common place, so that any other
zero-ers are likely to reuse it. It also makes testing easier/more obvious.
Also, because it's needed for correctness, fix the aligned_alloc for OSX, which
wasn't aligned, and use an actually aligned allocation function.
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively().
Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
Added an optimization to use a separate set for throttled events. Since this set is expected to be small, comparison of every event against this set is going to be cheaper.
std::is_pod<> being less restrictive than is_binary_serializable<> meant that
structs that both were POD and had a serialize method defined would be binary
serialized instead of using the defined serialize(). This means that it would
also serialize any padding that the struct contained, which would cause mass
waves of valgrind failures from uninitialized memory.
Included in this change is additional uses of valgrind client requests so that
attempts to send uninitialized memory are reported at the sending site, versus
as part of checksum calculation in sending the packet.
* Fixed fdbcli to be more idiomatic.
* Removed is_binary_serializable in favor of std::is_pod<>
* Removed custom enable_if<> in favor of std::enable_if<>
* Removed HEY REVIEWER comments
* Removed print from prof.py
* Added FLOW_PROFILER_ENABLED=yes to circus components that wished to enable the flow profiler.
backtrace() gives a list of return addresses, which means that addr2line will
print out the line after the caller. GetStackTrace returns the list of caller
addresses, so the addr2line results should be accurate. The flow profiler was
also changed to use the new backtracing code, so flow profiles will now be
accurate as well. Unfortunately, the abseil code doesn't work on MacOS, so we
still fall back to backtrace() in this case.
For the stack unwinder to work, we must disable -fomit-frame-pointer. This can
result in a small performance penalty, as it effectively reduces the number of
general purpose registers available by one. (I'm also curious if this has
anything to do with the overly frequent "<value optimized out>" messages from
gdb.) If this shows up as a problem, we can make release builds still have
-fomit-frame-pointer, and fall back to backtrace when it's enabled then as
well.
This code is all Apache 2 licensed, and all headers were maintained when
concatinated, so we should be completely fine from a legal standpoint.
I've scriptified the steps that I took so that if we need to update this code
in the future, it hopefully shouldn't be too much of a hassle.
This is the combination of two small changes.
1. Add support for a string knob type.
2. Change profiles to be written to the log directory instead of the working
directory.
We have three options of where to write files: the working directory, the data
directory, and the log directory.
The working directory may be set to a non-writable location, and likely
contains the fdb binaries. Allowing these files to be overwritten would likely
not be a wise idea.
The data directory hosts our sqlite b-trees. It would also be very unfortunate
if these were ever overwritten by an unfortunate profile name.
The log directory contains logs. Out of the three, these matter the least if
they disappear or become corrupted.
Thus, we write to the log directory.
This adds the fdbcli commands:
* profile list -- Lists all workers in a way that doesn't fill `kill`'s list.
* profile flow run -- Allows starting flow profiling on a set of hosts for a specified interval.
And threads through all the support for enabling and disabling profiling as an RPC.
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.
This is a follow-on to c4eb73d0. Thanks to Bala for pointing out the unchanged
std::move usage, and there appeared to not be many existing users of addMetric
anyway.
The existing code tried to work around the complexities of optionally using
rvalue references' move capabilities if they exist. As seen in the previous
MapPair, there's a combinatorial explosion of prototypes to declare as the
parameter length increases. Because of this, addMetric ended up with a strange
API, and there was a wrapper to make a copy for insert.
Instead, we can apply the idiom of using universal/forwarding references and
std::forward to allow the compiler to instantiate the combinations that are
needed. There's a TagData struct with no copy constructor that validates that
move constructors can be properly called still.
I measured a 12-byte difference between before and after this change, so no
template bloat was introduced.
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.