With sharded txs tags, the master now receives data from transaction
logs at an order of magnitude higher rate. This is the intentional
desires result of sharding the txs tag. With a sufficient number of
TLogs, the master will saturate its CPU time handling the peek
responses.
Performance tests revealed some unstable oddities in how long a recovery
would take, which was eventually root caused to a priority inversion
between TLogRejoin requests and TLog peek replies.
Once peek replies saturate the CPU, the master would proceed to ignore
further TLogRejoin messages. TLogRejoin is what marks a TLog as
available to the failure monitor, which is also what decides between a
ServerPeekCursor and a MergePeekCursor for a SetPeekCursor. Ignoring
TLogRejoins meant that the sharded txs locality tags for those servers
would be merge peeked over all TLogs. This is much less efficient than
just peeking one copy of data from the one preferred server.
Depending on the race between TLogPeek replies saturating the CPU and
TLogRejoin requests being submitted, a variable number of tags would be
affected, and thus the performance test would have some variance in its
results.
This commit includes functionality to turn on
the object serializer for network communication.
This is done the following way:
- On incoming connections, a process will detect
whether the client supports the object serializer
and will only serialize responses with it, if it does
- On outgoing connections, the command line flag is used
to determine whether the object serializer should be used
to send data.
This way, a cluster can run in mixed mode. To upgrade one
can upgrade one process at a time and set the flag one process
at a time.
This is how this is tested on the simulator:
- The command line flag can take three options: on, off,
and random.
- For off, the object serializer will never we used.
- For on, the object serializer will be always used.
- For random, the simulator will flip a coin for each
process it starts up.
This commit includes:
- The flatbuffers implementation
- A draft on how it should be used for network messages
- A serializer that can be used independently
What is missing:
- All root objects will need a file identifier
- Many special classes can not be serialized yet as the
corresponding traits are not yet implemented
- Object serialization can not yet be turned on (this will
need a network option)
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:
- TraceEvent::detail now takes a c-string instead of std::string for
literals. This prevents unnecessary allocations if the trace is not
going to be printed in the first place (for example for SevDebug).
Before that `detail` expected a `std::string` as key, which mean that
any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
specialized for any type that needs to be printed. The actual
formatting will be deferred to after the `enabled` check. This
provides two benefits: (1) if a TraceEvent is disabled, we don't pay
for the formatting and (2) TraceEvent can trace types that it doesn't
know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
calls, a call to detail will only introduce a if-branch which is much
cheaper than a function call.
If a server has its data spilled, then it's behind the 5s window.
Feeding it data is less important than committing, so we can hide the
extra CPU usage from checksumming the read amplified disk queue pages.
Add a new role for ratekeeper.
Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.
- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.
- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.
- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
Extends the CLI interface to take multiple public and listen addresses.
We however do not do anything with those extra addresses and just
consider the first one for now.
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
increase the amount of memory ratekeeper budgets for tlogs so that there is a gap after the spill threshold to prevent temporarily overshooting the budget
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.