For backup workers working on old epochs, once their work is done, they will
notify the master. Then the master removes them from the log system and
acknowledge back to the backup workers so that they can gracefully shut down.
The popping of a backup worker is stalled if there are workers from older
epochs still working. Otherwise, workers from old epochs will lost data.
However, allowing newer epoch to start backup can cause holes in version ranges.
The restore process must verify the backup progress to make sure there are no
holes, otherwise it has to wait.
With sharded txs tags, the master now receives data from transaction
logs at an order of magnitude higher rate. This is the intentional
desires result of sharding the txs tag. With a sufficient number of
TLogs, the master will saturate its CPU time handling the peek
responses.
Performance tests revealed some unstable oddities in how long a recovery
would take, which was eventually root caused to a priority inversion
between TLogRejoin requests and TLog peek replies.
Once peek replies saturate the CPU, the master would proceed to ignore
further TLogRejoin messages. TLogRejoin is what marks a TLog as
available to the failure monitor, which is also what decides between a
ServerPeekCursor and a MergePeekCursor for a SetPeekCursor. Ignoring
TLogRejoins meant that the sharded txs locality tags for those servers
would be merge peeked over all TLogs. This is much less efficient than
just peeking one copy of data from the one preferred server.
Depending on the race between TLogPeek replies saturating the CPU and
TLogRejoin requests being submitted, a variable number of tags would be
affected, and thus the performance test would have some variance in its
results.
Let cluster controller to start a new data distributor role by sending a
message to a chosen worker.
Change MasterInterface usage in DataDistribution to masterId
Add DataDistributor rejoin handling.
This allows the data distributor to tell the new cluster controller of its
existence so that the controller doesn't spawn a new one. I.e., there should
be only ONE data distributor in the cluster.
If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries
to recruit one as DD. CC also monitors DD and restarts one if it failed.
The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for
the new DD.
Add GetRecoveryInfo RPC to master server, which is called by data distributor
to obtain the recovery Transaction version from the master server.
- This patch will make FDB listen to multiple addresses given via
command line. Although, we'll still use first address in most places,
this patch starts using vector<NetworkAddress> in Endpoint at some basic
places.
- When sending packets to an endpoint, pick a random network address in
endpoints
- Renames Endpoint::address to Endpoint::addresses since it
now holds a vector of addresses.
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.
This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.
NOTE:
Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>