fixup! Update documentation per comments

This commit is contained in:
Xiaoge Su 2022-07-15 17:21:41 -07:00
parent 6e7716d078
commit bb05c83e19
1 changed files with 32 additions and 26 deletions

View File

@ -1,8 +1,16 @@
# Load Balancing in FoundationDB
In FoundationDB, often multiple *interface*s are available for the same type of *request*s. A load balancer can be used to distribute the requests to those interfaces, while awaring the possible failures.
## Introduction
Two load balancer are provided: `basicLoadBalance` and `loadBalance`, both defined in `LoadBalance.actor.h`. The `basicLoadBalance` is a simple
FoundationDB is a distributed key-value database. A FoundationDB cluster is constituted by one or more processes over one or more physical machines, where each process is a *worker* and takes certain *role*s, such as coordinator, proxy, TLog, storage server, etc., in the system.
The interpocess communications (IPC) between the processes are supported by the [`flow`](https://github.com/apple/foundationdb/tree/main/flow) infrastructure. In the `flow` context, each process will expose one or more *interface*(s). Each interface is able to accept given type of *request*s, and *reply* `Void`, requested data or error. The interfaces and the corresponding request/reply pairs forms the IPC protocol of FoundationDB.
In many cases, the same request can be proceed by multiple processes, e.g. all commit proxies can accept commit requests, and multiple storage server processes can provide values for a given key in double/triple redundancy mode. A load balancer (LB) can be used to distribute the requests over the possible interfaces, preventing one or a few processes getting overloaded. The interface candidates are also referred as *alternative*s. The LB is also able to react when one or more interfaces are (temporarily) unavailable by retrying, or re-routing the request to other candidates. The interface candidates are also known as *alternative*s.
Two LBs are provided in FoundationDB: `basicLoadBalance` and `loadBalance`, both defined in [`LoadBalance.actor.h`](https://github.com/apple/foundationdb/blob/main/fdbrpc/include/fdbrpc/LoadBalance.actor.h). The `basicLoadBalance` is a simple load balancer which each interface is equally chosen; while the `loadBalance` accepts a model object, which provides [datacenter](https://apple.github.io/foundationdb/configuration.html#configuring-regions) (DC) awaring balancing algorithms, allowing requests being sent to interfaces in the same DC.
In the following sections, the two LBs will be discussed in details.
## `basicLoadBalance`
@ -12,12 +20,12 @@ Two load balancer are provided: `basicLoadBalance` and `loadBalance`, both defin
* GetReadVersion proxy interface
* ConfigFollower interface
The interface is assumed to be always *fresh*, i.e. the list of the servers is fixed.
Here, the interfaces are assumed to be always *fresh*, i.e. the list of the servers is fixed.
```mermaid
graph LR
H0{Has alternatives?}
H1[Pick up an alternative]
H1[Pick an alternative]
H2[Backoff]
H3[Request]
H4([Reply])
@ -26,24 +34,24 @@ graph LR
H((Start)) --> H0
H0 --No--> H6
H0 --Yes--> H1
H1 --No healthy alternatives--> H2 --> H1
H1 --No healthy alternatives--> H2 --Retry--> H1
H1 --Has alternative--> H3 --Success--> H4
H3 --Exception--> H5
H3 --Broken Promise --> H2
```
### Alternative pick up algorithm
### Alternative pick algorithm
In `basicLoadBalance`, a *best* alternative is picked up and used at the beginning. At this stage, this alternative is randomly picked up among all alternatives. If the best alternative does not work, it will iteratively try other interfaces, see [here](#picking-up-an-alternative-in-basic-load-balancing-algorithm).
In `basicLoadBalance`, a *best* alternative is picked and used at the beginning. At this stage, this alternative is randomly picked among all alternatives. If the best alternative does not work, it will iteratively try other interfaces, see [here](#picking-up-an-alternative-in-basic-load-balancing-algorithm).
## `loadBalance`
`loadBalance` provides a more sophisticated implementation of load balancing. In addition to the basic load balancing, it also provides a variety of features, such as
`loadBalance` provides a more sophisticated implementation of load balancing. In addition of the basic load balancing, it also provides a variety of features:
* Support for Test Storage Server ([TSS](https://github.com/apple/foundationdb/blob/main/documentation/sphinx/source/tss.rst))
* Distance-based candidate election
* Able to handle timeouts and exceptions with retries
* etc.
* Datacenter awaring alternative election
* Recording the latency and penalty from interfaces, and [prioritize the interfaces based on previously stored data](#with-queuemodel).
* Able to handle timeouts and SS exceptions with retries.
Currently it is used for
@ -58,7 +66,7 @@ graph LR
H0{Has alternatives?}
H1[Choose initial candidates]
H4([Never])
H5[Pick up an alternative]
H5[pick an alternative]
H6[Send request]
H7[Wait for available alternative]
H8([Response])
@ -77,34 +85,32 @@ graph LR
Note:
* Response could be an exception, e.g. `process_behind` or `request_maybe_delivered`, and will be delivered as `Error` to the caller.
* The response could be either a reply, or an `Error`, e.g. `process_behind` or `request_maybe_delivered`.
### Choose initial candidates
Two initial candidates will be picked up before the requests start. They will be selected as the first two alternatives for the load balancer. If both of them failed, other alternatives are used in a round-robin way.
Two initial candidates will be picked before the requests start. They will be selected as the first two alternatives for the load balancer. If both of them failed, other alternatives are used in a round-robin way.
#### No `QueueModel`
If no `QueueModel` is provided, the initial candidates are picked up randomly. The first candidate, or the *best* alternative, will always be one of local workers.
If no `QueueModel` is provided, the initial candidates are picked randomly. The first candidate, or the *best* alternative, will be the one that in the same DC, if possible.
#### With `QueueModel`
`QueueModel` holds information about each candidate related to future version, latency and penalty.
* If the storage server is returning a future version error, it is marked as not available until some certain time.
* Penalty is reported by storage server in each response (see `storageserver.actor.cpp:StorageServer::getPenalty`). It is determined by the write queue length and the version lagging.
* Penalty is reported by storage server in each response (see `storageserver.actor.cpp:StorageServer::getPenalty`). It is determined by the write queue length and the durability lagging.
If `QueueModel` exists, the candidates will be picked base on the penalty. Workers with high penalties will be avoided when picking up the first two candidates.
If `QueueModel` exists, the candidates will be picked base on the penalty. Workers with high penalties will be avoided when picking the first two candidates.
### Pick up an alternative
### Pick an alternative
As mentioned above, the alternatives are chosen in the round-robin way when the first two candidates failed.
If all alternatives failed, a flag is set, so if the next request fails with `process_behind`, the caller will receive the `process_behind` error.
The alternatives are chosen in the round-robin way when the first two candidates failed. If all alternatives failed, a flag is set, and if the next request fails with `process_behind`, the caller will receive the `process_behind` error.
### Send requests to workers
Here it is assumed that there is at least one alternative available.
Here it is assumed that there are at least one alternative available. If no alternative is available, the LB will wait.
```mermaid
graph LR
@ -126,7 +132,7 @@ graph LR
H4 --Additional request failed--> H3
```
If the first request failed, it is reset and the next request will be considered as the first request. Certain types of errors can also be returned as response, e.g. `request_may_be_delivered` or `process_behind`, which may not trigger a load-balancer retry.
The first request has a timeout option. If the LB is not able to retrieve the response within the timout, more requests will be sent to secondary and other available interfaces. If the first request failed, it is reset and the next request will be considered as the first request. Certain types of errors can also be returned as response, e.g. `request_may_be_delivered` or `process_behind`, which may not trigger a load-balancer retry.
### Wait for available alternative
@ -151,7 +157,7 @@ graph LR
H2 --Failed-->H4
```
Note that "Wait for alternatives" will only timeout if the alternatives are not always fresh, i.e. this only happens when accessing storage servers.
Note that "Wait for alternatives" will only timeout if the alternatives are always not fresh, i.e. this only happens when accessing storage servers. LB will throw `all_alternatives_failed` when timeout in this case.
#### Requests
@ -163,9 +169,9 @@ Original requests in `loadBalancer` are wrapped by `LoadBalance.actor.h:RequestD
## Appendix
### Picking up an alternative in load balancing algorithm
### Picking an alternative in basic load balancing algorithm
The following script simulates the alternative picking up algorithm. The chosen alternatives will be printed out one-by-one.
The following script simulates the alternative picking up algorithm. The chosen alternatives will be printed out one-by-one. The `loadBalance` function uses a similar approach, though the interfaces in the same DC are used firstly.
```python
#! /usr/bin/env python3