218 lines
11 KiB
HTML
218 lines
11 KiB
HTML
<meta charset="utf-8">
|
||
|
||
# Forward Compatibility for Transaction Logs
|
||
|
||
## Background
|
||
|
||
A repeated concern with adopting FoundationDB has been that upgrades are one
|
||
way, with no supported rollback. If one were to upgrade a cluster running 6.0
|
||
to a 6.1, then there's no way to roll back to 6.0 if the new version results in
|
||
worse client application performance or unavailability. In the interest of
|
||
increasing adoption, work has begun on supporting on-disk forward
|
||
compatibility, which allows for upgrades to be rolled back.
|
||
|
||
The traditional way of allowing roll backs is to have one version, `N`, that
|
||
introduces a feature, but is left as disabled. `N+1` enables the feature, and
|
||
then `N+2` removes whatever was deprecated in `N`. However, FDB currently has
|
||
a 6 month release cadence, and waiting 6 months to be able to use a new feature
|
||
in production is unacceptably long. Thus, the goal is to have a way to be able
|
||
to have a sane and user-friendly, rollback-supporting upgrade path, but still
|
||
allow features to be used immediately if desired.
|
||
|
||
This document also carries two specific restrictions to the scope of what it covers:
|
||
|
||
1. This document specifically is **not** a discussion of network protocol
|
||
compatibility nor supporting rolling upgrades. Rolling upgrades of FDB are
|
||
still discouraged, and minor versions are still protocol incompatible with
|
||
each other.
|
||
2. This only covers the proposed design of how forward compatibility for
|
||
transaction logs will be handled, and not forward compatibility for
|
||
FoundationDB as a whole. There are other parts of the system that durably
|
||
store data, the coordinators and storage servers, that will not be discussed.
|
||
|
||
## Overview
|
||
|
||
A new configuration option, `log_version`, will be introduced to allow a user
|
||
to control which on-disk format the transaction logs are allowed to use. Not
|
||
every release will affect the on-disk format of the transaction logs, so
|
||
`log_version` is an opaque integer that is incremented by one whenever the
|
||
on-disk format of the transaction log is changed.
|
||
|
||
`log_version` is set by from `fdbcli`, with an invocation looking like
|
||
`$ fdbcli -C cluster.file --exec "configure log_version:=2"`. Note that `:=`
|
||
is used instead of `=`, to keep the convention in `fdbcli` that configuration
|
||
options that users aren't expected to need (or wish) to modify are set with
|
||
`:=`.
|
||
|
||
Right now, FDB releases and `log_version` values are as follows:
|
||
|
||
| Release | Log Version |
|
||
| ------- | ----------- |
|
||
| pre-5.2 | 1 |
|
||
| 5.2-6.0 | 2 |
|
||
| 6.1+ | 3 |
|
||
| 6.2 | 4 |
|
||
| 6.3 | 5 |
|
||
|
||
If a user does not specify any configuration for `log_version`, then
|
||
`log_version` will be set so that rolling back to the previous minor version of
|
||
FDB will be possible. FDB will always support loading files generated by
|
||
default from the next minor version. It will be possible to configure
|
||
`log_version` to a higher value on the release that introduces it, it the user
|
||
is willing to sacrifice the ability to roll back.
|
||
|
||
This means FDB's releases will work like the following:
|
||
|
||
| | 6.0 | 6.1 | 6.2 | 6.3 |
|
||
|--------------|-----|-----|-------|---------|
|
||
| Configurable | 2 | 2,3 | 3,4 | 4,5 |
|
||
| Default | 2 | 2 | 3 | 4 |
|
||
| Recoverable | 2 | 2,3 | 2,3,4 | 2,3,4,5 |
|
||
|
||
Where...
|
||
|
||
* "configurable" means values considered an acceptable configuration setting for `fdbcli> configure log_version:=N`.
|
||
* "default" means what `log_version` will be if you don't configure it.
|
||
* "recoverable" means that FDB can load files that were generated from the specified `log_version`.
|
||
|
||
Configuring to a `log_version` will cause FDB to use the maximum of that
|
||
`log_version` and default `log_version`. The default `log_version` will always
|
||
be the minimum configurable log version. This is done so that manually setting
|
||
`log_version` once, and then upgrading FDB multiple times, will eventually
|
||
cause a low `log_version` left in the database configuration to act as a
|
||
request for the default. Configuring `log_version` to a very high number (e.g. 9999)
|
||
will cause FDB to always use the highest available log version.
|
||
|
||
As a concrete example, 6.1 will introduce a new transaction log feature with
|
||
on-disk format implications. If you wish to use it, you'll first have to
|
||
`configure log_version:=3`. Otherwise, after upgrading to FDB6.2, it will
|
||
become the default. If problems are discovered when upgrading to FDB6.2, then
|
||
roll back to FDB6.1. (Theoretically. See scope restrictions above.)
|
||
|
||
## Detailed Implementation
|
||
|
||
`fdbcli> configure log_version:=3` sets `\xff/conf/log_version` to `3`. This
|
||
version is also persisted as part of the `LogSystemConfig` and thus
|
||
`DBCoreState`, so that any code handling the log system will have access to the
|
||
`log_version` that was used to create it.
|
||
|
||
Changing `log_version` will result in a recovery, and FoundationDB will recover
|
||
into the requested transaction log implementation. This involves locking the
|
||
previous generation of transaction logs, and then recruiting a new generation
|
||
of transaction logs. FDB will load `\xff/conf/log_version` as the requested
|
||
`log_version`, and when sending a `InitializeTLogRequest` to recruit a new
|
||
transaction log, it uses the maximum of the requested log version and the
|
||
default `log_version`.
|
||
|
||
A worker, when receiving an `InitializeTLogRequest`, will initialize a
|
||
transaction log corresponding to the requested `log_version`. Transaction logs
|
||
can pack multiple generations of transaction logs into the same shared entity,
|
||
a `SharedTLog`. `SharedTLog` instances correspond to one set of files, and
|
||
will only contain transaction log generations of the same `log_version`.
|
||
|
||
This allows us to have multiple generations of transaction logs running within
|
||
one worker that have different `log_version`s, and if the worker crashes and
|
||
restarts, we need to be able to recreate those transaction log instances.
|
||
|
||
Transaction logs maintain two types of files, one is a pair files prefixed with
|
||
`logqueue-` that are the DiskQueue, and the other is the metadata store, which
|
||
is normally a mini `ssd-2` storage engine running within the transaction log.
|
||
|
||
When a worker first starts, it scans its data directory for any files that were
|
||
instances of a transaction log. It then needs to construct a transaction log
|
||
instance that can read the format of the file to be able to reconnect the data
|
||
in the files back to the FDB cluster, so that it can be used in a recovery if
|
||
needed.
|
||
|
||
This presents a problem that the worker needs to know all the configuration
|
||
options that were used to decide the file format of the transaction log
|
||
*before* it can rejoin a cluster and get far enough through a recovery to find
|
||
out what that configuration was. To get around this, the relevant
|
||
configuration options have been added to the file name so that they're
|
||
available when scanning the list of files.
|
||
|
||
Currently, FDB identifies a transaction log instance via seeing a file that starts
|
||
with `log-`, which represents the metadata store. This filename has the format
|
||
of `log-<UUID>.<SUFFIX>` where UUID is the `logId`, and SUFFIX tells us if the
|
||
metadata store is a memory or ssd storage engine file.
|
||
|
||
This format is being changed to `log2-<KV PAIRS>-<UUID>.<SUFFIX>`, where KV
|
||
PAIRS is a small amount of information encoded into the file name to give us
|
||
the metadata *about* the file that is required. According to POSIX, the
|
||
characters allowed for "fully portable filenames" are `A–Z a–z 0–9 . _ -` and
|
||
the filename length should stay under 255 characters. This leaves only `_` as
|
||
the only character not already used. Therefore, the KV pair encoding
|
||
`K1_V1_K2_V2_...`, so keys and values separated by an `_`, and kv pairs are
|
||
also separated by an `_`.
|
||
|
||
The currently supported keys are:
|
||
|
||
V
|
||
: A copy of `log_version`
|
||
|
||
LS
|
||
: `log_spill`, a new configuration option in 6.1
|
||
|
||
and any unrecognized keys are ignored, which will likely help forward compatibility.
|
||
|
||
An example file name is `log2-V_3_LS_2-46a5f353ac18d787852d44c3a2e51527-0.fdq`.
|
||
|
||
### Testing
|
||
|
||
`SimulationConfig` has been changed to randomly set `log_version` according to
|
||
what is supported. This means that with restarting upgrade tests that simulate
|
||
upgrading from `N` to `N+1`, the `N+1` version will see files that came from an
|
||
FDB running with any `log_version` value that was previously supported. If
|
||
`N+1` can't handle the files correctly, then the simulation test will fail.
|
||
|
||
`ConfigureTest` tries randomly toggling `log_version` up and down in a live
|
||
database, along with all the other log related options. Some are valid, some
|
||
are invalid and should be rejected, or will cause ASSERTs in later parts of the
|
||
code.
|
||
|
||
I've added a new test, `ConfigureTestRestart` that tests changing
|
||
configurations and then upgrading FDB, to cover testing that upgrades still
|
||
happen correctly when `log_version` has been changed. This also verifies that
|
||
on-disk formats for those `log_version`s are still loadable by future FDB
|
||
versions.
|
||
|
||
There are no tests that mix the `ConfigureDatabase` and `Attrition` workloads.
|
||
It would be good to do so, to cover the case of `log_version` changes in the
|
||
presence of failures, but one cannot be added easily. The simulator calculates
|
||
what processes/machines are safe to kill by looking at the current
|
||
configuration. For `ConfigureTest`, this isn't good enough, because `triple`
|
||
could mean that there are three replicas, or that the FDB cluster just changed
|
||
from `single` to `triple` and only have one replica of data until data
|
||
distribution finishes. It would be good to add a `ConfigureKillTest` sometime
|
||
in the future.
|
||
|
||
For FDB to actually announce that rolling back from `N+1` to `N` is supported,
|
||
there will need to be downgrade tests from `N+1` to `N` also. The default in
|
||
`N+1` should always be recoverable within `N`. As FDB isn't promising forward
|
||
compatibility yet, these tests haven't been implemented.
|
||
|
||
# Transaction Log Forward Compatibility Operational Guide
|
||
|
||
## Notable Behavior Changes
|
||
|
||
When release notes mention a new `log_version` is available, after deploying
|
||
that release, it's worth considering upgrading `log_version`. Doing so will
|
||
allow a controlled upgrade, and reduce the number of new changes that will
|
||
take effect when upgrading to the next release.
|
||
|
||
## Observability
|
||
|
||
* When running with a non-default `log_version`, the setting will appear in `fdbcli> status`.
|
||
|
||
## Monitoring and Alerting
|
||
|
||
If anyone is doing anything that relies on the file names the transaction log uses, they'll be changing.
|
||
|
||
|
||
<!-- Force long-style table of contents -->
|
||
<script>window.markdeepOptions={}; window.markdeepOptions.tocStyle="long";</script>
|
||
<!-- When printed, top level section headers should force page breaks -->
|
||
<style>.md h1, .md .nonumberh1 {page-break-before:always}</style>
|
||
<!-- Markdeep: -->
|
||
<style class="fallback">body{visibility:hidden;white-space:pre;font-family:monospace}</style><script src="markdeep.min.js" charset="utf-8"></script><script src="https://casual-effects.com/markdeep/latest/markdeep.min.js" charset="utf-8"></script><script>window.alreadyProcessedMarkdeep||(document.body.style.visibility="visible")</script>
|