foundationdb/design/tlog-forward-compatibility....

218 lines
11 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<meta charset="utf-8">
# Forward Compatibility for Transaction Logs
## Background
A repeated concern with adopting FoundationDB has been that upgrades are one
way, with no supported rollback. If one were to upgrade a cluster running 6.0
to a 6.1, then there's no way to roll back to 6.0 if the new version results in
worse client application performance or unavailability. In the interest of
increasing adoption, work has begun on supporting on-disk forward
compatibility, which allows for upgrades to be rolled back.
The traditional way of allowing roll backs is to have one version, `N`, that
introduces a feature, but is left as disabled. `N+1` enables the feature, and
then `N+2` removes whatever was deprecated in `N`. However, FDB currently has
a 6 month release cadence, and waiting 6 months to be able to use a new feature
in production is unacceptably long. Thus, the goal is to have a way to be able
to have a sane and user-friendly, rollback-supporting upgrade path, but still
allow features to be used immediately if desired.
This document also carries two specific restrictions to the scope of what it covers:
1. This document specifically is **not** a discussion of network protocol
compatibility nor supporting rolling upgrades. Rolling upgrades of FDB are
still discouraged, and minor versions are still protocol incompatible with
each other.
2. This only covers the proposed design of how forward compatibility for
transaction logs will be handled, and not forward compatibility for
FoundationDB as a whole. There are other parts of the system that durably
store data, the coordinators and storage servers, that will not be discussed.
## Overview
A new configuration option, `log_version`, will be introduced to allow a user
to control which on-disk format the transaction logs are allowed to use. Not
every release will affect the on-disk format of the transaction logs, so
`log_version` is an opaque integer that is incremented by one whenever the
on-disk format of the transaction log is changed.
`log_version` is set by from `fdbcli`, with an invocation looking like
`$ fdbcli -C cluster.file --exec "configure log_version:=2"`. Note that `:=`
is used instead of `=`, to keep the convention in `fdbcli` that configuration
options that users aren't expected to need (or wish) to modify are set with
`:=`.
Right now, FDB releases and `log_version` values are as follows:
| Release | Log Version |
| ------- | ----------- |
| pre-5.2 | 1 |
| 5.2-6.0 | 2 |
| 6.1+ | 3 |
| 6.2 | 4 |
| 6.3 | 5 |
If a user does not specify any configuration for `log_version`, then
`log_version` will be set so that rolling back to the previous minor version of
FDB will be possible. FDB will always support loading files generated by
default from the next minor version. It will be possible to configure
`log_version` to a higher value on the release that introduces it, it the user
is willing to sacrifice the ability to roll back.
This means FDB's releases will work like the following:
| | 6.0 | 6.1 | 6.2 | 6.3 |
|--------------|-----|-----|-------|---------|
| Configurable | 2 | 2,3 | 3,4 | 4,5 |
| Default | 2 | 2 | 3 | 4 |
| Recoverable | 2 | 2,3 | 2,3,4 | 2,3,4,5 |
Where...
* "configurable" means values considered an acceptable configuration setting for `fdbcli> configure log_version:=N`.
* "default" means what `log_version` will be if you don't configure it.
* "recoverable" means that FDB can load files that were generated from the specified `log_version`.
Configuring to a `log_version` will cause FDB to use the maximum of that
`log_version` and default `log_version`. The default `log_version` will always
be the minimum configurable log version. This is done so that manually setting
`log_version` once, and then upgrading FDB multiple times, will eventually
cause a low `log_version` left in the database configuration to act as a
request for the default. Configuring `log_version` to a very high number (e.g. 9999)
will cause FDB to always use the highest available log version.
As a concrete example, 6.1 will introduce a new transaction log feature with
on-disk format implications. If you wish to use it, you'll first have to
`configure log_version:=3`. Otherwise, after upgrading to FDB6.2, it will
become the default. If problems are discovered when upgrading to FDB6.2, then
roll back to FDB6.1. (Theoretically. See scope restrictions above.)
## Detailed Implementation
`fdbcli> configure log_version:=3` sets `\xff/conf/log_version` to `3`. This
version is also persisted as part of the `LogSystemConfig` and thus
`DBCoreState`, so that any code handling the log system will have access to the
`log_version` that was used to create it.
Changing `log_version` will result in a recovery, and FoundationDB will recover
into the requested transaction log implementation. This involves locking the
previous generation of transaction logs, and then recruiting a new generation
of transaction logs. FDB will load `\xff/conf/log_version` as the requested
`log_version`, and when sending a `InitializeTLogRequest` to recruit a new
transaction log, it uses the maximum of the requested log version and the
default `log_version`.
A worker, when receiving an `InitializeTLogRequest`, will initialize a
transaction log corresponding to the requested `log_version`. Transaction logs
can pack multiple generations of transaction logs into the same shared entity,
a `SharedTLog`. `SharedTLog` instances correspond to one set of files, and
will only contain transaction log generations of the same `log_version`.
This allows us to have multiple generations of transaction logs running within
one worker that have different `log_version`s, and if the worker crashes and
restarts, we need to be able to recreate those transaction log instances.
Transaction logs maintain two types of files, one is a pair files prefixed with
`logqueue-` that are the DiskQueue, and the other is the metadata store, which
is normally a mini `ssd-2` storage engine running within the transaction log.
When a worker first starts, it scans its data directory for any files that were
instances of a transaction log. It then needs to construct a transaction log
instance that can read the format of the file to be able to reconnect the data
in the files back to the FDB cluster, so that it can be used in a recovery if
needed.
This presents a problem that the worker needs to know all the configuration
options that were used to decide the file format of the transaction log
*before* it can rejoin a cluster and get far enough through a recovery to find
out what that configuration was. To get around this, the relevant
configuration options have been added to the file name so that they're
available when scanning the list of files.
Currently, FDB identifies a transaction log instance via seeing a file that starts
with `log-`, which represents the metadata store. This filename has the format
of `log-<UUID>.<SUFFIX>` where UUID is the `logId`, and SUFFIX tells us if the
metadata store is a memory or ssd storage engine file.
This format is being changed to `log2-&lt;KV PAIRS&gt;-<UUID>.<SUFFIX>`, where KV
PAIRS is a small amount of information encoded into the file name to give us
the metadata *about* the file that is required. According to POSIX, the
characters allowed for "fully portable filenames" are `AZ az 09 . _ -` and
the filename length should stay under 255 characters. This leaves only `_` as
the only character not already used. Therefore, the KV pair encoding
`K1_V1_K2_V2_...`, so keys and values separated by an `_`, and kv pairs are
also separated by an `_`.
The currently supported keys are:
V
: A copy of `log_version`
LS
: `log_spill`, a new configuration option in 6.1
and any unrecognized keys are ignored, which will likely help forward compatibility.
An example file name is `log2-V_3_LS_2-46a5f353ac18d787852d44c3a2e51527-0.fdq`.
### Testing
`SimulationConfig` has been changed to randomly set `log_version` according to
what is supported. This means that with restarting upgrade tests that simulate
upgrading from `N` to `N+1`, the `N+1` version will see files that came from an
FDB running with any `log_version` value that was previously supported. If
`N+1` can't handle the files correctly, then the simulation test will fail.
`ConfigureTest` tries randomly toggling `log_version` up and down in a live
database, along with all the other log related options. Some are valid, some
are invalid and should be rejected, or will cause ASSERTs in later parts of the
code.
I've added a new test, `ConfigureTestRestart` that tests changing
configurations and then upgrading FDB, to cover testing that upgrades still
happen correctly when `log_version` has been changed. This also verifies that
on-disk formats for those `log_version`s are still loadable by future FDB
versions.
There are no tests that mix the `ConfigureDatabase` and `Attrition` workloads.
It would be good to do so, to cover the case of `log_version` changes in the
presence of failures, but one cannot be added easily. The simulator calculates
what processes/machines are safe to kill by looking at the current
configuration. For `ConfigureTest`, this isn't good enough, because `triple`
could mean that there are three replicas, or that the FDB cluster just changed
from `single` to `triple` and only have one replica of data until data
distribution finishes. It would be good to add a `ConfigureKillTest` sometime
in the future.
For FDB to actually announce that rolling back from `N+1` to `N` is supported,
there will need to be downgrade tests from `N+1` to `N` also. The default in
`N+1` should always be recoverable within `N`. As FDB isn't promising forward
compatibility yet, these tests haven't been implemented.
# Transaction Log Forward Compatibility Operational Guide
## Notable Behavior Changes
When release notes mention a new `log_version` is available, after deploying
that release, it's worth considering upgrading `log_version`. Doing so will
allow a controlled upgrade, and reduce the number of new changes that will
take effect when upgrading to the next release.
## Observability
* When running with a non-default `log_version`, the setting will appear in `fdbcli> status`.
## Monitoring and Alerting
If anyone is doing anything that relies on the file names the transaction log uses, they'll be changing.
<!-- Force long-style table of contents -->
<script>window.markdeepOptions={}; window.markdeepOptions.tocStyle="long";</script>
<!-- When printed, top level section headers should force page breaks -->
<style>.md h1, .md .nonumberh1 {page-break-before:always}</style>
<!-- Markdeep: -->
<style class="fallback">body{visibility:hidden;white-space:pre;font-family:monospace}</style><script src="markdeep.min.js" charset="utf-8"></script><script src="https://casual-effects.com/markdeep/latest/markdeep.min.js" charset="utf-8"></script><script>window.alreadyProcessedMarkdeep||(document.body.style.visibility="visible")</script>