foundationdb/design/backup-dataFormat.md

6.8 KiB
Raw Blame History

FDB Backup Data Format

Introduction

This document describes the data format of the files generated by FoundationDB (FDB) backup procedure. The target readers who may benefit from reading this document are:

  • who make changes on the current backup or restore procedure;
  • who writes tools to digest the backup data for analytical purpose;
  • who wants to understand the internals of how backup and restore works.

The description of the backup data format is based on FDB 5.2 to FDB 6.1. The backup data format may (although unlikely) change after FDB 6.1.

Files generated by backup

The backup procedure generates two types of files: range files and log files.

  • A range file describes key-value pairs in a range at the version when the backup process takes a snapshot of the range. Different range files have data for different ranges at different versions.
  • A log file describes the mutations taken from a version v1 to v2 during the backup procedure.

With the key-value pairs in range file and the mutations in log file, the restore procedure can restore the database into a consistent state at a user-provided version vk if the backup data is claimed by the restore as restorable at vk. (The details of determining if a set of backup data is restorable at a version is out of scope of this document and can be found at backup.md.

Filename conventions

The backup files will be saved in a directory (i.e., url) specified by users. Under the directory, the range files are in the snapshots folder. The log files are in the logs folder.

The convention of the range filename is snapshots/snapshot,beginVersion,beginVersion,blockSize, where beginVersion is the version when the key-values in the range file are recorded, and blockSize is the size of data blocks in the range file.

The convention of the log filename is logs/,versionPrefix/log,beginVersion,endVersion,randomUID, blockSize, where the versionPrefix is a 2-level path (x/y) where beginVersion should go such that x/y/* contains (10^smallestBucket) possible versions; the randomUID is a random UID, the beginVersion and endVersion are the version range (left inclusive, right exclusive) when the mutations are recorded; and the blockSize is the data block size in the log file.

We will use an example to explain what each field in the range and log filename means. Suppose under the backup directory, we have a range file snapshots/snapshot,78994177,78994177,97 and a log file logs/0000/0000/log,78655645,98655645,149a0bdfedecafa2f648219d5eba816e,1048576. The range files filename tells us that all key-value pairs decoded from the file are the KV value in DB at the version 78994177. The data block size is 97 bytes. The log files filename tells us that the mutations in the log file were the mutations in the DB during the version range [78655645,98655645), and the data block size is 1048576 bytes.

Data format in a range file

A range file can have one to many data blocks. Each data block has a set of key-value pairs. A data block is encoded as follows: Header startKey k1v1 k2v2 Padding.

Example:

The client code writes keys in this sequence:
         a c d e f g h i j z
The backup procedure records the key-value pairs in the database into range file.

H = header   P = padding   a...z = keys  v = value | = block boundary

Encoded file:  H a cv dv P | H e ev fv gv hv P | H h hv iv jv z
Decoded in blocks yields:
           Block 1: range [a, e) with kv pairs cv, dv
           Block 2: range [e, h) with kv pairs ev, fv, gv
           Block 3: range [h, z) with kv pairs hv, iv, jv

NOTE: All blocks except for the final block will have one last value which will not be used. This isn't actually a waste since if the next KV pair wouldn't fit within the block after the value then the space after the final key to the next 1MB boundary would just be padding anyway.

The code related to how a range file is written is in the struct RangeFileWriter in namespace fileBackup.

The code that decodes a range block is in ACTOR Future<Standalone<VectorRef<KeyValueRef>>> decodeRangeFileBlock(Reference<IAsyncFile> file, int64_t offset, int len).

Data format in a log file

A log file can have one to many data blocks. Each block is encoded as Header, [Param1, Param2]... padding. The first 32bits in Param1 and Param2 specifies the length of the Param1 and Param2. Param1 specifies the version when the mutations happened; Param2 encodes the group of mutations happened at the version.

Note that if the group of mutations is bigger than the block size, the mutation group will be split across multiple data blocks. For example, we may get [Param1, Param2_part0], [Param1, Param2_part1]. By concatenating the Param2_part0 and Param2_part1, we can get the group of all mutations happened in the version specified in Param1.

The encoding format for Param1 is as follows: hashValue|commitVersion|part, where hashValue is the hash of the commitVersion, commitVersion is the version when the mutations in Param2(s) are taken, and part is the part number in case we need to concatenate the Param2 to get the group of all mutations. hashValue takes 8bits, commitVersion takes 64bits, and part takes 32bits.

Note that in case of concatenating the partial group of mutations in Param2 to get the full group of all mutations, the part number should be continuous.

The encoding format for the group of mutations, which is Param2 or the concatenated Param2 in case of partial group of mutations in a block, is as follows: length_of_the_mutation_group | encoded_mutation_1 | … | encoded_mutation_k. The encoded_mutation_i is encoded as follows type|kLen|vLen|Key|Value where type is the mutation type, such as Set or Clear, kLen and vLen respectively are the length of the key and value in the mutation. Key and Value are the serialized value of the Key and Value in the mutation.

The code related to how a log file is written is in the struct LogFileWriter in namespace fileBackup.

The code that decodes a mutation block is in ACTOR Future<Standalone<VectorRef<KeyValueRef>>> decodeLogFileBlock(Reference<IAsyncFile> file, int64_t offset, int len).

Endianness

When the restore decodes a serialized integer from the backup file, it needs to convert the serialized value from big endian to little endian.

The reason is as follows: When the backup procedure transfers the data to remote blob store, the backup data is encoded in big endian. However, FoundationDB currently only run on little endian machines. The endianness affects the interpretation of an integer, so we must perform the endianness convertion.