Commit Graph

151 Commits

Author SHA1 Message Date
Meng Xu 07a9a05683 FastRestore:Agent:Fix restore requests 2020-04-30 16:20:20 -07:00
Meng Xu 6ee78aa3a4 Fix:Disable sanity check backup metadata file
Which can increase the false positive rate of TooManyFiles error
2020-04-30 16:16:14 -07:00
Jingyu Zhou a8becb9027 Use calculated block size for range files in unit tests 2020-04-28 21:13:18 -07:00
Jingyu Zhou ba261eda36 Fix a backup container unit test
Write a valid range file instead of random data so that checking its content is
fine.
2020-04-28 15:39:24 -07:00
Meng Xu 59217ddf1e Remove sanity check on metadata
The sanity check parses each range file to get the key range of each
range file.
The parsing incurs restore_unsupported_file_version error.

We need to include this sanity check before 6.3 release.
2020-04-23 14:59:58 -07:00
Jingyu Zhou 9bfc5bbea8 Check RestorableFileSet's key ranges in simulation
Ranges written in the manifest file should match with actual file content.
2020-04-21 12:55:40 -07:00
Jingyu Zhou 0938e45c6a Set continuous version to invalidVersion when snapshot version is the target version 2020-04-20 22:26:42 -07:00
Jingyu Zhou 930b175c4c Add range files' key ranges to RestorableFileSet
Also add continuous logs' begin and end version in RestorableFileSet.
2020-04-20 22:26:42 -07:00
Jingyu Zhou a2b867c6f9 Fix a unit test failure 2020-04-20 22:26:42 -07:00
Jingyu Zhou 3063611355 Write range files' begin & end keys to manifest file
This information can be very useful in knowing the content in these files,
especially for restores.
2020-04-20 22:26:42 -07:00
Jingyu Zhou 4c66c8c377 Fix backup progress calculation
The oldest epoch the master gets can assume its begin version is 1, which can
be wrong. In this case, we use the saved backup progress to "true-up" the real
begin version.
2020-04-20 11:06:46 -07:00
Jingyu Zhou db2cef844b Write mutation log type as a backup property
This can solve the problem when listing log files returns empty results.
2020-04-09 09:29:24 -07:00
Jingyu Zhou 4d06e837dc Remove getPartitionedRestoreSet() API
Use getRestoreSet() instead for both old and new partitioned logs.
2020-04-08 20:12:09 -07:00
Jingyu Zhou fd9caa88a0 Remove isPartitionedBackup()
This is no longer needed, since describeBackup() figures this out.
2020-04-08 16:09:18 -07:00
Jingyu Zhou 0cf6013357 Refactor to remove describePartitionedBackup()
The backup container can figure out if partitioned logs are used by looking at
mutation logs, thus consolidating the API to a single describeBackup() as
before.
2020-04-08 15:59:37 -07:00
Jingyu Zhou 241f9c123e Try to find continuous log ranges that cover snapshot begin version
Backup container can have mutation log files that are not continuous overall,
but contain a continuous range that cover the snapshots. So when determine the
continuous log ranges, try to find one that cover the first snapshot's begin
version.
2020-03-28 21:19:47 -07:00
Jingyu Zhou 3261eae8c1 Initialize contiguousLogEnd to the lowest possible version
If set to the first file's end version, then we may erroneously set to an
unrestorable version.
2020-03-28 17:38:02 -07:00
Jingyu Zhou 34745d71fa Fix continuous end version
When scan from a known version, stop if logs are not continuous. However, if
scan from 0, we should reset minLogBegin to the next file and continue scan
from that file's begin version.
2020-03-28 17:38:02 -07:00
Jingyu Zhou abcb458a44 Fix contiguousLogEnd calculation
The contiguous version should scan from the max of scanBegin version and the
minLogBegin version. Once we found a version that's larger, set it as the
contiguousLogEnd version.
2020-03-28 17:38:02 -07:00
Meng Xu 32b0ba1822 Merge branch 'master' into mengxu/parallel-range-log-file-loading-PR 2020-03-27 12:13:47 -07:00
Meng Xu 113d0fb48b Remove incorrect assertion 2020-03-27 12:13:30 -07:00
Jingyu Zhou 772ab70aee Add an option for fast restore to restore old backups
If "usePartitionedLogs" is set to false, then the workload uses old backups for
restore.
2020-03-26 13:04:00 -07:00
Jingyu Zhou 799f0b4b0e Small code refactor 2020-03-20 20:15:09 -07:00
Jingyu Zhou 4c75c61f39 Fix duplicate file removal for subset version ranges
Partitioned logs can have strict subset version ranges, which was not properly
handled -- we used to assume overlapping only happens for the same begin
version.
2020-03-20 20:15:09 -07:00
Jingyu Zhou c63493c34f Allow overlapped versions in partitioned logs
The overlapping can only happens between two generations, where the known
committed version to recovery version is copied from old generation to the new
generation. Within a generation, there is no overlap.

The fix here is related to the calculation of continuous version ranges,
allowing the overlap to happen.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 4f4ce93f8c Remove debug print out 2020-03-20 20:15:09 -07:00
Jingyu Zhou fe6b4a4398 Some correctness fixes 2020-03-20 20:15:08 -07:00
Jingyu Zhou 5ce9fc0e4c Partitioned logs should be filtered after sorting by tag IDs
The default sorting by begin and end version doesn't work with duplicates
removal, as tags are also compared.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 20df67ee6a Filter partitioned logs with subset relationship
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 05b87cf288 Partitioned logs need to compute continuous begin Version
Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 1f95cba53e Add describePartitionedBackup() for parallel restore
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.

Rename partitioned log file so that the last number is block size.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 938a6f358d Describe backup uses partitioned logs to find continuous end version
For partitioned logs, the continuous end version has to be done range by range,
where each range must contain continuous version for all tags.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 659843ff51 Check partitioned log files are continuous for RestoreSet
The idea of checking is to use Tag 0 to find out ranges and their number of
tags. Then for each tag 1 and above, check versions are continuous.
2020-03-20 20:13:38 -07:00
Jingyu Zhou fda6c08640 Include a total number of tags in partition log file names
This is needed for BackupContainer to check partitioned mutation logs are
continuous, i.e., restorable to a version.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 64859467e4 Return partitioned logs for RestorableFileSet 2020-03-20 20:13:38 -07:00
Jingyu Zhou 88ad28e576 Integrate parallel restore with partitioned logs
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.

TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Jingyu Zhou ec352c03c9 Add partitioned logs to BackupContainer 2020-03-20 20:13:38 -07:00
Evan Tschannen c11c24b79d removed the fdbrpc version of platform.h 2020-02-28 14:56:10 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Steve Atherton 3d72c2a661 BackupContainerFilesystem no longer unnecessarily depends on abspath() to find the last part of a path string, since it shouldn't touch the local filesystem in the remote case. 2020-02-18 16:35:00 -08:00
Jingyu Zhou 7c10683c77 Backup workers save logs into right containers
The mutation logs of backup workers are saved into "mlogs" directory under the
container directory. The backup worker has been restructured to handle multiple
backups, where each one is stored in a separate backup container.

In the backup worker, mutations pulled from TLogs are buffered in a message
queue. When writing out to different containers, their corresponding mutation
ranges are used to check if a mutation should be written. When a new backup
is submitted by the client, "backupStartedKey" is updated. The worker monitors
this key, updates its internal map of backups, and then next pull from TLog
needs to wait for the readiness of the new backup. This is to ensure when
worker 0 sets the backup is started, all workers have already been logging
mutations for the backup.
2020-02-03 20:27:14 -08:00
Jingyu Zhou d8c74e7e1a Extend BackupContainer to support tagged log files
That is, the file name contains the log router tag ID as the last component,
e.g., "log,39638169,42718056,016f52a4d16ef36fd3335db9c68abfc1,1048576,1".
2020-01-22 19:38:46 -08:00
Jingyu Zhou f21d7ca44c Add tag ID to backup log file names 2020-01-22 19:38:46 -08:00
Jingyu Zhou dafcaee844 Fix compiler errors. 2020-01-22 19:38:45 -08:00
Jingyu Zhou c7f51782b8 Use override for virtual functions. 2020-01-22 19:38:45 -08:00
Steve Atherton 9a031bfc47 Function was renamed. 2019-12-11 11:00:12 -08:00
Stephen Atherton 09e8d804e8 Added BlobStoreEndpoint::listBuckets(), renamed listBucket() and several related functions with similar names to listObjects() to avoid confusion and closer match what it actually does. Added a bytesDeleted output statistic to BlobStoreEndpoint::deleteRecursively. 2019-12-06 00:14:13 -08:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Meng Xu 477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00