Commit Graph

225 Commits

Author SHA1 Message Date
Young Liu 1ad5e17458 add support for comparing original and current impls 2020-09-05 11:14:59 -07:00
Jon Fu b9baa996e3 Merge branch 'master' of https://github.com/apple/foundationdb into jfu-incremental-backup-only 2020-09-01 14:38:02 -04:00
Evan Tschannen 12edadd059 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	fdbclient/Knobs.cpp
#	fdbclient/MasterProxyInterface.h
#	fdbrpc/simulator.h
#	fdbserver/MasterProxyServer.actor.cpp
#	tests/fast/CycleAndLock.txt
#	tests/fast/TxnStateStoreCycleTest.txt
#	tests/fast/VersionStamp.txt
#	tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
#	tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
#	versions.target
2020-08-31 19:33:34 -07:00
Young Liu b6c0299d09 Add help message in backup CLI for added options 2020-08-31 09:31:57 -07:00
Young Liu 33aa10b461 Minor optimizations 2020-08-29 20:10:45 -07:00
Young Liu fd7198d874 Extend backup container interface to support query restorable files set by key ranges 2020-08-29 19:58:07 -07:00
Jon Fu 00c77ba2b4 Added beginVersion cmd line option and addressed code review comments 2020-08-28 14:29:22 -04:00
Meng Xu ca9b1f5b34 Merge branch 'release-6.3' into mengxu/fr-sched-PR
Resolve conflict at BackupContainer.actor.cpp
2020-08-27 16:54:00 -07:00
Meng Xu 369000a125 BackupContainer:Remove link to filename with random string 2020-08-26 15:55:36 -07:00
Meng Xu a2ab709a0c BackupContainer:Use processId as the process filename
instead of using a randomly generated string which change every time
when a file is open.

Having too many files will trigger TOO_MANY_FILES error
2020-08-26 15:54:34 -07:00
Meng Xu 6256bedf8d BackupContainer:Use processId as the process filename
instead of using a randomly generated string which change every time
when a file is open.

Having too many files will trigger TOO_MANY_FILES error
2020-08-25 12:25:09 -07:00
Jon Fu ae999aa118 Merge branch 'master' of https://github.com/apple/foundationdb into jfu-incremental-backup-only 2020-08-19 16:36:47 -04:00
Jon Fu 7dce3a9187 fixed issue with mutations not applying and allow backup to non-empty db 2020-08-11 15:39:21 -04:00
Jon Fu 21635f8a28 update backup restore for local testing 2020-08-04 15:48:43 -04:00
Evan Tschannen a49cb41de7 Merge branch 'release-6.3'
# Conflicts:
#	CMakeLists.txt
#	cmake/ConfigureCompiler.cmake
#	fdbserver/Knobs.cpp
#	fdbserver/StorageCache.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	flow/ThreadHelper.actor.h
#	flow/serialize.h
#	tests/CMakeLists.txt
2020-07-29 00:31:55 -07:00
Andrew Noyes d2cf700bd4 Fix compiler warnings 2020-07-28 18:30:26 +00:00
Xin Dong 2ac7df8a18
Merge pull request #3563 from xumengpanda/mengxu/fr-filesize-PR
FastRestore:Add trace for file size and bc progress
2020-07-25 20:08:57 -07:00
Meng Xu 99d8399f4e FastRestore:Add trace for file size and bc progress 2020-07-25 19:12:25 -07:00
Evan Tschannen e1dedff7b3 Merge branch 'release-6.2' into release-6.3
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	cmake/ConfigureCompiler.cmake
#	documentation/sphinx/source/downloads.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/FileBackupAgent.actor.cpp
#	packaging/msi/FDBInstaller.wxs
#	versions.target
2020-07-24 12:10:44 -07:00
Oleg Samarin 18964fae2a Unclear error message in fdbbackup if the backup url uses a symlink 2020-07-14 15:39:31 +03:00
Andrew Noyes 6446b4c082 WIP 2020-07-09 22:02:43 +00:00
David Youngworth f22a8845e4 Finish putting actors in Platform.actor.cpp 2020-05-20 18:14:29 -07:00
David Youngworth 5559643403 Fix rebase issue 2020-05-06 13:39:18 -07:00
David Youngworth a09f30e48e Make backupContainer's listFiles step asynchronous 2020-05-06 10:42:40 -07:00
Meng Xu 07a9a05683 FastRestore:Agent:Fix restore requests 2020-04-30 16:20:20 -07:00
Meng Xu 6ee78aa3a4 Fix:Disable sanity check backup metadata file
Which can increase the false positive rate of TooManyFiles error
2020-04-30 16:16:14 -07:00
Jingyu Zhou a8becb9027 Use calculated block size for range files in unit tests 2020-04-28 21:13:18 -07:00
Jingyu Zhou ba261eda36 Fix a backup container unit test
Write a valid range file instead of random data so that checking its content is
fine.
2020-04-28 15:39:24 -07:00
Meng Xu 59217ddf1e Remove sanity check on metadata
The sanity check parses each range file to get the key range of each
range file.
The parsing incurs restore_unsupported_file_version error.

We need to include this sanity check before 6.3 release.
2020-04-23 14:59:58 -07:00
Jingyu Zhou 9bfc5bbea8 Check RestorableFileSet's key ranges in simulation
Ranges written in the manifest file should match with actual file content.
2020-04-21 12:55:40 -07:00
Jingyu Zhou 0938e45c6a Set continuous version to invalidVersion when snapshot version is the target version 2020-04-20 22:26:42 -07:00
Jingyu Zhou 930b175c4c Add range files' key ranges to RestorableFileSet
Also add continuous logs' begin and end version in RestorableFileSet.
2020-04-20 22:26:42 -07:00
Jingyu Zhou a2b867c6f9 Fix a unit test failure 2020-04-20 22:26:42 -07:00
Jingyu Zhou 3063611355 Write range files' begin & end keys to manifest file
This information can be very useful in knowing the content in these files,
especially for restores.
2020-04-20 22:26:42 -07:00
Jingyu Zhou 4c66c8c377 Fix backup progress calculation
The oldest epoch the master gets can assume its begin version is 1, which can
be wrong. In this case, we use the saved backup progress to "true-up" the real
begin version.
2020-04-20 11:06:46 -07:00
Jingyu Zhou db2cef844b Write mutation log type as a backup property
This can solve the problem when listing log files returns empty results.
2020-04-09 09:29:24 -07:00
Jingyu Zhou 4d06e837dc Remove getPartitionedRestoreSet() API
Use getRestoreSet() instead for both old and new partitioned logs.
2020-04-08 20:12:09 -07:00
Jingyu Zhou fd9caa88a0 Remove isPartitionedBackup()
This is no longer needed, since describeBackup() figures this out.
2020-04-08 16:09:18 -07:00
Jingyu Zhou 0cf6013357 Refactor to remove describePartitionedBackup()
The backup container can figure out if partitioned logs are used by looking at
mutation logs, thus consolidating the API to a single describeBackup() as
before.
2020-04-08 15:59:37 -07:00
Jingyu Zhou 241f9c123e Try to find continuous log ranges that cover snapshot begin version
Backup container can have mutation log files that are not continuous overall,
but contain a continuous range that cover the snapshots. So when determine the
continuous log ranges, try to find one that cover the first snapshot's begin
version.
2020-03-28 21:19:47 -07:00
Jingyu Zhou 3261eae8c1 Initialize contiguousLogEnd to the lowest possible version
If set to the first file's end version, then we may erroneously set to an
unrestorable version.
2020-03-28 17:38:02 -07:00
Jingyu Zhou 34745d71fa Fix continuous end version
When scan from a known version, stop if logs are not continuous. However, if
scan from 0, we should reset minLogBegin to the next file and continue scan
from that file's begin version.
2020-03-28 17:38:02 -07:00
Jingyu Zhou abcb458a44 Fix contiguousLogEnd calculation
The contiguous version should scan from the max of scanBegin version and the
minLogBegin version. Once we found a version that's larger, set it as the
contiguousLogEnd version.
2020-03-28 17:38:02 -07:00
Meng Xu 32b0ba1822 Merge branch 'master' into mengxu/parallel-range-log-file-loading-PR 2020-03-27 12:13:47 -07:00
Meng Xu 113d0fb48b Remove incorrect assertion 2020-03-27 12:13:30 -07:00
Jingyu Zhou 772ab70aee Add an option for fast restore to restore old backups
If "usePartitionedLogs" is set to false, then the workload uses old backups for
restore.
2020-03-26 13:04:00 -07:00
Jingyu Zhou 799f0b4b0e Small code refactor 2020-03-20 20:15:09 -07:00
Jingyu Zhou 4c75c61f39 Fix duplicate file removal for subset version ranges
Partitioned logs can have strict subset version ranges, which was not properly
handled -- we used to assume overlapping only happens for the same begin
version.
2020-03-20 20:15:09 -07:00
Jingyu Zhou c63493c34f Allow overlapped versions in partitioned logs
The overlapping can only happens between two generations, where the known
committed version to recovery version is copied from old generation to the new
generation. Within a generation, there is no overlap.

The fix here is related to the calculation of continuous version ranges,
allowing the overlap to happen.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 4f4ce93f8c Remove debug print out 2020-03-20 20:15:09 -07:00
Jingyu Zhou fe6b4a4398 Some correctness fixes 2020-03-20 20:15:08 -07:00
Jingyu Zhou 5ce9fc0e4c Partitioned logs should be filtered after sorting by tag IDs
The default sorting by begin and end version doesn't work with duplicates
removal, as tags are also compared.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 20df67ee6a Filter partitioned logs with subset relationship
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 05b87cf288 Partitioned logs need to compute continuous begin Version
Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 1f95cba53e Add describePartitionedBackup() for parallel restore
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.

Rename partitioned log file so that the last number is block size.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 938a6f358d Describe backup uses partitioned logs to find continuous end version
For partitioned logs, the continuous end version has to be done range by range,
where each range must contain continuous version for all tags.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 659843ff51 Check partitioned log files are continuous for RestoreSet
The idea of checking is to use Tag 0 to find out ranges and their number of
tags. Then for each tag 1 and above, check versions are continuous.
2020-03-20 20:13:38 -07:00
Jingyu Zhou fda6c08640 Include a total number of tags in partition log file names
This is needed for BackupContainer to check partitioned mutation logs are
continuous, i.e., restorable to a version.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 64859467e4 Return partitioned logs for RestorableFileSet 2020-03-20 20:13:38 -07:00
Jingyu Zhou 88ad28e576 Integrate parallel restore with partitioned logs
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.

TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Jingyu Zhou ec352c03c9 Add partitioned logs to BackupContainer 2020-03-20 20:13:38 -07:00
Evan Tschannen c11c24b79d removed the fdbrpc version of platform.h 2020-02-28 14:56:10 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Steve Atherton 3d72c2a661 BackupContainerFilesystem no longer unnecessarily depends on abspath() to find the last part of a path string, since it shouldn't touch the local filesystem in the remote case. 2020-02-18 16:35:00 -08:00
Jingyu Zhou 7c10683c77 Backup workers save logs into right containers
The mutation logs of backup workers are saved into "mlogs" directory under the
container directory. The backup worker has been restructured to handle multiple
backups, where each one is stored in a separate backup container.

In the backup worker, mutations pulled from TLogs are buffered in a message
queue. When writing out to different containers, their corresponding mutation
ranges are used to check if a mutation should be written. When a new backup
is submitted by the client, "backupStartedKey" is updated. The worker monitors
this key, updates its internal map of backups, and then next pull from TLog
needs to wait for the readiness of the new backup. This is to ensure when
worker 0 sets the backup is started, all workers have already been logging
mutations for the backup.
2020-02-03 20:27:14 -08:00
Jingyu Zhou d8c74e7e1a Extend BackupContainer to support tagged log files
That is, the file name contains the log router tag ID as the last component,
e.g., "log,39638169,42718056,016f52a4d16ef36fd3335db9c68abfc1,1048576,1".
2020-01-22 19:38:46 -08:00
Jingyu Zhou f21d7ca44c Add tag ID to backup log file names 2020-01-22 19:38:46 -08:00
Jingyu Zhou dafcaee844 Fix compiler errors. 2020-01-22 19:38:45 -08:00
Jingyu Zhou c7f51782b8 Use override for virtual functions. 2020-01-22 19:38:45 -08:00
Steve Atherton 9a031bfc47 Function was renamed. 2019-12-11 11:00:12 -08:00
Stephen Atherton 09e8d804e8 Added BlobStoreEndpoint::listBuckets(), renamed listBucket() and several related functions with similar names to listObjects() to avoid confusion and closer match what it actually does. Added a bytesDeleted output statistic to BlobStoreEndpoint::deleteRecursively. 2019-12-06 00:14:13 -08:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Meng Xu 477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Austin Seipp b5cbffc1b8 fdbclient: fix some print/scan format warnings
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Meng Xu 529ce66b6c Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-18 18:02:45 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Meng Xu 70d7c289f4 Merge branch 'master' into mengxu/restore/parallel-v7 2019-03-30 22:13:10 -07:00
Meng Xu 589fb76826 FastRestore:Attempt to fix old restore 2019-03-30 15:19:30 -07:00
Meng Xu 5e9a6edfe6 FastRestore:bug fix: Lock DB successfully 2019-03-29 13:31:38 -07:00
Stephen Atherton cabe7ca844 Stopped using %z to parse timezone offset with strptime() because it only seems to work as expected on MacOS. Updated time input/output unit tests so that they don't assume what the local timezone is. 2019-03-21 19:38:07 -07:00
Stephen Atherton d5e50e6963 Rewrote BackupAgentBase::parseTime() to use std::get_time() so it compiles on all supported platforms. Added unit test for parseTime() and formatTime(). 2019-03-20 01:18:37 -07:00
Stephen Atherton c6edcc7f06 Added schema version string to backup JSON status docs. Bug fix in backup status JSON, the document was being created outside the transaction retry loop so retries would combine partial element sets across all tries into the result. 2019-03-14 02:10:14 -07:00
Steve Atherton 8aab719c22
Merge branch 'master' into feature-backup-json 2019-03-12 18:23:16 -07:00
Stephen Atherton bc0b2aa040 Merge branch 'release-6.0' of https://github.com/apple/foundationdb
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BlobStore.actor.cpp
2019-03-12 04:49:12 -07:00
Stephen Atherton 023bbb566f Renamed backup state enums for clarity, added backup state names. Changed Epochs to EpochSeconds in backup JSON along with some other renaming/moving of fields, and added information about snapshot dispatch. Changed timestamp format for input/output in all backup/restore contexts to be a fully qualified time with timezone offset. Added information about the last snapshot dispatch to backup config and status (not yet populated). 2019-03-10 16:00:01 -07:00
Stephen Atherton 06c11a316d Normalized timestamp to text format across backup and restore tooling. Added epochs field to JSON objects describing versions and timestamps in backup status and describe output, renamed some fields for clarity. 2019-03-06 22:34:25 -08:00
Stephen Atherton ca8bbad657 Added --json option to fdbbackup describe. Also added expired percentage indicator to snapshot details. 2019-03-06 14:14:06 -08:00
Stephen Atherton 87d0c5bae0 Bug/usability fix: The output URLs of fdbbackup list were meant to be directly usable for backup management operations but they were missing the bucket URL parameter. 2019-03-05 21:14:21 -08:00
Stephen Atherton d3377722d5 Added blob store Backup URL parameter 'header' which enables addition of custom HTTP header fields to blob store HTTP requests. Added 'fdbbackup modify' command line tool for changing the backup URL and parameters, default snapshot interval, and/or current snapshot interval of a running backup. 2019-03-05 04:00:11 -08:00
Stephen Atherton 7d287c6999 Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2019-02-28 14:01:00 -08:00
Stephen Atherton 887856b6b0 Bug fix in AsyncFileReadAhead where a file size that is an integer multiple of the read chunk size will cause a crash when reading the file's final block. BackupContainerLocalDirectory now uses AsyncFileReadAhead in simulation to get simulation coverage of that class, and FileBackup will generate file sizes which expose the bug. 2019-02-28 00:22:38 -08:00
mpilman 479a4d7c22 Minor fixes in fdbclient for intellisense 2019-02-19 15:16:59 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
Alex Miller 0750dc0418 Change store from (Future, T&) to (T&, Future).
LHS = RHS, and the name of what's being modified is easier to find.
2019-02-04 18:04:22 -08:00
Meng Xu 9f5e06099f Circus test: get performance result from circus
A worker may die which prevents the restore from finishing.
The restore speed is only 30MB per second, which need improvement
2019-02-01 14:09:40 -08:00
Meng Xu 550f2e2682 Merge with master to use the latest backup container
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
Meng Xu 2e11b38f3f Add print in fast restore agent about backup info 2019-01-30 11:18:11 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00