Commit Graph

61 Commits

Author SHA1 Message Date
Meng Xu bbc7ce581e Resolve conflicts merging from 6.3 to master 2020-09-25 16:08:31 -07:00
Meng Xu 862336de8f Merge branch 'master' into mengxu/merge-to-master-PR 2020-09-24 17:06:00 -07:00
Young Liu fd7198d874 Extend backup container interface to support query restorable files set by key ranges 2020-08-29 19:58:07 -07:00
Jon Fu 00c77ba2b4 Added beginVersion cmd line option and addressed code review comments 2020-08-28 14:29:22 -04:00
Jon Fu 21635f8a28 update backup restore for local testing 2020-08-04 15:48:43 -04:00
Jingyu Zhou 7d59e53349 Consolidate makePadding() 2020-04-28 15:39:23 -07:00
Jingyu Zhou b315163033 Fix a memory corruption error 2020-04-28 15:39:23 -07:00
Jingyu Zhou 9bfc5bbea8 Check RestorableFileSet's key ranges in simulation
Ranges written in the manifest file should match with actual file content.
2020-04-21 12:55:40 -07:00
Jingyu Zhou 0938e45c6a Set continuous version to invalidVersion when snapshot version is the target version 2020-04-20 22:26:42 -07:00
Jingyu Zhou 930b175c4c Add range files' key ranges to RestorableFileSet
Also add continuous logs' begin and end version in RestorableFileSet.
2020-04-20 22:26:42 -07:00
Jingyu Zhou 3063611355 Write range files' begin & end keys to manifest file
This information can be very useful in knowing the content in these files,
especially for restores.
2020-04-20 22:26:42 -07:00
Jingyu Zhou 4d06e837dc Remove getPartitionedRestoreSet() API
Use getRestoreSet() instead for both old and new partitioned logs.
2020-04-08 20:12:09 -07:00
Jingyu Zhou fd9caa88a0 Remove isPartitionedBackup()
This is no longer needed, since describeBackup() figures this out.
2020-04-08 16:09:18 -07:00
Jingyu Zhou 0cf6013357 Refactor to remove describePartitionedBackup()
The backup container can figure out if partitioned logs are used by looking at
mutation logs, thus consolidating the API to a single describeBackup() as
before.
2020-04-08 15:59:37 -07:00
Jingyu Zhou 772ab70aee Add an option for fast restore to restore old backups
If "usePartitionedLogs" is set to false, then the workload uses old backups for
restore.
2020-03-26 13:04:00 -07:00
Jingyu Zhou 4c75c61f39 Fix duplicate file removal for subset version ranges
Partitioned logs can have strict subset version ranges, which was not properly
handled -- we used to assume overlapping only happens for the same begin
version.
2020-03-20 20:15:09 -07:00
Jingyu Zhou 20df67ee6a Filter partitioned logs with subset relationship
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 1f95cba53e Add describePartitionedBackup() for parallel restore
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.

Rename partitioned log file so that the last number is block size.
2020-03-20 20:13:38 -07:00
Jingyu Zhou fda6c08640 Include a total number of tags in partition log file names
This is needed for BackupContainer to check partitioned mutation logs are
continuous, i.e., restorable to a version.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 88ad28e576 Integrate parallel restore with partitioned logs
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.

TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Jingyu Zhou e15015ee6c Add mutation log version names
I.e., BACKUP_AGENT_MLOG_VERSION for 2001 and PARTITIONED_MLOG_VERSION for 4110.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 1123157ae0 Ignore mutations large than the end version 2020-01-22 19:38:46 -08:00
Jingyu Zhou d8c74e7e1a Extend BackupContainer to support tagged log files
That is, the file name contains the log router tag ID as the last component,
e.g., "log,39638169,42718056,016f52a4d16ef36fd3335db9c68abfc1,1048576,1".
2020-01-22 19:38:46 -08:00
Jingyu Zhou f21d7ca44c Add tag ID to backup log file names 2020-01-22 19:38:46 -08:00
Meng Xu e345c9061f FastRestore:Refine debug messages 2019-11-04 11:47:38 -08:00
Meng Xu d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
Meng Xu a08a6776f5 FastRestore: Refactor to smaller components
The current code uses one restore interface to handle the work
for all restore roles, i.e., master, loader and applier.
This makes it harder to review or maintain or scale.

This commit split the restore into multiple roles by mimicing FDB
transaction system:
1) It uses a RestoreWorker as the process to host restore roles;
   This commit assumes one restore role per RestoreWorker; but
   it should be easy to extend to support multiple roles per RestoreWorker;
2) It creates 3 restore roles:
   RestoreMaster: Coordinate the restore process and send commands to the other two roles;
   RestoreLoader: Parse backup files to mutations and send mutations to appliers;
   RestoreApplier: Sort received mutations and apply them to DB in order.

Compilable version. To be tested in correctness.
2019-05-10 14:20:06 -07:00
Meng Xu 4c3ccebe8a FastRestore: Cleanup code
Remove unused code and comments.
2019-04-12 13:49:55 -07:00
Meng Xu 70d7c289f4 Merge branch 'master' into mengxu/restore/parallel-v7 2019-03-30 22:13:10 -07:00
Stephen Atherton ca8bbad657 Added --json option to fdbbackup describe. Also added expired percentage indicator to snapshot details. 2019-03-06 14:14:06 -08:00
mpilman 0bb60e5a3b Use proper fwd decl in NativeAPI
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Meng Xu 550f2e2682 Merge with master to use the latest backup container
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00
Meng Xu 5f406d864c WiP:Has bug: assign roles and distribute files
The global variables for a process in real mode are separate across processes in real mode, BUT
they are NOT separate in simulation mode.
The workers in different processes can still access the global variables,
which makes one worker overwrite another work's global variable value.

To solve this problem, which happens in other FDB code as well,
we allocate the global variable in a heap and pass the pointers across functions,
as what DataDistribution.actor does.

The other approach is to create two code path: one for real mode and the other for simulator,
as the g_simulator does to create the process context for each simulated process
2019-01-04 22:23:02 -08:00
Meng Xu 338f7ebe16 WiP: not compilable, get restore files
Need to see how existing code pass conplex struct around
E.g., TaskBucket
2018-12-26 20:57:56 -08:00
Stephen Atherton 928876be81 Backup dumpFileList() can now take a version range. 2018-12-21 22:42:29 -08:00
Stephen Atherton f64d5321e9 Backup describe, expire, and delete will now clearly indicate when there is no backup at the given URL. 2018-12-20 18:05:23 -08:00
Stephen Atherton 354abebf64 Added progress reporting to backup expiration. Simplified backup delete progress. 2018-12-20 00:23:26 -08:00
Stephen Atherton 69d847dbbd Backup describe can now analyze a backup assuming log data starts at a given version. This is used by expire both as an optimization and because doing so enables expire to repair a bad metadata state that can result from a cancelled or failed expire operation from fdbbackup <= 6.0.17. ListLogFiles() and ListRangeFiles() no longer sort results as in most cases the sort is not required and the result set can be very large. Describe will now update minLogBegin metadata to the beginning of the log range known to be present when possible. Several serial log and range list pairings are now done in parallel. 2018-12-18 04:33:37 -08:00
Stephen Atherton 223b19f5ba Rewrote backup metadata scheme to fix several design flaws involving cancelled or concurrent operations. Most importantly, before an expire deletes any data it marks the end of the range being deleted as 'unreliable' so further reads of the backup will treat versions before that point as having data holes. 2018-12-16 00:18:13 -08:00
Meng Xu 1b085a9817 sequantial restore: pass 1 test case
-r simulation --logsize 1024MiB -f foundationdb/tests/fast/ParallelRestoreCorrectness.txt -b off -s 95208406
2018-12-03 10:57:30 -08:00
Evan Tschannen 1f3b6e8bdf Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/BlobStore.actor.cpp
#	versions.target
2018-11-27 14:41:46 -08:00
Stephen Atherton ec9410492d Changed backup folder scheme to use a much smaller number of unique folders for kv range files, which will speed up list and expire operations. The old scheme is still readable but will no longer be written except in simulation in order to test backward compatibility. 2018-11-23 05:23:56 -08:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Stephen Atherton 7f0b7311b9 Corrected function name to timeKeeperVersionFromDatetime(). 'Fdbbackup expire' now allows an expire_before version of 0 if explicitly passed by version or by timestamp. 2018-01-23 00:19:51 -08:00
Stephen Atherton 51a1bd9327 Timekeeper lookup improvements, moved both function declartions to BackupContainer.h. VersionFromEpochs() now uses versions/sec to adjust the lookup result to improve accuracy. Conversions in both directions look for the latest record less than the target conversion value, but failing that they will now fall back on any available data point and adjust from there using versions/sec. 2018-01-22 23:57:01 -08:00
Stephen Atherton 93b34a945f Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup. 2018-01-17 04:09:43 -08:00