Young Liu
1ad5e17458
add support for comparing original and current impls
2020-09-05 11:14:59 -07:00
Jon Fu
b9baa996e3
Merge branch 'master' of https://github.com/apple/foundationdb into jfu-incremental-backup-only
2020-09-01 14:38:02 -04:00
Evan Tschannen
12edadd059
Merge branch 'release-6.3'
...
# Conflicts:
# CMakeLists.txt
# fdbclient/Knobs.cpp
# fdbclient/MasterProxyInterface.h
# fdbrpc/simulator.h
# fdbserver/MasterProxyServer.actor.cpp
# tests/fast/CycleAndLock.txt
# tests/fast/TxnStateStoreCycleTest.txt
# tests/fast/VersionStamp.txt
# tests/slow/ParallelRestoreOldBackupApiCorrectnessAtomicRestore.txt
# tests/slow/ParallelRestoreOldBackupCorrectnessCycle.txt
# versions.target
2020-08-31 19:33:34 -07:00
Young Liu
b6c0299d09
Add help message in backup CLI for added options
2020-08-31 09:31:57 -07:00
Young Liu
33aa10b461
Minor optimizations
2020-08-29 20:10:45 -07:00
Young Liu
fd7198d874
Extend backup container interface to support query restorable files set by key ranges
2020-08-29 19:58:07 -07:00
Jon Fu
00c77ba2b4
Added beginVersion cmd line option and addressed code review comments
2020-08-28 14:29:22 -04:00
Meng Xu
ca9b1f5b34
Merge branch 'release-6.3' into mengxu/fr-sched-PR
...
Resolve conflict at BackupContainer.actor.cpp
2020-08-27 16:54:00 -07:00
Meng Xu
369000a125
BackupContainer:Remove link to filename with random string
2020-08-26 15:55:36 -07:00
Meng Xu
a2ab709a0c
BackupContainer:Use processId as the process filename
...
instead of using a randomly generated string which change every time
when a file is open.
Having too many files will trigger TOO_MANY_FILES error
2020-08-26 15:54:34 -07:00
Meng Xu
6256bedf8d
BackupContainer:Use processId as the process filename
...
instead of using a randomly generated string which change every time
when a file is open.
Having too many files will trigger TOO_MANY_FILES error
2020-08-25 12:25:09 -07:00
Jon Fu
ae999aa118
Merge branch 'master' of https://github.com/apple/foundationdb into jfu-incremental-backup-only
2020-08-19 16:36:47 -04:00
Jon Fu
7dce3a9187
fixed issue with mutations not applying and allow backup to non-empty db
2020-08-11 15:39:21 -04:00
Jon Fu
21635f8a28
update backup restore for local testing
2020-08-04 15:48:43 -04:00
Evan Tschannen
a49cb41de7
Merge branch 'release-6.3'
...
# Conflicts:
# CMakeLists.txt
# cmake/ConfigureCompiler.cmake
# fdbserver/Knobs.cpp
# fdbserver/StorageCache.actor.cpp
# fdbserver/storageserver.actor.cpp
# flow/ThreadHelper.actor.h
# flow/serialize.h
# tests/CMakeLists.txt
2020-07-29 00:31:55 -07:00
Andrew Noyes
d2cf700bd4
Fix compiler warnings
2020-07-28 18:30:26 +00:00
Xin Dong
2ac7df8a18
Merge pull request #3563 from xumengpanda/mengxu/fr-filesize-PR
...
FastRestore:Add trace for file size and bc progress
2020-07-25 20:08:57 -07:00
Meng Xu
99d8399f4e
FastRestore:Add trace for file size and bc progress
2020-07-25 19:12:25 -07:00
Evan Tschannen
e1dedff7b3
Merge branch 'release-6.2' into release-6.3
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# cmake/ConfigureCompiler.cmake
# documentation/sphinx/source/downloads.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/FileBackupAgent.actor.cpp
# packaging/msi/FDBInstaller.wxs
# versions.target
2020-07-24 12:10:44 -07:00
Oleg Samarin
18964fae2a
Unclear error message in fdbbackup if the backup url uses a symlink
2020-07-14 15:39:31 +03:00
Andrew Noyes
6446b4c082
WIP
2020-07-09 22:02:43 +00:00
David Youngworth
f22a8845e4
Finish putting actors in Platform.actor.cpp
2020-05-20 18:14:29 -07:00
David Youngworth
5559643403
Fix rebase issue
2020-05-06 13:39:18 -07:00
David Youngworth
a09f30e48e
Make backupContainer's listFiles step asynchronous
2020-05-06 10:42:40 -07:00
Meng Xu
07a9a05683
FastRestore:Agent:Fix restore requests
2020-04-30 16:20:20 -07:00
Meng Xu
6ee78aa3a4
Fix:Disable sanity check backup metadata file
...
Which can increase the false positive rate of TooManyFiles error
2020-04-30 16:16:14 -07:00
Jingyu Zhou
a8becb9027
Use calculated block size for range files in unit tests
2020-04-28 21:13:18 -07:00
Jingyu Zhou
ba261eda36
Fix a backup container unit test
...
Write a valid range file instead of random data so that checking its content is
fine.
2020-04-28 15:39:24 -07:00
Meng Xu
59217ddf1e
Remove sanity check on metadata
...
The sanity check parses each range file to get the key range of each
range file.
The parsing incurs restore_unsupported_file_version error.
We need to include this sanity check before 6.3 release.
2020-04-23 14:59:58 -07:00
Jingyu Zhou
9bfc5bbea8
Check RestorableFileSet's key ranges in simulation
...
Ranges written in the manifest file should match with actual file content.
2020-04-21 12:55:40 -07:00
Jingyu Zhou
0938e45c6a
Set continuous version to invalidVersion when snapshot version is the target version
2020-04-20 22:26:42 -07:00
Jingyu Zhou
930b175c4c
Add range files' key ranges to RestorableFileSet
...
Also add continuous logs' begin and end version in RestorableFileSet.
2020-04-20 22:26:42 -07:00
Jingyu Zhou
a2b867c6f9
Fix a unit test failure
2020-04-20 22:26:42 -07:00
Jingyu Zhou
3063611355
Write range files' begin & end keys to manifest file
...
This information can be very useful in knowing the content in these files,
especially for restores.
2020-04-20 22:26:42 -07:00
Jingyu Zhou
4c66c8c377
Fix backup progress calculation
...
The oldest epoch the master gets can assume its begin version is 1, which can
be wrong. In this case, we use the saved backup progress to "true-up" the real
begin version.
2020-04-20 11:06:46 -07:00
Jingyu Zhou
db2cef844b
Write mutation log type as a backup property
...
This can solve the problem when listing log files returns empty results.
2020-04-09 09:29:24 -07:00
Jingyu Zhou
4d06e837dc
Remove getPartitionedRestoreSet() API
...
Use getRestoreSet() instead for both old and new partitioned logs.
2020-04-08 20:12:09 -07:00
Jingyu Zhou
fd9caa88a0
Remove isPartitionedBackup()
...
This is no longer needed, since describeBackup() figures this out.
2020-04-08 16:09:18 -07:00
Jingyu Zhou
0cf6013357
Refactor to remove describePartitionedBackup()
...
The backup container can figure out if partitioned logs are used by looking at
mutation logs, thus consolidating the API to a single describeBackup() as
before.
2020-04-08 15:59:37 -07:00
Jingyu Zhou
241f9c123e
Try to find continuous log ranges that cover snapshot begin version
...
Backup container can have mutation log files that are not continuous overall,
but contain a continuous range that cover the snapshots. So when determine the
continuous log ranges, try to find one that cover the first snapshot's begin
version.
2020-03-28 21:19:47 -07:00
Jingyu Zhou
3261eae8c1
Initialize contiguousLogEnd to the lowest possible version
...
If set to the first file's end version, then we may erroneously set to an
unrestorable version.
2020-03-28 17:38:02 -07:00
Jingyu Zhou
34745d71fa
Fix continuous end version
...
When scan from a known version, stop if logs are not continuous. However, if
scan from 0, we should reset minLogBegin to the next file and continue scan
from that file's begin version.
2020-03-28 17:38:02 -07:00
Jingyu Zhou
abcb458a44
Fix contiguousLogEnd calculation
...
The contiguous version should scan from the max of scanBegin version and the
minLogBegin version. Once we found a version that's larger, set it as the
contiguousLogEnd version.
2020-03-28 17:38:02 -07:00
Meng Xu
32b0ba1822
Merge branch 'master' into mengxu/parallel-range-log-file-loading-PR
2020-03-27 12:13:47 -07:00
Meng Xu
113d0fb48b
Remove incorrect assertion
2020-03-27 12:13:30 -07:00
Jingyu Zhou
772ab70aee
Add an option for fast restore to restore old backups
...
If "usePartitionedLogs" is set to false, then the workload uses old backups for
restore.
2020-03-26 13:04:00 -07:00
Jingyu Zhou
799f0b4b0e
Small code refactor
2020-03-20 20:15:09 -07:00
Jingyu Zhou
4c75c61f39
Fix duplicate file removal for subset version ranges
...
Partitioned logs can have strict subset version ranges, which was not properly
handled -- we used to assume overlapping only happens for the same begin
version.
2020-03-20 20:15:09 -07:00
Jingyu Zhou
c63493c34f
Allow overlapped versions in partitioned logs
...
The overlapping can only happens between two generations, where the known
committed version to recovery version is copied from old generation to the new
generation. Within a generation, there is no overlap.
The fix here is related to the calculation of continuous version ranges,
allowing the overlap to happen.
2020-03-20 20:15:09 -07:00
Jingyu Zhou
4f4ce93f8c
Remove debug print out
2020-03-20 20:15:09 -07:00
Jingyu Zhou
fe6b4a4398
Some correctness fixes
2020-03-20 20:15:08 -07:00
Jingyu Zhou
5ce9fc0e4c
Partitioned logs should be filtered after sorting by tag IDs
...
The default sorting by begin and end version doesn't work with duplicates
removal, as tags are also compared.
2020-03-20 20:15:08 -07:00
Jingyu Zhou
20df67ee6a
Filter partitioned logs with subset relationship
...
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou
05b87cf288
Partitioned logs need to compute continuous begin Version
...
Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
2020-03-20 20:13:38 -07:00
Jingyu Zhou
1f95cba53e
Add describePartitionedBackup() for parallel restore
...
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.
Rename partitioned log file so that the last number is block size.
2020-03-20 20:13:38 -07:00
Jingyu Zhou
938a6f358d
Describe backup uses partitioned logs to find continuous end version
...
For partitioned logs, the continuous end version has to be done range by range,
where each range must contain continuous version for all tags.
2020-03-20 20:13:38 -07:00
Jingyu Zhou
659843ff51
Check partitioned log files are continuous for RestoreSet
...
The idea of checking is to use Tag 0 to find out ranges and their number of
tags. Then for each tag 1 and above, check versions are continuous.
2020-03-20 20:13:38 -07:00
Jingyu Zhou
fda6c08640
Include a total number of tags in partition log file names
...
This is needed for BackupContainer to check partitioned mutation logs are
continuous, i.e., restorable to a version.
2020-03-20 20:13:38 -07:00
Jingyu Zhou
64859467e4
Return partitioned logs for RestorableFileSet
2020-03-20 20:13:38 -07:00
Jingyu Zhou
88ad28e576
Integrate parallel restore with partitioned logs
...
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.
TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Jingyu Zhou
ec352c03c9
Add partitioned logs to BackupContainer
2020-03-20 20:13:38 -07:00
Evan Tschannen
c11c24b79d
removed the fdbrpc version of platform.h
2020-02-28 14:56:10 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Steve Atherton
3d72c2a661
BackupContainerFilesystem no longer unnecessarily depends on abspath() to find the last part of a path string, since it shouldn't touch the local filesystem in the remote case.
2020-02-18 16:35:00 -08:00
Jingyu Zhou
7c10683c77
Backup workers save logs into right containers
...
The mutation logs of backup workers are saved into "mlogs" directory under the
container directory. The backup worker has been restructured to handle multiple
backups, where each one is stored in a separate backup container.
In the backup worker, mutations pulled from TLogs are buffered in a message
queue. When writing out to different containers, their corresponding mutation
ranges are used to check if a mutation should be written. When a new backup
is submitted by the client, "backupStartedKey" is updated. The worker monitors
this key, updates its internal map of backups, and then next pull from TLog
needs to wait for the readiness of the new backup. This is to ensure when
worker 0 sets the backup is started, all workers have already been logging
mutations for the backup.
2020-02-03 20:27:14 -08:00
Jingyu Zhou
d8c74e7e1a
Extend BackupContainer to support tagged log files
...
That is, the file name contains the log router tag ID as the last component,
e.g., "log,39638169,42718056,016f52a4d16ef36fd3335db9c68abfc1,1048576,1".
2020-01-22 19:38:46 -08:00
Jingyu Zhou
f21d7ca44c
Add tag ID to backup log file names
2020-01-22 19:38:46 -08:00
Jingyu Zhou
dafcaee844
Fix compiler errors.
2020-01-22 19:38:45 -08:00
Jingyu Zhou
c7f51782b8
Use override for virtual functions.
2020-01-22 19:38:45 -08:00
Steve Atherton
9a031bfc47
Function was renamed.
2019-12-11 11:00:12 -08:00
Stephen Atherton
09e8d804e8
Added BlobStoreEndpoint::listBuckets(), renamed listBucket() and several related functions with similar names to listObjects() to avoid confusion and closer match what it actually does. Added a bytesDeleted output statistic to BlobStoreEndpoint::deleteRecursively.
2019-12-06 00:14:13 -08:00
Meng Xu
3b54363780
FastRestore:Apply Clang-format
2019-08-01 18:09:12 -07:00
Meng Xu
45083edf74
Merge branch 'master' into mengxu/performant-restore-PR
...
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Meng Xu
477fd152c0
FastRestore:Refactor code
...
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
the file only has functionalities related to restore worker.
Passed correctness test
2019-06-04 11:22:47 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Austin Seipp
b5cbffc1b8
fdbclient: fix some print/scan format warnings
...
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Meng Xu
529ce66b6c
Merge branch 'apple/master' into mengxu/performant-restore-PR
2019-04-18 18:02:45 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
Meng Xu
70d7c289f4
Merge branch 'master' into mengxu/restore/parallel-v7
2019-03-30 22:13:10 -07:00
Meng Xu
589fb76826
FastRestore:Attempt to fix old restore
2019-03-30 15:19:30 -07:00
Meng Xu
5e9a6edfe6
FastRestore:bug fix: Lock DB successfully
2019-03-29 13:31:38 -07:00
Stephen Atherton
cabe7ca844
Stopped using %z to parse timezone offset with strptime() because it only seems to work as expected on MacOS. Updated time input/output unit tests so that they don't assume what the local timezone is.
2019-03-21 19:38:07 -07:00
Stephen Atherton
d5e50e6963
Rewrote BackupAgentBase::parseTime() to use std::get_time() so it compiles on all supported platforms. Added unit test for parseTime() and formatTime().
2019-03-20 01:18:37 -07:00
Stephen Atherton
c6edcc7f06
Added schema version string to backup JSON status docs. Bug fix in backup status JSON, the document was being created outside the transaction retry loop so retries would combine partial element sets across all tries into the result.
2019-03-14 02:10:14 -07:00
Steve Atherton
8aab719c22
Merge branch 'master' into feature-backup-json
2019-03-12 18:23:16 -07:00
Stephen Atherton
bc0b2aa040
Merge branch 'release-6.0' of https://github.com/apple/foundationdb
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/BlobStore.actor.cpp
2019-03-12 04:49:12 -07:00
Stephen Atherton
023bbb566f
Renamed backup state enums for clarity, added backup state names. Changed Epochs to EpochSeconds in backup JSON along with some other renaming/moving of fields, and added information about snapshot dispatch. Changed timestamp format for input/output in all backup/restore contexts to be a fully qualified time with timezone offset. Added information about the last snapshot dispatch to backup config and status (not yet populated).
2019-03-10 16:00:01 -07:00
Stephen Atherton
06c11a316d
Normalized timestamp to text format across backup and restore tooling. Added epochs field to JSON objects describing versions and timestamps in backup status and describe output, renamed some fields for clarity.
2019-03-06 22:34:25 -08:00
Stephen Atherton
ca8bbad657
Added --json option to fdbbackup describe. Also added expired percentage indicator to snapshot details.
2019-03-06 14:14:06 -08:00
Stephen Atherton
87d0c5bae0
Bug/usability fix: The output URLs of fdbbackup list were meant to be directly usable for backup management operations but they were missing the bucket URL parameter.
2019-03-05 21:14:21 -08:00
Stephen Atherton
d3377722d5
Added blob store Backup URL parameter 'header' which enables addition of custom HTTP header fields to blob store HTTP requests. Added 'fdbbackup modify' command line tool for changing the backup URL and parameters, default snapshot interval, and/or current snapshot interval of a running backup.
2019-03-05 04:00:11 -08:00
Stephen Atherton
7d287c6999
Merge branch 'release-6.0'
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2019-02-28 14:01:00 -08:00
Stephen Atherton
887856b6b0
Bug fix in AsyncFileReadAhead where a file size that is an integer multiple of the read chunk size will cause a crash when reading the file's final block. BackupContainerLocalDirectory now uses AsyncFileReadAhead in simulation to get simulation coverage of that class, and FileBackup will generate file sizes which expose the bug.
2019-02-28 00:22:38 -08:00
mpilman
479a4d7c22
Minor fixes in fdbclient for intellisense
2019-02-19 15:16:59 -08:00
Andrew Noyes
067a445e06
Replace unused _ variables with wait(success(...))
2019-02-12 17:30:30 -08:00
Alex Miller
0750dc0418
Change store from (Future, T&) to (T&, Future).
...
LHS = RHS, and the name of what's being modified is easier to find.
2019-02-04 18:04:22 -08:00
Meng Xu
9f5e06099f
Circus test: get performance result from circus
...
A worker may die which prevents the restore from finishing.
The restore speed is only 30MB per second, which need improvement
2019-02-01 14:09:40 -08:00
Meng Xu
550f2e2682
Merge with master to use the latest backup container
...
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
Meng Xu
2e11b38f3f
Add print in fast restore agent about backup info
2019-01-30 11:18:11 -08:00
Evan Tschannen
684a22a52b
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/BackupContainer.actor.cpp
# fdbclient/HTTP.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/BackupCorrectness.actor.cpp
# versions.target
2019-01-09 16:14:46 -08:00