Commit Graph

290 Commits

Author SHA1 Message Date
Meng Xu f5e8345496 FastRestoreAgent:Use atomicParallelRestore to kick off restore
Replace the handcrafted version with atomicParallelRestore actor
which is simulation tested
2020-04-27 22:15:00 -07:00
Meng Xu d4509090d4 Generate fastrestore_agent symlink in CMake 2020-04-23 07:55:44 -07:00
A.J. Beamon af4e0088ba
Merge pull request #2896 from tclinken/atomically-update-dependent-knobs
Atomically update dependent knobs
2020-04-08 15:00:49 -07:00
tclinken 52860043c9 Merge remote-tracking branch 'origin' into atomically-update-dependent-knobs 2020-04-08 12:26:21 -07:00
Markus Pilman 7ba2cec9ef Merge branch 'master' of github.com:apple/foundationdb into features/no-make 2020-04-08 10:23:46 -07:00
Markus Pilman d4542dbb5a Delete old build system 2020-04-07 11:03:45 -07:00
Jingyu Zhou 9fb3fb9d82 Add pause/resume for new backups
To pause/resume the backup workers, the fdbbackup command will write to the
backupPausedKey. Then backup workers noticed the value of the key has been
changed and stops/resumes pulling from TLog.
2020-04-06 14:29:46 -07:00
Markus Pilman e4611e8ae4 fix versions.h stupidity 2020-04-06 10:28:55 -07:00
Markus Pilman 8b5780c36c don't include source and binary dir
This forces users to use include paths from the sources root.

So `#include "Arena.h"` won't work anymore, only
`#include "flow/Arena.h"` will.
2020-04-06 10:13:49 -07:00
Markus Pilman bbd2fe62cc Merge branch 'master' of github.com:apple/foundationdb into features/boost70 2020-04-02 09:21:01 -07:00
Meng Xu 4cf796e838
Merge pull request #2891 from jzhou77/backup-cmd
Add an option for fdbbackup to use new backup system
2020-04-01 16:12:10 -07:00
tclinken 884e92bb49 Atomically update dependent knobs 2020-04-01 15:18:49 -07:00
Jingyu Zhou 906174e3e8 Add an option for fdbbackup to use new backup system
I.e., "-p", or "--partitioned_log" to enable it. By default, old backup system
is used.
2020-03-31 14:54:32 -07:00
Jingyu Zhou 3c76722504 Fix valgrind error of invalid memory access 2020-03-30 16:38:22 -07:00
Jingyu Zhou 3c32835cce Fix decoder for unfinished version batch in a log
A mutation log's version batch data can be split into multiple blocks, and some
of the blocks can be spread across two mutation logs. Thus, the decoder needs
copy unfinished version batch data from previous file progress to the next one.
2020-03-30 11:34:51 -07:00
Markus Pilman 28cd38913d Merge branch 'master' of github.com:apple/foundationdb into features/boost70 2020-03-27 13:44:10 -07:00
Jingyu Zhou feedab02a0
Merge pull request #2855 from xumengpanda/mengxu/fr-api-atomicrestore-PR
Add ApiCorrectnessAtomicRestore workload for the new performant restore
2020-03-25 18:05:26 -07:00
Meng Xu a93f13cfd7 Remove redundant restoreRequestDone break in backup.actor 2020-03-25 15:19:46 -07:00
Meng Xu 495afe2e0b Improve how to wati for restore to finish
Remove default parameter for atomicRestore as suggested in review.
2020-03-25 13:54:21 -07:00
Jingyu Zhou 7831bec2b0 Address review comments on trace events 2020-03-24 10:54:12 -07:00
Jingyu Zhou b18f192831 Fix decode bug of missing mutations
After reading a new block, all mutations are sorted by version again, which
can invalidate previously tuple. As a result, the decoded file will miss some
of the mutations.
2020-03-20 20:15:09 -07:00
Jingyu Zhou ca1a4ef9fd Ignore mutation logs of size 0 in converter 2020-03-20 20:15:08 -07:00
Jingyu Zhou d0a24dd20d Decode out of order mutations in old mutation logs
In the old mutation logs, a version's mutations are serialized as a buffer.
Then the buffer is split into smaller chunks, e.g., 10000 bytes each. When
writting chunks to the final mutation log file, these chunks can be flushed
out of order. For instance, the (version, chunck_part) can be in the order of
(3, 0), (4, 0), (3, 1). As a result, the decoder must read forward to find all
chunks of data for a version.

Another complication is that the files are organized into blocks, where (3, 1)
can be in a subsequent block. This change checks the value size for each
version, if the size is smaller than the right size, the decoder will look
for the missing chucks in the next block.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 20df67ee6a Filter partitioned logs with subset relationship
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou e15015ee6c Add mutation log version names
I.e., BACKUP_AGENT_MLOG_VERSION for 2001 and PARTITIONED_MLOG_VERSION for 4110.
2020-03-20 20:13:38 -07:00
Evan Tschannen 303df197cf Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	bindings/c/test/mako/mako.c
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbclient/NativeAPI.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/LogRouter.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/flow.vcxproj
#	flow/flow.vcxproj.filters
#	versions.target
2020-03-06 18:22:46 -08:00
Alex Miller 595dd77ed1 Merge remote-tracking branch 'upstream/release-6.2' into certificate-refresh 2020-03-04 20:25:42 -08:00
Alex Miller 9b5ef3416e Refactor TLSParams into TLSConfig + LoadedTLSConfig
The idea being that we keep around a TLSConfig that the configuration
that the user has provided, and then when we want to intialize an SSL
context, we ask the TLSConfig to load all certificates and return us a
LoadedTLSConfig that is a concrete set of certificate bytes in memory.

initTLS now just takes the in-memory bytes and applies them to the ssl
context.

This is a large refactor to lead up into certificate refeshing, where we
will periodically check for changes to the certificates, and then
re-load them and apply them to a new SSL context.
2020-03-04 20:14:47 -08:00
Evan Tschannen cdcb816866
Update fdbbackup/backup.actor.cpp 2020-03-04 16:08:45 -08:00
A.J. Beamon 58e621eca1 Invalid knobs or knob values are treated as warnings rather than errors. Apply this change to backup as well. 2020-03-04 15:50:04 -08:00
Evan Tschannen c11c24b79d removed the fdbrpc version of platform.h 2020-02-28 14:56:10 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Alex Miller 927cff3317 Report errors on TLS misconfigurations ... or at least try to. 2020-02-20 16:57:29 -08:00
Evan Tschannen 761da5a059 code cleanup 2020-02-19 17:59:45 -08:00
Meng Xu 132f5aa9ba FastRestore:Improve trace name and cosmetic change 2020-02-18 16:41:19 -08:00
Meng Xu 31a6ec34b7 Merge branch 'master' into mengxu/fast-restore-agent-PR 2020-02-18 16:17:59 -08:00
mpilman 3a1e878a9b Upgrade to boost 1.72 2020-02-14 18:10:13 -08:00
Jingyu Zhou e3061ab713 Use ArenaReader to deserialize MutationRef 2020-02-13 10:28:08 -08:00
Meng Xu 72110de7e2 FastRestore:Add trace for quick perf. measurement 2020-02-06 19:48:26 -08:00
Andrew Noyes 09f3690f09 Fix OPEN_FOR_IDE build 2020-02-03 10:42:05 -08:00
Jingyu Zhou 690e93145e Fix some comments 2020-01-22 19:38:46 -08:00
Jingyu Zhou 06fb45f32a FileConverter skips mutation files without tag ID
Fileconverter doesn't know the format of old mutation logs.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 3854d9a49a Fix decoder to handle multi-part values
When a value (i.e., mutations for a version) is large, it will be split into
multiple key value pairs. This is not handled previously and fixing it also
consolidate the interface of DecodeProgress.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 7d1b9fe6d3 Add mutation file decoder 2020-01-22 19:38:46 -08:00
Jingyu Zhou 568a8a8e77 Use big endian for mutation log files
For each mutation, its version, sub-version, and size are prefixed with big
endian representation. This is required, especially for the first version
variable, because we use 0xFF for padding purpose. A little endian version
number can easily collide with 0xFF, while big endian is guaranteed to have
0x00 as the first byte.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 114e153bc8 Use block size encoded in file names
The log files have block size encoded in their names and the converter should
use these sizes.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 1123157ae0 Ignore mutations large than the end version 2020-01-22 19:38:46 -08:00
Jingyu Zhou b92363bc29 Remove duplicated log files before the conversion
Duplicates can happen because backup workers may store the log for
old epochs successfully, but do not update the progress before another
recovery happened.	As a result, next epoch will retry and creates
duplicated log files.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 4327435601 Fix a data corruption bug
VersionedData used to include a MutationRef, which is made from BinaryReader.
Unfortunately, the StringRef inside MutationRef points a memory allocated from
the BinaryReader's arena, which is free'd after BinaryReader is destroyed.
Change to use a StringRef pointing to the serialized mutation solves this bug.
2020-01-22 19:38:46 -08:00
Jingyu Zhou c1748c0460 Code refactoring
The BackupWorker produces files not in blocks, which should be fixed.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 84a49cf389 Add merge sorting mutations from multiple files
This is implemented in MutationFilesReadProgress.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 5ab9d0925c Add namespace file_converter 2020-01-22 19:38:46 -08:00
Jingyu Zhou 7f7ec99170 Serialize and deserialize new backup files
The BackupWorker writes files that can be read by FileConverter. Move
StringRefReader to the header file for reuse in FileConverter.
2020-01-22 19:38:46 -08:00
Jingyu Zhou 5ac63ec526 Apply clang-format 2020-01-22 19:38:46 -08:00
Jingyu Zhou 674b468609 Add more parameter parsing 2020-01-22 19:38:46 -08:00
Jingyu Zhou 2707ab3eba Add fdbconvert command line utility
fdbconvert is intended to convert new backup files which are tagged mutation
logs to old backup format. The actual conversion is not included in this commit
and will be added in future commits.

Note that the BackupContainer needs to be updated to support new backup files,
which is also not included in this commit.
2020-01-22 19:38:46 -08:00
Alvin Moore 7628d04fb9 Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00
Steve Atherton 4ff058e86b Backup and DR layer status document generation now uses snapshot reads for all keys read to avoid unnecessary conflicts when read during a status update or cleanup transaction. Since many of the keys read use wrapper functions, all of the underlying functions in BackupAgentBase and its two implementations also required a snapshot mode argument. All snapshot arguments default to false to match the underlying FDB API get/getrange methods. 2019-12-19 00:29:35 -08:00
Alvin Moore 3bf971ba8b Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
2019-12-12 07:13:12 -08:00
Alvin Moore 363feafacd Changed the name of a cmake functions
Added working directory for cmake command
2019-12-11 13:59:03 -08:00
Alvin Moore aece864143 Added support for creating symlinks to fdbbackup within bin and packages/bin directory 2019-12-10 11:44:45 -08:00
A.J. Beamon ed8d3f163c Rename hgVersion to sourceVersion. 2019-11-15 12:26:51 -08:00
Evan Tschannen 4de60fc437 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/TLogServer.actor.cpp
2019-11-01 15:48:04 -07:00
A.J. Beamon f175ed30b3 Cleanup the fdbbackup cleanup command output. Add cleanup to the usage output printed for fdbbackup. 2019-10-31 09:52:21 -07:00
Andrew Noyes d4de608bb6 Fix OPEN_FOR_IDE build 2019-10-25 10:42:22 -07:00
Meng Xu 7599dd0093
Merge pull request #2205 from jzhou77/backup-fix
Fix format string with more portable code
2019-10-02 20:03:35 -07:00
Jingyu Zhou ceb39c0279 Fix format string with more portable code 2019-10-02 15:25:14 -07:00
Meng Xu d0147e5e5d Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
Resolved Conflicts:
	documentation/sphinx/source/release-notes.rst
	fdbserver/DataDistribution.actor.cpp
	versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen 9463b91594
Update fdbbackup/backup.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-09-30 13:16:55 -07:00
Evan Tschannen 4d659662b8 made cleanup handle retries better 2019-09-30 12:44:20 -07:00
Evan Tschannen ef01ad2ed8 optimized log range clearing to clear everything for each possible hash (256 clears) if that would be more efficient than one clear per second that has elapsed
aborting a DR without the —cleanup flag will still attempt to cleanup for 30 seconds before giving up
added a cleanup command to fdbbackup which can remove mutations from orphaned DRs which were stopped without the —cleanup flag
2019-09-27 18:32:27 -07:00
Alex Miller 9a64dc7f24
Merge pull request #1792 from ajbeamon/remove-dl-iterate-phdr-hack
Remove dl_iterate_phdr caching used by slow task profiler
2019-09-17 16:51:30 -07:00
Evan Tschannen b495cc697b Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-09-13 09:25:08 -07:00
Andrew Noyes bd7678e71b Remove --object-serializer help text 2019-09-05 11:57:59 -07:00
Meng Xu c2355f721e Merge branch 'master' into mengxu/performant-restore-PR 2019-09-04 17:11:42 -07:00
Meng Xu d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
A.J. Beamon e465cbef07 Merge branch 'master' into remove-dl-iterate-phdr-hack
# Conflicts:
#	fdbserver/fdbserver.actor.cpp
2019-08-22 10:13:23 -07:00
A.J. Beamon f02799455e Add --loggroup to fdbserver and fdbbackup help text. 2019-08-19 12:59:14 -07:00
Meng Xu 7ff46e6772 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-07 20:31:56 -07:00
mpilman 370ba8b841 Remove --object-serializer flag from executables 2019-08-06 09:25:40 -07:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 7ccaeddf05 Merge branch 'master' into mengxu/performant-restore-PR 2019-08-01 13:23:17 -07:00
A.J. Beamon b5d2234a13 Merge branch 'release-6.1' into merge-release-6.1-into-master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbserver/MoveKeys.actor.cpp
#	flow/FastAlloc.h
#	versions.target
2019-07-30 16:23:42 -07:00
Stephen Atherton 9e1042d903 Bug fix, restore with --dryrun option would still require the --dest_cluster_file option even though it does not connect to a cluster. 2019-07-29 16:21:12 -07:00
Stephen Atherton b644f15b87 Bug fix: fdbrestore commands other than "start" were using default cluster file argument handling (but without the -C flag) instead of using the --dest_cluster_file argument. 2019-07-29 13:19:28 -07:00
Meng Xu b0c31f28af FastRestore:Fix bug that blocks restore
1) Should recruit only configured number of roles;
2) Should never register a restore master interface as a restore worker (loader or applier) interface.
2019-07-25 17:55:37 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
mpilman 1ac2d01b03 Merge remote-tracking branch 'upstream/master' into flatbuffers-fixes2 2019-07-18 09:50:08 -07:00
mpilman d5caf0c1b4 Merge branch 'flatbuffers-fixes2' of github.com:mpilman/foundationdb into flatbuffers-fixes2 2019-07-16 14:47:40 -07:00
A.J. Beamon d5051b08dd Make trace event field lengths (and total event sizes) default knobified and configurable. Add a transaction option to control the field length of transaction debug logging. Make the program start command line field less likely to be truncated. 2019-07-12 16:12:35 -07:00
Andrew Noyes 5be0c47409 Add some missing help text 2019-07-12 10:46:46 -07:00
Andrew Noyes 269f077196 Add --object-serializer flag to fdbbackup etc 2019-07-12 10:46:45 -07:00
A.J. Beamon a5a6f8431c Add a random UID to TransactionMetrics in case a client opens multiple connections and also a field to indicate whether the connection is internal. Convert some of the metrics to our Counter object instead of running totals. 2019-07-08 14:01:04 -07:00
A.J. Beamon 6a899ddff3 Remove dl_iterate_phdr results caching that is used by slow task profiler, instead favoring disabling and reenabling profiling around the call. Add a mechanism to handle deferred profile requests. 2019-07-03 12:48:36 -07:00
Evan Tschannen 37c1df2491
Merge pull request #1705 from bnamasivayam/suspend-process
Extend RebootRequest API to include time to suspend the process befor…
2019-06-20 17:36:25 -07:00
mpilman 68ce9a5e75 ProtocolVersion type - second try 2019-06-18 17:55:27 -07:00
Evan Tschannen 20e3edeb0a Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
#	versions.target
2019-06-14 12:42:59 -07:00
Balachandar Namasivayam 5eb833759e Extend RebootRequest API to include time to suspend the process before reboot. This is intended to be used for testing purposes to simulate failures. 2019-06-14 11:35:38 -07:00
Stephen Atherton 235d24ee26 Bug fix: fdbrestore abort, wait, and status were not using the --dest_cluster_file argument. 2019-06-13 17:42:11 -07:00
Meng Xu 3fcb6ec0a1 FastRestore:Refactor RestoreLoader and fix bugs
Refactor RestoreLoader code and
Fix a bug in notifying restore finish.
2019-06-04 21:53:31 -07:00