Meng Xu
f5e8345496
FastRestoreAgent:Use atomicParallelRestore to kick off restore
...
Replace the handcrafted version with atomicParallelRestore actor
which is simulation tested
2020-04-27 22:15:00 -07:00
Meng Xu
d4509090d4
Generate fastrestore_agent symlink in CMake
2020-04-23 07:55:44 -07:00
A.J. Beamon
af4e0088ba
Merge pull request #2896 from tclinken/atomically-update-dependent-knobs
...
Atomically update dependent knobs
2020-04-08 15:00:49 -07:00
tclinken
52860043c9
Merge remote-tracking branch 'origin' into atomically-update-dependent-knobs
2020-04-08 12:26:21 -07:00
Markus Pilman
7ba2cec9ef
Merge branch 'master' of github.com:apple/foundationdb into features/no-make
2020-04-08 10:23:46 -07:00
Markus Pilman
d4542dbb5a
Delete old build system
2020-04-07 11:03:45 -07:00
Jingyu Zhou
9fb3fb9d82
Add pause/resume for new backups
...
To pause/resume the backup workers, the fdbbackup command will write to the
backupPausedKey. Then backup workers noticed the value of the key has been
changed and stops/resumes pulling from TLog.
2020-04-06 14:29:46 -07:00
Markus Pilman
e4611e8ae4
fix versions.h stupidity
2020-04-06 10:28:55 -07:00
Markus Pilman
8b5780c36c
don't include source and binary dir
...
This forces users to use include paths from the sources root.
So `#include "Arena.h"` won't work anymore, only
`#include "flow/Arena.h"` will.
2020-04-06 10:13:49 -07:00
Markus Pilman
bbd2fe62cc
Merge branch 'master' of github.com:apple/foundationdb into features/boost70
2020-04-02 09:21:01 -07:00
Meng Xu
4cf796e838
Merge pull request #2891 from jzhou77/backup-cmd
...
Add an option for fdbbackup to use new backup system
2020-04-01 16:12:10 -07:00
tclinken
884e92bb49
Atomically update dependent knobs
2020-04-01 15:18:49 -07:00
Jingyu Zhou
906174e3e8
Add an option for fdbbackup to use new backup system
...
I.e., "-p", or "--partitioned_log" to enable it. By default, old backup system
is used.
2020-03-31 14:54:32 -07:00
Jingyu Zhou
3c76722504
Fix valgrind error of invalid memory access
2020-03-30 16:38:22 -07:00
Jingyu Zhou
3c32835cce
Fix decoder for unfinished version batch in a log
...
A mutation log's version batch data can be split into multiple blocks, and some
of the blocks can be spread across two mutation logs. Thus, the decoder needs
copy unfinished version batch data from previous file progress to the next one.
2020-03-30 11:34:51 -07:00
Markus Pilman
28cd38913d
Merge branch 'master' of github.com:apple/foundationdb into features/boost70
2020-03-27 13:44:10 -07:00
Jingyu Zhou
feedab02a0
Merge pull request #2855 from xumengpanda/mengxu/fr-api-atomicrestore-PR
...
Add ApiCorrectnessAtomicRestore workload for the new performant restore
2020-03-25 18:05:26 -07:00
Meng Xu
a93f13cfd7
Remove redundant restoreRequestDone break in backup.actor
2020-03-25 15:19:46 -07:00
Meng Xu
495afe2e0b
Improve how to wati for restore to finish
...
Remove default parameter for atomicRestore as suggested in review.
2020-03-25 13:54:21 -07:00
Jingyu Zhou
7831bec2b0
Address review comments on trace events
2020-03-24 10:54:12 -07:00
Jingyu Zhou
b18f192831
Fix decode bug of missing mutations
...
After reading a new block, all mutations are sorted by version again, which
can invalidate previously tuple. As a result, the decoded file will miss some
of the mutations.
2020-03-20 20:15:09 -07:00
Jingyu Zhou
ca1a4ef9fd
Ignore mutation logs of size 0 in converter
2020-03-20 20:15:08 -07:00
Jingyu Zhou
d0a24dd20d
Decode out of order mutations in old mutation logs
...
In the old mutation logs, a version's mutations are serialized as a buffer.
Then the buffer is split into smaller chunks, e.g., 10000 bytes each. When
writting chunks to the final mutation log file, these chunks can be flushed
out of order. For instance, the (version, chunck_part) can be in the order of
(3, 0), (4, 0), (3, 1). As a result, the decoder must read forward to find all
chunks of data for a version.
Another complication is that the files are organized into blocks, where (3, 1)
can be in a subsequent block. This change checks the value size for each
version, if the size is smaller than the right size, the decoder will look
for the missing chucks in the next block.
2020-03-20 20:15:08 -07:00
Jingyu Zhou
20df67ee6a
Filter partitioned logs with subset relationship
...
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou
e15015ee6c
Add mutation log version names
...
I.e., BACKUP_AGENT_MLOG_VERSION for 2001 and PARTITIONED_MLOG_VERSION for 4110.
2020-03-20 20:13:38 -07:00
Evan Tschannen
303df197cf
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# bindings/c/test/mako/mako.c
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/LogRouter.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/fdbserver.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/flow.vcxproj
# flow/flow.vcxproj.filters
# versions.target
2020-03-06 18:22:46 -08:00
Alex Miller
595dd77ed1
Merge remote-tracking branch 'upstream/release-6.2' into certificate-refresh
2020-03-04 20:25:42 -08:00
Alex Miller
9b5ef3416e
Refactor TLSParams into TLSConfig + LoadedTLSConfig
...
The idea being that we keep around a TLSConfig that the configuration
that the user has provided, and then when we want to intialize an SSL
context, we ask the TLSConfig to load all certificates and return us a
LoadedTLSConfig that is a concrete set of certificate bytes in memory.
initTLS now just takes the in-memory bytes and applies them to the ssl
context.
This is a large refactor to lead up into certificate refeshing, where we
will periodically check for changes to the certificates, and then
re-load them and apply them to a new SSL context.
2020-03-04 20:14:47 -08:00
Evan Tschannen
cdcb816866
Update fdbbackup/backup.actor.cpp
2020-03-04 16:08:45 -08:00
A.J. Beamon
58e621eca1
Invalid knobs or knob values are treated as warnings rather than errors. Apply this change to backup as well.
2020-03-04 15:50:04 -08:00
Evan Tschannen
c11c24b79d
removed the fdbrpc version of platform.h
2020-02-28 14:56:10 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Alex Miller
927cff3317
Report errors on TLS misconfigurations ... or at least try to.
2020-02-20 16:57:29 -08:00
Evan Tschannen
761da5a059
code cleanup
2020-02-19 17:59:45 -08:00
Meng Xu
132f5aa9ba
FastRestore:Improve trace name and cosmetic change
2020-02-18 16:41:19 -08:00
Meng Xu
31a6ec34b7
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-18 16:17:59 -08:00
mpilman
3a1e878a9b
Upgrade to boost 1.72
2020-02-14 18:10:13 -08:00
Jingyu Zhou
e3061ab713
Use ArenaReader to deserialize MutationRef
2020-02-13 10:28:08 -08:00
Meng Xu
72110de7e2
FastRestore:Add trace for quick perf. measurement
2020-02-06 19:48:26 -08:00
Andrew Noyes
09f3690f09
Fix OPEN_FOR_IDE build
2020-02-03 10:42:05 -08:00
Jingyu Zhou
690e93145e
Fix some comments
2020-01-22 19:38:46 -08:00
Jingyu Zhou
06fb45f32a
FileConverter skips mutation files without tag ID
...
Fileconverter doesn't know the format of old mutation logs.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
3854d9a49a
Fix decoder to handle multi-part values
...
When a value (i.e., mutations for a version) is large, it will be split into
multiple key value pairs. This is not handled previously and fixing it also
consolidate the interface of DecodeProgress.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
7d1b9fe6d3
Add mutation file decoder
2020-01-22 19:38:46 -08:00
Jingyu Zhou
568a8a8e77
Use big endian for mutation log files
...
For each mutation, its version, sub-version, and size are prefixed with big
endian representation. This is required, especially for the first version
variable, because we use 0xFF for padding purpose. A little endian version
number can easily collide with 0xFF, while big endian is guaranteed to have
0x00 as the first byte.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
114e153bc8
Use block size encoded in file names
...
The log files have block size encoded in their names and the converter should
use these sizes.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
1123157ae0
Ignore mutations large than the end version
2020-01-22 19:38:46 -08:00
Jingyu Zhou
b92363bc29
Remove duplicated log files before the conversion
...
Duplicates can happen because backup workers may store the log for
old epochs successfully, but do not update the progress before another
recovery happened. As a result, next epoch will retry and creates
duplicated log files.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
4327435601
Fix a data corruption bug
...
VersionedData used to include a MutationRef, which is made from BinaryReader.
Unfortunately, the StringRef inside MutationRef points a memory allocated from
the BinaryReader's arena, which is free'd after BinaryReader is destroyed.
Change to use a StringRef pointing to the serialized mutation solves this bug.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
c1748c0460
Code refactoring
...
The BackupWorker produces files not in blocks, which should be fixed.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
84a49cf389
Add merge sorting mutations from multiple files
...
This is implemented in MutationFilesReadProgress.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
5ab9d0925c
Add namespace file_converter
2020-01-22 19:38:46 -08:00
Jingyu Zhou
7f7ec99170
Serialize and deserialize new backup files
...
The BackupWorker writes files that can be read by FileConverter. Move
StringRefReader to the header file for reuse in FileConverter.
2020-01-22 19:38:46 -08:00
Jingyu Zhou
5ac63ec526
Apply clang-format
2020-01-22 19:38:46 -08:00
Jingyu Zhou
674b468609
Add more parameter parsing
2020-01-22 19:38:46 -08:00
Jingyu Zhou
2707ab3eba
Add fdbconvert command line utility
...
fdbconvert is intended to convert new backup files which are tagged mutation
logs to old backup format. The actual conversion is not included in this commit
and will be added in future commits.
Note that the BackupContainer needs to be updated to support new backup files,
which is also not included in this commit.
2020-01-22 19:38:46 -08:00
Alvin Moore
7628d04fb9
Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00
Steve Atherton
4ff058e86b
Backup and DR layer status document generation now uses snapshot reads for all keys read to avoid unnecessary conflicts when read during a status update or cleanup transaction. Since many of the keys read use wrapper functions, all of the underlying functions in BackupAgentBase and its two implementations also required a snapshot mode argument. All snapshot arguments default to false to match the underlying FDB API get/getrange methods.
2019-12-19 00:29:35 -08:00
Alvin Moore
3bf971ba8b
Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/storageserver.actor.cpp
2019-12-12 07:13:12 -08:00
Alvin Moore
363feafacd
Changed the name of a cmake functions
...
Added working directory for cmake command
2019-12-11 13:59:03 -08:00
Alvin Moore
aece864143
Added support for creating symlinks to fdbbackup within bin and packages/bin directory
2019-12-10 11:44:45 -08:00
A.J. Beamon
ed8d3f163c
Rename hgVersion to sourceVersion.
2019-11-15 12:26:51 -08:00
Evan Tschannen
4de60fc437
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/TLogServer.actor.cpp
2019-11-01 15:48:04 -07:00
A.J. Beamon
f175ed30b3
Cleanup the fdbbackup cleanup command output. Add cleanup to the usage output printed for fdbbackup.
2019-10-31 09:52:21 -07:00
Andrew Noyes
d4de608bb6
Fix OPEN_FOR_IDE build
2019-10-25 10:42:22 -07:00
Meng Xu
7599dd0093
Merge pull request #2205 from jzhou77/backup-fix
...
Fix format string with more portable code
2019-10-02 20:03:35 -07:00
Jingyu Zhou
ceb39c0279
Fix format string with more portable code
2019-10-02 15:25:14 -07:00
Meng Xu
d0147e5e5d
Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
...
Resolved Conflicts:
documentation/sphinx/source/release-notes.rst
fdbserver/DataDistribution.actor.cpp
versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen
9463b91594
Update fdbbackup/backup.actor.cpp
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-09-30 13:16:55 -07:00
Evan Tschannen
4d659662b8
made cleanup handle retries better
2019-09-30 12:44:20 -07:00
Evan Tschannen
ef01ad2ed8
optimized log range clearing to clear everything for each possible hash (256 clears) if that would be more efficient than one clear per second that has elapsed
...
aborting a DR without the —cleanup flag will still attempt to cleanup for 30 seconds before giving up
added a cleanup command to fdbbackup which can remove mutations from orphaned DRs which were stopped without the —cleanup flag
2019-09-27 18:32:27 -07:00
Alex Miller
9a64dc7f24
Merge pull request #1792 from ajbeamon/remove-dl-iterate-phdr-hack
...
Remove dl_iterate_phdr caching used by slow task profiler
2019-09-17 16:51:30 -07:00
Evan Tschannen
b495cc697b
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-09-13 09:25:08 -07:00
Andrew Noyes
bd7678e71b
Remove --object-serializer help text
2019-09-05 11:57:59 -07:00
Meng Xu
c2355f721e
Merge branch 'master' into mengxu/performant-restore-PR
2019-09-04 17:11:42 -07:00
Meng Xu
d160810662
FastRestore:Resolve review comments
2019-09-04 16:48:43 -07:00
A.J. Beamon
e465cbef07
Merge branch 'master' into remove-dl-iterate-phdr-hack
...
# Conflicts:
# fdbserver/fdbserver.actor.cpp
2019-08-22 10:13:23 -07:00
A.J. Beamon
f02799455e
Add --loggroup to fdbserver and fdbbackup help text.
2019-08-19 12:59:14 -07:00
Meng Xu
7ff46e6772
Merge branch 'master' into mengxu/performant-restore-PR
2019-08-07 20:31:56 -07:00
mpilman
370ba8b841
Remove --object-serializer flag from executables
2019-08-06 09:25:40 -07:00
Meng Xu
3b54363780
FastRestore:Apply Clang-format
2019-08-01 18:09:12 -07:00
Meng Xu
7ccaeddf05
Merge branch 'master' into mengxu/performant-restore-PR
2019-08-01 13:23:17 -07:00
A.J. Beamon
b5d2234a13
Merge branch 'release-6.1' into merge-release-6.1-into-master
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbserver/MoveKeys.actor.cpp
# flow/FastAlloc.h
# versions.target
2019-07-30 16:23:42 -07:00
Stephen Atherton
9e1042d903
Bug fix, restore with --dryrun option would still require the --dest_cluster_file option even though it does not connect to a cluster.
2019-07-29 16:21:12 -07:00
Stephen Atherton
b644f15b87
Bug fix: fdbrestore commands other than "start" were using default cluster file argument handling (but without the -C flag) instead of using the --dest_cluster_file argument.
2019-07-29 13:19:28 -07:00
Meng Xu
b0c31f28af
FastRestore:Fix bug that blocks restore
...
1) Should recruit only configured number of roles;
2) Should never register a restore master interface as a restore worker (loader or applier) interface.
2019-07-25 17:55:37 -07:00
Meng Xu
45083edf74
Merge branch 'master' into mengxu/performant-restore-PR
...
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
mpilman
1ac2d01b03
Merge remote-tracking branch 'upstream/master' into flatbuffers-fixes2
2019-07-18 09:50:08 -07:00
mpilman
d5caf0c1b4
Merge branch 'flatbuffers-fixes2' of github.com:mpilman/foundationdb into flatbuffers-fixes2
2019-07-16 14:47:40 -07:00
A.J. Beamon
d5051b08dd
Make trace event field lengths (and total event sizes) default knobified and configurable. Add a transaction option to control the field length of transaction debug logging. Make the program start command line field less likely to be truncated.
2019-07-12 16:12:35 -07:00
Andrew Noyes
5be0c47409
Add some missing help text
2019-07-12 10:46:46 -07:00
Andrew Noyes
269f077196
Add --object-serializer flag to fdbbackup etc
2019-07-12 10:46:45 -07:00
A.J. Beamon
a5a6f8431c
Add a random UID to TransactionMetrics in case a client opens multiple connections and also a field to indicate whether the connection is internal. Convert some of the metrics to our Counter object instead of running totals.
2019-07-08 14:01:04 -07:00
A.J. Beamon
6a899ddff3
Remove dl_iterate_phdr results caching that is used by slow task profiler, instead favoring disabling and reenabling profiling around the call. Add a mechanism to handle deferred profile requests.
2019-07-03 12:48:36 -07:00
Evan Tschannen
37c1df2491
Merge pull request #1705 from bnamasivayam/suspend-process
...
Extend RebootRequest API to include time to suspend the process befor…
2019-06-20 17:36:25 -07:00
mpilman
68ce9a5e75
ProtocolVersion type - second try
2019-06-18 17:55:27 -07:00
Evan Tschannen
20e3edeb0a
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/storageserver.actor.cpp
# versions.target
2019-06-14 12:42:59 -07:00
Balachandar Namasivayam
5eb833759e
Extend RebootRequest API to include time to suspend the process before reboot. This is intended to be used for testing purposes to simulate failures.
2019-06-14 11:35:38 -07:00
Stephen Atherton
235d24ee26
Bug fix: fdbrestore abort, wait, and status were not using the --dest_cluster_file argument.
2019-06-13 17:42:11 -07:00
Meng Xu
3fcb6ec0a1
FastRestore:Refactor RestoreLoader and fix bugs
...
Refactor RestoreLoader code and
Fix a bug in notifying restore finish.
2019-06-04 21:53:31 -07:00