Commit Graph

227 Commits

Author SHA1 Message Date
Evan Tschannen 924d335aa7 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	flow/Knobs.cpp
#	flow/Knobs.h
2020-02-25 18:25:19 -08:00
Evan Tschannen d3bca19960 backup should also submit on the first proxy for similar reasons to DR 2020-02-25 15:57:32 -08:00
Markus Pilman e71fe44ee3
Merge branch 'master' into features/icc 2020-02-08 21:33:02 -08:00
Jingyu Zhou d5849af5c0 Address review comments 2020-02-05 10:33:51 -08:00
mpilman d09e07f1f5 Merge remote-tracking branch 'upstream/master' into features/icc 2020-02-04 10:26:18 -08:00
Jingyu Zhou 52c6737411 Rename backupLoggingEnabled as backupWorkerEnabled
To highlight the changes for 7.0 backup changes. By default,
backup_worker_enabled flag is set for 7.0 version.
2020-02-04 10:09:16 -08:00
Jingyu Zhou 0db03f1d3c Use backup_logging_enabled flag
The default is to enable new backup workers. Users can disable this flag to
turn off the backup worker feature.
2020-02-03 20:03:22 -08:00
Jingyu Zhou 38aa1903fd Add a DB configuration option for backup workers
Right now, the default is to keep the old backup behavior, i.e., do NOT use
backup workers. Specifically, if BackupType is not set (or is set to default),
the master will not recruit backup workers and will not add pseudo locality for
backup workers.

The StartFullBackupTaskFunc is updated to check if backup worker is enabled.
Only when it is not enabled, starting a backup will wait on all backup workers
to be started.
2020-01-31 19:29:09 -08:00
Jingyu Zhou f7956cfbfc Clear backup UID from backupStartedKey when finish/abort backups
Clearing this key signals backup workers that backup is no longer needed. When
no backup is going on, the backup workers switch to the NOOP state.
2020-01-31 19:29:09 -08:00
Jingyu Zhou 19ef7f6bdb Skip watch of backup task's started key if it's already set
The backup task may be restarted multiple times so the started key for the
backup task may already be set. In this case, the wait on watch should be
skipped.
2020-01-31 19:29:09 -08:00
Jingyu Zhou f8342f0884 Add keepRunning for start backup transaction
TaskBucket::keepRunning() needs to be called in backup transactions to be sure
that the task has not been cancelled. If so, the task is cancelled. Otherwise,
the task can continue run, causing multiple runs of the same task.

Another subtle issue is that the beginVersion is persisted on backupStartedKey.
So while reading it back from that key, we should set task's beginVersion with
the value persisted earlier.
2020-01-31 19:29:09 -08:00
Jingyu Zhou 5a602f58e8 Start backup with a wait on all backup workers running
This wait is to make sure that backup workers are already saving mutations so
that no mutations are missed. The idea is that the CLI sets a "backupStartedKey"
in the database and waits for allWorkerStarted() key of the backup to be set.

Backup workers monitor the changes to the "backupStartedKey" and start logging
mutations. Additionally, backup worker for Tag(-2,0) monitors all other workers
have started (checking their saved progress version is larger than the backup's
start version), and then sets the allWorkerStarted() key for the backup.
2020-01-31 19:29:09 -08:00
Jingyu Zhou 7f7ec99170 Serialize and deserialize new backup files
The BackupWorker writes files that can be read by FileConverter. Move
StringRefReader to the header file for reuse in FileConverter.
2020-01-22 19:38:46 -08:00
Alvin Moore 7628d04fb9 Merge branch 'release-6.2' of github.com:apple/foundationdb into release_6.2_merge
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2020-01-09 07:21:16 -08:00
Meng Xu e98b2a0d1c FastRestore:Introduce RestoreAsset 2019-12-20 18:00:10 -08:00
Steve Atherton 4ff058e86b Backup and DR layer status document generation now uses snapshot reads for all keys read to avoid unnecessary conflicts when read during a status update or cleanup transaction. Since many of the keys read use wrapper functions, all of the underlying functions in BackupAgentBase and its two implementations also required a snapshot mode argument. All snapshot arguments default to false to match the underlying FDB API get/getrange methods. 2019-12-19 00:29:35 -08:00
Stephen Atherton f3ed3f462b Added code comments in backup log range task. 2019-12-03 09:44:15 -08:00
Andrew Noyes b7b5d2ead3 Remove several nonsensical const uses
These seem to be all the ones that clang's -Wignored-qualifiers
complains about
2019-10-26 14:30:34 -07:00
Evan Tschannen 688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Evan Tschannen 628b4e0220 added a warning if multiple log ranges exist for the same range 2019-10-02 17:06:19 -07:00
Meng Xu d0147e5e5d Merge branch 'release-6.2' into mengxu/merge-release620-to-master-v3
Resolved Conflicts:
	documentation/sphinx/source/release-notes.rst
	fdbserver/DataDistribution.actor.cpp
	versions.target
2019-10-02 13:22:56 -07:00
Evan Tschannen 9ec9f41d34 backup and DR would not share mutations if they were started on different versions of FDB 2019-10-01 18:52:07 -07:00
Evan Tschannen 4d659662b8 made cleanup handle retries better 2019-09-30 12:44:20 -07:00
Evan Tschannen ef01ad2ed8 optimized log range clearing to clear everything for each possible hash (256 clears) if that would be more efficient than one clear per second that has elapsed
aborting a DR without the —cleanup flag will still attempt to cleanup for 30 seconds before giving up
added a cleanup command to fdbbackup which can remove mutations from orphaned DRs which were stopped without the —cleanup flag
2019-09-27 18:32:27 -07:00
Meng Xu d160810662 FastRestore:Resolve review comments 2019-09-04 16:48:43 -07:00
Meng Xu 2602cb3591 FastRestore:Rename RestoreConfig to RestoreConfigFR to fix link problem in windows
Because the current restore has defined RestoreConfig, windows linker complains.
This commit rename the RestoreConfig used in FastRestore as RestoreConfigFR.
2019-08-02 23:00:12 -07:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
mpilman 844dd60202 FDB compiling with intel compiler 2019-06-20 09:29:01 -07:00
Meng Xu 477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
A.J. Beamon 603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
Evan Tschannen f4fbaac6b0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-05-19 10:27:59 -07:00
Stephen Atherton e495afdc6e Removed unnecessary c_str(). 2019-05-16 17:26:37 -07:00
Stephen Atherton 7b8e140788 Added UID field to fdbbackup status --json output. 2019-05-16 17:23:33 -07:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Austin Seipp b5cbffc1b8 fdbclient: fix some print/scan format warnings
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Meng Xu 529ce66b6c Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-18 18:02:45 -07:00
Andrew Noyes 781b6ece77 Fix OPEN_FOR_IDE -Wunused-variable warnings
CC #1255, #1173
2019-04-16 15:28:01 -07:00
Andrew Noyes 6207d724f8 Fix all -Wunused-variable warnings 2019-04-15 18:13:00 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Meng Xu c4a8a80d6f Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-04 22:51:00 -07:00
Meng Xu 4ed8e9c16f FastRestore: Comment out fprintf stderr
We used fprintf(stderr,) to make sure the message is flushed out.
When we run correctness, we should comment the unnecessary stderr
to avoid false positive errors.
2019-04-01 18:35:54 -07:00
Meng Xu 068ba2e082 FastRestore: Rename RestoreFile to RestoreFileFR to avoid weird running error
When two struct have the same name but never used in the same scope,
the compiler will NOT report any error in compilation, but
the program will arbitrarily choose one of the struct at the linker time,
and experience weird error in running time. The runtime error is caused by the
corrrupted memory when we assign a struct content to a different struct type.
2019-04-01 18:23:38 -07:00
Stephen Atherton bae815a777 Bug fix, starting a restore on a tag already in-use would spinloop forever and eventually run out of memory. 2019-04-01 15:00:24 -07:00
Meng Xu 70d7c289f4 Merge branch 'master' into mengxu/restore/parallel-v7 2019-03-30 22:13:10 -07:00
Meng Xu 589fb76826 FastRestore:Attempt to fix old restore 2019-03-30 15:19:30 -07:00
Meng Xu 5e9a6edfe6 FastRestore:bug fix: Lock DB successfully 2019-03-29 13:31:38 -07:00
Stephen Atherton c6edcc7f06 Added schema version string to backup JSON status docs. Bug fix in backup status JSON, the document was being created outside the transaction retry loop so retries would combine partial element sets across all tries into the result. 2019-03-14 02:10:14 -07:00
Steve Atherton 8aab719c22
Merge branch 'master' into feature-backup-json 2019-03-12 18:23:16 -07:00
A.J. Beamon a25e224cda
Merge pull request #1213 from etschannen/feature-metadata-version
Added a metadata version key
2019-03-12 13:36:33 -07:00