Commit Graph

851 Commits

Author SHA1 Message Date
Stephen Atherton 0f20068e82 Renamed all TaskBucket backup tasks to more appropriate names. Created the ability to make task aliases and used this to direct old task names to a task definition which will abort backups created before version 5.1. 2018-01-04 22:53:31 -08:00
A.J. Beamon 653a46f12f Update error string fro cluster_version_changed error 2018-01-04 15:06:09 -08:00
Evan Tschannen f8f1c48d83 sometimes test pausing backups 2018-01-04 11:40:08 -08:00
Stephen Atherton 529716972d Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-01-04 11:38:41 -08:00
Evan Tschannen f2c4beed9f fix: tlogFitness did not consider it better to have one tlog of a better fitness
fix: checkStable was not used in all places in better master exists
fix: we need to call checkOutstanding on worker registration in all cases
fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it
2018-01-04 11:33:02 -08:00
Stephen Atherton d43e80cf48 Bug fix, atomicRestore would fail to get a commit version after a commit_unknown_result where the transaction actually was committed. This would cause the restore to target version -1 so it would use one of the first available restorable versions in the backup instead of the version at which the database was locked. 2018-01-03 16:24:02 -08:00
Stephen Atherton cec9f4d7a4 Bug fix in DNS resolution. When the result is an error the result promise was being set twice. 2018-01-03 13:05:38 -08:00
Stephen Atherton fd3f3aa647 Increased system key size limit to fix some rare backup use cases. 2018-01-03 12:05:12 -08:00
Stephen Atherton 96c479dc71 Rare bug fix. It turns out that backup log files must be written with unique names, otherwise a re-written >1 block log file overwritten after a restore has begun could read some blocks from before the rewrite and some blocks after, but due to random content ordering this would be incorrect and produce a corrupt restore. This bug is very rare because restore would detect an error unless the rewritten log file has exactly same size as the original file, but this is unlikely because the random content order affects block padding and therefore usable content bytes per block. 2018-01-02 23:38:01 -08:00
Stephen Atherton 371dee70e6 Improved backup folder structure to be shallower but spread files more uniformly and make each folder's entries lexically sort into version order regardless of numeric length. Improved backup container test to use a random version multiplier on the file versions created in order to test a wide range of versioned folder paths. 2018-01-02 23:22:35 -08:00
Stephen Atherton 78430425e8 Blob bucket listings will now use parallel recursive requests on CommonPrefixes, up to a max depth, if a delimiter is provided. 2018-01-02 23:17:52 -08:00
Stephen Atherton 07fde9dfb4 Bug fix, error code 429 was not being treated as retryable in the recent refactor. 2018-01-02 23:15:25 -08:00
Evan Tschannen 6d5dd9bd27 fix: we cannot pipeline disk queue commits until after the first commit is successful 2018-01-02 13:30:27 -08:00
Evan Tschannen 3e4d968308 Merge commit 'f3a3799b1d7af148949cb50c4fc349494976a6f8' into release-5.1 2018-01-02 11:25:46 -08:00
Stephen Atherton f324afc13f Bug fix in blob store listing when it requires multiple serial requests Added more trace events to FileBackup and BlobStoreEndpoint with suppression and added suppression to existing trace events. 2017-12-22 17:08:25 -08:00
Stephen Atherton f2524ffd33 AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description. 2017-12-21 21:15:26 -08:00
Stephen Atherton aa8b4c52d5 Removed backup URL from trace events. 2017-12-21 18:22:14 -08:00
Stephen Atherton 93e426ccd2 Merge branch 'master' of github.com:apple/foundationdb 2017-12-21 17:21:20 -08:00
Stephen Atherton e8f9568bbe Simulation improvement, readCommitted() calls that run for a long time would sometimes go too slow depending on the buggified limit, so now the limit is updated for each fetch loop. 2017-12-21 17:21:05 -08:00
Evan Tschannen 69f7409c37 fix: latestRestorable was incorrect 2017-12-21 17:09:21 -08:00
Evan Tschannen 5ed080721d fix: atomic restore must wait for the restorable version is greater than the lock version
fix: latestRestorableVersion calculation was wrong
2017-12-21 15:45:10 -08:00
Yichi Chiang e750682b74 Merge pull request #227 from cie/add-datetime-conversion-error-message
Add datetime conversion error message in backup
2017-12-21 14:51:04 -08:00
Yichi Chiang 3616035415 Add datetime conversion error message in backup 2017-12-21 14:45:05 -08:00
Evan Tschannen 95b502e1d7 fix: we did not restore to the target version in all cases 2017-12-21 14:11:44 -08:00
Stephen Atherton e0ef5a9a20 Whitespace normalization. 2017-12-21 12:07:29 -08:00
Stephen Atherton 9dc952e3b2 Added blob credentials details to fdbbackup help. 2017-12-21 02:50:02 -08:00
Stephen Atherton ec28c77353 Merge branch 'master' of github.com:apple/foundationdb 2017-12-21 01:58:47 -08:00
Stephen Atherton e3aee45a74 Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings. 2017-12-21 01:58:15 -08:00
Evan Tschannen 86958cb08d Merge pull request #226 from cie/fix-taskBucket-unblockFuture
Modify TaskBucketCorrectness to support chain and multiple tasks
2017-12-20 18:00:54 -08:00
Yichi Chiang 91e5abeaa6 Modify TaskBucketCorrectness to support chain and multiple tasks 2017-12-20 17:02:49 -08:00
Alex Miller f70e3b9fe8 Add or change a bunch of comments to provide descriptions of function contracts.
This cleans up a bit of the VersionStamp DR work I did, and leaves hints and
advice for anyone who will be touching mutation applying code in the future.
2017-12-20 16:57:14 -08:00
Evan Tschannen 38cff7d4a5 every transaction which clears applyMutation keys does so on the first proxy 2017-12-20 15:41:47 -08:00
Evan Tschannen 982f0dcb1e Merge pull request #222 from cie/alexmiller/drtimefix2
Fix yet another VersionStamp DR issue.
2017-12-20 15:09:23 -08:00
Alex Miller b5a6bc0ab7 Fix VersionStamp problems by instead adding a COMMIT_ON_FIRST_PROXY transaction option.
Simulation identified the fact that we can violate the
VersionStamps-are-always-increasing promise via the following series of events:

1. On proxy 0, dumpData adds commit requests to proxy 0's commit promise stream
2. To any proxy, a client submits the first transaction of abortBackup, which stops further dumpData calls on proxy 0.
3. To any proxy that is not proxy 0, submit a transaction that checks if it needs to upgrade the destination version.
4. The transaction from (3) is committed
5. Transactions from (1) are committed

This is possible because the dumpData transactions have no read conflict
ranges, and thus it's impossible to make them abort due to "conflicting"
transactions.  There's also no promise that if client C sends a commit to proxy
A, and later a client D sends a commit to proxy B, that B must log its commit
after A.  (We only promise that if C is told it was committed before D is told
it was committed, then A committed before B.)

There was a failed attempt to fix this problem.  We tried to add read conflict
ranges to dumpData transactions so that they could be aborted by "conflicting"
transactions.  However, this failed because this now means that dumpData
transactions require conflict resolution, and the stale read version that they
use can cause them to be aborted with a transaction_too_old error.
(Transactions that don't have read conflict ranges will never return
transaction_too_old, because with no reads, the read snapshot version is
effectively meaningless.)  This was never previously possible, so the existing
code doesn't retry commits, and to make things more complicated, the dumpData
commits must be applied in order.  This would require either adding
dependencies to transactions (if A is going to commit then B must also be/have
committed), which would be complicated, or submitting transactions with a fixed
read version, and replaying the failed commits with a higher read version once
we get a transaction_too_old error, which would unacceptably slow down the
maximum throughput of dumpData.

Thus, we've instead elected to add a special transaction option that bypasses
proxy load balancing for commits, and always commits against proxy 0.  We can
know for certain that after the transaction from (2) is committed, all of the
dumpData transactions that will be committed have been added to the commit
promise stream on proxy 0.  Thus, if we enqueue another transaction against
proxy 0, we can know that it will be placed into the promise stream after all
of the dumpData transactions, thus providing the semantics that we require:  no
dumpData transaction can commit after the destination version upgrade
transaction.
2017-12-20 15:04:04 -08:00
Evan Tschannen 07efdc70c8 more fixes for windows compile 2017-12-20 14:39:23 -08:00
Evan Tschannen c51de3bb88 fixed windows compile issues 2017-12-20 13:48:31 -08:00
Stephen Atherton c1958b335a Compile fix on windows, can't access protected parent class member from static function, apparently. 2017-12-20 12:13:25 -08:00
Evan Tschannen 0ab0cf51a3 fix: snapshotDispatch signaled completion after the first snapshot finished 2017-12-20 12:07:35 -08:00
Evan Tschannen 50bc25d3c7 Merge pull request #225 from cie/continuous-backup
Continuous backup
2017-12-20 11:13:36 -08:00
Stephen Atherton b77276d2f0 First snapshot of a backup should go as fast as possible instead of using the configured snapshot interval. 2017-12-20 01:07:03 -08:00
Stephen Atherton 7caa012fbf Added snapshot interval option to "fdbbackup start" which defaults to a new knob's value. Added snapshot info to backup status text. Improvements to fdbbackup help. 2017-12-20 00:49:08 -08:00
Stephen Atherton d87aa521e9 Merge branch 'backup-container-refactor' into continuous-backup 2017-12-19 23:39:00 -08:00
Stephen Atherton 193c216f52 Merge pull request #224 from cie/add-fdbbackup-interface
Add fdbbackup interface
2017-12-19 23:33:17 -08:00
Stephen Atherton e0d9cea008 Merge branch 'master' into continuous-backup
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Stephen Atherton 2cd1ff6aae Bug fix, in restore dispatch the apply lag was being retrieved before updating the apply end version which would make it look like mutations were finished applying early. 2017-12-19 18:11:40 -08:00
Stephen Atherton 61a043ebfa Added tr->reset() to prevent initial transaction loop attempts from having a higher chance of expiring. 2017-12-19 17:33:45 -08:00
Alex Miller c7dbd31a1e Refactoring: Create a common prefixRange and do UID->Key once in backup. 2017-12-19 17:17:50 -08:00
Alex Miller 1488c12c18 Simulation will return and error and print if any non-suppressed SevError events were logged.
This means that loops like `seed=1; while ./fdbserver -r simulation -s $seed;
do seed=$(($seed+1)); done` to find an example of an often failing test.  This
also means joshua will report ExitCode errors on anything that has a SevError
in the log.

As a part of this, we also implicitly downgrade any injected errors to SevWarnAlways.
2017-12-19 17:17:50 -08:00
Stephen Atherton aa5169bd3c Removed unnecessary trace event. 2017-12-19 15:29:22 -08:00
Stephen Atherton e28641886d TraceEvent improvements. Minor bug fix, restore log writing tasks didn't have the log file endVersion but it's only for logging purposes. 2017-12-19 15:27:04 -08:00