Commit Graph

906 Commits

Author SHA1 Message Date
Evan Tschannen 30710f7493 syncLogId was not necessary 2018-01-06 14:52:39 -08:00
Evan Tschannen 3ec45d38a0 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Evan Tschannen 11ade6b6c5 Merge commit '6a756891a4cd9fd1980cecd0890b4a4f17840c5f' 2018-01-06 13:51:06 -08:00
Evan Tschannen 14e956c4fc fix: getRange did not check for endpoint failures
refactored checking endpoint failures
2018-01-06 13:49:47 -08:00
Evan Tschannen 10c3fc165e fix: after recovering from disk, only allow peeking data the was fully recovered 2018-01-06 13:49:13 -08:00
Stephen Atherton 96cb06cbc7 Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations. 2018-01-05 23:06:39 -08:00
Stephen Atherton b86f68ceb8 Added new test that combines atomic backup/restore. Added randomization to delays in AtomicRestore workload. 2018-01-05 14:43:21 -08:00
Stephen Atherton 236799c77f Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-01-05 14:38:06 -08:00
A.J. Beamon 1b8cae5f8e Merge branch 'release-5.1' 2018-01-05 14:24:15 -08:00
Stephen Atherton cbecc44ee2 Merge pull request #231 from cie/fdbmonitor-deconfigure-on-delete
Fdbmonitor deconfigure on delete
2018-01-05 14:23:34 -08:00
Alex Miller f021934792 Fix yet another VersionStamp DR bug.
In this episode, we discover that having a transaction retry loop in which the
transaction conditionally has write conflict ranges is potentially troublesome.

To simplify the problem, if we have two concurrent transaction loops:

    retry {
      if (rand() > .5) tr->set('x', rand());
      if (rand() > .5) tr->set('y', rand());
    }

and

    retry {
      x = tr->get('x')
      y = tr->get('y')
      if (x > y) {
        tr->set('y', x)
      }
      tr->commit();
    }

Is not guaranteed that x > y in the database after the second transaction
commits.  This is because it could read an older snapshot of x and y, in which
x was greater than y, and thus not invoke set.  This means that `tr` is now a
read-only transaction, which no-ops out of committing as an "optimization".  If
we add any write conflict range to `tr`, it then will conflict checked and
committed, which would guarantee that x>y when it commits.

Replace the first transaction with dumpData, and the second with version
upgrade transaction, and you have the bug that we're fixing, why, and how.
2018-01-05 14:23:11 -08:00
A.J. Beamon 27e0cdc0f4 Merge branch 'release-5.1' 2018-01-05 14:18:36 -08:00
Evan Tschannen 63751fb0e2 fix: remote logs are not in the log system until the recovery is complete so they cannot be used to determine if this is the correct log system to recover from 2018-01-05 14:15:25 -08:00
Stephen Atherton cbeff0f789 Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-01-05 14:13:27 -08:00
Stephen Atherton 2763713cbc Bug fix, backup snapshot dispatch was calculating that all shards must be done immediately. 2018-01-05 14:12:00 -08:00
Alec Grieser 70cc02d572 Merge pull request #230 from cie/vexillographer-binding-specific-disables
Vexillographer binding specific disables
2018-01-05 13:26:12 -08:00
A.J. Beamon 30067d2f53 Whitespace fixes and removal of change to java's AbstractTester 2018-01-05 13:21:54 -08:00
A.J. Beamon 9f2e6bfbd1 Merge branch 'release-5.1' into vexillographer-binding-specific-disables
# Conflicts:
#	fdbclient/vexillographer/fdb.options
2018-01-05 13:16:41 -08:00
Stephen Atherton 097cd01968 Merge pull request #213 from cie/fdbmonitor-deconfigure-on-delete
fdbmonitor now deconfigures processes when the config file is deleted.
2018-01-05 13:10:19 -08:00
Evan Tschannen d10d42a015 Merge pull request #229 from cie/status-generalize-incorrect-cluster-file
Status generalize incorrect cluster file
2018-01-05 12:42:50 -08:00
Evan Tschannen 2a869a4178 fixed merge errors 2018-01-05 12:17:17 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
A.J. Beamon 5015119115 Generalize the message that gets displayed in status if a cluster file's contents are incorrect. 2018-01-05 10:29:47 -08:00
Stephen Atherton 0f20068e82 Renamed all TaskBucket backup tasks to more appropriate names. Created the ability to make task aliases and used this to direct old task names to a task definition which will abort backups created before version 5.1. 2018-01-04 22:53:31 -08:00
Alex Miller b264a98aea Fix yet another VersionStamp DR bug.
In this episode, we discover that having a transaction retry loop in which the
transaction conditionally has write conflict ranges is potentially troublesome.

To simplify the problem, if we have two concurrent transaction loops:

    retry {
      if (rand() > .5) tr->set('x', rand());
      if (rand() > .5) tr->set('y', rand());
    }

and

    retry {
      x = tr->get('x')
      y = tr->get('y')
      if (x > y) {
        tr->set('y', x)
      }
      tr->commit();
    }

Is not guaranteed that x > y in the database after the second transaction
commits.  This is because it could read an older snapshot of x and y, in which
x was greater than y, and thus not invoke set.  This means that `tr` is now a
read-only transaction, which no-ops out of committing as an "optimization".  If
we add any write conflict range to `tr`, it then will conflict checked and
committed, which would guarantee that x>y when it commits.

Replace the first transaction with dumpData, and the second with version
upgrade transaction, and you have the bug that we're fixing, why, and how.
2018-01-04 17:29:43 -08:00
Evan Tschannen e11f461cbd fix: better master exists needs to check master fitness before tlogs or proxies because that is the order of recruitment 2018-01-04 15:19:46 -08:00
A.J. Beamon 653a46f12f Update error string fro cluster_version_changed error 2018-01-04 15:06:09 -08:00
A.J. Beamon 7d6c93122f Merge branch 'release-5.1' 2018-01-04 11:41:29 -08:00
Evan Tschannen f8f1c48d83 sometimes test pausing backups 2018-01-04 11:40:08 -08:00
Stephen Atherton 529716972d Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-01-04 11:38:41 -08:00
Evan Tschannen f2c4beed9f fix: tlogFitness did not consider it better to have one tlog of a better fitness
fix: checkStable was not used in all places in better master exists
fix: we need to call checkOutstanding on worker registration in all cases
fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it
2018-01-04 11:33:02 -08:00
A.J. Beamon 8891804b1b Add some comments to watch setting in fdbmonitor. 2018-01-04 11:31:01 -08:00
A.J. Beamon c0d37864cf Remove the watch on parent of missing directory if we detect that the directory has just shown up. 2018-01-04 11:30:41 -08:00
A.J. Beamon d52280b628 Merge branch 'release-5.1' into fdbmonitor-deconfigure-on-delete
# Conflicts:
#	fdbmonitor/fdbmonitor.cpp
2018-01-04 11:23:35 -08:00
A.J. Beamon 6f452ae9fa Remove unused variable 2018-01-04 10:30:02 -08:00
Stephen Atherton d43e80cf48 Bug fix, atomicRestore would fail to get a commit version after a commit_unknown_result where the transaction actually was committed. This would cause the restore to target version -1 so it would use one of the first available restorable versions in the backup instead of the version at which the database was locked. 2018-01-03 16:24:02 -08:00
Stephen Atherton cec9f4d7a4 Bug fix in DNS resolution. When the result is an error the result promise was being set twice. 2018-01-03 13:05:38 -08:00
Stephen Atherton fd3f3aa647 Increased system key size limit to fix some rare backup use cases. 2018-01-03 12:05:12 -08:00
Stephen Atherton 96c479dc71 Rare bug fix. It turns out that backup log files must be written with unique names, otherwise a re-written >1 block log file overwritten after a restore has begun could read some blocks from before the rewrite and some blocks after, but due to random content ordering this would be incorrect and produce a corrupt restore. This bug is very rare because restore would detect an error unless the rewritten log file has exactly same size as the original file, but this is unlikely because the random content order affects block padding and therefore usable content bytes per block. 2018-01-02 23:38:01 -08:00
Stephen Atherton 371dee70e6 Improved backup folder structure to be shallower but spread files more uniformly and make each folder's entries lexically sort into version order regardless of numeric length. Improved backup container test to use a random version multiplier on the file versions created in order to test a wide range of versioned folder paths. 2018-01-02 23:22:35 -08:00
Stephen Atherton 78430425e8 Blob bucket listings will now use parallel recursive requests on CommonPrefixes, up to a max depth, if a delimiter is provided. 2018-01-02 23:17:52 -08:00
Stephen Atherton 07fde9dfb4 Bug fix, error code 429 was not being treated as retryable in the recent refactor. 2018-01-02 23:15:25 -08:00
Evan Tschannen 6d5dd9bd27 fix: we cannot pipeline disk queue commits until after the first commit is successful 2018-01-02 13:30:27 -08:00
Evan Tschannen e648ddc3fe Merge branch 'release-5.1' 2018-01-02 11:30:08 -08:00
Evan Tschannen b5ed01c053 updated version to 5.2 2018-01-02 11:28:30 -08:00
Evan Tschannen 3e4d968308 Merge commit 'f3a3799b1d7af148949cb50c4fc349494976a6f8' into release-5.1 2018-01-02 11:25:46 -08:00
Stephen Atherton f324afc13f Bug fix in blob store listing when it requires multiple serial requests Added more trace events to FileBackup and BlobStoreEndpoint with suppression and added suppression to existing trace events. 2017-12-22 17:08:25 -08:00
Stephen Atherton f2524ffd33 AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description. 2017-12-21 21:15:26 -08:00
Stephen Atherton aa8b4c52d5 Removed backup URL from trace events. 2017-12-21 18:22:14 -08:00
Stephen Atherton 93e426ccd2 Merge branch 'master' of github.com:apple/foundationdb 2017-12-21 17:21:20 -08:00