Commit Graph

887 Commits

Author SHA1 Message Date
Evan Tschannen be643d6937 fix: the tlog did not cancel recovery properly when stopped 2018-01-12 17:18:14 -08:00
Alvin Moore 2e6ce03224 Merge pull request #232 from cie/build-dont-compile-hpp
Filter out .hpp files from *_BUILD_SOURCES (like we do with .h files)…
2018-01-12 14:09:25 -08:00
Evan Tschannen 660cee0254 increased the priority of getKeyServersLocations, because once a client gets a read version, answering their reads should be higher priority than starting new transactions 2018-01-12 13:46:20 -08:00
Evan Tschannen 3915d6825c we need to check the server list at a higher priority, because if we do not notice a storage server interface change for a long period of time, we will mark it as failed 2018-01-12 12:51:07 -08:00
Evan Tschannen 721a891d1f fix: never request more than 100 shards from the proxy at a time to avoid large packets 2018-01-12 10:51:53 -08:00
Evan Tschannen de119f192d fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled) 2018-01-11 16:09:49 -08:00
Evan Tschannen 29ebb19388 Merge branch 'release-5.0' into release-5.1 2018-01-11 15:43:37 -08:00
Evan Tschannen 22e5a0b257 formatting 2018-01-11 14:44:09 -08:00
Evan Tschannen 173a8de3ed DBCoreState supports upgrades from 3.0 versions 2018-01-11 14:39:51 -08:00
Evan Tschannen 02bd83ff76 changed incompatibleDataRead to an asyncTrigger 2018-01-11 13:35:56 -08:00
A.J. Beamon 80b84c23ac Filter out .hpp files from *_BUILD_SOURCES (like we do with .h files). Add xml2json.hpp to our fdbrpc project. 2018-01-10 13:51:57 -08:00
A.J. Beamon ce93d98b50 Temporarily remove xml2json.hpp from fdbrpc vcxproj 2018-01-10 10:18:44 -08:00
A.J. Beamon 2f5073d00f Some visual studio project cleanup. 2018-01-10 10:07:18 -08:00
Evan Tschannen 022df3b91b backup and restore sometimes took too long in simulation 2018-01-09 17:26:42 -08:00
Evan Tschannen 645f68212b make timekeeper priority system immediate 2018-01-08 18:21:00 -08:00
Evan Tschannen 370e8a9903 fix: split metrics could fail an assert in a very rare scenario 2018-01-08 18:20:22 -08:00
Stephen Atherton 0e7d538c94 Bug fix, in recursive blob folder listings the recent removal of common prefixes from the result stream caused the list marker to not be set correctly when a folder level requires multiple requests due to folder size. 2018-01-06 20:58:48 -08:00
Stephen Atherton 96cb06cbc7 Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations. 2018-01-05 23:06:39 -08:00
Stephen Atherton b86f68ceb8 Added new test that combines atomic backup/restore. Added randomization to delays in AtomicRestore workload. 2018-01-05 14:43:21 -08:00
Stephen Atherton 236799c77f Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-01-05 14:38:06 -08:00
Stephen Atherton cbecc44ee2 Merge pull request #231 from cie/fdbmonitor-deconfigure-on-delete
Fdbmonitor deconfigure on delete
2018-01-05 14:23:34 -08:00
Alex Miller f021934792 Fix yet another VersionStamp DR bug.
In this episode, we discover that having a transaction retry loop in which the
transaction conditionally has write conflict ranges is potentially troublesome.

To simplify the problem, if we have two concurrent transaction loops:

    retry {
      if (rand() > .5) tr->set('x', rand());
      if (rand() > .5) tr->set('y', rand());
    }

and

    retry {
      x = tr->get('x')
      y = tr->get('y')
      if (x > y) {
        tr->set('y', x)
      }
      tr->commit();
    }

Is not guaranteed that x > y in the database after the second transaction
commits.  This is because it could read an older snapshot of x and y, in which
x was greater than y, and thus not invoke set.  This means that `tr` is now a
read-only transaction, which no-ops out of committing as an "optimization".  If
we add any write conflict range to `tr`, it then will conflict checked and
committed, which would guarantee that x>y when it commits.

Replace the first transaction with dumpData, and the second with version
upgrade transaction, and you have the bug that we're fixing, why, and how.
2018-01-05 14:23:11 -08:00
Stephen Atherton cbeff0f789 Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-01-05 14:13:27 -08:00
Stephen Atherton 2763713cbc Bug fix, backup snapshot dispatch was calculating that all shards must be done immediately. 2018-01-05 14:12:00 -08:00
Alec Grieser 70cc02d572 Merge pull request #230 from cie/vexillographer-binding-specific-disables
Vexillographer binding specific disables
2018-01-05 13:26:12 -08:00
A.J. Beamon 30067d2f53 Whitespace fixes and removal of change to java's AbstractTester 2018-01-05 13:21:54 -08:00
A.J. Beamon 9f2e6bfbd1 Merge branch 'release-5.1' into vexillographer-binding-specific-disables
# Conflicts:
#	fdbclient/vexillographer/fdb.options
2018-01-05 13:16:41 -08:00
Evan Tschannen d10d42a015 Merge pull request #229 from cie/status-generalize-incorrect-cluster-file
Status generalize incorrect cluster file
2018-01-05 12:42:50 -08:00
A.J. Beamon 5015119115 Generalize the message that gets displayed in status if a cluster file's contents are incorrect. 2018-01-05 10:29:47 -08:00
Stephen Atherton 0f20068e82 Renamed all TaskBucket backup tasks to more appropriate names. Created the ability to make task aliases and used this to direct old task names to a task definition which will abort backups created before version 5.1. 2018-01-04 22:53:31 -08:00
Evan Tschannen e11f461cbd fix: better master exists needs to check master fitness before tlogs or proxies because that is the order of recruitment 2018-01-04 15:19:46 -08:00
A.J. Beamon 653a46f12f Update error string fro cluster_version_changed error 2018-01-04 15:06:09 -08:00
Evan Tschannen f8f1c48d83 sometimes test pausing backups 2018-01-04 11:40:08 -08:00
Stephen Atherton 529716972d Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-01-04 11:38:41 -08:00
Evan Tschannen f2c4beed9f fix: tlogFitness did not consider it better to have one tlog of a better fitness
fix: checkStable was not used in all places in better master exists
fix: we need to call checkOutstanding on worker registration in all cases
fix: in case persistentData is keyValueStoreMemory, we need to make sure it is fully recovered before writing to it
2018-01-04 11:33:02 -08:00
A.J. Beamon 8891804b1b Add some comments to watch setting in fdbmonitor. 2018-01-04 11:31:01 -08:00
A.J. Beamon c0d37864cf Remove the watch on parent of missing directory if we detect that the directory has just shown up. 2018-01-04 11:30:41 -08:00
A.J. Beamon d52280b628 Merge branch 'release-5.1' into fdbmonitor-deconfigure-on-delete
# Conflicts:
#	fdbmonitor/fdbmonitor.cpp
2018-01-04 11:23:35 -08:00
A.J. Beamon 6f452ae9fa Remove unused variable 2018-01-04 10:30:02 -08:00
Stephen Atherton d43e80cf48 Bug fix, atomicRestore would fail to get a commit version after a commit_unknown_result where the transaction actually was committed. This would cause the restore to target version -1 so it would use one of the first available restorable versions in the backup instead of the version at which the database was locked. 2018-01-03 16:24:02 -08:00
Stephen Atherton cec9f4d7a4 Bug fix in DNS resolution. When the result is an error the result promise was being set twice. 2018-01-03 13:05:38 -08:00
Stephen Atherton fd3f3aa647 Increased system key size limit to fix some rare backup use cases. 2018-01-03 12:05:12 -08:00
Stephen Atherton 96c479dc71 Rare bug fix. It turns out that backup log files must be written with unique names, otherwise a re-written >1 block log file overwritten after a restore has begun could read some blocks from before the rewrite and some blocks after, but due to random content ordering this would be incorrect and produce a corrupt restore. This bug is very rare because restore would detect an error unless the rewritten log file has exactly same size as the original file, but this is unlikely because the random content order affects block padding and therefore usable content bytes per block. 2018-01-02 23:38:01 -08:00
Stephen Atherton 371dee70e6 Improved backup folder structure to be shallower but spread files more uniformly and make each folder's entries lexically sort into version order regardless of numeric length. Improved backup container test to use a random version multiplier on the file versions created in order to test a wide range of versioned folder paths. 2018-01-02 23:22:35 -08:00
Stephen Atherton 78430425e8 Blob bucket listings will now use parallel recursive requests on CommonPrefixes, up to a max depth, if a delimiter is provided. 2018-01-02 23:17:52 -08:00
Stephen Atherton 07fde9dfb4 Bug fix, error code 429 was not being treated as retryable in the recent refactor. 2018-01-02 23:15:25 -08:00
Evan Tschannen 6d5dd9bd27 fix: we cannot pipeline disk queue commits until after the first commit is successful 2018-01-02 13:30:27 -08:00
Evan Tschannen 3e4d968308 Merge commit 'f3a3799b1d7af148949cb50c4fc349494976a6f8' into release-5.1 2018-01-02 11:25:46 -08:00
Stephen Atherton f324afc13f Bug fix in blob store listing when it requires multiple serial requests Added more trace events to FileBackup and BlobStoreEndpoint with suppression and added suppression to existing trace events. 2017-12-22 17:08:25 -08:00
Stephen Atherton f2524ffd33 AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description. 2017-12-21 21:15:26 -08:00