Commit Graph

908 Commits

Author SHA1 Message Date
Evan Tschannen c3918d892a do not use bandwidth splitting on the keyServer shard, lots of sets and clears to this shard generally means you do not want to create additional data distribution work 2017-11-30 18:28:16 -08:00
Evan Tschannen 0c986f25ed Merge pull request #215 from cie/alexmiller/drtimefix
Fix a race between dumpData and version upgrades.
2017-11-30 18:17:19 -08:00
Alex Miller 7bab3a4ece AsyncFileKAIO will prefer using fallocate's ZERO_RANGE for AsyncFile::zero().
For situations in which we have support for FALLOC_FL_ZERO_RANGE, it's much
faster to use fallocate than manually overwrite the file with zero bytes.  Note
that this support depends on having a kernel from late 2014 or newer, and being
on ext4 or xfs.  If these conditions aren't met, we'll fall back to writing
zeros in 1MB chunks as normal.
2017-11-30 17:57:55 -08:00
Alex Miller 196258080b Refactor zeroing a chunk of a file from DiskQueue into IAsyncFile.
If we're going to do the work to provide more optimized ways to zero files,
then I'd feel better with this being in a more common place, so that any other
zero-ers are likely to reuse it.  It also makes testing easier/more obvious.

Also, because it's needed for correctness, fix the aligned_alloc for OSX, which
wasn't aligned, and use an actually aligned allocation function.
2017-11-30 17:57:55 -08:00
Alex Miller e583beb8f6 Fix a race between dumpData and version upgrades.
This fixes the occasional VersionStampBackupToDB failures, that were caused by
the version upgrade comarision happening before dumpData invocations were
stopped.  Committing the first transaction stops dumpData, and thus we can then
do the primary vs secondary version check correctly.
2017-11-30 17:37:00 -08:00
Alex Miller c7a120c59d Rename IAsyncFile::incrementalDelete -> IAsyncFileSystem::incrementalDeleteFile.
`deleteFile` existed in IAsyncFileSystem, so an incremental delete function
seems to belong more as a virtual method on IAsyncFileSystem than a static
method on IAsyncFile, and the naming should match.

As long as we're here, change IAsyncFile to declare a virtual destructor, so
that it has good and proper C++ behavior.  I presume this is what was vaguely
intended by the default constructor definition that previously existed?
2017-11-30 17:19:10 -08:00
Stephen Atherton aeebe711ce TaskBucket’s saveAndExtend() is now accomplished through extendTimeout() with an option to save parameters. SaveAndExtendIncrementally() has been removed as it is no longer needed because TaskBucket’s normal execution loop calls extendTimeout() periodically as long as the TaskFunc’s execute() actor has not finished or thrown. If a TaskFunc wants to save changes to task parameters to checkpoint progress for task restarts to benefit from it can call extendTimeout() explicitly with the updateParams flag set to true. 2017-11-30 17:18:57 -08:00
A.J. Beamon 9d83501800 fdbmonitor now deconfigures processes when the config file is deleted. Depending on the specified settings, running processes may or may not be killed in that case. If they are not killed, then they are rejoined with the config if it reappears.
In this implementation, the deletion or moving of a parent directory does not trigger an immediate reload of the configuration, but the change will be detected the next time the configuration is reloaded for other reasons.
2017-11-30 13:57:05 -08:00
Stephen Atherton 1e643239f9 Improvement in blob connnection reuse, oldest connnections in pool are now used first. 2017-11-30 12:57:29 -08:00
Evan Tschannen 7f72aa7de5 fix: a storage server does not ever need to rollback before a version restored from disk 2017-11-30 11:19:43 -08:00
Evan Tschannen e5a682948c Merge pull request #212 from cie/check-cluster-controller-desired-class
Check cluster controller using desired process class in consistency c…
2017-11-29 15:57:51 -08:00
Yichi Chiang 8ba0eaebff Check cluster controller using desired process class in consistency check 2017-11-29 15:09:23 -08:00
Evan Tschannen 8c51bc4ac4 fixed low latency tests in a way that gives us better test coverage 2017-11-28 18:20:29 -08:00
Evan Tschannen dc624a54dc fix: avoid flushing large queues in simulation when checking latency 2017-11-27 17:23:20 -08:00
Stephen Atherton 39edda1804 Bug fix, and some code cleanup along the way. If a range backup task dies in finish() the re-run of the task will start at begin == end, which wasn’t being handled correctly. 2017-11-27 15:57:19 -08:00
Evan Tschannen 062d7ad400 fix: client might not notice a cluster controller which has changed ids because of process class or exclusion changes 2017-11-27 15:08:03 -08:00
Stephen Atherton d9c2f6d705 Bug fix. The terminator argument of readCommitted() previously did nothing, and end_of_stream() was always sent to the output stream. The parameter was fixed to enable changing this behavior but original the behavior was not being correctly preserved in at least one case. 2017-11-26 22:52:47 -08:00
Stephen Atherton 9ce9fd8692 Added comments to describe IBackupFile contract. 2017-11-26 22:02:14 -08:00
Stephen Atherton 1d3af8f4f0 Bug fix. 2017-11-25 21:13:56 -08:00
Stephen Atherton 1b1c8e985a Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2017-11-25 19:54:51 -08:00
Stephen Atherton 6695c9e6a2 Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files. 2017-11-25 00:46:16 -08:00
Stephen Atherton 3449bc4cdc Bug fix, range end was wrong for final range file of backup range task. 2017-11-19 04:44:33 -08:00
Stephen Atherton a31216f3f7 Added toString() to Backup/Restore TaskFunc interface so tasks can provide a method to describe important task parameters for the default handleError() methods to use. 2017-11-19 04:39:18 -08:00
Stephen Atherton 32903ffa77 Trace event improvements and severity changes. 2017-11-19 04:34:28 -08:00
Stephen Atherton 9354a8cbb4 Added new backup container method to list everything in a backup. 2017-11-19 04:28:22 -08:00
Evan Tschannen f9efdf1fc1 fix: typeString was not static, so it added a lot of memory to MutationRef 2017-11-17 23:36:09 -08:00
Alex Miller f19cb3bbbd Merge pull request #208 from cie/alexmiller/grvtfix
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
A.J. Beamon 3ded271153 Dispose of Cluster objects in fdb.open() 2017-11-17 12:21:14 -08:00
Alec Grieser f657be8136 add a space to match the bracing style used in this file 2017-11-17 09:55:11 -08:00
A.J. Beamon 0981e0dcdd Dispose of newly created transactions if transfer() fails. 2017-11-17 09:47:17 -08:00
Bhaskar Muppana 1bf84cd51a Merge pull request #210 from bmuppana/backup-logs
Adding TraceEvents for BackupRangeTask.
2017-11-16 19:12:04 -08:00
Bhaskar Muppana 5e596ea670 Adding TraceEvents for BackupRangeTask. 2017-11-16 19:11:31 -08:00
Yichi Chiang d9a98aa968 Remove commented code 2017-11-16 17:25:37 -08:00
Evan Tschannen 7211a397b0 Merge pull request #209 from cie/fix-double-recoveries
Fix double recoveries
2017-11-16 17:16:39 -08:00
Yichi Chiang 0d5dc15ac8 Fix double recoveries 2017-11-16 16:58:55 -08:00
Stephen Atherton 07c19098fe Improved backup container unit test, added file reading / verification, more data, and a series of expirations and validating the expected result. Then fixed the bugs that this new testing discovered. 2017-11-16 16:19:56 -08:00
Alex Miller e9412bbb11 Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that.  The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap.  Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.

The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.

This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
A.J. Beamon 5b5850e097 The dispose in Database.createTransaction was supposed to happen on error, not in the finally block 2017-11-16 10:50:13 -08:00
Stephen Atherton f105204aca Shifted version distribution over folders. 2017-11-15 23:13:04 -08:00
Stephen Atherton cc47d0e161 Bug fix in restore dispatch, begin file was not being incremented. Removed try/catch because the inherited handleError() is better. 2017-11-15 22:38:31 -08:00
Alex Miller e900333dbf Fix a subtle valgrind error.
If there was an error in waiting for the read version, we would attempt to
serialize and eventually commit a CommitTransactionRef that had an
uninitialized read_snapshot.
2017-11-15 19:21:20 -08:00
Evan Tschannen ad456a939a Merge pull request #206 from cie/change-excluded-cluster-controller
Change excluded cluster controller
2017-11-15 17:28:33 -08:00
Yichi Chiang f96faf72d9 Add fullyRecoveredConfig for checking exclusions 2017-11-15 17:15:24 -08:00
A.J. Beamon db017317ac Update the Java bindings to call add missing dispose calls. 2017-11-15 15:56:50 -08:00
Stephen Atherton ab0017f023 TaskBucket’s TaskFunc interface now has an optional handleError() which is called on any task that throws an error from execute() or finish(). Restore and Backup tasks use this to ensure that any errors that occur are placed in the backup or restore config’s lastError property. Bug fixes in log and range file encodings. 2017-11-15 13:33:09 -08:00
A.J. Beamon 02a9978612 Merge pull request #204 from cie/fdbcli-warn-incompatible-peers
Added a counter to keep track of active outgoing incompatible connect…
2017-11-15 12:52:01 -08:00
Evan Tschannen 30464e943c Merge pull request #205 from cie/cleanup-spammy-traceevents
Cleanup spammy traceevents
2017-11-15 12:41:37 -08:00
Evan Tschannen e113dba0e3 added a new trace event tracking master recovery durations 2017-11-15 12:38:26 -08:00
Stephen Atherton a77162b53d Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbclient/BackupAgent.h
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton e07dcb9ada Fixed header paths. 2017-11-15 00:05:20 -08:00