Evan Tschannen
c3918d892a
do not use bandwidth splitting on the keyServer shard, lots of sets and clears to this shard generally means you do not want to create additional data distribution work
2017-11-30 18:28:16 -08:00
Evan Tschannen
0c986f25ed
Merge pull request #215 from cie/alexmiller/drtimefix
...
Fix a race between dumpData and version upgrades.
2017-11-30 18:17:19 -08:00
Alex Miller
7bab3a4ece
AsyncFileKAIO will prefer using fallocate's ZERO_RANGE for AsyncFile::zero().
...
For situations in which we have support for FALLOC_FL_ZERO_RANGE, it's much
faster to use fallocate than manually overwrite the file with zero bytes. Note
that this support depends on having a kernel from late 2014 or newer, and being
on ext4 or xfs. If these conditions aren't met, we'll fall back to writing
zeros in 1MB chunks as normal.
2017-11-30 17:57:55 -08:00
Alex Miller
196258080b
Refactor zeroing a chunk of a file from DiskQueue into IAsyncFile.
...
If we're going to do the work to provide more optimized ways to zero files,
then I'd feel better with this being in a more common place, so that any other
zero-ers are likely to reuse it. It also makes testing easier/more obvious.
Also, because it's needed for correctness, fix the aligned_alloc for OSX, which
wasn't aligned, and use an actually aligned allocation function.
2017-11-30 17:57:55 -08:00
Alex Miller
e583beb8f6
Fix a race between dumpData and version upgrades.
...
This fixes the occasional VersionStampBackupToDB failures, that were caused by
the version upgrade comarision happening before dumpData invocations were
stopped. Committing the first transaction stops dumpData, and thus we can then
do the primary vs secondary version check correctly.
2017-11-30 17:37:00 -08:00
Alex Miller
c7a120c59d
Rename IAsyncFile::incrementalDelete -> IAsyncFileSystem::incrementalDeleteFile.
...
`deleteFile` existed in IAsyncFileSystem, so an incremental delete function
seems to belong more as a virtual method on IAsyncFileSystem than a static
method on IAsyncFile, and the naming should match.
As long as we're here, change IAsyncFile to declare a virtual destructor, so
that it has good and proper C++ behavior. I presume this is what was vaguely
intended by the default constructor definition that previously existed?
2017-11-30 17:19:10 -08:00
Stephen Atherton
aeebe711ce
TaskBucket’s saveAndExtend() is now accomplished through extendTimeout() with an option to save parameters. SaveAndExtendIncrementally() has been removed as it is no longer needed because TaskBucket’s normal execution loop calls extendTimeout() periodically as long as the TaskFunc’s execute() actor has not finished or thrown. If a TaskFunc wants to save changes to task parameters to checkpoint progress for task restarts to benefit from it can call extendTimeout() explicitly with the updateParams flag set to true.
2017-11-30 17:18:57 -08:00
A.J. Beamon
9d83501800
fdbmonitor now deconfigures processes when the config file is deleted. Depending on the specified settings, running processes may or may not be killed in that case. If they are not killed, then they are rejoined with the config if it reappears.
...
In this implementation, the deletion or moving of a parent directory does not trigger an immediate reload of the configuration, but the change will be detected the next time the configuration is reloaded for other reasons.
2017-11-30 13:57:05 -08:00
Stephen Atherton
1e643239f9
Improvement in blob connnection reuse, oldest connnections in pool are now used first.
2017-11-30 12:57:29 -08:00
Evan Tschannen
7f72aa7de5
fix: a storage server does not ever need to rollback before a version restored from disk
2017-11-30 11:19:43 -08:00
Evan Tschannen
e5a682948c
Merge pull request #212 from cie/check-cluster-controller-desired-class
...
Check cluster controller using desired process class in consistency c…
2017-11-29 15:57:51 -08:00
Yichi Chiang
8ba0eaebff
Check cluster controller using desired process class in consistency check
2017-11-29 15:09:23 -08:00
Evan Tschannen
8c51bc4ac4
fixed low latency tests in a way that gives us better test coverage
2017-11-28 18:20:29 -08:00
Evan Tschannen
dc624a54dc
fix: avoid flushing large queues in simulation when checking latency
2017-11-27 17:23:20 -08:00
Stephen Atherton
39edda1804
Bug fix, and some code cleanup along the way. If a range backup task dies in finish() the re-run of the task will start at begin == end, which wasn’t being handled correctly.
2017-11-27 15:57:19 -08:00
Evan Tschannen
062d7ad400
fix: client might not notice a cluster controller which has changed ids because of process class or exclusion changes
2017-11-27 15:08:03 -08:00
Stephen Atherton
d9c2f6d705
Bug fix. The terminator argument of readCommitted() previously did nothing, and end_of_stream() was always sent to the output stream. The parameter was fixed to enable changing this behavior but original the behavior was not being correctly preserved in at least one case.
2017-11-26 22:52:47 -08:00
Stephen Atherton
9ce9fd8692
Added comments to describe IBackupFile contract.
2017-11-26 22:02:14 -08:00
Stephen Atherton
1d3af8f4f0
Bug fix.
2017-11-25 21:13:56 -08:00
Stephen Atherton
1b1c8e985a
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-11-25 19:54:51 -08:00
Stephen Atherton
6695c9e6a2
Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files.
2017-11-25 00:46:16 -08:00
Stephen Atherton
3449bc4cdc
Bug fix, range end was wrong for final range file of backup range task.
2017-11-19 04:44:33 -08:00
Stephen Atherton
a31216f3f7
Added toString() to Backup/Restore TaskFunc interface so tasks can provide a method to describe important task parameters for the default handleError() methods to use.
2017-11-19 04:39:18 -08:00
Stephen Atherton
32903ffa77
Trace event improvements and severity changes.
2017-11-19 04:34:28 -08:00
Stephen Atherton
9354a8cbb4
Added new backup container method to list everything in a backup.
2017-11-19 04:28:22 -08:00
Evan Tschannen
f9efdf1fc1
fix: typeString was not static, so it added a lot of memory to MutationRef
2017-11-17 23:36:09 -08:00
Alex Miller
f19cb3bbbd
Merge pull request #208 from cie/alexmiller/grvtfix
...
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
A.J. Beamon
3ded271153
Dispose of Cluster objects in fdb.open()
2017-11-17 12:21:14 -08:00
Alec Grieser
f657be8136
add a space to match the bracing style used in this file
2017-11-17 09:55:11 -08:00
A.J. Beamon
0981e0dcdd
Dispose of newly created transactions if transfer() fails.
2017-11-17 09:47:17 -08:00
Bhaskar Muppana
1bf84cd51a
Merge pull request #210 from bmuppana/backup-logs
...
Adding TraceEvents for BackupRangeTask.
2017-11-16 19:12:04 -08:00
Bhaskar Muppana
5e596ea670
Adding TraceEvents for BackupRangeTask.
2017-11-16 19:11:31 -08:00
Yichi Chiang
d9a98aa968
Remove commented code
2017-11-16 17:25:37 -08:00
Evan Tschannen
7211a397b0
Merge pull request #209 from cie/fix-double-recoveries
...
Fix double recoveries
2017-11-16 17:16:39 -08:00
Yichi Chiang
0d5dc15ac8
Fix double recoveries
2017-11-16 16:58:55 -08:00
Stephen Atherton
07c19098fe
Improved backup container unit test, added file reading / verification, more data, and a series of expirations and validating the expected result. Then fixed the bugs that this new testing discovered.
2017-11-16 16:19:56 -08:00
Alex Miller
e9412bbb11
Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
...
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that. The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap. Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.
The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.
This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
A.J. Beamon
5b5850e097
The dispose in Database.createTransaction was supposed to happen on error, not in the finally block
2017-11-16 10:50:13 -08:00
Stephen Atherton
f105204aca
Shifted version distribution over folders.
2017-11-15 23:13:04 -08:00
Stephen Atherton
cc47d0e161
Bug fix in restore dispatch, begin file was not being incremented. Removed try/catch because the inherited handleError() is better.
2017-11-15 22:38:31 -08:00
Alex Miller
e900333dbf
Fix a subtle valgrind error.
...
If there was an error in waiting for the read version, we would attempt to
serialize and eventually commit a CommitTransactionRef that had an
uninitialized read_snapshot.
2017-11-15 19:21:20 -08:00
Evan Tschannen
ad456a939a
Merge pull request #206 from cie/change-excluded-cluster-controller
...
Change excluded cluster controller
2017-11-15 17:28:33 -08:00
Yichi Chiang
f96faf72d9
Add fullyRecoveredConfig for checking exclusions
2017-11-15 17:15:24 -08:00
A.J. Beamon
db017317ac
Update the Java bindings to call add missing dispose calls.
2017-11-15 15:56:50 -08:00
Stephen Atherton
ab0017f023
TaskBucket’s TaskFunc interface now has an optional handleError() which is called on any task that throws an error from execute() or finish(). Restore and Backup tasks use this to ensure that any errors that occur are placed in the backup or restore config’s lastError property. Bug fixes in log and range file encodings.
2017-11-15 13:33:09 -08:00
A.J. Beamon
02a9978612
Merge pull request #204 from cie/fdbcli-warn-incompatible-peers
...
Added a counter to keep track of active outgoing incompatible connect…
2017-11-15 12:52:01 -08:00
Evan Tschannen
30464e943c
Merge pull request #205 from cie/cleanup-spammy-traceevents
...
Cleanup spammy traceevents
2017-11-15 12:41:37 -08:00
Evan Tschannen
e113dba0e3
added a new trace event tracking master recovery durations
2017-11-15 12:38:26 -08:00
Stephen Atherton
a77162b53d
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/BackupAgent.h
# fdbclient/FileBackupAgent.actor.cpp
# fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton
e07dcb9ada
Fixed header paths.
2017-11-15 00:05:20 -08:00