Commit Graph

711 Commits

Author SHA1 Message Date
Stephen Atherton aa5169bd3c Removed unnecessary trace event. 2017-12-19 15:29:22 -08:00
Stephen Atherton e28641886d TraceEvent improvements. Minor bug fix, restore log writing tasks didn't have the log file endVersion but it's only for logging purposes. 2017-12-19 15:27:04 -08:00
Stephen Atherton a276985baf Bug fix, if there are range files in a restore which begin at exactly the restore version they will be repeatedly dispatched forever. 2017-12-18 17:48:18 -08:00
Stephen Atherton 005a4a0706 Restore status bug fix, during restore the apply lag would appear as a large negative number until the first restore batch is completed. Test improvement, snapshot dispatch now chooses a random number of tasks to dispatch per commit. 2017-12-18 15:56:57 -08:00
Stephen Atherton 937fa75bec Bug fix, if target snapshot end version is at or before the begin version then no progress would be made. 2017-12-18 00:13:25 -08:00
Stephen Atherton d32a770648 Bug fix, backup never went to differential mode once it was restorable which caused waitBackup to only return once the backup was discontinued. 2017-12-17 23:22:18 -08:00
Stephen Atherton 2b92815e8c Bug fix. The snapshot dispatch add task retry loop was incorrectly deciding that the second and further transaction of an execution was already committed and therefore skipping it, resulting in missing ranges in the snapshot. 2017-12-17 21:01:31 -08:00
Stephen Atherton afd2603576 Refactored backup task flow and config to support ongoing snapshots and allow stopping the backup cleanly between snapshots. The previously separate tasks for initial and differential mode log dispatching have been merged into BackupLogsDispatchTask. 2017-12-17 14:29:57 -08:00
Stephen Atherton 18305ab326 Bug fixes. Added snapshotBatchSize to backupConfig to enable detecting if a transaction for adding a group of tasks to a batch had already completed. Changed KeyRangeMap usage so that each range value to be dispatched has a unique integer value, enabling more efficient range coalescing and avoiding some iterator invalidation bugs. 2017-12-15 01:39:50 -08:00
Stephen Atherton 33f9f1a95c Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration. 2017-12-14 01:44:38 -08:00
Stephen Atherton 47a9a7ab0e Finished backup container discovery / listing via base URL. 2017-12-12 17:44:03 -08:00
Stephen Atherton b6cfe010a1 Bug fix in URL encoding of delimiter. 2017-12-12 17:31:19 -08:00
Stephen Atherton 4bc7d0b86a Updated error names and severities. 2017-12-06 15:42:44 -08:00
Stephen Atherton d3b4a81ed0 Blobstore connection details in unit tests now come from environment variables. 2017-12-06 14:38:45 -08:00
Stephen Atherton 4068ed3554 Merge branch 'backup-container-refactor' of github.com:apple/foundationdb into backup-container-refactor 2017-12-06 14:12:26 -08:00
Stephen Atherton ce6c49e173 Corrected a bunch of retry loops to not reset the backoff timer. 2017-12-06 14:11:40 -08:00
Balachandar Namasivayam 1f949240f5 Make fdbbackup s3 compatible.
s3 sends response in XML.  FDB backup expects json response. Added a new libraray xml2json to convert xml to json.
2017-12-05 17:13:15 -08:00
Stephen Atherton f8e89a40ac Bug fixes, take(1) is incorrect usage of FlowLock. 2017-12-04 10:25:47 -08:00
Stephen Atherton 86ae6c09c7 Bug fixes, take(1) is incorrect usage of FlowLock. 2017-12-04 10:20:50 -08:00
Stephen Atherton 42c6f7db34 Taskbucket but fix, caused by accidental removal of task function lookup. Added extendMutex to Task for use around transaction loops that call extendTimeout() to reduce conflicts. 2017-12-03 20:52:09 -08:00
Stephen Atherton 3a6708707f Removed unnecessary duplicate variable. 2017-12-02 07:03:34 -08:00
Stephen Atherton 20a8aae241 Old bug fix, transaction reset() not being called in a retry loop. 2017-12-02 07:02:26 -08:00
Stephen Atherton eadf93826d Bug fixes with transaction options and exception handling that were causing internal errors. 2017-12-01 15:16:44 -08:00
Stephen Atherton aeebe711ce TaskBucket’s saveAndExtend() is now accomplished through extendTimeout() with an option to save parameters. SaveAndExtendIncrementally() has been removed as it is no longer needed because TaskBucket’s normal execution loop calls extendTimeout() periodically as long as the TaskFunc’s execute() actor has not finished or thrown. If a TaskFunc wants to save changes to task parameters to checkpoint progress for task restarts to benefit from it can call extendTimeout() explicitly with the updateParams flag set to true. 2017-11-30 17:18:57 -08:00
Stephen Atherton 1e643239f9 Improvement in blob connnection reuse, oldest connnections in pool are now used first. 2017-11-30 12:57:29 -08:00
Stephen Atherton 39edda1804 Bug fix, and some code cleanup along the way. If a range backup task dies in finish() the re-run of the task will start at begin == end, which wasn’t being handled correctly. 2017-11-27 15:57:19 -08:00
Stephen Atherton d9c2f6d705 Bug fix. The terminator argument of readCommitted() previously did nothing, and end_of_stream() was always sent to the output stream. The parameter was fixed to enable changing this behavior but original the behavior was not being correctly preserved in at least one case. 2017-11-26 22:52:47 -08:00
Stephen Atherton 9ce9fd8692 Added comments to describe IBackupFile contract. 2017-11-26 22:02:14 -08:00
Stephen Atherton 1d3af8f4f0 Bug fix. 2017-11-25 21:13:56 -08:00
Stephen Atherton 1b1c8e985a Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2017-11-25 19:54:51 -08:00
Stephen Atherton 6695c9e6a2 Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files. 2017-11-25 00:46:16 -08:00
Stephen Atherton 3449bc4cdc Bug fix, range end was wrong for final range file of backup range task. 2017-11-19 04:44:33 -08:00
Stephen Atherton a31216f3f7 Added toString() to Backup/Restore TaskFunc interface so tasks can provide a method to describe important task parameters for the default handleError() methods to use. 2017-11-19 04:39:18 -08:00
Stephen Atherton 32903ffa77 Trace event improvements and severity changes. 2017-11-19 04:34:28 -08:00
Stephen Atherton 9354a8cbb4 Added new backup container method to list everything in a backup. 2017-11-19 04:28:22 -08:00
Evan Tschannen f9efdf1fc1 fix: typeString was not static, so it added a lot of memory to MutationRef 2017-11-17 23:36:09 -08:00
Alex Miller f19cb3bbbd Merge pull request #208 from cie/alexmiller/grvtfix
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
Bhaskar Muppana 1bf84cd51a Merge pull request #210 from bmuppana/backup-logs
Adding TraceEvents for BackupRangeTask.
2017-11-16 19:12:04 -08:00
Bhaskar Muppana 5e596ea670 Adding TraceEvents for BackupRangeTask. 2017-11-16 19:11:31 -08:00
Yichi Chiang d9a98aa968 Remove commented code 2017-11-16 17:25:37 -08:00
Evan Tschannen 7211a397b0 Merge pull request #209 from cie/fix-double-recoveries
Fix double recoveries
2017-11-16 17:16:39 -08:00
Yichi Chiang 0d5dc15ac8 Fix double recoveries 2017-11-16 16:58:55 -08:00
Stephen Atherton 07c19098fe Improved backup container unit test, added file reading / verification, more data, and a series of expirations and validating the expected result. Then fixed the bugs that this new testing discovered. 2017-11-16 16:19:56 -08:00
Alex Miller e9412bbb11 Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that.  The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap.  Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.

The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.

This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
Stephen Atherton f105204aca Shifted version distribution over folders. 2017-11-15 23:13:04 -08:00
Stephen Atherton cc47d0e161 Bug fix in restore dispatch, begin file was not being incremented. Removed try/catch because the inherited handleError() is better. 2017-11-15 22:38:31 -08:00
Alex Miller e900333dbf Fix a subtle valgrind error.
If there was an error in waiting for the read version, we would attempt to
serialize and eventually commit a CommitTransactionRef that had an
uninitialized read_snapshot.
2017-11-15 19:21:20 -08:00
Evan Tschannen ad456a939a Merge pull request #206 from cie/change-excluded-cluster-controller
Change excluded cluster controller
2017-11-15 17:28:33 -08:00
Yichi Chiang f96faf72d9 Add fullyRecoveredConfig for checking exclusions 2017-11-15 17:15:24 -08:00
Stephen Atherton ab0017f023 TaskBucket’s TaskFunc interface now has an optional handleError() which is called on any task that throws an error from execute() or finish(). Restore and Backup tasks use this to ensure that any errors that occur are placed in the backup or restore config’s lastError property. Bug fixes in log and range file encodings. 2017-11-15 13:33:09 -08:00