Stephen Atherton
aa5169bd3c
Removed unnecessary trace event.
2017-12-19 15:29:22 -08:00
Stephen Atherton
e28641886d
TraceEvent improvements. Minor bug fix, restore log writing tasks didn't have the log file endVersion but it's only for logging purposes.
2017-12-19 15:27:04 -08:00
Stephen Atherton
a276985baf
Bug fix, if there are range files in a restore which begin at exactly the restore version they will be repeatedly dispatched forever.
2017-12-18 17:48:18 -08:00
Stephen Atherton
005a4a0706
Restore status bug fix, during restore the apply lag would appear as a large negative number until the first restore batch is completed. Test improvement, snapshot dispatch now chooses a random number of tasks to dispatch per commit.
2017-12-18 15:56:57 -08:00
Stephen Atherton
937fa75bec
Bug fix, if target snapshot end version is at or before the begin version then no progress would be made.
2017-12-18 00:13:25 -08:00
Stephen Atherton
d32a770648
Bug fix, backup never went to differential mode once it was restorable which caused waitBackup to only return once the backup was discontinued.
2017-12-17 23:22:18 -08:00
Stephen Atherton
2b92815e8c
Bug fix. The snapshot dispatch add task retry loop was incorrectly deciding that the second and further transaction of an execution was already committed and therefore skipping it, resulting in missing ranges in the snapshot.
2017-12-17 21:01:31 -08:00
Stephen Atherton
afd2603576
Refactored backup task flow and config to support ongoing snapshots and allow stopping the backup cleanly between snapshots. The previously separate tasks for initial and differential mode log dispatching have been merged into BackupLogsDispatchTask.
2017-12-17 14:29:57 -08:00
Stephen Atherton
18305ab326
Bug fixes. Added snapshotBatchSize to backupConfig to enable detecting if a transaction for adding a group of tasks to a batch had already completed. Changed KeyRangeMap usage so that each range value to be dispatched has a unique integer value, enabling more efficient range coalescing and avoiding some iterator invalidation bugs.
2017-12-15 01:39:50 -08:00
Stephen Atherton
33f9f1a95c
Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration.
2017-12-14 01:44:38 -08:00
Stephen Atherton
47a9a7ab0e
Finished backup container discovery / listing via base URL.
2017-12-12 17:44:03 -08:00
Stephen Atherton
b6cfe010a1
Bug fix in URL encoding of delimiter.
2017-12-12 17:31:19 -08:00
Stephen Atherton
4bc7d0b86a
Updated error names and severities.
2017-12-06 15:42:44 -08:00
Stephen Atherton
d3b4a81ed0
Blobstore connection details in unit tests now come from environment variables.
2017-12-06 14:38:45 -08:00
Stephen Atherton
4068ed3554
Merge branch 'backup-container-refactor' of github.com:apple/foundationdb into backup-container-refactor
2017-12-06 14:12:26 -08:00
Stephen Atherton
ce6c49e173
Corrected a bunch of retry loops to not reset the backoff timer.
2017-12-06 14:11:40 -08:00
Balachandar Namasivayam
1f949240f5
Make fdbbackup s3 compatible.
...
s3 sends response in XML. FDB backup expects json response. Added a new libraray xml2json to convert xml to json.
2017-12-05 17:13:15 -08:00
Stephen Atherton
f8e89a40ac
Bug fixes, take(1) is incorrect usage of FlowLock.
2017-12-04 10:25:47 -08:00
Stephen Atherton
86ae6c09c7
Bug fixes, take(1) is incorrect usage of FlowLock.
2017-12-04 10:20:50 -08:00
Stephen Atherton
42c6f7db34
Taskbucket but fix, caused by accidental removal of task function lookup. Added extendMutex to Task for use around transaction loops that call extendTimeout() to reduce conflicts.
2017-12-03 20:52:09 -08:00
Stephen Atherton
3a6708707f
Removed unnecessary duplicate variable.
2017-12-02 07:03:34 -08:00
Stephen Atherton
20a8aae241
Old bug fix, transaction reset() not being called in a retry loop.
2017-12-02 07:02:26 -08:00
Stephen Atherton
eadf93826d
Bug fixes with transaction options and exception handling that were causing internal errors.
2017-12-01 15:16:44 -08:00
Stephen Atherton
aeebe711ce
TaskBucket’s saveAndExtend() is now accomplished through extendTimeout() with an option to save parameters. SaveAndExtendIncrementally() has been removed as it is no longer needed because TaskBucket’s normal execution loop calls extendTimeout() periodically as long as the TaskFunc’s execute() actor has not finished or thrown. If a TaskFunc wants to save changes to task parameters to checkpoint progress for task restarts to benefit from it can call extendTimeout() explicitly with the updateParams flag set to true.
2017-11-30 17:18:57 -08:00
Stephen Atherton
1e643239f9
Improvement in blob connnection reuse, oldest connnections in pool are now used first.
2017-11-30 12:57:29 -08:00
Stephen Atherton
39edda1804
Bug fix, and some code cleanup along the way. If a range backup task dies in finish() the re-run of the task will start at begin == end, which wasn’t being handled correctly.
2017-11-27 15:57:19 -08:00
Stephen Atherton
d9c2f6d705
Bug fix. The terminator argument of readCommitted() previously did nothing, and end_of_stream() was always sent to the output stream. The parameter was fixed to enable changing this behavior but original the behavior was not being correctly preserved in at least one case.
2017-11-26 22:52:47 -08:00
Stephen Atherton
9ce9fd8692
Added comments to describe IBackupFile contract.
2017-11-26 22:02:14 -08:00
Stephen Atherton
1d3af8f4f0
Bug fix.
2017-11-25 21:13:56 -08:00
Stephen Atherton
1b1c8e985a
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-11-25 19:54:51 -08:00
Stephen Atherton
6695c9e6a2
Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files.
2017-11-25 00:46:16 -08:00
Stephen Atherton
3449bc4cdc
Bug fix, range end was wrong for final range file of backup range task.
2017-11-19 04:44:33 -08:00
Stephen Atherton
a31216f3f7
Added toString() to Backup/Restore TaskFunc interface so tasks can provide a method to describe important task parameters for the default handleError() methods to use.
2017-11-19 04:39:18 -08:00
Stephen Atherton
32903ffa77
Trace event improvements and severity changes.
2017-11-19 04:34:28 -08:00
Stephen Atherton
9354a8cbb4
Added new backup container method to list everything in a backup.
2017-11-19 04:28:22 -08:00
Evan Tschannen
f9efdf1fc1
fix: typeString was not static, so it added a lot of memory to MutationRef
2017-11-17 23:36:09 -08:00
Alex Miller
f19cb3bbbd
Merge pull request #208 from cie/alexmiller/grvtfix
...
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
Bhaskar Muppana
1bf84cd51a
Merge pull request #210 from bmuppana/backup-logs
...
Adding TraceEvents for BackupRangeTask.
2017-11-16 19:12:04 -08:00
Bhaskar Muppana
5e596ea670
Adding TraceEvents for BackupRangeTask.
2017-11-16 19:11:31 -08:00
Yichi Chiang
d9a98aa968
Remove commented code
2017-11-16 17:25:37 -08:00
Evan Tschannen
7211a397b0
Merge pull request #209 from cie/fix-double-recoveries
...
Fix double recoveries
2017-11-16 17:16:39 -08:00
Yichi Chiang
0d5dc15ac8
Fix double recoveries
2017-11-16 16:58:55 -08:00
Stephen Atherton
07c19098fe
Improved backup container unit test, added file reading / verification, more data, and a series of expirations and validating the expected result. Then fixed the bugs that this new testing discovered.
2017-11-16 16:19:56 -08:00
Alex Miller
e9412bbb11
Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
...
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that. The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap. Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.
The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.
This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
Stephen Atherton
f105204aca
Shifted version distribution over folders.
2017-11-15 23:13:04 -08:00
Stephen Atherton
cc47d0e161
Bug fix in restore dispatch, begin file was not being incremented. Removed try/catch because the inherited handleError() is better.
2017-11-15 22:38:31 -08:00
Alex Miller
e900333dbf
Fix a subtle valgrind error.
...
If there was an error in waiting for the read version, we would attempt to
serialize and eventually commit a CommitTransactionRef that had an
uninitialized read_snapshot.
2017-11-15 19:21:20 -08:00
Evan Tschannen
ad456a939a
Merge pull request #206 from cie/change-excluded-cluster-controller
...
Change excluded cluster controller
2017-11-15 17:28:33 -08:00
Yichi Chiang
f96faf72d9
Add fullyRecoveredConfig for checking exclusions
2017-11-15 17:15:24 -08:00
Stephen Atherton
ab0017f023
TaskBucket’s TaskFunc interface now has an optional handleError() which is called on any task that throws an error from execute() or finish(). Restore and Backup tasks use this to ensure that any errors that occur are placed in the backup or restore config’s lastError property. Bug fixes in log and range file encodings.
2017-11-15 13:33:09 -08:00