Commit Graph

226 Commits

Author SHA1 Message Date
Jingyu Zhou 4f4ce93f8c Remove debug print out 2020-03-20 20:15:09 -07:00
Jingyu Zhou fe6b4a4398 Some correctness fixes 2020-03-20 20:15:08 -07:00
Jingyu Zhou 5ce9fc0e4c Partitioned logs should be filtered after sorting by tag IDs
The default sorting by begin and end version doesn't work with duplicates
removal, as tags are also compared.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 20df67ee6a Filter partitioned logs with subset relationship
If a log file's progress is not saved, a new log file will be generated
with the same begin version. Then we can have a file that contains a subset
of contents in another log file. During restore, we should filter out files
that their contents are subset of other files.
2020-03-20 20:15:08 -07:00
Jingyu Zhou 05b87cf288 Partitioned logs need to compute continuous begin Version
Because different tags may start at different versions, tag 0 can start at a
higher version. In this case, another tag's high version should be used as
the start version for continuous logs.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 1f95cba53e Add describePartitionedBackup() for parallel restore
For partitioned logs, computing continuous log end version from min logs begin
version. Old backup test keeps using describeBackup() to be correctness clean.

Rename partitioned log file so that the last number is block size.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 938a6f358d Describe backup uses partitioned logs to find continuous end version
For partitioned logs, the continuous end version has to be done range by range,
where each range must contain continuous version for all tags.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 659843ff51 Check partitioned log files are continuous for RestoreSet
The idea of checking is to use Tag 0 to find out ranges and their number of
tags. Then for each tag 1 and above, check versions are continuous.
2020-03-20 20:13:38 -07:00
Jingyu Zhou fda6c08640 Include a total number of tags in partition log file names
This is needed for BackupContainer to check partitioned mutation logs are
continuous, i.e., restorable to a version.
2020-03-20 20:13:38 -07:00
Jingyu Zhou 64859467e4 Return partitioned logs for RestorableFileSet 2020-03-20 20:13:38 -07:00
Jingyu Zhou 88ad28e576 Integrate parallel restore with partitioned logs
In parallel restore, use new getPartitionedRestoreSet() to get a set containing
partitioned mutation logs. The loader uses a new parser to extract mutations
from partitioned logs.

TODO: fix unable to restore errors.
2020-03-20 20:13:38 -07:00
Jingyu Zhou ec352c03c9 Add partitioned logs to BackupContainer 2020-03-20 20:13:38 -07:00
Evan Tschannen c11c24b79d removed the fdbrpc version of platform.h 2020-02-28 14:56:10 -08:00
Evan Tschannen 96258b9809 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbcli/fdbcli.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/FlowTransport.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistribution.actor.h
#	fdbserver/DataDistributionQueue.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/SkipList.cpp
#	fdbserver/StorageMetrics.actor.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KVStoreTest.actor.cpp
#	flow/CMakeLists.txt
#	flow/Knobs.cpp
#	flow/Knobs.h
#	flow/genericactors.actor.cpp
#	flow/serialize.h
2020-02-21 19:09:16 -08:00
Steve Atherton 3d72c2a661 BackupContainerFilesystem no longer unnecessarily depends on abspath() to find the last part of a path string, since it shouldn't touch the local filesystem in the remote case. 2020-02-18 16:35:00 -08:00
Jingyu Zhou 7c10683c77 Backup workers save logs into right containers
The mutation logs of backup workers are saved into "mlogs" directory under the
container directory. The backup worker has been restructured to handle multiple
backups, where each one is stored in a separate backup container.

In the backup worker, mutations pulled from TLogs are buffered in a message
queue. When writing out to different containers, their corresponding mutation
ranges are used to check if a mutation should be written. When a new backup
is submitted by the client, "backupStartedKey" is updated. The worker monitors
this key, updates its internal map of backups, and then next pull from TLog
needs to wait for the readiness of the new backup. This is to ensure when
worker 0 sets the backup is started, all workers have already been logging
mutations for the backup.
2020-02-03 20:27:14 -08:00
Jingyu Zhou d8c74e7e1a Extend BackupContainer to support tagged log files
That is, the file name contains the log router tag ID as the last component,
e.g., "log,39638169,42718056,016f52a4d16ef36fd3335db9c68abfc1,1048576,1".
2020-01-22 19:38:46 -08:00
Jingyu Zhou f21d7ca44c Add tag ID to backup log file names 2020-01-22 19:38:46 -08:00
Jingyu Zhou dafcaee844 Fix compiler errors. 2020-01-22 19:38:45 -08:00
Jingyu Zhou c7f51782b8 Use override for virtual functions. 2020-01-22 19:38:45 -08:00
Steve Atherton 9a031bfc47 Function was renamed. 2019-12-11 11:00:12 -08:00
Stephen Atherton 09e8d804e8 Added BlobStoreEndpoint::listBuckets(), renamed listBucket() and several related functions with similar names to listObjects() to avoid confusion and closer match what it actually does. Added a bytesDeleted output statistic to BlobStoreEndpoint::deleteRecursively. 2019-12-06 00:14:13 -08:00
Meng Xu 3b54363780 FastRestore:Apply Clang-format 2019-08-01 18:09:12 -07:00
Meng Xu 45083edf74 Merge branch 'master' into mengxu/performant-restore-PR
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
Meng Xu 477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Austin Seipp b5cbffc1b8 fdbclient: fix some print/scan format warnings
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Meng Xu 529ce66b6c Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-18 18:02:45 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Meng Xu 70d7c289f4 Merge branch 'master' into mengxu/restore/parallel-v7 2019-03-30 22:13:10 -07:00
Meng Xu 589fb76826 FastRestore:Attempt to fix old restore 2019-03-30 15:19:30 -07:00
Meng Xu 5e9a6edfe6 FastRestore:bug fix: Lock DB successfully 2019-03-29 13:31:38 -07:00
Stephen Atherton cabe7ca844 Stopped using %z to parse timezone offset with strptime() because it only seems to work as expected on MacOS. Updated time input/output unit tests so that they don't assume what the local timezone is. 2019-03-21 19:38:07 -07:00
Stephen Atherton d5e50e6963 Rewrote BackupAgentBase::parseTime() to use std::get_time() so it compiles on all supported platforms. Added unit test for parseTime() and formatTime(). 2019-03-20 01:18:37 -07:00
Stephen Atherton c6edcc7f06 Added schema version string to backup JSON status docs. Bug fix in backup status JSON, the document was being created outside the transaction retry loop so retries would combine partial element sets across all tries into the result. 2019-03-14 02:10:14 -07:00
Steve Atherton 8aab719c22
Merge branch 'master' into feature-backup-json 2019-03-12 18:23:16 -07:00
Stephen Atherton bc0b2aa040 Merge branch 'release-6.0' of https://github.com/apple/foundationdb
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BlobStore.actor.cpp
2019-03-12 04:49:12 -07:00
Stephen Atherton 023bbb566f Renamed backup state enums for clarity, added backup state names. Changed Epochs to EpochSeconds in backup JSON along with some other renaming/moving of fields, and added information about snapshot dispatch. Changed timestamp format for input/output in all backup/restore contexts to be a fully qualified time with timezone offset. Added information about the last snapshot dispatch to backup config and status (not yet populated). 2019-03-10 16:00:01 -07:00
Stephen Atherton 06c11a316d Normalized timestamp to text format across backup and restore tooling. Added epochs field to JSON objects describing versions and timestamps in backup status and describe output, renamed some fields for clarity. 2019-03-06 22:34:25 -08:00
Stephen Atherton ca8bbad657 Added --json option to fdbbackup describe. Also added expired percentage indicator to snapshot details. 2019-03-06 14:14:06 -08:00
Stephen Atherton 87d0c5bae0 Bug/usability fix: The output URLs of fdbbackup list were meant to be directly usable for backup management operations but they were missing the bucket URL parameter. 2019-03-05 21:14:21 -08:00
Stephen Atherton d3377722d5 Added blob store Backup URL parameter 'header' which enables addition of custom HTTP header fields to blob store HTTP requests. Added 'fdbbackup modify' command line tool for changing the backup URL and parameters, default snapshot interval, and/or current snapshot interval of a running backup. 2019-03-05 04:00:11 -08:00
Stephen Atherton 7d287c6999 Merge branch 'release-6.0'
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2019-02-28 14:01:00 -08:00
Stephen Atherton 887856b6b0 Bug fix in AsyncFileReadAhead where a file size that is an integer multiple of the read chunk size will cause a crash when reading the file's final block. BackupContainerLocalDirectory now uses AsyncFileReadAhead in simulation to get simulation coverage of that class, and FileBackup will generate file sizes which expose the bug. 2019-02-28 00:22:38 -08:00
mpilman 479a4d7c22 Minor fixes in fdbclient for intellisense 2019-02-19 15:16:59 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
Alex Miller 0750dc0418 Change store from (Future, T&) to (T&, Future).
LHS = RHS, and the name of what's being modified is easier to find.
2019-02-04 18:04:22 -08:00
Meng Xu 9f5e06099f Circus test: get performance result from circus
A worker may die which prevents the restore from finishing.
The restore speed is only 30MB per second, which need improvement
2019-02-01 14:09:40 -08:00
Meng Xu 550f2e2682 Merge with master to use the latest backup container
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
Meng Xu 2e11b38f3f Add print in fast restore agent about backup info 2019-01-30 11:18:11 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00
Stephen Atherton bb2de3479b Updated unit test to new backup container behavior. 2019-01-08 16:29:00 -08:00
Stephen Atherton 2951369006 When starting a restore, if range files are missing their names will be logged before an error is thrown. The error thrown is now restore_missing_data instead of restore_corrupted_data. 2019-01-08 16:28:40 -08:00
Stephen Atherton 928876be81 Backup dumpFileList() can now take a version range. 2018-12-21 22:42:29 -08:00
Stephen Atherton f64d5321e9 Backup describe, expire, and delete will now clearly indicate when there is no backup at the given URL. 2018-12-20 18:05:23 -08:00
Stephen Atherton 354abebf64 Added progress reporting to backup expiration. Simplified backup delete progress. 2018-12-20 00:23:26 -08:00
Stephen Atherton fcb34a8768 In backup describe, when not using the original cluster to resolve versions to date/time strings the versions will be converted to approximate day deltas from maxLogEndVersion (if available) using core_versionspersecond. 2018-12-19 16:53:39 -08:00
Stephen Atherton 00568f4011 Error code was misleading, added comment. 2018-12-19 13:14:48 -08:00
Stephen Atherton fd4a62fbfd Backup expire will fail earlier if force option is needed but not specified. 2018-12-19 10:36:25 -08:00
Stephen Atherton 172c3f2021 Bug fix in backup describe, do not update log begin/end metadata in backup if describe was given a start version override as it can result in incorrect metadata. 2018-12-19 10:35:06 -08:00
Stephen Atherton afa243bc97 Added fdbbackup expire options to calculate approximate version boundaries based on a number of days prior to the latest log file found in the backup container. This enables expiration operations based on time (with reasonable precision) without accessing the source cluster. 2018-12-18 18:55:44 -08:00
Stephen Atherton 69d847dbbd Backup describe can now analyze a backup assuming log data starts at a given version. This is used by expire both as an optimization and because doing so enables expire to repair a bad metadata state that can result from a cancelled or failed expire operation from fdbbackup <= 6.0.17. ListLogFiles() and ListRangeFiles() no longer sort results as in most cases the sort is not required and the result set can be very large. Describe will now update minLogBegin metadata to the beginning of the log range known to be present when possible. Several serial log and range list pairings are now done in parallel. 2018-12-18 04:33:37 -08:00
Stephen Atherton 9ef9041fba Bug fix, metadata read futures were moved to a vector but then not waited on. 2018-12-17 13:13:35 -08:00
Stephen Atherton dac1827d23 Backup describe's "deep scan" mode should only ignore log begin/end versions, not expire and unreliable end versions. 2018-12-16 00:41:38 -08:00
Stephen Atherton 5951e9d577 Added backup URL and exceptions to trace events where applicable. 2018-12-16 00:33:30 -08:00
Stephen Atherton 223b19f5ba Rewrote backup metadata scheme to fix several design flaws involving cancelled or concurrent operations. Most importantly, before an expire deletes any data it marks the end of the range being deleted as 'unreliable' so further reads of the backup will treat versions before that point as having data holes. 2018-12-16 00:18:13 -08:00
Evan Tschannen 1f3b6e8bdf Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/BlobStore.actor.cpp
#	versions.target
2018-11-27 14:41:46 -08:00
A.J. Beamon 975711c389 Merge branch 'release-6.0' of github.com:apple/foundationdb
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2018-11-27 09:50:39 -08:00
Stephen Atherton 3d68d6b994 Bug fix, clarified a comment. 2018-11-24 18:41:39 -08:00
Stephen Atherton 0610a19e4d Rewrote backup container unit test to use randomness to cover a wider variety of data patterns and edge cases. 2018-11-24 17:24:54 -08:00
Stephen Atherton aa648daabf Compile fix for linux, can't make a move iterator from a const container reference. 2018-11-23 12:49:10 -08:00
Stephen Atherton ec9410492d Changed backup folder scheme to use a much smaller number of unique folders for kv range files, which will speed up list and expire operations. The old scheme is still readable but will no longer be written except in simulation in order to test backward compatibility. 2018-11-23 05:23:56 -08:00
Stephen Atherton 1f2223fcf5 Bug fix in backup expiration. After the range file scan, it was being asserted that the range files found have a version < expiration version but this isn't guaranteed because the expiration version is likely shifted backward a bit after the file scan based on the log files found. 2018-11-15 02:15:25 -08:00
Evan Tschannen e45952bc53 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	tests/BlobStore.txt
#	versions.target
2018-11-13 16:06:39 -08:00
Stephen Atherton 3125b807b3 Added documentation of the 'bucket' URL parameter for blobstore:// backup URLs. 2018-11-13 06:23:58 -08:00
Stephen Atherton 983bd3346a BackupContainerBlobStore no longer uses a hardcoded bucket name. BlobStoreEndpoint creating from a URL string now supports having additional parameters in the URL which it does not consume but rather returns to the caller, and BackupContainerBlobStore uses this to accept a "bucket" parameter. 2018-11-13 03:00:59 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
A.J. Beamon 776b289bfe Move AsyncFileBlobStore and related files to fdbclient. 2018-10-26 13:49:42 -07:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Stephen Atherton 22f8a4efa9 Normalized all unit test names to begin with "/" if they should be included in random unit testing. 2018-10-05 22:09:58 -07:00
Evan Tschannen 3922e477a5 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/LogSystemDiskQueueAdapter.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-10-03 16:57:18 -07:00
Evan Tschannen 3f86905ea7 fix: restore did not take into account that the end version of a log file does not exist in that file. This resulted in restores done at the same version a snapshot completes to not apply the mutations at that final version. 2018-09-21 11:48:28 -07:00
Bhaskar Muppana 920fd3fe97 Merge branch 'release-6.0' 2018-09-06 14:24:02 -07:00
Stephen Atherton cfce27b0f4 Timestamp to Version lookup using Timekeeper data was not lock aware. 2018-09-05 16:16:22 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen 1c29275672 call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details. 2018-08-01 14:30:57 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Stephen Atherton 9c901983f0 Clarity improvement, resetting backup description variable because it's no longer valid due to some of its contents being std::move'd. 2018-03-09 12:03:10 -08:00
Stephen Atherton 3a7288924a Bug fixes. During expiration, the backup container's log range metadata could be updated incorrectly if force was required and not specified or if a backup had no log begin metadata and an expire was done which covered 1 or more log file. In the latter case a backup could be left in a state where the container metadata suggests the backup has more log coverage than it actually does. 2018-03-09 11:29:23 -08:00
Stephen Atherton cb68885328 If backup expiration determines that force is required but the force parameter is not set, it will no longer throw an error unless the backup contains data from prior to the expire_before_version. 2018-03-08 11:27:15 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Stephen Atherton d8879dc3f3 HTTP::doRequest() now reads responses in parallel with sending requests, so if the server responds before receiving all of the the request the client can stop sending the remainder of the request. For PUT requests which upload files, this prevents sending potentially several megabytes of unnecessary bytes if the server responds with an error (such as 429) before the request is completely sent. Updated the backup container unit test to use more parallelism in order to test this new behavior. 2018-02-07 10:38:31 -08:00
Stephen Atherton 2f291d8955 Bug fix in blob backup container deletion. The list/delete loop could end before deleting all of the files, but the index entry would still be deleted. Also preemptively made the same code change in listBucket() - Although it is technically correct as written it is a dangerous style because it is not obvious that the addition of a wait() call in the second 'when' block would create a bug. Consolidated deleteContainer() and deleteBucket() as they differ by only 1 line. 2018-01-29 00:32:41 -08:00
Stephen Atherton 83409fb067 Bug fix, versionFolderString() was not reducing the precision of the number in the output string. Not technically a 'bug' as the scheme will still work but produces an overly deep and sparse folder structure. 2018-01-24 10:29:37 -08:00
Stephen Atherton 40d38880fe Changed version-based folder naming scheme to something simpler, a fixed width 0-padded 19 digit number (the longest a Version can be) with /'s inserted to limit the size of each folder level. Comparisons using these folder names ignore the /'s so any future change to the splitting scheme would still be compatible with the current listing/reading logic. 2018-01-23 15:02:15 -08:00
Stephen Atherton 7db7a51440 Changes to backup folder structure in BackupContainerBlobStore at the top level inside the backup bucket. All data files now live under data/<backup_name> and there is an 'index' at backups/<backup_name> which indicates what backups exist. Backup names can now contain '/' characters. 2018-01-23 11:46:16 -08:00
Stephen Atherton 7f0b7311b9 Corrected function name to timeKeeperVersionFromDatetime(). 'Fdbbackup expire' now allows an expire_before version of 0 if explicitly passed by version or by timestamp. 2018-01-23 00:19:51 -08:00
Stephen Atherton 51a1bd9327 Timekeeper lookup improvements, moved both function declartions to BackupContainer.h. VersionFromEpochs() now uses versions/sec to adjust the lookup result to improve accuracy. Conversions in both directions look for the latest record less than the target conversion value, but failing that they will now fall back on any available data point and adjust from there using versions/sec. 2018-01-22 23:57:01 -08:00
Stephen Atherton f086ba9d9d Improved version to timestamp lookup - if there are no older versioned records in the database then the next available record, if any, will be used to calculate a result. 2018-01-22 22:47:57 -08:00
Stephen Atherton b6dd06d945 Bug fix in version to timestamp conversion. 2018-01-18 02:54:12 -08:00
Stephen Atherton 307e04c0ad Updated backup container unit test to match new safer behavior of expireData(). Rewrote BackupContainerLocalDirectory::deleteContainer() to actually delete the whole directory but only if it appears to be a backup with either log or snapshot data. 2018-01-18 00:36:28 -08:00
Stephen Atherton cdd1e784dc Added yields to writing backup snapshot manifests to avoid slow tasks. 2018-01-17 13:28:56 -08:00
Stephen Atherton f6f0816bc1 Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-01-17 12:12:12 -08:00
Stephen Atherton 8fece71662 Bug fix in backup metadata handling if logEnd becomes less than logBegin, which can happen if an expire is done without logEnd being updated. 2018-01-17 12:12:04 -08:00
Stephen Atherton d7f8fe218a Bug fix, resolving versions to timestamps for use in backup descriptions did not work on a locked database. Added some trace events to AtomicRestore to show progress. 2018-01-17 12:03:19 -08:00
A.J. Beamon 4bfbdbf454 Extract getLocalTime to platform.cpp 2018-01-17 11:35:34 -08:00
Stephen Atherton a6fc30209e Compile fix on windows, localtime_r is not available there. 2018-01-17 09:55:25 -08:00
Stephen Atherton 93b34a945f Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup. 2018-01-17 04:09:43 -08:00
Stephen Atherton 96cb06cbc7 Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations. 2018-01-05 23:06:39 -08:00
Stephen Atherton 96c479dc71 Rare bug fix. It turns out that backup log files must be written with unique names, otherwise a re-written >1 block log file overwritten after a restore has begun could read some blocks from before the rewrite and some blocks after, but due to random content ordering this would be incorrect and produce a corrupt restore. This bug is very rare because restore would detect an error unless the rewritten log file has exactly same size as the original file, but this is unlikely because the random content order affects block padding and therefore usable content bytes per block. 2018-01-02 23:38:01 -08:00
Stephen Atherton 371dee70e6 Improved backup folder structure to be shallower but spread files more uniformly and make each folder's entries lexically sort into version order regardless of numeric length. Improved backup container test to use a random version multiplier on the file versions created in order to test a wide range of versioned folder paths. 2018-01-02 23:22:35 -08:00
Stephen Atherton f2524ffd33 AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description. 2017-12-21 21:15:26 -08:00
Evan Tschannen 95b502e1d7 fix: we did not restore to the target version in all cases 2017-12-21 14:11:44 -08:00
Stephen Atherton e3aee45a74 Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings. 2017-12-21 01:58:15 -08:00
Evan Tschannen c51de3bb88 fixed windows compile issues 2017-12-20 13:48:31 -08:00
Stephen Atherton c1958b335a Compile fix on windows, can't access protected parent class member from static function, apparently. 2017-12-20 12:13:25 -08:00
Stephen Atherton 47a9a7ab0e Finished backup container discovery / listing via base URL. 2017-12-12 17:44:03 -08:00
Stephen Atherton d3b4a81ed0 Blobstore connection details in unit tests now come from environment variables. 2017-12-06 14:38:45 -08:00
Balachandar Namasivayam 1f949240f5 Make fdbbackup s3 compatible.
s3 sends response in XML.  FDB backup expects json response. Added a new libraray xml2json to convert xml to json.
2017-12-05 17:13:15 -08:00
Stephen Atherton 6695c9e6a2 Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files. 2017-11-25 00:46:16 -08:00
Stephen Atherton 9354a8cbb4 Added new backup container method to list everything in a backup. 2017-11-19 04:28:22 -08:00
Stephen Atherton 07c19098fe Improved backup container unit test, added file reading / verification, more data, and a series of expirations and validating the expected result. Then fixed the bugs that this new testing discovered. 2017-11-16 16:19:56 -08:00
Stephen Atherton f105204aca Shifted version distribution over folders. 2017-11-15 23:13:04 -08:00
Stephen Atherton ab0017f023 TaskBucket’s TaskFunc interface now has an optional handleError() which is called on any task that throws an error from execute() or finish(). Restore and Backup tasks use this to ensure that any errors that occur are placed in the backup or restore config’s lastError property. Bug fixes in log and range file encodings. 2017-11-15 13:33:09 -08:00
Stephen Atherton 3dfaf13b67 IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup.  The addition of IBackupFile and its finish() method simplified the log and range writer tasks.  Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes.  Added KeyBackedSet<T> type.  Moved JSONDoc to its own header.  Added platform::findFilesRecursively().

Still to do:  update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00