Stephen Atherton
69425a303b
Improved error handling for cases where blob account credentials are either not found in the provided credentials sources and/or some of the credentials sources provided are not readable or parseable.
2018-02-07 21:50:43 -08:00
Stephen Atherton
f8522248cb
Blob credentials files were being opened in read-write mode despite the read-only option being specified because the underlying caching layer opens always opens files for read/write access. For now, disabled caching for this file.
2018-02-07 16:25:16 -08:00
Stephen Atherton
2f291d8955
Bug fix in blob backup container deletion. The list/delete loop could end before deleting all of the files, but the index entry would still be deleted. Also preemptively made the same code change in listBucket() - Although it is technically correct as written it is a dangerous style because it is not obvious that the addition of a wait() call in the second 'when' block would create a bug. Consolidated deleteContainer() and deleteBucket() as they differ by only 1 line.
2018-01-29 00:32:41 -08:00
Stephen Atherton
9fd2a8df3d
Tweaked a trace event suppression time.
2018-01-24 19:08:24 -08:00
A.J. Beamon
19ed388c0e
Merge branch 'release-5.0' into release-5.1
...
# Conflicts:
# documentation/sphinx/source/downloads.rst
# documentation/sphinx/source/release-notes.rst
# versions.target
2018-01-24 14:43:41 -08:00
Stephen Atherton
7f18d59dfe
Bug fix, the blob request attempt count is now incremented for all errors except response code 429.
2018-01-24 01:15:01 -08:00
Stephen Atherton
a2481343ec
Bug fix, HTTP error code 429 was not being considered retryable in blob client (this was previously fixed but apparently reintroduced).
2018-01-24 00:22:11 -08:00
Stephen Atherton
66de9d392b
New error code, http_auth_failed, which is used when blob authentication fails instead of the previous generic http_request_failed.
2018-01-22 14:58:56 -08:00
Stephen Atherton
93b34a945f
Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup.
2018-01-17 04:09:43 -08:00
Stephen Atherton
0e7d538c94
Bug fix, in recursive blob folder listings the recent removal of common prefixes from the result stream caused the list marker to not be set correctly when a folder level requires multiple requests due to folder size.
2018-01-06 20:58:48 -08:00
Stephen Atherton
96cb06cbc7
Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations.
2018-01-05 23:06:39 -08:00
Stephen Atherton
78430425e8
Blob bucket listings will now use parallel recursive requests on CommonPrefixes, up to a max depth, if a delimiter is provided.
2018-01-02 23:17:52 -08:00
Stephen Atherton
07fde9dfb4
Bug fix, error code 429 was not being treated as retryable in the recent refactor.
2018-01-02 23:15:25 -08:00
Stephen Atherton
f324afc13f
Bug fix in blob store listing when it requires multiple serial requests Added more trace events to FileBackup and BlobStoreEndpoint with suppression and added suppression to existing trace events.
2017-12-22 17:08:25 -08:00
Stephen Atherton
f2524ffd33
AsyncFileBlobStoreWrite was prohibiting the writing of 0-byte files. Improved HTTP verbose logging to stdout. Added writing a 0-byte file to BackupContainer unit test. Added backup log and snapshot sizes to backup description.
2017-12-21 21:15:26 -08:00
Stephen Atherton
e0ef5a9a20
Whitespace normalization.
2017-12-21 12:07:29 -08:00
Stephen Atherton
e3aee45a74
Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings.
2017-12-21 01:58:15 -08:00
Stephen Atherton
b6cfe010a1
Bug fix in URL encoding of delimiter.
2017-12-12 17:31:19 -08:00
Stephen Atherton
41f80bf7ed
Renamed an error, changed blob request failure to Warn severity.
2017-12-06 15:58:54 -08:00
Stephen Atherton
4bc7d0b86a
Updated error names and severities.
2017-12-06 15:42:44 -08:00
Balachandar Namasivayam
1f949240f5
Make fdbbackup s3 compatible.
...
s3 sends response in XML. FDB backup expects json response. Added a new libraray xml2json to convert xml to json.
2017-12-05 17:13:15 -08:00
Stephen Atherton
86ae6c09c7
Bug fixes, take(1) is incorrect usage of FlowLock.
2017-12-04 10:20:50 -08:00
Stephen Atherton
1e643239f9
Improvement in blob connnection reuse, oldest connnections in pool are now used first.
2017-11-30 12:57:29 -08:00
Stephen Atherton
a77162b53d
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/BackupAgent.h
# fdbclient/FileBackupAgent.actor.cpp
# fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton
3dfaf13b67
IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
...
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively().
Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
John Brownlee
d46e240de2
Merge branch 'release-5.0'
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# versions.target
2017-11-02 10:42:30 -07:00
Stephen Atherton
f050105243
Added HTTP 502 to the list of retryable errors.
2017-11-01 11:41:32 -07:00
Stephen Atherton
45fa3680fa
Restore logging of remote address (if connected) or host (if connection fails) for blob errors.
2017-10-20 21:47:23 -07:00
Stephen Atherton
3afc85881e
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbrpc/BlobStore.actor.cpp
2017-10-20 21:38:28 -07:00
Stephen Atherton
42955012e9
Merge branch 'release-5.0'
...
# Conflicts:
# fdbrpc/BlobStore.actor.cpp
# flow/error_definitions.h
2017-10-20 21:16:55 -07:00
Stephen Atherton
9f151314b3
Changed some trace event severities. Also fixed a weird casing of “retryable”.
2017-10-19 17:47:42 -07:00
Stephen Atherton
caad691ae2
Added comments for how to handle HTTP 400 errors gracefully in certain instances should the need arise.
2017-10-18 23:47:59 -07:00
Stephen Atherton
ef84e52127
Improved error handling and memory usage in AsyncFileBlobStoreWrite. Writes will now fail if any upload has already failed, rather than buffering unboundedly until sync() is called to complete the file. There is also a configurable limit on how many uploads can be pending before writes will stall waiting for one to finish.
2017-10-18 05:51:30 -07:00
Stephen Atherton
ebd0234514
Rewrote most error handling in BlobStoreEndpoint to fix several shortcomings in error handling and logging. The request loop now logs but rate limits all errors, and the exceptions thrown are more appropriate. HTTP 503 is now treated as retryable. Callers of BlobStoreEndpoint::doRequest() now specify which codes they consider to be successful so that more error handling can take place in the main request loop.
2017-10-18 02:52:09 -07:00
Stephen Atherton
e934604f67
Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random.
...
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.
2017-10-15 21:51:11 -07:00
Stephen Atherton
fd5fe3a000
Add slightly better handling of HTTP 503 in blob client. Previously it would end the blob request loop and the task doing the blob action would see a failure, but now the blob request attempt loop will continue to back off and retry. This is better because previously the task that saw the failure would be re-run quickly.
2017-10-03 15:25:49 -07:00
Stephen Atherton
03c4cea511
Added rate-controlled TraceEvents for blob http connection attempts and failures.
2017-10-03 15:21:40 -07:00
Stephen Atherton
058300be16
Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down.
2017-10-01 16:17:38 -07:00
Stephen Atherton
a95107417f
Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in.
2017-10-01 16:01:24 -07:00
Stephen Atherton
a098919b20
Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed.
2017-10-01 11:25:50 -07:00
Stephen Atherton
af87ac301d
Removed wait never used for debugging which was accidentally included in bug fix.
2017-10-01 11:19:38 -07:00
Stephen Atherton
6000cafde1
Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser.
2017-10-01 10:46:55 -07:00
A.J. Beamon
38616424f6
Report a couple error cases in blobstore URL parsing when dealing with numbers.
2017-09-29 17:58:49 -07:00
Evan Tschannen
a1f8b546e6
fix: ensure connections to blob store are evenly distributed across network addresses
...
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00