Commit Graph

43 Commits

Author SHA1 Message Date
Stephen Atherton 7f18d59dfe Bug fix, the blob request attempt count is now incremented for all errors except response code 429. 2018-01-24 01:15:01 -08:00
Stephen Atherton a2481343ec Bug fix, HTTP error code 429 was not being considered retryable in blob client (this was previously fixed but apparently reintroduced). 2018-01-24 00:22:11 -08:00
Stephen Atherton 41f80bf7ed Renamed an error, changed blob request failure to Warn severity. 2017-12-06 15:58:54 -08:00
Stephen Atherton f050105243 Added HTTP 502 to the list of retryable errors. 2017-11-01 11:41:32 -07:00
Stephen Atherton 9f151314b3 Changed some trace event severities. Also fixed a weird casing of “retryable”. 2017-10-19 17:47:42 -07:00
Stephen Atherton caad691ae2 Added comments for how to handle HTTP 400 errors gracefully in certain instances should the need arise. 2017-10-18 23:47:59 -07:00
Stephen Atherton ef84e52127 Improved error handling and memory usage in AsyncFileBlobStoreWrite. Writes will now fail if any upload has already failed, rather than buffering unboundedly until sync() is called to complete the file. There is also a configurable limit on how many uploads can be pending before writes will stall waiting for one to finish. 2017-10-18 05:51:30 -07:00
Stephen Atherton ebd0234514 Rewrote most error handling in BlobStoreEndpoint to fix several shortcomings in error handling and logging. The request loop now logs but rate limits all errors, and the exceptions thrown are more appropriate. HTTP 503 is now treated as retryable. Callers of BlobStoreEndpoint::doRequest() now specify which codes they consider to be successful so that more error handling can take place in the main request loop. 2017-10-18 02:52:09 -07:00
Alvin Moore 25513d8e2c Added tests for DataCenter kills 2017-10-02 12:04:28 -07:00
Stephen Atherton 058300be16 Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down. 2017-10-01 16:17:38 -07:00
Stephen Atherton a95107417f Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in. 2017-10-01 16:01:24 -07:00
Stephen Atherton a098919b20 Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed. 2017-10-01 11:25:50 -07:00
Stephen Atherton af87ac301d Removed wait never used for debugging which was accidentally included in bug fix. 2017-10-01 11:19:38 -07:00
Stephen Atherton 6000cafde1 Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser. 2017-10-01 10:46:55 -07:00
Evan Tschannen f84e7252e8 fix: there was a reference counting cycle in asyncFileBlobStore and asyncFileReadAhead 2017-09-29 19:13:08 -07:00
A.J. Beamon 38616424f6 Report a couple error cases in blobstore URL parsing when dealing with numbers. 2017-09-29 17:58:49 -07:00
Evan Tschannen a1f8b546e6 fix: ensure connections to blob store are evenly distributed across network addresses
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
Alvin Moore 02525d7b14 Added TESTs to ensure that all of the different kills are performed during simulation 2017-09-26 11:15:39 -07:00
Alvin Moore 4a6fb10a42 Added TraceEvents for remaining and killed workers when killing DataCenter
Fixed consideration of excluded workers when checking cluster availability
2017-09-12 13:33:13 -07:00
Alvin Moore 0994587573 Fixed OS X compilation build warnings due to printf field specifiers 2017-09-01 09:35:56 -07:00
Alvin Moore 44e0df78c5 Added support for tracking roles for simulation workers
Fixed the exclusion and inclusion address simulation API and integration within workloads
Added more information within trace events for simulation
2017-08-28 11:25:37 -07:00
Evan Tschannen 272b4b984c fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine 2017-08-25 10:12:58 -07:00
Alvin Moore 0c1be7537c Fixed OSX compilation warning about printf field value specification 2017-08-24 12:30:38 -07:00
Alvin Moore 17c6392295 Added support for printing out information on the current simulation workers 2017-08-22 16:56:33 -07:00
A.J. Beamon 311d0e3815 Remove outdated comment from incrementalDelete function. 2017-07-26 15:27:37 -07:00
A.J. Beamon d8acb11200 Remove the change that waits only for unlinking; call delete on the file even if it doesn't exist. 2017-07-26 15:25:49 -07:00
A.J. Beamon d8e308c18f Enable use of incremental delete when deleting disk queue and sqlite KVS sqlite files. 2017-07-26 14:11:11 -07:00
A.J. Beamon a7fbc56a8e Checksums computed on pages with partially undefined contents are still valid, so mark them as such for valgrind purposes. 2017-07-17 09:34:04 -07:00
Stephen Atherton 39ff1b3c52 Bug fix, when io_timeouts are enabled in warn only mode they weren’t being logged at all. 2017-07-05 14:43:10 -07:00
Stephen Atherton 028fb75f88 Added last write timestamp to lost write detector class. Renamed TraceEvent for lost writes detected since it is no longer part of the KAIO class specifically. 2017-06-29 15:11:11 -07:00
Alvin Moore 6d19580789 Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0
# Conflicts:
#	fdbrpc/simulator.h
2017-06-19 17:39:37 -07:00
Alvin Moore 9553458b78 Updated simulation to support managing exclusion and inclusion address
Added method for identifying acceptable availability process classes
Extended cluster availability function to ensure coordinators can be auto configured
Fixed availability function to allow protected processes to be considered as dead if not available
Added debug trace events for providing machine state when considering availability
Added trace event for protected coordinators
2017-06-19 16:48:15 -07:00
Stephen Atherton 0e638e7ea2 Merge branch 'release-4.6' into release-5.0 2017-06-18 23:25:17 -07:00
Stephen Atherton 430bb6224e Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/Net2FileSystem.cpp
#	fdbrpc/sim2.actor.cpp
2017-06-16 02:14:19 -07:00
Stephen Atherton f405c8d88e Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/sim2.actor.cpp
#	fdbserver/optimisttest.actor.cpp
#	versions.target
2017-06-15 17:40:19 -07:00
Evan Tschannen cdd64ebc15 fix: asyncFileNonDurable could never complete deleting a file in rare situations 2017-06-15 13:30:15 -07:00
Evan Tschannen 4bdcd8fc12 Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	bindings/bindingtester/run_binding_tester.sh
#	fdbrpc/AsyncFileKAIO.actor.h
2017-06-14 16:43:53 -07:00
Stephen Atherton fa4fdb1f1d Merge branch 'fix-io-timeout-handling' into release-5.0
# Conflicts:
#	fdbserver/optimisttest.actor.cpp
2017-05-31 17:03:15 -07:00
Stephen Atherton 98604d33a0 Merge branch 'fix-io-timeout-handling'
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/sim2.actor.cpp
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/optimisttest.actor.cpp
#	fdbserver/worker.actor.cpp
#	fdbserver/workloads/MachineAttrition.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
Alvin Moore b28ed397a2 Fixed printf field width specifier to reduce compilation warnings within OS X 2017-05-26 14:51:34 -07:00
Alvin Moore 0b9ed67e12 Fixed support for RemoveServers Workload
Added availability functions to simulation
2017-05-26 14:20:11 -07:00
Alvin Moore 16cc0821b1 Removed dead machine option from simulation 2017-05-25 16:29:02 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00