Stephen Atherton
7f18d59dfe
Bug fix, the blob request attempt count is now incremented for all errors except response code 429.
2018-01-24 01:15:01 -08:00
Stephen Atherton
a2481343ec
Bug fix, HTTP error code 429 was not being considered retryable in blob client (this was previously fixed but apparently reintroduced).
2018-01-24 00:22:11 -08:00
Stephen Atherton
41f80bf7ed
Renamed an error, changed blob request failure to Warn severity.
2017-12-06 15:58:54 -08:00
Stephen Atherton
f050105243
Added HTTP 502 to the list of retryable errors.
2017-11-01 11:41:32 -07:00
Stephen Atherton
9f151314b3
Changed some trace event severities. Also fixed a weird casing of “retryable”.
2017-10-19 17:47:42 -07:00
Stephen Atherton
caad691ae2
Added comments for how to handle HTTP 400 errors gracefully in certain instances should the need arise.
2017-10-18 23:47:59 -07:00
Stephen Atherton
ef84e52127
Improved error handling and memory usage in AsyncFileBlobStoreWrite. Writes will now fail if any upload has already failed, rather than buffering unboundedly until sync() is called to complete the file. There is also a configurable limit on how many uploads can be pending before writes will stall waiting for one to finish.
2017-10-18 05:51:30 -07:00
Stephen Atherton
ebd0234514
Rewrote most error handling in BlobStoreEndpoint to fix several shortcomings in error handling and logging. The request loop now logs but rate limits all errors, and the exceptions thrown are more appropriate. HTTP 503 is now treated as retryable. Callers of BlobStoreEndpoint::doRequest() now specify which codes they consider to be successful so that more error handling can take place in the main request loop.
2017-10-18 02:52:09 -07:00
Alvin Moore
25513d8e2c
Added tests for DataCenter kills
2017-10-02 12:04:28 -07:00
Stephen Atherton
058300be16
Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down.
2017-10-01 16:17:38 -07:00
Stephen Atherton
a95107417f
Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in.
2017-10-01 16:01:24 -07:00
Stephen Atherton
a098919b20
Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed.
2017-10-01 11:25:50 -07:00
Stephen Atherton
af87ac301d
Removed wait never used for debugging which was accidentally included in bug fix.
2017-10-01 11:19:38 -07:00
Stephen Atherton
6000cafde1
Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser.
2017-10-01 10:46:55 -07:00
Evan Tschannen
f84e7252e8
fix: there was a reference counting cycle in asyncFileBlobStore and asyncFileReadAhead
2017-09-29 19:13:08 -07:00
A.J. Beamon
38616424f6
Report a couple error cases in blobstore URL parsing when dealing with numbers.
2017-09-29 17:58:49 -07:00
Evan Tschannen
a1f8b546e6
fix: ensure connections to blob store are evenly distributed across network addresses
...
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
Alvin Moore
02525d7b14
Added TESTs to ensure that all of the different kills are performed during simulation
2017-09-26 11:15:39 -07:00
Alvin Moore
4a6fb10a42
Added TraceEvents for remaining and killed workers when killing DataCenter
...
Fixed consideration of excluded workers when checking cluster availability
2017-09-12 13:33:13 -07:00
Alvin Moore
0994587573
Fixed OS X compilation build warnings due to printf field specifiers
2017-09-01 09:35:56 -07:00
Alvin Moore
44e0df78c5
Added support for tracking roles for simulation workers
...
Fixed the exclusion and inclusion address simulation API and integration within workloads
Added more information within trace events for simulation
2017-08-28 11:25:37 -07:00
Evan Tschannen
272b4b984c
fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine
2017-08-25 10:12:58 -07:00
Alvin Moore
0c1be7537c
Fixed OSX compilation warning about printf field value specification
2017-08-24 12:30:38 -07:00
Alvin Moore
17c6392295
Added support for printing out information on the current simulation workers
2017-08-22 16:56:33 -07:00
A.J. Beamon
311d0e3815
Remove outdated comment from incrementalDelete function.
2017-07-26 15:27:37 -07:00
A.J. Beamon
d8acb11200
Remove the change that waits only for unlinking; call delete on the file even if it doesn't exist.
2017-07-26 15:25:49 -07:00
A.J. Beamon
d8e308c18f
Enable use of incremental delete when deleting disk queue and sqlite KVS sqlite files.
2017-07-26 14:11:11 -07:00
A.J. Beamon
a7fbc56a8e
Checksums computed on pages with partially undefined contents are still valid, so mark them as such for valgrind purposes.
2017-07-17 09:34:04 -07:00
Stephen Atherton
39ff1b3c52
Bug fix, when io_timeouts are enabled in warn only mode they weren’t being logged at all.
2017-07-05 14:43:10 -07:00
Stephen Atherton
028fb75f88
Added last write timestamp to lost write detector class. Renamed TraceEvent for lost writes detected since it is no longer part of the KAIO class specifically.
2017-06-29 15:11:11 -07:00
Alvin Moore
6d19580789
Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0
...
# Conflicts:
# fdbrpc/simulator.h
2017-06-19 17:39:37 -07:00
Alvin Moore
9553458b78
Updated simulation to support managing exclusion and inclusion address
...
Added method for identifying acceptable availability process classes
Extended cluster availability function to ensure coordinators can be auto configured
Fixed availability function to allow protected processes to be considered as dead if not available
Added debug trace events for providing machine state when considering availability
Added trace event for protected coordinators
2017-06-19 16:48:15 -07:00
Stephen Atherton
0e638e7ea2
Merge branch 'release-4.6' into release-5.0
2017-06-18 23:25:17 -07:00
Stephen Atherton
430bb6224e
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/Net2FileSystem.cpp
# fdbrpc/sim2.actor.cpp
2017-06-16 02:14:19 -07:00
Stephen Atherton
f405c8d88e
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/optimisttest.actor.cpp
# versions.target
2017-06-15 17:40:19 -07:00
Evan Tschannen
cdd64ebc15
fix: asyncFileNonDurable could never complete deleting a file in rare situations
2017-06-15 13:30:15 -07:00
Evan Tschannen
4bdcd8fc12
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# bindings/bindingtester/run_binding_tester.sh
# fdbrpc/AsyncFileKAIO.actor.h
2017-06-14 16:43:53 -07:00
Stephen Atherton
fa4fdb1f1d
Merge branch 'fix-io-timeout-handling' into release-5.0
...
# Conflicts:
# fdbserver/optimisttest.actor.cpp
2017-05-31 17:03:15 -07:00
Stephen Atherton
98604d33a0
Merge branch 'fix-io-timeout-handling'
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/KeyValueStoreSQLite.actor.cpp
# fdbserver/optimisttest.actor.cpp
# fdbserver/worker.actor.cpp
# fdbserver/workloads/MachineAttrition.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
Alvin Moore
b28ed397a2
Fixed printf field width specifier to reduce compilation warnings within OS X
2017-05-26 14:51:34 -07:00
Alvin Moore
0b9ed67e12
Fixed support for RemoveServers Workload
...
Added availability functions to simulation
2017-05-26 14:20:11 -07:00
Alvin Moore
16cc0821b1
Removed dead machine option from simulation
2017-05-25 16:29:02 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00