Commit Graph

1575 Commits

Author SHA1 Message Date
Alvin Moore 5257b99d3f Fixed problem with machines RebootedAndCleared not being considered dead in availability consideration 2017-10-03 10:48:16 -07:00
Alvin Moore d099656557 Merge branch 'release-5.0' 2017-10-02 12:05:24 -07:00
Alvin Moore 25513d8e2c Added tests for DataCenter kills 2017-10-02 12:04:28 -07:00
Evan Tschannen 6ea9903c82 Merge branch 'release-5.0'
# Conflicts:
#	fdbbackup/backup.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	versions.target
2017-10-01 18:46:44 -07:00
Stephen Atherton 058300be16 Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down. 2017-10-01 16:17:38 -07:00
Stephen Atherton a95107417f Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in. 2017-10-01 16:01:24 -07:00
Stephen Atherton a098919b20 Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed. 2017-10-01 11:25:50 -07:00
Stephen Atherton af87ac301d Removed wait never used for debugging which was accidentally included in bug fix. 2017-10-01 11:19:38 -07:00
Stephen Atherton 6000cafde1 Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser. 2017-10-01 10:46:55 -07:00
Evan Tschannen f84e7252e8 fix: there was a reference counting cycle in asyncFileBlobStore and asyncFileReadAhead 2017-09-29 19:13:08 -07:00
A.J. Beamon 38616424f6 Report a couple error cases in blobstore URL parsing when dealing with numbers. 2017-09-29 17:58:49 -07:00
Alex Miller c40c1bb5fe Add a new workload: BackupToDBAbort, which does an ACI switchover.
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Evan Tschannen a1f8b546e6 fix: ensure connections to blob store are evenly distributed across network addresses
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
A.J. Beamon d30c730f75 Add the ability to access name and description in Error. Update error descriptions. 2017-09-28 12:35:03 -07:00
Alvin Moore 298b54104e Merge branch 'release-5.0' 2017-09-26 11:16:14 -07:00
Alvin Moore 02525d7b14 Added TESTs to ensure that all of the different kills are performed during simulation 2017-09-26 11:15:39 -07:00
Stephen Atherton 1ca9814879 Bug (arguable, perhaps) fix in AsyncFileCached. Order was not being enforced between writes and truncates such that calling and waiting on a truncate to X and then writing to X + 1 could end up writing first and then truncating the written page off of the file. 2017-09-20 17:58:56 -07:00
Evan Tschannen e8b895c878 added the ability to disable connection failures for a period of time after one happens 2017-09-18 12:46:29 -07:00
Evan Tschannen 8cb53fd608 Merge pull request #149 from cie/choose-leader-on-stateless-processes
choose leader on the perferred process class
2017-09-13 13:58:49 -07:00
Alvin Moore b1dd2ac6fe Merge branch 'release-5.0' 2017-09-12 13:34:28 -07:00
Alvin Moore 4a6fb10a42 Added TraceEvents for remaining and killed workers when killing DataCenter
Fixed consideration of excluded workers when checking cluster availability
2017-09-12 13:33:13 -07:00
Evan Tschannen 76e7988663 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.h
#	flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Evan Tschannen ea26bc1c43 passed first tests which kill entire datacenters
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
Evan Tschannen 6f6dbe4b33 fix: load balance will still use second requests when client locality is present 2017-09-01 11:14:18 -07:00
Alvin Moore 0994587573 Fixed OS X compilation build warnings due to printf field specifiers 2017-09-01 09:35:56 -07:00
Alvin Moore fd439e9d1c Fixed OS X compilation build warnings due to printf field type specifiers 2017-09-01 09:34:53 -07:00
Stephen Atherton 6e9de8f35a Bug fix. eraseDirectoryRecursive() on MacOS used to do nothing at all, but now it erases directories recursively. The Linux version was modified to be simpler and use a version of the FTW API that also works on MacOS. 2017-08-31 00:11:18 -07:00
A.J. Beamon 9a0a3b6329 Merge commit '66528becb82d826e81fa644bb378212584ab580e' 2017-08-28 16:47:59 -07:00
Yichi Chiang 9fe927127f choose leader on the perferred process class 2017-08-28 14:41:04 -07:00
Alvin Moore 44e0df78c5 Added support for tracking roles for simulation workers
Fixed the exclusion and inclusion address simulation API and integration within workloads
Added more information within trace events for simulation
2017-08-28 11:25:37 -07:00
Alec Grieser 300b5a17ed Merge branch 'release-5.0' 2017-08-25 18:55:33 -07:00
Evan Tschannen 272b4b984c fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine 2017-08-25 10:12:58 -07:00
A.J. Beamon 45c0585891 Merge branch 'release-5.0' 2017-08-24 14:48:47 -07:00
Alvin Moore 0c1be7537c Fixed OSX compilation warning about printf field value specification 2017-08-24 12:30:38 -07:00
Alec Grieser 2b678f6e91 Merge remote-tracking branch 'origin/release-5.0' 2017-08-23 10:24:23 -07:00
Alvin Moore 17c6392295 Added support for printing out information on the current simulation workers 2017-08-22 16:56:33 -07:00
A.J. Beamon 41c90bcdea Merge commit '89ac94853c70d08289e7fb58055bc5d0cd4e494d' 2017-07-26 15:35:36 -07:00
A.J. Beamon 311d0e3815 Remove outdated comment from incrementalDelete function. 2017-07-26 15:27:37 -07:00
A.J. Beamon d8acb11200 Remove the change that waits only for unlinking; call delete on the file even if it doesn't exist. 2017-07-26 15:25:49 -07:00
A.J. Beamon d8e308c18f Enable use of incremental delete when deleting disk queue and sqlite KVS sqlite files. 2017-07-26 14:11:11 -07:00
Evan Tschannen 64e9560599 Merge pull request #128 from cie/maintain-incompatible-connections
Maintain incompatible connections
2017-07-17 16:28:22 -07:00
A.J. Beamon 2113d47db6 Update protocol version for incompatible connection change 2017-07-17 16:16:05 -07:00
A.J. Beamon 23c2946fa3 Rename some trace events surrounding connections 2017-07-17 16:15:18 -07:00
A.J. Beamon 591d98f711 Update the incompatible version behavior change protocol version check and add a note that we'll need to appropriately set the version at merge time. 2017-07-17 11:00:45 -07:00
A.J. Beamon 650c6ff399 Merge branch 'release-5.0' into maintain-incompatible-connections 2017-07-17 10:40:36 -07:00
A.J. Beamon 9493f8f78c Merge branch 'release-5.0' 2017-07-17 09:34:37 -07:00
A.J. Beamon a7fbc56a8e Checksums computed on pages with partially undefined contents are still valid, so mark them as such for valgrind purposes. 2017-07-17 09:34:04 -07:00
Alec Grieser f75b6f333b Merge branch 'release-5.0' 2017-07-13 11:21:18 -07:00
Stephen Atherton 39ff1b3c52 Bug fix, when io_timeouts are enabled in warn only mode they weren’t being logged at all. 2017-07-05 14:43:10 -07:00
Stephen Atherton 1b1a0d27e2 Merge branch 'release-5.0'
# Conflicts:
#	versions.target
2017-06-29 15:58:04 -07:00
Stephen Atherton 028fb75f88 Added last write timestamp to lost write detector class. Renamed TraceEvent for lost writes detected since it is no longer part of the KAIO class specifically. 2017-06-29 15:11:11 -07:00
Alec Grieser 9bcdfe4ddb removed undefined behavior surrounding TLS logging 2017-06-28 14:23:53 -07:00
Alec Grieser 94bce335e7 Merge branch 'release-5.0' 2017-06-19 17:51:10 -07:00
Alvin Moore 6d19580789 Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0
# Conflicts:
#	fdbrpc/simulator.h
2017-06-19 17:39:37 -07:00
Alvin Moore 9553458b78 Updated simulation to support managing exclusion and inclusion address
Added method for identifying acceptable availability process classes
Extended cluster availability function to ensure coordinators can be auto configured
Fixed availability function to allow protected processes to be considered as dead if not available
Added debug trace events for providing machine state when considering availability
Added trace event for protected coordinators
2017-06-19 16:48:15 -07:00
Stephen Atherton 5d13d845a7 Merge branch 'release-5.0' 2017-06-18 23:25:29 -07:00
Stephen Atherton 0e638e7ea2 Merge branch 'release-4.6' into release-5.0 2017-06-18 23:25:17 -07:00
Stephen Atherton 6d9e302487 Merge branch 'release-5.0' 2017-06-16 02:14:34 -07:00
Stephen Atherton 430bb6224e Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/Net2FileSystem.cpp
#	fdbrpc/sim2.actor.cpp
2017-06-16 02:14:19 -07:00
Stephen Atherton 1c94e30e64 Merge branch 'release-5.0' 2017-06-15 17:40:40 -07:00
Stephen Atherton f405c8d88e Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/sim2.actor.cpp
#	fdbserver/optimisttest.actor.cpp
#	versions.target
2017-06-15 17:40:19 -07:00
Evan Tschannen cdd64ebc15 fix: asyncFileNonDurable could never complete deleting a file in rare situations 2017-06-15 13:30:15 -07:00
Evan Tschannen afdc125db9 Merge branch 'release-5.0' 2017-06-14 16:44:23 -07:00
Evan Tschannen 4bdcd8fc12 Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	bindings/bindingtester/run_binding_tester.sh
#	fdbrpc/AsyncFileKAIO.actor.h
2017-06-14 16:43:53 -07:00
Yichi Chiang 02ee6d8cd1 Change checksum enabled condition 2017-06-13 11:03:25 -07:00
Stephen Atherton e318aabe55 Merge branch 'release-5.0' 2017-05-31 17:10:48 -07:00
Stephen Atherton fa4fdb1f1d Merge branch 'fix-io-timeout-handling' into release-5.0
# Conflicts:
#	fdbserver/optimisttest.actor.cpp
2017-05-31 17:03:15 -07:00
Yichi Chiang 41d9bce2d7 Merge pull request #115 from cie/checksum-off-with-tls
Disable checksum when TLS is enabled
2017-05-30 11:43:53 -07:00
Stephen Atherton 98604d33a0 Merge branch 'fix-io-timeout-handling'
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/sim2.actor.cpp
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/optimisttest.actor.cpp
#	fdbserver/worker.actor.cpp
#	fdbserver/workloads/MachineAttrition.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
Stephen Atherton 7260e38545 Merge branch 'fix-io-timeout-handling'
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/sim2.actor.cpp
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/optimisttest.actor.cpp
#	fdbserver/worker.actor.cpp
#	fdbserver/workloads/MachineAttrition.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 17:43:28 -07:00
Yichi Chiang d2ad46680c Disable checksum when TLS is enabled 2017-05-26 15:34:40 -07:00
Alvin Moore b28ed397a2 Fixed printf field width specifier to reduce compilation warnings within OS X 2017-05-26 14:51:34 -07:00
Alvin Moore 0b9ed67e12 Fixed support for RemoveServers Workload
Added availability functions to simulation
2017-05-26 14:20:11 -07:00
Alvin Moore 16cc0821b1 Removed dead machine option from simulation 2017-05-25 16:29:02 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00