Alvin Moore
5257b99d3f
Fixed problem with machines RebootedAndCleared not being considered dead in availability consideration
2017-10-03 10:48:16 -07:00
Alvin Moore
d099656557
Merge branch 'release-5.0'
2017-10-02 12:05:24 -07:00
Alvin Moore
25513d8e2c
Added tests for DataCenter kills
2017-10-02 12:04:28 -07:00
Evan Tschannen
6ea9903c82
Merge branch 'release-5.0'
...
# Conflicts:
# fdbbackup/backup.actor.cpp
# fdbserver/ClusterController.actor.cpp
# versions.target
2017-10-01 18:46:44 -07:00
Stephen Atherton
058300be16
Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down.
2017-10-01 16:17:38 -07:00
Stephen Atherton
a95107417f
Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in.
2017-10-01 16:01:24 -07:00
Stephen Atherton
a098919b20
Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed.
2017-10-01 11:25:50 -07:00
Stephen Atherton
af87ac301d
Removed wait never used for debugging which was accidentally included in bug fix.
2017-10-01 11:19:38 -07:00
Stephen Atherton
6000cafde1
Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser.
2017-10-01 10:46:55 -07:00
Evan Tschannen
f84e7252e8
fix: there was a reference counting cycle in asyncFileBlobStore and asyncFileReadAhead
2017-09-29 19:13:08 -07:00
A.J. Beamon
38616424f6
Report a couple error cases in blobstore URL parsing when dealing with numbers.
2017-09-29 17:58:49 -07:00
Alex Miller
c40c1bb5fe
Add a new workload: BackupToDBAbort, which does an ACI switchover.
...
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Evan Tschannen
a1f8b546e6
fix: ensure connections to blob store are evenly distributed across network addresses
...
added a per address limit to the number of open connections
lowered a variety of knobs to prevent us from using too much memory
2017-09-29 14:59:24 -07:00
A.J. Beamon
d30c730f75
Add the ability to access name and description in Error. Update error descriptions.
2017-09-28 12:35:03 -07:00
Alvin Moore
298b54104e
Merge branch 'release-5.0'
2017-09-26 11:16:14 -07:00
Alvin Moore
02525d7b14
Added TESTs to ensure that all of the different kills are performed during simulation
2017-09-26 11:15:39 -07:00
Stephen Atherton
1ca9814879
Bug (arguable, perhaps) fix in AsyncFileCached. Order was not being enforced between writes and truncates such that calling and waiting on a truncate to X and then writing to X + 1 could end up writing first and then truncating the written page off of the file.
2017-09-20 17:58:56 -07:00
Evan Tschannen
e8b895c878
added the ability to disable connection failures for a period of time after one happens
2017-09-18 12:46:29 -07:00
Evan Tschannen
8cb53fd608
Merge pull request #149 from cie/choose-leader-on-stateless-processes
...
choose leader on the perferred process class
2017-09-13 13:58:49 -07:00
Alvin Moore
b1dd2ac6fe
Merge branch 'release-5.0'
2017-09-12 13:34:28 -07:00
Alvin Moore
4a6fb10a42
Added TraceEvents for remaining and killed workers when killing DataCenter
...
Fixed consideration of excluded workers when checking cluster availability
2017-09-12 13:33:13 -07:00
Evan Tschannen
76e7988663
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.h
# flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Evan Tschannen
ea26bc1c43
passed first tests which kill entire datacenters
...
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
Evan Tschannen
6f6dbe4b33
fix: load balance will still use second requests when client locality is present
2017-09-01 11:14:18 -07:00
Alvin Moore
0994587573
Fixed OS X compilation build warnings due to printf field specifiers
2017-09-01 09:35:56 -07:00
Alvin Moore
fd439e9d1c
Fixed OS X compilation build warnings due to printf field type specifiers
2017-09-01 09:34:53 -07:00
Stephen Atherton
6e9de8f35a
Bug fix. eraseDirectoryRecursive() on MacOS used to do nothing at all, but now it erases directories recursively. The Linux version was modified to be simpler and use a version of the FTW API that also works on MacOS.
2017-08-31 00:11:18 -07:00
A.J. Beamon
9a0a3b6329
Merge commit '66528becb82d826e81fa644bb378212584ab580e'
2017-08-28 16:47:59 -07:00
Yichi Chiang
9fe927127f
choose leader on the perferred process class
2017-08-28 14:41:04 -07:00
Alvin Moore
44e0df78c5
Added support for tracking roles for simulation workers
...
Fixed the exclusion and inclusion address simulation API and integration within workloads
Added more information within trace events for simulation
2017-08-28 11:25:37 -07:00
Alec Grieser
300b5a17ed
Merge branch 'release-5.0'
2017-08-25 18:55:33 -07:00
Evan Tschannen
272b4b984c
fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine
2017-08-25 10:12:58 -07:00
A.J. Beamon
45c0585891
Merge branch 'release-5.0'
2017-08-24 14:48:47 -07:00
Alvin Moore
0c1be7537c
Fixed OSX compilation warning about printf field value specification
2017-08-24 12:30:38 -07:00
Alec Grieser
2b678f6e91
Merge remote-tracking branch 'origin/release-5.0'
2017-08-23 10:24:23 -07:00
Alvin Moore
17c6392295
Added support for printing out information on the current simulation workers
2017-08-22 16:56:33 -07:00
A.J. Beamon
41c90bcdea
Merge commit '89ac94853c70d08289e7fb58055bc5d0cd4e494d'
2017-07-26 15:35:36 -07:00
A.J. Beamon
311d0e3815
Remove outdated comment from incrementalDelete function.
2017-07-26 15:27:37 -07:00
A.J. Beamon
d8acb11200
Remove the change that waits only for unlinking; call delete on the file even if it doesn't exist.
2017-07-26 15:25:49 -07:00
A.J. Beamon
d8e308c18f
Enable use of incremental delete when deleting disk queue and sqlite KVS sqlite files.
2017-07-26 14:11:11 -07:00
Evan Tschannen
64e9560599
Merge pull request #128 from cie/maintain-incompatible-connections
...
Maintain incompatible connections
2017-07-17 16:28:22 -07:00
A.J. Beamon
2113d47db6
Update protocol version for incompatible connection change
2017-07-17 16:16:05 -07:00
A.J. Beamon
23c2946fa3
Rename some trace events surrounding connections
2017-07-17 16:15:18 -07:00
A.J. Beamon
591d98f711
Update the incompatible version behavior change protocol version check and add a note that we'll need to appropriately set the version at merge time.
2017-07-17 11:00:45 -07:00
A.J. Beamon
650c6ff399
Merge branch 'release-5.0' into maintain-incompatible-connections
2017-07-17 10:40:36 -07:00
A.J. Beamon
9493f8f78c
Merge branch 'release-5.0'
2017-07-17 09:34:37 -07:00
A.J. Beamon
a7fbc56a8e
Checksums computed on pages with partially undefined contents are still valid, so mark them as such for valgrind purposes.
2017-07-17 09:34:04 -07:00
Alec Grieser
f75b6f333b
Merge branch 'release-5.0'
2017-07-13 11:21:18 -07:00
Stephen Atherton
39ff1b3c52
Bug fix, when io_timeouts are enabled in warn only mode they weren’t being logged at all.
2017-07-05 14:43:10 -07:00
Stephen Atherton
1b1a0d27e2
Merge branch 'release-5.0'
...
# Conflicts:
# versions.target
2017-06-29 15:58:04 -07:00
Stephen Atherton
028fb75f88
Added last write timestamp to lost write detector class. Renamed TraceEvent for lost writes detected since it is no longer part of the KAIO class specifically.
2017-06-29 15:11:11 -07:00
Alec Grieser
9bcdfe4ddb
removed undefined behavior surrounding TLS logging
2017-06-28 14:23:53 -07:00
Alec Grieser
94bce335e7
Merge branch 'release-5.0'
2017-06-19 17:51:10 -07:00
Alvin Moore
6d19580789
Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0
...
# Conflicts:
# fdbrpc/simulator.h
2017-06-19 17:39:37 -07:00
Alvin Moore
9553458b78
Updated simulation to support managing exclusion and inclusion address
...
Added method for identifying acceptable availability process classes
Extended cluster availability function to ensure coordinators can be auto configured
Fixed availability function to allow protected processes to be considered as dead if not available
Added debug trace events for providing machine state when considering availability
Added trace event for protected coordinators
2017-06-19 16:48:15 -07:00
Stephen Atherton
5d13d845a7
Merge branch 'release-5.0'
2017-06-18 23:25:29 -07:00
Stephen Atherton
0e638e7ea2
Merge branch 'release-4.6' into release-5.0
2017-06-18 23:25:17 -07:00
Stephen Atherton
6d9e302487
Merge branch 'release-5.0'
2017-06-16 02:14:34 -07:00
Stephen Atherton
430bb6224e
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/Net2FileSystem.cpp
# fdbrpc/sim2.actor.cpp
2017-06-16 02:14:19 -07:00
Stephen Atherton
1c94e30e64
Merge branch 'release-5.0'
2017-06-15 17:40:40 -07:00
Stephen Atherton
f405c8d88e
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/optimisttest.actor.cpp
# versions.target
2017-06-15 17:40:19 -07:00
Evan Tschannen
cdd64ebc15
fix: asyncFileNonDurable could never complete deleting a file in rare situations
2017-06-15 13:30:15 -07:00
Evan Tschannen
afdc125db9
Merge branch 'release-5.0'
2017-06-14 16:44:23 -07:00
Evan Tschannen
4bdcd8fc12
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# bindings/bindingtester/run_binding_tester.sh
# fdbrpc/AsyncFileKAIO.actor.h
2017-06-14 16:43:53 -07:00
Yichi Chiang
02ee6d8cd1
Change checksum enabled condition
2017-06-13 11:03:25 -07:00
Stephen Atherton
e318aabe55
Merge branch 'release-5.0'
2017-05-31 17:10:48 -07:00
Stephen Atherton
fa4fdb1f1d
Merge branch 'fix-io-timeout-handling' into release-5.0
...
# Conflicts:
# fdbserver/optimisttest.actor.cpp
2017-05-31 17:03:15 -07:00
Yichi Chiang
41d9bce2d7
Merge pull request #115 from cie/checksum-off-with-tls
...
Disable checksum when TLS is enabled
2017-05-30 11:43:53 -07:00
Stephen Atherton
98604d33a0
Merge branch 'fix-io-timeout-handling'
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/KeyValueStoreSQLite.actor.cpp
# fdbserver/optimisttest.actor.cpp
# fdbserver/worker.actor.cpp
# fdbserver/workloads/MachineAttrition.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
Stephen Atherton
7260e38545
Merge branch 'fix-io-timeout-handling'
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/KeyValueStoreSQLite.actor.cpp
# fdbserver/optimisttest.actor.cpp
# fdbserver/worker.actor.cpp
# fdbserver/workloads/MachineAttrition.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 17:43:28 -07:00
Yichi Chiang
d2ad46680c
Disable checksum when TLS is enabled
2017-05-26 15:34:40 -07:00
Alvin Moore
b28ed397a2
Fixed printf field width specifier to reduce compilation warnings within OS X
2017-05-26 14:51:34 -07:00
Alvin Moore
0b9ed67e12
Fixed support for RemoveServers Workload
...
Added availability functions to simulation
2017-05-26 14:20:11 -07:00
Alvin Moore
16cc0821b1
Removed dead machine option from simulation
2017-05-25 16:29:02 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00