A DASD device that is not ready or online has no defined disk layout,
so all requests that arrive in such a state need to be returned as
failed.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We used the init_name to set the console ccw_device's name early
at the boot stage. This patch moves the name setting (for all ccw
devices) to the point where we actually register the device. At this
time we can do dynamic allocations and therefore use dev_set_name.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We use a test_and_clear_bit to prevent a device from being
unregistered twice. Unfortunately in this cases the "final"
put_device (from device_initialize) was issued more than once,
resulting in an use after free error. Fix this by moving this
put_device to ccw_device_unregister.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We used the init_name to set the console subchannels name early
at the boot stage. With the patch cio: fix memleak in subchannel validation
we moved the name setting to the point where we actually register the
console subchannel. At this time we can do dynamic allocations and therefore
use dev_set_name.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When scanning for new subchannels we have a code path where we allocate
memory for a struct subchannel, set the device name (which is dynamically
allocated now) and do a check if the underlying device is blacklisted - if
so we free the subchannel structure.
Since we have not set up refcounting at this stage, the device name's memory
is lost. Fix this by moving the dev_set_name after the blacklist test.
Note: With this patch the init_name for the console subchannel becomes
virtually obsolete.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When using s390dbf with "%s" in sprintf format strings the string itself
is not copied to the dbf buffer.
Since in this case only pointers are stored in the s390dbf, we should
not use dev_name - which is bound to the lifetime of the device.
Reading this entry from s390dbf after the device was released will cause
an use after free error.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The number of qdio debugfs entries was limited. Remove this limit
and group the queue files in a per device directory.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When unit checks trigger sensing the device state is set to W4SENSE
until sense completion; then the device state is set back to
ONLINE. If a unit check occurs while set online or set offline
requests are processed then it might happen that the device's
temporary W4SENSE state causes these functions to terminate,
leaving the device in an inconsistent state when the state is set
back to ONLINE later on so that the device cannot be set online or
offline any longer.
To solve this, set online/offline and related rollback or error
routines are processed only if the device is in a final or
DISCONNECTED state.
Signed-off-by: Michael Ernst <mernst@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Ensure to always hold an extra device reference for scheduling a
subchannel deregistration, by moving the get_device to
ccw_device_schedule_sch_unregister. This fixes an use after free
error in ccw_device_call_sch_unregister where put_device was called
on an already freed device structure.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With commit c38f960809 polling was
stopped for the queue even if new data is available.
Return immediately after scheduling the queue tasklet if the queue
is not done.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move debug traces for start I/O and interrupt events to exclusive
trace levels. Also change tracing in hot-path from sprintf (costly)
to hex.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If online/offline processing of a ccw device fails, resulting in not
operational state, notify the driver and unregister the device in case
the driver dosn't want to keep it.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Ensure that the hardware interruption parameter for a subchannel is
reset when the associated subchannel data structure is freed.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
All scsw helper functions are very short and usage of them shouldn't
result in function calls. Therefore we move them to a separate header
file.
Also saves a lot of EXPORT_SYMBOLs.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Path verification events occurring for offline devices are currently
ignored. As a result, offline devices are not removed, even though
they might no longer be accessible (for example because the last path
to the device was varied offline). Fix this by scheduling a status
evaluation for the affected subchannel when a path verification event
occurs.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the NULL test on block is needed, it should be before the dereference of
the base field.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
expression E1,E2;
identifier fld;
statement S1,S2;
@@
E1 = E2->fld;
(
if (E1 == NULL) S1 else S2
|
*if (E2 == NULL) S1 else S2
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If io_subchannel_initialize_dev fails it will release the only
reference to the ccw device therefore the caller should not
kfree this device since this is done in the release function.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: (23 commits)
[SCSI] sd: Avoid sending extended inquiry to legacy devices
[SCSI] libsas: fix wide port hotplug issues
[SCSI] libfc: fix a circular locking warning during sending RRQ
[SCSI] qla4xxx: Remove hiwat code so scsi eh does not get escalated when we can make progress
[SCSI] qla4xxx: Fix srb lookup in qla4xxx_eh_device_reset
[SCSI] qla4xxx: Fix Driver Fault Recovery Completion
[SCSI] qla4xxx: add timeout handler
[SCSI] qla4xxx: Correct Extended Sense Data Errors
[SCSI] libiscsi: disable bh in and abort handler.
[SCSI] zfcp: Fix tracing of request id for abort requests
[SCSI] zfcp: Fix wka port processing
[SCSI] zfcp: avoid double notify in lowmem scenario
[SCSI] zfcp: Add port only once to FC transport class
[SCSI] zfcp: Recover from stalled outbound queue
[SCSI] zfcp: Fix erp escalation procedure
[SCSI] zfcp: Fix logic for physical port close
[SCSI] zfcp: Use -EIO for SBAL allocation failures
[SCSI] zfcp: Use unchained mode for small ct and els requests
[SCSI] zfcp: Use correct flags for zfcp_erp_notify
[SCSI] zfcp: Return -ENOMEM for allocation failures in zfcp_fsf
...
The trace record for SCSI abort requests has a field for the request
id of the request to be aborted. Put the real request id instead of
zero.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Under certain conditions it is possible that a WKA port ist not opened
within the expected timeframe of half a second. In this situation
the WKA port remains in the state OPENING preventing any succeding
request to open the port. This led to unrecoverable remote ports.
Fixing this by always setting an appropriate WKA port status before
leaving the function and removing the timeout value here since it's
not needed here because the general timeout processing would deal
with it if required.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In a LOWMEM condition an ERP notification would have been sent twice
causing an unpredictable behaviour of the ERP.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When calling fc_remote_port_add make sure to not call it again before
fc_remote_port_delete has been called. In other words, ensure to
create a new fc_rport, then delete it, then create a new one again.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Depending on interruptions on some storage systems, the complete
channel can stall which looks like an outbound queue stall to Linux.
When trying to acquire a free SBAL for a non-SCSI command, zfcp waits
for 5 seconds for a free slot to appear. This is the right place to
detect a queue stall: If the wait times out, we assume a stalled queue
and try to recover this.
The overall strategy should be to trigger the erp from specific
events, and not try an overall escalation from one failed port to a
full-blown queue recovery. If we manage to send a command, the status
codes for this command or a timeout will trigger the right follow-on
actions.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If an action fails, retry it until the erp count exceeds the
threshold. If there is something fundamentally wrong, the FSF layer
will trigger a more appropriate action depending on the FSF status
codes.
The followup for successful actions is a different followup than
retrying failed actions, so split the code two functions to make this
clear.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
After closing the port, we want it to be "not open" to consider the
action to be successful.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The ELS ADISC and the GID_PN requests sent from zfcp fit into
unchained FSF requests. Change the FSF allocation logic to use
unchained requests whenever possible where everything fits in one
SBAL. This avoids acquiring more SBALs than necessary, especially
during zfcp recovery when things might be stalled.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
zfcp_erp_notify uses the ZFCP_ERP_STATUS_* flags, so it is
ZFCP_STATUS_ERP_LOWMEM instead of ZFCP_ERP_NOMEM. Signalling
ZFCP_ERP_FAILED is not necessary, the missing d_id will show that the
nameserver did not return the d_id.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When a fsf_req or a qtcb cannot be allocated return -ENOMEM instead of
-EIO.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
req_q_util is not atomic, so the qdio_stat_lock must be held when
reading this variable.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
We should not modify the port status after triggering an ERP action
for the port. It is not guaranteed which status is finally active
when the ERP action is performed. This can lead to situations which
are unwanted and hard to debug in case of a failure.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Every time a request is enqueued or there is some work outstanding
from the ap_tasklet, the ap_poll_timer is scheduled again.
Unfortunately it was permanently called. It looked as if it was
started in the past and thus imediately expired.
This has been changed. First it is checked if the hrtimer is already
expired. Then the expiring time is forwarded and the timer restarted.
Signed-off-by: Felix Beck <felix.beck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
remove loop, add some debug data and use get_sense function
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix this:
drivers/s390/char/monreader.c: In function 'mon_open':
drivers/s390/char/monreader.c:323: warning: passing argument 1 of 'dev_set_drvdata' from incompatible pointer type
include/linux/device.h:457: note: expected 'struct device *' but argument is of type 'struct device **'
drivers/s390/char/monreader.c: In function 'monreader_freeze':
drivers/s390/char/monreader.c:466: warning: passing argument 1 of 'dev_get_drvdata' from incompatible pointer type
include/linux/device.h:452: note: expected 'const struct device *' but argument is of type 'struct device **'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Define an empty static inline version of sclp_console_pm_event()
to fix the build error below for !SCLP_CONSOLE.
drivers/s390/built-in.o: In function `sclp_rw_pm_event':
sclp_rw.c:(.text+0x12f68): undefined reference to `sclp_console_pm_event'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To set a dasd online dasd_change_state is called twice. The first
cycle will schedule initial analysis of the device, set the rc to
-EAGAIN and will not touch the device state any more.
The initial analysis will in turn call dasd_change_state to increase
the state to the final DASD_STATE_ONLINE.
If the dasd_change_state on the second thread outruns the other one
both finish with the state set to DASD_STATE_ONLINE and the device
refcount will be decreased by 2.
Fix this by leaving dasd_change_state on rc == -EAGAIN so that the
refcount will always be decreased by 1.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Replace the remaining direct accesses to the driver_data pointer
with calls to the dev_get_drvdata() and dev_set_drvdata() functions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The stop flags are handled in the generic restore function so the
stop flag is removed also for FBA and DIAG devices.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add Suspend/Resume support to ap bus and zcrypt. All enhancements are
done in the ap bus. No changes in the crypto card specific part are
necessary.
Signed-off-by: Felix Beck <felix.beck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove unneeded sanity checks from do_QDIO since this is the hot path.
Change the type of bufnr and count to unsigned int so the check for the
maximum value works.
Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It is not required to change the state of primed SBALs. Leaving them
primed saves a SQBS instruction under z/VM.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since the adapter interrupt tasklet only schedules the queue tasklets
and contains no code that requires serialization in can be merged
with the adapter interrupt handler. That possibly safes some CPU
cycles.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For devices without QIOASSIST primed SBALS were extracted in a loop.
Remove the loop since get_buf_states can already return more than
one primed SBAL.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The check whether qdio runs under z/VM was incorrect since SIGA-Sync is not
set if the device runs with QIOASSIST. Use MACHINE_IS_VM instead to prevent
polling under z/VM.
Merge qdio_inbound_q_done and tiqdio_is_inbound_q_done.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the adapter interrupt tasklet function to the qdio main code
since all the functions used by the tasklet are located there.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When syncing the sclp console queue, we call del_timer_sync() while holding
the "sclp_con_lock" spinlock. This lock is also taken in the timer function
"sclp_console_timeout". Therefore the sync version of del_timer() cannot be
used here. Because the synchronous deletion of the timer is only needed
in the suspend callback and in that case only one CPU is remaining and
therefore it is not possible that the timer function is running in parallel,
we can safely use del_timer() instead of del_timer_sync().
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The slab allocator is earlier available so convert the
bootmem allocations to slab/gfp allocations.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>