[Resend patch as per Bernd Schubert comment ]
Issue:
Device goes offline while doing aggressive HBA reset
along with IO using some utility.
Root cause:
FW goes into bad state due to aggressive reset. Softreset does not
help to recover FW. And also aggressive reset open up the window for
Error handling thread to kicked off at the same time HBA will be in
constant RESET loop as part of aggressive reset test case can lead
Device to goes offline.
Changes:
1. Added extra check as below inside eh_timed_out call back as below.
if(ioc->ioc_reset_in_progress) Rc = EH_TIMER_RESET
2. Removed " DOORBELL_ACTIVE" check for SAS controller from task
management context. Since SAS controller uses high priority queue
for task management. This check is not required for SAS controller.
3. Moved SoftReset call to HardReset from Task Mgmt context.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Find Non-Operation IOC and remove it from OS: Detecting
dead(non-functional) ioc will be done reading doorbell register value
from fault reset thread, which has been called from work thread
context after each specific interval. If doorbell value is 0xFFFFFFFF,
it will be considered as IOC is non-operational and marked as dead
ioc.
Once Dead IOC has been detected, it will be removed at pci layer using
"pci_remove_bus_device" API.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Convert everything except ->proc_info() stuff, it is done within separate
->proc_info path series.
Problem with ->read_proc et al is described here commit
786d7e1612 "Fix rmmod/read/write races in
/proc entries"
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding function name in original debug prints and few more debug prints are
added.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Issue description:
In multipath topology, when device deletion is in transient state,
multipath driver can call blk_flush_queue() as part of path failure.
Before device get deleted from OS, Device may go OFFLINE as part of error
handling kicked off triggered from multipathing driver. Above condition hits
more frequently if device missing delay timer (which is LSI specific firmware
parameter) is non zero value.
root cause of this issue is Error handling thread is getting kicked off for
device which is not really present(in transient state of deleting).
This patch has solution for this issue. driver is now using eh_timed_out
callback. See below.
mptsas_transport_template->eh_timed_out = mptsas_eh_timed_out
Using mptsas_eh_timed_out function, driver can decide weather vdevice is
under Device missing delay or deleting state.
for either of those cases, there is BLK_EH_RESET_TIMER return to scsi mid
and error handling thread will not be kicked off for that particular scsi
command.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Issue:
target reset will be queued to driver's internal queue to get schedule
later. When driver add target into internal target_reset queue we will block IOs
on those target using scsi midlayer API. Now due to some cause driver is not
executing those target_reset list and it is always in block state.
Changes:
now we are clearing target_reset queue from all other Callback context
instead of only DeviceReset context.Now wherever driver is clearing
taskmgmt_in_progress flag it is considering target_reset queue cleanup
also.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
device missing delay is 8 bit value in io unit pg1. Making correct variable
declaration for device_missing_delay.
The driver is storing the calculated device missing delay in IOC structure
as a u8 instead of a u16. It needs to be a u16 if the delay is > 255.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Message Unit Reset - instructs the IOC to reset the Reply Post and
Free FIFO's. All the Message Frames on Reply Free FIFO are
discarded. All posted buffers are freed, and event notification is
turned off. IOC doesnt reply to any outstanding request. This will
transfer IOC to READY state. Message unit ready is less expensive
operations than Hard Reset. soft reset will not force Firmware to
reload again, it only do clean up of Message units.
mpt_Soft_Hard_ResetHandler will first try for Soft Reset,if
it fails then go for big hammer reset which is Hard Reset.
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Firmware is able to handle Broadcast primitives, but upstream driver does not
have support for broadcast primitive handling. Now this patch is mainly to
support broadcast primitives.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
1. Handle integrated Raid device(Add/Delete) and error condition and check
related to Raid device. is_logical_volume will represent logical volume
device.
2. Raid device dual port support is added. Main functions to support this
feature are mpt_raid_phys_disk_get_num_paths and mpt_raid_phys_disk_pg1.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Resending patch considering Grants G's code review.
Main goal to submit this patch is code cleaup.
1. Better driver debug prints and code indentation.
2. fault_reset_work_lock is not used anywhere. driver is using taskmgmt_lock
instead of fault_reset_work_lock.
3. setting pci_set_drvdata properly.
4. Ingore config request when IOC is in reset state.( ioc_reset_in_progress
is set).
5. Init/clear managment frame proprely.(INITIALIZE_MGMT_STATUS and
CLEAR_MGMT_STATUS)
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
1.) SAS topology Rescan is added. If Firmware is doing Reset and we get
Device add interrupt from Firmware, we will not receive it as part of Reset
is going ON. After Reset we will do special Rescan of SAS topology.
2.) Driver version changed from 3.04.08 to 3.04.09.
Added proper lock/unlock in mptsas_not_responding_devices() as per James'
comment.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
SAS topology scan is restructured. HBA firmware is generating more
events. Expander Events are added, Link status events are also added with
respect to SAS topology scan optimization.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Now Firmware events are handled by firmware event queue.
Previously it was handled in interrupt context/WorkQueue of Linux.
Firmware Event handling is restructured and optimized.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
1) rewrite of ioctl_cmds internal generated function that issue commands to
firmware, porting them to be single threaded using the generic MPT_MGMT
struct. All wait Queues are replace by completion Queue.
2) added seperate callback handler for ioctl task managment
(mptctl_taskmgmt_reply), to handle command that timeout
3) rewrite mptctl_bus_reset
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
1.) Added taskmgmt_quiesce_io flag in IOC and removed resetPending from
_MPT_SCSI_HOST struct.
2.) Reset from Scsi mid layer and internal Reset are seperate context.
Adding DeviceResetCtx for internal Device reset frame.
mptsas_taskmgmt_complete is optimized as part of implementation.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
1.) rewrite taskmanagement request and completion routines, making them
single threaded and using the generic MPT_MGMT struct, deleting
mptscsih_TMHandler, replacing with single request TM handler
mptscsih_IssueTaskMgmt, and killing the watchdog timer functions.
2.) cleanup ioc_reset callback handlers, introducing wrappers for
synchronizing error recovery (mpt_set_taskmgmt_in_progress_flag,
mpt_clear_taskmgmt_in_progress_flag), as the fusion firmware only handles
one task management request at a time
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Rewrite of all internal generated functions that issue commands to firmware,
porting them to be single threaded using the generic MPT_MGMT
struct. Implemented using completion Queue.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
1) Previously we had mutliple #defines to use same values.
Now those #defines are optimized.
MPT_IOCTL_STATUS_* is removed and MPT_MGMT_STATUS_* are new
#defines.
2.) config path is optimized.
Instead of wait Queue and timer, using completion Q.
3.) mpt_timer_expired is not used.
[jejb: elide patch to eliminate mpt_timer_expired]
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
sas_discovery_quiesce_io flag is used to control IO start/resume functionality.
IO will be stoped while doing discovery of topology. Once discovery is completed
It will resume IO. Resending patch including James review.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The reason for this change is there is a data corruption when four different
physical memory regions in the 36GB to 37GB region are
accessed. This is only affecting 1078.
The solution is we need to use different addressing when filling in
the scatter gather table for the effected memory regions. So instead
of snooping on all four different memory holes, we treat any physical
addresses in the 36GB address with the same algorithm.
The fix is explained below
1) Ensure that the message frames are NOT located in the trouble
region. There is no remapping available for message frames, they must
be allocated outside the problem region.
2) Ensure that Sense buffers are NOT in the trouble region. There is
no remapping available.
3) Walk through the SGE entries and if any are inside the trouble region
then they need to be remapped as discussed below.
1) Set the Local Address bit in the SGE Flags field.
MPI_SGE_FLAGS_LOCAL_ADDRESS
2) Ensure we are using 64-bit SGEs
3) Set MSb (Bit 63) of the 64-bit address, this will indicate buffer
location is Host Memory.
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Kobjects do not have a limit in name size since a while, so stop
pretending that they do.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the firmware is in Fault state it will be identifed only when the next time
the driver access the IOC state.
This patch includes a polling function in the driver which will be executed in
regular interval to check the status of the firmware and if it is in Fault
state, then the firmware will be reset by the driver.
Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Updating copyright statement to include the year 2008
Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Updating driver version to 3.04.07 from 3.04.06
Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch makes the needlessly global struct mpt_proc_root_dir static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: "Prakash, Sathya" <Sathya.Prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
While performing hardware raid reset testing via the raid's client, I
noticed that sometimes, following the reset, that there would be more
raid targets in the lsscsi output than there actually were raid
targets. I tracked this down to the following issue.
Fusion cannot always find the mptsas_portinfo structure for the hba
because it uses the handle stored in ioc->handle to locate it. The
problem is that the firmware can change the handle associated with the
hba when h/w raid is reset (via the raid client). When this happens,
the driver will allocate another mptsas_portinfo structure and link it
into the chain of said structures. This ultimately causes confusion
within the driver resulting in targets not being removed when they
should be.
Eric Moore pointed out that the hba's portinfo structure is always the
first structure on the sas_topology list. This patch modifies
mptsas.c to access the hba's portinfo structure by taking the first
structure on said list.
Signed-off-by: Michael Reed <mdr@sgi.com>
Acked-by: "Moore, Eric" <Eric.Moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
the semaphore inactive_list_mutex is used as a mutex, convert it to
the mutex API
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Acked-by: "Moore, Eric" <Eric.Moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch modifies the driver to enable MSI by default for all SAS chips.
Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Verified all the arches necessary select the CONFIG_64BIT symbol. This
also kills the warning (since it was using the 32-bit case) on parisc64
and mips64.
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch modifies the I/O resource allocation behavior of FUSION
driver. The current version of driver allocates the I/O resources
even if they are not required and this creates trouble in low resource
environments. This driver now uses
pci_enable_device_mem/pci_enable_device functions to differentiate the
resource allocations.
Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch fixes the module unload problem in flash less 1030
controller environment where firmware download boot functionality is
invoked. The problem is due to the firmware download is being done in
the reverse order, which this patch solves by insureing the download
occurs to the last controller being reset.
signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cleaning up prints that use the xxx_printk API, in that the fusion
preamble "mptbase: iocX" follows the info provided by the print API.
The way its currently coded, the [H:C:T] print in sdev_printk will be
inbetween "mptbase" and "iocX", instead of before.
Signed-off-by: Eric Moore <Eric.Moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
ScsiLookup is an array of pending scmd pointers that the scsi lld
maintains. This array is touched from queuecommand, eh threads, and
interrupt context. This array should put under locks, hence this patch
to synchronize its access. I've added some nice little function
wrappers for this, and moved the ScsiLookup array over to MPT_ADAPTER
struct.
Signed-off-by: Eric Moore <Eric.Moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Recently LSI Logic Corp was renamed as LSI Corp, so whereever there is
a reference of LSI Logic, it is changed to LSI in mpt fusion driver
code.
signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Acked-by: Eric Moore <Eric.Moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When there is state change in FC links, a message is displayed with
old and new link speed.
signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Acked-by: Eric Moore <Eric.Moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>