Many times in libsas, and in LLDDs which use libsas, the check for an
expander device is re-implemented or open coded.
Use dev_is_expander() instead. We rename this from
sas_dev_type_is_expander() to not spill so many lines in referencing.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Notice that, in this particular case, a dash is added as a token in order
to separate the "fall through" annotations from the rest of the comment on
the same line, which is what GCC is expecting to find.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
These enums have been separate since the dawn of SAS, mainly because the
latter is a procotol only enum and the former includes additional state
for libsas. The dichotomy causes endless confusion about which one you
should use where and leads to pointless warnings like this:
drivers/scsi/mvsas/mv_sas.c: In function 'mvs_update_phyinfo':
drivers/scsi/mvsas/mv_sas.c:1162:34: warning: comparison between 'enum sas_device_type' and 'enum sas_dev_type' [-Wenum-compare]
Fix by eliminating one of them. The one kept is effectively the sas.h
one, but call it sas_device_type and make sure the enums are all
properly namespaced with the SAS_ prefix.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
While the RNC is suspended for I/O cleanup, the remote device can be
stopped and the RNC setup for destruction. These changes accomodate that
case in the abort path.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Since there is a possibilty of a timeout waiting for the RNC suspension,
handle the exit case from the task termination under scic_lock, and leave
the tag allocated if the termination timed-out.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The ATAPI specific and STP general RNC suspension code had been
incorrectly removed from the remote device code.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Make sure that the wait for suspend can handle the RNC destruction case.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This change adds timeouts to the RNC suspension wait. It makes the
suspend and resume timeouts the same.
The previous resume timeout of 5 ms was too short, and timeouts were
seen in resumptions of devices in the abort task/LUN reset path - which
would receive an RNC resumed message within a tenth of a second later.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In the case of TMF execution, or device resets, wait for the RNC to fully
resume before returning to the caller. This ensures that the remote
device will not fail I/O requests while waiting for the RNC resumption to
complete.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When an individual request is being terminated, the request's tag
is managed in the terminate function.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch changes the callback mechanism to libsas to only occur while
the scic_lock is held; the abort path cleanup of I/Os also checks to make
sure IREQ_ABORT_PATH_ACTIVE is clear before proceding.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Completion of I/Os during the one of the abort path interface calls
from libsas can drive remote device state changes and the resumption
of the device RNC. This is a problem when the abort path is
attempting to cleanup outstanding I/O at the same time - the resumption
can prevent the termination from occuring correctly.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In order to prevent a device from receiving an I/O request while still
in an RNC suspending or resuming state (and therefore failing that
I/O back to libsas with a reset required status) wait for the RNC state
change before proceding in the abort path.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The LLHANG timer should be enabled once per device. This patch corrects
both the timer enable and the timer disable for the remote device.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In the case of a suspend call while in SCI_RNC_POSTING or INVALIDATING
states, the LLHANG detect needed to be saved so the upcoming suspension
would enable it correctly. The unused suspend callback parameters were
removed.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
For NCQ error conditions among others, there is no need to enable
the link layer hang detect timer.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The RNC can be any of the states in the loop from suspended to
ready when the API "suspend" or "resume" are called. This change
adds destination states parameters that control the suspension /
resumption action of the RNC statemachine for those transition states.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This commit changes the means by which outstanding I/Os are handled
for cleanup.
The likelihood is that this commit will be broken into smaller pieces,
however that will be a later revision. Among the changes:
- All completion structures have been removed from the tmf and
abort paths.
- Now using one completed I/O list, with the I/O completed in host bit being
used to select error or normal callback paths.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fixing the remote device state machine to suspend and terminate
all outstanding I/O before the device stopped state is reached.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When the remote device enters the NCQ error state, the device must
be suspended so that the I/O terminations can take place.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
TCs must be terminated only while the RNC is suspended. This commit
adds remote device suspensions and resumptions in the abort, reset and
termination paths.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
TCs must only be terminated when RNCs are suspended.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add comprehensive decode for all TC completions that generate RNC
suspensions.
Note that this commit also removes unconditional resumptions of ATAPI
devices when in the SCI_STP_DEV_ATAPI_ERROR state, and STP devices
when in the SCI_STP_DEV_IDLE state. This is because the SCI_STP_DEV_IDLE
and SCI_STP_DEV_ATAPI state entry functions manage the RNC resumption.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
domain_device ->parent conveys the same information.
Occurrences of ->is_direct_attached appear next to incomplete open-coded
versions of dev_is_sata(), clean those up as well.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Debugging the driver requires tracing the state transtions and tracing
state names is less work than decoding numbers.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Prior to commit 61aaff49 "isci: filter broadcast change notifications
during SMP phy resets" we borrowed the MVS_DEV_EH approach from the
mvsas driver for preventing ->lldd_I_T_nexus_reset() events during ata
discovery. This hack was protecting against the old ->phy_reset() in
ata_bus_probe(), but since the conversion to the new error handling this
hack is preventing resets from reaching ata devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
It only tracks whether the port is stopping in order to gate new devices
being discovered while the port is stopping. However, since the check
and subsequent handling is unlocked there is nothing to stop the port
from going down immediately after the check.
Driver is already prepared to handle devices arriving on stale ports,
and those will be cleaned up by an eventual ->lldd_dev_gone()
notification.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This field is a holdover from the OS abstraction conversion. The stable
phy to port lookups are done via iphy->ownining_port under scic_lock.
After this conversion to use port->lldd_port the only volatile lookup is
the initial lookup in isci_port_formed(). After that point any lookup
via a successfully notified domain_device is guaranteed to be valid
until the domain_device is destroyed.
Delete ->start_complete as it is only set once and is set as a
consequence of the port going link up, by definition of getting a port
formed event the port is "ready".
While we are correcting port lookups also move the asd_sas_port table
out from under the isci_port. This is to preclude any temptation to use
container_of() to convert an asd_sas_port to an isci_port, the
association is dynamic and under libsas control.
Tested-by: Maciej Trela <maciej.trela@intel.com>
[dmilburn@redhat.com: fix i686 compile error]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Arrange for task_contexts prepared for the wide targets to account for
all the attached phys in the port.
Signed-off-by: Bartek Nowakowski <bartek.nowakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The lldd does not need to look at or manage the pending device
reset bit in pending sas_tasks.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Use the existing IREQ_TMF flag as a request type indicator.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
libsas uses the LLDD abort task interface to handle I/O timeouts
in the SATA/STP and SMP discovery paths, so this change will terminate
STP/SMP requests. Also, if the device is gone, the lldd will prevent
libsas from further escalations in the error handler.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Based on original implementation from Jiangbi Liu and Maciej Trela.
ATAPI transfers happen in two-to-three stages. The two stage atapi
commands are those that include a dma data transfer. The data transfer
portion of these operations is handled by the hardware packet-dma
acceleration. The three-stage commands do not have a data transfer and
are handled without hardware assistance in raw frame mode.
stage1: transmit host-to-device fis to notify the device of an incoming
atapi cdb. Upon reception of the pio-setup-fis repost the task_context
to perform the dma transfer of the cdb+data (go to stage3), or repost
the task_context to transmit the cdb as a raw frame (go to stage 2).
stage2: wait for hardware notification of the cdb transmission and then
go to stage 3.
stage3: wait for the arrival of the terminating device-to-host fis and
terminate the command.
To keep the implementation simple we only support ATAPI packet-dma
protocol (for commands with data) to avoid needing to handle the data
transfer manually (like we do for SATA-PIO). This may affect
compatibility for a small number of devices (see
ATA_HORKAGE_ATAPI_MOD16_DMA).
If the data-transfer underruns, or encounters an error the
device-to-host fis is expected to arrive in the unsolicited frame queue
to pass to libata for disposition. However, in the DONE_UNEXP_FIS (data
underrun) case it appears we need to craft a response. In the
DONE_REG_ERR case we do receive the UF and propagate it to libsas.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Most of these simple dereference macros are longer than their open coded
equivalent. Deleting enum sci_controller_mode is thrown in for good
measure.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The distinction between scic_sds_ scic_ and sci_ are no longer relevant
so just unify the prefixes on sci_. The distinction between isci_ and
sci_ is historically significant, and useful for comparing the old
'core' to the current Linux driver. 'sci_' represents the former core as
well as the routines that are closer to the hardware and protocol than
their 'isci_' brethren. sci == sas controller interface.
Also unwind the 'sds1' out of the parameter structs.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove the distinction between these two implementations and unify on
isci_host (local instances named ihost). Hmmm, we had two
'oem_parameters' instances, one was unused... nice.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove the distinction between these two implementations and unify on
isci_remote_device (local instances named idev).
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove the distinction between these two implementations and unify on
isci_port (local instances named iport). The duplicate '->owning_port' and
'->isci_port' in both isci_phy and isci_remote_device will be fixed in a later
patch... this is just the straightforward rename/unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Commit 0815632 "isci: unify remote_device stop_handlers" introduced the
possibility that not all requests get terminated if we reach the
request_count. Now that we properly reference count devices we don't
need this self-defense and can do the straightforward scan of all active
requests.
Reported-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Acked-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
They are one in the same object so remove the distinction. The near
duplicate fields (owning_port, and isci_port) will be cleaned up
after the scic_sds_port isci_port unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
They are one in the same object so remove the distinction. The near
duplicate fields (owning_controller, and isci_host) will be cleaned up
after the scic_sds_contoller isci_host unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
the dma_pool interface is optimized for object_size << page_size which
is not the case with isci_request objects and the dma_pool routines show
up in the top of the profile.
The old io_request_table which tracked whether tci slots were in-flight
or not is replaced with an IREQ_ACTIVE flag per request.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When the remote device transitions to a not-ready state because of
an NCQ error condition, all outstanding requests to that device
are terminated and completed to libsas on the normal path. The
device then waits for a READ LOG EXT command to issue on the task
management path.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that we have upleveled device reassignment protection to the
isci_remote_device reference count we no longer need this level of
self-defense.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that "stopping/stopped" are one in the same and signalled by a NULL device
pointer the rest of the device status infrastructure can be removed (->status
and ->state_lock). The "not ready for i/o state" is replaced with a state
flag, and is evaluated under scic_lock so that we don't see transients from
taking the device reference to submitting the i/o.
This also fixes a potential leakage of can_queue slots in the rare case that
SAS_TASK_ABORTED is set at submission.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We have unsafe references to remote devices that are notified to
disappear at lldd_dev_gone. In order to clean this up we need a single
canonical source for device lookups and stable references once a lookup
succeeds. Towards that end guarantee that domain_device.lldd_dev is
NULL as soon as we start the process of stopping a device. Any code
path that wants to safely lookup a remote device must do so through
task->dev->lldd_dev (isci_lookup_device()).
For in-flight references outside of scic_lock we need reference counting
to ensure that the device is not recycled before we are done with it.
Simplify device back references to just scic_sds_request.target_device
which is now the only permissible internal reference that is maintained
relative to the reference count.
There were two occasions where we wanted new i/o's to be treated as
SAS_TASK_UNDELIVERED but where the domain_dev->lldd_dev link is still
intact. Introduce a 'gone' flag to prevent i/o while waiting for libsas
to take action on the port down event.
One 'core' leftover is that we currently call
scic_remote_device_destruct() from isci_remote_device_deconstruct()
which is called when the 'core' says the device is stopped. It would be
more natural for the final put to trigger
isci_remote_device_deconstruct() but this implementation is deferred as
it requires other changes.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>