Currently, when a trunk link goes down due to some fault, the driver
snapshots the fault code. If the link then comes back up, meaning there is
no fault, the driver is not clearing the fault code so the sysfs link_state
entry reports old/stale data.
Revise the logic so that on successful link up the fault code is cleared.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The existing MDS loopback diagnostics support processing received frames in
the slowpath work thread. It caps the number of frames it will process at
64, before waiting for another event to indicate additional frame
reception. The net-net is this results in very slow frame processing during
loopback tests and sometimes orphans an io, causing the loopback test to
report failure by the switch.
Move MDS loopback frame processing out of the slow path worker thread and
into the normal RQ processing routines.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the adapter is taken offline, the trunk link port attributes continue to
report trunk links as up even though all links are down as the adapter is
offline.
Clear the trunk links state as part of taking the adapter offline.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Certain older adapters such as the OneConnect OCe10100 may not have a valid
wqpcnt value. In this case, do not set queue->page_count to 0 in
lpfc_sli4_queue_alloc() as this will prevent the driver from initializing.
Fixes: 895427bd01 ("scsi: lpfc: NVME Initiator: Base modifications")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Most SCSI drivers want to enable "clustering", that is merging of
segments so that they might span more than a single page. Remove the
ENABLE_CLUSTERING define, and require drivers to explicitly set
DISABLE_CLUSTERING to disable this feature.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Addition of support for if_type=6 missed several checks for interface type,
resulting in the failure of several key management features such as
firmware dump and loopback testing.
Correct the checks on the if_type so that both SLI4 IF_TYPE's 2 and 6 are
supported.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This reverts commit 287aba2592.
We killed the bad firmware and this mod is no longer necessary.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwNpb0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwGwH/00UHnXfxww3ixxz
zwTVDzptA6SPm6s84yJOWatM5fXhPiAltZaHSYF9lzRzNU71NCq7Frhq3fQUIXKM
OxqDn9nfSTWcjWTk2q5n2keyRV/KIn67YX7UgqFc1bO/mqtVjEgNWaMyblhI+e9E
giu1ZXayHr43jK1cDOmGExZubXUq7Vsc9TOlrd+d2SwIqeEP7TCMrPhnHDwCNvX2
UU5dtANpVzGtHaBcr37wJj+L8kODCc0f+PQ3g2ar5jTHst5SLlHp2u0AMRnUmgdi
VkGx+mu/uk8mtwUqMIMqhplklVoqK6LTeLqsY5Xt32SKruw9UqyJGdphLjW2QP/g
MkmA1lI=
=7kaD
-----END PGP SIGNATURE-----
Merge tag 'v4.20-rc6' into for-4.21/block
Pull in v4.20-rc6 to resolve the conflict in NVMe, but also to get the
two corruption fixes. We're going to be overhauling the direct dispatch
path, and we need to do that on top of the changes we made for that
in mainline.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Update the driver version to 12.0.0.9
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When dif and first burst is used in a write command wqe, the driver was not
properly setting fields in the io command request. This resulted in no dif
bytes being sent and invalid xfer_rdy's, resulting in the io being aborted
by the hardware.
Correct the wqe initializaton when both dif and first burst are used.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On driver termination, after the driver stops fw logging by writing a
register on the chip, the driver immediately unmaps and frees the logging
buffer, without confirming in any way that the chip has received the write
and terminated the logging. As termination on the chip is not immediate,
the chip may issue a dma request to the now unmapped dma buffer, resulting
in a iommu fault.
Change the driver to receive a confirmation that logging ahs been
terminated. As the driver always issues an SLI reset with the device as
part of shutdown, and as part of that is receiving confirmation that the
reset is complete - the driver was modified to perform the write to disable
fw logging prior to the SLI reset and only free the fw log buffer after the
SLI reset is complete. That guarantees use of the fw log buffer is fully
terminated when it is unmapped.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver missed classifying the chip type for G7 when reporting supported
topologies. This resulted in loop being shown as supported on FC links that
are not supported per the standard.
Add the chip classifications to the topology checks in the driver.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver is setting bits in word 10 of the SLI4 ABORT WQE (the wqid). The
field was a carry over from a prior SLI revision. The field does not exist
in SLI4, and the action may result in an overlap with future definition of
the WQE.
Remove the setting of WQID in the ABORT WQE.
Also cleaned up WQE field settings - initialize to zero, don't bother to
set fields to zero.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current discovery state machine the driver treated FLOGI oddly. When
point to point, an FLOGI is to be exchanged by the two ports, with the port
with the most significant WWN then proceeding with PLOGI. The
implementation in the driver was keyed to closely with "what have I sent",
not with what has happened between the two endpoints. Thus, it blatantly
would ACC an FLOGI, but reject PLOGI's until it had its FLOGI ACC'd. The
problem is - the sending of FLOGI may be delayed for some reason, or the
response to FLOGI held off by the other side. In the failing situation the
other side sent an FLOGI, which was ACC'd, then sent PLOGIs which were then
rjt'd until the retry count for the PLOGIs were exceeded and the port gave
up. The FLOGI may have been very late in transmit, or the response held off
until the PLOGIs failed. Given the other port had the higher WWN, no PLOGIs
would occur and communication stopped.
Correct the situation by changing the FLOGI handling. Defer any response to
an FLOGI until the driver has sent its FLOGI as well. Then, upon either
completion of the sent FLOGI, or upon sending an ACC to a received FLOGI
(which may be received before or just after FLOGI was sent). the driver
will act on who has the higher WWN. if the other port does, the driver will
noop any handling of an FLOGI response (if outstanding) and wait for PLOGI.
If the local port does, the driver will transition to sending PLOGI and
will noop any action on responding to an FLOGI (if not yet received).
Fortunately, to implement this, it only took another state flag and
deferring any FLOGI response if the FLOGI has yet to be transmit. All
subsequent actions were already in place.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In some link initialization sequences, the fw generates an erroneous FLOGI
payload to the driver without an intervening link bounce. The driver, when
it sees a 2nd FLOGI without an intervening link bounce, automatically
performs a link bounce. In this, the link bounce causes the situate to
repeat and in a nasty loop of link bounces.
Resolve the issue by validating the FLOGI payload. The erroneous FLOGI will
contain VVL signatures that are not normal. When the driver sees these, it
will simply reject the flogi rather than bouncing the link. The reject is
consumed within the firmware.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Two initiator ports were cable swapped and after swap both went down. The
driver internally swaps the nlp nodes based on matching node wwn's but not
the same nport id as before. After detecting a change in the nodes RPI, the
driver sends an UNREG_RPI command and clears the NLP_RPI_REGISTERED flag,
then swaps the node information with the other node. But the other node's
NLP_RPI_REGISTERED flag is also cleared, but it is done so without an
UNREG_RPI being sent, which causes the later REG_RPI for that other node to
fail as the hardware believes its still registered.
Additionally, if the node swap occurred while the two nodes had PLOGI's in
flight, the fc4_types weren't properly getting swapped such that when the
PLOGIs commpleted and PRLI's were then sent, the PRLI's acted on bad
protocol types so the PRLI was for the wrong protocol. NVME devices saw
SCSI FCP PRLIs and vice versa.
Clean up the node swap so that the NLP_RPI_REGISTERED flag is handled
properly.
Fix the handling of the fc4_types when the nodes are swapped as well
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Depending on the chipset, the number of NPIV vports may vary and be in
excess of what most switches support (256). To avoid confusion with the
users, limit the reported NPIV vports to 256.
Additionally correct the 16G adapter which is reporting a bogus NPIV vport
number if the link is down.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver is hitting null pring pointers in lpfc_do_work().
Pointer assignment occurs based on SLI-revision. If recovering after an
error, its possible the sli revision for the port was cleared, making the
lpfc_phba_elsring() not return a ring pointer, thus the null pointer.
Add SLI revision checking to lpfc_phba_elsring() and status checking to all
callers.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Renumber one of the 0711 log messages so there isn't a duplication.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is getting hit with 100s of RSCNs during remote port address
changes. Each of those RSCN's ends up generating UNREG_RPI and REG_PRI
mailbox commands. The discovery engine within the driver doesn't wait for
the mailbox command completions. Instead it sets state flags and moves
forward. At some point, there's a massive backlog of mailbox commands which
take time for the adapter to process. Additionally, it appears there were
duplicate events from the switch so the driver generated duplicate mailbox
commands for the same remote port. During this window, failures on PLOGI
and PRLI ELS's are see as the adapter is rejecting them as they are for
remote ports that still have pending mailbox commands.
Streamline the discovery engine so that PLOGI log checks for outstanding
UNREG_RPIs and defer the processing until the commands complete. This
better synchronizes the ELS transmission vs the RPI registrations.
Filter out multiple UNREG_RPIs being queued up for the same remote port.
Beef up log messages in this area.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver data structure for managing a mailbox command contained two
context fields. Unfortunately, the context were considered "generic" to be
used at the whim of the command code. Of course, one section of code used
fields this way, while another did it that way, and eventually there were
mixups.
Refactored the structure so that the generic contexts become a node context
and a buffer context and all code standardizes on their use.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update manufacturer attribute to reflect Broadcom Inc, not Emulex
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While trying to get adapter fw-log for a function whose buffsize was set to
0, kernel panic occurred.
When buffsize is 0, the kernel buffer for the log won't be allocated. When
fw log usage was enabled, it failed to check the buffer size, and log usage
was started. Eventually the driver referenced the unallocated log buffer.
Added checks of the buffer size before allowing fw logging to be enabled
and added check for valid buffer if enabling fw log.
Performed a couple other minor cleanups while fixing this:
- clarified log messages
- re-evaluated log message severity
- treat any error as an error, not only a couple codes
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since f44ac12f1d, BG enablement is tracked with the LPFC_SLI3_BG_ENABLED
bit, which is set in lpfc_get_cfgparam before lpfc_sli_config_sli_port() is
called. The bit shouldn't be cleared before checking the feature. Based on
problem analysis by David Bond.
Fixes: f44ac12f1d "scsi: lpfc: Memory allocation error during driver start-up on power8"
Tested-by: David Bond <dbond@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Cc: stable@vger.kernel.org # 4.17.x
Cc: stable@vger.kernel.org # 4.18.x
Cc: stable@vger.kernel.org # 4.19.x
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replaced dma_alloc_coherent + memset with dma_zalloc_coherent.
Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlvx2sAeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGycgIAIuxobwt0RRKa0zO
ROS+34JGoC2yU2P9VdEGWdtxS6ANMVQgKPBhWL6s+xR89Kd+V4xSdJLD1pNTxxqP
0DCva0np1/Q4juH+JbU50v/lykoLgteZ0P0LBRGf1y8p3WiLPv45IbnNsMDNYhB2
7a8rOmZYakRY9CPznRDw3X8cJt3sddKgFJHIOGz1OQJVWtCD0KPGcJmQNsbDSagY
Zx6Z5BKSIdjRqaAdN5gDa1Pft3WQo7TpaQGl80lSsgr5LcjmscXA3sClOCy+25Mo
FZLx0PcwP+Efq8RTGzNK51WSOMa6d37hvjDqUAdQBOR0KbyjRyXQwyQVw/MGbPJs
7J3Pzm0=
=56Mt
-----END PGP SIGNATURE-----
Merge tag 'v4.20-rc3' into for-4.21/block
Merge in -rc3 to resolve a few conflicts, but also to get a few
important fixes that have gone into mainline since the block
4.21 branch was forked off (most notably the SCSI queue issue,
which is both a conflict AND needed fix).
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The driver currently uses pci_set_dma_mask despite otherwise using the
generic DMA API. Switch it over to the better generic DMA API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This removes the legacy (non-mq) IO path for SCSI.
Cc: linux-scsi@vger.kernel.org
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Update the driver version to 12.0.0.8
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add trunking support to the driver. Trunking is found on more recent
asics. In general, trunking appears as a single "port" to the driver
and overall behavior doesn't differ. Link speed is reported as an
aggregate value, while link speed control is done on a per-physical
link basis with all links in the trunk symmetrical. Some commands
returning port information are updated to additionally provide
trunking information. And new ACQEs are generated to report physical
link events relative to the trunk.
This patch contains the following modifications:
- Added link speed settings of 128GB and 256GB.
- Added handling of trunk-related ACQEs, mainly logging and trapping
of physical link statuses.
- Added additional bsg interface to query trunk state by applications.
- Augment link_state sysfs attribtute to display trunk link status
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The switches seem to respond faster to GID_PT vs GID_FT NameServer
queries. Add support for GID_PT to be used over GID_FT to enable
faster storage failover detection. Includes addition of new module
parameter to select between GID_PT and GID_FT (GID_FT is default).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An address change for a remote port cause PRLI for the wrong protocol
to be sent. The node copy done in the discovery code skipped copying
the fc4 protocols supported as well.
Fix the copy logic for the address change. Beefed up log messages in
this area as well.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Testing a point-to-point topology and a case of re-FLOGI without
intervening link bouncing, showed an odd interaction with firmware and
a resulting scenario where the driver no longer probed after accepting
the new FLOGI.
Work around the firmware issue by issuing a link bounce if a FLOGI is
received after the link is already up and FLOGI's accepted.
While debugging the issue, realized that some debug traces should be
clarified to help in the future.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When LCB's are rejected, if beaconing was already in progress, the
Reason Code Explanation was not being set. Should have been set to
command in progress.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On FCoE adapters, when running link bounce test in a loop, initiator
failed to login with switch switch and required driver reload to
recover. Switch reached a point where all subsequent FLOGIs would be
LS_RJT'd. Further testing showed the condition to be related to not
performing FCF discovery between FLOGI's.
Fix by monitoring FLOGI failures and once a repeated error is seen
repeat FCF discovery.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch corrects two issues:
- An oops would occur if reading based on a non-zero offset. Offset
calculation was incorrect.
- Updates to ras config (logging level) were ignored if change was
made while fw logging was enabled. Revise to dynamically update.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, PLOGI failures are infinitely delayed/retried. There have
been some fabric situations where the PLOGI's were to the nameserver
and it stopped responding. The retries would never clear up. A better
resolution in this situation is to retry a couple of times, then drop
the link and reinit. This brings back connectivity to the nameserver.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After a LOGO in response to an ABTS timeout, a PLOGI wasn't issued to
re-establish the login. An nlp_type check in the LOGO completion
handler failed to restart discovery for NVME targets. Revised the
nlp_type check for NVME as well as SCSI.
While reviewing the LOGO handling a few other issues were seen and
were addressed:
- Better lock synchronization around ndlp data types
- When the ABTS times out, unregister the RPI before sending the LOGO
so that all local exchange contexts are cleared and nothing received
while awaiting LOGO/PLOGI handling will be accepted.
- LOGO handling optimized to:
Wait only R_A_TOV for a response.
It doesn't need to be retried on timeout. If there wasn't a
response, a PLOGI will be sent, thus an implicit logout
applies as well when the other port sees it.
If there is a response, any kind of response is considered "good"
and the XRI quarantined for a exchange qualifier window.
- PLOGI is issued as soon a LOGO state is resolved.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An error is an error - but not to the existing return value check.
Revise check to handle any failure, not just EIO.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Supported speeds is not updated when SFP is removed or replaced
Supported speed is obtained from lmt field in READ_CONFIG mailbox
response. Driver updates supported speeds only once from PCI probe
path. After that it is never updated. So, supported speeds remains the
same till reboot or driver reload.
When SFP is removed or inserted, driver gets SLI-Port Event ACQE. If
SFP is removed, lmt wil have value 0. If a different SFP is inserted,
lmt will have value according to its supported speeds. So, afterr
SLI-Port Event ACQE handling path, send READ_CONFIG mailbox and update
supported speeds. If READ_CONFIG fails, set supported speeds to
unknown and log.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The addition of a spinlock in lpfc_debugfs_nodelist_data() introduced
a bug that lets us not skip NULL pointers correctly, as noticed by
gcc-8:
drivers/scsi/lpfc/lpfc_debugfs.c: In function 'lpfc_debugfs_nodelist_data.constprop':
drivers/scsi/lpfc/lpfc_debugfs.c:728:13: error: 'nrport' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (nrport->port_role & FC_PORT_ROLE_NVME_INITIATOR)
This changes the logic back to what it was, while keeping the added
spinlock.
Fixes: 9e21017826 ("scsi: lpfc: Synchronize access to remoteport via rport")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly updates of the usual drivers: UFS, esp_scsi, NCR5380,
qla2xxx, lpfc, libsas, hisi_sas. In addition there's a set of mostly
small updates to the target subsystem a set of conversions to the
generic DMA API, which do have some potential for issues in the older
drivers but we'll handle those as case by case fixes. A new myrs for
the DAC960/mylex raid controllers to replace the block based DAC960
which is also being removed by Jens in this merge window. Plus the
usual slew of trivial changes.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCW9BQJSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishU3MAP41T8yW
UJQDCprj65pCR+9mOUWzgMvgAW/15ouK89x/7AD/XAEQZqoAgpFUbgnoZWGddZkS
LykIzSiLHP4qeDOh1TQ=
=2JMU
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual drivers: UFS, esp_scsi, NCR5380,
qla2xxx, lpfc, libsas, hisi_sas.
In addition there's a set of mostly small updates to the target
subsystem a set of conversions to the generic DMA API, which do have
some potential for issues in the older drivers but we'll handle those
as case by case fixes.
A new myrs driver for the DAC960/mylex raid controllers to replace the
block based DAC960 which is also being removed by Jens in this merge
window.
Plus the usual slew of trivial changes"
[ "myrs" stands for "MYlex Raid Scsi". Obviously. Silly of me to even
wonder. There's also a "myrb" driver, where the 'b' stands for
'block'. Truly, somebody has got mad naming skillz. - Linus ]
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (237 commits)
scsi: myrs: Fix the processor absent message in processor_show()
scsi: myrs: Fix a logical vs bitwise bug
scsi: hisi_sas: Fix NULL pointer dereference
scsi: myrs: fix build failure on 32 bit
scsi: fnic: replace gross legacy tag hack with blk-mq hack
scsi: mesh: switch to generic DMA API
scsi: ips: switch to generic DMA API
scsi: smartpqi: fully convert to the generic DMA API
scsi: vmw_pscsi: switch to generic DMA API
scsi: snic: switch to generic DMA API
scsi: qla4xxx: fully convert to the generic DMA API
scsi: qla2xxx: fully convert to the generic DMA API
scsi: qla1280: switch to generic DMA API
scsi: qedi: fully convert to the generic DMA API
scsi: qedf: fully convert to the generic DMA API
scsi: pm8001: switch to generic DMA API
scsi: nsp32: switch to generic DMA API
scsi: mvsas: fully convert to the generic DMA API
scsi: mvumi: switch to generic DMA API
scsi: mpt3sas: switch to generic DMA API
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlvPV7IUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyaUg//WnCaRIu2oKOp8c/bplZJDW5eT10d
oYAN9qeyptU9RYrg4KBNbZL9UKGFTk3AoN5AUjrk8njxc/dY2ra/79esOvZyyYQy
qLXBvrXKg3yZnlNlnyBneGSnUVwv/kl2hZS+kmYby2YOa8AH/mhU0FIFvsnfRK2I
XvwABFm2ZYvXCqh3e5HXaHhOsR88NQ9In0AXVC7zHGqv1r/bMVn2YzPZHL/zzMrF
mS79tdBTH+shSvchH9zvfgIs+UEKvvjEJsG2liwMkcQaV41i5dZjSKTdJ3EaD/Y2
BreLxXRnRYGUkBqfcon16Yx+P6VCefDRLa+RhwYO3dxFF2N4ZpblbkIdBATwKLjL
npiGc6R8yFjTmZU0/7olMyMCm7igIBmDvWPcsKEE8R4PezwoQv6YKHBMwEaflIbl
Rv4IUqjJzmQPaA0KkRoAVgAKHxldaNqno/6G1FR2gwz+fr68p5WSYFlQ3axhvTjc
bBMJpB/fbp9WmpGJieTt6iMOI6V1pnCVjibM5ZON59WCFfytHGGpbYW05gtZEod4
d/3yRuU53JRSj3jQAQuF1B6qYhyxvv5YEtAQqIFeHaPZ67nL6agw09hE+TlXjWbE
rTQRShflQ+ydnzIfKicFgy6/53D5hq7iH2l7HwJVXbXRQ104T5DB/XHUUTr+UWQn
/Nkhov32/n6GjxQ=
=58I4
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- Fix ASPM link_state teardown on removal (Lukas Wunner)
- Fix misleading _OSC ASPM message (Sinan Kaya)
- Make _OSC optional for PCI (Sinan Kaya)
- Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set
(Patrick Talbert)
- Remove x86 and arm64 node-local allocation for host bridge structures
(Punit Agrawal)
- Pay attention to device-specific _PXM node values (Jonathan Cameron)
- Support new Immediate Readiness bit (Felipe Balbi)
- Differentiate between pciehp surprise and safe removal (Lukas Wunner)
- Remove unnecessary pciehp includes (Lukas Wunner)
- Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner)
- Tolerate PCIe Slot Presence Detect being hardwired to zero to
workaround broken hardware, e.g., the Wilocity switch/wireless device
(Lukas Wunner)
- Unify pciehp controller & slot structs (Lukas Wunner)
- Constify hotplug_slot_ops (Lukas Wunner)
- Drop hotplug_slot_info (Lukas Wunner)
- Embed hotplug_slot struct into users instead of allocating it
separately (Lukas Wunner)
- Initialize PCIe port service drivers directly instead of relying on
initcall ordering (Keith Busch)
- Restore PCI config state after a slot reset (Keith Busch)
- Save/restore DPC config state along with other PCI config state
(Keith Busch)
- Reference count devices during AER handling to avoid race issue with
concurrent hot removal (Keith Busch)
- If an Upstream Port reports ERR_FATAL, don't try to read the Port's
config space because it is probably unreachable (Keith Busch)
- During error handling, use slot-specific reset instead of secondary
bus reset to avoid link up/down issues on hotplug ports (Keith Busch)
- Restore previous AER/DPC handling that does not remove and
re-enumerate devices on ERR_FATAL (Keith Busch)
- Notify all drivers that may be affected by error recovery resets
(Keith Busch)
- Always generate error recovery uevents, even if a driver doesn't have
error callbacks (Keith Busch)
- Make PCIe link active reporting detection generic (Keith Busch)
- Support D3cold in PCIe hierarchies during system sleep and runtime,
including hotplug and Thunderbolt ports (Mika Westerberg)
- Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots
are empty or occupied (Jon Derrick)
- Remove duplicated include from pci/pcie/err.c and unused variable
from cpqphp (YueHaibing)
- Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza
Pawandeep)
- Uninline PCI bus accessors for better ftracing (Keith Busch)
- Remove unused AER Root Port .error_resume method (Keith Busch)
- Use kfifo in AER instead of a local version (Keith Busch)
- Use threaded IRQ in AER bottom half (Keith Busch)
- Use managed resources in AER core (Keith Busch)
- Reuse pcie_port_find_device() for AER injection (Keith Busch)
- Abstract AER interrupt handling to disconnect error injection (Keith
Busch)
- Refactor AER injection callbacks to simplify future improvments
(Keith Busch)
- Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski)
- Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko)
- Add switch fall-through annotations (Gustavo A. R. Silva)
- Remove unused Switchtec quirk variable (Joshua Abraham)
- Fix pci.c kernel-doc warning (Randy Dunlap)
- Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig)
- Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng)
- Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid
useless dmesg errors (Logan Gunthorpe)
- Update Switchtec NTB documentation (Wesley Yung)
- Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz)
- Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang)
- Add PCI support for peer-to-peer DMA (Logan Gunthorpe)
- Add sysfs group for PCI peer-to-peer memory statistics (Logan
Gunthorpe)
- Add PCI peer-to-peer DMA scatterlist mapping interface (Logan
Gunthorpe)
- Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan
Gunthorpe)
- Add PCI peer-to-peer DMA driver writer's documentation (Logan
Gunthorpe)
- Add block layer flag to indicate driver support for PCI peer-to-peer
DMA (Logan Gunthorpe)
- Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P
memory (Logan Gunthorpe)
- Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan
Gunthorpe)
- Add nvme-pci support for PCI peer-to-peer memory in requests (Logan
Gunthorpe)
- Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise,
Christoph Hellwig, Logan Gunthorpe)
- Cache VF config space size to optimize enumeration of many VFs
(KarimAllah Ahmed)
- Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas)
- Fix VMD AERSID quirk Device ID matching (Jon Derrick)
- Fix Cadence PHY handling during probe (Alan Douglas)
- Signal Cadence Endpoint interrupts via AXI region 0 instead of last
region (Alan Douglas)
- Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan
Douglas)
- Remove redundant controller tests for "device_type == pci" (Rob
Herring)
- Document R-Car E3 (R8A77990) bindings (Tho Vu)
- Add device tree support for R-Car r8a7744 (Biju Das)
- Drop unused mvebu PCIe capability code (Thomas Petazzoni)
- Add shared PCI bridge emulation code (Thomas Petazzoni)
- Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni)
- Add aardvark Root Port emulation (Thomas Petazzoni)
- Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach)
- Add initial power management for i.MX7 (Leonard Crestez)
- Add PME_Turn_Off support for i.MX7 (Leonard Crestez)
- Fix qcom runtime power management error handling (Bjorn Andersson)
- Update TI dra7xx unaligned access errata workaround for host mode as
well as endpoint mode (Vignesh R)
- Fix kirin section mismatch warning (Nathan Chancellor)
- Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare)
- Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I)
- Update Keystone to use MRRS quirk for host bridge instead of open
coding (Kishon Vijay Abraham I)
- Refactor Keystone link establishment (Kishon Vijay Abraham I)
- Simplify and speed up Keystone link training (Kishon Vijay Abraham I)
- Remove unused Keystone host_init argument (Kishon Vijay Abraham I)
- Merge Keystone driver files into one (Kishon Vijay Abraham I)
- Remove redundant Keystone platform_set_drvdata() (Kishon Vijay
Abraham I)
- Rename Keystone functions for uniformity (Kishon Vijay Abraham I)
- Add Keystone device control module DT binding (Kishon Vijay Abraham
I)
- Use SYSCON API to get Keystone control module device IDs (Kishon
Vijay Abraham I)
- Clean up Keystone PHY handling (Kishon Vijay Abraham I)
- Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I)
- Clean up Keystone config space access checks (Kishon Vijay Abraham I)
- Get Keystone outbound window count from DT (Kishon Vijay Abraham I)
- Clean up Keystone outbound window configuration (Kishon Vijay Abraham
I)
- Clean up Keystone DBI setup (Kishon Vijay Abraham I)
- Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I)
- Fix Keystone IRQ status checking (Kishon Vijay Abraham I)
- Add debug messages for all Keystone errors (Kishon Vijay Abraham I)
- Clean up Keystone includes and macros (Kishon Vijay Abraham I)
- Fix Mediatek unchecked return value from devm_pci_remap_iospace()
(Gustavo A. R. Silva)
- Fix Mediatek endpoint/port matching logic (Honghui Zhang)
- Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui
Zhang)
- Remove redundant Mediatek PM domain check (Honghui Zhang)
- Convert Mediatek to pci_host_probe() (Honghui Zhang)
- Fix Mediatek MSI enablement (Honghui Zhang)
- Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang)
- Add Mediatek loadable module support (Honghui Zhang)
- Detach VMD resources after stopping root bus to prevent orphan
resources (Jon Derrick)
- Convert pcitest build process to that used by other tools (iio, perf,
etc) (Gustavo Pimentel)
* tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI/AER: Refactor error injection fallbacks
PCI/AER: Abstract AER interrupt handling
PCI/AER: Reuse existing pcie_port_find_device() interface
PCI/AER: Use managed resource allocations
PCI: pcie: Remove redundant 'default n' from Kconfig
PCI: aardvark: Implement emulated root PCI bridge config space
PCI: mvebu: Convert to PCI emulated bridge config space
PCI: mvebu: Drop unused PCI express capability code
PCI: Introduce PCI bridge emulated config space common logic
PCI: vmd: Detach resources after stopping root bus
nvmet: Optionally use PCI P2P memory
nvmet: Introduce helper functions to allocate and free request SGLs
nvme-pci: Add support for P2P memory in requests
nvme-pci: Use PCI p2pmem subsystem to manage the CMB
IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
block: Add PCI P2P flag for request queue
PCI/P2PDMA: Add P2P DMA driver writer's documentation
docs-rst: Add a new directory for PCI documentation
PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
...
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/lpfc/lpfc_debugfs.c: In function 'lpfc_debugfs_nodelist_data':
drivers/scsi/lpfc/lpfc_debugfs.c:553:29: warning:
variable 'tgtp' set but not used [-Wunused-but-set-variable]
It never used since 2b65e18202 ("scsi: lpfc: NVME Target: Add debugfs support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/lpfc/lpfc_hbadisc.c: In function 'lpfc_free_tx':
drivers/scsi/lpfc/lpfc_hbadisc.c:5431:19: warning:
variable 'psli' set but not used [-Wunused-but-set-variable]
Since commit 895427bd01 ("scsi: lpfc: NVME Initiator: Base modifications")
'psli' is not used any more.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_sli4_sp_handle_rcqe':
drivers/scsi/lpfc/lpfc_sli.c:13430:26: warning:
variable 'fc_hdr' set but not used [-Wunused-but-set-variable]
drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_cq_create':
drivers/scsi/lpfc/lpfc_sli.c:14852:11: warning:
variable 'hw_page_size' set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistake in lpfc_printf_log message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
dma_alloc_coherent allocates memory that can be used by the cpu and the
device at the same time, calls to pci_dma_sync_* are not required, and in
fact actively harmful on some architectures like arm.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After bfcb79fca1 ("PCI/ERR: Run error recovery callbacks for all affected
devices"), AER errors are always cleared by the PCI core and drivers don't
need to do it themselves.
Remove calls to pci_cleanup_aer_uncorrect_error_status() from device
driver error recovery functions.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, remove PCI core changes, remove unused variables]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The driver currently uses the ndlp to get the local rport which is then used
to get the nvme transport remoteport pointer. There can be cases where a stale
remoteport pointer is obtained as synchronization isn't done through the
different dereferences.
Correct by using locks to synchronize the dereferences.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/lpfc/lpfc_nvme.c: In function 'lpfc_new_nvme_buf':
drivers/scsi/lpfc/lpfc_nvme.c:2238:24: warning:
variable 'sgl_size' set but not used [-Wunused-but-set-variable]
int bcnt, num_posted, sgl_size;
^
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 12.0.0.7
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch adds the ability to read firmware logs from the adapter. The driver
registers a buffer with the adapter that is then written to by the adapter.
The adapter posts CQEs to indicate content updates in the buffer. While the
adapter is writing to the buffer in a circular fashion, an application will
poll the driver to read the next amount of log data from the buffer.
Driver log buffer size is configurable via the ras_fwlog_buffsize sysfs
attribute. Verbosity to be used by firmware when logging to host memory is
controlled through the ras_fwlog_level attribute. The ras_fwlog_func
attribute enables or disables loggy by firmware.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, on each io completion, the stats update routine indiscriminately
holds a lock. While holding the adapter-wide lock, checks are made to check
whether status are being tracked. When disabled (the default), the locking
wasted a lot of cycles.
Check for stats enablement before taking the lock.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Message 6408 is displayed for each entry in an array, but the cpu and queue
numbers were incorrect for the entry. Message 6001 includes an extraneous
character.
Resolve both issues
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During attachment, the driver writes the EQ doorbell to disable potential
interrupts from an EQ. The current EQ doorbell format used for clearing the
interrupt is incorrect and uses an if_type=2 format, making the operation act
on the wrong EQ.
Correct the code to use the proper if_type=6 EQ doorbell format.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When taking the board offline while performing i/o, unsafe locking errors
occurred and irq level isn't properly managed.
In lpfc_sli_hba_down, spin_lock_irqsave(&phba->hbalock, flags) does not
disable softirqs raised from timer expiry. It is possible that a softirq is
raised from the lpfc_els_retry_delay routine and recursively requests the same
phba->hbalock spinlock causing deadlock.
Address the deadlocks by creating a new port_list lock. The softirq behavior
can then be managed a level deeper into the calling sequences.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When running an mds diagnostic that passes frames with the switch, soft
lockups are detected. The driver is in a CQE processing loop and has
sufficient amount of traffic that it never exits the ring processing routine,
thus the "lockup".
Cap the number of elements in the work processing routine to 64 elements. This
ensures that the cpu will be given up and the handler reschedule to process
additional items.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On io completion, the driver is taking an adapter wide lock and nulling the
scsi command back pointer. The nulling of the back pointer is to signify the
io was completed and the scsi_done() routine was called. However, the routine
makes no check to see if the abort routine had done the same thing and
possibly nulled the pointer. Thus it may doubly-complete the io.
Make the following mods:
- Check to make sure forward progress (call scsi_done()) only happens if the
command pointer was non-null.
- As the taking of the lock, which is adapter wide, is very costly on a system
under load, null the pointer using an xchg operation rather than under lock.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When nvme is enabled, change the default for two parameters:
sg_seg_cnt - raise the per-io sg list size so that 1MB ios are
supported (based on a 4k buffer per element).
iocb_cnt - raise the number of buffers used for things like
NVME LS request/responses to allow more concurrent requests
to for larger nvme configs.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver allocates a sg list per io struture based on a fixed maximum
size. When it registers with the protocol transports and indicates the max sg
list size it supports, the driver manipulates the fixed value to report a
lesser amount so that it has reserved space for sg elements that are used for
DIF.
The driver initialization path sets the cfg_sg_seg_cnt field to the
manipulated value for scsi. NVME initialization ran afterward and capped it's
maximum by the manipulated value for SCSI. This erroneously made NVME report
the SCSI-reduce-for-DIF value that reduced the max io size for nvme and wasted
sg elements.
Rework the driver so that cfg_sg_seg_cnt becomes the overall maximum size and
allow the max size to be tunable. A separate (new) scsi sg count is then
setup with the scsi-modified reduced value. NVME then initializes based off
the overall maximum.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver only sends NVME PRLI to a device that also supports FCP. This resuls
in remote ports that don't have fc_remote_ports created for them. The driver
is clearing the nlp_fc4_type for a ndlp at the wrong time.
Fix by moving the nlp_fc4_type clearing to the discovery engine in the
DEVICE_RECOVERY state. Also ensure that rport registration is done for all
nlp_fc4_types.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Smatch complains about this code:
drivers/scsi/lpfc/lpfc_scsi.c:1053 lpfc_get_scsi_buf_s4()
warn: variable dereferenced before check 'lpfc_cmd' (see line 1039)
Fortunately the NULL check isn't required so I have removed it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A recent change added some MDS processing in the lpfc_drain_txq routine
that relies on the fcp_wq being allocated. For nvmet operation the fcp_wq
is not allocated because it can only be an nvme-target. When the original
MDS support was added LS_MDS_LOOPBACK was defined wrong, (0x16) it should
have been 0x10 (decimal value used for hex setting). This incorrect value
allowed MDS_LOOPBACK to be set simultaneously with LS_NPIV_FAB_SUPPORTED,
causing the driver to crash when it accesses the non-existent fcp_wq.
Correct the bad value setting for LS_MDS_LOOPBACK.
Fixes: ae9e28f36a ("lpfc: Add MDS Diagnostic support.")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change default behavior for fdmi registration to on.
[mkp: patch was mangled]
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 12.0.0.6
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enablement of the PBDE optimization brought out some incompatible behaviors
under error scenarios.
Best to disable and remove the PBDE optimization.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After memory allocation for the LCB response frame, the memory wasn't zero
initialized, and not all fields are set. Thus garbage shows up in the
payload.
Fix by zeroing the memory at allocation. Also properly set the Capability
field based on duration support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Performance is affected when target queue depth is tracked. An atomic
counter is incremented on the submission path which competes with it being
decremented on the completion path. In addition, multiple CPUs can
simultaniously be manipulating this counter for the same ndlp.
Reduce the overhead by only performing the target increment/decrement when
the target queue depth is less than the overall adapter depth, thus is
actually meaningful.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During remote port loss fault testing, the driver crashed with the
following trace:
general protection fault: 0000 [#1] SMP
RIP: ... lpfc_nvme_register_port+0x250/0x480 [lpfc]
Call Trace:
lpfc_nlp_state_cleanup+0x1b3/0x7a0 [lpfc]
lpfc_nlp_set_state+0xa6/0x1d0 [lpfc]
lpfc_cmpl_prli_prli_issue+0x213/0x440
lpfc_disc_state_machine+0x7e/0x1e0 [lpfc]
lpfc_cmpl_els_prli+0x18a/0x200 [lpfc]
lpfc_sli_sp_handle_rspiocb+0x3b5/0x6f0 [lpfc]
lpfc_sli_handle_slow_ring_event_s4+0x161/0x240 [lpfc]
lpfc_work_done+0x948/0x14c0 [lpfc]
lpfc_do_work+0x16f/0x180 [lpfc]
kthread+0xc9/0xe0
ret_from_fork+0x55/0x80
After registering a new remoteport, the driver is pulling an ndlp pointer
from the lpfc rport associated with the private area of a newly registered
remoteport. The private area is uninitialized, so it's garbage.
Correct by pulling the the lpfc rport pointer from the entering ndlp point,
then ndlp value from at rport. Note the entering ndlp may be replacing by
the rport->ndlp due to an address change swap.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enabling list_debug showed the drivers txcmplq was suffering list
corruption. The systems will eventually crash because the iocb free list
gets crossed linked with the prings txcmplq. Most systems will run for a
while after the corruption, but will eventually crash when a scsi eh reset
occurs and the txcmplq is attempted to be flushed. The flush gets stuck in
an endless loop.
The problem is the abort handler does not hold the sli4 ring lock while
validating the IO so the IO could complete while the driver is still
preping the abort. The erroneously generated abort, when it completes, has
pointers to the original IO that has already completed, and the IO
manipulation (for the second time) corrupts the list.
Correct by taking the ring lock early in the abort handler so the erroneous
abort won't be sent if the io has/is completing.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
CNA ports were showing speed as "unknown" even if the link is up.
Add speed decoding for FCOE-based adapters.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For ABORT_XRI_CN command, firmware identifies XRI to abort by IOTAG and RPI
combination. For ELS aborts, driver specifies IOTAG correctly but RPI is
not specified.
Fix by setting RPI in WQE.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The null checks on nvmebuf are redundant as nvmebuf is always obtained from
a container_of() and hence can never be null. Remove all the redundant null
checks. This also cleans up a static analysis warning.
Detected by CoverityScan, CID#1471753 ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove the ScsiResult macro and open code it on all call sites.
This will make subsequent refactoring in this area easier.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change references from "Broadcom Limited" to "Broadcom Inc." in the
copyright message. Update copyright duration if not yet updated for 2018.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 12.0.0.5
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A race condition between the context of devloss timeout handler and I/O
completion caused devloss timeout handler de-referencing pointer that had
been released.
Added the check in lpfc_sli_validate_fcp_iocb() on LPFC_IO_ON_TXCMPLQ to
capture the race condition of I/O completion and devloss timeout handler
attemption for aborting the I/O. Also, added check on lpfc_cmd->rdata
pointer before de-referenceing lpfc_cmd->rdata->pnode.
Also, added protection in lpfc_sli_abort_iocb() routine on driver performed
FCP I/O FLUSHING already under way before proceeding to aborting I/Os.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kernel occasionally crashed with the following
ops on NVME Target:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffffa042ee50>] lpfc_nvmet_defer_rcv+0x50/0x70 [lpfc]
Callback routine was called for deferred rcv when it should be treated as a
normal rcv.
Added code in callback routine to detect this condition and log a message,
then bail.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current implementation missed setting the duration field. Correct the code
to set the field.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The PBDE optimizations aren't supported in all firmware revs.
Make optimizations configurable in case there's a side effect on old
firmware.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
rmmod of driver hangs
As driver instances were being unloaded, the NVME target port was unloaded
first. During the unload, the NVME initiator port sent a heartbeat
IO. Because of the target port state, that IO was scheduled for an Abort;
however, that abort subsequently failed. The failure was not cleaned up
properly and lpfc_sli4_xri_exchange_busy_wait silently hung forever.
Clean failed abort properly and make lpfc_sli4_xri_exchange_busy_wait not
hangs silently while waiting for aborts to complete.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
System crashes when the lpfc module is unloaded after making the port
offline
The nvme queue pointers were freed during port offline, but were later
accessed in pci remove path.
Validate the pointers in pci remove path before accessing them.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver is incorrectly formatting a register on new hardware, using a format
for an older chip. This can result in non-deterministic behavior.
Ensure driver is not setting "workqueue index" in the WQ doorbell when
making a non-dpp doorbell write. The field must be zero when non-dpp.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Kernel crashes during fill_read_buffer when nvme_info sysfs file read.
With multiple NVME targets, approx 40, nvme_info may grow larger than
PAGE_SIZE bytes. snprintf(buf + len, PAGE_SIZE - len, ...) logic is flawed
as PAGE_SIZE - len can be < 0 and is accepted by snprintf. This results in
buffer overflow, and is detected with check from dev_attr_show and
fill_read_buffer.
Change to use scnprintf to a tmp array, before calling strlcat to ensure no
buffer overflow over PAGE_SIZE bytes.
Message "6314" created as a new message indicating when there is more nvme
info, but is truncated to fit within PAGE_SIZE bytes.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The get_seconds() function suffers from a possible overflow in 2038 or
2106, as well as jitter due to settimeofday or leap second updates, and is
deprecated.
As we are interested in elapsed time only, using ktime_get_seconds() to
read the CLOCK_MONOTONIC timebase is ideal here. This also lets us remove
the hack that tries to deal with get_seconds() going slightly backwards,
which cannot happen with montonic timestamps.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 12.0.0.4
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver exits port setup after failing the lpfc_sli4_get_parameters
command (messages 0356, 2541, & 1412).
The older CNA adapters do not support the MBX command. In the past
the code was allowed to fail and continue on with initialization.
However a nvme change moved a closing bracket and now makes all
failures terminal.
Revise the logic so that terminal failure only occurs if the command
failed on the newer adapters. Additionally, if parameters are set
that require information from the command and the command failed,
the parameters are erroneous and port set up should fail even on
the older adapters.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The lancer G5 chip family fails the CQ create with 16k page size. The
hardware incorrectly reports it supports large page sizes when it is
actually limited to 4k pages.
A prior patch resolved this for the A0 chip revision only. This patch
excludes all revisions of the G5 asic from using large page sizes. As
knowing the actual chip revision is unnecessary, the now unused definitions
are removed
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
modprobe -r lpfc produces the following:
Call Trace:
__blk_mq_run_hw_queue+0xa2/0xb0
__blk_mq_delay_run_hw_queue+0x9d/0xb0
? blk_mq_hctx_has_pending+0x32/0x80
blk_mq_run_hw_queue+0x50/0xd0
blk_mq_sched_insert_request+0x110/0x1b0
blk_execute_rq_nowait+0x76/0x180
nvme_keep_alive_work+0x8a/0xd0 [nvme_core]
process_one_work+0x17f/0x440
worker_thread+0x126/0x3c0
? manage_workers.isra.24+0x2a0/0x2a0
kthread+0xd1/0xe0
? insert_kthread_work+0x40/0x40
ret_from_fork_nospec_begin+0x21/0x21
? insert_kthread_work+0x40/0x40
However, rmmod lpfc would run correctly.
When an nvme remoteport is unregistered with the host nvme transport, it
needs to set the remoteport->dev_loss_tmo value 0 to indicate an immediate
termination of device loss and prevent any further keep alives to that
rport. The driver was never setting dev_loss_tmo causing the nvme
transport to continue to send the keep alive.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Under large configurations, the driver would start to log message 6065 -
NVME out of buffers (exchanges).
The driver is using the ndlp cmd_qdepth value when determining the max
outstanding ios for an adapter. This value, by default, is set to 65536,
which exceeds the maximum exchange counts supported on an adapter. The ndlp
cmd_qdepth has no relevance and outstanding io count should be capped at
the max exchange count with IO requests beyond that level getting bounced
back with an EBUSY status so that they are retried by the block layer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
MDS diagnostics fail because of frame count mismatch.
Unavailability of SGL is the trigger for this issue. If ELS SGL is not
available to process MDS frame, IOCB is put in FCP txq but not attempted to
post afterwards. So, driver stops processing incoming frames as it runs out
of IOCB. lpfc_drain_txq attempts to submit IOCBS that are queued in ELS
txq but MDS frames are posted to FCP WQ.
Attempt to submit IOCBs that are present in FCP txq when MDS loopback is
running.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistakes in lpfc_printf_log log message
"mabilbox" -> "mailbox"
"maibox" -> "mailbox"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix small formatting and wording nits in Broadcom copyright header
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 12.0.0.3
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enhance log messages for CQEs as they were not reporting certain fields.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix up log messages and add an fcp error stat counter in the IO submit
code path to make diagnosing problems easier
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the cpu count is larger than the number of WQ resources available,
adapter attachment eventually failes due to a WQ_CREATE failure.
Calculate the number of WQs desired (which initializes to cpu count)
after accounting for the number of queues the adapter supports and the
number allocated to SCSI and the control/ELS path, and scale down if
necessary.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>