There were a number of erroneous comments and incorrect older lockdep
checks that were causing a number of warnings.
Resolve the following:
- Inconsistent lock state warnings in lpfc_nvme_info_show().
- Fixed comments and code on sequences where ring lock is now held instead
of hbalock.
- Reworked calling sequences around lpfc_sli_iocbq_lookup(). Rather than
locking prior to the routine and have routine guess on what lock, take
the lock within the routine. The lockdep check becomes unnecessary.
- Fixed comments and removed erroneous hbalock checks.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that a kernel warning appears when smp_processor_id() is
called with preempt debugging enabled.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove those functions that are not called from outside the removed
functions.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoid that smatch complains about misleading indentation.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that the compiler complains about missing declarations
when building with W=1.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Declaring interrupt clear routines as inline is bogus as they are used as
an indirect pointer.
Remove the inline references.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
You can't declare a function inline in a header if it doesn't have a body
available to the compiler. So realistically you either don't declare it
inline or you make it a static inline in the header. I think the latter
applies in this case, so this should be the fix
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change the SLI4 queue creation code to use NUMA node based memory
allocation based on the cpu the queues will be related to.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the driver maintains a sideband structure which has a pointer for
each queue element. However, at 8 bytes per pointer, and up to 4k elements
per queue, and 100s of queues, this can take up a lot of memory.
Convert the driver to using an access routine that calculates the element
address based on its index rather than using the pointer table.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is currently reporting the firmware revision not the actual boot
bios version in FDMI data.
Modify the driver to obtain the boot bios version from the adapter and use
that data in the FMDI data sent to the switch.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The adapter initialization sequence enables interrupts, initializes the
adapter link_state to LINK_DOWN, then issues commands to initialize the
adapter. The interrupt handler on the adapter validates the link_state (has
to be at least LINK_DOWN) and if invalid, will discard the interrupting
event.
In most cases, there is not a command completion, thus an interrupt until
the initialization commands have been sent which is post the setting of
state to LINK_DOWN. However, in cases of firmware reset, the reset will
modify the link_state to an invalid value (indicating a reset of the
adapter) and there occasionally are cases where the adapter will generate
an asynchronous event which shares the eq/cq used for mailbox commands. In
the failure case, an interrupt is generated immediately after enabling them
due to the async event. As link_state is invalid, the eq is list and the
CQ not serviced. At this point link_state is initialized and the mailbox
command sent. As the CQ has not been serviced, it is not armed, so no
interrupt event is generated when the mailbox command completes.
Modify the initialization sequence so that interrupts are enabled after
link_state is properly initialized, which avoids the race condition with
the async event.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current code is using msleep when polling for hw ready. Unfortunately the
msleep routine isn't very accurate on rescheduling. In fact, on a busy
systems which reset the adapter, it became 10s of seconds before it was
rescheduled.
Fix by busy waiting using udelay. As we're now busy waiting, significantly
reduce the wait time so that we can exit the pool loop as soon as possible.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver periodically checks for adapter error in a background thread. If
the thread detects an error, the adapter will be reset including the
deletion and reallocation of workqueues on the adapter. Simultaneously,
there may be a user-space request to offline the adapter which may try to
do many of the same steps, in parallel, on a different thread. As memory
was deallocated while unexpected, the parallel offline request hit a bad
pointer.
Add coordination between the two threads. The error recovery thread has
precedence. So, when an error is detected, a flag is set on the adapter to
indicate the error thread is terminating the adapter. But, before doing
that work, it will look for a flag that is set by the offline flow, and if
set, will wait for it to complete before then processing the error handling
path. Similarly, in the offline thread, it first checks for whether the
error thread is resetting the adapter, and if so, will then wait for the
error thread to finish. Only after it has finished, will it set its flag
and offline the adapter.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In a couple of cases, the driver detected a pci error (via pci device state
or via failed register reads) but didn't take any action to disable the
device. Additionally, the driver is ignoring the status of pci
configuration space reads.
Having the driver take the adapter offline whenever the pci error is
detected. Pay attention to pci_config_space_read status and return failure
if an error is seen.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With negative test injection, the driver is receiving a command with first
burst enabled, meaning Sequence initiative is not passed with the command
frame. The driver notes the condition and discards the frame. However the
driver calls the incorrect buffer free routine, resulting in a NULL pointer
reference.
For hbq buffer free, convert to using lpfc_rq_buf_free().
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When unloading the driver, mailbox commands may be sent without holding a
reference on the ndlp. By the time the mailbox command completes, the ndlp
may have reduced its ref counts and been freed. The problem was reported
by KASAN.
While unregistering due to driver unload, have the completion noop'd by
setting the ndlp context NULL'd. Due to the unload, no further action was
necessary. Also, while reviewing this path, the generic nulling of the
context after handling should be slightly moved.
Reported by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is the final round of mostly small fixes and performance
improvements to our initial submit. The main regression fix is the
ia64 simscsi build failure which was missed in the serial number
elimination conversion.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXIxBayYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pisherpAP4rxLpX
bcUnQnEsvoxys/JyoK08Qfv1JebZo1B2MAZ62wD/VZ7LpOuzVLhsM2KhLFGRrs1/
7D2K4tgtO2dQsFix7H0=
=pcHl
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is the final round of mostly small fixes and performance
improvements to our initial submit.
The main regression fix is the ia64 simscsi build failure which was
missed in the serial number elimination conversion"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
scsi: ia64: simscsi: use request tag instead of serial_number
scsi: aacraid: Fix performance issue on logical drives
scsi: lpfc: Fix error codes in lpfc_sli4_pci_mem_setup()
scsi: libiscsi: Hold back_lock when calling iscsi_complete_task
scsi: hisi_sas: Change SERDES_CFG init value to increase reliability of HiLink
scsi: hisi_sas: Send HARD RESET to clear the previous affiliation of STP target port
scsi: hisi_sas: Set PHY linkrate when disconnected
scsi: hisi_sas: print PHY RX errors count for later revision of v3 hw
scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
scsi: hisi_sas: Change return variable type in phy_up_v3_hw()
scsi: qla2xxx: check for kstrtol() failure
scsi: lpfc: fix 32-bit format string warning
scsi: lpfc: fix unused variable warning
scsi: target: tcmu: Switch to bitmap_zalloc()
scsi: libiscsi: fall back to sendmsg for slab pages
scsi: qla2xxx: avoid printf format warning
scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset
scsi: lpfc: Correct __lpfc_sli_issue_iocb_s4 lockdep check
scsi: ufs: hisi: fix ufs_hba_variant_ops passing
scsi: qla2xxx: Fix panic in qla_dfs_tgt_counters_show
...
This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
hisi_sas, target/iscsi and target/core. Additionally Christoph
refactored gdth as part of the dma changes. The major mid-layer
change this time is the removal of bidi commands and with them the
whole of the osd/exofs driver and filesystem.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXIC54SYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishT1GAPwJEV23
ExPiPsnuVgKj49nLTagZ3rILRQcYNbL+MNYqxQEA0cT8FHzSDBfWY5OKPNE+RQ8z
f69LpXGmMpuagKGvvd4=
=Fhy1
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
hisi_sas, target/iscsi and target/core.
Additionally Christoph refactored gdth as part of the dma changes. The
major mid-layer change this time is the removal of bidi commands and
with them the whole of the osd/exofs driver and filesystem. This is a
major simplification for block and mq in particular"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (240 commits)
scsi: cxgb4i: validate tcp sequence number only if chip version <= T5
scsi: cxgb4i: get pf number from lldi->pf
scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
scsi: mpt3sas: Add missing breaks in switch statements
scsi: aacraid: Fix missing break in switch statement
scsi: kill command serial number
scsi: csiostor: drop serial_number usage
scsi: mvumi: use request tag instead of serial_number
scsi: dpt_i2o: remove serial number usage
scsi: st: osst: Remove negative constant left-shifts
scsi: ufs-bsg: Allow reading descriptors
scsi: ufs: Allow reading descriptor via raw upiu
scsi: ufs-bsg: Change the calling convention for write descriptor
scsi: ufs: Remove unused device quirks
Revert "scsi: ufs: disable vccq if it's not needed by UFS device"
scsi: megaraid_sas: Remove a bunch of set but not used variables
scsi: clean obsolete return values of eh_timed_out
scsi: sd: Optimal I/O size should be a multiple of physical block size
scsi: MAINTAINERS: SCSI initiator and target tweaks
scsi: fcoe: make use of fip_mode enum complete
...
The outer routine lpfc_sli_issue_iocb(), which decomposes into the
SLI3 (s3) or SLI4 (s4) subroutines takes out the locks. For s3, it takes
out the hbalock. For s4, it takes out the ring_lock. The lockdep check in
the s3 and s4 subroutines both check hbalock, which is incorrect for s4.
Revise the s4 subroutine to lockdep check the ring_lock.
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are a handful of statements that are indented incorrectly. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For files modified as part of 12.2.0.0 patches, update copyright to 2019
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A scsi host lock is taken on every io completion to check whether the abort
handler is waiting on the io completion. This is an expensive lock to take
on all completion when rarely in an abort condition.
Replace scsi host lock with command-specific lock. Synchronize completion
and abort paths by new cmd lock. Ensure all flag changing and nulling of
context pointers taken under lock. When adding lock to task management
abort, realized it was missing other synchronization locks. Added that
synchronization to match normal paths.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When driving high iop counts, auto_imax coalescing kicks in and drives the
performance to extremely small iops levels.
There are two issues:
1) auto_imax is enabled by default. The auto algorithm, when iops gets
high, divides the iops by the hdwq count and uses that value to
calculate EQ_Delay. The EQ_Delay is set uniformly on all EQs whether
they have load or not. The EQ_delay is only manipulated every 5s (a
long time). Thus there were large 5s swings of no interrupt delay
followed by large/maximum delay, before repeating.
2) When processing a CQ, the driver got mixed up on the rate of when
to ring the doorbell to keep the chip appraised of the eqe or cqe
consumption as well as how how long to sit in the thread and
process queue entries. Currently, the driver capped its work at
64 entries (very small) and exited/rearmed the CQ. Thus, on heavy
loads, additional overheads were taken to exit and re-enter the
interrupt handler. Worse, if in the large/maximum coalescing
windows,k it could be a while before getting back to servicing.
The issues are corrected by the following:
- A change in defaults. Auto_imax is turned OFF and fcp_imax is set
to 0. Thus all interrupts are immediate.
- Cleanup of field names and their meanings. Existing names were
non-intuitive or used for duplicate things.
- Added max_proc_limit field, to control the length of time the
handlers would service completions.
- Reworked EQ handling:
Added common routine that walks eq, applying notify interval and max
processing limits. Use queue_claimed to claim ownership of the queue
while processing. Always rearm the queue whenever the common routine
is called.
Rework queue element processing, namely to eliminate hba_index vs
host_index. Only one index is necessary. The queue entry can be
marked invalid and the host_index updated immediately after eqe
processing.
After rework, xx_release routines are now DB write functions. Renamed
the routines as such.
Moved lpfc_sli4_eq_flush(), which does similar action, to same area.
Replaced the 2 individual loops that walk an eq with a call to the
common routine.
Slightly revised lpfc_sli4_hba_handle_eqe() calling syntax.
Added per-cpu counters to detect interrupt rates and scale
interrupt coalescing values.
- Reworked CQ handling:
Added common routine that walks cq, applying notify interval and max
processing limits. Use queue_claimed to claim ownership of the queue
while processing. Always rearm the queue whenever the common routine
is called.
Rework queue element processing, namely to eliminate hba_index vs
host_index. Only one index is necessary. The queue entry can be
marked invalid and the host_index updated immediately after cqe
processing.
After rework, xx_release routines are now DB write functions. Renamed
the routines as such.
Replaced the 3 individual loops that walk a cq with a call to the
common routine.
Redefined lpfc_sli4_sp_handle_mcqe() to commong handler definition with
queue reference. Add increment for mbox completion to handler.
- Added a new module/sysfs attribute: lpfc_cq_max_proc_limit To allow
dynamic changing of the CQ max_proc_limit value being used.
Although this leaves an EQ as an immediate interrupt, that interrupt will
only occur if a CQ bound to it is in an armed state and has cqe's to
process. By staying in the cq processing routine longer, high loads will
avoid generating more interrupts as they will only rearm as the processing
thread exits. The immediately interrupt is also beneficial to idle or
lower-processing CQ's as they get serviced immediately without being
penalized by sharing an EQ with a more loaded CQ.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Review of the eq coalescing logic showed the code was a bit fragmented.
Sometimes it would save/set via an interrupt max value, while in others it
would do so via a usdelay. There were also two places changing eq delay,
one place that issued mailbox commands, and another that changed via
register writes if supported.
Clean this up by:
- Standardizing the operation of lpfc_modify_hba_eq_delay() routine so
that it is always told of a us delay to impose. The routine then chooses
the best way to set that - via register or via mbx.
- Rather than two value types stored in eq->q_mode (usdelay if change via
register, imax if change via mbox) - q_mode always contains usdelay.
Before any value change, old vs new value is compared and only if
different is a change done.
- Revised the dmult calculation. dmult is not set based on overall imax
divided by hardware queues - instead imax applies to a single cpu and
the value will be replicated to all cpus.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
So far MSIX vector allocation assumed it would be 1:1 with hardware
queues. However, there are several reasons why fewer MSIX vectors may be
allocated than hardware queues such as the platform being out of vectors or
adapter limits being less than cpu count.
This patch reworks the MSIX/EQ relationships with the per-cpu hardware
queues so they can function independently. MSIX vectors will be equitably
split been cpu sockets/cores and then the per-cpu hardware queues will be
mapped to the vectors most efficient for them.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Default behavior is to use the information from the upper IO stacks to
select the hardware queue to use for IO submission. Which typically has
good cpu affinity.
However, the driver, when used on some variants of the upstream kernel, has
found queuing information to be suboptimal for FCP or IO completion locked
on particular cpus.
For command submission situations, the lpfc_fcp_io_sched module parameter
can be set to specify a hardware queue selection policy that overrides the
os stack information.
For IO completion situations, rather than queing cq processing based on the
cpu servicing the interrupting event, schedule the cq processing on the cpu
associated with the hardware queue's cq.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The XRI get/put lists were partitioned per hardware queue. However, the
adapter rarely had sufficient resources to give a large number of resources
per queue. As such, it became common for a cpu to encounter a lack of XRI
resource and request the upper io stack to retry after returning a BUSY
condition. This occurred even though other cpus were idle and not using
their resources.
Create as efficient a scheme as possible to move resources to the cpus that
need them. Each cpu maintains a small private pool which it allocates from
for io. There is a watermark that the cpu attempts to keep in the private
pool. The private pool, when empty, pulls from a global pool from the
cpu. When the cpu's global pool is empty it will pull from other cpu's
global pool. As there many cpu global pools (1 per cpu or hardware queue
count) and as each cpu selects what cpu to pull from at different rates and
at different times, it creates a radomizing effect that minimizes the
number of cpu's that will contend with each other when the steal XRI's from
another cpu's global pool.
On io completion, a cpu will push the XRI back on to its private pool. A
watermark level is maintained for the private pool such that when it is
exceeded it will move XRI's to the CPU global pool so that other cpu's may
allocate them.
On NVME, as heartbeat commands are critical to get placed on the wire, a
single expedite pool is maintained. When a heartbeat is to be sent, it will
allocate an XRI from the expedite pool rather than the normal cpu
private/global pools. On any io completion, if a reduction in the expedite
pools is seen, it will be replenished before the XRI is placed on the cpu
private pool.
Statistics are added to aid understanding the XRI levels on each cpu and
their behaviors.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SLI4 nvme functions are passing the SLI3 ring number when posting wqe to
hardware. This should be indicating the hardware queue to use, not the ring
number.
Replace ring number with the hardware queue that should be used.
Note: SCSI avoided this issue as it utilized an older lfpc_issue_iocb
routine that properly adapts.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Both NVME and SCSI aborts are now processed off the CQ workqueue and do not
generate events for the slowpath any more.
Remove the unused event code.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Once the IO buff allocations were made shared, there was a single XRI
buffer list shared by all hardware queues. A single list isn't great for
performance when shared across the per-cpu hardware queues.
Create a separate XRI IO buffer get/put list for each Hardware Queue. As
SGLs and associated IO buffers get allocated/posted to the firmware; round
robin their assignment across all available hardware Queues so that there
is an equitable assignment.
Modify SCSI and NVME IO submit code paths to use the Hardware Queue logic
for XRI allocation.
Add a debugfs interface to display hardware queue statistics
Added new empty_io_bufs counter to track if a cpu runs out of XRIs.
Replace common_ variables/names with io_ to make meanings clearer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, both nvme and fcp each have their own concept of an io_channel,
which is a combination wq/cq and associated msix. Different cpus would
share an io_channel.
The driver is now moving to per-cpu wq/cq pairs and msix vectors. The
driver will still use separate wq/cq pairs per protocol on each cpu, but
the protocols will share the msix vector.
Given the elimination of the nvme and fcp io channels, the module
parameters will be removed. A new parameter, lpfc_hdw_queue is added which
allows the wq/cq pair allocation per cpu to be overridden and allocated to
lesser value. If lpfc_hdw_queue is zero, the number of pairs allocated will
be based on the number of cpus. If non-zero, the parameter specifies the
number of queues to allocate. At this time, the maximum non-zero value is
64.
To manage this new paradigm, a new hardware queue structure is created to
track queue activity and relationships.
As MSIX vector allocation must be known before setting up the
relationships, msix allocation now occurs before queue datastructures are
allocated. If the number of vectors allocated is less than the desired
hardware queues, the hardware queue counts will be reduced to the number of
vectors
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a extra queue and msix vector for expresslane. Now that the driver
will be doing queues per cpu, this oddball queue is no longer needed.
Expresslane will utilize the normal per-cpu queues.
Updated debugfs sli4 queue output to go along with the change
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, both NVME and SCSI get their IO buffers from separate
pools. XRI's are associated 1:1 with IO buffers, so XRI's are also split
between protocols.
Eliminate the independent pools and use a single pool. Each buffer
structure now has a common section and a protocol section. Per protocol
routines for SGL initialization are removed and replaced by common
routines. Initialization of the buffers is only done on the common area.
All other fields, which are protocol specific, are initialized when the
buffer is allocated for use in the per-protocol allocation routine.
In the past, the SCSI side allocated IO buffers as part of slave_alloc
calls until the maximum XRIs for SCSI was reached. As all XRIs are now
common and may be used for either protocol, allocation for everything is
done as part of adapter initialization and the scsi side has no action in
slave alloc.
As XRI's are no longer split, the lpfc_xri_split module parameter is
removed.
Adapters based on SLI3 will continue to use the older scsi_buf_list_get/put
routines. All SLI4 adapters utilize the new IO buffer scheme
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A set of 17 fixes. Most of these are minor or trivial. The one fix
that may be serious is the isci one: the bug can cause hba parameters
to be set from uninitialized memory. I don't think it's exploitable,
but you never know.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXEKL0SYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVZpAQCwuPTk
fqOt4v4hJ0oUHtEBsQK3VMXSdUvWdb5Lbn3WeQD/RFYTyNxcIF7ADSWw71b+IigT
ejUrMzI8ig+nZ1jbFZ4=
=BdS/
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of 17 fixes. Most of these are minor or trivial.
The one fix that may be serious is the isci one: the bug can cause hba
parameters to be set from uninitialized memory. I don't think it's
exploitable, but you never know"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: cxgb4i: add wait_for_completion()
scsi: qla1280: set 64bit coherent mask
scsi: ufs: Fix geometry descriptor size
scsi: megaraid_sas: Retry reads of outbound_intr_status reg
scsi: qedi: Add ep_state for login completion on un-reachable targets
scsi: ufs: Fix system suspend status
scsi: qla2xxx: Use correct number of vectors for online CPUs
scsi: hisi_sas: Set protection parameters prior to adding SCSI host
scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes
scsi: isci: initialize shost fully before calling scsi_add_host()
scsi: lpfc: lpfc_sli: Mark expected switch fall-throughs
scsi: smartpqi_init: fix boolean expression in pqi_device_remove_start
scsi: core: Synchronize request queue PM status only on successful resume
scsi: pm80xx: reduce indentation
scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
scsi: megaraid_sas: correct an info message
scsi: target/iscsi: fix error msg typo when create lio_qr_cache failed
scsi: sd: Fix cache_type_store()
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
we are expecting to fall through.
Notice that, in this particular case, I replaced "Drop thru" and "Fall
Thru" with "fall through" annotations, which is what GCC is expecting to
find.
Also, in some cases a dash is added as a token in order to separate the
"fall through" annotation from the rest of the comment on the same line,
which is what GCC is expecting to find.
Addresses-Coverity-ID: 114979 ("Missing break in switch")
Addresses-Coverity-ID: 114980 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.
This change was generated with the following Coccinelle SmPL patch:
@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@
-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas. Additionally, we have
a pile of annotation, unused variable and minor updates. The big API
change is the updates for Christoph's DMA rework which include
removing the DISABLE_CLUSTERING flag. And finally there are a couple
of target tree updates.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXCEUNiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdjKAP9vrTTv
qFaYmAoRSbPq9ZiixaXLMy0K/6o76Uay0gnBqgD/fgn3jg/KQ6alNaCjmfeV3wAj
u1j3H7tha9j1it+4pUw=
=GDa+
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: smarpqi, lpfc, qedi,
megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas.
Additionally, we have a pile of annotation, unused variable and minor
updates.
The big API change is the updates for Christoph's DMA rework which
include removing the DISABLE_CLUSTERING flag.
And finally there are a couple of target tree updates"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits)
scsi: isci: request: mark expected switch fall-through
scsi: isci: remote_node_context: mark expected switch fall-throughs
scsi: isci: remote_device: Mark expected switch fall-throughs
scsi: isci: phy: Mark expected switch fall-through
scsi: iscsi: Capture iscsi debug messages using tracepoints
scsi: myrb: Mark expected switch fall-throughs
scsi: megaraid: fix out-of-bound array accesses
scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through
scsi: fcoe: remove set but not used variable 'port'
scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
scsi: smartpqi: fix build warnings
scsi: smartpqi: update driver version
scsi: smartpqi: add ofa support
scsi: smartpqi: increase fw status register read timeout
scsi: smartpqi: bump driver version
scsi: smartpqi: add smp_utils support
scsi: smartpqi: correct lun reset issues
scsi: smartpqi: correct volume status
scsi: smartpqi: do not offline disks for transient did no connect conditions
scsi: smartpqi: allow for larger raid maps
...
This patch adds a "pci_bus_reset" option to the board_mode sysfs attribute.
This option uses the pci_reset_bus() api to reset the PCIe link the adapter
is on, which will reset the chip/adapter. Prior to issuing this option,
all functions on the same chip must be placed in the offline state by the
admin. After the reset, all of the instances may be brought online again.
The primary purpose of this functionality is to support cases where
firmware update required a chip reset but the admin did not want to reboot
the machine in order to instantiate the firmware update.
Sanity checks take place prior to the reset to ensure the adapter is the
sole entity on the PCIe bus and that all functions are in the offline
state.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a target's link dropped, an RSCN was received to communicate the
change. The driver detected the loss of the target and issued and UNREG_RPI
mailbox command. While that was being processed, another RSCN was received
to communicate the port coming back. The driver deferred the PLOGI to the
port until the mailbox command finishes. When the mailbox command completed
it saw the pending port and called the routines to issue the
PLOGI. However, it forgot to clear the UNREG_INP state flag, so the PLOGI
xmt routine nooped the PLOGI request assuming it needed to wait for the
mailbox command. At this point, login would never be re-attempted.
Clear UNREG_INP before issuing the deferred PLOGI.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The existing MDS loopback diagnostics support processing received frames in
the slowpath work thread. It caps the number of frames it will process at
64, before waiting for another event to indicate additional frame
reception. The net-net is this results in very slow frame processing during
loopback tests and sometimes orphans an io, causing the loopback test to
report failure by the switch.
Move MDS loopback frame processing out of the slow path worker thread and
into the normal RQ processing routines.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Certain older adapters such as the OneConnect OCe10100 may not have a valid
wqpcnt value. In this case, do not set queue->page_count to 0 in
lpfc_sli4_queue_alloc() as this will prevent the driver from initializing.
Fixes: 895427bd01 ("scsi: lpfc: NVME Initiator: Base modifications")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On driver termination, after the driver stops fw logging by writing a
register on the chip, the driver immediately unmaps and frees the logging
buffer, without confirming in any way that the chip has received the write
and terminated the logging. As termination on the chip is not immediate,
the chip may issue a dma request to the now unmapped dma buffer, resulting
in a iommu fault.
Change the driver to receive a confirmation that logging ahs been
terminated. As the driver always issues an SLI reset with the device as
part of shutdown, and as part of that is receiving confirmation that the
reset is complete - the driver was modified to perform the write to disable
fw logging prior to the SLI reset and only free the fw log buffer after the
SLI reset is complete. That guarantees use of the fw log buffer is fully
terminated when it is unmapped.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver is setting bits in word 10 of the SLI4 ABORT WQE (the wqid). The
field was a carry over from a prior SLI revision. The field does not exist
in SLI4, and the action may result in an overlap with future definition of
the WQE.
Remove the setting of WQID in the ABORT WQE.
Also cleaned up WQE field settings - initialize to zero, don't bother to
set fields to zero.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver is hitting null pring pointers in lpfc_do_work().
Pointer assignment occurs based on SLI-revision. If recovering after an
error, its possible the sli revision for the port was cleared, making the
lpfc_phba_elsring() not return a ring pointer, thus the null pointer.
Add SLI revision checking to lpfc_phba_elsring() and status checking to all
callers.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is getting hit with 100s of RSCNs during remote port address
changes. Each of those RSCN's ends up generating UNREG_RPI and REG_PRI
mailbox commands. The discovery engine within the driver doesn't wait for
the mailbox command completions. Instead it sets state flags and moves
forward. At some point, there's a massive backlog of mailbox commands which
take time for the adapter to process. Additionally, it appears there were
duplicate events from the switch so the driver generated duplicate mailbox
commands for the same remote port. During this window, failures on PLOGI
and PRLI ELS's are see as the adapter is rejecting them as they are for
remote ports that still have pending mailbox commands.
Streamline the discovery engine so that PLOGI log checks for outstanding
UNREG_RPIs and defer the processing until the commands complete. This
better synchronizes the ELS transmission vs the RPI registrations.
Filter out multiple UNREG_RPIs being queued up for the same remote port.
Beef up log messages in this area.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver data structure for managing a mailbox command contained two
context fields. Unfortunately, the context were considered "generic" to be
used at the whim of the command code. Of course, one section of code used
fields this way, while another did it that way, and eventually there were
mixups.
Refactored the structure so that the generic contexts become a node context
and a buffer context and all code standardizes on their use.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While trying to get adapter fw-log for a function whose buffsize was set to
0, kernel panic occurred.
When buffsize is 0, the kernel buffer for the log won't be allocated. When
fw log usage was enabled, it failed to check the buffer size, and log usage
was started. Eventually the driver referenced the unallocated log buffer.
Added checks of the buffer size before allowing fw logging to be enabled
and added check for valid buffer if enabling fw log.
Performed a couple other minor cleanups while fixing this:
- clarified log messages
- re-evaluated log message severity
- treat any error as an error, not only a couple codes
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since f44ac12f1d, BG enablement is tracked with the LPFC_SLI3_BG_ENABLED
bit, which is set in lpfc_get_cfgparam before lpfc_sli_config_sli_port() is
called. The bit shouldn't be cleared before checking the feature. Based on
problem analysis by David Bond.
Fixes: f44ac12f1d "scsi: lpfc: Memory allocation error during driver start-up on power8"
Tested-by: David Bond <dbond@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Cc: stable@vger.kernel.org # 4.17.x
Cc: stable@vger.kernel.org # 4.18.x
Cc: stable@vger.kernel.org # 4.19.x
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replaced dma_alloc_coherent + memset with dma_zalloc_coherent.
Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add trunking support to the driver. Trunking is found on more recent
asics. In general, trunking appears as a single "port" to the driver
and overall behavior doesn't differ. Link speed is reported as an
aggregate value, while link speed control is done on a per-physical
link basis with all links in the trunk symmetrical. Some commands
returning port information are updated to additionally provide
trunking information. And new ACQEs are generated to report physical
link events relative to the trunk.
This patch contains the following modifications:
- Added link speed settings of 128GB and 256GB.
- Added handling of trunk-related ACQEs, mainly logging and trapping
of physical link statuses.
- Added additional bsg interface to query trunk state by applications.
- Augment link_state sysfs attribtute to display trunk link status
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>