Commit af061a644a ("IB/qib: Use RCU for qpn lookup") introduced sparse
warnings.
This patch corrects those issues.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Profiling indicates that MR validation locking is expensive. The MR
table is largely read-only and is a suitable candidate for RCU locking.
The patch uses RCU locking during validation to eliminate one
lock/unlock during that validation.
Reviewed-by: Mike Heinz <michael.william.heinz@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
A timing issue can occur where qib_mr_dereg can return -EBUSY if the
MR use count is not zero.
This can occur if the MR is de-registered while RDMA read response
packets are being progressed from the SDMA ring. The suspicion is
that the peer sent an RDMA read request, which has already been copied
across to the peer. The peer sees the completion of his request and
then communicates to the responder that the MR is not needed any
longer. The responder tries to de-register the MR, catching some
responses remaining in the SDMA ring holding the MR use count.
The code now uses a get/put paradigm to track MR use counts and
coordinates with the MR de-registration process using a completion
when the count has reached zero. A timeout on the delay is in place
to catch other EBUSY issues.
The reference count protocol is as follows:
- The return to the user counts as 1
- A reference from the lk_table or the qib_ibdev counts as 1.
- Transient I/O operations increase/decrease as necessary
A lot of code duplication has been folded into the new routines
init_qib_mregion() and deinit_qib_mregion(). Additionally, explicit
initialization of fields to zero is now handled by kzalloc().
Also, duplicated code 'while.*num_sge' that decrements reference
counts have been consolidated in qib_put_ss().
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
An MR reference leak exists when handling UC RDMA writes with
immediate data because we manipulate the reference counts as if the
operation had been a send.
This patch moves the last_imm label so that the RDMA write operations
with immediate data converge at the cq building code. The copy/mr
deref code is now done correctly prior to the branch to last_imm.
Reviewed-by: Edward Mascarenhas <edward.mascarenhas@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The dev->sgid_tbl[] array is allocated in ocrdma_alloc_resources().
It has OCRDMA_MAX_SGID elements so the test here is off by one.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix RQ/SRQ error CQE polling. Return error CQE to consumer for error
case which was not returned previously.
Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix max sge calculation for sq, rq, srq for all hardware types.
Signed-off-by: Mahesh Vardhamanaiah <mahesh.vardhamanaiah@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix code to read the max wqe and max rqe values from mailbox response.
Signed-off-by: Mahesh Vardhamanaiah <mahesh.vardhamanaiah@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. Fix reporting GID table addition events.
2. Enable vlan based GID entries only when VLAN is enabled at compile
time (test CONFIG_VLAN_8021Q / CONFIG_VLAN_8021Q_MODULE).
Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1. Limit the max number of WQEs per QP reported when querying the
device, so that ib_create_qp() will not fail for a QP size that the
device claimed to support due to additional headroom WQEs being
allocated.
2. Limit qp resources accepted for ib_create_qp() to the limits
reported in ib_query_device(). In kernel space, make sure that the
limits returned to the caller following qp creation also lie within
the reported device limits. For userspace, report as before, and do
adjustment in libmlx4 (so as not to break ABI).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit e605b743f3 ("IB/mlx4: Increase the number of vectors (EQs)
available for ULPs") didn't handle correctly the case where there
aren't enough MSI-X vectors to increase the number of EQs, so only the
legacy EQs are allocated. This results in an attempt to memset() to
zero the EQ table which was never allocated and a kernel crash.
Fix this by checking in the teardown flow if the table of EQs was ever
allocated. Also remove some unneeded setting to zero of the EQ
related fields in struct mlx4_ib_dev.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When using rping -c -a 0.0.0.0 with iw_cxgb4, the system crashes when
rdma_connect() is called. ip_dev_find() will return NULL, but pdev is
accessed anyway.
Checking that pdev is NULL and returning -ENODEV prevents the system
from crashing.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Correct queue free count math for SQ, RQ for all hardware type.
Update user-kernel ABI interface.
Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We need to use a different loop index for mlx4_counter_alloc() and for
device_create_file() iterations: the mlx4_counter_alloc() loop index
is used in the error flow to free counters.
If the same loop index is used for device_create_file() and, say, the
device_create_file() loop fails on the first iteration, the allocated
counters will not be freed.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Enable IB ULPs to use a larger portion of the device EQs (which map to
IRQs). The mlx4_ib driver follows the mlx4_core framework of the EQs
to be divided among the device ports. In this scheme, for each IB
port, the number of allocated EQs follows the number of cores, subject
to other system constraints, such as number available MSI-X vectors.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This allows querying the QP state before flushing.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Using kfifos for ID management was limiting the number of QPs and
preventing NP384 MPI jobs. So replace it with a simple bitmap
allocator.
Remove IDs from the IDR tables before deallocating them. This bug was
causing the BUG_ON() in insert_handle() to fire because the ID was
getting reused before being removed from the IDR table.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This allows dumping thousands of QPs. Log active open failures of
interest.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add module option db_fc_threshold which is the count of active QPs
that trigger automatic db flow control mode. Automatically transition
to/from flow control mode when the active qp count crosses
db_fc_theshold.
Add more db debugfs stats
On DB DROP event from the LLD, recover all the iwarp queues.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use GFP_ATOMIC in _insert_handle() if ints are disabled.
Don't panic if we get an abort with no endpoint found. Just log a
warning.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Get FULL/EMPTY/DROP events from LLD. On FULL event, disable normal
user mode DB rings.
Add modify_qp semantics to allow user processes to call into the
kernel to ring doobells without overflowing.
Add DB Full/Empty/Drop stats.
Mark queues when created indicating the doorbell state.
If we're in the middle of db overflow avoidance, then newly created
queues should start out in this mode.
Bump the C4IW_UVERBS_ABI_VERSION to 2 so the user mode library can
know if the driver supports the kernel mode db ringing.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Log a warning and drop the abort message. Otherwise we will do a
bogus wake_up() and crash.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This fixes a race where an ingress abort fails to wake up the thread
blocked in rdma_init() causing the app to hang.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Don't call the ibqp event_handler pointer in the case it wasn't initialized.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Donald Wood <Donald.E.Wood@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Set ORD value of the connecting peer to be at least one in order to
accommodate an RDMA READ Request message.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Donald Wood <Donald.E.Wood@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch reorganizes the QP and devdata files to be more cache line aware.
qib_qp fields in particular are split into read-mostly, send, and receive fields.
qib_devdata fields are split into read-mostly and read/write fields
Testing has show that bidirectional tests improve by as much as 100%
with this patch.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If a MAD is sent directly to the local HCA rather than placed on a QP
and the MAD fails M_Key checks, there is no means to generate a
timeout for the client, which may hang. Instead we report
IB_MAD_RESULT_FAILURE, which operates the same for on-the-wire
packets, but will generate a send failure back to the client.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If a port has an M_Key lease set, the M_Key protect bits set to 1, and
a SubnSet arrives with an invalid M_Key, an M_Key mismatch trap is
generated, the lease timer begins as expected, and eventually the
M_Key protect bits will be set back to 0 as per the spec. However, if
any other SMP with an invalid M_Key arrives, the lease timer is expired
and the M_Key protect bits remain in force.
This is not according to to spec. In particular, C14-17 says that
a lease timer that is underway is not affected by protection level
checks (ie, at protection level 1, a SubnGet with a bad M_Key may be
successful, but does not stop the timer), and C14-19 says that the timer
shall stop when a valid M_Key has been received. C14-19 is the only
compliance statement that specifies a stopping condition for the timer.
This behavior is magnified if the port's Master SM LID attribute
points at itself. In that case, the M_Key mismatch trap is sufficient to
expire the timer, and the mkey lease attribute is rendered useless.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The SERDES was using the incorrect Frequency Loop Bandwidth setting
causing the link to cycle through the Physical link negotiation state
machine. Fixing the Frequency Loop Bandwidth setting in the SERDES
helps the link come up faster and more reliably.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
A "fix" for a bug with the number of contexts on a single-port board
caused the calculation to be off by one, which causes problems with
the upper layers. The same problem exists for number of free
contexts, which is also fixed here.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When a port first goes active with SMA Set(PortInfo) and reregister
bit set, the driver sends up the reregister event followed by a port
active event.
The problem is that in response to reregister event most apps try to
issue a SA query of some sort, but that fails because port is not
active.
The qib driver needs to a trivial change to correct this behavior.
This issue has been there for a while; however the recent serdes work
has probably made the delay between the reregister event and the
active event larger and hence opened the race far enough so that its
being seen more often.
The patch also changes the clientrereg local to a u8 and saves off the
rereg bit into it. The code following the nested subn_get_portinfo()
now restores that bit per o14-12.2.1 with a logical OR from that copy.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch optimizes pio buffer allocation in the kernel.
For qib, kernel pio buffers are used for sending acks. The code to
allocate the buffer would always start at 0 until it found a buffer.
This means that an average of 64 comparisions were done on each
allocate, since the busy bit won't be cleared until the bits are
refreshed when buffers are exhausted.
This patch adds two new fields in the devdata struct, last_pio and
min_kernel_pio. last_pio is the last buffer that was allocated.
min_kernel_pio is the lowest potential available buffer.
min_kernel_pio is modifed as contexts are allocated and deallocted.
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add a prefetch call when a packet has been stored. The nature of the
prefetch is correctly determined by the alternatives mechanism.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
[ Replace one more printk_once() with pr_info_once(). - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Otherwise CM packets going over MLX QP1 get fixed scheduling priority 0.
We want CM packets to get the same scheduling priority, and therefore
map to the same SQ (Schedule Queue) and eventually TC (Traffic Class),
as the application requested for the actual QP used for the connection.
Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Implement raw packet QPs for Ethernet ports using the MLX transport (as
done by the mlx4_en Ethernet netdevice driver).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When IPV6 is not enabled:
ERROR: "register_inet6addr_notifier" [drivers/infiniband/hw/ocrdma/ocrdma.ko] undefined!
ERROR: "unregister_inet6addr_notifier" [drivers/infiniband/hw/ocrdma/ocrdma.ko] undefined!
Fix this by wrapping the inet6 calls in #ifdef IPV6. Also make the
ocrdma module depend on (IPV6 || IPV6=n) to forbid the case of modular
ipv6 but built-in ocrdma (which can't work, because ocrdma calls ipv6
functions).
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We only need to disable the IRQs one time.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[ Rename "wq_flags" to more conventional "flags." - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
The ocrdma_alloc_lkey() function never returns NULL pointers -- it
returns ERR_PTRs.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Events sent to ocrdma_inet6addr_event() are sent from an atomic context,
therefore we can't try to lock a mutex within the notifier callback.
We could just switch the mutex to a spinlock since all it does it
protect a list, but I've gone ahead and switched the list to use RCU
instead. I couldn't fully test it since I don't have IB hardware, so
if it doesn't fully work for some reason let me know and I'll switch
it back to using a spinlock.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
[ Fixed locking in ocrdma_add(). - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
We need to set ib_evt.device, or else ib_dispatch_event() will crash
when we call it for unaffiliated events (and consumers may get
confused in their QP/CQ/SRQ event handler for affiliated events).
Also fix sparse warning:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:678:36: warning: Using plain integer as NULL pointer
There's no need to clear ib_evt, since every member is initialized.
Signed-off-by: Roland Dreier <roland@purestorage.com>