Commit Graph

6278 Commits

Author SHA1 Message Date
Doug Ledford 884fa4f304 Merge branches 'chelsio', 'debug-cleanup', 'hns' and 'i40iw' into merge-test 2016-12-14 14:43:14 -05:00
Pan Bian 46d0703fac IB/mlx4: fix improper return value
If uhw->inlen is non-zero, the value of variable err is 0 if the copy
succeeds. Then, if kzalloc() or kmalloc() returns a NULL pointer, it
will return 0 to the callers. As a result, the callers cannot detect the
errors. This patch fixes the bug, assign "-ENOMEM" to err before the
NULL pointer checks, and remove the initialization of err at the
beginning.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189031
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 14:35:23 -05:00
Pan Bian 5b4c9cd7e4 IB/ocrdma: fix bad initialization
In function ocrdma_mbx_create_ah_tbl(), returns the value of status on
errors. However, because status is initialized with 0, 0 will be
returned even if on error paths. This patch initialize status with
"-ENOMEM".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188831

Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 14:33:48 -05:00
Zhouyi Zhou 6a3a1056d6 infiniband: nes: return value of skb_linearize should be handled
Return value of skb_linearize should be handled in function
nes_netdev_start_xmit.

Compiled in x86_64
Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 14:26:49 -05:00
Sebastian Ott 17069d32a3 IB/core: fix unmap_sg argument
__ib_umem_release calls dma_unmap_sg with a different number of
sg_entries than ib_umem_get uses for dma_map_sg. This might cause
trouble for implementations that merge sglist entries and results
in the following dma debug complaint:

DMA-API: device driver frees DMA sg list with different entry
         count [map count=2] [unmap count=1]

Fix it by using the correct value.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 14:21:26 -05:00
Souptick Joarder 7ceb740c54 IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
In mthca_create_ah(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()

Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:58:39 -05:00
Bart Van Assche 1974ab9d9d mlx5, calc_sq_size(): Make a debug message more informative
Make it clear that qp->sq.wqe_cnt is not the number of WQEs.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:45:38 -05:00
Bart Van Assche 3d6bdf1625 mlx5: Remove a set-but-not-used variable
This has been detected by building the mlx5 driver with W=1.

Fixes: 1a412fb1ca ('net/mlx5: Fixes: 1a412fb1ca (IB/mlx5: Modify QP
commands via mlx5 ifc')
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:45:10 -05:00
Bart Van Assche 626bc02d4d mlx5: Use { } instead of { 0 } to init struct
Detected by sparse.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:42:32 -05:00
Bart Van Assche 4fa354c9db IB/srp: Make writing the add_target sysfs attr interruptible
Avoid that shutdown of srp_daemon is delayed if add_target_mutex is
held by another process.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:47 -05:00
Bart Van Assche 290081b453 IB/srp: Make mapping failures easier to debug
Make it easier to figure out what is going on if memory mapping
fails because more memory regions than mr_per_cmd are needed.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:37 -05:00
Bart Van Assche 3787d9908c IB/srp: Make login failures easier to debug
If login fails because memory region allocation failed it can be
hard to figure out what happened. Make it easier to figure out
why login failed by logging a message if ib_alloc_mr() fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:37 -05:00
Bart Van Assche 042dd765bd IB/srp: Introduce a local variable in srp_add_one()
This patch makes the srp_add_one() code more compact and does not
change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:37 -05:00
Bart Van Assche 1a1faf7a8a IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
Avoid that the kernel build fails as follows if dynamic debug support
is disabled:

drivers/infiniband/ulp/srp/ib_srp.c:2272:3: error: implicit declaration of function 'DEFINE_DYNAMIC_DEBUG_METADATA'
drivers/infiniband/ulp/srp/ib_srp.c:2272:33: error: 'ddm' undeclared (first use in this function)
drivers/infiniband/ulp/srp/ib_srp.c:2275:39: error: '_DPRINTK_FLAGS_PRINT' undeclared (first use in this function)

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:31:37 -05:00
Bart Van Assche d3a2418ee3 IB/multicast: Check ib_find_pkey() return value
This patch avoids that Coverity complains about not checking the
ib_find_pkey() return value.

Fixes: commit 547af76521 ("IB/multicast: Report errors on multicast groups if P_key changes")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:27:34 -05:00
Bart Van Assche 11b642b84e IPoIB: Avoid reading an uninitialized member variable
This patch avoids that Coverity reports the following:

    Using uninitialized value port_attr.state when calling printk

Fixes: commit 94232d9ce8 ("IPoIB: Start multicast join process only on active ports")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:27:34 -05:00
Bart Van Assche 2fe2f378dd IB/mad: Fix an array index check
The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:

Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).

Fixes: commit b7ab0b19a8 ("IB/mad: Verify mgmt class in received MADs")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:27:34 -05:00
Bart Van Assche b42dde478b IB/mlx4: Rework special QP creation error path
The special QP creation error path relies on offset_of(struct mlx4_ib_sqp,
qp) == 0. Remove this assumption because that makes the QP creation
code easier to understand.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 13:01:11 -05:00
Bart Van Assche 0d38c240f9 IB/srpt: Report login failures only once
Report the following message only once if no ACL has been configured
yet for an initiator port:

"Rejected login because no ACL has been configured yet for initiator %s.\n"

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagig@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:58:30 -05:00
Julia Lawall 5f4c7e4eb5 IB/usnic: simplify IS_ERR_OR_NULL to IS_ERR
The function usnic_ib_qp_grp_get_chunk only returns an ERR_PTR value or a
valid pointer, never NULL.  The same is true of get_qp_res_chunk, which
just returns the result of calling usnic_ib_qp_grp_get_chunk.  Simplify
IS_ERR_OR_NULL to IS_ERR in both cases.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,e;
@@

t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
... when != t=e
- IS_ERR_OR_NULL(t)
+ IS_ERR(t)

@@
expression t,e,e1;
@@

t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
... when != t=e
?- t ? PTR_ERR(t) : e1
+ PTR_ERR(t)
... when any
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:57:54 -05:00
Hans Westgaard Ry 9315bc9a13 IB/core: Issue DREQ when receiving REQ/REP for stale QP
from "InfiBand Architecture Specifications Volume 1":

  A QP is said to have a stale connection when only one side has
  connection information. A stale connection may result if the remote CM
  had dropped the connection and sent a DREQ but the DREQ was never
  received by the local CM. Alternatively the remote CM may have lost
  all record of past connections because its node crashed and rebooted,
  while the local CM did not become aware of the remote node's reboot
  and therefore did not clean up stale connections.

and:

   A local CM may receive a REQ/REP for a stale connection. It shall
   abort the connection issuing REJ to the REQ/REP. It shall then issue
   DREQ with "DREQ:remote QPN” set to the remote QPN from the REQ/REP.

This patch solves a problem with reuse of QPN. Current codebase, that
is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
in CM. A problem with this is the timeconstants governing this
mechanism; they are up to 768 seconds and the interface may look
inresponsive in that period.  Issuing a DREQ (and receiving a DREP)
does the necessary cleanup and the interface comes up.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:56:24 -05:00
Philippe Reynes 24dc08c3c9 IB/nes: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:52:25 -05:00
Alexey Khoroshilov def4a6ffc9 IB/isert: do not ignore errors in dma_map_single()
There are several places, where errors in dma_map_single() are
ignored. The patch fixes them.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:51:31 -05:00
Jim Foraker 22dccc5454 IB/rdmavt: Only put mmap_info ref if it exists
rvt_create_qp() creates qp->ip only when a qp creation request comes from
userspace (udata is not NULL).  If we exceed the number of available
queue pairs however, the error path always attempts to put a kref to this
structure.  If the requestor is inside the kernel, this leads to a crash.

We fix this by checking that qp->ip is not NULL before caling kref_put().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:16:11 -05:00
Petr Mladek f5eabf5e51 IB/rdmavt: Handle the kthread worker using the new API
Use the new API to create and destroy the cq kthread worker.
The API hides some implementation details.

In particular, kthread_create_worker() allocates and initializes
struct kthread_worker. It runs the kthread the right way and stores
task_struct into the worker structure. In addition, the *on_cpu()
variant binds the kthread to the given cpu and the related memory
node.

kthread_destroy_worker() flushes all pending works, stops
the kthread and frees the structure.

This patch does not change the existing behavior. Note that we must
use the on_cpu() variant because the function starts the kthread
and it must bind it to the right CPU before waking. The numa node
is associated for given CPU as well.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:16:11 -05:00
Petr Mladek 6efaf10f16 IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker
The memory barrier is not enough to protect queuing works into
a destroyed cq kthread. Just imagine the following situation:

CPU1				CPU2

rvt_cq_enter()
  worker =  cq->rdi->worker;

				rvt_cq_exit()
				  rdi->worker = NULL;
				  smp_wmb();
				  kthread_flush_worker(worker);
				  kthread_stop(worker->task);
				  kfree(worker);

				  // nothing queued yet =>
				  // nothing flushed and
				  // happily stopped and freed

  if (likely(worker)) {
     // true => read before CPU2 acted
     cq->notify = RVT_CQ_NONE;
     cq->triggered++;
     kthread_queue_work(worker, &cq->comptask);

  BANG: worker has been flushed/stopped/freed in the meantime.

This patch solves this by protecting the critical sections by
rdi->n_cqs_lock. It seems that this lock is not much contended
and looks reasonable for this purpose.

One catch is that rvt_cq_enter() might be called from IRQ context.
Therefore we must always take the lock with IRQs disabled to avoid
a possible deadlock.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:16:11 -05:00
Arnd Bergmann 14ab8896f5 IB/mlx5: avoid bogus -Wmaybe-uninitialized warning
We get a false-positive warning in linux-next for the mlx5 driver:

infiniband/hw/mlx5/mr.c: In function ‘mlx5_ib_reg_user_mr’:
infiniband/hw/mlx5/mr.c:1172:5: error: ‘order’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1161:6: note: ‘order’ was declared here
infiniband/hw/mlx5/mr.c:1173:6: error: ‘ncont’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1160:6: note: ‘ncont’ was declared here
infiniband/hw/mlx5/mr.c:1173:6: error: ‘page_shift’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1158:6: note: ‘page_shift’ was declared here
infiniband/hw/mlx5/mr.c:1143:13: error: ‘npages’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1159:6: note: ‘npages’ was declared here

I had a trivial workaround for gcc-5 or higher, but that didn't work
on gcc-4.9 unfortunately.

The only way I found to avoid the warnings for gcc-4.9, short of
initializing each of the arguments first was to change the calling
conventions to separate the error code from the umem pointer. This
avoids casting the error codes from one pointer to another incompatible
pointer, and lets gcc figure out when that the data is actually valid
whenever we return successfully.

Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 12:12:53 -05:00
Steve Wise 1e38a366ee ib_isert: log the connection reject message
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Steve Wise 97540bb90a ib_iser: log the connection reject message
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Steve Wise 5f24410408 rdma_cm: add rdma_consumer_reject_data helper function
rdma_consumer_reject_data() will return the private data pointer
and length if any is available.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Steve Wise 5042a73d3e rdma_cm: add rdma_is_consumer_reject() helper function
Return true if the peer consumer application rejected the
connection attempt.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Steve Wise 77a5db1315 rdma_cm: add rdma_reject_msg() helper function
rdma_reject_msg() returns a pointer to a string message associated with
the transport reject reason codes.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Wei Yongjun aecb66b2b0 qedr: remove pointless NULL check in qedr_post_send()
Remove pointless NULL check for 'wr' in qedr_post_send().

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:18:17 -05:00
Wei Yongjun aafec388a1 qedr: Use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:18:17 -05:00
Wei Yongjun 181d80151f qedr: Fix possible memory leak in qedr_create_qp()
'qp' is malloced in qedr_create_qp() and should be freed before leaving
from the error handling cases, otherwise it will cause memory leak.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:18:17 -05:00
Colin Ian King ea7ef2accd qedr: return -EINVAL if pd is null and avoid null ptr dereference
Currently, if pd is null then we hit a null pointer derference
on accessing pd->pd_id.  Instead of just printing an error message
we should also return -EINVAL immediately.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:18:17 -05:00
Hal Rosenstock 9fa240bbfc IB/mad: Eliminate redundant SM class version defines for OPA
and rename class version define to indicate SM rather than SMP or SMI

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:01:58 -05:00
Bodong Wang 7d29f349a4 IB/mlx5: Properly adjust rate limit on QP state transitions
- Add MODIFY_QP_EX CMD to extend modify_qp.
- Rate limit will be updated in the following state transactions: RTR2RTS,
  RTS2RTS. The limit will be removed when SQ is in RST and ERR state.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:51 -05:00
Bodong Wang 189aba99e7 IB/uverbs: Extend modify_qp and support packet pacing
An new uverbs command ib_uverbs_ex_modify_qp is added to support more QP
attributes. User driver should choose to call the legacy/extended API
based on input mask.

IB_USER_LAST_QP_ATTR_MASK is added to indicated the maximum bit position
which supports legacy ib_uverbs_modify_qp.
IB_USER_LEGACY_LAST_QP_ATTR_MASK indicates the maximum bit position
which supports ib_uverbs_ex_modify_qp, the value of this mask should be
updated if new mask is added later.

Along with this change, rate_limit is supported by the extended command,
user driver could use it to control packet packing.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:51 -05:00
Bodong Wang 528e5a1bd3 IB/core: Support rate limit for packet pacing
Add new member rate_limit to ib_qp_attr which holds the packet pacing rate
in kbps, 0 means unlimited.

IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW
QPs when changing QP state from RTR to RTS, RTS to RTS.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:50 -05:00
Bodong Wang d949167d68 IB/mlx5: Report mlx5 packet pacing capabilities when querying device
Enable mlx5 based hardware to report packet pacing capabilities
from kernel to user space. Packet pacing allows to limit the rate to any
number between the maximum and minimum, based on user settings.

The capabilities are exposed to user space through query_device by uhw.
The following capabilities are reported:

1. The maximum and minimum rate limit in kbps supported by packet pacing.
2. Bitmap showing which QP types are supported by packet pacing operation.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:50 -05:00
Or Gerlitz ca5b91d631 IB/mlx5: Support RAW Ethernet when RoCE is disabled
On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.

This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET
QPs) functionality to remain in place. For that end, enhance the relevant
driver flows such that we do create a device instance in that case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:49 -05:00
Or Gerlitz 45f95acd63 IB/mlx5: Rename RoCE related helpers to reflect being Eth ones
This is a pre-step towards having mlx5 IB device also over Eth ports where
RoCE is not supported. We change the roce enable/disable and roce_lag
init/fini function names to have _eth instead of _roce.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:48 -05:00
Or Gerlitz d012f5d6f8 IB/mlx5: Refactor registration to netdev notifier
Refactor the netdev notifier registration into a small helper function.

This is a pre-step towards having mlx5 IB device over an Ethernet port
which doesn't support RoCE. Also, renamed the de-registration helper
and the new helper as netdev notifier and not roce, to make it clear
this is not only used with roce.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:48 -05:00
Maor Gottlieb b216af408c IB/mlx5: Use u64 for UMR length
The fast_registration length is used to convey length for memory
registrations through UMR which can be of any size up to 2^64.

Change the length type to be u64.

Fixes: 968e78dd96 ('IB/mlx5: Enhance UMR support to allow partial page table update')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:47 -05:00
Eli Cohen afd02cd3a9 IB/mlx5: Avoid system crash when enabling many VFs
When enabling many VFs, the total amount of DMA mappings increase
significantly. This causes DMA allocations to take a lot of time
since they are serialized in the kernel.

As a result the driver enters into fatal condition due to
timeout and the system hangs. To recover from this we disable
MR cache for VFs.

PFs will still have a full cache and VFs cache can be manipulated
as usual after driver load.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:47 -05:00
Maor Gottlieb c73b7911de IB/mlx5: Assign SRQ type earlier
Move the SRQ type assignment to be before actually using it
in create_srq_user() and in create_srq_kernel() functions.

Fixes: af1ba291c5 ('{net, IB}/mlx5: Refactor internal SRQ API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:46 -05:00
Jack Morgenstein c482af646d IB/mlx4: Fix out-of-range array index in destroy qp flow
For non-special QPs, the port value becomes non-zero only at the
RESET-to-INIT transition. If the QP has not undergone that transition,
its port number value is still zero.

If such a QP is destroyed before being moved out of the RESET state,
subtracting one from the qp port number results in a negative value.
Using that negative value as an index into the qp1_proxy array
results in an out-of-bounds array reference.

Fix this by testing that the QP type is one that uses qp1_proxy before
using the port number. For special QPs of all types, the port number is
specified at QP creation time.

Fixes: 9433c18891 ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:46 -05:00
Moni Shoua 41c450fd8d IB/mlx5: Make create/destroy_ah available to userspace
Advertise that create_ah and destroy_ah verbs are accessible from
uverbs interface.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:39:19 -05:00
Moni Shoua 5097e71f3e IB/mlx5: Use kernel driver to help userspace create ah
Resolving a MAC address for a given IP address in userspace is inefficient.
This patch lets mlx5 user driver using the kernel driver to resolve the mac
and get the answer in the private section of the response.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:38:49 -05:00
Moni Shoua 477864c8fc IB/core: Let create_ah return extended response to user
Add struct ib_udata to the signature of create_ah callback that is
implemented by IB device drivers. This allows HW drivers to return extra
data to the userspace library.
This patch prepares the ground for mlx5 driver to resolve destination
mac address for a given GID and return it to userspace.
This patch was previously submitted by Knut Omang as a part of the
patch set to support Oracle's Infiniband HCA (SIF).

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:38:27 -05:00
Moni Shoua 6ad279c5a2 IB/mlx5: Report that device has udata response in create_ah
To make mlx5 user driver aware of whether kernel driver returns dmac
in user data response add a new flag that will be returned back to
user-space through alloc_ucontext.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:37:19 -05:00
Moni Shoua c90ea9d8e5 IB/core: Change ib_resolve_eth_dmac to use it in create AH
The function ib_resolve_eth_dmac() requires struct qp_attr * and
qp_attr_mask as parameters while the function might be useful to resolve
dmac for address handles. This patch changes the signature of the
function so it can be used in the flow of creating an address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:34:25 -05:00
Moses Reuben 2d1e697e9b IB/mlx5: Add support to match inner packet fields
Add support to match packet fields which are tunneled,
i.e. support matching the header of the inner packet which is the result of
or bit operation of the original header and the IB_FLOW_SPEC_INNER type.

The combination of IB_FLOW_SPEC_INNER | IB_FLOW_SPEC_VXLAN_TUNNEL is not
needed to be checked, because the IB core has this check already.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:34:24 -05:00
Moses Reuben fbf46860b1 IB/core: Introduce inner flow steering
For a tunneled packet which contains external and internal headers,
we refer to the external headers as "outer fields" and the internal
headers as "inner fields".

Example of a tunneled packet:

{ L2 | L3 | L4 | tunnel header | L2 | L3 | l4 | data }
  |     |    |         |         |    |    |
{       outer fields           }{ inner fields }

This patch introduces a new flag for flow steering rules
- IB_FLOW_SPEC_INNER - which specifies that the rule applies
to the inner fields, rather than to the outer fields of the packet.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:34:23 -05:00
Moses Reuben ffb30d8f10 IB/mlx5: Support Vxlan tunneling specification
Add support to receive specific Vxlan packet in ConnectX-4.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:34:23 -05:00
Moses Reuben 0dbf3332b7 IB/core: Add flow spec tunneling support
In order to support tunneling, that can be used by the QP,
both struct ib_flow_spec_tunnel and struct ib_flow_tunnel_filter can be
used to more IP or UDP based tunneling protocols (e.g NVGRE, GRE, etc).

IB_FLOW_SPEC_VXLAN_TUNNEL type flow specification is added to use this
functionality and match specific Vxlan packets.

In similar to IPv6, we check overflow of the vni value by
comparing with the maximum size.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:34:21 -05:00
Bodong Wang 1cbe6fc86c IB/mlx5: Add support for CQE compressing
CQE compressing reduces PCI overhead by coalescing and compressing
multiple CQEs into a single merged CQE. Successful compressing
improves message rate especially for small packet traffic.

CQE compressing is supported for all 64B CQE formats (with certain
limitations) generated by RQ/Responder or by SQ/Requestor.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:34:20 -05:00
Bodong Wang 7e43a2a5ba IB/mlx5: Report mlx5 CQE compression caps during query
The capabilities include:
- Max number of compressed and aggregated CQEs in a single session,
  while zero means unsupported.
- For Responder, there are two formats of mini CQE: mini CQE with Rx
  hash and mini CQE with checksum. They're mutual exclusive.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:34:03 -05:00
Bodong Wang 191ded4a4d IB/mlx5: Report mlx5 multi packet WQE caps during query
The capabilities whether hardware support multi packet WQE or not is
exposed to user space through query_device by uhw.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:33:25 -05:00
Yonatan Cohen d680ebed91 IB/rxe: Increase max number of completions to 32k
Increase limit of max CQE from 8K to 32K to allow demanding
applications to work over SoftRoCE with same configuration
as most RoCEv2 HW vendors have.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:33:24 -05:00
Eran Ben Elisha bf08e884bf IB/mlx4: Check if GRH is available before using it
Before reading GRH attributes, need to make sure AH contains GRH,
and in addition, initialize GID type.

Fixes: dbf727de74 ('IB/core: Use GID table in AH creation and dmac resolution')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:32:51 -05:00
Eran Ben Elisha 1f22e454df IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs
According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is
supported only if dmfs_ipoib bit is set.

If it isn't set we want to ensure allocating NET_IF QPs fail. We do so
by filling out the allocation bitmap. By thus, the NET_IF QPs allocating
function won't find any free QP and will fail.

Fixes: c1c9850112 ('IB/mlx4: Add support for steerable IB UD QPs')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13 13:29:46 -05:00
Henry Orosco d6f7bbcc2e i40iw: Reorganize structures to align with HW capabilities
Some resources are incorrectly organized and at odds with
HW capabilities. Specifically, ILQ, IEQ, QPs, MSS, QOS
and statistics belong in a VSI.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:20:29 -05:00
Mustafa Ismail 0cc0d851cc i40iw: Fix incorrect check for error
In i40iw_ieq_handle_partial() the check for !status is incorrect.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:20:29 -05:00
Mustafa Ismail 6b0805c256 i40iw: Assign MSS only when it is a new MTU
Currently we are changing the MSS regardless of whether
there is a change or not in MTU. Fix to make the
assignment of MSS dependent on an MTU change.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:20:28 -05:00
Shiraz Saleem d627b50631 i40iw: Fix race condition in terminate timer's handler
Add a QP reference when terminate timer is started to ensure
the destroy QP doesn't race ahead to free the QP while it is being
referenced in the terminate timer's handler.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:20:28 -05:00
Mustafa Ismail fd90d4d4c2 i40iw: Fix memory leak in CQP destroy when in reset
On a device close, the control QP (CQP) is destroyed by calling
cqp_destroy which destroys the CQP and frees its SD buffer memory.
However, if the reset flag is true, cqp_destroy is never called and
leads to a memory leak on SD buffer memory. Fix this by always calling
cqp_destroy, on device close, regardless of reset. The exception to this
when CQP create fails. In this case, the SD buffer memory is already
freed on an error check and there is no need to call cqp_destroy.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:20:27 -05:00
Shiraz Saleem 1cda28bb5b i40iw: Fix QP flush to not hang on empty queues or failure
When flush QP and there are no pending work requests, signal completion
to unblock i40iw_drain_sq and i40iw_drain_rq which are waiting on
completion for iwqp->sq_drained and iwqp->sq_drained respectively.
Also, signal completion if flush QP fails to prevent the drain SQ or RQ
from being blocked indefintely.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:20:27 -05:00
Mustafa Ismail f4a87ca12a i40iw: Fix double free of QP
A QP can be double freed if i40iw_cm_disconn() is
called while it is currently being freed by
i40iw_rem_ref(). The fix in i40iw_cm_disconn() will
first check if the QP is already freed before
making another request for the QP to be freed.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:20:26 -05:00
Shiraz Saleem 91c42b72f8 i40iw: Use correct src address in memcpy to rdma stats counters
hw_stats is a pointer to i40_iw_dev_stats struct in i40iw_get_hw_stats().
Use hw_stats and not &hw_stats in the memcpy to copy the i40iw device stats
data into rdma_hw_stats counters.

Fixes: b40f4757da ("IB/core: Make device counter infrastructure dynamic")

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:19:02 -05:00
Thomas Huth 5e58917122 i40iw: Remove macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG
The macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG are
apparently bad - they are using the logical "&&" operation which
does not make sense here. It should have been a bitwise "&" instead.
Since the macros seem to be completely unused, let's simply remove
them so that nobody accidentially uses them in the future. And while
we're at it, also remove the unused macro I40IW_CREATE_STAG.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 17:13:02 -05:00
Andrew Boyer 37f69f43fb IB/rxe: Hold refs when running tasklets
It might be possible for all of a QP's references to be dropped
while one of that QP's tasklets is running.

For example, the completer might run during QP destroy.
If qp->valid is false, it will drop all of the packets on
the resp_pkts list, potentially removing the last reference.
Then it tries to advance the SQ consumer pointer. If the
SQ's buffer has already been destroyed, the system will
panic.

To be safe, hold a reference on the QP for the duration
of each tasklet.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:34:22 -05:00
Andrew Boyer 07bf9627d5 IB/rxe: Wait for tasklets to finish before tearing down QP
The system may crash when a malformed request is received and
the error is detected by the responder.

NodeA: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 50000
NodeB: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 1024 <NodeA_ip>

The responder generates a receive error on node B since the incoming
SEND is oversized. If the client tears down the QP before the responder
or the completer finish running, a page fault may occur.

The fix makes the destroy operation spin until the tasks complete, which
appears to be original intent of the design.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer 5407f53012 IB/rxe: Fix ref leak in duplicate_request()
A ref was added after the call to skb_clone().

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer 5b9ea16c54 IB/rxe: Fix ref leak in rxe_create_qp()
The udata->inlen error path needs to clean up the ref
added by rxe_alloc().

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer accacb8f51 IB/rxe: Add support for IB_CQ_REPORT_MISSED_EVENTS
Peek at the CQ after arming it so that we can return a hint.
This avoids missed completions due to a race between posting
CQEs and arming the CQ.

For example, CM teardown waits on MAD requests to complete with
ib_cq_poll_work(). Without this fix, the last completion might be
left on the CQ, hanging the kthread doing the teardown.

The console backtraces look like this:

[ 4199.911284] Call Trace:
[ 4199.911401]  [<ffffffff9657fe95>] schedule+0x35/0x80
[ 4199.911556]  [<ffffffff965830df>] schedule_timeout+0x22f/0x2c0
[ 4199.911727]  [<ffffffff9657f7a8>] ? __schedule+0x368/0xa20
[ 4199.911891]  [<ffffffff96580903>] wait_for_completion+0xb3/0x130
[ 4199.912067]  [<ffffffff960a17e0>] ? wake_up_q+0x70/0x70
[ 4199.912243]  [<ffffffffc074a06d>] cm_destroy_id+0x13d/0x450 [ib_cm]
[ 4199.912422]  [<ffffffff961615d5>] ? printk+0x57/0x73
[ 4199.912578]  [<ffffffffc074a390>] ib_destroy_cm_id+0x10/0x20 [ib_cm]
[ 4199.912759]  [<ffffffffc076098c>] rdma_destroy_id+0xac/0x340 [rdma_cm]
[ 4199.912941]  [<ffffffffc076f2cc>] 0xffffffffc076f2cc

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer d4fb59256a IB/rxe: Add support for zero-byte operations
The last_psn algorithm fails in the zero-byte case: it calculates
first_psn = N, last_psn = N-1. This makes the operation unretryable since
the res structure will fail the (first_psn <= psn <= last_psn) test in
find_resource().

While here, use BTH_PSN_MASK to mask the calculated last_psn.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer d38eb801aa IB/rxe: Unblock loopback by moving skb_out increment
skb_out is decremented in rxe_skb_tx_dtor(), which is not called in the
loopback() path. Move the increment to the send() path rather than
rxe_xmit_packet().

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer 2a7a85487e IB/rxe: Don't update the response PSN unless it's going forwards
A client might post a read followed by a send. The partner receives
and acknowledges both transactions, posting an RCQ entry for the
send, but something goes wrong with the read ACK. When the client
retries the read, the partner's responder processes the duplicate
read but incorrectly resets the PSN to the value preceding the
original send. When the duplicate send arrives, the responder cannot
tell that it is a duplicate, so the responder generates a duplicate
RCQ entry, confusing the client.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer dd753d8743 IB/rxe: Advance the consumer pointer before posting the CQE
A simple userspace application might poll the CQ, find a completion,
and then attempt to post a new WQE to the SQ. A spurious error can
occur if the userspace application detects a full SQ in the instant
before the kernel is able to advance the SQ consumer pointer.

This is noticeable when using single-entry SQs with ibv_rc_pingpong
if lots of kernel and userspace library debugging is enabled.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Andrew Boyer 6e9bb530ff IB/rxe: Remove buffer used for printing IP address
Avoid smashing the stack when an ICRC error occurs on an IPv6 network.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Dan Carpenter 95db9d05b7 IB/rxe: Remove unneeded cast in rxe_srq_from_attr()
It makes me nervous when we cast pointer parameters.  I would estimate
that around 50% of the time, it indicates a bug.  Here the cast is not
needed becaue u32 and and unsigned int are the same thing.  Removing the
cast makes the code more robust and future proof in case any of the
types change.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Wei Yongjun 4ac4707102 IB/rxe: Use DEFINE_SPINLOCK() for spinlock
spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanosky <leonro@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Arnd Bergmann a0fa72683e IB/rxe: avoid putting a large struct rxe_qp on stack
A race condition fix added an rxe_qp structure to the stack in order
to be able to perform rollback in rxe_requester(), but the structure
is large enough to trigger the warning for possible stack overflow:

drivers/infiniband/sw/rxe/rxe_req.c: In function 'rxe_requester':
drivers/infiniband/sw/rxe/rxe_req.c:757:1: error: the frame size of 2064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

This changes the rollback function to only save the psn inside
the qp, which is the only field we access in the rollback_qp
anyway.

Fixes: 3050b99850 ("IB/rxe: Fix race condition between requester and completer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12 16:31:45 -05:00
Bart Van Assche 66431b0e86 IB/hfi1: Define platform_config_table_limits once
Defining static data structures in a header file is wrong because
this causes the data structure to be instantiated once in every .c
file it is included in. Hence move the definition of a static
array from a header file into the only .c file in which it is used.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Bhumika Goyal 0fc859a657 IB/hfi1: constify mmu_notifier_ops structure
Declare the structure mmu_notifier_ops as const as it is only stored in
the ops field of a mmu_notifier structure. The ops field is of type
const struct mmu_notifier_ops *, so mmu_notifier_ops structures having
this property can be declared as const.
Done using coccinelle:
@r1 disable optional_qualifier @
identifier i;
position p;
@@
static struct mmu_notifier_ops i@p = {...};

@ok1@
identifier r1.i;
position p;
struct mmu_rb_handler handler;
@@
handler.mn.ops=&i@p

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
static
+const
struct mmu_notifier_ops i={...};

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct mmu_notifier_ops i;

File size before:
   text	   data	    bss	    dec	    hex	filename
   3566	     72	     16	   3654	    e46
drivers/infiniband/hw/hfi1/mmu_rb.o

File size after:
   text	   data	    bss	    dec	    hex	filename
   3658	      0	     16	   3674	    e5a
drivers/infiniband/hw/hfi1/mmu_rb.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Mike Marciniszyn 5dc806052a IB/rdmavt, IB/hfi1, IB/qib: Add inlines for mtu division
Add rvt_div_round_up_mtu() and rvt_div_mtu() routines to
do the computation based on the pmtu and the log_pmtu.

Change divides in qib, hfi1 to use the new inlines.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Mike Marciniszyn c64607aa8a IB/hfi1,IB/qib: use rvt swqe mr deref helper
Convert to use new swqe put routine.

Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Harish Chegondi 9d8145a604 IB/hfi1: Avoid credit return allocation for cpu-less NUMA nodes
Do not allocate credit return base and DMA memory for
NUMA nodes without CPUs.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Mike Marciniszyn 0771da5a6e IB/hfi1,IB/qib: Use new send completion helper
Convert cq completion returns in both rdmavt drivers
to use the new helper.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Mike Marciniszyn f2dc9cdce8 IB/rdmavt: Add a send completion helper
This is for use by client drivers to drive
send completions into a CQ.

A new exported table allows for the mapping
of ib_wr_opcode into a ib_wc_opcode.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Sebastian Sanchez 238b1862b4 IB/qib: Use standard refcount wrapper for QPs
Use the standard driver wrapper for QP reference counters.
This makes the code more maintainable.

Fixes: Commit 4d6f85c3fa ("IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Sebastian Sanchez f84dfa26e6 IB/hfi1: Use reference count wrapper for MRs
Some parts of the code don't use the standard driver
wrapper for memory region reference counters. Use the
standard driver wrapper throughout the code.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Sebastian Sanchez b44980f879 IB/hfi1: Replace qp->refcount release code with standard driver wrapper
Some parts of the code don't use the standard release
wrapper rvt_put_qp() for decrementing and testing
the refcount to then try to use a resource.
Replace this code with the standard driver wrapper.

Fixes: Commit 4d6f85c3fa ("IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Dean Luick 0080167467 IB/hfi1: Preserve external device completed bit
The driver should not change the external device request
completed bit when not actually doing an external device
request.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Sebastian Sanchez 9b86071c5e IB/hfi1: Remove critical section gap in sc_buffer_alloc()
In sc_buffer_alloc(), the sc->alloc_lock is released
before calling sc_release_update(), and it is reacquired
after the function call. This causes CPU lock trading.
Fix it by not dropping the lock before calling
sc_release_update().

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Mitko Haralanov b777f154a0 IB/hfi1: Remove usage of qp->s_cur_sge
The s_cur_sge field in the qp structure holds a pointer to the
SGE of the currently processed WQE. It assumes the protection
of the RVT_S_BUSY flag to prevent the changing of this field
while the send engine is using it. This scheme works as long
as there is only one instance of the send engine running at a
time.

Scaling of the send engine to multiple cores would break this
assumption as there could be multiple instances of the send engine
running on different CPUs. This opens a window where the QP's
RVT_S_BUSY flag is not set but the send engine is still running.

To prevent accidental changing of the s_cur_sge pointer, the QP's
dependence on it is removed. The SGE pointer is now stored in the
verbs_txreq, which is a per-packet data structure. This ensures
that each individual packet has it's own pointer, which is setup
while the RVT_S_BUSY flag is set.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Mike Marciniszyn fcb29a6668 IB/rdmavt: Add trace of MR segs
Add tracing of MR segment information.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00
Dean Luick 5213006ade IB/hfi1: Add special setting for low power AOC
Low power QSFP AOC cables require a different SerDes
Tx PLL bandwidth setting than the default.  The
8051 firmware does not know the details, so the driver
needs to tell the firmware through a special setting.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11 15:29:42 -05:00