Commit Graph

5221 Commits

Author SHA1 Message Date
Yixian Liu df7e404258 RDMA/hns: Fix init resp when alloc ucontext
The data in resp will be copied from kernel to userspace, thus it needs to
be initialized to zeros to avoid copying uninited stack memory.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e088a685ea ("RDMA/hns: Support rq record doorbell for the user space")
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:41:41 -06:00
Honggang Li 7672ed33c4 IB/mlx5: Set the default active rate and width to QDR and 4X
Before commit f1b65df5a2 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set the default active_width
and active_speed to IB_WIDTH_4X and IB_SPEED_QDR.

When the RoCE port is down, the RoCE port does not negotiate the active
width with the remote side, causing the active width to be zero. When
running userspace ibstat to view the port status, ibstat will panic as it
reads an invalid width from sys file.

This patch restores the original behavior.

Fixes: f1b65df5a2 ("IB/mlx5: Add support for active_width and active_speed in RoCE").
Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:39:47 -06:00
Ganesh Goudar 8b7372c101 cxgb4: notify fatal error to uld drivers
notify uld drivers if the adapter encounters fatal
error.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 14:46:57 -04:00
Guy Levi 6d06c9aa38 IB/mlx4: Add Scatter FCS support over WQ creation
As a default, for Ethernet packets, the device scatters only the payload
of ingress packets. The scatter FCS feature lets the user to get the FCS
(Ethernet's frame check sequence) in the received WR's buffer as a 4
Bytes trailer following the packet's payload.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:05 -06:00
Yishai Hadas 9c71172c4a IB/mlx4: Report TSO capabilities
Report to the user area the TSO device capabilities, it includes the
max_tso size and the QP types that support it.

The TSO is applicable only when when of the ports is ETH and the device
supports it.

uresp logic around rss_caps is updated to fix a till-now harmless bug
computing the length of the structure to copy. The code did not handle the
implicit padding before rss_caps correctly. This is necessay to copy
tss_caps successfully.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:05 -06:00
Henry Orosco 546b1452fd i40iw: Tear-down connection after CQP Modify QP failure
There is no explicit tear-down sequence initiated on
connections if the Control QP OP, Modify QP to close,
fails. Fix this by triggering a driver generated
Asynchronous Event (AE) on Modify QP failures and
tear-down the connection on receipt of the AE.

This fix can be generalized to other Modify QP failures
(i.e. RTS->TERM, IDLE->RTS, etc) as any modify failure
will require a connection tear-down.

Fixes: d374984179 ("i40iw: add files for iwarp interface")
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:05 -06:00
Henry Orosco a8b9234b12 i40iw: Refactor of driver generated AEs
The flush CQP OP can be used to optionally generate
Asynchronous Events (AEs) in addition to QP flush.
Consolidate all HW AE generation code under a new
function i40iw_gen_ae which use the flush CQP OP
to only generate AEs.

Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:04 -06:00
Jason Gunthorpe 7f86260b5f RDMA/cxgb4: Use structs to describe the uABI instead of opencoding
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:04 -06:00
Jason Gunthorpe 633fb4d9fd RDMA/hns: Use structs to describe the uABI instead of opencoding
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:04 -06:00
Jason Gunthorpe 9a657b4c4a RDMA/i40iw: Move uapi header to include/uapi
All of these defines are part of the uABI for the driver, this
header duplicates providers/i40iw/i40iw-abi.h in rdma-core.

Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:03 -06:00
Jason Gunthorpe 48962f5c6f RDMA/mlx4: Move flag constants to uapi header
MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps)
and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via
mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to
userspace and form part of the uAPI.

Move them to the uapi header where they belong.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:03 -06:00
Sinan Kaya 561e5d4896 RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:35:44 -06:00
Yixian Liu 7b48221cf4 RDMA/hns: Fix cqn type and init resp
This patch changes the type of cqn from u32 to u64 to keep
userspace and kernel consistent, initializes resp both for
cq and qp to zeros, and also changes the condition judgment
of outlen considering future caps extension.

Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Fixes: e088a685ea (hns: Support rq record doorbell for the user space)
Fixes: 9b44703d0a (hns: Support cq record doorbell for the user space)
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:34:26 -06:00
Parav Pandit 115b68aa6e IB/ocrdma: Removed GID add/del null routines
add_gid() and del_gid() are optional callback routines.
ib_core ignores invoking them while updating GID table entries if
they are not implemented by provider drivers. Therefore remove them.

Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:17:48 -06:00
Leon Romanovsky eeea6953c4 RDMA/mlx5: Simplify clean and destroy MR calls
The failure to destroy the MRs is printed on mlx5_core layer
as error and it makes warning prints useless.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Leon Romanovsky c985bd0ed7 RDMA/mlx5: Guard ODP specific assignments with specific CONFIG
"live" is needed for ODP only and is better to be guarded
by appropriate CONFIG.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Leon Romanovsky 4638a3b242 RDMA/mlx5: Unify error flows in rereg MR failure paths
According to the IBTA spec 1.3, the driver failure in
MR reregister shall release old and new MRs.

 C11-20: If the CI returns any other error, the CI shall
 invalidate both "old" and "new" registrations, and release
 any associated resources.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Leon Romanovsky ea30f01376 RDMA/mlx5: Return proper value for not-supported command
Return -EOPNOTSUPP value to the user for unsupported reg_user_mr.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Leon Romanovsky 4289861d88 RDMA/mlx5: Protect from NULL pointer derefence
The mlx5_ib_alloc_implicit_mr() can fail to acquire pages
and the returned mr pointer won't be valid. Ensure that it
is not error prior to access.

Cc: <stable@vger.kernel.org> # 4.10
Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15 10:59:58 -04:00
Doug Ledford 2d873449a2 Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-next
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch.  This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.

Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
	(IB/mlx5: Fix cleanup order on unload) added to for-rc and
	commit b5ca15ad7e (IB/mlx5: Add proper representors support)
	add as part of the devel cycle both needed to modify the
	init/de-init functions used by mlx5.  To support the new
	representors, the new functions added by the cleanup patch
	needed to be made non-static, and the init/de-init list
	added by the representors patch needed to be modified to
	match the init/de-init list changes made by the cleanup
	patch.
Updates:
	drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
	prototypes added by representors patch to reflect new function
	names as changed by cleanup patch
	drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
	stage list to match new order from cleanup patch

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 19:28:58 -04:00
Arnd Bergmann bd8602ca42 infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks
On 32-bit targets, we otherwise get a warning about an impossible constant
integer expression:

In file included from include/linux/kernel.h:11,
                 from include/linux/interrupt.h:6,
                 from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
include/linux/bitops.h:7:24: error: left shift count >= width of type [-Werror=shift-count-overflow]
 #define BIT(nr)   (1UL << (nr))
                        ^~
drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 'BIT'
 #define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
                                  ^~~
drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE_HIGH'
 #define BNXT_RE_MAX_MR_SIZE  BNXT_RE_MAX_MR_SIZE_HIGH
                              ^~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE'
  ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
                         ^~~~~~~~~~~~~~~~~~~

Fixes: 872f357824 ("RDMA/bnxt_re: Add support for MRs with Huge pages")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-14 18:24:13 -04:00
Arnd Bergmann 5388a50847 infiniband: qplib_fp: fix pointer cast
Building for a 32-bit target results in a couple of warnings from casting
between a 32-bit pointer and a 64-bit integer:

drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq':
drivers/infiniband/hw/bnxt_re/qplib_fp.c:333:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle,
                       ^
drivers/infiniband/hw/bnxt_re/qplib_fp.c:336:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
            (struct bnxt_qplib_srq *)q_handle,
            ^
In file included from include/linux/byteorder/little_endian.h:5,
                 from arch/arm/include/uapi/asm/byteorder.h:22,
                 from include/asm-generic/bitops/le.h:6,
                 from arch/arm/include/asm/bitops.h:342,
                 from include/linux/bitops.h:38,
                 from include/linux/kernel.h:11,
                 from include/linux/interrupt.h:6,
                 from drivers/infiniband/hw/bnxt_re/qplib_fp.c:39:
drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_create_srq':
include/uapi/linux/byteorder/little_endian.h:31:43: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
 #define __cpu_to_le64(x) ((__force __le64)(__u64)(x))
                                           ^
include/linux/byteorder/generic.h:86:21: note: in expansion of macro '__cpu_to_le64'
 #define cpu_to_le64 __cpu_to_le64
                     ^~~~~~~~~~~~~
drivers/infiniband/hw/bnxt_re/qplib_fp.c:569:19: note: in expansion of macro 'cpu_to_le64'
  req.srq_handle = cpu_to_le64(srq);

Using a uintptr_t as an intermediate works on all architectures.

Fixes: 37cb11acf1 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-14 18:24:12 -04:00
Mark Bloch 42cea83f95 IB/mlx5: Fix cleanup order on unload
On load we create private CQ/QP/PD in order to be used by UMR, we create
those resources after we register ourself as an IB device, and we destroy
them after we unregister as an IB device. This was changed by commit
16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove
stages") which moved the destruction before we unregistration. This
allowed to trigger an invalid memory access when unloading mlx5_ib while
there are open resources:

BUG: unable to handle kernel paging request at 00000001002c012c
...
Call Trace:
 mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib]
 __slab_free+0x9a/0x2d0
 delay_time_func+0x10/0x10 [mlx5_ib]
 unreg_umr.isra.15+0x4b/0x50 [mlx5_ib]
 mlx5_mr_cache_free+0x46/0x150 [mlx5_ib]
 clean_mr+0xc9/0x190 [mlx5_ib]
 dereg_mr+0xba/0xf0 [mlx5_ib]
 ib_dereg_mr+0x13/0x20 [ib_core]
 remove_commit_idr_uobject+0x16/0x70 [ib_uverbs]
 uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs]
 ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs]
 ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs]
 ib_unregister_device+0xd4/0x190 [ib_core]
 __mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
 mlx5_remove_device+0xf5/0x120 [mlx5_core]
 mlx5_unregister_interface+0x37/0x90 [mlx5_core]
 mlx5_ib_cleanup+0xc/0x225 [mlx5_ib]
 SyS_delete_module+0x153/0x230
 do_syscall_64+0x62/0x110
 entry_SYSCALL_64_after_hwframe+0x21/0x86
...

We restore the original behavior by breaking the UMR stage into two parts,
pre and post IB registration stages, this way we can restore the original
functionality and maintain clean separation of logic between stages.

Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:44:02 -04:00
Arnd Bergmann baa00fcde4 RDMA/i40iw: include linux/irq.h
We get a build failure on ARM unless the header is included explicitly:

drivers/infiniband/hw/i40iw/i40iw_verbs.c: In function 'i40iw_get_vector_affinity':
drivers/infiniband/hw/i40iw/i40iw_verbs.c:2747:9: error: implicit declaration of function 'irq_get_affinity_mask'; did you mean 'irq_create_affinity_masks'? [-Werror=implicit-function-declaration]
  return irq_get_affinity_mask(msix_vec->irq);
         ^~~~~~~~~~~~~~~~~~~~~
         irq_create_affinity_masks
drivers/infiniband/hw/i40iw/i40iw_verbs.c:2747:9: error: returning 'int' from a function with return type 'const struct cpumask *' makes pointer from integer without a cast [-Werror=int-conversion]
  return irq_get_affinity_mask(msix_vec->irq);

Fixes: 7e952b19eb ("i40iw: Implement get_vector_affinity API")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:09:56 -04:00
Ilya Lesokhin c44ef998f2 IB/mlx5: Maintain a single emergency page
The mlx5 driver needs to be able to issue invalidation to ODP MRs
even if it cannot allocate memory. To this end it preallocates
emergency pages to use when the situation arises.

This flow should be extremely rare enough, that we don't need
to worry about contention and therefore a single emergency page
is good enough.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:05:16 -04:00
Daniel Jurgens 65edd0e758 IB/mlx5: Only synchronize RCU once when removing mkeys
Instead synchronizing RCU in a loop when removing mkeys in a batch do it
once at the end before freeing them. The result is only waiting for one
RCU grace period instead of many serially.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:05:16 -04:00
Leon Romanovsky f3f134f526 RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory
The failure in rereg_mr flow caused to set garbage value (error value)
into mr->umem pointer. This pointer is accessed at the release stage
and it causes to the following crash.

There is not enough to simply change umem to point to NULL, because the
MR struct is needed to be accessed during MR deregistration phase, so
delay kfree too.

[    6.237617] BUG: unable to handle kernel NULL pointer dereference a 0000000000000228
[    6.238756] IP: ib_dereg_mr+0xd/0x30
[    6.239264] PGD 80000000167eb067 P4D 80000000167eb067 PUD 167f9067 PMD 0
[    6.240320] Oops: 0000 [#1] SMP PTI
[    6.240782] CPU: 0 PID: 367 Comm: dereg Not tainted 4.16.0-rc1-00029-gc198fafe0453 #183
[    6.242120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    6.244504] RIP: 0010:ib_dereg_mr+0xd/0x30
[    6.245253] RSP: 0018:ffffaf5d001d7d68 EFLAGS: 00010246
[    6.246100] RAX: 0000000000000000 RBX: ffff95d4172daf00 RCX: 0000000000000000
[    6.247414] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff95d41a317600
[    6.248591] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[    6.249810] R10: ffff95d417033c10 R11: 0000000000000000 R12: ffff95d4172c3a80
[    6.251121] R13: ffff95d4172c3720 R14: ffff95d4172c3a98 R15: 00000000ffffffff
[    6.252437] FS:  0000000000000000(0000) GS:ffff95d41fc00000(0000) knlGS:0000000000000000
[    6.253887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.254814] CR2: 0000000000000228 CR3: 00000000172b4000 CR4: 00000000000006b0
[    6.255943] Call Trace:
[    6.256368]  remove_commit_idr_uobject+0x1b/0x80
[    6.257118]  uverbs_cleanup_ucontext+0xe4/0x190
[    6.257855]  ib_uverbs_cleanup_ucontext.constprop.14+0x19/0x40
[    6.258857]  ib_uverbs_close+0x2a/0x100
[    6.259494]  __fput+0xca/0x1c0
[    6.259938]  task_work_run+0x84/0xa0
[    6.260519]  do_exit+0x312/0xb40
[    6.261023]  ? __do_page_fault+0x24d/0x490
[    6.261707]  do_group_exit+0x3a/0xa0
[    6.262267]  SyS_exit_group+0x10/0x10
[    6.262802]  do_syscall_64+0x75/0x180
[    6.263391]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[    6.264253] RIP: 0033:0x7f1b39c49488
[    6.264827] RSP: 002b:00007ffe2de05b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[    6.266049] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1b39c49488
[    6.267187] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[    6.268377] RBP: 00007f1b39f258e0 R08: 00000000000000e7 R09: ffffffffffffff98
[    6.269640] R10: 00007f1b3a147260 R11: 0000000000000246 R12: 00007f1b39f258e0
[    6.270783] R13: 00007f1b39f2ac20 R14: 0000000000000000 R15: 0000000000000000
[    6.271943] Code: 74 07 31 d2 e9 25 d8 6c 00 b8 da ff ff ff c3 0f 1f
44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 07 53 48 8b
5f 08 <48> 8b 80 28 02 00 00 e8 f7 d7 6c 00 85 c0 75 04 3e ff 4b 18 5b
[    6.274927] RIP: ib_dereg_mr+0xd/0x30 RSP: ffffaf5d001d7d68
[    6.275760] CR2: 0000000000000228
[    6.276200] ---[ end trace a35641f1c474bd20 ]---

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org>
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:37:53 -04:00
Leon Romanovsky fbf1795c96 RDMA/pvrdma: Properly annotate QP states
QP states provided by core layer are converted to enum ib_qp_state
and better to use internal variable in that type instead of int.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:34:25 -04:00
Leon Romanovsky 75a4598209 RDMA/mlx5: Fix NULL dereference while accessing XRC_TGT QPs
mlx5 modify_qp() relies on FW that the error will be thrown if wrong
state is supplied. The missing check in FW causes the following crash
while using XRC_TGT QPs.

[   14.769632] BUG: unable to handle kernel NULL pointer dereference at (null)
[   14.771085] IP: mlx5_ib_modify_qp+0xf60/0x13f0
[   14.771894] PGD 800000001472e067 P4D 800000001472e067 PUD 14529067 PMD 0
[   14.773126] Oops: 0002 [#1] SMP PTI
[   14.773763] CPU: 0 PID: 365 Comm: ubsan Not tainted 4.16.0-rc1-00038-g8151138c0793 #119
[   14.775192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   14.777522] RIP: 0010:mlx5_ib_modify_qp+0xf60/0x13f0
[   14.778417] RSP: 0018:ffffbf48001c7bd8 EFLAGS: 00010246
[   14.779346] RAX: 0000000000000000 RBX: ffff9a8f9447d400 RCX: 0000000000000000
[   14.780643] RDX: 0000000000000000 RSI: 000000000000000a RDI: 0000000000000000
[   14.781930] RBP: 0000000000000000 R08: 00000000000217b0 R09: ffffffffbc9c1504
[   14.783214] R10: fffff4a180519480 R11: ffff9a8f94523600 R12: ffff9a8f9493e240
[   14.784507] R13: ffff9a8f9447d738 R14: 000000000000050a R15: 0000000000000000
[   14.785800] FS:  00007f545b466700(0000) GS:ffff9a8f9fc00000(0000) knlGS:0000000000000000
[   14.787073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.787792] CR2: 0000000000000000 CR3: 00000000144be000 CR4: 00000000000006b0
[   14.788689] Call Trace:
[   14.789007]  _ib_modify_qp+0x71/0x120
[   14.789475]  modify_qp.isra.20+0x207/0x2f0
[   14.790010]  ib_uverbs_modify_qp+0x90/0xe0
[   14.790532]  ib_uverbs_write+0x1d2/0x3c0
[   14.791049]  ? __handle_mm_fault+0x93c/0xe40
[   14.791644]  __vfs_write+0x36/0x180
[   14.792096]  ? handle_mm_fault+0xc1/0x210
[   14.792601]  vfs_write+0xad/0x1e0
[   14.793018]  SyS_write+0x52/0xc0
[   14.793422]  do_syscall_64+0x75/0x180
[   14.793888]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   14.794527] RIP: 0033:0x7f545ad76099
[   14.794975] RSP: 002b:00007ffd78787468 EFLAGS: 00000287 ORIG_RAX: 0000000000000001
[   14.795958] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f545ad76099
[   14.797075] RDX: 0000000000000078 RSI: 0000000020009000 RDI: 0000000000000003
[   14.798140] RBP: 00007ffd78787470 R08: 00007ffd78787480 R09: 00007ffd78787480
[   14.799207] R10: 00007ffd78787480 R11: 0000000000000287 R12: 00005599ada98760
[   14.800277] R13: 00007ffd78787560 R14: 0000000000000000 R15: 0000000000000000
[   14.801341] Code: 4c 8b 1c 24 48 8b 83 70 02 00 00 48 c7 83 cc 02 00
00 00 00 00 00 48 c7 83 24 03 00 00 00 00 00 00 c7 83 2c 03 00 00 00 00
00 00 <c7> 00 00 00 00 00 48 8b 83 70 02 00 00 c7 40 04 00 00 00 00 4c
[   14.804012] RIP: mlx5_ib_modify_qp+0xf60/0x13f0 RSP: ffffbf48001c7bd8
[   14.804838] CR2: 0000000000000000
[   14.805288] ---[ end trace 3f1da0df5c8b7c37 ]---

Cc: syzkaller <syzkaller@googlegroups.com>
Reported-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 15:34:25 -04:00
Zhu Yanjun 8a18e911d0 IB: remove duplicate header files
In hfi.h, the header file opa_addr.h is included twice.
In vt.h, the header file mmap.h is included twice.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:46:03 -04:00
Yixian Liu 86188a8810 RDMA/hns: Support cq record doorbell for kernel space
This patch updates to support cq record doorbell for
the kernel space.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:40:15 -04:00
Yixian Liu 472bc0fbd4 RDMA/hns: Support rq record doorbell for kernel space
This patch updates to support rq record doorbell for
the kernel space.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:40:15 -04:00
Yixian Liu 9b44703d0a RDMA/hns: Support cq record doorbell for the user space
This patch updates to support cq record doorbell for
the user space.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:40:15 -04:00
Yixian Liu e088a685ea RDMA/hns: Support rq record doorbell for the user space
This patch adds interfaces and definitions to support the rq record
doorbell for the user space.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:40:15 -04:00
Boris Pismenny c2b37f7648 IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
This patch validates user provided input to prevent integer overflow due
to integer manipulation in the mlx5_ib_create_srq function.

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:31:21 -04:00
Boris Pismenny 2c292dbb39 IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
Add a check for the length of the qpin structure to prevent out-of-bounds reads

BUG: KASAN: slab-out-of-bounds in create_raw_packet_qp+0x114c/0x15e2
Read of size 8192 at addr ffff880066b99290 by task syz-executor3/549

CPU: 3 PID: 549 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #27 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0x8d/0xd4
 print_address_description+0x73/0x290
 kasan_report+0x25c/0x370
 ? create_raw_packet_qp+0x114c/0x15e2
 memcpy+0x1f/0x50
 create_raw_packet_qp+0x114c/0x15e2
 ? create_raw_packet_qp_tis.isra.28+0x13d/0x13d
 ? lock_acquire+0x370/0x370
 create_qp_common+0x2245/0x3b50
 ? destroy_qp_user.isra.47+0x100/0x100
 ? kasan_kmalloc+0x13d/0x170
 ? sched_clock_cpu+0x18/0x180
 ? fs_reclaim_acquire.part.15+0x5/0x30
 ? __lock_acquire+0xa11/0x1da0
 ? sched_clock_cpu+0x18/0x180
 ? kmem_cache_alloc_trace+0x17e/0x310
 ? mlx5_ib_create_qp+0x30e/0x17b0
 mlx5_ib_create_qp+0x33d/0x17b0
 ? sched_clock_cpu+0x18/0x180
 ? create_qp_common+0x3b50/0x3b50
 ? lock_acquire+0x370/0x370
 ? __radix_tree_lookup+0x180/0x220
 ? uverbs_try_lock_object+0x68/0xc0
 ? rdma_lookup_get_uobject+0x114/0x240
 create_qp.isra.5+0xce4/0x1e20
 ? ib_uverbs_ex_create_cq_cb+0xa0/0xa0
 ? copy_ah_attr_from_uverbs.isra.2+0xa00/0xa00
 ? ib_uverbs_cq_event_handler+0x160/0x160
 ? __might_fault+0x17c/0x1c0
 ib_uverbs_create_qp+0x21b/0x2a0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ib_uverbs_write+0x55a/0xad0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ? ib_uverbs_destroy_cq+0x2e0/0x2e0
 ? ib_uverbs_open+0x760/0x760
 ? futex_wake+0x147/0x410
 ? check_prev_add+0x1680/0x1680
 ? do_futex+0x3d3/0xa60
 ? sched_clock_cpu+0x18/0x180
 __vfs_write+0xf7/0x5c0
 ? ib_uverbs_open+0x760/0x760
 ? kernel_read+0x110/0x110
 ? lock_acquire+0x370/0x370
 ? __fget+0x264/0x3b0
 vfs_write+0x18a/0x460
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x4477b9
RSP: 002b:00007f1822cadc18 EFLAGS: 00000292 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004477b9
RDX: 0000000000000070 RSI: 000000002000a000 RDI: 0000000000000005
RBP: 0000000000708000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 00000000ffffffff
R13: 0000000000005d70 R14: 00000000006e6e30 R15: 0000000020010ff0

Allocated by task 549:
 __kmalloc+0x15e/0x340
 kvmalloc_node+0xa1/0xd0
 create_user_qp.isra.46+0xd42/0x1610
 create_qp_common+0x2e63/0x3b50
 mlx5_ib_create_qp+0x33d/0x17b0
 create_qp.isra.5+0xce4/0x1e20
 ib_uverbs_create_qp+0x21b/0x2a0
 ib_uverbs_write+0x55a/0xad0
 __vfs_write+0xf7/0x5c0
 vfs_write+0x18a/0x460
 SyS_write+0xc7/0x1a0
 entry_SYSCALL_64_fastpath+0x18/0x85

Freed by task 368:
 kfree+0xeb/0x2f0
 kernfs_fop_release+0x140/0x180
 __fput+0x266/0x700
 task_work_run+0x104/0x180
 exit_to_usermode_loop+0xf7/0x110
 syscall_return_slowpath+0x298/0x370
 entry_SYSCALL_64_fastpath+0x83/0x85

The buggy address belongs to the object at ffff880066b99180  which
belongs to the cache kmalloc-512 of size 512 The buggy address is
located 272 bytes inside of  512-byte region [ffff880066b99180,
ffff880066b99380) The buggy address belongs to the page:
page:000000006040eedd count:1 mapcount:0 mapping:          (null)
index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 0000000180190019
raw: ffffea00019a7500 0000000b0000000b ffff88006c403080 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880066b99180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880066b99200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880066b99280: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
 ffff880066b99300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff880066b99380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: 0fb2ed66a1 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:30:21 -04:00
Bart Van Assche 036ef0a1a8 RDMA/bnxt_re: Remove an unused variable
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Cc: Devesh Sharma <devesh.sharma@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:21:42 -04:00
Bart Van Assche 8932ff803d IB/hfi1: Fix a kernel-doc warning
Avoid that building with W=1 causes the following warning to appear:

drivers/infiniband/hw/hfi1/qp.c:484: warning: Cannot understand * on line 484 - I thought it was a doc line

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-13 16:21:14 -04:00
Leon Romanovsky 28e9091e31 RDMA/mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow as it can be seen in the following UBSAN warning:

=======================================================================
UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/cq.c:1192:53
signed integer overflow:
64870 * 65536 cannot be represented in type 'int'
CPU: 0 PID: 267 Comm: syzkaller605279 Not tainted 4.15.0+ #90 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ubsan_epilogue+0xe/0x81
 handle_overflow+0x1f3/0x251
 ? __ubsan_handle_negate_overflow+0x19b/0x19b
 ? lock_acquire+0x440/0x440
 mlx5_ib_resize_cq+0x17e7/0x1e40
 ? cyc2ns_read_end+0x10/0x10
 ? native_read_msr_safe+0x6c/0x9b
 ? cyc2ns_read_end+0x10/0x10
 ? mlx5_ib_modify_cq+0x220/0x220
 ? sched_clock_cpu+0x18/0x200
 ? lookup_get_idr_uobject+0x200/0x200
 ? rdma_lookup_get_uobject+0x145/0x2f0
 ib_uverbs_resize_cq+0x207/0x3e0
 ? ib_uverbs_ex_create_cq+0x250/0x250
 ib_uverbs_write+0x7f9/0xef0
 ? cyc2ns_read_end+0x10/0x10
 ? print_irqtrace_events+0x280/0x280
 ? ib_uverbs_ex_create_cq+0x250/0x250
 ? uverbs_devnode+0x110/0x110
 ? sched_clock_cpu+0x18/0x200
 ? do_raw_spin_trylock+0x100/0x100
 ? __lru_cache_add+0x16e/0x290
 __vfs_write+0x10d/0x700
 ? uverbs_devnode+0x110/0x110
 ? kernel_read+0x170/0x170
 ? sched_clock_cpu+0x18/0x200
 ? security_file_permission+0x93/0x260
 vfs_write+0x1b0/0x550
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x433549
RSP: 002b:00007ffe63bd1ea8 EFLAGS: 00000217
=======================================================================

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 3.13
Fixes: bde51583f4 ("IB/mlx5: Add support for resize CQ")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-09 18:10:48 -05:00
Doug Ledford 212a0cbc56 Revert "RDMA/mlx5: Fix integer overflow while resizing CQ"
The original commit of this patch has a munged log message that is
missing several of the tags the original author intended to be on the
patch.  This was due to patchworks misinterpreting a cut-n-paste
separator line as an end of message line and munging the mbox that was
used to import the patch:

https://patchwork.kernel.org/patch/10264089/

The original patch will be reapplied with a fixed commit message so the
proper tags are applied.

This reverts commit aa0de36a40.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-09 18:07:46 -05:00
Steve Wise 5292443431 mlx4_ib: zero out struct ib_pd when allocating
Zero out the fields of the struct ib_pd for user mode pds so that
users querying pds via nldev will not get garbage.  For simplicity,
use kzalloc() to allocate the mlx4_ib_pd struct.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise e6f0330106 mlx4_ib: set user mr attributes in struct ib_mr
Setting iova, length, and page_size allows this information to be
seen via NLDEV netlink queries, which can aid in user rdma debugging.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Steve Wise 750fb1656a iw_cxgb4: initialize ib_mr fields for user mrs
Some of the struct ib_mr fields weren't getting initialized.  This was
benign, but will cause problems when dumping the mr resource via
nldev/restrack.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08 15:03:03 -05:00
Yishai Hadas d50a8a96ee IB/mlx4: Move mlx4_uverbs_ex_query_device_resp to include/uapi/
This struct is involved in the user API for mlx4 and should not be hidden
inside a driver header file.

Fixes: 09d208b258 ("IB/mlx4: Add report for RSS capabilities by vendor channel")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-07 16:10:07 -07:00
Doug Ledford 1abb791fcd Merge tag 'mlx5-updates-2018-02-28-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-next
mlx5-updates-2018-02-28-1 (IPSec-1)

This series consists of some fixes and refactors for the mlx5 drivers,
especially around the FPGA and flow steering. Most of them are trivial
fixes and are the foundation of allowing IPSec acceleration from user-space.

We use flow steering abstraction in order to accelerate IPSec packets.
When a user creates a steering rule, [s]he states that we'll carry an
encrypt/decrypt flow action (using a specific configuration) for every
packet which conforms to a certain match. Since currently offloading these
packets is done via FPGA, we'll add another set of flow steering ops.
These ops will execute the required FPGA commands and then call the
standard steering ops.

In order to achieve this, we need that the commands will get all the
required information. Therefore, we pass the fte object and embed the
flow_action struct inside the fte. In addition, we add the shim layer
that will later be used for alternating between the standard and the
FPGA steering commands.

Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout"
are very relevant for user-space applications, as these applications could
be killed, but we still want to wait for the FPGA and update the kernel's
database.

Regards,
Aviad and Matan

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:39 -07:00
David S. Miller bcde6b725f mlx5-updates-2018-02-28-1 (IPSec-1)
This series consists of some fixes and refactors for the mlx5 drivers,
 especially around the FPGA and flow steering. Most of them are trivial
 fixes and are the foundation of allowing IPSec acceleration from user-space.
 
 We use flow steering abstraction in order to accelerate IPSec packets.
 When a user creates a steering rule, [s]he states that we'll carry an
 encrypt/decrypt flow action (using a specific configuration) for every
 packet which conforms to a certain match. Since currently offloading these
 packets is done via FPGA, we'll add another set of flow steering ops.
 These ops will execute the required FPGA commands and then call the
 standard steering ops.
 
 In order to achieve this, we need that the commands will get all the
 required information. Therefore, we pass the fte object and embed the
 flow_action struct inside the fte. In addition, we add the shim layer
 that will later be used for alternating between the standard and the
 FPGA steering commands.
 
 Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout"
 are very relevant for user-space applications, as these applications could
 be killed, but we still want to wait for the FPGA and update the kernel's
 database.
 
 Regards,
 Aviad and Matan
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJan4UmAAoJEEg/ir3gV/o+cZwH/1xBpdLsmeqEimwQ41bAc9Rj
 UmPZXXMyQVUYfGOiE1aLTH7YNi38XWSnTFMN7HklMeX/9YKxUZNG8YuiO9iQhE1B
 rUqRKfYFz9oFrUh95SzeclaunTpKrhYKHjQ9/u9nBMfI3H2Fy+y2NUBjIqJ6nysz
 op3EwRcX5kD4+MjRum24XLUnMSYbg05mHCZKTDj5/2T4x+/j0XQqvvmWWinIt8BO
 R4d7XGGywGjbhtcG1j+XBcFeLsEZnS/w70hoN38TdcmNWvokl1pGk8DVDii9i7GX
 c5jQj2h5WG/bdsS26y8MFfWpoAn3Qzzm4W4OYwp/vmL7n/Llvq0GRCEKCi8AlA0=
 =LeYV
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-28-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-02-28-1 (IPSec-1)

This series consists of some fixes and refactors for the mlx5 drivers,
especially around the FPGA and flow steering. Most of them are trivial
fixes and are the foundation of allowing IPSec acceleration from user-space.

We use flow steering abstraction in order to accelerate IPSec packets.
When a user creates a steering rule, [s]he states that we'll carry an
encrypt/decrypt flow action (using a specific configuration) for every
packet which conforms to a certain match. Since currently offloading these
packets is done via FPGA, we'll add another set of flow steering ops.
These ops will execute the required FPGA commands and then call the
standard steering ops.

In order to achieve this, we need that the commands will get all the
required information. Therefore, we pass the fte object and embed the
flow_action struct inside the fte. In addition, we add the shim layer
that will later be used for alternating between the standard and the
FPGA steering commands.

Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout"
are very relevant for user-space applications, as these applications could
be killed, but we still want to wait for the FPGA and update the kernel's
database.

Regards,
Aviad and Matan
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07 15:28:13 -05:00
Leon Romanovsky aa0de36a40 RDMA/mlx5: Fix integer overflow while resizing CQ
The user can provide very large cqe_size which will cause to integer
overflow as it can be seen in the following UBSAN warning:

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:23:43 -05:00
Boris Pismenny 3346c48737 {net,IB}/mlx5: Add flow steering helpers
Add helper functions that check if a protocol is
part of a flow steering match criteria.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:20:14 -08:00
Matan Barak a9db0ecf15 {net,IB}/mlx5: Add has_tag to mlx5_flow_act
The has_tag member will indicate whether a tag action was specified
in flow specification.

A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag
that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0
means that the user doesn't care about flow_tag.  HW always provide
a flow_tag = 0 if all flow tags requested on a specific flow are 0.

So we need a way (in the driver) to differentiate between a user really
requesting flow_tag = 0 and a user who does not care, in order to be
able to report conflicting flow tags on a specific flow.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:33 -08:00
Boris Pismenny 075572d4b7 IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments
Group and pass all function arguments of parse_flow_attr call in one
common struct mlx5_flow_act.

This patch passes all the action arguments of parse_flow_attr in one common
struct mlx5_flow_act. It allows us to scale the number of actions without adding
new arguments to the function.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 22:06:11 -08:00
Aviad Yehezkel c33251a3c6 IB/mlx5: Removed not used parameters
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 22:05:36 -08:00
Selvin Xavier 942c9b6ca8 RDMA/bnxt_re: Avoid Hard lockup during error CQE processing
Hitting the following hardlockup due to a race condition in
error CQE processing.

[26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req
[26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa
[26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
[26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded
[26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8,
[26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[26156.472639] Call Trace:
[26156.475379]  <NMI>  [<ffffffff98d0d722>] dump_stack+0x19/0x1b
[26156.481833]  [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140
[26156.489341]  [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100
[26156.496256]  [<ffffffff98787c24>] perf_event_overflow+0x14/0x20
[26156.502887]  [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510
[26156.509813]  [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50
[26156.516738]  [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150
[26156.523273]  [<ffffffff98d17be8>] do_nmi+0x218/0x460
[26156.528834]  [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e
[26156.534980]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.543268]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.551556]  [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200
[26156.559842]  <EOE>  [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf
[26156.567555]  [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30
[26156.573696]  [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re]
[26156.581789]  [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re]
[26156.589493]  [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re]

The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to
put the error QP in flush list. Since SQ and RQ of each error qp are added
to two different flush list, we need to protect it using locks of
corresponding CQs. Difference in order of acquiring the lock in
SQ poll_cq and RQ poll_cq can cause a hard lockup.

Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock.
Instead of this lock, introduces qplib_cq.flush_lock to handle
addition/deletion of QPs in flush list. Also, always invoke the flush_lock
in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential
deadlock.

Other than the poll_cq context, the movement of QP to/from flush list can
be done in modify_qp context or from an async error event from HW.
Synchronize these operations using the bnxt_re verbs layer CQ locks.
To achieve this, adds a call back to the HW abstraction layer(qplib) to
bnxt_re ib_verbs layer in case of async error event. Also, removes the
buddy cq functions as it is no longer required.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:39 -07:00
Dan Carpenter 5d414b178e IB/mlx5: Fix an error code in __mlx5_ib_modify_qp()
"err" is either zero or possibly uninitialized here.  It should be
-EINVAL.

Fixes: 427c1e7bcd ("{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:39 -07:00
Mark Bloch 210b1f7807 IB/mlx5: When not in dual port RoCE mode, use provided port as native
The series that introduced dual port RoCE mode assumed that we don't have
a dual port HCA that use the mlx5 driver, this is not the case for
Connect-IB HCAs. This reasoning led to assigning 1 as the native port
index which causes issue when the second port is used.

For example query_pkey() when called on the second port will return values
of the first port. Make sure that we assign the right port index as the
native port index.

Fixes: 32f69e4be2 ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:38 -07:00
Jack M a18177925c IB/mlx4: Include GID type when deleting GIDs from HW table under RoCE
The commit cited below added a gid_type field (RoCEv1 or RoCEv2)
to GID properties.

When adding GIDs, this gid_type field was copied over to the
hardware gid table. However, when deleting GIDs, the gid_type field
was not copied over to the hardware gid table.

As a result, when running RoCEv2, all RoCEv2 gids in the
hardware gid table were set to type RoCEv1 when any gid was deleted.

This problem would persist until the next gid was added (which would again
restore the gid_type field for all the gids in the hardware gid table).

Fix this by copying over the gid_type field to the hardware gid table
when deleting gids, so that the gid_type of all remaining gids is
preserved when a gid is deleted.

Fixes: b699a859d1 ("IB/mlx4: Add gid_type to GID properties")
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:38 -07:00
Jack Morgenstein 0077416a3d IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDs
When using IPv4 addresses in RoCEv2, the GID format for the mapped
IPv4 address should be: ::ffff:<4-byte IPv4 address>.

In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords
zeroed out by memset, which resulted in deleting the ffff field.

However, since procedure ipv6_addr_v4mapped() already verifies that the
gid has format ::ffff:<ipv4 address>, no change is needed for the gid,
and the memset can simply be removed.

Fixes: 7e57b85c44 ("IB/mlx4: Add support for setting RoCEv2 gids in hardware")
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 20:08:38 -07:00
Kalderon, Michal 551e1c67b4 RDMA/qedr: Fix iWARP write and send with immediate
iWARP does not support RDMA WRITE or SEND with immediate data.
Driver should check this before submitting to FW and return an
immediate error

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 19:57:37 -07:00
Kalderon, Michal e3fd112cbf RDMA/qedr: Fix kernel panic when running fio over NFSoRDMA
Race in qedr_poll_cq, lastest_cqe wasn't protected by lock,
leading to a case where two context's accessing poll_cq at
the same time lead to one of them having a pointer to an old
latest_cqe and reading an invalid cqe element

Signed-off-by: Amit Radzi <Amit.Radzi@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 19:57:37 -07:00
Kalderon, Michal ea0ed47803 RDMA/qedr: Fix iWARP connect with port mapper
Fix iWARP connect and listen to use the mapped port for
ipv4 and ipv6. Without this fixed, running on a server
that has iwpmd enabled will not use the correct port

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 19:57:37 -07:00
Kalderon, Michal 11052696fd RDMA/qedr: Fix ipv6 destination address resolution
The wrong parameter was passed to dst_neigh_lookup

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 19:57:37 -07:00
Gustavo A. R. Silva 63231585a6 RDMA/bnxt_re/qplib_sp: Use true and false for boolean values
Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Bart Van Assche 4190443947 IB/hfi1: Add a missing rcu_read_unlock()
This patch avoids that sparse reports the following:

drivers/infiniband/hw/hfi1/driver.c:251:13: warning: context imbalance in 'rcv_hdrerr' - different lock contexts for basic block

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Arushi 666fe24bbe infiniband: hw: Drop unnecessary continue
Continue at the bottom of a loop are removed.
Issue found using drop_continue.cocci Coccinelle script.

Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Shiraz Saleem 7e952b19eb i40iw: Implement get_vector_affinity API
Storage ULPs (like NVMEoF) benefit from exposing affinity mapping
per completion vector to find the optimal multi-queue affinity
assignments. The ULPs call the verbs API ib_get_vector_affinity
introduced in commit c66cd353bb ("RDMA/core: expose affinity mappings per
completion vector") to get the underlying devices affinity mappings.

Add support in driver to expose the affinity masks per MSI-X
completion vector.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Shiraz Saleem 7de8b3576a i40iw: Improve CM node lookup time on connection setup
Currently all CM nodes involved in a connection are
maintained in a connected_node list per dev. During
connection setup, we need to search this every time
we receive a packet on the iWARP LAN Queue (ILQ) and
this can be pretty inefficient for large number of
connections.

Fix this by organizing the CM nodes in two lists -
accelerated list and non-accelerated list. The search
on ILQ receive would be limited to only non accelerated
nodes. When a node moves to RTS, it is added to the
accelerated list.

Benchmarking ucmatose 16k connections shows a 20%
improvement in test completion time.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Mustafa Ismail 6b0c549fc6 i40iw: Refactor handling of txpend list
Currently the TX pending lists for IEQ and ILQ are
handled separately. The handling of both can be
consolidated in i40iw_poll_completion.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Hernán Gonzalez 7f566a91b1 IB/qib: Move char *qib_sdma_state_names[] and constify while there.
Note: This is compile only tested as I have no access to the hw.
This variable was not used in qib_sdma.c but in qib_iba7322.c. Declaring it
there, as static, saves 56 bytes.

add/remove: 0/2 grow/shrink: 0/0 up/down: 0/-144 (-144)
Function                                     old     new   delta
qib_sdma_state_names                          56       -     -56
qib_sdma_event_names                          88       -     -88
Total: Before=2874565, After=2874421, chg -0.01%

Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:57:40 -07:00
Hernán Gonzalez 4f1d583432 IB/qib: Remove unused variable (char *qib_sdma_event_names[])
Note: This is compile only tested as I have no access to the hw.

This variable was not used anywhere in the code. Removing it saves 88
bytes.

add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-88 (-88)
Function                                     old     new   delta
qib_sdma_event_names                          88       -     -88
Total: Before=2874565, After=2874477, chg -0.00%

Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:57:39 -07:00
Arnd Bergmann a8ed748708 infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks
On 32-bit targets, we otherwise get a warning about an impossible constant
integer expression:

In file included from include/linux/kernel.h:11,
                 from include/linux/interrupt.h:6,
                 from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
include/linux/bitops.h:7:24: error: left shift count >= width of type [-Werror=shift-count-overflow]
 #define BIT(nr)   (1UL << (nr))
                        ^~
drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 'BIT'
 #define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
                                  ^~~
drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE_HIGH'
 #define BNXT_RE_MAX_MR_SIZE  BNXT_RE_MAX_MR_SIZE_HIGH
                              ^~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE'
  ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
                         ^~~~~~~~~~~~~~~~~~~

Fixes: 872f357824 ("RDMA/bnxt_re: Add support for MRs with Huge pages")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:57:39 -07:00
Arnd Bergmann e5d6574ded infiniband: qplib_fp: fix pointer cast
Building for a 32-bit target results in a couple of warnings from casting
between a 32-bit pointer and a 64-bit integer:

drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq':
drivers/infiniband/hw/bnxt_re/qplib_fp.c:333:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle,
                       ^
drivers/infiniband/hw/bnxt_re/qplib_fp.c:336:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
            (struct bnxt_qplib_srq *)q_handle,
            ^
In file included from include/linux/byteorder/little_endian.h:5,
                 from arch/arm/include/uapi/asm/byteorder.h:22,
                 from include/asm-generic/bitops/le.h:6,
                 from arch/arm/include/asm/bitops.h:342,
                 from include/linux/bitops.h:38,
                 from include/linux/kernel.h:11,
                 from include/linux/interrupt.h:6,
                 from drivers/infiniband/hw/bnxt_re/qplib_fp.c:39:
drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_create_srq':
include/uapi/linux/byteorder/little_endian.h:31:43: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
 #define __cpu_to_le64(x) ((__force __le64)(__u64)(x))
                                           ^
include/linux/byteorder/generic.h:86:21: note: in expansion of macro '__cpu_to_le64'
 #define cpu_to_le64 __cpu_to_le64
                     ^~~~~~~~~~~~~
drivers/infiniband/hw/bnxt_re/qplib_fp.c:569:19: note: in expansion of macro 'cpu_to_le64'
  req.srq_handle = cpu_to_le64(srq);

Using a uintptr_t as an intermediate works on all architectures.

Fixes: 37cb11acf1 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:57:39 -07:00
Markus Elfring b7c5bc7368 IB/usnic: Delete an error message for a failed memory allocation in usnic_transport_init()
Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:57:39 -07:00
Leon Romanovsky 55de9a77da RDMA/mlx5: Refactor QP type check to be as early as possible
Perform QP type check in one place and fail as early as possible.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:16:36 -07:00
Selvin Xavier 497158aa5f RDMA/bnxt_re: Fix the ib_reg failure cleanup
Release the netdev references in the cleanup path.  Invokes the cleanup
routines if bnxt_re_ib_reg fails.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:33 -07:00
Devesh Sharma c354dff00d RDMA/bnxt_re: Fix incorrect DB offset calculation
To support host systems with non 4K page size, l2_db_size shall be
calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size
to FW during initialization.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Devesh Sharma a45bc17b36 RDMA/bnxt_re: Unconditionly fence non wire memory operations
HW requires an unconditonal fence for all non-wire memory operations
through SQ. This guarantees the completions of these memory operations.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Moni Shoua 65389322b2 IB/mlx: Set slid to zero in Ethernet completion struct
IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16()  validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.

Fixes: 62ede77799 ("Add OPA extended LID support")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Daniel Jurgens aba4621346 {net, IB}/mlx5: Raise fatal IB event when sys error occurs
All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Fixes: 89d44f0a6c73("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Noa Osherovich e7b169f344 IB/mlx5: Avoid passing an invalid QP type to firmware
During QP creation, the mlx5 driver translates the QP type to an
internal value which is passed on to FW. There was no check to make
sure that the translated value is valid, and -EINVAL was coerced into
the mailbox command.

Current firmware refuses this as an invalid QP type, but future/past
firmware may do something else.

Fixes: 09a7d9eca1 ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Sergey Gorenko da343b6d90 IB/mlx5: Fix incorrect size of klms in the memory region
The value of mr->ndescs greater than mr->max_descs is set in the
function mlx5_ib_sg_to_klms() if sg_nents is greater than
mr->max_descs. This is an invalid value and it causes the
following error when registering mr:

mlx5_0:dump_cqe:276:(pid 193): dump error cqe
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00000030: 00 00 00 00 0f 00 78 06 25 00 00 8b 08 1e 8f d3

Cc: <stable@vger.kernel.org> # 4.5
Fixes: b005d31647 ("mlx5: Add arbitrary sg list support")
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Doug Ledford 1d1ab1ae69 mlx5-update-2018-02-23 (IB representors)
From: Mark Bloch <markb@mellanox.com>
 =========
 Add IB representor when in switchdev mode
 
 The following series adds support for an IB (RAW Ethernet only) device
 representor which is created when the user switches to switchdev mode.
 
 Today when switching to switchdev mode the only representors which are
 created are net devices. Each netdev is a representor of a virtual
 function and any data sent via the representor is received on the virtual
 function, and any data sent via the virtual function is received by the
 representor.
 
 For the mlx5 driver the main use of this functionality is to be able to
 use Open vSwitch on the hypervisor in order to manage/control traffic
 from/to the virtual functions. Open vSwitch can also work with  DPDK
 devices and not just net devices, this series exposes an IB device, which
 Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.
 
 An IB device representor exposes only RAW Ethernet QP capabilities and
 the ability to create flow rules to direct traffic to its RX queues. The
 state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
 corresponding net device representor. No other RDMA/RoCE functionality is
 currently supported and no GID table is exposed.
 =========
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJakH7zAAoJEEg/ir3gV/o+c/MIAMGGgNajr49+JP3t9wnrs011
 +cTfAfM88HBzTlfb/COEBz+jurH2oB7ZF4RZC29S+6pR3loKKBuvbiPndE0XKjSg
 Ue4sOkawybmDvfo9ZiMsusOiMfTp5wsLmqJP1HRUvGMAlSBeriMTZfbiKzx5c3Ok
 X8cMnRIvUOtCoQaJTfKarDUn4OF8aFam4tQW8k/RAo77kTPyihb1NlGiblrcCA2E
 PWYAOWW3D8gvE0cr19JVgEqpKIaJ/VRyjwQ7m8XSvfBJtw1ZTO6YMXiXbWMOsRzD
 fx33H+n/qwJT0cnxDmSpZrR7mEk+Wr2HL92O85KDupOSgLOIlywmtIIkEAnCeaw=
 =Fq6m
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-next

mlx5-update-2018-02-23 (IB representors)

From: Mark Bloch <markb@mellanox.com>
=========
Add IB representor when in switchdev mode

The following series adds support for an IB (RAW Ethernet only) device
representor which is created when the user switches to switchdev mode.

Today when switching to switchdev mode the only representors which are
created are net devices. Each netdev is a representor of a virtual
function and any data sent via the representor is received on the virtual
function, and any data sent via the virtual function is received by the
representor.

For the mlx5 driver the main use of this functionality is to be able to
use Open vSwitch on the hypervisor in order to manage/control traffic
from/to the virtual functions. Open vSwitch can also work with  DPDK
devices and not just net devices, this series exposes an IB device, which
Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.

An IB device representor exposes only RAW Ethernet QP capabilities and
the ability to create flow rules to direct traffic to its RX queues. The
state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
corresponding net device representor. No other RDMA/RoCE functionality is
currently supported and no GID table is exposed.
=========

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-28 13:37:39 -05:00
David S. Miller fb66cb0775 mlx5-update-2018-02-23 (IB representors)
From: Mark Bloch <markb@mellanox.com>
 =========
 Add IB representor when in switchdev mode
 
 The following series adds support for an IB (RAW Ethernet only) device
 representor which is created when the user switches to switchdev mode.
 
 Today when switching to switchdev mode the only representors which are
 created are net devices. Each netdev is a representor of a virtual
 function and any data sent via the representor is received on the virtual
 function, and any data sent via the virtual function is received by the
 representor.
 
 For the mlx5 driver the main use of this functionality is to be able to
 use Open vSwitch on the hypervisor in order to manage/control traffic
 from/to the virtual functions. Open vSwitch can also work with  DPDK
 devices and not just net devices, this series exposes an IB device, which
 Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.
 
 An IB device representor exposes only RAW Ethernet QP capabilities and
 the ability to create flow rules to direct traffic to its RX queues. The
 state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
 corresponding net device representor. No other RDMA/RoCE functionality is
 currently supported and no GID table is exposed.
 =========
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJakH7zAAoJEEg/ir3gV/o+c/MIAMGGgNajr49+JP3t9wnrs011
 +cTfAfM88HBzTlfb/COEBz+jurH2oB7ZF4RZC29S+6pR3loKKBuvbiPndE0XKjSg
 Ue4sOkawybmDvfo9ZiMsusOiMfTp5wsLmqJP1HRUvGMAlSBeriMTZfbiKzx5c3Ok
 X8cMnRIvUOtCoQaJTfKarDUn4OF8aFam4tQW8k/RAo77kTPyihb1NlGiblrcCA2E
 PWYAOWW3D8gvE0cr19JVgEqpKIaJ/VRyjwQ7m8XSvfBJtw1ZTO6YMXiXbWMOsRzD
 fx33H+n/qwJT0cnxDmSpZrR7mEk+Wr2HL92O85KDupOSgLOIlywmtIIkEAnCeaw=
 =Fq6m
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

mlx5-update-2018-02-23 (IB representors)

From: Mark Bloch <markb@mellanox.com>
=========
Add IB representor when in switchdev mode

The following series adds support for an IB (RAW Ethernet only) device
representor which is created when the user switches to switchdev mode.

Today when switching to switchdev mode the only representors which are
created are net devices. Each netdev is a representor of a virtual
function and any data sent via the representor is received on the virtual
function, and any data sent via the virtual function is received by the
representor.

For the mlx5 driver the main use of this functionality is to be able to
use Open vSwitch on the hypervisor in order to manage/control traffic
from/to the virtual functions. Open vSwitch can also work with  DPDK
devices and not just net devices, this series exposes an IB device, which
Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.

An IB device representor exposes only RAW Ethernet QP capabilities and
the ability to create flow rules to direct traffic to its RX queues. The
state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
corresponding net device representor. No other RDMA/RoCE functionality is
currently supported and no GID table is exposed.
=========

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 09:54:54 -05:00
David S. Miller f74290fdb3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-02-24 00:04:20 -05:00
Mark Bloch ec9c2fb8ce IB/mlx5: Disable self loopback check when in switchdev mode
When in switchdev mode, there is no need to do self loopback checks
as we can't receive those packets, we insert steering rules to the
eswitch that make sure packets can't be looped back.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch b5ca15ad7e IB/mlx5: Add proper representors support
This commit adds full support for IB representor:

1) Representors profile, We add two new profiles:
   nic_rep_profile - This profile will be used to create an IB device that
   represents the PF/UPLINK.
   rep_profile - This profile will be used to create an IB device that
   represents VFs. Each VF will be its own representor.
2) Proper load/unload callbacks, Those are called by the E-Switch when
   moving to/from switchdev mode.
3) Different flow DB handling for when we in switchdev mode.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch b96c9dde17 IB/mlx5: E-Switch, Add rule to forward traffic to vport
In order to forward traffic from representor's SQ to the right virtual
function, every time an SQ is created also add the corresponding flow rule
to the FDB.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch 72afcf8247 IB/mlx5: Don't expose MR cache in switchdev mode
When enabling many VFs and switching to switchdev mode, the total amount
of mkeys we try to allocate when loading representors is very large and
may cause timeouts on allocations, the same issues was observed on VFs
and we employ the same fix that was done for them. We avoid allocating
the full MR cache on load but still allow it to be manipulated once the
IB device is loaded.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch 8e6efa3a31 IB/mlx5: When in switchdev mode, expose only raw packet capabilities
Currently in switchdev mode we allow only for raw packet QPs.
Expose the right capabilities and set the gid table length to 0, also
make sure we don't try to enable RoCE, so split the function
to enable RoCE so representors can enable only the notifier needed for
net device events.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch bcf87f1dbb IB/mlx5: Listen to netdev register/unresiter events in switchdev mode
Currently we listen to netdev register/unregister event based on PCI
device. When in switchdev mode PF and representors share the same PCI
device, so in order to pair ib device and netdev in switchdev mode
compare the netdev that triggered the event to that of the representor.

Expose a function that lets you receive the netdev associated what
a given representor.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch 018a94eec0 IB/mlx5: Add match on vport when in switchdev mode
When we point to a representor, it means we are in switchdev mode.
The flow db is shared between PF and virtual function representors
so each rule created needs to have a match on its specific source port.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch 9a4ca38d77 IB/mlx5: Allocate flow DB only on PF IB device
A flow DB is a shared resource between PF and representors,
need to allocate it only when creating the PF IB device.
Once we add IB representors, they will use the flow db which was
created by the PF.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch fc385b7ac4 IB/mlx5: Add basic regiser/unregister representors code
Create the basic infrastructure of registering and unregistering
IB representors. The load/unload callbacks are left empty and
proper implementation will be introduced in following patches.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Doug Ledford 4bb46608ed Merge branch 'k.o/for-rc' into k.o/wip/dl-for-next
There is a 14 patch series waiting to come into for-next that has a
dependecy on code submitted into this kernel's for-rc series.  So, merge
the for-rc branch into the current for-next in order to make the patch
series apply cleanly.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 22:27:20 -05:00
Doug Ledford f76a5c75d9 mlx5-updates-2018-02-21
This series includes shared code updates for mlx5 core driver for both
 netdev and rdma subsystems.
 
 By Saeed,
 First six patches of the series are meant to address a performance issue
 and should provide a performance boost for multi core IRQ interrupt hungry
 workloads.  The issue is fixed in the first patch, all other patches are
 meant to refactor the code in light of this fix.
 
 The problem it comes to fix, is a shared spinlock accessed across all HCA
 IRQs which protects the CQ database.  To solve this we simply move the CQ
 database and its spinlock to be per EQ (IRQ), thus per core.
 
 By Yonatan,
 Fragmented completion queue (CQ) for RDMA,
 core driver implementation to create fragmented CQ buffers rather than
 one large contiguous memory buffer, the implementation scheme already
 exist and used by the netdev CQs, the patch shares that code with the
 rdma CQ creation flow and makes use of the new API in mlx5_ib driver.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJajcwxAAoJEEg/ir3gV/o+uJAIAICP2LBu5ANBgX7Z7/B0IZX4
 Abkfk+IKR44v+3t0BBgEAZI2A9LcTQ8pwIOeLCDpcKd78665Y7rYQL1p31WuPKc5
 1WwwBesjQLAmdiNLIjUnjW1fd3YyOmqLBcaMMYIeF0m4m4pCQue1kwtxesQrhdZV
 I7FBxxnQt/8o6hoh/5nmU4DzGyZaY+uR/4FxUCEreklfhNeI5M2DZKe1xQCmhsJ4
 Tosh/cJ8kSBJfclH6hk1lh5eWGDYDltWCpY86KBWsYktl10VAE3P7LPmekn487ta
 e6cL1NPoEHbp+/UBThkNZMzUnA7wpoNeDbTTtVZrdcL1KivwColt1r3pmS1bqEQ=
 =DlfW
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-next

mlx5-updates-2018-02-21

This series includes shared code updates for mlx5 core driver for both
netdev and rdma subsystems.

By Saeed,
First six patches of the series are meant to address a performance issue
and should provide a performance boost for multi core IRQ interrupt hungry
workloads.  The issue is fixed in the first patch, all other patches are
meant to refactor the code in light of this fix.

The problem it comes to fix, is a shared spinlock accessed across all HCA
IRQs which protects the CQ database.  To solve this we simply move the CQ
database and its spinlock to be per EQ (IRQ), thus per core.

By Yonatan,
Fragmented completion queue (CQ) for RDMA,
core driver implementation to create fragmented CQ buffers rather than
one large contiguous memory buffer, the implementation scheme already
exist and used by the netdev CQs, the patch shares that code with the
rdma CQ creation flow and makes use of the new API in mlx5_ib driver.

Thanks,
Saeed.

Merged into rdma-next tree as well as net-next tree to prevent conflicts
in future patches between the two trees.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22 20:52:28 -05:00
David S. Miller f4af1db48b mlx5-updates-2018-02-21
This series includes shared code updates for mlx5 core driver for both
 netdev and rdma subsystems.
 
 By Saeed,
 First six patches of the series are meant to address a performance issue
 and should provide a performance boost for multi core IRQ interrupt hungry
 workloads.  The issue is fixed in the first patch, all other patches are
 meant to refactor the code in light of this fix.
 
 The problem it comes to fix, is a shared spinlock accessed across all HCA
 IRQs which protects the CQ database.  To solve this we simply move the CQ
 database and its spinlock to be per EQ (IRQ), thus per core.
 
 By Yonatan,
 Fragmented completion queue (CQ) for RDMA,
 core driver implementation to create fragmented CQ buffers rather than
 one large contiguous memory buffer, the implementation scheme already
 exist and used by the netdev CQs, the patch shares that code with the
 rdma CQ creation flow and makes use of the new API in mlx5_ib driver.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJajcwxAAoJEEg/ir3gV/o+uJAIAICP2LBu5ANBgX7Z7/B0IZX4
 Abkfk+IKR44v+3t0BBgEAZI2A9LcTQ8pwIOeLCDpcKd78665Y7rYQL1p31WuPKc5
 1WwwBesjQLAmdiNLIjUnjW1fd3YyOmqLBcaMMYIeF0m4m4pCQue1kwtxesQrhdZV
 I7FBxxnQt/8o6hoh/5nmU4DzGyZaY+uR/4FxUCEreklfhNeI5M2DZKe1xQCmhsJ4
 Tosh/cJ8kSBJfclH6hk1lh5eWGDYDltWCpY86KBWsYktl10VAE3P7LPmekn487ta
 e6cL1NPoEHbp+/UBThkNZMzUnA7wpoNeDbTTtVZrdcL1KivwColt1r3pmS1bqEQ=
 =DlfW
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-02-21

This series includes shared code updates for mlx5 core driver for both
netdev and rdma subsystems.

By Saeed,
First six patches of the series are meant to address a performance issue
and should provide a performance boost for multi core IRQ interrupt hungry
workloads.  The issue is fixed in the first patch, all other patches are
meant to refactor the code in light of this fix.

The problem it comes to fix, is a shared spinlock accessed across all HCA
IRQs which protects the CQ database.  To solve this we simply move the CQ
database and its spinlock to be per EQ (IRQ), thus per core.

By Yonatan,
Fragmented completion queue (CQ) for RDMA,
core driver implementation to create fragmented CQ buffers rather than
one large contiguous memory buffer, the implementation scheme already
exist and used by the netdev CQs, the patch shares that code with the
rdma CQ creation flow and makes use of the new API in mlx5_ib driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22 14:38:38 -05:00
Selvin Xavier 7374fbd9e1 RDMA/bnxt_re: Avoid system hang during device un-reg
BNXT_RE_FLAG_TASK_IN_PROG doesn't handle multiple work
requests posted together. Track schedule of multiple
workqueue items by maintaining a per device counter
and proceed with IB dereg only if this counter is zero.
flush_workqueue is no longer required from
NETDEV_UNREGISTER path.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:59:47 -05:00
Selvin Xavier dcdaba0806 RDMA/bnxt_re: Fix system crash during load/unload
During driver unload, the driver proceeds with cleanup
without waiting for the scheduled events. So the device
pointers get freed up and driver crashes when the events
are scheduled later.

Flush the bnxt_re_task work queue before starting
device removal.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:57:21 -05:00
Selvin Xavier 3b921e3bc4 RDMA/bnxt_re: Synchronize destroy_qp with poll_cq
Avoid system crash when destroy_qp is invoked while
the driver is processing the poll_cq. Synchronize these
functions using the cq_lock.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:57:21 -05:00
Devesh Sharma 6b4521f517 RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails
Driver leaves the QP memory pinned if QP create command
fails from the FW. Avoids this scenario by adding a proper
exit path if the FW command fails.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:57:21 -05:00
Devesh Sharma 7ff662b761 RDMA/bnxt_re: Disable atomic capability on bnxt_re adapters
More testing needs to be done before enabling this feature.
Disabling the feature temporarily

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20 11:57:21 -05:00
Adit Ranadive 1f5a6c47aa RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file
This ensures that we return the right structures back to userspace.
Otherwise, it looks like the reserved fields in the response structures
in userspace might have uninitialized data in them.

Fixes: 8b10ba783c ("RDMA/vmw_pvrdma: Add shared receive queue support")
Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15 15:31:28 -07:00
Yonatan Cohen 388ca8be00 IB/mlx5: Implement fragmented completion queue (CQ)
The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().

This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.

Base the Completion Queues (CQs) on this new fragmented buffer.

It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15 00:30:03 -08:00
Andy Shevchenko 71591d1280 RDMA/hns: Replace __raw_write*(cpu_to_le*()) with LE write*()
There is no need to repeat the semantics of writel() and similar.
Moreover sparse complains about this:

drivers/infiniband/hw/hns/hns_roce_hw_v1.c:1690:22: expected unsigned long long val
drivers/infiniband/hw/hns/hns_roce_hw_v1.c:1690:22: got restricted __le64 <noident>

Fixing this by replacing __raw_write*(cpu_to_le*()) calls by plain
write*() ones.

Note, write*() accessors are little endian by definition.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-14 16:32:54 -07:00
oulijun d480bb50d2 RDMA/hns: Use free_pages function instead of free_page
It need to use free_pages function for free the memory allocated
by __get_free_pages function.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-14 16:30:12 -07:00
Yixian Liu ced07769dc RDMA/hns: Fix QP state judgement before receiving work requests
The QP can accept receive work requests only when the QP is
in the states that allow them to be submitted.

This patch updates the QP state judgement based on the
specification.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-14 16:30:12 -07:00
oulijun 173bc6be96 RDMA/hns: Fix a bug with modifying mac address
When modifying mac address, it will trigger hns_roce_del_gid
function and can't delete the default gid matched the index
because the attribute of gid is null.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-14 16:30:12 -07:00
Corentin Labbe fc968aee5e IB/cxgb3: remove cxio_dbg.c
cxio_dbg.c is uncompiled since commit 2b540355cd ("RDMA/cxgb3: cleanups")
10 years after, we could remove it.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-14 16:21:15 -07:00
Denys Vlasenko 9b2c45d479 net: make getname() functions return length rather than use int* parameter
Changes since v1:
Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.

Userspace API is not changed.

    text    data     bss      dec     hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-bluetooth@vger.kernel.org
CC: linux-decnet-user@lists.sourceforge.net
CC: linux-wireless@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-sctp@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12 14:15:04 -05:00
Linus Torvalds a9a08845e9 vfs: do bulk POLL* -> EPOLL* replacement
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11 14:34:03 -08:00
Linus Torvalds 2246edfaf8 Second pull request for 4.16 merge window
- Clean up some function signatures in rxe for clarity
 - Tidy the RDMA netlink header to remove unimplemented constants
 - bnxt_re driver fixes, one is a regression this window.
 - Minor hns driver fixes
 - Various fixes from Dan Carpenter and his tool
 - Fix IRQ cleanup race in HFI1
 - HF1 performance optimizations and a fix to report counters in the right units
 - Fix for an IPoIB startup sequence race with the external manager
 - Oops fix for the new kabi path
 - Endian cleanups for hns
 - Fix for mlx5 related to the new automatic affinity support
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaePL/AAoJELgmozMOVy/dqhsQALUhzDuuJ+/F6supjmyqZG53
 Ak/PoFjTmHToGQfDq/1TRzyKwMx12aB2l6WGZc31FzhvCw4daPWkoEVKReNWUUJ+
 fmESxjLgo8ZRGSqpNxn9Q8agE/I/5JZQoA8bCFCYgdZPKTPNKdtAVBphpdhmrOX4
 ygjABikWf/wBsNF1A8lnX9xkfPO21cPHrFQLTnuOzOT/hc6U+PPklHSQCnS91svh
 1+Pqjtssg54rxYkJqiFq3giSnfwvmAXO8WyVGmRRPFGLpB0nIjq0Sl6ZgLLClz7w
 YJdiBGr7rlnNMgGCjlPU2ZO3lO6J0ytXQzFNqRqvKryXQOv+uVeJgep7WqHTcdQU
 UN30FCKQMgLL/F6NF8wKaKcK4X0VgXQa7gpuH2fVSXF0c3LO3/mmWNjixbGSzT2c
 Wj+EW3eOKlTddhRLhgbMOdwc32tIGhaD85z2F4+FZO+XI9ZQtJaDewWVDjYoumP/
 RlDIFw+KCgSq7+UZL8CoXuh0BuS1nu9TGfkx1HW0DLMF1+yigNiswpUfksV4cISP
 JqE2I3yH0A4UobD/a+f9IhIfk2MjxO0tJWNjU8IA9LXgUFlskQ6MpH/AcE9G8JNv
 tlfLGR3s4PJa/7j/Iy2F84og/b/KH8v7vyj4Eknq/hLq63/BiM5wj0AUBRrGulN6
 HhAMOegxGZ7IKP/y0L7I
 =xwZz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull more rdma updates from Doug Ledford:
 "Items of note:

   - two patches fix a regression in the 4.15 kernel. The 4.14 kernel
     worked fine with NVMe over Fabrics and mlx5 adapters. That broke in
     4.15. The fix is here.

   - one of the patches (the endian notation patch from Lijun) looks
     like a lot of lines of change, but it's mostly mechanical in
     nature. It amounts to the biggest chunk of change in it (it's about
     2/3rds of the overall pull request).

  Summary:

   - Clean up some function signatures in rxe for clarity

   - Tidy the RDMA netlink header to remove unimplemented constants

   - bnxt_re driver fixes, one is a regression this window.

   - Minor hns driver fixes

   - Various fixes from Dan Carpenter and his tool

   - Fix IRQ cleanup race in HFI1

   - HF1 performance optimizations and a fix to report counters in the right units

   - Fix for an IPoIB startup sequence race with the external manager

   - Oops fix for the new kabi path

   - Endian cleanups for hns

   - Fix for mlx5 related to the new automatic affinity support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits)
  net/mlx5: increase async EQ to avoid EQ overrun
  mlx5: fix mlx5_get_vector_affinity to start from completion vector 0
  RDMA/hns: Fix the endian problem for hns
  IB/uverbs: Use the standard kConfig format for experimental
  IB: Update references to libibverbs
  IB/hfi1: Add 16B rcvhdr trace support
  IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node
  IB/core: Avoid a potential OOPs for an unused optional parameter
  IB/core: Map iWarp AH type to undefined in rdma_ah_find_type
  IB/ipoib: Fix for potential no-carrier state
  IB/hfi1: Show fault stats in both TX and RX directions
  IB/hfi1: Remove blind constants from 16B update
  IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
  IB/hfi1: Do not override given pcie_pset value
  IB/hfi1: Optimize process_receive_ib()
  IB/hfi1: Remove unnecessary fecn and becn fields
  IB/hfi1: Look up ibport using a pointer in receive path
  IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
  IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet
  IB/hfi1: Remove dependence on qp->s_hdrwords
  ...
2018-02-06 11:09:45 -08:00
Linus Torvalds 105cf3c8c6 pci-v4.16-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJad5lgAAoJEFmIoMA60/r8s2kQAI3PztawDpaCP9Z12pkbBHSt
 Ho0xTyk9rCZi9kQJbNjc+a+QrlA3QmTHXIXerB3LSWoh7M+XhsECjem92eHpgLNS
 JvYPhTfOrCr0vdiAmOz6hD0AqN/psrbfzgiJhSwomsGEFS77k7kERSJckRv81sxb
 Aj5F/WjucAgLorwm4auveAJEQ7atE7/6pkXzoqYm4G6NLOb46jUcRGndrnvXZBlz
 fws8fBM4BHyi7i25CYQl24tFq1CGax1rIPgLg+4KnH76bQk/N6Ju0sGVSzfh+hG8
 SIerK9bJbzGRAuNKoxB3aO1dyzsK3x9WztE2mG98w5trOISPIR1FqnvC/225FWAU
 d6eIXiC7wKnEx+DElNTzCjzfHc7SAJoupO32H7CoiTe5zPUlWlxJ1zLYkK1gt50q
 m8PRBiYTglxyznzrO0drtcdjEzvbdZNRrsYnul4wi1vSHzjk6F6XLtzT10XWM1M1
 1pXLB8384FTj0Hu4bq6Y3Aivkmz0Sf+eQM2NaOwe+Zj7/1VV0d3lvi4LUXkqzLCA
 FoXPJSMxG2Qu+iflCeYRQBJjExaZH3eNLZ3dT6QpcJrjaFVedd9u5DeeFqNL27zV
 bhr8TdqrR4p4rc8EBAGoCapw96IxLZROKB3gxbrZVOpfIZpzthwHbElHX6aqUgF4
 w/EV1JWs36WXWaxFk8wd
 =ttq9
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - skip AER driver error recovery callbacks for correctable errors
   reported via ACPI APEI, as we already do for errors reported via the
   native path (Tyler Baicar)

 - fix DPC shared interrupt handling (Alex Williamson)

 - print full DPC interrupt number (Keith Busch)

 - enable DPC only if AER is available (Keith Busch)

 - simplify DPC code (Bjorn Helgaas)

 - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
   Helgaas)

 - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
   Helgaas)

 - move ASPM internal interfaces out of public header (Bjorn Helgaas)

 - allow hot-removal of VGA devices (Mika Westerberg)

 - speed up unplug and shutdown by assuming Thunderbolt controllers
   don't support Command Completed events (Lukas Wunner)

 - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
   Jay Cornwall)

 - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)

 - clean up PCI DMA interface usage (Christoph Hellwig)

 - remove PCI pool API (replaced with DMA pool) (Romain Perier)

 - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
   Kaya)

 - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)

 - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)

 - remove warnings on sysfs mmap failure (Bjorn Helgaas)

 - quiet ROM validation messages (Alex Deucher)

 - remove redundant memory alloc failure messages (Markus Elfring)

 - fill in types for compile-time VGA and other I/O port resources
   (Bjorn Helgaas)

 - make "pci=pcie_scan_all" work for Root Ports as well as Downstream
   Ports to help AmigaOne X1000 (Bjorn Helgaas)

 - add SPDX tags to all PCI files (Bjorn Helgaas)

 - quirk Marvell 9128 DMA aliases (Alex Williamson)

 - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)

 - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
   Cassel)

 - use DMA API to get MSI address for DesignWare IP (Niklas Cassel)

 - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)

 - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)

 - add support for ARTPEC-7 SoC (Niklas Cassel)

 - add endpoint-mode support for ARTPEC (Niklas Cassel)

 - add Cadence PCIe host and endpoint controller driver (Cyrille
   Pitchen)

 - handle multiple INTx status bits being set in dra7xx (Vignesh R)

 - translate dra7xx hwirq range to fix INTD handling (Vignesh R)

 - remove deprecated Exynos PHY initialization code (Jaehoon Chung)

 - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)

 - fix NULL pointer dereference in iProc BCMA driver (Ray Jui)

 - fix Keystone interrupt-controller-node lookup (Johan Hovold)

 - constify qcom driver structures (Julia Lawall)

 - rework Tegra config space mapping to increase space available for
   endpoints (Vidya Sagar)

 - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)

 - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)

 - add support for Global Fabric Manager Server (GFMS) event to
   Microsemi Switchtec switch driver (Logan Gunthorpe)

 - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)

* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
  PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
  PCI: endpoint: Fix EPF device name to support multi-function devices
  PCI: endpoint: Add the function number as argument to EPC ops
  PCI: cadence: Add host driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
  PCI: Add vendor ID for Cadence
  PCI: Add generic function to probe PCI host controllers
  PCI: generic: fix missing call of pci_free_resource_list()
  PCI: OF: Add generic function to parse and allocate PCI resources
  PCI: Regroup all PCI related entries into drivers/pci/Makefile
  PCI/DPC: Reformat DPC register definitions
  PCI/DPC: Add and use DPC Status register field definitions
  PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
  PCI/DPC: Remove unnecessary RP PIO register structs
  PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
  PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
  PCI/DPC: Make RP PIO log size check more generic
  PCI/DPC: Rename local "status" to "dpc_status"
  PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
  ...
2018-02-06 09:59:40 -08:00
oulijun 8b9b8d143b RDMA/hns: Fix the endian problem for hns
The hip06 and hip08 run on a little endian ARM, it needs to
revise the annotations to indicate that the HW uses little
endian data in the various DMA buffers, and flow the necessary
swaps throughout.

The imm_data use big endian mode. The cpu_to_le32/le32_to_cpu
swaps are no-op for this, which makes the only substantive
change the handling of imm_data which is now mandatory swapped.

This also keep match with the userspace hns driver and resolve
the warning by sparse.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-05 10:48:48 -05:00
Don Hiatt 6197a815fe IB/hfi1: Add 16B rcvhdr trace support
Add trace_hfi1_rcvhdr support for bypass packets.
While here, remove the etype argument as it is available
in struct hfi1_packet.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:32 -07:00
Kamenee Arumugam 953a9cebea IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node
Kzalloc_node API doesn't check for overflows in size multiplication.
While kcalloc API check for overflows in size multiplication
but these implementations are not NUMA-aware.

This conversion allowed for correcting an allocation used in the hot
path to be on the local NUMA and ensure us overflow free multiplication
for the size of a memory allocation.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:32 -07:00
Mitko Haralanov b5de809ef6 IB/hfi1: Show fault stats in both TX and RX directions
The routine which shows the fault stats checks the counters
to determine whether to show any stats based on the number of
transmitted pkts/bytes for a particular opcode.

Unfortunately, it only checked the receive counters. As a result,
if any packet faults have happened for packets egressing the HFI,
those stats would not be shown.

In order to fix this, the routine is amended to also check the
TX counters. With this change the pkt/byte counts are the sum of
both TX and RX counts for the opcode.

Fixes: 1b311f8931 ("IB/hfi1: Add tx_opcode_stats like the opcode_stats")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:31 -07:00
Mike Marciniszyn 78d3633ba9 IB/hfi1: Remove blind constants from 16B update
These values were introduced as part of the 16B code to
account for the varying size of the LRH between the differing
packet formats.

Replace the blind constants with defines based on FIELD_SIZEOF()
calls.

Fixes: 5b6cabb0db ("IB/hfi1: Add 16B RC/UC support")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:30 -07:00
Kamenee Arumugam 0719007663 IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
HFI's counters SendWaitCnt and SendWaitVlCnt are in units
of TXE cycle time (at 805MHz). OPA counters PortXmitWait and
PortVLXmtWait are in units of flit times.
Convert the counter values to flit units using following
conversion formula:

PortXmitWait =
	SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)
PortVLXmitWait =
	SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)

At link up or downgrade events, the link width can change. To ensure
accurate counter calculations, sample the counters after the events,
during counter requests, and then aggregate the OPA counters.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:30 -07:00
Bartlomiej Dudek 6391214f4d IB/hfi1: Do not override given pcie_pset value
During PCIe Gen 3 transistion, pcie_pset is read and might be overridden
to a default value(i.e. 255) in do_pcie_gen3_transition() routine.

If the pcie_pset value is overridden then this new value will be used
during initialization of next adapter on a different card.

Introducing a new local variable to avoid modification of pcie_pset

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Patel Jay P <jay.p.patel@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:30 -07:00
Sebastian Sanchez aca7f4fc32 IB/hfi1: Optimize process_receive_ib()
The arguments for trace_hfi1_rcvhdr() get computed every
time in the hot path regardless of the whether the trace
is on or off. This is seen to be costly with a profile.
The handling of fault inject isolates the verbs device for
all packets regardless of the presence of a RHF_DC_ERR error.

Fix the first by computing trace_hfi1_rcvhdr() arguments within
the trace itself, so that when the trace is off, the argument
data isn't computed. Fix the second by moving the error check to
handle_eflags() when an RHF error occurs and by testing for
RHF_DC_ERR before executing the reset of handle_eflags().

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:30 -07:00
Sebastian Sanchez ca85bb1ca9 IB/hfi1: Remove unnecessary fecn and becn fields
packet->fecn and packet->becn are calculated in the hot path
and are never used. Remove these fields as they show to be
costly in a profile. Also, remove initialization for
becn and fecn in process_ecn() as they're unconditionally
assigned in the function and ensure fecn and becn variables
use a boolean type.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:29 -07:00
Sebastian Sanchez bdaf96f650 IB/hfi1: Look up ibport using a pointer in receive path
In the receive path, hfi1_ibport is looked up by indexing into an
array. A profile shows this to be expensive. The receive context
data has a pointer to the ibport data, use that pointer instead.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:29 -07:00
Sebastian Sanchez 6d6b8848c8 IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
The packet type comparison used to find out if a packet is a bypass
packet in the hot path is an expensive operation as seen in a profile.

Determine packet's pkey and migration bit through the bypass and 9B
code paths instead.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:28 -07:00
Sebastian Sanchez f150e2736f IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet
In hfi1_rc_rcv(), BTH is computed for all packets received.
However, it's only used for packets received with opcodes
RDMA_WRITE_LAST and SEND_LAST, and it is a costly operation.

Compute BTH only in the RDMA_WRITE_LAST/SEND_LAST code path
and let the compiler handle endianness conversion for bitwise
operations.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:43:28 -07:00
Mitko Haralanov 9636258f10 IB/hfi1: Remove dependence on qp->s_hdrwords
The s_hdrwords variable was used to indicate whether a
packet was already built on a previous iteration of the
send engine. This variable assumed the protection of the
QP's RVT_S_BUSY flag, which was required since the the
QP's s_lock was dropped just prior to the packet being
queued on the one of the egress mechanisms.

Support for multiple send engine instantiations require
that the field not be used due to concurency issues.
The ps.txreq signals the "already built" without the
potential concurency issues.

Fix by getting rid of all s_hdrword usage.   A wrapper
is added to test for the already built case that used to
use s_hdrwords.

What used to be stored in s_hdrwords is now in the txreq.
The PBC is not counted, but is added in the pio/sdma code
paths prior to posting the packet.

Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
Alex Estrin 2b1e7fe161 IB/hfi1: Fix for potential refcount leak in hfi1_open_file()
The dd refcount is speculatively incremented prior to allocating
the fd memory with kzalloc(). If that kzalloc() failed the dd
refcount leaks.
Increment refcount on kzalloc success.

Fixes: e11ffbd575 ("IB/hfi1: Do not free hfi1 cdev parent structure early")
Reviewed-by: Michael J Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
Alex Estrin 473291b3ea IB/hfi1: Fix for early release of sdma context
With IRQF_SHARED flag set and CONFIG_DEBUG_SHIRQ enabled
module removal may result in panic in sdma_interrupt() routine
if associated sdma context was released before pci_free_irq();

[ 9198.939885] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 9198.940514] IP: sdma_make_progress+0xa5/0x450 [hfi1]
[ 9198.941114] PGD 170bdc0067 P4D 170bdc0067 PUD 172063e067 PMD 0
[ 9198.941783] Oops: 0000 [#1] SMP
.....
[ 9198.958877] CPU: 132 PID: 64173 Comm: rmmod Tainted: G           OE   4.14.0-rc4+ #1
[ 9198.961032] Hardware name: Intel Corporation S7200AP/S7200AP, BIOS S72C610.86B.01.02.0118.080620171935 08/06/2017
[ 9198.963323] task: ffff9681397f0000 task.stack: ffffae1647c40000
[ 9198.965695] RIP: 0010:sdma_make_progress+0xa5/0x450 [hfi1]
[ 9198.968082] RSP: 0018:ffffae1647c43be8 EFLAGS: 00010046
[ 9198.970503] RAX: 0000000000000000 RBX: ffff9680ce8b5ca8 RCX: 0000000000000000
[ 9198.973006] RDX: 0000000000000000 RSI: 0000000001a00d28 RDI: ffff9680ce8b5ca0
[ 9198.975546] RBP: ffffae1647c43c40 R08: ffff96814325ec00 R09: 00000000ffffffff
[ 9198.978142] R10: 000000004325e501 R11: ffff96814325ec00 R12: ffff9680ce8b5c44
[ 9198.980779] R13: ffff9680ce8b5ca0 R14: 0000000000000000 R15: ffff9680ce8b5b00
[ 9198.983462] FS:  00007f31196ba740(0000) GS:ffff96819df00000(0000) knlGS:0000000000000000
[ 9198.986231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9198.989036] CR2: 0000000000000000 CR3: 000000170833f000 CR4: 00000000001406e0
[ 9198.991911] Call Trace:
[ 9198.994847]  sdma_engine_interrupt+0x82/0x100 [hfi1]
[ 9198.997852]  sdma_interrupt+0x61/0xc0 [hfi1]
[ 9199.000852]  __free_irq+0x1b3/0x2d0
[ 9199.003873]  free_irq+0x35/0x70
[ 9199.006909]  pci_free_irq+0x1c/0x30
[ 9199.009999]  clean_up_interrupts+0x53/0xf0 [hfi1]
[ 9199.013137]  hfi1_start_cleanup+0x117/0x190 [hfi1]
[ 9199.016315]  postinit_cleanup+0x1d/0x270 [hfi1]
[ 9199.019529]  remove_one+0x1f3/0x210 [hfi1]
[ 9199.022738]  pci_device_remove+0x39/0xc0
[ 9199.025974]  device_release_driver_internal+0x141/0x210
[ 9199.029268]  driver_detach+0x3f/0x80
[ 9199.032580]  bus_remove_driver+0x55/0xd0
[ 9199.035931]  driver_unregister+0x2c/0x50
[ 9199.039321]  pci_unregister_driver+0x2a/0xa0
[ 9199.042755]  hfi1_mod_cleanup+0x10/0xb50 [hfi1]
[ 9199.046196]  SyS_delete_module+0x171/0x250
...

Fix by exporting sdma_clean() and removing from sdma_exit().
sdma_exit() now just manipulates the engine state,
leaving the memory free to sdma_clean() which is now called
just before the dd is freed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
Michael J. Ruhl 82a9792656 IB/hfi1: Re-order IRQ cleanup to address driver cleanup race
The pci_request_irq() interfaces always adds the IRQF_SHARED bit to
all IRQ requests.

When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the
IRQF_SHARED bit is set, a call to the IRQ handler is made from the
__free_irq() function. This is testing a race condition between the
IRQ cleanup and an IRQ racing the cleanup.  The HFI driver should be
able to handle this race, but does not.

This race can cause traces that start with this footprint:

BUG: unable to handle kernel NULL pointer dereference at   (null)
Call Trace:
 <hfi1 irq handler>
 ...
 __free_irq+0x1b3/0x2d0
 free_irq+0x35/0x70
 pci_free_irq+0x1c/0x30
 clean_up_interrupts+0x53/0xf0 [hfi1]
 hfi1_start_cleanup+0x122/0x190 [hfi1]
 postinit_cleanup+0x1d/0x280 [hfi1]
 remove_one+0x233/0x250 [hfi1]
 pci_device_remove+0x39/0xc0

Export IRQ cleanup function so it can be called from other modules.

Using the exported cleanup function:

  Re-order the driver cleanup code to clean up IRQ resources before
  other resources, eliminating the race.

  Re-order error path for init so that the race does not occur.

Reduce severity on spurious error message for SDMA IRQs to info.

Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Patel Jay P <jay.p.patel@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
oulijun 0da6550366 RDMA/hns: Fix misplaced call to hns_roce_cleanup_hem_table
The mtt_table is cleaned up during the err_unmap_cqe label, it is a
mistake to duplicate the cleanup during the later unwind labels.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
oulijun fd012f1c4f RDMA/hns: Add names to function arguments in function pointers
This patch mainly fix some style warings matched with the new checkpatch
requirement. The warning as follows:

WARNING: function definition argument 'struct hns_roce_cq *' should also have
an identifier name

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
oulijun c27991198c RDMA/hns: Remove unnecessary operator
The double not-operator is unncessary when used in a boolean context. This
patch removes them.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:32 -07:00
Markus Elfring e5b8984386 RDMA/bnxt_re: Use common error handling code in bnxt_qplib_alloc_dpi_tbl()
Add a jump target so that a bit of exception handling can be better reused
at the end of this function.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:31 -07:00
Markus Elfring f390b71b65 RDMA/bnxt_re: Delete two error messages for a failed memory allocation in bnxt_qplib_alloc_dpi_tbl()
Omit extra messages for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Devesh Sharma <devesh.sharma@broadcom.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01 15:24:31 -07:00
Linus Torvalds 73da9e1a9f Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - misc fixes

 - ocfs2 updates

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  mm: remove PG_highmem description
  tools, vm: new option to specify kpageflags file
  mm/swap.c: make functions and their kernel-doc agree
  mm, memory_hotplug: fix memmap initialization
  mm: correct comments regarding do_fault_around()
  mm: numa: do not trap faults on shared data section pages.
  hugetlb, mbind: fall back to default policy if vma is NULL
  hugetlb, mempolicy: fix the mbind hugetlb migration
  mm, hugetlb: further simplify hugetlb allocation API
  mm, hugetlb: get rid of surplus page accounting tricks
  mm, hugetlb: do not rely on overcommit limit during migration
  mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
  mm, hugetlb: unify core page allocation accounting and initialization
  mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
  mm/memcontrol.c: make local symbol static
  mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
  include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
  mm/compaction.c: fix comment for try_to_compact_pages()
  mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
  zsmalloc: use U suffix for negative literals being shifted
  ...
2018-01-31 18:46:22 -08:00
David Rientjes 5ff7091f5a mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks
Commit 4d4bbd8526 ("mm, oom_reaper: skip mm structs with mmu
notifiers") prevented the oom reaper from unmapping private anonymous
memory with the oom reaper when the oom victim mm had mmu notifiers
registered.

The rationale is that doing mmu_notifier_invalidate_range_{start,end}()
around the unmap_page_range(), which is needed, can block and the oom
killer will stall forever waiting for the victim to exit, which may not
be possible without reaping.

That concern is real, but only true for mmu notifiers that have
blockable invalidate_range_{start,end}() callbacks.  This patch adds a
"flags" field to mmu notifier ops that can set a bit to indicate that
these callbacks do not block.

The implementation is steered toward an expensive slowpath, such as
after the oom reaper has grabbed mm->mmap_sem of a still alive oom
victim.

[rientjes@google.com: mmu_notifier_invalidate_range_end() can also call the invalidate_range() must not block, fix comment]
  Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801091339570.240101@chino.kir.corp.google.com
[akpm@linux-foundation.org: make mm_has_blockable_invalidate_notifiers() return bool, use rwsem_is_locked()]
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1712141329500.74052@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:38 -08:00
Linus Torvalds b2fe5fa686 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Significantly shrink the core networking routing structures. Result
    of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf

 2) Add netdevsim driver for testing various offloads, from Jakub
    Kicinski.

 3) Support cross-chip FDB operations in DSA, from Vivien Didelot.

 4) Add a 2nd listener hash table for TCP, similar to what was done for
    UDP. From Martin KaFai Lau.

 5) Add eBPF based queue selection to tun, from Jason Wang.

 6) Lockless qdisc support, from John Fastabend.

 7) SCTP stream interleave support, from Xin Long.

 8) Smoother TCP receive autotuning, from Eric Dumazet.

 9) Lots of erspan tunneling enhancements, from William Tu.

10) Add true function call support to BPF, from Alexei Starovoitov.

11) Add explicit support for GRO HW offloading, from Michael Chan.

12) Support extack generation in more netlink subsystems. From Alexander
    Aring, Quentin Monnet, and Jakub Kicinski.

13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
    Russell King.

14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.

15) Many improvements and simplifications to the NFP driver bpf JIT,
    from Jakub Kicinski.

16) Support for ipv6 non-equal cost multipath routing, from Ido
    Schimmel.

17) Add resource abstration to devlink, from Arkadi Sharshevsky.

18) Packet scheduler classifier shared filter block support, from Jiri
    Pirko.

19) Avoid locking in act_csum, from Davide Caratti.

20) devinet_ioctl() simplifications from Al viro.

21) More TCP bpf improvements from Lawrence Brakmo.

22) Add support for onlink ipv6 route flag, similar to ipv4, from David
    Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
  tls: Add support for encryption using async offload accelerator
  ip6mr: fix stale iterator
  net/sched: kconfig: Remove blank help texts
  openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
  tcp_nv: fix potential integer overflow in tcpnv_acked
  r8169: fix RTL8168EP take too long to complete driver initialization.
  qmi_wwan: Add support for Quectel EP06
  rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
  ipmr: Fix ptrdiff_t print formatting
  ibmvnic: Wait for device response when changing MAC
  qlcnic: fix deadlock bug
  tcp: release sk_frag.page in tcp_disconnect
  ipv4: Get the address of interface correctly.
  net_sched: gen_estimator: fix lockdep splat
  net: macb: Handle HRESP error
  net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
  ipv6: addrconf: break critical section in addrconf_verify_rtnl()
  ipv6: change route cache aging logic
  i40e/i40evf: Update DESC_NEEDED value to reflect larger value
  bnxt_en: cleanup DIM work on device shutdown
  ...
2018-01-31 14:31:10 -08:00
Dan Carpenter 8de9399a24 RDMA/bnxt_re: Fix an error code in bnxt_qplib_create_srq()
We should return -ENOMEM if the allocation fails.  (The current code
returns succees).

Fixes: 37cb11acf1 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-By: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:19:50 -05:00
Doug Ledford 589ccd8b04 RDMA/bnxt_re: Fix static checker warning
If there is ever any error while creating srq->umem, we return that
error, we don't store it in srq->umem, so any check of srq->umem for
IS_ERR is pointless.  Further, checking udata is unnecessary as
srq->umem is always either NULL or valid, without respect to udata.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:19:50 -05:00
Linus Torvalds 7b1cd95d65 First merge window pull request for 4.16
- Misc small driver fixups to
   bnxt_re/hfi1/qib/hns/ocrdma/rdmavt/vmw_pvrdma/nes
 - Several major feature adds to bnxt_re driver: SRIOV VF RoCE support,
   HugePages support, extended hardware stats support, and SRQ support
 - A notable number of fixes to the i40iw driver from debugging scale up
   testing
 - More work to enable the new hip08 chip in the hns driver
 - Misc small ULP fixups to srp/srpt//ipoib
 - Preparation for srp initiator and target to support the RDMA-CM
   protocol for connections
 - Add RDMA-CM support to srp initiator, srp target is still a WIP
 - Fixes for a couple of places where ipoib could spam the dmesg log
 - Fix encode/decode of FDR/EDR data rates in the core
 - Many patches from Parav with ongoing work to clean up inconsistencies
   and bugs in RoCE support around the rdma_cm
 - mlx5 driver support for the userspace features 'thread domain', 'wallclock
   timestamps' and 'DV Direct Connected transport'. Support for the firmware
   dual port rocee capability
 - Core support for more than 32 rdma devices in the char dev allocation
 - kernel doc updates from Randy Dunlap
 - New netlink uAPI for inspecting RDMA objects similar in spirit to 'ss'
 - One minor change to the kobject code acked by GKH
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJacfljAAoJEDht9xV+IJsaUnwP+QFJvfIDEfRlfU2rTmcfymPs
 Rz9bW1KLgETcJx/XOE2ba2DOaqdFr56TLflsDfEfOSIL8AtzBQqH3vTqEj49bBP7
 4JZAkzWllUS/qoYD2XmvOM0IrIfFXzZtLM/lzLi+5dwK26x3GAB9hHXpKzUrJ1vj
 I1Naq14qOFXoNBndEtZJqtIKOhR/Pnd6YtxAiNCmViZGdqm3DIU3D4VJhU5B7pO9
 j6ovJs16wfJl/gV1iiz9xO49ViVFpwzSIzYE/Q2ZCegcrsF3EEVN2J4vZHkKgDuN
 0/Ar/WOvkPzKBFR8hJ7M4kwp0Fy/69/U49s7kpGNxdhML9sU3+Qfse6JYGj0M9L8
 01gTM0SShyAZMNAvjVFbIKLQPg806OAit4cooMwlObbwJ6b7B8K0uN17/uVIkIqp
 gXqertyl1BLhUtTOby/8Fox/f/oEvaZksKiwcTKSb7D1Y5jGZZUPRknJ5SwAFWQB
 RiTPJ6mY7BUsM9zuYQtRE8x2mpgIezYXFcrAz7iT76WuoZQgo1QLIyYRM1+MlhnC
 wNrp5BtqoVfW2Ps0CbSdxJ9vDtDf3cwLg0RzcCB8+NJJccsRD9IVMDev/TDY5k9U
 M9LxxtW3WuulRWgliU0Q9VaswUQoIao16vBMVL7GwUm+ClLvbRVoPe8jxgtfk+W3
 GAANAI7Kv/vUoV/6CFfP
 =sMXV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull RDMA subsystem updates from Jason Gunthorpe:
 "Overall this cycle did not have any major excitement, and did not
  require any shared branch with netdev.

  Lots of driver updates, particularly of the scale-up and performance
  variety. The largest body of core work was Parav's patches fixing and
  restructing some of the core code to make way for future RDMA
  containerization.

  Summary:

   - misc small driver fixups to
     bnxt_re/hfi1/qib/hns/ocrdma/rdmavt/vmw_pvrdma/nes

   - several major feature adds to bnxt_re driver: SRIOV VF RoCE
     support, HugePages support, extended hardware stats support, and
     SRQ support

   - a notable number of fixes to the i40iw driver from debugging scale
     up testing

   - more work to enable the new hip08 chip in the hns driver

   - misc small ULP fixups to srp/srpt//ipoib

   - preparation for srp initiator and target to support the RDMA-CM
     protocol for connections

   - add RDMA-CM support to srp initiator, srp target is still a WIP

   - fixes for a couple of places where ipoib could spam the dmesg log

   - fix encode/decode of FDR/EDR data rates in the core

   - many patches from Parav with ongoing work to clean up
     inconsistencies and bugs in RoCE support around the rdma_cm

   - mlx5 driver support for the userspace features 'thread domain',
     'wallclock timestamps' and 'DV Direct Connected transport'. Support
     for the firmware dual port rocee capability

   - core support for more than 32 rdma devices in the char dev
     allocation

   - kernel doc updates from Randy Dunlap

   - new netlink uAPI for inspecting RDMA objects similar in spirit to 'ss'

   - one minor change to the kobject code acked by Greg KH"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (259 commits)
  RDMA/nldev: Provide detailed QP information
  RDMA/nldev: Provide global resource utilization
  RDMA/core: Add resource tracking for create and destroy PDs
  RDMA/core: Add resource tracking for create and destroy CQs
  RDMA/core: Add resource tracking for create and destroy QPs
  RDMA/restrack: Add general infrastructure to track RDMA resources
  RDMA/core: Save kernel caller name when creating PD and CQ objects
  RDMA/core: Use the MODNAME instead of the function name for pd callers
  RDMA: Move enum ib_cq_creation_flags to uapi headers
  IB/rxe: Change RDMA_RXE kconfig to use select
  IB/qib: remove qib_keys.c
  IB/mthca: remove mthca_user.h
  RDMA/cm: Fix access to uninitialized variable
  RDMA/cma: Use existing netif_is_bond_master function
  IB/core: Avoid SGID attributes query while converting GID from OPA to IB
  RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
  IB/umad: Fix use of unprotected device pointer
  IB/iser: Combine substrings for three messages
  IB/iser: Delete an unnecessary variable initialisation in iser_send_data_out()
  IB/iser: Delete an error message for a failed memory allocation in iser_send_data_out()
  ...
2018-01-31 12:05:10 -08:00
Linus Torvalds 168fe32a07 Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull poll annotations from Al Viro:
 "This introduces a __bitwise type for POLL### bitmap, and propagates
  the annotations through the tree. Most of that stuff is as simple as
  'make ->poll() instances return __poll_t and do the same to local
  variables used to hold the future return value'.

  Some of the obvious brainos found in process are fixed (e.g. POLLIN
  misspelled as POLL_IN). At that point the amount of sparse warnings is
  low and most of them are for genuine bugs - e.g. ->poll() instance
  deciding to return -EINVAL instead of a bitmap. I hadn't touched those
  in this series - it's large enough as it is.

  Another problem it has caught was eventpoll() ABI mess; select.c and
  eventpoll.c assumed that corresponding POLL### and EPOLL### were
  equal. That's true for some, but not all of them - EPOLL### are
  arch-independent, but POLL### are not.

  The last commit in this series separates userland POLL### values from
  the (now arch-independent) kernel-side ones, converting between them
  in the few places where they are copied to/from userland. AFAICS, this
  is the least disruptive fix preserving poll(2) ABI and making epoll()
  work on all architectures.

  As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
  it will trigger only on what would've triggered EPOLLWRBAND on other
  architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
  at all on sparc. With this patch they should work consistently on all
  architectures"

* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  make kernel-side POLL... arch-independent
  eventpoll: no need to mask the result of epi_item_poll() again
  eventpoll: constify struct epoll_event pointers
  debugging printk in sg_poll() uses %x to print POLL... bitmap
  annotate poll(2) guts
  9p: untangle ->poll() mess
  ->si_band gets POLL... bitmap stored into a user-visible long field
  ring_buffer_poll_wait() return value used as return value of ->poll()
  the rest of drivers/*: annotate ->poll() instances
  media: annotate ->poll() instances
  fs: annotate ->poll() instances
  ipc, kernel, mm: annotate ->poll() instances
  net: annotate ->poll() instances
  apparmor: annotate ->poll() instances
  tomoyo: annotate ->poll() instances
  sound: annotate ->poll() instances
  acpi: annotate ->poll() instances
  crypto: annotate ->poll() instances
  block: annotate ->poll() instances
  x86: annotate ->poll() instances
  ...
2018-01-30 17:58:07 -08:00
Linus Torvalds d772794637 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle were:

   - Updates to use cond_resched() instead of cond_resched_rcu_qs()
     where feasible (currently everywhere except in kernel/rcu and in
     kernel/torture.c). Also a couple of fixes to avoid sending IPIs to
     offline CPUs.

   - Updates to simplify RCU's dyntick-idle handling.

   - Updates to remove almost all uses of smp_read_barrier_depends() and
     read_barrier_depends().

   - Torture-test updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  torture: Save a line in stutter_wait(): while -> for
  torture: Eliminate torture_runnable and perf_runnable
  torture: Make stutter less vulnerable to compilers and races
  locking/locktorture: Fix num reader/writer corner cases
  locking/locktorture: Fix rwsem reader_delay
  torture: Place all torture-test modules in one MAINTAINERS group
  rcutorture/kvm-build.sh: Skip build directory check
  rcutorture: Simplify functions.sh include path
  rcutorture: Simplify logging
  rcutorture/kvm-recheck-*: Improve result directory readability check
  rcutorture/kvm.sh: Support execution from any directory
  rcutorture/kvm.sh: Use consistent help text for --qemu-args
  rcutorture/kvm.sh: Remove unused variable, `alldone`
  rcutorture: Remove unused script, config2frag.sh
  rcutorture/configinit: Fix build directory error message
  rcutorture: Preempt RCU-preempt readers more vigorously
  torture: Reduce #ifdefs for preempt_schedule()
  rcu: Remove have_rcu_nocb_mask from tree_plugin.h
  rcu: Add comment giving debug strategy for double call_rcu()
  tracing, rcu: Hide trace event rcu_nocb_wake when not used
  ...
2018-01-30 10:15:30 -08:00
Jason Gunthorpe e7996a9a77 Linux 4.15
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJabj6pAAoJEHm+PkMAQRiGs8cIAJQFkCWnbz86e3vG4DuWhyA8
 CMGHCQdUOxxFGa/ixhIiuetbC0x+JVHAjV2FwVYbAQfaZB3pfw2iR1ncQxpAP1AI
 oLU9vBEqTmwKMPc9CM5rRfnLFWpGcGwUNzgPdxD5yYqGDtcM8K840mF6NdkYe5AN
 xU8rv1wlcFPF4A5pvHCH0pvVmK4VxlVFk/2H67TFdxBs4PyJOnSBnf+bcGWgsKO6
 hC8XIVtcKCH2GfFxt5d0Vgc5QXJEpX1zn2mtCa1MwYRjN2plgYfD84ha0xE7J0B0
 oqV/wnjKXDsmrgVpncr3txd4+zKJFNkdNRE4eLAIupHo2XHTG4HvDJ5dBY2NhGU=
 =sOml
 -----END PGP SIGNATURE-----

Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

To resolve conflicts in:
 drivers/infiniband/hw/mlx5/main.c
 drivers/infiniband/hw/mlx5/qp.c

From patches merged into the -rc cycle. The conflict resolution matches
what linux-next has been carrying.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-30 09:30:00 -07:00
Jason Gunthorpe beb801ac51 RDMA: Move enum ib_cq_creation_flags to uapi headers
The flags field the enum is used with comes directly from the uapi
so it belongs in the uapi headers for clarity and so userspace can
use it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 12:58:34 -07:00
Corentin Labbe 3356c8c0b8 IB/qib: remove qib_keys.c
qib_keys.c was left uncompilable in commit 7c2e11fe2d ("IB/qib: Remove qp and mr functionality from qib")
Since nothing need it, remove it from tree.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-28 14:07:16 -07:00
Corentin Labbe 57ac30c0ef IB/mthca: remove mthca_user.h
mthca_user.h is unused since commit 486f60954c ("IB/mthca: Move user vendor structures")
Remove it from tree.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-28 14:07:16 -07:00
Leon Romanovsky b081808a66 RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
Failure in XRCD FW deallocation command leaves memory leaked and
returns error to the user which he can't do anything about it.

This patch changes behavior to always free memory and always return
success to the user.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-28 14:07:16 -07:00
Davidlohr Bueso 487f6683f1 IB/mthca: Fix gup usage in mthca_map_user_db()
get_user_pages() must be called with mmap_sem held, currently
it is not. In fact it is called under the user db_table->mutex.
To fix this we can convert gup to use the fast alternative,
and safely avoid taking mmap_sem, if possible. Furthermore
this is safe wrt to the mutex as other callers that take the
lock (unmap and alloc_db) are not called under mmap_sem
(hence possible deadlock).

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-26 10:43:46 -05:00
Kalderon, Michal dc728f779a RDMA/qedr: lower print level of flushed CQEs
There are races where can still get flush on CQEs before the QP enters
error state. This is not an error and should be treated as
debug information.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-25 10:58:36 -05:00
Felix Kuehling 20c3ff6114 RDMA/qedr: Use pci_enable_atomic_ops_to_root()
Use PCI core interface pci_enable_atomic_ops_to_root() to enable atomic
capability.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
CC: Ram Amrani <Ram.Amrani@cavium.com>
CC: Doug Ledford <dledford@redhat.com>
2018-01-24 15:28:11 -06:00
Leon Romanovsky 10bea9c873 RDMA/mlx5: Remove redundant allocation warning print
The kmalloc() failure to allocate memory generates enough information
and doesn't need to be accompanied by another driver print.

Fixes: d69a24e036 ("IB/mlx5: Move IB event processing onto a workqueue")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-19 13:05:39 -07:00
weiyongjun (A) 0b5fe5c43a RDMA/hns: Remove unnecessary platform_get_resource() error check
devm_ioremap_resource() already checks if the resource is NULL, so
remove the unnecessary platform_get_resource() error check.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:21 -05:00
Feras Daoud 5c99eaecb1 IB/mlx5: Mmap the HCA's clock info to user-space
This patch maps the new page to user space applications to
allow converting a user space completion timestamp to system wall
time at the lowest possible latency cost.
By using a versioning scheme we allow compatibility between current
and future userspace libraries.
The change moves mlx5_ib_mmap_cmd enum from mlx5_ib.h to the
abi header file mlx5-abi.h.

Reviewed-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:21 -05:00
Devesh Sharma 37cb11acf1 RDMA/bnxt_re: Add SRQ support for Broadcom adapters
Shared receive queue (SRQ) is defined as a pool of
receive buffers shared among multiple QPs which belong
to same protection domain in a given process context.
Use of SRQ reduces the memory foot print of IB applications.

Broadcom adapters support SRQ, adding code-changes to enable
shared receive queue.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:19 -05:00
Selvin Xavier 89f81008ba RDMA/bnxt_re: expose detailed stats retrieved from HW
Broadcom's adapter supports more granular statistics
to allow better understanding about the state of the
chip when data traffic is flowing.

Exposing the detailed stats to the consumer through
the standard hook available in the kverbs interface.
In order to retrieve all the information, driver
implements a firmware command.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:18 -05:00
Somnath Kotur 872f357824 RDMA/bnxt_re: Add support for MRs with Huge pages
Depending on the OS page-table configurations, applications
may request MRs which has page size alignment other than 4K

Underlying provider driver needs to adjust its PBL boundaries
according to the incoming page boundaries in the PA list.

Adding a capability to register MRs having pages-sizes other
than 4K (Hugepages).

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:13 -05:00
Selvin Xavier 2fc68543f2 RDMA/bnxt_re: Add support for query firmware version
The device now reports firmware version thus, removing
the hard coded values of the FW version string and
redundant fw_rev hook from sysfs. Adding code to query
firmware version from underlying device and report it
through the kernel verb to get firmware version string.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17 09:56:17 -05:00
Selvin Xavier ccd9d0d3df RDMA/bnxt_re: Enable RoCE on virtual functions
RoCE can be used by virtual functions (VFs) as well. Adding
code changes to allow resource reservation, initialization
and avail the resources to the RDMA applications running on
those VFs.

Currently, fifty percent of the total available resources
are reserved for PF and remaining are equally divided among
active VFs.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-17 09:56:17 -05:00
David S. Miller c02b3741eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Overlapping changes all over.

The mini-qdisc bits were a little bit tricky, however.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 00:10:42 -05:00
Mustafa Ismail f20d429511 i40iw: Free IEQ resources
The iWARP Exception Queue (IEQ) resources are not freed when a QP is
destroyed. Fix this by freeing IEQ resources when freeing QP resources.

Fixes: d374984179 ("i40iw: add files for iwarp interface")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
Mustafa Ismail ebb6c0c015 i40iw: Remove setting of rem_addr.len
Remove setting of rem_addr.len before calling iw_rdma_write,
iw_inline_rdma_write and rdma_read. rem_addr.len is not used in those
functions.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
Sindhu Devale 72b30e986d i40iw: Remove limit on re-posting AEQ entries to HW
Currently, if the number of processed Asynchronous Event Queue (AEQ)
entries exceeds 255, they are not returned to HW for re-use. During
scale-up, the unreturned AEQ entries can grow to the max AEQ size and
cause the HW to report an AEQ overflow.

Remove the check which limits the number of processed AEQ entries returned
to HW.

Fixes: 86dbcd0f12 ("RDMA/i40iw: add file to handle cqp calls")
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
Shiraz Saleem 6376e926af i40iw: Zero-out consumer key on allocate stag for FMR
If the application invalidates the MR before the FMR WR, HW parses the
consumer key portion of the stag and returns an invalid stag key
Asynchronous Event (AE) that tears down the QP.

Fix this by zeroing-out the consumer key portion of the allocated stag
returned to application for FMR.

Fixes: ee855d3b93f3 ("RDMA/i40iw: Add base memory management extensions")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
Shiraz Saleem 23541b28e5 i40iw: Remove extra call to i40iw_est_sd()
Remove redundant estimate SD function call.  sd_needed should already be
updated at the end of the do while resource reduction loop.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
oulijun d4994d2f1f RDMA/hns: Set the guid for hip08 RoCE device
This patch assign a guid(Global Unique identifer) value to the hip08
device.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
oulijun 2eade67535 RDMA/hns: Update the verbs of polling for completion
If the port is a RoCEv2 port, the remote port address and QP information
which returned for UD will be modified.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
oulijun 6c1f08b347 RDMA/hns: Assign zero for pkey_index of wc in hip08
Because pkey is fixed for hip08 RoCE, it needs to assign zero for
pkey_index of wc. otherwise, it will happen an error when establishing
connection by communication management mechanism.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
oulijun 7bdee4158b RDMA/hns: Fill sq wqe context of ud type in hip08
This patch mainly configure the fields of sq wqe of ud type when posting
wr of gsi qp type.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
oulijun 0fa95a9a71 RDMA/hns: Add gsi qp support for modifying qp in hip08
It needs to Assign the values for some fields in qp context when qp type
is gsi qp type in hip08.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
oulijun b66efc9320 RDMA/hns: Create gsi qp in hip08
The gsi qp and rc qp use the same qp context structure and the created
flow, only differentiate them by qpn and qp type.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
oulijun 6d13b869ea RDMA/hns: Assign the correct value for tx_cqn
When modifying qp from init to init, it need to assign the cqn of send cq
for tx cqn field of qp context. Otherwise, it will cause a mistake when
the send and recv cq sizes are different.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-16 20:38:18 -07:00
Linus Torvalds 8cbab92dff Fifth pull request for 4.15-rc
- Oops fix in hfi1 driver
 - Use-after-free issue in iser-target
 - Use of user supplied array index without proper checking
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaXnN2AAoJELgmozMOVy/dyYAP/3YBi6ofVHrIjKovrsA6fh+y
 O1+YuG++2glgKF1QuZkZnrKqrJUIueHhGuKce2rRgWWMioAzOfd1mbXbwha3xKWh
 4mH/BK/fv1k/TZy+eu6w4WMnjtExVinPp4qeSwvvfHh3BpJfnBOtPbYNdPVt94MV
 QYPZuLarvSIJe8MFDbv6yitAIQWnPQLfGMWPW8OZb0ut7bsMNPyZ8LvcAXhfXA4a
 tVuxAfCqUb90BHWOnCBcaWGNpp9Ov2w6yyNCDkiIRXvgJxhjvJwxpHze6TMa4fCo
 mv8e4QDdQ8KK2vQaINQAJTl68Y1/fPxtI8um5tpIhJ5YjQE3rrVzTaQJRstBcXXs
 ViiDmxm/4KTUTeFtWHRj7oVJTZNSOBu+FfBVmjqtcUl4k+t9tDfj2Lw/ZVXT/3KQ
 +dsqUV+1PrQspCeFlKwg+e1mgXa3miyOYvMPacAHzRYBLv5oU6MPpb+WHbZMlTMt
 duT50JfjseIB0s6F2aGLNRRXPWb9jVUwHwzOXaDGKwX1+5u6O2IKeUP7+15TgZsS
 IFeLPodNN8W1N1z0TGcGL8zRXtY9GYhlYwrdMUe/4EWfxrrRtawLLCGGVhWKk/Rv
 iPl/Qpxj/8f2DS0A5+N5iHTAZPoOXjNx6JtZf9AZhvI9blvwZVuGdr63mKZMUnpb
 a0xeypv1bnYbv7k70zS8
 =MHNK
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "We had a few more items creep up over the last week. Given we are in
  -rc8, these are obviously limited to bugs that have a big downside and
  for which we are certain of the fix.

  The first is a straight up oops bug that all you have to do is read
  the code to see it's a guaranteed 100% oops bug.

  The second is a use-after-free issue. We get away lucky if the queue
  we are shutting down is empty, but if it isn't, we can end up oopsing.
  We really need to drain the queue before destroying it.

  The final one is an issue with bad user input causing us to access our
  port array out of bounds. While fixing the array out of bounds issue,
  it was noticed that the original code did the same thing twice (the
  call to rdma_ah_set_port_num()), so its removal is not balanced by a
  readd elsewhere, it was already where it needed to be in addition to
  where it didn't need to be.

  Summary:

   - Oops fix in hfi1 driver

   - use-after-free issue in iser-target

   - use of user supplied array index without proper checking"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/mlx5: Fix out-of-bound access while querying AH
  IB/hfi1: Prevent a NULL dereference
  iser-target: Fix possible use-after-free in connection establishment error
2018-01-16 16:47:40 -08:00
Jack Morgenstein 852f692759 IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH ports
Allocating steerable UD QPs depends on having at least one IB port,
while releasing those QPs does not.

As a result, when there are only ETH ports, the IB (RoCE) driver
requests releasing a qp range whose base qp is zero, with
qp count zero.

When SR-IOV is enabled, and the VF driver is running on a VM over
a hypervisor which treats such qp release calls as errors
(rather than NOPs), we see lines in the VM message log like:

 mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0

Fix this by adding a check for a zero count in mlx4_release_qp_range()
(which thus treats releasing 0 qps as a nop), and eliminating the
check for device managed flow steering when releasing steerable UD QPs.
(Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it
remains NULL when steerable UD QPs are not allocated).

Cc: <stable@vger.kernel.org>
Fixes: 4196670be7 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Jason Gunthorpe 7bed7ebcb7 RDMA/qedr: Fix endian problems around imm_data
The double swap matches what user space rdma-core does to imm_data.

wc->imm_data is not used in the kernel so this change has no practical
impact.

Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Jason Gunthorpe ccb8a29e7d RDMA/hns: Fix endian problems around imm_data and rkey
This matches the changes made recently to the userspace hns
driver when it was made sparse clean.

See rdma-core commit bffd380cfe56 ("libhns: Make the provider sparse
clean")

wc->imm_data is not used in the kernel so this change has no practical
impact.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Leon Romanovsky ae59c3f0b6 RDMA/mlx5: Fix out-of-bound access while querying AH
The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.

==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409

CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xe9/0x18f
 print_address_description+0xa2/0x350
 kasan_report+0x3a5/0x400
 to_rdma_ah_attr+0xa8/0x3b0
 mlx5_ib_query_qp+0xd35/0x1330
 ib_query_qp+0x8a/0xb0
 ib_uverbs_query_qp+0x237/0x7f0
 ib_uverbs_write+0x617/0xd80
 __vfs_write+0xf7/0x500
 vfs_write+0x149/0x310
 SyS_write+0xca/0x190
 entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560

Allocated by task 1:
 __kmalloc+0x3f9/0x430
 alloc_mad_private+0x25/0x50
 ib_mad_post_receive_mads+0x204/0xa60
 ib_mad_init_device+0xa59/0x1020
 ib_register_device+0x83a/0xbc0
 mlx5_ib_add+0x50e/0x5c0
 mlx5_add_device+0x142/0x410
 mlx5_register_interface+0x18f/0x210
 mlx5_ib_init+0x56/0x63
 do_one_initcall+0x15b/0x270
 kernel_init_freeable+0x2d8/0x3d0
 kernel_init+0x14/0x190
 ret_from_fork+0x24/0x30

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff880019ae2000
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
 512-byte region [ffff880019ae2000, ffff880019ae2200)
The buggy address belongs to the page:
page:000000005d674e18 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                          ^
 ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

Cc: <stable@vger.kernel.org>
Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 14:19:55 -07:00
Eran Ben Elisha 72f36be061 net/mlx5: Fix mlx5_get_uars_page to return error code
Change mlx5_get_uars_page to return ERR_PTR in case of
allocation failure. Change all callers accordingly to
check the IS_ERR(ptr) instead of NULL.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:47 +02:00
Eran Ben Elisha 8978cc921f {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed
There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.

Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)

Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.

Adapt mlx5_ib and mlx5e to support the new cap bits.

Fixes: 2c43c5a036 ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb4 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 00:52:42 +02:00
Colin Ian King da005f9fab IB/mlx5: remove redundant assignment of mdev
The initial assignment to mdev is redundant as mdev is re-assigned
later and the first assigned value is never read. Remove this
redundant assignment.

Cleans up clang warning:
drivers/infiniband/hw/mlx5/main.c:359:24: warning: Value stored
to 'mdev' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-10 16:52:52 -05:00
Dan Carpenter 57194fa763 IB/hfi1: Prevent a NULL dereference
In the original code, we set "fd->uctxt" to NULL and then dereference it
which will cause an Oops.

Fixes: f2a3bc00a0 ("IB/hfi1: Protect context array set/clear with spinlock")
Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-10 16:48:09 -05:00
David S. Miller a0ce093180 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-09 10:37:00 -05:00
Doug Ledford f8457d5832 Merge branch 'bart-srpt-for-next' into k.o/wip/dl-for-next
Merging in 12 patch series from Bart that required changes in the
current for-rc branch in order to apply cleanly.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-08 16:06:20 -05:00
Daniel Jurgens 85c7c01499 IB/mlx5: Don't advertise RAW QP support in dual port mode
When operating in dual port RoCE mode FW doesn't support steering for
raw QPs on the slave port. They still work on the master port, but
the user has no way of knowing which port is the master. The
capability is reported per device, not per port.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:24 -07:00
Daniel Jurgens 212f2a87b7 IB/mlx5: Route MADs for dual port RoCE
Route performance query MADs to the correct mlx5_core_dev when using
dual port RoCE mode.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Daniel Jurgens cfe4e37fdc {net, IB}/mlx5: Change set_roce_gid to take a port number
When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Daniel Jurgens aac4492ef2 IB/mlx5: Update counter implementation for dual port RoCE
Update the counter interface for multiple ports. Some counter sets
always comes from the primary device.

Port specific counters should be accessed per mlx5_core_dev not always
through the IB master mdev.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Parav Pandit a9e546e73a IB/mlx5: Change debugfs to have per port contents
When there are multiple ports for single IB(RoCE) device, support
debugfs entries to be available for each port.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens b3cbd6f080 IB/mlx5: Implement dual port functionality in query routines
Port operations must be routed to their native mlx5_core_dev. A
multiport RoCE device registers itself as having 2 ports even before a
2nd port is affiliated. If an unaffilated port is queried use capability
information from the master port, these values are the same.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens d69a24e036 IB/mlx5: Move IB event processing onto a workqueue
Because mlx5_ib_event can be called from atomic context move event
handling onto a workqueue. A mutex lock is required to get the IB device
for slave ports, so move event processing onto a work queue. When an IB
event is received, check if the mlx5_core_dev  is a slave port, if so
attempt to get the IB device it's affiliated with. If found process the
event for that device, otherwise return.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens 32f69e4be2 {net, IB}/mlx5: Manage port association for multiport RoCE
When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens 7fd8aefb7c IB/mlx5: Make netdev notifications multiport capable
When multiple RoCE ports are supported registration for events on
multiple netdevs is required. Refactor the event registration and
handling to support multiple ports.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:21 -07:00
Daniel Jurgens 508562d6f7 IB/mlx5: Reduce the use of num_port capability
Remove use of the num_ports general capability throughout. The number of
ports will be variable in the future, and reported in a different way.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:21 -07:00
Moni Shoua 776a3906b6 IB/mlx5: Add support for DC target QP
A DC Target (DCT) QP is represented in the hardware as a unique object.
This object is created by CREATE_DCT command and destroyed by DESTROY_DCT
command. However, in the driver we describe it as a QP.

The hardware command that creates a DCT needs parameters that the verb
create_qp() does not provide. Those remaining parameters are provided
with the call to the verb modify_qp(). Therefore we delay the actual
creation of a DCT in the hardware until the stage of modify_qp() to RTR.

A support for query_qp() was added as well. It uses QUERY_DCT command to
retrieve the applicable fields.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:51 -07:00
Moni Shoua c32a4f296e IB/mlx5: Add support for DC Initiator QP
DC Initiator (DCI) QP is represented like any other QP in the hardware.
However, like any other transport QP there are attributes and settings
that are special to DCI QP and needs specific attention and care.
Make necessary changes to configure DCI QP.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:50 -07:00
Moni Shoua b4aaa1f0b4 IB/mlx5: Handle type IB_QPT_DRIVER when creating a QP
The QP type IB_QPT_DRIVER doesn't describe the transport or the service
that the QP provides but those are known only to the hardware driver.
The actual type of the QP is stored in the hardware driver context (i.e.
mlx5_qp) under the field qp_sub_type.

Take the real QP type and any extra data that is required to create the QP
from the driver channel and modify the QP initial attributes before continuing
with create_qp().

Downstream patches from this series will add support for both DCI and
DCT driver QPs.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:50 -07:00
Jonathan Toppins 8fc12d94ee bnxt_re: report RoCE device support at info level
Reporting that a device doesn't support RoCE seems like a valuable piece
of information to have when trying to determine why a driver is not binding
to a device. Better to report this at info log level instead of requiring
a user to enable all debug messages in the driver.

Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-By: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 14:35:31 -05:00
Jia-Ju Bai 106b886306 i40iw: Replace mdelay with msleep in i40iw_wait_pe_ready
i40iw_wait_pe_ready is not called in an interrupt handler
nor holding a spinlock.
The function mdelay in it can be replaced with msleep,
to reduce busy wait.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:47:29 -05:00
Michael J. Ruhl 11f0e89710 IB/{hfi1, qib}: Fix a concurrency issue with device name in logging
The get_unit_name() function crafts a string based on the device name
and the device unit number.  It then stores this in a static variable.

This has concurrency issues as can be seen with this log:

hfi1 0000:02:00.0: hfi1_1: read_idle_message: read idle message 0x203
hfi1 0000:01:00.0: hfi1_1: read_idle_message: read idle message 0x203

The PCI device ID (0000:02:00.0 vs. 0000:01:00.0) is correct for the
message, but the device string hfi1_1 is incorrect (it should be
hfi1_0 for the second log message).

Remove get_unit_name() function.

Instead, use the rvt accessor rvt_get_ibdev_name() to get the IB name
string.

Clean up any hfi1_early_xx calls that can now use the new path.

QIB has the same (qib_get_unit_name()) issue.  Updating as necessary.

Remove qib_get_unit_name() function.

Update log message that has redundant device name.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:34:55 -05:00
Sebastian Sanchez 9996b049f6 IB/hfi1: Fix infinite loop in 8051 command error path
When an 8051 command times out, the entire DC block is restarted. During
the restart, the host interface version bit is set, which calls
do_8051_command() recursively. The host version bit needs to be set
before the link moves into polling, so the host version bit can be set
in set_local_link_attributes() instead. Thus, the 8051 command functions
can be simplied as a non-locking version (dd->dc8051_lock) of those
functions are no longer needed.

Fixes: 9be6a5d788 ("IB/hfi1: Prevent LNI out of sync by resetting host interface version")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:34:55 -05:00
Michael J. Ruhl 06f2597f75 IB/{rdmavt, hfi1, qib}: Remove get_card_name() downcall
rdmavt has a down call to client drivers to retrieve a crafted card
name.

This name should be the IB defined name.

Rather than craft the name each time it is needed, simply retrieve
the IB allocated name from the IB device.

Update the function name to reflect its application.

Clean up driver code to match this change.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:34:55 -05:00
Michael J. Ruhl 5084c8ff21 IB/{rdmavt, hfi1, qib}: Self determine driver name
Currently the HFI and QIB drivers allow the IB core to assign a unit
number to the driver name string.

If multiple devices exist in a system, there is a possibility that the
device unit number and the IB core number will be mismatched.

Fix by using the driver defined unit number to generate the device
name.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05 13:34:55 -05:00
Mark Bloch 3cc297db97 IB/mlx5: Move locks initialization to the corresponding stage
Unconditional locks/list and ODP srcu initialization should be done in
the INIT stage. Remove those from the CAPS stage and move them to the
proper stage.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch c8b8992446 IB/mlx5: Move loopback initialization to the corresponding stage
The loopback stage only initializes a lock, move it to be in
the CAPS initialization phase and get rid loopback step completely.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch 5e1e761251 IB/mlx5: Move hardware counters initialization to the corresponding stage
Now that we have a stage just for hardware counters, move all relevant
initialization logic into one place.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch 07321b3c67 IB/mlx5: Move ODP initialization to the corresponding stage
Now that we have a stage just for ODP, move all relevant
initialization logic into one place.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch c11a226a1e IB/mlx5: Move RoCE/ETH initialization to the corresponding stage
Now that we have a stage just for RoCE/ETH, move all relevant
initialization logic into one place.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:59 -07:00
Mark Bloch 16c1975f10 IB/mlx5: Create profile infrastructure to add and remove stages
Today we have single function which is used when we add an IB interface,
break this function into multiple functions.

Create stages and a generic mechanism to execute each stage.
This is in preparation for RDMA/IB representors which might not need
all stages or will do things differently in some of the stages.

This patch doesn't change any functionality.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 17:26:58 -07:00
Michael J. Ruhl d67d6114ca IB/hfi1: Add RQ/SRQ information to QP stats
When debugging issues with RC QPs, it is useful to know if a QP
has an associated RQ or SRQ, the size of the RQ, and any RNR timeout
values.

Add the necessary information to the QP stats output.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 14:21:31 -07:00
oulijun 107013ce7b RDMA/hns: Assign dest_qp when deregistering mr
It needs to create eight reserve QPs for resolving
a bug of hip06. When deregistering mr, it will issue
a rdma write for every reserve QPs.

When modify qp from init to rtr, it needs to set
the value of dest_qp_num. Otherwise, it will lead
an error of freeing mr.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 13:58:59 -07:00
Yixian Liu 10bd2ade4b RDMA/hns: Fix QP state judgement before sending work requests
The QP can accept send work requests only when the QP is
in the states that allow them to be submitted.

This patch updates the QP state judgement based on the
specification.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 13:58:59 -07:00
oulijun 52e3b42a2f RDMA/hns: Filter for zero length of sge in hip08 kernel mode
When the length of sge is zero, the driver need to filter it

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 13:58:59 -07:00
oulijun ace1c5416b RDMA/hns: Set access flags of hip08 RoCE
This patch refactors the code of setting access flags
for RDMA operation as well as adds the scene when
attr->max_dest_rd_atomic is zero.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 13:58:58 -07:00
oulijun 4f3f7a704b RDMA/hns: Update the usage of sr_max and rr_max field
This patch fixes the usage with sr_max filed and rr_max of qp
context when modify qp. Its modifications include:
1. Adjust location of filling sr_max filed of qpc
2. Only assign the number of responder resource if
   IB_QP_MAX_DEST_RD_ATOMIC bit is set
3. Only assign the number of outstanding resource if
   IB_QP_MAX_QP_RD_ATOMIC
4. Fix the assgin algorithms for the field of sr_max
   and rr_max of qp context

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 13:58:58 -07:00
oulijun 0009c2dbe8 RDMA/hns: Add rq inline data support for hip08 RoCE
This patch mainly implement rq inline data feature for hip08
RoCE in kernel mode.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-03 13:58:58 -07:00
Ingo Molnar 475c5ee193 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

- Updates to use cond_resched() instead of cond_resched_rcu_qs()
  where feasible (currently everywhere except in kernel/rcu and
  in kernel/torture.c).  Also a couple of fixes to avoid sending
  IPIs to offline CPUs.

- Updates to simplify RCU's dyntick-idle handling.

- Updates to remove almost all uses of smp_read_barrier_depends()
  and read_barrier_depends().

- Miscellaneous fixes.

- Torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03 14:14:18 +01:00
Tomer Tayar da09091732 qed*: Utilize FW 8.33.1.0
Advance the qed* drivers to use firmware 8.33.1.0:
Modify core driver (qed) to utilize the new FW and initialize the device
with it. This is the lion's share of the patch, and includes changes to FW
interface files, device initialization flows, FW interaction flows, and
debug collection flows.
Modify Ethernet driver (qede) to make use of new FW in fastpath.
Modify RoCE/iWARP driver (qedr) to make use of new FW in fastpath.
Modify FCoE driver (qedf) to make use of new FW in fastpath.
Modify iSCSI driver (qedi) to make use of new FW in fastpath.

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Bason <Yuval.Bason@cavium.com>
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:59:16 -05:00
Tomer Tayar 21dd79e82f qed*: HSI renaming for different types of HW
This patch renames defines and structures in the FW HSI files to allow a
distinction between different types of HW.

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:59:15 -05:00
Tomer Tayar a2e7699eb5 qed*: Refactoring and rearranging FW API with no functional impact
This patch refactors and reorders the FW API files in preparation of
upgrading the code to support new FW.

- Make use of the BIT macro in appropriate places.
- Whitespace changes to align values and code blocks.
- Comments are updated (spelling mistakes, removed if not clear).
- Group together code blocks which are related or deal with similar
 matters.

Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:59:15 -05:00
Himanshu Jha 27f382da31 IB/mthca: Use zeroing memory allocator than allocator/memset
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.

Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 11:20:13 -07:00
Himanshu Jha 7d37ebbb92 RDMA/bnxt_re: Use zeroing memory allocator than allocator/memset
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.

Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 11:20:13 -07:00
Himanshu Jha 7bced914e8 RDMA/qedr: Use zeroing memory allocator than allocator/memset
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.

Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 11:20:13 -07:00
Himanshu Jha 583556568c RDMA/vmw_pvrdma: Use zeroing memory allocator than allocator/memset
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.

Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 11:20:13 -07:00
Himanshu Jha d78756d842 IB/ocrdma: Use zeroing memory allocator than allocator/memset
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.

Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 11:20:13 -07:00
Leon Romanovsky 5a371cf87e IB/mlx4: Fix mlx4_ib_alloc_mr error flow
ibmr.device is being set only after ib_alloc_mr() is successfully complete.
Therefore, in case imlx4_mr_enable() returns with error, the error flow
unwinder calls to mlx4_free_priv_pages(), which uses ibmr.device.

Such usage causes to NULL dereference oops and to fix it, the IB device
should be set in the mr struct earlier stage (e.g. prior to calling
mlx4_free_priv_pages()).

Fixes: 1b2cd0fc67 ("IB/mlx4: Support the new memory registration API")
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 11:09:05 -07:00
Bharat Potnuri f48fca4d81 iw_cxgb4: Change error/warn prints to pr_debug
These prints not neccesarily mean error/warning, so changing them to
debug.

Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-29 11:09:23 -07:00
Yixian Liu 4044a3f482 RDMA/hns: Add detailed comments for mb() call
This patch adds more detailed comments when we call the
memory barrier function, such as rmb, wmb and mb. Three
mb() callers are deleted since they are unnecessary.

Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-29 11:07:04 -07:00
Yishai Hadas 1ee47ab3e8 IB/mlx5: Enable QP creation with a given blue flame index
This patch enables QP creation with a given BF index, this allows the
user space driver to share same BF between few QPs or alternatively have
a dedicated BF per QP.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:37:46 -07:00
Yishai Hadas 4ed131d0bb IB/mlx5: Expose dynamic mmap allocation
This patch exposes the option to dynamic allocates a UAR, this
functionality will be used in downstream patch in this series as
part of QP creation.

Specifically, the user space driver asks for a UAR allocation in a given
page index, upon success this UAR and its bfregs can be used as part of
QP creation by the user space driver.

To enable allocating more than 256 UARs the page index is encoded in an
extra one byte just after the command byte.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:37:46 -07:00
Yishai Hadas 31a78a5a79 IB/mlx5: Extend UAR stuff to support dynamic allocation
This patch extends the alloc context flow to be prepared for working
with dynamic UAR allocations.

Currently upon alloc context there is some fix size of UARs that are
allocated (named 'static allocation') and there is no option to user
application to ask for more or control which UAR will be used by which
QP.

In this patch the driver prepares its data structures to manage both the
static and the dynamic allocations and let the user driver knows about
the max value of dynamic blue-flame registers that are allowed.

Downstream patches from this series will enable the dynamic allocation
and the association as part of QP creation.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:33:14 -07:00
Maor Gottlieb 4e2b53a5cb IB/mlx5: Report inner RSS capability
Add missing inner RSS support capability as part of
the RSS supported fields.

In addition change MLX5_RX_HASH_INNER to 1UL << 31 in
order to define it as unsigned.

Fixes: 309fa3470f ("IB/mlx5: Add support for RSS on the inner packet")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:33:14 -07:00
Guy Levi 07d84f7b6a IB/mlx4: Add support to RSS hash for inner headers
Support RSS hash for inner headers according to a new flag,
MLX4_IB_RX_HASH_INNER provided by the vendor channel.

In case the flag is set, RSS hash will be done on the inner headers of
VXLAN packets (which are encapsulated).
Non-encapsulated packets will be hashed according to the outer headers.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28 11:33:14 -07:00
Jason Gunthorpe 76a895d9e1 Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
Patches for 4.16 that are dependent on patches sent to 4.15-rc.

These are small clean ups for the vmw_pvrdma and i40iw drivers.

* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
  RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
  RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
  RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
  RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
  RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
  i40iw: Change accelerated flag to bool
2017-12-27 21:50:46 -07:00
Bryan Tan a61eb61368 RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
refcount_t is the preferred type for refcounts. Change the
QP and CQ refcnt fields to use refcount_t.

Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 21:37:26 -07:00
Bryan Tan 1a9ecf8d1a RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
Convert the sizeof(void *) in two kcalloc calls to be more
specific for the arrays that are being allocated.

Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 21:37:26 -07:00
Bryan Tan 5aef7cf254 RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
Be more consistent in setting and checking is_kernel
flag for QPs and CQs.

Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 21:37:26 -07:00
Henry Orosco 66131e005e i40iw: Change accelerated flag to bool
The accelerated flag only utilizes two values: 0 and 1.
Modify accelerated flag in struct i40iw_cm_node to bool.

Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 21:37:26 -07:00
Nitzan Carmi 45e6ae7ef2 IB/mlx5: Fix mlx5_ib_alloc_mr error flow
ibmr.device is being set only after ib_alloc_mr() is
(successfully) complete. Therefore, in case mlx5_core_create_mkey()
return with error, the error flow calls mlx5_free_priv_descs()
which uses ibmr.device (which doesn't exist yet), causing
a NULL dereference oops.

To fix this, the IB device should be set in the mr struct earlier
stage (e.g. prior to calling mlx5_core_create_mkey()).

Fixes: 8a187ee52b ("IB/mlx5: Support the new memory registration API")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 15:24:41 -07:00
Majd Dibbiny ad9a3668a4 IB/mlx5: Serialize access to the VMA list
User-space applications can do mmap and munmap directly at
any time.

Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.

Cc: <stable@vger.kernel.org> # v4.7
Fixes: 7c2344c3bb ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27 15:24:40 -07:00
Don Hiatt 2e903b611b IB/hfi1: Change slid arg in ingress_pkey_table_fail to 32bit
Change the slid arg to ingress_pkey_table_fail() to a full
32Bits and do not convert to 16Bits in caller. This is so we
can keep everything 32bit in the kernel and only change to
16bit at the uapi boundary.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:46:11 -07:00
Tatyana Nikolova fefa06811c i40iw: Fix the connection ORD value for loopback
The accepting QP ORD value should be adjusted not to
exceed the peer QP IRD value (RFC 6581). This is
skipped for loopback. After the ORD is validated
by i40iw_record_ird_ord(), adjust the ORD value of
the loopback accepting QP to prevent overrunning the
IRD space of the peer QP. Also move the ORD accounting
for 0-byte RDMA read to i40iw_record_ird_ord().

Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:46:11 -07:00
Tatyana Nikolova ce9ce74145 i40iw: Validate correct IRD/ORD connection parameters
Casting to u16 before validating IRD/ORD connection
parameters could cause recording wrong IRD/ORD values
in the cm_node. Validate the IRD/ORD parameters as
they are passed by the application before recording
them.

Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:39:48 -07:00
Shiraz Saleem 756f77d216 i40iw: Ignore LLP_DOUBT_REACHABILITY AE
The LLP_DOUBT_REACHABILITY Asynchronous Event (AE) is an early
warning of a connection issue. It is followed by LLP_TOO_MANY_RETRIES
AE, if the retransmit threshold is reached and recovery is not possible
for the connection.

Currently we terminate the connection on receiving the
LLP_DOUBT_REACHABILITY AE. Ignore this AE and
terminate the connection only on LLP_TOO_MANY_RETRIES AE.

This improves the user experience on cable disconnect/reconnect
scenario while running iWARP traffic. On cable disconnect,
the QP traffic is paused and the user has a larger and more
reasonable timeout within which if the cable is reconnected,
traffic can continue.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:39:21 -07:00
Shiraz Saleem df8b13a1b2 i40iw: Fix sequence number for the first partial FPDU
Partial FPDU processing is broken as the sequence number
for the first partial FPDU is wrong due to incorrect
Q2 buffer offset. The offset should be 64 rather than 16.

Fixes: 786c6adb3a ("i40iw: add puda code")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:39:20 -07:00
Shiraz Saleem 3020f252c3 i40iw: Selectively teardown QPs on IP addr change event
On IP address change event, all connected QPs are torn down
irrespective of whether IP address is involved in a connection.

Only teardown connections those source or destination address
matches the netdev interface IP address being changed, and if
they are on the same VLAN as the netdev.

Fixes: e5e74b61b1 ("i40iw: Add IP addr handling on netdev events")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:39:07 -07:00
Shiraz Saleem 0c5d515546 i40iw: Add notifier for network device events
Register a netdevice notifier for netdev UP/DOWN
notification events and report the appropriate ib event.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:38:05 -07:00
Shiraz Saleem fe99afd1fe i40iw: Correct Q1/XF object count equation
Lower Inbound RDMA Read Queue (Q1) object count by a factor of 2
as it is incorrectly doubled. Also, round up Q1 and Transmit FIFO (XF)
object count to power of 2 to satisfy hardware requirement.

Fixes: 86dbcd0f12 ("i40iw: add file to handle cqp calls")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:38:05 -07:00
Shiraz Saleem 8758768ad8 i40iw: Use utility function roundup_pow_of_two()
Consolidate all power of 2 round calculations to
use kernel utility function roundup_pow_of_two().

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:37:51 -07:00
Shiraz Saleem f32b766cf7 i40iw: Set MAX_IRD_SIZE to 64
Increase I40IW_MAX_IRD_SIZE to 64 which is the device limit.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:33:30 -07:00
Shiraz Saleem adb1c0d15e nes: Change accelerated flag to bool
The accelerated flag only utilizes two values: 0 and 1.
Modify accelerated flag in struct nes_cm_node to bool.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 13:33:30 -07:00
Michael J. Ruhl 4c009af473 IB/hfi: Only read capability registers if the capability exists
During driver init, various registers are saved to allow restoration
after an FLR or gen3 bump.  Some of these registers are not available
in some circumstances (i.e. Virtual machines).

This bug makes the driver unusable when the PCI device is passed into
a VM, it fails during probe.

Delete unnecessary register read/write, and only access register if
the capability exists.

Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: a618b7e40a ("IB/hfi1: Move saving PCI values to a separate function")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 10:42:08 -07:00
Pravin Shedge 72c7fe90ee drivers: infiniband: remove duplicate includes
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 09:39:35 -07:00
Yixian Liu a5073d6054 RDMA/hns: Add eq support of hip08
This patch adds eq support for hip08. The eq table can
be multi-hop addressed.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 09:21:45 -07:00
Yixian Liu b16f818847 RDMA/hns: Refactor eq code for hip06
Considering the compatibility of supporting hip08's eq
process and possible changes of data structure, this patch
refactors the eq code structure of hip06.

We move all the eq process code for hip06 from hns_roce_eq.c
into hns_roce_hw_v1.c, and also for hns_roce_eq.h. With
these changes, it will be convenient to add the eq support
for later hardware version.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-22 09:21:38 -07:00