Commit Graph

226 Commits

Author SHA1 Message Date
Parav Pandit 82f82ceb8e IB/rxe: Use rdma GID API
rxe_netdev_from_av can now be done by the core code directly from the
gid_attrs, no need for a helper in the driver.

ib_find_cached_gid_by_port can be switched to use the rdma version here as
well.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18 11:09:05 -06:00
Zhu Yanjun 828d810550 IB/rxe: avoid double kfree skb
In rxe_send, when network_type is not RDMA_NETWORK_IPV4 or
RDMA_NETWORK_IPV6, skb is freed and -EINVAL is returned.
Then rxe_xmit_packet will return -EINVAL, too. In rxe_requester,
this skb is double freed.
In rxe_requester, kfree_skb is needed only after fill_packet fails.
So kfree_skb is moved from label err to test fill_packet.

Fixes: 5793b46521 ("IB/rxe: remove unnecessary skb_clone in xmit")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-11 11:02:27 -06:00
Jason Gunthorpe 0394808d9e Merge branch 'mr_fix' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-next
Update mlx4 to support user MR creation against read-only memory, previously
it required the memory to be writable.

Based on rdma for-rc due to dependencies.

* mr_fix: (2 commits)
  IB/mlx4: Mark user MR as writable if actual virtual memory is writable
  IB/core: Make testing MR flags for writability a static inline function
2018-05-28 11:44:35 -06:00
Zhu Yanjun bb42f87e29 IB/rxe: avoid unnecessary export
The function rxe_remove_all is only used in this modules.
There is no other modules that call this function. So it
is not necessary to export it.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-28 11:25:55 -06:00
Zhu Yanjun da2f3b286a IB/rxe: avoid calling WARN_ON_ONCE twice
In the exit branch, WARN_ON_ONCE is called to show stack. So it is
not necessary to call WARN_ON_ONCE before going to exit.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-17 09:16:14 -06:00
Ben Hutchings e02637e97d IB: Fix RDMA_RXE and INFINIBAND_RDMAVT dependencies for DMA_VIRT_OPS
DMA_VIRT_OPS requires that dma_addr_t is at least as wide as a
pointer, which is expressed as a dependency on !64BIT ||
ARCH_DMA_ADDR_T_64BIT.

For parisc64 this is not true, and if these IB modules are enabled,
kconfig warns:

WARNING: unmet direct dependencies detected for DMA_VIRT_OPS
  Depends on [n]: HAS_DMA [=y] && (!64BIT [=y] || ARCH_DMA_ADDR_T_64BIT)
  Selected by [m]:
  - INFINIBAND_RDMAVT [=m] && INFINIBAND [=m] && 64BIT [=y] && PCI [=y]
  - RDMA_RXE [=m] && INET [=y] && PCI [=y] && INFINIBAND [=m]

Add dependencies to fix this.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-15 10:48:16 -04:00
Doug Ledford f5e27a203f Merge branch 'k.o/for-rc' into k.o/wip/dl-for-next
Several items of conflict have arisen between the RDMA stack's for-rc
branch and upcoming for-next work:

9fd4350ba8 ("IB/rxe: avoid double kfree_skb") directly conflicts with
2e47350789 ("IB/rxe: optimize the function duplicate_request")

Patches already submitted by Intel for the hfi1 driver will fail to
apply cleanly without this merge

Other people on the mailing list have notified that their upcoming
patches also fail to apply cleanly without this merge

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 15:48:48 -04:00
Alexandru Moise 1661d3b0e2 nvmet,rxe: defer ip datagram sending to tasklet
This addresses 3 separate problems:

1. When using NVME over Fabrics we may end up sending IP
packets in interrupt context, we should defer this work
to a tasklet.

[   50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0
[   50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         4.17.0-rc3-ARCH+ #104
[   50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[   50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0
[   50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006
[   50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001
[   50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec
[   50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614
[   50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec
[   50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000
[   50.959402] FS:  0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000
[   50.961552] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0
[   50.966121] Call Trace:
[   50.966845]  <IRQ>
[   50.967497]  __dev_queue_xmit+0x62d/0x690
[   50.968722]  dev_queue_xmit+0x10/0x20
[   50.969894]  neigh_resolve_output+0x173/0x190
[   50.971244]  ip_finish_output2+0x2b8/0x370
[   50.972527]  ip_finish_output+0x1d2/0x220
[   50.973785]  ? ip_finish_output+0x1d2/0x220
[   50.975010]  ip_output+0xd4/0x100
[   50.975903]  ip_local_out+0x3b/0x50
[   50.976823]  rxe_send+0x74/0x120
[   50.977702]  rxe_requester+0xe3b/0x10b0
[   50.978881]  ? ip_local_deliver_finish+0xd1/0xe0
[   50.980260]  rxe_do_task+0x85/0x100
[   50.981386]  rxe_run_task+0x2f/0x40
[   50.982470]  rxe_post_send+0x51a/0x550
[   50.983591]  nvmet_rdma_queue_response+0x10a/0x170
[   50.985024]  __nvmet_req_complete+0x95/0xa0
[   50.986287]  nvmet_req_complete+0x15/0x60
[   50.987469]  nvmet_bio_done+0x2d/0x40
[   50.988564]  bio_endio+0x12c/0x140
[   50.989654]  blk_update_request+0x185/0x2a0
[   50.990947]  blk_mq_end_request+0x1e/0x80
[   50.991997]  nvme_complete_rq+0x1cc/0x1e0
[   50.993171]  nvme_pci_complete_rq+0x117/0x120
[   50.994355]  __blk_mq_complete_request+0x15e/0x180
[   50.995988]  blk_mq_complete_request+0x6f/0xa0
[   50.997304]  nvme_process_cq+0xe0/0x1b0
[   50.998494]  nvme_irq+0x28/0x50
[   50.999572]  __handle_irq_event_percpu+0xa2/0x1c0
[   51.000986]  handle_irq_event_percpu+0x32/0x80
[   51.002356]  handle_irq_event+0x3c/0x60
[   51.003463]  handle_edge_irq+0x1c9/0x200
[   51.004473]  handle_irq+0x23/0x30
[   51.005363]  do_IRQ+0x46/0xd0
[   51.006182]  common_interrupt+0xf/0xf
[   51.007129]  </IRQ>

2. Work must always be offloaded to tasklet for rxe_post_send_kernel()
when using NVMEoF in order to solve lock ordering between neigh->ha_lock
seqlock and the nvme queue lock:

[   77.833783]  Possible interrupt unsafe locking scenario:
[   77.833783]
[   77.835831]        CPU0                    CPU1
[   77.837129]        ----                    ----
[   77.838313]   lock(&(&n->ha_lock)->seqcount);
[   77.839550]                                local_irq_disable();
[   77.841377]                                lock(&(&nvmeq->q_lock)->rlock);
[   77.843222]                                lock(&(&n->ha_lock)->seqcount);
[   77.845178]   <Interrupt>
[   77.846298]     lock(&(&nvmeq->q_lock)->rlock);
[   77.847986]
[   77.847986]  *** DEADLOCK ***

3. Same goes for the lock ordering between sch->q.lock and nvme queue lock:

[   47.634271]  Possible interrupt unsafe locking scenario:
[   47.634271]
[   47.636452]        CPU0                    CPU1
[   47.637861]        ----                    ----
[   47.639285]   lock(&(&sch->q.lock)->rlock);
[   47.640654]                                local_irq_disable();
[   47.642451]                                lock(&(&nvmeq->q_lock)->rlock);
[   47.644521]                                lock(&(&sch->q.lock)->rlock);
[   47.646480]   <Interrupt>
[   47.647263]     lock(&(&nvmeq->q_lock)->rlock);
[   47.648492]
[   47.648492]  *** DEADLOCK ***

Using NVMEoF after this patch seems to finally be stable, without it,
rxe eventually deadlocks the whole system and causes RCU stalls.

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09 10:39:51 -04:00
YueHaibing ecb238f6a7 IB/cxgb4: use skb_put_zero()/__skb_put_zero
Use the recently introduced helper to replace the pattern of
skb_put_zero/__skb_put() && memset().

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-01 11:27:37 -04:00
Zhu Yanjun 9fd4350ba8 IB/rxe: avoid double kfree_skb
When skb is sent, it will pass the following functions in soft roce.

rxe_send [rdma_rxe]
    ip_local_out
        __ip_local_out
        ip_output
            ip_finish_output
                ip_finish_output2
                    dev_queue_xmit
                        __dev_queue_xmit
                            dev_hard_start_xmit

In the above functions, if error occurs in the above functions or
iptables rules drop skb after ip_local_out, kfree_skb will be called.
So it is not necessary to call kfree_skb in soft roce module again.
Or else crash will occur.

The steps to reproduce:

     server                       client
    ---------                    ---------
    |1.1.1.1|<----rxe-channel--->|1.1.1.2|
    ---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 10000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 10000 -S 512

The kernel configs CONFIG_DEBUG_KMEMLEAK and
CONFIG_DEBUG_OBJECTS are enabled on both server and client.

When rping runs, run the following command in server:

iptables -I OUTPUT -p udp  --dport 4791 -j DROP

Without this patch, crash will occur.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
Jianchao Wang 2da36d44a9 IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV
w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
will not be updated in update_wqe_psn, and the corresponding
wqe will not be acked in rxe_completer due to its last_psn is
zero. Finally, the other wqe will also not be able to be acked,
because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
is still there. This causes large amount of io timeout when
nvmeof is over rxe.

Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 14:20:47 -04:00
Zhu Yanjun e12ee8ce51 IB/rxe: remove unused function variable
In the functions rxe_mem_init_dma, rxe_mem_init_user, rxe_mem_init_fast
and copy_data, the function variable rxe is not used. So this function
variable rxe is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 12:18:29 -04:00
Zhu Yanjun 0dff463a6a IB/rxe: change rxe_set_mtu function type to void
The function rxe_set_mtu always returns zero. So this function type
is changed to void.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 12:18:29 -04:00
Yuval Shaia 10c47d5606 IB/rxe: Change rxe_rcv to return void
It always returns 0. Change return type to void.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27 12:17:56 -04:00
Zhu Yanjun fe896ceb57 IB/rxe: replace refcount_inc with skb_get
Follow the advice from Bart, the function refcount_inc is replaced
with skb_get in commit 99dae69025 ("IB/rxe: optimize mcast recv process")
and commit 86af617641 ("IB/rxe: remove unnecessary skb_clone").

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Suggested-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-19 13:58:16 -04:00
Zhu Yanjun 2e47350789 IB/rxe: optimize the function duplicate_request
In the function duplicate_request, the reference of skb can be increased
to replace the function skb_clone.

This will make rxe performace better and save memory.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-19 13:58:16 -04:00
Zhu Yanjun 8f1a72c815 IB/rxe: make rxe_release_udp_tunnel static
The function rxe_release_udp_tunnel is only used in rxe_net.c.
So it is necessary to make this function as static.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-19 13:58:04 -04:00
Zhu Yanjun 76be04500b IB/rxe: avoid export symbols
The functions rxe_set_mtu, rxe_add and rxe_remove are only used in their
own module. So it is not necessary to export them.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-17 19:54:38 -06:00
Zhu Yanjun 0f02ba7ed1 IB/rxe: make the variable static
The variable rxe_net_notifier is only used in the file rxe_net.c. So
remove it from rxe_net.h file and make it static in the file rxe_net.c.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-17 19:54:38 -06:00
Mikhail Malygin efc365e729 IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
On ppc64le arch rxe_add command causes oops in kernel log:

[   92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
[   92.499710] SMP NR_CPUS=2048 NUMA pSeries
[   92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
 nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
a2xxx(E)
[   92.499832]  scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[   92.499834] Supported: No, Unsupported modules are loaded
[   92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G           OE   NX 4.4.120-ttln.17-default #1
[   92.499841] task: c0000000afe8a490 ti: c0000000beba8000 task.ti: c0000000beba8000
[   92.499842] NIP: c00000000008ba3c LR: c000000000027644 CTR: c00000000008ba10
[   92.499844] REGS: c0000000bebab750 TRAP: 0300   Tainted: G           OE   NX  (4.4.120-ttln.17-default)
[   92.499850] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28424428  XER: 20000000
[   92.499871] CFAR: 0000000000002424 DAR: 0000000000000208 DSISR: 40000000 SOFTE: 1
               GPR00: c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
               GPR04: d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
               GPR08: 000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
               GPR12: c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
               GPR16: 0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
               GPR20: c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
               GPR24: c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
               GPR28: d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
[   92.499881] NIP [c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
[   92.499885] LR [c000000000027644] dma_get_required_mask+0x44/0xac
[   92.499886] Call Trace:
[   92.499891] [c0000000bebab9d0] [c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
[   92.499894] [c0000000bebaba10] [c000000000027644] dma_get_required_mask+0x44/0xac
[   92.499904] [c0000000bebaba30] [d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
[   92.499910] [c0000000bebabab0] [d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
[   92.499915] [c0000000bebabb30] [d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
[   92.499921] [c0000000bebabb60] [d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
[   92.499924] [c0000000bebabbf0] [c0000000000e78c0] param_attr_store+0xa0/0x180
[   92.499927] [c0000000bebabc70] [c0000000000e6448] module_attr_store+0x48/0x70
[   92.499932] [c0000000bebabc90] [c000000000391f60] sysfs_kf_write+0x70/0xb0
[   92.499935] [c0000000bebabcb0] [c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
[   92.499939] [c0000000bebabd00] [c0000000002e22ac] __vfs_write+0x4c/0x1d0
[   92.499942] [c0000000bebabd90] [c0000000002e2f94] vfs_write+0xc4/0x200
[   92.499945] [c0000000bebabde0] [c0000000002e488c] SyS_write+0x6c/0x110
[   92.499948] [c0000000bebabe30] [c000000000009384] system_call+0x38/0xe4
[   92.499949] Instruction dump:
[   92.499954] 4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
[   92.499958] fbc1fff0 fbe1fff8 f8010010 f821ffc1 <e9230208> 7c7e1b78 2fa90000 419e0078
[   92.499962] ---[ end trace bed077e15eb420cf ]---

It fails in dma_get_required_mask, that has ppc-specific implementation,
and fail if provided device argument is NULL

Signed-off-by: Mikhail Malygin <mikhail@malygin.me>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:50 -06:00
Parav Pandit 39e00b6cf6 IB/rxe: Removed GID add/del dummy routines
rxe driver's add_gid() and del_gid() callbacks are doing simple
checks which are already done by the ib core before invoking these
callback routines.
Therefore, code is simplified to skip implementing add_gid() and
del_gid() callback functions.
They are only invoked by ib_core if they are implemented.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 10:15:33 -06:00
Parav Pandit 414448d249 RDMA: Use ib_gid_attr during GID modification
Now that ib_gid_attr contains device, port and index, simplify the
provider APIs add_gid() and del_gid() to use device, port and index
fields from the ib_gid_attr attributes structure.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:34:16 -06:00
Parav Pandit 3e44e0ee08 IB/providers: Avoid null netdev check for RoCE
Now that IB core GID cache ensures that all RoCE entries have an
associated netdev remove null checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Parav Pandit 0e1f9b9244 RDMA/providers: Simplify query_gid callback of RoCE providers
ib_query_gid() fetches the GID from the software cache maintained in
ib_core for RoCE ports.

Therefore, simplify the provider drivers for RoCE to treat query_gid()
callback as never called for RoCE, and only require non-RoCE devices to
implement it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:47 -06:00
Zhu Yanjun 99dae69025 IB/rxe: optimize mcast recv process
In mcast recv process, the function skb_clone is used. In fact,
the refcount can be increased to replace cloning a new skb since
the original skb will not be modified before it is freed.

This can make the performance better and save the memory.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:25:22 -06:00
Jason Gunthorpe f2e9bfac13 RDMA/rxe: Fix uABI structure layouts for 32/64 compat
With 32 bit compilation several of the fields become misaligned here.
Fixing this is an ABI break for 32 bit rxe and it is in well used
portions of the rxe ABI.

To handle this we bump the ABI version, as expected. However the user
space driver doesn't handle it properly today, so all existing user
space continues to work.

Updated userspace will start to require the necessary kernel version.

We don't expect there to be any 32 bit users of rxe. Most likely cases,
such as ARM 32 already generally don't work because rxe does not handle
the CPU cache properly on its shared with userspace pages.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:09 -06:00
Matan Barak 0ede73bc01 IB/uverbs: Extend uverbs_ioctl header with driver_id
Extending uverbs_ioctl header with driver_id and another reserved
field. driver_id should be used in order to identify the driver.
Since every driver could have its own parsing tree, this is necessary
for strace support.
Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
support and thus we add some reserved fields for future usage.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Jason Gunthorpe 0c43ab371b RDMA/rxe: Use structs to describe the uABI instead of opencoding
Open coding pointer math is not acceptable for describing the uABI in
RDMA. Provide structs for all the cases.

The udata is casted to the struct as close to the verbs entry point
as possible for maximum clarity. Function signatures and so forth
are revised to allow for this.

Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:02 -06:00
Jason Gunthorpe b92ec0fe32 RDMA/rxe: Get rid of confusing udata parameter to rxe_cq_chk_attr
It isn't used and it couldn't possibly ever be used correctly.

Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15 15:58:02 -06:00
Martin Wilck 43c9fc509f rdma_rxe: make rxe work over 802.1q VLAN devices
This patch fixes RDMA/rxe over 802.1q VLAN devices.

Without it, I observed the following behavior:

a) adding a VLAN device to RXE via rxe_net_add() creates a non-functional
   RDMA device. This is caused by the logic in enum_all_gids_of_dev_cb() /
   is_eth_port_of_netdev(), which only considers networks connected to
   "upper devices" of the configured network device, resulting in an empty
   set of gids for a VLAN interface that is an "upper device" itself.
   Later attempts to connect via this rdma device fail in cma_acuire_dev()
   because no gids can be resolved.

b) adding the master device of the VLAN device instead seems to work
   initially, target addresses via VLAN devices are resolved successfully.
   But the connection times out because no 802.1q VLAN headers are
   inserted in the ethernet packets, which are therefore never received.
   This happens because the RXE layer sends the packets via the master
   device rather than the VLAN device.

The problem could be solved by changing either a) or b). My thinking was
that the logic in a) was created deliberately, thus I decided to work on
b). It turns out that the information about the VLAN interface for the gid
at hand is available in the AV information. My patch converts the RXE code
to use this netdev instead of rxe->ndev. With this change, RXE over vlan
works on my test system.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14 16:33:25 -04:00
Zhu Yanjun befd8d98f2 IB/rxe: change the function rxe_init_device_param type
The function rxe_init_device_param always return 0. So the function
type is changed to void.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:15 -07:00
Zhu Yanjun 31f1bd14cb IB/rxe: remove unnecessary rxe in rxe_send
In the function rxe_send, the variable rxe is not used in it.
So it should be removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:14 -07:00
Zhu Yanjun 86af617641 IB/rxe: remove unnecessary skb_clone
In send_atomic_ack function, it is not necessary to make a
skb_clone. To gain better performance (high throughput and
low latency), this skb_clone is removed.

The following tests are made.

 server                       client
---------                    ---------
|1.1.1.1|<----rxe-channel--->|1.1.1.2|
---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512

The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server
and client.

This test runs for several hours. There is no memory leak and the whole
system can work well.

Based on the above network, the following tests are made.

Server: ibv_rc_pingpong -d rxe0 -g 1
Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1

The test results on Server(10 tests are made).
Before:
Throughput is 137.07 Mbit/sec
Latency is 517.76 usec/iter

After:
Throughput is 148.85 Mbit/sec
Latency is 476.64 usec/iter

The throughput is enhanced and the latency is reduced.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07 15:56:14 -07:00
Bart Van Assche a6544a624c RDMA/rxe: Fix an out-of-bounds read
This patch avoids that KASAN reports the following when the SRP initiator
calls srp_post_send():

==================================================================
BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x5c4/0x980 [rdma_rxe]
Read of size 8 at addr ffff880066606e30 by task 02-mq/1074

CPU: 2 PID: 1074 Comm: 02-mq Not tainted 4.16.0-rc3-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc7
print_address_description+0x65/0x270
kasan_report+0x231/0x350
rxe_post_send+0x5c4/0x980 [rdma_rxe]
srp_post_send.isra.16+0x149/0x190 [ib_srp]
srp_queuecommand+0x94d/0x1670 [ib_srp]
scsi_dispatch_cmd+0x1c2/0x550 [scsi_mod]
scsi_queue_rq+0x843/0xa70 [scsi_mod]
blk_mq_dispatch_rq_list+0x143/0xac0
blk_mq_do_dispatch_ctx+0x1c5/0x260
blk_mq_sched_dispatch_requests+0x2bf/0x2f0
__blk_mq_run_hw_queue+0xdb/0x160
__blk_mq_delay_run_hw_queue+0xba/0x100
blk_mq_run_hw_queue+0xf2/0x190
blk_mq_sched_insert_request+0x163/0x2f0
blk_execute_rq+0xb0/0x130
scsi_execute+0x14e/0x260 [scsi_mod]
scsi_probe_and_add_lun+0x366/0x13d0 [scsi_mod]
__scsi_scan_target+0x18a/0x810 [scsi_mod]
scsi_scan_target+0x11e/0x130 [scsi_mod]
srp_create_target+0x1522/0x19e0 [ib_srp]
kernfs_fop_write+0x180/0x210
__vfs_write+0xb1/0x2e0
vfs_write+0xf6/0x250
SyS_write+0x99/0x110
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

The buggy address belongs to the page:
page:ffffea0001998180 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880066606d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1
ffff880066606d80: f1 00 f2 f2 f2 f2 f2 f2 f2 00 00 f2 f2 f2 f2 f2
>ffff880066606e00: f2 00 00 00 00 00 f2 f2 f2 f3 f3 f3 f3 00 00 00
                                    ^
ffff880066606e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880066606f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06 16:00:51 -07:00
Hernán Gonzalez c33bab622d IB/rxe: Remove unused variable (char *rxe_qp_state_name[])
Note: This is compile only tested as I have no access to the hw.  This
variable was not used anywhere in the code. Removing it saves 24 bytes.

add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-24 (-24)
Function                                     old     new   delta
rxe_qp_state_name                             24       -     -24
Total: Before=3348732, After=3348708, chg -0.00%

Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 13:57:40 -07:00
Jason Gunthorpe 7061f28d8a rxe: Do not use 'struct sockaddr' in a uapi header
Linux has two 'linux/socket.h' files - and only the one in the kernel
defines struct sockaddr - the user space one does not.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-14 16:31:35 -07:00
Zhu Yanjun 316663c6bf IB/rxe: remove redudant parameter in rxe_av_fill_ip_info
In the function rxe_av_fill_ip_info, the parameter rxe is not used.
So it is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun 45a290f8f6 IB/rxe: change the function rxe_av_fill_ip_info to void
The function rxe_av_fill_ip_info always returns 0. So the function
type is changed to void.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun a402dc445d IB/rxe: change the function to void from int
Since the function rxe_av_to_attr always return 0, the function
type is changed to void.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun 9c96f3d4dd IB/rxe: remove unnecessary parameter in rxe_av_to_attr
In the function rxe_av_to_attr, the parameter rxe is
not used. So it is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun ca3d9feee6 IB/rxe: change the function to void from int
The function rxe_av_from_attr always return 0. So change the
function to void.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Zhu Yanjun 1a241db19c IB/rxe: remove redudant parameter in function
In the function rxe_av_from_attr, the parameter rxe
is not used. So it is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31 16:32:10 -05:00
Jason Gunthorpe 0812ed1321 IB/rxe: Change RDMA_RXE kconfig to use select
NET_UDP_TUNNEL is not user selectable, so it should be used as a select
in kconfig.

CRYPTO_CRC32 is a required library for RDMA_RXE so it should active
automatically, as most other CRYPTO_ users do.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29 12:58:00 -07:00
Bart Van Assche bb3ffb7ad4 RDMA/rxe: Fix rxe_qp_cleanup()
rxe_qp_cleanup() can sleep so it must be run in thread context and
not in atomic context. This patch avoids that the following bug is
triggered:

Kernel BUG at 00000000560033f3 [verbose debug info unavailable]
BUG: sleeping function called from invalid context at net/core/sock.c:2761
in_atomic(): 1, irqs_disabled(): 0, pid: 7, name: ksoftirqd/0
INFO: lockdep is turned off.
Preemption disabled at:
[<00000000b6e69628>] __do_softirq+0x4e/0x540
CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.15.0-rc7-dbg+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x85/0xbf
 ___might_sleep+0x177/0x260
 lock_sock_nested+0x1d/0x90
 inet_shutdown+0x2e/0xd0
 rxe_qp_cleanup+0x107/0x140 [rdma_rxe]
 rxe_elem_release+0x18/0x80 [rdma_rxe]
 rxe_requester+0x1cf/0x11b0 [rdma_rxe]
 rxe_do_task+0x78/0xf0 [rdma_rxe]
 tasklet_action+0x99/0x270
 __do_softirq+0xc0/0x540
 run_ksoftirqd+0x1c/0x70
 smpboot_thread_fn+0x1be/0x270
 kthread+0x117/0x130
 ret_from_fork+0x24/0x30

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:20 -05:00
Bart Van Assche 65567e4121 RDMA/rxe: Fix a race condition in rxe_requester()
The rxe driver works as follows:
* The send queue, receive queue and completion queues are implemented as
  circular buffers.
* ib_post_send() and ib_post_recv() calls are serialized through a spinlock.
* Removing elements from various queues happens from tasklet
  context. Tasklets are guaranteed to run on at most one CPU. This serializes
  access to these queues. See also rxe_completer(), rxe_requester() and
  rxe_responder().
* rxe_completer() processes the skbs queued onto qp->resp_pkts.
* rxe_requester() handles the send queue (qp->sq.queue).
* rxe_responder() processes the skbs queued onto qp->req_pkts.

Since rxe_drain_req_pkts() processes qp->req_pkts, calling
rxe_drain_req_pkts() from rxe_requester() is racy. Hence this patch.

Reported-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:19 -05:00
Jason Gunthorpe c966ea12c0 RDMA: Mark imm_data as be32 in the verbs uapi header
This matches what the userspace copy of this header has been doing
for a while. imm_data is an opaque 4 byte array carried over the network,
and invalidate_rkey is in CPU byte order.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Bart Van Assche 6f301e06de RDMA/rxe: Fix a race condition related to the QP error state
The following sequence:
* Change queue pair state into IB_QPS_ERR.
* Post a work request on the queue pair.

Triggers the following race condition in the rdma_rxe driver:
* rxe_qp_error() triggers an asynchronous call of rxe_completer(), the function
  that examines the QP send queue.
* rxe_post_send() posts a work request on the QP send queue.

If rxe_completer() runs prior to rxe_post_send(), it will drain the send
queue and the driver will assume no further action is necessary.
However, once we post the send to the send queue, because the queue is
in error, no send completion will ever happen and the send will get
stuck.  In order to process the send, we need to make sure that
rxe_completer() gets run after a send is posted to a queue pair in an
error state.  This patch ensures that happens.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: <stable@vger.kernel.org> # v4.8
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-10 16:57:27 -05:00
Zhu Yanjun 5793b46521 IB/rxe: remove unnecessary skb_clone in xmit
In xmit, there is a skb_clone. This function copies the struct sk_buff.
And some parameters are changed to the new skb. Then the new skb is sent
while the old skb is freed.

While the function skb_clone is removed, the parameter changes are made on
the old skb, then the old skb is sent. It can also work well.

The following tests are made.

 server                       client
---------                    ---------
|1.1.1.1|<----rxe-channel--->|1.1.1.2|
---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512

The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server
and client.

This test runs for several hours. There is no memory leak and the whole
system can work well.

As the above network, the following tests are made.

Server: ibv_rc_pingpong -d rxe0 -g 1
Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1

The result on Server.
Before:
8192000 bytes in 0.88 seconds = 74.36 Mbit/sec
1000 iters in 0.88 seconds = 881.30 usec/iter

After:
8192000 bytes in 0.81 seconds = 81.15 Mbit/sec
1000 iters in 0.81 seconds = 807.62 usec/iter

The throughput is enhanced and the latency is reduced.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-08 17:43:47 -05:00
Zhu Yanjun 91eab79653 IB/rxe: add the static type to the variable
The variable recv_sockets is only used in the file rxe_net.c. So
it is better to add static type to it.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-08 17:43:06 -05:00
Leon Romanovsky 9ef77bd760 RDMA/rxe: Remove useless EXPORT_SYMBOL
The RXE driver is standalone module and hence doesn't need to export
symbols, nor does this one line function deserve to be not inlined.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02 13:36:57 -07:00
Parav Pandit fe78acf95f IB/rxe: Avoid passing unused index pointer which is optional
While searching for GID, returned index is not used, so avoid passing
pointer during invocation.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:08 -07:00
Linus Torvalds ad0835a930 Updates for 4.15 kernel merge window
- Add iWARP support to qedr driver
 - Lots of misc fixes across subsystem
 - Multiple update series to hns roce driver
 - Multiple update series to hfi1 driver
 - Updates to vnic driver
 - Add kref to wait struct in cxgb4 driver
 - Updates to i40iw driver
 - Mellanox shared pull request
 - timer_setup changes
 - massive cleanup series from Bart Van Assche
 - Two series of SRP/SRPT changes from Bart Van Assche
 - Core updates from Mellanox
 - i40iw updates
 - IPoIB updates
 - mlx5 updates
 - mlx4 updates
 - hns updates
 - bnxt_re fixes
 - PCI write padding support
 - Sparse/Smatch/warning cleanups/fixes
 - CQ moderation support
 - SRQ support in vmw_pvrdma
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDF9JAAoJELgmozMOVy/dDXUP/i92g+G4OJ+4hHMh4KCjQMHT
 eMr/w9l1C033HrtsU1afPhqHOsKSxwCuJSiTgN4uXIm67/2kPK5Vlx+ir7mbOLwB
 3ukVK6Q/aFdigWCUhIaJSlDpjbd2sEj7JwKtM3rucvMWJlBJ4mAbcVQVfU96CCsv
 V9mO7dpR3QtYWDId9DukfnAfPUPFa3SMZnD7tdl6mKNRg/MjWGYLAL4nJoBfex5f
 b4o+MTrbuFWXYsfDru1m9BpHgyul20ldfcnbe8C/sVOQmOgkX7ngD5Sdi1FLeRJP
 GF/DnAqInC9N7cAxZHx4kH9x6mLMmEdfnwQ9VTVqGUHBsj3H4hQTVIAFfHUhWUbG
 TP5ZHgZG2CewZ0rf092cWlDZwp6n0BalnbQJr+QN4MzPmYbofs3AccSKUwrle+e+
 E6yYf4XxJdt7wRr4F1QKygtUEXSnNkNYUDQ4ZFbpJS/D4Sq80R1ZV/WZ7PJxm1D/
 EIKoi7NU9cbPMIlbCzn8kzgfjS7Pe4p0WW/Xxc/IYmACzpwNPkZuFGSND79ksIpF
 jhHqwZsOWFuXISjvcR4loc8wW6a5w5vjOiX0lLVz0NSdXSzVqav/2at7ZLDx/PT+
 Lh9YVL51akA3hiD+3X6iOhfOUu6kskjT9HijE5T8rJnf0V+C6AtIRpwrQ7ONmjJm
 3JMrjjLxtCIvpUyzCvDW
 =A1oL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a fairly plain pull request. Lots of driver updates across the
  stack, a huge number of static analysis cleanups including a close to
  50 patch series from Bart Van Assche, and a number of new features
  inside the stack such as general CQ moderation support.

  Nothing really stands out, but there might be a few conflicts as you
  take things in. In particular, the cleanups touched some of the same
  lines as the new timer_setup changes.

  Everything in this pull request has been through 0day and at least two
  days of linux-next (since Stephen doesn't necessarily flag new
  errors/warnings until day2). A few more items (about 30 patches) from
  Intel and Mellanox showed up on the list on Tuesday. I've excluded
  those from this pull request, and I'm sure some of them qualify as
  fixes suitable to send any time, but I still have to review them
  fully. If they contain mostly fixes and little or no new development,
  then I will probably send them through by the end of the week just to
  get them out of the way.

  There was a break in my acceptance of patches which coincides with the
  computer problems I had, and then when I got things mostly back under
  control I had a backlog of patches to process, which I did mostly last
  Friday and Monday. So there is a larger number of patches processed in
  that timeframe than I was striving for.

  Summary:
   - Add iWARP support to qedr driver
   - Lots of misc fixes across subsystem
   - Multiple update series to hns roce driver
   - Multiple update series to hfi1 driver
   - Updates to vnic driver
   - Add kref to wait struct in cxgb4 driver
   - Updates to i40iw driver
   - Mellanox shared pull request
   - timer_setup changes
   - massive cleanup series from Bart Van Assche
   - Two series of SRP/SRPT changes from Bart Van Assche
   - Core updates from Mellanox
   - i40iw updates
   - IPoIB updates
   - mlx5 updates
   - mlx4 updates
   - hns updates
   - bnxt_re fixes
   - PCI write padding support
   - Sparse/Smatch/warning cleanups/fixes
   - CQ moderation support
   - SRQ support in vmw_pvrdma"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
  RDMA/core: Rename kernel modify_cq to better describe its usage
  IB/mlx5: Add CQ moderation capability to query_device
  IB/mlx4: Add CQ moderation capability to query_device
  IB/uverbs: Add CQ moderation capability to query_device
  IB/mlx5: Exposing modify CQ callback to uverbs layer
  IB/mlx4: Exposing modify CQ callback to uverbs layer
  IB/uverbs: Allow CQ moderation with modify CQ
  iw_cxgb4: atomically flush the qp
  iw_cxgb4: only call the cq comp_handler when the cq is armed
  iw_cxgb4: Fix possible circular dependency locking warning
  RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
  IB/core: Only maintain real QPs in the security lists
  IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
  RDMA/core: Make function rdma_copy_addr return void
  RDMA/vmw_pvrdma: Add shared receive queue support
  RDMA/core: avoid uninitialized variable warning in create_udata
  RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
  RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
  RDMA/bnxt_re: Set QP state in case of response completion errors
  RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
  ...
2017-11-15 14:54:53 -08:00
Thomas Bogendoerfer 3192c53e5a IB/rxe: don't crash, if allocation of crc algorithm failed
Following crash happens, if crc algorithm couldn't be allocated:

[ 1087.989072] rdma_rxe: loaded
[ 1097.855397] PCLMULQDQ-NI instructions are not detected.
[ 1097.901220] rdma_rxe: failed to allocate crc algorithmi err:-2
[ 1097.901248] BUG: unable to handle kernel
[ 1097.901249] NULL pointer dereference
[ 1097.901250]  at 0000000000000046
[...]

Reason is that rxe->tfm is assigned the error return, which will then
be used for crypto_free_shash() in rxe_cleanup. Fix by using a
temporary variable and assigning it rxe->tfm after allocation succeeded.

Fixes: cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:43:50 -05:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Kees Cook 3bfbea7473 IB/rxe: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Moni Shoua <monis@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 15:24:49 -04:00
Bart Van Assche ea6ee93b40 RDMA/rxe: Suppress gcc 7 fall-through complaints
Avoid that gcc 7 reports the following warning when building with W=1:

warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:07 -04:00
Doug Ledford 6b9f8970cd IB/rxe: put the pool on allocation failure
If the allocation of elem fails, it is not sufficient to simply check
for NULL and return.  We need to also put our reference on the pool or
else we will leave the pool with a permanent ref count and we will never
be able to free it.

Fixes: 4831ca9e4a ("IB/rxe: check for allocation failure on elem")
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-09 12:10:41 -04:00
Colin Ian King 4831ca9e4a IB/rxe: check for allocation failure on elem
The allocation for elem may fail (especially because we're using
GFP_ATOMIC) so best to check for a null return.  This fixes a potential
null pointer dereference when assigning elem->pool.

Detected by CoverityScan CID#1357507 ("Dereference null return value")

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-09 09:03:16 -04:00
Andrew Boyer 5c50f1d18f IB/rxe: Handle NETDEV_CHANGE events
Without this fix, ports configured on top of ixgbe miss link up
notifications. ibv_query_port() will continue to return IBV_PORT_DOWN even
though the port is up and working.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:36 -04:00
Andrew Boyer 13eb1e21d6 IB/rxe: Avoid ICRC errors by copying into the skb first
The current process is to first calculate the CRC and then copy the client
data into the packet. This leaves a window in which the packet contents and
CRC can get out of sync, if the client changes the data after the CRC is
calculated but before the data is copied.

By copying the data into the packet and then calculating the CRC directly
from the packet contents we eliminate the window.

This can be seen with qperf's ud_bi_bw test. This seems like very
strange/reckless client behavior, but whether the client has mangled its
data or not RXE should be able to transfer it reliably.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:36 -04:00
Andrew Boyer 1223a1af75 IB/rxe: Another fix for broken receive queue draining
This fixes another path in rxe_requester() that might overlook stale SKBs,
preventing cleanup.

Fixes: 1217197142 ("rxe: fix broken receive queue draining")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:35 -04:00
Andrew Boyer 2418adaed1 IB/rxe: Remove unneeded initialization in prepare6()
Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:35 -04:00
Andrew Boyer 825a51a4af IB/rxe: Fix up rxe_qp_cleanup()
Replace sk_dst_get()/dst_release() in rxe_qp_cleanup() with sk_dst_reset().
sk_dst_get() takes a new reference on dst, so the dst_release() doesn't
actually release the original reference, which was the design intent.

Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:34 -04:00
Andrew Boyer 48c22be4ab IB/rxe: Add dst_clone() in prepare_ipv6_hdr()
Otherwise the reference count goes negative as IPv6 packets complete.

Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:34 -04:00
Andrew Boyer b9109b7ddb IB/rxe: Fix destination cache for IPv6
To successfully match an IPv6 path, the path cookie must match. Store it
in the QP so that the IPv6 path can be reused.

Replace open-coded version of dst_check() with the actual call, fixing the
logic. The open-coded version skips the check call if dst->obsolete is 0
(DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE
means that the route may continue to be used, though.

Fixes: 4ed6ad1eb3 ("IB/rxe: Cache dst in QP instead of getting it...")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:33 -04:00
Andrew Boyer d45d29567f IB/rxe: Fix up the responder's find_resources() function
The resource array is sized by max_dest_rd_atomic, not max_rd_atomic.
Iterating over max_rd_atomic entries of qp->resp.resources[] will cause
incorrect behavior when the two attributes are different (or even
crash if max_rd_atomic is larger).

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:33 -04:00
Andrew Boyer cffec53daf IB/rxe: Remove dangling prototype
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:32 -04:00
Andrew Boyer bfc3ae0566 IB/rxe: Disable completion upcalls when a CQ is destroyed
This prevents the stack from accessing userspace objects while they
are being torn down.

One possible sequence of events:
 - Userspace program exits
 - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(),
   ib_destroy_cq(), etc. and releasing/freeing the UCQ
   - The QP still has tasklets running, so it isn't destroyed yet
   - The CQ is referenced by the QP, so the CQ isn't destroyed yet
   - The UCQ is kfree()'d anyway
 - A send work request completes
 - rxe_send_complete() calls cq->ibcq.comp_handler()
 - ib_uverbs_comp_handler() runs and crashes; the event queue is checked
   for is_closed, but it has no way to check the ib_ucq_object before
   accessing it

The reference counting on the CQ doesn't protect against this since the CQ
hasn't been destroyed yet.
There's no available interface to deregister the UCQ from the CQ, and it
didn't appear that attempting to add reference counting to the UCQ was
going to be a good way to go since this solution is much simpler.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:32 -04:00
Andrew Boyer 9eb7f8e44d IB/rxe: Move refcounting earlier in rxe_send()
The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the
packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add
the QP ref before output to avoid extra dereferences during network
congestion. This could lead to unwanted destruction of the QP.

Fix up the skb_out accounting, too.

Fixes: fda85ce912 ("IB/rxe: Fix kernel panic from skb destructor")
Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:31 -04:00
Kamal Heib fab773cb51 IB/rxe: Make rxe_counter_name static
rxe_counter_name is used in rxe_hw_counters.c only. Make it static.

Fixes: 0b1e5b99a4 ('IB/rxe: Add port protocol stats')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:44:48 -04:00
Doug Ledford d3cf4d9915 Merge branch 'misc' into k.o/for-next
Conflicts:
	drivers/infiniband/core/iwcm.c - The rdma_netlink patches in
	HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM,
	we aren't safe for that context) touched the same code.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:10:23 -04:00
Yuval Shaia 660b1de13c IB/rxe: Remove unneeded check
Port validation is performed in ib_core, no need to duplicate it here.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:01:10 -04:00
Yuval Shaia 8b62cbd13a IB/rxe: Convert pr_info to pr_warn
This message is warning so let's print it accordingly.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:01:09 -04:00
Doug Ledford a5f66725c7 Merge branch 'misc' into k.o/for-next 2017-07-27 09:00:38 -04:00
Leon Romanovsky e1267b0124 RDMA: Remove useless MODULE_VERSION
All modules in drivers/infiniband defined and used MODULE_VERSION, which
was pointless because the kernel version describes their state more accurate
then those arbitrary numbers.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sagi Grimbrg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Yuval Shaia d41861942f IB/core: Add generic function to extract IB speed from netdev
Logic of retrieving netdev speed from net_device and translating it to
IB speed is implemented in rxe, in usnic and in bnxt drivers.

Define new function which merges all.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Kamal Heib b4fbec9673 IB/rxe: Constify static rxe_vm_ops
Constify static rxe_vm_ops that is never modified.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib 61013828f6 IB/rxe: Use __func__ to print function's name
Its better to use __func__ to print functions name instead of writing
the name in the print statement.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib c05d26647b IB/rxe: Use DEVICE_ATTR_RO macro to show parent field
Use DEVICE_ATTR RO() macro and rename the show function accordingly.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib c498e82e3c IB/rxe: Prefer 'unsigned int' to bare use of 'unsigned'
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Kamal Heib 3363828758 IB/rxe: Use "foo *bar" instead of "foo * bar"
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:43:12 -04:00
Vijay Immanuel 1217197142 rxe: fix broken receive queue draining
If we modified the qp to ERROR state, and
drained the recieve queue, post_recv must
trigger the responder task to complete
the drain work request.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
yonatanc 56012e1cad IB/rxe: Set dma_mask and coherent_dma_mask
The RXE coupled with dummy device causes to the kernel panic attached
below.  The panic happens when ib_register_device tries to set dma_mask
by accessing a NULLed parent device.

The RXE does not actually use DMA, so we can set the dma_mask
to architecture value.

[16240.199689] RIP: 0010:ib_register_device+0x468/0x5a0 [ib_core]
[16240.205289] RSP: 0018:ffffc9000220fc10 EFLAGS: 00010246
[16240.209909] RAX: 0000000000000024 RBX: ffff880220d1a2a8 RCX: 0000000000000000
[16240.212244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
[16240.214385] RBP: ffffc9000220fcb0 R08: 0000000000000000 R09: 000000000000023f
[16240.254465] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000
[16240.259467] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880220d1a2a8
[16240.263314] FS:  00007fd8ecca0740(0000) GS:ffff8802364c0000(0000) knlGS:0000000000000000
[16240.267292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16240.273503] CR2: 0000000000000218 CR3: 00000002253ba000 CR4: 00000000000006e0
[16240.277066] Call Trace:
[16240.281836]  ? __kmalloc+0x26f/0x280
[16240.286596]  rxe_register_device+0x297/0x300 [rdma_rxe]
[16240.291377]  rxe_add+0x535/0x5b0 [rdma_rxe]
[16240.297586]  rxe_net_add+0x3e/0xc0 [rdma_rxe]
[16240.302375]  rxe_param_set_add+0x65/0x144 [rdma_rxe]
[16240.307769]  param_attr_store+0x68/0xd0
[16240.311640]  module_attr_store+0x1d/0x30
[16240.316421]  sysfs_kf_write+0x3a/0x50
[16240.317802]  kernfs_fop_write+0xff/0x180
[16240.322989]  __vfs_write+0x37/0x140
[16240.328164]  ? handle_mm_fault+0xce/0x240
[16240.333340]  vfs_write+0xb2/0x1b0
[16240.335013]  SyS_write+0x55/0xc0
[16240.340632]  entry_SYSCALL_64_fastpath+0x1a/0xa9

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:27 -04:00
Yonatan Cohen fda85ce912 IB/rxe: Fix kernel panic from skb destructor
In the time between rxe_send has finished and skb destructor
called, the QP's ref count might be 0, leading to a possible
QP destruction. This will lead to a kernel panic when the destructor
dereferences the QP.

The operation of incrementing QP ref count at rxe_send and decrementing
from skb destructor will prevent this crash.

BUG: unable to handle kernel NULL pointer dereference at 000000000000072c
IP: [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
PGD 0 [16240.211178]
Oops: 0002 [#1] SMP
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           OE   4.9.0-mlnx #1
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
task: ffff88042d6b1480 task.stack: ffffc90001904000
RIP: 0010:[<ffffffffa05df765>]  [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
RSP: 0018:ffff88043fcc3df0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880429684700 RCX: ffff88042d248200
RDX: 00000000ffffffff RSI: 00000000fffffe01 RDI: ffff880429684700
RBP: ffff88043fcc3e00 R08: ffff88043fcda240 R09: 00000000ff2d1de6
R10: 0000000000000000 R11: 00000000f49cf6fe R12: ffff880429684700
R13: ffffffff81893f96 R14: ffffffff817d66f0 R15: ffff880427f74200
FS:  0000000000000000(0000) GS:ffff88043fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000072c CR3: 000000041d3df000 CR4: 00000000000006e0
Stack:
 ffffffff817b29cf ffff880429684700 ffff88043fcc3e18 ffffffff817b42c2
 ffff880429684700 ffff88043fcc3e40 ffffffff817b4332 ffff880429684700
 ffff880427f74238 ffff880427f74228 ffff88043fcc3e58 ffffffff81893f96
Call Trace:
 <IRQ> [16240.336345]  [<ffffffff817b29cf>] ? skb_release_head_state+0x4f/0xb0
 [<ffffffff817b42c2>] skb_release_all+0x12/0x30
 [<ffffffff817b4332>] kfree_skb+0x32/0x90
 [<ffffffff81893f96>] ndisc_error_report+0x36/0x40
 [<ffffffff817d4de1>] neigh_invalidate+0x81/0xf0
 [<ffffffff817d68f7>] neigh_timer_handler+0x207/0x2b0
 [<ffffffff81109295>] call_timer_fn+0x35/0x120
 [<ffffffff81109db7>] run_timer_softirq+0x1d7/0x460
 [<ffffffff8106155e>] ? kvm_sched_clock_read+0x1e/0x30
 [<ffffffff810366b9>] ? sched_clock+0x9/0x10
 [<ffffffff810cfed2>] ? sched_clock_cpu+0x72/0xa0
 [<ffffffff818dd537>] __do_softirq+0xd7/0x289
 [<ffffffff810a6c95>] irq_exit+0xb5/0xc0
 [<ffffffff818dd372>] smp_apic_timer_interrupt+0x42/0x50
 [<ffffffff818dc682>] apic_timer_interrupt+0x82/0x90
 <EOI> [16240.395776]  [<ffffffff818da156>] ? native_safe_halt+0x6/0x10
 [<ffffffff818d9e6e>] default_idle+0x1e/0xd0
 [<ffffffff8103797f>] arch_cpu_idle+0xf/0x20
 [<ffffffff818da2c5>] default_idle_call+0x35/0x40
 [<ffffffff810e3eb5>] cpu_startup_entry+0x185/0x210
 [<ffffffff81050433>] start_secondary+0x103/0x130
RIP  [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17 21:21:26 -04:00
Kees Cook 4c93496f18 IB/rxe: do not copy extra stack memory to skb
This fixes a over-read condition detected by FORTIFY_SOURCE for this
line:

	memcpy(SKB_TO_PKT(skb), &ack_pkt, sizeof(skb->cb));

The error was:

  In file included from ./include/linux/bitmap.h:8:0,
                   from ./include/linux/cpumask.h:11,
                   from ./include/linux/mm_types_task.h:13,
                   from ./include/linux/mm_types.h:4,
                   from ./include/linux/kmemcheck.h:4,
                   from ./include/linux/skbuff.h:18,
                   from drivers/infiniband/sw/rxe/rxe_resp.c:34:
  In function 'memcpy',
      inlined from 'send_atomic_ack.constprop' at drivers/infiniband/sw/rxe/rxe_resp.c:998:2,
      inlined from 'acknowledge' at drivers/infiniband/sw/rxe/rxe_resp.c:1026:3,
      inlined from 'rxe_responder' at drivers/infiniband/sw/rxe/rxe_resp.c:1286:10:
  ./include/linux/string.h:309:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
      __read_overflow2();

Daniel Micay noted that struct rxe_pkt_info is 32 bytes on 32-bit
architectures, but skb->cb is still 64.  The memcpy() over-reads 32
bytes.  This fixes it by zeroing the unused bytes in skb->cb.

Link: http://lkml.kernel.org/r/1497903987-21002-5-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Moni Shoua <monis@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Linus Torvalds 51ce5f3329 Fixes #2 for 4.12-rc
- A fix for fix eea40b8f62 ("infiniband: call ipv6 route lookup via the
   stub interface")
 - 6 patches against bnxt_re...the first two are considerably larger than
   I would like, but as they address real issues I went ahead and
   submitted them (it also helped that a good deal of the churn was
   removing code repeated in multiple places and consolidating it to one
   common function)
 - 2 fixes against qedr that just came in
 - 1 fix against rxe that took a few revisions to get right plus time to
   get the proper reviews
 - 5 late breaking IPoIB fixes
 - 1 late cxgb4 fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZQY2UAAoJELgmozMOVy/d/goP/1uNXbrSG7CfdOykvGdm4lbN
 rmal4D0ChqivDsEll+eEaCBPBFZnQXbK/n5Fu/R4+qvFCWPqxAV0kMckKYsEMYNK
 Q9zh6h3jxeBA8ms5D6MeZt31tzMG4VQjjs0LvbIz1S0wHbHuNd2dPCMwxKNOmm0H
 eTJQ7SAssYnHJmWnQI0R6QZPfAcOOT2HgLsBdvq8apIk7tRRzflQ5YQWZE1X4cv0
 iYshMgmkSR9vgr83YZwYQYd6Uc/yUDj87FrBR7ELUyE9Dr0tMoprtM3fekw78uLE
 YBiIMb6NwUH+2q5bxPuaq0DulEsdEdMPdtNOY/VRSduqCkdlGPS0KMvLzHcm701b
 Ks6gQQMvpb5lNUPasmGkowz7gwTrIniKu9nZUcJ/4BZRH4ax6hrZh3yj/Kx4GuxF
 hp+KZPokytLfNEhuJKgrIQx6r2kfMJKr6eTDEAVp8dZ54bymWQgZhoQ8lLFF2isQ
 MuCM2p7G/gauu5WrMZBc4vjvWaHAfV60KrCwxJQwQIBZwTRwA619nuDi3McED1R8
 YS3pUxu3QJRaZWqI4vS10CUTxAz/4NVHLjE2rkDJAMMKRfBu94CTU/DklQ6jejKr
 njvK5uryjTaJWN8F/EKRs85NhA/yqZpb8rDhoWn3ri7eyHdj9SsEw0RpVIorYiqT
 zsk6SIT9biA5bs46kQ6P
 =blNV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:

 "I had thought at the time of the last pull request that there wouldn't
  be much more to go, but several things just kept trickling in over the
  last week.

  Instead of just the six patches to bnxt_re that I had anticipated,
  there are another five IPoIB patches, two qedr patches, and a few
  other miscellaneous patches.

  The bnxt_re patches are more lines of diff than I like to submit this
  late in the game. That's mostly because of the first two patches in
  the series of six. I almost dropped them just because of the lines of
  churn, but on a close review, a lot of the churn came from removing
  duplicated code sections and consolidating them into callable
  routines. I felt like this made the number of lines of change more
  acceptable, and they address problems, so I left them. The remainder
  of the patches are all small, well contained, and well understood.

  These have passed 0day testing, but have not been submitted to
  linux-next (but a local merge test with your current master was
  without any conflicts).

  Summary:

   - A fix for fix eea40b8f62 ("infiniband: call ipv6 route lookup via
     the stub interface")

   - Six patches against bnxt_re...the first two are considerably larger
     than I would like, but as they address real issues I went ahead and
     submitted them (it also helped that a good deal of the churn was
     removing code repeated in multiple places and consolidating it to
     one common function)

   - Two fixes against qedr that just came in

   - One fix against rxe that took a few revisions to get right plus
     time to get the proper reviews

   - Five late breaking IPoIB fixes

   - One late cxgb4 fix"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rdma/cxgb4: Fix memory leaks during module exit
  IB/ipoib: Fix memory leak in create child syscall
  IB/ipoib: Fix access to un-initialized napi struct
  IB/ipoib: Delete napi in device uninit default
  IB/ipoib: Limit call to free rdma_netdev for capable devices
  IB/ipoib: Fix memory leaks for child interfaces priv
  rxe: Fix a sleep-in-atomic bug in post_one_send
  RDMA/qedr: Add 64KB PAGE_SIZE support to user-space queues
  RDMA/qedr: Initialize byte_len in WC of READ and SEND commands
  RDMA/bnxt_re: Remove FMR support
  RDMA/bnxt_re: Fix RQE posting logic
  RDMA/bnxt_re: Add HW workaround for avoiding stall for UD QPs
  RDMA/bnxt_re: Dereg MR in FW before freeing the fast_reg_page_list
  RDMA/bnxt_re: HW workarounds for handling specific conditions
  RDMA/bnxt_re: Fixing the Control path command and response handling
  IB/addr: Fix setting source address in addr6_resolve()
2017-06-16 17:38:23 +09:00
Jia-Ju Bai 07d432bb97 rxe: Fix a sleep-in-atomic bug in post_one_send
The driver may sleep under a spin lock, and the function call path is:
post_one_send (acquire the lock by spin_lock_irqsave)
  init_send_wqe
    copy_from_user --> may sleep

There is no flow that makes "qp->is_user" true, and copy_from_user may
cause bug when a non-user pointer is used. So the lines of copy_from_user
and check of "qp->is_user" are removed.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-14 13:02:01 -04:00
David Miller d41519a69b crypto: Work around deallocated stack frame reference gcc bug on sparc.
On sparc, if we have an alloca() like situation, as is the case with
SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
memory.  The result can be that the value is clobbered if a trap
or interrupt arrives at just the right instruction.

It only occurs if the function ends returning a value from that
alloca() area and that value can be placed into the return value
register using a single instruction.

For example, in lib/libcrc32c.c:crc32c() we end up with a return
sequence like:

        return  %i7+8
         lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],

%o5 holds the base of the on-stack area allocated for the shash
descriptor.  But the return released the stack frame and the
register window.

So if an intererupt arrives between 'return' and 'lduw', then
the value read at %o5+16 can be corrupted.

Add a data compiler barrier to work around this problem.  This is
exactly what the gcc fix will end up doing as well, and it absolutely
should not change the code generated for other cpus (unless gcc
on them has the same bug :-)

With crucial insight from Eric Sandeen.

Cc: <stable@vger.kernel.org>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-08 17:36:03 +08:00
Sagi Grimberg 67cf3623e0 rxe: expose num_possible_cpus() cnum_comp_vectors
They're completely logical, so don't impose an artificial limitation.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:33:02 -04:00
Leon Romanovsky af5df5fb59 IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
Callers of rxe_mem_copy() provide pointer to store updated CRC
value. That pointer was supposed to be updated, but the
commit cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
mistakenly removed that assignment for RXE_MEM_TYPE_DMA memory type.

The code worked because there are no actual callers with
RXE_MEM_TYPE_DMA, who are interested in returned value of crcp.
The one caller in read_reply(), who uses the returned crcp didn't
set RXE_MEM_TYPE_DMA as mem->type.

Fixes: cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
Reported-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 19:31:46 -04:00
Johannes Thumshirn d52418502e IB/rxe: Don't clamp residual length to mtu
When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
header into the qp->resp.resid variable for later use. Later in check_rkey()
we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
residual length bigger than the MTU. Later in write_data_in() we subtract the
payload of the packet from the residual length. If the packet happens to have a
payload of exactly the MTU size we end up with a residual length of 0 despite
the packet not being the last in the conversation. When the next packet in the
conversation arrives, we don't have any residual length left and thus set the QP
into an error state.

This broke NVMe over Fabrics functionality over rdma_rxe.ko

The patch was verified using the following test.

 # echo eth0 > /sys/module/rdma_rxe/parameters/add
 # nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
 # mkfs.xfs -fK /dev/nvme0n1
 meta-data=/dev/nvme0n1           isize=256    agcount=4, agsize=65536 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=0        finobt=0, sparse=0
 data     =                       bsize=4096   blocks=262144, imaxpct=25
          =                       sunit=0      swidth=0 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal log           bsize=4096   blocks=2560, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
 # mount /dev/nvme0n1 /tmp/
 [  148.923263] XFS (nvme0n1): Mounting V4 Filesystem
 [  148.961196] XFS (nvme0n1): Ending clean mount
 # dd if=/dev/urandom of=test.bin bs=1M count=128
 128+0 records in
 128+0 records out
 134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
 # sha256sum test.bin
 cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  test.bin
 # cp test.bin /tmp/
 sha256sum /tmp/test.bin
 cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  /tmp/test.bin

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:42:58 -04:00
Dasaratharaman Chandramouli 44c58487d5 IB/core: Define 'ib' and 'roce' rdma_ah_attr types
rdma_ah_attr can now be either ib or roce allowing
core components to use one type or the other and also
to define attributes unique to a specific type. struct
ib_ah is also initialized with the type when its first
created. This ensures that calls such as modify_ah
dont modify the type of the address handle attribute.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli d8966fcd4c IB/core: Use rdma_ah_attr accessor functions
Modify core and driver components to use accessor functions
introduced to access individual fields of rdma_ah_attr

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli 90898850ec IB/core: Rename struct ib_ah_attr to rdma_ah_attr
This patch simply renames struct ib_ah_attr to
rdma_ah_attr as these fields specify attributes that are
not necessarily specific to IB.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Dasaratharaman Chandramouli eca7ddf965 IB/rxe: Initialize ib_ah_attr during query_ah
Zero out ib_ah_attr before calling query_ah. Set ah_flags
appropriately.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01 14:32:43 -04:00
Colin Ian King 27b0b83233 IB/rxe: fix typo: "algorithmi" -> "algorithm"
trivial fix to typo in pr_err message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28 13:07:58 -04:00
Artemy Kovalyov 3e7e1193e2 IB: Replace ib_umem page_size by page_shift
Size of pages are held by struct ib_umem in page_size field.

It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.

The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:

  ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

CC: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Steve Wise <swise@chelsio.com>
CC: Lijun Ou <oulijun@huawei.com>
CC: Shiraz Saleem <shiraz.saleem@intel.com>
CC: Adit Ranadive <aditr@vmware.com>
CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
CC: Ram Amrani <Ram.Amrani@Cavium.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-25 15:40:28 -04:00
Yuval Shaia 4d6f28591f {net,IB}/{rxe,usnic}: Utilize generic mac to eui32 function
This logic seems to be duplicated in (at least) three separate files.
Move it to one place so code can be re-use.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2017-04-25 14:21:34 -04:00
yonatanc 4ed6ad1eb3 IB/rxe: Cache dst in QP instead of getting it for each send
In RC QP there is no need to resolve the outgoing interface
for each packet, as this does not change during QP life cycle.

Instead cache the interface on the socket and use that one.
This improves performance by 12% by sparing redundant
calls to rxe_find_route.

ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100

----------------------------------------------------------------------------------------
|        | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
----------------------------------------------------------------------------------------
| before | 1048576 | 9000       | inf             | 551.21             | 0.000551      |
| after  | 1048576 | 9000       | inf             | 615.54             | 0.000616      |
----------------------------------------------------------------------------------------

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:45:02 -04:00
yonatanc cee2688e3c IB/rxe: Offload CRC calculation when possible
Use CPU ability to perform CRC calculations, by
replacing direct calls to crc32_le() with crypto_shash_updata().

The overall performance gain measured with ib_send_bw tool is 10% and it
was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU.

ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100

---------------------------------------------------------------------------------------------
|             | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
---------------------------------------------------------------------------------------------
| crc32_le    | 1048576 | 9000       | inf             | 497.60             | 0.000498      |
| CRC offload | 1048576 | 9000       | inf             | 546.70             | 0.000547      |
---------------------------------------------------------------------------------------------

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21 10:45:02 -04:00