OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Shamir Rabinovitch	e00b64f7c5	RDMA: Cleanup undesired pd->uobject usage Drivers should be using udata to determine if a method is invoked from user space or kernel space. A pd does not necessarily say a different objects is kernel or user. Transforming the tests to use udata eliminates a large number of uobject references from the drivers. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 19:15:48 -07:00
Kamal Heib	b81a327dbc	RDMA/i40iw: Make sure to initialize ib_device_ops The initialization of the ib_device_ops was dropped by mistake when rebasing the ib_device_ops series, this patch fixes that. Fixes: `15644f57cb` ("RDMA/i40iw: Initialize ib_device_ops struct") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-18 14:12:26 -07:00
Kamal Heib	3023a1e936	RDMA: Start use ib_device_ops Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-12 07:40:16 -07:00
Kamal Heib	15644f57cb	RDMA/i40iw: Initialize ib_device_ops struct Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-12-11 15:15:08 -07:00
Sagi Grimberg	759ace7832	i40iw: remove support for ib_get_vector_affinity Devices that does not use managed affinity can not export a vector affinity as the consumer relies on having a static mapping it can map to upper layer affinity (e.g. sw queues). If the driver allows the user to set the device irq affinity, then the affinitization of a long term existing entites is not relevant. For example, nvme-rdma controllers queue-irq affinitization is determined at init time so if the irq affinity changes over time, we are no longer aligned. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-11-08 14:22:53 -07:00
Parav Pandit	508a523f6b	RDMA/drivers: Use core provided API for registering device attributes Use rdma_set_device_sysfs_group() to register device attributes and simplify the driver. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-10-17 03:45:01 -06:00
Jason Gunthorpe	e349f858d2	RDMA: Fully setup the device name in ib_register_device The current code has two copies of the device name, ibdev->dev and dev_name(&ibdev->dev), and they are setup at different times, which is very confusing. Set them both up at the same time and make dev_name() the lead name, which is the proper use of the driver core APIs. To make it very clear that the name is not valid until registration pass it in to the ib_register_device() call rather than messing with ibdev->name directly. Also the reorganization now checks that dev_name is unique even if it does not contain a %. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>	2018-09-26 13:51:36 -06:00
Håkon Bugge	802fa45cd3	RDMA/i40iw: Fix incorrect iterator type Commit `f27b4746f3` ("i40iw: add connection management code") uses an incorrect rcu iterator, whilst holding the rtnl_lock. Since the critical region invokes i40iw_manage_qhash(), which is a sleeping function, the rcu locking and traversal cannot be used. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-09-19 10:08:20 -06:00
Kamal Heib	1ffba62642	RDMA/providers: Remove pointless functions The rdma core is taking care of return the right error code when the rdma device callbacks aren't supported. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:31:54 -06:00
Bart Van Assche	d34ac5cd3a	RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send\|recv\|srq_recv) really do not modify the posted work request. To make this possible, only one cast had to be introduce that casts away constness, namely in rpcrdma_post_recvs(). The only way I can think of to avoid that cast is to introduce an additional loop in that function or to change the data type of bad_wr from struct ib_recv_wr ** into int (an index that refers to an element in the work request list). However, both approaches would require even more extensive changes than this patch. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:09:34 -06:00
Arnd Bergmann	07f3355df7	infiniband: i40iw, nes: don't use wall time for TCP sequence numbers The nes infiniband driver uses current_kernel_time() to get a nanosecond granunarity timestamp to initialize its tcp sequence counters. This is one of only a few remaining users of that deprecated function, so we should try to get rid of it. Aside from using a deprecated API, there are several problems I see here: - Using a CLOCK_REALTIME based time source makes it predictable in case the time base is synchronized. - Using a coarse timestamp means it only gets updated once per jiffie, making it even more predictable in order to avoid having to access the hardware clock source - The upper 2 bits are always zero because the nanoseconds are at most 999999999. For the Linux TCP implementation, we use secure_tcp_seq(), which appears to be appropriate here as well, and solves all the above problems. i40iw uses a variant of the same code, so I do that same thing there for ipv4. Unlike nes, i40e also supports ipv6, which needs to call secure_tcpv6_seq instead. Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-11 12:10:19 -06:00
Leon Romanovsky	5d9a2b0e28	RDMA/i40w: Hold read semaphore while looking after VMA VMA lookup is supposed to be performed while mmap_sem is held. Fixes: `f26c7c8339` ("i40iw: Add 2MB page support") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-04 11:51:06 -06:00
Steve Wise	33023fb85a	IB/core: add max_send_sge and max_recv_sge attributes This patch replaces the ib_device_attr.max_sge with max_send_sge and max_recv_sge. It allows ulps to take advantage of devices that have very different send and recv sge depths. For example cxgb4 has a max_recv_sge of 4, yet a max_send_sge of 16. Splitting out these attributes allows much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW API. Consider a large RDMA WRITE that has 16 scattergather entries. With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of 16, it can be done with 1 WRITE WR. Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-06-18 13:17:28 -06:00
Shiraz Saleem	aaf5e003b1	i40iw: Reorganize acquire/release of locks in i40iw_manage_apbvt Commit `f43c00c04b` ("i40iw: Extend port reuse support for listeners") introduces a sparse warning: include/linux/spinlock.h:365:9: sparse: context imbalance in 'i40iw_manage_apbvt' - unexpected unlock Fix this by reorganizing the acquire/release of locks in i40iw_manage_apbvt and add a new function i40iw_cqp_manage_abvpt_cmd to perform the CQP command. Also, use __clear_bit and __test_and_set_bit as we do not need atomic versions. Fixes: `f43c00c04b` ("i40iw: Extend port reuse support for listeners") Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-06-18 11:09:05 -06:00
Jason Gunthorpe	0394808d9e	Merge branch 'mr_fix' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-next Update mlx4 to support user MR creation against read-only memory, previously it required the memory to be writable. Based on rdma for-rc due to dependencies. * mr_fix: (2 commits) IB/mlx4: Mark user MR as writable if actual virtual memory is writable IB/core: Make testing MR flags for writability a static inline function	2018-05-28 11:44:35 -06:00
Shiraz Saleem	f43c00c04b	i40iw: Extend port reuse support for listeners If two listeners are created with different IP's but same port, the second rdma_listen fails due to a duplicate port entry being added from the CQP add APBVT OP. commit `f16dc0aa5e` ("i40iw: Add support for port reuse on active side connections") does not account for listener side port reuse. Check for duplicate port before invoking the CQP command to add APBVT entry and delete the entry only if the port is not in use. Additionally, consolidate all port-reuse logic into i40iw_manage_apbvt. Fixes: `f16dc0aa5e` ("i40iw: Add support for port reuse on active side connections") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-05-16 13:13:20 -06:00
Andrew Boyer	43731753c4	RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint The current code sets an affinity hint with a cpumask_t stored on the stack. This value can then be accessed through /proc/irq/*/affinity_hint/, causing a segfault or returning corrupt data. Move the cpumask_t into struct i40iw_msix_vector so it is available later. Backtrace: BUG: unable to handle kernel paging request at ffffb16e600e7c90 IP: irq_affinity_hint_proc_show+0x60/0xf0 PGD 17c0c6d067 PUD 17c0c6e067 PMD 15d4a0e067 PTE 0 Oops: 0000 [#1] SMP Modules linked in: ... CPU: 3 PID: 172543 Comm: grep Tainted: G OE ... #1 Hardware name: ... task: ffff9a5caee08000 task.stack: ffffb16e659d8000 RIP: 0010:irq_affinity_hint_proc_show+0x60/0xf0 RSP: 0018:ffffb16e659dbd20 EFLAGS: 00010086 RAX: 0000000000000246 RBX: ffffb16e659dbd20 RCX: 0000000000000000 RDX: ffffb16e600e7c90 RSI: 0000000000000003 RDI: 0000000000000046 RBP: ffffb16e659dbd88 R08: 0000000000000038 R09: 0000000000000001 R10: 0000000070803079 R11: 0000000000000000 R12: ffff9a59d1d97a00 R13: ffff9a5da47a6cd8 R14: ffff9a5da47a6c00 R15: ffff9a59d1d97a00 FS: 00007f946c31d740(0000) GS:ffff9a5dc1800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb16e600e7c90 CR3: 00000016a4339000 CR4: 00000000007406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: seq_read+0x12d/0x430 ? sched_clock_cpu+0x11/0xb0 proc_reg_read+0x48/0x70 __vfs_read+0x37/0x140 ? security_file_permission+0xa0/0xc0 vfs_read+0x96/0x140 SyS_read+0x58/0xc0 do_syscall_64+0x5a/0x190 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f946bbc97e0 RSP: 002b:00007ffdd0c4ae08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000000000096b000 RCX: 00007f946bbc97e0 RDX: 000000000096b000 RSI: 00007f946a2f0000 RDI: 0000000000000004 RBP: 0000000000001000 R08: 00007f946a2ef011 R09: 000000000000000a R10: 0000000000001000 R11: 0000000000000246 R12: 00007f946a2f0000 R13: 0000000000000004 R14: 0000000000000000 R15: 00007f946a2f0000 Code: b9 08 00 00 00 49 89 c6 48 89 df 31 c0 4d 8d ae d8 00 00 00 f3 48 ab 4c 89 ef e8 6c 9a 56 00 49 8b 96 30 01 00 00 48 85 d2 74 3f <48> 8b 0a 48 89 4d 98 48 8b 4a 08 48 89 4d a0 48 8b 4a 10 48 89 RIP: irq_affinity_hint_proc_show+0x60/0xf0 RSP: ffffb16e659dbd20 CR2: ffffb16e600e7c90 Fixes: `8e06af711b` ("i40iw: add main, hdr, status") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:19 -04:00
Andrew Boyer	9f7b16afab	RDMA/i40iw: Avoid reference leaks when processing the AEQ In this switch there is a reference held on the QP. 'continue' will grab the next event without releasing the reference, causing a leak. Change it to 'break' to drop the reference before grabbing the next event. Fixes: `4e9042e647` ("i40iw: add hw and utils files") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
Andrew Boyer	a75895b1eb	RDMA/i40iw: Avoid panic when objects are being created and destroyed A panic occurs when there is a newly-registered element on the QP/CQ MR list waiting to be attached, but a different MR is deregistered. The current code only checks for whether the list is empty, not whether the element being deregistered is actually on the list. Fix the panic by adding a boolean to track if the object is on the list. Fixes: `d374984179` ("i40iw: add files for iwarp interface") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
Mustafa Ismail	eeb1af4f53	i40iw: Use correct address in dst_neigh_lookup for IPv6 Use of incorrect structure address for IPv6 neighbor lookup causes connections to IPv6 addresses to fail. Fix this by using correct address in call to dst_neigh_lookup. Fixes: `f27b4746f3` ("i40iw: add connection management code") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:39:51 -04:00
Mustafa Ismail	5a7189d529	i40iw: Fix memory leak in error path of create QP If i40iw_allocate_dma_mem fails when creating a QP, the memory allocated for the QP structure using kzalloc is not freed because iwqp->allocated_buffer is used to free the memory and it is not setup until later. Fix this by setting iwqp->allocated_buffer before allocating the dma memory. Fixes: `d374984179` ("i40iw: add files for iwarp interface") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:39:50 -04:00
Jia-Ju Bai	4e56569cee	infiniband: i40iw: Replace GFP_ATOMIC with GFP_KERNEL in i40iw_l2param_change i40iw_l2param_change() is never called in atomic context. i40iw_make_listen_node() is only set as ".l2_param_change" in struct i40e_client_ops, and this function pointer is not called in atomic context. Despite never getting called from atomic context, i40iw_l2param_change() calls kzalloc() with GFP_ATOMIC, which does not sleep for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which can sleep and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-04-17 19:57:12 -06:00
Jia-Ju Bai	f9af873014	infiniband: i40iw: Replace GFP_ATOMIC with GFP_KERNEL in i40iw_make_listen_node i40iw_make_listen_node() is never called in atomic context. i40iw_make_listen_node() is only called by i40iw_create_listen, which is set as ".create_listen" in struct iw_cm_verbs. Despite never getting called from atomic context, i40iw_make_listen_node() calls kzalloc() with GFP_ATOMIC, which does not sleep for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which can sleep and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-04-17 19:57:11 -06:00
Jia-Ju Bai	39e487faaf	infiniband: i40iw: Replace GFP_ATOMIC with GFP_KERNEL in i40iw_add_mqh_4 i40iw_add_mqh_4() is never called in atomic context, because it calls rtnl_lock() that can sleep. Despite never getting called from atomic context, i40iw_add_mqh_4() calls kzalloc() with GFP_ATOMIC, which does not sleep for allocation. GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, which can sleep and improve the possibility of sucessful allocation. This is found by a static analysis tool named DCNS written by myself. And I also manually check it. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-04-17 19:57:11 -06:00
Shiraz Saleem	3e64f8d6f5	i40iw: Remove pre-production workaround for resource profile 1 Support for resource profile 1 is currenlty deprecated due to a pre-production errata. Remove this workaround as its no longer needed. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-04-03 13:40:39 -06:00
Matan Barak	0ede73bc01	IB/uverbs: Extend uverbs_ioctl header with driver_id Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-19 14:45:17 -06:00
Henry Orosco	546b1452fd	i40iw: Tear-down connection after CQP Modify QP failure There is no explicit tear-down sequence initiated on connections if the Control QP OP, Modify QP to close, fails. Fix this by triggering a driver generated Asynchronous Event (AE) on Modify QP failures and tear-down the connection on receipt of the AE. This fix can be generalized to other Modify QP failures (i.e. RTS->TERM, IDLE->RTS, etc) as any modify failure will require a connection tear-down. Fixes: `d374984179` ("i40iw: add files for iwarp interface") Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-15 15:58:05 -06:00
Henry Orosco	a8b9234b12	i40iw: Refactor of driver generated AEs The flush CQP OP can be used to optionally generate Asynchronous Events (AEs) in addition to QP flush. Consolidate all HW AE generation code under a new function i40iw_gen_ae which use the flush CQP OP to only generate AEs. Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-15 15:58:04 -06:00
Jason Gunthorpe	9a657b4c4a	RDMA/i40iw: Move uapi header to include/uapi All of these defines are part of the uABI for the driver, this header duplicates providers/i40iw/i40iw-abi.h in rdma-core. Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-15 15:58:03 -06:00
Arnd Bergmann	baa00fcde4	RDMA/i40iw: include linux/irq.h We get a build failure on ARM unless the header is included explicitly: drivers/infiniband/hw/i40iw/i40iw_verbs.c: In function 'i40iw_get_vector_affinity': drivers/infiniband/hw/i40iw/i40iw_verbs.c:2747:9: error: implicit declaration of function 'irq_get_affinity_mask'; did you mean 'irq_create_affinity_masks'? [-Werror=implicit-function-declaration] return irq_get_affinity_mask(msix_vec->irq); ^~~~~~~~~~~~~~~~~~~~~ irq_create_affinity_masks drivers/infiniband/hw/i40iw/i40iw_verbs.c:2747:9: error: returning 'int' from a function with return type 'const struct cpumask *' makes pointer from integer without a cast [-Werror=int-conversion] return irq_get_affinity_mask(msix_vec->irq); Fixes: `7e952b19eb` ("i40iw: Implement get_vector_affinity API") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-03-14 16:09:56 -04:00
Shiraz Saleem	7e952b19eb	i40iw: Implement get_vector_affinity API Storage ULPs (like NVMEoF) benefit from exposing affinity mapping per completion vector to find the optimal multi-queue affinity assignments. The ULPs call the verbs API ib_get_vector_affinity introduced in commit `c66cd353bb` ("RDMA/core: expose affinity mappings per completion vector") to get the underlying devices affinity mappings. Add support in driver to expose the affinity masks per MSI-X completion vector. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-06 16:00:51 -07:00
Shiraz Saleem	7de8b3576a	i40iw: Improve CM node lookup time on connection setup Currently all CM nodes involved in a connection are maintained in a connected_node list per dev. During connection setup, we need to search this every time we receive a packet on the iWARP LAN Queue (ILQ) and this can be pretty inefficient for large number of connections. Fix this by organizing the CM nodes in two lists - accelerated list and non-accelerated list. The search on ILQ receive would be limited to only non accelerated nodes. When a node moves to RTS, it is added to the accelerated list. Benchmarking ucmatose 16k connections shows a 20% improvement in test completion time. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-06 16:00:51 -07:00
Mustafa Ismail	6b0c549fc6	i40iw: Refactor handling of txpend list Currently the TX pending lists for IEQ and ILQ are handled separately. The handling of both can be consolidated in i40iw_poll_completion. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-06 16:00:51 -07:00
Mustafa Ismail	f20d429511	i40iw: Free IEQ resources The iWARP Exception Queue (IEQ) resources are not freed when a QP is destroyed. Fix this by freeing IEQ resources when freeing QP resources. Fixes: `d374984179` ("i40iw: add files for iwarp interface") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-01-16 20:38:18 -07:00
Mustafa Ismail	ebb6c0c015	i40iw: Remove setting of rem_addr.len Remove setting of rem_addr.len before calling iw_rdma_write, iw_inline_rdma_write and rdma_read. rem_addr.len is not used in those functions. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-01-16 20:38:18 -07:00
Sindhu Devale	72b30e986d	i40iw: Remove limit on re-posting AEQ entries to HW Currently, if the number of processed Asynchronous Event Queue (AEQ) entries exceeds 255, they are not returned to HW for re-use. During scale-up, the unreturned AEQ entries can grow to the max AEQ size and cause the HW to report an AEQ overflow. Remove the check which limits the number of processed AEQ entries returned to HW. Fixes: `86dbcd0f12` ("RDMA/i40iw: add file to handle cqp calls") Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-01-16 20:38:18 -07:00
Shiraz Saleem	6376e926af	i40iw: Zero-out consumer key on allocate stag for FMR If the application invalidates the MR before the FMR WR, HW parses the consumer key portion of the stag and returns an invalid stag key Asynchronous Event (AE) that tears down the QP. Fix this by zeroing-out the consumer key portion of the allocated stag returned to application for FMR. Fixes: ee855d3b93f3 ("RDMA/i40iw: Add base memory management extensions") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-01-16 20:38:18 -07:00
Shiraz Saleem	23541b28e5	i40iw: Remove extra call to i40iw_est_sd() Remove redundant estimate SD function call. sd_needed should already be updated at the end of the do while resource reduction loop. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-01-16 20:38:18 -07:00
Jia-Ju Bai	106b886306	i40iw: Replace mdelay with msleep in i40iw_wait_pe_ready i40iw_wait_pe_ready is not called in an interrupt handler nor holding a spinlock. The function mdelay in it can be replaced with msleep, to reduce busy wait. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-01-05 13:47:29 -05:00
Jason Gunthorpe	76a895d9e1	Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git Patches for 4.16 that are dependent on patches sent to 4.15-rc. These are small clean ups for the vmw_pvrdma and i40iw drivers. * 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git: RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file i40iw: Change accelerated flag to bool	2017-12-27 21:50:46 -07:00
Henry Orosco	66131e005e	i40iw: Change accelerated flag to bool The accelerated flag only utilizes two values: 0 and 1. Modify accelerated flag in struct i40iw_cm_node to bool. Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-27 21:37:26 -07:00
Tatyana Nikolova	fefa06811c	i40iw: Fix the connection ORD value for loopback The accepting QP ORD value should be adjusted not to exceed the peer QP IRD value (RFC 6581). This is skipped for loopback. After the ORD is validated by i40iw_record_ird_ord(), adjust the ORD value of the loopback accepting QP to prevent overrunning the IRD space of the peer QP. Also move the ORD accounting for 0-byte RDMA read to i40iw_record_ird_ord(). Fixes: `f27b4746f3` ("i40iw: add connection management code") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:46:11 -07:00
Tatyana Nikolova	ce9ce74145	i40iw: Validate correct IRD/ORD connection parameters Casting to u16 before validating IRD/ORD connection parameters could cause recording wrong IRD/ORD values in the cm_node. Validate the IRD/ORD parameters as they are passed by the application before recording them. Fixes: `f27b4746f3` ("i40iw: add connection management code") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:39:48 -07:00
Shiraz Saleem	756f77d216	i40iw: Ignore LLP_DOUBT_REACHABILITY AE The LLP_DOUBT_REACHABILITY Asynchronous Event (AE) is an early warning of a connection issue. It is followed by LLP_TOO_MANY_RETRIES AE, if the retransmit threshold is reached and recovery is not possible for the connection. Currently we terminate the connection on receiving the LLP_DOUBT_REACHABILITY AE. Ignore this AE and terminate the connection only on LLP_TOO_MANY_RETRIES AE. This improves the user experience on cable disconnect/reconnect scenario while running iWARP traffic. On cable disconnect, the QP traffic is paused and the user has a larger and more reasonable timeout within which if the cable is reconnected, traffic can continue. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:39:21 -07:00
Shiraz Saleem	df8b13a1b2	i40iw: Fix sequence number for the first partial FPDU Partial FPDU processing is broken as the sequence number for the first partial FPDU is wrong due to incorrect Q2 buffer offset. The offset should be 64 rather than 16. Fixes: `786c6adb3a` ("i40iw: add puda code") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:39:20 -07:00
Shiraz Saleem	3020f252c3	i40iw: Selectively teardown QPs on IP addr change event On IP address change event, all connected QPs are torn down irrespective of whether IP address is involved in a connection. Only teardown connections those source or destination address matches the netdev interface IP address being changed, and if they are on the same VLAN as the netdev. Fixes: `e5e74b61b1` ("i40iw: Add IP addr handling on netdev events") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:39:07 -07:00
Shiraz Saleem	0c5d515546	i40iw: Add notifier for network device events Register a netdevice notifier for netdev UP/DOWN notification events and report the appropriate ib event. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:38:05 -07:00
Shiraz Saleem	fe99afd1fe	i40iw: Correct Q1/XF object count equation Lower Inbound RDMA Read Queue (Q1) object count by a factor of 2 as it is incorrectly doubled. Also, round up Q1 and Transmit FIFO (XF) object count to power of 2 to satisfy hardware requirement. Fixes: `86dbcd0f12` ("i40iw: add file to handle cqp calls") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:38:05 -07:00
Shiraz Saleem	8758768ad8	i40iw: Use utility function roundup_pow_of_two() Consolidate all power of 2 round calculations to use kernel utility function roundup_pow_of_two(). Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:37:51 -07:00
Shiraz Saleem	f32b766cf7	i40iw: Set MAX_IRD_SIZE to 64 Increase I40IW_MAX_IRD_SIZE to 64 which is the device limit. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2017-12-22 13:33:30 -07:00

1 2 3 4 5 ...

253 Commits