OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Kashyap Desai	830f93f470	RDMA/bnxt_re: optimize the parameters passed to helper functions Avoid passing arguments like Opcode which can be retrieved from bnxt_qplib_crsqe structure. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-18-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:54 +03:00
Kashyap Desai	bcfee4ce3e	RDMA/bnxt_re: remove redundant cmdq_bitmap cmdq_bitmap is used to derive the next available index in the CMDQ. This is not required as the we can get the next index using the existing bnxt_qplib_crsqe array. Driver will use bnxt_qplib_crsqe array and flag is_in_used to derive valid entries. is_in_used is replacement of cmdq_bitmap. There is no change in the existing mechanism of the circular buffer used to get index. Added opcode field in bnxt_qplib_crsqe array so that it is easy to map opcode associated with pending rcfw command. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-17-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:54 +03:00
Kashyap Desai	f0c875ff62	RDMA/bnxt_re: use firmware provided max request timeout Firmware provides max request timeout value as part of hwrm_ver_get API. Driver gets the timeout from firmware and if that interface is not available then fall back to hardcoded timeout value. Also, Add a helper function to check the FW status. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-16-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:54 +03:00
Kashyap Desai	a00278521c	RDMA/bnxt_re: cancel all control path command waiters upon error When an error is detected in FW, wake up all the waiters as the all of them need to be completed with timeout. Add the device error state also as a wait condition. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-15-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:51 +03:00
Kashyap Desai	bb8c93618f	RDMA/bnxt_re: consider timeout of destroy ah as success. If destroy_ah is timed out, it is likely to be destroyed by firmware but it is taking longer time due to temporary slowness in processing the rcfw command. In worst case, there might be AH resource leak in firmware. Sending timeout return value can dump warning message from ib_core which can be avoided if we map timeout of destroy_ah as success. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-14-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:17 +03:00
Kashyap Desai	84911cf3b2	RDMA/bnxt_re: post destroy_ah for delayed completion of AH creation AH create may be called from interrpt context and driver has a special timeout (8 sec) for this command. This is to avoid soft lockups when the FW command takes more time. Driver returns -ETIMEOUT and fail create AH, without waiting for actual completion from firmware. When FW completion is received, use is_waiter_alive flag to avoid a regular completion path. If create_ah opcode is detected in completion path which does not have waiter alive, driver will fetch ah_id from successful firmware completion in the interrupt context and sends destroy_ah command for same ah_id. This special post is done in quick manner using helper function __send_message_no_waiter. timeout_send is only used for debugging purposes. If timeout_send value keeps incrementing, it indicates out of sync active ah counter between driver and firmware. This is a limitation but graceful handling is possible in future. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-13-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:17 +03:00
Kashyap Desai	b6c7256688	RDMA/bnxt_re: Add firmware stall check detection Every completion will update last_seen value in the unit of jiffies. last_seen field will be used to know if firmware is alive and is useful to detect firmware stall. Non blocking interface __wait_for_resp will have logic to detect firmware stall. After every 10 second interval if __wait_for_resp has not received completion for a given command it will check for firmware stall condition. If current jiffies is greater than last_seen jiffies + RCFW_FW_STALL_TIMEOUT_SEC * HZ, it is a firmware stall. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-12-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:17 +03:00
Kashyap Desai	691eb7c611	RDMA/bnxt_re: handle command completions after driver detect a timedout If calling context detect command timeout, associated memory stored on stack will not be valid. If firmware complete the same command later, this causes incorrect memory access by driver. Added is_waiter_alive to handle delayed completion by firmware. is_waiter_alive is set and reset under command queue lock. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-11-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:17 +03:00
Kashyap Desai	354f5bd985	RDMA/bnxt_re: add helper function __poll_for_resp This interface will be used if the driver has not enabled interrupt and/or interrupt is disabled for a short period of time. Completion is not possible from interrupt so this interface does self-polling. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-10-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:17 +03:00
Kashyap Desai	159cf95e42	RDMA/bnxt_re: Simplify the function that sends the FW commands - Use __send_message_basic_sanity helper function. - Do not retry posting same command if there is a queue full detection. - ENXIO is used to indicate controller recovery. - In the case of ERR_DEVICE_DETACHED state, the driver should not post commands to the firmware, but also return fabricated written code. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-9-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:17 +03:00
Kashyap Desai	65288a22dd	RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command Whenever there is a fast path IO and create/destroy resources from the slow path is happening in parallel, we may notice high latency of slow path command completion. Introduces a shadow queue depth to prevent the outstanding requests to the FW. Driver will not allow more than #RCFW_CMD_NON_BLOCKING_SHADOW_QD non-blocking commands to the Firmware. Shadow queue depth is a soft limit only for non-blocking commands. Blocking commands will be posted to the firmware as long as there is a free slot. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-8-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:17 +03:00
Kashyap Desai	3022cc1511	RDMA/bnxt_re: Avoid the command wait if firmware is inactive Add a check to avoid waiting if driver already detects a FW timeout. Return success for resource destroy in case the device is detached. Add helper function to map timeout error code to success. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:16 +03:00
Kashyap Desai	8cf1d12ad5	RDMA/bnxt_re: Enhance the existing functions that wait for FW responses Use jiffies based timewait instead of counting iteration for commands that block for FW response. Also add a poll routine for control path commands. This is for polling completion if the waiting commands timeout. This avoids cases where the driver misses completion interrupts. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:16 +03:00
Kashyap Desai	258ee04317	RDMA/bnxt_re: set fixed command queue depth There is no need of setting max command queue entries based on firmware version check. Removing deperecated code. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:16 +03:00
Kashyap Desai	b021186bca	RDMA/bnxt_re: remove virt_func check while creating RoCE FW channel There is a common FW communication offset for both PF and VF. Removed code around virt_fn check while creating FW channel. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:16 +03:00
Kashyap Desai	3099bcdc19	RDMA/bnxt_re: Avoid calling wake_up threads from spin_lock context bnxt_qplib_service_creq can be called from interrupt or tasklet or process context. So the function take irq variant of spin_lock. But when wake_up is invoked with the lock held, it is putting the calling context to sleep. [exception RIP: __wake_up_common+190] RIP: ffffffffb7539d7e RSP: ffffa73300207ad8 RFLAGS: 00000083 RAX: 0000000000000001 RBX: ffff91fa295f69b8 RCX: dead000000000200 RDX: ffffa733344af940 RSI: ffffa73336527940 RDI: ffffa73336527940 RBP: 000000000000001c R8: 0000000000000002 R9: 00000000000299c0 R10: 0000017230de82c5 R11: 0000000000000002 R12: ffffa73300207b28 R13: 0000000000000000 R14: ffffa733341bf928 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Call the wakeup after releasing the lock. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:16 +03:00
Kashyap Desai	0af91306e1	RDMA/bnxt_re: wraparound mbox producer index Driver is not handling the wraparound of the mbox producer index correctly. Currently the wraparound happens once u32 max is reached. Bit 31 of the producer index register is special and should be set only once for the first command. Because the producer index overflow setting bit31 after a long time, FW goes to initialization sequence and this causes FW hang. Fix is to wraparound the mbox producer index once it reaches u16 max. Fixes: `cee0c7bba4` ("RDMA/bnxt_re: Refactor command queue management code") Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1686308514-11996-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-12 10:10:16 +03:00
Cheng Xu	3b3dfd58ba	RDMA/erdma: Refactor the original doorbell allocation mechanism The original doorbell allocation mechanism is complex and does not meet the isolation requirement. So we introduce a new doorbell mechanism and the original mechanism (only be used with CAP_SYS_RAWIO if hardware does not support the new mechanism) needs to be kept as simple as possible for compatibility. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230606055005.80729-5-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-11 11:57:01 +03:00
Cheng Xu	6534de1fe3	RDMA/erdma: Associate QPs/CQs with doorbells for authorization For the isolation requirement, each QP/CQ can only issue doorbells from the allocated mmio space. Configure the relationship between QPs/CQs and mmio doorbell spaces to hardware in create_qp/create_cq interfaces. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230606055005.80729-4-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-11 11:57:00 +03:00
Cheng Xu	7e9a1dada2	RDMA/erdma: Allocate doorbell resources from hardware Each ucontext will try to allocate doorbell resources in the extended bar space from hardware. For compatibility, we change nothing for the original bar space, and it will be used only for applications with CAP_SYS_RAWIO authority in the older HW/FW environments. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230606055005.80729-3-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-11 11:57:00 +03:00
Cheng Xu	128f840430	RDMA/erdma: Configure PAGE_SIZE to hardware Add a new CMDQ message to configure hardware. Initially the page size (in the format of shift) will be passed to hardware, so that hardware can organize the mmio space properly. It's called only if hardware supports it. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20230606055005.80729-2-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-11 11:57:00 +03:00
Patrisious Haddad	22664c06e9	RDMA/mlx5: Return the firmware result upon destroying QP/RQ Previously when destroying a QP/RQ, the result of the firmware destruction function was ignored and upper layers weren't informed about the failure. Which in turn could lead to various problems since when upper layer isn't aware of the failure it continues its operation thinking that the related QP/RQ was successfully destroyed while it actually wasn't, which could lead to the below kernel WARN. Currently, we return the correct firmware destruction status to upper layers which in case of the RQ would be mlx5_ib_destroy_wq() which was already capable of handling RQ destruction failure or in case of a QP to destroy_qp_common(), which now would actually warn upon qp destruction failure. WARNING: CPU: 3 PID: 995 at drivers/infiniband/core/rdma_core.c:940 uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse CPU: 3 PID: 995 Comm: python3 Not tainted 5.16.0-rc5+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:uverbs_destroy_ufile_hw+0xcb/0xe0 [ib_uverbs] Code: 41 5c 41 5d 41 5e e9 44 34 f0 e0 48 89 df e8 4c 77 ff ff 49 8b 86 10 01 00 00 48 85 c0 74 a1 4c 89 e7 ff d0 eb 9a 0f 0b eb c1 <0f> 0b be 04 00 00 00 48 89 df e8 b6 f6 ff ff e9 75 ff ff ff 90 0f RSP: 0018:ffff8881533e3e78 EFLAGS: 00010287 RAX: ffff88811b2cf3e0 RBX: ffff888106209700 RCX: 0000000000000000 RDX: ffff888106209780 RSI: ffff8881533e3d30 RDI: ffff888109b101a0 RBP: 0000000000000001 R08: ffff888127cb381c R09: 0de9890000000009 R10: ffff888127cb3800 R11: 0000000000000000 R12: ffff888106209780 R13: ffff888106209750 R14: ffff888100f20660 R15: 0000000000000000 FS: 00007f8be353b740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8bd5b117c0 CR3: 000000012cd8a004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ib_uverbs_close+0x1a/0x90 [ib_uverbs] __fput+0x82/0x230 task_work_run+0x59/0x90 exit_to_user_mode_prepare+0x138/0x140 syscall_exit_to_user_mode+0x1d/0x50 ? __x64_sys_close+0xe/0x40 do_syscall_64+0x4a/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f8be3ae0abb Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 83 43 f9 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 c1 43 f9 ff 8b 44 RSP: 002b:00007ffdb51909c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000557bb7f7c020 RCX: 00007f8be3ae0abb RDX: 0000557bb7c74010 RSI: 0000557bb7f14ca0 RDI: 0000000000000005 RBP: 0000557bb7fbd598 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 0000557bb7fbd5b8 R13: 0000557bb7fbd5a8 R14: 0000000000001000 R15: 0000557bb7f7c020 </TASK> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/c6df677f931d18090bafbe7f7dbb9524047b7d9b.1685953497.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-11 11:21:46 +03:00
Patrisious Haddad	afff248998	RDMA/mlx5: Handle DCT QP logic separately from low level QP interface Previously when destroying a DCT, if the firmware function for the destruction failed, the common resource would have been destroyed either way, since it was destroyed before the firmware object. Which leads to kernel warning "refcount_t: underflow" which indicates possible use-after-free. Which is triggered when we try to destroy the common resource for the second time and execute refcount_dec_and_test(&common->refcount). So, let's fix the destruction order by factoring out the DCT QP logic to be in separate XArray database. refcount_t: underflow; use-after-free. WARNING: CPU: 8 PID: 1002 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0 Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse CPU: 8 PID: 1002 Comm: python3 Not tainted 5.16.0-rc5+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0xd8/0xe0 Code: ff 48 c7 c7 18 f5 23 82 c6 05 60 70 ff 00 01 e8 d0 0a 45 00 0f 0b c3 48 c7 c7 c0 f4 23 82 c6 05 4c 70 ff 00 01 e8 ba 0a 45 00 <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 RSP: 0018:ffff8881221d3aa8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8881313e8d40 RCX: ffff88852cc1b5c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88852cc1b5c0 RBP: ffff888100f70000 R08: ffff88853ffd1ba8 R09: 0000000000000003 R10: 00000000fffff000 R11: 3fffffffffffffff R12: 0000000000000246 R13: ffff888100f71fa0 R14: ffff8881221d3c68 R15: 0000000000000020 FS: 00007efebbb13740(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005611aac29f80 CR3: 00000001313de004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> destroy_resource_common+0x6e/0x95 [mlx5_ib] mlx5_core_destroy_rq_tracked+0x38/0xbe [mlx5_ib] mlx5_ib_destroy_wq+0x22/0x80 [mlx5_ib] ib_destroy_wq_user+0x1f/0x40 [ib_core] uverbs_free_wq+0x19/0x40 [ib_uverbs] destroy_hw_idr_uobject+0x18/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2f/0x190 [ib_uverbs] uobj_destroy+0x3c/0x80 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xb80 [ib_uverbs] ? uverbs_free_wq+0x40/0x40 [ib_uverbs] ? ip_list_rcv+0xf7/0x120 ? netif_receive_skb_list_internal+0x1b6/0x2d0 ? task_tick_fair+0xbf/0x450 ? __handle_mm_fault+0x11fc/0x1450 ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs] __x64_sys_ioctl+0x3e4/0x8e0 ? handle_mm_fault+0xb9/0x210 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7efebc0be17b Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe71813e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffe71813fb8 RCX: 00007efebc0be17b RDX: 00007ffe71813fa0 RSI: 00000000c0181b01 RDI: 0000000000000005 RBP: 00007ffe71813f80 R08: 00005611aae96020 R09: 000000000000004f R10: 00007efebbf9ffa0 R11: 0000000000000246 R12: 00007ffe71813f80 R13: 00007ffe71813f4c R14: 00005611aae2eca0 R15: 00007efeae6c89d0 </TASK> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4470888466c8a898edc9833286967529cc5f3c0d.1685953497.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-11 11:21:40 +03:00
Leon Romanovsky	2ecfd94616	RDMA/mlx5: Reduce QP table exposure driver.h is common header to whole mlx5 code base, but struct mlx5_qp_table is used in mlx5_ib driver only. So move that struct to be under sole responsibility of mlx5_ib. Link: https://lore.kernel.org/r/bec0dc1158e795813b135d1143147977f26bf668.1685953497.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2023-06-11 11:21:28 +03:00
Patrisious Haddad	c023b61ac8	net/mlx5: Nullify qp->dbg pointer post destruction Nullifying qp->dbg is a preparation for the next patches from the series in which mlx5_core_destroy_qp() could actually fail, and then it can be called again which causes a kernel crash, since qp->dbg was not nullified in previous call. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/1677e52bb642fd8d6062d73a5aa69083c0283dc9.1685953497.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-11 11:21:23 +03:00
Bryan Tan	7ad697cdd3	RDMA/vmw_pvrdma: Remove unnecessary check on wr->opcode wr->opcode is unsigned; checking if it is negative is unnecessary. Fix this issue by removing the check. Fixes: `29c8d9eba5` ("IB: Add vmw_pvrdma driver") Link: https://lore.kernel.org/r/20230605183728.47021-1-bryantan@vmware.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/07c48eee-0aca-48ee-897a-38588c341c41@kili.mountain Signed-off-by: Bryan Tan <bryantan@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-09 14:11:53 -03:00
Bob Pearson	c3e1bf626e	RDMA/rxe: Send last wqe reached event on qp cleanup The IBA requires: o11-5.2.5: If the HCA supports SRQ, for RC and UD service, the CI shall generate a Last WQE Reached Affiliated Asynchronous Event on a QP that is in the Error State and is associated with an SRQ when either: • a CQE is generated for the last WQE, or • the QP gets in the Error State and there are no more WQEs on the RQ. This patch implements this behavior in flush_recv_queue() which is called as a result of rxe_qp_error() being called whenever the qp is put into the error state. The rxe responder executes SRQ WQEs directly from the SRQ so there are never more WQES on the RQ. Link: https://lore.kernel.org/r/20230602164229.9277-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-09 14:06:23 -03:00
Bob Pearson	544c7f62cf	RDMA/rxe: Implement rereg_user_mr Implement the two easy cases of ib_rereg_user_mr. Link: https://lore.kernel.org/r/20230530221334.89432-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-09 13:18:53 -03:00
Bob Pearson	86a3fb55bc	RDMA/rxe: Let rkey == lkey for local access In order to conform to other drivers stop using rkey == 0 as an indication that there are no remote access flags set. Set rkey == lkey by default for all MRs. Link: https://lore.kernel.org/r/20230530221334.89432-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-09 13:18:53 -03:00
Bob Pearson	02ed253770	RDMA/rxe: Introduce rxe access supported flags Introduce supported bit masks for setting the access attributes of MWs, MRs, and QPs. Check these when attributes are set. Link: https://lore.kernel.org/r/20230530221334.89432-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-09 13:18:53 -03:00
Bob Pearson	425e1c9018	RDMA/rxe: Fix access checks in rxe_check_bind_mw The subroutine rxe_check_bind_mw() in rxe_mw.c performs checks on the mw access flags before they are set so they always succeed. This patch instead checks the access flags passed in the send wqe. Fixes: `32a577b4c3` ("RDMA/rxe: Add support for bind MW work requests") Link: https://lore.kernel.org/r/20230530221334.89432-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-09 13:18:52 -03:00
Bob Pearson	2a129958bd	RDMA//rxe: Optimize send path in rxe_resp.c Bypass calling check_rkey() in rxe_resp.c for non-rdma messages. Link: https://lore.kernel.org/r/20230530221334.89432-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-09 13:18:52 -03:00
Bob Pearson	d11442c6bd	RDMA/rxe: Rename IB_ACCESS_REMOTE Rename IB_ACCESS_REMOTE to RXE_ACCESS_REMOTE and move to rxe_verbs.h as an enum instead of a #define. Shouldn't use IB_xxx for rxe symbols. Link: https://lore.kernel.org/r/20230530221334.89432-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-09 13:18:52 -03:00
Chengchang Tang	a519a612a7	RDMA/hns: Add clear_hem return value to log Log return value of clear_hem() to help diagnose. Link: https://lore.kernel.org/r/20230523121641.3132102-4-huangjunxian6@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 19:58:47 -03:00
Chengchang Tang	cf5b608fb0	RDMA/hns: Fix hns_roce_table_get return value The return value of set_hem has been fixed to ENODEV, which will lead a diagnostic information missing. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/20230523121641.3132102-3-huangjunxian6@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 19:58:47 -03:00
Junxian Huang	b9989ab3f6	RDMA/hns: Remove unnecessary QP type checks It is not necessary to check the type of the queue on IO path because unsupported QP type cannot be created. Link: https://lore.kernel.org/r/20230523121641.3132102-2-huangjunxian6@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 19:58:47 -03:00
Brendan Cunningham	95ea2efbd6	IB/hfi1: Remove unused struct mmu_rb_ops fields .insert, .invalidate The struct mmu_rb_ops function pointers .insert, .invalidate were only used to increment and decrement struct sdma_mmu_node.refcount. With the deletion of struct sdma_mmu_node.refcount and the addition of struct mmu_rb_node.refcount these function pointers are not called and there are no implementations of them. So it is safe to delete these from struct mmu_rb_ops. Link: https://lore.kernel.org/r/168451526508.3702129.8677714753157495310.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 14:45:58 -03:00
Brendan Cunningham	e236e2eae5	IB/hfi1: Add mmu_rb_node refcount to hfi1_mmu_rb_template tracepoints Add kref_read() of mmu_rb_node.refcount in hfi1_mmu_rb_template-type tracepoint output. Change hfi1_mmu_rb_template tracepoint to take a struct mmu_rb_node* and record the values it needs from that. This makes the trace_hfi1_mmu() calls shorter and easier to read. Add hfi1_mmu_release_node() tracepoint before all mmu_rb_node->handler->ops->remove() calls. Make hfi1_mmu_rb_search() tracepoint its own tracepoint type separate from hfi1_mmu_rb_template since hfi1_mmu_rb_search() does not take a struct mmu_rb_node. Link: https://lore.kernel.org/r/168451525987.3702129.12824880387615916700.stgit@awfm-02.cornelisnetworks.com Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 14:45:57 -03:00
Brendan Cunningham	c9358de193	IB/hfi1: Fix wrong mmu_node used for user SDMA packet after invalidate The hfi1 user SDMA pinned-page cache will leave a stale cache entry when the cache-entry's virtual address range is invalidated but that cache entry is in-use by an outstanding SDMA request. Subsequent user SDMA requests with buffers in or spanning the virtual address range of the stale cache entry will result in packets constructed from the wrong memory, the physical pages pointed to by the stale cache entry. To fix this, remove mmu_rb_node cache entries from the mmu_rb_handler cache independent of the cache entry's refcount. Add 'struct kref refcount' to struct mmu_rb_node and manage mmu_rb_node lifetime with kref_get() and kref_put(). mmu_rb_node.refcount makes sdma_mmu_node.refcount redundant. Remove 'atomic_t refcount' from struct sdma_mmu_node and change sdma_mmu_node code to use mmu_rb_node.refcount. Move the mmu_rb_handler destructor call after a wait-for-SDMA-request-completion call so mmu_rb_nodes that need mmu_rb_handler's workqueue to queue themselves up for destruction from an interrupt context may do so. Fixes: `f48ad614c1` ("IB/hfi1: Move driver out of staging") Fixes: `00cbce5cbf` ("IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests") Link: https://lore.kernel.org/r/168451393605.3700681.13493776139032178861.stgit@awfm-02.cornelisnetworks.com Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 14:38:00 -03:00
Li Zhijian	1cc625cecc	RDMA/rtrs: Remove duplicate cq_num assignment line 1701 and 1713 are duplicate: > 1701 cq_num = max_send_wr + max_recv_wr; 1702 /* alloc iu to recv new rkey reply when server reports flags set / 1703 if (clt_path->flags & RTRS_MSG_NEW_RKEY_F \|\| con->c.cid == 0) { 1704 con->rsp_ius = rtrs_iu_alloc(cq_num, sizeof(rsp), 1705 GFP_KERNEL, 1706 clt_path->s.dev->ib_dev, 1707 DMA_FROM_DEVICE, 1708 rtrs_clt_rdma_done); 1709 if (!con->rsp_ius) 1710 return -ENOMEM; 1711 con->queue_num = cq_num; 1712 } > 1713 cq_num = max_send_wr + max_recv_wr; Remove the duplicate. Link: https://lore.kernel.org/r/1682384563-2-2-git-send-email-lizhijian@fujitsu.com Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 13:11:41 -03:00
Nicolas Morey	84510a61ef	RDMA/rxe: Remove dangling declaration of rxe_cq_disable() rxe_cq_disable() has been removed but not its declaration. Fixes: `78b26a3353` ("RDMA/rxe: Remove tasklet call from rxe_cq.c") Link: https://lore.kernel.org/r/4f20ffc5-b2c4-0c11-2883-a835caf01a94@suse.com Signed-off-by: Nicolas Morey <nmorey@suse.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 12:59:32 -03:00
Arnd Bergmann	b002760f87	RDMA/irdma: avoid fortify-string warning in irdma_clr_wqes Commit `df8fc4e934` ("kbuild: Enable -fstrict-flex-arrays=3") triggers a warning for fortified memset(): In function 'fortify_memset_chk', inlined from 'irdma_clr_wqes' at drivers/infiniband/hw/irdma/uk.c:103:4: include/linux/fortify-string.h:493:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 493 \| __write_overflow_field(p_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The problem here isthat the inner array only has four 8-byte elements, so clearing 4096 bytes overflows that. As this structure is part of an outer array, change the code to pass a pointer to the irdma_qp_quanta instead, and change the size argument for readability, matching the comment above it. Fixes: `551c46edc7` ("RDMA/irdma: Add user/kernel shared libraries") Link: https://lore.kernel.org/r/20230523111859.2197825-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 12:59:12 -03:00
Long Li	2145328515	RDMA/mana_ib: Use v2 version of cfg_rx_steer_req to enable RX coalescing With RX coalescing, one CQE entry can be used to indicate multiple packets on the receive queue. This saves processing time and PCI bandwidth over the CQ. The MANA Ethernet driver also uses the v2 version of the protocol. It doesn't use RX coalescing and its behavior is not changed. Link: https://lore.kernel.org/r/1684045095-31228-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-06-01 12:52:01 -03:00
Kalesh AP	8c1ee346da	RDMA/bnxt_re: Remove unnecessary checks The NULL check inside bnxt_qplib_del_sgid() and bnxt_qplib_add_sgid() always return false as the "sgid_tbl" inside "rdev->qplib_res" is a static memory. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/1684478897-12247-8-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 13:26:19 -03:00
Kalesh AP	07d5ce14b2	RDMA/bnxt_re: Return directly without goto jumps When there is no cleanup to be done, return directly. This will help eliminating unnecessary local variables and goto labels. This patch fixes such occurrences in qplib_fp.c file. Fixes: `37cb11acf1` ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Fixes: `159fb4ceac` ("RDMA/bnxt_re: introduce a function to allocate swq") Link: https://lore.kernel.org/r/1684478897-12247-7-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 13:26:19 -03:00
Kalesh AP	43774bc156	RDMA/bnxt_re: Fix to remove an unnecessary log During destroy_qp, driver sets the qp handle in the existing CQEs belonging to the QP being destroyed to NULL. As a result, a poll_cq after destroy_qp can report unnecessary messages. Remove this noise from system logs. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/1684478897-12247-6-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 13:26:18 -03:00
Kalesh AP	b989f90cef	RDMA/bnxt_re: Remove a redundant check inside bnxt_re_update_gid The NULL check inside bnxt_re_update_gid() always return false. If sgid_tbl->tbl is not allocated, then dev_init would have failed. Fixes: `5fac5b1b29` ("RDMA/bnxt_re: Add vlan tag for untagged RoCE traffic when PFC is configured") Link: https://lore.kernel.org/r/1684478897-12247-5-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 13:26:18 -03:00
Kalesh AP	ff2e4bfd16	RDMA/bnxt_re: Use unique names while registering interrupts bnxt_re currently uses the names "bnxt_qplib_creq" and "bnxt_qplib_nq-0" while registering IRQs. There is no way to distinguish the IRQs of different device ports when there are multiple IB devices registered. This could make the scenarios worse where one want to pin IRQs of a device port to certain CPUs. Fixed the code to use unique names which has PCI BDF information while registering interrupts like: "bnxt_re-nq-0@pci:0000:65:00.0" and "bnxt_re-creq@pci:0000:65:00.1". Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/1684478897-12247-4-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 13:26:18 -03:00
Kalesh AP	9b3ee47796	RDMA/bnxt_re: Fix to remove unnecessary return labels If there is no cleanup needed then just return directly. This cleans up the code and improve readability. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/1684478897-12247-3-git-send-email-selvin.xavier@broadcom.com Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Reviewed-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 13:26:18 -03:00
Selvin Xavier	ab112ee789	RDMA/bnxt_re: Disable/kill tasklet only if it is enabled When the ulp hook to start the IRQ fails because the rings are not available, tasklets are not enabled. In this case when the driver is unloaded, driver calls CREQ tasklet_kill. This causes an indefinite hang as the tasklet is not enabled. Driver shouldn't call tasklet_kill if it is not enabled. So using the creq->requested and nq->requested flags to identify if both tasklets/irqs are registered. Checking this flag while scheduling the tasklet from ISR. Also, added a cleanup for disabling tasklet, in case request_irq fails during start_irq. Check for return value for bnxt_qplib_rcfw_start_irq and in case the bnxt_qplib_rcfw_start_irq fails, return bnxt_re_start_irq without attempting to start NQ IRQs. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Link: https://lore.kernel.org/r/1684478897-12247-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-05-19 13:26:18 -03:00

1 2 3 4 5 ...

1185398 Commits All Branches Search

1185398 Commits

All Branches