OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Artemy Kovalyov	403cd12e2c	IB/umem: Add contiguous ODP support Currenlty ODP supports only regular MMU pages. Add ODP support for regions consisting of physically contiguous chunks of arbitrary order (huge pages for instance) to improve performance. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:40:28 -04:00
Artemy Kovalyov	4df4a5bac3	IB/mlx5: Decrease verbosity level of ODP errors Decrease verbosity level of ODP error flows messages to debug level. Remove one redundant print since debug level message already exists in this flow. Fixes: `d9aaed8387` ('{net,IB}/mlx5: Refactor page fault handling') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:40:28 -04:00
Artemy Kovalyov	523791d7c5	IB/mlx5: Fix implicit MR GC When implicit MR's leaf MKey becomes unused, i.e. when it's last page being released my MMU invalidation it is marked as "dying" and scheduled for release by garbage collector. Currentle consequent page fault may remove "dying" flag. Treat leaf MKey as non-existent once it was scheduled to removal by GC. Fixes: `81713d3788` ('IB/mlx5: Add implicit MR support') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:40:28 -04:00
Artemy Kovalyov	438b228e03	IB/mlx5: Fix UMR size calculation Translation table updates of large UMR may require multiple post send operations. The last operations can be in various lengths, but current code set them to be the same length. Fixes: `7d0cc6edcc` ('IB/mlx5: Add MR cache for large UMR regions') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:40:28 -04:00
Artemy Kovalyov	bd174fc2ca	IB/mlx5: Fix function updating xlt emergency path In memory shortage path we fall back to use spare buffer. mlx5_ib_update_xlt() called from ib_uverbs_reg_mr when ibmr.ucontext not initialized yet. Scenario how to test it: 1. trigger memory exhaustion so __get_free_pages(GFP_KERNEL, 4) will fail 2. register MR 3. there should be no kernel oops Fixes: `7d0cc6edcc` ('IB/mlx5: Add MR cache for large UMR regions') Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:40:28 -04:00
Artemy Kovalyov	3e7e1193e2	IB: Replace ib_umem page_size by page_shift Size of pages are held by struct ib_umem in page_size field. It is better to store it as an exponent, because page size by nature is always power-of-two and used as a factor, divisor or ilog2's argument. The conversion of page_size to be page_shift allows to have portable code and avoid following error while compiling on ARM: ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined! CC: Selvin Xavier <selvin.xavier@broadcom.com> CC: Steve Wise <swise@chelsio.com> CC: Lijun Ou <oulijun@huawei.com> CC: Shiraz Saleem <shiraz.saleem@intel.com> CC: Adit Ranadive <aditr@vmware.com> CC: Dennis Dalessandro <dennis.dalessandro@intel.com> CC: Ram Amrani <Ram.Amrani@Cavium.com> Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Ram Amrani <Ram.Amrani@cavium.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:40:28 -04:00
Zhu Yanjun	8d2216be28	IB/core: change the return type to void The function ib_unregister_mad_agent always returns zero. And this returned value is not checked. As such, chane the return type to void. CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:30:26 -04:00
Selvin Xavier	2c15b73ab4	MAINTAINERS: Update ocrdma module status Since ocrdma driver is not going to be updated with any new development activity, except for critical bug fixes reported by partners or customers, changing the module status to "Odd Fixes". Also, updating the web page info and the maintainers email addresses. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:28:00 -04:00
Ira Weiny	e8ea95af87	IB/hfi: Fix up comments in engine mapping Fix off by 1 error in comments documenting the sdma and send context mappings. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:24:51 -04:00
Geert Uytterhoeven	3d35d32d1f	MAINTAINERS: Add file patterns for infiniband device tree bindings Submitters of device tree binding documentation may forget to CC the subsystem maintainer if this is missing. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:20:28 -04:00
Vlad Tsyrklevich	4f7f4dcfff	infiniband/uverbs: Fix integer overflows The 'num_sge' variable is verfied to be smaller than the 'sge_count' variable; however, since both are user-controlled it's possible to cause an integer overflow for the kmalloc multiply on 32-bit platforms (num_sge and sge_count are both defined u32). By crafting an input that causes a smaller-than-expected allocation it's possible to write controlled data out-of-bounds. Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:18:02 -04:00
Arnd Bergmann	5b0ff9a007	infiniband: hns: avoid gcc-7.0.1 warning for uninitialized data hns_roce_v1_cq_set_ci() calls roce_set_bit() on an uninitialized field, which will then change only a few of its bits, causing a warning with the latest gcc: infiniband/hw/hns/hns_roce_hw_v1.c: In function 'hns_roce_v1_cq_set_ci': infiniband/hw/hns/hns_roce_hw_v1.c:1854:23: error: 'doorbell[1]' is used uninitialized in this function [-Werror=uninitialized] roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1); The code is actually correct since we always set all bits of the port_vlan field, but gcc correctly points out that the first access does contain uninitialized data. This initializes the field to zero first before setting the individual bits. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 15:16:38 -04:00
Petr Mladek	50b6778c44	IB/fmr_pool: Convert the cleanup thread into kthread worker API Kthreads are currently implemented as an infinite loop. Each has its own variant of checks for terminating, freezing, awakening. In many cases it is unclear to say in which state it is and sometimes it is done a wrong way. The plan is to convert kthreads into kthread_worker or workqueues API. It allows to split the functionality into separate operations. It helps to make a better structure. Also it defines a clean state where no locks are taken, IRQs blocked, the kthread might sleep or even be safely migrated. The kthread worker API is useful when we want to have a dedicated single thread for the work. It helps to make sure that it is available when needed. Also it allows a better control, e.g. define a scheduling priority. This patch converts the frm_pool kthread into the kthread worker API because I am not sure how busy the thread is. It is well possible that it does not need a dedicated kthread and workqueues would be perfectly fine. Well, the conversion between kthread worker API and workqueues is pretty trivial. The patch moves one iteration from the kthread into the work function. It is queued only when there is a pending work. Therefore we do not need to compare flush_ser and req_ser at the beginning. On the contrary, the same work could be queued only once at a time. Therefore it has to re-queue itself if some requests are pending. Otherwise, wake_up_process() is replaced by queuing the work. Important: The change is only compile tested. I did not find an easy way how to check it in a real life. Signed-off-by: Petr Mladek <pmladek@suse.com> TO: Doug Ledford <dledford@redhat.com> CC: Sean Hefty <sean.hefty@intel.com> CC: Hal Rosenstock <hal.rosenstock@gmail.com> CC: linux-rdma@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 14:24:17 -04:00
Yuval Shaia	4d6f28591f	{net,IB}/{rxe,usnic}: Utilize generic mac to eui32 function This logic seems to be duplicated in (at least) three separate files. Move it to one place so code can be re-use. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>	2017-04-25 14:21:34 -04:00
Yuval Shaia	a7c81326ca	IB/usnic: Remove unused functions Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 14:12:14 -04:00
Colin Ian King	05a24b9b7f	IB/iser: fix spelling mistake: "unexepected" -> "unexpected" trivial fix to spelling mistake in iser_err error message Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 14:06:49 -04:00
Ganesh Goudar	e821303c42	iw_cxgb4: Use dsgl by default Enable the use of dsgl by default and determine whether dsgl is supported from lld info. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Bharat Potnuri <bharat@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 14:04:41 -04:00
Doug Ledford	374cb8610a	RDMA/bnxt_re: Use IS_ERR_OR_NULL where appropriate Constructs such as if (ptr && !IS_ERR(ptr)) can be shorted to just !IS_ERR_OR_NULL(ptr) instead. Make substitutions in the bnxt_re driver where appropriate. Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 14:00:59 -04:00
Colin Ian King	ebbd1dfb26	RDMA/bnxt_re: remove redundant initialization of rc to zero rc is initialized to zero but is then updated by calls to bnxt_qplib_free_fast_reg_page_list and/or bnxt_qpliob_free_mrw so the initialization is redundant and can be removed. Detected with CoverityScan, CID#1408448 ("Unused Value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-25 13:36:59 -04:00
Noa Osherovich	f1b65df5a2	IB/mlx5: Add support for active_width and active_speed in RoCE Add missing calculation and translation of active_width and active_speed for RoCE. Fixes: `3f89a643eb` ('IB/mlx5: Extend query_device/port to ...') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:58:41 -04:00
Noa Osherovich	50f22fd8ec	IB/mlx5: Set mlx5_query_roce_port's return value to void In case of an error, the properties reported to user are zeroed out, so no need for a return value. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:54:51 -04:00
Noa Osherovich	12113a35ad	IB/core: Add HDR speed enum Add high data rate speed to the ib_port_speed enumeration. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:29:31 -04:00
Moni Shoua	12f8fedef2	IB/mlx5: Set correct SL in completion for RoCE There is a difference when parsing a completion entry between Ethernet and IB ports. When link layer is Ethernet the bits describe the type of L3 header in the packet. In the case when link layer is Ethernet and VLAN header is present the value of SL is equal to the 3 UP bits in the VLAN header. If VLAN header is not present then the SL is undefined and consumer of the completion should check if IB_WC_WITH_VLAN is set. While that, this patch also fills the vlan_id field in the completion if present. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:29:31 -04:00
Moni Shoua	61c0ddbe97	IB/cma: Send MRA for reply messages Current implementation of RDMA_CM sends MRA (Message Receipt Acknowledgment) only for request messages but not for response messages. As a result, a slow active side of the connection may send a ready-to-use message to the passive side in a delay that is too long for the passive side to wait for. This patch adds a call to ib_send_cm_mra() upon receiving a response message and by this tells the other side to modify the service timeout to a bigger value, 16 times than before. As in the request case, MRA for reply will be sent only if a duplicate response has arrived. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Matan Barak <matan@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:29:31 -04:00
Parav Pandit	e1f24a79f4	IB/mlx5: Support congestion related counters This patch adds support to query the congestion related hardware counters through new command and links them with other hw counters being available in hw_counters sysfs location. In order to reuse existing infrastructure it renames related q_counter data structures to more generic counters to reflect q_counters and congestion counters and maybe some other counters in the future. New hardware counters: * rp_cnp_handled - CNP packets handled by the reaction point * rp_cnp_ignored - CNP packets ignored by the reaction point * np_cnp_sent - CNP packets sent by notification point to respond to CE marked RoCE packets * np_ecn_marked_roce_packets - CE marked RoCE packets received by notification point It also avoids returning ENOSYS which is specific for invalid system call and produces the following checkpatch.pl warning. WARNING: ENOSYS means 'invalid syscall nr' and nothing else + return -ENOSYS; Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:29:31 -04:00
Leon Romanovsky	a43402af1e	IB/mthca: Check validity of output parameter pointer The mthca driver didn't check supplied pointer to functions mthca_cmd_poll() and mthca_cmd_wait(). This caused to the following smatch errors: drivers/infiniband/hw/mthca/mthca_cmd.c:371 mthca_cmd_poll() error: we previously assumed 'out_param' could be null (see line 353) drivers/infiniband/hw/mthca/mthca_cmd.c:454 mthca_cmd_wait() error: we previously assumed 'out_param' could be null (see line 432) In reality all callers of these functions are setting out_is_imm flag are providing pointer too. However it is better to check again to remove smatch errors to achieve warning free subsystem. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:29:31 -04:00
Slava Shwartsman	a22ed86cff	IB/mlx5: Add drop flow steering rule support A drop rule is described by an action drop and no destination. If a user specified IB_FLOW_SPEC_ACTION_DROP then set the action to MLX5_FLOW_CONTEXT_ACTION_DROP and clear the destination. Signed-off-by: Slava Shwartsman <slavash@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:29:31 -04:00
Slava Shwartsman	483a3966b5	IB/core: Introduce drop flow specification This flow steering specification identifies flow for drop by the HW. If user create a flow only with the drop specification, then all the packets that hit this flow will be dropped, otherwise the HW will drop only the packets that match the other L2/L3/L4 specifications. Signed-off-by: Slava Shwartsman <slavash@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Ariel Levkovich	19cc75249a	IB/mlx5: Use IP version matching to classify IP traffic This change adds the ability for flow steering to classify IPv4/6 packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP packets and hit IPv4/6 classifed steering rules. When user added a flow rule with IP classification, driver was implicitly adding ethertype matching to the created rule in order to distinguish between IPv4 and IPv6 protocols. Since IP packets with MPLS tag header have MPLS ethertype, they missed the rule and ended up hitting the default filters. Such behavior prevented from MPLS packets to undergo inbound traffic load balancing flows (if such were defined by configuring RSS) to achieve higher throughput - the way that non-MPLS IP packets performed. Since our device is able to look past the MPLS tag and identify the next protocol we introduce this solution which replaces Ethertype matching by the device's capability to perform IP version parsing and matching in order to distinguish between IPv4 and IPv6. Therefore, whenever a flow with IP spec is added and device support IP version matching, driver will implicitly add IP version matching to the rule (Based on the IP spec type) without Ethertype matching which will cause relevant MPLS tagged packets to hit this rule as well. Otherwise (device doesn't support IP version matching), we fall back to setting Ethertype matching. If the user's filters specify an L2 ethertype and an IP spec the rule will then match both the ethertype and the IP version. The device's support for IP version matching is reported by the device via dedicated capability bit in query_device_cap and named outer/inner_ip_version. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Ariel Levkovich	0f750966dc	IB/mlx5: Add inner spec and IPv6 validation in user's flow attribute list This change fixes an incomplete validation of the user's flow attributes list. Previous implementation validated only matching of IPv4 Ethertype to IPv4 spec of outer headers (in case both Ethernet with specified Ethertype and IP specs were present) and lacked the validation of: 1. Matching of IPv6 Ethertype in Ethernet spec (if such exists) to an IPv6 protocol spec (if such exists). 2. Validation of Ethertype to IP protocol matching on inner headers specs. Which could cause some combinations of unmatching Ethernet and IP protocols to pass validation and apply on the device. The fix adds validation of IPv6 Ethertype and IP spec as well as performing the scan on both outer and inner attributes. Fixes: `038d2ef875` ("Add flow steering support") Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Bodong Wang	44f2e99ecd	IB/mlx5: Fix wrong use of kfree at bad flow in create_cq_user The kfree was called to free cqb, while it should free *cqb. Fixes: `1cbe6fc86c` ("IB/mlx5: Add support for CQE compressing") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Maor Gottlieb	00b7c2abb6	IB/mlx5: Enlarge autogroup flow table In order to enlarge the flow group size to 8k, we decrease the number of flow group types to 6 and increase the flow table size to 64k. Flow group size is calculated as follow: group_size = table_size / (#group_types + 1) Fixes: `038d2ef875` ('IB/mlx5: Add flow steering support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Maor Gottlieb	dac388ef4c	IB/mlx5: Check supported flow table size Check that the required flow table size is supported by device. Return ENOMEM error if no space left. In addition change the create flow table routine to return ENOMEM instead of ENOSPC. Fixes: `038d2ef875` ('IB/mlx5: Add flow steering support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Maor Gottlieb	1377661298	IB/mlx5: Change vma from shared to private Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise it would lead to SIGBUS. Remove the shared flags from the vma after we change it to be anonymous. This is easily reproduced by doing modprobe -r while running a user-space application such as raw_ethernet_bw. Fixes: `7c2344c3bb` ('IB/mlx5: Implements disassociate_ucontext API') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Maor Gottlieb	ecc7d83be3	IB/mlx5: Take write semaphore when changing the vma struct When the driver disassociate user context, it changes the vma to anonymous by setting the vm_ops to null and zap the vma ptes. In order to avoid race in the kernel, we need to take write lock before we change the vma entries. Fixes: `7c2344c3bb` ('IB/mlx5: Implements disassociate_ucontext API') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Maor Gottlieb	ca37a664a8	IB/mlx4: Change vma from shared to private Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise it would lead to SIGBUS. Remove the shared flags from the vma after we change it to be anonymous. This is easily reproduced by doing modprobe -r while running a user-space application such as raw_ethernet_bw. Fixes: `ae184ddeca` ('IB/mlx4_ib: Disassociate support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Maor Gottlieb	22c3653d04	IB/mlx4: Take write semaphore when changing the vma struct When the driver disassociate user context, it changes the vma to anonymous by setting the vm_ops to null and zap the vma ptes. In order to avoid race in the kernel, we need to take write lock before we change the vma entries. Fixes: `ae184ddeca` ('IB/mlx4_ib: Disassociate support') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Jack Morgenstein	fb7a91746a	IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level A warning message during SRIOV multicast cleanup should have actually been a debug level message. The condition generating the warning does no harm and can fill the message log. In some cases, during testing, some tests were so intense as to swamp the message log with these warning messages, causing a stall in the console message log output task. This stall caused an NMI to be sent to all CPUs (so that they all dumped their stacks into the message log). Aside from the message flood causing an NMI, the tests all passed. Once the message flood which caused the NMI is removed (by reducing the warning message to debug level), the NMI no longer occurs. Sample message log (console log) output illustrating the flood and resultant NMI (snippets with comments and modified with ... instead of hex digits, to satisfy checkpatch.pl): <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!... * About 4000 almost identical lines in less than one second * <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!... INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...) * { 17} above indicates that CPU 17 was the one that stalled * sending NMI to all CPUs: ... NMI backtrace for cpu 17 CPU: 17 PID: 45909 Comm: kworker/17:2 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013 Workqueue: events fb_flashcursor task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e...... RIP: 0010:[ffffffff81......] [ffffffff81......] io_serial_in+0x15/0x20 RSP: 0018:ffff88064e257cb0 EFLAGS: 00000002 RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000...... RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81...... RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000...... R10: 0000000000...... R11: ffff88064e...... R12: 0000000000...... R13: 0000000000...... R14: ffffffff81...... R15: 0000000000...... FS: 0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080...... CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000...... DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000...... DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000...... Stack: ffff88064e...... ffffffff81...... ffffffff81...... 0000000000...... ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81...... ffffffff81...... ffff88064e...... ffffffff81...... 0000000000...... Call Trace: [<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0 [<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30 [<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140 [<ffffffff813cb5fa>] uart_console_write+0x3a/0x80 [<ffffffff813d0aae>] serial8250_console_write+0xae/0x140 [<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0 [<ffffffff8107d6cf>] console_unlock+0x3bf/0x400 [<ffffffff813503cd>] fb_flashcursor+0x5d/0x140 [<ffffffff81355c30>] ? bit_clear+0x120/0x120 [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [<ffffffff810a5aef>] kthread+0xcf/0xe0 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81645858>] ret_from_fork+0x58/0x90 [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6 As indicated in the stack trace above, the console output task got swamped. Fixes: `b9c5d6a643` ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV") Cc: <stable@vger.kernel.org> # v3.6+ Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Jack Morgenstein	99e68909d5	IB/mlx4: Fix ib device initialization error flow In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs. However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not called to free the allocated EQs. Fixes: `e605b743f3` ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs") Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Majd Dibbiny	dd77abf8a0	IB/mlx4: Support RAW Ethernet when RoCE is disabled On some environments, such as certain SR-IOV VF configurations, RoCE isn't supported for mlx4 Ethernet ports. Currently the driver will not open IB device on that port. This is problematic since we do want user-space RAW Ethernet QPs functionality to remain in place. For that end, enhance the relevant driver flows such that we do create a device instance in that case. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Jack Morgenstein	b312be3d87	IB/core: Fix sysfs registration error flow The kernel commit cited below restructured ib device management so that the device kobject is initialized in ib_alloc_device. As part of the restructuring, the kobject is now initialized in procedure ib_alloc_device, and is later added to the device hierarchy in the ib_register_device call stack, in procedure ib_device_register_sysfs (which calls device_add). However, in the ib_device_register_sysfs error flow, if an error occurs following the call to device_add, the cleanup procedure device_unregister is called. This call results in the device object being deleted -- which results in various use-after-free crashes. The correct cleanup call is device_del -- which undoes device_add without deleting the device object. The device object will then (correctly) be deleted in the ib_register_device caller's error cleanup flow, when the caller invokes ib_dealloc_device. Fixes: `55aeed0654` ("IB/core: Make ib_alloc_device init the kobject") Cc: <stable@vger.kernel.org> # v4.2+ Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Parav Pandit	4be3a4fa51	IB/core: Fix kernel crash during fail to initialize device This patch fixes the kernel crash that occurs during ib_dealloc_device() called due to provider driver fails with an error after ib_alloc_device() and before it can register using ib_register_device(). This crashed seen in tha lab as below which can occur with any IB device which fails to perform its device initialization before invoking ib_register_device(). This patch avoids touching cache and port immutable structures if device is not yet initialized. It also releases related memory when cache and port immutable data structure initialization fails during register_device() state. [81416.561946] BUG: unable to handle kernel NULL pointer dereference at (null) [81416.570340] IP: ib_cache_release_one+0x29/0x80 [ib_core] [81416.576222] PGD 78da66067 [81416.576223] PUD 7f2d7c067 [81416.579484] PMD 0 [81416.582720] [81416.587242] Oops: 0000 [#1] SMP [81416.722395] task: ffff8807887515c0 task.stack: ffffc900062c0000 [81416.729148] RIP: 0010:ib_cache_release_one+0x29/0x80 [ib_core] [81416.735793] RSP: 0018:ffffc900062c3a90 EFLAGS: 00010202 [81416.741823] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [81416.749785] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff880859fec000 [81416.757757] RBP: ffffc900062c3aa0 R08: ffff8808536e5ac0 R09: ffff880859fec5b0 [81416.765708] R10: 00000000536e5c01 R11: ffff8808536e5ac0 R12: ffff880859fec000 [81416.773672] R13: 0000000000000000 R14: ffff8808536e5ac0 R15: ffff88084ebc0060 [81416.781621] FS: 00007fd879fab740(0000) GS:ffff88085fac0000(0000) knlGS:0000000000000000 [81416.790522] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [81416.797094] CR2: 0000000000000000 CR3: 00000007eb215000 CR4: 00000000003406e0 [81416.805051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [81416.812997] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [81416.820950] Call Trace: [81416.824226] ib_device_release+0x1e/0x40 [ib_core] [81416.829858] device_release+0x32/0xa0 [81416.834370] kobject_cleanup+0x63/0x170 [81416.839058] kobject_put+0x25/0x50 [81416.843319] ib_dealloc_device+0x25/0x40 [ib_core] [81416.848986] mlx5_ib_add+0x163/0x1990 [mlx5_ib] [81416.854414] mlx5_add_device+0x5a/0x160 [mlx5_core] [81416.860191] mlx5_register_interface+0x8d/0xc0 [mlx5_core] [81416.866587] ? 0xffffffffa09e9000 [81416.870816] mlx5_ib_init+0x15/0x17 [mlx5_ib] [81416.876094] do_one_initcall+0x51/0x1b0 [81416.880861] ? __vunmap+0x85/0xd0 [81416.885113] ? kmem_cache_alloc_trace+0x14b/0x1b0 [81416.890768] ? vfree+0x2e/0x70 [81416.894762] do_init_module+0x60/0x1fa [81416.899441] load_module+0x15f6/0x1af0 [81416.904114] ? __symbol_put+0x60/0x60 [81416.908709] ? ima_post_read_file+0x3d/0x80 [81416.913828] ? security_kernel_post_read_file+0x6b/0x80 [81416.920006] SYSC_finit_module+0xa6/0xf0 [81416.924888] SyS_finit_module+0xe/0x10 [81416.929568] entry_SYSCALL_64_fastpath+0x1a/0xa9 [81416.935089] RIP: 0033:0x7fd879494949 [81416.939543] RSP: 002b:00007ffdbc1b4e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000139 [81416.947982] RAX: ffffffffffffffda RBX: 0000000001b66f00 RCX: 00007fd879494949 [81416.955965] RDX: 0000000000000000 RSI: 000000000041a13c RDI: 0000000000000003 [81416.963926] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001b652a0 [81416.971861] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffdbc1b3e70 [81416.979763] R13: 00007ffdbc1b3e50 R14: 0000000000000005 R15: 0000000000000000 [81417.008005] RIP: ib_cache_release_one+0x29/0x80 [ib_core] RSP: ffffc900062c3a90 [81417.016045] CR2: 0000000000000000 Fixes: `55aeed0654` ("IB/core: Make ib_alloc_device init the kobject") Fixes: `7738613e7c` ("IB/core: Add per port immutable struct to ib_device") Cc: <stable@vger.kernel.org> # v4.2+ Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 12:26:05 -04:00
Feras Daoud	3e31a490e0	IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow Before calling ipoib_stop, rtnl_lock should be taken, then the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY is set. On the other hand, the flow of multicast join task initializes a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this call returns EINVAL without setting the mcast completion and leads to a deadlock. ipoib_stop \| \| \| clear_bit(IPOIB_FLAG_ADMIN_UP) \| \| \| Context Switch \| \| ipoib_mcast_join_task \| \| \| spin_lock_irq(lock) \| \| \| init_completion(mcast) \| \| \| set_bit(IPOIB_MCAST_FLAG_BUSY) \| \| \| Context Switch \| \| clear_bit(IPOIB_FLAG_OPER_UP) \| \| \| spin_lock_irqsave(lock) \| \| \| Context Switch \| \| ipoib_mcast_join \| return (-EINVAL) \| \| \| spin_unlock_irq(lock) \| \| \| Context Switch \| \| ipoib_mcast_dev_flush \| wait_for_completion(mcast) \| ipoib_stop will wait for mcast completion for ever, and will not release the rtnl_lock. As a result panic occurs with the following trace: [13441.639268] Call Trace: [13441.640150] [<ffffffff8168b579>] schedule+0x29/0x70 [13441.641038] [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0 [13441.641914] [<ffffffff810bc017>] ? complete+0x47/0x50 [13441.642765] [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200 [13441.643580] [<ffffffff8168b956>] wait_for_completion+0x116/0x170 [13441.644434] [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20 [13441.645293] [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib] [13441.646159] [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib] [13441.647013] [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib] Fixes: `08bc327629` ("IB/ipoib: fix for rare multicast join race condition") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 11:45:55 -04:00
Feras Daoud	9a9b811269	IB/ipoib: Update broadcast object if PKey value was changed in index 0 Update the broadcast address in the priv->broadcast object when the Pkey value changes in index 0, otherwise the multicast GID value will keep the previous value of the PKey, and will not be updated. This leads to interface state down because the interface will keep the old PKey value. For example, in SR-IOV environment, if the PF changes the value of PKey index 0 for one of the VFs, then the VF receives PKey change event that triggers heavy flush. This flush calls update_parent_pkey that update the broadcast object and its relevant members. If in this case the multicast GID will not be updated, the interface state will be down. Fixes: `c290414169` ("IPoIB: Fix pkey change flow for virtualization environments") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 11:45:55 -04:00
yonatanc	4ed6ad1eb3	IB/rxe: Cache dst in QP instead of getting it for each send In RC QP there is no need to resolve the outgoing interface for each packet, as this does not change during QP life cycle. Instead cache the interface on the socket and use that one. This improves performance by 12% by sparing redundant calls to rxe_find_route. ib_send_bw -d rxe0 -x 1 -n 9000 -e -s $((1024 * 1024 )) -l 100 ---------------------------------------------------------------------------------------- \| \| bytes \| iterations \| BW peak[MB/sec] \| BW average[MB/sec] \| MsgRate[Mpps] \| ---------------------------------------------------------------------------------------- \| before \| 1048576 \| 9000 \| inf \| 551.21 \| 0.000551 \| \| after \| 1048576 \| 9000 \| inf \| 615.54 \| 0.000616 \| ---------------------------------------------------------------------------------------- Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 10:45:02 -04:00
yonatanc	cee2688e3c	IB/rxe: Offload CRC calculation when possible Use CPU ability to perform CRC calculations, by replacing direct calls to crc32_le() with crypto_shash_updata(). The overall performance gain measured with ib_send_bw tool is 10% and it was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU. ib_send_bw -d rxe0 -x 1 -n 9000 -e -s $((1024 * 1024 )) -l 100 --------------------------------------------------------------------------------------------- \| \| bytes \| iterations \| BW peak[MB/sec] \| BW average[MB/sec] \| MsgRate[Mpps] \| --------------------------------------------------------------------------------------------- \| crc32_le \| 1048576 \| 9000 \| inf \| 497.60 \| 0.000498 \| \| CRC offload \| 1048576 \| 9000 \| inf \| 546.70 \| 0.000547 \| --------------------------------------------------------------------------------------------- Fixes: `8700e3e7c4` ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 10:45:02 -04:00
Parav Pandit	0d38ac8a8b	IB/rxe: Do not export module's private function Function rxe_rcv is used internally in RXE and don't need to be exported. This patch removes such export declaration. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 10:43:28 -04:00
Parav Pandit	99fc12f60e	IB/rxe: Avoid accessing timers for non RC QPs This patch avoids RNR NAK timer and retransmit timer initialization and cleanup for non RC QPs (such as UD QP, GSI QP). Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 10:43:28 -04:00
Yonatan Cohen	0b1e5b99a4	IB/rxe: Add port protocol stats Expose new counters using the get_hw_stats callback. We expose the following counters: +---------------------+----------------------------------------+ \| Name \| Description \| \|---------------------+----------------------------------------\| \|sent_pkts \| number of sent pkts \| \|---------------------+----------------------------------------\| \|rcvd_pkts \| number of received packets \| \|---------------------+----------------------------------------\| \|out_of_sequence \| number of errors due to packet \| \| \| transport sequence number \| \|---------------------+----------------------------------------\| \|duplicate_request \| number of received duplicated packets. \| \| \| A request that previously executed is \| \| \| named duplicated. \| \|---------------------+----------------------------------------\| \|rcvd_rnr_err \| number of received RNR by completer \| \|---------------------+----------------------------------------\| \|send_rnr_err \| number of sent RNR by responder \| \|---------------------+----------------------------------------\| \|rcvd_seq_err \| number of out of sequence packets \| \| \| received \| \|---------------------+----------------------------------------\| \|ack_deffered \| number of deferred handling of ack \| \| \| packets. \| \|---------------------+----------------------------------------\| \|retry_exceeded_err \| number of times retry exceeded \| \|---------------------+----------------------------------------\| \|completer_retry_err \| number of times completer decided to \| \| \| retry \| \|---------------------+----------------------------------------\| \|send_err \| number of failed send packet \| +---------------------+----------------------------------------+ Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-21 10:43:28 -04:00
Doug Ledford	339e7575ad	cxgb4: Convert PDBG to pr_debug the second A couple spots were missed in the original patch to implement this change. Add those spots. Fixes: `a9a42886d0` (cxgb4: Convert PDBG to pr_debug) Signed-off-by: Doug Ledford <dledford@redhat.com>	2017-04-20 22:18:54 -04:00

1 2 3 4 5 ...

664780 Commits All Branches Search

664780 Commits

All Branches