OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jason Gunthorpe	25405d98a2	IB/ipoib: Do not remove child devices from within the ndo_uninit Switching to priv_destructor and needs_free_netdev created a subtle ordering problem in ipoib_remove_one. Now that unregister_netdev frees the netdev and priv we must ensure that the children are unregistered before trying to unregister the parent, or child unregister will use after free. The solution is to unregister the children, then parent, in the same batch all while holding the rtnl_lock. This closes all the races where a new child could have been added and ensures proper ordering. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-08-02 20:27:43 -06:00
Jason Gunthorpe	ee190ab734	IB/ipoib: Get rid of the sysfs_mutex This mutex was introduced to deal with the deadlock formed by calling unregister_netdev from within the sysfs callback of a netdev. Now that we have priv_destructor and needs_free_netdev we can switch to the more targeted solution of running the unregister from a work queue. This avoids the deadlock and gets rid of the mutex. The next patch in the series needs this mutex eliminated to create atomicity of unregisteration. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-08-02 20:27:43 -06:00
Jason Gunthorpe	9f49a5b5c2	RDMA/netdev: Use priv_destructor for netdev cleanup Now that the unregister_netdev flow for IPoIB no longer relies on external code we can now introduce the use of priv_destructor and needs_free_netdev. The rdma_netdev flow is switched to use the netdev common priv_destructor instead of the special free_rdma_netdev and the IPOIB ULP adjusted: - priv_destructor needs to switch to point to the ULP's destructor which will then call the rdma_ndev's in the right order - We need to be careful around the error unwind of register_netdev as it sometimes calls priv_destructor on failure - ULPs need to use ndo_init/uninit to ensure proper ordering of failures around register_netdev Switching to priv_destructor is a necessary pre-requisite to using the rtnl new_link mechanism. The VNIC user for rdma_netdev should also be revised, but that is left for another patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-08-02 20:27:43 -06:00
Jason Gunthorpe	eaeb398425	IB/ipoib: Move init code to ndo_init Now that we have a proper ndo_uninit, move code that naturally pairs with the ndo_uninit into ndo_init. This allows the netdev core to natually handle ordering. This fixes the situation where register_netdev can fail before calling ndo_init, in which case it wouldn't call ndo_uninit either. Also move a bunch of duplicated init code that is shared between child and parent for clarity. Now the child and parent register functions look very similar. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-08-02 20:27:43 -06:00
Jason Gunthorpe	7cbee87c17	IB/ipoib: Move all uninit code into ndo_uninit Currently uninit is sometimes done twice in error flows, and is sprinkled a bit all over the place. Improve the clarity of the design by moving all uninit only into ndo_uinit. Some duplication is removed: - Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but this duplicates the process in ipoib_neigh_hash_init - Flushing priv->wq was sometimes done before unregister, but that duplicates what has been done in ndo_uninit Uniniting the IB event queue must remain before unregister_netdev as it requires the RTNL lock to be dropped, this is moved to a helper to make that flow really clear and remove some duplication in error flows. If register_netdev fails (and ndo_init is NULL) then it almost always calls ndo_uninit, which lets us remove all the extra code from the error unwinds. The next patch in the series will close the 'almost always' hole by pairing a proper ndo_init with ndo_uninit. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-08-02 20:27:43 -06:00
Erez Shitrit	cda8daf179	IB/ipoib: Use cancel_delayed_work_sync for neigh-clean task The neigh_reap_task is self restarting, but so long as we call cancel_delayed_work_sync() it will be guaranteed to not be running and never start again. Thus we don't need to have the racy IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes calling flush_workqueue after the cancel. This fixes a situation where the GC work could have been left running in some rare situations. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-02 20:27:19 -06:00
Jason Gunthorpe	577e07ffba	IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN This essentially duplicates the netdev's reg_state, so just use that directly. The reg_state is updated under the rntl_lock, and all places using GOING_DOWN already acquire the rtnl_lock so checking is safe. Since the only place we use GOING_DOWN is for the parent device this does not fix any bugs, but it is a step to tidy up the unregister flow so that after later patches the flow is uniform and sane. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2018-08-02 20:19:41 -06:00
Potnuri Bharat Teja	94245f4ad9	iw_cxgb4: Support FW write completion WR To optimize NVME-oF READ IOPs, use a specialized WQE that combines the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target driver. This reduces uP overhead per NVME-oF IO, and results in over 10% improvement in NVME-oF 4K READ IOPs. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-02 20:16:02 -06:00
Potnuri Bharat Teja	b9855f4ca0	iw_cxgb4: RDMA write with immediate support Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-02 20:16:02 -06:00
Dan Carpenter	8001b717f0	rdma/cxgb4: fix some info leaks In c4iw_create_qp() there are several struct members which potentially aren't inintialized like uresp.rq_key. I've fixed this code before in in commit `ae1fe07f3f` ("RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()") so this time I'm just going to take a big hammer approach and memset the whole struct to zero. Hopefully, it will stay fixed this time. In c4iw_create_srq() we don't clear uresp.reserved. Fixes: `6a0b6174d3` ("rdma/cxgb4: Add support for kernel mode SRQ's") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-02 20:10:54 -06:00
Yixian Liu	0425e3e6e0	RDMA/hns: Support flush cqe for hip08 in kernel space According to IB protocol, there are some cases that work requests must return the flush error completion status through the completion queue. Due to hardware limitation, the driver needs to assist the flush process. This patch adds the support of flush cqe for hip08 in the cases that needed, such as poll cqe, post send, post recv and aeqe handle. The patch also considered the compatibility between kernel and user space. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-02 20:03:25 -06:00
Denis Drozdov	75da96067a	IB/IPoIB: Set ah valid flag in multicast send flow The change of ipoib_ah data structure with adding "valid" flag and checks of ah->valid in ipoib_start_xmit affected multicast packet flow. Since the multicast flow doesn't invoke path_rec_start, "ah->valid" flag remains unset, so that ipoib_start_xmit end up with neigh_refresh_path instead of sending the packet using neigh. "ah->valid" has to be set in multicast send flow. As a result IPoIB starts sending packets via neigh immediately and eliminates 60sec delay of neigh keep alive interval. The typical example of this issue are two sequential arpings: arping 11.134.208.9 -> got response (mcast_send) arping 11.134.208.9 -> no response (ah->valid = 0) Fixes: `fa9391dbad` ("RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE events") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 15:23:03 -06:00
Jason Gunthorpe	0f50d88a6e	IB/uverbs: Allow all DESTROY commands to succeed after disassociate The disassociate function was broken by design because it failed all commands. This prevents userspace from calling destroy on a uobject after it has detected a device fatal error and thus reclaiming the resources in userspace is prevented. This fix is now straightforward, when anything destroys a uobject that is not the user the object remains on the IDR with a NULL context and object pointer. All lookup locking modes other than DESTROY will fail. When the user ultimately calls the destroy function it is simply dropped from the IDR while any related information is returned. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	a9b66d6453	IB/uverbs: Do not block disassociate during write() Now that all the callbacks are safe to run concurrently with disassociation this test can be eliminated. The ufile core infrastructure becomes entirely self contained and is not sensitive to disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	e83f0ecdc4	IB/uverbs: Do not pass struct ib_device to the ioctl methods This does the same as the patch before, except for ioctl. The rules are the same, but for the ioctl methods the core code handles setting up the uobject. - Retrieve the ib_dev from the uobject->context->device. This is safe under ioctl as the core has already done rdma_alloc_begin_uobject and so CREATE calls are entirely protected by the rwsem. - Retrieve the ib_dev from uobject->object - Call ib_uverbs_get_ucontext() Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	bbd51e881f	IB/uverbs: Do not pass struct ib_device to the write based methods This is a step to get rid of the global check for disassociation. In this model, the ib_dev is not proven to be valid by the core code and cannot be provided to the method. Instead, every method decides if it is able to run after disassociation and obtains the ib_dev using one of three different approaches: - Call srcu_dereference on the udevice's ib_dev. As before, this means the method cannot be called after disassociation begins. (eg alloc ucontext) - Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext() - Retrieve the ib_dev from the uobject->object after checking under SRCU if disassociation has started (eg uobj_get) Largely, the code is all ready for this, the main work is to provide a ib_dev after calling uobj_alloc(). The few other places simply use ib_uverbs_get_ucontext() to get the ib_dev. This flexibility will let the next patches allow destroy to operate after disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	cc2e14e680	IB/uverbs: Lower the test for ongoing disassociation Commands that are reading/writing to objects can test for an ongoing disassociation during their initial call to rdma_lookup_get_uobject. This directly prevents all of these commands from conflicting with an ongoing disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	1e857e65d4	IB/uverbs: Allow uobject allocation to work concurrently with disassociate After all the recent structural changes this is now straightforward, hold the hw_destroy_rwsem across the entire uobject creation. We already take this semaphore on the success path, so holding it a bit longer is not going to change the performance. After this change none of the create callbacks require the disassociate_srcu lock to be correct. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	7452a3c745	IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate After all the recent structural changes this is now straightfoward, hoist the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around the uobject write lock as well as the destroy. This is necessary as obtaining a write lock concurrently with uverbs_destroy_ufile_hw() will cause malfunction. After this change none of the destroy callbacks require the disassociate_srcu lock to be correct. This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the IOCTL interface needs to hold an unlocked kref until all command verification is completed. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	9867f5c669	IB/uverbs: Convert 'bool exclusive' into an enum This is more readable, and future patches will need a 3rd lookup type. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	87ad80abc7	IB/uverbs: Consolidate uobject destruction There are several flows that can destroy a uobject and each one is minimized and sprinkled throughout the code base, making it difficult to understand and very hard to modify the destroy path. Consolidate all of these into uverbs_destroy_uobject() and call it in all cases where a uobject has to be destroyed. This makes one change to the lifecycle, during any abort (eg when alloc_commit is not called) we always call out to alloc_abort, even if remove_commit needs to be called to delete a HW object. This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to clarify its actual usage and revises some of the comments to reflect what the life cycle is for the type implementation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	32ed5c00ac	IB/uverbs: Make the write path destroy methods use the same flow as ioctl The ridiculous dance with uobj_remove_commit() is not needed, the write path can follow the same flow as ioctl - lock and destroy the HW object then use the data left over in the uobject to form the response to userspace. Two helpers are introduced to make this flow straightforward for the caller. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:48 -06:00
Jason Gunthorpe	aa72c9a5f9	IB/uverbs: Remove rdma_explicit_destroy() from the ioctl methods The core code will destroy the HW object on behalf of the method, if the method provides an implementation it must simply copy data from the stub uobj into the response. Destroy methods cannot touch the HW object. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-08-01 14:55:37 -06:00
Kamal Heib	26e551c5ae	RDMA: Fix return code check in rdma_set_cq_moderation The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is not supported, all drivers should generate this and all users should check for it when detecting not supported functionality. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5) Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-31 17:03:46 -06:00
Bart Van Assche	dd708e7b45	rdma/cxgb4: Simplify a structure initialization This patch avoids that sparse reports the following warning: drivers/infiniband/hw/cxgb4/qp.c:2269:34: warning: Using plain integer as NULL pointer Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-31 16:57:23 -06:00
Bart Van Assche	eb2463bab4	rdma/cxgb4: Fix SRQ endianness annotations This patch avoids that sparse complains about casts to restricted __be32. Fixes: `a3cdaa69e4` ("cxgb4: Adds CPL support for Shared Receive Queues") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-31 16:57:23 -06:00
Bart Van Assche	7810e09bfb	rdma/cxgb4: Remove a set-but-not-used variable This patch avoids that the following warning is reported when building with W=1: drivers/infiniband/hw/cxgb4/cm.c:1860:5: warning: variable 'status' set but not used [-Wunused-but-set-variable] u8 status; ^~~~~~ Fixes: `6a0b6174d3` ("rdma/cxgb4: Add support for kernel mode SRQ's") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-31 16:57:22 -06:00
Parav Pandit	8546331651	RDMA/core: Prefix _ib to IB/RoCE specific functions In rdma cm module, functions which are common between IB and iWarp are named with cma_. iWarp specific functions are prefixed with cma_iw. IB specific functions are perfixed with cma_ib. However some functions in request processing path didn't follow cma_ib notion. Prefix them with _ib for better code clarity. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Parav Pandit	79d684f026	RDMA/core: Simplify gid type check in cma_acquire_dev() cma_add_one() initializes the default GID regardless of device type. listen_id is bound to a device and an IP address, its GID type is initialized by cma_acquire_dev(). Therefore a valid default GID type is always available, it is not needed to check port type during cma_acquire_dev(). Initialize gid type of a cm id when the cm_id is created instead of doing conditional checks during cma_acquire_dev() and trying to initialize to 0 during _cma_attach_to_dev(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Parav Pandit	7582df8267	RDMA/core: Avoid holding lock while initializing fields on stack In various functions rdma_cm_event is zero initialized on stack using memset() while holding lock which is not necessary. Therefore, don't hold the lock while initializing on stack. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Parav Pandit	ca3a8ace2b	RDMA/core: Return bool instead of int Return bool for following internal and inline functions as their underlying APIs return bool too. 1. cma_zero_addr() 2. cma_loopback_addr() 3. cma_any_addr() 4. ib_addr_any() 5. ib_addr_loopback() While we are touching cma_loopback_addr(), remove extra white spaces in it. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Parav Pandit	05e0b86c41	RDMA/cma: Get rid of 1 bit boolean Arrange fields of cma_req_info structure for efficiency on stack and get rid of one bit boolean field. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Parav Pandit	e7ff98aefc	RDMA/cma: Constify path record, ib_cm_event, listen_id pointers Constify several pointers such as path_rec, ib_cm_event and listen_id pointers in several functions. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Parav Pandit	2df7dba855	RDMA/core: Constify dst_addr argument Following APIs are not supposed to modify addr or dest_addr contents. Therefore make those function argument const for better code readability. 1. rdma_resolve_ip() 2. rdma_addr_size() 3. rdma_resolve_addr() Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Parav Pandit	219d2e9dfd	RDMA/cma: Simplify rdma_resolve_addr() error flow Currently dst address is first set and later on cleared on either of the 3 error conditions are met. However none of the APIs or checks are supposed to refer to the destination address of the cm_id. Therefore, set the destination address after necessary checks pass which simplifies the error flow. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Parav Pandit	e11fef9f8d	RDMA/cma: Initialize resource type in __rdma_create_id() Currently rdma_cm_id's resource tracking fields such as owner task and kern_name and other non resource tracking fields are initialized in in single function __rdma_create_id(). Therefore, initialize rdma_cm_id's resource type also in same init function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:49:04 -06:00
Lijun Ou	cdfa4ad5d6	RDMA/hns: Program the tclass and flow label into the hardware This was missed in a few places, and was just using 0. Also correct the spelling of HNS_ROCE_FLOW_LABEL_MASK Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:42:44 -06:00
Lijun Ou	426c414619	RDMA/hns: Use macro instead of magic number This patch mainly uses CMD_CSQ_DESC_NUM instead of magic number in order to improve readability. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:42:44 -06:00
Lijun Ou	ac7cbf96c2	RDMA/hns: Modify qp will return errno when qp type is illegal Set for ret was missing in the error path here, resulting in incorrect error code for modify_qp. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:42:44 -06:00
Lijun Ou	c8e46f8d63	RDMA/hns: Assign the value for vlan field of qp context This patch mainly fills the correct value into the vlan id field of qp context as well as update the vlan field name according to the latest hardware user manual. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:42:44 -06:00
Lijun Ou	610b89677f	RDMA/hns: Only assgin the fields of the av if IB_QP_AV bit is set Only when the IB_QP_AV flag of attr_mask is set is it valid to assign the related fields of the av into the qp context. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:42:44 -06:00
Kamal Heib	1ffba62642	RDMA/providers: Remove pointless functions The rdma core is taking care of return the right error code when the rdma device callbacks aren't supported. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:31:54 -06:00
Kamal Heib	0584c47bbc	RDMA/core: Check for verbs callbacks before using them Make sure the providers implement the verbs callbacks before calling them, otherwise return -EOPNOTSUPP. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:31:09 -06:00
Kamal Heib	7150c3d554	RDMA/core: Remove {create,destroy}_ah from mandatory verbs {create,destroy}_ah aren't mandatory verbs, because not all providers are implementing them. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:31:09 -06:00
Kamal Heib	e586e1e1b7	RDMA/ipoib: Fix check for return code from ib_create_srq Make sure to check for "-EOPNOTSUPP" instead of "-ENOSYS" which is the return code from ib_create_srq() in case that it not supported. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:29:45 -06:00
Kamal Heib	8380b74e7d	RDMA/providers: Fix return value from create_srq callbacks The proper return code is "-EOPNOTSUPP" when the create_srq() callback is not supported. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:29:45 -06:00
Jack Morgenstein	f95ccffc71	IB/mlx4: Use 4K pages for kernel QP's WQE buffer In the current implementation, the driver tries to allocate contiguous memory, and if it fails, it falls back to 4K fragmented allocation. Once the memory is fragmented, the first allocation might take a lot of time, and even fail, which can cause connection failures. This patch changes the logic to always allocate with 4K granularity, since it's more robust and more likely to succeed. This patch was tested with Lustre and no performance degradation was observed. Note: This commit eliminates the "shrinking WQE" feature. This feature depended on using vmap to create a virtually contiguous send WQ. vmap use was abandoned due to problems with several processors (see the commit cited below). As a result, shrinking WQE was available only with physically contiguous send WQs. Allocating such send WQs caused the problems described above. Therefore, as a side effect of eliminating the use of large physically contiguous send WQs, the shrinking WQE feature became unavailable. Warning example: worker/20:1: page allocation failure: order:8, mode:0x80d0 CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------ Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<ffffffff81686d81>] dump_stack+0x19/0x1b [<ffffffff81186160>] warn_alloc_failed+0x110/0x180 [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0 [<ffffffff811ce868>] alloc_pages_current+0x98/0x110 [<ffffffff81184fae>] __get_free_pages+0xe/0x50 [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150 [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50 [<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core] [<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core] [<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib] [<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0 [<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib] [<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib] [<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib] [<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core] [<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm] [<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd] [<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs] [<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd] Fixes: `73898db043` ("net/mlx4: Avoid wrong virtual mappings") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:25:46 -06:00
Jason Gunthorpe	bccd06223f	IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language This clearly indicates that the input is a bitwise combination of values in an enum, and identifies which enum contains the definition of the bits. Special accessors are provided that handle the mandatory validation of the allowed bits and enforce the correct type for bitwise flags. If we had introduced this at the start then the kabi would have uniformly used u64 data to pass flags, however today there is a mixture of u64 and u32 flags. All places are converted to accept both sizes and the accessor fixes it. This allows all existing flags to grow to u64 in future without any hassle. Finally all flags are, by definition, optional. If flags are not passed the accessor does not fail, but provides a value of zero. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>	2018-07-30 20:23:29 -06:00
Bart Van Assche	d34ac5cd3a	RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send\|recv\|srq_recv) really do not modify the posted work request. To make this possible, only one cast had to be introduce that casts away constness, namely in rpcrdma_post_recvs(). The only way I can think of to avoid that cast is to introduce an additional loop in that function or to change the data type of bad_wr from struct ib_recv_wr ** into int (an index that refers to an element in the work request list). However, both approaches would require even more extensive changes than this patch. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:09:34 -06:00
Bart Van Assche	7bb1fafc2f	IB/mlx5, ib_post_send(), IB_WR_REG_SIG_MR: Do not modify the 'wr' argument Since the next patch will constify the wr pointer, do not modify the data that pointer points at. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-07-30 20:00:20 -06:00

1 2 3 4 5 ...

766762 Commits All Branches Search

766762 Commits

All Branches