Splits hns_roce_v2_post_send() into three sub-functions: set_rc_wqe(),
set_ud_wqe() and update_sq_db() to simplify the code.
Link: https://lore.kernel.org/r/1583839084-31579-6-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently, before the qp is created, a page size needs to be calculated
for the base address table to store all base addresses in the mtr. As a
result, the parameter configuration of the mtr is complex. So integrate
the process of calculating the base table page size into the hem related
interface to simplify the process of using mtr.
Link: https://lore.kernel.org/r/1583839084-31579-5-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Simplify the wr opcode conversion from ib to hns by using a map table
instead of the switch-case statement.
Link: https://lore.kernel.org/r/1583839084-31579-4-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Encapsulates the wqe buffer process details for datagram seg, fast mr seg
and atomic seg.
Link: https://lore.kernel.org/r/1583839084-31579-3-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There are serval global functions related to wqe buffer in the hns driver
and are called in different files. These symbols cannot directly represent
the namespace they belong to. So add prefix 'hns_roce_' to 3 wqe buffer
related global functions: get_recv_wqe(), get_send_wqe(), and
get_send_extend_sge().
Link: https://lore.kernel.org/r/1583839084-31579-2-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A work queue cannot just rely on the ib_device not being freed, it must
hold a kref on the memory so that the BNXT_RE_FLAG_IBDEV_REGISTERED check
works.
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://lore.kernel.org/r/1584117207-2664-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There are a couple places in this driver running from a work queue that
need the ib_device to be registered. Instead of using a broken internal
bit rely on the new core code to guarantee device registration.
Link: https://lore.kernel.org/r/1584117207-2664-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The first switch statement in cm_destroy_id() tries to move the ID to
either IB_CM_IDLE or IB_CM_TIMEWAIT. Both states will block concurrent
MAD handlers from progressing.
Previous patches removed the unreliably lock/unlock sequences in this
flow, this patch removes the extra locking steps and adds the missing
parts to guarantee that destroy reaches IB_CM_IDLE. There is no point in
leaving the ID in the IB_CM_TIMEWAIT state the memory about to be kfreed.
Rework things to hold the lock across all the state transitions and
directly assert when done that it ended up in IB_CM_IDLE as expected.
This was accompanied by a careful audit of all the state transitions here,
which generally did end up in IDLE on their success and non-racy paths.
Link: https://lore.kernel.org/r/20200310092545.251365-16-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The first thing ib_send_cm_sidr_rep() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.
Get rid of the cm_reject_sidr_req() wrapper so each call site can call the
locked or unlocked version as required.
This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().
Link: https://lore.kernel.org/r/20200310092545.251365-15-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The first thing ib_send_cm_rej() does is obtain the lock, so use the usual
unlocked wrapper, locked actor pattern here.
This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().
While here simplify some of the logic in the implementation.
Link: https://lore.kernel.org/r/20200310092545.251365-14-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The first thing ib_send_cm_drep() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.
This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().
Link: https://lore.kernel.org/r/20200310092545.251365-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The first thing ib_send_cm_dreq() does is obtain the lock, so use the
usual unlocked wrapper, locked actor pattern here.
This avoids a sketchy lock/unlock sequence (which could allow state to
change) during cm_destroy_id().
Link: https://lore.kernel.org/r/20200310092545.251365-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These functions all touch state, so must be called under the lock.
Inspection shows this is currently true.
Link: https://lore.kernel.org/r/20200310092545.251365-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ib_crate_cm_id() immediately places the id in the xarray, and publishes it
into the remote_id and remote_qpn rbtrees. This makes it visible to other
threads before it is fully set up.
It appears the thinking here was that the states IB_CM_IDLE and
IB_CM_REQ_RCVD do not allow any MAD handler or lookup in the remote_id and
remote_qpn rbtrees to advance.
However, cm_rej_handler() does take an action on IB_CM_REQ_RCVD, which is
not really expected by the design.
Make the whole thing clearer:
- Keep the new cm_id out of the xarray until it is completely set up.
This directly prevents MAD handlers and all rbtree lookups from seeing
the pointer.
- Move all the trivial setup right to the top so it is obviously done
before any concurrency begins
- Move the mutation of the cm_id_priv out of cm_match_id() and into the
caller so the state transition is obvious
- Place the manipulation of the work_list at the end, under lock, after
the cm_id is placed in the xarray. The work_count cannot change on an
ID outside the xarray.
- Add some comments
Link: https://lore.kernel.org/r/20200310092545.251365-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ib_create_cm_id() immediately places the id in the xarray, so it is visible
to network traffic.
The state is initially set to IB_CM_IDLE and all the MAD handlers will
test this state under lock and refuse to advance from IDLE, so adding to
the xarray is harmless.
Further, the set to IB_CM_SIDR_REQ_RCVD also excludes all MAD handlers.
However, the local_id isn't even used for SIDR mode, and there will be no
input MADs related to the newly created ID.
So, make the whole flow simpler so it can be understood:
- Do not put the SIDR cm_id in the xarray. This directly shows that there
is no concurrency
- Delete the confusing work_count and pending_list manipulations. This
mechanism is only used by MAD handlers and timewait, neither of which
apply to SIDR.
- Add a few comments and rename 'cur_cm_id_priv' to 'listen_cm_id_priv'
- Move other loose sets up to immediately after cm_id creation so that
the cm_id is fully configured right away. This fixes an oversight where
the service_id will not be returned back on a IB_SIDR_UNSUPPORTED
reject.
Link: https://lore.kernel.org/r/20200310092545.251365-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The lock should not be dropped before doing the pr_debug() print as it is
accessing data protected by the lock, such as id.state.
Fixes: 119bf81793 ("IB/cm: Add debug prints to ib_cm")
Link: https://lore.kernel.org/r/20200310092545.251365-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Any manipulation of cm_id->state must be done under the cm_id_priv->lock,
the two routines that added listens did not follow this rule, because they
never participate in any concurrent access around the state.
However, since this exception makes the code hard to understand, simplify
the flow so that it can be fully locked:
- Move manipulation of listen_sharecount into cm_insert_listen() so it is
trivially under the cm.lock without having to expose the cm.lock to the
caller.
- Push the cm.lock down into cm_insert_listen() and have the function
increment the reference count before returning an existing pointer.
- Split ib_cm_listen() into an cm_init_listen() and do not call
ib_cm_listen() from ib_cm_insert_listen()
- Make both ib_cm_listen() and ib_cm_insert_listen() directly call
cm_insert_listen() under their cm_id_priv->lock which does both a
collision detect and, if needed, the insert (atomically)
- Enclose all state manipulation within the cm_id_priv->lock, notice this
set can be done safely after cm_insert_listen() as no reader is allowed
to read the state without holding the lock.
- Do not set the listen cm_id in the xarray, as it is never correct to
look it up. This makes the concurrency simpler to understand.
Many needless error unwinds are removed in the process.
Link: https://lore.kernel.org/r/20200310092545.251365-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Too much of the destruction is very carefully sensitive to the state
and various other things. Move more code to the unconditional path and
add several WARN_ONs to check consistency.
Link: https://lore.kernel.org/r/20200310092545.251365-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When creating a cm_id during REQ the id immediately becomes visible to the
other MAD handlers, and shortly after the state is moved to IB_CM_REQ_RCVD
This allows cm_rej_handler() to run concurrently and free the work:
CPU 0 CPU1
cm_req_handler()
ib_create_cm_id()
cm_match_req()
id_priv->state = IB_CM_REQ_RCVD
cm_rej_handler()
cm_acquire_id()
spin_lock(&id_priv->lock)
switch (id_priv->state)
case IB_CM_REQ_RCVD:
cm_reset_to_idle()
kfree(id_priv->timewait_info);
goto destroy
destroy:
kfree(id_priv->timewait_info);
id_priv->timewait_info = NULL
Causing a double free or worse.
Do not free the timewait_info without also holding the
id_priv->lock. Simplify this entire flow by making the free unconditional
during cm_destroy_id() and removing the confusing special case error
unwind during creation of the timewait_info.
This also fixes a leak of the timewait if cm_destroy_id() is called in
IB_CM_ESTABLISHED with an XRC TGT QP. The state machine will be left in
ESTABLISHED while it needed to transition through IB_CM_TIMEWAIT to
release the timewait pointer.
Also fix a leak of the timewait_info if the caller mis-uses the API and
does ib_send_cm_reqs().
Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20200310092545.251365-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The test here typod the cm_id_priv to use, it used the one that was
freshly allocated. By definition the allocated one has the matching
cm_handler and zero context, so the condition was always true.
Instead check that the existing listening ID is compatible with the
proposed handler so that it can be shared, as was originally intended.
Fixes: 067b171b86 ("IB/cm: Share listening CM IDs")
Link: https://lore.kernel.org/r/20200310092545.251365-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
xa_alloc_cyclic() is a SMP release to be paired with some later acquire
during xa_load() as part of cm_acquire_id().
As such, xa_alloc_cyclic() must be done after the cm_id is fully
initialized, in particular, it absolutely must be after the
refcount_set(), otherwise the refcount_inc() in cm_acquire_id() may not
see the set.
As there are several cases where a reader will be able to use the
id.local_id after cm_acquire_id in the IB_CM_IDLE state there needs to be
an unfortunate split into a NULL allocate and a finalizing xa_store.
Fixes: a977049dac ("[PATCH] IB: Add the kernel CM implementation")
Link: https://lore.kernel.org/r/20200310092545.251365-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These judgments were used to keep the compatibility with older versions of
userspace that don't have the field named "cap_flags" in structure
hns_roce_ib_create_cq_resp. But it will be wrong to compare outlen with
the size of resp if another new field were added in resp. oulen should be
compared with the end offset of cap_flags in resp.
Fixes: 4f8f0d5e33 ("RDMA/hns: Package the flow of creating cq")
Link: https://lore.kernel.org/r/1583845569-47257-1-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky says:
====================
This series fixes various corner cases in the mlx5_ib MR cache
implementation, see specific commit messages for more information.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_mr-cache':
RDMA/mlx5: Allow MRs to be created in the cache synchronously
RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
RDMA/mlx5: Fix locking in MR cache work queue
RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
RDMA/mlx5: Fix MR cache size and limit debugfs
RDMA/mlx5: Always remove MRs from the cache before destroying them
RDMA/mlx5: Simplify how the MR cache bucket is located
RDMA/mlx5: Rename the tracking variables for the MR cache
RDMA/mlx5: Replace spinlock protected write with atomic var
{IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
{IB,net}/mlx5: Assign mkey variant in mlx5_ib only
{IB,net}/mlx5: Setup mkey variant before mr create command invocation
If the cache is completely out of MRs, and we are running in cache mode,
then directly, and synchronously, create an MR that is compatible with the
cache bucket using a sleeping mailbox command. This ensures that the
thread that is waiting for the MR absolutely will get one.
When a MR allocated in this way becomes freed then it is compatible with
the cache bucket and will be recycled back into it.
Deletes the very buggy ent->compl scheme to create a synchronous MR
allocation.
Link: https://lore.kernel.org/r/20200310082238.239865-13-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently if the work queue is running then it is in 'hysteresis' mode and
will fill until the cache reaches the high water mark. This implicit state
is very tricky and doesn't interact with pending very well.
Instead of self re-scheduling the work queue after the add_keys() has
started to create the new MR, have the queue scheduled from
reg_mr_callback() only after the requested MR has been added.
This avoids the bad design of an in-rush of queue'd work doing back to
back add_keys() until EAGAIN then sleeping. The add_keys() will be paced
one at a time as they complete, slowly filling up the cache.
Also, fix pending to be only manipulated under lock.
Link: https://lore.kernel.org/r/20200310082238.239865-12-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
All of the members of mlx5_cache_ent must be accessed while holding the
spinlock, add the missing spinlock in the __cache_work_func().
Using cache->stopped and flush_workqueue() is an inherently racy way to
shutdown self-scheduling work on a queue. Replace it with ent->disabled
under lock, and always check disabled before queuing any new work. Use
cancel_work_sync() to shutdown the queue.
Use READ_ONCE/WRITE_ONCE for dev->last_add to manage concurrency as
coherency is less important here.
Split fill_delay from the bitfield. C bitfield updates are not atomic and
this is just a mess. Use READ_ONCE/WRITE_ONCE, but this could also use
test_bit()/set_bit().
Link: https://lore.kernel.org/r/20200310082238.239865-11-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Accesses to these members needs to be locked. There is no reason not to
hold a spinlock while calling queue_work(), so move the tests into a
helper and always call it under lock.
The helper should be called when available_mrs is adjusted.
Link: https://lore.kernel.org/r/20200310082238.239865-10-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The size_write function is supposed to adjust the total_mr's to match the
user's request, but lacks locking and safety checking.
total_mrs can only be adjusted by at most available_mrs. mrs already
assigned to users cannot be revoked. Ensure that the user provides a
target value within the range of available_mrs and within the high/low
water mark.
limit_write has confusing and wrong sanity checking, and doesn't have the
ability to deallocate on limit reduction.
Since both functions use the same algorithm to adjust the available_mrs,
consolidate it into one function and write it correctly. Fix the locking
and by holding the spinlock for all accesses to ent->X.
Always fail if the user provides a malformed string.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Link: https://lore.kernel.org/r/20200310082238.239865-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The cache bucket tracks the total number of MRs that exists, both inside
and outside of the cache. Removing a MR from the cache (by setting
cache_ent to NULL) without updating total_mrs will cause the tracking to
leak and be inflated.
Further fix the rereg_mr path to always destroy the MR. reg_create will
always overwrite all the MR data in mlx5_ib_mr, so the MR must be
completely destroyed, in all cases, before this function can be
called. Detach the MR from the cache and unconditionally destroy it to
avoid leaking HW mkeys.
Fixes: afd1417404 ("IB/mlx5: Use direct mkey destroy command upon UMR unreg failure")
Fixes: 56e11d628c ("IB/mlx5: Added support for re-registration of MRs")
Link: https://lore.kernel.org/r/20200310082238.239865-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There are many bad APIs here that are accepting a cache bucket index
instead of a bucket pointer. Many of the callers already have a bucket
pointer, so this results in a lot of confusing uses of order2idx().
Pass the struct mlx5_cache_ent into add_keys(), remove_keys(), and
alloc_cached_mr().
Once the MR is in the cache, store the cache bucket pointer directly in
the MR, replacing the 'bool allocated_from cache'.
In the end there is only one place that needs to form index from order,
alloc_mr_from_cache(). Increase the safety of this function by disallowing
it from accessing cache entries in the ODP special area.
Link: https://lore.kernel.org/r/20200310082238.239865-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
mkey variant calculation was spinlock protected to make it atomic, replace
that with one atomic variable.
Link: https://lore.kernel.org/r/20200310082238.239865-4-leon@kernel.org
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
On reg_mr_callback() mlx5_ib is recalculating the mkey variant which is
wrong and will lead to using a different key variant than the one
submitted to firmware on create mkey command invocation.
To fix this, we store the mkey variant before invoking the firmware
command and use it later on completion (reg_mr_callback).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Peer to peer support was never implemented, so delete it to make code less
clutter.
Link: https://lore.kernel.org/r/20200310091438.248429-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Convert mlx4 to use in-kernel offsetofend() instead
of its duplicated implementation.
Link: https://lore.kernel.org/r/20200310091438.248429-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Until now the flex parser capability was used in ib_query_device() to
indicate tunnel_offloads_caps support for mpls_over_gre/mpls_over_udp.
Newer devices and firmware will have configurations with the flexparser
but without mpls support.
Testing for the flex parser capability was a mistake, the tunnel_stateless
capability was intended for detecting mpls and was introduced at the same
time as the flex parser capability.
Otherwise userspace will be incorrectly informed that a future device
supports MPLS when it does not.
Link: https://lore.kernel.org/r/20200305123841.196086-1-leon@kernel.org
Cc: <stable@vger.kernel.org> # 4.17
Fixes: e818e255a5 ("IB/mlx5: Expose MPLS related tunneling offloads")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Those macros are already defined in include/linux/mlx5/driver.h, so delete
their duplicate variants.
Link: https://lore.kernel.org/r/20200310075706.238592-1-leon@kernel.org
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The header file rdma_core.h is duplicate, so let's remove it.
Fixes: 622db5b643 ("RDMA/core: Add trace points to follow MR allocation")
Link: https://lore.kernel.org/r/20200310091656.249696-1-leon@kernel.org
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
'wqe' is already zeroed at the top of the 'while' loop, just a few lines
below, and is not used outside of the loop.
So there is no need to zero it again, or for the variable to be declared
outside the loop.
Link: https://lore.kernel.org/r/20200308065442.5415-1-christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The tgid used to be part of ib_umem_free_notifier(), when it was reworked
it got moved to release, but it should have been unconditional as all umem
alloc paths get the tgid.
As is, creating an implicit ODP will leak the tgid reference.
Link: https://lore.kernel.org/r/20200304181607.GA22412@ziepe.ca
Cc: stable@kernel.org
Fixes: f25a546e65 ("RDMA/odp: Use mmu_interval_notifier_insert()")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since RDMA rw API support operations with P2P memory sg list, make sure to
map/unmap the scatter list for signature operation correctly.
Link: https://lore.kernel.org/r/20200220100819.41860-2-maxg@mellanox.com
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5lkYceHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGpHQH/RJrzcaZHo4lw88m
Jf7vBZ9DYUlRgqE0pxTHWmodNObKRqpwOUGflUcWbb/7GD2LQUfeqhSECVQyTID9
N9y7FcPvx321Qhc3EkZ24DBYk0+DQ0K2FVUrSa/PxO0n7czxxXWaLRDmlSULEd3R
D4pVs3zEWOBXJHUAvUQ5R+lKfkeWKNeeepeh+rezuhpdWFBRNz4Jjr5QUJ8od5xI
sIwobYmESJqTRVBHqW8g2T2/yIsFJ78GCXs8DZLe1wxh40UbxdYDTA0NDDTHKzK6
lxzBgcmKzuge+1OVmzxLouNWMnPcjFlVgXWVerpSy3/SIFFkzzUWeMbqm6hKuhOn
wAlcIgI=
=VQUc
-----END PGP SIGNATURE-----
Merge tag 'v5.6-rc5' into rdma.git for-next
Required due to dependencies in following patches.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas Says:
====================
Expose raw packet pacing APIs to be used by DEVX based applications. The
existing code was refactored to have a single flow with the new raw APIs.
====================
Based on the mlx5-next branch at
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies
* branch 'mlx5_packet_pacing':
IB/mlx5: Introduce UAPIs to manage packet pacing
net/mlx5: Expose raw packet pacing APIs
Introduce packet pacing uobject and its alloc and destroy
methods.
This uobject holds mlx5 packet pacing context according to the device
specification and enables managing packet pacing device entries that are
needed by DEVX applications.
Link: https://lore.kernel.org/r/20200219190518.200912-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
- Fix busted syzkaller fix in 'get_new_pps' - this turned out to crash on
certain HW configurations
- Bug fixes for various missed things in error unwinds
- Add a missing rcu_read_lock annotation in hfi/qib
- Fix two ODP related regressions from the recent mmu notifier changes
- Several more syzkaller bugs in siw, RDMA netlink, verbs and iwcm
- Revert an old patch in CMA as it is now shown to not be allocating port
numbers properly
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl5iWSQACgkQOG33FX4g
mxoadw//ZkIcG25OMhgc4iqOXT+brCCYosdi1MB8ptcW/lx+t2jH8VD9cd8kOW4M
VfFIpiuqVc6U06BpoRJkSV3Ix5Hiw0nQVD9q1mNiqSs0fyAuJG0NGtVeqWWXSFFC
ptHzn1z5Aw9GV2necS+nJcZ3NceMW/rP255LHioqVfj7xSFJiymXfncH7YwQZOop
S88Dr3m+DibW+ueVwvtLPvSPaWL40NGZo4sNuITrfiJuHYvstWedUMtYkGCGjrmT
bUI7lpYgsakVTlM2LTtlAFrAoL/adkfrNbiCVLqGLpoy3DIdXVscQzt9CRnCP1iF
t1l0jY+2YNAMMfjktLDnhUU7wfAwgw/XTNoqzlRCAAiTp7D8+eo560Txj9xyjGw+
spxGOWuDEVWlBOFHHltRbQ13QZ06vA7yg0YqoIuEg86c+X38NoVEA3sRf59v05qM
XqPcdIBusjRfd8kZsk07uYbp5VQsNHSfL2ZtxAFwiWFr4stjBcwqrx3sFw5610uZ
Pt6uWN6JlGRb7A35I0ZuRwWhN1HTFkd7rIKK3d5hTWcqefH6JAkZldMsG0qt/YW2
nRnoZhUNwtP2YI6eOTpskQCyK41tqP5tC84k1GMBuAxMYw40FFqN9/M7v0h9NWq7
Eq8BMjbLB6DDR8cBJk7uoYfpYM6slnGLlDGfrLRR9j1oWv6iuCY=
=SFSu
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Nothing particularly exciting, some small ODP regressions from the mmu
notifier rework, another bunch of syzkaller fixes, and a bug fix for a
botched syzkaller fix in the first rc pull request.
- Fix busted syzkaller fix in 'get_new_pps' - this turned out to
crash on certain HW configurations
- Bug fixes for various missed things in error unwinds
- Add a missing rcu_read_lock annotation in hfi/qib
- Fix two ODP related regressions from the recent mmu notifier
changes
- Several more syzkaller bugs in siw, RDMA netlink, verbs and iwcm
- Revert an old patch in CMA as it is now shown to not be allocating
port numbers properly"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/iwcm: Fix iwcm work deallocation
RDMA/siw: Fix failure handling during device creation
RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
RDMA/odp: Ensure the mm is still alive before creating an implicit child
RDMA/core: Fix protection fault in ib_mr_pool_destroy
IB/mlx5: Fix implicit ODP race
IB/hfi1, qib: Ensure RCU is locked when accessing list
RDMA/core: Fix pkey and port assignment in get_new_pps
RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
RDMA/rw: Fix error flow during RDMA context initialization
RDMA/core: Fix use of logical OR in get_new_pps
Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver did not previously reject unsupported parameters.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dealloc_work_entries() function must update the work_free_list pointer
while freeing its entries, since potentially called again on same list. A
second iteration of the work list caused system crash. This happens, if
work allocation fails during cma_iw_listen() and free_cm_id() tries to
free the list again during cleanup.
Fixes: 922a8e9fb2 ("RDMA: iWARP Connection Manager.")
Link: https://lore.kernel.org/r/20200302181614.17042-1-bmt@zurich.ibm.com
Reported-by: syzbot+cb0c054eabfba4342146@syzkaller.appspotmail.com
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A failing call to ib_device_set_netdev() during device creation caused
system crash due to xa_destroy of uninitialized xarray hit by device
deallocation. Fixed by moving xarray initialization before potential
device deallocation.
Fixes: bdcf26bf9b ("rdma/siw: network and RDMA core interface")
Link: https://lore.kernel.org/r/20200302155814.9896-1-bmt@zurich.ibm.com
Reported-by: syzbot+2e80962bedd9559fe0b3@syzkaller.appspotmail.com
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Holding the rtnl_lock while iterating a devices interface address list
potentially causes deadlocks with the cma_netdev_callback. While this was
implemented to limit the scope of a wildcard listen to addresses of the
current device only, a better solution limits the scope of the socket to
the device. This completely avoiding locking, and also results in
significant code simplification.
Fixes: c421651fa2 ("RDMA/siw: Add missing rtnl_lock around access to ifa")
Link: https://lore.kernel.org/r/20200228173534.26815-1-bmt@zurich.ibm.com
Reported-by: syzbot+55de90ab5f44172b0c90@syzkaller.appspotmail.com
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Cited commit missed to include low level congestion control related
debugfs stage initialization. This resulted in missing debugfs entries
for cc_params of a RDMA device.
Add them back.
Fixes: b5ca15ad7e ("IB/mlx5: Add proper representors support")
Link: https://lore.kernel.org/r/20200227125407.99803-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add two debugfs parameters described below.
np_min_time_between_cnps - Minimum time between sending CNPs from the
port.
Unit = microseconds.
Default = 0 (no min wait time; generated
based on incoming ECN marked packets).
rp_max_rate - Maximum rate at which reaction point node can transmit.
Once this limit is reached, RP is no longer rate limited.
Unit = Mbits/sec
Default = 0 (full speed)
Link: https://lore.kernel.org/r/20200227125246.99472-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Registration of a mmu_notifier requires the caller to hold a mmget() on
the mm as registration is not permitted to race with exit_mmap(). There is
a BUG_ON inside the mmu_notifier to guard against this.
Normally creating a umem is done against current which implicitly holds
the mmget(), however an implicit ODP child is created from a pagefault
work queue and is not guaranteed to have a mmget().
Call mmget() around this registration and abort faulting if the MM has
gone to exit_mmap().
Before the patch below the notifier was registered when the implicit ODP
parent was created, so there was no chance to register a notifier outside
of current.
Fixes: c571feca2d ("RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'")
Link: https://lore.kernel.org/r/20200227114118.94736-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Following race may occur because of the call_srcu and the placement of
the synchronize_srcu vs the xa_erase.
CPU0 CPU1
mlx5_ib_free_implicit_mr: destroy_unused_implicit_child_mr:
xa_erase(odp_mkeys)
synchronize_srcu()
xa_lock(implicit_children)
if (still in xarray)
atomic_inc()
call_srcu()
xa_unlock(implicit_children)
xa_erase(implicit_children):
xa_lock(implicit_children)
__xa_erase()
xa_unlock(implicit_children)
flush_workqueue()
[..]
free_implicit_child_mr_rcu:
(via call_srcu)
queue_work()
WARN_ON(atomic_read())
[..]
free_implicit_child_mr_work:
(via wq)
free_implicit_child_mr()
mlx5_mr_cache_invalidate()
mlx5_ib_update_xlt() <-- UMR QP fail
atomic_dec()
The wait_event() solves the race because it blocks until
free_implicit_child_mr_work() completes.
Fixes: 5256edcb98 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200227113918.94432-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Commit f164be8c03 ("IB/mlx5: Extend caps stage to handle VAR
capabilities") introduced a straight "/" division of the u64 variable
"bar_size".
This was fixed with commit 685eff5131 ("IB/mlx5: Use div64_u64 for
num_var_hw_entries calculation"). However, div64_u64() is redundant here
as mlx5_var_table::stride_size is of type u32. Make the actual code way
more optimized on 32-bit kernels using div_u64() and fix 80 chars
break-through by the way.
Fixes: 685eff5131 ("IB/mlx5: Use div64_u64 for num_var_hw_entries calculation")
Link: https://lore.kernel.org/r/20200217073629.8051-1-alobakin@dlink.ru
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl5cOX8eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGw0AH/0nex1uEpzUTm+Gw
D8QPFr3y61sYLu7sIMVt39+Zl6OSxvOOX14QIM/mrNrzjjRI8EXGvYgES5gSO4on
6NLS6/64c1oQThDHCsxusKoSWLZ9KqP2vRPt7tZjn7DZMzEsuLhlINKBeupcqALX
FnOBr768P+if/j0WcDR2pBaMg3ch+XC5sfYav7kapjgWUqCx9BvrHKLXXdlEGUC0
7Ku7PH+nF7CIHiTay+i89odvOd8aLGsa/SUf5XGauKkH65VgQkmksgPeZUPqTnyC
MEsyLJLfn4AP3ySwqzfSLac8jqZG8FGBt4DgM2MQBHibctzfeMIznfcfh/A8+Edx
jqLKLAs=
=4075
-----END PGP SIGNATURE-----
Merge tag 'v5.6-rc4' into rdma.git for-next
Required due to dependencies in following patches.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The proper return code is "-EOPNOTSUPP" when the requested QP type is
not supported by the provider.
Link: https://lore.kernel.org/r/20200130082049.463-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Relaxed ordering is not supported in UMR so we are disabling UMR usage
when user passes relaxed ordering access flag.
Enable using UMR when user requested relaxed ordering but there are no
relaxed ordering capabilities.
This will prevent user from unnecessarily registering a new mkey.
Fixes: d6de0bb185 ("RDMA/mlx5: Set relaxed ordering when requested")
Link: https://lore.kernel.org/r/20200227113834.94233-1-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c: In function '__send_message':
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:101:10: warning:
variable 'idx' set but not used [-Wunused-but-set-variable]
drivers/infiniband/hw/bnxt_re/qplib_rcfw.c:101:6: warning:
variable 'pg' set but not used [-Wunused-but-set-variable]
commit cee0c7bba4 ("RDMA/bnxt_re: Refactor command queue management
code") involved this, but not used.
Link: https://lore.kernel.org/r/20200227064900.92255-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_create_gsi_qp':
drivers/infiniband/hw/bnxt_re/ib_verbs.c:1283:30: warning:
variable 'dev_attr' set but not used [-Wunused-but-set-variable]
commit 8dae419f9e ("RDMA/bnxt_re: Refactor queue pair creation code")
involved this, but not used, so remove it.
Link: https://lore.kernel.org/r/20200227064542.91205-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/infiniband/hw/bnxt_re/qplib_res.c: In function '__alloc_pbl':
drivers/infiniband/hw/bnxt_re/qplib_res.c:109:13: warning:
variable 'pg_size' set but not used [-Wunused-but-set-variable]
commit 0c4dcd6028 ("RDMA/bnxt_re: Refactor hardware queue memory
allocation") involved this, but not used, so remove it.
Link: https://lore.kernel.org/r/20200227064209.87893-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Using the new unregister APIs provided by the core. Provide the
dealloc_driver hook for the core to callback at the time of device
un-registration.
bnxt_re VF resources are created by the corresponding PF driver. During
ib_unregister_driver, PF might get removed before VF and this could cause
failure when VFs are removed. Driver is explicitly queuing the removal of
VF devices before calling ib_unregister_driver.
Link: https://lore.kernel.org/r/1582731932-26574-3-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
- bnxt_re_ib_reg() handles two main functionalities - initializing the
device and registering with the IB stack. Split it into 2 functions
i.e. bnxt_re_dev_init() and bnxt_re_ib_init() to account for the same
thereby improve modularity. Do the same for
bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
and bnxt_re_ib_uninit().
- Simplify the code by combining the different steps to add and remove
the device into two functions.
- Report correct netdev link state during device register
Link: https://lore.kernel.org/r/1582731932-26574-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The packet handling function, specifically the iteration of the qp list
for mad packet processing misses locking RCU before running through the
list. Not only is this incorrect, but the list_for_each_entry_rcu() call
can not be called with a conditional check for lock dependency. Remedy
this by invoking the rcu lock and unlock around the critical section.
This brings MAD packet processing in line with what is done for non-MAD
packets.
Fixes: 7724105686 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20200225195445.140896.41873.stgit@awfm-01.aw.intel.com
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When destroying a DMA mmapped object, there is no need to artificially
delay the freeing of the pages to the mmap entry removal. Since the vma
keeps a reference count on these pages, free_pages_exact can be called on
the destroy verb as it won't really free the pages until the reference
count is cleared (in case the user hasn't called munmap yet).
Remove the special handling of DMA pages and call free_pages_exact on
destroy_qp/cq. The mmap entry removal is moved to the beginning of the
destroy flows, so the driver can safely free the pages.
Link: https://lore.kernel.org/r/20200225114010.21790-4-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The fact that the LSB in the register is the enable bit should not be an
implicit assumption between the driver and the device, properly document
that in the register definition.
Link: https://lore.kernel.org/r/20200225114010.21790-3-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use unified macros for device structs access instead of open coding the
shifts and masks over and over again.
Link: https://lore.kernel.org/r/20200225114010.21790-2-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Encapsulate the kernel qp doorbell allocation related code into 2
functions: alloc_qp_db() and free_qp_db().
Link: https://lore.kernel.org/r/1582526258-13825-8-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Encapsulate the kernel qp wrid allocation related code into 2 functions:
alloc_kernel_wrid() and free_kernel_wrid().
Link: https://lore.kernel.org/r/1582526258-13825-7-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Encapsulate qp buffer allocation related code into 3 functions:
alloc_qp_buf(), map_wqe_buf() and free_qp_buf().
Link: https://lore.kernel.org/r/1582526258-13825-5-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Encapsulate the code associated with the qp number assignment into
alloc_qpn() and free_qpn().
Link: https://lore.kernel.org/r/1582526258-13825-4-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Rename the qp context related functions and adjusts the code location to
distinguish between the qp context and the entire qp.
Link: https://lore.kernel.org/r/1582526258-13825-3-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Wrap the duplicate code in hip08 and hip06 qp destruction process as
hns_roce_qp_destroy() to simply the qp destroy flow.
Link: https://lore.kernel.org/r/1582526258-13825-2-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There are two paths to update qp producer index into hardware now, one
path is doorbell in post verbs (send and recv), the another is mailbox in
modify qp verb which is called by flush process. This will lead the
hardware to be broken to correctly generate flush cqe. With stopping
doorbell update and holding qp spinlock in modify qp during flush process,
the problem can be solved.
Fixes: 0425e3e6e0 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Link: https://lore.kernel.org/r/1582367158-27030-3-git-send-email-liuyixian@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Set revisions that equal to or higher than HIP08_B as default to maintain
backward compatibility.
Link: https://lore.kernel.org/r/1582363039-10714-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The algorithm pre-allocates a cm_id since allocation cannot be done while
holding the cm.lock spinlock, however it doesn't free it on one error
path, leading to a memory leak.
Fixes: 067b171b86 ("IB/cm: Share listening CM IDs")
Link: https://lore.kernel.org/r/20200221152023.GA8680@ziepe.ca
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The default version provided by "ethtool -i" it the correct way
to identify Driver version. There is no need to overwrite it.
Link: https://lore.kernel.org/r/20200220071239.231800-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to set driver version in in-tree kernel code.
Link: https://lore.kernel.org/r/20200220071239.231800-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Make sure to set the active_{speed, width} attributes to avoid reporting
the same values regardless of the underlying device.
Fixes: 303ae1cdfd ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/20200218095911.26614-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Tested-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>