Commit Graph

782047 Commits

Author SHA1 Message Date
Yishai Hadas 991d219829 IB/mlx5: Set uid as part of QP creation
Set uid as part of QP creation so that the firmware can manage the
QP object in a secured way.

The uid for the destroy and the modify commands is set by mlx5_core.

This will enable using a QP that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Yishai Hadas a1069c1c75 IB/mlx5: Use uid as part of PD commands
Use uid as part of PD commands so that the firmware can manage the
PD object in a secured way.

For example when a QP is created its uid must match the CQ uid which it
uses.

Next patches in this series will use the uid from the PD, then will come
a patch to set the uid on the PD so that all objects will be properly
work in one change.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:06:04 -06:00
Jason Gunthorpe 1d6fba92d7 Merge branch 'mellanox/mlx5-next' into rdma.git for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

This is required to resolve dependencies of the next series of RDMA
patches.

* branch 'mellanox/mlx5-next':
  net/mlx5: Update mlx5_ifc with DEVX UID bits
  net/mlx5: Set uid as part of DCT commands
  net/mlx5: Set uid as part of SRQ commands
  net/mlx5: Set uid as part of SQ commands
  net/mlx5: Set uid as part of RQ commands
  net/mlx5: Set uid as part of QP commands
  net/mlx5: Set uid as part of CQ commands

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-25 14:02:48 -06:00
Leon Romanovsky bd37197554 net/mlx5: Update mlx5_ifc with DEVX UID bits
Add DEVX information to WQ, SRQ, CQ, TIR, TIS, QP,
RQ, XRCD, PD, MKEY and MCG.

Each object that is created/destroyed/modified via verbs will
be stamped with a UID based on its user context. This is already
done for DEVX objects commands.

This will enable the firmware to enforce the usage of kernel objects
from the DEVX flow by validating that the same UID is used and the
resources are really related to the same user.

The addition of *_valid fields are needed to distinguish
how various addresses are passed.

For non-DEVX callers, all those fields will be zero.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-25 10:10:58 +03:00
Yishai Hadas 774ea6eea2 net/mlx5: Set uid as part of DCT commands
Set uid as part of DCT commands so that the firmware can manage the
DCT object in a secured way.

That will enable using a DCT that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-25 09:52:55 +03:00
Yishai Hadas a0d8c05431 net/mlx5: Set uid as part of SRQ commands
Set uid as part of SRQ commands so that the firmware can manage the
SRQ object in a secured way.

That will enable using an SRQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-25 09:52:52 +03:00
Yishai Hadas 430ae0d5a3 net/mlx5: Set uid as part of SQ commands
Set uid as part of SQ commands so that the firmware can manage the
SQ object in a secured way.

That will enable using an SQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-25 09:52:48 +03:00
Yishai Hadas d269b3afff net/mlx5: Set uid as part of RQ commands
Set uid as part of RQ commands so that the firmware can manage the
RQ object in a secured way.

That will enable using an RQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-25 09:52:44 +03:00
Yishai Hadas 4ac63ec725 net/mlx5: Set uid as part of QP commands
Set uid as part of QP commands so that the firmware can manage the
QP object in a secured way.

That will enable using a QP that was created by verbs application to
be used by the DEVX flow in case the uid is equal.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-25 09:52:39 +03:00
Yishai Hadas 9ba481e2eb net/mlx5: Set uid as part of CQ commands
Set uid as part of CQ commands so that the firmware can manage the CQ
object in a secured way.

The firmware should mark this CQ with the given uid so that it can
be used later on only by objects with the same uid.

Upon DEVX flows that use this CQ (e.g. create QP command), the
pointed CQ must have the same uid as of the issuer uid command.

When a command is issued with uid=0 it means that the issuer of the
command is trusted (i.e. kernel), in that case any pointed object
can be used regardless of its uid.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-25 09:52:35 +03:00
Doug Ledford f9882bb506 Merge branch 'mlx5-vport-loopback' into rdma.get
For dependencies, branch based on 'mlx5-next' of
    git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

mlx5 mcast/ucast loopback control enhancements from Leon Romanovsky:

====================
This is short series from Mark which extends handling of loopback
traffic. Originally mlx5 IB dynamically enabled/disabled both unicast
and multicast based on number of users. However RAW ethernet QPs need
more granular access.
====================

Fixed failed automerge in mlx5_ib.h (minor context conflict issue)

mlx5-vport-loopback branch:
    RDMA/mlx5: Enable vport loopback when user context or QP mandate
    RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
    RDMA/mlx5: Refactor transport domain bookkeeping logic
    net/mlx5: Rename incorrect naming in IFC file

Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:41:58 -04:00
Mark Bloch 0042f9e458 RDMA/mlx5: Enable vport loopback when user context or QP mandate
A user can create a QP which can accept loopback traffic, but that's not
enough. We need to enable loopback on the vport as well. Currently vport
loopback is enabled only when more than 1 users are using the IB device,
update the logic to consider whatever a QP which supports loopback was
created, if so enable vport loopback even if there is only a single user.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Mark Bloch 175edba856 RDMA/mlx5: Allow creating RAW ethernet QP with loopback support
Expose two new flags:
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_UC
MLX5_QP_FLAG_TIR_ALLOW_SELF_LB_MC

Those flags can be used at creation time in order to allow a QP
to be able to receive loopback traffic (unicast and multicast).
We store the state in the QP to be used on the destroy path
to indicate with which flags the QP was created with.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Mark Bloch a560f1d9af RDMA/mlx5: Refactor transport domain bookkeeping logic
In preparation to enable loopback on a single user context move the logic
that enables/disables loopback to separate functions and group variables
under a single struct.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 20:20:59 -04:00
Mark Bloch 5d773ff41a net/mlx5: Rename incorrect naming in IFC file
Remove a trailing underscore from the multicast/unicast names.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-22 00:38:39 +03:00
zhong jiang 26f91da296 RDMA/cxgb4: remove redundant null pointer check before kfree_skb
kfree_skb has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before kfree_skb.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 12:00:50 -04:00
Nathan Chancellor fa8f11586a IB/mlx4: Remove unnecessary parentheses
Clang warns when more than one set of parentheses are used in single
conditional statements.

drivers/infiniband/hw/mlx4/mcg.c:676:16: warning: equality comparison
with extraneous parentheses [-Wparentheses-equality]
                        if ((method == IB_MGMT_METHOD_GET_RESP)) {
                             ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/mcg.c:676:16: note: remove extraneous
parentheses around the comparison to silence this warning
                        if ((method == IB_MGMT_METHOD_GET_RESP)) {
                            ~       ^                         ~
drivers/infiniband/hw/mlx4/mcg.c:676:16: note: use '=' to turn this
equality comparison into an assignment
                        if ((method == IB_MGMT_METHOD_GET_RESP)) {
                                    ^~
                                    =

Remove the unnecessary parentheses to silence this warning.

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 12:00:50 -04:00
Nathan Chancellor b9f86e6e7b IB/nes: Remove unnecessary parentheses
Clang warns when more than one set of parentheses are used in single
conditional statements.

drivers/infiniband/hw/nes/nes_hw.c:1446:27: warning: equality comparison
with extraneous parentheses [-Wparentheses-equality]
        } while ((temp_phy_data2 == temp_phy_data));
                  ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: remove extraneous
parentheses around the comparison to silence this warning
        } while ((temp_phy_data2 == temp_phy_data));
                 ~               ^               ~
drivers/infiniband/hw/nes/nes_hw.c:1446:27: note: use '=' to turn this
equality comparison into an assignment
        } while ((temp_phy_data2 == temp_phy_data));
                                 ^~
                                 =

Remove the unnecessary parentheses to silence this warning.

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 12:00:50 -04:00
Jason Gunthorpe 2a3ccfdbeb RDMA/uverbs: Get rid of ucontext->tgid
Nothing uses this now, just delete it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:58:36 -04:00
Jason Gunthorpe 56ac9dd917 RDMA/umem: Avoid synchronize_srcu in the ODP MR destruction path
synchronize_rcu is slow enough that it should be avoided on the syscall
path when user space is destroying MRs. After all the rework we can now
trivially do this by having call_srcu kfree the per_mm.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:58:36 -04:00
Jason Gunthorpe be7a57b41a RDMA/umem: Handle a half-complete start/end sequence
mmu_notifier_unregister() can race between a invalidate_start/end and
cause the invalidate_end to be skipped. This causes an imbalance in the
locking, which lockdep complains about.

This is not actually a bug, as we immediately kfree the memory holding the
lock, but it simple enough to fix.

Mark when the notifier is being destroyed and abort the start callback.
This can be done under the lock we already obtained, and can re-purpose
the invalidate_range test we already have.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:58:36 -04:00
Jason Gunthorpe ca748c39ea RDMA/umem: Get rid of per_mm->notifier_count
This is intrinsically racy and the scheme is simply unnecessary. New MR
registration can wait for any on going invalidation to fully complete.

      CPU0                              CPU1
                                  if (atomic_read())
 if (atomic_dec_and_test() &&
     !list_empty())
  { /* not taken */ }
                                       list_add()

Putting the new UMEM into some kind of purgatory until another invalidate
rolls through..

Instead hold the read side of the umem_rwsem across the pair'd start/end
and get rid of the racy 'deferred add' approach.

Since all umem's in the rbt are always ready to go, also get rid of the
mn_counters_active stuff.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:58:36 -04:00
Jason Gunthorpe f27a0d50a4 RDMA/umem: Use umem->owning_mm inside ODP
Since ODP had a single struct mmu_notifier located in the ucontext it
could only handle a single MM at a time, and this prevented it from using
the new owning_mm system.

With the prior rework it is now simple to let ODP track multiple MMs per
ucontext, finish the job so that the per_mm is allocated on a mm by mm
basis, and freed when the last umem is dropped from the ucontext.

As a side effect the new saner locking removes the lockdep splat about
nesting the umem_rwsem between mmu_notifier_unregister and
ib_umem_odp_release.

It also makes ODP work with multiple processes, across, fork, etc.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:58:36 -04:00
Jason Gunthorpe c9990ab39b RDMA/umem: Move all the ODP related stuff out of ucontext and into per_mm
This is the first step to make ODP use the owning_mm that is now part of
struct ib_umem.

Each ODP umem is linked to a single per_mm structure, which in turn, is
linked to a single mm, via the embedded mmu_notifier. This first patch
introduces the structure and reworks eveything to use it.

This also needs to introduce tgid into the ib_ucontext_per_mm, as
get_user_pages_remote() requires the originating task for statistics
tracking.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe 597ecc5a09 RDMA/umem: Get rid of struct ib_umem.odp_data
This no longer has any use, we can use container_of to get to the
umem_odp, and a simple flag to indicate if this is an odp MR. Remove the
few remaining references to it.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe 41b4deeaa1 RDMA/umem: Make ib_umem_odp into a sub structure of ib_umem
These two structures are linked together, use the container_of pattern
instead of a double allocation to make the code simpler and easier to
follow.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe b5231b019d RDMA/umem: Use ib_umem_odp in all function signatures connected to ODP
All of these functions already require the ODP version of the umem struct,
make this very clear by having the signature require it. This paves the
way to using the container_of() pattern to link umem_odp and umem
together.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-21 11:54:46 -04:00
Jason Gunthorpe ece8ea7bfa RDMA/usnic: Do not use ucontext->tgid
Update this driver to match the code it copies from umem.c which no longer
uses tgid.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Jason Gunthorpe d4b4dd1b97 RDMA/umem: Do not use current->tgid to track the mm_struct
This is just wrong, the process that calls into the reg_mr is the process
associated with the umem, and that does not have to be the same process
that created the context.

When this code was first written mmgrab() didn't exist, however these days
we can just directly hold the mm_struct pointer in the umem and have no
ambiguity when it comes to releasing the umem as to which mm it was
associated with.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Jason Gunthorpe ce92db1ca8 RDMA/ucontext: Get rid of the old disassociate flow
The disassociate_ucontext function in every driver is now empty, so we
don't need this ugly and wrong code that was messing with tgids.

rdma_user_mmap_io does this same work in a better way.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Jason Gunthorpe 6745d356ab RDMA/hns: Use rdma_user_mmap_io
Rely on the new core code helper to map BAR memory from the driver.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Jason Gunthorpe e2cd1d1ad2 RDMA/mlx5: Use rdma_user_mmap_io
Rely on the new core code helper to map BAR memory from the driver.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Jason Gunthorpe c282da4109 RDMA/mlx4: Use rdma_user_mmap_io
Rely on the new core code helper to map BAR memory from the driver.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
Jason Gunthorpe 5f9794dc94 RDMA/ucontext: Add a core API for mmaping driver IO memory
To support disassociation and PCI hot unplug, we have to track all the
VMAs that refer to the device IO memory. When disassociation occurs the
VMAs have to be revised to point to the zero page, not the IO memory, to
allow the physical HW to be unplugged.

The three drivers supporting this implemented three different versions
of this algorithm, all leaving something to be desired. This new common
implementation has a few differences from the driver versions:

- Track all VMAs, including splitting/truncating/etc. Tie the lifetime of
  the private data allocation to the lifetime of the vma. This avoids any
  tricks with setting vm_ops which Linus didn't like. (see link)
- Support multiple mms, and support properly tracking mmaps triggered by
  processes other than the one first opening the uverbs fd. This makes
  fork behavior of disassociation enabled drivers the same as fork support
  in normal drivers.
- Don't use crazy get_task stuff.
- Simplify the approach for to racing between vm_ops close and
  disassociation, fixing the related bugs most of the driver
  implementations had. Since we are in core code the tracking list can be
  placed in struct ib_uverbs_ufile, which has a lifetime strictly longer
  than any VMAs created by mmap on the uverbs FD.

Link: https://www.spinics.net/lists/stable/msg248747.html
Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.com
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20 16:19:30 -04:00
liuyixian b00a92c8f2 RDMA/hns: Move all prints out of irq handle
It will trigger unnecessary interrupts caused by time out if prints inside
aeq handle under some configurations.  Thus, move all prints out of aeq
handle to work queue.

Signed-off-by: liuyixian <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-19 15:33:57 -06:00
Jason Gunthorpe 0099103926 RDMA/uverbs: Fix error unwind in ib_uverbs_add_one
The error path has several mistakes

- cdev_del should not be called if cdev_device_add fails
- We must call put_device on all the goto exit paths as that is what frees
  the uapi, SRCU and the struct itself.

While we are here consolidate all the uvdev_dev init that cannot fail at
the top.

Fixes: c5c4d92e70 ("RDMA/uverbs: Use cdev_device_add() instead of cdev_add()")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
2018-09-19 10:49:22 -06:00
YueHaibing 0965cc953a RDMA/core: Properly return the error code of rdma_set_src_addr_rcu
rdma_set_src_addr_rcu should check copy_src_l2_addr fails, rather than
always return 0. Also copy_src_l2_addr should return 'ret' as its return
value when rdma_translate_ip fails.

Fixes: c31d4b2ddf ("RDMA/core: Protect against changing dst->dev during destination resolve")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-19 10:12:44 -06:00
Håkon Bugge 802fa45cd3 RDMA/i40iw: Fix incorrect iterator type
Commit f27b4746f3 ("i40iw: add connection management code") uses an
incorrect rcu iterator, whilst holding the rtnl_lock. Since the
critical region invokes i40iw_manage_qhash(), which is a sleeping
function, the rcu locking and traversal cannot be used.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-19 10:08:20 -06:00
Jason Gunthorpe 6ebce44746 RDMA/uverbs: Remove is_closed from ib_uverbs_file
This does nothing but indicate if the uverbs_file is in the device's list,
use list_del_init instead.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-19 10:07:05 -06:00
Jason Gunthorpe 9a59739bd0 IB/rxe: Revise the ib_wr_opcode enum
This enum has become part of the uABI, as both RXE and the
ib_uverbs_post_send() command expect userspace to supply values from this
enum. So it should be properly placed in include/uapi/rdma.

In userspace this enum is called 'enum ibv_wr_opcode' as part of
libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV,
IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it
turns out) into libiberbs in 2015.

The kernel has changed its mind on the numbering for several of the IB_WC
values over the years, but has remained stable on IB_WR_LOCAL_INV and
below.

Based on this we can conclude that there is no real user space user of the
values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via
rdma-core. This is confirmed by inspection, only rxe uses the kernel enum
and implements the latter operations. rxe has clearly never worked with
these attributes from userspace. Other drivers that support these opcodes
implement the functionality without calling out to the kernel.

To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
choose to renumber the IB_WR enum in the kernel to match the uABI that
userspace has bee using since before Soft RoCE was merged. This is an
overall simpler configuration for the whole software stack, and obviously
can't break anything existing.

Reported-by: Seth Howell <seth.howell@intel.com>
Tested-by: Seth Howell <seth.howell@intel.com>
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-17 17:06:02 -06:00
YueHaibing cb816cd226 RDMA: Remove duplicated include from ib_addr.h
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-13 12:59:56 -04:00
Arseny Maslennikov f6350da41d IB/ipoib: Log sysfs 'dev_id' accesses from userspace
Some tools may currently be using only the deprecated attribute;
let's print an elaborate and clear deprecation notice to kmsg.

To do that, we have to replace the whole sysfs file, since we inherit
the original one from netdev.

Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-13 11:47:52 -04:00
Arseny Maslennikov 9b8b2a3230 IB/ipoib: Use dev_port to expose network interface port numbers
Some InfiniBand network devices have multiple ports on the same PCI
function. This initializes the `dev_port' sysfs field of those
network interfaces with their port number.

Prior to this the kernel erroneously used the `dev_id' sysfs
field of those network interfaces to convey the port number to userspace.

The use of `dev_id' was considered correct until Linux 3.15,
when another field, `dev_port', was defined for this particular
purpose and `dev_id' was reserved for distinguishing stacked ifaces
(e.g: VLANs) with the same hardware address as their parent device.

Similar fixes to net/mlx4_en and many other drivers, which started
exporting this information through `dev_id' before 3.15, were accepted
into the kernel 4 years ago.
See 76a066f2a2 (`net/mlx4_en: Expose port number through sysfs').

Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-13 11:47:52 -04:00
Arseny Maslennikov 4c0b6534c9 Documentation/ABI: document /sys/class/net/*/dev_port
The sysfs field was introduced 4 years ago along with fixes to various
drivers that erroneously used `dev_id' for that purpose, but it was not
properly documented anywhere.
See commit v3.14-rc3-739-g3f85944fe207.

Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-13 11:47:52 -04:00
Parav Pandit 0e9d2c19bf RDMA/core: Consider net ns of gid attribute for RoCE
When resolving destination address or route, when net namespace is
unavailable, refer to the net namespace of the netdevice of the SGID
attribute. This is typically the case for requests arriving from the
network for RoCE ports.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-12 16:32:17 -06:00
Parav Pandit d6b1764a8c RDMA/core: Introduce rdma_read_gid_attr_ndev_rcu() to check GID attribute
Introduce an API rdma_read_gid_attr_ndev_rcu() to return GID attribute
netdevice which is in UP state for accessing netdevice's fields such as
net namespace and ifindex.

This is useful for users who intent to access netdevice fields under rcu
lock.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-12 16:32:17 -06:00
Parav Pandit 6aaecd3856 RDMA/core: Simplify roce_resolve_route_from_path()
Currently RoCE route resolve functionality is split between two
functions. (a) roce_resolve_route_from_path() and its helper function
rdma_resolve_ip_route().

Due to this multiple sockaddr src structures are created in both functions
with rdma_dev_addr is an interface between the two for checks.

Since there is only one user of rdma_resolve_ip_route() as RoCE, combine
the functionality of both functions to roce_resolve_route_from_path() and
further reduce the scope of rdma_dev_addr to core/addr.c

This also allow to extend addr_resolve() in subsequent patch to consider
netdev properties of GID in safer way under rcu lock.

Additionally src and dst addresses were always provided, so skip the src
addr NULL pointer check as they are present on the stack now.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-12 16:32:17 -06:00
Parav Pandit c31d4b2ddf RDMA/core: Protect against changing dst->dev during destination resolve
During resolving address process, during route lookup and while performing
src address translation in case of loopback mode, hold the rcu lock so
that if netdevice is moving to different net namespace, or being
unregistered, it can be synchronized with net/core/dev.c, ie

change_net_namespace()
->dev_close_many()
  ->rt6_uncached_list_flush_dev() who would change dst->dev

to loopback device of the given net namespace.

Therefore, hold the rcu lock and sync with synchronize_net() of
change_net_namespace() to ensure that netdevice cannot get freed while
dst->dev is being used.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-12 16:32:16 -06:00
Parav Pandit 307edde8ef RDMA/core: Refer to network type instead of device type
Set and refer to rdma_dev_addr network type instead of dst->ndev to reduce
dependency on accessing dst netdevice.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-12 15:48:08 -06:00
Parav Pandit 783793b554 RDMA/core: Use common code flow for IPv4/6 for addr resolve
Use common code flow for resolving neighbour and for finding source
addresses.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-12 15:48:08 -06:00