When in switchdev mode, there is no need to do self loopback checks
as we can't receive those packets, we insert steering rules to the
eswitch that make sure packets can't be looped back.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This commit adds full support for IB representor:
1) Representors profile, We add two new profiles:
nic_rep_profile - This profile will be used to create an IB device that
represents the PF/UPLINK.
rep_profile - This profile will be used to create an IB device that
represents VFs. Each VF will be its own representor.
2) Proper load/unload callbacks, Those are called by the E-Switch when
moving to/from switchdev mode.
3) Different flow DB handling for when we in switchdev mode.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to forward traffic from representor's SQ to the right virtual
function, every time an SQ is created also add the corresponding flow rule
to the FDB.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When enabling many VFs and switching to switchdev mode, the total amount
of mkeys we try to allocate when loading representors is very large and
may cause timeouts on allocations, the same issues was observed on VFs
and we employ the same fix that was done for them. We avoid allocating
the full MR cache on load but still allow it to be manipulated once the
IB device is loaded.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently in switchdev mode we allow only for raw packet QPs.
Expose the right capabilities and set the gid table length to 0, also
make sure we don't try to enable RoCE, so split the function
to enable RoCE so representors can enable only the notifier needed for
net device events.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently we listen to netdev register/unregister event based on PCI
device. When in switchdev mode PF and representors share the same PCI
device, so in order to pair ib device and netdev in switchdev mode
compare the netdev that triggered the event to that of the representor.
Expose a function that lets you receive the netdev associated what
a given representor.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When we point to a representor, it means we are in switchdev mode.
The flow db is shared between PF and virtual function representors
so each rule created needs to have a match on its specific source port.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
A flow DB is a shared resource between PF and representors,
need to allocate it only when creating the PF IB device.
Once we add IB representors, they will use the flow db which was
created by the PF.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Create the basic infrastructure of registering and unregistering
IB representors. The load/unload callbacks are left empty and
proper implementation will be introduced in following patches.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().
This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.
Base the Completion Queues (CQs) on this new fragmented buffer.
It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJabj6pAAoJEHm+PkMAQRiGs8cIAJQFkCWnbz86e3vG4DuWhyA8
CMGHCQdUOxxFGa/ixhIiuetbC0x+JVHAjV2FwVYbAQfaZB3pfw2iR1ncQxpAP1AI
oLU9vBEqTmwKMPc9CM5rRfnLFWpGcGwUNzgPdxD5yYqGDtcM8K840mF6NdkYe5AN
xU8rv1wlcFPF4A5pvHCH0pvVmK4VxlVFk/2H67TFdxBs4PyJOnSBnf+bcGWgsKO6
hC8XIVtcKCH2GfFxt5d0Vgc5QXJEpX1zn2mtCa1MwYRjN2plgYfD84ha0xE7J0B0
oqV/wnjKXDsmrgVpncr3txd4+zKJFNkdNRE4eLAIupHo2XHTG4HvDJ5dBY2NhGU=
=sOml
-----END PGP SIGNATURE-----
Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
To resolve conflicts in:
drivers/infiniband/hw/mlx5/main.c
drivers/infiniband/hw/mlx5/qp.c
From patches merged into the -rc cycle. The conflict resolution matches
what linux-next has been carrying.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The flags field the enum is used with comes directly from the uapi
so it belongs in the uapi headers for clarity and so userspace can
use it.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Failure in XRCD FW deallocation command leaves memory leaked and
returns error to the user which he can't do anything about it.
This patch changes behavior to always free memory and always return
success to the user.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The kmalloc() failure to allocate memory generates enough information
and doesn't need to be accompanied by another driver print.
Fixes: d69a24e036 ("IB/mlx5: Move IB event processing onto a workqueue")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch maps the new page to user space applications to
allow converting a user space completion timestamp to system wall
time at the lowest possible latency cost.
By using a versioning scheme we allow compatibility between current
and future userspace libraries.
The change moves mlx5_ib_mmap_cmd enum from mlx5_ib.h to the
abi header file mlx5-abi.h.
Reviewed-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- Oops fix in hfi1 driver
- Use-after-free issue in iser-target
- Use of user supplied array index without proper checking
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaXnN2AAoJELgmozMOVy/dyYAP/3YBi6ofVHrIjKovrsA6fh+y
O1+YuG++2glgKF1QuZkZnrKqrJUIueHhGuKce2rRgWWMioAzOfd1mbXbwha3xKWh
4mH/BK/fv1k/TZy+eu6w4WMnjtExVinPp4qeSwvvfHh3BpJfnBOtPbYNdPVt94MV
QYPZuLarvSIJe8MFDbv6yitAIQWnPQLfGMWPW8OZb0ut7bsMNPyZ8LvcAXhfXA4a
tVuxAfCqUb90BHWOnCBcaWGNpp9Ov2w6yyNCDkiIRXvgJxhjvJwxpHze6TMa4fCo
mv8e4QDdQ8KK2vQaINQAJTl68Y1/fPxtI8um5tpIhJ5YjQE3rrVzTaQJRstBcXXs
ViiDmxm/4KTUTeFtWHRj7oVJTZNSOBu+FfBVmjqtcUl4k+t9tDfj2Lw/ZVXT/3KQ
+dsqUV+1PrQspCeFlKwg+e1mgXa3miyOYvMPacAHzRYBLv5oU6MPpb+WHbZMlTMt
duT50JfjseIB0s6F2aGLNRRXPWb9jVUwHwzOXaDGKwX1+5u6O2IKeUP7+15TgZsS
IFeLPodNN8W1N1z0TGcGL8zRXtY9GYhlYwrdMUe/4EWfxrrRtawLLCGGVhWKk/Rv
iPl/Qpxj/8f2DS0A5+N5iHTAZPoOXjNx6JtZf9AZhvI9blvwZVuGdr63mKZMUnpb
a0xeypv1bnYbv7k70zS8
=MHNK
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"We had a few more items creep up over the last week. Given we are in
-rc8, these are obviously limited to bugs that have a big downside and
for which we are certain of the fix.
The first is a straight up oops bug that all you have to do is read
the code to see it's a guaranteed 100% oops bug.
The second is a use-after-free issue. We get away lucky if the queue
we are shutting down is empty, but if it isn't, we can end up oopsing.
We really need to drain the queue before destroying it.
The final one is an issue with bad user input causing us to access our
port array out of bounds. While fixing the array out of bounds issue,
it was noticed that the original code did the same thing twice (the
call to rdma_ah_set_port_num()), so its removal is not balanced by a
readd elsewhere, it was already where it needed to be in addition to
where it didn't need to be.
Summary:
- Oops fix in hfi1 driver
- use-after-free issue in iser-target
- use of user supplied array index without proper checking"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Fix out-of-bound access while querying AH
IB/hfi1: Prevent a NULL dereference
iser-target: Fix possible use-after-free in connection establishment error
The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.
==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409
CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0xe9/0x18f
print_address_description+0xa2/0x350
kasan_report+0x3a5/0x400
to_rdma_ah_attr+0xa8/0x3b0
mlx5_ib_query_qp+0xd35/0x1330
ib_query_qp+0x8a/0xb0
ib_uverbs_query_qp+0x237/0x7f0
ib_uverbs_write+0x617/0xd80
__vfs_write+0xf7/0x500
vfs_write+0x149/0x310
SyS_write+0xca/0x190
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560
Allocated by task 1:
__kmalloc+0x3f9/0x430
alloc_mad_private+0x25/0x50
ib_mad_post_receive_mads+0x204/0xa60
ib_mad_init_device+0xa59/0x1020
ib_register_device+0x83a/0xbc0
mlx5_ib_add+0x50e/0x5c0
mlx5_add_device+0x142/0x410
mlx5_register_interface+0x18f/0x210
mlx5_ib_init+0x56/0x63
do_one_initcall+0x15b/0x270
kernel_init_freeable+0x2d8/0x3d0
kernel_init+0x14/0x190
ret_from_fork+0x24/0x30
Freed by task 0:
(stack is not available)
The buggy address belongs to the object at ffff880019ae2000
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
512-byte region [ffff880019ae2000, ffff880019ae2200)
The buggy address belongs to the page:
page:000000005d674e18 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint
Cc: <stable@vger.kernel.org>
Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Change mlx5_get_uars_page to return ERR_PTR in case of
allocation failure. Change all callers accordingly to
check the IS_ERR(ptr) instead of NULL.
Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.
Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)
Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.
Adapt mlx5_ib and mlx5e to support the new cap bits.
Fixes: 2c43c5a036 ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb4 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The initial assignment to mdev is redundant as mdev is re-assigned
later and the first assigned value is never read. Remove this
redundant assignment.
Cleans up clang warning:
drivers/infiniband/hw/mlx5/main.c:359:24: warning: Value stored
to 'mdev' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When operating in dual port RoCE mode FW doesn't support steering for
raw QPs on the slave port. They still work on the master port, but
the user has no way of knowing which port is the master. The
capability is reported per device, not per port.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Route performance query MADs to the correct mlx5_core_dev when using
dual port RoCE mode.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Update the counter interface for multiple ports. Some counter sets
always comes from the primary device.
Port specific counters should be accessed per mlx5_core_dev not always
through the IB master mdev.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When there are multiple ports for single IB(RoCE) device, support
debugfs entries to be available for each port.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Port operations must be routed to their native mlx5_core_dev. A
multiport RoCE device registers itself as having 2 ports even before a
2nd port is affiliated. If an unaffilated port is queried use capability
information from the master port, these values are the same.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Because mlx5_ib_event can be called from atomic context move event
handling onto a workqueue. A mutex lock is required to get the IB device
for slave ports, so move event processing onto a work queue. When an IB
event is received, check if the mlx5_core_dev is a slave port, if so
attempt to get the IB device it's affiliated with. If found process the
event for that device, otherwise return.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.
If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.
Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.
The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.
In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When multiple RoCE ports are supported registration for events on
multiple netdevs is required. Refactor the event registration and
handling to support multiple ports.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove use of the num_ports general capability throughout. The number of
ports will be variable in the future, and reported in a different way.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A DC Target (DCT) QP is represented in the hardware as a unique object.
This object is created by CREATE_DCT command and destroyed by DESTROY_DCT
command. However, in the driver we describe it as a QP.
The hardware command that creates a DCT needs parameters that the verb
create_qp() does not provide. Those remaining parameters are provided
with the call to the verb modify_qp(). Therefore we delay the actual
creation of a DCT in the hardware until the stage of modify_qp() to RTR.
A support for query_qp() was added as well. It uses QUERY_DCT command to
retrieve the applicable fields.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
DC Initiator (DCI) QP is represented like any other QP in the hardware.
However, like any other transport QP there are attributes and settings
that are special to DCI QP and needs specific attention and care.
Make necessary changes to configure DCI QP.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The QP type IB_QPT_DRIVER doesn't describe the transport or the service
that the QP provides but those are known only to the hardware driver.
The actual type of the QP is stored in the hardware driver context (i.e.
mlx5_qp) under the field qp_sub_type.
Take the real QP type and any extra data that is required to create the QP
from the driver channel and modify the QP initial attributes before continuing
with create_qp().
Downstream patches from this series will add support for both DCI and
DCT driver QPs.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Unconditional locks/list and ODP srcu initialization should be done in
the INIT stage. Remove those from the CAPS stage and move them to the
proper stage.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The loopback stage only initializes a lock, move it to be in
the CAPS initialization phase and get rid loopback step completely.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have a stage just for hardware counters, move all relevant
initialization logic into one place.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have a stage just for ODP, move all relevant
initialization logic into one place.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have a stage just for RoCE/ETH, move all relevant
initialization logic into one place.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Today we have single function which is used when we add an IB interface,
break this function into multiple functions.
Create stages and a generic mechanism to execute each stage.
This is in preparation for RDMA/IB representors which might not need
all stages or will do things differently in some of the stages.
This patch doesn't change any functionality.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch enables QP creation with a given BF index, this allows the
user space driver to share same BF between few QPs or alternatively have
a dedicated BF per QP.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch exposes the option to dynamic allocates a UAR, this
functionality will be used in downstream patch in this series as
part of QP creation.
Specifically, the user space driver asks for a UAR allocation in a given
page index, upon success this UAR and its bfregs can be used as part of
QP creation by the user space driver.
To enable allocating more than 256 UARs the page index is encoded in an
extra one byte just after the command byte.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch extends the alloc context flow to be prepared for working
with dynamic UAR allocations.
Currently upon alloc context there is some fix size of UARs that are
allocated (named 'static allocation') and there is no option to user
application to ask for more or control which UAR will be used by which
QP.
In this patch the driver prepares its data structures to manage both the
static and the dynamic allocations and let the user driver knows about
the max value of dynamic blue-flame registers that are allowed.
Downstream patches from this series will enable the dynamic allocation
and the association as part of QP creation.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add missing inner RSS support capability as part of
the RSS supported fields.
In addition change MLX5_RX_HASH_INNER to 1UL << 31 in
order to define it as unsigned.
Fixes: 309fa3470f ("IB/mlx5: Add support for RSS on the inner packet")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Patches for 4.16 that are dependent on patches sent to 4.15-rc.
These are small clean ups for the vmw_pvrdma and i40iw drivers.
* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
i40iw: Change accelerated flag to bool
ibmr.device is being set only after ib_alloc_mr() is
(successfully) complete. Therefore, in case mlx5_core_create_mkey()
return with error, the error flow calls mlx5_free_priv_descs()
which uses ibmr.device (which doesn't exist yet), causing
a NULL dereference oops.
To fix this, the IB device should be set in the mr struct earlier
stage (e.g. prior to calling mlx5_core_create_mkey()).
Fixes: 8a187ee52b ("IB/mlx5: Support the new memory registration API")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
User-space applications can do mmap and munmap directly at
any time.
Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.
Cc: <stable@vger.kernel.org> # v4.7
Fixes: 7c2344c3bb ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.
Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A warning that I thought I had fixed before occasionally comes
back in rare randconfig builds (I found 7 instances in the last
100000 builds, originally it was much more frequent):
drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_ib_reg_user_mr':
drivers/infiniband/hw/mlx5/mr.c:1229:5: error: 'order' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (order <= mr_cache_max_order(dev)) {
^
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'ncont' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'page_shift' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1260:2: error: 'npages' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I've looked at all those findings again and noticed that they are all
with CONFIG_INFINIBAND_USER_MEM=n, which means ib_umem_get() returns
an error unconditionally and we never initialize or use those variables.
This triggers a condition in gcc iff mr_umem_get() is partially but not
entirely inlined, which in turn depends on the exact combination of
optimization settings. This is a known problem with gcc, with no
easy solution in the compiler, so this adds another workaround that
should be more reliable than my previous attempt.
Returning an error from mlx5_ib_reg_user_mr() earlier means that we
can completely bypass the logic that caused the warning, the compiler
can now see that the variable is never accessed.
Fixes: 14ab8896f5 ("IB/mlx5: avoid bogus -Wmaybe-uninitialized warning")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
query_device can now obtain the maximum values for
cq_max_count and cq_period, needed for cq moderation.
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Exposed mlx5_ib_modify_cq to be called from ib device
verb list.
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When working over a RoCE network, the UDP source port should be set only
for statically connected QPs (RC, UC and XRC).
Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The UMR's QP is created by calling mlx5_ib_create_qp directly, and
therefore the send CQ and the recv CQ on the ibqp weren't assigned.
Assign them right after calling the mlx5_ib_create_qp to assure
that any access to those pointers will work as expected and won't
crash the system as might happen as part of reset flow.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add the PCI write end padding flag to device_cap_flags enum and set it
during mlx5_ib_query_device so it will be reported to user-space.
During WQ/QP creation, set that capability for WQ/QP if user requested
it and HW supports it.
PCI write end padding modification is not supported for now. There's no
such flag for a QP but for a WQ, create and modify use the same flag.
Return an error if PCI write end padding flag is set during modify_wq.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Some user space application would like to do RSS on the inner
packet fields instead on the outer.
When MLX5_RX_HASH_INNER is set with one or more of the other
hash fields, then the RSS will be done using the inner packet.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The device can support receive Stateless Offloads for the inner
packet's fields only when the packet is processed by TIR which is
enabled to support tunneling. Otherwise, the device treats the
packet as an ordinary non-tunneling packet and receive offloads
can be done only for the outer packet's field.
In order to enable receive Stateless Offloading support for incoming
tunneling traffic the TIR should be created with tunneled_offload_en.
Tunneling offloads is supported only be raw ethernet QP.
This patch includes:
* New QP creation flag for tunneling offloads.
* Reports device capabilities.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In some benchmarks and some CPU architectures, writing the CQE on a full
cache line size improves performance by saving memory access operations
(read-modify-write) relative to partial cache line change. This patch
lets the user to configure the device to pad the CQE up to 128B in case
its content is less than 128B. Currently the driver supports only padding
for a CQE size of 128B.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In commit 1cbe6fc86c ("IB/mlx5: Add support for CQE compressing") the
concept of CQE compression was introduced and added a support for 64B
CQE size. This change update the code to support 128B CQE size as well.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Allow creation of a multi-packet receive queue.
In order to create a multi-packet RQ, the following fields in
the mlx5_ib_rwq should be set:
- log_num_strides: Log of number of strides per WQE
- single_stride_log_num_of_bytes: Log of a single stride size
- two_byte_shift_en: When enabled, hardware pads 2 bytes of zeros
before writing the message to memory (e.g. for the IP alignment).
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch reports the device's striding RQ capabilities to
the user-space:
- min/max_single_stride_log_num_of_bytes: Log of min/max number of
bytes in a single stride.
- min/max_single_wqe_log_num_of_strides: Log of min/max number of
strides in a single WQE.
- supported_qpts: A bit mask to know which QP types support multi-
packet RQ, for now only Raw Packet QPs.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The early for-next branch was based on v4.14-rc2, while the shared pull
request I got from Mellanox used a v4.14-rc4 base. I'm making the
branch that was the shared Mellanox pull request the new for-next branch
and merging the early for-next branch into it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Using the ARRAY_SIZE macro improves the readability of the code.
Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since IB/core resolves the destination mac address for user and kernel
consumers, avoid resolving in multiple provider drivers.
Only ib_core resolves DMAC now, therefore resolve_eth_dmac is removed as
exported symbol.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
References: commit 5fe9dec0d0 ("IB/mlx5: Use blue flame register allocator in mlx5_ib")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Avoid that gcc 7 reports the following warning when building with W=1:
warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When UAR get_page fails, it needs to continue to cleanup debugfs for
congestion control parameters. Labels for error path were incorrectly
ordered.
This patch fixes to do correct cleanup on debugfs init failure and uar
get page failure.
Fixes: 4a2da0b8c0 ("IB/mlx5: Add debug control parameters for congestion control")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
pr_err() and mlx5_ib_dbg( messages should terminated with a new-line to
avoid other messages being concatenated.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
mlx5_ib_reg_user_mr called mlx5_ib_dereg_mr in case of MR population
failure. This resulted in a NULL dereference as ibmr->device wasn't
initialized yet.
We address this by adding an internal dereg_mr function that can handle
partially initialized MRs, and fixing clean_mr to work on partially
initialized MRs.
Fixes: ff740aefec ("IB/mlx5: Decouple MR allocation and population flows")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The patch simplifies mlx5_ib_cont_pages and fixes the following
issues in the original implementation:
First issues is related to alignment of the PFNs. After the check
base + p != PFN, the alignment of the PFN wasn't checked. So the PFN
sequence 0, 1, 1, 2 would result in a page_shift of 13 even though
the 3rd PFN is not 8KB aligned.
This wasn't actually a bug because it was supported by all the
existing mlx5 compatible device, but we don't want to require
this support in all future devices.
Another issue is because the inner loop didn't advance PFN so
the test "if (base + p != pfn)" always failed for SGE with
len > (1<<page_shift).
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The tag matching functionality is implemented by mlx5 driver
by extending XRQ, however this internal kernel information was
exposed to user space applications with *xrq* name instead of *tm*.
This patch renames *xrq* to *tm* to handle that.
Fixes: 8d50505ada ("IB/uverbs: Expose XRQ capabilities")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If delay_drop_debugfs_init() fails in any of the operations to create
debugfs, it is calling delay_drop_debugfs_cleanup() as part of its
cleanup. But delay_drop_debugfs_cleanup() checks for 'dbg' and since
we have not yet pointed 'dbg' to the debugfs we need to cleanup, the
cleanup fails and we are left with stray debugfs elements and also a
memory leak.
Fixes: 4a5fd5d296 ("IB/mlx5: Add necessary delay drop assignment")
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pass to mlx5_core flag to enable rendezvous offload, list_size and CQ
when SRQ created with IB_SRQT_TM.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ handle out makes it available to other types extending SRQ
functionality.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Expose enhanced multi packet WQE capability to user space through
query_device by uhw.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Set the field to allow posting multi packet send WQEs if hardware
supports this feature. This doesn't mean the send WQEs will be for
multi packet unless the send WQE was prepared according to multi
packet send WQE format.
User space shall use flag MLX5_IB_ALLOW_MPW to check if hardware
supports MPW and allows MPW in SQ context.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Set underlay QPN as part of flow rule when it's applicable.
There is one root flow table in the NIC RX namespace and all the
underlay QPs steer the traffic to this flow table.
In order to prevent QP to get traffic which is not target to its
underlay QP, we need to set the underlay QP number as part of
the steering matching.
Note:
When multicast traffic is sent the QPN filtering is done by the firmware
as some early step. Adding the QPN match on the flow table entry is
wrong as by that time the target QPN holds the multicast address (e.g.
FF(s)) and it won't match.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Fix a bug where MR registration fails when mlx5_ib_cont_pages
indicates that the MR can be mapped using 2GB pages (page_shift == 31).
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In clean_mr error path the 'mr' should be freed.
Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
mlx5 compatible devices have two ways of populating the MTT
table of an MKEY: using a FW command and using a UMR WQE.
A UMR is much faster, so it should be used whenever possible.
Unfortunately the code today uses UMR only if the MKEY was allocated
from the MR cache.
Fix the code to use UMR even for MKEYs that were allocated using
a FW command.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch is the first step in decoupling UMR usage and
allocation from the MR cache. The only functional change
in this patch is to enables UMR for MRs created with
reg_create.
This change fixes a bug where ODP memory regions that
were not allocated from the MR cache did not have UMR
enabled.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Software parsing (SWP) is a feature that can be used to instruct the
device to stop using its internal parser and to parse packets on the
transmit path according to offsets set for each packets.
Through this feature, the device allows the handling of checksum and
LSO by the hardware according to the location of IP and TCP/UDP
headers.
Enable SW parsing on Raw Ethernet send queue by default if firmware
supports it and report these capabilities to user space.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx5_ib_get_vector_affinity() call is local to main.c file and there
is no need to be declared globally visible.
Fixes: 40b24403f3 ("mlx5: support ->get_vector_affinity")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Assign the statistics and configuration structure pointer on success.
Fixes: fe248c3a58 ('IB/mlx5: Add delay drop configuration and statistics')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
CM layer calls ib_modify_port() regardless of the link layer.
For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.
Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We observed multiple times on our Lustre OSS servers that when
the system memory is fragmented, kmalloc() in create_kernel_qp()
could fail order 4/5 allocations while we still have many free pages.
Switch to kvmalloc_array() to allow the operation to contine.
Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Most user verbs pass user data to the kernel with the inclusion of the
ib_uverbs_cmd_hdr structure. This is problematic because the vendor has
no ideas if the verb was called by a legacy verb or an extended verb.
Also, the incosistency between the verbs is confusing.
Fixes: 565197dd8f ("IB/core: Extend ib_uverbs_create_cq")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch series primarily increases sizes of variables that hold
lid values from 16 to 32 bits. Additionally, it adds a check in
the IB mad stack to verify a properly formatted MAD when OPA
extended LIDs are used.
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
include/rdma/ib_verbs.h - Modified a function signature adjacent
to a newly added function signature from a previous merge
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Both add new code
include/rdma/ib_verbs.h - Both add new code
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a need to forward FW version to user space
application through RDMA netlink. In order to make it safe, there
is need to declare nla_policy and limit the size of FW string.
The new define IB_FW_VERSION_NAME_MAX will limit the size of
FW version string. That define was chosen to be equal to
ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
by that value indirectly.
The introduction of this define allows us to remove the string size
from get_fw_str function signature.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
slid field in struct ib_wc is increased to 32 bits.
This enables core components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.
Fixes: 17d2f88f92 ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>