Add needed structure layouts and defines for pci sync for fw update
event. The downstream patches will include event handlers for this event
type.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The imm_inval_pkey field can hold four different types of data,
depends on the usage, the data could be one of the below:
- Immediate field of the received message
- Invalidate rkey
- Pkey of the packet
- Flow table metadata
Current implementation doesn't reflect the intended usage of the
field at usage time.
Reflect the different types by replace this field with a union,
modify code where this field is used to reflect its intended
usage.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add new RDMA TX flow steering namespace. Flow steering rules in
this namespace are used to filter transmitted RDMA traffic.
Link: https://lore.kernel.org/r/20200324061425.1570190-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
On load, Driver caches MCAM (Management Capabilities Mask Register)
registers. in addition to the only MCAM register group (0) the driver
already reads, here we add support for reading groups 1 and 2.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Expose vDPA emulation device capabilities from the core layer.
It includes reading the capabilities from the firmware and exposing
helper functions to access the data.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This cycle mainly saw lots of bug fixes and clean up code across the core
code and several drivers, few new functional changes were made.
- Many cleanup and bug fixes for hns
- Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
bnxt_re, efa
- Share the query_port code between all the iWarp drivers
- General rework and cleanup of the ODP MR umem code to fit better with
the mmu notifier get/put scheme
- Support rdma netlink in non init_net name spaces
- mlx5 support for XRC devx and DC ODP
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl2A1ugACgkQOG33FX4g
mxp+EQ//Ug8CyyDs40SGZztItoGghuQ4TVA2pXtuZ9LkFVJRWhYPJGOadHIYXqGO
KQJJpZPQ02HTZUPWBZNKmD5bwHfErm4cS73/mVmUpximnUqu/UsLRJp8SIGmBg1D
W1Lz1BJX24MdV8aUnChYvdL5Hbl52q+BaE99Z0haOvW7I3YnKJC34mR8m/A5MiRf
rsNIZNPHdq2U6pKLgCbOyXib8yBcJQqBb8x4WNAoB1h4MOx+ir5QLfr3ozrZs1an
xXgqyiOBmtkUgCMIpXC4juPN/6gw3Y5nkk2VIWY+MAY1a7jZPbI+6LJZZ1Uk8R44
Lf2KSzabFMMYGKJYE1Znxk+JWV8iE+m+n6wWEfRM9l0b4gXXbrtKgaouFbsLcsQA
CvBEQuiFSO9Kq01JPaAN1XDmhqyTarG6lHjXnW7ifNlLXnPbR1RJlprycExOhp0c
axum5K2fRNW2+uZJt+zynMjk2kyjT1lnlsr1Rbgc4Pyionaiydg7zwpiac7y/bdS
F7/WqdmPiff78KMi187EF5pjFqMWhthvBtTbWDuuxaxc2nrXSdiCUN+96j1amQUH
yU/7AZzlnKeKEQQCR4xddsSs2eTrXiLLFRLk9GCK2eh4cUN212eHTrPLKkQf1cN+
ydYbR2pxw3B38LCCNBy+bL+u7e/Tyehs4ynATMpBuEgc5iocTwE=
=zHXW
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull RDMA subsystem updates from Jason Gunthorpe:
"This cycle mainly saw lots of bug fixes and clean up code across the
core code and several drivers, few new functional changes were made.
- Many cleanup and bug fixes for hns
- Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed,
bnxt_re, efa
- Share the query_port code between all the iWarp drivers
- General rework and cleanup of the ODP MR umem code to fit better
with the mmu notifier get/put scheme
- Support rdma netlink in non init_net name spaces
- mlx5 support for XRC devx and DC ODP"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits)
RDMA: Fix double-free in srq creation error flow
RDMA/efa: Fix incorrect error print
IB/mlx5: Free mpi in mp_slave mode
IB/mlx5: Use the original address for the page during free_pages
RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp"
RDMA/hns: Package operations of rq inline buffer into separate functions
RDMA/hns: Optimize cmd init and mode selection for hip08
IB/hfi1: Define variables as unsigned long to fix KASAN warning
IB/{rdmavt, hfi1, qib}: Add a counter for credit waits
IB/hfi1: Add traces for TID RDMA READ
RDMA/siw: Relax from kmap_atomic() use in TX path
IB/iser: Support up to 16MB data transfer in a single command
RDMA/siw: Fix page address mapping in TX path
RDMA: Fix goto target to release the allocated memory
RDMA/usnic: Avoid overly large buffers on stack
RDMA/odp: Add missing cast for 32 bit
RDMA/hns: Use devm_platform_ioremap_resource() to simplify code
Documentation/infiniband: update name of some functions
RDMA/cma: Fix false error message
RDMA/hns: Fix wrong assignment of qp_access_flags
...
To resolve dependencies in following patches
mlx5_ib.h conflict resolved by keeing both hunks
Linux 5.3-rc8
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Map capability bit indicating that HCA supports port buffer's congestion
counters. Also map registers with the corresponding counters.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Merge mlx5-next patches needed for upcoming mlx5 software steering.
1) Alex adds HW bits and definitions required for SW steering
2) Ariel moves device memory management to mlx5_core (From mlx5_ib)
3) Maor, Cleanups and fixups for eswitch mode and RoCE
4) Mark, Set only stag for match untagged packets
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add two events that were defined in the device specification but were
not exposed in the driver list.
Post this patch those events can be read over the DEVX events interface
once be reported by the firmware.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Edward Srouji <edwards@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20190808084358.29517-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Misc updates from mlx5-next branch:
1) Add the required HW definitions and structures for upcoming TLS
support.
2) Add support for MCQI and MCQS hardware registers for fw version query.
3) Added hardware bits and structures definitions for sub-functions
4) Small code cleanup and improvement for PF pci driver.
5) Bluefield (ECPF) updates and refactoring for better E-Switch
management on ECPF embedded CPU NIC:
5.1) Consolidate querying eswitch number of VFs
5.2) Register event handler at the correct E-Switch init stage
5.3) Setup PF's inline mode and vlan pop when the ECPF is the
E-Swtich manager ( the host PF is basically a VF ).
5.4) Handle Vport UC address changes in switchdev mode.
6) Cleanup the rep and netdev reference when unloading IB rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
i# All conflicts fixed but you are still merging.
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.
As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
If a FW assert is considered fatal, indicated by a new bit in the health
buffer, reset the FW. After the reset go through the normal recovery
flow. Only one PF needs to issue the reset, so an attempt is made to
prevent the 2nd function from also issuing the reset.
It's not an error if that happens, it just slows recovery.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To support sriov on a E-Switch manager, num_vfs are queried
to the firmware whenever E-Switch manager is notified by
esw_functions_changed event.
Replace host_params event with esw_functions_changed event that reflects
more appropriate naming.
While at it, also correct num_vfs type from int to u16 as expected by
the function mlx5_esw_query_functions().
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This merge commit includes some misc shared code updates from mlx5-next branch needed
for net-next.
1) From Aya: Enable general events on all physical link types and
restrict general event handling of subtype DELAY_DROP_TIMEOUT in mlx5 rdma
driver to ethernet links only as it was intended.
2) From Eli: Introduce low level bits for prio tag mode
3) From Maor: Low level steering updates to support RDMA RX flow
steering and enables RoCE loopback traffic when switchdev is enabled.
4) From Vu and Parav: Two small mlx5 core cleanups
5) From Yevgeny add HW definitions of geneve offloads
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce specification for Geneve decap flow with encapsulation options
and allow creation of rules that are matching on Geneve TLV options.
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add new flow steering namespace - MLX5_FLOW_NAMESPACE_RDMA_RX.
Flow steering rules in this namespace are used to filter
RDMA traffic.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Handle event of power state change in the PCIE slot. When the event
occurs, check if query power state and PCI power fields is supported. If
so, read these fields from MPEIN (management PCIE info) register and
issue a corresponding message.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In Embedded CPU (EC) configurations, the EC driver needs to know when
the number of virtual functions change on the corresponding PF at the
host side. This is required so the EC driver can create or destroy
representor net devices that represent the VFs ports.
Whenever a change in the number of VFs occurs, firmware will generate an
event towards the EC which will trigger a work to complete the rest of
the handling. The specifics of the handling will be introduced in a
downstream patch.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power
with advanced network offloads to accelerate a multitude of security,
networking and storage applications.
With the introduction of the SmartNIC, there is a new PCI function
called Embedded CPU Physical Function(ECPF). And it's possible for a
PF to get its ICM pages from the ECPF PCI function. Driver shall
identify if it is running on such a function by reading a bit in
the initialization segment.
When firmware asks for pages, it would issue a page request event
specifying how many pages it requests and for which function. That
driver responds with a manage_pages command providing the requested
pages along with an indication for which function it is providing these
pages.
The encoding before this patch was as follows:
function_id == 0: pages are requested for the function receiving
the EQE.
function_id != 0: pages are requested for VF identified by the
function_id value
A new one bit field in the EQE identifies that pages are requested for
the ECPF.
The notion of page_supplier can be introduced here and to support that,
manage pages and query pages were modified so firmware can distinguish
the following cases:
1. Function provides pages for itself
2. PF provides pages for its VF
3. ECPF provides pages to itself
4. ECPF provides pages for another function
This distinction is possible through the introduction of the bit
"embedded_cpu_function" in query_pages, manage_pages and page request
EQE.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Better to use void * and avoid unnecessary casts.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To avoid compatibility issue with older kernels the firmware doesn't
allow SRQ to work with ODP unless kernel asks for it.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Add support for the HW feature of multi-packet WQE in XDP
xmit flow.
The conventional TX descriptor (WQE, Work Queue Element) serves
a single packet. Our HW has support for multi-packet WQE (MPWQE)
in which a single descriptor serves multiple TX packets.
This reduces both the PCI overhead and the CPU cycles wasted on
writing them.
In this patch we add support for the HW feature, which is supported
starting from ConnectX-5.
Performance:
Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_TX:
We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
milestone.
* Single-port HCA:
Before: 70 Mpps
After: 100 Mpps (+42.8%)
* Dual-port HCA:
Before: 51.7 Mpps
After: 57.3 Mpps (+10.8%)
* In both cases we tested traffic on one port and for now On Dual-port HCAs
we see only small gain, we are working to overcome this bottleneck, but
for the moment only with experimental firmware on dual port HCAs we can
reach the wanted numbers as seen on Single-port HCAs.
XDP_REDIRECT:
Redirect from (A) ConnectX-5 to (B) ConnectX-5.
Due to a setup limitation, (A) and (B) are on different NUMA nodes,
so absolute performance numbers are not optimal.
Note:
Below is the transmit rate of (B), not the redirect rate of (A)
which is in some cases higher.
* (B) is single-port:
Before: 77 Mpps
After: 90 Mpps (+16.8%)
* (B) is dual-port:
Before: 61 Mpps
After: 72 Mpps (+18%)
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Will be used in downstream patch to monitor counter changes
by the HCA and report it to the driver by an event.
The driver will update its counters cached data accordingly.
Signed-off-by: Eyal Davidovich <eyald@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce and use a helper that extracts the opcode
from a CQE (completion queue entry) structure.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use atomic_notifier_chain to fire firmware events at internal mlx5 core
components such as eswitch/fpga/clock/FW tracer/etc.., this is to
avoid explicit calls from low level mlx5_core to upper components and to
simplify the mlx5_core API for future developments.
Simply provide register/unregister notifiers API and call the notifier
chain on firmware async events.
Example: to subscribe to a FW event:
struct mlx5_nb port_event;
MLX5_NB_INIT(&port_event, port_event_handler, PORT_CHANGE);
mlx5_eq_notifier_register(mdev, &port_event);
where:
- port_event_handler is the notifier block callback.
- PORT_EVENT is the suffix of MLX5_EVENT_TYPE_PORT_CHANGE.
The above will guarantee that port_event_handler will receive all FW
events of the type MLX5_EVENT_TYPE_PORT_CHANGE.
To receive all FW/HW events one can subscribe to
MLX5_EVENT_TYPE_NOTIFY_ANY.
The next few patches will start moving all mlx5 core components to use
this new API and cleanup mlx5_eq_async_int misx handler from component
explicit calls and specific logic.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 updates for both net-next and rdma-next
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: (21 commits)
net/mlx5: Expose DC scatter to CQE capability bit
net/mlx5: Update mlx5_ifc with DEVX UID bits
net/mlx5: Set uid as part of DCT commands
net/mlx5: Set uid as part of SRQ commands
net/mlx5: Set uid as part of SQ commands
net/mlx5: Set uid as part of RQ commands
net/mlx5: Set uid as part of QP commands
net/mlx5: Set uid as part of CQ commands
net/mlx5: Rename incorrect naming in IFC file
net/mlx5: Export packet reformat alloc/dealloc functions
net/mlx5: Pass a namespace for packet reformat ID allocation
net/mlx5: Expose new packet reformat capabilities
{net, RDMA}/mlx5: Rename encap to reformat packet
net/mlx5: Move header encap type to IFC header file
net/mlx5: Break encap/decap into two separated flow table creation flags
net/mlx5: Add support for more namespaces when allocating modify header
net/mlx5: Export modify header alloc/dealloc functions
net/mlx5: Add proper NIC TX steering flow tables support
net/mlx5: Cleanup flow namespace getter switch logic
net/mlx5: Add memic command opcode to command checker
...
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.
Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been scatter
to memory
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Extend the ability to add steering rules to NIC TX flow tables.
For now, we are only adding TX bypass (egress) which is used by the RDMA
side. This will allow to shape outgoing traffic and tweak it if needed, for
example performing encapsulation or rewriting headers.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Modify and query vport state commands share the same admin_state and
op_mod values, rename the enums to fit them both.
In addition, remove the esw prefix from the admin state enum as this
also applied for vnic.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tracer has one event, event 0x26, with two subtypes:
- Subtype 0: Ownership change
- Subtype 1: Traces available
An ownership change occurs in the following cases:
1- Owner releases his ownership, in this case, an event will be
sent to inform others to reattempt acquire ownership.
2- Ownership was taken by a higher priority tool, in this case
the owner should understand that it lost ownership, and go through
tear down flow.
The second subtype indicates that there are traces in the trace buffer,
in this case, the driver polls the tracer buffer for new traces, parse
them and prepares the messages for printing.
The HW starts tracing from the first address in the tracer buffer.
Driver receives an event notifying that new trace block exists.
HW posts a timestamp event at the last 8B of every 256B block.
Comparing the timestamp to the last handled timestamp would indicate
that this is a new trace block. Once the new timestamp is detected,
the entire block is considered valid.
Block validation and parsing, should be done after copying the current
block to a different location, in order to avoid block overwritten
during processing.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch updates the mlx5_ifc structures and
command interface to support DEVX.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This has been a quiet cycle for RDMA, the big bulk is the usual smallish
driver updates and bug fixes. About four new uAPI related things. Not as much
Szykaller patches this time, the bugs it finds are getting harder to fix.
- More work cleaning up the RDMA CM code
- Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns, i40iw, iw_cxgb4, mlx5, rxe
- Driver specific resource tracking and reporting via netlink
- Continued work for name space support from Parav
- MPLS support for the verbs flow steering uAPI
- A few tricky IPoIB fixes improving robustness
- HFI1 driver support for the '16B' management packet format
- Some auditing to not print kernel pointers via %llx or similar
- Mark the entire 'UCM' user-space interface as BROKEN with the intent to remove it
entirely. The user space side of this was long ago replaced with RDMA-CM and
syzkaller is finding bugs in the residual UCM interface nobody wishes to fix because
nobody uses it.
- Purge more bogus BUG_ON's from Leon
- 'flow counters' verbs uAPI
- T10 fixups for iser/isert, these are Acked by Martin but going through the RDMA
tree due to dependencies
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJbGEcPAAoJEDht9xV+IJsarBMQAIsAFOizycF0kQfDtvz1yHyV
YjkT3NA71379DsDsCOezVKqZ6RtXdQncJoqqEG1FuNKiXh/rShR3rk9XmdBwUCTq
mIY0ySiQggdeSIJclROiBuzLE3F/KIIkY3jwM80DzT9GUEbnVuvAMt4M56X48Xo8
RpFc13/1tY09ZLBVjInlfmCpRWyNgNccDBDywB/5hF5KCFR/BG/vkp4W0yzksKiU
7M/rZYyxQbtwSfe/ZXp7NrtwOpkpn7vmhED59YgKRZWhqnHF9KKmV+K1FN+BKdXJ
V1KKJ2RQINg9bbLJ7H2JPdQ9EipvgAjUJKKBoD+XWnoVJahp6X2PjX351R/h4Lo5
TH+0XwuCZ2EdjRxhnm3YE+rU10mDY9/UUi1xkJf9vf0r25h6Fgt6sMnN0QBpqkTh
euRZnPyiFeo1b+hCXJfKqkQ6An+F3zes5zvVf59l0yfVNLVmHdlz0lzKLf/RPk+t
U+YZKxfmHA+mwNhMXtKx7rKVDrko+uRHjaX2rPTEvZ0PXE7lMzFMdBWYgzP6sx/b
4c55NiJMDAGTyLCxSc7ziGgdL9Lpo/pRZJtFOHqzkDg8jd7fb07ID7bMPbSa05y0
BU5VpC8yEOYRpOEFbkJSPtHc0Q8cMCv/q1VcMuuhKXYnfSho3TWvtOSQIjUoU/q0
8T6TXYi2yF+f+vZBTFlV
=Mb8m
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This has been a quiet cycle for RDMA, the big bulk is the usual
smallish driver updates and bug fixes. About four new uAPI related
things. Not as much Szykaller patches this time, the bugs it finds are
getting harder to fix.
Summary:
- More work cleaning up the RDMA CM code
- Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns,
i40iw, iw_cxgb4, mlx5, rxe
- Driver specific resource tracking and reporting via netlink
- Continued work for name space support from Parav
- MPLS support for the verbs flow steering uAPI
- A few tricky IPoIB fixes improving robustness
- HFI1 driver support for the '16B' management packet format
- Some auditing to not print kernel pointers via %llx or similar
- Mark the entire 'UCM' user-space interface as BROKEN with the
intent to remove it entirely. The user space side of this was long
ago replaced with RDMA-CM and syzkaller is finding bugs in the
residual UCM interface nobody wishes to fix because nobody uses it.
- Purge more bogus BUG_ON's from Leon
- 'flow counters' verbs uAPI
- T10 fixups for iser/isert, these are Acked by Martin but going
through the RDMA tree due to dependencies"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (138 commits)
RDMA/mlx5: Update SPDX tags to show proper license
RDMA/restrack: Change SPDX tag to properly reflect license
IB/hfi1: Fix comment on default hdr entry size
IB/hfi1: Rename exp_lock to exp_mutex
IB/hfi1: Add bypass register defines and replace blind constants
IB/hfi1: Remove unused variable
IB/hfi1: Ensure VL index is within bounds
IB/hfi1: Fix user context tail allocation for DMA_RTAIL
IB/hns: Use zeroing memory allocator instead of allocator/memset
infiniband: fix a possible use-after-free bug
iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency
IB/isert: use T10-PI check mask definitions from core layer
IB/iser: use T10-PI check mask definitions from core layer
RDMA/core: introduce check masks for T10-PI offload
IB/isert: fix T10-pi check mask setting
IB/mlx5: Add counters read support
IB/mlx5: Add flow counters read support
IB/mlx5: Add flow counters binding support
IB/mlx5: Add counters create and destroy support
IB/uverbs: Add support for flow counters
...
The FPGA queue pair (QP) event fires whenever a QP on the FPGA
transitions to the error state.
At this stage, this event is unrecoverable, it may become recoverable
in the future.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add pbmc and pptb in the port_access_reg_cap_mask. These two
bits determine if device supports receive buffer configuration.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch reports the device's capbilities to offload
encapsulated MPLS tunnel protocols to user-space:
- Capability to offload MPLS over GRE.
- Capability to offload MPLS over UDP.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch introduces support for the MPLS flow spec and
allows the creation of rules that are matching on the
MPLS label.
Applying the rule matching depends on the flow specs order and
the location of the MPLS in the spec list as there are different
configurations to be made in the device in the cases of MPLSoGRE
and MPLSoUDP vs. non-encapsulated MPLS.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
- Fix RDMA uapi headers to actually compile in userspace and be more
complete
- Three shared with netdev pull requests from Mellanox:
* 7 patches, mostly to net with 1 IB related one at the back). This
series addresses an IRQ performance issue (patch 1), cleanups related to
the fix for the IRQ performance problem (patches 2-6), and then extends
the fragmented completion queue support that already exists in the net
side of the driver to the ib side of the driver (patch 7).
* Mostly IB, with 5 patches to net that are needed to support the remaining
10 patches to the IB subsystem. This series extends the current
'representor' framework when the mlx5 driver is in switchdev mode from
being a netdev only construct to being a netdev/IB dev construct. The IB
dev is limited to raw Eth queue pairs only, but by having an IB dev of
this type attached to the representor for a switchdev port, it enables
DPDK to work on the switchdev device.
* All net related, but needed as infrastructure for the rdma driver
- Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
- SRP performance updates
- IB uverbs write path cleanup patch series from Leon
- Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to
set the port for ib_srpt to listen on in configfs in order for it to be
enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
- TSO and Scatter FCS support in mlx4
- Refactor of modify_qp routine to resolve problems seen while working on new
code that is forthcoming
- More refactoring and updates of RDMA CM for containers support from Parav
- mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory'
user API features
- Infrastructure updates for the new IOCTL interface, based on increased usage
- ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit
kernel as was originally intended. See the commit messages for
extensive details
- Syzkaller bugs and code cleanups motivated by them
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJax5Z0AAoJEDht9xV+IJsacCwQAJBIgmLCvVp5fBu2kJcXMMVI
y3l2YNzAUJvDDKv1r5yTC9ugBXEkDtgzi/W/C2/5es2yUG/QeT/zzQ3YPrtsnN68
5FkiXQ35Tt7+PBHMr0cacGRmF4M3Td3MeW0X5aJaBKhqlNKwA+aF18pjGWBmpVYx
URYCwLb5BZBKVh4+1Leebsk4i0/7jSauAqE5M+9notuAUfBCoY1/Eve3DipEIBBp
EyrEnMDIdujYRsg4KHlxFKKJ1EFGItknLQbNL1+SEa0Oe0SnEl5Bd53Yxfz7ekNP
oOWQe5csTcs3Yr4Ob0TC+69CzI71zKbz6qPDILTwXmsPFZJ9ipJs4S8D6F7ra8tb
D5aT1EdRzh/vAORPC9T3DQ3VsHdvhwpUMG7knnKrVT9X/g7E+gSji1BqaQaTr/xs
i40GepHT7lM/TWEuee/6LRpqdhuOhud7vfaRFwn2JGRX9suqTcvwhkBkPUDGV5XX
5RkHcWOb/7KvmpG7S1gaRGK5kO208LgmAZi7REaJFoZB74FqSneMR6NHIH07ha41
Zou7rnxV68CT2bgu27m+72EsprgmBkVDeEzXgKxVI/+PZ1oadUFpgcZ3pRLOPWVx
rEqjHu65rlA/YPog4iXQaMfSwt/oRD3cVJS/n8EdJKXi4Qt2RDDGdyOmt74w4prM
QuLEdvJIFmwrND1KDoqn
=Ku8g
-----END PGP SIGNATURE-----
Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Doug and I are at a conference next week so if another PR is sent I
expect it to only be bug fixes. Parav noted yesterday that there are
some fringe case behavior changes in his work that he would like to
fix, and I see that Intel has a number of rc looking patches for HFI1
they posted yesterday.
Parav is again the biggest contributor by patch count with his ongoing
work to enable container support in the RDMA stack, followed by Leon
doing syzkaller inspired cleanups, though most of the actual fixing
went to RC.
There is one uncomfortable series here fixing the user ABI to actually
work as intended in 32 bit mode. There are lots of notes in the commit
messages, but the basic summary is we don't think there is an actual
32 bit kernel user of drivers/infiniband for several good reasons.
However we are seeing people want to use a 32 bit user space with 64
bit kernel, which didn't completely work today. So in fixing it we
required a 32 bit rxe user to upgrade their userspace. rxe users are
still already quite rare and we think a 32 bit one is non-existing.
- Fix RDMA uapi headers to actually compile in userspace and be more
complete
- Three shared with netdev pull requests from Mellanox:
* 7 patches, mostly to net with 1 IB related one at the back).
This series addresses an IRQ performance issue (patch 1),
cleanups related to the fix for the IRQ performance problem
(patches 2-6), and then extends the fragmented completion queue
support that already exists in the net side of the driver to the
ib side of the driver (patch 7).
* Mostly IB, with 5 patches to net that are needed to support the
remaining 10 patches to the IB subsystem. This series extends
the current 'representor' framework when the mlx5 driver is in
switchdev mode from being a netdev only construct to being a
netdev/IB dev construct. The IB dev is limited to raw Eth queue
pairs only, but by having an IB dev of this type attached to the
representor for a switchdev port, it enables DPDK to work on the
switchdev device.
* All net related, but needed as infrastructure for the rdma
driver
- Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
- SRP performance updates
- IB uverbs write path cleanup patch series from Leon
- Add RDMA_CM support to ib_srpt. This is disabled by default. Users
need to set the port for ib_srpt to listen on in configfs in order
for it to be enabled
(/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
- TSO and Scatter FCS support in mlx4
- Refactor of modify_qp routine to resolve problems seen while
working on new code that is forthcoming
- More refactoring and updates of RDMA CM for containers support from
Parav
- mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
memory' user API features
- Infrastructure updates for the new IOCTL interface, based on
increased usage
- ABI compatibility bug fixes to fully support 32 bit userspace on 64
bit kernel as was originally intended. See the commit messages for
extensive details
- Syzkaller bugs and code cleanups motivated by them"
* tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
IB/mlx5: Device memory mr registration support
net/mlx5: Mkey creation command adjustments
IB/mlx5: Device memory support in mlx5_ib
net/mlx5: Query device memory capabilities
IB/uverbs: Add device memory registration ioctl support
IB/uverbs: Add alloc/free dm uverbs ioctl support
IB/uverbs: Add device memory capabilities reporting
IB/uverbs: Expose device memory capabilities to user
RDMA/qedr: Fix wmb usage in qedr
IB/rxe: Removed GID add/del dummy routines
RDMA/qedr: Zero stack memory before copying to user space
IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
IB/mlx5: Add information for querying IPsec capabilities
IB/mlx5: Add IPsec support for egress and ingress
{net,IB}/mlx5: Add ipsec helper
IB/mlx5: Add modify_flow_action_esp verb
IB/mlx5: Add implementation for create and destroy action_xfrm
IB/uverbs: Introduce ESP steering match filter
IB/uverbs: Add modify ESP flow_action
...
This patch adds querying of device memory capabilities by the mlx5_core
driver during initialization.
Device memory capabilities is a new capability type and structure
which contains the necessary data that is needed for future device
memory allocation.
The presence of this new capabilities struct is indicated in the
general capabilities struct which is queried first by the driver.
If the presence bit is set, the driver will also query the new
capabilities struct and save it in the device context.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Current Striding RQ HW feature utilizes the RX buffers so that
there is no wasted room between the strides. This maximises
the memory utilization.
This prevents the use of build_skb() (which requires headroom
and tailroom), and demands to memcpy the packets headers into
the skb linear part.
In this patch, whenever a set of conditions holds, we apply
an RQ configuration that allows combining the use of linear SKB
on top of a Striding RQ.
To use build_skb() with Striding RQ, the following must hold:
1. packet does not cross a page boundary.
2. there is enough headroom and tailroom surrounding the packet.
We can satisfy 1 and 2 by configuring:
stride size = MTU + headroom + tailoom.
This is possible only when:
a. (MTU - headroom - tailoom) does not exceed PAGE_SIZE.
b. HW LRO is turned off.
Using linear SKB has many advantages:
- Saves a memcpy of the headers.
- No page-boundary checks in datapath.
- No filler CQEs.
- Significantly smaller CQ.
- SKB data continuously resides in linear part, and not split to
small amount (linear part) and large amount (fragment).
This saves datapath cycles in driver and improves utilization
of SKB fragments in GRO.
- The fragments of a resulting GRO SKB follow the IP forwarding
assumption of equal-size fragments.
Some implementation details:
HW writes the packets to the beginning of a stride,
i.e. does not keep headroom. To overcome this we make sure we can
extend backwards and use the last bytes of stride i-1.
Extra care is needed for stride 0 as it has no preceding stride.
We make sure headroom bytes are available by shifting the buffer
pointer passed to HW by headroom bytes.
This configuration now becomes default, whenever capable.
Of course, this implies turning LRO off.
Performance testing:
ConnectX-5, single core, single RX ring, default MTU.
UDP packet rate, early drop in TC layer:
--------------------------------------------
| pkt size | before | after | ratio |
--------------------------------------------
| 1500byte | 4.65 Mpps | 5.96 Mpps | 1.28x |
| 500byte | 5.23 Mpps | 5.97 Mpps | 1.14x |
| 64byte | 5.94 Mpps | 5.96 Mpps | 1.00x |
--------------------------------------------
TCP streams: ~20% gain
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add the needed capability bit and counters to device spec description.
Expose the following two counters in ethtool:
tx_pause_storm_warning_events: when the device is stalled for a period
longer than a pre-configured watermark, the counter increase, allowing
the debug utility an insight into current device status.
tx_pause_storm_error_events: when the device is stalled for a period
longer than a pre-configured timeout, the pause transmission is disabled,
and the counter increase.
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
BYPASS namespace is used by the RDMA side to insert flow rules into
the vport RX flow tables. Currently only 8 priorities are exposed,
increase this to 16 to allow more flexibility. This change will also
cause the BYPASS namespace to use 32 levels (as apposed to 16 today) of
flow tables, 16 levels for regular rules and 16 for don't trap rules.
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>