Up until now the action of a trap was never changed during its lifetime.
This is going to change by subsequent patches that will allow devlink to
control the action of certain traps.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Gunthorpe says:
====================
This is a collection of general cleanups for ODP to clarify some of the
flows around umem creation and use of the interval tree.
====================
The branch is based on v5.3-rc5 due to dependencies
* odp_fixes:
RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr
RDMA/mlx5: Use ib_umem_start instead of umem.address
RDMA/core: Make invalidate_range a device operation
RDMA/odp: Use kvcalloc for the dma_list and page_list
RDMA/odp: Check for overflow when computing the umem_odp end
RDMA/odp: Provide ib_umem_odp_release() to undo the allocs
RDMA/odp: Split creating a umem_odp from ib_umem_get
RDMA/odp: Make the three ways to create a umem_odp clear
RMDA/odp: Consolidate umem_odp initialization
RDMA/odp: Make it clearer when a umem is an implicit ODP umem
RDMA/odp: Iterate over the whole rbtree directly
RDMA/odp: Use the common interval tree library instead of generic
RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use different namespaces for bypass and switchdev loopback because they
have different priorities and default table miss action requirement:
1. bypass: with multiple priorities support, and
MLX5_FLOW_TABLE_MISS_ACTION_DEF as the default table miss action;
2. switchdev loopback: with single priority support, and
MLX5_FLOW_TABLE_MISS_ACTION_SWITCH_DOMAIN as the default table miss
action.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Currently all the namespaces under the same steering domain share the same
default table miss action, however in some situations (e.g., RDMA RX)
different actions are required. This patch adds a per-namespace default
table miss action instead of using the miss action of the steering domain.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Now that the single target build descends into sub-directories in the
same way as the normal build, these dummy Makefiles are not needed
any more.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Fix documentation of mlx5_eq_enable/disable to cleanup compiler warnings.
drivers/net/ethernet/mellanox/mlx5/core//eq.c:334:
warning: Function parameter or member 'dev' not described in 'mlx5_eq_enable'
warning: Function parameter or member 'eq' not described in 'mlx5_eq_enable'
warning: Function parameter or member 'nb' not described in 'mlx5_eq_enable'
drivers/net/ethernet/mellanox/mlx5/core//eq.c:355:
warning: Function parameter or member 'dev' not described in 'mlx5_eq_disable'
warning: Function parameter or member 'eq' not described in 'mlx5_eq_disable'
warning: Function parameter or member 'nb' not described in 'mlx5_eq_disable'
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Previously, mlx5_cleanup_fc_stats() would cleanup the flow counter
pool beofre releasing all the counters to it, which would result in
flow counter bulks not getting freed. Resolve this by changing the
order in which elements of fc_stats are cleaned up, so that the flow
counter pool is cleaned up after all the counters are released.
Also move cleanup actions for freeing the bulk query memory and
destroying the idr to the end of mlx5_cleanup_fc_stats().
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for report and recovery from error on completion on RQ by
setting the queue back to ready state. Handle only errors with a
syndrome indicating the RQ might enter error state and could be
recovered.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Just to be aligned with the MPWQE handlers, handle RX WQE with error
for legacy RQs in the top RX handlers, just before calling skb_from_cqe().
CQE error handling will now be called at the same stage regardless of
the RQ type or netdev mode NIC, Representor, IPoIB, etc ..
This will be useful for down stream patch to improve error CQE
handling.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for report and recovery from rx timeout. On driver open we
post NOP work request on the rx channels to trigger napi in order to
fillup the rx rings. In case napi wasn't scheduled due to a lost
interrupt, perform EQ recovery.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add support for report and recovery from error on completion on ICOSQ.
Deactivate RQ and flush, then deactivate ICOSQ. Set the queue back to
ready state (firmware) and reset the ICOSQ and the RQ (software
resources). Finally, activate the ICOSQ and the RQ.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Align ICOSQ open/close behaviour with RQ and SQ. Split open flow into
open and activate where open handles creation and activate enables the
queue. Do a symmetric thing in close flow: split into close and
deactivate.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Introduce helper functions for create and destroy reporters and update
channels. In the following patch, rx reporter is added and it will use
these helpers too.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The following patches in the set enhance the diagnostics info of tx
reporter. Therefore, it is better to pass a pointer to the SQ for
further data extraction.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Prepare for code sharing with rx reporter, which is added in the
following patches in the set. Introduce a generic error_ctx for
agnostic recovery despatch.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Change from mlx5e_tx_reporter_* to mlx5e_reporter_tx_*. In the following
patches in the set rx reporter is added, the new naming convention is
more uniformed.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rename reporter.h -> health.h so patches in the set can use it for
health related functionality.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
tc transparently maps the software priority number to hardware. Update
it to pass the major priority which is what most drivers expect. Update
drivers too so they do not need to lshift the priority field of the
flow_cls_common_offload object. The stmmac driver is an exception, since
this code assumes the tc software priority is fine, therefore, lshift it
just to be conservative.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds support for the new need_wakeup feature of AF_XDP. The
applications can opt-in by using the XDP_USE_NEED_WAKEUP bind() flag.
When this feature is enabled, some behavior changes:
RX side: If the Fill Ring is empty, instead of busy-polling, set the
flag to tell the application to kick the driver when it refills the Fill
Ring.
TX side: If there are pending completions or packets queued for
transmission, set the flag to tell the application that it can skip the
sendto() syscall and save time.
The performance testing was performed on a machine with the following
configuration:
- 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz
- Mellanox ConnectX-5 Ex with 100 Gbit/s link
The results with retpoline disabled:
| without need_wakeup | with need_wakeup |
|----------------------|----------------------|
| one core | two cores | one core | two cores |
-------|----------|-----------|----------|-----------|
txonly | 20.1 | 33.5 | 29.0 | 34.2 |
rxdrop | 0.065 | 14.1 | 12.0 | 14.1 |
l2fwd | 0.032 | 7.3 | 6.6 | 7.2 |
"One core" means the application and NAPI run on the same core. "Two
cores" means they are pinned to different cores.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Two XSK tasks are performed during NAPI polling, that are not bound to
hardware interrupts: TXing packets and polling for frames in the Fill
Ring. They are special in a way that the hardware doesn't know about
these tasks, so it doesn't trigger interrupts if there is still some
work to be done, it's our driver's responsibility to ensure NAPI will be
rescheduled if needed.
Create a new function to handle these tasks and move the corresponding
code from mlx5e_napi_poll to the new function to improve modularity and
prepare for the changes in the following patch.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new
ndo provides the same functionality as before but with the addition of
a new flags field that is used to specifiy if Rx, Tx or both should be
woken up. The previous ndo only woke up Tx, as implied by the
name. The i40e and ixgbe drivers (which are all the supported ones)
are updated with this new interface.
This new ndo will be used by the new need_wakeup functionality of XDP
sockets that need to be able to wake up both Rx and Tx driver
processing.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cited patch deleted ethtool flash device support, as ethtool core can
fallback into devlink flash callback. However, this is supported only if
there is a devlink port registered over the corresponding netdevice.
As mlx5e do not have devlink port support over native netdevice, it broke
the ability to flash device via ethtool.
This patch re-add the ethtool callback to avoid user functionality breakage
when trying to flash device via ethtool.
Fixes: 9c8bca2637 ("mlx5: Move firmware flash implementation to devlink")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a missing spinlock around XSKICOSQ usage at the activation stage,
because there is a race between a configuration change and the
application calling sendto().
Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through
kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS
indirection. However, if mlx4_qp_alloc() fails, the allocated
'rss_map->indir_qp' is not deallocated, leading to a memory leak bug.
To fix the above issue, add the 'qp_alloc_err' label to free
'rss_map->indir_qp'.
Fixes: 4931c6ef04 ("net/mlx4_en: Optimized single ring steering")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Merging tip of mlx5-next in order to get changes related to adding
XRQ support to the DEVX interface needed prior to the following two
patches.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add XRQ legacy commands opcodes, will be used via the DEVX interface.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
As QP may be created by DEVX, it may be valid to not find the rsn in
mlx5 core tree, change the level to be debug.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
To identify timestamps for matching with their packets, Spectrum-1 uses a
five-tuple of (port, direction, domain number, message type, sequence ID).
If there are several clients from the same domain behind a single port
sending Delay_Req's, the only thing differentiating these packets, as far
as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between
individual clients be similar, conflicts may arise. That is not a problem
to hardware, which will simply deliver timestamps on a first comes, first
served basis.
However the driver uses a simple hash table to store the unmatched pieces.
When a new conflicting piece arrives, it pushes out the previously stored
one, which if it is a packet, is delivered without timestamp. Later on as
the corresponding timestamps arrive, the first one is mismatched to the
second packet, and the second one is never matched and eventually is GCd.
To correct this issue, instead of using a simple rhashtable, use rhltable
to keep the unmatched entries.
Previously, a found unmatched entry would always be removed from the hash
table. That is not the case anymore--an incompatible entry is left in the
hash table. Therefore removal from the hash table cannot be used to confirm
the validity of the looked-up pointer, instead the lookup would simply need
to be redone. Therefore move it inside the critical section. This
simplifies a lot of the code.
Fixes: 8748642751 ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Reported-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
This cleans up a lot of unneeded code and logic around the debugfs
files, making all of this much simpler and easier to understand as we
don't need to keep the dentries saved anymore.
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
It is desired to use unique port indices when multiple pci devices'
devlink instance have the same switch-id.
Make use of vhca-id to generate such unique devlink port indices.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
System image GUID doesn't depend on eswitch switchdev mode.
Hence, remove the check which simplifies the code.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently mlx5_eswitch_rep stores same hw ID for all representors.
However it is never used from this structure.
It is always used from mlx5_vport.
Hence, remove unused field.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Encap entries creation is fully synchronized by encap_tbl_lock. In order to
allow concurrent allocation of hardware resources used to offload
encapsulation, extend mlx5e_encap_entry with 'res_ready' completion. Move
call to mlx5e_tc_tun_create_header_ipv{4|6}() out of encap_tbl_lock
critical section. Modify code that attaches new flows to existing encap to
wait for 'res_ready' completion before using the entry. Insert encap entry
to table before provisioning it to hardware and modify all users of the
encap table to verify that encap was fully initialized by checking
completion result for non-zero value (and to wait for 'res_ready'
completion, if necessary).
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock, protect encap hash table from concurrent
modifications with new "encap_tbl_lock" mutex. Use the mutex to protect
internal encap entry state from concurrent modification. This is necessary
because a flow can be attached to multiple encap entries simultaneously,
which significantly complicates using finer grained per-entry lock.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
List of flows attached to encap entry is used as implicit reference
counter (encap entry is deallocated when list becomes free) and as a
mechanism to obtain encap entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to encap entry is possible. Proper atomic reference counter is
required to support concurrent access.
As a preparation for extending encap with reference counting, extract code
that lookups and deletes encap entry into standalone put/get helpers. In
order to remove this dependency on external locking, extend encap entry
with reference counter to manage its lifetime and extend flow structure
with direct pointer to encap entry that flow is attached to.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mod_hdr entries creation is fully synchronized by mod_hdr_tbl->lock. In
order to allow concurrent allocation of hardware resources used to offload
header rewrite, extend mlx5e_mod_hdr_entry with 'res_ready' completion.
Move call to mlx5_modify_header_alloc() out of mod_hdr_tbl->lock critical
section. Modify code that attaches new flows to existing mh to wait for
'res_ready' completion before using the entry. Insert mh to mod_hdr table
before provisioning it to hardware and modify all users of mod_hdr table to
verify that mh was fully initialized by checking completion result for
negative value (and to wait for 'res_ready' completion, if necessary).
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock, protect mod_hdr hash table from
concurrent modifications with new mutex.
Implement helper function to get flow namespace to prevent code
duplication.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock, extend mod header entry with spinlock
and use it to protect list of flows attached to mod header entry from
concurrent modifications.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
List of flows attached to mod header entry is used as implicit reference
counter (mod header entry is deallocated when list becomes free) and as a
mechanism to obtain mod header entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to mod header entry is possible. Proper atomic reference counter
is required to support concurrent access.
As a preparation for extending mod header with reference counting, extract
code that lookups and deletes mod header entry into standalone put/get
helpers. In order to remove this dependency on external locking, extend mod
header entry with reference counter to manage its lifetime and extend flow
structure with direct pointer to mod header entry that flow is attached to.
To remove code duplication between legacy and switchdev mode
implementations that both support mod_hdr functionality, store mod_hdr
table in dedicated structure used by both fdb and kernel namespaces. New
table structure is extended with table lock by one of the following patches
in this series. Implement helper function to get correct mod_hdr table
depending on flow namespace.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Hairpin entries creation is fully synchronized by hairpin_tbl_lock. In
order to allow concurrent initialization of mlx5e_hairpin structure
instances and provisioning of hairpin entries to hardware, extend
mlx5e_hairpin_entry with 'res_ready' completion. Move call to
mlx5e_hairpin_create() out of hairpin_tbl_lock critical section. Modify
code that attaches new flows to existing hpe to wait for 'res_ready'
completion before using the hpe. Insert hpe to hairpin table before
provisioning it to hardware and modify all users of hairpin table to verify
that hpe was fully initialized by checking hpe->hp pointer (and to wait for
'res_ready' completion, if necessary).
Modify dead peer update event handling function to save hpe's to temporary
list with their reference counter incremented. Wait for completion of hpe's
in temporary list and update their 'peer_gone' flag outside of
hairpin_tbl_lock critical section.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock, protect hairpin hash table from
concurrent modifications with new "hairpin_tbl_lock" mutex.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock, extend hairpin entry with spinlock and
use it to protect list of flows attached to hairpin entry from concurrent
modifications.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
List of flows attached to hairpin entry is used as implicit reference
counter (hairpin entry is deallocated when list becomes free) and as a
mechanism to obtain hairpin entry that flow is attached to (through list
head). This is not safe when concurrent modification of list of flows
attached to hairpin entry is possible. Proper atomic reference counter is
required to support concurrent access.
As a preparation for extending hairpin with reference counting, extract
code that deletes hairpin entry into standalone function. In order to
remove this dependency on external locking, extend hairpin entry with
reference counter to manage its lifetime and extend flow structure with
direct pointer to hairpin entry that flow is attached to.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The size of the snapshot has to be the same as the size of the region,
therefore no need to pass it again during snapshot creation. Remove the
arg and use region->size instead.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend existing driver for Spectrum and Spectrum-2 ASICs
to support Spectrum-3 ASIC as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove check of recovery bit, in the beginning of the CQE recovery
function. This test is already performed right before the reporter
is invoked, when CQE error is detected.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CQE recovery function begins with test and set of recovery bit. Add an
error flow which ensures clearing of this bit when leaving the recovery
function, to allow further recoveries to take place. This allows removal
of clearing recovery bit on sq activate.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Remove wrong error return value when SQ is not in error state.
CQE recovery on TX reporter queries the sq state. If the sq is not in
error state, the sq is either in ready or reset state. Ready state is
good state which doesn't require recovery and reset state is a temporal
state which ends in ready state. With this patch, CQE recovery in this
scenario is successful.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Shift the tisn field in the WQE control segment, per the
HW specification.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use the proper tisn field name from the union in struct mlx5_wqe_ctrl_seg.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The TLS progress params context WQE should not include an
Eth segment, drop it.
In addition, align the tls_progress_params layout with the
HW specification document:
- fix the tisn field name.
- remove the valid bit.
Fixes: a12ff35e0f ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Setting speed to 56GBASE is allowed only with auto-negotiation enabled.
This patch prevent setting speed to 56GBASE when auto-negotiation disabled.
Fixes: f62b8bb8f2 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Mohamad Heib <mohamadh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Only support changing tx/rx pause frame setting if the net device
is the vport group manager.
Fixes: 3c2d18ef22 ("net/mlx5e: Support ethtool get/set_pauseparam")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We have an issue that OVS application creates an offloaded drop rule
that drops VXLAN traffic with both inner and outer header match
criteria. mlx5_core driver detects correctly the inner and outer
header match criteria but does not enable the inner header match criteria
due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that
only decap rule needs inner header criteria.
Solution:
Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add
two new members: inner_match_level and outer_match_level.
inner/outer_match_level is set to NONE if the inner/outer match criteria
is not specified in the tc rule creation request. The decap assumption is
removed and the code just needs to check for inner/outer_match_level to
enable the corresponding bit in firmware's match_criteria_enable value.
Fixes: 6363651d6d ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current ARFS code relies on certain fields to be set in the SKB
(e.g. transport_header) and extracts IP addresses and ports by custom
code that parses the packet. The necessary SKB fields, however, are not
always set at that point, which leads to an out-of-bounds access. Use
skb_flow_dissect_flow_keys() to get the necessary information reliably,
fix the out-of-bounds access and reuse the code.
Fixes: 18c908e477 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reference counters are preferred to use refcount_t instead of
atomic_t.
This is because the implementation of refcount_t can prevent
overflows and detect possible use-after-free.
So convert atomic_t ref counters to refcount_t.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There is a self assignment of esw->dev to itself, clean this up by
removing it. Also make dev a const pointer.
Addresses-Coverity: ("Self assignment")
Fixes: 6cedde4513 ("net/mlx5: E-Switch, Verify support QoS element type")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The commit 069d11465a ("net/mlx5e: RX, Enhance legacy Receive Queue
memory scheme") introduced an undefined behaviour below due to
"frag->last_in_page" is only initialized in mlx5e_init_frags_partition()
when,
if (next_frag.offset + frag_info[f].frag_stride > PAGE_SIZE)
or after bailed out the loop,
for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++)
As the result, there could be some "frag" have uninitialized
value of "last_in_page".
Later, get_frag() obtains those "frag" and check "frag->last_in_page" in
mlx5e_put_rx_frag() and triggers the error during boot. Fix it by always
initializing "frag->last_in_page" to "false" in
mlx5e_init_frags_partition().
UBSAN: Undefined behaviour in
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:325:12
load of value 170 is not a valid value for type 'bool' (aka '_Bool')
Call trace:
dump_backtrace+0x0/0x264
show_stack+0x20/0x2c
dump_stack+0xb0/0x104
__ubsan_handle_load_invalid_value+0x104/0x128
mlx5e_handle_rx_cqe+0x8e8/0x12cc [mlx5_core]
mlx5e_poll_rx_cq+0xca8/0x1a94 [mlx5_core]
mlx5e_napi_poll+0x17c/0xa30 [mlx5_core]
net_rx_action+0x248/0x940
__do_softirq+0x350/0x7b8
irq_exit+0x200/0x26c
__handle_domain_irq+0xc8/0x128
gic_handle_irq+0x138/0x228
el1_irq+0xb8/0x140
arch_cpu_idle+0x1a4/0x348
do_idle+0x114/0x1b0
cpu_startup_entry+0x24/0x28
rest_init+0x1ac/0x1dc
arch_call_rest_init+0x10/0x18
start_kernel+0x4d4/0x57c
Fixes: 069d11465a ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The line continuations unintentionally add whitespace so
instead use coalesced formats to remove the whitespace.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/f14db3287b23ed8af9bdbf8001e2e2fe7ae9e43a.camel@perches.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
In some case, we don't want to allow specific tunnel packets
to host that can avoid to take up high CPU (e.g network attacks).
But other tunnel packets which not matched in hardware will be
sent to host too.
$ tc filter add dev vxlan_sys_4789 \
protocol ip chain 0 parent ffff: prio 1 handle 1 \
flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \
enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \
action tunnel_key unset pipe action drop
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid
garbage/error pointer.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Return error when failing to create a reporter in devlink. Since
NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer
can't be NULL and can only hold an error in bad path.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move vlan checksum fixup flow into mlx5e_skb_padding_csum(), which is
supposed to fixup SKB checksum if needed. And rename
mlx5e_skb_padding_csum() to mlx5e_skb_csum_fixup().
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If capable, use zero inline mode in TX WQE for non-VLAN packets.
For VLAN ones, keep the enforcement of at least L2 inline mode,
unless the WQE VLAN insertion offload cap is on.
Performance:
Tested single core packet rate of 64Bytes.
NIC: ConnectX-5
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
pktgen:
Before: 12.46 Mpps
After: 14.65 Mpps (+17.5%)
XDP_TX:
The MPWQE flow is not affected, as it already has this optimization.
So we test with priv-flag xdp_tx_mpwqe: off.
Before: 9.90 Mpps
After: 10.20 Mpps (+3%)
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Tested-by: Noam Stolero <noams@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of passing an output param, let function return the
WQE pointer.
In addition, pass &pi so it gets its value in the function,
and save the redundant assignment that comes after it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In MPWQE mode, when transmitting packets with XDP, a packet that is smaller
than a certain size (set to 256 bytes) would be sent inline within its WQE
TX descriptor (mem-copied), in case the hardware tx queue is congested
beyond a pre-defined water-mark.
If a MPWQE cannot contain an additional inline packet, we close this
MPWQE session, and send the packet inlined within the next MPWQE.
To save some MPWQE session close+open operations, we don't open MPWQE
sessions that are contiguously smaller than certain size (set to the
HW MPWQE maximum size). If there isn't enough contiguous room in the
send queue, we fill it with NOPs and wrap the send queue index around.
This way, qualified packets are always sent inline.
Perf tests:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_TX:
With 24 channels:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 113.6Mpps | 96.3Mpps | 84% |
| after | 115Mpps | 99.5Mpps | 86% |
With one channel:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 6.7Mpps | 0pps | 0% |
| after | 6.8Mpps | 0pps | 0% |
As we can see, there is improvement in both inline ratio and overall
packet rate for 24 channels. Also, we see no degradation for the
one-channel case.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We use NOPs to populate the WQ fragment edge if the WQE does not fit
in frag, to avoid WQEs crossing a page boundary (or wrap-around the WQ).
The upper bound on the needed number of NOPs is one WQEBB less than
the largest possible WQE, for otherwise the WQE would certainly fit.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a pool of flow counters, based on flow counter bulks, removing the
need to allocate a new counter via a costly FW command during the flow
creation process. The time it takes to acquire/release a flow counter
is cut from ~50 [us] to ~50 [ns].
The pool is part of the mlx5 driver instance, and provides flow
counters for aging flows. mlx5_fc_create() was modified to provide
counters for aging flows from the pool by default, and
mlx5_destroy_fc() was modified to release counters back to the pool
for later reuse. If bulk allocation is not supported or fails, and for
non-aging flows, the fallback behavior is to allocate and free
individual counters.
The pool is comprised of three lists of flow counter bulks, one of
fully used bulks, one of partially used bulks, and one of unused
bulks. Counters are provided from the partially used bulks first, to
help limit bulk fragmentation.
The pool maintains a threshold, and strives to maintain the amount of
available counters below it. The pool is increased in size when a
counter acquisition request is made and there are no available
counters, and it is decreased in size when the last counter in a bulk
is released and there are more available counters than the threshold.
All pool size changes are done in the context of the
acquiring/releasing process.
The value of the threshold is directly correlated to the amount of
used counters the pool is providing, while constrained by a hard
maximum, and is recalculated every time a bulk is allocated/freed.
This ensures that the pool only consumes large amounts of memory for
available counters if the pool is being used heavily. When fully
populated and at the hard maximum, the buffer of available counters
consumes ~40 [MB].
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add infrastructure to track bulks of flow counters, providing
the means to allocate and deallocate bulks, and to acquire and
release individual counters from the bulks.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use the scheduling elements to implement ingress rate limiter on an
eswitch ports ingress traffic. Since the ingress of eswitch port is the
egress of VF port, we control eswitch ingress by controlling VF egress.
Configuration is done using the ports' representor net devices.
Please note that burst size configuration is not supported by devices
ConnectX-5 and earlier generations.
Configuration examples:
tc:
tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k
ovs:
ovs-vsctl set interface eth0 ingress_policing_rate=1000
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Misc updates from mlx5-next branch.
1) Eli improves the handling of the support for QoS element type
2) Gavi refactors and prepares mlx5 flow counters for bulk allocation
support
3) Parav, refactors and improves E-Switch load/unload flows
4) Saeed, two misc cleanups
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently for PF and ECPF vports, representors are created before
their eswitch hardware ports are initialized in below flow.
mlx5_eswitch_enable()
esw_offloads_init()
esw_offloads_load_all_reps()
[..]
esw_enable_vport()
However for VFs, vports are initialized before creating their
respective netdev represnetors in event handling context.
Similarly while disabling eswitch, first hardware vports are disabled,
followed by destroying their representors.
Here while underlying vports gets destroyed but its respective user
facing netdevice can still exist on which user can continue to perform
more offload operations.
Instead, its more accurate to do
enable_eswitch switchdev mode:
1. perform FDB tables initialization
2. initialize hw vport
3. create and publish representor for this vport
disable_eswitch switchdev mode:
1. destroy user facing representor for the vport
2. disable hw vport
3. perform FDB tables cleanup
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mc_promisc pointer points to an instance of struct esw_mc_addr allocated
as part of the esw structure.
Hence it cannot be NULL.
Removed such redundant check and assign where it is actually used.
While at it, add comment around legacy mode fields and move mc_promisc
close to other legacy mode structures to improve code redability.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We don't need to handle error flow of esw_create_legacy_table() in the
same branch, it is already being handled directly after the if statement,
for both legacy and switchdev modes in one place.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
vports needs to be enabled in switchdev and legacy mode.
In switchdev mode, vports should be enabled after initializing
the FDB tables and before creating their represntors so that
representor works on an initialized vport object.
Prepare a helper function which can be called when enabling either of
the eswitch modes.
Similarly, have disable vports helper function.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
First enable TSAR Qos hardware block in device before enabling its
user vports.
This refactor is needed so that vports can be enabled before their
representor netdevice can be created.
While at it, esw_create_tsar() returns error code which was used only to
print error. However esw_create_tsar() already prints warning if it hits
an error.
Hence, remove the redundant warning.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Except bit toggling code, rest of the code is same to enable/disable
metadata passing functionality.
Hence, combine them to single function and control using enable flag.
Also instead of checking metadata supported at multiple places,
fold into the helper function.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Check if firmware supports the requested element type before
attempting to create the element type.
In addition, explicitly specify the request element type and tsar type.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently mlx5_load_one() perform device registration using
mlx5_register_device(). But mlx5_unload_one() doesn't unregister.
Make them symmetric by doing device unregistration in
mlx5_unload_one().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add a handle to invoke the new FW capability of allocating a bulk of
flow counters.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Towards introducing the ability to allocate bulks of flow counters,
refactor the flow counter bulk query process, removing functions and
structs whose names indicated being used for flow counter bulk
allocation FW commands, despite them actually only being used to
support bulk querying, and migrate their functionality to correctly
named functions in their natural location, fs_counters.c.
Additionally, optimize the bulk query process by:
* Extracting the memory used for the query to mlx5_fc_stats so
that it is only allocated once, and not for each bulk query.
* Querying all the counters in one function call.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In commit e891ce1dd2 ("mlxsw: spectrum_buffers: Reduce pool size on
Spectrum-2"), pool size was reduced to mitigate a problem in port buffer
usage of ports split four ways. It turns out that this work around does not
solve the issue, and a further reduction is required.
Thus reduce the size of pool 0 by another 2.7 MiB, and round down to the
whole number of cells.
Fixes: e891ce1dd2 ("mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of sp2 pci driver registration fail, fix the error path to
start with sp1 pci driver unregister.
Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently are duplicated checks on orig_egr_types which are
redundant, I believe this is a typo and should actually be
orig_ing_types || orig_egr_types instead of the expression
orig_egr_types || orig_egr_types. Fix these.
Addresses-Coverity: ("Same on both sides")
Fixes: c6b36bdd04 ("mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TC flow table is created when first flow is added, and destroyed when last
flow is removed. This assumes that all accesses to the table are externally
synchronized with rtnl lock. To remove dependency on rtnl lock, add new
mutex mlx5e_tc_table->t_lock and use it to protect the flow table.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Function netdev_master_upper_dev_get() generates warning if caller doesn't
hold rtnl lock. Modify rules update path to use rcu version of that
function.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
esw->state_lock is already used to protect vlan vport configuration change.
However, all preparation and correctness checks, and code that sets vport
data are not protected by this lock and assume external synchronization by
rtnl lock. In order to remove dependency on rtnl lock, extend
esw->state_lock protection to whole eswitch vlan add/del functions.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eswitch implements its own locking by means of state_lock mutex and
multiple fine-grained lock in containing data structures, and is supposed
to not rely on rtnl lock. However, eswitch offloads num_flows type is a
regular long long integer and cannot be modified concurrently. This is an
implicit assumptions that mlx5 tc is serialized (by rtnl lock or any other
means). In order to remove implicit dependency on rtnl lock, change
num_flows type to atomic64 to allow concurrent modifications.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to remove dependency on rtnl lock for protecting unready_flows
list when reoffloading unready flows on workqueue, extend representor
uplink private structure with dedicated 'unready_flows_lock' mutex. Take
the lock in all users of unready_flows list before accessing it. Implement
helper functions to add and delete unready flow.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to remove dependency on rtnl lock, access to tc flows hashtable
must be explicitly protected from concurrent flows removal.
Extend tc flow structure with rcu to allow concurrent parallel access. Use
rcu read lock to safely lookup flow in tc flows hash table, and take
reference to it. Use rcu free for flow deletion to accommodate concurrent
stats requests.
Add new DELETED flow flag. Imlement new flow_flag_test_and_set() helper
that is used to set a flag and return its previous value. Use it to
atomically set the flag in mlx5e_delete_flower() to guarantee that flow can
only be deleted once, even when same flow is deleted concurrently by
multiple tasks.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
To remove dependency on rtnl lock and allow concurrent modification of
'flags' field of tc flow structure, change flow flag type to unsigned long
and use atomic bit ops for reading and changing the flags. Implement
auxiliary functions for setting, resetting and getting specific flag, and
for checking most often used flag values.
Always set flags with smp_mb__before_atomic() to ensure that all
mlx5e_tc_flow are updated before concurrent readers can read new flags
value. Rearrange all code paths to actually set flow->rule[] pointers
before setting the OFFLOADED flag. On read side, use smp_mb__after_atomic()
when accessing flags to ensure that offload-related flow fields are only
read after the flags.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With new classifier type that doesn't require rtnl lock, following
invariant holds:
- Filter with specified cookie created only once.
- Filter with specified cookie deleted only once.
- Stats updates can be performed in parallel to each other.
Extend tc flow with rcu and reference counter. To protect from concurrent
delete, get reference to tc flow when:
- Reading flow stats.
- Accessing flow in neigh update handler.
- Accessing flow in neigh update used value handler.
Only free flow when reference counter reached zero. Modify flow cleanup to
account for flows that could be not fully initialized by checking if flow
is actually in the list of corresponding mod_hdr, hairpin and encap
entries. Don't cleanup flow directly in case of error to allow concurrent
neigh update (neigh update will be modified to always take reference to
flow when using it).
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The helper function has "if" branches that do the same. Merge them to
simplify the code.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When call flow_block_cb_is_busy. The indr_priv is guaranteed to
NULL ptr. So there is no need to call flow_bock_cb_is_busy.
Fixes: 0d4fd02e71 ("net: flow_offload: add flow_block_cb_is_busy() and use it")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Don't choose who implements the rxnfc "get/set" callbacks according to
CONFIG_MLX5_EN_RXNFC, instead have the callbacks always available and
delegate to a function of a different driver module when needed
(en_fs_ethtool.c), have stubs in en/fs.h to fallback to when
en_fs_ethtool.c is compiled out, to avoid complications and ifdefs in
en_main.c.
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When disabling CQE compression in favor of time-stamping, don't show a
warning when CQE compression is already disabled.
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When user enables LRO via ethtool and if the RQ mode is legacy,
mlx5e_fix_features drops the request without any explanation.
Add netdev_warn to cover this case.
Fixes: 6c3a823e1e ("net/mlx5e: RX, Remove HW LRO support in legacy RQ")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The commit b9a7ba5562 ("net/mlx5: Use event mask based on device
capabilities") introduced a few compilation warnings due to it bumps
MLX5_EVENT_TYPE_MAX from 0x27 to 0x100 which is always greater than
an "struct {mlx5_eqe|mlx5_nb}.type" that is an "u8".
drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function
'mlx5_eq_notifier_register':
drivers/net/ethernet/mellanox/mlx5/core/eq.c:948:21: warning: comparison
is always false due to limited range of data type [-Wtype-limits]
if (nb->event_type >= MLX5_EVENT_TYPE_MAX)
^~
drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function
'mlx5_eq_notifier_unregister':
drivers/net/ethernet/mellanox/mlx5/core/eq.c:959:21: warning: comparison
is always false due to limited range of data type [-Wtype-limits]
if (nb->event_type >= MLX5_EVENT_TYPE_MAX)
Fix them by removing unnecessary checkings.
Fixes: b9a7ba5562 ("net/mlx5: Use event mask based on device capabilities")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Spectrum systems have a configurable limit on how far into the packet they
parse. By default, the limit is 96 bytes.
An IPv6 PTP packet is layered as Ethernet/IPv6/UDP (14+40+8 bytes), and
sequence ID of a PTP event is only available 32 bytes into payload, for a
total of 94 bytes. When an additional 802.1q header is present as
well (such as when ptp4l is running on a VLAN port), the parsing limit is
exceeded. Such packets are not recognized as PTP, and are not timestamped.
Therefore generalize the current VXLAN-specific parsing depth setting to
allow reference-counted requests from other modules as well. Keep it in the
VXLAN module, because the MPRS register also configures UDP destination
port number used for VXLAN, and is thus closely tied to the VXLAN code
anyway.
Then invoke the new interfaces from both VXLAN (in obvious places), as well
as from PTP code, when the (global) timestamping configuration changes from
disabled to enabled or vice versa.
Fixes: 8748642751 ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matching on reserved TCP flags bits is only supported using custom
parser. Since the usecase for that is not known now, just forbid to
offload rules that match on these bits.
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some matches and actions are not supported on egress. Track such rules
and forbid a bind of block which contains them to egress.
With this patch, the kernel tells the user he cannot do that:
$ tc qdisc add dev ens16np1 ingress_block 22 clsact
$ tc filter add block 22 protocol 802.1q pref 2 handle 101 flower vlan_id 100 skip_sw action pass
$ tc qdisc add dev ens16np2 egress_block 22 clsact
Error: mlxsw_spectrum: Block cannot be bound to egress because it contains unsupported rules.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum ASIC does not support redirection on egress, so refuse to
insert such flows:
$ tc qdisc add dev ens16np1 clsact
$ tc filter add dev ens16np1 egress protocol all pref 1 handle 101 flower skip_sw action mirred egress redirect dev ens16np2
Error: mlxsw_spectrum: Redirect action is not supported on egress.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ConnectX-3 Pro can offload transmission of VLAN packets with VXLAN inside:
enable tunnel offloads in dev->vlan_features, like it's done with other
NIC drivers (e.g. be2net and ixgbe).
It's no more necessary to change dev->hw_enc_features when VXLAN are added
or removed, since .ndo_features_check() already checks for VXLAN packet
where the UDP destination port matches the configured value. Just set
dev->hw_enc_features when the NIC is initialized, so that overlying VLAN
can correctly inherit the tunnel offload capabilities.
Changes since v1:
- avoid flipping hw_enc_features, instead of calling netdev notifiers,
thanks to Saeed Mahameed
- squash two patches into a single one
CC: Paolo Abeni <pabeni@redhat.com>
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl06EYUACgkQSD+KveBX
+j7R4QgAht/C4115mi1Tc3d3zYjHp3SWLFxwK4vF0U2j30ouhsj1oaIP8bQdw6Mr
6hS4IZSdKNO5wo+NNqMnLYVtsAnvNGOuvYwUvMK5TDkdDb2lIzRlxihpWgTqWzXr
6Eh3nv5rTItgLMqxbLL1EE8Idlx3HQDJtU2a/AmxjmU/TqSKzbBTpnKIlRMPDFNC
PLWXjFXBR/XtcTbsnj7RtlD2HkDAERVTiMP2mlTvXjXxlN56YXCle4CWZamgH9H4
bTCrZwQHH9hllMAnAkq4gpHN7Z6/eXjV6jzu+BOE7ChOaEC5N2F+p5ARXqe+HwRL
apMYgRH5u4mzDt+1CbwR/I/pFOw3WA==
=NXce
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-07-25
This series introduces some fixes to mlx5 driver.
1) Ariel is addressing an issue with enacp flow counter race condition
2) Aya fixes ethtool speed handling
3) Edward fixes modify_cq hw bits alignment
4) Maor fixes RDMA_RX capabilities handling
5) Mark reverses unregister devices order to address an issue with LAG
6) From Tariq,
- wrong max num channels indication regression
- TLS counters naming and documentation as suggested by Jakub
- kTLS, Call WARN_ONCE on netdev mismatch
There is one patch in this series that touches nfp driver to align
TLS statistics names with latest documentation, Jakub is CC'ed.
Please pull and let me know if there is any problem.
For -stable v4.9:
('net/mlx5: Use reversed order when unregister devices')
For -stable v4.20
('net/mlx5e: Prevent encap flow counter update async to user query')
('net/mlx5: Fix modify_cq_in alignment')
For -stable v5.1
('net/mlx5e: Fix matching of speed to PRM link modes')
For -stable v5.2
('net/mlx5: Add missing RDMA_RX capabilities')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A netdev mismatch in the processed TLS SKB should not occur,
and indicates a kernel bug.
Add WARN_ONCE to spot such cases.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch prevents a race between user invoked cached counters
query and a neighbor last usage updater.
The cached flow counter stats can be queried by calling
"mlx5_fc_query_cached" which provides the number of bytes and
packets that passed via this flow since the last time this counter
was queried.
It does so by reducting the last saved stats from the current, cached
stats and then updating the last saved stats with the cached stats.
It also provide the lastuse value for that flow.
Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the
last usage time of encapsulation flows, it calls the flow counter
query method periodically and async to user queries of the flow counter
using cls_flower.
This call is causing the driver to update the last reported bytes and
packets from the cache and therefore, future user queries of the flow
stats will return lower than expected number for bytes and packets
since the last saved stats in the driver was updated async to the last
saved stats in cls_flower.
This causes wrong stats presentation of encapsulation flows to user.
Since the neighbor usage updater only needs the lastuse stats from the
cached counter, the fix is to use a dedicated lastuse query call that
returns the lastuse value without synching between the cached stats and
the last saved stats.
Fixes: f6dfb4c3f2 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Speed translation is performed based on legacy or extended PTYS
register. Translate speed with respect to:
1) Capability bit of extended PTYS table.
2) User request:
a) When auto-negotiation is turned on, inspect advertisement whether it
contains extended link modes.
b) When auto-negotiation is turned off, speed > 100Gbps (maximal
speed supported in legacy mode).
With both conditions fulfilled translation is done with extended PTYS
table otherwise use legacy PTYS table.
Without this patch 25/50/100 Gbps speed cannot be set, since try to
configure in extended mode but read from legacy mode.
Fixes: dd1b9e09c1 ("net/mlx5: ethtool, Allow legacy link-modes configuration via non-extended ptys")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
No XSK support in the enhanced IPoIB driver and representors.
Add a profile property to specify this, and enhance the logic
that calculates the max number of channels to take it into
account.
Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
New flow table type RDMA_RX was added but the MLX5_CAP_FLOW_TABLE_TYPE
didn't handle this new flow table type.
This means that MLX5_CAP_FLOW_TABLE_TYPE returns an empty capability to
this flow table type.
Update both the macro and the maximum supported flow table type to
RDMA_RX.
Fixes: d83eb50e29 ("net/mlx5: Add support in RDMA RX steering")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When lag is active, which is controlled by the bonded mlx5e netdev, mlx5
interface unregestering must happen in the reverse order where rdma is
unregistered (unloaded) first, to guarantee all references to the lag
context in hardware is removed, then remove mlx5e netdev interface which
will cleanup the lag context from hardware.
Without this fix during destroy of LAG interface, we observed following
errors:
* mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0xe4ac33)
* mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0xa5aee8).
Fixes: a31208b1e1 ("net/mlx5_core: New init and exit flow for mlx5_core")
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Mark Zhang <markz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Unlike IPv4, the kernel does not consolidate IPv6 nexthop groups. To
avoid exhausting the device's adjacency table - where nexthops are
stored - the driver does this consolidation instead.
Each nexthop group is hashed by XOR-ing the interface indexes of all the
member nexthop devices. However, the ifindex itself is not hashed, which
can result in identical keys used for different groups and finally an
-EBUSY error from rhashtable due to too long objects list.
Improve the situation by hashing the ifindex itself.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike Spectrum-1, the KVD (Key-value database) of Spectrum-2 is not
partitioned, so only expose the entire KVD size. This enables users to
query the total size of the KVD.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While using net_dim, a dim_sample was used without ever initializing the
comps value. Added use of DIV_ROUND_DOWN_ULL() to prevent potential
overflow, it should not be a problem to save the final result in an int
because after the division by epms the value should not be larger than a
few thousand.
[ 1040.127124] UBSAN: Undefined behaviour in lib/dim/dim.c:78:23
[ 1040.130118] signed integer overflow:
[ 1040.131643] 134718714 * 100 cannot be represented in type 'int'
Fixes: 398c2b05bb ("linux/dim: Add completions count to dim_sample")
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlx4_dev_cap and mlx4_init_hca_param are really too large
to be put on the kernel stack, as shown by this clang warning:
drivers/net/ethernet/mellanox/mlx4/main.c:3304:12: error: stack frame size of 1088 bytes in function 'mlx4_load_one' [-Werror,-Wframe-larger-than=]
With gcc, the problem is the same, but it does not warn because
it does not inline this function, and therefore stays just below
the warning limit, while clang is just above it.
Use kzalloc for dynamic allocation instead of putting them
on stack. This gets the combined stack frame down to 424 bytes.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The structure is too large to put on the stack, resulting in a
warning on 32-bit ARM:
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c:59:5: error: stack frame size of 1344 bytes in function
'mlx5e_open_xsk' [-Werror,-Wframe-larger-than=]
Use kvzalloc() instead.
Fixes: a038e9794541 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This object stores the flow block callbacks that are attached to this
block. Update flow_block_cb_lookup() to take this new object.
This patch restores the block sharing feature.
Fixes: da3eeb904f ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename this type definition and adapt users.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to annotate the netns on the flow block callback object,
flow_block_cb_is_busy() already checks for used blocks.
Fixes: d63db30c85 ("net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variable allocated by kvmalloc should not be freed by kfree.
Because it may be allocated by vmalloc.
So replace kfree with kvfree here.
Fixes: 9b1f298236 ("net/mlx5: Add support for FW fatal reporter dump")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch periodically sends notifications about learned FDB entries.
Among other things, the notification includes the FID (Filtering
Identifier) and the port on which the MAC was learned.
In case the driver does not have the FID defined on the relevant port,
the following error will be periodically generated:
mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification
This is not supposed to happen under normal conditions, but can happen
if an ingress tc filter with a redirect action is installed on a bridged
port. The redirect action will cause the packet's FID to be changed to
the dummy FID and a learning notification will be emitted with this FID
- which is not defined on the bridged port.
Fix this by having the driver ignore learning notifications generated
with the dummy FID and delete them from the device.
Another option is to chain an ignore action after the redirect action
which will cause the device to disable learning, but this means that we
need to consume another action whenever a redirect action is used. In
addition, the scenario described above is merely a corner case.
Fixes: cedbb8b259 ("mlxsw: spectrum_flower: Set dummy FID before forward action")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum systems use DSCP rewrite map to update DSCP field in egressing
packets to correspond to priority that the packet has. Whether rewriting
will take place is determined at the point when the packet ingresses the
switch: if the port is in Trust L3 mode, packet priority is determined from
the DSCP map at the port, and DSCP rewrite will happen. If the port is in
Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP
rewrite will happen.
The driver determines the port trust mode based on whether any DSCP
prioritization rules are in effect at given port. If there are any, trust
level is L3, otherwise it's L2. When the last DSCP rule is removed, the
port is switched to trust L2. Under that scenario, if DSCP of a packet
should be rewritten, it should be rewritten to 0.
However, when switching to Trust L2, the driver neglects to also update the
DSCP rewrite map. The last DSCP rule thus remains in effect, and packets
egressing through this port, if they have the right priority, will have
their DSCP set according to this rule.
Fix by first configuring the rewrite map, and only then switching to trust
L2 and bailing out.
Fixes: b2b1dab688 ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently, fl_flow_key->indev_ifindex int field was refactored into
flow_dissector_key_meta field. With this, flower classifier also sets
FLOW_DISSECTOR_KEY_META flow dissector key. However, mlx5 flower dissector
validation code rejects filters that use flow dissector keys that are not
supported. Add FLOW_DISSECTOR_KEY_META to the list of allowed dissector
keys in __parse_cls_flower() to prevent following error when offloading
flower classifier to mlx5:
Error: mlx5_core: Unsupported key.
Fixes: 8212ed777f ("net: sched: cls_flower: use flow_dissector for ingress ifindex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, tunnel attributes are parsed and inner header matching is used
only when flow dissector specifies match on some of the supported
encapsulation fields. When user tries to offload tc filter that doesn't
match any encapsulation fields on tunnel device, mlx5 tc layer incorrectly
sets to match packet header keys on encap header (outer header) and
firmware rejects the rule with syndrome 0x7e1579 when creating new flow
group.
Change __parse_cls_flower() to determine whether tunnel is used based on
fitler_dev tunnel info, instead of determining it indirectly by checking
flow dissector enc keys.
Fixes: bbd00f7e23 ("net/mlx5e: Add TC tunnel release action for SRIOV offloads")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When mlx5e_attach_encap() calls mlx5e_get_tc_tun() to get the tunnel
info data struct, check that returned value is not NULL, as would be in
the case of unsupported encapsulation.
Fixes: d386939a32 ("net/mlx5e: Rearrange tc tunnel code in a modular way")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
kvzalloc already zeroes the memory during the allocation.
pci_alloc_consistent calls dma_alloc_coherent directly.
In commit 518a2f1925
("dma-mapping: zero memory returned from dma_alloc_*"),
dma_alloc_coherent has already zeroed the memory.
So the memset after these function is not needed.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the review of commit 1ff2f0fa45 ("net/mlx5e: Return in default
case statement in tx_post_resync_params"), Leon and Nick pointed out
that the switch statements can be converted to single if statements
that return early so that the code is easier to follow.
Suggested-by: Leon Romanovsky <leon@kernel.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0ng7AACgkQSD+KveBX
+j41FQgAjcUgjCiq6gmwI62EsMpZjgjwKhGfYz56jwSmNDgG7Ltj9o/K/+b+oL3u
ZVPO9YgNstWuoPyApE12MzsA+kx/ydRz8GW6IrI5v71JK3yy8Mw/HrYVyUKEnzH4
BVnSRnvHgvCnjZ3SjmUb6Xs74vmoxti9cYr0w0XtBogzuXlfpSYILYFiqrkks6sH
VEmD9P3dTBNQUviYwkRcHhJoI7NqdQBApn9/30F4dQsmnUsikoaom5GovrbBw817
L3Q+U/9+2Xp5k1i7J2MLFSL6Vyaxs/48eRKxs3YubBmrtwBjPj+aOYdRdDOscguQ
WcDRiYPOW72cGIlFtYyuzApqo+J+xw==
=mq0x
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-07-11
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.15
('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn')
For -stable v5.1
('net/mlx5e: Fix port tunnel GRE entropy control')
('net/mlx5e: Rx, Fix checksum calculation for new hardware')
('net/mlx5e: Fix return value from timeout recover function')
('net/mlx5e: Fix error flow in tx reporter diagnose')
For -stable v5.2
('net/mlx5: E-Switch, Fix default encap mode')
Conflict note: This pull request will produce a small conflict when
merged with net-next.
In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
Take the hunk from net and replace:
esw_offloads_steering_init(esw, vf_nvports, total_nvports);
with:
esw_offloads_steering_init(esw);
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following compiler warning:
In function ‘esw_vport_add_ingress_acl_modify_metadata’:
the frame size of 1084 bytes is larger than 1024 bytes [-Wframe-larger-than=]
Since the structure is never written to, we can statically allocate
it to avoid the stack usage.
Fixes: 7445cfb116 ("net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In mlx5e_setup_tc "priv" variable is not being used if
CONFIG_MLX5_ESWITCH is off, one way to fix this is to actually use it.
mlx5e_setup_tc_mqprio also needs the "priv" variable and it extracts it
on its own. We can simply pass priv to mlx5e_setup_tc_mqprio instead of
netdev and avoid extracting the priv var, which will also resolve the
compiler warning.
Fixes: 4e95bc268b ("net: flow_offload: add flow_block_cb_setup_simple()")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
CC: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the cited patch below, the Kconfig flags combination of:
CONFIG_MLX5_FPGA is not set
CONFIG_MLX5_TLS=y
CONFIG_MLX5_EN_TLS=y
leads to the compilation error:
./include/linux/mlx5/device.h:61:39: error: invalid application of
sizeof to incomplete type struct mlx5_ifc_tls_flow_bits.
Fix it.
Fixes: 90687e1a9a50 ("net/mlx5: Kconfig, Better organize compilation flags")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CC: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix timeout recover function to return a meaningful return value.
When an interrupt was not sent by the FW, return IO error instead of
'true'.
Fixes: c7981bea48 ("net/mlx5e: Fix return status of TX reporter timeout recover")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
CQE checksum full mode in new HW, provides a full checksum of rx frame.
Covering bytes starting from eth protocol up to last byte in the received
frame (frame_size - ETH_HLEN), as expected by the stack.
Fixing up skb->csum by the driver is not required in such case. This fix
is to avoid wrong checksum calculation in drivers which already support
the new hardware with the new checksum mode.
Fixes: 85327a9c41 ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
GRE entropy calculation is a single bit per card, and not per port.
Force disable GRE entropy calculation upon the first GRE encap rule,
and release the force at the last GRE encap rule removal. This is done
per port.
Fixes: 97417f6182 ("net/mlx5e: Fix GRE key by controlling port tunnel entropy calculation")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Encap mode is related to switchdev mode only. Move the init of
the encap mode to eswitch_offloads. Before this change, we reported
that eswitch supports encap, even tough the device was in non
SRIOV mode.
Fixes: 7768d1971d ('net/mlx5: E-Switch, Add control for encapsulation')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
clang warns:
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:251:2:
warning: variable 'rec_seq_sz' is used uninitialized whenever switch
default is taken [-Wsometimes-uninitialized]
default:
^~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:255:46: note:
uninitialized use occurs here
skip_static_post = !memcmp(rec_seq, &rn_be, rec_seq_sz);
^~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:239:16: note:
initialize the variable 'rec_seq_sz' to silence this warning
u16 rec_seq_sz;
^
= 0
1 warning generated.
This case statement was clearly designed to be one that should not be
hit during runtime because of the WARN_ON statement so just return early
to prevent copying uninitialized memory up into rn_be.
Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Link: https://github.com/ClangBuiltLinux/linux/issues/590
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return value was changes to 'int' from void but this return statement
was not updated, or it slipped in via a merge.
Fixes: b5d9a834f4 ("net/tls: don't clear TX resync flag on error")
Signed-off-by: David S. Miller <davem@davemloft.net>
And any other existing fields in this structure that refer to tc.
Specifically:
* tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule().
* TC_CLSFLOWER_* to FLOW_CLS_*.
* tc_cls_common_offload to tc_cls_common_offload.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a function to check if flow block callback is already in
use. Call this new function from flow_block_cb_setup_simple() and from
drivers.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates flow_block_cb_setup_simple() to use the flow block API.
Several drivers are also adjusted to use it.
This patch introduces the per-driver list of flow blocks to account for
blocks that are already in use.
Remove tc_block_offload alias.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename from TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* and
remove temporary tcf_block_binder_type alias.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename from TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND and remove
temporary tc_block_command alias.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most drivers do the same thing to set up the flow block callbacks, this
patch adds a helper function to do this.
This preparation patch reduces the number of changes to adapt the
existing drivers to use the flow block callback API.
This new helper function takes a flow block list per-driver, which is
set to NULL until this driver list is used.
This patch also introduces the flow_block_command and
flow_block_binder_type enumerations, which are renamed to use
FLOW_BLOCK_* in follow up patches.
There are three definitions (aliases) in order to reduce the number of
updates in this patch, which go away once drivers are fully adapted to
use this flow block API.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register devlink port of physical port, PCI PF and PCI VF flavour
for each PF, VF when a given devlink instance is in switchdev mode.
Implement ndo_get_devlink_port callback API to make use of registered
devlink ports.
This eliminates ndo_get_phys_port_name() and ndo_get_port_parent_id()
callbacks. Hence, remove them.
An example output with 2 VFs, without a PF and single uplink port is
below.
$devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0 flavour physical
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a return code for the tls_dev_resync callback.
When the driver TX resync fails, kernel can retry the resync again
until it succeeds. This prevents drivers from attempting to offload
TLS packets if the connection is known to be out of sync.
We don't worry about the RX resync since they will be retried naturally
as more encrypted records get received.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper recently removed page_pool_destroy() (from driver invocation)
and moved shutdown and free of page_pool into xdp_rxq_info_unreg(),
in-order to handle in-flight packets/pages. This created an asymmetry
in drivers create/destroy pairs.
This patch reintroduce page_pool_destroy and add page_pool user
refcnt. This serves the purpose to simplify drivers error handling as
driver now drivers always calls page_pool_destroy() and don't need to
track if xdp_rxq_info_reg_mem_model() was unsuccessful.
This could be used for a special cases where a single RX-queue (with a
single page_pool) provides packets for two net_device'es, and thus
needs to register the same page_pool twice with two xdp_rxq_info
structures.
This patch is primarily to ease API usage for drivers. The recently
merged netsec driver, actually have a bug in this area, which is
solved by this API change.
This patch is a modified version of Ivan Khoronzhuk's original patch.
Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
Fixes: 5c67bf0ec4 ("net: netsec: Use page_pool API")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for transmit side kernel-TLS acceleration.
Offload the crypto encryption to HW.
Per TLS connection:
- Use a separate TIS to maintain the HW context.
- Use a separate encryption key.
- Maintain static and progress HW contexts by posting the proper
WQEs at creation time, or upon resync.
- Use a special DUMP opcode to replay the previous frags and sync
the HW context.
To make sure the SQ is able to serve an xmit request, increase
SQ stop room to cover:
- static params WQE,
- progress params WQE, and
- resync DUMP per frag.
Currently supporting TLS 1.2, and key size 128bit.
Tested over SimX simulator.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the existing mlx5e_post_nop(), but marks a fence
in the WQE control segment.
Added as a separate new function to not hurt the performance
of the common case.
To be used in a downstream patch of the series.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let the EN TIS creation function (mlx5e_create_tis) be responsible
for applying common mdev related fields.
Other specific fields must be set by the caller and passed within
the inbox.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use an SQ field for stop_room, and use the larger value only if TLS
is supported.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When polling a CQE of an SKB-less WQE, don't assume it consumed only
one WQEBB. Use wi->num_wqebbs directly instead.
In the downstream patch, SKB-less WQEs might have more the one WQEBB,
thus this change is needed.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change mlx5e_sq_fetch_wqe to be agnostic to the Work Queue
Element (WQE) type.
Before this patch, it was specific for struct mlx5e_tx_wqe.
In order to allow the change, the function now returns the
generic void pointer, and gets the WQE size to do the zero
memset.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ctrl->tisn field exists, this indicates an operation (HW offload)
on the TCP payload.
For such WQEs, inline the headers up to L4.
This is in preparation for kTLS HW offload support, added in
a downstream patch.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take datapath helper functions to a new header file en/txrx.h.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the new TLS implementation of the Connect-X family.
Introduce a new compilation flag MLX5_TLS for it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Encryption key create / destroy is done via
CREATE_GENERAL_OBJECT / DESTROY_GENERAL_OBJECT commands.
To be used in downstream patches by TLS API wrappers, to configure
the TIS context with the encryption key.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always contain all acceleration functions declarations in
'accel' files, independent to the flags setting.
For this, introduce new flags CONFIG_FPGA_{IPSEC/TLS} and use stubs
where needed.
This obsoletes the need for stubs in 'fpga' files. Remove them.
Also use the new flags in Makefile, to decide whether to compile
TLS-specific or IPSEC-specific objects, or not.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not directly call fpga version of IPsec function from main.c.
Wrap it by an accel version, and call the wrapper.
This will allow deprecating the FPGA IPsec stubs in downstream
patch.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series adds mlx5 support for devlink fw versions query.
1) Implement the required low level firmware commands
2) Implement the devlink knobs and callbacks for fw versions query.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0eZOQACgkQSD+KveBX
+j6a0gf/SbSY2yDz6iyDZy7Zt0e7onD395UyWIoXbXJAI/aMggvdNE233dh369xa
d4Zo8zvFijiFpPKmP0p+WZfvR5YPPWgtOYDv1MbaTz5lqsGYUh2kN9FaTyCNZW3O
jeNidxjsne70VBrpmOZJX2mxduKwRdxchRg2iQPdeZQGO5Uq3nEdOfJVx/mhsIGw
iRczF4IAQpArZEyKXgqMSrtJ99FJ7JSJXnsqjVJyUC1zyCZ+Cq3VV55+u5VCBokd
qr/VSYqFd38SEHR5l4nXDXLX23pOwWNhUdPgGopjak8bN3k19vk6GfQ2bzLFnaOW
4fx1k4uTixKOrbLkgzay/YSsoF2TDw==
=Bwl7
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-07-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-update-2019-07-04
This series adds mlx5 support for devlink fw versions query.
1) Implement the required low level firmware commands
2) Implement the devlink knobs and callbacks for fw versions query.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Apply by filling the PTP shaper parameters array.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When getting port up down event (PUDE), change the PTP shaper
configuration based on hardware time stamping on/off and the port's
speed.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to get more accurate hardware time stamping, the driver needs to
enable PTP shaper on the port, for speeds lower than 40 Gbps.
Enable the PTP shaper on the port when the user turns on the hardware
time stamping, and disable it when the user turns off the hardware time
stamping.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New operation for getting the port's speed as part of port-type-speed
operations.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the PTP shaper parameters during the ptp_init(). For different
speeds, there are different parameters.
When the port's speed changes and PTP shaper is enabled, the firmware
changes the ETS shaper values according to the PTP shaper parameters for
this new speed.
The PTP shaper parameters array is left empty for now, will be filled in
a follow-up patch.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The QPSC allows advanced configuration of the PTP shapers.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add note about disabling the PTP shaper when calling to
mlxsw_sp_port_ets_maxrate_set().
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PTP Shaper field is used for enabling and disabling of port-rate based
shaper which is slightly lower than port rate.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The callback is invoked using 'devlink dev info <pci>' command and returns
the running and pending firmware version of the HCA and the name of the
kernel driver.
If there is a pending firmware version (a new version is burned but the
HCA still runs with the previous) it is returned as the stored
firmware version. Otherwise, the running version is returned for this
field.
Output example:
$ devlink dev info pci/0000:00:06.0
pci/0000:00:06.0:
driver mlx5_core
versions:
fixed:
fw.psid MT_0000000009
running:
fw.version 16.26.0100
stored:
fw.version 16.26.0100
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Using the MCQI and MCQS registers, we query the running and pending
fw version of the HCA.
The MCQS is queried with sequentially increasing component index, until
a component of type BOOT_IMG is found. Querying this component's version
using the MCQI register yields the running and pending fw version of the
HCA.
Querying MCQI for the pending fw version should be done only after
validating that such fw version exists. This is done my checking
'component update state' field in MCQS output.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Misc updates from mlx5-next branch:
1) Add the required HW definitions and structures for upcoming TLS
support.
2) Add support for MCQI and MCQS hardware registers for fw version query.
3) Added hardware bits and structures definitions for sub-functions
4) Small code cleanup and improvement for PF pci driver.
5) Bluefield (ECPF) updates and refactoring for better E-Switch
management on ECPF embedded CPU NIC:
5.1) Consolidate querying eswitch number of VFs
5.2) Register event handler at the correct E-Switch init stage
5.3) Setup PF's inline mode and vlan pop when the ECPF is the
E-Swtich manager ( the host PF is basically a VF ).
5.4) Handle Vport UC address changes in switchdev mode.
6) Cleanup the rep and netdev reference when unloading IB rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
i# All conflicts fixed but you are still merging.
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-07-03
The following pull-request contains BPF updates for your *net-next* tree.
There is a minor merge conflict in mlx5 due to 8960b38932 ("linux/dim:
Rename externally used net_dim members") which has been pulled into your
tree in the meantime, but resolution seems not that bad ... getting current
bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
just so he's aware of the resolution below:
** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:
<<<<<<< HEAD
static int mlx5e_open_cq(struct mlx5e_channel *c,
struct dim_cq_moder moder,
struct mlx5e_cq_param *param,
struct mlx5e_cq *cq)
=======
int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
>>>>>>> e5a3e259ef
Resolution is to take the second chunk and rename net_dim_cq_moder into
dim_cq_moder. Also the signature for mlx5e_open_cq() in ...
drivers/net/ethernet/mellanox/mlx5/core/en.h +977
... and in mlx5e_open_xsk() ...
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64
... needs the same rename from net_dim_cq_moder into dim_cq_moder.
** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:
<<<<<<< HEAD
int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
struct dim_cq_moder icocq_moder = {0, 0};
struct net_device *netdev = priv->netdev;
struct mlx5e_channel *c;
unsigned int irq;
=======
struct net_dim_cq_moder icocq_moder = {0, 0};
>>>>>>> e5a3e259ef
Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
as well.
Let me know if you run into any issues. Anyway, the main changes are:
1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.
2) Addition of two new per-cgroup BPF hooks for getsockopt and
setsockopt along with a new sockopt program type which allows more
fine-grained pass/reject settings for containers. Also add a sock_ops
callback that can be selectively enabled on a per-socket basis and is
executed for every RTT to help tracking TCP statistics, both features
from Stanislav.
3) Follow-up fix from loops in precision tracking which was not propagating
precision marks and as a result verifier assumed that some branches were
not taken and therefore wrongly removed as dead code, from Alexei.
4) Fix BPF cgroup release synchronization race which could lead to a
double-free if a leaf's cgroup_bpf object is released and a new BPF
program is attached to the one of ancestor cgroups in parallel, from Roman.
5) Support for bulking XDP_TX on veth devices which improves performance
in some cases by around 9%, from Toshiaki.
6) Allow for lookups into BPF devmap and improve feedback when calling into
bpf_redirect_map() as lookup is now performed right away in the helper
itself, from Toke.
7) Add support for fq's Earliest Departure Time to the Host Bandwidth
Manager (HBM) sample BPF program, from Lawrence.
8) Various cleanups and minor fixes all over the place from many others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions change event output data size changes when functions other
than VFs will be enabled in HCA CAP.
With current API, multiple callers needs to align, calculate accurate
size of the output data depending on number on non VF functions enabled
in the device.
Instead of duplicating such math at multiple places, refactor
mlx5_esw_query_functions() to return raw output allocated by itself.
Caller must free the allocated memory using kvfree() as described in the
function comment section.
This hides calcuation within mlx5_esw_query_functions() and provides
simpler API.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eswitch function change handler will service multiple type of events for
VFs and non VF functions update.
Hence, introduce and use the helper function
esw_vfs_changed_event_handler() for handling change in num VFs to improve
the code readability.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.
Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Report EQE data upon CQ completion to let upper layers use this data.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Report a CQ error event only when a handler was set.
This enables mlx5_ib to not set a handler upon CQ creation and use some
other mechanism to get this event as of other events by the
mlx5_eq_notifier_register API.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Expose the API to register for ANY event, mlx5_ib will be able to use
this functionality for its needs.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.
As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The firmware command to destroy a CQ might fail when the object is
referenced by other object and the ref count is managed by the firmware.
To enable a second successful destruction post the first failure need to
change mlx5_eq_del_cq() to be a void function.
As an error in mlx5_eq_del_cq() is quite fatal from the option to
recover, a debug message inside it should be good enougth and it was
changed to be void.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Before mlxsw_sp1_ptp_packet_finish() sends the packet back, it validates
whether the corresponding port is still valid. However the condition is
incorrect: when mlxsw_sp_port == NULL, the code dereferences the port to
compare it to skb->dev.
The condition needs to check whether the port is present and skb->dev still
refers to that port (or else is NULL). If that does not hold, bail out.
Add a pair of parentheses to fix the condition.
Fixes: d92e4e6e33 ("mlxsw: spectrum: PTP: Support timestamping on Spectrum-1")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly, other callers of idr_get_next_ul() suffer the same
overflow bug as they don't handle it properly either.
Introduce idr_for_each_entry_continue_ul() to help these callers
iterate from a given ID.
cls_flower needs more care here because it still has overflow when
does arg->cookie++, we have to fold its nested loops into one
and remove the arg->cookie++.
Fixes: 01683a1469 ("net: sched: refactor flower walk to iterate over idr")
Fixes: 12d6066c3b ("net/mlx5: Add flow counters idr")
Reported-by: Li Shuang <shuali@redhat.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Chris Mi <chrism@mellanox.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The get_ts_info callback is used for obtaining information about
timestamping capabilities of a network device. On Spectrum-1, implement
it to advertise the PHC and the capability to do HW timestamping, and
the supported RX and TX filters.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SIOCSHWTSTAMP ioctl configures HW timestamping on a given port.
Dispatch the ioctls to per-chip handler (which add to ptp_ops). Find
which PTP messages need to be timestamped and configure MTPPPC
accordingly.
The SIOCGHWTSTAMP ioctl is getter for the current configuration.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Configure MTPTPT to set which message types should arrive under which
PTP trap, and MOGCR to clear the timestamp queue after its contents are
reported through PTP_ING_FIFO or PTP_EGR_FIFO.
With this configuration, PTP packets start arriving through the PTP
traps. However since timestamping is disabled by default and there is
currently no way to enable it, they will not be timestamped.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>