Commit Graph

37506 Commits

Author SHA1 Message Date
Claudiu Beznea a714e27ea8 net: macb: fix the restore of cmp registers
Commit a14d273ba1 ("net: macb: restore cmp registers on resume path")
introduces the restore of CMP registers on resume path. In case the IP
doesn't support type 2 screeners (zero on DCFG8 register) the
struct macb::rx_fs_list::list is not initialized and thus the
list_for_each_entry(item, &bp->rx_fs_list.list, list) loop introduced in
commit a14d273ba1 ("net: macb: restore cmp registers on resume path")
will access an uninitialized list leading to crash. Thus, initialize
the struct macb::rx_fs_list::list without taking into account if the
IP supports type 2 screeners or not.

Fixes: a14d273ba1 ("net: macb: restore cmp registers on resume path")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 13:57:17 -07:00
Wan Jiabing ace8d281aa sfc: Remove duplicate argument
Fix the following coccicheck warning:

./drivers/net/ethernet/sfc/enum.h:80:7-28: duplicated argument to |

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 13:55:07 -07:00
Lijun Pan 7c451f3ef6 ibmvnic: remove duplicate napi_schedule call in open function
Remove the unnecessary napi_schedule() call in __ibmvnic_open() since
interrupt_rx() calls napi_schedule_prep/__napi_schedule during every
receive interrupt.

Fixes: ed651a1087 ("ibmvnic: Updated reset handling")
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 13:10:58 -07:00
Lijun Pan d3a6abccbd ibmvnic: remove duplicate napi_schedule call in do_reset function
During adapter reset, do_reset/do_hard_reset calls ibmvnic_open(),
which will calls napi_schedule if previous state is VNIC_CLOSED
(i.e, the reset case, and "ifconfig down" case). So there is no need
for do_reset to call napi_schedule again at the end of the function
though napi_schedule will neglect the request if napi is already
scheduled.

Fixes: ed651a1087 ("ibmvnic: Updated reset handling")
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 13:10:58 -07:00
Lijun Pan 0775ebc4cf ibmvnic: avoid calling napi_disable() twice
__ibmvnic_open calls napi_disable without checking whether NAPI polling
has already been disabled or not. This could cause napi_disable
being called twice, which could generate deadlock. For example,
the first napi_disable will spin until NAPI_STATE_SCHED is cleared
by napi_complete_done, then set it again.
When napi_disable is called the second time, it will loop infinitely
because no dev->poll will be running to clear NAPI_STATE_SCHED.

To prevent above scenario from happening, call ibmvnic_napi_disable()
which checks if napi is disabled or not before calling napi_disable.

Fixes: bfc32f2973 ("ibmvnic: Move resource initialization to its own routine")
Suggested-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 13:10:58 -07:00
Heiner Kallweit 453a77894e r8169: don't advertise pause in jumbo mode
It has been reported [0] that using pause frames in jumbo mode impacts
performance. There's no available chip documentation, but vendor
drivers r8168 and r8125 don't advertise pause in jumbo mode. So let's
do the same, according to Roman it fixes the issue.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=212617

Fixes: 9cf9b84cc7 ("r8169: make use of phy_set_asym_pause")
Reported-by: Roman Mamedov <rm+bko@romanrm.net>
Tested-by: Roman Mamedov <rm+bko@romanrm.net>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 13:05:40 -07:00
Heiner Kallweit 216f78ea8c r8169: add support for pause ethtool ops
This adds support for the [g|s]et_pauseparam ethtool ops. It considers
that the chip doesn't support pause frame use in jumbo mode.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 13:04:28 -07:00
David S. Miller 1141bfef9c Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2021-04-13

This series contains updates to ixgbe and ixgbevf driver.

Jostar Yang adds support for BCM54616s PHY for ixgbe.

Chen Lin removes an unused function pointer for ixgbe and ixgbevf.

Bhaskar Chowdhury fixes a typo in ixgbe.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 12:59:41 -07:00
Tan Tee Min f4da56529d net: stmmac: Add support for external trigger timestamping
The Synopsis MAC controller supports auxiliary snapshot feature that
allows user to store a snapshot of the system time based on an external
event.

This patch add supports to the above mentioned feature. Users will be
able to triggered capturing the time snapshot from user-space using
application such as testptp or any other applications that uses the
PTP_EXTTS_REQUEST ioctl request.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Tan Tee Min <tee.min.tan@intel.com>
Co-developed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14 12:57:45 -07:00
Aya Levin 5b232ea94c net/mlx5e: Fix RQ creation flow for queues which doesn't support XDP
Allow to create an RQ which is not registered as an XDP RQ. For example:
the trap-RQ doesn't register as an XDP RQ.

Fixes: 869c5f9262 ("net/mlx5e: Generalize open RQ")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:03:10 -07:00
Wenpeng Liang 31450b435f net/mlx5: Replace spaces with tab at the start of a line
There should be no spaces at the start of the line.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:03:07 -07:00
Wenpeng Liang 9dee115bc1 net/mlx5: Remove return statement exist at the end of void function
void function return statements are not generally useful.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:03:04 -07:00
Wenpeng Liang 02f47c04c3 net/mlx5: Add a blank line after declarations
There should be a blank lines after declarations.

Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:03:01 -07:00
Colin Ian King 82c3ba31c3 net/mlx5: Fix bit-wise and with zero
The bit-wise and of the action field with MLX5_ACCEL_ESP_ACTION_DECRYPT
is incorrect as MLX5_ACCEL_ESP_ACTION_DECRYPT is zero and not intended
to be a bit-flag. Fix this by using the == operator as was originally
intended.

Addresses-Coverity: ("Logically dead code")
Fixes: 7dfee4b1d7 ("net/mlx5: IPsec, Refactor SA handle creation and destruction")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:58 -07:00
Roi Dayan b7f86258a2 net/mlx5: DR, Alloc cmd buffer with kvzalloc() instead of kzalloc()
The cmd size is 8K so use kvzalloc().

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:55 -07:00
Jianbo Liu 9dac2966c5 net/mlx5: DR, Use variably sized data structures for different actions
mlx5dr_action is a generally used data structure, and there is an
union for different types of actions in it. The size of mlx5dr_action
is about 72 bytes, but for those actions with fewer fields, most of
the allocated memory is wasted.
Remove this union, and mlx5dr_action becomes a generic action header.
Then actions are dynamically allocated with needed memory, the data
for each action is stored right after the header.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:52 -07:00
Parav Pandit a74ed24c43 net/mlx5: SF, Reuse stored hardware function id
SF's hardware function id is already stored in mlx5_sf. Reuse it,
instead of querying the hw table.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:49 -07:00
Parav Pandit 6e74e6ea1b net/mlx5: SF, Use device pointer directly
At many places in the code, device pointer is directly available. Make
use of it, instead of accessing it from the table.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:46 -07:00
Parav Pandit 57b92bdd9e net/mlx5: E-Switch, Initialize eswitch acls ns when eswitch is enabled
Currently eswitch flow steering (FS) namespace of vport's ingress and
egress ACL are enabled when FS layer is initialized. This is done even
when eswitch is diabled. This demands that total eswitch ports to be
known to FS layer without eswitch in use.

Given the FS core is not dependent on eswitch, make namespace init and
cleanup routines as helper routines to be invoked only when eswitch is
needed.

With this change, ingress and egress ACL namespaces are created only
when eswitch legacy/offloads mode is enabled.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:43 -07:00
Parav Pandit b55b35382e net/mlx5: E-Switch, Move legacy code to a individual file
Currently eswitch offers two modes. Legacy and offloads.
Offloads code is already in its own file eswitch_offloads.c

However eswitch.c contains the eswitch legacy code and common
infrastructure  code.

To enable future extensions and to better manage generic common eswitch
infrastructure code, move the legacy code to its own legacy.c file.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:40 -07:00
Parav Pandit b16f2bb6b6 net/mlx5: E-Switch, Convert a macro to a helper routine
Convert ESW_ALLOWED macro to a helper routine so that it can be used in
other eswitch files.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:38 -07:00
Parav Pandit 13795553a8 net/mlx5: E-Switch Make cleanup sequence mirror of init
Make cleanup sequence mirror of init sequence for cleaning up reps
and freeing vports.

Also when reps initialization fails, there is no need to perform reps
cleanup.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:35 -07:00
Parav Pandit 6308a5f06b net/mlx5: E-Switch, Make vport number u16
Vport number is 16-bit field in hardware. Make it u16.

Move location of vport in the structure so that it reduces a hole
in the structure.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:32 -07:00
Parav Pandit 7d5ae47891 net/mlx5: E-Switch, Skip querying SF enabled bits
With vhca events, SF state is queried through the VHCA events. Device no
longer expects SF bitmap in the query eswitch functions command.

Hence, remove it to simplify the code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:29 -07:00
Parav Pandit 7bf481d7e7 net/mlx5: E-Switch, let user to enable disable metadata
Currently each packet inserted in eswitch is tagged with a internal
metadata to indicate source vport. Metadata tagging is not always
needed. Metadata insertion is needed for multi-port RoCE, failover
between representors and stacked devices. In many other cases,
metadata enablement is not needed.

Metadata insertion slows down the packet processing rate of the E-switch
when it is in switchdev mode.

Below table show performance gain with metadata disabled for VXLAN
offload rules in both SMFS and DMFS steering mode on ConnectX-5 device.

----------------------------------------------
| steering | metadata | pkt size | rx pps    |
| mode     |          |          | (million) |
----------------------------------------------
| smfs     | disabled | 128Bytes | 42        |
----------------------------------------------
| smfs     | enabled  | 128Bytes | 36        |
----------------------------------------------
| dmfs     | disabled | 128Bytes | 42        |
----------------------------------------------
| dmfs     | enabled  | 128Bytes | 36        |
----------------------------------------------

Hence, allow user to disable metadata using driver specific devlink
parameter. Metadata setting of the eswitch is applicable only for the
switchdev mode.

Example to show and disable metadata before changing eswitch mode:
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
pci/0000:06:00.0:
  name esw_port_metadata type driver-specific
    values:
      cmode runtime value true

$ devlink dev param set pci/0000:06:00.0 \
	  name esw_port_metadata value false cmode runtime

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
changelog:
v1->v2:
 - added performance numbers in commit log
 - updated commit log and documentation for switchdev mode
 - added explicit note on when user can disable metadata in
   documentation
2021-04-14 11:02:26 -07:00
Bhaskar Chowdhury ce2cb12dcc net: ethernet: intel: Fix a typo in the file ixgbe_dcb_nl.c
s/Reprogam/Reprogram/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-13 19:08:48 -07:00
Chen Lin 7eceea90c5 net: intel: Remove unused function pointer typedef ixgbe_mc_addr_itr
Remove the 'ixgbe_mc_addr_itr' typedef as it is not used.

Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-13 19:08:48 -07:00
Jostar Yang 47222864c1 ixgbe: Support external GBE SerDes PHY BCM54616s
The Broadcom PHY is used in switches, so add the ID, and hook it up.

This upstreams the Linux kernel patch from the network operating system
SONiC from February 2020 [1].

[1]: https://github.com/Azure/sonic-linux-kernel/pull/122

Signed-off-by: Jostar Yang <jostar_yang@accton.com>
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-13 19:08:48 -07:00
Ioana Ciornei 166179542e dpaa2-switch: reuse dpaa2_switch_acl_entry_add() for STP frames trap
Since we added the dpaa2_switch_acl_entry_add() function in the previous
patches to hide all the details of actually adding the ACL entry by
issuing a firmware command, let's use it also for adding a CPU trap for
the STP frames.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:12:18 -07:00
Ioana Ciornei 4ba28c1a1a dpaa2-switch: add tc matchall filter support
Add support TC_SETUP_CLSMATCHALL by using the same ACL table entries
framework as for tc flower. Adding a matchall rule is done by installing
an entry which has a mask of all zeroes, thus matching on any packet.

This can be used as a catch-all type of rule if used correctly, ie the
priority of the matchall filter should be kept as the lowest one in the
entire filter block.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:12:18 -07:00
Ioana Ciornei 1110318d83 dpaa2-switch: add tc flower hardware offload on ingress traffic
This patch adds support for tc flower hardware offload on the ingress
path. Shared filter blocks are supported by sharing a single ACL table
between multiple ports.

The following flow keys are supported:
 - Ethernet: dst_mac/src_mac
 - IPv4: dst_ip/src_ip/ip_proto/tos
 - VLAN: vlan_id/vlan_prio/vlan_tpid/vlan_dei
 - L4: dst_port/src_port

As per flow actions, the following are supported:
 - drop
 - mirred egress redirect
 - trap
Each ACL entry (filter) can be setup with only one of the listed
actions.

A sorted single linked list is used to keep the ACL entries by their
order of priority. When adding a new filter, this enables us to quickly
ascertain if the new entry has the highest priority of the entire block
or if we should make some space in the ACL table by increasing the
priority of the filters already in the table.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:12:18 -07:00
Ioana Ciornei 2bf90ba510 dpaa2-switch: install default STP trap rule with the highest priority
Change the default ACL trap rule for STP frames to have the highest
priority.

In the same ACL table will reside both default rules added by the driver
for its internal use as well as rules added with tc flower.  In this
case, the default rules such as the STP one that we already have should
have the highest priority.

Also, remove the check for a full ACL table since we already know that
it's sized so that we don't hit this case.  The last thing changes is
that default trap filters will not be counted in the acl_tbl's num_rules
variable since their number doesn't change.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:12:18 -07:00
Ioana Ciornei 1b0f14b6c2 dpaa2-switch: create a central dpaa2_switch_acl_tbl structure
Introduce a new structure - dpaa2_switch_acl_tbl - to hold all data
related to an ACL table: number of rules added, ACL table id, etc.
This will be used more in the next patches when adding support for
sharing an ACL table between ports.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:12:18 -07:00
Dan Carpenter 5871d0c6b8 ionic: return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes that it wasn't
able to copy.  We want to return -EFAULT to the user.

Fixes: fee6efce56 ("ionic: add hw timestamp support files")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:08:18 -07:00
Ong Boon Leong 132c32ee5b net: stmmac: Add TX via XDP zero-copy socket
We add the support of XDP ZC TX submission and cleaning into
stmmac_tx_clean(). The function is made to clean as many TX complete
frames as possible, i.e. limit by priv->dma_tx_size instead of NAPI
budget. For TX ring that is associated with XSK pool, the function
stmmac_xdp_xmit_zc() is introduced to TX frame buffers from XSK pool by
using xsk_tx_peek_desc(). To make stmmac_tx_clean() support the cleaning
of XSK TX frames, STMMAC_TXBUF_T_XSK_TX TX buffer type is introduced.

As stmmac_tx_clean() uses the return value to cue whether NAPI function
should continue to poll, we augment the caller of stmmac_tx_clean() to
pass NAPI budget instead of priv->dma_tx_size through 'budget' input and
made stmmac_tx_clean() to always clean up-to the TX ring size instead.
This allows us to use the return boolean status of stmmac_xdp_xmit_zc()
to decide if XSK TX work is done or not: If true, set 'xmits' to return
'budget - 1' so that NAPI poll may exit. Else, set 'xmits' to return
'budget' to make NAPI poll continue to poll since XSK TX work is not
done. Finally, at the end of stmmac_tx_clean(), the function now take
a maximum value between 'count' and 'xmits' so that status from both
TX cleaning and XSK TX (only for XDP ZC) is considered.

This patch adds a new NAPI poll called stmmac_napi_poll_rxtx() that is
meant to be enabled/disabled for RX and TX ring that are bound to XSK
pool. This NAPI poll function starts with cleaning TX ring, then submits
XSK TX frames to TX ring before proceed to perform RX operations, i.e.
, receiving RX frames and replenishing RX ring with RX free buffers
obtained from XSK pool. Therefore, during XSK RX and TX setup, the driver
enables stmmac_napi_poll_rxtx() for RX and TX operations, then during
XSK RX and TX pool tear-down, the driver reenables the exisiting
independent NAPI poll functions accordingly: stmmac_napi_poll_rx() and
stmmac_napi_poll_tx().

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:06:51 -07:00
Ong Boon Leong bba2556efa net: stmmac: Enable RX via AF_XDP zero-copy
This patch adds the support for receiving packet via AF_XDP zero-copy
mechanism.

XDP ZC uses 1:1 mapping of XDP buffer to receive packet, therefore the
use of split header is not used currently. The 'xdp_buff' is declared as
union together with a struct that contains 'page', 'addr' and
'page_offset' that are associated with primary buffer.

RX buffers are now allocated either via page_pool or xsk pool. For RX
buffers from xsk_pool they are allocated and deallocated using below
functions:

 * stmmac_alloc_rx_buffers_zc(struct stmmac_priv *priv, u32 queue)
 * dma_free_rx_xskbufs(struct stmmac_priv *priv, u32 queue)

With above functions now available, we then extend the following driver
functions to support XDP ZC:
 * stmmac_reinit_rx_buffers()
 * __init_dma_rx_desc_rings()
 * init_dma_rx_desc_rings()
 * __free_dma_rx_desc_resources()

Note: stmmac_alloc_rx_buffers_zc() may return -ENOMEM due to RX XDP
buffer pool is not allocated (e.g. samples/bpf/xdpsock TX-only). But,
it is still ok to let TX XDP ZC to continue, therefore, the -ENOMEM
is silently ignored to let the driver succcessfully transition to XDP
ZC mode for the said RX and TX queue.

As XDP ZC buffer size is different, the DMA buffer size is required
to be reprogrammed accordingly for RX DMA/Queue that is populated with
XDP buffer from XSK pool.

Next, to add or remove per-queue XSK pool, stmmac_xdp_setup_pool()
will call stmmac_xdp_enable_pool() or stmmac_xdp_disable_pool()
that in-turn coordinates the tearing down and setting up RX ring via
RX buffers and descriptors removal and reallocation through
stmmac_disable_rx_queue() and stmmac_enable_rx_queue(). In addition,
stmmac_xsk_wakeup() is added to initiate XDP RX buffer replenishing
by signalling user application to add available XDP frames back to
FILL queue.

For RX processing using XDP zero-copy buffer, stmmac_rx_zc() is
introduced which is implemented with the assumption that RX split
header is disabled. For XDP verdict is XDP_PASS, the XDP buffer is
copied into a sk_buff allocated through stmmac_construct_skb_zc()
and sent to Linux network GRO inside stmmac_dispatch_skb_zc(). Free RX
buffers are then replenished using stmmac_rx_refill_zc()

v2: introduce __stmmac_disable_all_queues() to contain the original code
    that does napi_disable() and then make stmmac_setup_tc_block_cb()
    to use it. Move synchronize_rcu() into stmmac_disable_all_queues()
    that eventually calls __stmmac_disable_all_queues(). Then,
    make both stmmac_release() and stmmac_suspend() to use
    stmmac_disable_all_queues(). Thanks David Miller for spotting the
    synchronize_rcu() issue in v1 patch.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:06:51 -07:00
Ong Boon Leong bba71cac68 net: stmmac: Refactor __stmmac_xdp_run_prog for XDP ZC
Prepare stmmac_xdp_run_prog() for AF_XDP zero-copy support which will be
added by upcoming patches by splitting out the XDP verdict processing
into __stmmac_xdp_run_prog() and it callable for XDP ZC path which does
not need to verify bpf_prog is not NULL.

The stmmac_xdp_run_prog() is used for regular XDP Rx path which requires
bpf_prog to be verified.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:06:50 -07:00
Ong Boon Leong de0b90e52a net: stmmac: rearrange RX and TX desc init into per-queue basis
Below functions are made to be per-queue in preparation of XDP ZC:

 __init_dma_rx_desc_rings(struct stmmac_priv *priv, u32 queue, gfp_t flags)
 __init_dma_tx_desc_rings(struct stmmac_priv *priv, u32 queue)

The original functions below are stay maintained for all queue usage:

 init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
 init_dma_tx_desc_rings(struct net_device *dev)

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:06:50 -07:00
Ong Boon Leong da5ec7f22a net: stmmac: refactor stmmac_init_rx_buffers for stmmac_reinit_rx_buffers
The per-queue RX buffer allocation in stmmac_reinit_rx_buffers() can be
made to use stmmac_alloc_rx_buffers() by merging the page_pool alloc
checks for "buf->page" and "buf->sec_page" in stmmac_init_rx_buffers().

This is in preparation for XSK pool allocation later.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:06:50 -07:00
Ong Boon Leong 80f573c995 net: stmmac: introduce dma_recycle_rx_skbufs for stmmac_reinit_rx_buffers
Rearrange RX buffer page_pool recycling logics into dma_recycle_rx_skbufs,
so that we prepare stmmac_reinit_rx_buffers() for XSK pool expansion.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:06:50 -07:00
Ong Boon Leong 4298255f26 net: stmmac: rearrange RX buffer allocation and free functions
This patch restructures the per RX queue buffer allocation from page_pool
to stmmac_alloc_rx_buffers().

We also rearrange dma_free_rx_skbufs() so that it can be used in
init_dma_rx_desc_rings() during freeing of RX buffer in the event of
page_pool allocation failure to replace the more efficient method earlier.
The replacement is needed to make the RX buffer alloc and free method
scalable to XDP ZC xsk_pool alloc and free later.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:06:50 -07:00
Shannon Nelson 1da41aa110 ionic: git_ts_info bit shifters
All the uses of HWTSTAMP_FILTER_* values need to be
bit shifters, not straight values.

v2: fixed subject and added Cc Dan and SoB Allen

Fixes: f8ba81da73 ("ionic: add ethtool support for PTP")
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Allen Hubbe <allenbh@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 15:00:33 -07:00
Lijun Pan 870e04ae45 ibmvnic: queue reset work in system_long_wq
The reset process for ibmvnic commonly takes multiple seconds, clearly
making it inappropriate for schedule_work/system_wq. The reason to make
this change is that ibmvnic's use of the default system-wide workqueue
for a relatively long-running work item can negatively affect other
workqueue users. So, queue the relatively slow reset job to the
system_long_wq.

Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 14:56:23 -07:00
Lijun Pan ca09bf7bb1 ibmvnic: correctly use dev_consume/free_skb_irq
It is more correct to use dev_kfree_skb_irq when packets are dropped,
and to use dev_consume_skb_irq when packets are consumed.

Fixes: 0d97338818 ("ibmvnic: Introduce xmit_more support using batched subCRQ hcalls")
Suggested-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 14:49:30 -07:00
Lijun Pan 334c424147 ibmvnic: improve failover sysfs entry
The current implementation relies on H_IOCTL call to issue a
H_SESSION_ERR_DETECTED command to let the hypervisor to send a failover
signal. However, it may not work if there is no backup device or if
the vnic is already in error state,
e.g., "ibmvnic 30000003 env3: rx buffer returned with rc 6".
Add a last resort, that is to schedule a failover reset via CRQ command.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 14:48:44 -07:00
Michael Walle 83216e3988 of: net: pass the dst buffer to of_get_mac_address()
of_get_mac_address() returns a "const void*" pointer to a MAC address.
Lately, support to fetch the MAC address by an NVMEM provider was added.
But this will only work with platform devices. It will not work with
PCI devices (e.g. of an integrated root complex) and esp. not with DSA
ports.

There is an of_* variant of the nvmem binding which works without
devices. The returned data of a nvmem_cell_read() has to be freed after
use. On the other hand the return of_get_mac_address() points to some
static data without a lifetime. The trick for now, was to allocate a
device resource managed buffer which is then returned. This will only
work if we have an actual device.

Change it, so that the caller of of_get_mac_address() has to supply a
buffer where the MAC address is written to. Unfortunately, this will
touch all drivers which use the of_get_mac_address().

Usually the code looks like:

  const char *addr;
  addr = of_get_mac_address(np);
  if (!IS_ERR(addr))
    ether_addr_copy(ndev->dev_addr, addr);

This can then be simply rewritten as:

  of_get_mac_address(np, ndev->dev_addr);

Sometimes is_valid_ether_addr() is used to test the MAC address.
of_get_mac_address() already makes sure, it just returns a valid MAC
address. Thus we can just test its return code. But we have to be
careful if there are still other sources for the MAC address before the
of_get_mac_address(). In this case we have to keep the
is_valid_ether_addr() call.

The following coccinelle patch was used to convert common cases to the
new style. Afterwards, I've manually gone over the drivers and fixed the
return code variable: either used a new one or if one was already
available use that. Mansour Moufid, thanks for that coccinelle patch!

<spml>
@a@
identifier x;
expression y, z;
@@
- x = of_get_mac_address(y);
+ x = of_get_mac_address(y, z);
  <...
- ether_addr_copy(z, x);
  ...>

@@
identifier a.x;
@@
- if (<+... x ...+>) {}

@@
identifier a.x;
@@
  if (<+... x ...+>) {
      ...
  }
- else {}

@@
identifier a.x;
expression e;
@@
- if (<+... x ...+>@e)
-     {}
- else
+ if (!(e))
      {...}

@@
expression x, y, z;
@@
- x = of_get_mac_address(y, z);
+ of_get_mac_address(y, z);
  ... when != x
</spml>

All drivers, except drivers/net/ethernet/aeroflex/greth.c, were
compile-time tested.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-13 14:35:02 -07:00
Colin Ian King ef963ae427 ice: Fix potential infinite loop when using u8 loop counter
A for-loop is using a u8 loop counter that is being compared to
a u32 cmp_dcbcfg->numapp to check for the end of the loop. If
cmp_dcbcfg->numapp is larger than 255 then the counter j will wrap
around to zero and hence an infinite loop occurs. Fix this by making
counter j the same type as cmp_dcbcfg->numapp.

Addresses-Coverity: ("Infinite loop")
Fixes: aeac8ce864 ("ice: Recognize 860 as iSCSI port in CEE mode")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-13 11:50:38 -07:00
Yongxin Liu debb9df311 ixgbe: fix unbalanced device enable/disable in suspend/resume
pci_disable_device() called in __ixgbe_shutdown() decreases
dev->enable_cnt by 1. pci_enable_device_mem() which increases
dev->enable_cnt by 1, was removed from ixgbe_resume() in commit
6f82b25587 ("ixgbe: use generic power management"). This caused
unbalanced increase/decrease. So add pci_enable_device_mem() back.

Fix the following call trace.

  ixgbe 0000:17:00.1: disabling already-disabled device
  Call Trace:
   __ixgbe_shutdown+0x10a/0x1e0 [ixgbe]
   ixgbe_suspend+0x32/0x70 [ixgbe]
   pci_pm_suspend+0x87/0x160
   ? pci_pm_freeze+0xd0/0xd0
   dpm_run_callback+0x42/0x170
   __device_suspend+0x114/0x460
   async_suspend+0x1f/0xa0
   async_run_entry_fn+0x3c/0xf0
   process_one_work+0x1dd/0x410
   worker_thread+0x34/0x3f0
   ? cancel_delayed_work+0x90/0x90
   kthread+0x14c/0x170
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30

Fixes: 6f82b25587 ("ixgbe: use generic power management")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-13 11:49:11 -07:00
Alexander Duyck 31166efb1c ixgbe: Fix NULL pointer dereference in ethtool loopback test
The ixgbe driver currently generates a NULL pointer dereference when
performing the ethtool loopback test. This is due to the fact that there
isn't a q_vector associated with the test ring when it is setup as
interrupts are not normally added to the test rings.

To address this I have added code that will check for a q_vector before
returning a napi_id value. If a q_vector is not present it will return a
value of 0.

Fixes: b02e5a0ebb ("xsk: Propagate napi_id to XDP socket Rx path")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-13 11:03:36 -07:00
Adam Ford 8ef7adc6be net: ethernet: ravb: Enable optional refclk
For devices that use a programmable clock for the AVB reference clock,
the driver may need to enable them.  Add code to find the optional clock
and enable it when available.

Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 14:09:59 -07:00
Yangbo Lu 7294380c52 enetc: support PTP Sync packet one-step timestamping
This patch is to add support for PTP Sync packet one-step timestamping.
Since ENETC single-step register has to be configured dynamically per
packet for correctionField offeset and UDP checksum update, current
one-step timestamping packet has to be sent only when the last one
completes transmitting on hardware. So, on the TX, this patch handles
one-step timestamping packet as below:

- Trasmit packet immediately if no other one in transfer, or queue to
  skb queue if there is already one in transfer.
  The test_and_set_bit_lock() is used here to lock and check state.
- Start a work when complete transfer on hardware, to release the bit
  lock and to send one skb in skb queue if has.

And the configuration for one-step timestamping on ENETC before
transmitting is,

- Set one-step timestamping flag in extension BD.
- Write 30 bits current timestamp in tstamp field of extension BD.
- Update PTP Sync packet originTimestamp field with current timestamp.
- Configure single-step register for correctionField offeset and UDP
  checksum update.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:34:21 -07:00
Yangbo Lu f768e75130 enetc: mark TX timestamp type per skb
Mark TX timestamp type per skb on skb->cb[0], instead of
global variable for all skbs. This is a preparation for
one step timestamp support.

For one-step timestamping enablement, there will be both
one-step and two-step PTP messages to transfer. And a skb
queue is needed for one-step PTP messages making sure
start to send current message only after the last one
completed on hardware. (ENETC single-step register has to
be dynamically configured per message.) So, marking TX
timestamp type per skb is required.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:34:21 -07:00
Lijun Pan 0666ef7f61 ibmvnic: print adapter state as a string
The adapter state can be added or deleted over different versions
of the source code. Print a string instead of a number.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:31:26 -07:00
Lijun Pan caee7bf5b0 ibmvnic: print reset reason as a string
The reset reason can be added or deleted over different versions
of the source code. Print a string instead of a number.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:31:26 -07:00
Lijun Pan c82eaa4064 ibmvnic: clean up the remaining debugfs data structures
Commit e704f0434e ("ibmvnic: Remove debugfs support") did not
clean up everything. Remove the remaining code.

Signed-off-by: Lijun Pan <lijunp213@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:29:10 -07:00
Sriharsha Basavapatna ac797ced1f bnxt_en: Free and allocate VF-Reps during error recovery.
During firmware recovery, VF-Rep configuration in the firmware is lost.
Fix it by freeing and (re)allocating VF-Reps in FW at relevant points
during the error recovery process.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Michael Chan 90f4fd0296 bnxt_en: Refactor __bnxt_vf_reps_destroy().
Add a new helper function __bnxt_free_one_vf_rep() to free one VF rep.
We also reintialize the VF rep fields to proper initial values so that
the function can be used without freeing the VF rep data structure.  This
will be used in subsequent patches to free and recreate VF reps after
error recovery.

Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Sriharsha Basavapatna ea2d37b2b3 bnxt_en: Refactor bnxt_vf_reps_create().
Add a new function bnxt_alloc_vf_rep() to allocate a VF representor.
This function will be needed in subsequent patches to recreate the
VF reps after error recovery.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Vasundhara Volam 190eda1a9d bnxt_en: Invalidate health register mapping at the end of probe.
After probe is successful, interface may not be bought up in all
the cases and health register mapping could be invalid if firmware
undergoes reset. Fix it by invalidating the health register at the
end of probe. It will be remapped during ifup.

Fixes: 43a440c400 ("bnxt_en: Improve the status_reliable flag in bp->fw_health.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Michael Chan 17e1be342d bnxt_en: Treat health register value 0 as valid in bnxt_try_reover_fw().
The retry loop in bnxt_try_recover_fw() should not abort when the
health register value is 0.  It is a valid value that indicates the
firmware is booting up.

Fixes: 861aae786f ("bnxt_en: Enhance retry of the first message to the firmware.")
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:20:38 -07:00
Colin Ian King d0494135f9 net: hns3: Fix potential null pointer defererence of null ae_dev
The reset_prepare and reset_done calls have a null pointer check
on ae_dev however ae_dev is being dereferenced via the call to
ns3_is_phys_func with the ae->pdev argument. Fix this by performing
a null pointer check on ae_dev and hence short-circuiting the
dereference to ae_dev on the call to ns3_is_phys_func.

Addresses-Coverity: ("Dereference before null check")
Fixes: 715c58e94f ("net: hns3: add suspend and resume pm_ops")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:15:21 -07:00
Colin Ian King e701a25840 net: thunderx: Fix unintentional sign extension issue
The shifting of the u8 integers rq->caching by 26 bits to
the left will be promoted to a 32 bit signed int and then
sign-extended to a u64. In the event that rq->caching is
greater than 0x1f then all then all the upper 32 bits of
the u64 end up as also being set because of the int
sign-extension. Fix this by casting the u8 values to a
u64 before the 26 bit left shift.

Addresses-Coverity: ("Unintended sign extension")
Fixes: 4863dea3fa ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:13:57 -07:00
Colin Ian King dd2c796773 cxgb4: Fix unintentional sign extension issues
The shifting of the u8 integers f->fs.nat_lip[] by 24 bits to
the left will be promoted to a 32 bit signed int and then
sign-extended to a u64. In the event that the top bit of the u8
is set then all then all the upper 32 bits of the u64 end up as
also being set because of the sign-extension. Fix this by
casting the u8 values to a u64 before the 24 bit left shift.

Addresses-Coverity: ("Unintended sign extension")
Fixes: 12b276fbf6 ("cxgb4: add support to create hash filters")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12 13:13:17 -07:00
Christophe JAILLET 31457db375 net: davicom: Fix regulator not turned off on failed probe
When the probe fails, we must disable the regulator that was previously
enabled.

This patch is a follow-up to commit ac88c531a5
("net: davicom: Fix regulator not turned off on failed probe") which missed
one case.

Fixes: 7994fe55a4 ("dm9000: Add regulator and reset support to dm9000")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:53:58 -07:00
Qiheng Lin 95291ced81 ehea: add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this driver when it is built
as an external module.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:42:38 -07:00
Vladyslav Tarasiuk 4c88fa412a net/mlx5: Add support for DSFP module EEPROM dumps
Allow the driver to recognise DSFP transceiver module ID and therefore
allow its EEPROM dumps using ethtool.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:34:56 -07:00
Vladyslav Tarasiuk e109d2b204 net/mlx5: Implement get_module_eeprom_by_page()
Implement ethtool_ops::get_module_eeprom_by_page() to enable
support of new SFP standards.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:34:56 -07:00
Vladyslav Tarasiuk e19b0a3474 net/mlx5: Refactor module EEPROM query
Prepare for ethtool_ops::get_module_eeprom_data() implementation by
extracting common part of mlx5_query_module_eeprom() into a separate
function.

Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11 16:34:56 -07:00
Jakub Kicinski 8859a44ea0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

MAINTAINERS
 - keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
 - simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
 - trivial
include/linux/ethtool.h
 - trivial, fix kdoc while at it
include/linux/skmsg.h
 - move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
 - add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
 - trivial

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 20:48:35 -07:00
Claudiu Manoil 6c5e6b4ccc enetc: Use generic rule to map Tx rings to interrupt vectors
Even if the current mapping is correct for the 1 CPU and 2 CPU cases
(currently enetc is included in SoCs with up to 2 CPUs only), better
use a generic rule for the mapping to cover all possible cases.
The number of CPUs is the same as the number of interrupt vectors:

Per device Tx rings -
device_tx_ring[idx], where idx = 0..n_rings_total-1

Per interrupt vector Tx rings -
int_vector[i].ring[j], where i = 0..n_int_vects-1
			     j = 0..n_rings_per_v-1

Mapping rule -
n_rings_per_v = n_rings_total / n_int_vects
for i = 0..n_int_vects - 1:
	for j = 0..n_rings_per_v - 1:
		idx = n_int_vects * j + i
		int_vector[i].ring[j] <- device_tx_ring[idx]

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210409071613.28912-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 18:22:09 -07:00
Vladimir Oltean a93580a02d net: enetc: fix TX ring interrupt storm
The blamed commit introduced a bit in the TX software buffer descriptor
structure for determining whether a BD is final or not; we rearm the TX
interrupt vector for every frame (hence final BD) transmitted.

But there is a problem with the patch: it replaced a condition whose
expression is a bool which was evaluated at the beginning of the "while"
loop with a bool expression that is evaluated on the spot: tx_swbd->is_eof.

The problem with the latter expression is that the tx_swbd has already
been incremented at that stage, so the tx_swbd->is_eof check is in fact
with the _next_ software BD. Which is _not_ final.

The effect is that the CPU is in 100% load with ksoftirqd because it
does not acknowledge the TX interrupt, so the handler keeps getting
called again and again.

The fix is to restore the code structure, and keep the local bool is_eof
variable, just to assign it the tx_swbd->is_eof value instead of
!!tx_swbd->skb.

Fixes: d504498d2e ("net: enetc: add a dedicated is_eof bit in the TX software BD")
Reported-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20210409192759.3895104-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 18:17:12 -07:00
Jakub Kicinski 95b5c29132 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-next 2021-04-09

This pr contains changes from  mlx5-next branch,
already reviewed on netdev and rdma mailing lists, links below.

1) From Leon, Dynamically assign MSI-X vectors count
Already Acked by Bjorn Helgaas.
https://patchwork.kernel.org/project/netdevbpf/cover/20210314124256.70253-1-leon@kernel.org/

2) Cleanup series:
https://patchwork.kernel.org/project/netdevbpf/cover/20210311070915.321814-1-saeed@kernel.org/

From Mark, E-Switch cleanups and refactoring, and the addition
of single FDB mode needed HW bits.

From Mikhael, Remove unused struct field

From Saeed, Cleanup W=1 prototype warning

From Zheng, Esw related cleanup

From Tariq, User order-0 page allocation for EQs

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Implement sriov_get_vf_total_msix/count() callbacks
  net/mlx5: Dynamically assign MSI-X vectors count
  net/mlx5: Add dynamic MSI-X capabilities bits
  PCI/IOV: Add sysfs MSI-X vector assignment interface
  net/mlx5: Use order-0 allocations for EQs
  net/mlx5: Add IFC bits needed for single FDB mode
  net/mlx5: E-Switch, Refactor send to vport to be more generic
  RDMA/mlx5: Use representor E-Switch when getting netdev and metadata
  net/mlx5: E-Switch, Add eswitch pointer to each representor
  net/mlx5: E-Switch, Add match on vhca id to default send rules
  net/mlx5: Remove unused mlx5_core_health member recover_work
  net/mlx5: simplify the return expression of mlx5_esw_offloads_pair()
  net/mlx5: Cleanup prototype warning
====================

Link: https://lore.kernel.org/r/20210409200704.10886-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 18:07:21 -07:00
Dan Carpenter 626b598aa8 net: enetc: fix array underflow in error handling code
This loop will try to unmap enetc_unmap_tx_buff[-1] and crash.

Fixes: 9d2b68cc10 ("net: enetc: add support for XDP_REDIRECT")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/YHBHfCY/yv3EnM9z@mwanda
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 16:48:29 -07:00
Qiheng Lin 524e001b7d cxgb4: remove unneeded if-null-free check
Eliminate the following coccicheck warning:

drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c:529:3-9: WARNING:
 NULL check before some freeing functions is not needed.
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c:533:2-8: WARNING:
 NULL check before some freeing functions is not needed.
drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c:161:2-7: WARNING:
 NULL check before some freeing functions is not needed.
drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c:327:3-9: WARNING:
 NULL check before some freeing functions is not needed.

Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Link: https://lore.kernel.org/r/20210409115339.4598-1-linqiheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 16:47:41 -07:00
Heiner Kallweit 5c2280fc2e r8169: use mac-managed PHY PM
Use the new mac_managed_pm flag to indicate that the driver takes care
of PHY power management.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 16:37:05 -07:00
Heiner Kallweit 557d5dc83f net: fec: use mac-managed PHY PM
Use the new mac_managed_pm flag to work around an issue with KSZ8081 PHY
that becomes unstable when a soft reset is triggered during aneg.

Reported-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Tested-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 16:37:04 -07:00
Salil Mehta cd7e963d2f net: hns3: Trivial spell fix in hns3 driver
Some trivial spelling mistakes which caught my eye during the
review of the code.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Link: https://lore.kernel.org/r/20210409074223.32480-1-salil.mehta@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 12:50:31 -07:00
Sven Van Asbroeck 3bc41d6d27 lan743x: fix ethernet frame cutoff issue
The ethernet frame length is calculated incorrectly. Depending on
the value of RX_HEAD_PADDING, this may result in ethernet frames
that are too short (cut off at the end), or too long (garbage added
to the end).

Fix by calculating the ethernet frame length correctly. For added
clarity, use the ETH_FCS_LEN constant in the calculation.

Many thanks to Heiner Kallweit for suggesting this solution.

Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Fixes: 3e21a10fde ("lan743x: trim all 4 bytes of the FCS; not just 2")
Link: https://lore.kernel.org/lkml/20210408172353.21143-1-TheSven73@gmail.com/
Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: George McCollister <george.mccollister@gmail.com>
Tested-by: George McCollister <george.mccollister@gmail.com>
Link: https://lore.kernel.org/r/20210409003904.8957-1-TheSven73@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-09 12:49:38 -07:00
David S. Miller 4914a4f6a7 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-04-08

This series contains updates to ice driver only.

Chinh adds retrying of sending some AQ commands when receiving EBUSY
error.

Victor modifies how nodes are added to reduce stack usage.

Ani renames some variables to either follow spec naming or to be inline
with naming in the rest of the driver. Ignores EMODE error as there are
cases where this error is expected. Performs some cleanup such as
removing unnecessary checks, doing variable assignments over copies, and
removing unneeded variables. Revises some error codes returned in link
settings to be more appropriate. He also implements support for new
firmware option to get default link configuration which accounts for
any needed NVM based overrides for PHY configuration. He also removes
the rx_gro_dropped stat as the value no longer changes.

Jeb removes setting specific link modes on firmwares that no longer
require it.

Brett removes unnecessary checks when adding and removing VLANs.

Tony fixes a checkpatch warning for unnecessary blank line.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 14:18:25 -07:00
Jiaran Zhang 715c58e94f net: hns3: add suspend and resume pm_ops
To implement the system suspend/resume functions, the NIC driver needs
to support:
1. When the system enters the suspend mode, the driver needs to
implement the suspend callback function of the NIC device. The driver
needs to mute the device, stop all RX/TX activities of the device, and
unmap the interrupt.
2. When the system enters the resume mode, the driver needs to
implement the resume callback function of the NIC device and restore
the device to the state before suspension.

When the system enters the suspend and resume mode, the NIC driver
actually executes the PF function reset process.

When the PFs are suspending/resuming, VFs also enter the suspend/resume
state because the PFs trigger the VFs to reset, therefore no operation
is required when the VF pci_driver is suspending or resuming.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:23:01 -07:00
Jiaran Zhang bb1890d5f9 net: hns3: change flr_prepare/flr_done function names
The flr_prepare/flr_done functions are not only used in the FLR scenario,
but also used in the suspend/resume.

Change the function names to prepare_for_reset/rebuild_for_reset, change
the flr_prepare/flr_done to reset_prepare/reset_done in hnae3_ae_ops.

Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:23:01 -07:00
Shannon Nelson f331809965 ionic: extend ts_config set locking
Make sure the configuration is locked before
operating on it for the replay.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:18:49 -07:00
Shannon Nelson 829600ce5e ionic: add ts_config replay
Split the call into ionic_lif_hwstamp_set() to have two
separate interfaces, one from the ioctl() for changing the
configuration and one for replaying the current configuration
after a FW RESET.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:18:49 -07:00
Shannon Nelson 99b5bea04f ionic: ignore EBUSY on queue start
When starting the queues in the link-check, don't go into
the BROKEN state if the return was EBUSY.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:18:49 -07:00
Shannon Nelson 5111787455 ionic: re-start ptp after queues up
When returning after a firmware reset, re-start the
PTP after we've restarted the general queues.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:18:49 -07:00
Shannon Nelson bd7856bcd4 ionic: add SKBTX_IN_PROGRESS
Set the SKBTX_IN_PROGRESS when offloading the Tx timestamp.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:18:49 -07:00
Shannon Nelson e2ce148e94 ionic: check for valid tx_mode on SKBTX_HW_TSTAMP xmit
Make sure the device is in a Tx offload mode before calling the
hwstamp offload xmit.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:18:49 -07:00
Shannon Nelson e1edcc966a ionic: remove unnecessary compat ifdef
We don't need to look for HAVE_HWSTAMP_TX_ONESTEP_P2P in the
upstream kernel.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:18:49 -07:00
Shannon Nelson 33c252e1ba ionic: fix up a couple of code style nits
Clean up variable declarations.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-08 13:18:49 -07:00
Yongxin Liu 1831da7ea5 ice: fix memory leak of aRFS after resuming from suspend
In ice_suspend(), ice_clear_interrupt_scheme() is called, and then
irq_free_descs() will be eventually called to free irq and its descriptor.

In ice_resume(), ice_init_interrupt_scheme() is called to allocate new
irqs. However, in ice_rebuild_arfs(), struct irq_glue and struct cpu_rmap
maybe cannot be freed, if the irqs that released in ice_suspend() were
reassigned to other devices, which makes irq descriptor's affinity_notify
lost.

So call ice_free_cpu_rx_rmap() before ice_clear_interrupt_scheme(), which
can make sure all irq_glue and cpu_rmap can be correctly released before
corresponding irq and descriptor are released.

Fix the following memory leak.

unreferenced object 0xffff95bd951afc00 (size 512):
  comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s)
  hex dump (first 32 bytes):
    18 00 00 00 18 00 18 00 70 fc 1a 95 bd 95 ff ff  ........p.......
    00 00 ff ff 01 00 ff ff 02 00 ff ff 03 00 ff ff  ................
  backtrace:
    [<0000000072e4b914>] __kmalloc+0x336/0x540
    [<0000000054642a87>] alloc_cpu_rmap+0x3b/0xb0
    [<00000000f220deec>] ice_set_cpu_rx_rmap+0x6a/0x110 [ice]
    [<000000002370a632>] ice_probe+0x941/0x1180 [ice]
    [<00000000d692edba>] local_pci_probe+0x47/0xa0
    [<00000000503934f0>] work_for_cpu_fn+0x1a/0x30
    [<00000000555a9e4a>] process_one_work+0x1dd/0x410
    [<000000002c4b414a>] worker_thread+0x221/0x3f0
    [<00000000bb2b556b>] kthread+0x14c/0x170
    [<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30
unreferenced object 0xffff95bd81b0a2a0 (size 96):
  comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s)
  hex dump (first 32 bytes):
    38 00 00 00 01 00 00 00 e0 ff ff ff 0f 00 00 00  8...............
    b0 a2 b0 81 bd 95 ff ff b0 a2 b0 81 bd 95 ff ff  ................
  backtrace:
    [<00000000582dd5c5>] kmem_cache_alloc_trace+0x31f/0x4c0
    [<000000002659850d>] irq_cpu_rmap_add+0x25/0xe0
    [<00000000495a3055>] ice_set_cpu_rx_rmap+0xb4/0x110 [ice]
    [<000000002370a632>] ice_probe+0x941/0x1180 [ice]
    [<00000000d692edba>] local_pci_probe+0x47/0xa0
    [<00000000503934f0>] work_for_cpu_fn+0x1a/0x30
    [<00000000555a9e4a>] process_one_work+0x1dd/0x410
    [<000000002c4b414a>] worker_thread+0x221/0x3f0
    [<00000000bb2b556b>] kthread+0x14c/0x170
    [<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30

Fixes: 769c500dcc ("ice: Add advanced power mgmt for WoL")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-08 10:21:37 -07:00
Arkadiusz Kubalewski 8a1e918d83 i40e: Fix sparse warning: missing error code 'err'
Set proper return values inside error checking if-statements.

Previously following warning was produced when compiling against sparse.
i40e_main.c:15162 i40e_init_recovery_mode() warn: missing error code 'err'

Fixes: 4ff0ee1af0 ("i40e: Introduce recovery mode support")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-08 10:21:37 -07:00
Arkadiusz Kubalewski 6b5674fe6b i40e: Fix sparse error: 'vsi->netdev' could be null
Remove vsi->netdev->name from the trace.
This is redundant information. With the devinfo trace, the adapter
is already identifiable.

Previously following error was produced when compiling against sparse.
i40e_main.c:2571 i40e_sync_vsi_filters() error:
	we previously assumed 'vsi->netdev' could be null (see line 2323)

Fixes: b603f9dc20 ("i40e: Log info when PF is entering and leaving Allmulti mode.")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-08 10:21:37 -07:00
Arkadiusz Kubalewski d6d04ee6d2 i40e: Fix sparse error: uninitialized symbol 'ring'
Init pointer with NULL in default switch case statement.

Previously the error was produced when compiling against sparse.
i40e_debugfs.c:582 i40e_dbg_dump_desc() error: uninitialized symbol 'ring'.

Fixes: 44ea803e2f ("i40e: introduce new dump desc XDP command")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-08 10:21:37 -07:00
Arkadiusz Kubalewski 12738ac475 i40e: Fix sparse errors in i40e_txrx.c
Remove error handling through pointers. Instead use plain int
to return value from i40e_run_xdp(...).

Previously:
- sparse errors were produced during compilation:
i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing possible ERR_PTR()

- sk_buff* was used to return value, but it has never had valid
pointer to sk_buff. Returned value was always int handled as
a pointer.

Fixes: 0c8493d90b ("i40e: add XDP support for pass and drop actions")
Fixes: 2e68931238 ("i40e: split XDP_TX tail and XDP_REDIRECT map flushing")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-08 10:21:37 -07:00
Grzegorz Siwik b2d0efc4be i40e: Fix parameters in aq_get_phy_register()
Change parameters order in aq_get_phy_register() due to wrong
statistics in PHY reported by ethtool. Previously all PHY statistics were
exactly the same for all interfaces
Now statistics are reported correctly - different for different interfaces

Fixes: 0514db37dd ("i40e: Extend PHY access with page change flag")
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-08 10:21:37 -07:00
Tony Nguyen 2e20521b80 ice: Remove unnecessary blank line
Checkpatch reports the following, fix it.

-----------------------------------------
drivers/net/ethernet/intel/ice/ice_main.c
-----------------------------------------
CHECK:BRACES: Blank lines aren't necessary before a close brace '}'
FILE: drivers/net/ethernet/intel/ice/ice_main.c:455:
+
+}

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
2021-04-07 17:09:16 -07:00
Brett Creeley 771015b90b ice: Remove unnecessary checks in add/kill_vid ndo ops
Currently the driver is doing two unnecessary checks. First both ops are
checking if the VLAN ID passed in is less than VLAN_N_VID and second
both ops are checking to see if a port VLAN is configured on the VSI.

The first check is already handled by the 8021q driver so this is an
unnecessary check. The second check is unnecessary because the PF VSI is
never put into a port VLAN.

Remove these checks.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:16 -07:00
Anirudh Venkataramanan 51fe27e179 ice: Remove rx_gro_dropped stat
Tracking of the rx_gro_dropped statistic was removed in
commit f73fc40327 ("ice: drop dead code in ice_receive_skb()").
Remove the associated variables and its reporting to ethtool stats.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:16 -07:00
Anirudh Venkataramanan efc1eddb28 ice: Use local variable instead of pointer derefs
Replace multiple instances of vsi->back and pi->phy with equivalent
local variables

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:16 -07:00
Anirudh Venkataramanan dc6aaa139f ice: Remove unnecessary variable
In ice_init_phy_user_cfg, vsi is used only to get to hw. Remove this
and just use pi->hw

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:16 -07:00
Jeb Cramer 75751c80d6 ice: Limit forced overrides based on FW version
Beyond a specific version of firmware, there is no need to provide
override values to the firmware when setting PHY capabilities.  In this
case, we do not need to indicate whether we're in Strict or Lenient Link
Mode.

In the case of translating capabilities to the configuration structure,
the module compliance enforcement is already correctly set by firmware,
so the extra code block is redundant.

Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:16 -07:00
Anirudh Venkataramanan 0a02944fea ice: Use default configuration mode for PHY configuration
Recent firmware supports a new "get PHY capabilities" mode
ICE_AQC_REPORT_DFLT_CFG which makes it unnecessary for the driver
to track and apply NVM based default link overrides.

If FW AQ API version supports it, use Report Default Configuration.
Add check function for Report Default Configuration support and update
accordingly.

Also change adv_phy_type_[lo|hi] to advert_phy_type[lo|hi] for
clarity.

Co-developed-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Anirudh Venkataramanan 178a666daa ice: Replace some memsets and memcpys with assignment
In ice_set_link_ksettings, use assignment instead of memset/memcpy
where possible

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Anirudh Venkataramanan 450f10e794 ice: Fix error return codes in ice_set_link_ksettings
Return more appropriate error codes so that the right error
message is communicated to the user by ethtool.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Anirudh Venkataramanan 0be39bb4c7 ice: Rename a couple of variables
In ice_set_link_ksettings, change 'abilities' to 'phy_caps' and 'p' to
'pi'. This is more consistent with similar usages elsewhere in the
driver.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Anirudh Venkataramanan fd3dc1655e ice: Remove unnecessary checker loop
The loop checking for PF VSI doesn't make any sense. The VSI type
backing the netdev passed to ice_set_link_ksettings will always be
of type ICE_PF_VSI. Remove it.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Anirudh Venkataramanan d348d51771 ice: Ignore EMODE return for opcode 0x0605
When link is owned by manageability, the driver is not allowed to fiddle
with link. FW returns ICE_AQ_RC_EMODE if the driver attempts to do so.
This patch adds a new function ice_set_link which abstracts the call to
ice_aq_set_link_restart_an and provides a clean way to turn on/off link.

While making this change, I also spotted that an int variable was being
used to hold both an ice_status return code and the Linux errno return
code. This pattern more often than not results in the driver inadvertently
returning ice_status back to kernel which is a major boo-boo. Clean it up.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Anirudh Venkataramanan d6730a871e ice: Align macro names to the specification
For get PHY abilities AQ, the specification defines "report modes"
as "with media", "without media" and "active configuration". For
clarity, rename macros to align with the specification.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Victor Raj 7fb09a7375 ice: Modify recursive way of adding nodes
Remove the recursive way of adding the nodes to the layer in order
to reduce the stack usage. Instead the algorithm is modified to use
a while loop.

The previous code was scanning recursively the nodes horizontally.
The total stack consumption will be based on number of nodes present
on that layer. In some cases it can consume more stack.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Chinh T Cao 3056df93f7 ice: Re-send some AQ commands, as result of EBUSY AQ error
Retry sending some AQ commands, as result of EBUSY AQ error.
ice_aqc_opc_get_link_topo
ice_aqc_opc_lldp_stop
ice_aqc_opc_lldp_start
ice_aqc_opc_lldp_filter_ctrl

This change follows the latest guidelines from HW team. It is
better to retry the same AQ command several times, as the result
of EBUSY, instead of returning error to the caller right away.

Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-07 17:09:15 -07:00
Wei Yongjun 3cd52c1e32 net: fealnx: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 15:15:02 -07:00
Wei Yongjun 6381c45b28 net: atheros: atl2: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 15:15:01 -07:00
Wei Yongjun f670149a4f net: sundance: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 15:15:01 -07:00
Wei Yongjun 02f2743ecd tulip: de2104x: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 15:15:01 -07:00
Wei Yongjun 95b2fbdb93 tulip: windbond-840: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 15:15:01 -07:00
Wei Yongjun 1ffa660443 enic: use module_pci_driver to simplify the code
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 15:15:01 -07:00
Wei Yongjun 4e92cac843 net: encx24j600: use module_spi_driver to simplify the code
module_spi_driver() makes the code simpler by eliminating
boilerplate code.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 15:15:01 -07:00
Colin Ian King 298b58f00c liquidio: Fix unintented sign extension of a left shift of a u16
The macro CN23XX_PEM_BAR1_INDEX_REG is being used to shift oct->pcie_port
(a u16) left 24 places. There are two subtle issues here, first the
shift gets promoted to an signed int and then sign extended to a u64.
If oct->pcie_port is 0x80 or more then the upper bits get sign extended
to 1. Secondly shfiting a u16 24 bits will lead to an overflow so it
needs to be cast to a u64 for all the bits to not overflow.

It is entirely possible that the u16 port value is never large enough
for this to fail, but it is useful to fix unintended overflows such
as this.

Fix this by casting the port parameter to the macro to a u64 before
the shift.

Addresses-Coverity: ("Unintended sign extension")
Fixes: 5bc67f587b ("liquidio: CN23XX register definitions")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:54:49 -07:00
Danielle Ratson a975d7d8a3 ethtool: Remove link_mode param and derive link params from driver
Some drivers clear the 'ethtool_link_ksettings' struct in their
get_link_ksettings() callback, before populating it with actual values.
Such drivers will set the new 'link_mode' field to zero, resulting in
user space receiving wrong link mode information given that zero is a
valid value for the field.

Another problem is that some drivers (notably tun) can report random
values in the 'link_mode' field. This can result in a general protection
fault when the field is used as an index to the 'link_mode_params' array
[1].

This happens because such drivers implement their set_link_ksettings()
callback by simply overwriting their private copy of
'ethtool_link_ksettings' struct with the one they get from the stack,
which is not always properly initialized.

Fix these problems by removing 'link_mode' from 'ethtool_link_ksettings'
and instead have drivers call ethtool_params_from_link_mode() with the
current link mode. The function will derive the link parameters (e.g.,
speed) from the link mode and fill them in the 'ethtool_link_ksettings'
struct.

v3:
	* Remove link_mode parameter and derive the link parameters in
	  the driver instead of passing link_mode parameter to ethtool
	  and derive it there.

v2:
	* Introduce 'cap_link_mode_supported' instead of adding a
	  validity field to 'ethtool_link_ksettings' struct.

[1]
general protection fault, probably for non-canonical address 0xdffffc00f14cc32c: 0000 [#1] PREEMPT SMP KASAN
KASAN: probably user-memory-access in range [0x000000078a661960-0x000000078a661967]
CPU: 0 PID: 8452 Comm: syz-executor360 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__ethtool_get_link_ksettings+0x1a3/0x3a0 net/ethtool/ioctl.c:446
Code: b7 3e fa 83 fd ff 0f 84 30 01 00 00 e8 16 b0 3e fa 48 8d 3c ed 60 d5 69 8a 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03
+38 d0 7c 08 84 d2 0f 85 b9
RSP: 0018:ffffc900019df7a0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888026136008 RCX: 0000000000000000
RDX: 00000000f14cc32c RSI: ffffffff873439ca RDI: 000000078a661960
RBP: 00000000ffff8880 R08: 00000000ffffffff R09: ffff88802613606f
R10: ffffffff873439bc R11: 0000000000000000 R12: 0000000000000000
R13: ffff88802613606c R14: ffff888011d0c210 R15: ffff888011d0c210
FS:  0000000000749300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004b60f0 CR3: 00000000185c2000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 linkinfo_prepare_data+0xfd/0x280 net/ethtool/linkinfo.c:37
 ethnl_default_notify+0x1dc/0x630 net/ethtool/netlink.c:586
 ethtool_notify+0xbd/0x1f0 net/ethtool/netlink.c:656
 ethtool_set_link_ksettings+0x277/0x330 net/ethtool/ioctl.c:620
 dev_ethtool+0x2b35/0x45d0 net/ethtool/ioctl.c:2842
 dev_ioctl+0x463/0xb70 net/core/dev_ioctl.c:440
 sock_do_ioctl+0x148/0x2d0 net/socket.c:1060
 sock_ioctl+0x477/0x6a0 net/socket.c:1177
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: c8907043c6 ("ethtool: Get link mode in use instead of speed and duplex parameters")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:53:04 -07:00
Colin Ian King 7b3ae17f0f xircom: remove redundant error check on variable err
The error check on err is always false as err is always 0 at the
port_found label. The code is redundant and can be removed.

Addresses-Coverity: ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:48:32 -07:00
David S. Miller f86c70ed04 mlx5-updates-2021-04-06
Introduce TC sample offload
 
 Background
 ----------
 
 The tc sample action allows user to sample traffic matched by tc
 classifier. The sampling consists of choosing packets randomly and
 sampling them using psample module.
 
 The tc sample parameters include group id, sampling rate and packet's
 truncation (to save kernel-user traffic).
 
 Sample in TC SW
 ---------------
 
 User must specify rate and group id for sample action, truncate is
 optional.
 
 tc filter add dev enp4s0f0_0 ingress protocol ip prio 1 flower	\
 	src_mac 02:25:d0:14:01:02 dst_mac 02:25:d0:14:01:03	\
 	action sample rate 10 group 5 trunc 60			\
 	action mirred egress redirect dev enp4s0f0_1
 
 The tc sample action kernel module 'act_sample' will call another
 kernel module 'psample' to send sampled packets to userspace.
 
 MLX5 sample HW offload - MLX5 driver patches
 --------------------------------------------
 
 The sample action is translated to a goto flow table object
 destination which samples packets according to the provided
 sample ratio. Sampled packets are duplicated. One copy is
 processed by a termination table, named the sample table,
 which sends the packet to the eswitch manager port (that will
 be processed by software).
 
 The second copy is processed by the default table which executes
 the subsequent actions. The default table is created per <vport,
 chain, prio> tuple as rules with different prios and chains may
 overlap.
 
 For example, for the following typical flow table:
 
 +-------------------------------+
 +       original flow table     +
 +-------------------------------+
 +         original match        +
 +-------------------------------+
 + sample action + other actions +
 +-------------------------------+
 
 We translate the tc filter with sample action to the following HW model:
 
         +---------------------+
         + original flow table +
         +---------------------+
         +   original match    +
         +---------------------+
                    |
                    v
 +------------------------------------------------+
 +                Flow Sampler Object             +
 +------------------------------------------------+
 +                    sample ratio                +
 +------------------------------------------------+
 +    sample table id    |    default table id    +
 +------------------------------------------------+
            |                            |
            v                            v
 +-----------------------------+  +----------------------------------------+
 +        sample table         +  + default table per <vport, chain, prio> +
 +-----------------------------+  +----------------------------------------+
 + forward to management vport +  +            original match              +
 +-----------------------------+  +----------------------------------------+
                                  +            other actions               +
                                  +----------------------------------------+
 
 Flow sampler object
 -------------------
 
 Hardware introduces flow sampler object to do sample. It is a new
 destination type. Driver needs to specify two flow table ids in it.
 One is sample table id. The other one is the default table id.
 Sample table samples the packets according to the sample rate and
 forward the sampled packets to eswitch manager port. Default table
 finishes the subsequent actions.
 
 Group id and reg_c0
 -------------------
 
 Userspace program will take different actions for sampled packets
 according to tc sample action group id. So hardware must pass group
 id to software for each sampled packets. In Paul Blakey's "Introduce
 connection tracking offload" patch set, reg_c0 lower 16 bits are used
 for miss packet chain id restore. We convert reg_c0 lower 16 bits to
 a common object pool, so other features can also use it.
 
 Since sample group id is 32 bits, create a 16 bits object id to map
 the group id and write the object id to reg_c0 lower 16 bits. reg_c0
 can only be used for matching. Write reg_c0 to flow_tag, so software
 can get the object id via flow_tag and find group id via the common
 object pool.
 
 Sampler restore handle
 ----------------------
 
 Use common object pool to create an object id to map sample parameters.
 Allocate a modify header action to write the object id to reg_c0 lower
 16 bits. Create a restore rule to pass the object id to software. So
 software can identify sampled packets via the object id and send it to
 userspace.
 
 Aggregate the modify header action, restore rule and object id to a
 sample restore handle. Re-use identical sample restore handle for
 the same object id.
 
 Send sampled packets to userspace
 ---------------------------------
 
 The destination for sampled packets is eswitch manager port, so
 representors can receive sampled packets together with the group id.
 Driver will send sampled packets and group id to userspace via psample.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmBtNrUACgkQSD+KveBX
 +j6cRQf/ZARhVPDEgCvFd+wD+n2VCM11FJCpIumGecfqpA9DB/7i0iQrBWG2cGy6
 Go3XZ7HCPy0bAeDnVMBulF5RshfQkB/CNJfCTrw0QkNvenO/eYPZrl0XAGwL7w8W
 9vkeK51VG70bj7VEMeWVovL0X2VoGea0MD0ASLgOG3qZmCjFX0Aw3yY4WNZAA1fn
 i9rSP0AgTXqbR+nUezqP9xDHCyEf4etqpdPO/gosFvasZxTa9Xm6tXxT8YrcjAEH
 MjIYJVS5SERem/gxqrRi5p0u1RNrbZ3vPMmZQIr6x2eBXLwMhvjvcxKqZ2l9PvD5
 +O+Hf43GAmhAoqZukvU8H8oMWArciA==
 =MkzD
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-04-06

Introduce TC sample offload

Background
----------

The tc sample action allows user to sample traffic matched by tc
classifier. The sampling consists of choosing packets randomly and
sampling them using psample module.

The tc sample parameters include group id, sampling rate and packet's
truncation (to save kernel-user traffic).

Sample in TC SW
---------------

User must specify rate and group id for sample action, truncate is
optional.

tc filter add dev enp4s0f0_0 ingress protocol ip prio 1 flower	\
	src_mac 02:25:d0:14:01:02 dst_mac 02:25:d0:14:01:03	\
	action sample rate 10 group 5 trunc 60			\
	action mirred egress redirect dev enp4s0f0_1

The tc sample action kernel module 'act_sample' will call another
kernel module 'psample' to send sampled packets to userspace.

MLX5 sample HW offload - MLX5 driver patches
--------------------------------------------

The sample action is translated to a goto flow table object
destination which samples packets according to the provided
sample ratio. Sampled packets are duplicated. One copy is
processed by a termination table, named the sample table,
which sends the packet to the eswitch manager port (that will
be processed by software).

The second copy is processed by the default table which executes
the subsequent actions. The default table is created per <vport,
chain, prio> tuple as rules with different prios and chains may
overlap.

For example, for the following typical flow table:

+-------------------------------+
+       original flow table     +
+-------------------------------+
+         original match        +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+

We translate the tc filter with sample action to the following HW model:

        +---------------------+
        + original flow table +
        +---------------------+
        +   original match    +
        +---------------------+
                   |
                   v
+------------------------------------------------+
+                Flow Sampler Object             +
+------------------------------------------------+
+                    sample ratio                +
+------------------------------------------------+
+    sample table id    |    default table id    +
+------------------------------------------------+
           |                            |
           v                            v
+-----------------------------+  +----------------------------------------+
+        sample table         +  + default table per <vport, chain, prio> +
+-----------------------------+  +----------------------------------------+
+ forward to management vport +  +            original match              +
+-----------------------------+  +----------------------------------------+
                                 +            other actions               +
                                 +----------------------------------------+

Flow sampler object
-------------------

Hardware introduces flow sampler object to do sample. It is a new
destination type. Driver needs to specify two flow table ids in it.
One is sample table id. The other one is the default table id.
Sample table samples the packets according to the sample rate and
forward the sampled packets to eswitch manager port. Default table
finishes the subsequent actions.

Group id and reg_c0
-------------------

Userspace program will take different actions for sampled packets
according to tc sample action group id. So hardware must pass group
id to software for each sampled packets. In Paul Blakey's "Introduce
connection tracking offload" patch set, reg_c0 lower 16 bits are used
for miss packet chain id restore. We convert reg_c0 lower 16 bits to
a common object pool, so other features can also use it.

Since sample group id is 32 bits, create a 16 bits object id to map
the group id and write the object id to reg_c0 lower 16 bits. reg_c0
can only be used for matching. Write reg_c0 to flow_tag, so software
can get the object id via flow_tag and find group id via the common
object pool.

Sampler restore handle
----------------------

Use common object pool to create an object id to map sample parameters.
Allocate a modify header action to write the object id to reg_c0 lower
16 bits. Create a restore rule to pass the object id to software. So
software can identify sampled packets via the object id and send it to
userspace.

Aggregate the modify header action, restore rule and object id to a
sample restore handle. Re-use identical sample restore handle for
the same object id.

Send sampled packets to userspace
---------------------------------

The destination for sampled packets is eswitch manager port, so
representors can receive sampled packets together with the group id.
Driver will send sampled packets and group id to userspace via psample.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:38:24 -07:00
Vadim Pasternak d567fd6e82 mlxsw: core: Remove critical trip points from thermal zones
Disable software thermal protection by removing critical trip points
from all thermal zones.

The software thermal protection is redundant given there are two layers
of protection below it in firmware and hardware. The first layer is
performed by firmware, the second, in case firmware was not able to
perform protection, by hardware.
The temperature threshold set for hardware protection is always higher
than for firmware.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:26:18 -07:00
Voon Weifeng 017d6250ad stmmac: intel: Enable SERDES PHY rx clk for PSE
EHL PSE SGMII mode requires to ungate the SERDES PHY rx clk for power up
sequence and vice versa.

Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-07 14:24:23 -07:00
Chris Mi f94d6389f6 net/mlx5e: TC, Add support to offload sample action
The following diagram illustrates the hardware model for tc sample action:

        +---------------------+
        + original flow table +
        +---------------------+
        +   original match    +
        +---------------------+
                   |
                   v
+------------------------------------------------+
+                Flow Sampler Object             +
+------------------------------------------------+
+                    sample ratio                +
+------------------------------------------------+
+    sample table id    |    default table id    +
+------------------------------------------------+
           |                            |
           v                            v
+-----------------------------+  +----------------------------------------+
+        sample table         +  + default table per <vport, chain, prio> +
+-----------------------------+  +----------------------------------------+
+ forward to management vport +  +            original match              +
+-----------------------------+  +----------------------------------------+
                                 +            other actions               +
                                 +----------------------------------------+

The sample action is translated to a goto flow table object
destination which samples packets according to the provided
sample ratio. Sampled packets are duplicated. One copy is
processed by a termination table, named the sample table,
which sends the packet to the eswitch manager port (that will
be processed by software).

The second copy is processed by the default table which executes
the subsequent actions. The default table is created per <vport,
chain, prio> tuple as rules with different prios and chains may
overlap.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:05 -07:00
Chris Mi be9dc00474 net/mlx5e: TC, Handle sampled packets
Mark the sampled packets with a sample restore object. Send sampled
packets using the psample api.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:04 -07:00
Chris Mi 7319a1cc3c net/mlx5e: TC, Refactor tc update skb function
As a pre-step to process sampled packet in this function.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:04 -07:00
Chris Mi 36a3196256 net/mlx5e: TC, Add sampler restore handle API
Use common object pool to create an object ID to map sample parameters.
Allocate a modify header action to write the object ID to reg_c0 lower
16 bits. Create a restore rule to pass the object ID to software. So
software can identify sampled packets via the object ID and send it to
userspace.

Aggregate the modify header action, restore rule and object ID to a
sample restore handle. Re-use identical sample restore handle for
the same object ID.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:04 -07:00
Chris Mi 11ecd6c60b net/mlx5e: TC, Add sampler object API
In order to offload sample action, HW introduces sampler object. The
sampler object samples packets according to the provided sample ratio.
Sampled packets are duplicated. One copy is processed by a termination
table, named the sample table, which sends the packet up to software.
The second copy is processed by the default table.

Instantiate sampler object. Re-use identical sampler object for
the same sample ratio, sample table and default table as a prestep for
offloading tc sample actions.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:03 -07:00
Chris Mi 2a9ab10a56 net/mlx5e: TC, Add sampler termination table API
Sampled packets are sent to software using termination tables. There
is only one rule in that table that is to forward sampled packets to
the e-switch management vport.

Create a sampler termination table and rule for each eswitch.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:03 -07:00
Chris Mi 41c2fd9498 net/mlx5e: TC, Parse sample action
Parse TC sample action and save sample parameters in flow attribute
data structure.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:03 -07:00
Chris Mi c935568271 net/mlx5: Instantiate separate mapping objects for FDB and NIC tables
Currently, the u32 chain id is mapped to u16 value which is stored on
the lower 16 bits of reg_c0 for FDB and reg_b for NIC tables. The
mapping is internally maintained by the chains object. However, with
the introduction of reg_c0 objects the fdb may store more than just
the chain id on reg_c0. This is not relevant for NIC tables.

Separate the chains mapping instantiation for FDB and NIC tables.
Remove the mapping from the chains object. For FDB tables, create
the mapping per eswitch. For NIC tables, create the mapping per tc
table. Pass the corresponding mapping pointer when creating the
chains object.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:02 -07:00
Chris Mi a91d98a0a2 net/mlx5: Map register values to restore objects
Currently reg_c0 lower 16 bits and reg_b are used to store the chain
id that missed in FDB and NIC tables accordingly. However, the
registers' values may index a restore object, rather than a single u32
value. Different object types can be used to restore mutually exclusive
contexts such as chain id and sample group id.

Use the mapping object to associate an index with a restore object
as a prestep for supporting additional restore types.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:02 -07:00
Chris Mi c1904360dd net/mlx5: E-switch, Set per vport table default group number
Different per voprt table is created using a different per vport table
namespace. Because we can't use variable to set the namespace member
value.  If max group number is 0 in the namespace, use the eswitch
default max group number.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:02 -07:00
Chris Mi c796bb7cd2 net/mlx5: E-switch, Generalize per vport table API
Currently, per vport table was used only for port mirroring actions.
However, sample action will also require a per vport table instance.

Generalize the vport table API to work with multiple namespaces where
each namespace manages its own vport table instance.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:01 -07:00
Chris Mi 0a9e230787 net/mlx5: E-switch, Rename functions to follow naming convention.
Public api starts with mlx5 and remove mlx5 for non-public api.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:01 -07:00
Chris Mi 4c7f40287a net/mlx5: E-switch, Move vport table functions to a new file
Currently, the vport table functions are in common eswitch offload
file. This file is too big. Move the vport table create, delete and
lookup functions to a separate file. Put the file in esw directory.

Pre-step for generalizing its functionality for serving both the
mirroring and the sample features.

Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:36:01 -07:00
Xiaoming Ni d5f9b005c3 net/mlx5: fix kfree mismatch in indir_table.c
Memory allocated by kvzalloc() should be freed by kvfree().

Fixes: 34ca65352d ("net/mlx5: E-Switch, Indirect table infrastructur")
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:04:36 -07:00
Eli Cohen 1a73704c82 net/mlx5: Fix HW spec violation configuring uplink
Make sure to modify uplink port to follow only if the uplink_follow
capability is set as required by the HW spec. Failure to do so causes
traffic to the uplink representor net device to cease after switching to
switchdev mode.

Fixes: 7d0314b11c ("net/mlx5e: Modify uplink state on interface up/down")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-06 21:04:35 -07:00
Peng Zhang 631a44ed25 nfp: flower: add support for packet-per-second policing
Allow hardware offload of a policer action attached to a matchall filter
which enforces a packets-per-second rate-limit.

e.g.
tc filter add dev tap1 parent ffff: u32 match \
        u32 0 0 police pkts_rate 3000 pkts_burst 1000

Signed-off-by: Peng Zhang <peng.zhang@corigine.com>
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:47:46 -07:00
Guangbin Huang ed7bedd2c3 net: hns3: clear VF down state bit before request link status
Currently, the VF down state bit is cleared after VF sending
link status request command. There is problem that when VF gets
link status replied from PF, the down state bit may still set
as 1. In this case, the link status replied from PF will be
ignored and always set VF link status to down.

To fix this problem, clear VF down state bit before VF requests
link status.

Fixes: e2cb1dec97 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:40:06 -07:00
Andy Shevchenko a460513ed4 time64.h: Consolidated PSEC_PER_SEC definition
We have currently three users of the PSEC_PER_SEC each of them defining it
individually. Instead, move it to time64.h to be available for everyone.

There is a new user coming with the same constant in use. It will also
make its life easier.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:32:17 -07:00
Andy Shevchenko 3036ec035c stmmac: intel: Drop duplicate ID in the list of PCI device IDs
The PCI device IDs are defined with a prefix PCI_DEVICE_ID.
There is no need to repeat the ID part at the end of each definition.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:31:18 -07:00
Guenter Roeck 66c3f05ddc pcnet32: Use pci_resource_len to validate PCI resource
pci_resource_start() is not a good indicator to determine if a PCI
resource exists or not, since the resource may start at address 0.
This is seen when trying to instantiate the driver in qemu for riscv32
or riscv64.

pci 0000:00:01.0: reg 0x10: [io  0x0000-0x001f]
pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x0000001f]
...
pcnet32: card has no PCI IO resources, aborting

Use pci_resouce_len() instead.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:30:17 -07:00
Qiheng Lin 3b2c32f96e net: ethernet: mtk_eth_soc: remove unneeded semicolon
Eliminate the following coccicheck warning:
 drivers/net/ethernet/mediatek/mtk_ppe.c:270:2-3: Unneeded semicolon

Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:27:00 -07:00
Lv Yunlong b25b343db0 net: broadcom: bcm4908enet: Fix a double free in bcm4908_enet_dma_alloc
In bcm4908_enet_dma_alloc, if callee bcm4908_dma_alloc_buf_descs() failed,
it will free the ring->cpu_addr by dma_free_coherent() and return error.
Then bcm4908_enet_dma_free() will be called, and free the same cpu_addr
by dma_free_coherent() again.

My patch set ring->cpu_addr to NULL after it is freed in
bcm4908_dma_alloc_buf_descs() to avoid the double free.

Fixes: 4feffeadbc ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:15:21 -07:00
Krzysztof Kozlowski cc0626c2aa net: smsc911x: skip acpi_device_id table when !CONFIG_ACPI
The driver can match via multiple methods.  Its acpi_device_id table is
referenced via ACPI_PTR() so it will be unused for !CONFIG_ACPI builds:

  drivers/net/ethernet/smsc/smsc911x.c:2652:36: warning:
    ‘smsc911x_acpi_match’ defined but not used [-Wunused-const-variable=]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 15:05:02 -07:00
Salil Mehta d392ecd1bc net: hns3: Limiting the scope of vector_ring_chain variable
Limiting the scope of the variable vector_ring_chain to the block where it
is used.

Fixes: 424eb834a9 ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC")
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 15:04:09 -07:00
Salil Mehta 0600771fa6 net: hns3: Remove un-necessary 'else-if' in the hclge_reset_event()
Code to defer the reset(which caps the frequency of the reset) schedules the
timer and returns. Hence, following 'else-if' looks un-necessary.

Fixes: 9de0b86f64 ("net: hns3: Prevent to request reset frequently")
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 15:02:41 -07:00
Salil Mehta 9a6aaf6148 net: hns3: Remove the left over redundant check & assignment
This removes the left over check and assignment which is no longer used
anywhere in the function and should have been removed as part of the
below mentioned patch.

Fixes: 012fcb52f6 ("net: hns3: activate reset timer when calling reset_event")
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 15:02:41 -07:00
Christophe JAILLET 7190e9d8e1 qede: Use 'skb_add_rx_frag()' instead of hand coding it
Some lines of code can be merged into an equivalent 'skb_add_rx_frag()'
call which is less verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-05 11:59:33 -07:00