Commit Graph

36870 Commits

Author SHA1 Message Date
Amit Cohen d8f4da73ce mlxsw: reg: Add Switch Port Egress VLAN EtherType Register
SPEVET configures which EtherType to push at egress for packets incoming
through a local port for which 'SPVID.egr_et_set' is set.

The next patches will use SPEVET to configure EtherType 0x88A8 and
0x8100 for local ports member in 802.1ad and 802.1q bridges,
respectively. This allows using dual VxLAN bridges (802.1d and 802.1ad at
the same time).

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 12:26:28 -07:00
Amit Cohen 1b35293b7a mlxsw: reg: Add egr_et_set field to SPVID
SPVID.egr_et_set=1 means that when VLAN is pushed at ingress (for untagged
packets or for QinQ push mode) then the EtherType is decided at the egress
port.

The next patches will use this field for VxLAN devices (tunnel port) in
order to allow using dual VxLAN bridges (802.1d and 802.1ad at the same
time).

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 12:26:28 -07:00
Voon Weifeng 3600be5f58 net: stmmac: add timestamp correction to rid CDC sync error
According to Synopsis DesignWare EQoS Databook, the Clock Domain Cross
synchronization error is introduced tue to the clock(GMII Tx/Rx clock)
being different at the capture as compared to the PTP
clock(clk_ptp_ref_i) that is used to generate the time.

The CDC synchronization error is almost equal to 2 times the clock
period of the PTP clock(clk_ptp_ref_i).

On a Intel Tigerlake platform (with Marvell 88E2110 external PHY):

Before applying this patch (with CDC synchronization error):
ptp4l[64.044]: rms    8 max   13 freq +30877 +/-  11 delay   216 +/-   0
ptp4l[65.047]: rms   13 max   20 freq +30869 +/-  17 delay   213 +/-   0
ptp4l[66.050]: rms   12 max   20 freq +30857 +/-  11 delay   213 +/-   0
ptp4l[67.052]: rms   11 max   22 freq +30849 +/-  10 delay   215 +/-   0
ptp4l[68.055]: rms   10 max   16 freq +30853 +/-  13 delay   215 +/-   0
ptp4l[69.057]: rms    7 max   13 freq +30848 +/-   9 delay   216 +/-   0
ptp4l[70.060]: rms    8 max   13 freq +30846 +/-  10 delay   216 +/-   0
ptp4l[71.063]: rms    9 max   15 freq +30836 +/-   8 delay   218 +/-   0

After applying this patch (CDC syncrhonization error is taken care of):
ptp4l[61.516]: rms  773 max  824 freq +31526 +/- 158 delay   200 +/-   0
ptp4l[62.519]: rms  427 max  596 freq +31668 +/-  39 delay   198 +/-   0
ptp4l[63.522]: rms  113 max  206 freq +31482 +/-  57 delay   198 +/-   0
ptp4l[64.525]: rms   40 max   56 freq +31316 +/-  29 delay   200 +/-   0
ptp4l[65.528]: rms   47 max   56 freq +31255 +/-  17 delay   200 +/-   0
ptp4l[66.531]: rms   26 max   36 freq +31246 +/-   9 delay   200 +/-   0
ptp4l[67.534]: rms   12 max   18 freq +31254 +/-  12 delay   202 +/-   0
ptp4l[68.537]: rms    7 max   12 freq +31263 +/-  10 delay   202 +/-   0

Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 12:04:33 -07:00
Alexander Duyck acebe5b610 ionic: Update driver to use ethtool_sprintf
Update the ionic driver to make use of ethtool_sprintf. In addition add
separate functions for Tx/Rx stats strings in order to reduce the total
amount of indenting needed in the driver code.

Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 11:42:31 -07:00
Alexander Duyck b82e8118c5 bna: Update driver to use ethtool_sprintf
Update the bnad_get_strings to make use of ethtool_sprintf and avoid
unnecessary line wrapping. To do this we invert the logic for the string
set test and instead exit immediately if we are not working with the stats
strings. In addition the function is broken up into subfunctions for each
area so that we can simply call ethtool_sprintf once for each string in a
given subsection.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 11:42:31 -07:00
Alexander Duyck efbbe4fb59 ena: Update driver to use ethtool_sprintf
Replace instances of snprintf or memcpy with a pointer update with
ethtool_sprintf.

Acked-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 11:42:31 -07:00
Alexander Duyck 83cd23974a hisilicon: Update drivers to use ethtool_sprintf
Update the hisilicon drivers to make use of ethtool_sprintf. The general
idea is to reduce code size and overhead by replacing the repeated pattern
of string printf statements and ETH_STRING_LEN counter increments.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 11:42:30 -07:00
Alexander Duyck 6a143a7cf9 nfp: Replace nfp_pr_et with ethtool_sprintf
The nfp_pr_et function is nearly identical to ethtool_sprintf except for
the fact that it passes the pointer by value and as a return whereas
ethtool_sprintf passes it as a pointer.

Since they are so close just update nfp to make use of ethtool_sprintf

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 11:42:30 -07:00
Alexander Duyck c8d4725e98 intel: Update drivers to use ethtool_sprintf
Update the Intel drivers to make use of ethtool_sprintf. The general idea
is to reduce code size and overhead by replacing the repeated pattern of
string printf statements and ETH_STRING_LEN counter increments.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 11:42:30 -07:00
David S. Miller 0c88eda9f5 mlx5-updates-2021-03-16
mlx5 uplink representor netdev persistence.
 
 Before this patchset we used to have separate netdevs for Native NIC mode
 and Switchdev mode (uplink representor netdev), meaning that if user
 switches modes between Native to Switchdev and vice versa, the driver
 would cleanup the current netdev representor and create a new one for the
 new mode, such behavior created an administrative nightmare for users,
 where users need to be aware of such loss of both data path and control
 path configurations, e.g. netdev attributes and arp/route tables,
 where the later is more painful.
 
 A simple solution for this is not to replace the netdev in first place
 and use a single netdev to serve the uplink/physical port whether it is
 in switchdev mode or native mode.
 
 We already have different HW profiles for each netdev mode, in this series
 we just replace the HW profile on the fly and we keep the same netdev
 attached.
 
 Refactoring: Some refactoring has been made to overcome some technical
 difficulties
 1) The netdev is created with the maximum amount of tx/rx queues to serve
 the two profiles.
 
 2) Some ndos are not supported in some modes, so we added a mode check for
    such cases, e.g legacy sriov ndos must be blocked in switchdev mode.
 
 3) Some mlx5 netdev private attributes need to be moved out of profiles
    and kept in a persistent place, where the netdev is created
    e.g devlink port and other global HW resources
 
 4) The netdev devlink port is now always registered with the switch id
 
 Implementation: the last three patches implement the mechanism now as the
 netdev can be shared.
 
 5) Don't recreate the netdev on switchdev mode changes
 6) Prevent changing switchdev mode when some netdev operations
 are active, mostly when TC rules are being processed.
 This is required since the netdev is kept registered while switchdev mode
 can be changed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmBRQ9sACgkQSD+KveBX
 +j7mLQf/SyOuP+Hu2TKss79JhQCJb0FJ3bmwaXqenSbgRGQYi6dZN70/ywlR2ajQ
 0gRococ+y4OUJQQi0cDwMZ7AWo6bYc8rMUKRgjxqIsfYhyG+FAX9BoLi0jApplKK
 MkdPbzotrsAiOomF2cQJjYFpMDU3NpTKIv/eUlMowLhLdx7oK4XGohUkh2eICUhG
 5PPuZstYR5OvgOaQLHpXtkDAw7I/IWPOp6Y2sqDqmgESkpS7jaKhqvmtIi8F5u+L
 0FyLsyHP2keudOtfG05rpDq0n+56Z/lRFbWpFNReF1oQOizCGXM11QdHtsl4EZtQ
 LqPinLYbeGOWhXo+8gRipG7eLSKkYg==
 =k64c
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-03-16

mlx5 uplink representor netdev persistence.

Before this patchset we used to have separate netdevs for Native NIC mode
and Switchdev mode (uplink representor netdev), meaning that if user
switches modes between Native to Switchdev and vice versa, the driver
would cleanup the current netdev representor and create a new one for the
new mode, such behavior created an administrative nightmare for users,
where users need to be aware of such loss of both data path and control
path configurations, e.g. netdev attributes and arp/route tables,
where the later is more painful.

A simple solution for this is not to replace the netdev in first place
and use a single netdev to serve the uplink/physical port whether it is
in switchdev mode or native mode.

We already have different HW profiles for each netdev mode, in this series
we just replace the HW profile on the fly and we keep the same netdev
attached.

Refactoring: Some refactoring has been made to overcome some technical
difficulties
1) The netdev is created with the maximum amount of tx/rx queues to serve
the two profiles.

2) Some ndos are not supported in some modes, so we added a mode check for
   such cases, e.g legacy sriov ndos must be blocked in switchdev mode.

3) Some mlx5 netdev private attributes need to be moved out of profiles
   and kept in a persistent place, where the netdev is created
   e.g devlink port and other global HW resources

4) The netdev devlink port is now always registered with the switch id

Implementation: the last three patches implement the mechanism now as the
netdev can be shared.

5) Don't recreate the netdev on switchdev mode changes
6) Prevent changing switchdev mode when some netdev operations
are active, mostly when TC rules are being processed.
This is required since the netdev is kept registered while switchdev mode
can be changed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17 11:22:39 -07:00
Roi Dayan 7dc84de98b net/mlx5: E-Switch, Protect changing mode while adding rules
We re-use the native NIC port net device instance for the Uplink
representor, a driver currently cannot unbind TC setup callback
actively, hence protect changing E-Switch mode while adding rules.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:43 -07:00
Roi Dayan c55479d0cb net/mlx5: E-Switch, Change mode lock from mutex to rw semaphore
E-Switch mode change routine will take the write lock to prevent any
consumer to access the E-Switch resources while E-Switch is going
through a mode change.

In the next patch
E-Switch consumers (e.g vport representors) will take read_lock prior to
accessing E-Switch resources to prevent E-Switch mode changing in the
middle of the operation.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:42 -07:00
Roi Dayan 7a9fb35e8c net/mlx5e: Do not reload ethernet ports when changing eswitch mode
When switching modes between legacy and switchdev and back, do not
reload ethernet interfaces. just change the profile from nic profile
to uplink rep profile in switchdev mode.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:42 -07:00
Roi Dayan fec2b4bb39 net/mlx5e: Unregister eth-reps devices first
When we clean all the interfaces, i.e. rescan or reload module,
we need to clean eth-reps devices first, before eth devices.

We will re-use the native NIC port net device instance for the Uplink
representor. Changing eswitch mode will skip destroying the eth device
so the net device won't be destroyed and only change the profile.

Creating uplink eth-rep will initialize the representor related resources.
In that sense when we destroy all devices we first need to destroy
eth-rep devices so uplink eth-rep will clean all representor related
resources and only then destroy the eth device which will destroy rest
of the resources and the net device.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:42 -07:00
Roi Dayan c27971d08a net/mlx5: Move devlink port from mlx5e priv to mlx5e resources
We re-use the native NIC port net device instance for the Uplink
representor, and the devlink port.
When changing profiles we reset the mlx5e priv but we should still
use the devlink port so move it to mlx5e resources.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:41 -07:00
Roi Dayan c276aae8c1 net/mlx5: Move mlx5e hw resources into a sub object
This is to separate between resources attributes and other
attributes we will want to use.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:41 -07:00
Roi Dayan 5a65d85dc7 net/mlx5e: Register nic devlink port with switch id
We will re-use the native NIC port net device instance for the Uplink
representor. Since the netdev will be kept registered while we engage
switchdev mode also the devlink will be kept registered.
Register the nic devlink port with switch id so it will be available
when changing profiles.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:40 -07:00
Roi Dayan 865d6d1c2d net/mlx5e: Move devlink port register and unregister calls
We will re-use the native NIC port net device instance for the Uplink
representor. As such we also don't want to unregister/register the
devlink port as part of the profile.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:40 -07:00
Roi Dayan 2ff349c5ed net/mlx5e: Verify dev is present in some ndos
We will re-use the native NIC port net device instance for the Uplink
representor. While changing profiles private resources are not
available but some ndos are not checking if the netdev is present.
So for those ndos check the netdev is present in the driver before
accessing the private resources.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:40 -07:00
Roi Dayan c97a2c0691 net/mlx5e: Use nic mode netdev ndos and ethtool ops for uplink representor
Remove dedicated uplink rep netdev ndos and ethtools ops.
We will re-use the native NIC port net device instance and ethtool ops for
the Uplink representor.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:39 -07:00
Roi Dayan ee5260307c net/mlx5e: Add offload stats ndos to nic netdev ops
We will re-use the native NIC port net device instance for the Uplink
representor, hence same ndos must be used.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:39 -07:00
Roi Dayan ec9457a6f6 net/mlx5e: Distinguish nic and esw offload in tc setup block cb
We will re-use the native NIC port net device instance for the Uplink
representor, hence same ndos will be used.
Now we need to distinguish in the TC callback if the mode is legacy or
switchdev and set the proper flag.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:38 -07:00
Roi Dayan 1aa48ca6aa net/mlx5e: Allow legacy vf ndos only if in legacy mode
We will re-use the native NIC port net device instance for the Uplink
representor. Several VF ndo ops are not relevant in switchdev mode.
Disallow them when eswitch mode is not legacy as a preparation.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-16 16:48:38 -07:00
Saeed Mahameed f031dbd530 net/mlx5e: Same max num channels for both nic and uplink profiles
In downstream patches NIC netdev can change profile dynamically from
NIC mode to uplink mode and vise-versa. It is required that both profiles
must advertise the same max amount of tx/rx queues.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
2021-03-16 16:48:37 -07:00
Horatiu Vultur 2ed2c5f039 net: ocelot: Remove ocelot_xfh_get_cpuq
Now when extracting frames from CPU the cpuq is not used anymore so
remove it.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:49:52 -07:00
Horatiu Vultur 7c588c3e96 net: ocelot: Extend MRP
This patch extends MRP support for Ocelot. It allows to have multiple
rings and when the node has the MRC role it forwards MRP Test frames in
HW. For MRM there is no change.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:49:52 -07:00
Horatiu Vultur ebb1bb4013 net: ocelot: Add PGID_BLACKHOLE
Add a new PGID that is used not to forward frames anywhere. It is used
by MRP to make sure that MRP Test frames will not reach CPU port.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:49:52 -07:00
David S. Miller 0d40597082 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
40GbE Intel Wired LAN Driver Updates 2021-03-16

This series contains updates to i40e, ixgbe, and ice drivers.

Magnus Karlsson says:

Optimize run_xdp_zc() for the XDP program verdict being XDP_REDIRECT
in the xsk zero-copy path. This path is only used when having AF_XDP
zero-copy on and in that case most packets will be directed to user
space. This provides around 100k extra packets in throughput on my
server when running l2fwd in xdpsock.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:34:15 -07:00
Ido Schimmel 45aad0b704 mlxsw: spectrum_acl: Offload FLOW_ACTION_SAMPLE
Implement support for action sample when used with a flower classifier
by implementing the required sampler_add() / sampler_del() callbacks and
registering an Rx listener for the sampled packets.

The sampler_add() callback returns an error for Spectrum-1 as the
functionality is not supported. In Spectrum-{2,3} the callback creates a
mirroring agent towards the CPU. The agent's identifier is used by the
policy engine code to mirror towards the CPU with probability.

The Rx listener for the sampled packet is registered with the 'policy
engine' mirroring reason and passes trapped packets to the psample
module after looking up their parameters (e.g., sampling group).

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:32:22 -07:00
Ido Schimmel ca19ea63f7 mlxsw: core_acl_flex_actions: Add mirror sampler action
Add core functionality required to support mirror sampler action in the
policy engine. The switch driver (e.g., 'mlxsw_spectrum') is required to
implement the sampler_add() / sampler_del() callbacks that perform the
necessary configuration before the sampler action can be installed. The
next patch will implement it for Spectrum-{2,3}, while Spectrum-1 will
return an error, given it is not supported.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:32:22 -07:00
Ido Schimmel 54d0e963f6 mlxsw: spectrum_matchall: Add support for egress sampling
Allow user space to install a matchall classifier with sample action on
egress. This is only supported on Spectrum-2 onwards, so Spectrum-1 will
continue to return an error.

Programming the hardware to sample on egress is identical to ingress
sampling with the sole change of using a different sampling trigger.

Upon receiving a sampled packet, the sampling trigger (ingress vs.
egress) will be encoded in the mirroring reason in the Completion Queue
Element (CQE). The mirroring reason is used to lookup the sampling
parameters (e.g., psample group) which are passed to the psample module.

Note that locally generated packets that are sampled are simply
consumed. This is done for several reasons.

First, such packets do not have an ingress netdev given that their Rx
local port is the CPU port. This breaks several basic assumptions.

Second, sampling using the same interface (tc), but with flower
classifier will not result in locally generated packets being sampled
given that such packets are not subject to the policy engine.

Third, realistically, this is not a big deal given that the vast
majority of the packets being transmitted through the port are not
locally generated packets.

Fourth, if such packets do need to be sampled, they can be sampled with
a 'skip_hw' filter and reported to the same sampling group as the data
path packets. The software sampling rate can also be adjusted to fit the
rate of the locally generated packets which is much lower than the rate
of the data path traffic.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:32:22 -07:00
Ido Schimmel 90f53c53ec mlxsw: spectrum: Start using sampling triggers hash table
Start using the previously introduced sampling triggers hash table to
store sampling parameters instead of storing them as attributes of the
sampled port.

This makes it easier to introduce new sampling triggers.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:32:22 -07:00
Ido Schimmel 1b9fc42e46 mlxsw: spectrum: Track sampling triggers in a hash table
Currently, mlxsw supports a single sampling trigger type (i.e., received
packet). When sampling is configured on an ingress port, the sampling
parameters (e.g., pointer to the psample group) are stored as an
attribute of the port, so that they could be passed to
psample_sample_packet() when a sampled packet is trapped to the CPU.

Subsequent patches are going to add more types of sampling triggers,
making it difficult to maintain the current scheme.

Instead, store all the active sampling triggers with their associated
parameters in a hash table. That way, more trigger types can be easily
added.

The next patch will flip mlxsw to use the hash table instead of the
current scheme.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:32:22 -07:00
Ido Schimmel e09a59555a mlxsw: spectrum_matchall: Pass matchall entry to sampling operations
The entry will be required by the next patches, so pass it. No
functional changes intended.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:32:22 -07:00
Ido Schimmel 559313b2cb mlxsw: spectrum_matchall: Push sampling checks to per-ASIC operations
Push some sampling checks to the per-ASIC operations, as they are no
longer relevant for all ASICs.

The sampling rate validation against the MPSC maximum rate is only
relevant for Spectrum-1, as Spectrum-2 and later ASICs no longer use
MPSC register for sampling.

The ingress / egress validation is pushed down to the per-ASIC
operations since subsequent patches are going to remove it for
Spectrum-2 and later ASICs.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:32:22 -07:00
Ido Schimmel 6561df5608 mlxsw: spectrum_matchall: Propagate extack further
Due to the differences between Spectrum-1 and later ASICs, some of the
checks currently performed at the common code (where extack is
available) will need to be pushed to the per-ASIC operations.

As a preparation, propagate extack further to maintain proper error
reporting.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:32:22 -07:00
Ioana Ciornei 4fe72de61e dpaa2-eth: fixup kdoc warnings
Running kernel-doc over the dpaa2-eth driver generates a bunch of
warnings. Fix them up by removing code comments for macros which are
self-explanatory, respecting the kdoc format for macro documentation and
other small changes like describing the expected return values of
functions.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:29:49 -07:00
Ioana Ciornei 5ac2d25438 dpaa2-switch: fit the function declaration on the same line
Multiple ABI function declarations are split unnecessarry on multiple
lines. Fix this so that we have a consistent coding style.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:29:49 -07:00
Ioana Ciornei 2b7e3f7d1b dpaa2-switch: reduce the size of the if_id bitmap to 64 bits
The maximum number of DPAA2 switch interfaces, including the control
interface, is 64. Even though this restriction existed from the first
place, the command structures which use an interface id bitmap were
poorly described and even though a single uint64_t is enough, all of
them used an array of 4 uint64_t's.
Fix this by reducing the size of the interface id field to a single
uint64_t.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:29:49 -07:00
Ioana Ciornei 05b363608b dpaa2-switch: fix kdoc warnings
Running kernel-doc over the dpaa2-switch driver generates a bunch of
warnings. Fix them up by removing code comments for macros which are
self-explanatory and adding a bit more context for the
dpsw_if_get_port_mac_addr() function and the fields of the
dpsw_vlan_if_cfg structure.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:29:48 -07:00
Ioana Ciornei cba0445633 dpaa2-switch: remove unused ABI functions
Cleanup the dpaa2-switch driver a bit by removing any unused MC firmware
ABI definitions.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:29:48 -07:00
David S. Miller 52280f60c9 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2021-03-15

This series contains updates to e1000e only.

Chen Yu says:

The NIC is put in runtime suspend status when there is no cable connected.
As a result, it is safe to keep non-wakeup NIC in runtime suspended during
s2ram because the system does not rely on the NIC plug event nor WoL to
wake up the system. Besides that, unlike the s2idle, s2ram does not need to
manipulate S0ix settings during suspend.
====================

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16 15:04:30 -07:00
Shannon Nelson 633eddf120 ionic: aggregate Tx byte counting calls
Gather the Tx packet and byte counts and call
netdev_tx_completed_queue() only once per clean cycle.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 21:27:06 -07:00
Shannon Nelson 19fef72cb4 ionic: simplify tx clean
The descriptor mappings are set up the same way whether
or not it is a TSO, so we don't need separate logic for
the two cases.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 21:27:06 -07:00
Shannon Nelson 2da479ca08 ionic: generic tx skb mapping
Make the new ionic_tx_map_tso() usable by the non-TSO paths,
and pull the call up a level into ionic_tx() before calling
the csum or no-csum routines.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 21:27:06 -07:00
Shannon Nelson 5b039241fe ionic: simplify TSO descriptor mapping
One issue with the original TSO code was that it was working too
hard to deal with skb layouts that were never going to show up,
such as an skb->data that was longer than a single descriptor's
length.  The other issue was trying to arrange the fragment dma
mapping at the same time as figuring out the descriptors needed.
There was just too much going on at the same time.

Now we do the dma mapping first, which sets up the buffers with
skb->data in buf[0] and the remaining frags in buf[1..n-1].
Next we spread the bufs across the descriptors needed, where
each descriptor gets up to mss number of bytes.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 21:27:06 -07:00
Alex Elder 86ca860e12 net: qualcomm: rmnet: don't use C bit-fields in rmnet checksum header
Replace the use of C bit-fields in the rmnet_map_ul_csum_header
structure with a single two-byte (big endian) structure member,
and use masks to encode or get values within it.  The content of
these fields can be accessed using simple bitwise AND and OR
operations on the (host byte order) value of the new structure
member.

Previously rmnet_map_ipv4_ul_csum_header() would update C bit-field
values in host byte order, then forcibly fix their byte order using
a combination of byte swap operations and types.

Instead, just compute the value that needs to go into the new
structure member and save it with a simple byte-order conversion.

Make similar simplifications in rmnet_map_ipv6_ul_csum_header().

Finally, in rmnet_map_checksum_uplink_packet() a set of assignments
zeroes every field in the upload checksum header.  Replace that with
a single memset() operation.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 20:41:58 -07:00
Alex Elder cc1b21ba62 net: qualcomm: rmnet: don't use C bit-fields in rmnet checksum trailer
Replace the use of C bit-fields in the rmnet_map_dl_csum_trailer
structure with a single one-byte field, using constant field masks
to encode or get at embedded values.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 20:41:58 -07:00
Alex Elder 16653c16d2 net: qualcomm: rmnet: use masks instead of C bit-fields
The actual layout of bits defined in C bit-fields (e.g. int foo : 3)
is implementation-defined.  Structures defined in <linux/if_rmnet.h>
address this by specifying all bit-fields twice, to cover two
possible layouts.

I think this pattern is repetitive and noisy, and I find the whole
notion of compiler "bitfield endianness" to be non-intuitive.

Stop using C bit-fields for the command/data flag and the pad length
fields in the rmnet_map structure, and define a single-byte flags
field instead.  Define a mask for the single-bit "command" flag,
and another mask for the encoded pad length.  The content of both
fields can be accessed using a simple bitwise AND operation.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 20:41:58 -07:00
Alex Elder 9d131d044f net: qualcomm: rmnet: kill RMNET_MAP_GET_*() accessor macros
The following macros, defined in "rmnet_map.h", assume a socket
buffer is provided as an argument without any real indication this
is the case.
    RMNET_MAP_GET_MUX_ID()
    RMNET_MAP_GET_CD_BIT()
    RMNET_MAP_GET_PAD()
    RMNET_MAP_GET_CMD_START()
    RMNET_MAP_GET_LENGTH()
What they hide is pretty trivial accessing of fields in a structure,
and it's much clearer to see this if we do these accesses directly.

So rather than using these accessor macros, assign a local
variable of the map header pointer type to the socket buffer data
pointer, and derereference that pointer variable.

In "rmnet_map_data.c", use sizeof(object) rather than sizeof(type)
in one spot.  Also, there's no need to byte swap 0; it's all zeros
irrespective of endianness.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 20:41:58 -07:00
Alex Elder 50c62a111c net: qualcomm: rmnet: simplify some byte order logic
In rmnet_map_ipv4_ul_csum_header() and rmnet_map_ipv6_ul_csum_header()
the offset within a packet at which checksumming should commence is
calculated.  This calculation involves byte swapping and a forced type
conversion that makes it hard to understand.

Simplify this by computing the offset in host byte order, then
converting the result when assigning it into the header field.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 20:41:58 -07:00
Chen Yu 3335369bad e1000e: Remove the runtime suspend restriction on CNP+
Although there is platform issue of runtime suspend support
on CNP, it would be more flexible to let the user decide whether
to disable runtime or not because:
1. This can be done in userspace via
   echo on > /sys/devices/pci0000\:00/0000\:00\:1f.d/power/control
2. More and more NICs would support runtime suspend, disabling the
   runtime suspend on them by default would impact the validation.

Only disable runtime suspend on CNP in case of any user space regression.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-15 20:10:51 -07:00
Chen Yu ccf8b940e5 e1000e: Leverage direct_complete to speed up s2ram
The NIC is put in runtime suspend status when there is no cable connected.
As a result, it is safe to keep non-wakeup NIC in runtime suspended during
s2ram because the system does not rely on the NIC plug event nor WoL to wake
up the system. Besides that, unlike the s2idle, s2ram does not need to
manipulate S0ix settings during suspend.

This patch introduces the .prepare() for e1000e so that if the NIC is runtime
suspended the subsequent suspend/resume hooks will be skipped so as to speed
up the s2ram. The pm core will check whether the NIC is a wake up device so
there's no need to check it again in .prepare(). DPM_FLAG_SMART_PREPARE flag
should be set during probe to ask the pci subsystem to honor the driver's
prepare() result. Besides, the NIC remains runtime suspended after resumed
from s2ram as there is no need to resume it.

Tested on i7-2600K with 82579V NIC
Before the patch:
e1000e 0000:00:19.0: pci_pm_suspend+0x0/0x160 returned 0 after 225146 usecs
e1000e 0000:00:19.0: pci_pm_resume+0x0/0x90 returned 0 after 140588 usecs

After the patch:
echo disabled > //sys/devices/pci0000\:00/0000\:00\:19.0/power/wakeup
becomes 0 usecs because the hooks will be skipped.

Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-15 20:10:47 -07:00
Joakim Zhang 8f2f83765e net: stmmac: dwmac-imx: add platform level clocks management for i.MX
Split clocks settings from init callback into clks_config callback,
which could support platform level clocks management.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 14:46:21 -07:00
Joakim Zhang b4d45aee66 net: stmmac: add platform level clocks management
This patch intends to add platform level clocks management. Some
platforms may have their own special clocks, they also need to be
managed dynamically. If you want to manage such clocks, please implement
clks_config callback.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 14:46:21 -07:00
Joakim Zhang 5ec5582343 net: stmmac: add clocks management for gmac driver
This patch intends to add clocks management for stmmac driver:

If CONFIG_PM enabled:
1. Keep clocks disabled after driver probed.
2. Enable clocks when up the net device, and disable clocks when down
the net device.

If CONFIG_PM disabled:
Keep clocks always enabled after driver probed.

Note:
1. It is fine for ethtool, since the way of implementing ethtool_ops::begin
in stmmac is only can be accessed when interface is enabled, so the clocks
are ticked.
2. The MDIO bus has a different life cycle to the MAC, need ensure
clocks are enabled when _mdio_read/write() need clocks, because these
functions can be called while the interface it not opened.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 14:46:21 -07:00
Ong Boon Leong 7310fe538e stmmac: intel: add pcs-xpcs for Intel mGbE controller
Intel mGbE controller such as those in EHL & TGL uses pcs-xpcs driver for
SGMII interface. To ensure mdio bus scanning does not assign phy_device
to MDIO-addressable entities like intel serdes and pcs-xpcs, we set up
to phy_mask to skip them.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 12:53:12 -07:00
Ong Boon Leong c62808e810 net: stmmac: ensure phydev is attached to phylink for C37 AN
As the support for MAC-side SGMII C37 AN is added to pcs-xpcs, phydev
should be attached to phylink during driver's open(). So, we change the
condition to "Not C73 AN" instead.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 12:53:12 -07:00
Ong Boon Leong e5e5b771f6 net: stmmac: make in-band AN mode parsing is supported for non-DT
Not all platform uses DT, so phylink_parse_mode() will skip in-band setup
of pl->supported and pl->link_config.advertising entirely. So, we add the
setting of ovr_an_inband flag to make it works for non-DT platform.

Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 12:53:12 -07:00
Bhaskar Chowdhury 29c35da103 net: ethernet: neterion: Fix a typo in the file s2io.c
s/structue/structure/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 12:48:20 -07:00
Bhaskar Chowdhury 6f05a12241 net: ethernet: intel: igb: Typo fix in the file igb_main.c
s/structue/structure/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 12:48:20 -07:00
Bhaskar Chowdhury a7dde236b3 ethernet: amazon: ena: A typo fix in the file ena_com.h
Mundane typo fix.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-15 12:48:20 -07:00
Magnus Karlsson bb52073645 ice: optimize for XDP_REDIRECT in xsk path
Optimize ice_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little over 100k extra packets
in throughput on my server when running l2fwd in xdpsock.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-15 12:04:51 -07:00
Magnus Karlsson 7d52fe2ead ixgbe: optimize for XDP_REDIRECT in xsk path
Optimize ixgbe_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little under 100k extra
packets in throughput on my server when running l2fwd in xdpsock.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-15 12:04:51 -07:00
Magnus Karlsson 346497c78d i40e: optimize for XDP_REDIRECT in xsk path
Optimize i40e_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little over 100k extra packets
in throughput on my server when running l2fwd in xdpsock.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-15 12:04:51 -07:00
Ido Schimmel 2073c60044 mlxsw: spectrum: Report extra metadata to psample module
Make use of the previously added metadata and report it to the psample
module. The metadata is read from the skb's control block, which was
initialized by the bus driver (i.e., 'mlxsw_pci') after decoding the
packet's Completion Queue Element (CQE).

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:44 -07:00
Ido Schimmel 48990bef1e mlxsw: spectrum: Remove mlxsw_sp_sample_receive()
The function resolves the psample sampling group from the Rx port
because this is the only form of sampling the driver currently supports.
Subsequent patches are going to add support for Tx-based and
policy-based sampling, in which case the sampling group would not be
resolved from the Rx port.

Therefore, move this code to the Rx-specific sampling listener.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:44 -07:00
Ido Schimmel e1f78ecdfd mlxsw: spectrum: Remove unnecessary RCU read-side critical section
Since commit 7d8e8f3433 ("mlxsw: core: Increase scope of RCU read-side
critical section"), all Rx handlers are called from an RCU read-side
critical section.

Remove the unnecessary rcu_read_lock() / rcu_read_unlock().

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:44 -07:00
Ido Schimmel 5ab6dc9fa2 mlxsw: pci: Set extra metadata in skb control block
Packets that are mirrored / sampled to the CPU have extra metadata
encoded in their corresponding Completion Queue Element (CQE). Retrieve
this metadata from the CQE and set it in the skb control block so that
it could be accessed by the switch driver (i.e., 'mlxsw_spectrum').

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:43 -07:00
Ido Schimmel d4cabaadea mlxsw: Create dedicated field for Rx metadata in skb control block
Next patch will need to encode more Rx metadata in the skb control
block, so create a dedicated field for it and move the cookie index
there.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:43 -07:00
Ido Schimmel e0eeede3d2 mlxsw: pci: Add more metadata fields to CQEv2
The Completion Queue Element version 2 (CQEv2) includes various metadata
fields for packets that are mirrored / sampled to the CPU.

Add these fields so that they could be used by a later patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:43 -07:00
Ido Schimmel a03e99d39f psample: Encapsulate packet metadata in a struct
Currently, callers of psample_sample_packet() pass three metadata
attributes: Ingress port, egress port and truncated size. Subsequent
patches are going to add more attributes (e.g., egress queue occupancy),
which also need an indication whether they are valid or not.

Encapsulate packet metadata in a struct in order to keep the number of
arguments reasonable.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:43 -07:00
Jonathan McDowell e127906b68 net: stmmac: Set FIFO sizes for ipq806x
Commit eaf4fac478 ("net: stmmac: Do not accept invalid MTU values")
started using the TX FIFO size to verify what counts as a valid MTU
request for the stmmac driver.  This is unset for the ipq806x variant.
Looking at older patches for this it seems the RX + TXs buffers can be
up to 8k, so set appropriately.

(I sent this as an RFC patch in June last year, but received no replies.
I've been running with this on my hardware (a MikroTik RB3011) since
then with larger MTUs to support both the internal qca8k switch and
VLANs with no problems. Without the patch it's impossible to set the
larger MTU required to support this.)

Signed-off-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13 18:08:05 -08:00
Bhaskar Chowdhury 65c7bc1b7a net: ethernet: marvell: Fixed typo in the file sky2.c
s/calclation/calculation/

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13 18:06:11 -08:00
Baowen Zheng 6a56e19902 flow_offload: reject configuration of packet-per-second policing in offload drivers
A follow-up patch will allow users to configures packet-per-second policing
in the software datapath. In preparation for this, teach all drivers that
support offload of the policer action to reject such configuration as
currently none of them support it.

Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13 14:18:09 -08:00
Guangbin Huang b47cfe1f40 net: hns3: add phy loopback support for imp-controlled PHYs
If the imp-controlled PHYs feature is enabled, driver can not
call phy driver interface to set loopback anymore and needs
to send command to firmware to start phy loopback.

Driver reuses the existing firmware command 0x0315 to start
phy loopback, just add a setting bit in this command. As this
command is not only for serdes loopback anymore, rename this
command to "xxx_COMMON_LOOPBACK", and modify function name,
macro name and logs related to it.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13 14:11:29 -08:00
Guangbin Huang 024712f51e net: hns3: add ioctl support for imp-controlled PHYs
When the imp-controlled PHYs feature is enabled, driver will not
register mdio bus. In order to support ioctl ops for phy tool to
read or write phy register in this case, the firmware implement
a new command for driver and driver implement ioctl by using this
new command.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13 14:11:29 -08:00
Guangbin Huang 57a8f46b1b net: hns3: add get/set pause parameters support for imp-controlled PHYs
When the imp-controlled PHYs feature is enabled, phydev is NULL.
In this case, the autoneg is always off when user uses ethtool -a
command to get pause parameters because  hclge_get_pauseparam()
uses phydev to check whether device is TP port. To fit this new
feature, use media type to check whether device is TP port.

And when user set pause parameters, these parameters need to
always set to mac, no matter whether autoneg is off.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13 14:11:28 -08:00
Guangbin Huang f5f2b3e4dc net: hns3: add support for imp-controlled PHYs
IMP(Intelligent Management Processor) firmware add a new feature
to take control of PHYs for some new devices, PF driver adds
support for this feature.

Driver queries device's capability to check whether IMP supports
this feature, it will tell IMP to enable this feature by firmware
compatible command if it is supported.

Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-13 14:11:28 -08:00
Sergey Shtylyov 0deaeabf27 sh_eth: place RX/TX descriptor *enum*s after their *struct*s
Place the RX/TX descriptor bit *enum*s where they belong -- after the
corresponding RX/TX descriptor *struct*s and, while at it, switch to
declaring one *enum* entry per line...

Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:50:42 -08:00
Sergey Shtylyov e2dccaf194 sh_eth: rename *enum*s still not matching register names
Finally, rename the rest of the *enum* tags still not (exactly) matching
the abbreviated register names from the manuals...

Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:50:42 -08:00
Sergey Shtylyov 4585b72d97 sh_eth: rename PSR bits
In all the SoC manuals (except R-Car gen2) the PHY status register's name
is abbreviated to  PSR with the only valid bit 0 named LMON.  Follow the
suit and rename the corresponding *enum* tag/entry.

Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:50:42 -08:00
Sergey Shtylyov bc9d992ca4 sh_eth: rename TRSCER bits
In all the SoC manuals the TRSCER register bits match the corresponding
EESR registers's bits, but only on the R-Car gen2 SoC those are named
RINT<n> and TINT<n>.  Follow the suit and rename the *enum* tag/entries
from DESC_I_* to TRSCER_*.

Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:50:42 -08:00
Lee Jones f90fc37f28 ptp_pch: Move 'pch_*()' prototypes to shared header
Fixes the following W=1 kernel build warning(s):

 drivers/ptp/ptp_pch.c:193:6: warning: no previous prototype for ‘pch_ch_control_write’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:201:5: warning: no previous prototype for ‘pch_ch_event_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:212:6: warning: no previous prototype for ‘pch_ch_event_write’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:220:5: warning: no previous prototype for ‘pch_src_uuid_lo_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:231:5: warning: no previous prototype for ‘pch_src_uuid_hi_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:242:5: warning: no previous prototype for ‘pch_rx_snap_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:259:5: warning: no previous prototype for ‘pch_tx_snap_read’ [-Wmissing-prototypes]
 drivers/ptp/ptp_pch.c:300:5: warning: no previous prototype for ‘pch_set_station_address’ [-Wmissing-prototypes]

Cc: Richard Cochran <richardcochran@gmail.com> (maintainer:PTP HARDWARE CLOCK SUPPORT)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Flavio Suligoi <f.suligoi@asem.it>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:09:34 -08:00
Lee Jones 257382c54e ptp_pch: Remove unused function 'pch_ch_control_read()'
Fixes the following W=1 kernel build warning(s):

 drivers/ptp/ptp_pch.c:182:5: warning: no previous prototype for ‘pch_ch_control_read’ [-Wmissing-prototypes]

Cc: Richard Cochran <richardcochran@gmail.com> (maintainer:PTP HARDWARE CLOCK SUPPORT)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Flavio Suligoi <f.suligoi@asem.it>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:09:34 -08:00
Rafał Miłecki 12bb508bfe net: broadcom: bcm4908_enet: support TX interrupt
It appears that each DMA channel has its own interrupt and both rings
can be configured (the same way) to handle interrupts.

1. Make ring interrupts code generic (make it operate on given ring)
2. Move napi to ring (so each has its own)
3. Make IRQ handler generic (match ring against received IRQ number)
4. Add (optional) support for TX interrupt

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:48:38 -08:00
Robert Hancock e276e5e40e net: macb: Disable PCS auto-negotiation for SGMII fixed-link mode
When using a fixed-link configuration in SGMII mode, it's not really
sensible to have auto-negotiation enabled since the link settings are
fixed by definition. In other configurations, such as an SGMII
connection to a PHY, it should generally be enabled.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:44:45 -08:00
Robert Hancock 8fab174b78 net: macb: poll for fixed link state in SGMII mode
When using a fixed-link configuration with GEM in SGMII mode, such as
for a chip-to-chip interconnect, the link state was always showing as
established regardless of the actual connectivity state. We can monitor
the pcs_link_state bit in the Network Status register to determine
whether the PCS link state is actually up.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 16:44:45 -08:00
Maor Dickman a3222a2da0 net/mlx5e: Allow to match on ICMP parameters
Support matching on ICMPv4/6 type and code parameters using misc3
section of match parameters.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:34 -08:00
Paul Blakey 69e2916ebc net/mlx5: CT: Add support for mirroring
Add support for mirroring before the CT action by spliting the pre ct rule.
Mirror outputs are done first on the tc chain,prio table rule (the fwd
rule), which will then forward to a per port fwd table.
On this fwd table, we insert the original pre ct rule that forwards to
ct/ct nat table.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:33 -08:00
Alaa Hleihel 287e0df021 net/mlx5: Display the command index in command mailbox dump
Multiple commands can be printed at the same time which can
lead to wrong order of their lines in dmesg output.
As a result, it's hard to match data dumps to the correct command
or which command was fully dumped at some point.

Fix this by displaying the corresponding command index, and also
indicate when a command was fully dumped.

Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:33 -08:00
Arnd Bergmann 2119bda642 net/mlx5e: allocate 'indirection_rqt' buffer dynamically
Increasing the size of the indirection_rqt array from 128 to 256 bytes
pushed the stack usage of the mlx5e_hairpin_fill_rqt_rqns() function
over the warning limit when building with clang and CONFIG_KASAN:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:970:1: error: stack frame size of 1180 bytes in function 'mlx5e_tc_add_nic_flow' [-Werror,-Wframe-larger-than=]

Using dynamic allocation here is safe because the caller does the
same, and it reduces the stack usage of the function to just a few
bytes.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:32 -08:00
Tariq Toukan e16cf9d754 net/mlx5e: Dump ICOSQ WQE descriptor on CQE with error events
Dump the ICOSQ's WQE descriptor when a completion with error is received.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:32 -08:00
Maxim Mikityanskiy 991b265460 net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
Commit e20f0dbf20 ("net/mlx5e: RX, Add a prefetch command for small
L1_CACHE_BYTES") switched to using net_prefetchw at all places in mlx5e.
In the same time frame, commit 5af75c747e ("net/mlx5e: Enhanced TX
MPWQE for SKBs") added one more usage of prefetchw. When these two
changes were merged, this new occurrence of prefetchw wasn't replaced
with net_prefetchw.

This commit fixes this last occurrence of prefetchw in
mlx5e_tx_mpwqe_session_start, making the same change that was done in
mlx5e_xdp_mpwqe_session_start.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:31 -08:00
Roi Dayan bca08a9145 net/mlx5e: Remove redundant newline in NL_SET_ERR_MSG_MOD
Fix the following coccicheck warnings:

drivers/net/ethernet/mellanox/mlx5/core/devlink.c:145:29-66: WARNING
avoid newline at end of message in NL_SET_ERR_MSG_MOD
drivers/net/ethernet/mellanox/mlx5/core/devlink.c:140:29-77: WARNING
avoid newline at end of message in NL_SET_ERR_MSG_MOD

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:31 -08:00
Mark Zhang 093bd76469 net/mlx5: Read congestion counters from all ports when lag is active
Read congestion counters from all ports in any lag mode rather than
only in RoCE lag mode (e.g., VF lag).

Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:31 -08:00
Jiapeng Chong 7976092241 net/mlx5: remove unneeded semicolon
Fix the following coccicheck warnings:

./drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c:495:2-3: Unneeded
semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:30 -08:00
Junlin Yang ad2c99ca75 net/mlx5: use kvfree() for memory allocated with kvzalloc()
It is allocated with kvzalloc(), the corresponding release function
should not be kfree(), use kvfree() instead.

Generated by: scripts/coccinelle/api/kfree_mismatch.cocci

Signed-off-by: Junlin Yang <yangjunlin@yulong.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:30 -08:00
Yevgeny Kliteynik cc82a2e6c8 net/mlx5: DR, Add missing vhca_id consume from STEv1
The field source_eswitch_owner_vhca_id was not consumed
in the same way as in STEv0. Added the missing set.

Fixes: 10b6941864 ("net/mlx5: DR, Add HW STEv1 match logic")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:30 -08:00
Yevgeny Kliteynik 1412477882 net/mlx5: DR, Remove unneeded rx_decap_l3 function for STEv1
Remove the dr_ste_v1_set_rx_decap_l3 function that was
replaced by another function - fixing a rebase error.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-03-12 15:29:29 -08:00