When compiling YNL generated code compiler complains about
array-initializer-out-of-bounds. Turns out the MAX value
for STATS_GRP uses the value for STATS.
This may lead to random corruptions in user space (kernel
itself doesn't use this value as it never parses stats).
Fixes: f09ea6fb12 ("ethtool: add a new command for reading standard stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This attribute, which is part of ethtool's ring param configuration
allows the user to specify the maximum number of the packet's payload
that can be written directly to the device.
Example usage:
# ethtool -G [interface] tx-push-buf-len [number of bytes]
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to what was done for TX_PUSH, add an RX_PUSH concept
to the ethtool interfaces.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IEEE 802.3-2018 clause 99 defines a MAC Merge sublayer which contains an
Express MAC and a Preemptible MAC. Both MACs are hidden to higher and
lower layers and visible as a single MAC (packet classification to eMAC
or pMAC on TX is done based on priority; classification on RX is done
based on SFD).
For devices which support a MAC Merge sublayer, it is desirable to
retrieve individual packet counters from the eMAC and the pMAC, as well
as aggregate statistics (their sum).
Introduce a new ETHTOOL_A_STATS_SRC attribute which is part of the
policy of ETHTOOL_MSG_STATS_GET and, and an ETHTOOL_A_PAUSE_STATS_SRC
which is part of the policy of ETHTOOL_MSG_PAUSE_GET (accepted when
ETHTOOL_FLAG_STATS is set in the common ethtool header). Both of these
take values from enum ethtool_mac_stats_src, defaulting to "aggregate"
in the absence of the attribute.
Existing drivers do not need to pay attention to this enum which was
added to all driver-facing structures, just the ones which report the
MAC merge layer as supported.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2
specifications (the other being Frame Preemption; IEEE 802.1Q-2018
clause 6.7.2), which work together to minimize latency caused by frame
interference at TX. The overall goal of TSN is for normal traffic and
traffic with a bounded deadline to be able to cohabitate on the same L2
network and not bother each other too much.
The standards achieve this (partly) by introducing the concept of
preemptible traffic, i.e. Ethernet frames that have a custom value for
the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented
and reassembled at L2 on a link-local basis. The non-preemptible frames
are called express traffic, they are transmitted using a normal SFD, and
they can preempt preemptible frames, therefore having lower latency,
which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
frames around 9K). Preemption is not recursive, i.e. a P frame cannot
preempt another P frame. Preemption also does not depend upon priority,
or otherwise said, an E frame with prio 0 will still preempt a P frame
with prio 7.
In terms of implementation, the standards talk about the presence of an
express MAC (eMAC) which handles express traffic, and a preemptible MAC
(pMAC) which handles preemptible traffic, and these MACs are multiplexed
on the same MII by a MAC merge layer.
To support frame preemption, the definition of the SFD was generalized
to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an
Ethernet frame fragment, or a complete frame. Stations unaware of an SMD
value different from the standard SFD will treat P frames as error
frames. To prevent that from happening, a negotiation process is
defined.
On RX, packets are dispatched to the eMAC or pMAC after being filtered
by their SMD. On TX, the eMAC/pMAC classification decision is taken by
the 802.1Q spec, based on packet priority (each of the 8 user priority
values may have an admin-status of preemptible or express).
The MAC Merge layer and the Frame Preemption parameters have some degree
of independence in terms of how software stacks are supposed to deal
with them. The activation of the MM layer is supposed to be controlled
by an LLDP daemon (after it has been communicated that the link partner
also supports it), after which a (hardware-based or not) verification
handshake takes place, before actually enabling the feature. So the
process is intended to be relatively plug-and-play. Whereas FP settings
are supposed to be coordinated across a network using something
approximating NETCONF.
The support contained here is exclusively for the 802.3 (MAC Merge)
portions and not for the 802.1Q (Frame Preemption) parts. This API is
sufficient for an LLDP daemon to do its job. The FP adminStatus variable
from 802.1Q is outside the scope of an LLDP daemon.
I have taken a few creative licenses and augmented the Linux kernel UAPI
compared to the standard managed objects recommended by IEEE 802.3.
These are:
- ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive
Processing state diagram, a MAC Merge layer is always supposed to be
able to receive P frames. However, this implies keeping the pMAC
powered on, which will consume needless power in applications where FP
will never be used. If LLDP is used, the reception of an Additional
Ethernet Capabilities TLV from the link partner is sufficient
indication that the pMAC should be enabled. So my proposal is that in
Linux, we keep the pMAC turned off by default and that user space
turns it on when needed.
- ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called
aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in
the boolean netlink attributes offered, so this is also positive here.
Other than the meaning being reversed, they correspond to the same
thing.
- ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP
daemon to maximize the verifyTime variable (delay between SMD-V
transmissions), to maximize its chances that the LP replies. IEEE says
that the verifyTime can range between 1 and 128 ms, but the NXP ENETC
stupidly keeps this variable in a 7 bit register, so the maximum
supported value is 127 ms. I could have chosen to hardcode this in the
LLDP daemon to a lower value, but why not let the kernel expose its
supported range directly.
- ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called
aMACMergeAddFragSize, and expresses the "additional" fragment size
(on top of ETH_ZLEN), whereas this expresses the absolute value of the
fragment size.
- ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed
object mandated by the standard, but user space clearly needs to know
what is the minimum supported fragment size of our local receiver,
since LLDP must advertise a value no lower than that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the following ethtool tx aggregation parameters:
ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES
Maximum size in bytes of a tx aggregated block of frames.
ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES
Maximum number of frames that can be aggregated into a block.
ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS
Time in usecs after the first packet arrival in an aggregated
block for the block to be sent.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for configuring the PLCA Reconciliation Sublayer on
multi-drop PHYs that support IEEE802.3cg-2019 Clause 148 (e.g.,
10BASE-T1S). This patch adds the appropriate netlink interface
to ethtool.
Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netlink based support for "ethtool -x <dev> [context x]"
command by implementing ETHTOOL_MSG_RSS_GET netlink message.
This is equivalent to functionality provided via ETHTOOL_GRSSH
in ioctl path. It sends RSS table, hash key and hash function
of an interface to user space.
This patch implements existing functionality available
in ioctl path and enables addition of new RSS context
based parameters in future.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Link: https://lore.kernel.org/r/20221202002555.241580-1-sudheer.mogilappagari@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The previous attempt to augment carrier_down (see Link)
was not met with much enthusiasm so let's do the simple
thing of exposing what some devices already maintain.
Add a common ethtool statistic for link going down.
Currently users have to maintain per-driver mapping
to extract the right stat from the vendor-specific ethtool -S
stats. carrier_down does not fit the bill because it counts
a lot of software related false positives.
Add the statistic to the extended link state API to steer
vendors towards implementing all of it.
Implement for bnxt and all Linux-controlled PHYs. mlx5 and (possibly)
enic also have a counter for this but I leave the implementation
to their maintainers.
Link: https://lore.kernel.org/r/20220520004500.2250674-1-kuba@kernel.org
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20221104190125.684910-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add interface to support Power Sourcing Equipment. At current step it
provides generic way to address all variants of PSE devices as defined
in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4
PoDL Power Sourcing Equipment (PSE).
Currently supported and mandatory objects are:
IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus
IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl
This is minimal interface needed to control PSE on each separate
ethernet port but it provides not all mandatory objects specified in
IEEE 802.3-2018.
Since "PoDL PSE" and "PSE" have similar names, but some different values
I decide to not merge them and keep separate naming schema. This should
allow as to be as close to IEEE 802.3 spec as possible and avoid name
conflicts in the future.
This implementation is connected to PHYs instead of MACs because PSE
auto classification can potentially interfere with PHY auto negotiation.
So, may be some extra PHY related initialization will be needed.
With WIP version of ethtools interaction with PSE capable link looks
as following:
$ ip l
...
5: t1l1@eth0: <BROADCAST,MULTICAST> ..
...
$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: disabled
PoDL PSE Power Detection Status: disabled
$ ethtool --set-pse t1l1 podl-pse-admin-control enable
$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: enabled
PoDL PSE Power Detection Status: delivering power
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds support for rate matching (also known as rate adaptation) to
the phy subsystem. The general idea is that the phy interface runs at
one speed, and the MAC throttles the rate at which it sends packets to
the link speed. There's a good overview of several techniques for
achieving this at [1]. This patch adds support for three: pause-frame
based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and
2BASE-TL), and open-loop-based (such as in 10GBASE-W).
This patch makes a few assumptions and a few non assumptions about the
types of rate matching available. First, it assumes that different phys
may use different forms of rate matching. Second, it assumes that phys
can use rate matching for any of their supported link speeds (e.g. if a
phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T).
Third, it does not assume that all interface modes will use the same
form of rate matching. Fourth, it does not assume that all phy devices
will support rate matching (even if some do). Relaxing or strengthening
these (non-)assumptions could result in a different API. For example, if
all interface modes were assumed to use the same form of rate matching,
then a bitmask of interface modes supportting rate matching would
suffice.
For some better visibility into the process, the current rate matching
mode is exposed as part of the ethtool ksettings. For the moment, only
read access is supported. I'm not sure what userspace might want to
configure yet (disable it altogether, disable just one mode, specify the
mode to use, etc.). For the moment, since only pause-based rate
adaptation support is added in the next few commits, rate matching can
be disabled altogether by adjusting the advertisement.
802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and
"rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls
this feature "rate adaptation". I chose "rate matching" because it is
shorter, and because Russell doesn't think "adaptation" is correct in this
context.
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently tx push is a standard driver feature which controls use of a fast
path descriptor push. So this patch extends the ringparam APIs and data
structures to support set/get tx push by ethtool -G/g.
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For applications running on a mix of platforms it's useful
to have a clear indication whether host's NIC supports the
geometry requirements of TCP zero-copy. TCP zero-copy Rx
requires data to be neatly placed into memory pages.
Most NICs can't do that.
This patch is adding GET support only, since the NICs
I work with either always have the feature enabled or
enable it whenever MTU is set to jumbo. In other words
I don't need SET. But adding set should be trivial.
(The only note on SET is that we will likely want
the setting to be "sticky" and use 0 / `unknown`
to reset it back to driver default.)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to set rx buf len via ethtool -G parameter and get
rx buf len via ethtool -g parameter.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ETHTOOL_A_PAUSE_STAT_MAX is the MAX attribute id,
so we need to subtract non-stats and add one to
get a count (IOW -2+1 == -1).
Otherwise we'll see:
ethnl cmd 21: calculated reply length 40, but consumed 52
Fixes: 9a27a33027 ("ethtool: add standard pause stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and
'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver
modules parameters and retrieve their status.
The first parameter to control is the power mode of the module. It is
only relevant for paged memory modules, as flat memory modules always
operate in low power mode.
When a paged memory module is in low power mode, its power consumption
is reduced to the minimum, the management interface towards the host is
available and the data path is deactivated.
User space can choose to put modules that are not currently in use in
low power mode and transition them to high power mode before putting the
associated ports administratively up. This is useful for user space that
favors reduced power consumption and lower temperatures over reduced
link up times. In QSFP-DD modules the transition from low power mode to
high power mode can take a few seconds and this transition is only
expected to get longer with future / more complex modules.
User space can control the power mode of the module via the power mode
policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible
values:
* high: Module is always in high power mode.
* auto: Module is transitioned by the host to high power mode when the
first port using it is put administratively up and to low power mode
when the last port using it is put administratively down.
The operational power mode of the module is available to user space via
the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not
reported to user space when a module is not plugged-in.
The user API is designed to be generic enough so that it could be used
for modules with different memory maps (e.g., SFF-8636, CMIS).
The only implementation of the device driver API in this series is for a
MAC driver (mlxsw) where the module is controlled by the device's
firmware, but it is designed to be generic enough so that it could also
be used by implementations where the module is controlled by the CPU.
CMIS testing
============
# ethtool -m swp11
Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
...
Module State : 0x03 (ModuleReady)
LowPwrAllowRequestHW : Off
LowPwrRequestSW : Off
The module is not in low power mode, as it is not forced by hardware
(LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off).
The power mode can be queried from the kernel. In case
LowPwrAllowRequestHW was on, the kernel would need to take into account
the state of the LowPwrRequestHW signal, which is not visible to user
space.
$ ethtool --show-module swp11
Module parameters for swp11:
power-mode-policy high
power-mode high
Change the power mode policy to 'auto':
# ethtool --set-module swp11 power-mode-policy auto
Query the power mode again:
$ ethtool --show-module swp11
Module parameters for swp11:
power-mode-policy auto
power-mode low
Verify with the data read from the EEPROM:
# ethtool -m swp11
Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
...
Module State : 0x01 (ModuleLowPwr)
LowPwrAllowRequestHW : Off
LowPwrRequestSW : On
Put the associated port administratively up which will instruct the host
to transition the module to high power mode:
# ip link set dev swp11 up
Query the power mode again:
$ ethtool --show-module swp11
Module parameters for swp11:
power-mode-policy auto
power-mode high
Verify with the data read from the EEPROM:
# ethtool -m swp11
Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
...
Module State : 0x03 (ModuleReady)
LowPwrAllowRequestHW : Off
LowPwrRequestSW : Off
Put the associated port administratively down which will instruct the
host to transition the module to low power mode:
# ip link set dev swp11 down
Query the power mode again:
$ ethtool --show-module swp11
Module parameters for swp11:
power-mode-policy auto
power-mode low
Verify with the data read from the EEPROM:
# ethtool -m swp11
Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
...
Module State : 0x01 (ModuleLowPwr)
LowPwrAllowRequestHW : Off
LowPwrRequestSW : On
SFF-8636 testing
================
# ethtool -m swp13
Identifier : 0x11 (QSFP28)
...
Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled
Power set : Off
Power override : On
...
Transmit avg optical power (Channel 1) : 0.7733 mW / -1.12 dBm
Transmit avg optical power (Channel 2) : 0.7649 mW / -1.16 dBm
Transmit avg optical power (Channel 3) : 0.7790 mW / -1.08 dBm
Transmit avg optical power (Channel 4) : 0.7837 mW / -1.06 dBm
Rcvr signal avg optical power(Channel 1) : 0.9302 mW / -0.31 dBm
Rcvr signal avg optical power(Channel 2) : 0.9079 mW / -0.42 dBm
Rcvr signal avg optical power(Channel 3) : 0.8993 mW / -0.46 dBm
Rcvr signal avg optical power(Channel 4) : 0.8778 mW / -0.57 dBm
The module is not in low power mode, as it is not forced by hardware
(Power override is on) or by software (Power set is off).
The power mode can be queried from the kernel. In case Power override
was off, the kernel would need to take into account the state of the
LPMode signal, which is not visible to user space.
$ ethtool --show-module swp13
Module parameters for swp13:
power-mode-policy high
power-mode high
Change the power mode policy to 'auto':
# ethtool --set-module swp13 power-mode-policy auto
Query the power mode again:
$ ethtool --show-module swp13
Module parameters for swp13:
power-mode-policy auto
power-mode low
Verify with the data read from the EEPROM:
# ethtool -m swp13
Identifier : 0x11 (QSFP28)
Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled
Power set : On
Power override : On
...
Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm
Put the associated port administratively up which will instruct the host
to transition the module to high power mode:
# ip link set dev swp13 up
Query the power mode again:
$ ethtool --show-module swp13
Module parameters for swp13:
power-mode-policy auto
power-mode high
Verify with the data read from the EEPROM:
# ethtool -m swp13
Identifier : 0x11 (QSFP28)
...
Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled
Power set : Off
Power override : On
...
Transmit avg optical power (Channel 1) : 0.7934 mW / -1.01 dBm
Transmit avg optical power (Channel 2) : 0.7859 mW / -1.05 dBm
Transmit avg optical power (Channel 3) : 0.7885 mW / -1.03 dBm
Transmit avg optical power (Channel 4) : 0.7985 mW / -0.98 dBm
Rcvr signal avg optical power(Channel 1) : 0.9325 mW / -0.30 dBm
Rcvr signal avg optical power(Channel 2) : 0.9034 mW / -0.44 dBm
Rcvr signal avg optical power(Channel 3) : 0.9086 mW / -0.42 dBm
Rcvr signal avg optical power(Channel 4) : 0.8885 mW / -0.51 dBm
Put the associated port administratively down which will instruct the
host to transition the module to low power mode:
# ip link set dev swp13 down
Query the power mode again:
$ ethtool --show-module swp13
Module parameters for swp13:
power-mode-policy auto
power-mode low
Verify with the data read from the EEPROM:
# ethtool -m swp13
Identifier : 0x11 (QSFP28)
...
Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled
Power set : On
Power override : On
...
Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm
Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm
Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, there are many drivers who support CQE mode configuration,
some configure it as a fixed when initialized, some provide an
interface to change it by ethtool private flags. In order to make it
more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and
'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these
parameters can be accessed by ethtool netlink coalesce uAPI.
Also add an new structure kernel_ethtool_coalesce, then the
new parameter can be added into this struct.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add an interface for getting PHC (PTP Hardware Clock)
virtual clocks, which are based on PHC physical clock
providing hardware timestamp to network packets.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'ETHTOOL_A_MODULE_EEPROM_DATA' is a binary attribute, not a nested one.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most devices maintain RMON (RFC 2819) stats - particularly
the "histogram" of packets received by size. Unlike other
RFCs which duplicate IEEE stats, the short/oversized frame
counters in RMON don't seem to match IEEE stats 1-to-1 either,
so expose those, too. Do not expose basic packet, CRC errors
etc - those are already otherwise covered.
Because standard defines packet ranges only up to 1518, and
everything above that should theoretically be "oversized"
- devices often create their own ranges.
Going beyond what the RFC defines - expose the "histogram"
in the Tx direction (assume for now that the ranges will
be the same).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Number of devices maintains the standard-based MAC control
counters for control frames. Add a API for those.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the MAC statistics are included in
struct rtnl_link_stats64, but some fields
are aggregated. Besides it's good to expose
these clearly hardware stats separately.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an interface for reading standard stats, including
stats which don't have a corresponding control interface.
Start with IEEE 802.3 PHY stats. There seems to be only
one stat to expose there.
Define API to not require user space changes when new
stats or groups are added. Groups are based on bitset,
stats have a string set associated.
v1: wrap stats in a nest
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to pause statistics add stats for FEC.
The IEEE standard mandates two sets of counters:
- 30.5.1.1.17 aFECCorrectedBlocks
- 30.5.1.1.18 aFECUncorrectableBlocks
where block is a block of bits FEC operates on.
Each of these counters is defined per lane (PCS instance).
Multiple vendors provide number of corrected _bits_ rather
than/as well as blocks.
This set adds the 2 standard-based block counters and a extra
one for corrected bits.
Counters are exposed to user space via netlink in new attributes.
Each attribute carries an array of u64s, first element is
the total count, and the following ones are a per-lane break down.
Much like with pause stats the operation will not fail when driver
does not implement the get_fec_stats callback (nor can the driver
fail the operation by returning an error). If stats can't be
reported the relevant attributes will be empty.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define get_module_eeprom_by_page() ethtool callback and implement
netlink infrastructure.
get_module_eeprom_by_page() allows network drivers to dump a part of
module's EEPROM specified by page and bank numbers along with offset and
length. It is effectively a netlink replacement for get_module_info()
and get_module_eeprom() pair, which is needed due to emergence of
complex non-linear EEPROM layouts.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add FEC API to netlink.
This is not a 1-to-1 conversion.
FEC settings already depend on link modes to tell user which
modes are supported. Take this further an use link modes for
manual configuration. Old struct ethtool_fecparam is still
used to talk to the drivers, so we need to translate back
and forth. We can revisit the internal API if number of FEC
encodings starts to grow.
Enforce only one active FEC bit (by using a bit position
rather than another mask).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when auto negotiation is on, the user can advertise all the
linkmodes which correspond to a specific speed, but does not have a
similar selector for the number of lanes. This is significant when a
specific speed can be achieved using different number of lanes. For
example, 2x50 or 4x25.
Add 'ETHTOOL_A_LINKMODES_LANES' attribute and expand 'struct
ethtool_link_settings' with lanes field in order to implement a new
lanes-selector that will enable the user to advertise a specific number
of lanes as well.
When auto negotiation is off, lanes parameter can be forced only if the
driver supports it. Add a capability bit in 'struct ethtool_ops' that
allows ethtool know if the driver can handle the lanes parameter when
auto negotiation is off, so if it does not, an error message will be
returned when trying to set lanes.
Example:
$ ethtool -s swp1 lanes 4
$ ethtool swp1
Settings for swp1:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseKX/Full
10000baseKR/Full
40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
25000baseCR/Full
25000baseSR/Full
50000baseCR2/Full
100000baseSR4/Full
100000baseCR4/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
100000baseSR4/Full
100000baseCR4/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Unknown! (255)
Auto-negotiation: on
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
Link detected: no
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two minor conflicts:
1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.
2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tunnel offload info code uses ETHTOOL_MSG_TUNNEL_INFO_GET message type (cmd
field in genetlink header) for replies to tunnel info netlink request, i.e.
the same value as the request have. This is a problem because we are using
two separate enums for userspace to kernel and kernel to userspace message
types so that this ETHTOOL_MSG_TUNNEL_INFO_GET (28) collides with
ETHTOOL_MSG_CABLE_TEST_TDR_NTF which is what message type 28 means for
kernel to userspace messages.
As the tunnel info request reached mainline in 5.9 merge window, we should
still be able to fix the reply message type without breaking backward
compatibility.
Fixes: c7d759eb7b ("ethtool: add tunnel info interface")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently drivers have to report their pause frames statistics
via ethtool -S, and there is a wide variety of names used for
these statistics.
Add the two statistics defined in IEEE 802.3x to the standard
API. Create a new ethtool request header flag for including
statistics in the response to GET commands.
Always create the ETHTOOL_A_PAUSE_STATS nest in replies when
flag is set. Testing if driver declares the op is not a reliable
way of checking if any stats will actually be included and therefore
we don't want to give the impression that presence of
ETHTOOL_A_PAUSE_STATS indicates driver support.
Note that this patch does not include PFC counters, which may fit
better in dcbnl? But mostly I don't need them/have a setup to test
them so I haven't looked deeply into exposing them :)
v3:
- add a helper for "uninitializing" stats, rather than a cryptic
memset() (Andrew)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an interface to report offloaded UDP ports via ethtool netlink.
Now that core takes care of tracking which UDP tunnel ports the NICs
are aware of we can quite easily export this information out to
user space.
The responsibility of writing the netlink dumps is split between
ethtool code and udp_tunnel_nic.c - since udp_tunnel module may
not always be loaded, yet we should always report the capabilities
of the NIC.
$ ethtool --show-tunnels eth0
Tunnel information for eth0:
UDP port table 0:
Size: 4
Types: vxlan
No entries
UDP port table 1:
Size: 4
Types: geneve, vxlan-gpe
Entries (1):
port 1230, vxlan-gpe
v4:
- back to v2, build fix is now directly in udp_tunnel.h
v3:
- don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET
not set.
v2:
- fix string set count,
- reorder enums in the uAPI,
- fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset
in docs and comments.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, drivers can only tell whether the link is up/down using
LINKSTATE_GET, but no additional information is given.
Add attributes to LINKSTATE_GET command in order to allow drivers
to expose the user more information in addition to link state to ease
the debug process, for example, reason for link down state.
Extended state consists of two attributes - link_ext_state and
link_ext_substate. The idea is to avoid 'vendor specific' states in order
to prevent drivers to use specific link_ext_state that can be in the future
common link_ext_state.
The substates allows drivers to add more information to the common
link_ext_state. For example, vendor can expose 'Autoneg' as link_ext_state
and add 'No partner detected during force mode' as link_ext_substate.
If a driver cannot pinpoint the extended state with the substate
accuracy, it is free to expose only the extended state and omit the
substate attribute.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.
2) Add GSO partial support to igc, from Sasha Neftin.
3) Several cleanups and improvements to r8169 from Heiner Kallweit.
4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.
5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.
6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.
7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.
8) Add sriov and vf support to hinic, from Luo bin.
9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.
10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.
11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.
12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.
13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.
14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.
15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.
16) Several optimizations to sch_fq scheduler, from Eric Dumazet.
17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.
18) Several RISCV bpf jit optimizations, from Luke Nelson.
19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.
20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.
21) Add BPF iterators, from Yonghang Song.
22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.
23) Remove zero-length arrays all over, from Gustavo A. R. Silva.
24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.
25) Add CAP_BPF, from Alexei Starovoitov.
26) Support terse dumps in the packet scheduler, from Vlad Buslov.
27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.
28) Add devm_register_netdev(), from Bartosz Golaszewski.
29) Minimize qdisc resets, from Cong Wang.
30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...
Allow the user to configure where on the cable the TDR data should be
retrieved, in terms of first and last sample, and the step between
samples. Also add the ability to ask for TDR data for just one pair.
If this configuration is not provided, it defaults to 1-150m at 1m
intervals for all pairs.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v3:
Move the TDR configuration into a structure
Add a range check on step
Use NL_SET_ERR_MSG_ATTR() when appropriate
Move TDR configuration into a nest
Document attributes in the request
Signed-off-by: David S. Miller <davem@davemloft.net>
Some Ethernet PHYs can return the raw time domain reflectromatry data.
Add the attributes to allow this data to be requested and returned via
netlink ethtool.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
m -> cm
Report what the PHY actually used for start/stop/step.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the attributes needed to report cable test results to userspace.
The reports are expected to be per twisted pair. A nested property per
pair can report the result of the cable test. A nested property can
also report the length of the cable to any fault.
v2:
Grammar fixes
Change length from u16 to u32
s/DEV/HEADER/g
Add status attributes
Rename pairs from numbers to letters.
v3:
Fixed example in document
Add ETHTOOL_A_CABLE_NEST_* enum
Add ETHTOOL_MSG_CABLE_TEST_NTF to documentation
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add new ethtool netlink calls to trigger the starting of a PHY cable
test.
Add Kconfig'ury to ETHTOOL_NETLINK so that PHYLIB is not a module when
ETHTOOL_NETLINK is builtin, which would result in kernel linking errors.
v2:
Remove unwanted white space change
Remove ethnl_cable_test_act_ops and use doit handler
Rename cable_test_set_policy cable_test_act_policy
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY
v3:
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY from documentation
Remove unused cable_test_get_policy
Add Reviewed-by tags
v4:
Remove unwanted blank line
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of
auto-negotiation support, we needed to be able to configure the
MASTER-SLAVE role of the port manually or from an application in user
space.
The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to
force MASTER or SLAVE role. See IEEE 802.3-2018:
22.2.4.3.7 MASTER-SLAVE control register (Register 9)
22.2.4.3.8 MASTER-SLAVE status register (Register 10)
40.5.2 MASTER-SLAVE configuration resolution
45.2.1.185.1 MASTER-SLAVE config value (1.2100.14)
45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32)
The MASTER-SLAVE role affects the clock configuration:
-------------------------------------------------------------------------------
When the PHY is configured as MASTER, the PMA Transmit function shall
source TX_TCLK from a local clock source. When configured as SLAVE, the
PMA Transmit function shall source TX_TCLK from the clock recovered from
data stream provided by MASTER.
iMX6Q KSZ9031 XXX
------\ /-----------\ /------------\
| | | | |
MAC |<----RGMII----->| PHY Slave |<------>| PHY Master |
|<--- 125 MHz ---+-<------/ | | \ |
------/ \-----------/ \------------/
^
\-TX_TCLK
-------------------------------------------------------------------------------
Since some clock or link related issues are only reproducible in a
specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial
to provide generic (not 100BASE-T1 specific) interface to the user space
for configuration flexibility and trouble shooting.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several references got broken due to txt to ReST conversion.
Several of them can be automatically fixed with:
scripts/documentation-file-ref-check --fix
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> # hwtracing/coresight/Kconfig
Reviewed-by: Paul E. McKenney <paulmck@kernel.org> # memory-barrier.txt
Acked-by: Alex Shi <alex.shi@linux.alibaba.com> # translations/zh_CN
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> # translations/it_IT
Acked-by: Marc Zyngier <maz@kernel.org> # kvm/arm64
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6f919ddb83a33b5f2a63b6b5f0575737bb2b36aa.1586881715.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Implement TSINFO_GET request to get timestamping information for a network
device. This is traditionally available via ETHTOOL_GET_TS_INFO ioctl
request.
Move part of ethtool_get_ts_info() into common.c so that ioctl and netlink
code use the same logic to get timestamping information from the device.
v3: use "TSINFO" rather than "TIMESTAMP", suggested by Richard Cochran
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_EEE_NTF notification whenever EEE settings of a network
device are modified using ETHTOOL_MSG_EEE_SET netlink message or
ETHTOOL_SEEE ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement EEE_SET netlink request to set EEE settings of a network device.
These are traditionally set with ETHTOOL_SEEE ioctl request.
The netlink interface allows setting the EEE status for all link modes
supported by kernel but only first 32 link modes can be set at the moment
as only those are supported by the ethtool_ops callback.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement EEE_GET request to get EEE settings of a network device. These
are traditionally available via ETHTOOL_GEEE ioctl request.
The netlink interface allows reporting EEE status for all link modes
supported by kernel but only first 32 link modes are provided at the moment
as only those are reported by the ethtool_ops callback and drivers.
v2: fix alignment (whitespace only)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_PAUSE_NTF notification whenever pause parameters of
a network device are modified using ETHTOOL_MSG_PAUSE_SET netlink message
or ETHTOOL_SPAUSEPARAM ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PAUSE_SET netlink request to set pause parameters of a network
device. Thease are traditionally set with ETHTOOL_SPAUSEPARAM ioctl
request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PAUSE_GET request to get pause parameters of a network device.
These are traditionally available via ETHTOOL_GPAUSEPARAM ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_COALESCE_NTF notification whenever coalescing parameters
of a network device are modified using ETHTOOL_MSG_COALESCE_SET netlink
message or ETHTOOL_SCOALESCE ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement COALESCE_SET netlink request to set coalescing parameters of
a network device. These are traditionally set with ETHTOOL_SCOALESCE ioctl
request. This commit adds only support for device coalescing parameters,
not per queue coalescing parameters.
Like the ioctl implementation, the generic ethtool code checks if only
supported parameters are modified; if not, first offending attribute is
reported using extack.
v2: fix alignment (whitespace only)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>