"Unsupported key used:" appears in kernel log when flows with
unsupported key are used, arp fields for example.
OpenVSwitch was changed to match on arp fields by default that
caused this warning to appear in kernel log for every arp rule, which
can be a lot.
Fix by lowering print level from warning to debug.
Fixes: e3a2b7ed01 ("net/mlx5e: Support offload cls_flower with drop action")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Instead of directly return, goto the error handling label to free
allocated page.
Fixes: 5f29458b77 ("net/mlx5e: Support dump callback in TX reporter")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
rate_bytes_ps is a 64-bit field. It passed as 32-bit field to
apply_police_params(). Due to this when police rate is higher
than 4Gbps, 32-bit calculation ignores the carry. This results
in incorrect rate configurationn the device.
Fix it by performing 64-bit calculation.
Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When we create the ft object we also init rhltable in ft->fgs_hash.
So in error flow before kfree of ft we need to destroy that rhltable.
Fixes: 693c6883bb ("net/mlx5: Add hash table for flow groups in flow table")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link speed advertising in igc has two problems:
- When setting the advertisement via ethtool, the link speed is converted
to the legacy 32 bit representation for the intel PHY code.
This inadvertently drops ETHTOOL_LINK_MODE_2500baseT_Full_BIT (being
beyond bit 31). As a result, any call to `ethtool -s ...' drops the
2500Mbit/s link speed from the PHY settings. Only reloading the driver
alleviates that problem.
Fix this by converting the ETHTOOL_LINK_MODE_2500baseT_Full_BIT to the
Intel PHY ADVERTISE_2500_FULL bit explicitly.
- Rather than checking the actual PHY setting, the .get_link_ksettings
function always fills link_modes.advertising with all link speeds
the device is capable of.
Fix this by checking the PHY autoneg_advertised settings and report
only the actually advertised speeds up to ethtool.
Fixes: 8c5ad0dae9 ("igc: Add ethtool support")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This change simplifies the VF initialization check and also minimizes
the delay between acquiring the VSI pointer and using it. As known by
the commit being fixed, there is a risk of the VSI pointer getting
changed. Therefore minimize the delay between getting and using the
pointer.
Fixes: 9889707b06 ("i40e: Fix crash caused by stress setting of VF MAC addresses")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The current MSI-X enablement logic tries to enable best-case MSI-X
vectors and if that fails we only support a bare-minimum set. This
includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X
for the OICR interrupt. Unfortunately, the driver fails to load when we
don't get as many MSI-X as requested for a couple reasons.
First, the code to allocate MSI-X in the driver tries to allocate
num_online_cpus() MSI-X for LAN traffic without caring about the number
of MSI-X actually enabled/requested from the kernel for LAN traffic.
So, when calling ice_get_res() for the PF VSI, it returns failure
because the number of available vectors is less than requested. Fix
this by not allowing the PF VSI to allocation more than
pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues.
Limiting the number of queues is done because we don't want more than
1 Tx/Rx queue per interrupt due to performance conerns.
Second, the driver assigns pf->num_lan_msix = 2, to account for LAN
traffic and the OICR. However, pf->num_lan_msix is only meant for LAN
MSI-X. This is causing a failure when the PF VSI tries to
allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has
already been reserved, so there may not be enough MSI-X vectors left.
Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the
ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for
the failure case.
Update the related defines used in ice_ena_msix_range() to align with
the above behavior and remove the unused RDMA defines because RDMA is
currently not supported. Also, remove the now incorrect comment.
Fixes: 152b978a1f ("ice: Rework ice_ena_msix_range")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently users could create more channels than LAN MSI-X available.
This is happening because there is no check against pf->num_lan_msix
when checking the max allowed channels and will cause performance issues
if multiple Tx and Rx queues are tied to a single MSI-X. Fix this by not
allowing more channels than LAN MSI-X available in pf->num_lan_msix.
Fixes: 87324e747f ("ice: Implement ethtool ops for channels")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the driver to copy the MAC address configured in ndo_set_mac_address
into dev_addr, even if the MAC filter already exists in HW. In some
situations (e.g. bonding) the netdev's dev_addr could have been modified
outside of the driver, with no change to the HW filter, so the driver
cannot assume that they match.
Fixes: 757976ab16 ("ice: Fix check for removing/adding mac filters")
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This patch is based on a similar change to i40e by Slawomir Laba:
"i40e: Implement flow for IPv6 next header (extension header)".
When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.
The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with ICE_TX_CTX_EIPT_IPV6.
The extension header was not skipped at all.
The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.
Fixes: a4e82a81f5 ("ice: Add support for tunnel offloads")
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The packet classifier would occasionally misrecognize an IPv6 training
packet when the next protocol field was 0. The correct value for
unspecified protocol is IPPROTO_NONE.
Fixes: 165d80d6ad ("ice: Support IPv6 Flow Director filters")
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Once the firmware fatal condition is detected, we should cease
comminication with the firmware and hardware quickly even if there
are many completion entries in the completion rings. This will
speed up the recovery process and prevent further I/Os that may
cause further exceptions.
Do not proceed in the NAPI poll function if fatal condition is
detected. Call napi_complete() and return without arming interrupts.
Cleanup of all rings and reset are imminent.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Combine the three netdev_warn() calls into a single call, printed at
the NETIF_MSG_HW log level.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the event of a fatal firmware error, firmware will notify the host
and then it will proceed to do core reset when it sees that all functions
have disabled Bus Master. To prevent Master Aborts and other hard
errors, we need to quiesce all activities in addition to disabling Bus
Master before the chip goes into core reset.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the event of a fatal firmware error, we want to disable IRQ early
in the recovery sequence. This change will allow it to be called
safely again as part of the normal shutdown sequence.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Up until now, we don't need to keep track of this state because NAPI
is always enabled once and disabled once during bring up and shutdown.
For better error recovery in subsequent patches, we want to quiesce
the device earlier during fatal error conditions. The normal shutdown
sequence will disable NAPI again and the flag will prevent disabling
NAPI twice.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This code to check if we have reached the maximum wait time after
firmware reset is used multiple times. Add a helper function to
do this.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware may be in the middle of reset when the driver tries to do ifup.
In that case, firmware will return a special error code and the driver
will retry 10 times with 50 msecs delay after each retry.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Drawing a hard line on aborted resets prevents a NIC open in
some scenarios that may otherwise be recoverable. For example,
if a firmware recovery happened while a PF was down and an
attempt was made to bring up an associated VF in this state,
then it was impossible to ever bring up this VF without a
rebind or reload of its driver.
Attempt to reinitialize the firmware when an aborted reset (or
failed init after a reset) is discovered during open - it may
succeed. Also take care to allow the user to retry opening the
NIC even after an aborted reset.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Firmware is capable of generating asynchronous debug notifications.
The event data is opaque to the driver and is simply logged. Debug
notifications can be enabled by turning on hardware status messages
using the ethtool msglvl interface.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The timeout period for firmware messages is passed to the driver
from the firmware in the response of the first command. This
timeout period is multiplied by a factor for certain long
running commands such as NVRAM commands. In some cases, the
timeout period can become really long and it can cause hung task
warnings if firmware has crashed or is not responding. To avoid
such long delays, cap all firmware commands to a max timeout value
of 40 seconds.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If firmware is in reset or in bad state, it won't be able to return
VPD data. Move bnxt_vpd_read_info() until after bnxt_fw_init_one_p1()
successfully returns. By then we would have established proper
communications with the firmware.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first HWRM_VER_GET message to firmware during probe may timeout if
firmware is under reset. This can happen during hot-plug for example.
On P5 and newer chips, we can check if firmware is in the boot stage by
reading a status register. Retry 5 times if the status register shows
that firmware is not ready and not in error state.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add missing support for handling NO_MASTER crashes while ports are
administratively down (ifdown). On some SoC platforms, the driver
needs to assist the firmware to recover from a crash via OP-TEE.
This is performed in a similar fashion to what is done during driver
probe.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Define macros to check for the various states in the lower 16 bits of
the health register. Replace the C code that checks for these values
with the newly defined macros.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Updates to backing store APIs, QoS profiles, and push buffer initial
index support.
Since the new HWRM_FUNC_BACKING_STORE_CFG message size has increased,
we need to add some compat. logic to fall back to the smaller legacy
size if firmware cannot accept the larger message size. The new fields
added to the structure are not used yet.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MT762x HW, except for MT7628, supports frame length up to 2048
(maximum length on GDM), so allow setting MTU up to 2030.
Also set the default frame length to the hardware default 1518.
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210125042046.5599-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support SPI and sequence number fields of
ESP/AH header to be hashed for RSS. By default
ESP/AH fields are not considered for RSS and
needs to be set explicitly as below:
ethtool -U eth0 rx-flow-hash esp4 sdfn
or
ethtool -U eth0 rx-flow-hash ah4 sdfn
or
ethtool -U eth0 rx-flow-hash esp6 sdfn
or
ethtool -U eth0 rx-flow-hash ah6 sdfn
To disable hashing of ESP fields:
ethtool -U eth0 rx-flow-hash esp4 sd
or
ethtool -U eth0 rx-flow-hash ah4 sd
or
ethtool -U eth0 rx-flow-hash esp6 sd
or
ethtool -U eth0 rx-flow-hash ah6 sd
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/1611378552-13288-1-git-send-email-sundeep.lkml@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When working on the PCI VPD code I also tested with a Broadcom BCM95719
card. tg3 uses internal NVRAM access with this card, so I forced it to
PCI VPD mode for testing. PCI VPD access fails
(i + PCI_VPD_LRDT_TAG_SIZE + j > len) because only TG3_NVM_VPD_LEN (256)
bytes are read, but PCI VPD has 400 bytes on this card.
So add a constant TG3_NVM_PCI_VPD_MAX_LEN that defines the maximum
PCI VPD size. The actual VPD size is returned by pci_read_vpd().
In addition it's not worth looping over pci_read_vpd(). If we miss the
125ms timeout per VPD dword read then definitely something is wrong,
and if the tg3 module loading is killed then there's also not much
benefit in retrying the VPD read.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/cb9e9113-0861-3904-87e0-d4c4ab3c8860@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The mvpp2 is an Ethernet driver, and it implements MAC style time
stamping of PTP frames. It has no need of the expensive option to
enable PHY time stamping. Remove the incorrect dependency.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The dst entry should be released if no neighbour is found. Goto label
free_dst to fix the issue. Besides, the check of ndev against NULL is
redundant.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210121145738.51091-1-bianpan2016@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The switch ASIC has a limited capacity of physical ('flavour physical'
in devlink terminology) ports that it can support. While each system is
brought up with a different number of ports, this number can be
increased via splitting up to the ASIC's limit.
Expose physical ports as a devlink resource so that user space will have
visibility to the maximum number of ports that can be supported and the
current occupancy.
In addition, add a "Generic Resources" section in devlink-resource
documentation so the different drivers will be aligned by the same resource
name when exposing to user space.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit adds support for HTB offload in the mlx5e driver.
Performance:
NIC: Mellanox ConnectX-6 Dx
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (24 cores with HT)
100 Gbit/s line rate, 500 UDP streams @ ~200 Mbit/s each
48 traffic classes, flower used for steering
No shaping (rate limits set to 4 Gbit/s per TC) - checking for max
throughput.
Baseline: 98.7 Gbps, 8.25 Mpps
HTB: 6.7 Gbps, 0.56 Mpps
HTB offload: 95.6 Gbps, 8.00 Mpps
Limitations:
1. 256 leaf nodes, 3 levels of depth.
2. Granularity for ceil is 1 Mbit/s. Rates are converted to weights, and
the bandwidth is split among the siblings according to these weights.
Other parameters for classes are not supported.
Ethtool statistics support for QoS SQs are also added. The counters are
called qos_txN_*, where N is the QoS queue number (starting from 0, the
numeration is separate from the normal SQs), and * is the counter name
(the counters are the same as for the normal SQs).
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Without this change the driver tries to allocate too many queues,
breaching the number of available msi-x interrupts on machines
with many logical cpus and default adapter settings:
Insufficient resources for 12 XDP event queues (24 other channels, max 32)
Which in turn triggers EINVAL on XDP processing:
sfc 0000:86:00.0 ext0: XDP TX failed (-22)
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20210120212759.81548-1-ivan@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Also decrement the reference count of child device on error path.
Fixes: 3e782985cb ("net: ethernet: fec: Allow configuration of MDIO bus speed")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210120122037.83897-1-bianpan2016@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The octeontx2 hardware needs the buffer to be 128 byte aligned.
But in the current implementation of napi_alloc_frag(), it can't
guarantee the return address is 128 byte aligned even the request size
is a multiple of 128 bytes, so we have to request an extra 128 bytes and
use the PTR_ALIGN() to make sure that the buffer is aligned correctly.
Fixes: 7a36e4918e ("octeontx2-pf: Use the napi_alloc_frag() to alloc the pool buffers")
Reported-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Tested-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20210121070906.25380-1-haokexin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Support changing the state of the SF port's function through devlink.
When activating the SF port's function, enable the hca in the device
followed by adding its auxiliary device.
When deactivating the SF port's function, delete its auxiliary device
followed by disabling the vHCA.
Port function attributes get/set callbacks are invoked with devlink
instance lock held. Such callbacks need to synchronize with sf port
table getting disabled either via sriov sysfs callback. Such callbacks
synchronize with table disable context holding table refcount.
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:88:88 state inactive opstate detached
$ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 state active
$ devlink port show ens2f0npf0sf88 -jp
{
"port": {
"pci/0000:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:88:88",
"state": "active",
"opstate": "attached"
}
}
}
}
On port function activation, an auxiliary device is created in below
example.
$ devlink dev show
devlink dev show auxiliary/mlx5_core.sf.4
$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that mlx5 upper
layer who wish to support sf port management in switchdev mode can
perform its task whenever eswitch mode is set to switchdev or before
eswitch is disabled.
Initialize sf port table on such eswitch event.
Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
or by its unique port index:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88 -jp
{
"port": {
"pci/0000:06:00.0/32768": {
"type": "eth",
"netdev": "ens2f0npf0sf88",
"flavour": "pcisf",
"controller": 0,
"pfnum": 0,
"sfnum": 88,
"external": false,
"splittable": false,
"function": {
"hw_addr": "00:00:00:00:00:00",
"state": "inactive",
"opstate": "detached"
}
}
}
}
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add helpers to enable/disable eswitch port, register its devlink port and
load its representor.
Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Prepare eswitch to handle SF vport during
(a) querying eswitch functions
(b) egress ACL creation
(c) account for SF vports in total vports calculation
Assign a dedicated placeholder for SFs vports and their representors.
They are placed after VFs vports and before ECPF vports as below:
[PF,VF0,...,VFn,SF0,...SFm,ECPF,UPLINK].
Change functions to map SF's vport numbers to indices when
accessing the vports or representors arrays, and vice versa.
Signed-off-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Add auxiliary device driver for mlx5 subfunction auxiliary device.
A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.
As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as
$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4
$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4
$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4
$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false
$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88
$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112
In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].
[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Introduce API to add and delete an auxiliary device for an SF.
Each SF has its own dedicated window in the PCI BAR 2.
SF device is similar to PCI PF and VF that supports multiple class of
devices such as net, rdma and vdpa.
SF device will be added or removed in subsequent patch during SF
devlink port function state change command.
A subfunction device exposes user supplied subfunction number which will
be further used by systemd/udev to have deterministic name for its
netdevice and rdma device.
An mlx5 subfunction auxiliary device example:
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:00:00 state inactive opstate detached
$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
function:
hw_addr 00:00:00:00:88:88 state inactive opstate detached
$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active
On activation,
$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4
$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.
Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.
This enables single entity to subscribe, query and rearm the event
for a function.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If the MII interface is used, the PHY is the clock master, thus don't
set the clock rate. On Zynq-7000, this will prevent the following
warning:
macb e000b000.ethernet eth0: unable to generate target frequency: 25000000 Hz
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210120194303.28268-1-michael@walle.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The tango4 platform is getting removed, and this driver has no
other known users, so it can be removed.
Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Mans Rullgard <mans@mansr.com>
Link: https://lore.kernel.org/r/20210120150703.1629527-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The comment is quite weird, there is no such thing as a vendor-specific
VPD id. 0x82 is the value of PCI_VPD_LRDT_ID_STRING. So what we are
doing here is simply checking whether the byte at VPD address VPD_BASE
is a valid string LRDT, same as what is done a few lines later in
the code.
LRDT = Large Resource Data Tag, see PCI 2.2 spec, VPD chapter
v2:
- don't set VPD_BASE / VPD_BASE_OLD separately
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/644ef22f-e86a-5cc1-0f27-f873ab165696@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since kmalloc() is nowadays [1] guaranteed to return naturally
aligned (i.e., aligned to the size itself) memory for power-of-2
sizes, we don't need to over-allocate the align amount, compute an
aligned address within the allocation, and (for later freeing) also
storing the original pointer [2].
Instead, just round up the length we want to allocate to the alignment
requirements, then round that up to the next power of 2. In theory,
this could allocate up to about twice as much memory as we needed. In
practice, (a) kmalloc() would in most cases anyway return a
power-of-2-sized allocation and (b) with the default values of the
bdRingLen[RT]x fields, the length is already itself a power of 2
greater than the alignment.
So we actually end up saving memory compared to the current
situtation (e.g. for tx, we currently allocate 128+32 bytes, which
kmalloc() likely rounds up to 192 or 256; with this patch, we just
allocate 128 bytes.) Also struct ucc_geth_private becomes a little
smaller.
[1] 59bb47985c ("mm, sl[aou]b: guarantee natural alignment for
kmalloc(power-of-two)")
[2] That storing was anyway done in a u32, which works on 32 bit
machines, but is not very elegant and certainly makes a reader of the
code pause for a while.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The numQueuesTx and numQueuesRx members of struct ucc_geth_info are
never set to anything but 1, and never have been. It's unclear how
well the code supporting multiple queues would work. Until somebody
wants to play with enabling that, help the compiler eliminate a lot of
dead code and loops that are not really loops by creating static
inline helpers. If and when the numQueuesTx/numQueuesRx fields are
re-introduced, it suffices to update those helper to return the
appropriate field.
This cuts the .text segment of ucc_geth.o by 8%.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The translation from the ucc_geth_num_of_threads enum value to the
actual count can be written somewhat more compactly with a small
lookup table, allowing us to replace the four switch statements.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bd_mem_part member of ucc_geth_info always has the value
MEM_PART_SYSTEM, and AFAICT, there has never been any code setting it
to any other value. Moreover, muram is a somewhat precious resource,
so there's no point using that when normal memory serves just as well.
Apart from removing a lot of dead code, this is also motivated by
wanting to clean up the "store result from kmalloc() in a u32" mess.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These macros both have the value 32, there's no point first
initializing align to a lower value.
If anything, one could throw in a
BUILD_BUG_ON(UCC_GETH_TX_BD_RING_ALIGNMENT < 4), but it's not worth it
- lots of code depends on named constants having sensible values.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
struct ucc_geth_info is somewhat large, and on systems with only one
or two UCC instances, that just wastes a few KB of memory. So
allocate and populate a chunk of memory at probe time instead of
initializing them all during driver init.
Note that the existing "ug_info == NULL" check was dead code, as the
address of some static array element can obviously never be NULL.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reduce the code duplication a bit by moving the parsing of
rx-clock-name and the fallback handling to a helper function.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These fields are only used within ucc_geth_startup(), so they might as
well be local variables in that function rather than being stashed in
struct ucc_geth_private.
Aside from making that struct a tiny bit smaller, it also shortens
some lines (getting rid of pointless casts while here), and fixes the
problems with using IS_ERR_VALUE() on a u32 as explained in commit
800cd6fb76 ("soc: fsl: qe: change return type of cpm_muram_alloc()
to s32").
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These buffers have all just been handed out from qe_muram_alloc(), aka
cpm_muram_alloc(), and the helper cpm_muram_alloc_common() already
does
memset_io(cpm_muram_addr(start), 0, size);
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This removes the explicit NULL checks, and allows us to stop storing
at least some of the _offset values separately.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In theory, such a read-after-write might be required by the hardware,
but nothing in the data sheet suggests that to be the case. The name
test also suggests that it's some debug leftover.
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When has_prg_eth1_rgmii_rx_delay is true then we support RX delays
between 0ps and 3000ps in 200ps steps. Swap the validation of the RX
delay based on the has_prg_eth1_rgmii_rx_delay flag so the 200ps check
is now applied correctly on G12A SoCs (instead of only allow 0ps or
2000ps on G12A, but 0..3000ps in 200ps steps on older SoCs which don't
support that).
Fixes: de94fc104d ("net: stmmac: dwmac-meson8b: add support for the RGMII RX delay on G12A")
Reported-by: Martijn van Deventer <martijn@martijnvandeventer.nl>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/r/20210119202424.591349-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Conflicts:
drivers/net/can/dev.c
commit 03f16c5075 ("can: dev: can_restart: fix use after free bug")
commit 3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")
Code move.
drivers/net/dsa/b53/b53_common.c
commit 8e4052c32d ("net: dsa: b53: fix an off by one in checking "vlan->vid"")
commit b7a9e0da2d ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects")
Field rename.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On the error path, it should goto the error handling label to free
allocated memory rather than directly return.
Fixes: 31bc72d976 ("net: systemport: fetch and use clock resources")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210120044423.1704-1-bianpan2016@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Multicast entries in the MAC table use the high bits of the MAC
address to encode the ports that should get the packets. But this port
mask does not work for the CPU port, to receive these packets on the
CPU port the MAC_CPU_COPY flag must be set.
Because of this IPv6 was effectively not working because neighbor
solicitations were never received. This was not apparent before commit
9403c158 (net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb
entries) as the IPv6 entries were broken so all incoming IPv6
multicast was then treated as unknown and flooded on all ports.
To fix this problem rework the ocelot_mact_learn() to set the
MAC_CPU_COPY flag when a multicast entry that target the CPU port is
added. For this we have to read back the ports endcoded in the pseudo
MAC address by the caller. It is not a very nice design but that avoid
changing the callers and should make backporting easier.
Signed-off-by: Alban Bedel <alban.bedel@aerq.com>
Fixes: 9403c158b8 ("net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries")
Link: https://lore.kernel.org/r/20210119140638.203374-1-alban.bedel@aerq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the smsc911x driver has mininal power management: during
driver probe, the device is powered up, and during driver remove, it is
powered down.
Improve power management by making it more fine-grained:
1. Power the device down when driver probe is finished,
2. Power the device (down) when it is opened (closed),
3. Make sure the device is powered during PHY access.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20210118150857.796943-1-geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The TI AM64x SoCs Gigabit Ethernet Switch subsystem (CPSW3g NUSS) has three
ports (2 ext. ports) and provides Ethernet packet communication for the
device and can be configured in multi port mode or as an Ethernet switch.
This patch adds support for the corresponding CPSW3g version.
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The AM642x CPSW3g is similar to j721e-cpswxg except its ALE table size is
512 entries. Add entry for the same.
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the glue layer's functions to convert the dma_addr_t to and from CPPI5
address (with the ASEL bits), which should be used within the descriptors
and data buffers.
- Per channel coherency support
The DMAs use the 'ASEL' bits to select data and configuration fetch path.
The ASEL bits are placed at the unused parts of any address field used by
the DMAs (pointers to descriptors, addresses in descriptors, ring base
addresses). The ASEL is not part of the address (the DMAs can address
48bits). Individual channels can be configured to be coherent (via ACP
port) or non coherent individually by configuring the ASEL to appropriate
value.
[1] https://lore.kernel.org/patchwork/cover/1350756/
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For DMA API the DMA device should be used as cpsw does not accesses to
descriptors or data buffers in any ways. The DMA does.
Also, drop dma_coerce_mask_and_coherent() setting on CPSW device, as it
should be done by DMA driver which does data movement.
This is required for adding AM64x CPSW3g support where DMA coherency
supported per DMA channel.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Co-developed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sh_eth_close() does a synchronous power down of the device before
marking it closed. Revert the order, to make sure the device is never
marked opened while suspended.
While at it, use pm_runtime_put() instead of pm_runtime_put_sync(), as
there is no reason to do a synchronous power down.
Fixes: 7fa2955ff7 ("sh_eth: Fix sleeping function called from invalid context")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/20210118150812.796791-1-geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This reverts commit fbdd0049d9.
Due to commit in fixes tag, netdevice events were received only in one net
namespace of mlx5_core_dev. Due to this when netdevice events arrive in
net namespace other than net namespace of mlx5_core_dev, they are missed.
This results in empty GID table due to RDMA device being detached from its
net device.
Hence, revert back to receive netdevice events in all net namespaces to
restore back RDMA functionality in non init_net net namespace. The
deadlock will have to be addressed in another patch.
Fixes: fbdd0049d9 ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion")
Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes ixgbevf support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes ixgbe support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes igc support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes igbvf support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using skb_csum_is_sctp is a easier way to validate it's a SCTP
CRC checksum offload packet, and there is no need to parse the
packet to check its proto field, especially when it's a UDP or
GRE encapped packet.
So this patch also makes igb support SCTP CRC checksum offload
for UDP and GRE encapped packets.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch is to define a inline function skb_csum_is_sctp(), and
also replace all places where it checks if it's a SCTP CSUM skb.
This function would be used later in many networking drivers in
the following patches.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wolfram reports that his R-Car H2-based Lager board can no longer be
rebooted in v5.11-rc1, as it crashes with an imprecise external abort.
The issue can be reproduced on other boards (e.g. Koelsch with R-Car
M2-W) too, if CONFIG_IP_PNP is disabled, and the Ethernet interface is
down at reboot time:
Unhandled fault: imprecise external abort (0x1406) at 0x00000000
pgd = (ptrval)
[00000000] *pgd=422b6835, *pte=00000000, *ppte=00000000
Internal error: : 1406 [#1] ARM
Modules linked in:
CPU: 0 PID: 1105 Comm: init Tainted: G W 5.10.0-rc1-00402-ge2f016cf7751 #1048
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
PC is at sh_mdio_ctrl+0x44/0x60
LR is at sh_mmd_ctrl+0x20/0x24
...
Backtrace:
[<c0451f30>] (sh_mdio_ctrl) from [<c0451fd4>] (sh_mmd_ctrl+0x20/0x24)
r7:0000001f r6:00000020 r5:00000002 r4:c22a1dc4
[<c0451fb4>] (sh_mmd_ctrl) from [<c044fc18>] (mdiobb_cmd+0x38/0xa8)
[<c044fbe0>] (mdiobb_cmd) from [<c044feb8>] (mdiobb_read+0x58/0xdc)
r9:c229f844 r8:c0c329dc r7:c221e000 r6:00000001 r5:c22a1dc4 r4:00000001
[<c044fe60>] (mdiobb_read) from [<c044c854>] (__mdiobus_read+0x74/0xe0)
r7:0000001f r6:00000001 r5:c221e000 r4:c221e000
[<c044c7e0>] (__mdiobus_read) from [<c044c9d8>] (mdiobus_read+0x40/0x54)
r7:0000001f r6:00000001 r5:c221e000 r4:c221e458
[<c044c998>] (mdiobus_read) from [<c044d678>] (phy_read+0x1c/0x20)
r7:ffffe000 r6:c221e470 r5:00000200 r4:c229f800
[<c044d65c>] (phy_read) from [<c044d94c>] (kszphy_config_intr+0x44/0x80)
[<c044d908>] (kszphy_config_intr) from [<c044694c>] (phy_disable_interrupts+0x44/0x50)
r5:c229f800 r4:c229f800
[<c0446908>] (phy_disable_interrupts) from [<c0449370>] (phy_shutdown+0x18/0x1c)
r5:c229f800 r4:c229f804
[<c0449358>] (phy_shutdown) from [<c040066c>] (device_shutdown+0x168/0x1f8)
[<c0400504>] (device_shutdown) from [<c013de44>] (kernel_restart_prepare+0x3c/0x48)
r9:c22d2000 r8:c0100264 r7:c0b0d034 r6:00000000 r5:4321fedc r4:00000000
[<c013de08>] (kernel_restart_prepare) from [<c013dee0>] (kernel_restart+0x1c/0x60)
[<c013dec4>] (kernel_restart) from [<c013e1d8>] (__do_sys_reboot+0x168/0x208)
r5:4321fedc r4:01234567
[<c013e070>] (__do_sys_reboot) from [<c013e2e8>] (sys_reboot+0x18/0x1c)
r7:00000058 r6:00000000 r5:00000000 r4:00000000
[<c013e2d0>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
As of commit e2f016cf77 ("net: phy: add a shutdown procedure"),
system reboot calls phy_disable_interrupts() during shutdown. As this
happens unconditionally, the PHY registers may be accessed while the
device is suspended, causing undefined behavior, which may crash the
system.
Fix this by wrapping the PHY bitbang accessors in the sh_eth driver by
wrappers that take care of Runtime PM, to resume the device when needed.
Reported-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When DEBUG is defined this error occurs
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1505:36: error:
‘struct net_device’ has no member named ‘ae_handle’;
did you mean ‘rx_handler’?
assert(skb->queue_mapping < ndev->ae_handle->q_num);
^~~~~~~~~
ae_handle is an element of struct hns_nic_priv, so change
ndev to priv.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20210117191044.533725-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The blamed commit was too aggressive, and it made ocelot_netdevice_event
react only to network interface events emitted for the ocelot switch
ports.
In fact, only the PRECHANGEUPPER should have had that check.
When we ignore all events that are not for us, we miss the fact that the
upper of the LAG changes, and the bonding interface gets enslaved to a
bridge. This is an operation we could offload under certain conditions.
Fixes: 7afb3e575e ("net: mscc: ocelot: don't handle netdev events for other netdevs")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20210118135210.2666246-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Enable interrupt based Tx completions to improve latency for T5.
The consumer index (CIDX) will now come via interrupts so that Tx
SKBs can be freed up sooner in Rx path. Also, enforce CIDX flush
threshold override (CIDXFTHRESHO) to improve latency for slow
traffic. This ensures that the interrupt is generated immediately
whenever hardware catches up with driver (i.e. CIDX == PIDX is
reached), which is often the case for slow traffic.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Link: https://lore.kernel.org/r/20210115102059.6846-1-rajur@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/toshiba/spider_net.c:263: warning: Function parameter or member 'hwdescr' not described in 'spider_net_get_descr_status'
drivers/net/ethernet/toshiba/spider_net.c:263: warning: Excess function parameter 'descr' description in 'spider_net_get_descr_status'
drivers/net/ethernet/toshiba/spider_net.c:554: warning: Function parameter or member 'netdev' not described in 'spider_net_get_multicast_hash'
drivers/net/ethernet/toshiba/spider_net.c:902: warning: Function parameter or member 't' not described in 'spider_net_cleanup_tx_ring'
drivers/net/ethernet/toshiba/spider_net.c:902: warning: Excess function parameter 'card' description in 'spider_net_cleanup_tx_ring'
drivers/net/ethernet/toshiba/spider_net.c:1074: warning: Function parameter or member 'card' not described in 'spider_net_resync_head_ptr'
drivers/net/ethernet/toshiba/spider_net.c🔢 warning: Function parameter or member 'napi' not described in 'spider_net_poll'
drivers/net/ethernet/toshiba/spider_net.c🔢 warning: Excess function parameter 'netdev' description in 'spider_net_poll'
drivers/net/ethernet/toshiba/spider_net.c:1278: warning: Function parameter or member 'p' not described in 'spider_net_set_mac'
drivers/net/ethernet/toshiba/spider_net.c:1278: warning: Excess function parameter 'ptr' description in 'spider_net_set_mac'
drivers/net/ethernet/toshiba/spider_net.c:1350: warning: Function parameter or member 'error_reg1' not described in 'spider_net_handle_error_irq'
drivers/net/ethernet/toshiba/spider_net.c:1350: warning: Function parameter or member 'error_reg2' not described in 'spider_net_handle_error_irq'
drivers/net/ethernet/toshiba/spider_net.c:1968: warning: Function parameter or member 't' not described in 'spider_net_link_phy'
drivers/net/ethernet/toshiba/spider_net.c:1968: warning: Excess function parameter 'data' description in 'spider_net_link_phy'
drivers/net/ethernet/toshiba/spider_net.c:2149: warning: Function parameter or member 'work' not described in 'spider_net_tx_timeout_task'
drivers/net/ethernet/toshiba/spider_net.c:2149: warning: Excess function parameter 'data' description in 'spider_net_tx_timeout_task'
drivers/net/ethernet/toshiba/spider_net.c:2182: warning: Function parameter or member 'txqueue' not described in 'spider_net_tx_timeout'
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'irq' not described in 'gelic_card_interrupt'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'ptr' not described in 'gelic_card_interrupt'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1407: warning: Function parameter or member 'txqueue' not described in 'gelic_net_tx_timeout'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1439: warning: Function parameter or member 'napi' not described in 'gelic_ether_setup_netdev_ops'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1639: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_probe'
drivers/net/ethernet/toshiba/ps3_gelic_net.c:1795: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_remove'
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes the following W=1 kernel build warning(s):
from drivers/net/ethernet/ibm/ibmvnic.c:35:
inlined from ‘handle_vpd_rsp’ at drivers/net/ethernet/ibm/ibmvnic.c:4124:3:
drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_data'
drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'skb' not described in 'build_hdr_data'
drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_len' not described in 'build_hdr_data'
drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_data' not described in 'build_hdr_data'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_field' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_data' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'len' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_len' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'scrq_arr' not described in 'create_hdr_descs'
drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'txbuff' not described in 'build_hdr_descs_arr'
drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'num_entries' not described in 'build_hdr_descs_arr'
drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_descs_arr'
drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'adapter' not described in 'do_change_param_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'rwi' not described in 'do_change_param_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'reset_state' not described in 'do_change_param_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'adapter' not described in 'do_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'rwi' not described in 'do_reset'
drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'reset_state' not described in 'do_reset'
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/ti/am65-cpts.c:736: warning: Function parameter or member 'en' not described in 'am65_cpts_rx_enable'
drivers/net/ethernet/ti/am65-cpts.c:736: warning: Excess function parameter 'skb' description in 'am65_cpts_rx_enable'
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/ti/am65-cpsw-qos.c:364: warning: Function parameter or member 'ndev' not described in 'am65_cpsw_timer_set'
drivers/net/ethernet/ti/am65-cpsw-qos.c:364: warning: Function parameter or member 'est_new' not described in 'am65_cpsw_timer_set'
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fixes the following W=1 kernel build warning(s):
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'dev' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'desc' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'name' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'index' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'value' not described in 'try_toggle_control_gpio'
drivers/net/ethernet/smsc/smc91x.c:2200: warning: Function parameter or member 'nsdelay' not described in 'try_toggle_control_gpio'
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Using devlink-sb, we can configure 12/16 (the important 75%) of the
switch's controlling watermarks for congestion drops, and we can monitor
50% of the watermark occupancies (we can monitor the reservation
watermarks, but not the sharing watermarks, which are exposed as pool
sizes).
The following definitions can be made:
SB_BUF=0 # The devlink-sb for frame buffers
SB_REF=1 # The devlink-sb for frame references
POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances
# have one of these.
POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances
# have one of these.
Editing the hardware watermarks is done in the following way:
BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING
REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING
BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR
REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR
Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly
by modifying the corresponding pool size. By default, the pool size has
maximum size, so this can be skipped.
devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \
size 129840 thtype static
Since by default there is no buffer reservation, the above command has
maxed out BUF_COL_SHR_I(dp=0).
Configuring the per-port reservation watermark (P_RSRV) is done in the
following way:
devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \
pool $POOL_ING th 1000
The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this
command, the sharing watermarks are internally reconfigured with 1000
bytes less, i.e. from 129840 bytes to 128840 bytes.
Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in
the following way:
for tc in {0..7}; do
devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \
type ingress pool $POOL_ING \
th 3000
done
The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes.
The sharing watermarks are again reconfigured with 24000 bytes less.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is meant to be a gentle introduction into the world of watermarks
on ocelot. The code is placed in ocelot_devlink.c because it will be
integrated with devlink, even if it isn't right now.
My first step was intended to be to replicate the default configuration
of the congestion watermarks programatically, since they are now going
to be tuned by the user.
But after studying and understanding through trial and error how they
work, I now believe that the configuration used out of reset does not do
justice to the word "reservation", since the sum of all reservations
exceeds the total amount of resources (otherwise said, all reservations
cannot be fulfilled at the same time, which means that, contrary to the
reference manual, they don't guarantee anything).
As an example, here's a dump of the reservation watermarks for frame
buffers, for port 0 (for brevity, the ports 1-6 were omitted, but they
have the same configuration):
BUF_Q_RSRV_I(port 0, prio 0) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 1) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 2) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 3) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 4) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 5) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 6) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 7) = max 3000 bytes
Otherwise said, every port-tc has an ingress reservation of 3000 bytes,
and there are 7 ports in VSC9959 Felix (6 user ports and 1 CPU port).
Concentrating only on the ingress reservations, there are, in total,
8 [traffic classes] x 7 [ports] x 3000 [bytes] = 168,000 bytes of memory
reserved on ingress.
But, surprise, Felix only has 128 KB of packet buffer in total...
A similar thing happens with Seville, which has a larger packet buffer,
but also more ports, and the default configuration is also overcommitted.
This patch disables the (apparently) bogus reservations and moves all
resources to the shared area. This way, real reservations can be set up
by the user, using devlink-sb.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add devlink integration into the mscc_ocelot switchdev driver. All
physical ports (i.e. the unused ones as well) except the CPU port module
at ocelot->num_phys_ports are registered with devlink, and that requires
keeping the devlink_port structure outside struct ocelot_port_private,
since the latter has a 1:1 mapping with a struct net_device (which does
not exist for unused ports).
Since we use devlink_port_type_eth_set to link the devlink port to the
net_device, we can as well remove the .ndo_get_phys_port_name and
.ndo_get_port_parent_id implementations, since devlink takes care of
retrieving the port name and number automatically, once
.ndo_get_devlink_port is implemented.
Note that the felix DSA driver is already integrated with devlink by
default, since that is a thing that the DSA core takes care of. This is
the reason why these devlink stubs were put in ocelot_net.c and not in
the common library. It is also the reason why ocelot::devlink is a
pointer and not a full structure embedded inside struct ocelot: because
the mscc_ocelot driver allocates that by itself (as the container of
struct ocelot, in fact), but in the case of felix, it is DSA who
allocates the devlink, and felix just propagates the pointer towards
struct ocelot.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This is a leftover of commit 69df578c5f ("net: mscc: ocelot: eliminate
confusion between CPU and NPI port") which renamed that function.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We'll need to read back the watermark thresholds and occupancy from
hardware (for devlink-sb integration), not only to write them as we did
so far in ocelot_port_set_maxlen. So introduce 2 new functions in struct
ocelot_ops, similar to wm_enc, and implement them for the 3 supported
mscc_ocelot switches.
Remove the INUSE and MAXUSE unpacking helpers for the QSYS_RES_STAT
register, because that doesn't scale with the number of switches that
mscc_ocelot supports now. They have different bit widths for the
watermarks, and we need function pointers to abstract that difference
away.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of reading these values from the reference manual and writing
them down into the driver, it appears that the hardware gives us the
option of detecting them dynamically.
The number of frame references corresponds to what the reference manual
notes, however it seems that the frame buffers are reported as slightly
less than the books would indicate. On VSC9959 (Felix), the books say it
should have 128KB of packet buffer, but the registers indicate only
129840 bytes (126.79 KB). Also, the unit of measurement for FREECNT from
the documentation of all these devices is incorrect (taken from an older
generation). This was confirmed by Younes Leroul from Microchip support.
Not having anything better to do with these values at the moment* (this
will change soon), let's just print them.
*The frame buffer size is, in fact, used to calculate the tail dropping
watermarks.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-01-16
1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support,
that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman.
2) Add support for using kernel module global variables (__ksym externs in BPF
programs) retrieved via module's BTF, from Andrii Nakryiko.
3) Generalize BPF stackmap's buildid retrieval and add support to have buildid
stored in mmap2 event for perf, from Jiri Olsa.
4) Various fixes for cross-building BPF sefltests out-of-tree which then will
unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker.
5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann.
6) Clean up driver's XDP buffer init and split into two helpers to init per-
descriptor and non-changing fields during processing, from Lorenzo Bianconi.
7) Minor misc improvements to libbpf & bpftool, from Ian Rogers.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits)
perf: Add build id data in mmap2 event
bpf: Add size arg to build_id_parse function
bpf: Move stack_map_get_build_id into lib
bpf: Document new atomic instructions
bpf: Add tests for new BPF atomic operations
bpf: Add bitwise atomic instructions
bpf: Pull out a macro for interpreting atomic ALU operations
bpf: Add instructions for atomic_[cmp]xchg
bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
bpf: Move BPF_STX reserved field check into BPF_STX verifier code
bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
bpf: x86: Factor out a lookup table for some ALU opcodes
bpf: x86: Factor out emission of REX byte
bpf: x86: Factor out emission of ModR/M for *(reg + off)
tools/bpftool: Add -Wall when building BPF programs
bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
selftests/bpf: Install btf_dump test cases
selftests/bpf: Fix installation of urandom_read
selftests/bpf: Move generated test files to $(TEST_GEN_FILES)
selftests/bpf: Fix out-of-tree build
...
====================
Link: https://lore.kernel.org/r/20210116012922.17823-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Defining DEBUG should only be done in development.
So remove DEBUG.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Link: https://lore.kernel.org/r/20210115153128.131026-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In rvu_mbox_handler_cgx_mac_addr_get()
and rvu_mbox_handler_cgx_mac_addr_set(),
the msg is expected only from PFs that are mapped to CGX LMACs.
It should be checked before mapping,
so we add the is_cgx_config_permitted() in the functions.
Fixes: 96be2e0da8 ("octeontx2-af: Support for MAC address filters in CGX")
Signed-off-by: Yingjie Wang <wangyingjie55@126.com>
Reviewed-by: Geetha sowjanya<gakula@marvell.com>
Link: https://lore.kernel.org/r/1610719804-35230-1-git-send-email-wangyingjie55@126.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.
In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.
This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.
All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
Removing initialization of nrxq and rxq_size in uld_info. As
ipsec uses nic queues only, there is no need to create uld
rx queues for ipsec.
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Link: https://lore.kernel.org/r/20210113044302.25522-1-ayush.sawal@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
prestera_bridge_port_vlan_add should have been called with vlan->vid,
however this was masked by the presence of the local vid variable and I
did not notice the build warning.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: b7a9e0da2d ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Taras Chornyi <tchornyi@marvell.com>
Link: https://lore.kernel.org/r/20210114083556.2274440-1-olteanv@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Synopsys TSN MAC supports Qbv base times in the past, but only up to a
certain limit. As a result, a taprio qdisc configuration with a small
base time (for example when treating the base time as a simple phase
offset) is not applied by the hardware and silently ignored.
This was observed on an NXP i.MX8MPlus device, but likely affects all
TSN-variants of the MAC.
Fix the issue by making sure the base time is in the future, pushing it by
an integer amount of cycle times if needed. (a similar check is already
done in several other taprio implementations, see for example
drivers/net/ethernet/intel/igc/igc_tsn.c#L116 or
drivers/net/dsa/sja1105/sja1105_ptp.h#L39).
Fixes: b60189e039 ("net: stmmac: Integrate EST with TAPRIO scheduler API")
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Link: https://lore.kernel.org/r/20210113131557.24651-2-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When configuring a 802.1Qbv schedule through the tc taprio qdisc on an NXP
i.MX8MPlus device, the effective cycle time differed from the requested one
by N*96ns, with N number of entries in the Qbv Gate Control List. This is
because the driver was adding a 96ns margin to each interval of the GCL,
apparently to account for the IPG. The problem was observed on NXP
i.MX8MPlus devices but likely affected all devices relying on the same
configuration callback (dwmac 4.00, 4.10, 5.10 variants).
Fix the issue by removing the margins, and simply setup the MAC with the
provided cycle time value. This is the behavior expected by the user-space
API, as altering the Qbv schedule timings would break standards conformance.
This is also the behavior of several other Ethernet MAC implementations
supporting taprio, including the dwxgmac variant of stmmac.
Fixes: 504723af0d ("net: stmmac: Add basic EST support for GMAC5+")
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Link: https://lore.kernel.org/r/20210113131557.24651-1-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the original mtu is not used when the mtu is updated,
the mtu is aligned with cache, this will get an incorrect.
For example, if you want to configure the mtu to be 1500,
but mtu 1536 is configured in fact.
Fixed: eaf4fac478 ("net: stmmac: Do not accept invalid MTU values")
Signed-off-by: David Wu <david.wu@rock-chips.com>
Link: https://lore.kernel.org/r/20210113034109.27865-1-david.wu@rock-chips.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
TID stuck is seen when there is a race in
CPL_PASS_ACCEPT_RPL/CPL_ABORT_REQ and abort is arriving
before the accept reply, which sets the queue number.
In this case HW ends up sending CPL_ABORT_RPL_RSS to an
incorrect ingress queue.
V1->V2:
- Removed the unused variable len in chtls_set_quiesce_ctrl().
V2->V3:
- As kfree_skb() has a check for null skb, so removed this
check before calling kfree_skb() in func chtls_send_reset().
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Link: https://lore.kernel.org/r/20210112053600.24590-1-ayush.sawal@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, the function i40e_construct_skb_zc only frees the input xdp
buffer when the output skb is successfully built. On error, the
function i40e_clean_rx_irq_zc does not commit anything for the current
packet descriptor and simply exits the packet descriptor processing
loop, with the plan to restart the processing of this descriptor on
the next invocation. Therefore, on error the ring next-to-clean
pointer should not advance, the xdp i.e. *bi buffer should not be
freed and the current buffer info should not be invalidated by setting
*bi to NULL. Therefore, the *bi should only be set to NULL when the
function i40e_construct_skb_zc is successful, otherwise a NULL *bi
will be dereferenced when the work for the current descriptor is
eventually restarted.
Fixes: 3b4f0b66c2 ("i40e, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL")
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/r/20210111181138.49757-1-cristian.dumitrescu@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Check if the interface is indeed connected to a MAC before trying to
close the DPMAC object representing it. Without this check we end up
working with a NULL pointer.
Fixes: d87e606373 ("dpaa2-mac: export MAC counters even when in TYPE_FIXED")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210111171802.1826324-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Declare Rx VLAN filtering as supported and user-changeable only when
there are VLAN filtering entries available on the DPNI object. Even
then, rx-vlan-filtering is by default disabled.
Also, populate the .ndo_vlan_rx_add_vid() and .ndo_vlan_rx_kill_vid()
callbacks for adding and removing a specific VLAN from the VLAN table.
Signed-off-by: Ionut-robert Aron <ionut-robert.aron@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20210111170725.1818218-1-ciorneiioana@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds support to install flow rules using ipv4 proto or
ipv6 next header field to distinguish between tcp/udp/sctp/esp/ah
flows when user doesn't specify the other match creteria related to
the flow such as tcp/udp/sctp source port and destination port, ah/esp
spi value. This is achieved by matching the layer type extracted by
NPC HW into the search key. Modified the driver to use Ethertype as
match criteria when user doesn't specify source IP address and dest
IP address. The esp/ah flow rule matching using security parameter
index (spi) is not supported as of now since the field is not extracted
into MCAM search key. Modified npc_check_field function to return bool.
Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Link: https://lore.kernel.org/r/20210111112537.3277-1-naveenm@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use of __napi_schedule_irqoff() is not safe with PREEMPT_RT in which
hard interrupts are not disabled while running the threaded interrupt.
Using __napi_schedule() works for both PREEMPT_RT and mainline Linux,
just at the cost of an additional check if interrupts are disabled for
mainline (since they are already disabled).
Similar to the fix done for enetc commit 215602a8d2 ("enetc: use
napi_schedule to be compatible with PREEMPT_RT")
Signed-off-by: Seb Laveze <sebastien.laveze@nxp.com>
Link: https://lore.kernel.org/r/20210112140121.1487619-1-sebastien.laveze@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MLX5_IPSEC_DEV() is always defined, no need to protect it under config
flag CONFIG_MLX5_EN_IPSEC, especially in slow path.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Feature check functions are in the TX fast-path of all SKBs, not only
IPsec traffic.
Move the IPsec feature check function into a header and turn it inline.
Use a stub and clean the config flag condition in Eth main driver file.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All IPsec logic should be wrapped under the compile flag,
including its checksum logic.
Introduce an inline function in ipsec datapath header,
with a corresponding stub.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The zone member is of type u16 so there is no reason to apply
the zone mask on it. This is also matching the call to set a
match in other places which don't need and don't apply the mask.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
miss_rule and prio_s args are not being referenced before assigned
so there is no need to init them.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
No need to pass zero spec to mlx5_add_flow_rules() as the
function can handle null spec.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Miss path handling of tc multi chain filters (i.e. filters that are
defined on chain > 0) requires the hardware to communicate to the
driver the last chain that was processed. This is possible only when
the hardware is capable of performing the combination of modify header
and forward to table actions. Currently, if the hardware is missing
this capability then the driver only offloads rules that are defined
on tc chain 0 prio 1. However, this restriction can be relaxed because
packets that miss from chain 0 are processed through all the
priorities by tc software.
Allow the offload of all the supported priorities for chain 0 even
when the hardware is not capable to perform modify header and goto
table actions.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use tunnel_stateless_mpls_over_udp instead of
MLX5_FLEX_PROTO_CW_MPLS_UDP since new devices have native support for
mpls over udp and do not rely on flex parser.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
esw->qos.enabled will only be true if both MLX5_CAP_GEN(dev, qos) and
MLX5_CAP_QOS(dev, esw_scheduling) are true. Therefore, remove them from
the condition in and rely only on esw->qos.enabled.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the FW tells the driver to retry the INSTALL_UPDATE command after
it has cleared the NVM area, the driver is not clearing the previously
used ALLOWED_TO_DEFRAG flag. As a result the FW tries to defrag the NVM
area a second time in a loop and can fail the request.
Fixes: 1432c3f6a6 ("bnxt_en: Retry installing FW package under NO_SPACE error condition.")
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The function bnxt_get_ulp_stat_ctxs() does not count the stats contexts
used by the RDMA driver correctly when the RDMA driver is freeing the
MSIX vectors. It assumes that if the RDMA driver is registered, the
additional stats contexts will be needed. This is not true when the
RDMA driver is about to unregister and frees the MSIX vectors.
This slight error leads to over accouting of the stats contexts needed
after the RDMA driver has unloaded. This will cause some firmware
warning and error messages in dmesg during subsequent config. changes
or ifdown/ifup.
Fix it by properly accouting for extra stats contexts only if the
RDMA driver is registered and MSIX vectors have been successfully
requested.
Fixes: c027c6b4e9 ("bnxt_en: get rid of num_stat_ctxs variable")
Reviewed-by: Yongping Zhang <yongping.zhang@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Instead of doing the full DASH check each time r8168_check_dash() is
called, let's do it once in probe and store DASH capabilities in a
new rtl8169_private member.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend the mask to include the checksum failure bits. This allows to
simplify the condition in rtl8169_rx_csum().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At least from chip version 25 the vendor driver sets these rx flags
for all chip versions if WOL is enabled. Therefore I wouldn't consider
it a quirk, so let's rename the function.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The PHYLIB must be selected to provide mdiobus_*() functions, and the
MICREL_PHY is necessary too, as that is the only possible PHY attached
to the KS8851 (it is the internal PHY).
Fixes: ef3631220d ("net: ks8851: Register MDIO bus and the internal PHY")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210111125046.36326-1-marex@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Patch didn't fix any issue, just improve parse flow
and align ipv4 parse flow with ipv6 parse flow.
Currently ipv4 kenguru parser first check IP protocol(TCP/UDP)
and then destination IP address.
Patch introduce reverse ipv4 parse, first destination IP address parsed
and only then IP protocol.
This would allow extend capability for packet L4 parsing and align ipv4
parsing flow with ipv6.
Suggested-by: Liron Himi <liron@marvell.com>
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Link: https://lore.kernel.org/r/1610289059-14962-1-git-send-email-stefanc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Clean up the remainings of rtl_pll_power_down/up and rename
rtl_pll_power_down() to rtl_prepare_power_down().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Realtek provided a description of bits 6 and 7 in register PMCH.
They configure whether the chip powers down certain PLL in D3hot and
D3cold respectively. They do not actually power down the PLL.
Reflect this in the code and configure D3 PLL powerdown based on
whether WOL is enabled.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There's no known reason why PLL powerdown on D3 shouldn't be enabled
on chip versions 34, 35, 36, and 42. At least the vendor driver doesn't
exclude any of these chip versions.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Packet Processor hardware not connected to MAC flow control unit and
cannot support TX flow control.
This patch disable flow control support.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/r/1610306582-16641-1-git-send-email-stefanc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
So far we don't increase the max read request size if we switch to
jumbo mode before bringing up the interface for the first time.
Let's change this.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Align behavior with r8168 vendor driver and don't reduce max read
request size for RTL8168e in jumbo mode.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As of commit 457e20d659 ("mlxsw: spectrum_switchdev: Avoid returning
errors in commit phase"), the mlxsw driver performs the VLAN object
offloading during the prepare phase. So conversion just seems to be a
matter of removing the code that was running in the commit phase.
Ido Schimmel explains that the reason why mlxsw_sp_span_respin is called
unconditionally is because the bridge driver will ignore -EOPNOTSUPP and
actually add the VLAN on the bridge device - see commit 9c86ce2c1a
("net: bridge: Notify about bridge VLANs") and commit ea47217519
("mlxsw: spectrum_switchdev: Ignore bridge VLAN events"). Since the VLAN
was successfully added on the bridge device, mlxsw_sp_span_respin_work()
should be able to resolve the egress port for a packet that is mirrored
to a gre tap and passes through the bridge device. Therefore keep the
logic as it is.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the introduction of the switchdev API, port attributes were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.
Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceff ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.
It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.
This patch removes the struct switchdev_trans member from switchdev port
attribute notifier structures, and converts drivers to not look at this
member.
In part, this patch contains a revert of my previous commit 2e554a7a5d
("net: dsa: propagate switchdev vlan_filtering prepare phase to
drivers").
For the most part, the conversion was trivial except for:
- Rocker's world implementation based on Broadcom OF-DPA had an odd
implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
done mechanically, by pasting the implementation twice, then only
keeping the code that would get executed during prepare phase on top,
then only keeping the code that gets executed during the commit phase
on bottom, then simplifying the resulting code until this was obtained.
- DSA's offloading of STP state, bridge flags, VLAN filtering and
multicast router could be converted right away. But the ageing time
could not, so a shim was introduced and this was left for a further
commit.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since the introduction of the switchdev API, port objects were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.
Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8eceff ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.
It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.
This patch removes the struct switchdev_trans member from switchdev port
object notifier structures, and converts drivers to not look at this
member.
Where driver conversion is trivial (like in the case of the Marvell
Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
done in this patch.
Where driver conversion needs more attention (DSA, Mellanox Spectrum),
the conversion is left for subsequent patches and here we only fake the
prepare/commit phases at a lower level, just not in the switchdev
notifier itself.
Where the code has a natural structure that is best left alone as a
preparation and a commit phase (as in the case of the Ocelot switch),
that structure is left in place, just made to not depend upon the
switchdev transactional model.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The call path of a switchdev VLAN addition to the bridge looks something
like this today:
nbp_vlan_init
| __br_vlan_set_default_pvid
| | |
| | br_afspec |
| | | |
| | v |
| | br_process_vlan_info |
| | | |
| | v |
| | br_vlan_info |
| | / \ /
| | / \ /
| | / \ /
| | / \ /
v v v v v
nbp_vlan_add br_vlan_add ------+
| ^ ^ | |
| / | | |
| / / / |
\ br_vlan_get_master/ / v
\ ^ / / br_vlan_add_existing
\ | / / |
\ | / / /
\ | / / /
\ | / / /
\ | / / /
v | | v /
__vlan_add /
/ | /
/ | /
v | /
__vlan_vid_add | /
\ | /
v v v
br_switchdev_port_vlan_add
The ranges UAPI was introduced to the bridge in commit bdced7ef78
("bridge: support for multiple vlans and vlan ranges in setlink and
dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
have always been passed one by one, through struct bridge_vlan_info
tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
Then Scott Feldman introduced the switchdev_port_bridge_setlink function
in commit 47f8328bb1 ("switchdev: add new switchdev bridge setlink").
That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
full use of the range. But switchdev_port_bridge_setlink was called like
this:
br_setlink
-> br_afspec
-> switchdev_port_bridge_setlink
Basically, the switchdev and the bridge code were not tightly integrated.
Then commit 41c498b935 ("bridge: restore br_setlink back to original")
came, and switchdev drivers were required to implement
.ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
In the meantime, commits such as 0944d6b5a2 ("bridge: try switchdev op
first in __vlan_vid_add/del") finally made switchdev penetrate the
br_vlan_info() barrier and start to develop the call path we have today.
But remember, br_vlan_info() still receives VLANs one by one.
Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
29ab586c3d ("net: switchdev: Remove bridge bypass support from
switchdev") so that drivers would not implement .ndo_bridge_setlink any
longer. The switchdev_port_bridge_setlink also got deleted.
This refactoring removed the parallel bridge_setlink implementation from
switchdev, and left the only switchdev VLAN objects to be the ones
offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add
(the latter coming from commit 9c86ce2c1a ("net: bridge: Notify about
bridge VLANs")).
That is to say, today the switchdev VLAN object ranges are not used in
the kernel. Refactoring the above call path is a bit complicated, when
the bridge VLAN call path is already a bit complicated.
Let's go off and finish the job of commit 29ab586c3d by deleting the
bogus iteration through the VLAN ranges from the drivers. Some aspects
of this feature never made too much sense in the first place. For
example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
flag supposed to mean, when a port can obviously have a single pvid?
This particular configuration _is_ denied as of commit 6623c60dc2
("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
perspective, the driver still has to play pretend, and only offload the
vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
modify the flags of another, completely unrelated, switchdev VLAN
object! (a VLAN that is PVID will invalidate the PVID flag from whatever
other VLAN had previously been offloaded with switchdev and had that
flag. Yet switchdev never notifies about that change, drivers are
supposed to guess).
Nonetheless, having a VLAN range in the API makes error handling look
scarier than it really is - unwinding on errors and all of that.
When in reality, no one really calls this API with more than one VLAN.
It is all unnecessary complexity.
And despite appearing pretentious (two-phase transactional model and
all), the switchdev API is really sloppy because the VLAN addition and
removal operations are not paired with one another (you can add a VLAN
100 times and delete it just once). The bridge notifies through
switchdev of a VLAN addition not only when the flags of an existing VLAN
change, but also when nothing changes. There are switchdev drivers out
there who don't like adding a VLAN that has already been added, and
those checks don't really belong at driver level. But the fact that the
API contains ranges is yet another factor that prevents this from being
addressed in the future.
Of the existing switchdev pieces of hardware, it appears that only
Mellanox Spectrum supports offloading more than one VLAN at a time,
through mlxsw_sp_port_vlan_set. I have kept that code internal to the
driver, because there is some more bookkeeping that makes use of it, but
I deleted it from the switchdev API. But since the switchdev support for
ranges has already been de facto deleted by a Mellanox employee and
nobody noticed for 4 years, I'm going to assume it's not a biggie.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RTL8168dp is ancient anyway, and I haven't seen any trace of its early
version 27 yet. This chip versions needs quite some special handling,
therefore it would facilitate driver maintenance if support for it
could be dropped. For now just disable detection of this chip version.
If nobody complains we can remove support for it in the near future.
v2:
- extend unknown chip version error message
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/ca98f018-a0e1-8762-e95c-f0ad773a0271@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If WOL isn't enabled, then there's no need to enable wakeup from D3
on system shutdown.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use WARN_ONCE here to get a call trace in case of a problem.
This facilitates finding the offending code part.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use WARN here to avoid stopping the system. In addition print the addr
and mask values that triggered the warning.
v2:
- return on WARN to avoid an invalid register write
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Increase critical threshold for ASIC thermal zone from 110C to 140C
according to the system hardware requirements. All the supported ASICs
(Spectrum-1, Spectrum-2, Spectrum-3) could be still operational with ASIC
temperature below 140C. With the old critical threshold value system
can perform unjustified shutdown.
All the systems equipped with the above ASICs implement thermal
protection mechanism at firmware level and firmware could decide to
perform system thermal shutdown in case the temperature is below 140C.
So with the new threshold system will not meltdown, while thermal
operating range will be aligned with hardware abilities.
Fixes: 41e760841d ("mlxsw: core: Replace thermal temperature trips with defines")
Fixes: a50c1e3565 ("mlxsw: core: Implement thermal zone")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Validate thresholds to avoid a single failure due to some transceiver
unreliability. Ignore the last readouts in case warning temperature is
above alarm temperature, since it can cause unexpected thermal
shutdown. Stay with the previous values and refresh threshold within
the next iteration.
This is a rare scenario, but it was observed at a customer site.
Fixes: 6a79507cfe ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The MC firmware takes these PAUSE/ASYM_PAUSE flags provided by the
driver, transforms them back into rx/tx pause enablement status and
applies them to hardware. We are not losing information by this
transformation, thus remove the comment.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The dpaa2-eth driver has phylink integration only if the connected dpmac
object is in TYPE_PHY (aka the PCS/PHY etc link status is managed by
Linux instead of the firmware). The check is thus unnecessary because
the code path that reaches the .mac_link_up() callback is only with
TYPE_PHY dpmac objects.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The fsl_mc_get_endpoint() function now returns -EPROBE_DEFER when the
dpmac device was not yet discovered by the fsl-mc bus. When this
happens, pass the error code up so that we can retry the probe at a
later time.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If the network interface object is connected to a MAC of TYPE_FIXED, the
link status management is handled exclusively by the firmware. This does
not mean that the driver cannot access the MAC counters and export them
in ethtool.
For this to happen, we open the attached dpmac device and keep a pointer
to it in priv->mac. Because of this, all the checks in the driver of the
following form 'if (priv->mac)' have to be updated to actually check
the dpmac attribute and not rely on the presence of a non-NULL value.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Split up the initialization phase of the dpmac object from actually
configuring the phylink instance, connecting to it and configuring the
MAC. This is done so that even though the dpni object is connected to a
dpmac which has link management handled by the firmware we are still
able to export the MAC counters.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
napi_gro_receive() can never return GRO_DROP
GRO_DROP can only be returned from napi_gro_frags()
which is the other NAPI GRO entry point.
Followup patch will remove GRO_DROP, because drivers
are not supposed to call napi_gro_frags() if prior
napi_get_frags() has failed.
Note that I have left the gro_dropped variable. I leave to ice
maintainers the decision to further remove it from ethtool -S results.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For all PCI functions on the netxen_nic adapter, interrupt
mode (INTx or MSI) configuration is dependent on what has
been configured by the PCI function zero in the shared
interrupt register, as these adapters do not support mixed
mode interrupts among the functions of a given adapter.
Logic for setting MSI/MSI-x interrupt mode in the shared interrupt
register based on PCI function id zero check is not appropriate for
all family of netxen adapters, as for some of the netxen family
adapters PCI function zero is not really meant to be probed/loaded
in the host but rather just act as a management function on the device,
which caused all the other PCI functions on the adapter to always use
legacy interrupt (INTx) mode instead of choosing MSI/MSI-x interrupt mode.
This patch replaces that check with port number so that for all
type of adapters driver attempts for MSI/MSI-x interrupt modes.
Fixes: b37eb210c0 ("netxen_nic: Avoid mixed mode interrupts")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20210107101520.6735-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit b27507bb59 ("net/ibmvnic: unlock rtnl_lock in reset so
linkwatch_event can run") introduced do_change_param_reset function to
solve the rtnl lock issue. Majority of the code in do_change_param_reset
duplicates do_reset. Also, we can handle the rtnl lock issue in do_reset
itself. Hence merge do_change_param_reset back into do_reset to clean up
the code.
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Link: https://lore.kernel.org/r/20210106213514.76027-1-ljp@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
UniMAC is integrated into multiple Broadcom's Ethernet controllers so
use a shared header file for it and avoid some code duplication.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Doug Berger <opendmb@gmail.com>
Link: https://lore.kernel.org/r/20210107180051.1542-2-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
UniMAC is a hardware block commonly used in Broadcom Ethernet controllers
that should get its own header file. Not every controller has it mapped at
the 0x800 offset so add bgmac access helpers. They will allow using
shared register defines.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210107180051.1542-1-zajec5@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "undocumented" annotations in the EtherAVB driver were done against
the R-Car gen2 manuals; most of these registers/bits were then described
in the R-Car gen3 manuals -- reflect this fact in the annotations (note
that ECSIPR.LCHNGIP was documented in the recent R-Car gen2 manual)...
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to the R-Car Series, 3rd Generation User's Manual: Hardware,
Rev. 1.50, there's no APSR.DM field, instead there are 2 independent
RX/TX clock internal delay bits. Follow the suit: remove #define APSR_DM
and rename #define's APSR_DM_{R|T}DM to APSR_{R|T}DM.
While at it, do several more things to the declaration of *enum* APSR_BIT:
- remove superfluous indentation;
- annotate APSR_MEMS as undocumented;
- annotate APSR as R-Car Gen3 only.
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl/3bZsACgkQSD+KveBX
+j4powgAnzKVXUxN/R5IJ8z9kONZEvI+9SgHQ/qvzsHY66zy6xHlCdrmRWYI8V1c
KuZltBQPVCd1fAAWAzzmkEOpClM1IbS6zd0/OoJFHJfZYdFEZStvw4NrazpRNd6E
rPUgJpxRFjiLbklZhpfZ6yaoAWez/fXPQbXB+ObOs2m/guesCyZy/T7mBTvvovwA
gmrWoxb6vApJZtgk3bIQKaaevx5/0UFCWqJIoAgKyCzVKf8QvQ0OLRkt27Cn3B08
lTjtUjiGe4boxVY8F+bEb79BcnezIb/NlafjoJ1gyCgZ5dvHSrfaopiAxIZhxRSX
bP/W8cCVZvUngclDs/L+cx3Y3btpNA==
=0qLp
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-01-07
* tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups
net/mlx5e: Fix two double free cases
net/mlx5: Release devlink object if adev fails
net/mlx5e: ethtool, Fix restriction of autoneg with 56G
net/mlx5e: In skb build skip setting mark in switchdev mode
net/mlx5: E-Switch, fix changing vf VLANID
net/mlx5e: Fix SWP offsets when vlan inserted by driver
net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
net/mlx5e: Add missing capability check for uplink follow
net/mlx5: Check if lag is supported before creating one
====================
Link: https://lore.kernel.org/r/20210107202845.470205-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently the error return paths don't kfree lmac and lmac->name
leading to some memory leaks. Fix this by adding two error return
paths that kfree these objects
Addresses-Coverity: ("Resource leak")
Fixes: 1463f382f5 ("octeontx2-af: Add support for CGX link management")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210107123916.189748-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CPL_ABORT_RPL is sent after releasing the resources by calling
chtls_release_resources(sk); and chtls_conn_done(sk);
eventually causing kernel panic. Fixing it by calling release
in appropriate order.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of server removal lookup_stid() may return NULL pointer, which
is used as listen_ctx. So added a check before accessing this pointer.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The skb is unlinked twice, one in __skb_dequeue in function
chtls_reset_synq() and another in cleanup_syn_rcv_conn().
So in this patch using skb_peek() instead of __skb_dequeue(),
so that unlink will be handled only in cleanup_syn_rcv_conn().
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In chtls_pass_accept_request(), removing the chtls_reqsk_free()
call to avoid oreq freeing twice. Here oreq is the pointer to
struct request_sock.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If route to peer is not configured, we might get non tls
devices from dst_neigh_lookup() which is invalid, adding a
check to avoid it.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the time of SYN_RECV, connection information is not
initialized at FW, updating tcb flag over uninitialized
connection causes adapter crash. We don't need to
update the flag during SYN_RECV state, so avoid this.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
send_abort_rpl() is not calculating cpl_abort_req_rss offset and
ends up sending wrong TID with abort_rpl WR causng tid leaks.
Replaced send_abort_rpl() with chtls_send_abort_rpl() as it is
redundant.
Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amlogic Meson G12A (and newer: G12B, SM1) SoCs have a more advanced RX
delay logic. Instead of fine-tuning the delay in the nanoseconds range
it now allows tuning in 200 picosecond steps. This support comes with
new bits in the PRG_ETH1[19:16] register.
Add support for validating the RGMII RX delay as well as configuring the
register accordingly on these platforms.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Newer SoCs starting with the Amlogic Meson G12A have more a precise
RGMII RX delay configuration register. This means more complexity in the
code. Extract the existing RGMII delay configuration code into a
separate function to make it easier to read/understand even when adding
more logic in the future.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Amlogic Meson G12A, G12B and SM1 SoCs have a more advanced RGMII RX
delay register which allows picoseconds precision. Parse the new
"rx-internal-delay-ps" property or fall back to the value from the old
"amlogic,rx-delay-ns" property.
No upstream DTB uses the old "amlogic,rx-delay-ns" property (yet).
Only include minimalistic logic to fall back to the old property,
without any special validation (for example if the old and new
property are given at the same time).
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The timing-adjustment clock only has to be enabled when a) there is a
2ns RX delay configured using device-tree and b) the phy-mode indicates
that the RX delay should be enabled.
Only enable the RX delay if both are true, instead of (by accident) also
enabling it when there's the 2ns RX delay configured but the phy-mode
incicates that the RX delay is not used.
Fixes: 9308c47640 ("net: stmmac: dwmac-meson8b: add support for the RX delay configuration")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The SYSTEMPORT driver maps each port of the embedded Broadcom DSA switch
port to a certain queue of the master Ethernet controller. For that it
currently uses a dedicated notifier infrastructure which was added in
commit 60724d4bae ("net: dsa: Add support for DSA specific notifiers").
However, since commit 2f1e8ea726 ("net: dsa: link interfaces with the
DSA master to get rid of lockdep warnings"), DSA is actually an upper of
the Broadcom SYSTEMPORT as far as the netdevice adjacency lists are
concerned. So naturally, the plain NETDEV_CHANGEUPPER net device notifiers
are emitted. It looks like there is enough API exposed by DSA to the
outside world already to make the call_dsa_notifiers API redundant. So
let's convert its only user to plain netdev notifiers.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is a bit strange to see something as specific as Broadcom SYSTEMPORT
bits in the main DSA include file. Move these away into a separate
header, and have the tagger and the SYSTEMPORT driver include them.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to Realtek the ERI register 0x1a8 quirk is needed to work
around a hw issue with the PHY on RTL8168g. The register needs to be
changed before powering down the PHY. Currently we don't meet this
requirement, however I'm not aware of any problems caused by this.
Therefore I see the change as an improvement.
The PHY driver has no means to access the chip ERI registers,
therefore we have to intercept MDIO writes to BMCR register.
If the BMCR_PDOWN bit is going to be set, then let's apply the
quirk before actually powering down the PHY.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
No functional change here. We just move a code block to avoid a
function forward declaration in a subsequent change.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All UDP tunnel port management is now routed via udp_tunnel_nic
infra directly. Remove the old callbacks.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All of the OF code that is used has stubbed and will compile and link
just fine, keeping COMPILE_TEST is enough.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210106191546.1358324-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use existing rx processed count to track against budget, thereby making
budget decrement operation redundant.
rx_desc_count can be calculated outside the rx loop, making the loop a
bit smaller.
Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We can increase the efficiency of rx path by using buffers to receive
packets then build SKBs around them just before passing into the network
stack. In contrast, preallocating SKBs too early reduces CPU cache
efficiency.
Check if we're in NAPI context when refilling RX. Normally we're almost
always running in NAPI context. Dispatch to napi_alloc_frag directly
instead of relying on netdev_alloc_frag which does the same but
with the overhead of local_bh_disable/enable.
Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec
performance. Included netif_receive_skb_list and NET_IP_ALIGN
optimizations.
Before:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 49.9 MBytes 41.9 Mbits/sec 197 sender
[ 4] 0.00-10.00 sec 49.3 MBytes 41.3 Mbits/sec receiver
After:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-30.00 sec 171 MBytes 47.8 Mbits/sec 272 sender
[ 4] 0.00-30.00 sec 170 MBytes 47.6 Mbits/sec receiver
Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The rx SKB ring use the same code for cleanup at various points.
Combine them into a function to reduce lines of code.
Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When mlx5_create_flow_group() fails, ft->g should be
freed just like when kvzalloc() fails. The caller of
mlx5e_create_l2_table_groups() does not catch this
issue on failure, which leads to memleak.
Fixes: 33cfaaa8f3 ("net/mlx5e: Split the main flow steering table")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5e_create_ttc_table_groups() frees ft->g on failure of
kvzalloc(), but such failure will be caught by its caller
in mlx5e_create_ttc_table() and ft->g will be freed again
in mlx5e_destroy_flow_table(). The same issue also occurs
in mlx5e_create_ttc_table_groups(). Set ft->g to NULL after
kfree() to avoid double free.
Fixes: 7b3722fa9e ("net/mlx5e: Support RSS for GRE tunneled packets")
Fixes: 33cfaaa8f3 ("net/mlx5e: Split the main flow steering table")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>