The minimum size of admin send/receive queue is 1 and 2 respectively.
The admin send queue can't be set to 1 because in that case, the
firmware would fail to init.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use the minimum of the number of descriptors thus we will allocate the
minimal ring buffers for kdump.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Set the number of the MSI-X vectors to 1. When MSI-X is enabled,
it's not allowed to use more TC queue pairs than MSI-X vectors
(pf->num_lan_msix) exist. Thus the number of Tx and Rx pairs
(vsi->num_queue_pairs) will be equal to the number of MSI-X vectors,
i.e., 1.
Signed-off-by: Coiby Xu <coxu@redhat.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Refactor repeated link state reporting code into a separate helper
functions: i40e_set_vf_link_state() i40e_vc_link_speed2mbps().
Add support of VIRTCHNL_VF_CAP_ADV_LINK_SPEED;
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Provide the ability to enable SCTP RSS hashing by ethtool.
It gives users option of generating RSS hash based on the SCTP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Provides the ability to enable UDP RSS hashing by ethtool.
It gives users option of generating RSS hash based on the UDP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Provides the ability to enable TCP RSS hashing by ethtool.
It gives users option of generating RSS hash based on the TCP source
and destination ports numbers, IPv4 or IPv6 source and destination
addresses.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add the virtchnl message interface to VF, so that VF can request RSS
input set(s) based on PF's capability.
This framework allows ethtool RSS config support on the VF driver.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, RSS hash input is not available to AVF by ethtool, it is set
by the PF directly.
Add the RSS configure support for AVF through new virtchnl message, and
define the capability flag VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF to query this
new RSS offload support.
Signed-off-by: Jia Guo <jia.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Bo Chen <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, the driver gets the VF's VSI by using a long string of
dereferences (i.e. vf->pf->vsi[vf->lan_vsi_idx]). If the method to get
the VF's VSI were to change the driver would have to change it in every
location. Fix this by adding the helper ice_get_vf_vsi().
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Pointer vsi is being re-assigned a value that is never read,
the assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add code to support UDP segmentation offload (USO) for
hardware that supports it.
Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As the hardware is capable of supporting UDP segmentation offload, add a
capability bit to virtchnl.h to communicate this and have the driver
advertise its support.
Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Declare bitmap of allowed commands on VF. Initialize default
opcodes list that should be always supported. Declare array of
supported opcodes for each caps used in virtchnl code.
Change allowed bitmap by setting or clearing corresponding
bit to allowlist (bit set) or denylist (bit clear).
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Attempt to detect malicious VFs and, if suspected, log the information but
keep going to allow the user to take any desired actions.
Potentially malicious VFs are identified by checking if the VFs are
transmitting too many messages via the PF-VF mailbox which could cause an
overflow of this channel resulting in denial of service. This is done by
creating a snapshot or static capture of the mailbox buffer which can be
traversed and in which the messages sent by VFs are tracked.
Co-developed-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Co-developed-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Co-developed-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
- keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
- fix build after move to net_generic
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Expose EEE Tx and Rx low power idle counters via ethtool
A EEE TX or RX LPI event occurs when the transmitter or the receiver
enters EEE (IEEE802.3az) LPI state.
ethtool --statistics <iface>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The i225 device offers a number of special PTP Hardware Clock features on
the Software Defined Pins (SDPs) - much like i210, which is used as
inspiration for this patch. It enables two possible functions, namely
time stamping external events and periodic output signals.
The assignment of PHC functions to the four SDP can be freely chosen by
the user.
For the external events time stamping, when the SDP (configured as input
by user) level changes, an interrupt is generated and the kernel
Precision Time Protocol (PTP) is informed.
For the periodic output signals, the i225 is configured to generate them
(so the SDP level will change periodically) and the driver also has to
keep updating the time of the next level change. However, this work is
not necessary for some frequencies as the i225 takes care of them
(namely, anything with a half-cycle of 500ms, 250ms, 125ms or < 70ms).
While i225 allows up to four timers to be used to source the time used
on the external events or output signals, this patch uses only one of
those timers. Main reason is to keep it simple, as it's not clear how
these extra timers would be exposed to users. Note that currently a NIC
can expose a single PTP device.
Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The i225 device can produce one interrupt on the full second, much
like i210 - from where this patch is inspired.
This patch sets up the full second interruption on the i225 and when
receiving it, it sends a PPS event to PTP (Precision Time Protocol)
kernel subsystem.
The PTP subsystem exposes the PPS events via ioctl and sysfs, and one
can use the `testptp` tool (tools/testing/selftests/ptp) to check that
the events are being generated.
Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add new function which checks MTA_REGISTER if its filled correctly.
If not then writes again to same register.
There is possibility that i210 and i211 could not accept
MTA_REGISTER settings, specially when you add and remove
many of multicast addresses in short time.
Without this patch there is possibility that multicast settings will be
not always set correctly in hardware.
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i210 has a total of 24KB of transmit packet buffer. When in Qav mode,
this buffer is divided into four pieces, one for each Tx queue.
Currently, 8KB are given to each of the two SR queues and 4KB are given
to each of the two SP queues.
However, it was noticed that such distribution can make best effort
traffic (which would usually go to the SP queues when Qav is enabled, as
the SR queues would be used by ETF or CBS qdiscs for TSN-aware traffic)
perform poorly. Using iperf3 to measure, one could see the performance
of best effort traffic drop by nearly a third (from 935Mbps to 578Mbps),
with no TSN traffic competing.
This patch redistributes the 24KB to each queue equally: 6KB each. On
tests, there was no notable performance reduction of best effort traffic
performance when there was no TSN traffic competing.
Below, more details about the data collected:
All experiments were run using the following qdisc setup:
qdisc taprio 100: root refcnt 9 tc 4 map 3 3 3 2 3 0 0 3 3 3 3 3 3 3 3 3
queues offset 0 count 1 offset 1 count 1 offset 2 count 1 offset 3 count 1
clockid TAI base-time 0 cycle-time 10000000 cycle-time-extension 0
index 0 cmd S gatemask 0xf interval 10000000
qdisc etf 8045: parent 100:1 clockid TAI delta 1000000 offload on
deadline_mode off skip_sock_check off
TSN traffic, when enabled, had this characteristics:
Packet size: 1500 bytes
Transmission interval: 125us
----------------------------------
Without this patch:
----------------------------------
- TCP data:
- No TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.35 GBytes 578 Mbits/sec 0
- With TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.07 GBytes 460 Mbits/sec 1
- TCP data limiting iperf3 buffer size to 4K:
- No TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.35 GBytes 579 Mbits/sec 0
- With TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.08 GBytes 462 Mbits/sec 0
- TCP data limiting iperf3 buffer size to 192 bytes (smallest size without
serious performance degradation):
- No TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.34 GBytes 577 Mbits/sec 0
- With TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.07 GBytes 461 Mbits/sec 1
- UDP data at 1000Mbit/sec:
- No TSN traffic:
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-20.00 sec 1.36 GBytes 586 Mbits/sec 0.000 ms 0/1011407 (0%)
- With TSN traffic:
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-20.00 sec 1.05 GBytes 451 Mbits/sec 0.000 ms 0/778672 (0%)
----------------------------------
With this patch:
----------------------------------
- TCP data:
- No TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 2.17 GBytes 932 Mbits/sec 0
- With TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.50 GBytes 646 Mbits/sec 1
- TCP data limiting iperf3 buffer size to 4K:
- No TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 2.17 GBytes 931 Mbits/sec 0
- With TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.50 GBytes 645 Mbits/sec 0
- TCP data limiting iperf3 buffer size to 192 bytes (smallest size without
serious performance degradation):
- No TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 2.17 GBytes 932 Mbits/sec 1
- With TSN traffic:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 1.50 GBytes 645 Mbits/sec 0
- UDP data at 1000Mbit/sec:
- No TSN traffic:
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-20.00 sec 2.23 GBytes 956 Mbits/sec 0.000 ms 0/1650226 (0%)
- With TSN traffic:
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-20.00 sec 1.51 GBytes 649 Mbits/sec 0.000 ms 0/1120264 (0%)
Signed-off-by: Ederson de Souza <ederson.desouza@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix this panic by adding more rules to calculate the value of @rss_size_max
which could be used in allocating the queues when bpf is loaded, which,
however, could cause the failure and then trigger the NULL pointer of
vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
cpus are online and then allocates 256 queues on the machine with 32 cpus
online actually.
Once the load of bpf begins, the log will go like this "failed to get
tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
failed".
Thus, I attach the key information of the crash-log here.
BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
Call Trace:
[2160294.717292] ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
[2160294.717666] dev_xdp_install+0x4f/0x70
[2160294.718036] dev_change_xdp_fd+0x11f/0x230
[2160294.718380] ? dev_disable_lro+0xe0/0xe0
[2160294.718705] do_setlink+0xac7/0xe70
[2160294.719035] ? __nla_parse+0xed/0x120
[2160294.719365] rtnl_newlink+0x73b/0x860
Fixes: 41c445ff0f ("i40e: main driver core")
Co-developed-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The scope of this variable can be reduced so do that.
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We were saving the return value from ice_vsi_manage_rss_lut(), but
the errors from that function are not critical so change it to
return void and remove the code that saved the value.
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Silence false errors, warnings and style issues reported by cppcheck.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the vsi->vf_id is set only for ICE_VSI_VF and it's left as 0
for all other VSI types. This is confusing and could be problematic
since 0 is a valid vf_id. Fix this by always setting non VF VSI types to
ICE_INVAL_VFID.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The only time you can ever have a rq_last_status is if
a firmware event was somehow reporting a status on the receive
queue, which are generally firmware initiated events or
mailbox messages from a VF. Mostly this struct member was unused.
Fix this problem by still printing the value of the field in a debug
print, but don't store the value forever in a struct, potentially
creating opportunities for callers to use the wrong struct member.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Do a minor refactor on ice_vsi_rebuild to use a local
variable to store vsi->type.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver previously printed it's PCI address in
the name field for the pci resource, which when displayed
via /proc/iomem, would print the same thing twice.
It's more useful for debugging to see the driver name, as
most other modules do.
Here's a diff of before and after this change:
99100000-991fffff : 0000:3b:00.1
9a000000-a04fffff : PCI Bus 0000:3b
9a000000-9bffffff : 0000:3b:00.1
- 9a000000-9bffffff : 0000:3b:00.1
+ 9a000000-9bffffff : ice
9c000000-9dffffff : 0000:3b:00.0
- 9c000000-9dffffff : 0000:3b:00.0
+ 9c000000-9dffffff : ice
9e000000-9effffff : 0000:3b:00.1
9f000000-9fffffff : 0000:3b:00.0
a0000000-a000ffff : 0000:3b:00.1
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There was an excessive increment of the QSFP page, which is
now fixed. Additionally, this new update now reads 8 bytes
at a time and will retry each request if the module/bus is
busy.
Also, prevent reading from upper pages if module does not
support those pages.
Signed-off-by: Scott W Taylor <scott.w.taylor@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use a dedicated bitfield in order to both increase
the amount of checking around the length of ITR writes
as well as simplify the checks of dynamic mode.
Basically unpack the "high bit means dynamic" logic
into bitfields.
Also, remove some unused ITR defines.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver would occasionally miss that there were outstanding
descriptors to clean when exiting busy/napi poll. This issue has
been in the code since the introduction of the ice driver.
Attempt to "catch" any remaining work by triggering a software
interrupt when exiting napi poll or busy-poll. This will not
cause extra interrupts in the case of normal execution.
This issue was found when running sfnt-pingpong, with busy
poll enabled, and typically with larger I/O sizes like > 8192,
the program would occasionally report > 1 second maximums
to complete a ping pong.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice driver has support for adaptive interrupt moderation, an
algorithm for tuning the interrupt rate dynamically. This algorithm
is based on various assumptions about ring size, socket buffer size,
link speed, SKB overhead, ethernet frame overhead and more.
The Linux kernel has support for a dynamic interrupt moderation
algorithm known as "dimlib". Replace the custom driver-specific
implementation of dynamic interrupt moderation with the kernel's
algorithm.
The Intel hardware has a different hardware implementation than the
originators of the dimlib code had to work with, which requires the
driver to use a slightly different set of inputs for the actual
moderation values, while getting all the advice from dimlib of
better/worse, shift left or right.
The change made for this implementation is to use a pair of values
for each of the 5 "slots" that the dimlib moderation expects, and
the driver will program those pairs when dimlib recommends a slot to
use. The currently implementation uses two tables, one for receive
and one for transmit, and the pairs of values in each slot set the
maximum delay of an interrupt and a maximum number of interrupts per
second (both expressed in microseconds).
There are two separate kinds of bugs fixed by using DIMLIB, one is
UDP single stream send was too slow, and the other is that 8K
ping-pong was going to the most aggressive moderation and has much
too high latency.
The overall result of using DIMLIB is that we meet or exceed our
performance expectations set based on the old algorithm.
Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Introduce several new helpers for writing ITR and GLINT_RATE
registers, and refactor the code calling them. This resulted
in removal of several duplicate functions and rolled a bunch
of simple code back into the calling routines.
In particular this removes some code that was doing both
a store and a set in a helper function, which seems better
done as separate tasks in the caller (and generally takes
less lines of code even with a tiny bit of repetition).
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add two new VSI states, one to track if a netdev for the VSI has been
allocated and the other to track if the netdev has been registered.
Call unregister_netdev/free_netdev only when the corresponding state
bits are set.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove the leading underscores in enum ice_pf_state. This is not really
communicating anything and is unnecessary. No functional change.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The well-known IANA protocol port 3260 (iscsi-target 0x0cbc) and the
ether-types 0x8906 (ETH_P_FCOE) and 0x8914 (ETH_P_FIP) are already defined
in kernel header files. Use those definitions instead of open-coding the
same.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove the 'ixgbe_mc_addr_itr' typedef as it is not used.
Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The Broadcom PHY is used in switches, so add the ID, and hook it up.
This upstreams the Linux kernel patch from the network operating system
SONiC from February 2020 [1].
[1]: https://github.com/Azure/sonic-linux-kernel/pull/122
Signed-off-by: Jostar Yang <jostar_yang@accton.com>
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
A for-loop is using a u8 loop counter that is being compared to
a u32 cmp_dcbcfg->numapp to check for the end of the loop. If
cmp_dcbcfg->numapp is larger than 255 then the counter j will wrap
around to zero and hence an infinite loop occurs. Fix this by making
counter j the same type as cmp_dcbcfg->numapp.
Addresses-Coverity: ("Infinite loop")
Fixes: aeac8ce864 ("ice: Recognize 860 as iSCSI port in CEE mode")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
pci_disable_device() called in __ixgbe_shutdown() decreases
dev->enable_cnt by 1. pci_enable_device_mem() which increases
dev->enable_cnt by 1, was removed from ixgbe_resume() in commit
6f82b25587 ("ixgbe: use generic power management"). This caused
unbalanced increase/decrease. So add pci_enable_device_mem() back.
Fix the following call trace.
ixgbe 0000:17:00.1: disabling already-disabled device
Call Trace:
__ixgbe_shutdown+0x10a/0x1e0 [ixgbe]
ixgbe_suspend+0x32/0x70 [ixgbe]
pci_pm_suspend+0x87/0x160
? pci_pm_freeze+0xd0/0xd0
dpm_run_callback+0x42/0x170
__device_suspend+0x114/0x460
async_suspend+0x1f/0xa0
async_run_entry_fn+0x3c/0xf0
process_one_work+0x1dd/0x410
worker_thread+0x34/0x3f0
? cancel_delayed_work+0x90/0x90
kthread+0x14c/0x170
? kthread_park+0x90/0x90
ret_from_fork+0x1f/0x30
Fixes: 6f82b25587 ("ixgbe: use generic power management")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ixgbe driver currently generates a NULL pointer dereference when
performing the ethtool loopback test. This is due to the fact that there
isn't a q_vector associated with the test ring when it is setup as
interrupts are not normally added to the test rings.
To address this I have added code that will check for a q_vector before
returning a napi_id value. If a q_vector is not present it will return a
value of 0.
Fixes: b02e5a0ebb ("xsk: Propagate napi_id to XDP socket Rx path")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In ice_suspend(), ice_clear_interrupt_scheme() is called, and then
irq_free_descs() will be eventually called to free irq and its descriptor.
In ice_resume(), ice_init_interrupt_scheme() is called to allocate new
irqs. However, in ice_rebuild_arfs(), struct irq_glue and struct cpu_rmap
maybe cannot be freed, if the irqs that released in ice_suspend() were
reassigned to other devices, which makes irq descriptor's affinity_notify
lost.
So call ice_free_cpu_rx_rmap() before ice_clear_interrupt_scheme(), which
can make sure all irq_glue and cpu_rmap can be correctly released before
corresponding irq and descriptor are released.
Fix the following memory leak.
unreferenced object 0xffff95bd951afc00 (size 512):
comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s)
hex dump (first 32 bytes):
18 00 00 00 18 00 18 00 70 fc 1a 95 bd 95 ff ff ........p.......
00 00 ff ff 01 00 ff ff 02 00 ff ff 03 00 ff ff ................
backtrace:
[<0000000072e4b914>] __kmalloc+0x336/0x540
[<0000000054642a87>] alloc_cpu_rmap+0x3b/0xb0
[<00000000f220deec>] ice_set_cpu_rx_rmap+0x6a/0x110 [ice]
[<000000002370a632>] ice_probe+0x941/0x1180 [ice]
[<00000000d692edba>] local_pci_probe+0x47/0xa0
[<00000000503934f0>] work_for_cpu_fn+0x1a/0x30
[<00000000555a9e4a>] process_one_work+0x1dd/0x410
[<000000002c4b414a>] worker_thread+0x221/0x3f0
[<00000000bb2b556b>] kthread+0x14c/0x170
[<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30
unreferenced object 0xffff95bd81b0a2a0 (size 96):
comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s)
hex dump (first 32 bytes):
38 00 00 00 01 00 00 00 e0 ff ff ff 0f 00 00 00 8...............
b0 a2 b0 81 bd 95 ff ff b0 a2 b0 81 bd 95 ff ff ................
backtrace:
[<00000000582dd5c5>] kmem_cache_alloc_trace+0x31f/0x4c0
[<000000002659850d>] irq_cpu_rmap_add+0x25/0xe0
[<00000000495a3055>] ice_set_cpu_rx_rmap+0xb4/0x110 [ice]
[<000000002370a632>] ice_probe+0x941/0x1180 [ice]
[<00000000d692edba>] local_pci_probe+0x47/0xa0
[<00000000503934f0>] work_for_cpu_fn+0x1a/0x30
[<00000000555a9e4a>] process_one_work+0x1dd/0x410
[<000000002c4b414a>] worker_thread+0x221/0x3f0
[<00000000bb2b556b>] kthread+0x14c/0x170
[<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30
Fixes: 769c500dcc ("ice: Add advanced power mgmt for WoL")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove vsi->netdev->name from the trace.
This is redundant information. With the devinfo trace, the adapter
is already identifiable.
Previously following error was produced when compiling against sparse.
i40e_main.c:2571 i40e_sync_vsi_filters() error:
we previously assumed 'vsi->netdev' could be null (see line 2323)
Fixes: b603f9dc20 ("i40e: Log info when PF is entering and leaving Allmulti mode.")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Init pointer with NULL in default switch case statement.
Previously the error was produced when compiling against sparse.
i40e_debugfs.c:582 i40e_dbg_dump_desc() error: uninitialized symbol 'ring'.
Fixes: 44ea803e2f ("i40e: introduce new dump desc XDP command")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove error handling through pointers. Instead use plain int
to return value from i40e_run_xdp(...).
Previously:
- sparse errors were produced during compilation:
i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing possible ERR_PTR()
- sk_buff* was used to return value, but it has never had valid
pointer to sk_buff. Returned value was always int handled as
a pointer.
Fixes: 0c8493d90b ("i40e: add XDP support for pass and drop actions")
Fixes: 2e68931238 ("i40e: split XDP_TX tail and XDP_REDIRECT map flushing")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Change parameters order in aq_get_phy_register() due to wrong
statistics in PHY reported by ethtool. Previously all PHY statistics were
exactly the same for all interfaces
Now statistics are reported correctly - different for different interfaces
Fixes: 0514db37dd ("i40e: Extend PHY access with page change flag")
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Checkpatch reports the following, fix it.
-----------------------------------------
drivers/net/ethernet/intel/ice/ice_main.c
-----------------------------------------
CHECK:BRACES: Blank lines aren't necessary before a close brace '}'
FILE: drivers/net/ethernet/intel/ice/ice_main.c:455:
+
+}
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Currently the driver is doing two unnecessary checks. First both ops are
checking if the VLAN ID passed in is less than VLAN_N_VID and second
both ops are checking to see if a port VLAN is configured on the VSI.
The first check is already handled by the 8021q driver so this is an
unnecessary check. The second check is unnecessary because the PF VSI is
never put into a port VLAN.
Remove these checks.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tracking of the rx_gro_dropped statistic was removed in
commit f73fc40327 ("ice: drop dead code in ice_receive_skb()").
Remove the associated variables and its reporting to ethtool stats.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Replace multiple instances of vsi->back and pi->phy with equivalent
local variables
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In ice_init_phy_user_cfg, vsi is used only to get to hw. Remove this
and just use pi->hw
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Beyond a specific version of firmware, there is no need to provide
override values to the firmware when setting PHY capabilities. In this
case, we do not need to indicate whether we're in Strict or Lenient Link
Mode.
In the case of translating capabilities to the configuration structure,
the module compliance enforcement is already correctly set by firmware,
so the extra code block is redundant.
Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Recent firmware supports a new "get PHY capabilities" mode
ICE_AQC_REPORT_DFLT_CFG which makes it unnecessary for the driver
to track and apply NVM based default link overrides.
If FW AQ API version supports it, use Report Default Configuration.
Add check function for Report Default Configuration support and update
accordingly.
Also change adv_phy_type_[lo|hi] to advert_phy_type[lo|hi] for
clarity.
Co-developed-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In ice_set_link_ksettings, use assignment instead of memset/memcpy
where possible
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Return more appropriate error codes so that the right error
message is communicated to the user by ethtool.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In ice_set_link_ksettings, change 'abilities' to 'phy_caps' and 'p' to
'pi'. This is more consistent with similar usages elsewhere in the
driver.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The loop checking for PF VSI doesn't make any sense. The VSI type
backing the netdev passed to ice_set_link_ksettings will always be
of type ICE_PF_VSI. Remove it.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When link is owned by manageability, the driver is not allowed to fiddle
with link. FW returns ICE_AQ_RC_EMODE if the driver attempts to do so.
This patch adds a new function ice_set_link which abstracts the call to
ice_aq_set_link_restart_an and provides a clean way to turn on/off link.
While making this change, I also spotted that an int variable was being
used to hold both an ice_status return code and the Linux errno return
code. This pattern more often than not results in the driver inadvertently
returning ice_status back to kernel which is a major boo-boo. Clean it up.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
For get PHY abilities AQ, the specification defines "report modes"
as "with media", "without media" and "active configuration". For
clarity, rename macros to align with the specification.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove the recursive way of adding the nodes to the layer in order
to reduce the stack usage. Instead the algorithm is modified to use
a while loop.
The previous code was scanning recursively the nodes horizontally.
The total stack consumption will be based on number of nodes present
on that layer. In some cases it can consume more stack.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Retry sending some AQ commands, as result of EBUSY AQ error.
ice_aqc_opc_get_link_topo
ice_aqc_opc_lldp_stop
ice_aqc_opc_lldp_start
ice_aqc_opc_lldp_filter_ctrl
This change follows the latest guidelines from HW team. It is
better to retry the same AQ command several times, as the result
of EBUSY, instead of returning error to the caller right away.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
If veb-stats was enabled, the ethtool stats triggered a warning
due to invalid size: 'unexpected stat size for veb.tc_%u_tx_packets'.
This was due to an incorrect structure definition for the statistics.
Structures and functions have been improved in line with requirements
for the presentation of statistics, in particular for the functions:
'i40e_add_ethtool_stats' and 'i40e_add_stat_strings'.
Fixes: 1510ae0be2 ("i40e: convert VEB TC stats to use an i40e_stats array")
Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com>
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix so that single packets are received immediately instead of in
batches of 8. If you sent 1 pps to a system, you received 8 packets
every 8 seconds instead of 1 packet every second. The problem behind
this was that the work_done reporting from the Tx part of the driver
was broken. The work_done reporting in i40e controls not only the
reporting back to the napi logic but also the setting of the interrupt
throttling logic. When Tx or Rx reports that it has more to do,
interrupts are throttled or coalesced and when they both report that
they are done, interrupts are armed right away. If the wrong work_done
value is returned, the logic will start to throttle interrupts in a
situation where it should have just enabled them. This leads to the
undesired batching behavior seen in user-space.
Fix this by returning the correct boolean value from the Tx xsk
zero-copy path. Return true if there is nothing to do or if we got
fewer packets to process than we asked for. Return false if we got as
many packets as the budget since there might be more packets we can
process.
Fixes: 3106c580fb ("i40e: Use batched xsk Tx interfaces to increase performance")
Reported-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fixed new static analysis findings:
"warn: inconsistent indenting" - introduced lately,
reported with lkp and smatch build.
Fixes: 4b208eaa80 ("i40e: Add init and default config of software based DCB")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The following is reported by checkpatch, correct it.
-----------------------------------------------
drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
-----------------------------------------------
WARNING:NETWORKING_BLOCK_COMMENT_STYLE: networking block comments don't use an empty /* line, use /* Comment...
FILE: drivers/net/ethernet/intel/ice/ice_adminq_cmd.h:1428:
+/*
+ * Send to PF command (indirect 0x0801) ID is only used by PF
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
A few style issues reported by checkpatch have snuck into the code; resolve
the style issues.
COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
struct ice_vsi has two fields, state and flags which seem to
be serving the same purpose. Consolidate them into one field
'state'.
enum ice_state is used to represent state information of the PF.
While some of these enum values can be use to represent VSI state,
it makes more sense to represent VSI state with its own enum. So
derive a new enum ice_vsi_state from ice_vsi_flags and ice_state
and use it. Also rename enum ice_state to ice_pf_state for clarity.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently ice_set/get_rss are used to set/get the RSS LUT and/or RSS
key. However nearly everywhere these functions are called only the LUT
or key are set/get. Also, making this change reduces how many things
ice_set/get_rss are doing. Fix this by adding ice_set/get_rss_lut and
ice_set/get_rss_key functions.
Also, consolidate all calls for setting/getting the RSS LUT and RSS Key
to use ice_set/get_rss_lut() and ice_set/get_rss_key().
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Update ice_aq_get_rss_lut() and ice_aq_set_rss_lut() to take a new
structure ice_aq_get_set_rss_params instead of passing individual
parameters. This is done for 2 reasons:
1. Reduce the number of parameters passed to the functions.
2. Reduce the amount of change required if the arguments ever need to be
updated in the future.
Also, reduce duplicate code that was checking for an invalid vsi_handle
and lut parameter by moving the checks to the lower level
__ice_aq_get_set_rss_lut().
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently, ice_vsi_setup_q_map() depends on the VSI's rss_size. However,
the Rx Queue Mapping section of the VSI context has no dependency on RSS.
Instead, limit the maximum number of Rx queues per TC based on the Rx
Queue mapping section of the VSI context, which currently allows for up
to 256 Rx queues per TC.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Align all ptype bitmap to follow ice_ptypes_xxx prefix.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use *malloc() instead of *calloc() when allocating only a single object as
opposed to an array of objects.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Check for bail out condition before calling ice_aq_sff_eeprom
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit a012dca9f7 ("ice: add ethtool -m support for reading i2c eeprom
modules") unnecessarily added the ICE_AQ_FLAG_BUF flag to the descriptor
when sending the indirect Read/Write SFF EEPROM AQ command. The flag is
already added later in the code flow for all indirect AQ commands, i.e.
commands that provide an additional data buffer.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Change link misconfiguration message since the configuration
could be intended by the user.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is an issue when the Tx or Rx ring size increases using
'ethtool -L ...' where the new rings don't get the correct ITR
values because when we rebuild the VSI we don't know that some
of the rings may be new.
Fix this by looking at the original number of rings and
determining if the rings in ice_vsi_rebuild_set_coalesce()
were not present in the original rings received in
ice_vsi_rebuild_get_coalesce().
Also change the code to return an error if we can't allocate
memory for the coalesce data in ice_vsi_rebuild().
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There are two package versions in the package binary. Today, these two
version numbers are the same. However, in the future that may change.
Update code to use the package info from the ice segment metadata
section, which is the package information that is actually downloaded to
the firmware during the download package process.
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Once a netdev is registered, the corresponding network interface can
be immediately used by userspace utilities (like say NetworkManager).
This can be problematic if the driver technically isn't fully up yet.
Move netdev registration to the end of probe, as by this time the
driver data structures and device will be initialized as expected.
However, delaying netdev registration causes a failure in the aRFS flow
where netdev->reg_state == NETREG_REGISTERED condition is checked. It's
not clear why this check was added to begin with, so remove it.
Local testing didn't indicate any issues with this change.
The state bit check in ice_open was put in as a stop-gap measure to
prevent a premature interface up operation. This is no longer needed,
so remove it.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enable and configure XPS. The driver code implemented sets up the Transmit
Packet Steering Map, which in turn will be used by the kernel in queue
selection during Tx.
Signed-off-by: Benita Bose <benita.bose@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove repeated words "to" and "try".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ice_remove_vsi_lkup_fltr is called, by calling
ice_add_to_vsi_fltr_list local copy of vsi filter list
is created. If any issues during creation of vsi filter
list occurs it up for the caller to free already
allocated memory. This patch ensures proper memory
deallocation in these cases.
Fixes: 80d144c9ac ("ice: Refactor switch rule management structures and functions")
Signed-off-by: Robert Malz <robertx.malz@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As per the spec, the WoL control word read from the NVM should be
interpreted as port numbers, and not PF numbers. So when checking
if WoL supported, use the port number instead of the PF ID.
Also, ice_is_wol_supported doesn't really need a pointer to the pf
struct, but just needs a pointer to the hw instance.
Fixes: 769c500dcc ("ice: Add advanced power mgmt for WoL")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add handling of allocation fault for ice_vsi_list_map_info.
Also *fi should not be NULL pointer, it is a reference to raw
data field, so remove this variable and use the reference
directly.
Fixes: 9daf8208dd ("ice: Add support for switch filter programming")
Signed-off-by: Jacek Bułatek <jacekx.bulatek@intel.com>
Co-developed-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The original purpose of the ICE_DCBNL_DEVRESET was to protect
the driver during DCBNL device resets. But, the flow for
DCBNL device resets now consists of only calls up the stack
such as dev_close() and dev_open() that will result in NDO calls
to the driver. These will be handled with state changes from the
stack. Also, there is a problem of the dev_close and dev_open
being blocked by checks for reset in progress also using the
ICE_DCBNL_DEVRESET bit.
Since the ICE_DCBNL_DEVRESET bit is not necessary for protecting
the driver from DCBNL device resets and it is actually blocking
changes coming from the DCBNL interface, remove the bit from the
PF state and don't block driver function based on DCBNL reset in
progress.
Fixes: b94b013eb6 ("ice: Implement DCBNL support")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the order of number of array members and member size parameters in a
*calloc() call.
Fixes: b3c3890489 ("ice: avoid unnecessary single-member variable-length structs")
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is a possibility of race between ice_open or ice_stop calls
performed by OS and reset handling routine both trying to modify VSI
resources. Observed scenarios:
- reset handler deallocates memory in ice_vsi_free_arrays and ice_open
tries to access it in ice_vsi_cfg_txq leading to driver crash
- reset handler deallocates memory in ice_vsi_free_arrays and ice_close
tries to access it in ice_down leading to driver crash
- reset handler clears port scheduler topology and sets port state to
ICE_SCHED_PORT_STATE_INIT leading to ice_ena_vsi_txq fail in ice_open
To prevent this additional checks in ice_open and ice_stop are
introduced to make sure that OS is not allowed to alter VSI config while
reset is in progress.
Fixes: cdedef59de ("ice: Configure VSIs for Tx/Rx")
Signed-off-by: Krzysztof Goreczny <krzysztof.goreczny@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
iSCSI can use both TCP ports 860 and 3260. However, in our current
implementation, the ice_aqc_opc_get_cee_dcb_cfg (0x0A07) AQ command
doesn't provide a way to communicate the protocol port number to the
AQ's caller. Thus, we assume that 3260 is the iSCSI port number at the
AQ's caller layer.
Rely on the dcbx-willing mode, desired QoS and remote QoS configuration to
determine which port number that iSCSI will use.
Fixes: 0ebd3ff13c ("ice: Add code for DCB initialization part 2/4")
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
250 msec timeout is insufficient for some AQ commands. Advice from FW
team was to increase the timeout. Increase to 1 second.
Fixes: 7ec59eeac8 ("ice: Add support for control queues")
Signed-off-by: Fabio Pricoco <fabio.pricoco@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
An incorrect NVM update procedure can result in the driver failing probe.
In this case, the recommended resolution method is to update the NVM
using the right procedure. However, if the driver fails probe, the user
will not be able to update the NVM. So do not fail probe on link/PHY
errors.
Fixes: 1a3571b593 ("ice: restore PHY settings on media insertion")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support for the XDP_REDIRECT action which enables XDP programs to
redirect packets arriving at I225 NIC. It also implements the
ndo_xdp_xmit ops, enabling the igc driver to transmit packets forwarded
to it by xdp programs running on other interfaces.
The patch tweaks the driver's page counting and recycling scheme as
described in the following two commits and implemented by other Intel
drivers in order to properly support XDP_REDIRECT action:
commit 8ce29c679a ("i40e: tweak page counting for XDP_REDIRECT")
commit 75aab4e10a ("i40e: avoid premature Rx buffer reuse")
This patch has been tested with the sample apps "xdp_redirect_cpu" and
"xdp_redirect_map" located in samples/bpf/.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add support for XDP_TX action which enables XDP programs to transmit
back receiving frames.
I225 controller has only 4 Tx hardware queues. Since XDP programs may
not even issue an XDP_TX action, this patch doesn't reserve dedicated
queues just for XDP like other Intel drivers do. Instead, the queues
are shared between the network stack and XDP. The netdev queue lock is
used to ensure mutual exclusion.
Since frames can now be transmitted via XDP_TX, the igc_tx_buffer
structure is modified so we are able to save a reference to the xdp
frame for later clean up once the packet is transmitted. The tx_buffer
is mapped to either a skb or a xdpf so we use a union to save the skb
or xdpf pointer and have a bit in tx_flags to indicate which field to
use.
This patch has been tested with the sample app "xdp2" located in
samples/bpf/ dir.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add the initial XDP support to the igc driver. For now, only XDP_PASS,
XDP_DROP, XDP_ABORTED actions are supported. Upcoming patches will add
support for the remaining XDP actions.
XDP configuration helpers are defined in a new file, igc_xdp.c. These
helpers are utilized in igc_main.c to implement the ndo_bpf callback.
XDP-related code that belongs to the driver's hot path is landed in
igc_main.c.
By default, the driver uses Rx buffers with 2 KB size. When XDP is
enabled, it uses larger buffers so we have enough space to accommodate
the headroom and tailroom required by XDP infrastructure. Also, the
driver doesn't support XDP functionality with frames that span over
multiple buffers so jumbo frames are not allowed for now.
The approach implemented follows the approach implemented in other Intel
drivers as much as possible for the sake of consistency across the
drivers.
Quick comment regarding igc_build_skb(): this patch doesn't touch it
because the function is never called. It seems its support is
incomplete/in progress. The function was added by commit 0507ef8a03
("igc: Add transmit and receive fastpath and interrupt handlers") but
ring_uses_build_skb() always return False since the IGC_RING_FLAG_RX_
BUILD_SKB_ENABLED isn't set anywhere in the driver code.
This patch has been tested with the sample app "xdp1" located in
samples/bpf/ dir.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
While commit 13b5b7fd6a ("igc: Add support for Tx/Rx rings")
introduced code to handle larger packet buffers, it missed the
set/clear helpers which enable/disable that feature. Introduce
the missing helpers which will be use in the next patch.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Refactor the Rx timestamp handling in preparation to land
XDP support.
Rx timestamps are put in the Rx buffer by hardware, before the packet
data. When creating the xdp buffer, we will need to check the Rx
descriptor to determine if the buffer contains timestamp information
and consider the offset when setting xdp.data.
The Rx descriptor check is already done in igc_construct_skb(). To
avoid code duplication, this patch moves the timestamp handling to
igc_clean_rx_irq() so both skb and xdp paths can reuse it.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The RX frame truesize calculation is scattered throughout the RX code.
This patch creates the helper function igc_get_rx_frame_truesize() and
uses it where applicable.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The igc driver implements the same page recycling scheme from other
Intel drivers which reuses the page by flipping the buffer. The code
to handle buffer flips is duplicated in many locations so introduce
the igc_rx_buffer_flip() helper and use it where applicable.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The 'skb' argument from igc_tx_cmd_type() is not used so remove it.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-03-24
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 15 day(s) which contain
a total of 65 files changed, 3200 insertions(+), 738 deletions(-).
The main changes are:
1) Static linking of multiple BPF ELF files, from Andrii.
2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo.
3) Spelling fixes from various folks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Setup TC before the i40e_setup_pf_switch() call.
Memory must be initialized for all the queues
before using its resources.
Previously it could be possible that a call:
xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
was made with q_vector being null.
Oops could show up with the following sequence:
- no driver loaded
- FW LLDP agent is on (flag disable-fw-lldp:off)
- link is up
- DCB configured with number of Traffic Classes that will not divide
completely the default number of queues (usually cpu cores)
- driver load
- set private flag: disable-fw-lldp:on
Fixes: 4b208eaa80 ("i40e: Add init and default config of software based DCB")
Fixes: b02e5a0ebb ("xsk: Propagate napi_id to XDP socket Rx path")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the reason of kernel oops when i40e driver removed VFs.
Added new __I40E_VFS_RELEASING state to signalize releasing
process by PF, that it makes possible to exit of reset VF procedure.
Without this patch, it is possible to suspend the VFs reset by
releasing VFs resources procedure. Retrying the reset after the
timeout works on the freed VF memory causing a kernel oops.
Fixes: d43d60e5eb ("i40e: ensure reset occurs when disabling VF")
Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com>
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add Asym_Pause to supported link modes (it is supported by HW).
Lack of Asym_Pause in supported modes can cause several problems,
i.e. it won't be possible to turn the autonegotiation on
with asymmetric pause settings (i.e. Tx on, Rx off).
Fixes: 4e91bcd5d4 ("i40e: Finish implementation of ethtool get settings")
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of just
letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding a couple of break statements instead of
just letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of just letting the code
fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
A bunch of header comments were showing warnings when compiling
with W=1. Fix them all at once. This changes only comments.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Correct reported warnings for "warning: expecting prototype for ...
Prototype was for ... instead"
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Flex-bytes allows for packet matching based on an offset and value. This
is supported via the ethtool user-def option.
The user-def 0xAAAABBBBCCCCDDDD: BBBB is the 2 byte pattern while AAAA
corresponds to its offset in the packet. Similarly DDDD is the 2 byte
pattern with CCCC being the corresponding offset. The offset ranges from
0x0 to 0x1F7 (up to 504 bytes into the packet). The offset starts from
the beginning of the packet.
This feature can be used to allow customers to set flow director rules
for protocols headers that are beyond standard ones supported by ethtool
(e.g. PFCP or GTP-U).
Like for matching GTP-U's TEID value 0x10203040:
ethtool -N ens787f0v0 flow-type udp4 dst-port 2152 \
user-def 0x002e102000303040 action 13
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Support the addition and deletion of Ethernet filters.
Supported fields are: proto
Supported flow-types are: ether
Example usage:
ethtool -N ens787f0v0 flow-type ether proto 0x8863 action 6
ethtool -N ens787f0v0 flow-type ether proto 0x8864 action 7
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enable ethtool ntuple filter support on the VF driver using the virtchnl
interface to the PF driver and the Flow director functionality in the
hardware.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Enable returning FDIR completion status by checking the
ctrl_vsi Rx queue descriptor value.
To enable returning FDIR completion status from ctrl_vsi Rx queue,
COMP_Queue and COMP_Report of FDIR filter programming descriptor
needs to be properly configured. After program request sent to ctrl_vsi
Tx queue, ctrl_vsi Rx queue interrupt will be triggered and
completion status will be returned.
Driver will first issue request in ice_vc_fdir_add_fltr(), then
pass FDIR context to the background task in interrupt service routine
ice_vc_fdir_irq_handler() and finally deal with them in
ice_flush_fdir_ctx(). ice_flush_fdir_ctx() will check the descriptor's
value, fdir context, and then send back virtual channel message to VF
by calling ice_vc_add_fdir_fltr_post(). An additional timer will be
setup in case of hardware interrupt timeout.
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add new FDIR filter type to forward GTPU packets by matching TEID or QFI.
The filter is only enabled when COMMS DDP package is downloaded.
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add new filter type that allow forward non-IP Ethernet packets base on its
ethertype. The filter is only enabled when COMMS DDP package is loaded.
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add two new actions support for VF FDIR:
A passthrough action does not specify the destination queue, but
just allow the packet go to next pipeline stage, a typical use
cases is combined with a software mark (FDID) action.
Allow specify a 2^n continuous queues as the destination of a FDIR rule.
Packet distribution is based on current RSS configure.
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add basic FDIR flow list and pattern / action parse functions for VF.
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The virtual channel is going to be extended to support FDIR and
RSS configure from AVF. New data structures and OP codes will be
added, the patch enable the FDIR part.
To support above advanced AVF feature, we need to figure out
what kind of data structure should be passed from VF to PF to describe
an FDIR rule or RSS config rule. The common part of the requirement is
we need a data structure to represent the input set selection of a rule's
hash key.
An input set selection is a group of fields be selected from one or more
network protocol layers that could be identified as a specific flow.
For example, select dst IP address from an IPv4 header combined with
dst port from the TCP header as the input set for an IPv4/TCP flow.
The patch adds a new data structure virtchnl_proto_hdrs to abstract
a network protocol headers group which is composed of layers of network
protocol header(virtchnl_proto_hdr).
A protocol header contains a 32 bits mask (field_selector) to describe
which fields are selected as input sets, as well as a header type
(enum virtchnl_proto_hdr_type). Each bit is mapped to a field in
enum virtchnl_proto_hdr_field guided by its header type.
+------------+-----------+------------------------------+
| | Proto Hdr | Header Type A |
| | +------------------------------+
| | | BIT 31 | ... | BIT 1 | BIT 0 |
| |-----------+------------------------------+
|Proto Hdrs | Proto Hdr | Header Type B |
| | +------------------------------+
| | | BIT 31 | ... | BIT 1 | BIT 0 |
| |-----------+------------------------------+
| | Proto Hdr | Header Type C |
| | +------------------------------+
| | | BIT 31 | ... | BIT 1 | BIT 0 |
| |-----------+------------------------------+
| | .... |
+-------------------------------------------------------+
All fields in enum virtchnl_proto_hdr_fields are grouped with header type
and the value of the first field of a header type is always 32 aligned.
enum proto_hdr_type {
header_type_A = 0;
header_type_B = 1;
....
}
enum proto_hdr_field {
/* header type A */
header_A_field_0 = 0,
header_A_field_1 = 1,
header_A_field_2 = 2,
header_A_field_3 = 3,
/* header type B */
header_B_field_0 = 32, // = header_type_B << 5
header_B_field_0 = 33,
header_B_field_0 = 34
header_B_field_0 = 35,
....
};
So we have:
proto_hdr_type = proto_hdr_field / 32
bit offset = proto_hdr_field % 32
To simply the protocol header's operations, couple help macros are added.
For example, to select src IP and dst port as input set for an IPv4/UDP
flow.
we have:
struct virtchnl_proto_hdr hdr[2];
VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[0], IPV4)
VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[0], IPV4, SRC)
VIRTCHNL_SET_PROTO_HDR_TYPE(&hdr[1], UDP)
VIRTCHNL_ADD_PROTO_HDR_FIELD(&hdr[1], UDP, DST)
The byte array is used to store the protocol header of a training package.
The byte array must be network order.
The patch added virtual channel support for iAVF FDIR add/validate/delete
filter. iAVF FDIR is Flow Director for Intel Adaptive Virtual Function
which can direct Ethernet packets to the queues of the Network Interface
Card. Add/delete command is adding or deleting one rule for each virtual
channel message, while validate command is just verifying if this rule
is valid without any other operations.
To add or delete one rule, driver needs to config TCAM and Profile,
build training packets which contains the input set value, and send
the training packets through FDIR Tx queue. In addition, driver needs to
manage the software context to avoid adding duplicated rules, deleting
non-existent rule, input set conflicts and other invalid cases.
NOTE:
Supported pattern/actions and their parse functions are not be included in
this patch, they will be added in a separate one.
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Simei Su <simei.su@intel.com>
Signed-off-by: Beilei Xing <beilei.xing@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We are going to enable FDIR configure for AVF through virtual channel.
The first step is to add helper functions to support control VSI setup.
A control VSI will be allocated for a VF when AVF creates its
first FDIR rule through ice_vf_ctrl_vsi_setup().
The patch will also allocate FDIR rule space for VF's control VSI.
If a VF asks for flow director rules, then those should come entirely
from the best effort pool and not from the guaranteed pool. The patch
allow a VF VSI to have only space in the best effort rules.
Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Separate IPv4 and IPv6 ptype bit mask table into 2 tables:
with or without L4 protocols.
When a flow filter without any l4 type is specified, the
ICE_FLOW_SEG_HDR_IPV_OTHER flag can be used to describe if user
want to create a IP rule target for all IP packet or just IP
packet without l4 header.
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
To apply different input set for GTP-U packet with or without extend
header as well as GTP-U uplink and downlink, we need to add TCAM mask
matching capability. This allows comprehending different PTYPE
attributes by examining flags from the parser. Using this method,
different profiles can be used by examining flag values from the parser.
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add more protocol support in flow filter, these
include PPPoE, L2TPv3, GTP, PFCP, ESP and AH.
Signed-off-by: Ting Xu <ting.xu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
To support FDIR input set with protocol field like DSCP, TTL,
PROT, etc. which is not word aligned, we need to enable field
vector masking.
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add more protocol and field support for flow filter include:
ETH, VLAN, ICMP, ARP and TCP flag.
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The function e1000e_pm_prepare() may have no callers depending
on configuration, so it must be marked __maybe_unused to avoid
harmless warning:
drivers/net/ethernet/intel/e1000e/netdev.c:6926:12:
warning: 'e1000e_pm_prepare' defined but not used [-Wunused-function]
6926 | static int e1000e_pm_prepare(struct device *dev)
| ^~~~~~~~~~~~~~~~~
Fixes: ccf8b940e5 ("e1000e: Leverage direct_complete to speed up s2ram")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the following coccicheck warnings:
./drivers/net/ethernet/intel/igc/igc_main.c:4961:2-14: WARNING:
Assignment of 0/1 to bool variable.
./drivers/net/ethernet/intel/igc/igc_main.c:4955:2-14: WARNING:
Assignment of 0/1 to bool variable.
./drivers/net/ethernet/intel/igc/igc_main.c:4933:1-13: WARNING:
Assignment of 0/1 to bool variable.
./drivers/net/ethernet/intel/igc/igc_main.c:4592:1-24: WARNING:
Assignment of 0/1 to bool variable.
./drivers/net/ethernet/intel/igc/igc_main.c:4438:2-25: WARNING:
Assignment of 0/1 to bool variable.
./drivers/net/ethernet/intel/igc/igc_main.c:4396:2-25: WARNING:
Assignment of 0/1 to bool variable.
./drivers/net/ethernet/intel/igc/igc_main.c:4018:2-25: WARNING:
Assignment of 0/1 to bool variable.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
MII_CR_LOOPBACK masks not in use in i225 device and can be removed.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Force PHY speed not supported for i225 devices.
MII_CR_SPEED masks not in use in i225 device and can be removed.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
MII_CR_RESET mask not in use in i225 device and can be removed
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a couple of checks to make sure timestamping is on and that the
timestamp value from DMA is valid. This avoids any functional issues
that could come from a misinterpreted time stamp.
One of the functions changed doesn't need a return value added because
there was no value in checking from the calling locations.
While here, fix a couple of reverse christmas tree issues next to
the code being changed.
Fixes: f56e7bba22 ("igb: Pull timestamp from fragment before adding it to skb")
Fixes: 9cbc948b5a ("igb: add XDP support")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The include guard "_E1000_HW_H_" is used by two separate header files in
two different drivers (e1000/e1000_hw.h and igb/e1000_hw.h). Using the
same include guard macro in more than one header file may cause
unexpected behavior from the compiler. Fix this by renaming the
duplicate guard in the igb driver.
Fixes: 9d5c824399 ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The include guard "_E1000_HW_H_" is used by header files in three
different drivers (e1000/e1000_hw.h, e1000e/hw.h, and igb/e1000_hw.h).
Using the same include guard macro in more than one header file may
cause unexpected behavior from the compiler. Fix the duplicate include
guard in the e1000e driver by renaming it.
Fixes: bc7f75fa97 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)")
Signed-off-by: Tom Seewald <tseewald@gmail.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We want to change the current ndo_xdp_xmit drop semantics because it will
allow us to implement better queue overflow handling. This is working
towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error
path handling from each XDP ethernet driver to devmap code. According to
the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx
loop whenever the hw reports a tx error and it will just return to devmap
caller the number of successfully transmitted frames. It will be devmap
responsibility to free dropped frames.
Move each XDP ndo_xdp_xmit capable driver to the new APIs:
- veth
- virtio-net
- mvneta
- mvpp2
- socionext
- amazon ena
- bnxt
- freescale (dpaa2, dpaa)
- xen-frontend
- qede
- ice
- igb
- ixgbe
- i40e
- mlx5
- ti (cpsw, cpsw-new)
- tun
- sfc
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org
Update the Intel drivers to make use of ethtool_sprintf. The general idea
is to reduce code size and overhead by replacing the repeated pattern of
string printf statements and ETH_STRING_LEN counter increments.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2021-03-16
This series contains updates to i40e, ixgbe, and ice drivers.
Magnus Karlsson says:
Optimize run_xdp_zc() for the XDP program verdict being XDP_REDIRECT
in the xsk zero-copy path. This path is only used when having AF_XDP
zero-copy on and in that case most packets will be directed to user
space. This provides around 100k extra packets in throughput on my
server when running l2fwd in xdpsock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2021-03-15
This series contains updates to e1000e only.
Chen Yu says:
The NIC is put in runtime suspend status when there is no cable connected.
As a result, it is safe to keep non-wakeup NIC in runtime suspended during
s2ram because the system does not rely on the NIC plug event nor WoL to
wake up the system. Besides that, unlike the s2idle, s2ram does not need to
manipulate S0ix settings during suspend.
====================
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although there is platform issue of runtime suspend support
on CNP, it would be more flexible to let the user decide whether
to disable runtime or not because:
1. This can be done in userspace via
echo on > /sys/devices/pci0000\:00/0000\:00\:1f.d/power/control
2. More and more NICs would support runtime suspend, disabling the
runtime suspend on them by default would impact the validation.
Only disable runtime suspend on CNP in case of any user space regression.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The NIC is put in runtime suspend status when there is no cable connected.
As a result, it is safe to keep non-wakeup NIC in runtime suspended during
s2ram because the system does not rely on the NIC plug event nor WoL to wake
up the system. Besides that, unlike the s2idle, s2ram does not need to
manipulate S0ix settings during suspend.
This patch introduces the .prepare() for e1000e so that if the NIC is runtime
suspended the subsequent suspend/resume hooks will be skipped so as to speed
up the s2ram. The pm core will check whether the NIC is a wake up device so
there's no need to check it again in .prepare(). DPM_FLAG_SMART_PREPARE flag
should be set during probe to ask the pci subsystem to honor the driver's
prepare() result. Besides, the NIC remains runtime suspended after resumed
from s2ram as there is no need to resume it.
Tested on i7-2600K with 82579V NIC
Before the patch:
e1000e 0000:00:19.0: pci_pm_suspend+0x0/0x160 returned 0 after 225146 usecs
e1000e 0000:00:19.0: pci_pm_resume+0x0/0x90 returned 0 after 140588 usecs
After the patch:
echo disabled > //sys/devices/pci0000\:00/0000\:00\:19.0/power/wakeup
becomes 0 usecs because the hooks will be skipped.
Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
s/structue/structure/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize ice_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little over 100k extra packets
in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Optimize ixgbe_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little under 100k extra
packets in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Optimize i40e_run_xdp_zc() for the XDP program verdict being
XDP_REDIRECT in the xsk zero-copy path. This path is only used when
having AF_XDP zero-copy on and in that case most packets will be
directed to user space. This provides a little over 100k extra packets
in throughput on my server when running l2fwd in xdpsock.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Igb needs a similar fix as commit 75aab4e10a ("i40e: avoid
premature Rx buffer reuse")
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Longer explanation:
Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.
t0: Page is allocated, and put on the Rx ring
+---------------
used by NIC ->| upper buffer
(rx_buffer) +---------------
| lower buffer
+---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX
t1: Buffer is received, and passed to the stack (e.g.)
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 1
t2: Buffer is received, and redirected
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
Now, prior calling xdp_do_redirect():
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 2
This means that buffer *cannot* be flipped/reused, because the skb is
still using it.
The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
page count == USHRT_MAX - 1
rx_buffer->pagecnt_bias == USHRT_MAX - 2
From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!
To work around this, the page count is stored prior calling
xdp_do_redirect().
Fixes: 9cbc948b5a ("igb: add XDP support")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ixgbe_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on __IXGBE_RX_BUILD_SKB_ENABLED flag.
Currently, the callsite of mentioned function is placed incorrectly
within ixgbe_setup_rx_resources() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
Fix this by moving ixgbe_rx_offset() to ixgbe_configure_rx_ring() after
the flag setting, which happens to be set in ixgbe_set_rx_buffer_len.
Fixes: c0d4e9d223 ("ixgbe: store the result of ixgbe_rx_offset() onto ixgbe_ring")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
ice_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on ICE_RX_FLAGS_RING_BUILD_SKB flag as well as XDP prog presence.
Currently, the callsite of mentioned function is placed incorrectly
within ice_setup_rx_ring() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
Fix this by moving ice_rx_offset() to ice_setup_rx_ctx() after the flag
setting.
Fixes: f1b1f409bf ("ice: store the result of ice_rx_offset() onto ice_ring")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.
Currently, the callsite of mentioned function is placed incorrectly
within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.
For the record, below is the call graph:
i40e_vsi_open
i40e_vsi_setup_rx_resources
i40e_setup_rx_descriptors
i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below
i40e_vsi_configure_rx
i40e_configure_rx_ring
set_ring_build_skb_enabled(ring) <-- set build_skb flag
Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
the flag setting.
Fixes: f7bb0d71d6 ("i40e: store the result of i40e_rx_offset() onto i40e_ring")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix the wrong napi work done reporting in the xsk path of the ice
driver. The code in the main Rx processing loop was written to assume
that the buffer allocation code returns true if all allocations where
successful and false if not. In contrast with all other Intel NIC xsk
drivers, the ice_alloc_rx_bufs_zc() has the inverted logic messing up
the work done reporting in the napi loop.
This can be fixed either by inverting the return value from
ice_alloc_rx_bufs_zc() in the function that uses this in an incorrect
way, or by changing the return value of ice_alloc_rx_bufs_zc(). We
chose the latter as it makes all the xsk allocation functions for
Intel NICs behave in the same way. My guess is that it was this
unexpected discrepancy that gave rise to this bug in the first place.
Fixes: 5bb0c4b5eb ("ice, xsk: Move Rx allocation out of while-loop")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is one e1e_wphy() call in e1000_set_d0_lplu_state_82571
that we have caught its return value but lack further handling.
Check and terminate the execution flow just like other e1e_wphy()
in this function.
Fixes: bc7f75fa97 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
A possible race condition was found in e1000_reset_task,
after discovering a similar issue in igb driver via
commit 024a8168b7 ("igb: reinit_locked() should be called
with rtnl_lock").
Added rtnl_lock() and rtnl_unlock() to avoid this.
Fixes: bc7f75fa97 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The comment describing the timestamps layout in the packet buffer is
wrong and the code is actually retrieving the timestamp in Timer 1
reference instead of Timer 0. This hasn't been a big issue so far
because hardware is configured to report both timestamps using Timer 0
(see IGC_SRRCTL register configuration in igc_ptp_enable_rx_timestamp()
helper). This patch fixes the comment and the code so we retrieve the
timestamp in Timer 0 reference as expected.
This patch also takes the opportunity to get rid of the hw.mac.type check
since it is not required.
Fixes: 81b055205e ("igc: Add support for RX timestamping")
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Signed-off-by: Vedang Patel <vedang.patel@intel.com>
Signed-off-by: Jithu Joseph <jithu.joseph@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The Supported Pause Frame always display "No" even though the Advertised
pause frame showing the correct setting based on the pause parameters via
ethtool. Set bit in link_ksettings to "Supported" for Pause Frame.
Before output:
Supported pause frame use: No
Expected output:
Supported pause frame use: Symmetric
Fixes: 8c5ad0dae9 ("igc: Add ethtool support")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reviewed-by: Malli C <mallikarjuna.chilakala@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix Pause Frame Advertising when getting the advertisement via ethtool.
Remove setting the "advertising" bit in link_ksettings during default
case when Tx and Rx are in off state with Auto Negotiate off.
Below is the original output of advertisement link during Tx and Rx off:
Advertised pause frame use: Symmetric Receive-only
Expected output:
Advertised pause frame use: No
Fixes: 8c5ad0dae9 ("igc: Add ethtool support")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reviewed-by: Malli C <mallikarjuna.chilakala@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This commit applies to the igc_reset_task the same changes that
were applied to the igb driver in commit 024a8168b7 ("igb:
reinit_locked() should be called with rtnl_lock")
and fix possible race in reset subtask.
Fixes: 0507ef8a03 ("igc: Add transmit and receive fastpath and interrupt handlers")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As explained in commit 29d98f54a4 ("net: enetc: allow hardware
timestamping on TX queues with tc-etf enabled"), hardware TX
timestamping requires an skb with skb->tstamp = 0. When a packet is sent
with SO_TXTIME, the skb->skb_mstamp_ns corrupts the value of skb->tstamp,
so the drivers need to explicitly reset skb->tstamp to zero after
consuming the TX time.
Create a helper named skb_txtime_consumed() which does just that. All
drivers which offload TC_SETUP_QDISC_ETF should implement it, and it
would make it easier to assess during review whether they do the right
thing in order to be compatible with hardware timestamping or not.
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When iavf_process_config() fails, no error return code of
iavf_init_get_resources() is assigned.
To fix this bug, err is assigned with the return value of
iavf_process_config(), and then err is checked.
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ixgbe_fdir_write_perfect_filter_82599() fails,
input allocated by kzalloc() has not been freed,
which leads to memleak.
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Based on talks and indirect references ixgbe IPsec offlod do not
support IPsec tunnel mode offload. It can only support IPsec transport
mode offload. Now explicitly fail when creating non transport mode SA
with offload to avoid false performance expectations.
Fixes: 63a67fe229 ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Antony Antony <antony@phenome.org>
Acked-by: Shannon Nelson <snelson@pensando.io>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-02-22
Dave corrects reporting of max TCs to use the value from hardware
capabilities and setting of DCBx capability bits when changing between
SW and FW LLDP.
Brett fixes trusted VF multicast promiscuous not receiving expected
packets and corrects VF max packet size when a port VLAN is configured.
Henry updates available RSS queues following a change in channel count
with a user defined LUT.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: update the number of available RSS queues
ice: Fix state bits on LLDP mode switch
ice: Account for port VLAN in VF max packet size calculation
ice: Set trusted VF as default VSI when setting allmulti on
ice: report correct max number of TCs
====================
Link: https://lore.kernel.org/r/20210222235814.834282-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-02-19
This series contains updates to i40e driver only.
Slawomir resolves an issue with the IPv6 extension headers being
processed incorrectly.
Keita Suzuki fixes a memory leak on probe failure.
Mateusz initializes AQ command structures to zero to comply with
spec, fixes FW flow control settings being overwritten and resolves an
issue with adding VLAN filters after enabling FW LLDP. He also adds
an additional check when adding TC filter as the current check doesn't
properly distinguish between IPv4 and IPv6.
Sylwester removes setting disabled bit when syncing filters as this
prevents VFs from completing setup.
Norbert cleans up sparse warnings.
v2:
- Fix fixes tag on patch 7
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: Fix endianness conversions
i40e: Fix add TC filter for IPv6
i40e: Fix VFs not created
i40e: Fix addition of RX filters after enabling FW LLDP agent
i40e: Fix overwriting flow control settings during driver loading
i40e: Add zero-initialization of AQ command structures
i40e: Fix memory leak in i40e_probe
i40e: Fix flow for IPv6 next header (extension header)
====================
Link: https://lore.kernel.org/r/20210219213606.2567536-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It was possible to have Rx queues that were not available for use
by RSS. This would happen when increasing the number of Rx queues
while there was a user defined RSS LUT.
Always update the number of available RSS queues when changing the
number of Rx queues.
Fixes: 87324e747f ("ice: Implement ethtool ops for channels")
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
DCBX_CAP bits were not being adjusted when switching
between SW and FW controlled LLDP.
Adjust bits to correctly indicate which mode the
LLDP logic is in.
Fixes: b94b013eb6 ("ice: Implement DCBNL support")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently if an AVF driver doesn't account for the possibility of a
port VLAN when determining its max packet size then packets at MTU will
be dropped. It is not the VF driver's responsibility to account for a
port VLAN so fix this. To fix this, do the following:
1. Add a function that determines the max packet size a VF is allowed by
using the port's max packet size and whether the VF is in a port
VLAN. If a port VLAN is configured then a VF's max packet size will
always be the port's max packet size minus VLAN_HLEN. Otherwise it
will be the port's max packet size.
2. Use this function to verify the max packet size from the VF.
3. If there is a port VLAN configured then add 4 bytes (VLAN_HLEN) to
the VF's max packet size configuration.
Also, the VIRTCHNL_OP_GET_VF_RESOURCES message provides the capability
to communicate a VF's max packet size. Use the new function for this
purpose.
Fixes: 1071a8358a ("ice: Implement virtchnl commands for AVF support")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the PF will only set a trusted VF as the default VSI when it
requests FLAG_VF_UNICAST_PROMISC over VIRTCHNL. However, when
FLAG_VF_MULTICAST_PROMISC is set it's expected that the trusted VF will
see multicast packets that don't have a matching destination MAC in the
devices internal switch. Fix this by setting the trusted VF as the
default VSI if either FLAG_VF_UNICAST_PROMISC or
FLAG_VF_MULTICAST_PROMISC is set.
Fixes: 01b5e89aab ("ice: Add VF promiscuous support")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In the driver currently, we are reporting max number of TCs
to the DCBNL callback as a kernel define set to 8. This is
preventing userspace applications performing DCBx to correctly
down map the TCs from requested to actual values.
Report the actual max TC value to userspace from the capability
struct.
Fixes: b94b013eb6 ("ice: Implement DCBNL support")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fixes the following sparse warnings:
i40e_main.c:5953:32: warning: cast from restricted __le16
i40e_main.c:8008:29: warning: incorrect type in assignment (different base types)
i40e_main.c:8008:29: expected unsigned int [assigned] [usertype] ipa
i40e_main.c:8008:29: got restricted __le32 [usertype]
i40e_main.c:8008:29: warning: incorrect type in assignment (different base types)
i40e_main.c:8008:29: expected unsigned int [assigned] [usertype] ipa
i40e_main.c:8008:29: got restricted __le32 [usertype]
i40e_txrx.c:1950:59: warning: incorrect type in initializer (different base types)
i40e_txrx.c:1950:59: expected unsigned short [usertype] vlan_tag
i40e_txrx.c:1950:59: got restricted __le16 [usertype] l2tag1
i40e_txrx.c:1953:40: warning: cast to restricted __le16
i40e_xsk.c:448:38: warning: invalid assignment: |=
i40e_xsk.c:448:38: left side has type restricted __le64
i40e_xsk.c:448:38: right side has type int
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Fixes: 2a508c64ad ("i40e: fix VLAN.TCI == 0 RX HW offload")
Fixes: 3106c580fb ("i40e: Use batched xsk Tx interfaces to increase performance")
Fixes: 8f88b3034d ("i40e: Add infrastructure for queue channel support")
Signed-off-by: Norbert Ciosek <norbertx.ciosek@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix insufficient distinction between IPv4 and IPv6 addresses
when creating a filter.
IPv4 and IPv6 are kept in the same memory area. If IPv6 is added,
then it's caught by IPv4 check, which leads to err -95.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When creating VFs they were sometimes not getting resources.
It was caused by not executing i40e_reset_all_vfs due to
flag __I40E_VF_DISABLE being set on PF. Because of this
IAVF was never able to finish setup sequence never
getting reset indication from PF.
Changed test_and_set_bit __I40E_VF_DISABLE in
i40e_sync_filters_subtask to test_bit and removed clear_bit.
This function should not set this bit it should only check
if it hasn't been already set.
Fixes: a7542b8760 ("i40e: check __I40E_VF_DISABLE bit in i40e_sync_filters_subtask")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fix addition of VLAN filter for PF after enabling FW LLDP agent.
Changing LLDP Agent causes FW to re-initialize per NVM settings.
Remove default PF filter and move "Enable/Disable" to currently used
reset flag.
Without this patch PF would try to add MAC VLAN filter with default
switch filter present. This causes AQ error and sets promiscuous mode
on.
Fixes: c65e78f87f ("i40e: Further implementation of LLDP")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
During driver loading flow control settings were written to FW
using a variable which was always zero, since it was being set
only by ethtool. This behavior has been corrected and driver
no longer overwrites the default FW/NVM settings.
Fixes: 373149fc99 ("i40e: Decrease the scope of rtnl lock")
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Zero-initialize AQ command data structures to comply with
API specifications.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Fixes: f4492db16d ("i40e: Add NPAR BW get and set functions")
Signed-off-by: Andrzej Sawuła <andrzej.sawula@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Struct i40e_veb is allocated in function i40e_setup_pf_switch, and
stored to an array field veb inside struct i40e_pf. However when
i40e_setup_misc_vector fails, this memory leaks.
Fix this by calling exit and teardown functions.
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.
The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with I40E_TX_CTX_EXT_IP_IPV6.
The extension header was not skipped at all.
The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.
Fixes: a3fd9d8876 ("i40e/i40evf: Handle IPv6 extension headers in checksum offload")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The variable mfs_max is not initialized and is being compared to find
the maximum value. Fix this by initializing it to 0.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 90bc8e003b ("i40e: Add hardware configuration for software based DCB")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that the right argument to be passed is &tcp_ip6_spec->ip6dst,
not &tcp_ip6_spec->ip6src, when calling function ipv6_addr_any().
Addresses-Coverity-ID: 1501734 ("Copy-paste error")
Fixes: efca91e89b ("i40e: Add flow director support for IPv6")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Output of ixgbe_rx_offset() is based on ethtool's priv flag setting, which
when changed, causes PF reset (disables napi, frees irqs, loads
different Rx mem model, etc.). This means that within napi its result is
constant and there is no reason to call it per each processed frame.
Add new 'rx_offset' field to ixgbe_ring that is meant to hold the
ixgbe_rx_offset() result and use it within ixgbe_clean_rx_irq().
Furthermore, use it within ixgbe_alloc_mapped_page().
Last but not least, un-inline the function of interest as it lives in .c
file so let compiler do the decision about the inlining.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Output of ice_rx_offset() is based on ethtool's priv flag setting, which
when changed, causes PF reset (disables napi, frees irqs, loads
different Rx mem model, etc.). This means that within napi its result is
constant and there is no reason to call it per each processed frame.
Add new 'rx_offset' field to ice_ring that is meant to hold the
ice_rx_offset() result and use it within ice_clean_rx_irq().
Furthermore, use it within ice_alloc_mapped_page().
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Output of i40e_rx_offset() is based on ethtool's priv flag setting,
which when changed, causes PF reset (disables napi, frees irqs, loads
different Rx mem model, etc.). This means that within napi its result is
constant and there is no reason to call it per each processed frame.
Add new 'rx_offset' field to i40e_ring that is meant to hold the
i40e_rx_offset() result and use it within i40e_clean_rx_irq().
Furthermore, use it within i40e_alloc_mapped_page().
Last but not least, un-inline the function of interest so that compiler
makes the decision about inlining as it lives in .c file.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fold the count decrement into the while-statement.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Whole zero-copy variant of clean Rx IRQ is executed when xsk_pool is
attached to rx_ring and it can happen only when XDP program is present
on interface. Therefore it is safe to assume that program is always
!NULL and there is no need for checking it in ice_run_xdp_zc.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
dev_validate_mtu checks that mtu value specified by user is not less
than min mtu and not greater than max allowed mtu. It is being done
before calling the ndo_change_mtu exposed by driver, so remove these
redundant checks in ice_change_mtu.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Similar thing has been done in i40e, as there is no real need for having
the sk_buff pointer in each rx_buf. Non-eop frames can be simply handled
on that pointer moved upwards to rx_ring.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There's no need for 'result' variable, we can directly return the
internal status based on action returned by xdp prog.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e_is_non_eop had a leftover comment and unused skb argument which was
used for placing the skb onto rx_buf in case when current buffer was
non-eop one. This is not relevant anymore as commit e72e56597b
("i40e/i40evf: Moves skb from i40e_rx_buffer to i40e_ring") pulled the
non-complete skb handling out of rx_bufs up to rx_ring. Therefore,
let's adjust the function arguments that i40e_is_non_eop takes.
Furthermore, since there is already a function responsible for bumping
the ntc, make use of that and drop that logic from i40e_is_non_eop so
that the scope of this function is limited to what the name actually
states.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i40e_cleanup_headers has a statement about check against skb being
linear or not which is not relevant anymore, so let's remove it.
Same case for i40e_can_reuse_rx_page, it references things that are not
present there anymore.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Net core handles the case where netdev has no xdp prog attached and
current prog is NULL. Therefore, remove such check within
i40e_xdp_setup.
Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The variable ret is overwritten by the following call
i40e_clean_arq_element() and the assignment is useless, so remove it.
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Allow user to specify VLAN field and add it to flow director. Show VLAN
field in "ethtool -n ethx" command.
Handle VLAN type and tag field provided by ethtool command. Refactored
filter addition, by replacing static arrays with runtime dummy packet
creation, which allows specifying VLAN field.
Previously, VLAN field was omitted.
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Flow director for IPv6 is not supported.
1) Implementation of support for IPv6 flow director.
2) Added handlers for addition of TCP6, UDP6, SCTP6, IPv6.
3) Refactored legacy code to make it more generic.
4) Added packet templates for TCP6, UDP6, SCTP6, IPv6.
5) Added handling of IPv6 source and destination address for flow director.
6) Improved argument passing for source and destination portin TCP6, UDP6
and SCTP6.
7) Added handling of ethtool -n for IPv6, TCP6,UDP6, SCTP6.
8) Used correct bit flag regarding FLEXOFF field of flow director data
descriptor.
Without this patch, there would be no support for flow director on IPv6,
TCP6, UDP6, SCTP6.
Tested based on x710 datasheet by using:
ethtool -N enp133s0f0 flow-type tcp4 src-port 13 dst-port 37 user-def 0x44142 action 1
ethtool -N enp133s0f0 flow-type tcp6 src-port 13 dst-port 40 user-def 0x44142 action 2
ethtool -N enp133s0f0 flow-type udp4 src-port 20 dst-port 40 user-def 0x44142 action 3
ethtool -N enp133s0f0 flow-type udp6 src-port 25 dst-port 40 user-def 0x44142 action 4
ethtool -N enp133s0f0 flow-type sctp4 src-port 55 dst-port 65 user-def 0x44142 action 5
ethtool -N enp133s0f0 flow-type sctp6 src-port 60 dst-port 40 user-def 0x44142 action 6
ethtool -N enp133s0f0 flow-type ip4 src-ip 1.1.1.1 dst-ip 1.1.1.4 user-def 0x44142 action 7
ethtool -N enp133s0f0 flow-type ip6 src-ip fe80::3efd:feff:fe6f:bbbb dst-ip fe80::3efd:feff:fe6f:aaaa user-def 0x44142 action 8
Then send traffic from client which matches the criteria provided to ethtool.
Observe that packets are redirected to user set queues with ethtool -S <interface>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Implement Energy Efficient Ethernet (EEE) status getting & setting.
The i40e_get_eee() requesting PHY EEE capabilities from firmware.
The i40e_set_eee() function requests PHY EEE capabilities
from firmware and sets PHY EEE advertising to full abilities or 0
depending whether EEE is to be enabled or disabled.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add callbacks used by software based LLDP agent, which allows to
configure DCB feature from userspace.
Update copyright dates as appropriate.
If LLDP agent is turned off in BIOS, or after setting private flag
("disable-fw-lldp on"). The driver initialized DCB functionality with
default values, one traffic class with 100% bandwidth allocated.
The new netlink callbacks are required for software LLDP agent, it
must be able to acquire current DCB configuration of a network port
and apply DCB configuration changes, if required.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add extra handling on changing the "disable-fw-lldp" private
flag to properly initialize software based DCB feature.
Add default configuration of DCB functionality when Firmware
LLDP agent is turned off, in case of driver probe and device
reset on reconfiguration.
Update copyright dates as appropriate.
Software based DCB is a brand-new feature in i40e driver.
Before, DCB was implemented by Firmware LLDP agent only. The agent was
responsible for handling incoming DCB-related LLDP frames and
applying received DCB configuration to hardware.
Default configuration and new initialization flow for software based
DCB is required. If LLDP agent is turned off in BIOS, or after
setting private flag ("disable-fw-lldp on"). The driver initializes
DCB functionality with default values, one traffic class with 100%
bandwidth allocated.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add registers and definitions required for applying
DCB related hardware configuration.
Add functions responsible for calculating and setting proper
hardware configuration values for software based DCB functionality.
Add function responsible for invoking Admin Queue command, which
results in applying new DCB configuration to the hardware.
Update copyright dates as appropriate.
Software based DCB is a brand-new feature in i40e driver.
Before, DCB was implemented by Firmware LLDP agent only. The agent was
responsible for handling incoming DCB-related LLDP frames and
applying received DCB configuration to hardware.
New communication channel between software and hardware is required
for software driver. It must be able to calculate and configure all
the registers related for DCB feature.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2021-02-08
This series contains updates to i40e driver only.
Cristian makes improvements to driver XDP path. Avoids writing
next-to-clean pointer on every update, removes redundant updates of
cleaned_count and buffer info, creates a helper function to consolidate
XDP actions and simplifies some of the behavior.
Eryk adds messages to inform the user when MTU is larger than supported
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When attempting to link XDP prog with MTU larger than supported,
user is not informed why XDP linking fails. Adding proper
error message: "MTU too large to enable XDP".
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Eryk Rybak <eryk.roch.rybak@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Consolidate the actions performed on the packet based on the XDP
program result into a separate function that is easier to read and
maintain. Simplify the i40e_construct_skb_zc function, so that the
input xdp buffer is always freed, regardless of whether the output
skb is successfully created or not. Simplify the behavior of the
i40e_clean_rx_irq_zc function, so that the current packet descriptor
is dropped when function i40_construct_skb_zc returns an error as
opposed to re-processing the same description on the next invocation.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
For performance reasons, remove the redundant buffer info updates
(*bi = NULL). The buffers ready to be cleaned can easily be tracked
based on the ring next-to-clean variable, which is consistently
updated.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
For performance reasons, remove the redundant updates of the cleaned_count
variable, as its value can be computed based on the ring next-to-clean
variable, which is consistently updated.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
For performance reasons, avoid writing the ring next-to-clean pointer
value back to memory on every update, as it is not really necessary.
Instead, simply read it at initialization into a local copy, update
the local copy as necessary and write the local copy back to memory
after the last update.
Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently if the driver is unable to get all the MSI-X vectors it wants, it
falls back to the minimum configuration which equates to a single Tx/Rx
traffic queue pair. Instead of using the minimum configuration, if given
more vectors than the minimum, utilize those vectors for additional traffic
queues after accounting for other interrupts.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
This message indicates an error on close, not open.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Casting a void * rvalue in an assignment is unnecessary in C; remove the
casts.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Refactor the DCB related variables out of the ice_port_info_struct. The
goal is to make the ice_port_info struct cleaner.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Co-developed-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The writeback enable logic was incorrectly implemented (due to
misunderstanding what the side effects of the implementation would be
during polling).
Fix this logic issue, while implementing a new feature allowing the user
to control the writeback frequency using the knobs for controlling
interrupt throttling that we already have. Basically if you leave
adaptive interrupts enabled, the writeback frequency will be varied even
if busy_polling or if napi-poll is in use. If the interrupt rates are
set to a fixed value by ethtool -C and adaptive is off, the driver will
allow the user-set interrupt rate to guide how frequently the hardware
will complete descriptors to the driver.
Effectively the user will get a control over the hardware efficiency,
allowing the choice between immediate interrupts or delayed up to a
maximum of the interrupt rate, even when interrupts are disabled
during polling.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The core clock frequency is currently hardcoded at 446 MHz for the RL
profile calculations. This causes issues since not all devices use that
clock frequency. Read the GLGEN_CLKSTAT_SRC register to determine which PSM
clock frequency is selected. This ensures that the rate limiter profile
calculations will be correct.
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Create set scheduler aggregator node and move for VSIs into respective
scheduler node. Max children per aggregator node is 64.
There are two types of aggregator node(s) created.
1. dedicated node for PF and _CTRL VSIs
2. dedicated node(s) for VFs.
As part of reset and rebuild, aggregator nodes are recreated and VSIs
are moved to respective aggregator node.
Having related VSIs in respective tree avoid starvation between PF and VF
w.r.t Tx bandwidth.
Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Co-developed-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Victor Raj <victor.raj@intel.com>
Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add the framework and initial implementation for receiving and processing
netdev bonding events. This is only the software support and the
implementation of the HW offload for bonding support will be coming at a
later time. There are some architectural gaps that need to be closed
before that happens.
Because this is a software only solution that supports in kernel bonding,
SR-IOV is not supported with this implementation.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Current implementation of netdev already contains xsk_buff_pools.
We no longer have to contain these structures in ice_vsi.
Refactor the code to operate on netdev-provided xsk_buff_pools.
Move scheduling napi on each queue to a separate function to
simplify setup function.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is an issue with some NVMs where an already existent LLDP
filter is blocking the creation of a filter to allow LLDP packets
to be redirected to the default VSI for the interface. This is
blocking all LLDP functionality based in the kernel when the FW
LLDP agent is disabled (e.g. software based DCBx).
Implement the new AQ command to allow adding VSI destinations to
existent filters on NVM versions that support the new command.
The new lldp_fltr_ctrl AQ command supports Rx filters only, so the
code flow for adding filters to disable Tx of control frames will
remain intact.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently there is no message printed on the host when a VF goes in and
out of promiscuous mode. This is causing confusion because this is the
expected behavior based on i40e. Fix this.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The check for a NULL pf pointer is moot since the earlier declaration and
assignment of struct device *dev already de-referenced the pointer. Also,
the only caller of ice_set_dflt_mib() already ensures pf is not NULL.
Cc: Dave Ertman <david.m.ertman@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use the flex_array_size() helper with the recently added flexible array
members in structures.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
Refactor the code according to the use of a flexible-array member in
struct ice_res_tracker, instead of a one-element array and use the
struct_size() helper to calculate the size for the allocations.
Also, notice that the code below suggests that, currently, two too many
bytes are being allocated with devm_kzalloc(), as the total number of
entries (pf->irq_tracker->num_entries) for pf->irq_tracker->list[] is
_vectors_ and sizeof(*pf->irq_tracker) also includes the size of the
one-element array _list_ in struct ice_res_tracker.
drivers/net/ethernet/intel/ice/ice_main.c:3511:
3511 /* populate SW interrupts pool with number of OS granted IRQs. */
3512 pf->num_avail_sw_msix = (u16)vectors;
3513 pf->irq_tracker->num_entries = (u16)vectors;
3514 pf->irq_tracker->end = pf->irq_tracker->num_entries;
With this change, the right amount of dynamic memory is now allocated
because, contrary to one-element arrays which occupy at least as much
space as a single object of the type, flexible-array members don't
occupy such space in the containing structure.
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays
Built-tested-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Just as we recently added support for other stored firmware flash
versions, support display of the stored UNDI Option ROM version via
devlink info.
To do this, we need to introduce a new ice_get_inactive_orom_ver
function. This is a little trickier than with other flash versions. The
Option ROM version data was being read from a special "Boot
Configuration" block of the NVM Preserved Field Area. This block only
contains the *active* Option ROM version data. It is populated when the
device firmware finishes updating the Option ROM.
This method is ineffective at reading the stored Option ROM version
data. Instead of reading from this section of the flash, replace this
version extraction with one which locates the Combo Version information
from within the Option ROM binary.
This data is stored within the Option ROM at a 512 byte offset, in
a simple structured format. The structure uses a simple modulo 256
checksum for integrity verification. Scan through the Option ROM to
locate the CIVD data section, and extract the Combo Version.
Refactor ice_get_orom_ver_info so that it takes the bank select
enumeration parameter. Use this to implement ice_get_inactive_orom_ver.
Although all ice devices have a Boot Configuration block in the NVM PFA,
not all devices have a valid Option ROM. In this case, the old
ice_get_orom_ver_info would "succeed" but report a version of all
zeros. The new implementation would fail to locate the $CIV section in
the Option ROM and report an error. Thus, we must ensure that
ice_init_nvm does not fail if ice_get_orom_ver_info fails.
Use the new ice_get_inactive_orom_ver to allow reporting the Option ROM
versions for a pending update via devlink info.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a function to read the inactive netlist bank for version
information. To support this, refactor how we read the netlist version
data. Instead of using the firmware AQ interface with a module ID, read
from the flash as a flat NVM, using ice_read_flash_module.
This change requires a slight adjustment to the offset values used, as
reading from the flat NVM includes the type field (which was stripped by
firmware previously). Cleanup the macro names and move them to
ice_type.h. For clarity in how we calculate the offsets and so that
programmers can easily map the offset value to the data sheet, use
a wrapper macro to account for the offset adjustments.
Use the newly added ice_get_inactive_netlist_ver function to extract the
version data from the pending netlist module update. Add the stored
variants of "fw.netlist", and "fw.netlist.build" to the info version map
array.
With this change, we now report the "fw.netlist" and "fw.netlist.build"
versions into the stored section of the devlink info report. As with the
main NVM module versions, if there is no pending update, we report the
currently active values as stored.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The devlink info interface supports drivers reporting "stored" versions.
These versions indicate the version of an update that has been
downloaded to the device, but is not yet active.
The code for extracting the NVM version recently changed to enable
support for reading from either the active or the inactive bank. Use
this to implement ice_get_inactive_nvm_ver, which will read the NVM
version data from the inactive section of flash.
When reporting the versions via devlink info, first read the device
capabilities. Determine if there is a pending flash update, and if so,
extract relevant version information from the inactive flash. Store
these within the info context structure.
When reporting "stored" firmware versions, devlink documentation
indicates that we ought to always report a stored value, even if there
is no pending update. In this common case, the stored version should
match the running version. This means that each stored version should by
default fallback to the same value as reported by the running handler.
To support this, modify the version structure to have both a "getter"
and a "fallback". Modify the control loop so that it will use the
"fallback" function if the "getter" function does not report a version.
To report versions for which we can read the stored value, use a new
"stored()" macro. This macro will insert two entries into the version
list. The first entry is the traditional running version. The second is
the stored version, implemented with a fallback to the active version.
This is a little tricky, but reduces the overall duplication of elements
in the entry list, and ensures that running and stored values remain
consistent.
To avoid some duplication, add a combined() macro that will insert both
the running and stored versions into the version entry list.
Using this new support, add pending version reporter functions for
"fw.psid.api" and "fw.bundle_id". This enables reporting the stored
values for some of versions in the NVM module of the flash.
Reporting management versions is not implemented by this patch. The
active management version is reported to the driver via the AdminQ
mailbox during load. Although the version must be in the firmware binary
somewhere, accessing this from the inactive firmware is not trivial and
has not been implemented in this change.
Future changes will introduce support for reading the UNDI Option ROM
version and the version associated with the Netlist module.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When reading from the flash memory of the device, the ice driver has two
interfaces available to it. First, it can use a mediated interface via
firmware that allows specifying a module ID. This allows reading from
specific modules of the active flash bank.
The second interface available is to perform flat reads. This allows
complete access to the entire flash. However, using it requires the
software to handle calculating module location and interpret pointer
addresses.
While most data required is accessible through the convenient first
interface, certain flash contents are not. This includes the CSS header
information associated with the Option ROM and NVM banks, as well as any
access to the "inactive" banks used as scratch space for performing
flash updates.
In order to access all of the relevant flash contents, software must use
the flat reads. Rather than forcing all flows to perform flat read
calculations, introduce a new abstraction for reading from the flash:
ice_read_flash_module. This function provides an abstraction for reading
from either the active or inactive flash bank at the requested module.
This interface is very similar to the abstraction provided via firmware,
but allows access to additional modules, as well as providing
a mechanism to request access to both flash banks.
At first glance, it might make sense for this abstraction to allow
specifying precisely which bank (1st or 2nd) the caller wishes to read.
This is simpler to implement but more difficult to use. In practice,
most callers only know whether they want the active bank, or the
inactive bank. Rather than force callers to determine for themselves
which bank to read from, implement ice_read_flash_module in terms of
"active" vs "inactive". This significantly simplifies the implementation
at the caller level and is a more useful abstraction over the flash
contents.
Make use of this new interface to refactor reading of the main NVM
version information. Instead of using the firmware's mediated ShadowRAM
function, use the ice_read_flash_module abstraction.
To do this, notice that most reads of the NVM are going to be in 2-byte
word chunks. To simplify using ice_read_flash_module for this case,
ice_read_nvm_module is introduced. This is a simple wrapper around
ice_read_flash_module which takes the correct pointer address for the
NVM bank, and forces the 2-byte word format onto the caller.
When reading the NVM versions, some fields are read from the Shadow RAM.
The Shadow RAM is the first 64KB of flash memory, and is populated
during device load. Most fields are copied from a section within the
active NVM bank. In order to read this data from both the active and
inactive NVM banks, we need to read not from the first 64KB of flash,
but instead from the correct offset into the NVM bank. Introduce
ice_read_nvm_sr_copy for this purpose. This function wraps around
ice_read_nvm_module and has the same interface as the ice_read_sr_word,
with the exception of allowing the caller to specify whether to read the
active or inactive flash bank.
With this change, it is now trivial to refactor ice_get_nvm_ver_info to
read using the software mediated ice_read_flash_module interface instead
of relying on the firmware mediated interface. This will be used in the
following change to implement support for stored versions in the devlink
info report.
Additionally, the overall ice_read_flash_module interface will be used
and extended to support all three major flash banks, and additionally to
support reading the flash image security revision information.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice flash contains two copies of each of the NVM, Option ROM, and
Netlist modules. Each bank has a pointer word and a size word. In order
to correctly read from the active flash bank, the driver must calculate
the offset manually.
During NVM initialization, read the Shadow RAM control word and
determine which bank is active for each NVM module. Additionally, cache
the size and pointer values for use in calculating the correct offset.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice driver uses an array of structures which link an info name with
a function that formats the associated version data into a string.
All existing format functions simply format already captured static data
from the driver hw structure. Future changes will introduce format
functions for reporting the versions of flash sections stored but not
yet applied. This type of version data is not stored as a member of the
hw structure. This is because (a) it might not yet exist in the case
there is no pending flash update, and (b) even if it does, it might
change such as if an update is canceled or replaced by a new update
before finalizing.
We could simply have each format function gather its own data upon being
called. However, in some cases the raw binary version data is
a combination of multiple different reported fields. Additionally, the
current interface doesn't have a way for the function to indicate that
the version doesn't exist.
Refactor this function interface to take a new ice_info_ctx structure
instead of the buffer pointer and length. This context structure allows
for future extensions to pre-gather version data that is stored within
the context struct instead of the hw struct.
Allocate this context structure initially at the start of
ice_devlink_info_get. We use dynamic allocation instead of a local stack
variable in order to avoid using too much kernel stack once we extend it
with additional data structures.
Modify the main loop that drives the info reporting so that the version
buffer string is always cleared between each format. Explicitly check
that the format function actually filled in a version string of non-zero
length. If the string is not provided, simply skip this version without
reporting an error. This allows for introducing format functions of
versions which may or may not be present, such as the version of
a pending update that has not yet been activated.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice_nvm_info structure has become somewhat of a dumping ground for
all of the fields related to flash version. It holds the NVM version and
EETRACK id, the OptionROM info structure, the flash size, the ShadowRAM
size, and more.
A future change is going to add the ability to read the NVM version and
EETRACK ID from the inactive NVM bank. To make this simpler, it is
useful to have these NVM version info fields extracted to their own
structure.
Rename ice_nvm_info into ice_flash_info, and create a separate
ice_nvm_info structure that will contain the eetrack and NVM map
version. Move the netlist_ver structure into ice_flash_info and rename it
ice_netlist_info for consistency.
Modify the static ice_get_orom_ver_info to take the option rom structure
as a pointer. This makes it more obvious what portion of the hw struct
is being modified. Do the same for ice_get_netlist_ver_info.
Introduce a new ice_get_nvm_ver_info function, which will be similar to
ice_get_orom_ver_info and ice_get_netlist_ver_info, used to keep the NVM
version extraction code co-located.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When erasing, notify userspace of how long we will potentially take to
erase a module. Doing so allows userspace to report the timeout, giving
a clear indication of the upper time bound of the operation.
Since we're re-using the erase timeout value, make it a macro rather
than a magic number.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Shannon Nelson <snelson@pensando.io>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2021-02-03
This series contains updates to igc, igb, e1000e, and e1000 drivers.
Sasha adds counting of good transmit packets and reporting of NVM version
and gPHY version in ethtool firmware version. Replaces the use of strlcpy
to the preferred strscpy. Fixes a typo that caused the wrong register to be
output. He also removes an unused function pointer, some unneeded defines,
and a non-applicable comment. All changes for igc.
Gal Hammer fixes a typo which caused the RDBAL register values to be
shown instead of TDBAL for igb.
Nick Lowe enables RSS support for i211 devices for igb.
Tom Rix fixes checkpatch warning by removing h from printk format
specifier for igb.
Kaixu Xia removes setting of a variable that is overwritten before next
use for e1000e.
Sudip Mukherjee removes an unneeded assignment for e1000.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
e1000: drop unneeded assignment in e1000_set_itr()
e1000e: remove the redundant value assignment in e1000_update_nvm_checksum_spt
igb: remove h from printk format specifier
igb: Enable RSS for Intel I211 Ethernet Controller
igb: fix TDBAL register show incorrect value
igc: Fix TDBAL register show incorrect value
igc: Remove unused FUNC_1 mask
igc: Remove unused local receiver mask
igc: Prefer strscpy over strlcpy
igc: Expose the gPHY firmware version
igc: Expose the NVM version
igc: Add Host Good Packets Transmitted Count
igc: Remove MULR mask define
igc: Remove igc_set_fw_version comment
igc: Clean up nvm_operations structure
====================
Link: https://lore.kernel.org/r/20210204004259.3662059-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now we can remove a bunch of identical functions from the drivers and
make them use common dev_page_is_reusable(). All {,un}likely() checks
are omitted since it's already present in this helper.
Also update some comments near the call sites.
Suggested-by: David Rientjes <rientjes@google.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The variable 'current_itr' is assigned to 0 before jumping to
'set_itr_now' but it has not been used after the jump. So, remove the
unneeded assignment.
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Both of the statements are value assignment of the variable act_offset.
The first value assignment is overwritten by the second and is useless.
Remove it.
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This change fixes the checkpatch warning described in this
commit cbacb5ab0a ("docs: printk-formats: Stop encouraging use of
unnecessary %h[xudi] and %hh[xudi]")
Standard integer promotion is already done and %hx and %hhx is useless
so do not encourage the use of %hh[xudi] or %h[xudi].
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The Intel I211 Ethernet Controller supports 2 Receive Side Scaling (RSS)
queues. It should not be excluded from having this feature enabled.
Via commit c883de9fd7 ("igb: rename igb define to be more generic")
E1000_MRQC_ENABLE_RSS_4Q was renamed to E1000_MRQC_ENABLE_RSS_MQ to
indicate that this is a generic bit flag to enable queues and not
a flag that is specific to devices that support 4 queues
The bit flag enables 2, 4 or 8 queues appropriately depending on the part.
Tested with a multicore CPU and frames were then distributed as expected.
This issue appears to have been introduced because of confusion caused
by the prior name.
Signed-off-by: Nick Lowe <nick.lowe@gmail.com>
Tested-by: David Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fixed a typo which caused the registers dump function to read the
RDBAL register when printing TDBAL register values.
Signed-off-by: Gal Hammer <ghammer@redhat.com>
Tested-by: David Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Fixed a typo which caused the registers dump function to read the
RDBAL register when printing TDBAL register values.
_reg_dump method has been partially derived from i210 and have
same typo.
Suggested-by: Gal Hammer <ghammer@redhat.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
FUNC_1 mask not in use in i225 device and could be removed
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Local receiver mask SR_1000T_LOCAL_RX_STATUS not in use in i225 device
and could be removed
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use the strscpy method instead of strlcpy method.
See: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr
_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Extend reporting of NVM image version to include the gPHY (i225 PHY)
firmware version.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Expose the NVM map version via drvinfo in ethtool
NVM image version is reported as firmware version for i225 device
Minor typo fix - remove space
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This counter counts the number of good (non-erred) packets
transmitted sent by the host.
A good transmit packet is considered one that is 64 or more bytes
in length (from <Destination Address> through <CRC>,
inclusively) in length
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Multiple Tx Data Read Requests is hardware pipeline feature and
is not controlled by software
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
i225 device not supported and do not plan to support
configuration of fw version string for ethtool
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
valid_led_default function pointer not in use and can
be removed from nvm_operations structure.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>