Don't offload IP header checksum to NIC.
This fixes a previous patch which enabled checksum offloading
for both IPv4 and IPv6 packets. So L3 checksum offload was
getting enabled for IPv6 pkts. And HW is dropping these pkts
as it assumes the pkt is IPv4 when IP csum offload is set
in the SQ descriptor.
Fixes: 3a9024f52c ("net: thunderx: Enable TSO and checksum offloads for ipv6")
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@auriga.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using netdev_<level>(netdev, "%s: ...", netdev->name) duplicates the
name in the output. Remove those uses.
Miscellanea:
o Use the netif_<level> convenience macros at the same time
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver follows a method of taking one extra reference on the
page for recycling which is fine in usual packet path where
each 64KB page is segmented into multiple receive buffers.
But in XDP mode since there is just one receive buffer per
page taking extra page reference itself becomes big bottleneck
consuming ~50% of CPU cycles due to atomic operations.
This patch adds a internal ref count in pgcache for each
page and additional page references are taken in a batch
instead of just one at a time. Internal i.e 'pgcache->ref_count'
and page's i.e 'page->_refcount' counters are compared to check
page's recyclability.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When in XDP mode reserve XDP_PACKET_HEADROOM bytes at the start
of receive buffer for XDP program to modify headers and adjust
packet start. Additional code changes done to handle such packets.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for XDP_TX i.e transmits packet out of
the XDP TX queue mapped to the corresponding Rx queue
on which packet is received.
Since SQ for XDP TX will be used only on a single cpu i.e
SQ description creation and freeing, using atomic free count
is not necessary and will become a bottleneck. Hence added
a separate 'xdp_free_cnt' used for SQs designated for XDP
to track descriptor free count.
Changes also include
- A new entry 'xdp_page' is added to save transmitted packet's
page pointer for later cleanup.
- XDP Tx SQ's doorbell is ringed once per NAPI instance.
- Retrieving designated SQ for packets being sent out by stack
via 'nicvf_xmit'.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for XDP_DROP.
Also since in XDP mode there is just a single buffer per page,
made changes to recycle DMA mapping info as well along with pages.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds basic XDP support i.e attaching a BPF program to an
interface. Also takes care of allocating separate Tx queues
for XDP path and for network stack packet transmission.
This patch doesn't support handling of any of the XDP actions,
all are treated as XDP_PASS i.e packets will be handed over to
the network stack.
Changes also involve allocating one receive buffer per page in XDP
mode and multiple in normal mode i.e when no BPF program is attached.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of unnecessary double pointer references and type casting
in receive buffer allocation code.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimized CQE handling with below changes
- Feeing descriptors back to SQ in bulk i.e once per NAPI
instance instead for every CQE_TX, this will reduce number
of atomic updates to 'sq->free_cnt'.
- Checking errors in CQE_TX and CQE_RX before calling appropriate
fn()s to update error stats i.e reduce branching.
Also removed debug messages in packet handling path which otherwise
causes issues if DEBUG is enabled.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Receive buffer's physical address or iova will anyway not
go beyond 49bits, since it is the max supported HW address.
As per perf, updating bitfields i.e buf_addr:42 in RBDR
descriptor entry consumes lots of cpu cycles, hence changed
it to a 64bit field with alignment requirements taken care of.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for page recycling for allocating receive buffers
to reduce cost of refilling RBDR ring. Also got rid of using
compound pages when pagesize is 4K, only order-0 pages now.
Only page is recycled, DMA mappings still needs to be done for
every receive buffer allocated due to following constraints
- Cannot have just one receive buffer per 64KB page.
- There is just one buffer ring shared across 8 Rx queues, so
buffers of same page can go to any Rx queue.
- HW gives buffer address where packet has been DMA'ed and not
the index into buffer ring.
This makes it not possible to resue DMA mapping info. So unfortunately
have to go through costly mapping route for every buffer.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding support for TSO and checksum hardware offloads for ipv6.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not consider IPv6 frames with zero UDP checksum as frames
with bad checksum and drop them.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ACPI support has been added to ARM IOMMU driver in 4.10 kernel
and that has resulted in VNIC interfaces throwing translation
faults when kernel is booted with ACPI as driver was not using
DMA API. This patch fixes the issue by using DMA API which inturn
will create translation tables when IOMMU is enabled.
Also VNIC doesn't have a seperate receive buffer ring per receive
queue, so there is no 1:1 descriptor index matching between CQE_RX
and the index in buffer ring from where a buffer has been used for
DMA'ing. Unlike other NICs, here it's not possible to maintain dma
address to virt address mappings within the driver. This leaves us
no other choice but to use IOMMU's IOVA address conversion API to
get buffer's virtual address which can be given to network stack
for processing.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support to set Rx/Tx queue sizes from ethtool. Fixes
an issue with retrieving queue size. Also sets SQ's CQ_LIMIT
based on configured Tx queue size such that HW doesn't process
SQEs when there is no sufficient space in CQ.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transmit queue timeout issue is seen in two cases
- Due to a race condition btw setting stop_queue at xmit()
and checking for stopped_queue in NAPI poll routine, at times
transmission from a SQ comes to a halt. This is fixed
by using barriers and also added a check for SQ free descriptors,
incase SQ is stopped and there are only CQE_RX i.e no CQE_TX.
- Contrary to an assumption, a HW errata where HW doesn't stop transmission
even though there are not enough CQEs available for a CQE_TX is
not fixed in T88 pass 2.x. This results in a Qset error with
'CQ_WR_FULL' stalling transmission. This is fixed by adjusting
RXQ's RED levels for CQ level such that there is always enough
space left for CQE_TXs.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables moving average calculation of Rx pkt's resources
and configures RED and backpressure levels for both CQ and RBDR.
Also initialize SQ's CQ_LIMIT properly.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the following
1. When interface is being teardown and queues are being cleaned up,
free pending SKBs that are in SQ which are either not transmitted
or freed as NAPI is disabled by that time.
2. While interface initialization, delay CFG_DONE notification till
the end to avoid corner cases where TXQs are enabled but CQ
interrupts are not which results blocking transmission and kicking
off watchdog.
3. Check for IFF_UP while re-enabling RBDR interrupts from tasklet.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes multiple issues
1. Convert all driver statistics to percpu counters for accuracy.
2. To avoid multiple CQEs posted by a TSO packet appended to HW,
TSO pkt's SQE has 'post_cqe' not set but a dummy SQE is added
for getting HW transmit completion notification. This dummy
SQE has 'dont_send' set and HW drops the pkt pointed to in this
thus Tx drop counter increases. This patch fixes this by subtracting
SW tx tso counter from HW Tx drop counter for actual packet drop counter.
3. Reset all individual queue's and VNIC HW stats when interface is going down.
4. Getrid off unnecessary counters in hot path.
5. Bringout all CQE error stats i.e both Rx and Tx.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes enabling of HW verification of L3/L4 length and
TCP/UDP checksum which is currently being cleared. Also fixed VLAN
stripping config which is being cleared when multiqset is enabled.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Programming LMAC credits taking 9K frame size by default is incorrect
as for an interface which is one of the many on the same BGX/QLM
no of credits available will be less as Tx FIFO will be divided
across all interfaces. So let's say a BGX with 40G interface and another
BGX with multiple 10G, bandwidth of 10G interfaces will be effected when
traffic is running on both 40G and 10G interfaces simultaneously.
This patch fixes this issue by programming credits based on netdev's MTU.
Also fixed configuring MTU to HW and added CQE counter for pkts which
exceed this value.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/qlogic/qed/qed_dcbx.c
drivers/net/phy/Kconfig
All conflicts were cases of overlapping commits.
Signed-off-by: David S. Miller <davem@davemloft.net>
On ThunderX 88xx pass 2.x chips when TSO is offloaded to HW,
HW posts a CQE for every TSO segment transmitted. Current code
does handles this, but is prone to issues when segment sizes are
small resulting in SW processing too many CQEs and also at times
frees a SKB which is not yet transmitted.
This patch handles the errata in a different way and eliminates issues
with earlier approach, TSO packet is submitted to HW with post_cqe=0,
so that no CQE is posted upon completion of transmission of TSO packet
but a additional HDR + IMMEDIATE descriptors are added to SQ due to
which a CQE is posted and will have required info to be used while
cleanup in napi. This way only one CQE is posted for a TSO packet.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SQ/TXQ is reclaimed i.e reset it's stats also automatically reset
by HW. This is not the case with RQ. Also VF doesn't have write access
to statistics counter registers. Hence a new Mbox msg is introduced which
supports resetting RQ, SQ and full Qset stats. Currently only RQ stats
are being reset using this mbox message.
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of a round about way of converting buffers to SKBs and
combining them into a frag list, use standard skb_add_rx_frag()
API to merge page fragments. This code is useful when incoming
packets are of size more than RCV_FRAG_LEN which is currently
set to 2048bytes.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike 88xx, CQE_RX descriptor's tunnelling extension i.e CQE_RX2_S
is always enabled on 81xx/83xx and HW does insert these fields into
CQE_RX. As a result receive buffer addresses will now be present at
7th word of CQE_RX instead of 6th.
Enable CQE_RX2_S on 88xx pass 2.x as well.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
81xx has only 4 CPUs, so it doesn't make sense to initialize
entire Qset i.e 8 queues by default. Made changes to queue
initialization to init queues equal to number of CPUs or
8 queues whichever is lesser. Also this will be applicable to
VMs with VNIC VF attached and having less VCPUs
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
page_reference manipulation functions are introduced to track down
reference count change of the page. Use it instead of direct
modification of _count.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reserved fields should be set to zero to avoid exposing
bits from the kernel stack.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of calling get_page() for every receive buffer carved out
of page, set page's usage count at the end, to reduce no of atomic
calls.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.c
All three conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Counting rx packets for every CQE_RX in CQ irq handler is incorrect.
Synchronization is missing when multiple queues are receiving packets
simultaneously. Like transmit packet stats use HW stats here.
Also removed unused 'cqe_type' parameter in nicvf_rcv_pkt_handler().
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate higher order pages when pagesize is small, this will
reduce number of calls to page allocator and wastage of memory.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When system is low on atomic memory, too many error messages are logged.
Since this is not a total failure but a simple switch to non-atomic allocation
better to have a stat.
Also add a stat for reset, kicked due to transmit watchdog timeout.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This feature is introduced in pass-2 chip and with this CQ interrupt
coalescing will work based on both timer and count.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for offloading TCP segmentation to HW in pass-2
revision of hardware. Both driver level SW TSO for pass1.x chips
and HW TSO for pass-2 chip will co-exist. Modified SQ descriptor
structures to reflect pass-2 hw implementation.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have moved on to using allocated pages to carve receive
buffers instead of netdev_alloc_skb() there is no need to store
any pointers for later retrieval. Earlier we had to store
skb and skb->data pointers which later are used to handover
received packet to network stack.
This will avoid an unnecessary cache miss as well.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same switch-case repeates for nivc_*_intr functions.
In this patch it is moved to a helper nicvf_int_type_to_mask().
By the way:
- Unneeded write to NICVF register dropped if int_type is unknown.
- netdev_dbg() is used instead of netdev_err().
Signed-off-by: Yury Norov <yury.norov@auriga.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Acked-by: Vadim Lomovtsev <Vadim.Lomovtsev@caiumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly set CQ timer threshold and also set it to 2us.
With previous incorrect settings it was set to 0.5us which is too less.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for handling multiple qsets assigned to a
single VF. There by increasing no of queues from earlier 8 to max
no of CPUs in the system i.e 48 queues on a single node and 96 on
dual node system. User doesn't have option to assign which Qsets/VFs
to be merged. Upon request from VF, PF assigns next free Qsets as
secondary qsets. To maintain current behavior no of queues is kept
to 8 by default which can be increased via ethtool.
If user wants to unbind NICVF driver from a secondary Qset then it
should be done after tearing down primary VF's interface.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch configures HW to strip 802.1Q header if found in a
receiving packet. The stripped VLAN ID and TCI information is
passed on to software via CQE_RX. Also sets netdev's 'vlan_features'
so that other HW offload features can be used for tagged packets.
This offload feature can be enabled or disabled via ethtool.
Network stack normally ignores RPS for 802.1Q packets and hence low
throughput. With this offload enabled throughput for tagged packets
will be almost same as normal packets.
Note: This patch doesn't enable HW VLAN insertion for transmit packets.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added ethtool support to dump receive packet error statistics reported
in CQE. Also made some small fixes
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suppressing standard alloc_pages() warnings. Some kernel configs limit
alloc size and the network driver may fail. Do not drop a kernel
warning in this case, instead just drop a oneliner that the network
driver could not be loaded since the buffer could not be allocated.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixing TSO packages not being counted.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed 'tso_hdrs' memory not being freed properly.
Also fixed SQ skbuff maintenance issues.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switching back to LDD transactions from LDWB.
While transmitting packets out with LDWB transactions
data integrity issues are seen very frequently.
hence switching back to LDD.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GFP_KERNEL should be used in the thread context
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to cast void* to u8*: pointer arithmetics
works same way for both.
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>