In order to use standard 'xdp' prefix, rename convert_to_xdp_frame
utility routine in xdp_convert_buff_to_frame and replace all the
occurrences
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
This driver uses RX page-split when possible. It was recently fixed
in commit 86e85bf698 ("sfc: fix XDP-redirect in this driver") to
add needed tailroom for XDP-redirect.
After the fix efx->rx_page_buf_step is the frame size, with enough
head and tail-room for XDP-redirect.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158945335278.97035.14611425333184621652.stgit@firesoul
XDP-redirect is broken in this driver sfc. XDP_REDIRECT requires
tailroom for skb_shared_info when creating an SKB based on the
redirected xdp_frame (both in cpumap and veth).
The fix requires some initial explaining. The driver uses RX page-split
when possible. It reserves the top 64 bytes in the RX-page for storing
dma_addr (struct efx_rx_page_state). It also have the XDP recommended
headroom of XDP_PACKET_HEADROOM (256 bytes). As it doesn't reserve any
tailroom, it can still fit two standard MTU (1500) frames into one page.
The sizeof struct skb_shared_info in 320 bytes. Thus drivers like ixgbe
and i40e, reduce their XDP headroom to 192 bytes, which allows them to
fit two frames with max 1536 bytes into a 4K page (192+1536+320=2048).
The fix is to reduce this drivers headroom to 128 bytes and add the 320
bytes tailroom. This account for reserved top 64 bytes in the page, and
still fit two frame in a page for normal MTUs.
We must never go below 128 bytes of headroom for XDP, as one cacheline
is for xdp_frame area and next cacheline is reserved for metadata area.
Fixes: eb9a36be7f ("sfc: perform XDP processing on received packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Includes a couple of filtering functions and also renames a constant.
Style fixes included.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Page recycling code and GRO packet receipt code were moved.
One function contains code extracted from another.
Code style fixes included.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The moved code deals with managing rx buffers and queues.
A tiny bit of refactoring was required in other files to stitch the
code together.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New headers contain prototypes of functions that will be common between
ef10 and upcoming driver.
Removed static modifier from the affected functions.
Some function prototypes were removed from existing headers.
Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct a mismatch between rx_page_buf_step and the actual step size
used when filling buffer pages.
This patch fixes the page overrun that occured when the MTU was set to
anything bigger than 1692.
Fixes: 3990a8fffb ("sfc: allocate channels for XDP tx queues")
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the number of successful and failed insertions, and also the
current count of filters, to aid in tuning e.g. rps_flow_cnt.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
In high connection count usage, the NIC's filter table may be filled with
sufficiently many ARFS filters that further insertions fail. As this
does not represent a correctness issue, do not log the resulting MCDI
errors. Add a debug-level message under the (by default disabled)
rx_status category instead; and take the opportunity to do a little extra
expiry work.
Since there are now multiple workitems able to call __efx_filter_rfs_expire
on a given channel, it is possible for them to race and thus pass quotas
which, combined, exceed rfs_filter_count. Thus, don't WARN_ON if we loop
all the way around the table with quota left over.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Tested-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The old rfs_filters_added method for determining the quota could potentially
allow the NIC to become filled with old filters, which never get tested for
expiry. Instead, explicitly make expiry check work depend on the number of
filters installed, and don't count checking slots without filters in as
doing work. This guarantees that each filter will be checked for expiry at
least once every thirty seconds (assuming the channel to which it belongs is
NAPI polling actively) regardless of fill level.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Tested-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The sfc driver can drop packets processed with XDP, notably when running
out of buffer space on XDP_TX, or returning an unknown XDP action.
This increments the rx_xdp_bad_drops ethtool counter.
Call trace_xdp_exception everywhere rx_xdp_bad_drops is incremented,
except for fragmented RX packets as the XDP program hasn't run yet.
This allows it to easily be monitored from userspace.
This mirrors the behavior of other drivers.
Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count XDP packet drops, error drops, transmissions and redirects and
expose these counters via the ethtool stats command.
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide an ndo_xdp_xmit function that uses the XDP tx queue for this
CPU to send the packet.
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a field to hold an attached xdp_prog, but never populates it (see
following patch). Also, XDP_TX support is deferred to a later patch
in the series.
Track failures of xdp_rxq_info_reg() via per-queue xdp_rxq_info_valid
flags and a per-nic xdp_rxq_info_failed flag. The per-queue flags are
needed to prevent attempts to xdp_rxq_info_unreg() structs that failed
to register. Possibly the API could be changed in the future to avoid
the need for these flags.
Signed-off-by: Charles McLachlan <cmclachlan@solarflare.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already scored points when handling the RX event, no-one else does this,
and looking at the history it appears this was originally meant to only
score on merges, not on GRO_NORMAL. Moreover, it gets in the way of
changing GRO to not immediately pass GRO_NORMAL skbs to the stack.
Performance testing with four TCP streams received on a single CPU (where
throughput was line rate of 9.4Gbps in all tests) showed a 13.7% reduction
in RX CPU usage (n=6, p=0.03).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After failing to allocate a receive buffer the driver may fail to ever
request additional allocations. EF10 NICs require new receive buffers to
be pushed in batches of eight or more. The test for whether a slow fill
should be scheduled failed to take account of this. There is little
downside to *always* requesting a slow fill if we failed to allocate a
buffer, so the condition has been removed completely. The timer that
triggers the request for a refill has also been shortened.
Signed-off-by: Robert Stonehouse <rstonehouse@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improves packet rate of 1-byte UDP receives by up to 10%.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx->type->filter_insert() returns an ID rather than the index that
efx->type->filter_async_insert() used to, which causes it to exceed
efx->type->max_rx_ip_filters on some EF10 configurations, leading to out-
of-bounds array writes.
So, in efx_filter_rfs_work(), convert this back into an index (which is
what the remove call in the expiry path expects, anyway).
Fixes: 3af0f34290 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Associate an arbitrary ID with each ARFS filter, allowing to properly query
for expiry. The association is maintained in a hash table, which is
protected by a spinlock.
v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot).
v2: fixed uninitialised variable (thanks davem and lkp-robot).
Fixes: 3af0f34290 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A misconfigured system (e.g. with all interrupts affinitised to all CPUs)
may produce a storm of ARFS steering events. With the existing sfc ARFS
implementation, that could create a backlog of workitems that grinds the
system to a halt. To prevent this, limit the number of workitems that
may be in flight for a given SFC device to 8 (EFX_RPS_MAX_IN_FLIGHT), and
return EBUSY from our ndo_rx_flow_steer method if the limit is reached.
Given this limit, also store the workitems in an array of slots within the
struct efx_nic, rather than dynamically allocating for each request.
The limit should not negatively impact performance, because it is only
likely to be hit in cases where ARFS will be ineffective anyway.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Necessary to allow redirecting a flow when the application moves.
Fixes: 3af0f34290 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having an efx->type->filter_rfs_insert() method, just use
workitems with a worker function that calls efx->type->filter_insert().
The only user of this is efx_filter_rfs(), which now queues a call to
efx_filter_rfs_work().
Similarly, efx_filter_rfs_expire() is now a worker function called on a
new channel->filter_work work_struct, so the method
efx->type->filter_rfs_expire_one() is no longer called in atomic context.
We also add a new mutex efx->rps_mutex to protect the RPS state (efx->
rps_expire_channel, efx->rps_expire_index, and channel->rps_flow_id) so
that the taking of efx->filter_lock can be moved to
efx->type->filter_rfs_expire_one().
Thus, all filter table functions are now called in a sleepable context,
allowing them to use sleeping locks in a future patch.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2 updates
- almost all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
memory hotplug: fix comments when adding section
mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
mm: simplify nodemask printing
mm,oom_reaper: remove pointless kthread_run() error check
mm/page_ext.c: check if page_ext is not prepared
writeback: remove unused function parameter
mm: do not rely on preempt_count in print_vma_addr
mm, sparse: do not swamp log with huge vmemmap allocation failures
mm/hmm: remove redundant variable align_end
mm/list_lru.c: mark expected switch fall-through
mm/shmem.c: mark expected switch fall-through
mm/page_alloc.c: broken deferred calculation
mm: don't warn about allocations which stall for too long
fs: fuse: account fuse_inode slab memory as reclaimable
mm, page_alloc: fix potential false positive in __zone_watermark_ok
mm: mlock: remove lru_add_drain_all()
mm, sysctl: make NUMA stats configurable
shmem: convert shmem_init_inodecache() to void
Unify migrate_pages and move_pages access checks
mm, pagevec: rename pagevec drained field
...
As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of. Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact. Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.
This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache. Hence, there is only a potential
benefit for microbenchmarks that alloc/free pages in a tight loop. It's
even worse when THP is taken into account which has little or no chance
of getting a cache-hot page as the per-cpu list is bypassed and the
zeroing of multiple pages will thrash the cache anyway.
The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path. A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising
given that the __GFP_COLD branches are a miniscule percentage of the
fault path.
Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the csum_level for encapsulated packets where the encapsulation
type, l3 class and l4 class are sets that need it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of tricky code, we also remove
one lock operation in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Logically, EFX_BUG_ON_PARANOID can never be correct. For, BUG_ON should
only be used if it is not possible to continue without potential harm;
and since the non-DEBUG driver will continue regardless (as the BUG_ON is
compiled out), clearly the BUG_ON cannot be needed in the DEBUG driver.
So, replace every EFX_BUG_ON_PARANOID with either an EFX_WARN_ON_PARANOID
or the newly defined EFX_WARN_ON_ONCE_PARANOID.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rationale: The differences between Falcon and Siena are in many ways larger
than those between Siena and EF10 (despite Siena being nominally "Falcon-
architecture"); for instance, Falcon has no MCPU, so there is no MCDI.
Removing Falcon support from the sfc driver should simplify the latter,
and avoid the possibility of Falcon support being broken by changes to sfc
(which are rarely if ever tested on Falcon, it being end-of-lifed hardware).
The sfc-falcon driver created in this changeset is essentially a copy of the
sfc driver, but with Siena- and EF10-specific code, including MCDI, removed
and with the "efx_" identifier prefix changed to "ef4_" (for "EFX 4000-
series") to avoid collisions when both drivers are built-in.
This changeset removes Falcon from the sfc driver's PCI ID table; then in
sfc I've removed obvious Falcon-related code: I removed the Falcon NIC
functions, Falcon PHY code, and EFX_REV_FALCON_*, then fixed up everything
that referenced them.
Also, increment minor version of both drivers (to 4.1).
For now, CONFIG_SFC selects CONFIG_SFC_FALCON, so that updating old configs
doesn't cause Falcon support to disappear; but that should be undone at
some point in the future.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we get confused when two flows on different channels get the
same flow ID.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We would like to automatically provide busy polling support
to all NAPI drivers, without them having to implement anything.
skb_mark_napi_id() can be called from napi_gro_receive() and
napi_get_frags().
Few drivers are still calling skb_mark_napi_id() because
they use netif_receive_skb(). They should eventually call
napi_gro_receive() instead. I will leave this to drivers
maintainers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When Rx packet data must be dropped, all the buffers
associated with that Rx packet must be freed. Extend
and rename efx_free_rx_buffer() to efx_free_rx_buffers()
and loop through all the fragments.
By doing so this patch fixes a possible memory leak.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the sfc driver code for implementing busy polling.
It adds ndo_busy_poll method and locking between it and napi poll.
It also adds each napi to the napi_hash right after netif_napi_add().
Uses efx_start_eventq and efx_stop_eventq in the self tests.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement per channel software TX and RX packet counters
accessed as ethtool statistics.
This allows confirmation with MAC statistics.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a counter rx_noskb_drop for failure to allocate an skb.
Summed the per-channel rx_nodesc_trunc counters earlier so that they can
be included in rx_dropped.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers should call skb_set_hash to set the hash and its type
in an skbuff.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EF10 firmware can optionally insert RX timestamps in the packet
prefix. These only include the clock minor value. We must also
enable periodic time sync events on each event queue which provide
the high bits of the clock value.
[bwh: Combined and rebased several changes.
Added the above description and some sanity checks for inline vs
separate timestamps.
Changed efx_rx_skb_attach_timestamp() to read the packet prefix
from the skb head area.]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
We can potentially pull the entire packet contents into the head area
and then free the page it was in. In order to read an inline
timestamp safely, we need to copy the prefix into the head area as
well.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
rx_prefix_size is 4-bytes aligned on Falcon/Siena (16 bytes), but it is equal
to 14 on EF10. So, it should be taken into account if arch requires IP header
to be 4-bytes aligned (via NET_IP_ALIGN).
Fixes: 8127d661e7 ('sfc: Add support for Solarflare SFC9100 family')
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Extend efx_filter_rfs() to map TCP/IPv6 and UDP/IPv6 flows into
efx_filter_spec. These are only supported on EF10; on Falcon and
Siena they will be rejected by efx_farch_filter_from_gen_spec().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Update the dates for files that have been added to in 2012-2013.
Drop the 'Solarstorm' brand name that's still lingering here.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
RX DMA scatter is always enabled on EF10. Adjust the common RX
completion handling to allow for this.
RX completion events on EF10 include the length used from a single
descriptor, not the cumulative length used. Add a field to struct
efx_rx_queue to hold the cumulative length.
[bwh: Also fix a related comment]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Add the efx_filter_is_mc_recip() function to decide whether a filter
is for a multicast recipient and can coexist with other filters with
the same match values. Update efx_filter_insert_filter() kernel-doc
to explain the conditions for this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Define a flag for struct efx_rx_buffer and efx_rx_packet() that
indicates packet length must be read from the prefix. If this
is set, read the length in __efx_rx_packet() (when the prefix
should have arrived in cache).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
EF10 uses an entirely different RX prefix format from Falcon-arch.
Extend struct efx_nic_type to describe this.
[bwh: Also replace the magic numbers used for the Falcon-arch RX prefix]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Aside from accelerated RFS, there is almost nothing that can be shared
between the filter table implementations for the Falcon architecture
and EF10.
Move the few shared functions into efx.c and rx.c and the rest into
farch.c. Introduce efx_nic_type operations for the implementation and
inline wrapper functions that call these.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>