Commit Graph

690505 Commits

Author SHA1 Message Date
David S. Miller 7f81ff0402 Merge branch 'xfrm-remove-flow-cache'
Florian Westphal says:

====================
xfrm: remove flow cache

After RCU-ification of ipsec packet path there are no major scalability
issues anymore without flow cache.

We still incur a performance hit, which comes mostly from the extra xfrm
dst allocation/freeing.
The last patch in the series adds a simple percpu cache to avoid the
extra allocation if a packet matched the same policies as last one.

The main concern with this is that we will see performance drops,
especially with large numbers of policies/SAs.

However, during hallway discussions at nfws 2017 it seemed the issues
with flow caching outweight the removal downsides, and that it
might be best to just 'remove it' and see where the practical issues
(if any) will appear.

It should now be possible to also remove the genid member in the policies
as we don't hold bundles for prolonged time anymore, but I think
this change is controversial (and intrusive) enough as-is, so defer
that to a later point in time.

Changes since last rfc:

- fix build failures due to implicit interrupt.h includes
- rework last patch (pcpu cache):
 * avoid xchg()
 * check policies for walk.dead = 1 instead of more costly bundle_ok().
 * flush pcpu bundles when sa/policies get removed, to allow module
   references to go away (suggested by Ilan Tayari)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:42 -07:00
Florian Westphal ec30d78c14 xfrm: add xdst pcpu cache
retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.

The cache will not help with strict RR workloads as there is no hit.

The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.

We need to run the dst_release on the correct cpu to avoid races with
packet path.  This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().

Test results using 4 network namespaces and null encryption:

ns1           ns2          -> ns3           -> ns4
netperf -> xfrm/null enc   -> xfrm/null dec -> netserver

what                    TCP_STREAM      UDP_STREAM      UDP_RR
Flow cache:             14644.61        294.35          327231.64
No flow cache:		14349.81	242.64		202301.72
Pcpu cache:		14629.70	292.21		205595.22

UDP tests used 64byte packets, tests ran for one minute each,
value is average over ten iterations.

'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
series but without this patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal 09c7570480 xfrm: remove flow cache
After rcu conversions performance degradation in forward tests isn't that
noticeable anymore.

See next patch for some numbers.

A followup patcg could then also remove genid from the policies
as we do not cache bundles anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal bd45c539bf xfrm_policy: make xfrm_bundle_lookup return xfrm dst object
This allows to remove flow cache object embedded in struct xfrm_dst.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal 86dc8ee0b2 xfrm_policy: remove xfrm_policy_lookup
This removes the wrapper and renames the __xfrm_policy_lookup variant
to get rid of another place that used flow cache objects.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal aff669bc28 xfrm_policy: kill flow to policy dir conversion
XFRM_POLICY_IN/OUT/FWD are identical to FLOW_DIR_*, so gcc already
removed this function as its just returns the argument.  Again, no
code change.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal 855dad99c0 xfrm_policy: remove always true/false branches
after previous change oldflo and xdst are always NULL.
These branches were already removed by gcc, this doesn't change code.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal 3ca28286ea xfrm_policy: bypass flow_cache_lookup
Instead of consulting flow cache, call the xfrm bundle/policy lookup
functions directly.  This pretends the flow cache had no entry.

This helps to gradually remove flow cache integration,
followup commit will remove the dead code that this change adds.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal 3c2a89ddc1 net: xfrm: revert to lower xfrm dst gc limit
revert c386578f1c ("xfrm: Let the flowcache handle its size by default.").

Once we remove flow cache, we don't have a flow cache limit anymore.
We must not allow (virtually) unlimited allocations of xfrm dst entries.
Revert back to the old xfrm dst gc limits.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal 6b1c42e972 vti: revert flush x-netns xfrm cache when vti interface is removed
flow cache is removed in next commit.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
Florian Westphal 0ab1031474 drivers: net: add missing interrupt.h include
these drivers use tasklets or irq apis, but don't include interrupt.h.
Once flow cache is removed the implicit interrupt.h inclusion goes away
which will break the build.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:13:41 -07:00
David S. Miller 6ddb4fdfac Merge branch 'dsa-mv88e6xxx-cleanup-capabilities'
Vivien Didelot says:

====================
net: dsa: mv88e6xxx: cleanup capabilities

This patch series removes the remaining capabilities as well as the
flags bitmap in the info structures. Most of them are turned into ops,
or new info members.

There is no mv88e6xxx_cap enum or bitmap flags anymore, only
mv88e6xxx_info and mv88e6xxx_ops structures.

While reviewing and documenting the related G2 registers, fix a few
inconsistencies: 88E6185 has no interrupt in G2 and 88E6390 has a POT.

Except these two adjustments, there is no functional changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:58 -07:00
Vivien Didelot b3e05aa123 net: dsa: mv88e6xxx: add a multi_chip info flag
Instead of relying on a bitmap flag, add a new multi_chip info flag to
describe the presence of the indirect SMI access though the two device
registers 0x0 and 0x1.

All remaining capabilities and flags are now unused. Remove the
mv88e6xxx_cap enum and the info flags bitmaps.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:58 -07:00
Vivien Didelot 68b8f60cf7 net: dsa: mv88e6xxx: add Energy Detect ops
The 88E6352 family supports Energy Detect and has one bit for Sense and
one bit for periodically transmit NLP (Energy Detect+TM). The 88E6390
family adds another bit to distinguish Auto or SW wake-up. Chips
supporting EEE all have an EEE Enabled bit in the Port Status Register.

This patch adds new ops for the PHY Energy Detect accesses.

This also allows us to get rid of the MV88E6XXX_FLAG_EEE flag.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:58 -07:00
Vivien Didelot 9069c13a48 net: dsa: mv88e6xxx: add a global2_addr info flag
Similarly to global1_addr, add a global2_addr member in the info
structure to describe the presence of the Global 2 Registers.

This allows us to get rid of the MV88E6XXX_FLAG_GLOBAL2 flag.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:58 -07:00
Vivien Didelot 9e907d739c net: dsa: mv88e6xxx: add POT operation
Add a pot_clear operation to clear the Priority Override Table and wrap
its call into a mv88e6xxx_pot_setup helper.

This allows us to get rid of the MV88E6XXX_FLAG_G2_POT flag.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:58 -07:00
Vivien Didelot a2a05db8a5 net: dsa: mv88e6xxx: add POT flag to 88E6390
The 88E6390 family clear the Priority Override Table the same way as
88E6352, thus add MV88E6XXX_FLAG_G2_POT to MV88E6XXX_FLAGS_FAMILY_6390.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:57 -07:00
Vivien Didelot 51c901a775 net: dsa: mv88e6xxx: distinguish Global 2 Rsvd2CPU
The 88E6185 family only has one 16-bit register to mark the 16 802.1D
reserved multicast addresses in the range of 01:80:C2:00:00:0x as MGMT.

The 88E6352 family also has one 16-bit register to mark the 16 GARP
reserved multicast addresses in the range of 01:80:C2:00:00:2x as MGMT.

Split the existing mv88e6095 prefixed mgmt_rsvd2cpu operation into two
distinct mv88e6185 and mv88e6352 prefixed operations, and wrap its call
into a mv88e6xxx_rsvd2cpu_setup helper.

This allows us to also get rid of the MV88E6XXX_CAP_G2_MGMT_EN_* flags.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:57 -07:00
Vivien Didelot d6c5e6aff5 net: dsa: mv88e6xxx: add number of Global 2 IRQs
Similarly to g1_irqs, add a g2_irqs member to the info structure to
indicates the presence of the Global 2 Interrupt Source and Mask
registers.

At the same time, provide helpers and document the registers since they
differ a bit between 88E6352 and 88E6390 families.

This allows us to get rid of the MV88E6XXX_FLAG_G2_INT flag.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:57 -07:00
Vivien Didelot 74e60241ce net: dsa: mv88e6xxx: remove 88E6185 G2 interrupt
The 88E6185 family has no Global 2 Interrupt Source or Mask registers.
Remove the MV88E6XXX_FLAG_G2_INT from MV88E6XXX_FLAGS_FAMILY_6185.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:57 -07:00
Vivien Didelot 2466f64ae4 net: dsa: mv88e6xxx: remove unused capabilities
Remove the forgotten capabilities and related flags from previous
cleanups.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:57 -07:00
Vivien Didelot bd80720468 net: dsa: mv88e6xxx: fix 88E6321 family comment
MV88E6XXX_FAMILY_6321 is undefined, 88E6321's family is 88E6320,
fix this.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:57 -07:00
Vivien Didelot c56a71a921 net: dsa: mv88e6xxx: remove LED control register
We don't support LED control yet, remove its register definition.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:57 -07:00
Vivien Didelot 7b23268c9d net: dsa: mv88e6xxx: remove unneeded dsa header
phy.c does not need to include the DSA public header. Remove it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18 11:10:57 -07:00
John Fastabend 46f55cffa4 net: fix build error in devmap helper calls
Initial patches missed case with CONFIG_BPF_SYSCALL not set.

Fixes: 11393cc9b9 ("xdp: Add batching support to redirect map")
Fixes: 97f91a7cf0 ("bpf: add bpf_redirect_map helper routine")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 21:58:32 -07:00
Fabio Estevam 95b80bf3db mdio_bus: Remove unneeded gpiod NULL check
The gpiod API checks for NULL descriptors, so there is no need to
duplicate the check in the driver.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 16:57:04 -07:00
Andy Gospodarek 24251c2647 samples/bpf: add option for native and skb mode for redirect apps
When testing with a driver that has both native and generic redirect support:

$ sudo ./samples/bpf/xdp_redirect -N 5 6
input: 5 output: 6
ifindex 6:    4961879 pkt/s
ifindex 6:    6391319 pkt/s
ifindex 6:    6419468 pkt/s

$ sudo ./samples/bpf/xdp_redirect -S 5 6
input: 5 output: 6
ifindex 6:    1845435 pkt/s
ifindex 6:    3882850 pkt/s
ifindex 6:    3893974 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -N 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:    2207374 pkt/s
ifindex 6:    6212869 pkt/s
ifindex 6:    6286515 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -S 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:    5052528 pkt/s
ifindex 6:    5736631 pkt/s
ifindex 6:    5739962 pkt/s

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 13:46:27 -07:00
Arvind Yadav 7924a42133 net: ec_bhf: constify pci_device_id.
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
   5113	    384	      0	   5497	   1579	drivers/net/ethernet/ec_bhf.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   5177	    320	      0	   5497	   1579	drivers/net/ethernet/ec_bhf.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 13:37:15 -07:00
Arvind Yadav c744cf5b9d net: cadence: macb: constify pci_device_id.
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.

File size before:
   text	   data	    bss	    dec	    hex	filename
    791	    336	      0	   1127	    467	net/ethernet/cadence/macb_pci.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
    855	    272	      0	   1127	    467	net/ethernet/cadence/macb_pci.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 13:37:15 -07:00
Florian Westphal 06dc75ab06 net: Revert "net: add function to allocate sk_buff head without data area"
It was added for netlink mmap tx, there are no callers in the tree.
The commit also added a check for skb->head != NULL in kfree_skb path,
remove that too -- all skbs ought to have skb->head set.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 10:34:21 -07:00
David S. Miller 5c3c6081b2 Merge branch 'net-ufo-remove'
David S. Miller says:

====================
net: Remove UDP Fragmentation Offload support

This is a patch series, based upon some discussions with various
developers, that removes UFO offloading.

Very few devices support this operation, it's usefullness is
quesitonable at best, and it adds a non-trivial amount of
complexity to our data paths.

v2: Delete more code thanks to feedback from Willem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:53:05 -07:00
David S. Miller d9d30adf56 net: Kill NETIF_F_UFO and SKB_GSO_UDP.
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:58 -07:00
David S. Miller 6800b2e040 inet: Remove software UFO fragmenting code.
Rename udp{4,6}_ufo_fragment() to udp{4,6}_tunnel_segment() and only
handle tunnel segmentation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:58 -07:00
David S. Miller 880388aa3c net: Remove all references to SKB_GSO_UDP.
Such packets are no longer possible.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:58 -07:00
David S. Miller 988cf74deb inet: Stop generating UFO packets.
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:58 -07:00
David S. Miller 08a00fea6d net: Remove references to NETIF_F_UFO from ethtool.
It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:58 -07:00
David S. Miller d4c023f4f3 net: Remove references to NETIF_F_UFO in netdev_fix_features().
It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:58 -07:00
David S. Miller e078de0378 virtio_net: Remove references to NETIF_F_UFO.
It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:58 -07:00
David S. Miller 2082499a95 dummy: Remove references to NETIF_F_UFO.
It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:57 -07:00
David S. Miller d591a1f3aa tun/tap: Remove references to NETIF_F_UFO.
It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:57 -07:00
David S. Miller fb652fdfe8 macvlan/macvtap: Remove NETIF_F_UFO advertisement.
It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:57 -07:00
David S. Miller 182e0b6b58 ipvlan: Stop advertising NETIF_F_UFO support.
It is going away.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:57 -07:00
David S. Miller f9c45ae020 macb: Remove bogus reference to NETIF_F_UFO.
This driver doesn't actually support UFO explicitly yet
it advertises this in netdev->features.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:57 -07:00
David S. Miller ed22e2f6b7 s2io: Remove UFO support.
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:52:57 -07:00
David S. Miller 6093ec2dc3 Merge branch 'xdp-redirect'
John Fastabend says:

====================
Implement XDP bpf_redirect

This series adds two new XDP helper routines bpf_redirect() and
bpf_redirect_map(). The first variant bpf_redirect() is meant
to be used the same way it is currently being used by the cls_bpf
classifier. An xdp packet will be redirected immediately when this
is called.

The other variant bpf_redirect_map(map, key, flags) uses a new
map type called devmap. A devmap uses integers as keys and
net_devices as values. The user provies key/ifindex pairs to
update the map with new net_devices. This provides two benefits
over the normal variant 'bpf_redirect()'. First the datapath
bpf program is abstracted away from using hard-coded ifindex
values. Allowing a single bpf program to be run any many different
environments. Second, and perhaps more important, the map enables
batching packet transmits. The map plus small driver changes
allows for batching all send requests across a NAPI poll loop.
This allows driver writers to optimize the driver xmit path
and only call expensive operations once for a batch of xdp_buffs.

The devmap was designed to support possible future work for
multicast and broadcast as follow-up patches.

To see, in more detail, how to leverage the new helpers and
map from the userspace side please review these two patches,

  xdp: sample program for new bpf_redirect helper
  xdp: bpf redirect with map sample program

Performance numbers provided by Jesper are the following, tested
using the ixgbe driver with CPU E5-1650 v4 @ 3.60GHz:

13,939,674 pkt/s = XDP_DROP without touching memory
14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
 7,596,576 pkt/s = xdp_redirect:    XDP_REDIRECT with swap mac (like XDP_TX)
13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap

A big thanks to everyone who helped with this series. Jesper
provided fixes, debugging, code review, performance benchmarks!
Daniel provided lots of useful feedback and code review. And last
but not least Andy provided useful feedback related to supporting
additional drivers, generic xdp implementation, testing, etc. Any
other feedback is welcome but I believe at this point these are
ready to be merged!

Whats left... get the rest of the drivers developers to implement
this in all the drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:07 -07:00
John Fastabend 9d6e005287 xdp: bpf redirect with map sample program
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend 2ddf71e23c net: add notifier hooks for devmap bpf map
The BPF map devmap holds a refcnt on the net_device structure when
it is in the map. We need to do this to ensure on driver unload we
don't lose a dev reference.

However, its not very convenient to have to manually unload the map
when destroying a net device so add notifier handlers to do the cleanup
automatically. But this creates a race between update/destroy BPF
syscall and programs and the unregister netdev hook.

Unfortunately, the best I could come up with is either to live with
requiring manual removal of net devices from the map before removing
the net device OR to add a mutex in devmap to ensure the map is not
modified while we are removing a device. The fallout also requires
that BPF programs no longer update/delete the map from the BPF program
side because the mutex may sleep and this can not be done from inside
an rcu critical section.  This is not a real problem though because I
have not come up with any use cases where this is actually useful in
practice. If/when we come up with a compelling user for this we may
need to revisit this.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend 11393cc9b9 xdp: Add batching support to redirect map
For performance reasons we want to avoid updating the tail pointer in
the driver tx ring as much as possible. To accomplish this we add
batching support to the redirect path in XDP.

This adds another ndo op "xdp_flush" that is used to inform the driver
that it should bump the tail pointer on the TX ring.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend 97f91a7cf0 bpf: add bpf_redirect_map helper routine
BPF programs can use the devmap with a bpf_redirect_map() helper
routine to forward packets to netdevice in map.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend 546ac1ffb7 bpf: add devmap, a map for storing net device references
Device map (devmap) is a BPF map, primarily useful for networking
applications, that uses a key to lookup a reference to a netdevice.

The map provides a clean way for BPF programs to build virtual port
to physical port maps. Additionally, it provides a scoping function
for the redirect action itself allowing multiple optimizations. Future
patches will leverage the map to provide batching at the XDP layer.

Another optimization/feature, that is not yet implemented, would be
to support multiple netdevices per key to support efficient multicast
and broadcast support.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00