This implement XDP CPU redirection load-balancing across available
CPUs, based on the hashing IP-pairs + L4-protocol. This equivalent to
xdp-cpu-redirect feature in Suricata, which is inspired by the
Suricata 'ippair' hashing code.
An important property is that the hashing is flow symmetric, meaning
that if the source and destination gets swapped then the selected CPU
will remain the same. This is helps locality by placing both directions
of a flows on the same CPU, in a forwarding/routing scenario.
The hashing INITVAL (15485863 the 10^6th prime number) was fairly
arbitrary choosen, but experiments with kernel tree pktgen scripts
(pktgen_sample04_many_flows.sh +pktgen_sample05_flow_per_thread.sh)
showed this improved the distribution.
This patch also change the default loaded XDP program to be this
load-balancer. As based on different user feedback, this seems to be
the expected behavior of the sample xdp_redirect_cpu.
Link: https://github.com/OISF/suricata/commit/796ec08dd7a63
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Adjusted function call API to take an initval. This allow the API
user to set the initial value, as a seed. This could also be used for
inputting the previous hash.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The teardown race in cpumap is really hard to reproduce. These changes
makes it easier to reproduce, for QA.
The --stress-mode now have a case of a very small queue size of 8, that helps
to trigger teardown flush to encounter a full queue, which results in calling
xdp_return_frame API, in a non-NAPI protect context.
Also increase MAX_CPUS, as my QA department have larger machines than me.
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The test_cgrp2_attach test covers bpf cgroup attachment code well,
so let's re-use it for testing allocation/releasing of cgroup storage.
The extension is pretty straightforward: the bpf program will use
the cgroup storage to save the number of transmitted bytes.
Expected output:
$ ./test_cgrp2_attach2
Attached DROP prog. This ping in cgroup /foo should fail...
ping: sendmsg: Operation not permitted
Attached DROP prog. This ping in cgroup /foo/bar should fail...
ping: sendmsg: Operation not permitted
Attached PASS prog. This ping in cgroup /foo/bar should pass...
Detached PASS from /foo/bar while DROP is attached to /foo.
This ping in cgroup /foo/bar should fail...
ping: sendmsg: Operation not permitted
Attached PASS from /foo/bar and detached DROP from /foo.
This ping in cgroup /foo/bar should pass...
### override:PASS
### multi:PASS
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Convert xdpsock_user.c to use libbpf instead of bpf_load.o.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Convert xdp_fwd_user.c to use libbpf instead of bpf_load.o.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To smoothly test BTF supported binary on samples/bpf,
let samples/bpf/Makefile probe llc, pahole and
llvm-objcopy for BPF support and use them
like tools/testing/selftests/bpf/Makefile
changed from the commit c0fa1b6c3e ("bpf: btf:
Add BTF tests").
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Define u_smp_rmb() and u_smp_wmb() to respective barrier instructions.
This ensures the processor will order accesses to queue indices against
accesses to queue ring entries.
Signed-off-by: Brian Brooks <brian.brooks@linaro.org>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-20
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add sharing of BPF objects within one ASIC: this allows for reuse of
the same program on multiple ports of a device, and therefore gains
better code store utilization. On top of that, this now also enables
sharing of maps between programs attached to different ports of a
device, from Jakub.
2) Cleanup in libbpf and bpftool's Makefile to reduce unneeded feature
detections and unused variable exports, also from Jakub.
3) First batch of RCU annotation fixes in prog array handling, i.e.
there are several __rcu markers which are not correct as well as
some of the RCU handling, from Roman.
4) Two fixes in BPF sample files related to checking of the prog_cnt
upper limit from sample loader, from Dan.
5) Minor cleanup in sockmap to remove a set but not used variable,
from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Lots of fixes, here goes:
1) NULL deref in qtnfmac, from Gustavo A. R. Silva.
2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih.
3) Lost completion messages in AF_XDP, from Magnus Karlsson.
4) Correct bogus self-assignment in rhashtable, from Rishabh
Bhatnagar.
5) Fix regression in ipv6 route append handling, from David Ahern.
6) Fix masking in __set_phy_supported(), from Heiner Kallweit.
7) Missing module owner set in x_tables icmp, from Florian Westphal.
8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire.
9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy.
10) Fix NULL deref when using chains in act_csum, from Davide Caratti.
11) XDP_REDIRECT needs to check if the interface is up and whether the
MTU is sufficient. From Toshiaki Makita.
12) Net diag can do a double free when killing TCP_NEW_SYN_RECV
connections, from Lorenzo Colitti.
13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a
full minute, delaying device unregister. From Eric Dumazet.
14) Update MAC entries in the correct order in ixgbe, from Alexander
Duyck.
15) Don't leave partial mangles bpf program in jit_subprogs, from
Daniel Borkmann.
16) Fix pfmemalloc SKB state propagation, from Stefano Brivio.
17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng.
18) Use after free in tun XDP_TX, from Toshiaki Makita.
19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole.
20) Don't reuse remainder of RX page when XDP is set in mlx4, from
Saeed Mahameed.
21) Fix window probe handling of TCP rapair sockets, from Stefan
Baranoff.
22) Missing socket locking in smc_ioctl(), from Ursula Braun.
23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann.
24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva.
25) Two spots in ipv6 do a rol32() on a hash value but ignore the
result. Fixes from Colin Ian King"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits)
tcp: identify cryptic messages as TCP seq # bugs
ptp: fix missing break in switch
hv_netvsc: Fix napi reschedule while receive completion is busy
MAINTAINERS: Drop inactive Vitaly Bordug's email
net: cavium: Add fine-granular dependencies on PCI
net: qca_spi: Fix log level if probe fails
net: qca_spi: Make sure the QCA7000 reset is triggered
net: qca_spi: Avoid packet drop during initial sync
ipv6: fix useless rol32 call on hash
ipv6: sr: fix useless rol32 call on hash
net: sched: Using NULL instead of plain integer
net: usb: asix: replace mii_nway_restart in resume path
net: cxgb3_main: fix potential Spectre v1
lib/rhashtable: consider param->min_size when setting initial table size
net/smc: reset recv timeout after clc handshake
net/smc: add error handling for get_user()
net/smc: optimize consumer cursor updates
net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
ipv6: ila: select CONFIG_DST_CACHE
net: usb: rtl8150: demote allmulti message to dev_dbg()
...
In preparation for enabling command line LDLIBS, re-name HOST_LOADLIBES
to KBUILD_HOSTLDLIBS as the internal use only flags. Also rename
existing usage to HOSTLDLIBS for consistency. This should not have any
visible effects.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
In preparation for enabling command line CFLAGS, re-name HOSTCFLAGS to
KBUILD_HOSTCFLAGS as the internal use only flags. This should not have
any visible effects.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
"prog_cnt" is the number of elements which are filled out in prog_fd[]
so the test should be >= instead of >.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
I can't see that we check prog_cnt to ensure it doesn't go over
MAX_PROGS.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
People noticed that the code match on IEEE 802.1ad (ETH_P_8021AD) ethertype,
and this implies Q-in-Q or double tagged VLANs. Thus, we better parse
the next VLAN header too. It is even marked as a TODO.
This is relevant for real world use-cases, as XDP cpumap redirect can be
used when the NIC RSS hashing is broken. E.g. the ixgbe driver HW cannot
handle double tagged VLAN packets, and places everything into a single
RX queue. Using cpumap redirect, users can redistribute traffic across
CPUs to solve this, which is faster than the network stacks RPS solution.
It is left as an exerise how to distribute the packets across CPUs. It
would be convenient to use the RX hash, but that is not _yet_ exposed
to XDP programs. For now, users can code their own hash, as I've demonstrated
in the Suricata code (where Q-in-Q is handled correctly).
Reported-by: Florian Maury <florian.maury-cv@x-cli.eu>
Reported-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
mdev_access() calls mbochs_get_page() with mdev_state->ops_lock held,
while mbochs_get_page() locks the mutex by itself.
It leads to unavoidable deadlock.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The below path error can occur:
# ./xdp2skb_meta.sh --dev eth0 --list
./xdp2skb_meta.sh: line 61: /usr/sbin/tc: No such file or directory
So just use command names instead of absolute paths of tc and ip.
In addition, it allow callers to redefine $TC and $IP paths
Fixes: 36e04a2d78 ("samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB")
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For untracked executables of samples/bpf, add this.
Untracked files:
(use "git add <file>..." to include in what will be committed)
samples/bpf/cpustat
samples/bpf/fds_example
samples/bpf/lathist
samples/bpf/load_sock_ops
...
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
test_task_rename() and test_urandom_read()
can be failed during write() and read(),
So check the result of them.
Reviewed-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To avoid the below build warning message,
use new generate_load() checking the return value.
ignoring return value of ‘system’, declared with attribute warn_unused_result
And it also refactors the duplicate code of both
test_perf_event_all_cpu() and test_perf_event_task()
Cc: Teng Qin <qinteng@fb.com>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This fixes build error regarding redefinition:
CLANG-bpf samples/bpf/parse_varlen.o
samples/bpf/parse_varlen.c:111:8: error: redefinition of 'vlan_hdr'
struct vlan_hdr {
^
./include/linux/if_vlan.h:38:8: note: previous definition is here
So remove duplicate 'struct vlan_hdr' in sample code and include if_vlan.h
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-03
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various improvements to bpftool and libbpf, that is, bpftool build
speed improvements, missing BPF program types added for detection
by section name, ability to load programs from '.text' section is
made to work again, and better bash completion handling, from Jakub.
2) Improvements to nfp JIT's map read handling which allows for optimizing
memcpy from map to packet, from Jiong.
3) New BPF sample is added which demonstrates XDP in combination with
bpf_perf_event_output() helper to sample packets on all CPUs, from Toke.
4) Add a new BPF kselftest case for tracking connect(2) BPF hooks
infrastructure in combination with TFO, from Andrey.
5) Extend the XDP/BPF xdp_rxq_info sample code with a cmdline option to
read payload from packet data in order to use it for benchmarking.
Also for '--action XDP_TX' option implement swapping of MAC addresses
to avoid drops on some hardware seen during testing, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sendmsg in the SKB path of AF_XDP can now return EBUSY when a packet
was discarded and completed by the driver. Just ignore this message
in the sample application.
Fixes: b4b8faa1de ("samples/bpf: sample application and documentation for AF_XDP sockets")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reported-by: Pavel Odintsov <pavel@fastnetmon.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
For ACLs implemented using either FIB rules or FIB entries, the BPF
program needs the FIB lookup status to be able to drop the packet.
Since the bpf_fib_lookup API has not reached a released kernel yet,
change the return code to contain an encoding of the FIB lookup
result and return the nexthop device index in the params struct.
In addition, inform the BPF program of any post FIB lookup reason as
to why the packet needs to go up the stack.
The fib result for unicast routes must have an egress device, so remove
the check that it is non-NULL.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
XDP_TX requires also changing the MAC-addrs, else some hardware
may drop the TX packet before reaching the wire. This was
observed with driver mlx5.
If xdp_rxq_info select --action XDP_TX the swapmac functionality
is activated. It is also possible to manually enable via cmdline
option --swapmac. This is practical if wanting to measure the
overhead of writing/updating payload for other action types.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There is a cost associated with reading the packet data payload
that this test ignored. Add option --read to allow enabling
reading part of the payload.
This sample/tool helps us analyse an issue observed with a NIC
mlx5 (ConnectX-5 Ex) and an Intel(R) Xeon(R) CPU E5-1650 v4.
With no_touch of data:
Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:no_touch
XDP stats CPU pps issue-pps
XDP-RX CPU 0 14,465,157 0
XDP-RX CPU 1 14,464,728 0
XDP-RX CPU 2 14,465,283 0
XDP-RX CPU 3 14,465,282 0
XDP-RX CPU 4 14,464,159 0
XDP-RX CPU 5 14,465,379 0
XDP-RX CPU total 86,789,992
When not touching data, we observe that the CPUs have idle cycles.
When reading data the CPUs are 100% busy in softirq.
With reading data:
Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read
XDP stats CPU pps issue-pps
XDP-RX CPU 0 9,620,639 0
XDP-RX CPU 1 9,489,843 0
XDP-RX CPU 2 9,407,854 0
XDP-RX CPU 3 9,422,289 0
XDP-RX CPU 4 9,321,959 0
XDP-RX CPU 5 9,395,242 0
XDP-RX CPU total 56,657,828
The effect seen above is a result of cache-misses occuring when
more RXQs are being used. Based on perf-event observations, our
conclusion is that the CPUs DDIO (Direct Data I/O) choose to
deliver packet into main memory, instead of L3-cache. We also
found, that this can be mitigated by either using less RXQs or by
reducing NICs the RX-ring size.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add an example program showing how to sample packets from XDP using the
perf event buffer. The example userspace program just prints the ethernet
header for every packet sampled.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There is no default implementation for dma_buf_ops->unmap.
So add a function unmapping the page, otherwise we'll leak them.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Atomic mapping interface for dmabufs will be removed.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The new bochs vbe sample fails to link when DMA_SHARED_BUFFER is
disabled:
ERROR: "dma_buf_export" [samples/vfio-mdev/mbochs.ko] undefined!
ERROR: "dma_buf_fd" [samples/vfio-mdev/mbochs.ko] undefined!
This uses a 'select' statement to enable that framework, like all
other users do.
Fixes: a5e6e6505f ("sample: vfio bochs vbe display (host device for bochs-drm)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Display device, demo-ing the vfio dmabuf display interface
(VFIO_GFX_PLANE_TYPE_DMABUF). Compatible enough to qemu stdvga
that bochs-drm.ko can be used as guest driver.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
- charlcd: fixes and cleanups
From Robert Abel and Sean Young
- Kconfig fixes
From Randy Dunlap, Corentin Labbe and Ulf Magnusson
- cfag12864bfb: const cleanup
From Gustavo A. R. Silva
- Docs/licenses/warnings cleanups
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJbGTx5AAoJEBl8i3NobSFta7EP/0BPFiZQoYcj4Cnao9CX5VKT
IMJQNfVcMIuhMGKfsEnT/mT4X6LWZEMUoMpbf6hrMs8l9gkvDx41fDVmfOqBSiU7
7u010Lil/TksFzkETVQdstcE+DvQtD63tmzjAeVOorROmm21WrfD9+AUecpK3P3+
qWZGw3ErIxtIDFk6zwupZu3KuNqqeRW23i7Dg7Mw+wUsxD1CAfDq0l14dS4q6IsN
xVYBUm5ku+7uAZXj8eg1/nhrC90qcJ/fcWNi1e993NtaPQiWnZjcC5gilEACa1NM
cxqhxm1/AFpqMQ0Uco4pj+VBr5B+4bSOcpYcfCS02mIkVHrFYaNu079tKVpRaqKT
nZkcAJGfe3jxQH1Pjv66ddLROlQsPs7ghwUgGzrNUpy6NZIBRmD+9Vnl2L1bgMkq
XC1ry1jQqE9pS2uoEKaL0JvqEdNdbIALY/4xbndA9ZYUnd9Noak5tpem4S+9DqBJ
YSp9dgszQq8sGWMKzqLR4kWZNI4acAV+PgMzoQi04Q7EVP28x9vWFHgkhE3GVzvt
feZlkkzcVJnPePQxxHfVXHnvdQM2QO13cD2ssJMS/q4DmSrzVsG/aLK7EAc8cN8m
RSkJc0cZAPbLORfQPuSAxGf7uBjwNxji3NLCRVaHy8hZf/oWSv4s9xzNMXwmc/AD
c8lnadh/THeWRxxU+6A3
=SYoI
-----END PGP SIGNATURE-----
Merge tag 'auxdisplay-for-linus-v4.18-rc1' of git://github.com/ojeda/linux
Pull auxdisplay updates from Miguel Ojeda:
"Mostly small fixes and cleanups, plus a non-trivial fix for charlcd
- charlcd: fixes and cleanups (Robert Abel and Sean Young)
- Kconfig fixes (Randy Dunlap, Corentin Labbe and Ulf Magnusson)
- cfag12864bfb: const cleanup (Gustavo A. R. Silva)
- Docs/licenses/warnings cleanups"
* tag 'auxdisplay-for-linus-v4.18-rc1' of git://github.com/ojeda/linux:
auxdisplay: Replace licenses with SPDX identifiers
auxdisplay: make PANEL a menuconfig
auxdisplay: fix broken menu
auxdisplay: charlcd: Fix and clean up handling of x/y commands
auxdisplay: charlcd: fix hex literal ranges for graphics command
auxdisplay: charlcd: fix two-line command ^[[LN not marked as processed
auxdisplay: charlcd: replace octal literal with form-feed escape sequence
auxdisplay: charlcd: use null character instead of zero literal to terminate strings
auxdisplay: charlcd: no need to call charlcd_gotoxy() if nothing changes
auxdisplay: cfag12864bfb: constify fb_fix_screeninfo and fb_var_screeninfo structures
auxdisplay: img-ascii-lcd: fix typo on select SYSCON/MFD_SYSCON
auxdisplay: img-ascii-lcd: kconfig: Remove MIPS_SEAD3 reference
auxdisplay: arm-charlcd: Fix struct charlcd doc line
MAINTAINERS: auxdisplay: remove obsolete webpages
Doc: misc-devices: move lcd-panel-cgram.txt to auxdisplay/
Make sure that XDP_SKB also uses the skb Tx path.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Here, the xdpsock sample application is adjusted to the new descriptor
format.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
As Michal noted the flow struct takes both the flow label and priority.
Update the bpf_fib_lookup API to note that it is flowinfo and not just
the flow label.
Cc: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-05-24
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).
2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.
3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.
4) Jiong Wang adds support for indirect and arithmetic shifts to NFP
5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.
6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.
7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.
8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.
9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update xdp_monitor to use the recently added err code introduced
in tracepoint xdp:xdp_devmap_xmit, to show if the drop count is
caused by some driver general delivery problem. Other kind of drops
will likely just be more normal TX space issues.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The xdp_monitor sample/tool is updated to use the new tracepoint
xdp:xdp_devmap_xmit the previous patch just introduced.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This is mostly to test kprobe/uprobe which needs kernel headers.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adapt xdpsock to use the new getsockopt introduced in the previous
commit.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.
TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.
The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.
Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up SPDX-License-Identifier and removing licensing leftovers.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Only consider forwarding packets if ttl in received packet is > 1 and
decrement ttl before handing off to bpf_redirect_map.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Building samples with clang ignores the $(Q) setting, always
printing full command to the output. Make it less verbose.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make complains that it doesn't know how to make libbpf.a:
scripts/Makefile.host:106: target 'samples/bpf/../../tools/lib/bpf/libbpf.a' doesn't match the target pattern
Now that we have it as a dependency of the sources simply add libbpf.a
to libraries not objects.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There are many ways users may compile samples, some of them got
broken by commit 5f9380572b ("samples: bpf: compile and link
against full libbpf"). Improve path resolution and make libbpf
building a dependency of source files to force its build.
Samples should now again build with any of:
cd samples/bpf; make
make samples/bpf/
make -C samples/bpf
cd samples/bpf; make O=builddir
make samples/bpf/ O=builddir
make -C samples/bpf O=builddir
export KBUILD_OUTPUT=builddir
make samples/bpf/
make -C samples/bpf
Fixes: 5f9380572b ("samples: bpf: compile and link against full libbpf")
Reported-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The libbpf.h file in samples is clashing with libbpf's header.
Since it only includes a subset of filter.h instruction helpers
rename it to bpf_insn.h. Drop the unnecessary include of bpf/bpf.h.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There are two files in the tree called libbpf.h which is becoming
problematic. Most samples don't actually need the local libbpf.h
they simply include it to get to bpf/bpf.h. Include bpf/bpf.h
directly instead.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that we can use full powers of libbpf in BPF samples, we
should perhaps make the simplest XDP programs not depend on
bpf_load helpers. This way newcomers will be exposed to the
recommended library from the start.
Use of bpf_prog_load_xattr() will also make it trivial to later
on request offload of the programs by simply adding ifindex to
the xattr.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There are two copies of event reading loop - in bpftool and
trace_helpers "library". Consolidate them and move the code
to libbpf. Return codes from trace_helpers are kept, but
renamed to include LIBBPF prefix.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
samples/bpf currently cherry-picks object files from tools/lib/bpf
to link against. Just compile the full library and link statically
against it.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Both tools/lib/bpf/libbpf.h and samples/bpf/bpf_load.h define their
own version of struct bpf_map_def. The version in bpf_load.h has
more fields. libbpf does not support inner maps and its definition
of struct bpf_map_def lacks the related fields. Rename the definition
in bpf_load.h (samples/bpf) to avoid conflicts.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Simple example of fast-path forwarding. It has a serious flaw
in not verifying the egress device index supports XDP forwarding.
If the egress device does not packets are dropped.
Take this only as a simple example of fast-path forwarding.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This is a sample application for AF_XDP sockets. The application
supports three different modes of operation: rxdrop, txonly and l2fwd.
To show-case a simple round-robin load-balancing between a set of
sockets in an xskmap, set the RR_LB compile time define option to 1 in
"xdpsock.h".
v2: The entries variable was calculated twice in {umem,xq}_nb_avail.
Co-authored-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit d5a00528b5 ("syscalls/core, syscalls/x86: Rename
struct pt_regs-based sys_*() to __x64_sys_*()") renamed a lot
of syscall function sys_*() to __x64_sys_*().
This caused several kprobe based samples/bpf tests failing.
This patch fixed the problem in bpf_load.c.
For x86_64 architecture, function name __x64_sys_*() will be
first used for kprobe event creation. If the creation is successful,
it will be used. Otherwise, function name sys_*() will be used
for kprobe event creation.
Fixes: d5a00528b5 ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There is no functionality change in this patch. The common-purpose
trace functions, including perf_event polling and ksym lookup,
are moved from trace_output_user.c and bpf_load.c to
selftests/bpf/trace_helpers.c so that these function can
be reused later in selftests.
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 redundant ret assignments removed:
* 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
label 'done'. No 'ret = 1' needed before the error jump.
* After the '/* load programs */' part, if everything goes well, then
the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
after the for clause will make the error skipped.
For example, if some BPF code cannot provide supported program types
in ELF SEC("unknown"), the for clause will not call load_and_attach()
to load the BPF code. 1 should be returned to callees instead of 0.
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Move the testsuite to
selftests/bpf/{test_tunnel_kern.c, test_tunnel.sh}
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The BPF sample sockmap is redundant now that equivelant tests exist
in the BPF selftests. Lets remove this sample and only keep the
selftest version that will be run as part of the selftest suite.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If no options are passed to sockmap after this patch we run a set of
tests using various options and sendmsg/sendpage sizes. This replaces
the sockmap_test.sh script.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
By moving sockmap_test from shell script into C we can run it directly
from selftests, but we can also push the input/output around in proper
structures.
However, keep the CLI options around because they are useful for
debugging when a paticular pattern of msghdr or sockmap options
trips up the sockmap code path.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a test for fetching xfrm state parameters from a tc program running
on ingress.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Per Documentation/bpf/bpf_devel_QA.txt add the -target flag to the
sockmap Makefile. Relevant text quoted here,
Otherwise, you can use bpf target. Additionally, you _must_ use
bpf target when:
- Your program uses data structures with pointer or long / unsigned
long types that interface with BPF helpers or context data
structures. Access into these structures is verified by the BPF
verifier and may result in verification failures if the native
architecture is not aligned with the BPF architecture, e.g. 64-bit.
An example of this is BPF_PROG_TYPE_SK_MSG require '-target bpf'
Fixes: 69e8cc134b ("bpf: sockmap sample program")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull livepatching fix from Jiri Kosina:
"Shadow variable API list_head initialization fix from Petr Mladek"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Allow to call a custom callback when freeing shadow variables
livepatch: Initialize shadow variables safely by a custom callback
adding bpf's sample program which is using bpf_xdp_adjust_tail helper
by generating ICMPv4 "packet to big" message if ingress packet's size is
bigger then 600 bytes
Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The variable rec_i contains an XDP action code not an error.
Thus, using err2str() was wrong, it should have been action2str().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The program run against loopback interace "lo", not "eth0".
Correct the comment.
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
We might need to do some actions before the shadow variable is freed.
For example, we might need to remove it from a list or free some data
that it points to.
This is already possible now. The user can get the shadow variable
by klp_shadow_get(), do the necessary actions, and then call
klp_shadow_free().
This patch allows to do it a more elegant way. The user could implement
the needed actions in a callback that is passed to klp_shadow_free()
as a parameter. The callback usually does reverse operations to
the constructor callback that can be called by klp_shadow_*alloc().
It is especially useful for klp_shadow_free_all(). There we need to do
these extra actions for each found shadow variable with the given ID.
Note that the memory used by the shadow variable itself is still released
later by rcu callback. It is needed to protect internal structures that
keep all shadow variables. But the destructor is called immediately.
The shadow variable must not be access anyway after klp_shadow_free()
is called. The user is responsible to protect this any suitable way.
Be aware that the destructor is called under klp_shadow_lock. It is
the same as for the contructor in klp_shadow_alloc().
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The existing API allows to pass a sample data to initialize the shadow
data. It works well when the data are position independent. But it fails
miserably when we need to set a pointer to the shadow structure itself.
Unfortunately, we might need to initialize the pointer surprisingly
often because of struct list_head. It is even worse because the list
might be hidden in other common structures, for example, struct mutex,
struct wait_queue_head.
For example, this was needed to fix races in ALSA sequencer. It required
to add mutex into struct snd_seq_client. See commit b3defb791b
("ALSA: seq: Make ioctls race-free") and commit d15d662e89
("ALSA: seq: Fix racy pool initializations")
This patch makes the API more safe. A custom constructor function and data
are passed to klp_shadow_*alloc() functions instead of the sample data.
Note that ctor_data are no longer a template for shadow->data. It might
point to any data that might be necessary when the constructor is called.
Also note that the constructor is called under klp_shadow_lock. It is
an internal spin_lock that synchronizes alloc() vs. get() operations,
see klp_shadow_get_or_alloc(). On one hand, this adds a risk of ABBA
deadlocks. On the other hand, it allows to do some operations safely.
For example, we could add the new structure into an existing list.
This must be done only once when the structure is allocated.
Reported-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This adds support for generating coredumps for remoteprocs using
devcoredump, adds the Qualcomm sysmon driver for intra-remoteproc crash
handling and a number of fixes in Qualcomm and IMX drivers.
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAlrL5dgbHGJqb3JuLmFu
ZGVyc3NvbkBsaW5hcm8ub3JnAAoJEAsfOT8Nma3FufUP/ifQ+QbDzYzBI4CHfbef
hls8bplN9GlW4acJuhVRe8ikGz6vHZdyH88KJJX+IHekDXFqk10rWEgcyfV0h5mU
qOrhiW7hjfm2Mhs0MvNg9rKxbFrD5wAeJBJn8ukTB4z4jc9iFPazVu37V3ul/wVR
yMM4zI6kA54UwsLunqLEapb1thgLO4P99tvA+vdR8ekz/j0AVHB7HiHVLY1JX5Gu
D679icQSWOSsSQaRR7G0E97ZKZF6XaM6+aJV3eZol1HQw8k3I/Vbudi2wIKj0Ia8
jLO5JyCpgeSAAjPCZLrB5XyUXl/vU7PSPKTPyOnv2lK1wQgKsmHX80HfeAt1ISNn
4nY/Oy651/rjBN3FR9HQvQoqCDF08GG5YRDmUZCg0kS/OStrbJT4eCm+hklvanmK
O98LEWm5+FLqk+9Wau1DXGa0raXon2REIJxQSDnBUODZSqTQBS1iQmdtdYtrZxRp
17LtCABCSBMbAI8VE6eb/5jj/z07aI0pBwKEYEWOOG2xsF6q1GGZAXRxnmMUOWAv
bwSgiNqNbbYUdYMjVYDxDtWIp3/Dy8VQ3N3Fz61SieS0GjJxDy/SyNByzVL4eEjk
yYj+N74Dxgms1V/xknQmJn4ktwqaVxapnj52WF1WaEtIcm5WvPFZUVNSFvWMlbcD
rAiTKkpFC0CkINlM+dk3Ti/a
=cxV7
-----END PGP SIGNATURE-----
Merge tag 'rproc-v4.17' of git://github.com/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
- add support for generating coredumps for remoteprocs using
devcoredump
- add the Qualcomm sysmon driver for intra-remoteproc crash handling
- a number of fixes in Qualcomm and IMX drivers
* tag 'rproc-v4.17' of git://github.com/andersson/remoteproc:
remoteproc: fix null pointer dereference on glink only platforms
soc: qcom: qmi: add CONFIG_NET dependency
remoteproc: imx_rproc: Slightly simplify code in 'imx_rproc_probe()'
remoteproc: imx_rproc: Re-use existing error handling path in 'imx_rproc_probe()'
remoteproc: imx_rproc: Fix an error handling path in 'imx_rproc_probe()'
samples: Introduce Qualcomm QMI sample client
remoteproc: qcom: Introduce sysmon
remoteproc: Pass type of shutdown to subdev remove
remoteproc: qcom: Register segments for core dump
soc: qcom: mdt-loader: Return relocation base
remoteproc: Rename "load_rsc_table" to "parse_fw"
remoteproc: Add remote processor coredump support
remoteproc: Remove null character write of shared mem
Pull networking updates from David Miller:
1) Support offloading wireless authentication to userspace via
NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.
2) A lot of work on network namespace setup/teardown from Kirill Tkhai.
Setup and cleanup of namespaces now all run asynchronously and thus
performance is significantly increased.
3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon
Streiff.
4) Support zerocopy on RDS sockets, from Sowmini Varadhan.
5) Use denser instruction encoding in x86 eBPF JIT, from Daniel
Borkmann.
6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime
Chevallier.
7) Support grafting of child qdiscs in mlxsw driver, from Nogah
Frankel.
8) Add packet forwarding tests to selftests, from Ido Schimmel.
9) Deal with sub-optimal GSO packets better in BBR congestion control,
from Eric Dumazet.
10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.
11) Add path MTU tests to selftests, from Stefano Brivio.
12) Various bits of IPSEC offloading support for mlx5, from Aviad
Yehezkel, Yossi Kuperman, and Saeed Mahameed.
13) Support RSS spreading on ntuple filters in SFC driver, from Edward
Cree.
14) Lots of sockmap work from John Fastabend. Applications can use eBPF
to filter sendmsg and sendpage operations.
15) In-kernel receive TLS support, from Dave Watson.
16) Add XDP support to ixgbevf, this is significant because it should
allow optimized XDP usage in various cloud environments. From Tony
Nguyen.
17) Add new Intel E800 series "ice" ethernet driver, from Anirudh
Venkataramanan et al.
18) IP fragmentation match offload support in nfp driver, from Pieter
Jansen van Vuuren.
19) Support XDP redirect in i40e driver, from Björn Töpel.
20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of
tracepoints in their raw form, from Alexei Starovoitov.
21) Lots of striding RQ improvements to mlx5 driver with many
performance improvements, from Tariq Toukan.
22) Use rhashtable for inet frag reassembly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits)
net: mvneta: improve suspend/resume
net: mvneta: split rxq/txq init and txq deinit into SW and HW parts
ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
net: bgmac: Correctly annotate register space
route: check sysctl_fib_multipath_use_neigh earlier than hash
fix typo in command value in drivers/net/phy/mdio-bitbang.
sky2: Increase D3 delay to sky2 stops working after suspend
net/mlx5e: Set EQE based as default TX interrupt moderation mode
ibmvnic: Disable irqs before exiting reset from closed state
net: sched: do not emit messages while holding spinlock
vlan: also check phy_driver ts_info for vlan's real device
Bluetooth: Mark expected switch fall-throughs
Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME
Bluetooth: btrsi: remove unused including <linux/version.h>
Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4
sh_eth: kill useless check in __sh_eth_get_regs()
sh_eth: add sh_eth_cpu_data::no_xdfar flag
ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()
ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()
...
Add BPF_SK_SKB_STREAM_VERDICT tests for ingress hook. While
we do this also bring stream tests in-line with MSG based
testing.
A map for skb options is added for userland to push options
at BPF programs.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a set of tests to verify ingress flag in redirect helpers
works correctly with various msg sizes.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
add empty raw_tracepoint bpf program to test overhead similar
to kprobe and traditional tracepoint tests
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Trivial fix to spelling mistake in error message text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Access to the socket API and the root network namespace is only available
when networking is enabled:
ERROR: "kernel_sendmsg" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "sock_release" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "sock_create_kern" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "kernel_getsockname" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "init_net" [drivers/soc/qcom/qmi_helpers.ko] undefined!
ERROR: "kernel_recvmsg" [drivers/soc/qcom/qmi_helpers.ko] undefined!
Adding a dependency on CONFIG_NET lets us build it in all randconfig
builds.
Fixes: 9b8a11e826 ("soc: qcom: Introduce QMI encoder/decoder")
Acked-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
When FIFO mode is enabled, the receive data available interrupt
(UART_IIR_RDI in code) should be triggered when the number of data
in FIFO is equal or larger than interrupt trigger level.
This patch changes the trigger level check to ensure multiple bytes
received from upper layer can trigger RDI interrupt correctly.
Cc: Joey Zheng <yu.zheng@hxt-semitech.com>
Signed-off-by: Shunyong Yang <shunyong.yang@hxt-semitech.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This adds the test script I am currently using to validate
the latest sockmap changes. Shortly sockmap will be ported
to selftests and these will be run from the infrastructure
there. Until then add the script here so we have a coverage
checklist when porting into selftests.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds an option to test the msg_pull_data helper. This
uses two options txmsg_start and txmsg_end to let the user
specify start and end bytes to pull.
The options can be used with txmsg_apply, txmsg_cork options
as well as with any of the basic tests, txmsg, txmsg_redir and
txmsg_drop (plus noisy variants) to run pull_data inline with
those tests. By giving user direct control over the variables
we can easily do negative testing as well as positive tests.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add tests for SK_DROP.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add sample application support for the bpf_msg_cork_bytes helper. This
lets the user specify how many bytes each verdict should apply to.
Similar to apply_bytes() tests these can be run as a stand-alone test
when used without other options or inline with other tests by using
the txmsg_cork option along with any of the basic tests txmsg,
txmsg_redir, txmsg_drop.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds an option to test the apply_bytes helper. This option lets
the user specify an int on the command line specifying how much data
each verdict should apply to.
When this is set a map entry is set with the bytes input by the user
and then the specified program --txmsg or --txmsg_redir will use the
value and set the applied data. If no other option is set then a
default --txmsg_apply program is run. This program will drop pkts
if an error is detected on the bytes map lookup. Useful to verify
the map lookup and apply helper are working and causing a hard
error if it is not.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To verify data is not being dropped or corrupted this adds an option
to verify test-patterns on recv.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To exercise TX ULP sendpage implementation we need a test that does
a sendfile. Add sendfile test option here.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add sockmap option to use SK_MSG program types.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The Tile architecture port was added by Chris Metcalf in 2010, and
maintained until early 2018 when he orphaned it due to his departure
from Mellanox, and nobody else stepped up to maintain it. The product
line is still around in the form of the BlueField SoC, but no longer
uses the Tile architecture.
There are also still products for sale with Tile-GX SoCs, notably the
Mikrotik CCR router family. The products all use old (linux-3.3) kernels
with lots of patches and won't be upgraded by their manufacturers. There
have been efforts to port both OpenWRT and Debian to these, but both
projects have stalled and are very unlikely to be continued in the future.
Given that we are reasonably sure that nobody is still using the port
with an upstream kernel any more, it seems better to remove it now while
the port is in a good shape than to let it bitrot for a few years first.
Cc: Chris Metcalf <chris.d.metcalf@gmail.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: http://www.mellanox.com/page/npu_multicore_overview
Link: https://jenkins.debian.net/view/rebootstrap/job/rebootstrap_tilegx_gcc7/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The Analog Devices Blackfin port was added in 2007 and was rather
active for a while, but all work on it has come to a standstill
over time, as Analog have changed their product line-up.
Aaron Wu confirmed that the architecture port is no longer relevant,
and multiple people suggested removing blackfin independently because
of some of its oddities like a non-working SMP port, and the amount of
duplication between the chip variants, which cause extra work when
doing cross-architecture changes.
Link: https://docs.blackfin.uclinux.org/
Acked-by: Aaron Wu <Aaron.Wu@analog.com>
Acked-by: Bryan Wu <cooloney@gmail.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mike Frysinger <vapier@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This commit adds additional test in the trace_event example, by
attaching the bpf program to MEM_UOPS_RETIRED.LOCK_LOADS event with
PERF_SAMPLE_ADDR requested, and print the lock address value read from
the bpf program to trace_pipe.
Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
All of the conflicts were cases of overlapping changes.
In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds tests for GRE sequence number
support for metadata mode tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
test_cgrp2_sock.sh and test_cgrp2_sock2.sh tests keep the program
attached to cgroup even after completion.
Using detach functionality of test_cgrp2_sock in both scripts.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJalLBaAAoJEIly9N/cbcAmVt4P/R3s27mxdqYTXywVfhF10Av6
u+nyme53vTURvhTASqbD/7SCj8yV9EM7tI6qeQh/7aJf/I4Rc5/YkTGtL4Hitkcb
CVMwMeoEAl63ZCsy3X3osSI33jGAWNWU5/4+UJZtFo4TM/3RyWnVS7jIj5nI2KaA
y/t37klFVn7j0lKwiz+EP7B74h++CN+ReAc1Cxqd5HE1NLz15zsy+Ajqs15I5dtv
InQg33uBk71gHifFvCxqXWp6w8IngQt6JeJ/LN6GgB/mQ5AIwVGL33bt+vTjYorT
SincCvE2SoGGEjgefjWWwADQC4luYudzPZTnZRypi7NbqaITxn0VBT8Vskdr2OiQ
Ud1on+DUX//JfRkLxFC3sxoA9LCbt3zFzsYd33B9JqvqmnCy+LqgiHTvl+1Bijh6
fQMGzPu7lH0Q/wpvVJYcsq0rA3S3yUOaXpEsLFBHO1uLRZGAkFDF5fgY2DNJck5V
IeLSyGtphKbGUTRd37sqdoEaGQiAvczh4wO/y156sldELagaTkh4cvEGHIynzLZA
jIhTsCD4U5Ht+e6Tvm9ZEHBHz9OmxzbXLfdiGhNEDtbxRFzfnbhy7ZDDCcXnxsm7
xhMg2CCY77vQHbJ44g1DIx2oc06kstt/CpdZ0OwI6i0QegqVJzN7T5u8Hsf+HXzK
MWh/yJ8ZOooQeUWVRUYh
=oIgK
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v4.16-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into fixes-v4.16-rc4
- do not build samples when cross compiling (Michal Hocko)
From Kees: "This disables the seccomp samples when cross compiling. We're seen too many build issues here, so
it's best to just disable it, especially since they're just the samples."
CPU is active when have running tasks on it and CPUFreq governor can
select different operating points (OPP) according to different workload;
we use 'pstate' to present CPU state which have running tasks with one
specific OPP. On the other hand, CPU is idle which only idle task on
it, CPUIdle governor can select one specific idle state to power off
hardware logics; we use 'cstate' to present CPU idle state.
Based on trace events 'cpu_idle' and 'cpu_frequency' we can accomplish
the duration statistics for every state. Every time when CPU enters
into or exits from idle states, the trace event 'cpu_idle' is recorded;
trace event 'cpu_frequency' records the event for CPU OPP changing, so
it's easily to know how long time the CPU stays in the specified OPP,
and the CPU must be not in any idle state.
This patch is to utilize the mentioned trace events for pstate and
cstate statistics. To achieve more accurate profiling data, the program
uses below sequence to insure CPU running/idle time aren't missed:
- Before profiling the user space program wakes up all CPUs for once, so
can avoid to missing account time for CPU staying in idle state for
long time; the program forces to set 'scaling_max_freq' to lowest
frequency and then restore 'scaling_max_freq' to highest frequency,
this can ensure the frequency to be set to lowest frequency and later
after start to run workload the frequency can be easily to be changed
to higher frequency;
- User space program reads map data and update statistics for every 5s,
so this is same with other sample bpf programs for avoiding big
overload introduced by bpf program self;
- When send signal to terminate program, the signal handler wakes up
all CPUs, set lowest frequency and restore highest frequency to
'scaling_max_freq'; this is exactly same with the first step so
avoid to missing account CPU pstate and cstate time during last
stage. Finally it reports the latest statistics.
The program has been tested on Hikey board with octa CA53 CPUs, below
is one example for statistics result, the format mainly follows up
Jesper Dangaard Brouer suggestion.
Jesper reminds to 'get printf to pretty print with thousands separators
use %' and setlocale(LC_NUMERIC, "en_US")', tried three different arm64
GCC toolchains (5.4.0 20160609, 6.2.1 20161016, 6.3.0 20170516) but all
of them cannot support printf flag character %' on arm64 platform, so go
back print number without grouping mode.
CPU states statistics:
state(ms) cstate-0 cstate-1 cstate-2 pstate-0 pstate-1 pstate-2 pstate-3 pstate-4
CPU-0 767 6111 111863 561 31 756 853 190
CPU-1 241 10606 107956 484 125 646 990 85
CPU-2 413 19721 98735 636 84 696 757 89
CPU-3 84 11711 79989 17516 909 4811 5773 341
CPU-4 152 19610 98229 444 53 649 708 1283
CPU-5 185 8781 108697 666 91 671 677 1365
CPU-6 157 21964 95825 581 67 566 684 1284
CPU-7 125 15238 102704 398 20 665 786 1197
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
samples/seccomp relies on the host setting which is not suitable for
crosscompilation and it actually fails when crosscompiling s390 and
powerpc all{yes,mod}config on x86_64 with
samples/seccomp/bpf-helper.h:135:2: error: #error __BITS_PER_LONG value unusable.
#error __BITS_PER_LONG value unusable.
^
In file included from samples/seccomp/bpf-fancy.c:13:0:
samples/seccomp/bpf-fancy.c: In function ‘main’:
samples/seccomp/bpf-fancy.c:38:11: error: ‘__NR_exit’ undeclared (first use in this function)
SYSCALL(__NR_exit, ALLOW),
and many others. I am doing these for compile testing and it's been
quite useful to catch issues. Crosscompiling sample code on the other
hand doesn't seem all that important so it seems like the easiest way to
simply disable samples/seccomp when crosscompiling.
Fixing this properly is not that easy as Kees explains:
: IIRC, one of the problems is with build ordering problems: the kernel
: headers used by the samples aren't available when cross compiling.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
samples/sockops program keeps the sock_ops program attached to cgroup.
Fixed this by detaching program before exit.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
While building samples/sockmap, undefined reference error is thrown
for `nla_dump_errormsg'.
Linking tools/lib/bpf/nlattr.o as a fix
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure.
e.g.
[root@labbpf]# ./xdp_redirect $(</sys/class/net/eth2/ifindex) \
> $(</sys/class/net/eth3/ifindex)
failed to create a map: 1 Operation not permitted
The failure is seen when executing xdp_redirect while xdp_monitor
is already runnig.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce a sample driver that register for server notifications and
spawn clients for each available test service (service 15). The spawned
clients implements the interface for encoding "ping" and "data" requests
and decode the responses from the remote.
Acked-By: Chris Lew <clew@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
The commit c69de58ba8 ("net: erspan: use bitfield instead of
mask and offset") changes the erspan header to use bitfield, and
commit d350a82302 ("net: erspan: create erspan metadata uapi header")
creates a uapi header file. The above two commit breaks the current
erspan test. This patch fixes it by adapting the above two changes.
Fixes: ac80c2a165 ("samples/bpf: add erspan v2 sample code")
Fixes: ef88f89c83 ("samples/bpf: extend test_tunnel_bpf.sh with ERSPAN")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2018-02-02
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) support XDP attach in libbpf, from Eric.
2) minor fixes, from Daniel, Jakub, Yonghong, Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use bpf_set_link_xdp_fd instead of set_link_xdp_fd to remove some
code duplication and benefit of netlink ext ack errors message.
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Parse netlink ext attribute to get the error message returned by
the card. Code is partially take from libnl.
We add netlink.h to the uapi include of tools. And we need to
avoid include of userspace netlink header to have a successful
build of sample so nlattr.h has a define to avoid
the inclusion. Using a direct define could have been an issue
as NLMSGERR_ATTR_MAX can change in the future.
We also define SOL_NETLINK if not defined to avoid to have to
copy socket.h for a fixed value.
Signed-off-by: Eric Leblond <eric@regit.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with reworks
to try to attempt to make the code easier to handle in the long run, but
no functional change. There's also some tree-wide sysfs attribute
fixups with lots of acks from the various subsystem maintainers, as well
as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLvPw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynNzACgkzjPoBytJWbpWFt6SR6L33/u4kEAnRFvVCGL
s6ygQPQhZIjKk2Lxa2hC
=Zihy
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with
reworks to try to attempt to make the code easier to handle in the
long run, but no functional change. There's also some tree-wide sysfs
attribute fixups with lots of acks from the various subsystem
maintainers, as well as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
device property: Define type of PROPERTY_ENRTY_*() macros
device property: Reuse property_entry_free_data()
device property: Move property_entry_free_data() upper
firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
USB: serial: keyspan: Drop firmware Kconfig options
sysfs: remove DEBUG defines
sysfs: use SPDX identifiers
drivers: base: add coredump driver ops
sysfs: add attribute specification for /sysfs/devices/.../coredump
test_firmware: fix missing unlock on error in config_num_requests_store()
test_firmware: make local symbol test_fw_config static
sysfs: turn WARN() into pr_warn()
firmware: Fix a typo in fallback-mechanisms.rst
treewide: Use DEVICE_ATTR_WO
treewide: Use DEVICE_ATTR_RO
treewide: Use DEVICE_ATTR_RW
sysfs.h: Use octal permissions
component: add debugfs support
bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
...
Pull networking updates from David Miller:
1) Significantly shrink the core networking routing structures. Result
of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf
2) Add netdevsim driver for testing various offloads, from Jakub
Kicinski.
3) Support cross-chip FDB operations in DSA, from Vivien Didelot.
4) Add a 2nd listener hash table for TCP, similar to what was done for
UDP. From Martin KaFai Lau.
5) Add eBPF based queue selection to tun, from Jason Wang.
6) Lockless qdisc support, from John Fastabend.
7) SCTP stream interleave support, from Xin Long.
8) Smoother TCP receive autotuning, from Eric Dumazet.
9) Lots of erspan tunneling enhancements, from William Tu.
10) Add true function call support to BPF, from Alexei Starovoitov.
11) Add explicit support for GRO HW offloading, from Michael Chan.
12) Support extack generation in more netlink subsystems. From Alexander
Aring, Quentin Monnet, and Jakub Kicinski.
13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
Russell King.
14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.
15) Many improvements and simplifications to the NFP driver bpf JIT,
from Jakub Kicinski.
16) Support for ipv6 non-equal cost multipath routing, from Ido
Schimmel.
17) Add resource abstration to devlink, from Arkadi Sharshevsky.
18) Packet scheduler classifier shared filter block support, from Jiri
Pirko.
19) Avoid locking in act_csum, from Davide Caratti.
20) devinet_ioctl() simplifications from Al viro.
21) More TCP bpf improvements from Lawrence Brakmo.
22) Add support for onlink ipv6 route flag, similar to ipv4, from David
Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
tls: Add support for encryption using async offload accelerator
ip6mr: fix stale iterator
net/sched: kconfig: Remove blank help texts
openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
tcp_nv: fix potential integer overflow in tcpnv_acked
r8169: fix RTL8168EP take too long to complete driver initialization.
qmi_wwan: Add support for Quectel EP06
rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
ipmr: Fix ptrdiff_t print formatting
ibmvnic: Wait for device response when changing MAC
qlcnic: fix deadlock bug
tcp: release sk_frag.page in tcp_disconnect
ipv4: Get the address of interface correctly.
net_sched: gen_estimator: fix lockdep splat
net: macb: Handle HRESP error
net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
ipv6: addrconf: break critical section in addrconf_verify_rtnl()
ipv6: change route cache aging logic
i40e/i40evf: Update DESC_NEEDED value to reflect larger value
bnxt_en: cleanup DIM work on device shutdown
...
Pull livepatching updates from Jiri Kosina:
- handle 'infinitely'-long sleeping tasks, from Miroslav Benes
- remove 'immediate' feature, as it turns out it doesn't provide the
originally expected semantics, and brings more issues than value
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: add locking to force and signal functions
livepatch: Remove immediate feature
livepatch: force transition to finish
livepatch: send a fake signal to all blocking tasks
Do not build lib/bpf/bpf.o with this Makefile but use the one from the
library directory. This avoid making a buggy bpf.o file (e.g. missing
symbols).
This patch is useful if some code (e.g. Landlock tests) needs both the
bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Avoid extra step of setting limit from cmdline and do it directly in
the program.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Put client sockets in blocking mode otherwise with sendmsg tests
its easy to overrun the socket buffers which results in the test
being aborted.
The original non-blocking was added to handle listen/accept with
a single thread the client/accepted sockets do not need to be
non-blocking.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a base test that does not use BPF hooks to test baseline case.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Report bytes/sec sent as well as total bytes. Useful to get rough
idea how different configurations and usage patterns perform with
sockmap.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently for SENDMSG tests first send completes then recv runs. This
does not work well for large data sizes and/or many iterations. So
fork the recv and send handler so that we run both send and recv. In
the future we can add a parameter to do more than a single fork of
tx/rx.
With this we can get many GBps of data which helps exercise the
sockmap code.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When testing BPF programs using sockmap I often want to have more
control over how sendmsg is exercised. This becomes even more useful
as new sockmap program types are added.
This adds a test type option to select type of test to run. Currently,
only "ping" and "sendmsg" are supported, but more can be added as
needed.
The new help argument gives the following,
Usage: ./sockmap --cgroup <cgroup_path>
options:
--help -h
--cgroup -c
--rate -r
--verbose -v
--iov_count -i
--length -l
--test -t
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
sockmap sample program takes arguments from cmd line but it reads them
in using offsets into the array. Because we want to add more arguments
in the future lets do proper argument handling.
Also refactor code to pull apart sock init and ping/pong test. This
allows us to add new tests in the future.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The xdp_redirect_cpu sample have some "builtin" monitoring of the
tracepoints for xdp_cpumap_*, but it is practical to have an external
tool that can monitor these transpoint as an easy way to troubleshoot
an application using XDP + cpumap.
Specifically I need such external tool when working on Suricata and
XDP cpumap redirect. Extend the xdp_monitor tool sample with
monitoring of these xdp_cpumap_* tracepoints. Model the output format
like xdp_redirect_cpu.
Given I needed to handle per CPU decoding for cpumap, this patch also
add per CPU info on the existing monitor events. This resembles part
of the builtin monitoring output from sample xdp_rxq_info. Thus, also
covering part of that sample in an external monitoring tool.
Performance wise, the cpumap tracepoints uses bulking, which cause
them to have very little overhead. Thus, they are enabled by default.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Improve the 'unknown reason' comment, with an actual explaination of why
the ctx pkt-data pointers need to be loaded after the helper function
bpf_xdp_adjust_meta(). Based on the explaination Daniel gave.
Fixes: 36e04a2d78 ("samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Immediate flag has been used to disable per-task consistency and patch
all tasks immediately. It could be useful if the patch doesn't change any
function or data semantics.
However, it causes problems on its own. The consistency problem is
currently broken with respect to immediate patches.
func a
patches 1i
2i
3
When the patch 3 is applied, only 2i function is checked (by stack
checking facility). There might be a task sleeping in 1i though. Such
task is migrated to 3, because we do not check 1i in
klp_check_stack_func() at all.
Coming atomic replace feature would be easier to implement and more
reliable without immediate.
Thus, remove immediate feature completely and save us from the problems.
Note that force feature has the similar problem. However it is
considered as a last resort. If used, administrator should not apply any
new live patches and should plan for reboot into an updated kernel.
The architectures would now need to provide HAVE_RELIABLE_STACKTRACE to
fully support livepatch.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Creating a bpf sample that shows howto use the XDP 'data_meta'
infrastructure, created by Daniel Borkmann. Very few drivers support
this feature, but I wanted a functional sample to begin with, when
working on adding driver support.
XDP data_meta is about creating a communication channel between BPF
programs. This can be XDP tail-progs, but also other SKB based BPF
hooks, like in this case the TC clsact hook. In this sample I show
that XDP can store info named "mark", and TC/clsact chooses to use
this info and store it into the skb->mark.
It is a bit annoying that XDP and TC samples uses different tools/libs
when attaching their BPF hooks. As the XDP and TC programs need to
cooperate and agree on a struct-layout, it is best/easiest if the two
programs can be contained within the same BPF restricted-C file.
As the bpf-loader, I choose to not use bpf_load.c (or libbpf), but
instead wrote a bash shell scripted named xdp2skb_meta.sh, which
demonstrate howto use the iproute cmdline tools 'tc' and 'ip' for
loading BPF programs. To make it easy for first time users, the shell
script have command line parsing, and support --verbose and --dry-run
mode, if you just want to see/learn the tc+ip command syntax:
# ./xdp2skb_meta.sh --dev ixgbe2 --dry-run
# Dry-run mode: enable VERBOSE and don't call TC+IP
tc qdisc del dev ixgbe2 clsact
tc qdisc add dev ixgbe2 clsact
tc filter add dev ixgbe2 ingress prio 1 handle 1 bpf da obj ./xdp2skb_meta_kern.o sec tc_mark
# Flush XDP on device: ixgbe2
ip link set dev ixgbe2 xdp off
ip link set dev ixgbe2 xdp obj ./xdp2skb_meta_kern.o sec xdp_mark
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This sample program can be used for monitoring and reporting how many
packets per sec (pps) are received per NIC RX queue index and which
CPU processed the packet. In itself it is a useful tool for quickly
identifying RSS imbalance issues, see below.
The default XDP action is XDP_PASS in-order to provide a monitor
mode. For benchmarking purposes it is possible to specify other XDP
actions on the cmdline --action.
Output below shows an imbalance RSS case where most RXQ's deliver to
CPU-0 while CPU-2 only get packets from a single RXQ. Looking at
things from a CPU level the two CPUs are processing approx the same
amount, BUT looking at the rx_queue_index levels it is clear that
RXQ-2 receive much better service, than other RXQs which all share CPU-0.
Running XDP on dev:i40e1 (ifindex:3) action:XDP_PASS
XDP stats CPU pps issue-pps
XDP-RX CPU 0 900,473 0
XDP-RX CPU 2 906,921 0
XDP-RX CPU total 1,807,395
RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 0:0 180,098 0
rx_queue_index 0:sum 180,098
rx_queue_index 1:0 180,098 0
rx_queue_index 1:sum 180,098
rx_queue_index 2:2 906,921 0
rx_queue_index 2:sum 906,921
rx_queue_index 3:0 180,098 0
rx_queue_index 3:sum 180,098
rx_queue_index 4:0 180,082 0
rx_queue_index 4:sum 180,082
rx_queue_index 5:0 180,093 0
rx_queue_index 5:sum 180,093
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-18
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Allow arbitrary function calls from one BPF function to another BPF function.
As of today when writing BPF programs, __always_inline had to be used in
the BPF C programs for all functions, unnecessarily causing LLVM to inflate
code size. Handle this more naturally with support for BPF to BPF calls
such that this __always_inline restriction can be overcome. As a result,
it allows for better optimized code and finally enables to introduce core
BPF libraries in the future that can be reused out of different projects.
x86 and arm64 JIT support was added as well, from Alexei.
2) Add infrastructure for tagging functions as error injectable and allow for
BPF to return arbitrary error values when BPF is attached via kprobes on
those. This way of injecting errors generically eases testing and debugging
without having to recompile or restart the kernel. Tags for opting-in for
this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.
3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
call for XDP programs. First part of this work adds handling of BPF
capabilities included in the firmware, and the later patches add support
to the nfp verifier part and JIT as well as some small optimizations,
from Jakub.
4) The bpftool now also gets support for basic cgroup BPF operations such
as attaching, detaching and listing current BPF programs. As a requirement
for the attach part, bpftool can now also load object files through
'bpftool prog load'. This reuses libbpf which we have in the kernel tree
as well. bpftool-cgroup man page is added along with it, from Roman.
5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
a single perf event") added support for attaching multiple BPF programs
to a single perf event. Given they are configured through perf's ioctl()
interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
command in this work in order to return an array of one or multiple BPF
prog ids that are currently attached, from Yonghong.
6) Various minor fixes and cleanups to the bpftool's Makefile as well
as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
itself or prior installed documentation related to it, from Quentin.
7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
required for the test_dev_cgroup test case to run, from Naresh.
8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.
9) Fix libbpf's exit code from the Makefile when libelf was not found in
the system, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the existing tests for ipv4 ipv6 erspan version 2.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a basic test for bpf_override_return to verify it works. We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that the SPDX tag is in all kobject files, that identifies the
license in a specific and legally-defined manner. So the extra GPL text
wording can be removed as it is no longer needed at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
No copyright headers or other non-license-description text was removed.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the kobject files files with the correct SPDX license identifier
based on the license text in the file itself. The SPDX identifier is a
legally binding shorthand, which can be used instead of the full boiler
plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Small overlapping change conflict ('net' changed a line,
'net-next' added a line right afterwards) in flexcan.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-03
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Addition of a software model for BPF offloads in order to ease
testing code changes in that area and make semantics more clear.
This is implemented in a new driver called netdevsim, which can
later also be extended for other offloads. SR-IOV support is added
as well to netdevsim. BPF kernel selftests for offloading are
added so we can track basic functionality as well as exercising
all corner cases around BPF offloading, from Jakub.
2) Today drivers have to drop the reference on BPF progs they hold
due to XDP on device teardown themselves. Change this in order
to make XDP handling inside the drivers less error prone, and
move disabling XDP to the core instead, also from Jakub.
3) Misc set of BPF verifier improvements and cleanups as preparatory
work for upcoming BPF-to-BPF calls. Among others, this set also
improves liveness marking such that pruning can be slightly more
effective. Register and stack liveness information is now included
in the verifier log as well, from Alexei.
4) nfp JIT improvements in order to identify load/store sequences in
the BPF prog e.g. coming from memcpy lowering and optimizing them
through the NPU's command push pull (CPP) instruction, from Jiong.
5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove
bpf_prog_attach() magic values and replacing them with actual proper
attach flag instead, from David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend existing tests for vxlan, gre, geneve, ipip, erspan,
to include ip6 gre and gretap tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-02
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a compilation warning in xdp redirect tracepoint due to
missing bpf.h include that pulls in struct bpf_map, from Xie.
2) Limit the maximum number of attachable BPF progs for a given
perf event as long as uabi is not frozen yet. The hard upper
limit is now 64 and therefore the same as with BPF multi-prog
for cgroups. Also add related error checking for the sample
BPF loader when enabling and attaching to the perf event, from
Yonghong.
3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
case, so that the test case can always pass and not fail in
some environments due to too low default limit, also from
Yonghong.
4) Fix up a missing license header comment for kernel/bpf/offload.c,
from Jakub.
5) Several fixes for bpftool, among others a crash on incorrect
arguments when json output is used, error message handling
fixes on unknown options and proper destruction of json writer
for some exit cases, all from Quentin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
load_bpf_file() should fail if ioctl with command
PERF_EVENT_IOC_ENABLE and PERF_EVENT_IOC_SET_BPF fails.
When they do fail, proper error messages are printed.
With this change, the below "syscall_tp" run shows that
the maximum number of bpf progs attaching to the same
perf tracepoint is indeed enforced.
$ ./syscall_tp -i 64
prog #0: map ids 4 5
...
prog #63: map ids 382 383
$ ./syscall_tp -i 65
prog #0: map ids 4 5
...
prog #64: map ids 388 389
ioctl PERF_EVENT_IOC_SET_BPF failed err Argument list too long
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Attach flag 1 == BPF_F_ALLOW_OVERRIDE; attach flag 2 == BPF_F_ALLOW_MULTI.
Update the calls to bpf_prog_attach() in test_cgrp2_attach2.c to use the
names over the magic numbers.
Fixes: 39323e788c ("samples/bpf: add multi-prog cgroup test case")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaC/4EAAoJEAhfPr2O5OEVQnsP/2JNpdLuzgwNp0p2gXrvK5pl
KOsA6Fld6RNmpuel8eHARbbDTPKF1Y1bvYXVo7lPhXb7KuM2IzG56VxoNech/5pX
eflKwnpV/Ns/ZMLYue7Rqdw0iZnjWESBWf5lzg9MvzwhZBaPlRwqu/aOJy360AZr
FnjKHtU/6WUIOCB8r0qLBDR/epoh7y2lKfjDTcEBrURrFEsTajdyd59npdMSIQqO
iUeeBVEIUKyytYDQNM/VOsBnh0G+2inLuykF8Nd6pYs8O0iUEUpZYwdGuwGUG1HB
VmCcRGU62efl5nu8zQMPnwAvjXwZAh8vbS0ha+B1vBJh1RwNVUz0kKIKEgAaOMZ3
zZa3NLfDP4cHgYtr2Xw2vSvJvDwQecmiItJKeZ/Id4cPy03TKEV1KEaHCQJHwbDn
RP/o9C+5gagMO/oIvZPQ+esVZXQ4prAzOdX53N7HPn4Wn+k4clkI0+hMvMGf67mo
EYOguCqbN2D0e11BLiPP1bRbGZRSI8I9xcKuhcw4ajJHbRRkrjl8EW7V6c8CuMkd
0Wj5oidFleJ0Vy+qQOPqXN1FwR7AbHNtI38JfWNz324AIrFCQERpfXVmKwRPZfl4
YXgGIA9fil3a01YJCtxc0PsXlRkveKJ8hKCLpjXbjNTh1oSbgrDxx5sMx9PO6WqJ
VOb6fL17rwTXlKV/GeU/
=d9nT
-----END PGP SIGNATURE-----
Merge tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Documentation for digital TV (both kAPI and uAPI) are now in sync
with the implementation (except for legacy/deprecated ioctls). This
is a major step, as there were always a gap there
- New sensor driver: imx274
- New cec driver: cec-gpio
- New platform driver for rockship rga and tegra CEC
- New RC driver: tango-ir
- Several cleanups at atomisp driver
- Core improvements for RC, CEC, V4L2 async probing support and DVB
- Lots of drivers cleanup, fixes and improvements.
* tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (332 commits)
dvb_frontend: don't use-after-free the frontend struct
media: dib0700: fix invalid dvb_detach argument
media: v4l2-ctrls: Don't validate BITMASK twice
media: s5p-mfc: fix lockdep warning
media: dvb-core: always call invoke_release() in fe_free()
media: usb: dvb-usb-v2: dvb_usb_core: remove redundant code in dvb_usb_fe_sleep
media: au0828: make const array addr_list static
media: cx88: make const arrays default_addr_list and pvr2000_addr_list static
media: drxd: make const array fastIncrDecLUT static
media: usb: fix spelling mistake: "synchronuously" -> "synchronously"
media: ddbridge: fix build warnings
media: av7110: avoid 2038 overflow in debug print
media: Don't do DMA on stack for firmware upload in the AS102 driver
media: v4l: async: fix unregister for implicitly registered sub-device notifiers
media: v4l: async: fix return of unitialized variable ret
media: imx274: fix missing return assignment from call to imx274_mode_regs
media: camss-vfe: always initialize reg at vfe_set_xbar_cfg()
media: atomisp: make function calls cleaner
media: atomisp: get rid of storage_class.h
media: atomisp: get rid of wrong stddef.h include
...
Pull networking updates from David Miller:
"Highlights:
1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.
2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.
3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.
4) Add meter action support to openvswitch, from Andy Zhou.
5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.
6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.
7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.
8) More work to move the RTNL mutex down, from Florian Westphal.
9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.
10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.
11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.
12) TC flower offload support in cxgb4, from Kumar Sanghvi.
13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.
14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.
15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.
16) eBPF based device controller for cgroup v2, from Roman Gushchin.
17) Add some fundamental tracepoints for TCP, from Song Liu.
18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.
19) Add multicast route offload support to mlxsw, from Yotam Gigi"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...
Pull livepatching updates from Jiri Kosina:
- shadow variables support, allowing livepatches to associate new
"shadow" fields to existing data structures, from Joe Lawrence
- pre/post patch callbacks API, allowing livepatch writers to register
callbacks to be called before and after patch application, from Joe
Lawrence
* 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: __klp_disable_patch() should never be called for disabled patches
livepatch: Correctly call klp_post_unpatch_callback() in error paths
livepatch: add transition notices
livepatch: move transition "complete" notice into klp_complete_transition()
livepatch: add (un)patch callbacks
livepatch: Small shadow variable documentation fixes
livepatch: __klp_shadow_get_or_alloc() is local to shadow.c
livepatch: introduce shadow variable API