Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-18
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Allow arbitrary function calls from one BPF function to another BPF function.
As of today when writing BPF programs, __always_inline had to be used in
the BPF C programs for all functions, unnecessarily causing LLVM to inflate
code size. Handle this more naturally with support for BPF to BPF calls
such that this __always_inline restriction can be overcome. As a result,
it allows for better optimized code and finally enables to introduce core
BPF libraries in the future that can be reused out of different projects.
x86 and arm64 JIT support was added as well, from Alexei.
2) Add infrastructure for tagging functions as error injectable and allow for
BPF to return arbitrary error values when BPF is attached via kprobes on
those. This way of injecting errors generically eases testing and debugging
without having to recompile or restart the kernel. Tags for opting-in for
this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.
3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
call for XDP programs. First part of this work adds handling of BPF
capabilities included in the firmware, and the later patches add support
to the nfp verifier part and JIT as well as some small optimizations,
from Jakub.
4) The bpftool now also gets support for basic cgroup BPF operations such
as attaching, detaching and listing current BPF programs. As a requirement
for the attach part, bpftool can now also load object files through
'bpftool prog load'. This reuses libbpf which we have in the kernel tree
as well. bpftool-cgroup man page is added along with it, from Roman.
5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
a single perf event") added support for attaching multiple BPF programs
to a single perf event. Given they are configured through perf's ioctl()
interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
command in this work in order to return an array of one or multiple BPF
prog ids that are currently attached, from Yonghong.
6) Various minor fixes and cleanups to the bpftool's Makefile as well
as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
itself or prior installed documentation related to it, from Quentin.
7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
required for the test_dev_cgroup test case to run, from Naresh.
8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.
9) Fix libbpf's exit code from the Makefile when libelf was not found in
the system, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the existing tests for ipv4 ipv6 erspan version 2.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a basic test for bpf_override_return to verify it works. We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Small overlapping change conflict ('net' changed a line,
'net-next' added a line right afterwards) in flexcan.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-03
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Addition of a software model for BPF offloads in order to ease
testing code changes in that area and make semantics more clear.
This is implemented in a new driver called netdevsim, which can
later also be extended for other offloads. SR-IOV support is added
as well to netdevsim. BPF kernel selftests for offloading are
added so we can track basic functionality as well as exercising
all corner cases around BPF offloading, from Jakub.
2) Today drivers have to drop the reference on BPF progs they hold
due to XDP on device teardown themselves. Change this in order
to make XDP handling inside the drivers less error prone, and
move disabling XDP to the core instead, also from Jakub.
3) Misc set of BPF verifier improvements and cleanups as preparatory
work for upcoming BPF-to-BPF calls. Among others, this set also
improves liveness marking such that pruning can be slightly more
effective. Register and stack liveness information is now included
in the verifier log as well, from Alexei.
4) nfp JIT improvements in order to identify load/store sequences in
the BPF prog e.g. coming from memcpy lowering and optimizing them
through the NPU's command push pull (CPP) instruction, from Jiong.
5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove
bpf_prog_attach() magic values and replacing them with actual proper
attach flag instead, from David.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend existing tests for vxlan, gre, geneve, ipip, erspan,
to include ip6 gre and gretap tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-02
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a compilation warning in xdp redirect tracepoint due to
missing bpf.h include that pulls in struct bpf_map, from Xie.
2) Limit the maximum number of attachable BPF progs for a given
perf event as long as uabi is not frozen yet. The hard upper
limit is now 64 and therefore the same as with BPF multi-prog
for cgroups. Also add related error checking for the sample
BPF loader when enabling and attaching to the perf event, from
Yonghong.
3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
case, so that the test case can always pass and not fail in
some environments due to too low default limit, also from
Yonghong.
4) Fix up a missing license header comment for kernel/bpf/offload.c,
from Jakub.
5) Several fixes for bpftool, among others a crash on incorrect
arguments when json output is used, error message handling
fixes on unknown options and proper destruction of json writer
for some exit cases, all from Quentin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
load_bpf_file() should fail if ioctl with command
PERF_EVENT_IOC_ENABLE and PERF_EVENT_IOC_SET_BPF fails.
When they do fail, proper error messages are printed.
With this change, the below "syscall_tp" run shows that
the maximum number of bpf progs attaching to the same
perf tracepoint is indeed enforced.
$ ./syscall_tp -i 64
prog #0: map ids 4 5
...
prog #63: map ids 382 383
$ ./syscall_tp -i 65
prog #0: map ids 4 5
...
prog #64: map ids 388 389
ioctl PERF_EVENT_IOC_SET_BPF failed err Argument list too long
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Attach flag 1 == BPF_F_ALLOW_OVERRIDE; attach flag 2 == BPF_F_ALLOW_MULTI.
Update the calls to bpf_prog_attach() in test_cgrp2_attach2.c to use the
names over the magic numbers.
Fixes: 39323e788c ("samples/bpf: add multi-prog cgroup test case")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The assert statement is supposed to be part of the else branch but the
curly braces were accidentally left off.
Fixes: 3e29cd0e65 ("xdp: Sample xdp program implementing ip forward")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program was returning -1 in some cases which is not allowed
by the verifier any longer.
Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program was returning -1 in some cases which is not allowed
by the verifier any longer.
Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program was returning -1 in some cases which is not allowed
by the verifier any longer.
Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program was returning -1 in some cases which is not allowed
by the verifier any longer.
Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program was returning -1 in some cases which is not allowed
by the verifier any longer.
Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program was returning -1 in some cases which is not allowed
by the verifier any longer.
Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a basic test for bpf_override_return to verify it works. We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original patch had the wrong filename.
Fixes: bfdf756938 ("bpf: create samples/bpf/tcp_bpf.readme")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements port to port forwarding with route table and arp table
lookup for ipv4 packets using bpf_redirect helper function and
lpm_trie map.
Signed-off-by: Christina Jacob <Christina.Jacob@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The purpose of this move is to use these files in bpf tests.
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Files removed in 'net-next' had their license header updated
in 'net'. We take the remove from 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure.
e.g.
[root@labbpf]# ./xdp_redirect_map $(</sys/class/net/eth2/ifindex) \
> $(</sys/class/net/eth3/ifindex)
failed to create a map: 1 Operation not permitted
The failure is 100% when multiple xdp programs are running. Fix it.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure.
e.g.
[root@lab bpf]#./xdp1 -N $(</sys/class/net/eth2/ifindex)
failed to create a map: 1 Operation not permitted
Fix it.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf sample program syscall_tp is modified to
show attachment of more than bpf programs
for a particular kernel tracepoint.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Readme file explaining how to create a cgroupv2 and attach one
of the tcp_*_kern.o socket_ops BPF program.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked_by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sample socket_ops BPF program to test the BPF helper function
bpf_getsocketops and the new socket_ops op BPF_SOCKET_OPS_BASE_RTT.
The program provides a base RTT of 80us when the calling flow is
within a DC (as determined by the IPV6 prefix) and the congestion
algorithm is "nv".
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked_by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This sample program show how to use cpumap and the associated
tracepoints.
It provides command line stats, which shows how the XDP-RX process,
cpumap-enqueue and cpumap kthread dequeue is cooperating on a per CPU
basis. It also utilize the xdp_exception and xdp_redirect_err
transpoints to allow users quickly to identify setup issues.
One issue with ixgbe driver is that the driver reset the link when
loading XDP. This reset the procfs smp_affinity settings. Thus,
after loading the program, these must be reconfigured. The easiest
workaround it to reduce the RX-queue to e.g. two via:
# ethtool --set-channels ixgbe1 combined 2
And then add CPUs above 0 and 1, like:
# xdp_redirect_cpu --dev ixgbe1 --prog 2 --cpu 2 --cpu 3 --cpu 4
Another issue with ixgbe is that the page recycle mechanism is tied to
the RX-ring size. And the default setting of 512 elements is too
small. This is the same issue with regular devmap XDP_REDIRECT.
To overcome this I've been using 1024 rx-ring size:
# ethtool -G ixgbe1 rx 1024 tx 1024
V3:
- whitespace cleanups
- bpf tracepoint cannot access top part of struct
V4:
- report on kthread sched events, according to tracepoint change
- report average bulk enqueue size
V5:
- bpf_map_lookup_elem on cpumap not allowed from bpf_prog
use separate map to mark CPUs not available
V6:
- correct kthread sched summary output
V7:
- Added a --stress-mode for concurrently changing underlying cpumap
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update to llvm excludes assembly instructions.
llvm git revision is below
commit 65fad7c26569 ("bpf: add inline-asm support")
This change will be part of llvm release 6.0
__ASM_SYSREG_H define is not required for native compile.
-target switch includes appropriate target specific files
while cross compiling
Tested on x86 and arm64.
Signed-off-by: Abhijit Ayarekar <abhijit.ayarekar@caviumnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf sample program trace_event is enhanced to use the new
helper to print out enabled/running time.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf sample program tracex6 is enhanced to use the new
helper to read enabled/running time as well.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Other concurrent running programs, like perf or the XDP program what
needed to be monitored, might take up part of the max locked memory
limit. Thus, the xdp_monitor tool have to set the RLIMIT_MEMLOCK to
RLIM_INFINITY, as it cannot determine a more sane limit.
Using the man exit(3) specified EXIT_FAILURE return exit code, and
correct other users too.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also monitor the tracepoint xdp_exception. This tracepoint is usually
invoked by the drivers. Programs themselves can activate this by
returning XDP_ABORTED, which will drop the packet but also trigger the
tracepoint. This is useful for distinguishing intentional (XDP_DROP)
vs. ebpf-program error cases that cased a drop (XDP_ABORTED).
Drivers also use this tracepoint for reporting on XDP actions that are
unknown to the specific driver. This can help the user to detect if a
driver e.g. doesn't implement XDP_REDIRECT yet.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first 8 bytes of the tracepoint context struct are not accessible
by the bpf code. This is a choice that dates back to the original
inclusion of this code.
See explaination in:
commit 98b5c2c65c ("perf, bpf: allow bpf programs attach to tracepoints")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
use BPF_PROG_QUERY command to strengthen test coverage
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
create 5 cgroups, attach 6 progs and check that progs are executed as:
cgrp1 (MULTI progs A, B) ->
cgrp2 (OVERRIDE prog C) ->
cgrp3 (MULTI prog D) ->
cgrp4 (OVERRIDE prog E) ->
cgrp5 (NONE prog F)
the event in cgrp5 triggers execution of F,D,A,B in that order.
if prog F is detached, the execution is E,D,A,B
if prog F and D are detached, the execution is E,A,B
if prog F, E and D are detached, the execution is C,A,B
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make local functions static to fix
HOSTCC samples/bpf/xdp_monitor_user.o
samples/bpf/xdp_monitor_user.c:64:7: warning: no previous prototype for ‘gettime’ [-Wmissing-prototypes]
__u64 gettime(void)
^~~~~~~
samples/bpf/xdp_monitor_user.c:209:6: warning: no previous prototype for ‘print_bpf_prog_info’ [-Wmissing-prototypes]
void print_bpf_prog_info(void)
^~~~~~~~~~~~~~~~~~~
Fixes: 3ffab54602 ("samples/bpf: xdp_monitor tool based on tracepoints")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends the libbpf to provide API support to
allow specifying BPF object name.
In tools/lib/bpf/libbpf, the C symbol of the function
and the map is used. Regarding section name, all maps are
under the same section named "maps". Hence, section name
is not a good choice for map's name. To be consistent with
map, bpf_prog also follows and uses its function symbol as
the prog's name.
This patch adds logic to collect function's symbols in libbpf.
There is existing codes to collect the map's symbols and no change
is needed.
The bpf_load_program_name() and bpf_map_create_name() are
added to take the name argument. For the other bpf_map_create_xxx()
variants, a name argument is directly added to them.
In samples/bpf, bpf_load.c in particular, the symbol is also
used as the map's name and the map symbols has already been
collected in the existing code. For bpf_prog, bpf_load.c does
not collect the function symbol name. We can consider to collect
them later if there is a need to continue supporting the bpf_load.c.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF samples fail to build when cross-compiling for ARM64 because of incorrect
pt_regs param selection. This is because clang defines __x86_64__ and
bpf_headers thinks we're building for x86. Since clang is building for the BPF
target, it shouldn't make assumptions about what target the BPF program is
going to run on. To fix this, lets pass ARCH so the header knows which target
the BPF program is being compiled for and can use the correct pt_regs code.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cross compiling, bpf samples use HOSTCC for compiling the non-BPF part of
the sample, however what we really want is to use the cross compiler to build
for the cross target since that is what will load and run the BPF sample.
Detect this and compile samples correctly.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cross-compiling the bpf sample map_perf_test for aarch64, I find that
__NR_getpgrp is undefined. This causes build errors. This syscall is deprecated
and requires defining __ARCH_WANT_SYSCALL_DEPRECATED. To avoid having to define
that, just use a different syscall (getppid) for the array map stress test.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a new case to test the LRU lookup performance.
At the beginning, the LRU map is fully loaded (i.e. the number of keys
is equal to map->max_entries). The lookup is done through key 0
to num_map_entries and then repeats from 0 again.
This patch also creates an anonymous struct to properly
name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update cgrp2 bpf sock tests to check that device, mark and priority
can all be set on a socket via bpf programs attached to a cgroup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add option to dump socket settings. Will be used in the next patch
to verify bpf programs are correctly setting mark, priority and
device based on the cgroup attachment for the program run.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add option to detach programs from a cgroup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update sock test to set mark and priority on socket create.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This tool xdp_monitor demonstrate how to use the different xdp_redirect
tracepoints xdp_redirect{,_map}{,_err} from a BPF program.
The default mode is to only monitor the error counters, to avoid
affecting the per packet performance. Tracepoints comes with a base
overhead of 25 nanosec for an attached bpf_prog, and 48 nanosec for
using a full perf record (with non-matching filter). Thus, default
loading the --stats mode could affect the maximum performance.
This version of the tool is very simple and count all types of errors
as one. It will be natural to extend this later with the different
types of errors that can occur, which should help users quickly
identify common mistakes.
Because the TP_STRUCT was kept in sync all the tracepoints loads the
same BPF code. It would also be natural to extend the map version to
demonstrate how the map information could be used.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For supporting XDP_REDIRECT, a device driver must (obviously)
implement the "TX" function ndo_xdp_xmit(). An additional requirement
is you cannot TX out a device, unless it also have a xdp bpf program
attached. This dependency is caused by the driver code need to setup
XDP resources before it can ndo_xdp_xmit.
Update bpf samples xdp_redirect and xdp_redirect_map to automatically
attach a dummy XDP program to the configured ifindex_out device. Use
the XDP flag XDP_FLAGS_UPDATE_IF_NOEXIST on the dummy load, to avoid
overriding an existing XDP prog on the device.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the needed changes to allow each process of
the INNER_LRU_HASH_PREALLOC test to provide its numa node id
when creating the lru map.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This program binds a program to a cgroup and then matches hard
coded IP addresses and adds these to a sockmap.
This will receive messages from the backend and send them to
the client.
client:X <---> frontend:10000 client:X <---> backend:10001
To keep things simple this is only designed for 1:1 connections
using hard coded values. A more complete example would allow many
backends and clients.
To run,
# sockmap <cgroup2_dir>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two minor conflicts in virtio_net driver (bug fix overlapping addition
of a helper) and MAINTAINERS (new driver edit overlapping revamp of
PHY entry).
Signed-off-by: David S. Miller <davem@davemloft.net>
test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
next geneve tunnelling test case fails. In addition, the geneve reserved bit
in tcbpf2_kern.c should be zero, according to the RFC.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a sample program for testing bpf_redirect. It reports
the number of packets redirected per second and as input takes the
ifindex of the device to run the xdp program on and the ifindex of the
interface to redirect packets to.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With latest net-next:
====
clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -Isamples/bpf \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
-Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-Wno-unknown-warning-option \
-O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/tcp_synrto_kern.o
samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not found
^~~~~~~~~~~~~~
1 error generated.
====
net has the same issue.
Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.
Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
compiler include logic so that programs in samples/bpf can access the headers
in selftests/bpf, but not the other way around.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function load_bpf_file ignores the return value of
load_and_attach(), so even if load_and_attach() returns an error,
load_bpf_file() will return 0.
Now, load_bpf_file() can call load_and_attach() multiple times and some
can succeed and some could fail. I think the correct behavor is to
return error on the first failed load_and_attach().
v2: Added missing SOB
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.
In practice there would be a test to insure the hosts are actually
far enough away.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sample BPF program that sets congestion control to dctcp when both hosts
are within the same datacenter. In this example that is assumed to be
when they have the first 5.5 bytes of their IPv6 address are the same.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains a BPF program to set initial receive window to
40 packets and send and receive buffers to 1.5MB. This would usually
be done after doing appropriate checks that indicate the hosts are
far enough away (i.e. large RTT).
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added support for calling a subset of socket setsockopts from
BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
than making the changes to call the socket setsockopt function because
the changes required would have been larger.
The ops supported are:
SO_RCVBUF
SO_SNDBUF
SO_MAX_PACING_RATE
SO_PRIORITY
SO_RCVLOWAT
SO_MARK
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sample bpf program, tcp_rwnd_kern.c, sets the initial
advertized window to 40 packets in an environment where
distinct IPv6 prefixes indicate that both hosts are not
in the same data center.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
RTOs to 10ms when both hosts are within the same datacenter (i.e.
small RTTs) in an environment where common IPv6 prefixes indicate
both hosts are in the same data center.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The program load_sock_ops can be used to load sock_ops bpf programs and
to attach it to an existing (v2) cgroup. It can also be used to detach
sock_ops programs.
Examples:
load_sock_ops [-l] <cg-path> <prog filename>
Load and attaches a sock_ops program at the specified cgroup.
If "-l" is used, the program will continue to run to output the
BPF log buffer.
If the specified filename does not end in ".o", it appends
"_kern.o" to the name.
load_sock_ops -r <cg-path>
Detaches the currently attached sock_ops program from the
specified cgroup.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.
Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.
Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code. For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.
The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.
This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.
This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.
Two new corresponding structs (one for the kernel one for the user/BPF
program):
/* kernel version */
struct bpf_sock_ops_kern {
struct sock *sk;
__u32 op;
union {
__u32 reply;
__u32 replylong[4];
};
};
/* user version
* Some fields are in network byte order reflecting the sock struct
* Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to
* convert them to host byte order.
*/
struct bpf_sock_ops {
__u32 op;
union {
__u32 reply;
__u32 replylong[4];
};
__u32 family;
__u32 remote_ip4; /* In network byte order */
__u32 local_ip4; /* In network byte order */
__u32 remote_ip6[4]; /* In network byte order */
__u32 local_ip6[4]; /* In network byte order */
__u32 remote_port; /* In network byte order */
__u32 local_port; /* In host byte horder */
};
Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.
The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Checks are added to the existing sockex3 and test_map_in_map test.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
tracex5_kern.c build failed with the following error message:
../samples/bpf/tracex5_kern.c:12:10: fatal error: 'syscall_nrs.h' file not found
#include "syscall_nrs.h"
The generated file syscall_nrs.h is put in build/samples/bpf directory,
but this directory is not in include path, hence build failed.
The fix is to add $(obj) into the clang compilation path.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two problems:
1) In MIPS the __NR_* macros expand to an expression, this causes the
sections of the object file to be named like:
.
.
.
[ 5] kprobe/(5000 + 1) PROGBITS 0000000000000000 000160 ...
[ 6] kprobe/(5000 + 0) PROGBITS 0000000000000000 000258 ...
[ 7] kprobe/(5000 + 9) PROGBITS 0000000000000000 000348 ...
.
.
.
The fix here is to use the "asm_offsets" trick to evaluate the macros
in the C compiler and generate a header file with a usable form of the
macros.
2) MIPS syscall numbers start at 5000, so we need a bigger map to hold
the sub-programs.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
$ trace_event
tests attaching BPF program to HW_CPU_CYCLES, SW_CPU_CLOCK, HW_CACHE_L1D and other events.
It runs 'dd' in the background while bpf program collects user and kernel
stack trace on counter overflow.
User space expects to see sys_read and sys_write in the kernel stack.
$ tracex6
tests reading of various perf counters from BPF program.
Both tests were refactored to increase coverage and be more accurate.
Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An eBPF ELF file generated with LLVM can contain several program
section, which can be used for bpf tail calls. The bpf prog file
descriptors are accessible via array prog_fd[].
At-least XDP samples assume ordering, and uses prog_fd[0] is the main
XDP program to attach. The actual order of array prog_fd[] depend on
whether or not a bpf program section is referencing any maps or not.
Not using a map result in being loaded/processed after all other
prog section. Thus, this can lead to some very strange and hard to
debug situation, as the user can only see a FD and cannot correlated
that with the ELF section name.
The fix is rather simple, and even removes duplicate memcmp code.
Simply load program sections as the last step, instead of
load_and_attach while processing the relocation section.
When working with tail calls, it become even more essential that the
order of prog_fd[] is consistant, like the current dependency of the
map_fd[] order.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahid Habib noticed that when xdp1 was killed from a different console the xdp
program was not cleaned-up properly in the kernel and it continued to forward
traffic.
Most of the applications in samples/bpf cleanup properly, but only when getting
SIGINT. Since kill defaults to using SIGTERM, add support to cleanup when the
application receives either SIGINT or SIGTERM.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Shahid Habib <shahid.habib@broadcom.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit b5cdae3291 ("net: Generic XDP") we automatically fall
back to a generic XDP variant if the driver does not support native
XDP. Allow for an option where the user can specify that always the
native XDP variant should be selected and in case it's not supported
by a driver, just bail out.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giving *_user.c side tools access to map_data[] provides easier
access to information on the maps being loaded. Still provide
the guarantee that the order maps are being defined in inside the
_kern.c file corresponds with the order in the array. Now user
tools are not blind, but can inspect and verify the maps that got
loaded from the ELF binary.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do this change before others start to use this callback.
Change map_perf_test_user.c which seems to be the only user.
This patch extends capabilities of commit 9fd63d05f3 ("bpf:
Allow bpf sample programs (*_user.c) to change bpf_map_def").
Give fixup callback access to struct bpf_map_data, instead of
only stuct bpf_map_def. This add flexibility to allow userspace
to reassign the map file descriptor. This is very useful when
wanting to share maps between several bpf programs.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch does proper parsing of the ELF "maps" section, in-order to
be both backwards and forwards compatible with changes to the map
definition struct bpf_map_def, which gets compiled into the ELF file.
The assumption is that new features with value zero, means that they
are not in-use. For backward compatibility where loading an ELF file
with a smaller struct bpf_map_def, only copy objects ELF size, leaving
rest of loaders struct zero. For forward compatibility where ELF file
have a larger struct bpf_map_def, only copy loaders own struct size
and verify that rest of the larger struct is zero, assuming this means
the newer feature was not activated, thus it should be safe for this
older loader to load this newer ELF file.
Fixes: fb30d4b712 ("bpf: Add tests for map-in-map")
Fixes: 409526bea3c3 ("samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed to adjust max locked memory RLIMIT_MEMLOCK for testing these bpf samples
as these are using more and larger maps than can fit in distro default 64Kbytes limit.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following warnings triggered by 51570a5ab2 ("A Sample of
using socket cookie and uid for traffic monitoring"):
In file included from /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:54:0:
/home/foo/net-next/samples/bpf/cookie_uid_helper_example.c: In function 'prog_load':
/home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:119:27: warning: overflow in implicit constant conversion [-Woverflow]
-32 + offsetof(struct stats, uid)),
^
/home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
.off = OFF, \
^
/home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:121:27: warning: overflow in implicit constant conversion [-Woverflow]
-32 + offsetof(struct stats, packets), 1),
^
/home/foo/net-next/samples/bpf/libbpf.h:155:12: note: in definition of macro 'BPF_ST_MEM'
.off = OFF, \
^
/home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:129:27: warning: overflow in implicit constant conversion [-Woverflow]
-32 + offsetof(struct stats, bytes)),
^
/home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
.off = OFF, \
^
HOSTLD /home/foo/net-next/samples/bpf/per_socket_stats_example
Fixes: 51570a5ab2 ("A Sample of using socket cookie and uid for traffic monitoring")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xdp_tx_iptunnel program can be terminated in two ways, after
N-seconds or via Ctrl-C SIGINT. The SIGINT code path does not
handle detatching the correct XDP program, in-case the program
was attached with XDP_FLAGS_SKB_MODE.
Fix this by storing the XDP flags as a global variable, which is
available for the SIGINT handler function.
Fixes: 3993f2cb98 ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
samples/bpf/ should use the correct type.
Fixes: 3993f2cb98 ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct bpf_map_def was extended in commit fb30d4b712 ("bpf: Add tests
for map-in-map") with member unsigned int inner_map_idx. This changed the size
of the maps section in the generated ELF _kern.o files.
Unfortunately the loader in bpf_load.c does not detect or handle this. Thus,
older _kern.o files became incompatible, and caused hard-to-debug errors
where the syscall validation rejected BPF_MAP_CREATE request.
This patch only detect the situation and aborts load_bpf_file(). It also
add code comments warning people that read this loader for inspiration
for these pitfalls.
Fixes: fb30d4b712 ("bpf: Add tests for map-in-map")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add option to xdp1 and xdp_tx_iptunnel to insert xdp program in
SKB_MODE:
- update set_link_xdp_fd to take a flags argument that is added to the
RTM_SETLINK message
- Add -S option to xdp1 and xdp_tx_iptunnel user code. When passed in
XDP_FLAGS_SKB_MODE is set in the flags arg passed to set_link_xdp_fd
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following warning
samples/bpf/test_lru_dist.c:28:0: warning: "offsetof" redefined
#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
In file included from ./tools/lib/bpf/bpf.h:25:0,
from samples/bpf/libbpf.h:5,
from samples/bpf/test_lru_dist.c:24:
/usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stddef.h:417:0: note: this is the location of the previous definition
#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following warning
samples/bpf/cookie_uid_helper_example.c: At top level:
samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes]
void finish(int ret)
^~~~~~
HOSTLD samples/bpf/per_socket_stats_example
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was initially going to remove '-Wno-address-of-packed-member' because I
thought it was not supposed to be there but Daniel suggested using
'-Wno-unknown-warning-option'.
This silences several warnings similiar to the one below
warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option]
1 warning generated.
clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I./include
-I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
-Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-O2 -emit-llvm -c samples/bpf/xdp_tx_iptunnel_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/xdp_tx_iptunnel_kern.o
$ clang --version
clang version 3.9.1 (tags/RELEASE_391/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a map-in-map LRU example.
If we know only a subset of cores will use the
LRU, we can allocate a common LRU list per targeting core
and store it into an array-of-hashs.
It allows using the common LRU map with map-update performance
comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
on the unused cores that we know they will never access the LRU map.
BPF_F_NO_COMMON_LRU:
> map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
9234314 (9.23M/s)
map-in-map LRU:
> map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
9962743 (9.96M/s)
Notes that the max_entries for the map-in-map LRU test is 1260000 which
is the max_entries for each inner LRU map. 8 processes have been
started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
used in the BPF_F_NO_COMMON_LRU test.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current bpf_map_def is statically defined during compile
time. This patch allows the *_user.c program to change it during
runtime. It is done by adding load_bpf_file_fixup_map() which
takes a callback. The callback will be called before creating
each map so that it has a chance to modify the bpf_map_def.
The current usecase is to change max_entries in map_perf_test.
It is interesting to test with a much bigger map size in
some cases (e.g. the following patch on bpf_lru_map.c).
However, it is hard to find one size to fit all testing
environment. Hence, it is handy to take the max_entries
as a cmdline arg and then configure the bpf_map_def during
runtime.
This patch adds two cmdline args. One is to configure
the map's max_entries. Another is to configure the max_cnt
which controls how many times a syscall is called.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
One more LRU test will be added later in this patch series.
In this patch, we first move all existing LRU map tests into
a single syscall (connect) first so that the future new
LRU test can be added without hunting another syscall.
One of the map name is also changed from percpu_lru_hash_map
to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a per socket traffic monitoring option to illustrate the usage
of new getsockopt SO_COOKIE. The program is based on the socket traffic
monitoring program using xt_eBPF and in the new option the data entry
can be directly accessed using socket cookie. The cookie retrieved
allow us to lookup an element in the eBPF for a specific socket.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sample program to demostrate the possible usage of
get_socket_cookie and get_socket_uid helper function. The program will
store bytes and packets counting of in/out traffic monitored by iptables
and store the stats in a bpf map in per socket base. The owner uid of
the socket will be stored as part of the data entry. A shell script for
running the program is also included.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>