Commit Graph

424 Commits

Author SHA1 Message Date
Linus Torvalds a0cb2b5c39 Fix tracing sample code warning.
Commit 6575257c60 ("tracing/samples: Fix creation and deletion of
simple_thread_fn creation") introduced a new warning due to using a
boolean as a counter.

Just make it "int".

Fixes: 6575257c60 ("tracing/samples: Fix creation and deletion of simple_thread_fn creation")
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-27 20:35:31 -07:00
Linus Torvalds b5ac3beb5a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A little more than usual this time around. Been travelling, so that is
  part of it.

  Anyways, here are the highlights:

   1) Deal with memcontrol races wrt. listener dismantle, from Eric
      Dumazet.

   2) Handle page allocation failures properly in nfp driver, from Jaku
      Kicinski.

   3) Fix memory leaks in macsec, from Sabrina Dubroca.

   4) Fix crashes in pppol2tp_session_ioctl(), from Guillaume Nault.

   5) Several fixes in bnxt_en driver, including preventing potential
      NVRAM parameter corruption from Michael Chan.

   6) Fix for KRACK attacks in wireless, from Johannes Berg.

   7) rtnetlink event generation fixes from Xin Long.

   8) Deadlock in mlxsw driver, from Ido Schimmel.

   9) Disallow arithmetic operations on context pointers in bpf, from
      Jakub Kicinski.

  10) Missing sock_owned_by_user() check in sctp_icmp_redirect(), from
      Xin Long.

  11) Only TCP is supported for sockmap, make that explicit with a
      check, from John Fastabend.

  12) Fix IP options state races in DCCP and TCP, from Eric Dumazet.

  13) Fix panic in packet_getsockopt(), also from Eric Dumazet.

  14) Add missing locked in hv_sock layer, from Dexuan Cui.

  15) Various aquantia bug fixes, including several statistics handling
      cures. From Igor Russkikh et al.

  16) Fix arithmetic overflow in devmap code, from John Fastabend.

  17) Fix busted socket memory accounting when we get a fault in the tcp
      zero copy paths. From Willem de Bruijn.

  18) Don't leave opt->tot_len uninitialized in ipv6, from Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  stmmac: Don't access tx_q->dirty_tx before netif_tx_lock
  ipv6: flowlabel: do not leave opt->tot_len with garbage
  of_mdio: Fix broken PHY IRQ in case of probe deferral
  textsearch: fix typos in library helpers
  rxrpc: Don't release call mutex on error pointer
  net: stmmac: Prevent infinite loop in get_rx_timestamp_status()
  net: stmmac: Fix stmmac_get_rx_hwtstamp()
  net: stmmac: Add missing call to dev_kfree_skb()
  mlxsw: spectrum_router: Configure TIGCR on init
  mlxsw: reg: Add Tunneling IPinIP General Configuration Register
  net: ethtool: remove error check for legacy setting transceiver type
  soreuseport: fix initialization race
  net: bridge: fix returning of vlan range op errors
  sock: correct sk_wmem_queued accounting on efault in tcp zerocopy
  bpf: add test cases to bpf selftests to cover all access tests
  bpf: fix pattern matches for direct packet access
  bpf: fix off by one for range markings with L{T, E} patterns
  bpf: devmap fix arithmetic overflow in bitmap_size calculation
  net: aquantia: Bad udp rate on default interrupt coalescing
  net: aquantia: Enable coalescing management via ethtool interface
  ...
2017-10-21 22:44:48 -04:00
John Fastabend 34f79502bb bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region
SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.

This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:01:29 +01:00
Linus Torvalds 503f7e297d Testing a new trace event format, I triggered a bug by doing:
# modprobe trace-events-sample
   # echo 1 > /sys/kernel/debug/tracing/events/sample-trace/enable
   # rmmod trace-events-sample
 
 This would cause an oops. The issue is that I added another trace
 event sample that reused a reg function of another trace event to
 create a thread to call the tracepoints. The problem was that the
 reg function couldn't handle nested calls (reg; reg; unreg; unreg;)
 and created two threads (instead of one) and only removed one
 on exit.
 
 This isn't a critical bug as the bug is only in sample code. But sample
 code should be free of known bugs to prevent others from copying
 it. This is why this is also marked for stable.
 -----BEGIN PGP SIGNATURE-----
 
 iQHIBAABCgAyFiEEPm6V/WuN2kyArTUe1a05Y9njSUkFAlnmcA0UHHJvc3RlZHRA
 Z29vZG1pcy5vcmcACgkQ1a05Y9njSUkvqAwAhY/W7OF2JG/TV2cHNmHZqTEgQOFz
 59EXWI7EsnQzcKTm14rWuR477iK+Q6r2YEzpGajHhBcOy8KjpzYM2+Oj3qzn6ovc
 dyMEwr2wsaVb52B0h2X9J7fsfzZtL0KIIb6Y/wSz/H28BTHMi0xJUJLDkH4W9jrB
 g/3vbKHLpbr4hg8msMPoLSExe4seZeHeB+6VQ+G3VHuIIPlCZOSCnXH05pd8AqC6
 Y9cJzKqlivNPJFWUDnref0yE1aK/KuRsC+DpceJmP/K1+uiYhFMKCwlpWz/kI2eQ
 z02pYugUqck007NWCSdr1xTYWJQBEx4Ke19XKFhtXs2o5a/fgnVZoLYXUagV/QiT
 VoNDHnuqqnTESySMK38dQvekdj5lPU80ycy+Dsgp9RSCW804MBvaXswoMT1095OV
 zxyMAIsbSof2zgUqjUQKEFU75usjxpd1ifl6CoXlfH8hmKEvvdZmqvEypUKakyxh
 0D9+DcGTyOAg9MLEpMdyaW7+F0CVLqwVToBM
 =xSts
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Testing a new trace event format, I triggered a bug by doing:

    # modprobe trace-events-sample
    # echo 1 > /sys/kernel/debug/tracing/events/sample-trace/enable
    # rmmod trace-events-sample

  This would cause an oops. The issue is that I added another trace
  event sample that reused a reg function of another trace event to
  create a thread to call the tracepoints. The problem was that the reg
  function couldn't handle nested calls (reg; reg; unreg; unreg;) and
  created two threads (instead of one) and only removed one on exit.

  This isn't a critical bug as the bug is only in sample code. But
  sample code should be free of known bugs to prevent others from
  copying it. This is why this is also marked for stable"

* tag 'trace-v4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/samples: Fix creation and deletion of simple_thread_fn creation
2017-10-18 06:43:30 -04:00
Steven Rostedt (VMware) 6575257c60 tracing/samples: Fix creation and deletion of simple_thread_fn creation
Commit 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and
DEFINE_EVENT()") added template examples for all the events. It created a
DEFINE_EVENT_FN() example which reused the foo_bar_reg and foo_bar_unreg
functions.

Enabling both the TRACE_EVENT_FN() and DEFINE_EVENT_FN() example trace
events caused the foo_bar_reg to be called twice, creating the test thread
twice. The foo_bar_unreg would remove it only once, even if it was called
multiple times, leaving a thread existing when the module is unloaded,
causing an oops.

Add a ref count and allow foo_bar_reg() and foo_bar_unreg() be called by
multiple trace events.

Cc: stable@vger.kernel.org
Fixes: 7496946a8 ("tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-10-17 14:55:24 -04:00
Linus Torvalds c0da4fa0d1 media updates for v4.14-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZsSBoAAoJEAhfPr2O5OEVDc4QAJZSuVYmyLgvtmPxhyqgCvkz
 I0DmWM4ZtK2VT/xJ/AA23z8IiLKi2+pDC0Xx6/aIiA665cyl3oPUdkKIaHW9Z6+A
 fV8gSFkmGkluQb9mP/KdHYI2oSeEv2ivCa1kfaApYcoBa904z8uU++z15Iu5p/+m
 fjpc2vnc9rax0Vuwmgv7p1CL4j4e/ja0siCSCGbu2ad50KqP4ytnBooNPQOQt89D
 L+Av5MeGml/CTUUnAFjWfSmQ72Ht8GhoBBKc6wGoq9x3GTckDDTqy8BAqGt4UQnu
 fR0mb71zuSVmTjxRe7tc/74m3ReaeSHzQeHJhjdQslvNmV3RVQgk/6CCsmqNEegr
 rbC3glQCM+gp5YywCjRL6DCPsoqvjexLtPQjMZIGYxgSYQUyXGOxilgmj9+73761
 6aOl0nqdgN+vlWzaSeDF9EQxRsc+cCq/Po8/xuPE/Pzs6zTQwU+6b+ADLf9jCyDP
 LTC49wOj24SoWiTlG1FTct2ogZ3h5wNPWlurBtmyiFJn+43RpsH5IW9wLilCjeiE
 6JeCWEIBglCCq/TVCzETKNSaixDL6/lMQ9uRdCpIO4VLyoS6S9pZASNPBmQ1h7h/
 oTjYDeWirIthNOccstbBoJQYSX62CqAIW3wq5ME6PAgM+ioiLXLYk0fV3yBKoBNW
 Z0SBeTcuPxWmfzuxMtik
 =fNM2
 -----END PGP SIGNATURE-----

Merge tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:
 "Brazil's Independence Day pull request :-)

  This is one of the biggest media pull requests, with 625 patches
  affecting almost all parts of media (RC, DVB, V4L2, CEC, docs).

  This contains:

   - A lot of new drivers:
     * DVB frontends: mxl5xx, stv0910, stv6111;
     * camera flash: as3645a led driver;
     * HDMI receiver: adv748X;
     * camera sensor: Omnivision 6650 5M driver (ov6650);
     * HDMI CEC: ao-cec meson driver;
     * V4L2: Qualcom camss driver;
     * Remote controller: gpio-ir-tx, pwm-ir-tx and zx-irdec drivers.

   - The DDbridge DVB driver got a massive update, with makes it in sync
     with modern hardware from that vendor;

   - There's an important milestone on this series: the DVB
     documentation was written in 2003, but only started to be updated
     in 2007. It also used to contain several gaps from the time it was
     kept out of tree, mentioning error codes and device nodes that
     never existed upstream. On this series, it received a massive
     update: all non-deprecated digital TV APIs are now in sync with the
     current implementation;

   - Some DVB APIs that aren't used by any upstream driver got removed;

   - Other parts of the media documentation algo got updated, fixing
     some bugs on its PDF output and making it compatible with Sphinx
     version 1.6.

     As the number of hacks required to build PDF output reduced, I hope
     we'll have less troubles as newer versions of our documentation
     toolchain are released (famous last words);

   - As usual, lots of driver cleanups and improvements"

* tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (624 commits)
  media: leds: as3645a: add V4L2_FLASH_LED_CLASS dependency
  media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers
  media: Revert "[media] v4l: async: make v4l2 coexist with devicetree nodes in a dt overlay"
  media: staging: atomisp: sh_css_calloc shall return a pointer to the allocated space
  media: Revert "[media] lirc_dev: remove superfluous get/put_device() calls"
  media: add qcom_camss.rst to v4l-drivers rst file
  media: dvb headers: make checkpatch happier
  media: dvb uapi: move frontend legacy API to another part of the book
  media: pixfmt-srggb12p.rst: better format the table for PDF output
  media: docs-rst: media: Don't use \small for V4L2_PIX_FMT_SRGGB10 documentation
  media: index.rst: don't write "Contents:" on PDF output
  media: pixfmt*.rst: replace a two dots by a comma
  media: vidioc-g-fmt.rst: adjust table format
  media: vivid.rst: add a blank line to correct ReST format
  media: v4l2 uapi book: get rid of driver programming's chapter
  media: format.rst: use the right markup for important notes
  media: docs-rst: cardlists: change their format to flat-tables
  media: em28xx-cardlist.rst: update to reflect last changes
  media: v4l2-event.rst: adjust table to fit on PDF output
  media: docs: don't show ToC for each part on PDF output
  ...
2017-09-07 12:53:14 -07:00
Martin KaFai Lau 637cd8c312 bpf: Add lru_hash_lookup performance test
Create a new case to test the LRU lookup performance.

At the beginning, the LRU map is fully loaded (i.e. the number of keys
is equal to map->max_entries).   The lookup is done through key 0
to num_map_entries and then repeats from 0 again.

This patch also creates an anonymous struct to properly
name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 09:57:38 -07:00
David Ahern 0adc3dd900 samples/bpf: Update cgroup socket examples to use uid gid helper
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
David Ahern 33aeb5e30a samples/bpf: Update cgrp2 socket tests
Update cgrp2 bpf sock tests to check that device, mark and priority
can all be set on a socket via bpf programs attached to a cgroup.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
David Ahern f776d460b8 samples/bpf: Add option to dump socket settings
Add option to dump socket settings. Will be used in the next patch
to verify bpf programs are correctly setting mark, priority and
device based on the cgroup attachment for the program run.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
David Ahern 609b1c3275 samples/bpf: Add detach option to test_cgrp2_sock
Add option to detach programs from a cgroup.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
David Ahern fa38aa17bc samples/bpf: Update sock test to allow setting mark and priority
Update sock test to set mark and priority on socket create.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
Tariq Toukan 3edcf18ec2 samples/bpf: Fix compilation issue in redirect dummy program
Fix compilation error below:

$ make samples/bpf/

LLVM ERROR: 'xdp_redirect_dummy' label emitted multiple times to
assembly file
make[1]: *** [samples/bpf/xdp_redirect_kern.o] Error 1
make: *** [samples/bpf/] Error 2

Fixes: 306da4e685 ("samples/bpf: xdp_redirect load XDP dummy prog on TX device")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31 11:56:57 -07:00
Jesper Dangaard Brouer 3ffab54602 samples/bpf: xdp_monitor tool based on tracepoints
This tool xdp_monitor demonstrate how to use the different xdp_redirect
tracepoints xdp_redirect{,_map}{,_err} from a BPF program.

The default mode is to only monitor the error counters, to avoid
affecting the per packet performance. Tracepoints comes with a base
overhead of 25 nanosec for an attached bpf_prog, and 48 nanosec for
using a full perf record (with non-matching filter).  Thus, default
loading the --stats mode could affect the maximum performance.

This version of the tool is very simple and count all types of errors
as one.  It will be natural to extend this later with the different
types of errors that can occur, which should help users quickly
identify common mistakes.

Because the TP_STRUCT was kept in sync all the tracepoints loads the
same BPF code.  It would also be natural to extend the map version to
demonstrate how the map information could be used.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29 10:51:29 -07:00
Jesper Dangaard Brouer 306da4e685 samples/bpf: xdp_redirect load XDP dummy prog on TX device
For supporting XDP_REDIRECT, a device driver must (obviously)
implement the "TX" function ndo_xdp_xmit().  An additional requirement
is you cannot TX out a device, unless it also have a xdp bpf program
attached. This dependency is caused by the driver code need to setup
XDP resources before it can ndo_xdp_xmit.

Update bpf samples xdp_redirect and xdp_redirect_map to automatically
attach a dummy XDP program to the configured ifindex_out device.  Use
the XDP flag XDP_FLAGS_UPDATE_IF_NOEXIST on the dummy load, to avoid
overriding an existing XDP prog on the device.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29 10:51:29 -07:00
William Tu ef88f89c83 samples/bpf: extend test_tunnel_bpf.sh with ERSPAN
Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
John Fastabend 464bc0fd62 bpf: convert sockmap field attach_bpf_fd2 to type
In the initial sockmap API we provided strparser and verdict programs
using a single attach command by extending the attach API with a the
attach_bpf_fd2 field.

However, if we add other programs in the future we will be adding a
field for every new possible type, attach_bpf_fd(3,4,..). This
seems a bit clumsy for an API. So lets push the programs using two
new type fields.

   BPF_SK_SKB_STREAM_PARSER
   BPF_SK_SKB_STREAM_VERDICT

This has the advantage of having a readable name and can easily be
extended in the future.

Updates to samples and sockmap included here also generalize tests
slightly to support upcoming patch for multiple map support.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:13:21 -07:00
Julia Lawall aa7d01fd7d media: v4l2-pci-skeleton: constify vb2_ops structures
These vb2_ops structures are only stored in the ops field of a
vb2_queue structure, which is declared as const.  Thus the vb2_ops
structures themselves can be const.

Done with the help of Coccinelle.

// <smpl>
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct vb2_ops i@p = { ... };

@ok@
identifier r.i;
struct vb2_queue e;
position p;
@@
e.ops = &i@p;

@bad@
position p != {r.p,ok.p};
identifier r.i;
struct vb2_ops e;
@@
e@i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct vb2_ops i = { ... };
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-08-20 08:03:09 -04:00
Martin KaFai Lau ad17d0e6c7 bpf: Allow numa selection in INNER_LRU_HASH_PREALLOC test of map_perf_test
This patch makes the needed changes to allow each process of
the INNER_LRU_HASH_PREALLOC test to provide its numa node id
when creating the lru map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19 21:35:43 -07:00
John Fastabend 69e8cc134b bpf: sockmap sample program
This program binds a program to a cgroup and then matches hard
coded IP addresses and adds these to a sockmap.

This will receive messages from the backend and send them to
the client.

     client:X <---> frontend:10000 client:X <---> backend:10001

To keep things simple this is only designed for 1:1 connections
using hard coded values. A more complete example would allow many
backends and clients.

To run,

 # sockmap <cgroup2_dir>

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
Yonghong Song 1da236b6be bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07 14:09:48 -07:00
David S. Miller 29fda25a2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two minor conflicts in virtio_net driver (bug fix overlapping addition
of a helper) and MAINTAINERS (new driver edit overlapping revamp of
PHY entry).

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 10:07:50 -07:00
William Tu cc75f8514d samples/bpf: fix bpf tunnel cleanup
test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
next geneve tunnelling test case fails.  In addition, the geneve reserved bit
in tcbpf2_kern.c should be zero, according to the RFC.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 22:02:47 -07:00
David S. Miller 7a68ada6ec Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-21 03:38:43 +01:00
Andy Gospodarek 24251c2647 samples/bpf: add option for native and skb mode for redirect apps
When testing with a driver that has both native and generic redirect support:

$ sudo ./samples/bpf/xdp_redirect -N 5 6
input: 5 output: 6
ifindex 6:    4961879 pkt/s
ifindex 6:    6391319 pkt/s
ifindex 6:    6419468 pkt/s

$ sudo ./samples/bpf/xdp_redirect -S 5 6
input: 5 output: 6
ifindex 6:    1845435 pkt/s
ifindex 6:    3882850 pkt/s
ifindex 6:    3893974 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -N 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:    2207374 pkt/s
ifindex 6:    6212869 pkt/s
ifindex 6:    6286515 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -S 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:    5052528 pkt/s
ifindex 6:    5736631 pkt/s
ifindex 6:    5739962 pkt/s

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 13:46:27 -07:00
John Fastabend 9d6e005287 xdp: bpf redirect with map sample program
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend 832622e6bd xdp: sample program for new bpf_redirect helper
This implements a sample program for testing bpf_redirect. It reports
the number of packets redirected per second and as input takes the
ifindex of the device to run the xdp program on and the ifindex of the
interface to redirect packets to.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
Linus Torvalds ad51271afc Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

- various misc things

- kexec updates

- sysctl core updates

- scripts/gdb udpates

- checkpoint-restart updates

- ipc updates

- kernel/watchdog updates

- Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature"

- "stackprotector: ascii armor the stack canary"

- more MM bits

- checkpatch updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
  writeback: rework wb_[dec|inc]_stat family of functions
  ARM: samsung: usb-ohci: move inline before return type
  video: fbdev: omap: move inline before return type
  video: fbdev: intelfb: move inline before return type
  USB: serial: safe_serial: move __inline__ before return type
  drivers: tty: serial: move inline before return type
  drivers: s390: move static and inline before return type
  x86/efi: move asmlinkage before return type
  sh: move inline before return type
  MIPS: SMP: move asmlinkage before return type
  m68k: coldfire: move inline before return type
  ia64: sn: pci: move inline before type
  ia64: move inline before return type
  FRV: tlbflush: move asmlinkage before return type
  CRIS: gpio: move inline before return type
  ARM: HP Jornada 7XX: move inline before return type
  ARM: KVM: move asmlinkage before type
  checkpatch: improve the STORAGE_CLASS test
  mm, migration: do not trigger OOM killer when migrating memory
  drm/i915: use __GFP_RETRY_MAYFAIL
  ...
2017-07-13 12:38:49 -07:00
Logan Gunthorpe 9263969a46 kfifo: clean up example to not use page_link
This is a layering violation so we replace the uses with calls to
sg_page().  This is a prep patch for replacing page_link and this is one
of the very few uses outside of scatterlist.h.

Link: http://lkml.kernel.org/r/1495663199-22234-1-git-send-email-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:01 -07:00
Yonghong Song 533350227d samples/bpf: fix a build issue
With latest net-next:

====
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  -Isamples/bpf \
    -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
    -Wno-compare-distinct-pointer-types \
    -Wno-gnu-variable-sized-type-not-at-end \
    -Wno-address-of-packed-member -Wno-tautological-compare \
    -Wno-unknown-warning-option \
    -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/tcp_synrto_kern.o
samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not found
          ^~~~~~~~~~~~~~
1 error generated.
====

net has the same issue.

Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.
Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
compiler include logic so that programs in samples/bpf can access the headers
in selftests/bpf, but not the other way around.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-11 20:51:29 -07:00
Lawrence Brakmo f856e46978 bpf: fix return in load_bpf_file
The function load_bpf_file ignores the return value of
load_and_attach(), so even if load_and_attach() returns an error,
load_bpf_file() will return 0.

Now, load_bpf_file() can call load_and_attach() multiple times and some
can succeed and some could fail. I think the correct behavor is to
return error on the first failed load_and_attach().

v2: Added missing SOB

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:05:28 +01:00
Lawrence Brakmo 6c4a01b278 bpf: Sample bpf program to set sndcwnd clamp
Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo 7bc62e2854 bpf: Sample BPF program to set initial cwnd
Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.

In practice there would be a test to insure the hosts are actually
far enough away.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo bb56d4449d bpf: Sample BPF program to set congestion control
Sample BPF program that sets congestion control to dctcp when both hosts
are within the same datacenter. In this example that is assumed to be
when they have the first 5.5 bytes of their IPv6 address are the same.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo d9925368a6 bpf: Sample BPF program to set buffer sizes
This patch contains a BPF program to set initial receive window to
40 packets and send and receive buffers to 1.5MB. This would usually
be done after doing appropriate checks that indicate the hosts are
far enough away (i.e. large RTT).

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo 8c4b4c7e9f bpf: Add setsockopt helper function to bpf
Added support for calling a subset of socket setsockopts from
BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
than making the changes to call the socket setsockopt function because
the changes required would have been larger.

The ops supported are:
  SO_RCVBUF
  SO_SNDBUF
  SO_MAX_PACING_RATE
  SO_PRIORITY
  SO_RCVLOWAT
  SO_MARK

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Lawrence Brakmo c400296bf6 bpf: Sample bpf program to set initial window
The sample bpf program, tcp_rwnd_kern.c, sets the initial
advertized window to 40 packets in an environment where
distinct IPv6 prefixes indicate that both hosts are not
in the same data center.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Lawrence Brakmo 61bc4d8daa bpf: Sample bpf program to set SYN/SYN-ACK RTOs
The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
RTOs to 10ms when both hosts are within the same datacenter (i.e.
small RTTs) in an environment where common IPv6 prefixes indicate
both hosts are in the same data center.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Lawrence Brakmo ae16189efb bpf: program to load and attach sock_ops BPF progs
The program load_sock_ops can be used to load sock_ops bpf programs and
to attach it to an existing (v2) cgroup. It can also be used to detach
sock_ops programs.

Examples:
    load_sock_ops [-l] <cg-path> <prog filename>
	Load and attaches a sock_ops program at the specified cgroup.
	If "-l" is used, the program will continue to run to output the
	BPF log buffer.
	If the specified filename does not end in ".o", it appends
	"_kern.o" to the name.

    load_sock_ops -r <cg-path>
	Detaches the currently attached sock_ops program from the
	specified cgroup.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Lawrence Brakmo 40304b2a15 bpf: BPF support for sock_ops
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_sock_ops_kern {
        struct sock *sk;
        __u32  op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
};

/* user version
 * Some fields are in network byte order reflecting the sock struct
 * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to
 * convert them to host byte order.
 */
struct bpf_sock_ops {
        __u32 op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
        __u32 family;
        __u32 remote_ip4;     /* In network byte order */
        __u32 local_ip4;      /* In network byte order */
        __u32 remote_ip6[4];  /* In network byte order */
        __u32 local_ip6[4];   /* In network byte order */
        __u32 remote_port;    /* In network byte order */
        __u32 local_port;     /* In host byte horder */
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Martin KaFai Lau a8744f2528 bpf: Add test for syscall on fd array/htab lookup
Checks are added to the existing sockex3 and test_map_in_map test.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29 13:13:26 -04:00
Yonghong Song 00a3855d68 samples/bpf: fix a build problem
tracex5_kern.c build failed with the following error message:
  ../samples/bpf/tracex5_kern.c:12:10: fatal error: 'syscall_nrs.h' file not found
  #include "syscall_nrs.h"
The generated file syscall_nrs.h is put in build/samples/bpf directory,
but this directory is not in include path, hence build failed.

The fix is to add $(obj) into the clang compilation path.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22 11:35:19 -04:00
Tariq Toukan e0e16672ee pktgen: Specify the index of first thread
Use "-f <num>", to specify the index of the first
sender thread.
In default first thread is #0.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 12:32:34 -04:00
Tariq Toukan 69137ea60c pktgen: Specify num packets per thread
Use -n <num>, to specify the number of packets every
thread sends.
Zero means indefinitely.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 12:32:34 -04:00
David Daney 4b7190e841 samples/bpf: Fix tracex5 to work with MIPS syscalls.
There are two problems:

1) In MIPS the __NR_* macros expand to an expression, this causes the
   sections of the object file to be named like:

  .
  .
  .
  [ 5] kprobe/(5000 + 1) PROGBITS        0000000000000000 000160 ...
  [ 6] kprobe/(5000 + 0) PROGBITS        0000000000000000 000258 ...
  [ 7] kprobe/(5000 + 9) PROGBITS        0000000000000000 000348 ...
  .
  .
  .

The fix here is to use the "asm_offsets" trick to evaluate the macros
in the C compiler and generate a header file with a usable form of the
macros.

2) MIPS syscall numbers start at 5000, so we need a bigger map to hold
the sub-programs.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 15:03:23 -04:00
David Daney c1932cdb27 bpf: Add MIPS support to samples/bpf.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 15:03:22 -04:00
Teng Qin 41e9a8046c samples/bpf: add tests for more perf event types
$ trace_event

tests attaching BPF program to HW_CPU_CYCLES, SW_CPU_CLOCK, HW_CACHE_L1D and other events.
It runs 'dd' in the background while bpf program collects user and kernel
stack trace on counter overflow.
User space expects to see sys_read and sys_write in the kernel stack.

$ tracex6

tests reading of various perf counters from BPF program.

Both tests were refactored to increase coverage and be more accurate.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 21:58:15 -04:00
Jesper Dangaard Brouer 7bc57950bd samples/bpf: bpf_load.c order of prog_fd[] should correspond with ELF order
An eBPF ELF file generated with LLVM can contain several program
section, which can be used for bpf tail calls.  The bpf prog file
descriptors are accessible via array prog_fd[].

At-least XDP samples assume ordering, and uses prog_fd[0] is the main
XDP program to attach.  The actual order of array prog_fd[] depend on
whether or not a bpf program section is referencing any maps or not.
Not using a map result in being loaded/processed after all other
prog section.  Thus, this can lead to some very strange and hard to
debug situation, as the user can only see a FD and cannot correlated
that with the ELF section name.

The fix is rather simple, and even removes duplicate memcmp code.
Simply load program sections as the last step, instead of
load_and_attach while processing the relocation section.

When working with tail calls, it become even more essential that the
order of prog_fd[] is consistant, like the current dependency of the
map_fd[] order.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31 12:59:20 -04:00
Andy Gospodarek ad990dbe6d samples/bpf: run cleanup routines when receiving SIGTERM
Shahid Habib noticed that when xdp1 was killed from a different console the xdp
program was not cleaned-up properly in the kernel and it continued to forward
traffic.

Most of the applications in samples/bpf cleanup properly, but only when getting
SIGINT.  Since kill defaults to using SIGTERM, add support to cleanup when the
application receives either SIGINT or SIGTERM.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Shahid Habib <shahid.habib@broadcom.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11 21:43:30 -04:00
Daniel Borkmann 0489df9a43 xdp: add flag to enforce driver mode
After commit b5cdae3291 ("net: Generic XDP") we automatically fall
back to a generic XDP variant if the driver does not support native
XDP. Allow for an option where the user can specify that always the
native XDP variant should be selected and in case it's not supported
by a driver, just bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11 21:30:57 -04:00