linux-sg2042/net
Alexei Starovoitov d83525ca62 bpf: introduce bpf_spin_lock
Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
}

Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
  cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
  It drastically simplifies implementation yet allows bpf program to use
  any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
  bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
  a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
  Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
  bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
  insufficient preemption checks

Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
  Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
  Other solutions to avoid nested bpf_spin_lock are possible.
  Like making sure that all networking progs run with softirq disabled.
  spin_lock_irqsave is the simplest and doesn't add overhead to the
  programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
  sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
  extra flag during map_create, but specifying it via BTF is cleaner.
  It provides introspection for map key/value and reduces user mistakes.

Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
  to request kernel to grab bpf_spin_lock before rewriting the value.
  That will serialize access to map elements.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 20:55:38 +01:00
..
6lowpan 6lowpan: convert to DEFINE_SHOW_ATTRIBUTE 2018-12-19 00:28:05 +01:00
9p 9p/net: put a lower bound on msize 2018-12-25 17:07:49 +09:00
802
8021q net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
appletalk
atm Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
ax25 ax25: fix possible use-after-free 2019-01-23 11:18:00 -08:00
batman-adv bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls 2019-01-22 17:18:08 -08:00
bluetooth Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-12-27 13:53:32 -08:00
bpf bpf: add BPF_PROG_TEST_RUN support for flow dissector 2019-01-29 01:08:29 +01:00
bpfilter net: bpfilter: change section name of bpfilter UMH blob. 2019-01-16 15:46:46 -08:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
caif Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
can can: bcm: check timer values before ktime conversion 2019-01-22 11:33:46 +01:00
ceph libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive() 2019-01-21 14:53:12 +01:00
core bpf: introduce bpf_spin_lock 2019-02-01 20:55:38 +01:00
dcb net: dcb: Add priority-to-DSCP map getters 2018-07-27 13:17:50 -07:00
dccp tcp: Refactor pingpong code 2019-01-27 13:29:43 -08:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
dns_resolver dns: Allow the dns resolver to retrieve a server set 2018-10-04 09:40:52 -07:00
dsa switchdev: Add extack argument to call_switchdev_notifiers() 2019-01-17 15:18:47 -08:00
ethernet net: ethernet: provide nvmem_get_mac_address() 2018-12-03 15:40:30 -08:00
hsr
ieee802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-24 16:19:56 -08:00
ife
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
iucv iucv: Remove SKB list assumptions. 2018-11-10 16:55:11 -08:00
kcm Revert "kcm: remove any offset before parsing messages" 2018-09-17 18:43:42 -07:00
key af_key: fix indentation on declaration statement 2018-11-15 18:09:32 +01:00
l2tp ppp: Move PFC decompression to PPP generic layer 2018-12-20 16:49:30 -08:00
l3mdev l3mdev: add function to retreive upper master 2018-12-03 14:15:26 -08:00
lapb
llc llc: do not use sk_eat_skb() 2018-10-22 19:59:20 -07:00
mac80211 mac80211: Add attribute aligned(2) to struct 'action' 2019-01-25 10:17:25 +01:00
mac802154 mac802154: Remove VLA usage of skcipher 2018-09-28 12:46:07 +08:00
mpls net: mpls: netconf: perform strict checks also for doit handlers 2019-01-19 10:09:59 -08:00
ncsi net/ncsi: Add NCSI Mellanox OEM command 2018-11-27 16:37:20 -08:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
netlabel netlabel: check for IPV4MASK in addrinfo_get 2018-09-21 18:58:34 -07:00
netlink net: netlink: add helper to retrieve NETLINK_F_STRICT_CHK 2019-01-19 10:09:58 -08:00
netrom netrom: switch to sock timer API 2019-01-27 10:38:04 -08:00
nfc net: Revert recent Spectre-v1 patches. 2018-12-23 16:01:35 -08:00
nsh
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2019-01-28 17:34:38 -08:00
packet af_packet: fix raw sockets over 6in4 tunnel 2019-01-17 15:54:45 -08:00
phonet net: Revert recent Spectre-v1 patches. 2018-12-23 16:01:35 -08:00
psample
qrtr
rds rds: use DIV_ROUND_UP instead of ceil 2019-01-07 07:22:36 -08:00
rfkill rfkill: gpio: Remove unused include 2018-12-18 13:13:56 +01:00
rose net/rose: fix NULL ax25_cb kernel panic 2019-01-27 10:40:01 -08:00
rxrpc Revert "rxrpc: Allow failed client calls to be retried" 2019-01-15 21:33:36 -08:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-21 14:41:32 -08:00
sctp sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt 2019-01-30 00:44:08 -08:00
smc smc: move unhash as early as possible in smc_release() 2019-01-07 14:40:27 -05:00
strparser bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
sunrpc SUNRPC: Address Kerberos performance/behavior regression 2019-01-15 15:36:41 -05:00
switchdev switchdev: Add extack argument to call_switchdev_notifiers() 2019-01-17 15:18:47 -08:00
tipc tipc: remove dead code in struct tipc_topsrv 2019-01-24 22:31:23 -08:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
unix Revert "net: simplify sock_poll_wait" 2018-10-23 10:57:06 -07:00
vmw_vsock Fix ERROR:do not initialise statics to 0 in af_vsock.c 2019-01-15 20:38:29 -08:00
wimax wimax: remove blank lines at EOF 2018-07-24 14:10:42 -07:00
wireless cfg80211: extend range deviation for DMG 2019-01-25 10:18:51 +01:00
x25 net/x25: handle call collisions 2018-11-29 14:25:36 -08:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2019-01-28 19:38:33 -08:00
xfrm xfrm: Make set-mark default behavior backward compatible 2019-01-16 13:10:55 +01:00
Kconfig net: convert bridge_nf to use skb extension infrastructure 2018-12-19 11:21:37 -08:00
Makefile
compat.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
socket.c y2038: more syscalls and cleanups 2018-12-28 12:45:04 -08:00
sysctl_net.c