OpenCloudOS-Kernel/include
Daniel Borkmann b2197755b2 bpf: add support for persistent maps/progs
This work adds support for "persistent" eBPF maps/programs. The term
"persistent" is to be understood that maps/programs have a facility
that lets them survive process termination. This is desired by various
eBPF subsystem users.

Just to name one example: tc classifier/action. Whenever tc parses
the ELF object, extracts and loads maps/progs into the kernel, these
file descriptors will be out of reach after the tc instance exits.
So a subsequent tc invocation won't be able to access/relocate on this
resource, and therefore maps cannot easily be shared, f.e. between the
ingress and egress networking data path.

The current workaround is that Unix domain sockets (UDS) need to be
instrumented in order to pass the created eBPF map/program file
descriptors to a third party management daemon through UDS' socket
passing facility. This makes it a bit complicated to deploy shared
eBPF maps or programs (programs f.e. for tail calls) among various
processes.

We've been brainstorming on how we could tackle this issue and various
approches have been tried out so far, which can be read up further in
the below reference.

The architecture we eventually ended up with is a minimal file system
that can hold map/prog objects. The file system is a per mount namespace
singleton, and the default mount point is /sys/fs/bpf/. Any subsequent
mounts within a given namespace will point to the same instance. The
file system allows for creating a user-defined directory structure.
The objects for maps/progs are created/fetched through bpf(2) with
two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor
along with a pathname is being passed to bpf(2) that in turn creates
(we call it eBPF object pinning) the file system nodes. Only the pathname
is being passed to bpf(2) for getting a new BPF file descriptor to an
existing node. The user can use that to access maps and progs later on,
through bpf(2). Removal of file system nodes is being managed through
normal VFS functions such as unlink(2), etc. The file system code is
kept to a very minimum and can be further extended later on.

The next step I'm working on is to add dump eBPF map/prog commands
to bpf(2), so that a specification from a given file descriptor can
be retrieved. This can be used by things like CRIU but also applications
can inspect the meta data after calling BPF_OBJ_GET.

Big thanks also to Alexei and Hannes who significantly contributed
in the design discussion that eventually let us end up with this
architecture here.

Reference: https://lkml.org/lkml/2015/10/15/925
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 22:48:39 -05:00
..
acpi ACPI: Eliminate CONFIG_.*{, _MODULE} #ifdef in favor of IS_ENABLED() 2015-09-15 03:05:45 +02:00
asm-generic Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile 2015-10-04 16:31:13 +01:00
clocksource
crypto Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2015-09-08 12:41:25 -07:00
drm drm/dp/mst: make mst i2c transfer code more robust. 2015-10-15 09:06:20 +10:00
dt-bindings Merge branch 'drivers/reset' into next/late 2015-09-09 15:42:45 -07:00
keys
kvm arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' 2015-09-17 13:13:27 +01:00
linux bpf: add support for persistent maps/progs 2015-11-02 22:48:39 -05:00
math-emu
media media updates for v4.3-rc1 2015-09-11 16:42:39 -07:00
memory
misc
net net: fix percpu memory leaks 2015-11-02 22:47:14 -05:00
pcmcia
ras
rdma Changes for 4.3-rc1 2015-09-19 20:04:11 -07:00
rxrpc
scsi Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2015-09-11 19:00:42 -07:00
soc IOMMU Updates for Linux v4.3 2015-09-08 17:22:35 -07:00
sound Merge remote-tracking branches 'asoc/fix/rt298', 'asoc/fix/sx', 'asoc/fix/wm8904' and 'asoc/fix/wm8962' into asoc-linus 2015-10-23 08:44:14 +09:00
target target: Propigate backend read-only to core_tpg_add_lun 2015-09-24 23:17:21 -07:00
trace Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2015-09-11 16:13:47 -07:00
uapi bpf: add support for persistent maps/progs 2015-11-02 22:48:39 -05:00
video libnvdimm for 4.3: 2015-09-08 14:35:59 -07:00
xen x86/xen: Support kexec/kdump in HVM guests by doing a soft reset 2015-09-28 14:48:52 +01:00
Kbuild