flow_dissector: document BPF flow dissector environment
Short doc on what BPF flow dissector should expect in the input __sk_buff and flow_keys. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit is contained in:
parent
2ee7fba0d6
commit
ae82899bbe
|
@ -0,0 +1,115 @@
|
|||
==================
|
||||
BPF Flow Dissector
|
||||
==================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
Flow dissector is a routine that parses metadata out of the packets. It's
|
||||
used in the various places in the networking subsystem (RFS, flow hash, etc).
|
||||
|
||||
BPF flow dissector is an attempt to reimplement C-based flow dissector logic
|
||||
in BPF to gain all the benefits of BPF verifier (namely, limits on the
|
||||
number of instructions and tail calls).
|
||||
|
||||
API
|
||||
===
|
||||
|
||||
BPF flow dissector programs operate on an __sk_buff. However, only the
|
||||
limited set of fields is allowed: data, data_end and flow_keys. flow_keys
|
||||
is 'struct bpf_flow_keys' and contains flow dissector input and
|
||||
output arguments.
|
||||
|
||||
The inputs are:
|
||||
* nhoff - initial offset of the networking header
|
||||
* thoff - initial offset of the transport header, initialized to nhoff
|
||||
* n_proto - L3 protocol type, parsed out of L2 header
|
||||
|
||||
Flow dissector BPF program should fill out the rest of the 'struct
|
||||
bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also
|
||||
adjusted accordingly.
|
||||
|
||||
The return code of the BPF program is either BPF_OK to indicate successful
|
||||
dissection, or BPF_DROP to indicate parsing error.
|
||||
|
||||
__sk_buff->data
|
||||
===============
|
||||
|
||||
In the VLAN-less case, this is what the initial state of the BPF flow
|
||||
dissector looks like:
|
||||
+------+------+------------+-----------+
|
||||
| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
|
||||
+------+------+------------+-----------+
|
||||
^
|
||||
|
|
||||
+-- flow dissector starts here
|
||||
|
||||
skb->data + flow_keys->nhoff point to the first byte of L3_HEADER.
|
||||
flow_keys->thoff = nhoff
|
||||
flow_keys->n_proto = ETHER_TYPE
|
||||
|
||||
|
||||
In case of VLAN, flow dissector can be called with the two different states.
|
||||
|
||||
Pre-VLAN parsing:
|
||||
+------+------+------+-----+-----------+-----------+
|
||||
| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
|
||||
+------+------+------+-----+-----------+-----------+
|
||||
^
|
||||
|
|
||||
+-- flow dissector starts here
|
||||
|
||||
skb->data + flow_keys->nhoff point the to first byte of TCI.
|
||||
flow_keys->thoff = nhoff
|
||||
flow_keys->n_proto = TPID
|
||||
|
||||
Please note that TPID can be 802.1AD and, hence, BPF program would
|
||||
have to parse VLAN information twice for double tagged packets.
|
||||
|
||||
|
||||
Post-VLAN parsing:
|
||||
+------+------+------+-----+-----------+-----------+
|
||||
| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
|
||||
+------+------+------+-----+-----------+-----------+
|
||||
^
|
||||
|
|
||||
+-- flow dissector starts here
|
||||
|
||||
skb->data + flow_keys->nhoff point the to first byte of L3_HEADER.
|
||||
flow_keys->thoff = nhoff
|
||||
flow_keys->n_proto = ETHER_TYPE
|
||||
|
||||
In this case VLAN information has been processed before the flow dissector
|
||||
and BPF flow dissector is not required to handle it.
|
||||
|
||||
|
||||
The takeaway here is as follows: BPF flow dissector program can be called with
|
||||
the optional VLAN header and should gracefully handle both cases: when single
|
||||
or double VLAN is present and when it is not present. The same program
|
||||
can be called for both cases and would have to be written carefully to
|
||||
handle both cases.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference
|
||||
implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for
|
||||
the loader. bpftool can be used to load BPF flow dissector program as well.
|
||||
|
||||
The reference implementation is organized as follows:
|
||||
* jmp_table map that contains sub-programs for each supported L3 protocol
|
||||
* _dissect routine - entry point; it does input n_proto parsing and does
|
||||
bpf_tail_call to the appropriate L3 handler
|
||||
|
||||
Since BPF at this point doesn't support looping (or any jumping back),
|
||||
jmp_table is used instead to handle multiple levels of encapsulation (and
|
||||
IPv6 options).
|
||||
|
||||
|
||||
Current Limitations
|
||||
===================
|
||||
BPF flow dissector doesn't support exporting all the metadata that in-kernel
|
||||
C-based implementation can export. Notable example is single VLAN (802.1Q)
|
||||
and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys'
|
||||
for a set of information that's currently can be exported from the BPF context.
|
Loading…
Reference in New Issue