OpenCloudOS-Kernel/net/ipv6
Daniel Borkmann 1b66d25361 bpf: Add get{peer, sock}name attach types for sock_addr
As stated in 983695fa67 ("bpf: fix unconnected udp hooks"), the objective
for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be
transparent to applications. In Cilium we make use of these hooks [0] in
order to enable E-W load balancing for existing Kubernetes service types
for all Cilium managed nodes in the cluster. Those backends can be local
or remote. The main advantage of this approach is that it operates as close
as possible to the socket, and therefore allows to avoid packet-based NAT
given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses.

This also allows to expose NodePort services on loopback addresses in the
host namespace, for example. As another advantage, this also efficiently
blocks bind requests for applications in the host namespace for exposed
ports. However, one missing item is that we also need to perform reverse
xlation for inet{,6}_getname() hooks such that we can return the service
IP/port tuple back to the application instead of the remote peer address.

The vast majority of applications does not bother about getpeername(), but
in a few occasions we've seen breakage when validating the peer's address
since it returns unexpectedly the backend tuple instead of the service one.
Therefore, this trivial patch allows to customise and adds a getpeername()
as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order
to address this situation.

Simple example:

  # ./cilium/cilium service list
  ID   Frontend     Service Type   Backend
  1    1.2.3.4:80   ClusterIP      1 => 10.0.0.10:80

Before; curl's verbose output example, no getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0)
  > GET / HTTP/1.1
  > Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

After; with getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
  > GET / HTTP/1.1
  >  Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed
peer to the context similar as in inet{,6}_getname() fashion, but API-wise
this is suboptimal as it always enforces programs having to test for ctx->peer
which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split.
Similarly, the checked return code is on tnum_range(1, 1), but if a use case
comes up in future, it can easily be changed to return an error code instead.
Helper and ctx member access is the same as with connect/sendmsg/etc hooks.

  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
2020-05-19 11:32:04 -07:00
..
ila ila: remove unused inline function ila_addr_is_ila 2020-04-29 12:01:31 -07:00
netfilter netfilter: Replace zero-length array with flexible-array member 2020-03-15 15:20:16 +01:00
Kconfig docs: networking: convert ipv6.txt to ReST 2020-04-28 14:40:18 -07:00
Makefile net: ipv6: add rpl sr tunnel 2020-03-29 22:30:57 -07:00
addrconf.c ipv6: Implement draft-ietf-6man-rfc4941bis 2020-05-06 17:00:02 -07:00
addrconf_core.c net: ipv6: new arg skip_notify to ip6_rt_del 2020-04-28 12:50:37 -07:00
addrlabel.c netlink: make validation more configurable for future strictness 2019-04-27 17:07:21 -04:00
af_inet6.c bpf: Add get{peer, sock}name attach types for sock_addr 2020-05-19 11:32:04 -07:00
ah6.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
anycast.c net: ipv6: new arg skip_notify to ip6_rt_del 2020-04-28 12:50:37 -07:00
calipso.c netlabel: cope with NULL catmap 2020-05-12 18:12:40 -07:00
datagram.c net: ipv6: add net argument to ip6_dst_lookup_flow 2019-12-04 12:27:12 -08:00
esp6.c ESP: Export esp_output_fill_trailer function 2020-02-19 13:52:32 +01:00
esp6_offload.c esp6: add gso_segment for esp6 beet mode 2020-03-26 14:51:07 +01:00
exthdrs.c net: ipv6: add support for rpl sr exthdr 2020-03-29 22:30:57 -07:00
exthdrs_core.c ipv6: remove printk 2019-07-27 14:23:48 -07:00
exthdrs_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
fib6_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib6_rules.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fou6.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
icmp.c net: icmp6: do not select saddr from iif when route has prefsrc set 2020-04-07 18:25:10 -07:00
inet6_connection_sock.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
inet6_hashtables.c net: annotate accesses to sk->sk_incoming_cpu 2019-10-30 13:24:25 -07:00
ip6_checksum.c net: udp: fix handling of CHECKSUM_COMPLETE packets 2018-10-24 14:18:16 -07:00
ip6_fib.c bpf: Enable bpf_iter targets registering ctx argument types 2020-05-13 12:30:50 -07:00
ip6_flowlabel.c ipv6: fix static key imbalance in fl_create() 2019-07-11 14:43:25 -07:00
ip6_gre.c net: ip6_gre: Distribute switch variables for initialization 2020-02-20 10:00:19 -08:00
ip6_icmp.c icmp: introduce helper for nat'd source address in network device context 2020-02-13 14:19:00 -08:00
ip6_input.c bpf: Add socket assign support 2020-03-30 13:45:04 -07:00
ip6_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip6_offload.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip6_output.c net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc. 2020-02-24 13:31:42 -08:00
ip6_tunnel.c net: ip6_gre: Distribute switch variables for initialization 2020-02-20 10:00:19 -08:00
ip6_udp_tunnel.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
ip6_vti.c vti6: Fix memory leak of skb if input policy check fails 2020-03-16 11:13:48 +01:00
ip6mr.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
ipcomp6.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
ipv6_sockglue.c ipv6: set msg_control_is_user in do_ipv6_getsockopt 2020-05-13 13:11:45 -07:00
mcast.c mld: fix memory leak in mld_del_delrec() 2019-08-28 14:47:35 -07:00
mcast_snoop.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 343 2019-06-05 17:37:07 +02:00
mip6.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
ndisc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-05-01 17:02:27 -07:00
netfilter.c net: ensure correct skb->tstamp in various fragmenters 2019-10-18 10:02:37 -07:00
output_core.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
ping.c ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' 2019-09-12 11:20:33 +01:00
proc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-06-07 11:00:14 -07:00
protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
raw.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
reassembly.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
rpl.c ipv6: rpl: fix full address compression 2020-04-18 15:04:27 -07:00
rpl_iptunnel.c net: ipv6: rpl_iptunnel: remove redundant assignments to variable err 2020-04-02 06:57:34 -07:00
seg6.c seg6: fix SRH processing to comply with RFC8754 2020-05-06 17:21:35 -07:00
seg6_hmac.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
seg6_iptunnel.c net: add net available in build_state 2020-03-29 22:30:57 -07:00
seg6_local.c net: add net available in build_state 2020-03-29 22:30:57 -07:00
sit.c sit: do not confirm neighbor when do pmtu update 2019-12-24 22:28:55 -08:00
syncookies.c mptcp: handle tcp fallback when using syn cookies 2020-01-29 17:45:20 +01:00
sysctl_net_ipv6.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
tcp_ipv6.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
tcpv6_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tunnel6.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
udp.c net: Track socket refcounts in skb_steal_sock() 2020-03-30 13:45:04 -07:00
udp_impl.h udp6: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
udp_offload.c udp: Support UDP fraglist GRO/GSO. 2020-01-27 11:00:21 +01:00
udplite.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xfrm6_input.c net: use skb_sec_path helper in more places 2018-12-19 11:21:37 -08:00
xfrm6_output.c xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish 2020-04-22 12:32:11 -07:00
xfrm6_policy.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
xfrm6_protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xfrm6_state.c xfrm: remove eth_proto value from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
xfrm6_tunnel.c ipv6: xfrm6_tunnel.c: Use built-in RCU list checking 2020-02-27 10:17:41 +01:00