OpenCloudOS-Kernel/net/ipv4
Daniel Borkmann 1b66d25361 bpf: Add get{peer, sock}name attach types for sock_addr
As stated in 983695fa67 ("bpf: fix unconnected udp hooks"), the objective
for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be
transparent to applications. In Cilium we make use of these hooks [0] in
order to enable E-W load balancing for existing Kubernetes service types
for all Cilium managed nodes in the cluster. Those backends can be local
or remote. The main advantage of this approach is that it operates as close
as possible to the socket, and therefore allows to avoid packet-based NAT
given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses.

This also allows to expose NodePort services on loopback addresses in the
host namespace, for example. As another advantage, this also efficiently
blocks bind requests for applications in the host namespace for exposed
ports. However, one missing item is that we also need to perform reverse
xlation for inet{,6}_getname() hooks such that we can return the service
IP/port tuple back to the application instead of the remote peer address.

The vast majority of applications does not bother about getpeername(), but
in a few occasions we've seen breakage when validating the peer's address
since it returns unexpectedly the backend tuple instead of the service one.
Therefore, this trivial patch allows to customise and adds a getpeername()
as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order
to address this situation.

Simple example:

  # ./cilium/cilium service list
  ID   Frontend     Service Type   Backend
  1    1.2.3.4:80   ClusterIP      1 => 10.0.0.10:80

Before; curl's verbose output example, no getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0)
  > GET / HTTP/1.1
  > Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

After; with getpeername() reverse xlation:

  # curl --verbose 1.2.3.4
  * Rebuilt URL to: 1.2.3.4/
  *   Trying 1.2.3.4...
  * TCP_NODELAY set
  * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
  > GET / HTTP/1.1
  >  Host: 1.2.3.4
  > User-Agent: curl/7.58.0
  > Accept: */*
  [...]

Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed
peer to the context similar as in inet{,6}_getname() fashion, but API-wise
this is suboptimal as it always enforces programs having to test for ctx->peer
which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split.
Similarly, the checked return code is on tnum_range(1, 1), but if a use case
comes up in future, it can easily be changed to return an error code instead.
Helper and ctx member access is the same as with connect/sendmsg/etc hooks.

  [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
2020-05-19 11:32:04 -07:00
..
bpfilter SPDX update for 5.2-rc2, round 1 2019-05-21 12:33:38 -07:00
netfilter netfilter: Replace zero-length array with flexible-array member 2020-03-15 15:20:16 +01:00
Kconfig docs: networking: convert ip-sysctl.txt to ReST 2020-04-28 14:40:18 -07:00
Makefile bpf: Add sockmap hooks for UDP sockets 2020-03-09 22:34:58 +01:00
af_inet.c bpf: Add get{peer, sock}name attach types for sock_addr 2020-05-19 11:32:04 -07:00
ah4.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
arp.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
bpf_tcp_ca.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-03-30 19:52:37 -07:00
cipso_ipv4.c netlabel: cope with NULL catmap 2020-05-12 18:12:40 -07:00
datagram.c inet: stop leaking jiffies on the wire 2019-11-01 14:57:52 -07:00
devinet.c Merge branch 'work.sysctl' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-04-28 21:23:38 +02:00
esp4.c ESP: Export esp_output_fill_trailer function 2020-02-19 13:52:32 +01:00
esp4_offload.c esp4: add gso_segment for esp4 beet mode 2020-03-26 14:51:07 +01:00
fib_frontend.c ipv4: fix a RCU-list lock in inet_dump_fib() 2020-03-23 20:59:40 -07:00
fib_lookup.h net: add net available in build_state 2020-03-29 22:30:57 -07:00
fib_notifier.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_rules.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
fib_semantics.c net: ipv4: add sysctl for nexthop api compatibility mode 2020-04-28 12:50:37 -07:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-30 20:48:43 -07:00
fou.c fou: Fix IPv6 netlink policy 2020-01-23 14:32:52 +01:00
gre_demux.c gre: fix uninit-value in __iptunnel_pull_header 2020-03-08 21:25:37 -07:00
gre_offload.c net: remove the check argument from __skb_gro_checksum_convert 2020-01-03 12:24:34 -08:00
icmp.c docs: networking: convert ip-sysctl.txt to ReST 2020-04-28 14:40:18 -07:00
igmp.c igmp: remove unused macro IGMP_Vx_UNSOLICITED_REPORT_INTERVAL 2020-02-23 21:15:36 -08:00
inet_connection_sock.c inet_connection_sock: factor out destroy helper. 2020-05-15 12:30:13 -07:00
inet_diag.c inet_diag: bc: read cgroup id only for full sockets 2020-05-02 16:42:51 -07:00
inet_fragment.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
inet_hashtables.c tcp/dccp: fix possible race __inet_lookup_established() 2019-12-13 21:40:49 -08:00
inet_timewait_sock.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
inetpeer.c inetpeer: fix data-race in inet_putpeer / inet_putpeer 2019-11-07 16:15:56 -08:00
ip_forward.c ipv4: Revert removal of rt_uses_gateway 2019-09-20 18:23:33 -07:00
ip_fragment.c inet: frags: re-introduce skb coalescing for local delivery 2019-08-08 15:55:10 -07:00
ip_gre.c erspan: Check IFLA_GRE_ERSPAN_VER is set. 2020-05-12 13:11:41 -07:00
ip_input.c bpf: Add socket assign support 2020-03-30 13:45:04 -07:00
ip_options.c netfilter: nf_tables: add support for matching IPv4 options 2019-06-21 18:35:51 +02:00
ip_output.c net: Fix typo of SKB_SGO_CB_OFFSET 2020-03-29 21:53:18 -07:00
ip_sockglue.c net: cleanly handle kernel vs user buffers for ->msg_control 2020-05-11 16:59:16 -07:00
ip_tunnel.c net, ip_tunnel: fix interface lookup with no key 2020-03-29 22:16:27 -07:00
ip_tunnel_core.c net: add net available in build_state 2020-03-29 22:30:57 -07:00
ip_vti.c vti[6]: fix packet tx through bpf_redirect() in XinY cases 2020-02-06 13:27:30 +01:00
ipcomp.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2019-07-05 15:01:15 -07:00
ipconfig.c Documentation: nfsroot.rst: Fix references to nfsroot.rst 2020-03-02 13:11:46 -07:00
ipip.c ipip: validate header length in ipip_tunnel_xmit 2019-07-25 17:23:40 -07:00
ipmr.c ipmr: Add lockdep expression to ipmr_for_each_table macro 2020-05-14 18:01:07 -07:00
ipmr_base.c net: fib_notifier: propagate extack down to the notifier block callback 2019-10-04 11:10:56 -07:00
metrics.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
netfilter.c netfilter: ipv4: remove useless export_symbol 2019-01-28 11:32:58 +01:00
netlink.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
nexthop.c net: ipv4: add sysctl for nexthop api compatibility mode 2020-04-28 12:50:37 -07:00
ping.c ip: support SO_MARK cmsg 2019-09-13 21:44:19 +02:00
proc.c mptcp: add and use MIB counter infrastructure 2020-03-29 22:14:49 -07:00
protocol.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
raw.c raw: Add missing annotations to raw_seq_start() and raw_seq_stop() 2020-03-11 23:19:40 -07:00
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
syncookies.c mptcp: handle tcp fallback when using syn cookies 2020-01-29 17:45:20 +01:00
sysctl_net_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-05-01 17:02:27 -07:00
tcp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
tcp_bbr.c tcp_bbr: improve arithmetic division in bbr_update_bw() 2020-01-21 10:45:49 +01:00
tcp_bic.c tcp: fix stretch ACK bugs in BIC 2020-03-16 18:26:54 -07:00
tcp_bpf.c bpf, sockmap: bpf_tcp_ingress needs to subtract bytes from sg.size 2020-05-06 00:22:22 +02:00
tcp_cdg.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_cong.c bpf: tcp: Support tcp_congestion_ops in bpf 2020-01-09 08:46:18 -08:00
tcp_cubic.c tcp_cubic: refactor code to perform a divide only when needed 2019-12-30 14:44:27 -08:00
tcp_dctcp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_dctcp.h tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_diag.c inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data 2020-02-27 18:50:19 -08:00
tcp_fastopen.c tcp: add TCP_INFO status for failed client TFO 2019-10-25 19:25:37 -07:00
tcp_highspeed.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_htcp.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_hybla.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_illinois.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-15 13:48:59 -07:00
tcp_ipv4.c tcp: add hrtimer slack to sack compression 2020-04-30 13:24:01 -07:00
tcp_lp.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_metrics.c net-tcp: Disable TCP ssthresh metrics cache by default 2019-12-09 20:17:48 -08:00
tcp_minisocks.c mptcp: add new sock flag to deal with join subflows 2020-05-15 12:30:13 -07:00
tcp_nv.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_offload.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tcp_output.c tcp: defer xmit timer reset in tcp_xmit_retransmit_queue() 2020-05-06 17:29:38 -07:00
tcp_rate.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
tcp_recovery.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_scalable.c tcp: fix stretch ACK bugs in Scalable 2020-03-16 18:26:54 -07:00
tcp_timer.c tcp: add tp->dup_ack_counter 2020-04-30 13:24:01 -07:00
tcp_ulp.c bpf: sockmap: Only check ULP for TCP sockets 2020-03-09 22:34:58 +01:00
tcp_vegas.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_vegas.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_veno.c tcp: fix stretch ACK bugs in Veno 2020-03-16 18:26:55 -07:00
tcp_westwood.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
tcp_yeah.c tcp: fix stretch ACK bugs in Yeah 2020-03-16 18:26:55 -07:00
tunnel4.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-03-30 19:52:37 -07:00
udp_bpf.c bpf: Add sockmap hooks for UDP sockets 2020-03-09 22:34:58 +01:00
udp_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
udp_impl.h udp: add missing rehash callback to udplite 2019-01-17 15:01:08 -08:00
udp_offload.c udp: initialize is_flist with 0 in udp_gro_receive 2020-03-30 10:35:03 -07:00
udp_tunnel.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
udplite.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
xfrm4_input.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_output.c xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish 2020-04-22 12:32:11 -07:00
xfrm4_policy.c net: add bool confirm_neigh parameter for dst_ops.update_pmtu 2019-12-24 22:28:54 -08:00
xfrm4_protocol.c xfrm: add route lookup to xfrm4_rcv_encap 2019-12-09 09:59:07 +01:00
xfrm4_state.c xfrm: remove eth_proto value from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00
xfrm4_tunnel.c xfrm: remove type and offload_type map from xfrm_state_afinfo 2019-06-06 08:34:50 +02:00