OpenCloudOS-Kernel/net/ipv6
Tom Herbert 47d3d7ac65 ipv6: Implement limits on Hop-by-Hop and Destination options
RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).

By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.

  25.38%  [kernel]                    [k] __fib6_clean_all
  21.63%  [kernel]                    [k] ip6_parse_tlv
   4.21%  [kernel]                    [k] __local_bh_enable_ip
   2.18%  [kernel]                    [k] ip6_pol_route.isra.39
   1.98%  [kernel]                    [k] fib6_walk_continue
   1.88%  [kernel]                    [k] _raw_write_lock_bh
   1.65%  [kernel]                    [k] dst_release

This patch adds configurable limits to Destination and Hop-by-Hop
options. There are three limits that may be set:
  - Limit the number of options in a Hop-by-Hop or Destination options
    extension header.
  - Limit the byte length of a Hop-by-Hop or Destination options
    extension header.
  - Disallow unrecognized options in a Hop-by-Hop or Destination
    options extension header.

The limits are set in corresponding sysctls:

  ipv6.sysctl.max_dst_opts_cnt
  ipv6.sysctl.max_hbh_opts_cnt
  ipv6.sysctl.max_dst_opts_len
  ipv6.sysctl.max_hbh_opts_len

If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.

If a limit is exceeded when processing an extension header the packet is
dropped.

Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).

These limits have being proposed in draft-ietf-6man-rfc6434-bis.

Tested (by Martin Lau)

I tested out 1 thread (i.e. one raw_udp process).

I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.

v2;
  - Code and documention cleanup.
  - Change references of RFC2460 to be RFC8200.
  - Add reference to RFC6434-bis where the limits will be in standard.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03 09:50:22 +09:00
..
ila netfilter: nf_hook_ops structs can be const 2017-07-31 19:10:44 +02:00
netfilter ipv6: mark expected switch fall-throughs 2017-10-18 14:13:08 +01:00
Kconfig ipv6: sr: add helper functions for seg6local 2017-08-25 17:10:24 -07:00
Makefile ipv6: sr: define core operations for seg6local lightweight tunnel 2017-08-07 14:16:22 -07:00
addrconf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-02 15:23:39 +09:00
addrconf_core.c net: ipv6: Make inet6addr_validator a blocking notifier 2017-10-20 13:15:07 +01:00
addrlabel.c ipv6: addrlabel: remove refcounting 2017-10-09 10:47:30 -07:00
af_inet6.c ipv6: Implement limits on Hop-by-Hop and Destination options 2017-11-03 09:50:22 +09:00
ah6.c ipv6: mark expected switch fall-throughs 2017-10-18 14:13:08 +01:00
anycast.c net, ipv6: convert ifacaddr6.aca_refcnt from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00
calipso.c net, calipso: convert calipso_doi.refcount from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
datagram.c net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
esp6.c ipv6: esp6: use BUG_ON instead of if condition followed by BUG 2017-10-27 08:02:00 +02:00
esp6_offload.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-01 17:42:05 -07:00
exthdrs.c ipv6: Implement limits on Hop-by-Hop and Destination options 2017-11-03 09:50:22 +09:00
exthdrs_core.c net: ipv6: remove unused code in ipv6_find_hdr() 2017-10-05 21:53:02 -07:00
exthdrs_offload.c ipv6: fix exthdrs offload registration in out_rt path 2015-09-02 15:31:00 -07:00
fib6_notifier.c net: Add module reference to FIB notifiers 2017-09-01 20:33:42 -07:00
fib6_rules.c net: ipv6: avoid overhead when no custom FIB rules are installed 2017-08-08 21:40:08 -07:00
fou6.c fou: make local function static 2017-05-21 13:42:36 -04:00
icmp.c ipv6: mark expected switch fall-throughs 2017-10-18 14:13:08 +01:00
inet6_connection_sock.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-28 10:33:06 -05:00
inet6_hashtables.c net: ipv6: add second dif to inet6 socket lookups 2017-08-07 11:39:22 -07:00
ip6_checksum.c ipv6: fix checksum annotation in udp6_csum_init 2016-06-14 15:26:42 -04:00
ip6_fib.c net: Add extack to fib_notifier_info 2017-11-01 11:50:43 +09:00
ip6_flowlabel.c ipv6: flowlabel: do not leave opt->tot_len with garbage 2017-10-22 03:22:24 +01:00
ip6_gre.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-30 21:09:24 +09:00
ip6_icmp.c ipv6: icmp: add a force_saddr param to icmp6_send() 2016-06-18 22:11:38 -07:00
ip6_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-04-20 10:35:33 -04:00
ip6_offload.c gso: fix payload length when gso_size is zero 2017-10-08 10:12:15 -07:00
ip6_offload.h udp: Add GRO functions to UDP socket 2016-04-07 16:53:29 -04:00
ip6_output.c ipv6: flowlabel: do not leave opt->tot_len with garbage 2017-10-22 03:22:24 +01:00
ip6_tunnel.c ip6_tunnel: Allow rcv/xmit even if remote address is a local address 2017-10-25 10:33:27 +09:00
ip6_udp_tunnel.c ip6_udp_tunnel: remove unused IPCB related codes 2016-11-02 15:18:36 -04:00
ip6_vti.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-05 18:19:22 -07:00
ip6mr.c ipv6: mark expected switch fall-throughs 2017-10-18 14:13:08 +01:00
ipcomp6.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
ipv6_sockglue.c net-ipv6: add support for sockopt(SOL_IPV6, IPV6_FREEBIND) 2017-09-30 05:30:52 +01:00
mcast.c net, ipv6: convert ifmcaddr6.mca_refcnt from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00
mcast_snoop.c
mip6.c ktime: Get rid of ktime_equal() 2016-12-25 17:21:23 +01:00
ndisc.c net: display hw address of source machine during ipv6 DAD failure 2017-11-01 20:53:49 +09:00
netfilter.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
output_core.c ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt() 2017-08-22 10:23:26 -07:00
ping.c net/ipv6: Convert icmpv6_push_pending_frames to void 2017-10-06 09:52:31 -07:00
proc.c proc: snmp6: Use correct type in memset 2017-06-12 09:53:14 -04:00
protocol.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
raw.c ipv6: mark expected switch fall-throughs 2017-10-18 14:13:08 +01:00
reassembly.c inet: frags: Convert timers to use timer_setup() 2017-10-18 12:39:55 +01:00
route.c ipv6: prevent user from adding cached routes 2017-10-29 12:18:58 +09:00
seg6.c ipv6: sr: define core operations for seg6local lightweight tunnel 2017-08-07 14:16:22 -07:00
seg6_hmac.c ipv6: sr: Use ARRAY_SIZE macro 2017-09-01 18:35:23 -07:00
seg6_iptunnel.c ipv6: sr: add support for encapsulation of L2 frames 2017-08-25 17:10:23 -07:00
seg6_local.c ipv6: sr: remove duplicate routing header type check 2017-09-11 14:34:10 -07:00
sit.c net: sit: Update lookup to handle links set to L3 slave 2017-11-01 12:35:17 +09:00
syncookies.c tcp: Namespace-ify sysctl_tcp_workaround_signed_windows 2017-10-28 19:24:38 +09:00
sysctl_net_ipv6.c ipv6: Implement limits on Hop-by-Hop and Destination options 2017-11-03 09:50:22 +09:00
tcp_ipv6.c tcp: add tracepoint trace_tcp_send_reset 2017-10-24 01:21:25 +01:00
tcpv6_offload.c
tunnel6.c ipv6: fix tunnel error handling 2015-11-03 10:52:13 -05:00
udp.c udpv6: Fix the checksum computation when HW checksum does not apply 2017-09-18 11:43:03 -07:00
udp_impl.h udp: make *udp*_queue_rcv_skb() functions static 2017-05-18 10:23:33 -04:00
udp_offload.c net: avoid skb_warn_bad_offload false positives on UFO 2017-08-08 21:39:01 -07:00
udplite.c udplite: call proper backlog handlers 2016-11-24 15:32:14 -05:00
xfrm6_input.c xfrm6: Fix CHECKSUM_COMPLETE after IPv6 header push 2017-08-02 11:00:15 +02:00
xfrm6_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm6_mode_ro.c ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() 2017-06-02 13:57:27 -04:00
xfrm6_mode_transport.c ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() 2017-06-02 13:57:27 -04:00
xfrm6_mode_tunnel.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm6_output.c xfrm: Add an IPsec hardware offloading API 2017-04-14 10:06:10 +02:00
xfrm6_policy.c ipv6: mark expected switch fall-throughs 2017-10-18 14:13:08 +01:00
xfrm6_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm6_state.c
xfrm6_tunnel.c net, ipv6: convert xfrm6_tunnel_spi.refcnt from atomic_t to refcount_t 2017-07-04 01:29:04 -07:00