OpenCloudOS-Kernel/net/ipv4
Christophe Gouault 7263a5187f vti: get rid of nf mark rule in prerouting
This patch fixes and improves the use of vti interfaces (while
lightly changing the way of configuring them).

Currently:

- it is necessary to identify and mark inbound IPsec
  packets destined to each vti interface, via netfilter rules in
  the mangle table at prerouting hook.

- the vti module cannot retrieve the right tunnel in input since
  commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup
  is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them.

- the i_key is used by the outbound processing as a mark to lookup
  for the right SP and SA bundle.

This patch uses the o_key to store the vti mark (instead of i_key) and
enables:

- to avoid the need for previously marking the inbound skbuffs via a
  netfilter rule.
- to properly retrieve the right tunnel in input, only based on the IPsec
  packet outer addresses.
- to properly perform an inbound policy check (using the tunnel o_key
  as a mark).
- to properly perform an outbound SPD and SAD lookup (using the tunnel
  o_key as a mark).
- to keep the current mark of the skbuff. The skbuff mark is neither
  used nor changed by the vti interface. Only the vti interface o_key
  is used.

SAs have a wildcard mark.
SPs have a mark equal to the vti interface o_key.

The vti interface must be created as follows (i_key = 0, o_key = mark):

   ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1

The SPs attached to vti1 must be created as follows (mark = vti1 o_key):

   ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \
      proto esp mode tunnel
   ip xfrm policy add dir in  mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \
      proto esp mode tunnel

The SAs are created with the default wildcard mark. There is no
distinction between global vs. vti SAs. Just their addresses will
possibly link them to a vti interface:

   ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \
                 enc "cbc(aes)" "azertyuiopqsdfgh"

   ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \
                 enc "cbc(aes)" "sqbdhgqsdjqjsdfh"

To avoid matching "global" (not vti) SPs in vti interfaces, global SPs
should no use the default wildcard mark, but explicitly match mark 0.

To avoid a double SPD lookup in input and output (in global and vti SPDs),
the NOPOLICY and NOXFRM options should be set on the vti interfaces:

   echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_policy
   echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_xfrm

The outgoing traffic is steered to vti1 by a route via the vti interface:

   ip route add 192.168.0.0/16 dev vti1

The incoming IPsec traffic is steered to vti1 because its outer addresses
match the vti1 tunnel configuration.

Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-11 14:52:17 -04:00
..
netfilter netfilter: synproxy: fix BUG_ON triggered by corrupt TCP packets 2013-09-30 12:44:38 +02:00
Kconfig net: neighbour: Remove CONFIG_ARPD 2013-09-03 21:41:43 -04:00
Makefile net: gre: move GSO functions to gre_offload 2013-07-03 14:37:39 -07:00
af_inet.c net: net_secret should not depend on TCP 2013-09-28 15:19:40 -07:00
ah4.c ipv4: properly refresh rtable entries on pmtu/redirect events 2013-06-03 00:07:42 -07:00
arp.c net: neighbour: Remove CONFIG_ARPD 2013-09-03 21:41:43 -04:00
cipso_ipv4.c cipso: don't follow a NULL pointer when setsockopt() is called 2012-07-18 09:01:12 -07:00
datagram.c ipv4: Add a socket release callback for datagram sockets 2013-01-21 14:17:05 -05:00
devinet.c net: igmp: Allow user-space configuration of igmp unsolicited report interval 2013-08-09 11:27:46 -07:00
esp4.c net: esp{4,6}: fix potential MTU calculation overflows 2013-08-05 12:26:50 -07:00
fib_frontend.c netlink: fix splat in skb_clone with large messages 2013-06-27 22:44:16 -07:00
fib_lookup.h ipv4: Fix nexthop caching wrt. scoping. 2011-03-24 18:06:47 -07:00
fib_rules.c fib_rules: fix suppressor names and default values 2013-08-03 10:40:23 -07:00
fib_semantics.c ipv4: use next hop exceptions also for input routes 2013-06-28 21:27:47 -07:00
fib_trie.c fib_trie: remove potential out of bound access 2013-08-05 15:26:11 -07:00
gre_demux.c net: gre: move GSO functions to gre_offload 2013-07-03 14:37:39 -07:00
gre_offload.c gso: Update tunnel segmentation to support Tx checksum offload 2013-07-11 12:18:49 -07:00
icmp.c icmp: avoid allocating large struct on stack 2013-06-03 00:28:44 -07:00
igmp.c ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put 2013-09-30 22:28:56 -07:00
inet_connection_sock.c tcp: Remove TCPCT 2013-03-17 14:35:13 -04:00
inet_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
inet_fragment.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-07-09 18:24:39 -07:00
inet_hashtables.c net: do not call sock_put() on TIMEWAIT sockets 2013-10-02 17:05:54 -04:00
inet_lro.c ipv4: replace ip_fast_csum with csum_replace2 2013-03-15 09:12:25 -04:00
inet_timewait_sock.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
inetpeer.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
ip_forward.c ipv4: introduce rt_uses_gateway 2012-10-08 17:42:36 -04:00
ip_fragment.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-04-22 20:32:51 -04:00
ip_gre.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
ip_input.c net: add SNMP counters tracking incoming ECN bits 2013-08-08 22:24:59 -07:00
ip_options.c net/ipv4: Ensure that location of timestamp option is stored 2013-03-12 05:35:39 -04:00
ip_output.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
ip_sockglue.c net: prevent setting ttl=0 via IP_TTL 2013-01-08 17:57:10 -08:00
ip_tunnel.c ip_tunnel: Remove double unregister of the fallback device 2013-10-01 12:42:16 -04:00
ip_tunnel_core.c ip_tunnel_core: Change __skb_push back to skb_push 2013-10-01 12:42:16 -04:00
ip_vti.c vti: get rid of nf mark rule in prerouting 2013-10-11 14:52:17 -04:00
ipcomp.c ipv4: properly refresh rtable entries on pmtu/redirect events 2013-06-03 00:07:42 -07:00
ipconfig.c ipconfig: add informative timeout messages while waiting for carrier 2013-04-02 14:35:33 -04:00
ipip.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-09-05 14:58:52 -04:00
ipmr.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
netfilter.c netfilter: add my copyright statements 2013-04-18 20:27:55 +02:00
ping.c net: proc_fs: trivial: print UIDs as unsigned int 2013-08-15 14:37:46 -07:00
proc.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
protocol.c ipv4: Disallow non-namespace aware protocols to register. 2013-02-05 14:42:23 -05:00
raw.c net: raw: do not report ICMP redirects to user space 2013-09-24 10:15:49 -04:00
route.c ipv4: fix ineffective source address selection 2013-10-07 15:26:46 -04:00
syncookies.c net: syncookies: export cookie_v4_init_sequence/cookie_v4_check 2013-08-28 00:27:44 +02:00
sysctl_net_ipv4.c tcp: TSO packets automatic sizing 2013-08-29 15:50:06 -04:00
tcp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-09-05 14:58:52 -04:00
tcp_bic.c tcp: fix undo after RTO for BIC 2012-01-20 14:17:26 -05:00
tcp_cong.c tcp: remove Appropriate Byte Count support 2013-02-05 14:51:16 -05:00
tcp_cubic.c tcp: cubic: fix bug in bictcp_acked() 2013-08-07 10:35:08 -07:00
tcp_diag.c inet_diag: Rename inet_diag_req into inet_diag_req_v2 2012-01-11 12:56:06 -08:00
tcp_fastopen.c tcp: add server ip to encrypt cookie in fast open 2013-08-10 00:35:33 -07:00
tcp_highspeed.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_htcp.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_hybla.c tcp: bool conversions 2012-05-17 14:59:59 -04:00
tcp_illinois.c net: fix divide by zero in tcp algorithm illinois 2012-11-01 11:55:59 -04:00
tcp_input.c tcp: do not forget FIN in tcp_shifted_skb() 2013-10-04 14:16:36 -04:00
tcp_ipv4.c tcp: Change return value of tcp_rcv_established() 2013-09-04 00:27:28 -04:00
tcp_lp.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp_memcontrol.c memcg: rename RESOURCE_MAX to RES_COUNTER_MAX 2013-09-12 15:38:02 -07:00
tcp_metrics.c tcp: fix RTO calculated from cached RTT 2013-09-17 19:08:08 -04:00
tcp_minisocks.c tcp: consolidate SYNACK RTT sampling 2013-07-22 17:53:42 -07:00
tcp_offload.c net: tcp: move GRO/GSO functions to tcp_offload 2013-06-07 14:39:05 -07:00
tcp_output.c tcp: Always set options to 0 before calling tcp_established_options 2013-10-02 16:32:43 -04:00
tcp_probe.c tcp: Change return value of tcp_rcv_established() 2013-09-04 00:27:28 -04:00
tcp_scalable.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_timer.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_vegas.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_vegas.h
tcp_veno.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_westwood.c tcp: refactor F-RTO 2013-03-21 11:47:50 -04:00
tcp_yeah.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tunnel4.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
udp.c net: udp: do not report ICMP redirects to user space 2013-09-24 10:15:49 -04:00
udp_diag.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
udp_impl.h ipv4: fix checkpatch errors 2012-04-15 12:37:19 -04:00
udp_offload.c net: udp4: move GSO functions to udp_offload 2013-06-12 00:47:25 -07:00
udplite.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
xfrm4_input.c net: Add skb_unclone() helper function. 2013-02-15 15:10:37 -05:00
xfrm4_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ip: generate unique IP identificator if local fragmentation is allowed 2013-09-19 14:11:15 -04:00
xfrm4_output.c xfrm: revert ipv4 mtu determination to dst_mtu 2013-08-26 12:40:53 +02:00
xfrm4_policy.c xfrm: Decode sessions with output interface. 2013-09-16 09:39:43 +02:00
xfrm4_state.c xfrm: make local error reporting more robust 2013-08-14 13:07:12 +02:00
xfrm4_tunnel.c sit: add IPv4 over IPv4 support 2013-05-31 17:19:05 -07:00