OpenCloudOS-Kernel/net/ipv4
Eric Dumazet 93ab6cc691 tcp: implement mmap() for zero copy receive
Some networks can make sure TCP payload can exactly fit 4KB pages,
with well chosen MSS/MTU and architectures.

Implement mmap() system call so that applications can avoid
copying data without complex splice() games.

Note that a successful mmap( X bytes) on TCP socket is consuming
bytes, as if recvmsg() has been done. (tp->copied += X)

Only PROT_READ mappings are accepted, as skb page frags
are fundamentally shared and read only.

If tcp_mmap() finds data that is not a full page, or a patch of
urgent data, -EINVAL is returned, no bytes are consumed.

Application must fallback to recvmsg() to read the problematic sequence.

mmap() wont block,  regardless of socket being in blocking or
non-blocking mode. If not enough bytes are in receive queue,
mmap() would return -EAGAIN, or -EIO if socket is in a state
where no other bytes can be added into receive queue.

An application might use SO_RCVLOWAT, poll() and/or ioctl( FIONREAD)
to efficiently use mmap()

On the sender side, MSG_EOR might help to clearly separate unaligned
headers and 4K-aligned chunks if necessary.

Tested:

mlx4 (cx-3) 40Gbit NIC, with tcp_mmap program provided in following patch.
MTU set to 4168  (4096 TCP payload, 40 bytes IPv6 header, 32 bytes TCP header)

Without mmap() (tcp_mmap -s)

received 32768 MB (0 % mmap'ed) in 8.13342 s, 33.7961 Gbit,
  cpu usage user:0.034 sys:3.778, 116.333 usec per MB, 63062 c-switches
received 32768 MB (0 % mmap'ed) in 8.14501 s, 33.748 Gbit,
  cpu usage user:0.029 sys:3.997, 122.864 usec per MB, 61903 c-switches
received 32768 MB (0 % mmap'ed) in 8.11723 s, 33.8635 Gbit,
  cpu usage user:0.048 sys:3.964, 122.437 usec per MB, 62983 c-switches
received 32768 MB (0 % mmap'ed) in 8.39189 s, 32.7552 Gbit,
  cpu usage user:0.038 sys:4.181, 128.754 usec per MB, 55834 c-switches

With mmap() on receiver (tcp_mmap -s -z)

received 32768 MB (100 % mmap'ed) in 8.03083 s, 34.2278 Gbit,
  cpu usage user:0.024 sys:1.466, 45.4712 usec per MB, 65479 c-switches
received 32768 MB (100 % mmap'ed) in 7.98805 s, 34.4111 Gbit,
  cpu usage user:0.026 sys:1.401, 43.5486 usec per MB, 65447 c-switches
received 32768 MB (100 % mmap'ed) in 7.98377 s, 34.4296 Gbit,
  cpu usage user:0.028 sys:1.452, 45.166 usec per MB, 65496 c-switches
received 32768 MB (99.9969 % mmap'ed) in 8.01838 s, 34.281 Gbit,
  cpu usage user:0.02 sys:1.446, 44.7388 usec per MB, 65505 c-switches

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-16 18:26:37 -04:00
..
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
Kconfig ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
Makefile ipmr,ipmr6: Define a uniform vif_device 2018-03-01 13:13:23 -05:00
af_inet.c tcp: implement mmap() for zero copy receive 2018-04-16 18:26:37 -04:00
ah4.c net: use -ENOSPC for transient busy indication 2017-11-03 22:11:17 +08:00
arp.c arp: fix arp_filter on l3slave devices 2018-04-05 22:05:03 -04:00
cipso_ipv4.c tcp/dccp: fix ireq->opt races 2017-10-21 01:33:19 +01:00
datagram.c
devinet.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
esp4.c esp4: remove redundant initialization of pointer esph 2018-02-13 13:59:03 +01:00
esp4_offload.c esp: check the NETIF_F_HW_ESP_TX_CSUM bit before segmenting 2018-02-27 10:46:01 +01:00
fib_frontend.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
fib_lookup.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_notifier.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
fib_rules.c ipv6: route: dissect flow in input path if fib rules need it 2018-02-28 22:44:44 -05:00
fib_semantics.c route: check sysctl_fib_multipath_use_neigh earlier than hash 2018-04-01 20:57:39 -04:00
fib_trie.c net/ipv4: Allow notifier to fail route replace 2018-03-29 14:10:30 -04:00
fou.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
gre_demux.c
gre_offload.c gso: fix payload length when gso_size is zero 2017-10-08 10:12:15 -07:00
icmp.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
igmp.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
inet_connection_sock.c Revert "defer call to mem_cgroup_sk_alloc()" 2018-02-02 19:49:31 -05:00
inet_diag.c sock_diag: request _diag module only when the family or proto has been registered 2018-03-12 11:03:42 -04:00
inet_fragment.c inet: frags: remove inet_frag_maybe_warn_overflow() 2018-03-31 23:25:39 -04:00
inet_hashtables.c inet: Avoid unitialized variable warning in inet_unhash() 2018-02-01 09:48:42 -05:00
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-04-07 22:32:32 -04:00
inetpeer.c inetpeer: fix uninit-value in inet_getpeer 2018-04-09 10:57:35 -04:00
ip_forward.c net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len 2018-03-04 17:49:17 -05:00
ip_fragment.c inet: frags: fix ip6frag_low_thresh boundary 2018-04-04 12:04:59 -04:00
ip_gre.c ip_gre: clear feature flags when incompatible o_flags are set 2018-04-10 11:03:32 -04:00
ip_input.c net: Make ip_ra_chain per struct net 2018-03-22 15:12:56 -04:00
ip_options.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ip_output.c net: avoid unneeded atomic operation in ip*_append_data() 2018-04-04 11:53:08 -04:00
ip_sockglue.c net: Replace ip_ra_lock with per-net mutex 2018-03-22 15:12:56 -04:00
ip_tunnel.c ip_tunnel: better validate user provided tunnel names 2018-04-05 15:16:15 -04:00
ip_tunnel_core.c net: store port/representator id in metadata_dst 2017-06-25 11:42:01 -04:00
ip_vti.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ipcomp.c
ipconfig.c net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
ipip.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ipmr.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ipmr_base.c ipmr: Make ipmr_dump() common 2018-03-26 13:14:43 -04:00
netfilter.c netfilter: remove struct nf_afinfo and its helper functions 2018-01-08 18:11:02 +01:00
ping.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
proc.c inet: frags: break the 2GB limit for frags storage 2018-03-31 23:25:39 -04:00
protocol.c net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
raw.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
raw_diag.c net: ipv6: add second dif to raw socket lookups 2017-08-07 11:39:22 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-09 17:04:10 -07:00
syncookies.c net/ipv4: disable SMC TCP option with SYN Cookies 2018-03-25 20:53:54 -04:00
sysctl_net_ipv4.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
tcp.c tcp: implement mmap() for zero copy receive 2018-04-16 18:26:37 -04:00
tcp_bbr.c net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit 2018-03-16 15:07:48 -04:00
tcp_bic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_cdg.c tcp: cdg: make struct tcp_cdg static 2017-10-16 21:24:25 +01:00
tcp_cong.c tcp: Namespace-ify sysctl_tcp_default_congestion_control 2017-11-15 14:09:52 +09:00
tcp_cubic.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_dctcp.c Revert "dctcp: update cwnd on congestion event" 2016-12-06 11:34:24 -05:00
tcp_diag.c net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store 2017-12-20 14:00:25 -05:00
tcp_fastopen.c tcp: pause Fast Open globally after third consecutive timeout 2017-12-13 15:51:12 -05:00
tcp_highspeed.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_htcp.c tcp: fix cwnd undo in Reno and HTCP congestion controls 2017-08-06 21:25:10 -07:00
tcp_hybla.c tcp: make undo_cwnd mandatory for congestion modules 2016-11-21 13:20:17 -05:00
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-02-28 12:03:47 -05:00
tcp_input.c tcp: avoid extra wakeups for SO_RCVLOWAT users 2018-04-16 18:26:37 -04:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-03-31 23:33:04 -04:00
tcp_lp.c tcp: switch TCP TS option (RFC 7323) to 1ms clock 2017-05-17 16:06:01 -04:00
tcp_metrics.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
tcp_minisocks.c crypto : chtls - CPL handler definition 2018-03-31 23:37:32 -04:00
tcp_nv.c tcp_nv: fix potential integer overflow in tcpnv_acked 2018-01-31 10:26:30 -05:00
tcp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
tcp_output.c tcp_bbr: better deal with suboptimal GSO (II) 2018-03-01 21:44:28 -05:00
tcp_rate.c tcp: invalidate rate samples during SACK reneging 2017-12-08 10:07:02 -05:00
tcp_recovery.c tcp: evaluate packet losses upon RTT change 2017-12-08 14:14:11 -05:00
tcp_scalable.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_timer.c tcp: purge write queue upon aborting the connection 2018-03-07 15:01:03 -05:00
tcp_ulp.c net: add a UID to use for ULP socket assignment 2018-02-06 11:39:31 +01:00
tcp_vegas.c tcp: fix under-evaluated ssthresh in TCP Vegas 2017-09-29 06:07:00 +01:00
tcp_vegas.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tcp_veno.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tcp_westwood.c tcp: Revert "tcp: remove CA_ACK_SLOWPATH" 2017-08-30 11:20:08 -07:00
tcp_yeah.c tcp: consolidate congestion control undo functions 2017-08-06 21:25:10 -07:00
tunnel4.c inet: whitespace cleanup 2018-02-28 11:43:28 -05:00
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-03-31 23:33:04 -04:00
udp_diag.c net: ipv6: add second dif to udp socket lookups 2017-08-07 11:39:22 -07:00
udp_impl.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
udp_offload.c gso: validate gso_type in GSO handlers 2018-01-22 16:01:30 -05:00
udp_tunnel.c net: add infrastructure to un-offload UDP tunnel port 2017-07-24 13:52:59 -07:00
udplite.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_input.c xfrm: Reinject transport-mode packets through tasklet 2017-12-19 08:23:21 +01:00
xfrm4_mode_beet.c networking: make skb_pull & friends return void pointers 2017-06-16 11:48:39 -04:00
xfrm4_mode_transport.c xfrm: Add encapsulation header offsets while SKB is not encrypted 2017-04-14 10:07:39 +02:00
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm4_policy.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_protocol.c xfrm: input: constify xfrm_input_afinfo 2017-02-09 10:22:17 +01:00
xfrm4_state.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfrm4_tunnel.c