OpenCloudOS-Kernel/net/ipv4
Eric Dumazet 2f53384424 tcp: allow splice() to build full TSO packets
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 17:35:43 -04:00
..
netfilter netfilter: remove forward module param confusion. 2012-03-22 22:36:17 -04:00
Kconfig net: Fix build regression when INET_UDP_DIAG=y and IPV6=m 2012-02-07 13:35:28 -05:00
Makefile tcp memory pressure controls 2011-12-12 19:04:10 -05:00
af_inet.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
ah4.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
arp.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
cipso_ipv4.c ipv4: Convert call_rcu() to kfree_rcu(), drop opt_kfree_rcu() 2012-02-21 09:03:31 -08:00
datagram.c ipv4: Lock socket and use cork flow in ip4_datagram_connect(). 2011-05-08 13:48:57 -07:00
devinet.c Disintegrate and delete asm/system.h 2012-03-28 15:58:21 -07:00
esp4.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
fib_frontend.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
fib_lookup.h ipv4: Fix nexthop caching wrt. scoping. 2011-03-24 18:06:47 -07:00
fib_rules.c net: ipv4: export fib_lookup and fib_table_lookup 2011-12-04 22:43:33 +01:00
fib_semantics.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
fib_trie.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
gre.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
icmp.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
igmp.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
inet_connection_sock.c tcp: bind() optimize port allocation 2012-01-25 21:50:43 -05:00
inet_diag.c netlink: add netlink_dump_control structure for netlink_dump_start() 2012-02-26 14:10:06 -05:00
inet_fragment.c net/ipv4: EXPORT_SYMBOL cleanups 2010-07-12 12:57:54 -07:00
inet_hashtables.c net: Compute protocol sequence numbers and fragment IDs using MD5. 2011-08-06 18:33:19 -07:00
inet_lro.c net: add skb frag size accessors 2011-10-19 03:10:46 -04:00
inet_timewait_sock.c net: Fix files explicitly needing to include module.h 2011-10-31 19:30:28 -04:00
inetpeer.c route: Remove redirect_genid 2012-03-08 00:30:32 -08:00
ip_forward.c ipv4: Save nexthop address of LSRR/SSRR option to IPCB. 2011-11-23 19:19:32 -05:00
ip_fragment.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
ip_gre.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
ip_input.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
ip_options.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
ip_output.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
ip_sockglue.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2012-03-20 21:04:47 -07:00
ipcomp.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
ipconfig.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
ipip.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
ipmr.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
netfilter.c netfilter: possible unaligned packet header in ip_route_me_harder 2011-11-21 18:46:18 +01:00
ping.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
proc.c tcp: reduce out_of_order memory use 2012-03-19 16:53:08 -04:00
protocol.c net: add __rcu annotations to protocol 2010-10-27 11:37:31 -07:00
raw.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-02 17:53:39 -07:00
syncookies.c tcp: fix syncookie regression 2012-03-11 15:52:12 -07:00
sysctl_net_ipv4.c tcp: properly initialize tcp memory limits 2012-02-02 14:34:41 -05:00
tcp.c tcp: allow splice() to build full TSO packets 2012-04-03 17:35:43 -04:00
tcp_bic.c tcp: fix undo after RTO for BIC 2012-01-20 14:17:26 -05:00
tcp_cong.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
tcp_cubic.c tcp: fix undo after RTO for CUBIC 2012-01-20 14:17:26 -05:00
tcp_diag.c inet_diag: Rename inet_diag_req into inet_diag_req_v2 2012-01-11 12:56:06 -08:00
tcp_highspeed.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_htcp.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_hybla.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_illinois.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_input.c tcp: reduce out_of_order memory use 2012-03-19 16:53:08 -04:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-18 23:29:41 -04:00
tcp_lp.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tcp_memcontrol.c Merge branch 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-03-20 18:11:21 -07:00
tcp_minisocks.c tcp: md5: rcu conversion 2012-01-31 12:14:00 -05:00
tcp_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-02-04 16:39:32 -05:00
tcp_probe.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
tcp_scalable.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_timer.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
tcp_vegas.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_vegas.h
tcp_veno.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_westwood.c tcp: mark tcp_congestion_ops read_mostly 2011-03-10 00:40:17 -08:00
tcp_yeah.c Fix common misspellings 2011-03-31 11:26:23 -03:00
tunnel4.c net: Convert printks to pr_<level> 2012-03-11 23:42:51 -07:00
udp.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
udp_diag.c net: kill duplicate included header 2012-01-17 10:31:12 -05:00
udp_impl.h net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
udplite.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00
xfrm4_input.c net/ipv4: EXPORT_SYMBOL cleanups 2010-07-12 12:57:54 -07:00
xfrm4_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_mode_transport.c [IPSEC]: Use IPv6 calling convention as the convention for x->mode->output 2007-10-10 16:55:54 -07:00
xfrm4_mode_tunnel.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm4_output.c xfrm4: Don't call icmp_send on local error 2011-07-01 17:33:19 -07:00
xfrm4_policy.c ipv4: fix ipsec forward performance regression 2011-10-24 03:01:22 -04:00
xfrm4_state.c net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules 2011-10-31 19:30:30 -04:00
xfrm4_tunnel.c net: ipv4: Standardize prefixes for message logging 2012-03-12 17:05:21 -07:00