OpenCloudOS-Kernel/net/ipv6
Eric Dumazet 46d3ceabd8 tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11 18:12:59 -07:00
..
netfilter netfilter: nf_conntrack: generalize nf_ct_l4proto_net 2012-07-04 19:37:22 +02:00
Kconfig xfrm: make xfrm_algo.c a module 2012-05-15 13:13:34 -04:00
Makefile [IPV6] MROUTE: Support multicast forwarding. 2008-04-05 22:33:38 +09:00
addrconf.c Merge branch 'delete-tokenring' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2012-05-16 01:02:40 -04:00
addrconf_core.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
addrlabel.c ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
af_inet6.c ipv6: bool conversions phase1 2012-05-18 02:24:13 -04:00
ah6.c ipv6: Handle PMTU in ICMP error handlers. 2012-06-15 14:54:11 -07:00
anycast.c ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
datagram.c ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
esp6.c ipv6: Handle PMTU in ICMP error handlers. 2012-06-15 14:54:11 -07:00
exthdrs.c net: Remove casts to same type 2012-06-04 11:45:11 -04:00
exthdrs_core.c ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
fib6_rules.c net/ipv6/fib6_rules.c: Checkpatch cleanup 2012-04-02 04:33:46 -04:00
icmp.c inet: Minimize use of cached route inetpeer. 2012-07-10 22:40:11 -07:00
inet6_connection_sock.c tcp: pass fl6 to inet6_csk_route_req() 2012-06-28 17:53:50 -07:00
inet6_hashtables.c net: Compute protocol sequence numbers and fragment IDs using MD5. 2011-08-06 18:33:19 -07:00
ip6_fib.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-06-25 15:50:32 -07:00
ip6_flowlabel.c ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
ip6_input.c inet: Sanitize inet{,6} protocol demux. 2012-06-19 18:56:21 -07:00
ip6_output.c inet: Minimize use of cached route inetpeer. 2012-07-10 22:40:11 -07:00
ip6_tunnel.c net: Fix (nearly-)kernel-doc comments for various functions 2012-07-10 23:13:45 -07:00
ip6mr.c ip6mr: Do not use RTA_PUT() macros 2012-06-27 15:36:44 -07:00
ipcomp6.c ipv6: Handle PMTU in ICMP error handlers. 2012-06-15 14:54:11 -07:00
ipv6_sockglue.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
mcast.c ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
mip6.c ipv6: correct the ipv6 option name - Pad0 to Pad1 2012-05-17 15:49:51 -04:00
ndisc.c inet: Minimize use of cached route inetpeer. 2012-07-10 22:40:11 -07:00
netfilter.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
proc.c ipv6: fix per device IP snmp counters 2012-01-17 23:56:18 -05:00
protocol.c inet: Sanitize inet{,6} protocol demux. 2012-06-19 18:56:21 -07:00
raw.c inet: Sanitize inet{,6} protocol demux. 2012-06-19 18:56:21 -07:00
reassembly.c ipv6: use skb coalescing in reassembly 2012-05-19 18:34:57 -04:00
route.c rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo(). 2012-07-10 22:40:13 -07:00
sit.c ipv4: Handle PMTU in all ICMP error handlers. 2012-06-14 22:22:07 -07:00
syncookies.c ipv6: remove redundant declarations 2012-07-05 23:40:28 -07:00
sysctl_net_ipv6.c net: Delete all remaining instances of ctl_path 2012-04-20 21:22:30 -04:00
tcp_ipv6.c tcp: TCP Small Queues 2012-07-11 18:12:59 -07:00
tunnel6.c net: ipv6: Standardize prefixes for message logging 2012-05-16 01:01:03 -04:00
udp.c net: skb_free_datagram_locked() doesnt drop all packets 2012-06-27 15:40:57 -07:00
udp_impl.h net: Make setsockopt() optlen be unsigned. 2009-09-30 16:12:20 -07:00
udplite.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
xfrm6_input.c netfilter: ipv6: use NFPROTO values for NF_HOOK invocation 2010-03-25 16:00:49 +01:00
xfrm6_mode_beet.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm6_mode_ro.c [IPSEC]: Make x->lastused an unsigned long 2008-01-28 14:53:52 -08:00
xfrm6_mode_transport.c
xfrm6_mode_tunnel.c ipsec: be careful of non existing mac headers 2012-02-23 16:50:45 -05:00
xfrm6_output.c xfrm6: remove unneeded NULL check in __xfrm6_output() 2012-02-01 02:52:48 -05:00
xfrm6_policy.c ipv6: Store route neighbour in rt6_info struct. 2012-07-05 02:41:58 -07:00
xfrm6_state.c net: remove ipv6_addr_copy() 2011-11-22 16:43:32 -05:00
xfrm6_tunnel.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00