OpenCloudOS-Kernel/net/ipv4
Martin KaFai Lau 479f85c366 tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
it could incorrectly think that a skb has already
been acked and queue a SCM_TSTAMP_ACK cmsg to the
sk->sk_error_queue.

In tcp_ack_tstamp(), it checks
'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
script, between() returns true but the tskey is actually not acked.
e.g. try between(3, 2, 1).

The fix is to replace between() with one before() and one !before().
By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
removed.

A packetdrill script is used to reproduce the dup ack scenario.
Due to the lacking cmsg support in packetdrill (may be I
cannot find it),  a BPF prog is used to kprobe to
sock_queue_err_skb() and print out the value of
serr->ee.ee_data.

Both the packetdrill and the bcc BPF script is attached at the end of
this commit message.

BPF Output Before Fix:
~~~~~~
      <...>-2056  [001] d.s.   433.927987: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   433.929563: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   433.930765: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   434.028177: : ee_data:1459
packetdrill-2056  [001] d.s.   434.029686: : ee_data:14599

BPF Output After Fix:
~~~~~~
      <...>-2049  [000] d.s.   113.517039: : ee_data:1459
      <...>-2049  [000] d.s.   113.517253: : ee_data:14599

BCC BPF Script:
~~~~~~
#!/usr/bin/env python

from __future__ import print_function
from bcc import BPF

bpf_text = """
#include <uapi/linux/ptrace.h>
#include <net/sock.h>
#include <bcc/proto.h>
#include <linux/errqueue.h>

#ifdef memset
#undef memset
#endif

int trace_err_skb(struct pt_regs *ctx)
{
	struct sk_buff *skb = (struct sk_buff *)ctx->si;
	struct sock *sk = (struct sock *)ctx->di;
	struct sock_exterr_skb *serr;
	u32 ee_data = 0;

	if (!sk || !skb)
		return 0;

	serr = SKB_EXT_ERR(skb);
	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
	bpf_trace_printk("ee_data:%u\\n", ee_data);

	return 0;
};
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
print("Attached to kprobe")
b.trace_print()

Packetdrill Script:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 1460) = 1460
0.200 write(4, ..., 13140) = 13140

0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1

0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257

0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 13:45:43 -04:00
..
netfilter netfilter: arp_tables: register table in initns 2016-04-07 11:58:49 +02:00
Kconfig ipv4: Remove inet_lro library 2016-02-17 16:15:46 -05:00
Makefile ipv4: Remove inet_lro library 2016-02-17 16:15:46 -05:00
af_inet.c net: ipv4: Fix truncated timestamp returned by inet_current_timestamp() 2016-03-21 22:56:38 -04:00
ah4.c ah4: Fix error return in ah_input(). 2015-08-25 13:38:50 -07:00
arp.c arp: correct return value of arp_rcv 2016-03-07 14:55:04 -05:00
cipso_ipv4.c ipv4: coding style: comparison for inequality with NULL 2015-04-03 12:11:15 -04:00
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c ipv4: Don't do expensive useless work during inetdev destroy. 2016-03-13 23:28:35 -04:00
esp4.c esp4: Switch to new AEAD interface 2015-05-28 11:23:20 +08:00
fib_frontend.c ipv4: initialize flowi4_flags before calling fib_lookup() 2016-03-22 15:59:23 -04:00
fib_lookup.h ipv4: consider TOS in fib_select_default 2015-07-24 22:46:11 -07:00
fib_rules.c net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
fib_semantics.c net: Fix prefsrc lookups 2015-11-04 21:34:37 -05:00
fib_trie.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-02-01 15:56:08 -08:00
fou.c GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU 2016-04-07 16:56:33 -04:00
gre_demux.c gre: Remove support for sharing GRE protocol hook. 2015-08-10 14:03:54 -07:00
gre_offload.c GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU 2016-04-07 16:56:33 -04:00
icmp.c net: ipv4: Convert IP network timestamps to be y2038 safe 2016-03-01 17:18:44 -05:00
igmp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
inet_connection_sock.c soreuseport: fix merge conflict in tcp bind 2016-02-24 13:38:18 -05:00
inet_diag.c net: diag: add a scheduling point in inet_diag_dump_icsk() 2016-03-14 19:38:09 -04:00
inet_fragment.c inet: kill unused skb_free op 2016-01-05 22:25:57 -05:00
inet_hashtables.c tcp/dccp: better use of ephemeral ports in connect() 2016-02-12 05:28:32 -05:00
inet_timewait_sock.c tcp/dccp: fix timewait races in timer handling 2015-09-21 16:32:29 -07:00
inetpeer.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
ip_forward.c net: remove skb_sender_cpu_clear() 2016-03-01 17:36:47 -05:00
ip_fragment.c net: Export ip fragment sysctl to unprivileged users 2016-02-16 20:42:54 -05:00
ip_gre.c GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU 2016-04-07 16:56:33 -04:00
ip_input.c ipv4: namespacify ip_early_demux sysctl knob 2016-02-16 20:42:54 -05:00
ip_options.c net: ipv4: Convert IP network timestamps to be y2038 safe 2016-03-01 17:18:44 -05:00
ip_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
ip_sockglue.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-02-23 00:09:14 -05:00
ip_tunnel.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
ip_tunnel_core.c Fix returned tc and hoplimit values for route with IPv6 encapsulation 2016-03-27 22:35:02 -04:00
ip_vti.c ip_tunnel: Move stats update to iptunnel_xmit() 2015-12-25 23:32:23 -05:00
ipcomp.c ipv4: coding style: comparison for equality with NULL 2015-04-03 12:11:15 -04:00
ipconfig.c ipv4: ipconfig: avoid unused ic_proto_used symbol 2016-01-29 19:39:09 -08:00
ipip.c iptunnel: scrub packet in iptunnel_pull_header 2016-02-18 14:34:54 -05:00
ipmr.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
netfilter.c ipv4: Pass struct net into ip_route_me_harder 2015-09-29 20:21:32 +02:00
ping.c net/ipv4: remove left over dead code 2016-03-02 15:00:55 -05:00
proc.c ipv4: Namespaceify ip_default_ttl sysctl knob 2016-02-16 20:42:54 -05:00
protocol.c net: Export inet_offloads and inet6_offloads 2014-09-19 17:15:31 -04:00
raw.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-02-23 00:09:14 -05:00
route.c route: do not cache fib route info on local routes with oif 2016-04-13 23:33:01 -04:00
syncookies.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
sysctl_net_ipv4.c ipv4: namespacify ip_early_demux sysctl knob 2016-02-16 20:42:54 -05:00
tcp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
tcp_bic.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_cdg.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_cong.c tcp: remove tcp_ecn_make_synack() socket argument 2015-09-25 13:00:38 -07:00
tcp_cubic.c tcp_cubic: do not set epoch_start in the future 2015-09-17 22:35:07 -07:00
tcp_dctcp.c tcp: allow dctcp alpha to drop to zero 2015-10-23 02:46:52 -07:00
tcp_diag.c net: diag: Support destroying TCP sockets. 2015-12-15 23:26:52 -05:00
tcp_fastopen.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
tcp_highspeed.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_htcp.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_hybla.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_illinois.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_input.c tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks 2016-04-21 13:45:43 -04:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
tcp_lp.c tcp: remove in_flight parameter from cong_avoid() methods 2014-05-03 19:23:07 -04:00
tcp_metrics.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-03-08 12:34:12 -05:00
tcp_minisocks.c tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In 2016-03-14 14:55:26 -04:00
tcp_offload.c net: Store checksum result for offloaded GSO checksums 2016-02-11 08:55:33 -05:00
tcp_output.c tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In 2016-03-14 14:55:26 -04:00
tcp_probe.c net: ipv4: tcp_probe: Replace timespec with timespec64 2016-03-01 17:18:44 -05:00
tcp_recovery.c tcp: use RACK to detect losses 2015-10-21 07:00:53 -07:00
tcp_scalable.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_timer.c ipv4: Namespaceify tcp_orphan_retries sysctl knob 2016-02-07 14:35:11 -05:00
tcp_vegas.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_vegas.h tcp: prepare CC get_info() access from getsockopt() 2015-04-29 17:10:38 -04:00
tcp_veno.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_westwood.c tcp_westwood: fix tcp_westwood_info() 2015-05-05 19:50:09 -04:00
tcp_yeah.c tcp_yeah: don't set ssthresh below 2 2016-01-11 17:25:16 -05:00
tunnel4.c
udp.c soreuseport: fix ordering for mixed v4/v6 sockets 2016-04-14 21:14:03 -04:00
udp_diag.c soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF 2016-01-04 22:49:59 -05:00
udp_impl.h net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
udp_offload.c net: Reset encap_level to avoid resetting features on inner IP headers 2016-03-23 14:19:08 -04:00
udp_tunnel.c tunnel: Clear IPCB(skb)->opt before dst_link_failure called 2016-02-23 19:11:56 -05:00
udplite.c net: Eliminate no_check from protosw 2014-05-23 16:28:53 -04:00
xfrm4_input.c netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c ipv4: ERROR: code indent should use tabs where possible 2013-12-26 13:43:21 -05:00
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c ipv4: hash net ptr into fragmentation bucket selection 2015-03-25 14:07:04 -04:00
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2015-12-22 16:26:31 -05:00
xfrm4_protocol.c xfrm4: Remove duplicate semicolon 2014-06-30 07:49:47 +02:00
xfrm4_state.c inet: make no_pmtu_disc per namespace and kill ipv4_config 2013-12-18 16:58:20 -05:00
xfrm4_tunnel.c