linux-sg2042

History

Jon Maxwell 45caeaa5ac dccp/tcp: fix routing redirect race As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) \|\|↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: `ceb3320610` ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>		2017-03-13 21:55:47 -07:00
..
netfilter	lib/vsprintf.c: remove %Z support	2017-02-27 18:43:47 -08:00
Kconfig	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next	2017-02-16 21:25:49 -05:00
Makefile	esp: Add a software GRO codepath	2017-02-15 11:04:11 +01:00
af_inet.c	net: Work around lockdep limitation in sockets that use sockets	2017-03-09 18:23:27 -08:00
ah4.c	IPsec: do not ignore crypto err in ah4 input	2017-01-16 12:57:48 +01:00
arp.c	NET: Fix /proc/net/arp for AX.25	2017-02-13 22:15:03 -05:00
cipso_ipv4.c	netlabel: out of bound access in cipso_v4_validate()	2017-02-04 19:44:22 -05:00
datagram.c	net: Set sk_txhash from a random number	2015-07-29 22:44:04 -07:00
devinet.c	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>	2017-03-02 08:42:32 +01:00
esp4.c	esp: Introduce a helper to setup the trailer	2017-01-17 10:23:08 +01:00
esp4_offload.c	esp: Add a software GRO codepath	2017-02-15 11:04:11 +01:00
fib_frontend.c	net: route: add missing nla_policy entry for RTA_MARK attribute	2017-03-01 10:25:56 -08:00
fib_lookup.h	ipv4: consider TOS in fib_select_default	2015-07-24 22:46:11 -07:00
fib_rules.c	switchdev: remove FIB offload infrastructure	2016-09-28 04:48:00 -04:00
fib_semantics.c	ipv4: fib: Notify about nexthop status changes	2017-02-08 15:25:18 -05:00
fib_trie.c	lib/vsprintf.c: remove %Z support	2017-02-27 18:43:47 -08:00
fou.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-10-30 12:42:58 -04:00
gre_demux.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-06-30 05:03:36 -04:00
gre_offload.c	net: add recursion limit to GRO	2016-10-20 14:32:22 -04:00
icmp.c	net: for rate-limited ICMP replies save one atomic operation	2017-01-09 15:49:12 -05:00
igmp.c	igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()	2017-02-09 16:43:45 -05:00
inet_connection_sock.c	net: Work around lockdep limitation in sockets that use sockets	2017-03-09 18:23:27 -08:00
inet_diag.c	tcp: remove early retransmit	2017-01-13 22:37:16 -05:00
inet_fragment.c	net: disable fragment reassembly if high_thresh is zero	2016-06-05 22:56:42 -04:00
inet_hashtables.c	inet: kill smallest_size and smallest_port	2017-01-18 13:04:29 -05:00
inet_timewait_sock.c	ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob	2016-12-29 11:38:31 -05:00
inetpeer.c	net: Add helper function to compare inetpeer addresses	2015-08-28 13:32:36 -07:00
ip_forward.c	ipv4: allow local fragmentation in ip_finish_output_gso()	2016-11-03 16:10:26 -04:00
ip_fragment.c	net: rename IP_INC_STATS_BH()	2016-04-27 22:48:23 -04:00
ip_gre.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
ip_input.c	net: VRF: Pass original iif to ip_route_input()	2016-09-16 04:24:07 -04:00
ip_options.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
ip_output.c	udp: avoid ufo handling on IP payload compression packets	2017-03-09 18:28:42 -08:00
ip_sockglue.c	ip: fix IP_CHECKSUM handling	2017-02-21 12:23:53 -05:00
ip_tunnel.c	netns: make struct pernet_operations::id unsigned int	2016-11-18 10:59:15 -05:00
ip_tunnel_core.c	lwtunnel: remove device arg to lwtunnel_build_state	2017-01-30 15:14:22 -05:00
ip_vti.c	netns: make struct pernet_operations::id unsigned int	2016-11-18 10:59:15 -05:00
ipcomp.c	ipv4: coding style: comparison for equality with NULL	2015-04-03 12:11:15 -04:00
ipconfig.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
ipip.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
ipmr.c	lib/vsprintf.c: remove %Z support	2017-02-27 18:43:47 -08:00
netfilter.c	netfilter: use skb_to_full_sk in ip_route_me_harder	2017-02-28 12:49:36 +01:00
ping.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-02-11 02:31:11 -05:00
proc.c	net: add LINUX_MIB_PFMEMALLOCDROP counter	2017-02-02 23:34:19 -05:00
protocol.c	net: Export inet_offloads and inet6_offloads	2014-09-19 17:15:31 -04:00
raw.c	net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP	2017-02-07 13:07:47 -05:00
raw_diag.c	net: ip, raw_diag -- Use jump for exiting from nested loop	2016-11-03 15:25:26 -04:00
route.c	ipv4: mask tos for input route	2017-02-26 11:03:38 -05:00
syncookies.c	syncookies: use SipHash in place of SHA1	2017-01-09 13:58:57 -05:00
sysctl_net_ipv4.c	net: Avoid receiving packets with an l3mdev on unbound UDP sockets	2017-01-30 15:00:58 -05:00
tcp.c	tcp: fix potential double free issue for fastopen_req	2017-03-02 14:05:41 -08:00
tcp_bbr.c	tcp_bbr: add a state transition diagram and accompanying comment	2016-10-29 17:12:43 -04:00
tcp_bic.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_cdg.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>	2017-03-02 08:42:27 +01:00
tcp_cong.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-11-22 13:27:16 -05:00
tcp_cubic.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_dctcp.c	Revert "dctcp: update cwnd on congestion event"	2016-12-06 11:34:24 -05:00
tcp_diag.c	net: diag: Fix refcnt leak in error path destroying socket	2016-08-23 23:11:36 -07:00
tcp_fastopen.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-01-28 10:33:06 -05:00
tcp_highspeed.c	tcp: add cwnd_undo functions to various tcp cc algorithms	2016-11-21 13:20:17 -05:00
tcp_htcp.c	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_hybla.c	tcp: make undo_cwnd mandatory for congestion modules	2016-11-21 13:20:17 -05:00
tcp_illinois.c	tcp: add cwnd_undo functions to various tcp cc algorithms	2016-11-21 13:20:17 -05:00
tcp_input.c	tcp/dccp: block BH for SYN processing	2017-03-01 15:03:31 -08:00
tcp_ipv4.c	dccp/tcp: fix routing redirect race	2017-03-13 21:55:47 -07:00
tcp_lp.c	tcp: make undo_cwnd mandatory for congestion modules	2016-11-21 13:20:17 -05:00
tcp_metrics.c	tcp: replace dst_confirm with sk_dst_confirm	2017-02-07 13:07:46 -05:00
tcp_minisocks.c	tcp: account for ts offset only if tsecr not zero	2017-02-22 16:35:58 -05:00
tcp_nv.c	tcp: add NV congestion control	2016-06-10 23:07:49 -07:00
tcp_offload.c	gso: Support partial splitting at the frag_list pointer	2016-09-19 20:59:34 -04:00
tcp_output.c	tcp: accommodate sequence number to a peer's shrunk receive window caused by precision loss in window scaling	2017-02-17 15:30:33 -05:00
tcp_probe.c	tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"	2017-02-21 13:26:03 -05:00
tcp_rate.c	tcp: export data delivery rate	2016-09-21 00:23:00 -04:00
tcp_recovery.c	tcp: enable RACK loss detection to trigger recovery	2017-01-13 22:37:16 -05:00
tcp_scalable.c	tcp: add cwnd_undo functions to various tcp cc algorithms	2016-11-21 13:20:17 -05:00
tcp_timer.c	tcp: fix various issues for sockets morphing to listen state	2017-03-07 13:58:33 -08:00
tcp_vegas.c	tcp: make undo_cwnd mandatory for congestion modules	2016-11-21 13:20:17 -05:00
tcp_vegas.h	tcp: replace cnt & rtt with struct in pkts_acked()	2016-05-11 14:43:19 -04:00
tcp_veno.c	tcp: add cwnd_undo functions to various tcp cc algorithms	2016-11-21 13:20:17 -05:00
tcp_westwood.c	tcp: make undo_cwnd mandatory for congestion modules	2016-11-21 13:20:17 -05:00
tcp_yeah.c	tcp: add cwnd_undo functions to various tcp cc algorithms	2016-11-21 13:20:17 -05:00
tunnel4.c	tunnels: correct conditional build of MPLS and IPv6	2016-07-11 13:27:06 -07:00
udp.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-02-07 16:29:30 -05:00
udp_diag.c	net: inet: diag: expose the socket mark to privileged processes.	2016-09-08 16:13:09 -07:00
udp_impl.h	udplite: call proper backlog handlers	2016-11-24 15:32:14 -05:00
udp_offload.c	net: add recursion limit to GRO	2016-10-20 14:32:22 -04:00
udp_tunnel.c	net: Remove deprecated tunnel specific UDP offload functions	2016-06-17 20:23:32 -07:00
udplite.c	udplite: call proper backlog handlers	2016-11-24 15:32:14 -05:00
xfrm4_input.c	esp: Add a software GRO codepath	2017-02-15 11:04:11 +01:00
xfrm4_mode_beet.c	…
xfrm4_mode_transport.c	esp: Add a software GRO codepath	2017-02-15 11:04:11 +01:00
xfrm4_mode_tunnel.c	ipv4: hash net ptr into fragmentation bucket selection	2015-03-25 14:07:04 -04:00
xfrm4_output.c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-10-24 06:54:12 -07:00
xfrm4_policy.c	xfrm: policy: make policy backend const	2017-02-09 10:22:19 +01:00
xfrm4_protocol.c	xfrm: input: constify xfrm_input_afinfo	2017-02-09 10:22:17 +01:00
xfrm4_state.c	xfrm: remove unused function	2017-01-10 10:57:12 +01:00
xfrm4_tunnel.c	…