OpenCloudOS-Kernel/net
Jiri Olsa a57de0b433 net: adding memory barrier to the poll and receive callbacks
Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp->rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp->rcv_nxt
  ...                        ...
  tp->rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                        wake_up_interruptible(sk->sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp->rcv_nxt check and sleeps, or will get the new value for
tp->rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-09 17:06:57 -07:00
..
9p net/9p: Fix crash due to bad mount parameters. 2009-07-02 13:17:01 -07:00
802 net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
8021q 8021q: Vlan driver should use rcu_barrier() on unload instead of syncronize_net() 2009-06-10 01:11:22 -07:00
appletalk net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
atm net: adding memory barrier to the poll and receive callbacks 2009-07-09 17:06:57 -07:00
ax25 net: Move rx skb_orphan call to where needed 2009-06-23 16:36:25 -07:00
bluetooth net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
bridge bridge: Use rcu_barrier() instead of syncronize_net() on unload. 2009-06-26 13:51:32 -07:00
can can: af_can.c use rcu_barrier() on module unload. 2009-06-10 01:11:24 -07:00
core net: adding memory barrier to the poll and receive callbacks 2009-07-09 17:06:57 -07:00
dcb DCB: fix kfree(skb) 2009-01-04 17:29:21 -08:00
dccp net: adding memory barrier to the poll and receive callbacks 2009-07-09 17:06:57 -07:00
decnet decnet: Use rcu_barrier() on module unload. 2009-06-26 13:51:27 -07:00
dsa dsa: fix 88e6xxx statistics counter snapshotting 2009-07-05 18:03:35 -07:00
econet net: sk_wmem_alloc has initial value of one, not zero 2009-06-17 04:31:25 -07:00
ethernet net: remove COMPAT_NET_DEV_OPS 2009-05-25 01:53:53 -07:00
ieee802154 nl802154: add module license and description 2009-06-29 18:20:28 +04:00
ipv4 net: adding memory barrier to the poll and receive callbacks 2009-07-09 17:06:57 -07:00
ipv6 IPv6: preferred lifetime of address not getting updated 2009-07-03 19:10:13 -07:00
ipx net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
irda net: Move rx skb_orphan call to where needed 2009-06-23 16:36:25 -07:00
iucv net: adding memory barrier to the poll and receive callbacks 2009-07-09 17:06:57 -07:00
key net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
lapb
llc net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
mac80211 mac80211: minstrel: avoid accessing negative indices in rix_to_ndx() 2009-07-07 12:55:28 -04:00
netfilter netfilter: xtables: conntrack match revision 2 2009-06-29 14:31:46 +02:00
netlabel netlabel: Use genl_register_family_with_ops() 2009-05-21 16:50:24 -07:00
netlink net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
netrom net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
packet net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
phonet Phonet: generate Netlink RTM_DELADDR when destroying a device 2009-06-25 02:58:16 -07:00
rds Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-18 21:08:20 -07:00
rfkill rfkill: export persistent attribute in sysfs 2009-06-19 11:50:18 -04:00
rose net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
rxrpc net: adding memory barrier to the poll and receive callbacks 2009-07-09 17:06:57 -07:00
sched net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
sctp sctp: fix warning at inet_sock_destruct() while release sctp socket 2009-07-06 12:47:08 -07:00
sunrpc sunrpc: Use rcu_barrier() on unload. 2009-06-26 13:51:34 -07:00
tipc tipc: Use genl_register_family_with_ops() 2009-05-21 16:50:23 -07:00
unix net: adding memory barrier to the poll and receive callbacks 2009-07-09 17:06:57 -07:00
wanrouter wanrouter: fix sparse warnings: context imbalance 2009-02-26 23:13:36 -08:00
wimax wimax: fix warning caused by not checking retval of rfkill_set_hw_state() 2009-06-11 11:12:48 -07:00
wireless cfg80211: fix refcount leak 2009-07-07 12:55:28 -04:00
x25 net: correct off-by-one write allocations reports 2009-06-18 00:29:12 -07:00
xfrm xfrm: use xfrm_addr_cmp() instead of compare addresses directly 2009-06-29 19:41:46 -07:00
Kconfig net: add IEEE 802.15.4 socket family implementation 2009-06-09 05:25:32 -07:00
Makefile net: add IEEE 802.15.4 socket family implementation 2009-06-09 05:25:32 -07:00
TUNABLE
compat.c net: socket infrastructure for SO_TIMESTAMPING 2009-02-15 22:43:35 -08:00
nonet.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2009-04-06 18:05:43 -07:00
sysctl_net.c net: sysctl_net - use net_eq to compare nets 2009-03-16 16:23:30 +01:00