Commit Graph

27917 Commits

Author SHA1 Message Date
Pravin B Shelar 8e4e1713e4 openvswitch: Simplify datapath locking.
Currently OVS uses combination of genl and rtnl lock to protect
datapath state.  This was done due to networking stack locking.
But this has complicated locking and there are few lock ordering
issues with new tunneling protocols.
Following patch simplifies locking by introducing new ovs mutex
and now this lock is used to protect entire ovs state.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-04-15 14:38:40 -07:00
Vlad Yasevich 4cd729b042 net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api
The current implementation of dev_uc_sync/unsync() assumes that there is
a strict 1-to-1 relationship between the source and destination of the sync.
In other words, once an address has been synced to a destination device, it
will not be synced to any other device through the sync API.
However, there are some virtual devices that aggreate a number of lower
devices and need to sync addresses to all of them.  The current
API falls short there.

This patch introduces a new dev_uc_sync_multiple() api that can be called
in the above circumstances and allows sync to work for every invocation.

CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15 16:10:47 -04:00
Daniel Borkmann 0022d2dd4d net: sctp: minor: make sctp_ep_common's member 'dead' a bool
Since dead only holds two states (0,1), make it a bool instead
of a 'char', which is more appropriate for its purpose.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15 14:11:37 -04:00
Daniel Borkmann ff2266cddd net: sctp: remove sctp_ep_common struct member 'malloced'
There is actually no need to keep this member in the structure, because
after init it's always 1 anyway, thus always kfree called. This seems to
be an ancient leftover from the very initial implementation from 2.5
times. Only in case the initialization of an association fails, we leave
base.malloced as 0, but we nevertheless kfree it in the error path in
sctp_association_new().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15 14:11:37 -04:00
Wei Yongjun 06848c10f7 esp4: fix error return code in esp_output()
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15 14:05:34 -04:00
stephen hemminger 8f3359bdc8 bridge: make user modified path cost sticky
Keep a STP port path cost value if it was set by a user.
Don't replace it with the link-speed based path cost
whenever the link goes down and comes back up.

Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-15 14:03:44 -04:00
Daniel Borkmann 4731d011d6 net: tcp_memcontrol: minor: remove unused variable
Commit 10b96f7306 (``tcp_memcontrol: remove a redundant statement
in tcp_destroy_cgroup()'') says ``We read the value but make no use
of it.'', but forgot to remove the variable declaration as well. This
was a follow-up commit of 3f1346193 (``memcg: decrement static keys
at real destroy time'') that removed the read of variable 'val'.

This fixes therefore:

  CC      net/ipv4/tcp_memcontrol.o
net/ipv4/tcp_memcontrol.c: In function ‘tcp_destroy_cgroup’:
net/ipv4/tcp_memcontrol.c:67:6: warning: unused variable ‘val’ [-Wunused-variable]

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14 15:41:49 -04:00
Daniel Borkmann bf84a01063 net: sock: make sock_tx_timestamp void
Currently, sock_tx_timestamp() always returns 0. The comment that
describes the sock_tx_timestamp() function wrongly says that it
returns an error when an invalid argument is passed (from commit
20d4947353, ``net: socket infrastructure for SO_TIMESTAMPING'').
Make the function void, so that we can also remove all the unneeded
if conditions that check for such a _non-existant_ error case in the
output path.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14 15:41:49 -04:00
Cong Wang f88c91ddba ipv6: statically link register_inet6addr_notifier()
Tomas reported the following build error:

net/built-in.o: In function `ieee80211_unregister_hw':
(.text+0x10f0e1): undefined reference to `unregister_inet6addr_notifier'
net/built-in.o: In function `ieee80211_register_hw':
(.text+0x10f610): undefined reference to `register_inet6addr_notifier'
make: *** [vmlinux] Error 1

when built IPv6 as a module.

So we have to statically link these symbols.

Reported-by: Tomas Melin <tomas.melin@iki.fi>
Cc: Tomas Melin <tomas.melin@iki.fi>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14 15:24:17 -04:00
Eric Dumazet bece1b9708 tcp: tcp_tso_segment() small optimization
We can move th->check computation out of the loop, as compiler
doesn't know each skb initially share same tcp headers after
skb_segment()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-13 16:54:14 -04:00
Eric Dumazet d6a4a10411 tcp: GSO should be TSQ friendly
I noticed that TSQ (TCP Small queues) was less effective when TSO is
turned off, and GSO is on. If BQL is not enabled, TSQ has then no
effect.

It turns out the GSO engine frees the original gso_skb at the time the
fragments are generated and queued to the NIC.

We should instead call the tcp_wfree() destructor for the last fragment,
to keep the flow control as intended in TSQ. This effectively limits
the number of queued packets on qdisc + NIC layers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-12 18:17:06 -04:00
Eric Dumazet d14a489a41 act_csum: fix possible use after free
tcf_csum_skb_nextlayer() / pskb_may_pull() can change skb->head, so we
must be careful not keeping pointers to previous headers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Grégoire Baron <baronchon@n7mm.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-12 15:25:41 -04:00
David Ward fb745e9a03 net/802/mrp: fix possible race condition when calling mrp_pdu_queue()
(Adapted from a very similar change to net/802/garp.c by Cong Wang.)

mrp_pdu_queue() should ways be called with the applicant spin lock.
mrp_uninit_applicant() only holds the rtnl lock which is not enough;
a race is possible because mrp_rcv() is called in BH context:

	mrp_rcv()
	  |->mrp_pdu_parse_msg()
	    |->mrp_pdu_parse_vecattr()
	      |->mrp_pdu_parse_vecattr_event()
	        |-> mrp_attr_event()
	          |-> mrp_pdu_append_vecattr_event()
	            |-> mrp_pdu_queue()

Cc: Cong Wang <amwang@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-12 15:10:48 -04:00
David S. Miller 8b5b8c2990 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf into netfilter
Pablo Neira Ayuso says:

====================
The following patchset contains late netfilter fixes for your net
tree, they are:

* Don't drop segmented TCP packets in the SIP helper, we've got reports
  from users that this was breaking communications when the SIP phone
  messages are larger than the MTU, from Patrick McHardy.

* Fix refcount leak in the ipset list set, from Jozsef Kadlecsik.

* On hash set resizing, the nomatch flag was lost, thus entirely inverting
  the logic of the set matching, from Jozsef Kadlecsik.

* Fix crash on NAT modules removal. Timer expiration may race with the
  module cleanup exit path while deleting conntracks, from Florian
  Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-12 14:26:39 -04:00
John W. Linville bef086e08e Merge branch 'for-john' of git://x-git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-12 13:20:51 -04:00
Samuel Ortiz be055b2f89 NFC: RFKILL support
All NFC devices will now get proper RFKILL support as long as they provide
some dev_up and dev_down hooks. Rfkilling an NFC device will bring it down
while it is left to userspace to bring it back up when being rfkill unblocked.
This is very similar to what Bluetooth does.

Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-12 16:54:45 +02:00
Samuel Ortiz 44b3decb41 rfkill: Add NFC to the list of supported radios
And return the proper string for it.

Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-12 16:54:38 +02:00
Florian Westphal c2d421e171 netfilter: nf_nat: fix race when unloading protocol modules
following oops was reported:
RIP: 0010:[<ffffffffa03227f2>]  [<ffffffffa03227f2>] nf_nat_cleanup_conntrack+0x42/0x70 [nf_nat]
RSP: 0018:ffff880202c63d40  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8801ac7bec28 RCX: ffff8801d0eedbe0
RDX: dead000000200200 RSI: 0000000000000011 RDI: ffffffffa03265b8
[..]
Call Trace:
 [..]
 [<ffffffffa02febed>] destroy_conntrack+0xbd/0x110 [nf_conntrack]

Happens when a conntrack timeout expires right after first part
of the nat cleanup has completed (bysrc hash removal), but before
part 2 has completed (re-initialization of nat area).

[ destroy callback tries to delete bysrc again ]

Patrick suggested to just remove the affected conntracks -- the
connections won't work properly anyway without nat transformation.

So, lets do that.

Reported-by: CAI Qian <caiqian@redhat.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-12 11:46:31 +02:00
David S. Miller 6c6779856a Revert "netprio_cgroup: make local table static"
This reverts commit 763eff57de.

It causes build regressions, as per Stephen Rothwell:

====================
After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:

net/core/netprio_cgroup.c:250:29: error: static declaration of 'net_prio_subsys' follows non-static declaration
include/linux/cgroup_subsys.h:71:1: note: previous declaration of 'net_prio_subsys' was here
====================

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-12 03:06:44 -04:00
Thomas Graf 50bceae9bd tcp: Reallocate headroom if it would overflow csum_start
If a TCP retransmission gets partially ACKed and collapsed multiple
times it is possible for the headroom to grow beyond 64K which will
overflow the 16bit skb->csum_start which is based on the start of
the headroom. It has been observed rarely in the wild with IPoIB due
to the 64K MTU.

Verify if the acking and collapsing resulted in a headroom exceeding
what csum_start can cover and reallocate the headroom if so.

A big thank you to Jim Foraker <foraker1@llnl.gov> and the team at
LLNL for helping out with the investigation and testing.

Reported-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 18:12:41 -04:00
David S. Miller 16e3d9648a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
1)  Allow to avoid copying DSCP during encapsulation
    by setting a SA flag. From Nicolas Dichtel.

2) Constify the netlink dispatch table, no need to modify it
   at runtime. From Mathias Krause.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 16:14:37 -04:00
Dmitry Popov d66954a066 tcp: incoming connections might use wrong route under synflood
There is a bug in cookie_v4_check (net/ipv4/syncookies.c):
	flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
			   RT_SCOPE_UNIVERSE, IPPROTO_TCP,
			   inet_sk_flowi_flags(sk),
			   (opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
			   ireq->loc_addr, th->source, th->dest);

Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be
taken. This dst_entry is used by new socket (get_cookie_sock ->
tcp_v4_syn_recv_sock), so its packets may take the wrong path.

Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-11 16:01:46 -04:00
Claudio Takahasi 93796fa6f2 Bluetooth: Reject SCO when hci connection timeouts
This patch sends Reject Synchronous Connection Request Command when
hci_conn_timeout is triggered, and the SCO connection is in BT_CONNECT2
state. It prevents inconsistency if the remote host doesn't implement
properly the timeout for the connection request, and it removes the
connection reference left when the socket is closed for incoming SCO
connections.

[ 2650.129080] sco_sock_release: sock ffff8801ca417400, sk ffff88020c408800
[ 2650.129092] sco_sock_clear_timer: sock ffff88020c408800 state 6
[ 2650.129101] __sco_sock_close: sk ffff88020c408800 state 6 socket
	ffff8801ca417400
[ 2650.129108] sco_chan_del: sk ffff88020c408800, conn ffff8801c650ea20,
	err 104
[ 2650.129114] hci_conn_put: hcon ffff88020c40a800 orig refcnt 1
[ 2650.129128] sco_sock_kill: sk ffff88020c408800 state 9
[ 2650.129135] sco_sock_destruct: sk ffff88020c408800
[ 2650.138468] hci_conn_timeout: hcon ffff88020c40a800 state BT_CONNECT2

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-11 16:34:18 -03:00
Claudio Takahasi baf4325197 Bluetooth: Remove unneeded parameter
This patch removes the status parameter of the l2cap_conn_add function.
The parameter 'status' is always 0.

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-11 16:34:18 -03:00
Claudio Takahasi 92f185c89f Bluetooth: Minor coding style fix
This patch removes unneeded initialization and empty line.

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-11 16:34:17 -03:00
Claudio Takahasi c10cc5a9d4 Bluetooth: Use GFP_KERNEL in sco_conn_add
This patch changes the memory allocation flags in the sco_conn_add
function, replacing the type to GFP_KERNEL. This function is executed
in process context and it is not called inside an atomic section.

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-11 16:34:17 -03:00
Claudio Takahasi ea323c1198 Bluetooth: Fix SCO connection reference
This patch fixes decrementing SCO connection reference right after
stablishing the SCO connection with defer setup enabled. The dump below
shows a disconnection command with handle 0, the connection is still in
BT_CONNECT2 state and there isn't a handle associated with it.

< HCI Command: Accept Synchronous Connection (0x01|0x0029) plen 21
  bdaddr 78:47:1D:B3:72:6C
> HCI Event: Command Status (0x0f) plen 4
  Accept Synchronous Connection (0x01|0x0029) status 0x00 ncmd 1
< HCI Command: Disconnect (0x01|0x0006) plen 3
  handle 0 reason 0x13
  Reason: Remote User Terminated Connection
> HCI Event: Command Status (0x0f) plen 4
  Disconnect (0x01|0x0006) status 0x00 ncmd 1
> HCI Event: Synchronous Connect Complete (0x2c) plen 17
  status 0x00 handle 46 bdaddr 78:47:1D:B3:72:6C
  type eSCO
  Air mode: CVSD
< SCO data: handle 46 flags 0x00 dlen 48

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-11 16:34:16 -03:00
David Herrmann 76a68ba0ae Bluetooth: rename hci_conn_put to hci_conn_drop
We use _get() and _put() for device ref-counting in the kernel. However,
hci_conn_put() is _not_ used for ref-counting, hence, rename it to
hci_conn_drop() so we can later fix ref-counting and introduce
hci_conn_put().

hci_conn_hold() and hci_conn_put() are currently used to manage how long a
connection should be held alive. When the last user drops the connection,
we spawn a delayed work that performs the disconnect. Obviously, this has
nothing to do with ref-counting for the _object_ but rather for the
keep-alive of the connection.

But we really _need_ proper ref-counting for the _object_ to allow
connection-users like rfcomm-tty, HIDP or others.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-11 16:34:15 -03:00
Samuel Ortiz 7757dc8a3e NFC: Prevent polling when device is down
Some devices turn radio on whenever they're asked to start a poll.
To prevent that from happening, we just don't call into the driver
start_poll hook when the NFC device is down.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:29:10 +02:00
Samuel Ortiz 6d2cd978e5 NFC: llcp: Terminate connection when receiving a DISC on (0,0)
According to the LLCP specs, we must terminate the LLCP link when receiving
a DISC with both ssap and dsap set to 0.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:29:09 +02:00
Samuel Ortiz c470e319b4 NFC: llcp: Remove local_cleanup last argument
local_cleanup is always called with device set to false as it means the
local LLCP is going away. So no need to pass this switch as an argument.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:29:09 +02:00
Samuel Ortiz b436a13deb NFC: llcp: Only keep raw sockets alive when the LLCP local leaves
When the MAC goes down, connected and connection less sockets should be
notified, but raw sockets should be kept alive.
They will get notified only when the physical devices goes away.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:29:09 +02:00
Thierry Escande 064f370c5f NFC: llcp: Add support in getsockopt for RW, LTO, and MIU remote parameters
Useful for LLCP validation tests.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:28:59 +02:00
Thierry Escande abd18d4330 NFC: llcp: Reset RW, LTO, and MIU remote parameters when link goes down
This resets remote parameters in both local and socket llcp structures when the
link goes down. That way, nfc_llcp_getsockopt won't return values corresponding
to the previous link parameters.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:28:59 +02:00
Thierry Escande 66cbfa10f3 NFC: llcp: Use localy stored remote_miu value if not set at socket level
If remote_miu value is not set in the socket (i.e. connection-less socket) the
value stored in the local is used.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:28:58 +02:00
Thierry Escande 098dafcfb4 NFC: llcp: Aggregated frames support
This adds support for AGF PDUs. For each PDU contained in the AGF, a new sk_buff
is allocated and dispatched to its corresponding handler.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:28:58 +02:00
Olivier Guiter 0b23d666a8 NFC: llcp: Fix zero octets length SDU handling
LLCP Validation test #2 (Connection-less information transfer) send a
service data unit of zero octets length. This is now handled correctly.

Signed-off-by: Olivier Guiter <olivier.guiter@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:28:57 +02:00
Samuel Ortiz 00e856db49 NFC: llcp: Fall back to local values when getting socket options
If a socket option has not been set by the user, fall back to the LLCP
local ones.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:28:57 +02:00
Samuel Ortiz 5eef666975 NFC: llcp: Socket miux is a big endian field
The MIUX must be transmitted in big endian and as such we have to convert
it properly.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-04-11 16:28:56 +02:00
Karl Beldan 5253ffb8c9 mac80211: always pick a basic rate to tx RTS/CTS for pre-HT rates
When the 1st rate control entry is a pre-HT rate we want to set
rts_cts_rate_idx "as the fastest basic rate that is not faster than the
data rate"(code comments).
But in case some bss allowed rate indexes are lower than the lowest bss
basic rate, if the rate control selects a rate among the formers for its
1st rate control entry, rts_cts_rate_idx remains 0 and is not a basic
rate index.
This commit sets rts_cts_rate_idx to the lowest bss basic rate index in
this situation.

Note that the code assumes that lowest indexes == lowest bitrates.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-11 12:08:20 +02:00
Thomas Pedersen 2419ea14bb mac80211: fix ieee80211_queue_stopped()
Johannes Berg notes mac80211 drivers which use
ieee80211_queue_stopped() really only want to know if they
previously requested a queue stop.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-11 12:08:06 +02:00
stephen hemminger 763eff57de netprio_cgroup: make local table static
Minor sparse warning

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-10 23:23:43 -04:00
Andy Zhou b4f9e8cdc8 openvswitch: datapath.h: Fix a stale comment.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-04-10 14:57:48 -07:00
Linus Torvalds fe2971a017 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) cfg80211_conn_scan() must be called with the sched_scan_mutex, fix
    from Artem Savkov.

 2) Fix regression in TCP ICMPv6 processing, we do not want to treat
    redirects as socket errors, from Christoph Paasch.

 3) Fix several recvmsg() msg_name kernel memory leaks into userspace,
    in ATM, AX25, Bluetooth, CAIF, IRDA, s390 IUCV, L2TP, LLC, Netrom,
    NFC, Rose, TIPC, and VSOCK.  From Mathias Krause and Wei Yongjun.

 4) Fix AF_IUCV handling of segmented SKBs in recvmsg(), from Ursula
    Braun and Eric Dumazet.

 5) CAN gw.c code does kfree() on SLAB cache memory, use
    kmem_cache_free() instead.  Fix from Wei Yongjun.

 6) Fix LSM regression on TCP SYN/ACKs, some LSMs such as SELINUX want
    an skb->sk socket context available for these packets, but nothing
    else requires it.  From Eric Dumazet and Paul Moore.

 7) Fix ipv4 address lifetime processing so that we don't perform
    sleepable acts inside of rcu_read_lock() sections, do them in an
    rtnl_lock() section instead.  From Jiri Pirko.

 8) mvneta driver accidently sets HW features after device registry, it
    should do so beforehand.  Fix from Willy Tarreau.

 9) Fix bonding unload races more correctly, from Nikolay Aleksandrov
    and Veaceslav Falico.

10) rtnl_dump_ifinfo() and rtnl_calcit() invoke nlmsg_parse() with wrong
    header size argument.  Fix from Michael Riesch.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  lsm: add the missing documentation for the security_skb_owned_by() hook
  bnx2x: Prevent null pointer dereference in AFEX mode
  e100: Add dma mapping error check
  selinux: add a skb_owned_by() hook
  can: gw: use kmem_cache_free() instead of kfree()
  netrom: fix invalid use of sizeof in nr_recvmsg()
  qeth: fix qeth_wait_for_threads() deadlock for OSN devices
  af_iucv: fix recvmsg by replacing skb_pull() function
  rtnetlink: Call nlmsg_parse() with correct header length
  bonding: fix bonding_masters race condition in bond unloading
  Revert "bonding: remove sysfs before removing devices"
  net: mvneta: enable features before registering the driver
  hyperv: Fix RNDIS send_completion code path
  hyperv: Fix a kernel warning from netvsc_linkstatus_callback()
  net: ipv4: fix schedule while atomic bug in check_lifetime()
  net: ipv4: reset check_lifetime_work after changing lifetime
  bnx2x: Fix KR2 rapid link flap
  sctp: remove 'sridhar' from maintainers list
  VSOCK: Fix missing msg_namelen update in vsock_stream_recvmsg()
  VSOCK: vmci - fix possible info leak in vmci_transport_dgram_dequeue()
  ...
2013-04-10 14:15:27 -07:00
Johannes Berg 7b119dc06d mac80211: fix cfg80211 interaction on auth/assoc request
If authentication (or association with FT) is requested by
userspace, mac80211 currently doesn't tell cfg80211 that it
disconnected from the AP. That leaves inconsistent state:
cfg80211 thinks it's connected while mac80211 thinks it's
not. Typically this won't last long, as soon as mac80211
reports the new association to cfg80211 the old one goes
away. If, however, the new authentication or association
doesn't succeed, then cfg80211 will forever think the old
one still exists and will refuse attempts to authenticate
or associate with the AP it thinks it's connected to.

Anders reported that this leads to it taking a very long
time to reconnect to a network, or never even succeeding.
I tested this with an AP hacked to never respond to auth
frames, and one that works, and with just those two the
system never recovers because one won't work and cfg80211
thinks it's connected to the other so refuses connections
to it.

To fix this, simply make mac80211 tell cfg80211 when it is
no longer connected to the old AP, while authenticating or
associating to a new one.

Cc: stable@vger.kernel.org
Reported-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-10 21:38:36 +02:00
Marek Puzyniak 0ca54f6c5f mac80211: provide SSID in IBSS mode
Some drivers need SSID in AP and IBSS mode. AP SSID is provided
through BSS_CHANGED_SSID notification. There was no easy way to
do the same for IBSS. In IBSS mode SSID is known but was not
stored in BSS configuration. Extend the AP-mode functionality
to also work in IBSS mode.

Signed-off-by: Marek Puzyniak <marek.puzyniak@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-10 20:24:18 +02:00
Johannes Berg a21a4d3e8a mac80211: always advertise STBC/MCSes even if no AP support
Advertise STBC capabilities and MCS rates even if the AP
doesn't support them. This has always been the right thing
to do, but used to be problematic with some APs. Now WFA
testing requires this so re-enable it, problematic APs
would then presumably not pass the test and be fixed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-10 20:24:17 +02:00
Marek Puzyniak 0eabccd940 mac80211: clear SSID when stopping AP
When AP interface is stopped ssid_len in the BSS configuration
isn't cleared which can confuse drivers when switching modes.
Set the length to zero when stopping the AP interface.

Signed-off-by: Marek Puzyniak <marek.puzyniak@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-10 20:24:16 +02:00
Simon Wunderlich e47468518b mac80211: fix recalc_radar hwconf sync problem
local->hw.conf maybe not be synced when recalcing whether radar is
enabled, sometimes leaving radar enabled even if it's not neccesary
anymore.

Fix this by:
 * setting radar_enabled when creating the chanctx
 * turning radar_enabled off before destroying the last channel context

Reported-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-10 20:24:16 +02:00
Thomas Pedersen 3088f7d2db mac80211: stringify another plink state
The patch "mac80211: stringify mesh peering events" missed
an opportunity to print the peering state as a string.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-10 20:24:15 +02:00
Thomas Pedersen 9d6d6f4924 mac80211: unset FC retry bit in mesh fwding path
Otherwise forwarded frames would keep the retry bit set
from the previous link transmission.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-10 20:24:15 +02:00
John W. Linville 655d8e2328 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/ath/carl9170/debug.c
	drivers/net/wireless/ath/carl9170/main.c
	net/mac80211/ieee80211_i.h
2013-04-10 14:09:54 -04:00
Linus Torvalds f94eeb423b NFS client bugfixes for Linux 3.9
- Stable fix for memory corruption issues in nfs4[01]_walk_client_list
 - Stable fix for an Oopsable bug in rpc_clone_client
 - Another state manager deadlock in the NFSv4 open code
 - Memory leaks in nfs4_discover_server_trunking and rpc_new_client
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRZYu9AAoJEGcL54qWCgDySfwP/R2IdO2nfRzmDCPtvD6pPg8T
 l8Gf97Z/8A3g6WwfvmKNt48D1fKnhAcOaKTZQIZuZePAjI/Yy74DFMof6paiDmsO
 8hMcZgvunZotPwmBmhIwmLOxDYgbpdizDBlITsimnUQLrv78bMw2F/cNCcThYgTI
 Q4sNpZsl4kk1nmOYK/tGBCCkq6mIQhc95QeQPgnl2B/NozpZiIqgzrpWpSWMofn2
 cuSLiuEdmpCdJbgQaPEjSWf+doo/nBn720+Xj2RjmLhTTnWUtAsouElAdMs96Jjz
 cEhSll3nLIygr1xdFF7CD8qFjpbtg/YNhKw3HBCFAgHjrAjr+a3N+eHQOz9QQ6W4
 5OL3Mj0VEkvMrK1Sy76smynQJMJhrsn852Zo2wK2mCp+mHNZlBlML529Y4PJy2Ba
 Up4MteIaOTpKGSnBdzWmqPqro9glqlhrUk/o3XipCzIziWC8yDYjl2J9Ez8B7Ren
 uzvBeevYRX9AmQlmZUAPvx8+xVqA6cr0X2q8/6PqPnrNXP6Ff8+rm6gvH4VozyzJ
 qd/r7Bf1ozFXxoKQOztSiGjI5YiBp4DRXycR5td6eF3nZJipmbxY+WKllhaAakn6
 UY2NsGX2zfxkJMltqd2/xRmHtN+Eif1Uoo35pvzNxzBtPsRxBMIiPhGLglQu98Yj
 2NuwfT4//UNfS6JlBe6E
 =kBf2
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - fix for memory corruption issues in nfs4[01]_walk_client_list (stable)
 - fix for an Oopsable bug in rpc_clone_client (stable)
 - another state manager deadlock in the NFSv4 open code
 - memory leaks in nfs4_discover_server_trunking and rpc_new_client

* tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix another potential state manager deadlock
  SUNRPC: Fix a potential memory leak in rpc_new_client
  NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
  NFSv4: Fix a memory leak in nfs4_discover_server_trunking
  SUNRPC: Remove extra xprt_put()
2013-04-10 09:00:51 -07:00
John W. Linville d3641409a0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/rt2x00/rt2x00pci.c
	net/mac80211/sta_info.c
	net/wireless/core.h
2013-04-10 10:39:27 -04:00
John W. Linville 6fe5468f45 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
Conflicts:
	drivers/net/wireless/rt2x00/rt2x00pci.c
2013-04-10 09:31:39 -04:00
Jozsef Kadlecsik 6eb4c7e96e netfilter: ipset: hash:*net*: nomatch flag not excluded on set resize
If a resize is triggered the nomatch flag is not excluded at hashing,
which leads to the element missed at lookup in the resized set.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-09 21:04:16 +02:00
Jozsef Kadlecsik 02f815cb6d netfilter: ipset: list:set: fix reference counter update
The last element can be replaced or pushed off and in both
cases the reference counter must be updated.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-09 21:02:11 +02:00
David S. Miller 7b9a035f0c Merge branch 'fixes-for-3.9' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:

====================
here's a fix for the v3.9 release cycle, if not too late:

Wei Yongjun contributes a patch for the can-gw protocoll. The patch fixes the
memory allocated with kmem_cache_alloc(), is now freed using kmem_cache_free(),
not kfree().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:24:35 -04:00
Eric Dumazet ca10b9e9a8 selinux: add a skb_owned_by() hook
Commit 90ba9b1986 (tcp: tcp_make_synack() can use alloc_skb())
broke certain SELinux/NetLabel configurations by no longer correctly
assigning the sock to the outgoing SYNACK packet.

Cost of atomic operations on the LISTEN socket is quite big,
and we would like it to happen only if really needed.

This patch introduces a new security_ops->skb_owned_by() method,
that is a void operation unless selinux is active.

Reported-by: Miroslav Vadkerti <mvadkert@redhat.com>
Diagnosed-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-security-module@vger.kernel.org
Acked-by: James Morris <james.l.morris@oracle.com>
Tested-by: Paul Moore <pmoore@redhat.com>
Acked-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:23:11 -04:00
Zefan Li 6ffd464102 netprio_cgroup: remove task_struct parameter from sock_update_netprio()
The callers always pass current to sock_update_netprio().

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:19:37 -04:00
Zefan Li 211d2f97e9 cls_cgroup: remove task_struct parameter from sock_update_classid()
The callers always pass current to sock_update_classid().

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:19:35 -04:00
Zefan Li 10b96f7306 tcp_memcontrol: remove a redundant statement in tcp_destroy_cgroup()
We read the value but make no use of it.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:19:34 -04:00
Daniel Borkmann 617fe29d45 net: ipv6: only invalidate previously tokenized addresses
Instead of invalidating all IPv6 addresses with global scope
when one decides to use IPv6 tokens, we should only invalidate
previous tokens and leave the rest intact until they expire
eventually (or are intact forever). For doing this less greedy
approach, we're adding a bool at the end of inet6_ifaddr structure
instead, for two reasons: i) per-inet6_ifaddr flag space is
already used up, making it wider might not be a good idea,
since ii) also we do not necessarily need to export this
information into user space.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:12:23 -04:00
Daniel Borkmann fc403832f7 net: ipv6: also allow token to be set when device not ready
When we set the iftoken in inet6_set_iftoken(), we return -EINVAL
when the device does not have flag IF_READY. This is however not
necessary and rather an artificial usability barrier, since we
simply can set the token despite that, and in case the device is
ready, we just send out our rs, otherwise ifup et al. will do
this for us anyway.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:12:23 -04:00
Daniel Borkmann 914faa147b net: ipv6: minor: use in6addr_any in token init
Since we check for !ipv6_addr_any(&in6_dev->token) in
addrconf_prefix_rcv(), make the token initialization on
device setup more intuitive by using in6addr_any as an
initializer.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-09 13:12:23 -04:00
Wei Yongjun 3480a21259 can: gw: use kmem_cache_free() instead of kfree()
Memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Cc: linux-stable <stable@vger.kernel.org> # >= v3.2
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-04-09 09:58:44 +02:00
Wei Yongjun c802d75962 netrom: fix invalid use of sizeof in nr_recvmsg()
sizeof() when applied to a pointer typed expression gives the size of the
pointer, not that of the pointed data.
Introduced by commit 3ce5ef(netrom: fix info leak via msg_name in nr_recvmsg)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 22:49:23 -04:00
Ursula Braun f9c41a62bb af_iucv: fix recvmsg by replacing skb_pull() function
When receiving data messages, the "BUG_ON(skb->len < skb->data_len)" in
the skb_pull() function triggers a kernel panic.

Replace the skb_pull logic by a per skb offset as advised by
Eric Dumazet.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 17:16:57 -04:00
Michael Riesch 88c5b5ce5c rtnetlink: Call nlmsg_parse() with correct header length
Signed-off-by: Michael Riesch <michael.riesch@omicron.at>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-kernel@vger.kernel.org
Acked-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 17:12:26 -04:00
Daniel Borkmann f53adae4ea net: ipv6: add tokenized interface identifier support
This patch adds support for IPv6 tokenized IIDs, that allow
for administrators to assign well-known host-part addresses
to nodes whilst still obtaining global network prefix from
Router Advertisements. It is currently in draft status.

  The primary target for such support is server platforms
  where addresses are usually manually configured, rather
  than using DHCPv6 or SLAAC. By using tokenised identifiers,
  hosts can still determine their network prefix by use of
  SLAAC, but more readily be automatically renumbered should
  their network prefix change. [...]

  The disadvantage with static addresses is that they are
  likely to require manual editing should the network prefix
  in use change.  If instead there were a method to only
  manually configure the static identifier part of the IPv6
  address, then the address could be automatically updated
  when a new prefix was introduced, as described in [RFC4192]
  for example.  In such cases a DNS server might be
  configured with such a tokenised interface identifier of
  ::53, and SLAAC would use the token in constructing the
  interface address, using the advertised prefix. [...]

  http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02

The implementation is partially based on top of Mark K.
Thompson's proof of concept. However, it uses the Netlink
interface for configuration resp. data retrival, so that
it can be easily extended in future. Successfully tested
by myself.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 16:55:28 -04:00
John W. Linville 8cab24f0b1 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-08 14:30:26 -04:00
John W. Linville 64e5751918 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-04-08 14:26:57 -04:00
Alan Ott 9f7f78b479 mac802154: Keep track of the channel when changed
Two sections checked whether the current channel != the new channel
without ever setting the current channel variables.

1. net/mac802154/tx.c: Prevent set_channel() from getting called every
time a packet is sent.

2. net/mac802154/mib.c: Lock (pib_lock) accesses to current_channel and
current_page and make sure they are updated when the channel has been
changed.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:09:18 -04:00
Mathias Krause c17277f7d7 TTY: ircomm, use GFP_KERNEL in ircomm_open()
Hi Greg,

I'm unsure if you or Dave should take that one as it's for one a TTY
patch but also living under net/. So I'm uncertain and let you decide!

Thanks,
Mathias

-- >8 --
Subject: [PATCH] TTY: ircomm, use GFP_KERNEL in ircomm_open()

We're clearly running in non-atomic context as our only call site is
able to call wait_event_interruptible(). So we're safe to use GFP_KERNEL
here instead of GFP_ATOMIC.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:09:18 -04:00
Mathias Krause e1e3c806da irda: use GFP_KERNEL in irda_connect_response()
The only call site of irda_connect_response() is irda_accept() -- a
function called from user context only. Therefore it has no need for
GFP_ATOMIC.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:09:18 -04:00
Mathias Krause 84e2306e94 irda: use GFP_KERNEL in irda_create()
irda_create() is called from user context only, therefore has no need
for GFP_ATOMIC.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:09:17 -04:00
Jiri Pirko c988d1e8cb net: ipv4: fix schedule while atomic bug in check_lifetime()
move might_sleep operations out of the rcu_read_lock() section.
Also fix iterating over ifa_dev->ifa_list

Introduced by: commit 5c766d642b "ipv4: introduce address lifetime"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:04:51 -04:00
Jiri Pirko 05a324b9c5 net: ipv4: reset check_lifetime_work after changing lifetime
This will result in calling check_lifetime in nearest opportunity and
that function will adjust next time to call check_lifetime correctly.
Without this, check_lifetime is called in time computed by previous run,
not affecting modified lifetime.

Introduced by: commit 5c766d642b "ipv4: introduce address lifetime"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:04:51 -04:00
Eric Dumazet 22251c73ca ip_gre: fix a possible crash in parse_gre_header()
pskb_may_pull() can change skb->head, so we must init iph/greh after
calling it.

Bug added in commit c544193214 (GRE: Refactor GRE tunneling code.)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:00:17 -04:00
Werner Almesberger 56aa091d60 ieee802154/nl-mac.c: make some MLME operations optional
Check for NULL before calling the following operations from "struct
ieee802154_mlme_ops": assoc_req, assoc_resp, disassoc_req, start_req,
and scan_req.

This fixes a current oops where those functions are called but not
implemented. It also updates the documentation to clarify that they
are now optional by design. If a call to an unimplemented function
is attempted, the kernel returns EOPNOTSUPP via netlink.

The following operations are still required: get_phy, get_pan_id,
get_short_addr, and get_dsn.

Note that the places where this patch changes the initialization
of "ret" should not affect the rest of the code since "ret" was
always set (again) before returning its value.

Signed-off-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 12:00:16 -04:00
Johannes Berg 62a40a1555 mac80211: fix LED in idle handling
feng xiangjun reports that my

commit 382a103b2b
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Fri Mar 22 22:30:09 2013 +0100

    mac80211: fix idle handling sequence

broke the wireless status LED. The reason is that
we now call ieee80211_idle_off() when the channel
context is assigned, and that doesn't recalculate
the LED state. Fix this by making that function a
wrapper around most of idle recalculation while
forcing active.

Reported-by: feng xiangjun <fengxj325@gmail.com>
Tested-by: feng xiangjun <fengxj325@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 13:12:05 +02:00
Patrick McHardy aaa795ad25 netfilter: nat: propagate errors from xfrm_me_harder()
Propagate errors from ip_xfrm_me_harder() instead of returning EPERM in
all cases.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-08 12:34:01 +02:00
Patrick McHardy 58e35d1471 netfilter: ipv6: propagate routing errors from ip6_route_me_harder()
Propagate routing errors from ip_route_me_harder() when dropping a packet
using NF_DROP_ERR(). This makes userspace get the proper error instead of
EPERM for everything.

# ip -6 r a unreachable default table 100
# ip -6 ru add fwmark 0x1 lookup 100
# ip6tables -t mangle -A OUTPUT -d 2001:4860:4860::8888 -j MARK --set-mark 0x1

Old behaviour:

PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted

New behaviour:

PING 2001:4860:4860::8888(2001:4860:4860::8888) 56 data bytes
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-08 12:34:01 +02:00
Patrick McHardy c9e1673a0a netfilter: ipv4: propagate routing errors from ip_route_me_harder()
Propagate routing errors from ip_route_me_harder() when dropping a packet
using NF_DROP_ERR(). This makes userspace get the proper error instead of
EPERM for everything.

Example:

# ip r a unreachable default table 100
# ip ru add fwmark 0x1 lookup 100
# iptables -t mangle -A OUTPUT -d 8.8.8.8 -j MARK --set-mark 0x1

Current behaviour:

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted

New behaviour:

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-08 12:34:00 +02:00
Johannes Berg ddc4db2e3d mac80211: make ieee802_11_parse_elems an inline
This (slightly) reduces the code size.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 11:08:32 +02:00
Johannes Berg 2b730daace mac80211: don't start new netdev queues if driver stopped
If a new netdev (e.g. an AP VLAN) is created while the driver
has queues stopped, the new netdev queues will be started even
though they shouldn't. This will lead to frames accumulating
on the internal mac80211 pending queues instead of properly
being held on the netdev queues.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 11:06:28 +02:00
Johannes Berg a23108248a mac80211: replace some dead code by a warning
Given the (nested) switch statements, this code can't
be reached, so make it warn instead of manipulating
the carrier state which seems purposeful.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 11:06:24 +02:00
Johannes Berg a159838324 mac80211: don't fiddle with netdev queues in MLME code
The netdev queues should always represent the state that
the driver gave them, so fiddling with them isn't really
appropriate in the mlme code. Also, since we stop queues
for flushing now, this really isn't necessary any more.

As the scan/offchannel code has also been modified to no
longer do this a while ago, remove the outdated smp_mb()
and comments about it.

While at it, also add a pair of braces that was missing.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 11:06:19 +02:00
Alexander Bondar 24aa11ab8a mac80211: disable uAPSD if all ACs are under ACM
It's unlikely that an AP requires WMM mandatory admission control
for all access categories, and if it does then we still transmit
on the background AC without requesting admission. However, avoid
using uAPSD in this case since the implementation could run into
issues and might use other ACs etc.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 11:05:45 +02:00
Johannes Berg b2c0958b20 mac80211: fix do_stop handling while suspended
When a device is unplugged while suspended, mac80211 is
de-initialized and all interfaces are removed while no
state is actually present in the driver. This can cause
warnings and driver confusion.

Fix this by reordering the do_stop code to not call the
driver when it is suspended, i.e. when there's no state
in the driver anyway.

The previous patches removed a few corner cases in ROC
and virtual monitor interfaces so that now this is safe
to do and no state should be left over.

Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:17:01 +02:00
Johannes Berg 3c3e21e744 mac80211: destroy virtual monitor interface across suspend
It has to be removed from the driver, but completely
destroying it helps handle unplug of a device during
suspend since then the channel context handling etc.
doesn't have to happen later when it's removed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:17:00 +02:00
Johannes Berg c8f994eec2 mac80211: purge remain-on-channel items when suspending
They can't really be executed while suspended and could
trigger work warnings, so abort all ROC items. When the
system resumes the notifications about this will be
delivered to userspace which can then act accordingly
(though it will assume they were canceled/finished.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:17:00 +02:00
Johannes Berg afdc7c18e9 mac80211: remove outdated comment referring to master interface
The code now explicitly calls ieee80211_configure_filter()
anyway, so nothing needs to be explained.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:59 +02:00
Bob Copeland ae76eef027 mac80211: return new mpath from mesh_path_add()
Most times that mesh_path_add() is called, it is followed by
a lookup to get the just-added mpath.  We can instead just
return the new mpath in the case that we allocated one (or the
existing one if already there), so do that.  Also, reorder the
code in mesh_path_add a bit so that we don't need to allocate
in the pre-existing case.

Signed-off-by: Bob Copeland <bob@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:59 +02:00
Chun-Yeow Yeoh 0f71651f93 mac80211: fix the PREP mesh hwmp debug message
The mesh hwmp debug message is a bit confusing. The "sending PREP
to %p" should be the MAC address of mesh STA that has originated
the PREQ message and the "received PREP from %pM" should be the MAC
address of the mesh STA that has originated the PREP message.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:58 +02:00
Johannes Berg 79ba1d8910 mac80211: parse Timeout Interval Element using a struct
Instead of open-coding the accesses and length check do
the length check in the IE parser and assign a struct
pointer for use in the remaining code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:58 +02:00
Johannes Berg 1946bed957 mac80211: check ERP info IE length in parser
It's always just one byte, so check for that and
remove the length field from the parser struct.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:57 +02:00
Johannes Berg 1cd8e88e17 mac80211: check DSSS params IE length in parser
It's always just one byte, so check for that and
remove the length field from the parser struct.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:56 +02:00
Johannes Berg a6dfba841c mac80211: remove unused IE pointers from parser
There's no need to parse IEs that aren't used
so just remove them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:56 +02:00
Johannes Berg c5d54fbf0e mac80211: remove ancient reference to master interface
The master interface no longer exists ... and hasn't for
a few years now, so remove this reference :-)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:55 +02:00
Ben Greear a13fbe549f mac80211: be more careful about sending beacon-loss-events
I don't think we should send the events unless it was actually
a beacon that was lost...not just any probe of an AP.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:55 +02:00
Ben Greear 78e443e4c6 mac80211: add beacon stats to debugfs
Beacon-timeout and number of beacon loss events.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:54 +02:00
Karl Beldan d0e6c21acd mac80211: let drivers not supporting channel contexts use VHT
It is possible since the global hw config and local switched to
cfg80211_chan_def.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-04-08 09:16:44 +02:00
Eric W. Biederman 6b0ee8c036 scm: Stop passing struct cred
Now that uids and gids are completely encapsulated in kuid_t
and kgid_t we no longer need to pass struct cred which allowed
us to test both the uid and the user namespace for equality.

Passing struct cred potentially allows us to pass the entire group
list as BSD does but I don't believe the cost of cache line misses
justifies retaining code for a future potential application.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 18:58:55 -04:00
David S. Miller d978a6361a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/nfc/microread/mei.c
	net/netfilter/nfnetlink_queue_core.c

Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which
some cleanups are going to go on-top.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 18:37:01 -04:00
Wei Yongjun 8303e699f7 decnet: remove duplicated include from dn_table.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:12:01 -04:00
Wei Yongjun 2f13e9f741 net_cls: remove duplicated include from cls_api.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:12:01 -04:00
Alan Ott fc52eea4c5 6lowpan: handle dev_queue_xmit() error code properly
dev_queue_xmit() will return a positive value if the packet could not be
queued, often because the real network device (in our case the mac802154
wpan device) has its queue stopped.  lowpan_xmit() should handle the
positive return code (for the debug statement) and return that value to
the higher layer so the higher layer will retry sending the packet.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:06:44 -04:00
Alan Ott e937f583ec mac802154: Increase tx_buffer_len
Increase the buffer length from 10 to 300 packets. Consider that traffic on
mac802154 devices will often be 6LoWPAN, and a full-length (1280 octet)
IPv6 packet will fragment into 15 6LoWPAN fragments (because the MTU of
IEEE 802.15.4 is 127).  A 300-packet queue is really 20 full-length IPv6
packets.

With a queue length of 10, an entire IPv6 packet was unable to get queued
at one time, causing fragments to be dropped, and making reassembly
impossible.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:06:43 -04:00
Alan Ott b5992fe962 mac802154: Use netif flow control
Use netif_stop_queue() and netif_wake_queue() to control the flow of
packets to mac802154 devices.  Since many IEEE 802.15.4 devices have no
output buffer, and since the mac802154 xmit() function is designed to
block, netif_stop_queue() is called after each packet.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:06:43 -04:00
Alan Ott 7dd43d356e mac802154: Do not try to resend failed packets
When ops->xmit() fails, drop the packet. Devices which support hardware
ack and retry (which include all devices currently supported by mainline),
will automatically retry sending the packet (in the hardware) up to 3
times, per the 802.15.4 spec.  There is no need, and it is incorrect to
try to do it in mac802154.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:06:43 -04:00
Wei Yongjun 524fba6c14 sctp: fix error return code in __sctp_connect()
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:04:17 -04:00
Cong Wang cfbe800b8b 802: fix a possible race condition
(Resend with a better changelog)

garp_pdu_queue() should ways be called with this spin lock.
garp_uninit_applicant() only holds rtnl lock which is not
enough here.  A possible race can happen as garp_pdu_rcv()
is called in BH context:

	garp_pdu_rcv()
	  |->garp_pdu_parse_msg()
	    |->garp_pdu_parse_attr()
	      |-> garp_gid_event()

Found by code inspection.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Ward <david.ward@ll.mit.edu>
Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 17:04:17 -04:00
Mathias Krause d5e0d0f607 VSOCK: Fix missing msg_namelen update in vsock_stream_recvmsg()
The code misses to update the msg_namelen member to 0 and therefore
makes net/socket.c leak the local, uninitialized sockaddr_storage
variable to userland -- 128 bytes of kernel stack memory.

Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause 680d04e0ba VSOCK: vmci - fix possible info leak in vmci_transport_dgram_dequeue()
In case we received no data on the call to skb_recv_datagram(), i.e.
skb->data is NULL, vmci_transport_dgram_dequeue() will return with 0
without updating msg_namelen leading to net/socket.c leaking the local,
uninitialized sockaddr_storage variable to userland -- 128 bytes of
kernel stack memory.

Fix this by moving the already existing msg_namelen assignment a few
lines above.

Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: George Zhang <georgezhang@vmware.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause 60085c3d00 tipc: fix info leaks via msg_name in recv_msg/recv_stream
The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.

Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.

Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause 4a184233f2 rose: fix info leak via msg_name in rose_recvmsg()
The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.

Fix the issue by initializing the memory used for sockaddr info with
memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause d26d6504f2 NFC: llcp: fix info leaks via msg_name in llcp_sock_recvmsg()
The code in llcp_sock_recvmsg() does not initialize all the members of
struct sockaddr_nfc_llcp when filling the sockaddr info. Nor does it
initialize the padding bytes of the structure inserted by the compiler
for alignment.

Also, if the socket is in state LLCP_CLOSED or is shutting down during
receive the msg_namelen member is not updated to 0 while otherwise
returning with 0, i.e. "success". The msg_namelen update is also
missing for stream and seqpacket sockets which don't fill the sockaddr
info.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix the first issue by initializing the memory used for sockaddr info
with memset(0). Fix the second one by setting msg_namelen to 0 early.
It will be updated later if we're going to fill the msg_name member.

Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause 3ce5efad47 netrom: fix info leak via msg_name in nr_recvmsg()
In case msg_name is set the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of
struct sockaddr_ax25 inserted by the compiler for alignment. Also
the sax25_ndigis member does not get assigned, leaking four more
bytes.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:02 -04:00
Mathias Krause c77a4b9cff llc: Fix missing msg_namelen update in llc_ui_recvmsg()
For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.

Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause b860d3cc62 l2tp: fix info leak in l2tp_ip6_recvmsg()
The L2TP code for IPv6 fails to initialize the l2tp_conn_id member of
struct sockaddr_l2tpip6 and therefore leaks four bytes kernel stack
in l2tp_ip6_recvmsg() in case msg_name is set.

Initialize l2tp_conn_id with 0 to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause a5598bd9c0 iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.

Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause 5ae94c0d2f irda: Fix missing msg_namelen update in irda_recvmsg_dgram()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.

Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause 2d6fbfe733 caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about caif_seqpkt_recvmsg() not filling the msg_name in case it was
set.

Cc: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause c8c499175f Bluetooth: SCO - Fix missing msg_namelen update in sco_sock_recvmsg()
If the socket is in state BT_CONNECT2 and BT_SK_DEFER_SETUP is set in
the flags, sco_sock_recvmsg() returns early with 0 without updating the
possibly set msg_namelen member. This, in turn, leads to a 128 byte
kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_recvmsg().

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:01 -04:00
Mathias Krause e11e0455c0 Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()
If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:00 -04:00
Mathias Krause 4683f42fde Bluetooth: fix possible info leak in bt_sock_recvmsg()
In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.

Fix this by moving the msg_namelen assignment in front of the shutdown
test.

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:00 -04:00
Mathias Krause ef3313e84a ax25: fix info leak via msg_name in ax25_recvmsg()
When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:00 -04:00
Mathias Krause 9b3e617f3d atm: update msg_namelen in vcc_recvmsg()
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 16:28:00 -04:00
Christoph Paasch 50a75a8914 ipv6/tcp: Stop processing ICMPv6 redirect messages
Tetja Rediske found that if the host receives an ICMPv6 redirect message
after sending a SYN+ACK, the connection will be reset.

He bisected it down to 093d04d (ipv6: Change skb->data before using
icmpv6_notify() to propagate redirect), but the origin of the bug comes
from ec18d9a26 (ipv6: Add redirect support to all protocol icmp error
handlers.). The bug simply did not trigger prior to 093d04d, because
skb->data did not point to the inner IP header and thus icmpv6_notify
did not call the correct err_handler.

This patch adds the missing "goto out;" in tcp_v6_err. After receiving
an ICMPv6 Redirect, we should not continue processing the ICMP in
tcp_v6_err, as this may trigger the removal of request-socks or setting
sk_err(_soft).

Reported-by: Tetja Rediske <tetja@tetja.de>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 12:36:08 -04:00
David S. Miller d16658206a Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter and IPVS updates for
your net-next tree, most relevantly they are:

* Add net namespace support to NFLOG, ULOG and ebt_ulog and NFQUEUE.
  The LOG and ebt_log target has been also adapted, but they still
  depend on the syslog netnamespace that seems to be missing, from
  Gao Feng.

* Don't lose indications of congestion in IPv6 fragmentation handling,
  from Hannes Frederic Sowa.i

* IPVS conversion to use RCU, including some code consolidation patches
  and optimizations, also some from Julian Anastasov.

* cpu fanout support for NFQUEUE, from Holger Eitzenberger.

* Better error reporting to userspace when dropping packets from
  all our _*_[xfrm|route]_me_harder functions, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07 12:22:06 -04:00
Patrick McHardy 3a7b21eaf4 netfilter: nf_ct_sip: don't drop packets with offsets pointing outside the packet
Some Cisco phones create huge messages that are spread over multiple packets.
After calculating the offset of the SIP body, it is validated to be within
the packet and the packet is dropped otherwise. This breaks operation of
these phones. Since connection tracking is supposed to be passive, just let
those packets pass unmodified and untracked.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-06 14:03:18 +02:00
Hannes Frederic Sowa b8dd6a223e netfilter: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
This change brings netfilter reassembly logic on par with
reassembly.c. The corresponding change in net-next is
(eec2e61 ipv6: implement RFC3168 5.3 (ecn protection) for
ipv6 fragmentation handling)

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-06 13:06:37 +02:00
David Herrmann b3916db32c Bluetooth: hidp: verify l2cap sockets
We need to verify that the given sockets actually are l2cap sockets. If
they aren't, we are not supposed to access bt_sk(sock) and we shouldn't
start the session if the offsets turn out to be valid local BT addresses.

That is, if someone passes a TCP socket to HIDCONNADD, then we access some
random offset in the TCP socket (which isn't even guaranteed to be valid).

Fix this by checking that the socket is an l2cap socket.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-05 23:44:14 -03:00
David Herrmann c849edbdc2 Bluetooth: hidp: remove redundant error message
We print this error twice in the first error-path so remove it. One error
message is enough.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-04-05 23:44:07 -03:00
Trond Myklebust f05c124a70 SUNRPC: Fix a potential memory leak in rpc_new_client
If the call to rpciod_up() fails, we currently leak a reference to the
struct rpc_xprt.
As part of the fix, we also remove the redundant check for xprt!=NULL.
This is already taken care of by the callers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:02:14 -04:00
Chuck Lever a58e0be6f6 SUNRPC: Remove extra xprt_put()
While testing error cases where rpc_new_client() fails, I saw
some oopses.

If rpc_new_client() fails, it already invokes xprt_put().  Thus
__rpc_clone_client() does not need to invoke it again.

Introduced by commit 1b63a751 "SUNRPC: Refactor rpc_clone_client()"
Fri Sep 14, 2012.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 16:58:14 -04:00
Patrick McHardy 124dff01af netfilter: don't reset nf_trace in nf_reset()
Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
  release all netfilter references that might keep modules pinned while
  the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
  tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
  that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
  used anymore. Its not entirely clear whether those packets should
  be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
  packet cross network namespace boundaries. This is the only cases
  where we clearly want to reset nf_trace and is also what the
  original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 15:38:10 -04:00
Pablo Neira Ayuso 12202fa757 netfilter: remove unneeded variable proc_net_netfilter
Now that this supports net namespace for nflog and nfqueue,
we can remove the global proc_net_netfilter which has no
clients anymore.

Based on patch from Gao feng.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:11 +02:00
Gao feng e817961048 netfilter: nfnetlink_queue: add net namespace support for nfnetlink_queue
This patch makes /proc/net/netfilter/nfnetlink_queue pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:11 +02:00
Gao feng 5b023fc8d8 netfilter: enable per netns support for nf_loggers
After this patch, all nf_loggers support net namespace. Still
xt_LOG and ebt_log require syslog netns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:10 +02:00
Gao feng 9368a53c47 netfilter: nfnetlink_log: add net namespace support for nfnetlink_log
This patch makes /proc/net/netfilter/nfnetlink_log pernet.
Moreover, there's a pernet instance table and lock.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:08:01 +02:00
Gao feng 355430671a netfilter: ipt_ULOG: add net namespace support for ipt_ULOG
Add pernet support to ipt_ULOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
nflognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 21:03:43 +02:00
Gao feng 5b175b7ebd netfilter: ebt_ulog: add net namespace support for ebt_ulog
Add pernet support to ebt_ulog by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

This patch also make ulog_buffers and netlink socket
ebtulognl per netns.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:59:50 +02:00
Gao feng 69b34fb996 netfilter: xt_LOG: add net namespace support for xt_LOG
Add pernet support to xt_LOG by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:58:45 +02:00
Gao feng 7d2789246c netfilter: ebt_log: add net namespace support for ebt_log
Add pernet support to ebt_log by means of the new nf_log_set
function added in (30e0c6a netfilter: nf_log: prepare net
namespace support for loggers).

Since syslog ns has yet not been implemented, we don't want
the containers to DDOS host's syslogd. So only enable ebt_log
only from init_net and wait for syslog ns support.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:57:27 +02:00
Gao feng 30e0c6a6be netfilter: nf_log: prepare net namespace support for loggers
This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
   and nf_log_set. The new nf_log_register is used to globally
   register the nf_logger and nf_log_set is used for enabling
   pernet support from nf_loggers.

   Per netns is not yet complete after this patch, it comes in
   separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
   yet complete after this patch, it only allows to bind the
   nf_logger to the protocol family from init_net and it skips
   other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
   After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 20:12:54 +02:00
Gao feng f3c1a44a22 netfilter: make /proc/net/netfilter pernet
This patch makes this proc dentry pernet. So far only init_net
had a /proc/net/netfilter directory.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05 19:35:02 +02:00
Jiri Pirko 34e2ed34a0 net: ipv4: notify when address lifetime changes
if userspace changes lifetime of address, send netlink notification and
call notifier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:51:12 -04:00
Eric W. Biederman 0e82e7f6df af_unix: If we don't care about credentials coallesce all messages
It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.

The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.

The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.

Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.

I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.

Reported-by: Karel Srot <ksrot@redhat.com>
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:49:13 -04:00
Eric W. Biederman 25da0e3e9d Revert "af_unix: dont send SCM_CREDENTIAL when dest socket is NULL"
This reverts commit 14134f6584.

The problem that the above patch was meant to address is that af_unix
messages are not being coallesced because we are sending unnecesarry
credentials.  Not sending credentials in maybe_add_creds totally
breaks unconnected unix domain sockets that wish to send credentails
to other sockets.

In practice this break some versions of udev because they receive a
message and the sending uid is bogus so they drop the message.

Reported-by: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:49:03 -04:00
Vlad Yasevich 4543fbefe6 net: count hw_addr syncs so that unsync works properly.
A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices.  In
some cases (bond/team) a single address list is synched down
to multiple devices.  At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-05 00:18:46 -04:00
David S. Miller 4f4ecd5f2a Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains netfilter updates for your net tree,
they are:

* Fix missing the skb->trace reset in nf_reset, noticed by Gao Feng
  while using the TRACE target with several net namespaces.

* Fix prefix translation in IPv6 NPT if non-multiple of 32 prefixes
  are used, from Matthias Schiffer.

* Fix invalid nfacct objects with empty name, they are now rejected
  with -EINVAL, spotted by Michael Zintakis, patch from myself.

* A couple of fixes for wrong return values in the error path of
  nfnetlink_queue and nf_conntrack, from Wei Yongjun.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:41:53 -04:00
David S. Miller 518314ffe4 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into wireless
John W. Linville says:

====================
Here are some more fixes intended for the 3.9 stream...

Regarding the mac80211 bits, Johannes says:

"I had changed the idle handling to simplify it, but broken the
sequencing of commands, at least for ath9k-htc, one patch restores the
sequence. The other patch fixes a crash Jouni found while stress-testing
the remain-on-channel code, when an item is deleted the work struct can
run twice and crash the second time."

As for the iwlwifi bits, Johannes says:

"The only fix here is to the passive-no-RX firmware regulatory
enforcement driver support code to not drop auth frames in quick
succession, leading to not being able to connect to APs on passive
channels in certain circumstances."

Don't forget the NFC bits, about which Samuel says:

"This time we have:

- A crash fix for when a DGRAM LLCP socket is listening while the NFC adapter
  is physically removed.
- A potential double skb free when the LLCP socket receive queue is full.
- A fix for properly handling multiple and consecutive LLCP connections, and
  not trash the socket ack log.
- A build failure for the MEI microread physical layer, now that the MEI bus
  APIs have been merged into char-misc-next."

On top of that, Stone Piao provides an mwifiex fix to avoid accessing
beyond the end of a buffer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:39:06 -04:00
Jesper Dangaard Brouer 19952cc4f8 net: frag queue per hash bucket locking
This patch implements per hash bucket locking for the frag queue
hash.  This removes two write locks, and the only remaining write
lock is for protecting hash rebuild.  This essentially reduce the
readers-writer lock to a rebuild lock.

This patch is part of "net: frag performance followup"
 http://thread.gmane.org/gmane.linux.network/263644
of which two patches have already been accepted:

Same test setup as previous:
 (http://thread.gmane.org/gmane.linux.network/257155)
 Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses
 Ethernet flow-control.  A third interface is used for generating the
 DoS attack (with trafgen).

Notice, I have changed the frag DoS generator script to be more
efficient/deadly.  Before it would only hit one RX queue, now its
sending packets causing multi-queue RX, due to "better" RX hashing.

Test types summary (netperf UDP_STREAM):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS
 Test-20G64K+MQ  == Same as 20G64K with Multi-Queue frag DoS
 Test-20G3F+MQ   == Same as 20G3F  with Multi-Queue frag DoS

When I rebased this-patch(03) (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02.
Test (C) is what I reported before for this-patch

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)).
Test (D) function as a new base-test.

Performance table summary (in Mbit/s):

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"lab-test" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>

----
V2:
- By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't
  need the spinlock _bh versions, as Netfilter currently does a
  local_bh_disable() before entering inet_fragment.
- Fold-in desc from cover-mail
V3:
- Drop the chain_len counter per hash bucket.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-04 17:37:05 -04:00
Marcel Holtmann 5afff03815 Bluetooth: Remove driver init queue from core
The driver init queue is no longer needed. This can be all handled
inside the drivers now. So remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-04-04 19:28:25 +03:00
Marcel Holtmann f41c70c4d5 Bluetooth: Add driver setup stage for early init
Some drivers require a special stage for their early init. This is
always specific to the driver or transport. So call back into driver to
allow bringing up the device.

The advantage with this stage is that the Bluetooth core is actually
handling the HCI layer now. This means that command and event processing
is available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-04-04 19:16:12 +03:00
Johan Hedberg 7b1abbbed0 Bluetooth: Add __hci_cmd_sync_ev function
This patch adds a __hci_cmd_sync_ev function, analogous to
__hci_cmd_sync except that it also takes an event parameter to indicate
that the command completes with a special event instead of command
complete. Internally this new function takes advantage of the
hci_req_add_ev function introduced in the previous patch.

The primary expected user of this new function are the setup routines of
HCI drivers which may want to send custom commands and return only when
they have completed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2013-04-04 19:16:10 +03:00
Johan Hedberg 02350a725f Bluetooth: Add support for custom event terminated commands
This patch adds support for having commands within HCI requests that do
not result in a command complete but some other event. This is at least
needed for some vendor specific commands to be issued in the
hdev->setup() procecure, but might also be useful for other commands.

The way that the support is implemented is by extending the skb control
buffer to have a field to indicate that the command is expected to
terminate with a special event. After sending the command each received
event can then be compared against this field through hdev->sent_cmd.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2013-04-04 19:16:08 +03:00
Johan Hedberg 75e84b7c52 Bluetooth: Add __hci_cmd_sync() helper function
This patch adds a helper function for sending a single HCI command
waiting for its completion and then returning back the parameters in the
resulting command complete event (if there was one).

The implementation is very similar to that of hci_req_sync() except that
instead of invocing a callback for sending HCI commands the function
constructs and sends one itself and after being woken up picks the last
received event from hdev->recv_evt (if it matches the right criteria)
and returns it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2013-04-04 19:16:06 +03:00
Johan Hedberg b6ddb63823 Bluetooth: Track received events in hdev
This patch adds tracking of received HCI events to the hci_dev struct.
This is necessary so that a subsequent patch can implement a function
for sending a single command synchronously and returning the resulting
command complete parameters in the function return value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2013-04-04 19:16:04 +03:00
Andre Guedes d4299ce6b3 Bluetooth: Remove unneeded hci_req_cmd_status function
This patch removes the hci_req_cmd_status function since it is not
used anymore. The HCI request framework now considers the HCI command
has complete once the Command Status or Command Complete Event is
received.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-04-04 11:12:34 +03:00
Andre Guedes 3e13fa1e1f Bluetooth: Fix hci_inquiry ioctl usage
Since the HCI request framework was properly fixed, the hci_req_sync
call, in hci_inquiry, will return as soon as the HCI command completes
(not the Inquiry procedure). However, in inquiry ioctl implementation,
we want to sleep the user process until the inquiry procedure finishes.

This patch changes hci_inquiry so, in case the HCI Inquiry command
was executed successfully, it waits the HCI_INQUIRY flag to be cleared.
This way, the user process will sleep until the inquiry procedure
finishes.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-04-04 11:12:33 +03:00
Andre Guedes 33720450bb Bluetooth: Fix HCI request framework
Some HCI commands don't send a Command Complete Event once the HCI
command has completed so they require some special handling from the
HCI request framework. These HCI commands, however, send a Command
Status Event to indicate that the command has been received, and
that the controller is currently performing the task for the command.

So, in order to properly handle those HCI commands, the HCI request
framework should consider the HCI command has completed once the
Command Status Event is received.

This way, we fix some issues regarding the Inquiry command support,
as well as add support for all those HCI commands which would require
some special handling from the HCI request framework.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2013-04-04 11:12:33 +03:00
John W. Linville 9306a398e7 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-03 14:19:48 -04:00
John W. Linville 407ad2b7ef Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-04-03 13:50:34 -04:00
Matthias Schiffer 906b1c394d netfilter: ip6t_NPT: Fix translation for non-multiple of 32 prefix lengths
The bitmask used for the prefix mangling was being calculated
incorrectly, leading to the wrong part of the address being replaced
when the prefix length wasn't a multiple of 32.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-03 12:24:56 +02:00
David S. Miller d662483264 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull net into net-next to get the synchronize_net() bug fix in
bonding.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-03 01:31:54 -04:00
Thomas Graf 5d9633523f openvswitch: Don't insert empty OVS_VPORT_ATTR_OPTIONS attribute
The port specific options are currently unused resulting in an
empty OVS_VPORT_ATTR_OPTIONS nested attribute being inserted
into every OVS_VPORT_CMD_GET message.

Don't insert OVS_VPORT_ATTR_OPTIONS if no options are present.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-04-02 16:31:58 -07:00
Jacob Keller 8facd5fb73 net: fix smatch warnings inside datagram_poll
Commit 7d4c04fc17 ("net: add option to enable
error queue packets waking select") has an issue due to operator precedence
causing the bit-wise OR to bind to the sock_flags call instead of the result of
the terniary conditional. This fixes the *_poll functions to work properly. The
old code results in "mask |= POLLPRI" instead of what was intended, which is to
only include POLLPRI when the socket option is enabled.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 16:59:16 -04:00
Reilly Grant 990454b5a4 VSOCK: Handle changes to the VMCI context ID.
The VMCI context ID of a virtual machine may change at any time. There
is a VMCI event which signals this but datagrams may be processed before
this is handled. It is therefore necessary to be flexible about the
destination context ID of any datagrams received. (It can be assumed to
be correct because it is provided by the hypervisor.) The context ID on
existing sockets should be updated to reflect how the hypervisor is
currently referring to the system.

Signed-off-by: Reilly Grant <grantr@vmware.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:39:17 -04:00
Balakumaran Kannan 25fb6ca4ed net IPv6 : Fix broken IPv6 routing table after loopback down-up
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.

IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.

This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.

==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination                    Next Hop                   Flag Met Ref Use If
2000::20/128                   ::                         U    256 0     0 eth0
fe80::/64                      ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
::1/128                        ::                         Un   0   1     0 lo
2000::20/128                   ::                         Un   0   1     0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128  ::                         Un   0   1     0 lo
ff00::/8                       ::                         U    256 0     0 eth0
::/0                           ::                         !n   -1  1     1 lo
$

Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:37:19 -04:00
Paul Gortmaker 5e404cd658 ipconfig: add informative timeout messages while waiting for carrier
Commit 3fb72f1e6e ("ipconfig wait
for carrier") added a "wait for carrier on at least one interface"
policy, with a worst case maximum wait of two minutes.

However, if you encounter this, you won't get any feedback from
the console as to the nature of what is going on.  You just see
the booting process hang for two minutes and then continue.

Here we add a message so the user knows what is going on, and
hence can take action to rectify the situation (e.g. fix network
cable or whatever.)  After the 1st 10s pause, output now begins
that looks like this:

	Waiting up to 110 more seconds for network.
	Waiting up to 100 more seconds for network.
	Waiting up to 90 more seconds for network.
	Waiting up to 80 more seconds for network.
	...

Since most systems will have no problem getting link/carrier in the
1st 10s, the only people who will see these messages are people with
genuine issues that need to be resolved.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:35:33 -04:00
Vasily Averin f0f6ee1f70 cbq: incorrect processing of high limits
currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link

 In shaper | Actual Result
-----------+---------------
  100M     | 108 Mbps
  200M     | 244 Mbps
  300M     | 412 Mbps
  500M     | 893 Mbps

This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.

To fix this problem we prevent change of q->now until its synchronization
with real time.

Signed-off-by: Vasily Averin <vvs@openvz.org>
Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 14:29:20 -04:00
Eric Dumazet 4a0b5ec12f bridge: remove a redundant synchronize_net()
commit 00cfec3748 (net: add a synchronize_net() in
netdev_rx_handler_unregister())
allows us to remove the synchronized_net() call from del_nbp()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02 12:10:58 -04:00
holger@eitzenberger.org 5c33448c40 netfilter: xt_NFQUEUE: coalesce IPv4 and IPv6 hashing
Because rev1 and rev3 of the target share the same hashing
generalize it by introduing nfqueue_hash().

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:26:10 +02:00
holger@eitzenberger.org 8746ddcf12 netfilter: xt_NFQUEUE: introduce CPU fanout
Current NFQUEUE target uses a hash, computed over source and
destination address (and other parameters), for steering the packet
to the actual NFQUEUE. This, however forgets about the fact that the
packet eventually is handled by a particular CPU on user request.

If E. g.

  1) IRQ affinity is used to handle packets on a particular CPU already
     (both single-queue or multi-queue case)

and/or

  2) RPS is used to steer packets to a specific softirq

the target easily chooses an NFQUEUE which is not handled by a process
pinned to the same CPU.

The idea is therefore to use the CPU index for determining the
NFQUEUE handling the packet.

E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
looks like this:

 +-----+  +-----+  +-----+  +-----+
 |NFQ#0|  |NFQ#1|  |NFQ#2|  |NFQ#3|
 +-----+  +-----+  +-----+  +-----+
    ^        ^        ^        ^
    |        |NFQUEUE |        |
    +        +        +        +
 +-----+  +-----+  +-----+  +-----+
 |rx-0 |  |rx-1 |  |rx-2 |  |rx-3 |
 +-----+  +-----+  +-----+  +-----+

The NFQUEUEs not necessarily have to start with number 0, setups with
less NFQUEUEs than packet-handling CPUs are not a problem as well.

This patch extends the NFQUEUE target to accept a new
NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
CPU index for determining the NFQUEUE being used. I have to introduce
rev3 for this. The 'flags' are folded into _v2 'bypass'.

By changing the way which queue is assigned, I'm able to improve the
performance if the processes reading on the NFQUEUs are pinned
correctly.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:25:44 +02:00
Gao feng f016588861 netfilter: use IS_ENABLE to replace if defined in TRACE target
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02 01:03:45 +02:00
Julian Anastasov ac69269a45 ipvs: do not disable bh for long time
We used a global BH disable in LOCAL_OUT hook.
Add _bh suffix to all places that need it and remove
the disabling from LOCAL_OUT and sync code.

Functions like ip_defrag need protection from
BH, so add it. As for nf_nat_mangle_tcp_packet, it needs
RCU lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:58 +02:00
Julian Anastasov ceec4c3816 ipvs: convert services to rcu
This is the final step in RCU conversion.

Things that are removed:

- svc->usecnt: now svc is accessed under RCU read lock
- svc->inc: and some unused code
- ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE
- __ip_vs_svc_lock: replaced with RCU
- IP_VS_WAIT_WHILE: now readers lookup svcs and dests under
	RCU and work in parallel with configuration

Other changes:

- before now, a RCU read-side critical section included the
calling of the schedule method, now it is extended to include
service lookup
- ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist
- svc->pe and svc->scheduler remain to the end (of grace period),
	the schedulers are prepared for such RCU readers
	even after done_service is called but they need
	to use synchronize_rcu because last ip_vs_scheduler_put
	can happen while RCU read-side critical sections
	use an outdated svc->scheduler pointer
- as planned, update_service is removed
- empty services can be freed immediately after grace period.
	If dests were present, the services are freed from
	the dest trash code

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:58 +02:00
Julian Anastasov 413c2d04e9 ipvs: convert dests to rcu
In previous commits the schedulers started to access
svc->destinations with _rcu list traversal primitives
because the IP_VS_WAIT_WHILE macro still plays the role of
grace period. Now it is time to finish the updating part,
i.e. adding and deleting of dests with _rcu suffix before
removing the IP_VS_WAIT_WHILE in next commit.

We use the same rule for conns as for the
schedulers: dests can be searched in RCU read-side critical
section where ip_vs_dest_hold can be called by ip_vs_bind_dest.

Some things are not perfect, for example, calling
functions like ip_vs_lookup_dest from updating code under
RCU, just because we use some function both from reader
and from updater.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:57 +02:00
Julian Anastasov ba3a3ce14e ipvs: convert sched_lock to spin lock
As all read_locks are gone spin lock is preferred.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:56 +02:00
Julian Anastasov ed3ffc4e48 ipvs: do not expect result from done_service
This method releases the scheduler state,
it can not fail. Such change will help to properly
replace the scheduler in following patch.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:56 +02:00
Julian Anastasov 578bc3ef1e ipvs: reorganize dest trash
All dests will go to trash, no exceptions.
But we have to use new list node t_list for this, due
to RCU changes in following patches. Dests will wait there
initial grace period and later all conns and schedulers to
put their reference. The dests don't get reference for
staying in dest trash as before.

	As result, we do not load ip_vs_dest_put with
extra checks for last refcnt and the schedulers do not
need to play games with atomic_inc_not_zero while
selecting best destination.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:55 +02:00
Julian Anastasov 08cb2d032f ipvs: convert wrr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the weight for some
dest can be reduced during dest selection, change the
algorithm to check weights by using minimum weights in the
1 .. max_weight-(di-1) range, with the same step (di). By this
way we ensure that there will be always a weight >= 1 check
before claiming that all destinations are overloaded.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:54 +02:00
Julian Anastasov b310faad3e ipvs: convert wlc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:54 +02:00
Julian Anastasov 1acb7f6761 ipvs: convert sh scheduler to rcu
Use the 3 new methods to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:53 +02:00
Julian Anastasov 9be52aba7a ipvs: convert sed scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:52 +02:00
Julian Anastasov c0d0c0a1c0 ipvs: convert rr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. As the previous entry
could be unlinked, limit the list traversals to 2 when
lookup started from previous entry.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:52 +02:00
Julian Anastasov f92ea8f096 ipvs: convert nq scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:51 +02:00
Julian Anastasov 4ebd288b69 ipvs: convert lc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:50 +02:00
Julian Anastasov c5549571f9 ipvs: convert lblcr scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. The set.lock is removed because now it is used in
rare cases, mostly under sched_lock.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:50 +02:00
Julian Anastasov c2a4ffb70e ipvs: convert lblc scheduler to rcu
The schedule method now needs _rcu list-traversal
primitive for svc->destinations. The read_lock for sched_lock is
removed. Use a dead flag to prevent new entries to be created
while scheduler is reclaimed. Use hlist for the hash table.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:49 +02:00
Julian Anastasov 8f3d0023b9 ipvs: convert dh scheduler to rcu
Use the new add_dest and del_dest methods
to reassign dests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:49 +02:00
Julian Anastasov fca9c20ae1 ipvs: add ip_vs_dest_hold and ip_vs_dest_put
ip_vs_dest_hold will be used under RCU lock
while ip_vs_dest_put can be called even after dest
is removed from service, as it happens for conns and
some schedulers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:48 +02:00
Julian Anastasov 6b6df46663 ipvs: preparations for using rcu in schedulers
Allow schedulers to use rcu_dereference when
returning destination on lookup. The RCU read-side critical
section will allow ip_vs_bind_dest to get dest refcnt as
preparation for the step where destinations will be
deleted without an IP_VS_WAIT_WHILE guard that holds the
packet processing during update.

	Add new optional scheduler methods add_dest,
del_dest and upd_dest. For now the methods are called
together with update_service but update_service will be
removed in a following change.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:47 +02:00
Julian Anastasov 71dfa982f1 ipvs: change ip_vs_sched_lock to mutex
The global list with schedulers ip_vs_schedulers
is accessed only from user context - configuration and
scheduler module [un]registration. Use ip_vs_sched_mutex
instead.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:47 +02:00
Julian Anastasov 9a05475ceb ipvs: avoid kmem_cache_zalloc in ip_vs_conn_new
We have many fields to set and few to reset,
use kmem_cache_alloc instead to save some cycles.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:46 +02:00
Julian Anastasov 1845ed0bb2 ipvs: reorder keys in connection structure
__ip_vs_conn_in_get and ip_vs_conn_out_get are
hot places. Optimize them, so that ports are matched first.
By moving net and fwmark below, on 32-bit arch we can fit
caddr in 32-byte cache line and all addresses in 64-byte
cache line.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:45 +02:00
Julian Anastasov 088339a57d ipvs: convert connection locking
Convert __ip_vs_conntbl_lock_array as follows:

- readers that do not modify conn lists will use RCU lock
- updaters that modify lists will use spinlock_t

Now for conn lookups we will use RCU read-side
critical section. Without using __ip_vs_conn_get such
places have access to connection fields and can
dereference some pointers like pe and pe_data plus
the ability to update timer expiration. If full access
is required we contend for reference.

We add barrier in __ip_vs_conn_put, so that
other CPUs see the refcnt operation after other writes.

With the introduction of ip_vs_conn_unlink()
we try to reorganize ip_vs_conn_expire(), so that
unhashing of connections that should stay more time is
avoided, even if it is for very short time.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02 00:23:45 +02:00