Commit Graph

25911 Commits

Author SHA1 Message Date
Paul Marks a5a81f0b90 ipv6: Fix default route failover when CONFIG_IPV6_ROUTER_PREF=n
I believe this commit from 2008 was incorrect:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=398bcbebb6f721ac308df1e3d658c0029bb74503

When CONFIG_IPV6_ROUTER_PREF is disabled, the kernel should follow
RFC4861 section 6.3.6: if no route is NUD_VALID, then traffic should be
sprayed across all routers (indirectly triggering NUD) until one of them
becomes NUD_VALID.

However, the following experiment demonstrates that this does not work:

1) Connect to an IPv6 network.
2) Change the router's MAC (and link-local) address.

The kernel will lock onto the first router and never try the new one, even
if the first becomes unreachable.  This patch fixes the problem by
allowing rt6_check_neigh() to return 0; if all routers return 0, then
rt6_select() will fall back to round-robin behavior.

This patch should have no effect when CONFIG_IPV6_ROUTER_PREF=y.

Note that rt6_check_neigh() is only used in a boolean context, so I've
changed its return type accordingly.

Signed-off-by: Paul Marks <pmarks@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 15:34:47 -05:00
Shmulik Ladkani 9ba2add3cf ipv6: Make 'addrconf_rs_timer' send Router Solicitations (and re-arm itself) if Router Advertisements are accepted
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted],
Router Solicitations are sent whenever kernel accepts Router
Advertisements on the interface.

However, this logic isn't reflected in 'addrconf_rs_timer'.

The timer fails to issue subsequent RS messages (and fails to re-arm
itself) if forwarding is enabled and the special hybrid mode is
enabled (accept_ra=2).

Fix the condition determining whether next RS should be sent, by using
'ipv6_accept_ra()'.

Reported-by: Ami Koren <amikoren@yahoo.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 13:59:57 -05:00
Michele Baldessari 196d675934 sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call
The current SCTP stack is lacking a mechanism to have per association
statistics. This is an implementation modeled after OpenSolaris'
SCTP_GET_ASSOC_STATS.

Userspace part will follow on lksctp if/when there is a general ACK on
this.
V4:
- Move ipackets++ before q->immediate.func() for consistency reasons
- Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
  returning bogus RTO values
- return asoc->rto_min when max_obs_rto value has not changed

V3:
- Increase ictrlchunks in sctp_assoc_bh_rcv() as well
- Move ipackets++ to sctp_inq_push()
- return 0 when no rto updates took place since the last call

V2:
- Implement partial retrieval of stat struct to cope for future expansion
- Kill the rtxpackets counter as it cannot be precise anyway
- Rename outseqtsns to outofseqtsns to make it clearer that these are out
  of sequence unexpected TSNs
- Move asoc->ipackets++ under a lock to avoid potential miscounts
- Fold asoc->opackets++ into the already existing asoc check
- Kill unneeded (q->asoc) test when increasing rtxchunks
- Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
- Don't count SHUTDOWNs as SACKs
- Move SCTP_GET_ASSOC_STATS to the private space API
- Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
  future struct growth
- Move association statistics in their own struct
- Update idupchunks when we send a SACK with dup TSNs
- return min_rto in max_rto when RTO has not changed. Also return the
  transport when max_rto last changed.

Signed-off: Michele Baldessari <michele@acksyn.org>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 13:32:15 -05:00
Willy Tarreau 02275a2ee7 tcp: don't abort splice() after small transfers
TCP coalescing added a regression in splice(socket->pipe) performance,
for some workloads because of the way tcp_read_sock() is implemented.

The reason for this is the break when (offset + 1 != skb->len).

As we released the socket lock, this condition is possible if TCP stack
added a fragment to the skb, which can happen with TCP coalescing.

So let's go back to the beginning of the loop when this happens,
to give a chance to splice more frags per system call.

Doing so fixes the issue and makes GRO 10% faster than LRO
on CPU-bound splice() workloads instead of the opposite.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-02 20:23:01 -05:00
David S. Miller ddb303301b Merge git://git.infradead.org/users/dwmw2/atm
David Woodhouse says:

====================
This is the result of pulling on the thread started by Krzysztof Mazur's
original patch 'pppoatm: don't send frames to destroyed vcc'.

Various problems in the pppoatm and br2684 code are solved, some of which
were easily triggered and would panic the kernel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01 20:45:24 -05:00
David Woodhouse 5b4d72080f pppoatm: optimise PPP channel wakeups after sock_owned_by_user()
We don't need to schedule the wakeup tasklet on *every* unlock; only if we
actually blocked the channel in the first place.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:20 +00:00
Krzysztof Mazur 9eba25268e br2684: allow assign only on a connected socket
The br2684 does not check if used vcc is in connected state,
causing potential Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.

Now br2684 can be assigned only on connected sockets; otherwise
-EINVAL error is returned.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-12-02 00:05:19 +00:00
David Woodhouse d71ffeb123 br2684: fix module_put() race
The br2684 code used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.

Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:16 +00:00
David Woodhouse 0e56d99a5b pppoatm: fix missing wakeup in pppoatm_send()
Now that we can return zero from pppoatm_send() for reasons *other* than
the queue being full, that means we can't depend on a subsequent call to
pppoatm_pop() waking the queue, and we might leave it stalled
indefinitely.

Use the ->release_cb() callback to wake the queue after the sock is
unlocked.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:15 +00:00
David Woodhouse b89588531f br2684: don't send frames on not-ready vcc
Avoid submitting packets to a vcc which is being closed. Things go badly
wrong when the ->pop method gets later called after everything's been
torn down.

Use the ATM socket lock for synchronisation with vcc_destroy_socket(),
which clears the ATM_VF_READY bit under the same lock. Otherwise, we
could end up submitting a packet to the device driver even after its
->ops->close method has been called. And it could call the vcc's ->pop
method after the protocol has been shut down. Which leads to a panic.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:14 +00:00
David Woodhouse c971f08cba atm: add release_cb() callback to vcc
The immediate use case for this is that it will allow us to ensure that a
pppoatm queue is woken after it has to drop a packet due to the sock being
locked.

Note that 'release_cb' is called when the socket is *unlocked*. This is
not to be confused with vcc_release() — which probably ought to be called
vcc_close().

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:12 +00:00
Shmulik Ladkani aeaf6e9d2f ipv6: unify logic evaluating inet6_dev's accept_ra property
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the
logic determining whether to send Router Solicitations is identical
to the logic determining whether kernel accepts Router Advertisements.

However the condition itself is repeated in several code locations.

Unify it by introducing 'ipv6_accept_ra()' accessor.

Also, simplify the condition expression, making it more readable.
No semantic change.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01 11:36:37 -05:00
Eric Dumazet fd90b29d75 tcp: change default tcp hash size
As time passed, available memory increased faster than number of
concurrent tcp sockets.

As a result, a machine with 4GB of ram gets a hash table
with 524288 slots, using 8388608 bytes of memory.

Lets change that by a 16x factor (one slot for 128 KB of ram)

Even if a small machine needs a _lot_ of sockets, tcp lookups are now
very efficient, using one cache line per socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01 11:36:37 -05:00
Eric Dumazet ce43b03e88 net: move inet_dport/inet_num in sock_common
commit 68835aba4d (net: optimize INET input path further)
moved some fields used for tcp/udp sockets lookup in the first cache
line of struct sock_common.

This patch moves inet_dport/inet_num as well, filling a 32bit hole
on 64 bit arches and reducing number of cache line misses in lookups.

Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
before addresses match, as this check is more discriminant.

Remove the hash check from MATCH() macros because we dont need to
re validate the hash value after taking a refcount on socket, and
use likely/unlikely compiler hints, as the sk_hash/hash check
makes the following conditional tests 100% predicted by cpu.

Introduce skc_addrpair/skc_portpair pair values to better
document the alignment requirements of the port/addr pairs
used in the various MATCH() macros, and remove some casts.

The namespace check can also be done at last.

This slightly improves TCP/UDP lookup times.

IP/TCP early demux needs inet->rx_dst_ifindex and
TCP needs inet->min_ttl, lets group them together in same cache line.

With help from Ben Hutchings & Joe Perches.

Idea of this patch came after Ling Ma proposal to move skc_hash
to the beginning of struct sock_common, and should allow him
to submit a final version of his patch. My tests show an improvement
doing so.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 15:02:56 -05:00
Thomas Graf 06a31e2b91 sctp: verify length provided in heartbeat information parameter
If the variable parameter length provided in the mandatory
heartbeat information parameter exceeds the calculated payload
length the packet has been corrupted. Reply with a parameter
length protocol violation message.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:25:52 -05:00
Rami Rosen c07135633b rtnelink: remove unused parameter from rtnl_create_link().
This patch removes an unused parameter (src_net) from rtnl_create_link()
method and from the method single invocation, in veth.
This parameter was used in the past when calling
ops->get_tx_queues(src_net, tb) in rtnl_create_link().
The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
get_num_tx_queues() and get_num_rx_queues(), which do not get any
parameter. This was done in commit d40156aa5e by
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:24:40 -05:00
David S. Miller dad52fd964 Included changes:
- Use the new ETH_P_BATMAN define instead of the private BATADV_ETH_P_BATMAN
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQuIa1AAoJEADl0hg6qKeOqN8QAMZ/pEhhIjxQ/8Icde1dccbh
 IuGBtm1Nhx6dvfWxzAKx1JQ5GIJMrKFNR+er4bMEJHgsUtBW08pjMVFAzqzPV3Bb
 YMVQSJtnbl39obzWMjnsvbAhB2cOC04BAnFlCapKOAeQGWmNrYwZHmq1yrjQhq+F
 xOPUqMQ94CrecGGtXOrqCTEJL7Y6VJngu15A8ZGXQBeOKPlmfLUpA4wVG2f7n0cT
 aTv7sD46wmJ4YzFCmqd25ugKWCvtmslMg+ryY+NrZYVXlptMTsuF8JTcfqopne+5
 3KaIsQzdXPciM4PGuNmJC+C83kiQulmoeQav+LNvdq+r/suJLcELsxwsEH58amfc
 Qs80mXfd0Pnop1eORHvaTtNKMd/lkTJ6ydOVIW2dwN32VSPUaCxC0VopeERInKP+
 +1flCMT8e/CB7dFHH4lvjYbg9R30VZU2oQo8G3EFMRWzybbvphrM5ITax5aHdy6e
 sDzgwoBAdg9E6UvqLOcwkCeSVwyG9GYhk/JwpYVm9sKsVeqkCLLR9+mJjHv9iwqn
 h76jqJpwfpDHoDXBEvIQCNcN63B2XqzJGtyhWL3wrS+kAKrxnTONYJbqbugM3sJd
 lJjAFlCHR1xgeP1vZ8D+N2zM4CNR73AOHdKAUPUfyfJhJxcjpQpNpK26CDuKpNC9
 gNJ31BjiZPkjGXRd9NXf
 =hrO7
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- Use the new ETH_P_BATMAN define instead of the private BATADV_ETH_P_BATMAN

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:22:04 -05:00
Tommi Rantala ee3f34e857 sctp: fix CONFIG_SCTP_DBG_MSG=y null pointer dereference in sctp_v6_get_dst()
Trinity (the syscall fuzzer) triggered the following BUG, reproducible
only when the kernel is configured with CONFIG_SCTP_DBG_MSG=y.

When CONFIG_SCTP_DBG_MSG is not set, the null pointer is never
dereferenced.

---[ end trace a4de0bfcb38a3642 ]---
BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
IP: [<ffffffff8136796e>] ip6_string+0x1e/0xa0
PGD 4eead067 PUD 4e472067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU 3
Pid: 21324, comm: trinity-child11 Tainted: G        W    3.7.0-rc7+ #61 ASUSTeK Computer INC. EB1012/EB1012
RIP: 0010:[<ffffffff8136796e>]  [<ffffffff8136796e>] ip6_string+0x1e/0xa0
RSP: 0018:ffff88004e4637a0  EFLAGS: 00010046
RAX: ffff88004e4637da RBX: ffff88004e4637da RCX: 0000000000000000
RDX: ffffffff8246e92a RSI: 0000000000000100 RDI: ffff88004e4637da
RBP: ffff88004e4637a8 R08: 000000000000ffff R09: 000000000000ffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8289d600
R13: ffffffff8289d230 R14: ffffffff8246e928 R15: ffffffff8289d600
FS:  00007fed95153700(0000) GS:ffff88005fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000100 CR3: 000000004eeac000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process trinity-child11 (pid: 21324, threadinfo ffff88004e462000, task ffff8800524b0000)
Stack:
 ffff88004e4637da ffff88004e463828 ffffffff81368eee 000000004e4637d8
 ffffffff0000ffff ffff88000000ffff 0000000000000000 000000004e4637f8
 ffffffff826285d8 ffff88004e4637f8 0000000000000000 ffff8800524b06b0
Call Trace:
 [<ffffffff81368eee>] ip6_addr_string.isra.11+0x3e/0xa0
 [<ffffffff81369183>] pointer.isra.12+0x233/0x2d0
 [<ffffffff810a413a>] ? vprintk_emit+0x1ba/0x450
 [<ffffffff8110953d>] ? trace_hardirqs_on_caller+0x10d/0x1a0
 [<ffffffff81369757>] vsnprintf+0x187/0x5d0
 [<ffffffff81369c62>] vscnprintf+0x12/0x30
 [<ffffffff810a4028>] vprintk_emit+0xa8/0x450
 [<ffffffff81e5cb00>] printk+0x49/0x4b
 [<ffffffff81d17221>] sctp_v6_get_dst+0x731/0x780
 [<ffffffff81d16e15>] ? sctp_v6_get_dst+0x325/0x780
 [<ffffffff81d00a96>] sctp_transport_route+0x46/0x120
 [<ffffffff81cff0f1>] sctp_assoc_add_peer+0x161/0x350
 [<ffffffff81d0fd8d>] sctp_sendmsg+0x6cd/0xcb0
 [<ffffffff81b55bf0>] ? inet_create+0x670/0x670
 [<ffffffff81b55cfb>] inet_sendmsg+0x10b/0x220
 [<ffffffff81b55bf0>] ? inet_create+0x670/0x670
 [<ffffffff81a72a64>] ? sock_update_classid+0xa4/0x2b0
 [<ffffffff81a72ab0>] ? sock_update_classid+0xf0/0x2b0
 [<ffffffff81a6ac1c>] sock_sendmsg+0xdc/0xf0
 [<ffffffff8118e9e5>] ? might_fault+0x85/0x90
 [<ffffffff8118e99c>] ? might_fault+0x3c/0x90
 [<ffffffff81a6e12a>] sys_sendto+0xfa/0x130
 [<ffffffff810a9887>] ? do_setitimer+0x197/0x380
 [<ffffffff81e960d5>] ? sysret_check+0x22/0x5d
 [<ffffffff81e960a9>] system_call_fastpath+0x16/0x1b
Code: 01 eb 89 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 f8 31 c9 48 89 e5 53 eb 12 0f 1f 40 00 48 83 c1 01 48 83 c0 04 48 83 f9 08 74 70 <0f> b6 3c 4e 89 fb 83 e7 0f c0 eb 04 41 89 d8 41 83 e0 0f 0f b6
RIP  [<ffffffff8136796e>] ip6_string+0x1e/0xa0
 RSP <ffff88004e4637a0>
CR2: 0000000000000100
---[ end trace a4de0bfcb38a3643 ]---

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:21:27 -05:00
Alan Ott 92a2ec72a7 mac802154: use kfree_skb() instead of dev_kfree_skb()
kfree_skb() indicates failure, which is where this is being used.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:19:24 -05:00
Alan Ott fcefbe9fcb mac802154: fix memory leaks
kfree_skb() was not getting called in the case of some failures.
This was pointed out by Eric Dumazet.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:19:24 -05:00
Alan Ott b333b7e6ec 6lowpan: consider checksum bytes in fragmentation threshold
Change the threshold for framentation of a lowpan packet from
using the MTU size to now use the MTU size minus the checksum length,
which is added by the hardware. For IEEE 802.15.4, this effectively
changes it from 127 bytes to 125 bytes.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:19:24 -05:00
Yi Zou 6e22ce2c6e 8021q: fix vlan device to inherit the unicast filtering capability flag
This bug is observed on running FCoE over a VLAN device associated w/
a real device that has IFF_UNICAST_FLT set since FCoE would add unicast
address such as FLOGI MAC to the VLAN interface that FCoE is on. Since
currently, VLAN device is not inheriting the IFF_UNICAST_FLT flag from the
parent real device even though the real device is capable of doing unicast
filtering. This forces the VLAN device and its real device go to promiscuous
mode unnecessarily even the added address is actually being added to the
available unicast filter table in real device.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Cc: devel@open-fcoe.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:07:27 -05:00
David S. Miller e7165030db Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Conflicts:
	net/ipv6/exthdrs_core.c

Jesse Gross says:

====================
This series of improvements for 3.8/net-next contains four components:
 * Support for modifying IPv6 headers
 * Support for matching and setting skb->mark for better integration with
   things like iptables
 * Ability to recognize the EtherType for RARP packets
 * Two small performance enhancements

The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge
conflicts.  I left it as is but can do the merge if you want.  The conflicts
are:
 * ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of
   exthdrs_core.c.  Both should stay.
 * A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c
   after this patch.  The IPVS user has two instances of the old constant
   name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:01:30 -05:00
Krzysztof Mazur 397ff16dce pppoatm: do not inline pppoatm_may_send()
The pppoatm_may_send() is quite heavy and it's called three times
in pppoatm_send() and inlining costs more than 200 bytes of code
(more than 10% of total pppoatm driver code size).

add/remove: 1/0 grow/shrink: 0/1 up/down: 132/-367 (-235)
function                                     old     new   delta
pppoatm_may_send                               -     132    +132
pppoatm_send                                 900     533    -367

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-11-30 12:23:19 +00:00
Krzysztof Mazur 071d93931a pppoatm: drop frames to not-ready vcc
The vcc_destroy_socket() closes vcc before the protocol is detached
from vcc by calling vcc->push() with NULL skb. This leaves some time
window, where the protocol may call vcc->send() on closed vcc
and crash.

Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
indicate that vcc is not ready. If the vcc is not ready we just
drop frame. Queueing frames is much more complicated because we
don't have callbacks that inform us about vcc flags changes.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-11-30 12:21:42 +00:00
Antonio Quartulli af5d4f7737 batman-adv: use ETH_P_BATMAN
The ETH_P_BATMAN ethertype is now defined kernel-wide. Use it instead
of the private BATADV_ETH_P_BATMAN define.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2012-11-30 10:50:22 +01:00
Rami Rosen bb728820fe core: make GRO methods static.
This patch changes three methods to be static and removes their
EXPORT_SYMBOLs in core/dev.c and their external declaration in
netdevice.h. The methods, dev_gro_receive(), napi_frags_finish() and
napi_skb_finish(), which are in the GRO rx path, are not used
outside core/dev.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 13:18:32 -05:00
David S. Miller 8a2cf062b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 12:51:17 -05:00
David S. Miller a45085f6a7 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Two small openswitch fixes from Jesse Gross.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 18:00:47 -05:00
David S. Miller 83a9d197c7 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This pull request is intended for the 3.8 stream.  It is a bit large
-- I guess Thanksgiving got me off track!  At least the code got to
spend some time in linux-next... :-)

This includes the usual batch of pulls for Bluetooth, NFC, and mac80211
as well as iwlwifi.  Also here is an ath6kl pull, and a new driver
in the rtlwifi family.  The brcmfmac, brcmsmac, ath9k, and mwl8k get
their usual levels of attention, and a handful of other updates tag
along as well.

For more detail on the pulls, please see below...

On Bluetooth, Gustavo says:

"Another set of patches for integration in wireless-next. There are two big set
of changes in it: Andrei Emeltchenko and Mat Martineau added more patches
towards a full Bluetooth High Speed support and Johan Hedberg improve the
single mode support for Bluetooth dongles. Apart from that we have small fixes
and improvements."

...and:

"A few patches to 3.8. The majority of the work here is from Andrei on the High
Speed support. Other than that Johan added support for setting LE advertising
data. The rest are fixes and clean ups and small improvements like support for
a new broadcom hardware."

On mac80211, Johannes says:

"This is for mac80211, for -next (3.8). Plenty of changes, as you can see
below. Some fixes for previous changes like the export.h include, the
beacon listener fix from Ben Greear, etc. Overall, no exciting new
features, though hwsim does gain channel context support for people to
try it out and look at."

...and...:

"This one contains the mac80211-next material. Apart from a few small new
features and cleanups I have two fixes for the channel context code. The
RX_END timestamp support will probably be reworked again as Simon Barber
noted the calculations weren't really valid, but the discussions there
are still going on and it's better than what we had before."

...and:

"Please pull (see below) to get the following changes:
 * a fix & a debug aid in IBSS from Antonio,
 * mesh cleanups from Marco,
 * a few bugfixes for some of my previous patches from Arend and myself,
 * and the big initial VHT support patchset"

And on iwlwifi, Johannes says:

"In addition to the previous four patches that I'm not resending,
we have a number of cleanups, message reduction, firmware error
handling improvements (yes yes... we need to fix them instead)
and various other small things all over."

...and:

"In his quest to try to understand the current iwlwifi problems (like
stuck queues etc.) Emmanuel has first cleaned up the PCIe code, I'm
including his changes in this pull request. Other than that I only have
a small cleanup from Sachin Kamat to remove a duplicate include and a
bugfix to turn off MFP if software crypto is enabled, but this isn't
really interesting as MFP isn't supported right now anyway."

On NFC, Samuel says:

"With this one we have:

- A few HCI improvements in preparation for an upcoming HCI chipset support.
- A pn544 code cleanup after the old driver was removed.
- An LLCP improvement for notifying user space when one peer stops ACKing I
  frames."

On ath6kl, Kalle says:

"Major changes this time are firmware recover support to gracefully
handle if firmware crashes, support for changing regulatory domain and
support for new ar6004 hardware revision 1.4. Otherwise there are just
smaller fixes or cleanups from different people."

Thats about it... :-)  Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 17:49:16 -05:00
Jesse Gross 92eb1d4771 openvswitch: Use RCU callback when detaching netdevices.
Currently, each time a device is detached from an OVS datapath
we call synchronize RCU before freeing associated data structures.
However, if a bridge is deleted (which detaches all ports) when
many devices are connected then there can be a long delay.  This
switches to use call_rcu() to group the cost together.

Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-11-28 14:04:34 -08:00
Nicolas Dichtel f4e0b4c5e1 ip6tnl/sit: drop packet if ECN present with not-ECT
This patch reports the change made by Stephen Hemminger in ipip and gre[6] in
commit eccc1bb8d4 (tunnel: drop packet if ECN present with not-ECT).

Goal is to handle RFC6040, Section 4.2:

Default Tunnel Egress Behaviour.
 o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
      propagate any other ECN codepoint onwards.  This is because the
      inner Not-ECT marking is set by transports that rely on dropped
      packets as an indication of congestion and would not understand or
      respond to any other ECN codepoint [RFC4774].  Specifically:

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         CE, the decapsulator MUST drop the packet.

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
         outgoing packet with the ECN field cleared to Not-ECT.

The patch takes benefits from common function added in net/inet_ecn.h.

Like it was done for Xin4 tunnels, it adds logging to allow detecting broken
systems that set ECN bits incorrectly when tunneling (or an intermediate
router might be changing the header). Errors are also tracked via
rx_frame_error.

CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:37:11 -05:00
David S. Miller 52f2ede1ce Merge branch 'master' of git://1984.lsi.us.es/nf
An interface name overflow fix in netfilter via Pablo Neira Ayuso.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:29:43 -05:00
Tommi Rantala c3b2c25819 irda: irttp: fix memory leak in irttp_open_tsap() error path
Cleanup the memory we allocated earlier in irttp_open_tsap() when we hit
this error path. The leak goes back to at least 1da177e4
("Linux-2.6.12-rc2").

Discovered with Trinity (the syscall fuzzer).

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:25:29 -05:00
Paolo Valente 462dbc9101 pkt_sched: QFQ Plus: fair-queueing service at DRR cost
This patch turns QFQ into QFQ+, a variant of QFQ that provides the
following two benefits: 1) QFQ+ is faster than QFQ, 2) differently
from QFQ, QFQ+ correctly schedules also non-leaves classes in a
hierarchical setting. A detailed description of QFQ+, plus a
performance comparison with DRR and QFQ, can be found in [1].

[1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:19:35 -05:00
Schoch Christian 92d64c261e sctp: Error in calculation of RTTvar
The calculation of RTTVAR involves the subtraction of two unsigned
numbers which
may causes rollover and results in very high values of RTTVAR when RTT > SRTT.
With this patch it is possible to set RTOmin = 1 to get the minimum of RTO at
4 times the clock granularity.

Change Notes:

v2)
        *Replaced abs() by abs64() and long by __s64, changed patch
description.

Signed-off-by: Christian Schoch <e0326715@student.tuwien.ac.at>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:13:40 -05:00
Tommi Rantala 6e51fe7572 sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall
Consider the following program, that sets the second argument to the
sendto() syscall incorrectly:

 #include <string.h>
 #include <arpa/inet.h>
 #include <sys/socket.h>

 int main(void)
 {
         int fd;
         struct sockaddr_in sa;

         fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
         if (fd < 0)
                 return 1;

         memset(&sa, 0, sizeof(sa));
         sa.sin_family = AF_INET;
         sa.sin_addr.s_addr = inet_addr("127.0.0.1");
         sa.sin_port = htons(11111);

         sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

         return 0;
 }

We get -ENOMEM:

 $ strace -e sendto ./demo
 sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory)

Propagate the error code from sctp_user_addto_chunk(), so that we will
tell user space what actually went wrong:

 $ strace -e sendto ./demo
 sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address)

Noticed while running Trinity (the syscall fuzzer).

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:11:17 -05:00
Tommi Rantala be364c8c0f sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails
Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:

 #include <string.h>
 #include <arpa/inet.h>
 #include <sys/socket.h>

 int main(void)
 {
         int fd;
         struct sockaddr_in sa;

         fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
         if (fd < 0)
                 return 1;

         memset(&sa, 0, sizeof(sa));
         sa.sin_family = AF_INET;
         sa.sin_addr.s_addr = inet_addr("127.0.0.1");
         sa.sin_port = htons(11111);

         sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

         return 0;
 }

As far as I can tell, the leak has been around since ~2003.

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:10:09 -05:00
John W. Linville 79d38f7d6c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/tx.c
2012-11-28 10:56:03 -05:00
Krzysztof Mazur 3ac108006f pppoatm: take ATM socket lock in pppoatm_send()
The pppoatm_send() does not take any lock that will prevent concurrent
vcc_sendmsg(). This causes two problems:

	- there is no locking between checking the send queue size
	  with atm_may_send() and incrementing sk_wmem_alloc,
	  and the real queue size can be a little higher than sk_sndbuf

	- the vcc->sendmsg() can be called concurrently. I'm not sure
	  if it's allowed. Some drivers (eni, nicstar, ...) seem
	  to assume it will never happen.

Now pppoatm_send() takes ATM socket lock, the same that is used
in vcc_sendmsg() and other ATM socket functions. The pppoatm_send()
is called with BH disabled, so bh_lock_sock() is used instead
of lock_sock().

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-11-28 00:37:05 +00:00
Krzysztof Mazur e41faed9cd pppoatm: fix module_put() race
The pppoatm used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.

Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-11-28 00:37:04 +00:00
Krzysztof Mazur 3b1a914595 pppoatm: allow assign only on a connected socket
The pppoatm does not check if used vcc is in connected state,
causing an Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.

Now pppoatm can be assigned only on connected sockets; otherwise
-EINVAL error is returned.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-11-28 00:37:02 +00:00
Krzysztof Mazur ec809bd817 atm: add owner of push() callback to atmvcc
The atm is using atmvcc->push(vcc, NULL) callback to notify protocol
that vcc will be closed and protocol must detach from it. This callback
is usually used by protocol to decrement module usage count by module_put(),
but it leaves small window then module is still used after module_put().

Now the owner of push() callback is kept in atmvcc and
module_put(atmvcc->owner) is called after the protocol is detached from vcc.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
2012-11-28 00:36:48 +00:00
Eric Dumazet b49d3c1e1c net: ipmr: limit MRT_TABLE identifiers
Name of pimreg devices are built from following format :

char name[IFNAMSIZ]; // IFNAMSIZ == 16

sprintf(name, "pimreg%u", mrt->id);

We must therefore limit mrt->id to 9 decimal digits
or risk a buffer overflow and a crash.

Restrict table identifiers in [0 ... 999999999] interval.

Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26 17:36:59 -05:00
Joe Perches 03f52a0a55 ip6mr: Add sizeof verification to MRT6_ASSERT and MT6_PIM
Verify the length of the user-space arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26 17:35:58 -05:00
Neal Cardwell e1a676424c ipv4: avoid passing NULL to inet_putpeer() in icmpv4_xrlim_allow()
inet_getpeer_v4() can return NULL under OOM conditions, and while
inet_peer_xrlim_allow() is OK with a NULL peer, inet_putpeer() will
crash.

This code path now uses the same idiom as the others from:
1d861aa4b3 ("inet: Minimize use of
cached route inetpeer.").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26 17:24:41 -05:00
Brian Haley c91f6df2db sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name
Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
will then require another call like if_indextoname() to get the actual interface
name, have it return the name directly.

This also matches the existing man page description on socket(7) which mentions
the argument being an interface name.

If the value has not been set, zero is returned and optlen will be set to zero
to indicate there is no interface name present.

Added a seqlock to protect this code path, and dev_ifname(), from someone
changing the device name via dev_change_name().

v2: Added seqlock protection while copying device name.

v3: Fixed word wrap in patch.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26 17:22:14 -05:00
David Woodhouse ae088d663b atm: br2684: Fix excessive queue bloat
There's really no excuse for an additional wmem_default of buffering
between the netdev queue and the ATM device. Two packets (one in-flight,
and one ready to send) ought to be fine. It's not as if it should take
long to get another from the netdev queue when we need it.

If necessary we can make the queue space configurable later, but I don't
think it's likely to be necessary.

cf. commit 9d02daf754 (pppoatm: Fix
excessive queue bloat) which did something very similar for PPPoATM.

Note that there is a tremendously unlikely race condition which may
result in qspace temporarily going negative. If a CPU running the
br2684_pop() function goes off into the weeds for a long period of time
after incrementing qspace to 1, but before calling netdev_wake_queue()...
and another CPU ends up calling br2684_start_xmit() and *stopping* the
queue again before the first CPU comes back, the netdev queue could
end up being woken when qspace has already reached zero.

An alternative approach to coping with this race would be to check in
br2684_start_xmit() for qspace==0 and return NETDEV_TX_BUSY, but just
using '> 0' and '< 1' for comparison instead of '== 0' and '!= 0' is
simpler. It just warranted a mention of *why* we do it that way...

Move the call to atmvcc->send() to happen *after* the accounting and
potentially stopping the netdev queue, in br2684_xmit_vcc(). This matters
if the ->send() call suffers an immediate failure, because it'll call
br2684_pop() with the offending skb before returning. We want that to
happen *after* we've done the initial accounting for the packet in
question. Also make it return an appropriate success/failure indication
while we're at it.

Tested by running 'ping -l 1000 bottomless.aaisp.net.uk' from within my
network, with only a single PPPoE-over-BR2684 link running. And after
setting txqueuelen on the nas0 interface to something low (5, in fact).
Before the patch, we'd see about 15 packets being queued and a resulting
latency of ~56ms being reached. After the patch, we see only about 8,
which is fairly much what we expect. And a max latency of ~36ms. On this
OpenWRT box, wmem_default is 163840.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Reviewed-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26 17:13:56 -05:00
Ben Hutchings b3422a314c dsa: Hide core config options; make drivers select what they need
Commit 82167cb8c6 ('net: dsa/slave: Fix
compilation warnings') fixed one possible invalid configuration
(NET_DSA enabled with no trailer formats) but added others: drivers
can select NET_DSA without its dependencies being met.

It's not very useful to make either the DSA core or the tagging
formats manually selectable without a driver to use them, so:

1. Define a hidden HAVE_NET_DSA option and move the dependencies of
   NET_DSA to that.  While we're at it, drop the deprecated
   EXPERIMENTAL dependency.
2. Make NET_DSA and the drivers dependent on HAVE_NET_DSA.
3. Hide the tagging format options again.
4. Make drivers select both NET_DSA and the appropriate tagging format
   option.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26 17:10:44 -05:00
Oliver Hartkopp 81b401100c can: bcm: initialize ifindex for timeouts without previous frame reception
Set in the rx_ifindex to pass the correct interface index in the case of a
message timeout detection. Usually the rx_ifindex value is set at receive
time. But when no CAN frame has been received the RX_TIMEOUT notification
did not contain a valid value.

Cc: linux-stable <stable@vger.kernel.org>
Reported-by: Andre Naujoks <nautsch2@googlemail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-11-26 22:33:59 +01:00