Commit Graph

34671 Commits

Author SHA1 Message Date
WANG Cong 2c1a4311b6 neigh: check error pointer instead of NULL for ipv4_neigh_lookup()
Fixes: commit f187bc6efb ("ipv4: No need to set generic neighbour pointer")
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:16:04 -04:00
Florian Fainelli 7905288f09 net: dsa: allow switches driver to implement get/set EEE
Allow switches driver to query and enable/disable EEE on a per-port
basis by implementing the ethtool_{get,set}_eee settings and delegating
these operations to the switch driver.

set_eee() will need to coordinate with the PHY driver to make sure that
EEE is enabled, the link-partner supports it and the auto-negotiation
result is satisfactory.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:09 -04:00
Florian Fainelli b2f2af21e3 net: dsa: allow enabling and disable switch ports
Whenever a per-port network device is used/unused, invoke the switch
driver port_enable/port_disable callbacks to allow saving as much power
as possible by disabling unused parts of the switch (RX/TX logic, memory
arrays, PHYs...). We supply a PHY device argument to make sure the
switch driver can act on the PHY device if needed (like putting/taking
the PHY out of deep low power mode).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:08 -04:00
Florian Fainelli f7f1de51ed net: dsa: start and stop the PHY state machine
dsa_slave_open() should start the PHY library state machine for its PHY
interface, and dsa_slave_close() should stop the PHY library state
machine accordingly.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:08 -04:00
Peter Pan(潘卫平) 155c6e1ad4 tcp: use tcp_flags in tcp_data_queue()
This patch is a cleanup which follows the idea in commit e11ecddf51 (tcp: use
TCP_SKB_CB(skb)->tcp_flags in input path),
and it may reduce register pressure since skb->cb[] access is fast,
bacause skb is probably in a register.

v2: remove variable th
v3: reword the changelog

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:37:57 -04:00
Eric Dumazet cd7d8498c9 tcp: change tcp_skb_pcount() location
Our goal is to access no more than one cache line access per skb in
a write or receive queue when doing the various walks.

After recent TCP_SKB_CB() reorganizations, it is almost done.

Last part is tcp_skb_pcount() which currently uses
skb_shinfo(skb)->gso_segs, which is a terrible choice, because it needs
3 cache lines in current kernel (skb->head, skb->end, and
shinfo->gso_segs are all in 3 different cache lines, far from skb->cb)

This very simple patch reuses space currently taken by tcp_tw_isn
only in input path, as tcp_skb_pcount is only needed for skb stored in
write queue.

This considerably speeds up tcp_ack(), granted we avoid shinfo->tx_flags
to get SKBTX_ACK_TSTAMP, which seems possible.

This also speeds up all sack processing in general.

This speeds up tcp_sendmsg() because it no longer has to access/dirty
shinfo.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:36:48 -04:00
Eric Dumazet 971f10eca1 tcp: better TCP_SKB_CB layout to reduce cache line misses
TCP maintains lists of skb in write queue, and in receive queues
(in order and out of order queues)

Scanning these lists both in input and output path usually requires
access to skb->next, TCP_SKB_CB(skb)->seq, and TCP_SKB_CB(skb)->end_seq

These fields are currently in two different cache lines, meaning we
waste lot of memory bandwidth when these queues are big and flows
have either packet drops or packet reorders.

We can move TCP_SKB_CB(skb)->header at the end of TCP_SKB_CB, because
this header is not used in fast path. This allows TCP to search much faster
in the skb lists.

Even with regular flows, we save one cache line miss in fast path.

Thanks to Christoph Paasch for noticing we need to cleanup
skb->cb[] (IPCB/IP6CB) before entering IP stack in tx path,
and that I forgot IPCB use in tcp_v4_hnd_req() and tcp_v4_save_options().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:43 -04:00
Eric Dumazet a224772db8 ipv6: add a struct inet6_skb_parm param to ipv6_opt_accepted()
ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm
that it needs. Lets not assume this, as TCP stack might use a different
place.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:43 -04:00
Eric Dumazet 24a2d43d88 ipv4: rename ip_options_echo to __ip_options_echo()
ip_options_echo() assumes struct ip_options is provided in &IPCB(skb)->opt
Lets break this assumption, but provide a helper to not change all call points.

ip_send_unicast_reply() gets a new struct ip_options pointer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:42 -04:00
Steffen Klassert cd0a0bd9b8 ip6_gre: Return an error when adding an existing tunnel.
ip6gre_tunnel_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.

So return NULL if the tunnel that should be created already
exists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:19:46 -04:00
Steffen Klassert d814b847be ip6_vti: Return an error when adding an existing tunnel.
vti6_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.

So return NULL if the tunnel that should be created already
exists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:19:46 -04:00
Steffen Klassert 2b0bb01b6e ip6_tunnel: Return an error when adding an existing tunnel.
ip6_tnl_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.

So return NULL if the tunnel that should be created already
exists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:19:46 -04:00
Dan Williams 3f33407856 net: make tcp_cleanup_rbuf private
net_dma was the only external user so this can become local to tcp.c
again.

Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:22:21 -07:00
Dan Williams d27f9bc104 net_dma: revert 'copied_early'
Now that tcp_dma_try_early_copy() is gone nothing ever sets
copied_early.

Also reverts "53240c208776 tcp: Fix possible double-ack w/ user dma"
since it is no longer necessary.

Cc: Ali Saidi <saidi@engin.umich.edu>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Neal Cardwell <ncardwell@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:22:21 -07:00
Dan Williams 7bced39751 net_dma: simple removal
Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.

This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.

Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():

    https://lkml.org/lkml/2014/9/3/177

Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: <stable@vger.kernel.org>
Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:05:16 -07:00
Nicolas Dichtel 5a4ee9a9a0 ip6gre: add a rtnl link alias for ip6gretap
With this alias, we don't need to load manually the module before adding an
ip6gretap interface with iproute2.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 17:15:57 -04:00
Eric Dumazet ff04a771ad net : optimize skb_release_data()
Cache skb_shinfo(skb) in a variable to avoid computing it multiple
times.

Reorganize the tests to remove one indentation level.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:53:49 -04:00
Wang Sheng-Hui 8280bf00fd net/openvswitch: remove dup comment in vport.h
Remove the duplicated comment
"/* The following definitions are for users of the vport subsytem: */"
in vport.h

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:42:33 -04:00
David S. Miller e7af85db54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
nf pull request for net

This series contains netfilter fixes for net, they are:

1) Fix lockdep splat in nft_hash when releasing sets from the
   rcu_callback context. We don't the mutex there anymore.

2) Remove unnecessary spinlock_bh in the destroy path of the nf_tables
   rbtree set type from rcu_callback context.

3) Fix another lockdep splat in rhashtable. None of the callers hold
   a mutex when calling rhashtable_destroy.

4) Fix duplicated error reporting from nfnetlink when aborting and
   replaying a batch.

5) Fix a Kconfig issue reported by kbuild robot.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:21:29 -04:00
LEROY Christophe 58e3cac561 net: optimise inet_proto_csum_replace4()
csum_partial() is a generic function which is not optimised for small fixed
length calculations, and its use requires to store "from" and "to" values in
memory while we already have them available in registers. This also has impact,
especially on RISC processors. In the same spirit as the change done by
Eric Dumazet on csum_replace2(), this patch rewrites inet_proto_csum_replace4()
taking into account RFC1624.

I spotted during a NATted tcp transfert that csum_partial() is one of top 5
consuming functions (around 8%), and the second user of csum_partial() is
inet_proto_csum_replace4().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:14:17 -04:00
Eric Dumazet f4a775d144 net: introduce __skb_header_release()
While profiling TCP stack, I noticed one useless atomic operation
in tcp_sendmsg(), caused by skb_header_release().

It turns out all current skb_header_release() users have a fresh skb,
that no other user can see, so we can avoid one atomic operation.

Introduce __skb_header_release() to clearly document this.

This gave me a 1.5 % improvement on TCP_RR workload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:40:06 -04:00
David S. Miller 57219dc7bf Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-09-22

Please pull this batch of updates intended for the 3.18 stream...

For the mac80211 bits, Johannes says:

"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."

For the bluetooth bits, Johan says:

"Here are some more patches intended for 3.18. Most of them are cleanups
or fixes for SMP. The only exception is a fix for BR/EDR L2CAP fixed
channels which should now work better together with the L2CAP
information request procedure."

For the iwlwifi bits, Emmanuel says:

"I fix here dvm which was broken by my last pull request. Arik
continues to work on TDLS and Luca solved a few issues in CT-Kill. Eyal
keeps digging into rate scaling code, more to come soon. Besides this,
nothing really special here."

Beyond that, there are the usual big batches of updates to ath9k, b43,
mwifiex, and wil6210 as well as a handful of other bits here and there.
Also, rtlwifi gets some btcoexist attention from Larry.

Please let me know if there are problems!
====================

Had to adjust the wil6210 code to comply with Joe Perches's recent
change in net-next to make the netdev_*() routines return void instead
of 'int'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:39:24 -04:00
Joe Perches 6ea754eb76 net: Change netdev_<level> logging functions to return void
No caller or macro uses the return value so make all
the functions return void.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:17:17 -04:00
John W. Linville 30d3c071a6 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-09-26 13:38:51 -04:00
John W. Linville 330bd4ec9d NFC: 3.18 pull request
This is the NFC pull request for 3.18.
 
 We've had major updates for TI and ST Microelectronics drivers:
 
 For TI's trf7970a driver:
 
 - Target mode support for trf7970a
 - Suspend/resume support for trf7970a
 - DT properties additions to handle different quirks
 - A bunch of fixes for smartphone IOP related issues
 
 For ST Microelectronics' ST21NFCA and ST21NFCB drivers:
 
 - ISO15693 support for st21nfcb
 - checkpatch and sparse related warning fixes
 - Code cleanups and a few minor fixes
 
 Finally, Marvell add ISO15693 support to the NCI stack, together with a
 couple of NCI fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUIg4oAAoJEIqAPN1PVmxKg+kP/RPrgH6LA1tFubIwKR2+sGQ7
 g/W2J3AE8QASZkErkpXRNCt2D/SPWEKBY/qscwz+BtcWg76taIIaGTvVUNtxSaW3
 gS4hG6V1UlANWv3KFfaOKmzjEOO/SPNtkFAyI0cTOaGyUqG4o9BgBZpn1rYO16MD
 ZkSC39MpjMoXB9BbsfQngoUEoWc3tZNMmRzk4IVTwE/wXuQvZxmFQXEAiZ+pnYle
 NQfugaGMz0526rLG3QnrpkUakFb81iQwtONpbx6i8KW/Klkc6TN/ek6J9ecU8t5z
 tdHOViZWRmA1VwMGBHwpq8F2o/ATH6GeivTgqrQjcjGNhCUUT1Ulzve2UxGEMWi6
 ncjKY/GxUrYaMMtRvLv+/knrfbWtd+EnWOav07jgNrrA0tvgBNQvEKKHPoWykDVN
 QKpxu3YoNxrsR/LJMS+Zjj0IIM1Y+9DTOkLXzxJ5Hvht8rOl5heYGh2DICOpWsbQ
 ejrQicJOJvN5vqu+Sgcqq4msyTEdbs2LfRDrW1VC9A6ILI+KzYg2laTFGMnhZ5qn
 TgsYIDdONS2iGUulFHylGHI7ANtUg/mhklLUccY1HQYyiAM1NQUtzq1tAz6yLoIH
 l8iIiyzJSBWW57nWhyrULEbzHgPE+bHIjO4T+UUOxMgquYa4V11S1uP0OfWfZogR
 xS24GlobS2oXHMQqh0fA
 =d/C+
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.18 pull request

This is the NFC pull request for 3.18.

We've had major updates for TI and ST Microelectronics drivers:

For TI's trf7970a driver:

- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues

For ST Microelectronics' ST21NFCA and ST21NFCB drivers:

- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes

Finally, Marvell add ISO15693 support to the NCI stack, together with a
couple of NCI fixes."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-26 13:37:02 -04:00
Pablo Neira Ayuso 34666d467c netfilter: bridge: move br_netfilter out of the core
Jesper reported that br_netfilter always registers the hooks since
this is part of the bridge core. This harms performance for people that
don't need this.

This patch modularizes br_netfilter so it can be rmmod'ed, thus,
the hooks can be unregistered. I think the bridge netfilter should have
been a separated module since the beginning, Patrick agreed on that.

Note that this is breaking compatibility for users that expect that
bridge netfilter is going to be available after explicitly 'modprobe
bridge' or via automatic load through brctl.

However, the damage can be easily undone by modprobing br_netfilter.
The bridge core also spots a message to provide a clue to people that
didn't notice that this has been deprecated.

On top of that, the plan is that nftables will not rely on this software
layer, but integrate the connection tracking into the bridge layer to
enable stateful filtering and NAT, which is was bridge netfilter users
seem to require.

This patch still keeps the fake_dst_ops in the bridge core, since this
is required by when the bridge port is initialized. So we can safely
modprobe/rmmod br_netfilter anytime.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
2014-09-26 18:42:31 +02:00
Pablo Neira Ayuso 7276ca3fa2 netfilter: bridge: nf_bridge_copy_header as static inline in header
Move nf_bridge_copy_header() as static inline in netfilter_bridge.h
header file. This patch prepares the modularization of the br_netfilter
code.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-26 18:42:30 +02:00
Rob Jones 772476df70 net/netfilter/x_tables.c: use __seq_open_private()
Reduce boilerplate code by using __seq_open_private() instead of seq_open()
in xt_match_open() and xt_target_open().

Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-26 18:42:29 +02:00
Steffen Klassert d61746b2e7 ip_tunnel: Don't allow to add the same tunnel multiple times.
When we try to add an already existing tunnel, we don't return
an error. Instead we continue and call ip_tunnel_update().
This means that we can change existing tunnels by adding
the same tunnel multiple times. It is even possible to change
the tunnel endpoints of the fallback device.

We fix this by returning an error if we try to add an existing
tunnel.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 00:41:30 -04:00
Eric Dumazet 4a8e320c92 net: sched: use pinned timers
While using a MQ + NETEM setup, I had confirmation that the default
timer migration ( /proc/sys/kernel/timer_migration ) is killing us.

Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
queues) :

EST="est 1sec 4sec"
for ETH in eth1
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
 tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
 tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
 tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
 tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
 tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
 tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
 tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
done

We can see that timers get migrated into a single cpu, presumably idle
at the time timers are set up.
Then all qdisc dequeues run from this cpu and huge lock contention
happens. This single cpu is stuck in softirq mode and cannot dequeue
fast enough.

    39.24%  [kernel]          [k] _raw_spin_lock
     2.65%  [kernel]          [k] netem_enqueue
     1.80%  [kernel]          [k] netem_dequeue
     1.63%  [kernel]          [k] copy_user_enhanced_fast_string
     1.45%  [kernel]          [k] _raw_spin_lock_bh

By pinning qdisc timers on the cpu running the qdisc, we respect proper
XPS setting and remove this lock contention.

     5.84%  [kernel]          [k] netem_enqueue
     4.83%  [kernel]          [k] _raw_spin_lock
     2.92%  [kernel]          [k] copy_user_enhanced_fast_string

Current Qdiscs that benefit from this change are :

	netem, cbq, fq, hfsc, tbf, htb.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 00:26:48 -04:00
Tom Herbert 53e5039896 net: Remove gso_send_check as an offload callback
The send_check logic was only interesting in cases of TCP offload and
UDP UFO where the checksum needed to be initialized to the pseudo
header checksum. Now we've moved that logic into the related
gso_segment functions so gso_send_check is no longer needed.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 00:22:47 -04:00
Tom Herbert f71470b37e udp: move logic out of udp[46]_ufo_send_check
In udp[46]_ufo_send_check the UDP checksum initialized to the pseudo
header checksum. We can move this logic into udp[46]_ufo_fragment.
After this change udp[64]_ufo_send_check is a no-op.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 00:22:46 -04:00
Tom Herbert d020f8f733 tcp: move logic out of tcp_v[64]_gso_send_check
In tcp_v[46]_gso_send_check the TCP checksum is initialized to the
pseudo header checksum using __tcp_v[46]_send_check. We can move this
logic into new tcp[46]_gso_segment functions to be done when
ip_summed != CHECKSUM_PARTIAL (ip_summed == CHECKSUM_PARTIAL should be
the common case, possibly always true when taking GSO path). After this
change tcp_v[46]_gso_send_check is no-op.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 00:22:46 -04:00
Trond Myklebust 2aca5b869a SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in
order to allow NFSv4 clients to disable resend timeouts. Since those
cause the RPC layer to break the connection, they mess up the duplicate
reply caches that remain indexed on the port number in NFSv4..

This patch includes the code that was missing in the original to
set the appropriate flag in struct rpc_clnt, when the caller of
rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT.

Fixes: 8a19a0b6cb (SUNRPC: Add RPC task and client level options to...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-25 21:25:17 -04:00
NeilBrown 1aff525629 NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
 - it doesn't hurt for kswapd to block occasionally.  If it doesn't
   want to block it would clear __GFP_WAIT.  The current_is_kswapd()
   was only added to avoid deadlocks and we have a new approach for
   that.
 - memory allocation in the SUNRPC layer can very rarely try to
   ->releasepage() a page it is trying to handle.  The deadlock
   is removed as nfs_release_page() doesn't block indefinitely.

So we don't need to set PF_FSTRANS for sunrpc network operations any
more.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-25 08:25:47 -04:00
Johan Hedberg 565766b087 Bluetooth: Rename sco_param_wideband table to esco_param_msbc
The sco_param_wideband table represents the eSCO parameters for
specifically mSBC encoding. This patch renames the table to the more
descriptive esco_param_msbc name.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-25 10:35:08 +02:00
Jason Baron 3dedbb5ca1 rpc: Add -EPERM processing for xs_udp_send_request()
If an iptables drop rule is added for an nfs server, the client can end up in
a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
is ignored since the prior bits of the packet may have been successfully queued
and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
thinks that because some bits were queued it should return -EAGAIN. We then try
the request again and again, resulting in cpu spinning. Reproducer:

1) open a file on the nfs server '/nfs/foo' (mounted using udp)
2) iptables -A OUTPUT -d <nfs server ip> -j DROP
3) write to /nfs/foo
4) close /nfs/foo
5) iptables -D OUTPUT -d <nfs server ip> -j DROP

The softlockup occurs in step 4 above.

The previous patch, allows xs_sendpages() to return both a sent count and
any error values that may have occurred. Thus, if we get an -EPERM, return
that to the higher level code.

With this patch in place we can successfully abort the above sequence and
avoid the softlockup.

I also tried the above test case on an nfs mount on tcp and although the system
does not softlockup, I still ended up with the 'hung_task' firing after 120
seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
since -EPERM appears to get ignored much lower down in the stack and does not
propogate up to xs_sendpages(). This case is not quite as insidious as the
softlockup and it is not addressed here.

Reported-by: Yigong Lou <ylou@akamai.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-24 23:13:46 -04:00
Jason Baron f279cd008f rpc: return sent and err from xs_sendpages()
If an error is returned after the first bits of a packet have already been
successfully queued, xs_sendpages() will return a positive 'int' value
indicating success. Callers seem to treat this as -EAGAIN.

However, there are cases where its not a question of waiting for the write
queue to drain. For example, when there is an iptables rule dropping packets
to the destination, the lower level code can return -EPERM only after parts
of the packet have been successfully queued. In this case, we can end up
continuously retrying resulting in a kernel softlockup.

This patch is intended to make no changes in behavior but is in preparation for
subsequent patches that can make decisions based on both on the number of bytes
sent by xs_sendpages() and any errors that may have be returned.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-24 23:13:37 -04:00
Benjamin Coddington a743419f42 SUNRPC: Don't wake tasks during connection abort
When aborting a connection to preserve source ports, don't wake the task in
xs_error_report.  This allows tasks with RPC_TASK_SOFTCONN to succeed if the
connection needs to be re-established since it preserves the task's status
instead of setting it to the status of the aborting kernel_connect().

This may also avoid a potential conflict on the socket's lock.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-24 23:06:56 -04:00
David S. Miller 4daaab4f0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-24 16:48:32 -04:00
Johan Hedberg c7da579763 Bluetooth: Add retransmission effort into SCO parameter table
It is expected that new parameter combinations will have the
retransmission effort value different between some entries (mainly
because of the new S4 configuration added by HFP 1.7), so it makes sense
to move it into the table instead of having it hard coded based on the
selected SCO_AIRMODE_*.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-24 22:15:29 +02:00
David S. Miller 543a2dff5e Merge tag 'master-2014-09-23' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-09-23

Please consider pulling this one last batch of fixes intended for the 3.17 stream!

For the NFC bits, Samuel says:

"Hopefully not too late for a handful of NFC fixes:

- 2 potential build failures for ST21NFCA and ST21NFCB, triggered by a
  depmod dependenyc cycle.
- One potential buffer overflow in the microread driver."

On top of that...

Emil Goode provides a fix for a brcmfmac off-by-one regression which
was introduced in the 3.17 cycle.

Loic Poulain fixes a polarity mismatch for a variable assignment
inside of rfkill-gpio.

Wojciech Dubowik prevents a NULL pointer dereference in ath9k.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-24 15:00:12 -04:00
Tejun Heo d06efebf0c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18
This is to receive 0a30288da1 ("blk-mq, percpu_ref: implement a
kludge for SCSI blk-mq stall during probe") which implements
__percpu_ref_kill_expedited() to work around SCSI blk-mq stall.  The
commit reverted and patches to implement proper fix will be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
2014-09-24 13:00:21 -04:00
Simon Vincent f19f4f9525 ieee802154: 6lowpan: ensure header compression does not corrupt ipv6 header
The 6lowpan ipv6 header compression was causing problems for other interfaces
that expected a ipv6 header to still be in place, as we were replacing the
ipv6 header with a compressed version. This happened if you sent a packet to a
multicast address as the packet would be output on 802.15.4, ethernet, and also
be sent to the loopback interface. The skb data was shared between these
interfaces so all interfaces ended up with a compressed ipv6 header.

The solution is to ensure that before we do any header compression we are not
sharing the skb or skb data with any other interface. If we are then we must
take a copy of the skb and skb data before modifying the ipv6 header.
The only place we can copy the skb is inside the xmit function so we don't
leave dangling references to skb.

This patch moves all the header compression to inside the xmit function. Very
little code has been changed it has mostly been moved from lowpan_header_create
to lowpan_xmit. At the top of the xmit function we now check if the skb is
shared and if so copy it. In lowpan_header_create all we do now is store the
source and destination addresses for use later when we compress the header.

Signed-off-by: Simon Vincent <simon.vincent@xsilon.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-24 14:15:08 +02:00
Johan Hedberg d41c15cf95 Bluetooth: Fix reason code used for rejecting SCO connections
The core specification defines valid values for the
HCI_Reject_Synchronous_Connection_Request command to be 0x0D-0x0F. So
far the code has been using HCI_ERROR_REMOTE_USER_TERM (0x13) which is
not a valid value and is therefore being rejected by some controllers:

 > HCI Event: Connect Request (0x04) plen 10
	bdaddr 40:6F:2A:6A:E5:E0 class 0x000000 type eSCO
 < HCI Command: Reject Synchronous Connection (0x01|0x002a) plen 7
	bdaddr 40:6F:2A:6A:E5:E0 reason 0x13
	Reason: Remote User Terminated Connection
 > HCI Event: Command Status (0x0f) plen 4
	Reject Synchronous Connection (0x01|0x002a) status 0x12 ncmd 1
	Error: Invalid HCI Command Parameters

This patch introduces a new define for a value from the valid range
(0x0d == Connection Rejected Due To Limited Resources) and uses it
instead for rejecting incoming connections.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-24 14:03:32 +02:00
Joe Perches 2b0bf6c85a Bluetooth: Convert bt_<level> logging functions to return void
No caller or macro uses the return value so make all
the functions return void.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-24 09:40:08 +02:00
Christophe Ricard 9e87f9a9c4 NFC: nci: Add support for proprietary RF Protocols
In NFC Forum NCI specification, some RF Protocol values are
reserved for proprietary use (from 0x80 to 0xfe).
Some CLF vendor may need to use one value within this range
for specific technology.
Furthermore, some CLF may not becompliant with NFC Froum NCI
specification 2.0 and therefore will not support RF Protocol
value 0x06 for PROTOCOL_T5T as mention in a draft specification
and in a recent push.

Adding get_rf_protocol handle to the nci_ops structure will
help to set the correct technology to target.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-09-24 02:02:24 +02:00
Eric Dumazet bd1e75abf4 tcp: add coalescing attempt in tcp_ofo_queue()
In order to make TCP more resilient in presence of reorders, we need
to allow coalescing to happen when skbs from out of order queue are
transferred into receive queue. LRO/GRO can be completely canceled
in some pathological cases, like per packet load balancing on aggregated
links.

I had to move tcp_try_coalesce() up in the file above tcp_ofo_queue()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:47:38 -04:00
Eric Dumazet 4cdf507d54 icmp: add a global rate limitation
Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
protected by a lock, meaning that hosts can be stuck hard if all cpus
want to check ICMP limits.

When say a DNS or NTP server process is restarted, inetpeer tree grows
quick and machine comes to its knees.

iptables can not help because the bottleneck happens before ICMP
messages are even cooked and sent.

This patch adds a new global limitation, using a token bucket filter,
controlled by two new sysctl :

icmp_msgs_per_sec - INTEGER
    Limit maximal number of ICMP packets sent per second from this host.
    Only messages whose type matches icmp_ratemask are
    controlled by this limit.
    Default: 1000

icmp_msgs_burst - INTEGER
    icmp_msgs_per_sec controls number of ICMP packets sent per second,
    while icmp_msgs_burst controls the burst size of these packets.
    Default: 50

Note that if we really want to send millions of ICMP messages per
second, we might extend idea and infra added in commit 04ca6973f7
("ip: make IP identifiers less predictable") :
add a token bucket in the ip_idents hash and no longer rely on inetpeer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:47:38 -04:00
David S. Miller 1f6d80358d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/mips/net/bpf_jit.c
	drivers/net/can/flexcan.c

Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:09:27 -04:00
Bernhard Thaler 48e68ff5e5 Bluetooth: Check for SCO type before setting retransmission effort
SCO connection cannot be setup to devices that do not support retransmission.
Patch based on http://permalink.gmane.org/gmane.linux.bluez.kernel/7779 and
adapted for this kernel version.

Code changed to check SCO/eSCO type before setting retransmission effort
and max. latency. The purpose of the patch is to support older devices not
capable of eSCO.

Tested on Blackberry 655+ headset which does not support retransmission.
Credits go to Alexander Sommerhuber.

Signed-off-by: Bernhard Thaler <bernhard.thaler@r-it.at>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-23 11:30:04 +02:00
Linus Torvalds 98f75b8291 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) If the user gives us a msg_namelen of 0, don't try to interpret
    anything pointed to by msg_name.  From Ani Sinha.

 2) Fix some bnx2i/bnx2fc randconfig compilation errors.

    The gist of the issue is that we firstly have drivers that span both
    SCSI and networking.  And at the top of that chain of dependencies
    we have things like SCSI_FC_ATTRS and SCSI_NETLINK which are
    selected.

    But since select is a sledgehammer and ignores dependencies,
    everything to select's SCSI_FC_ATTRS and/or SCSI_NETLINK has to also
    explicitly select their dependencies and so on and so forth.

    Generally speaking 'select' is supposed to only be used for child
    nodes, those which have no dependencies of their own.  And this
    whole chain of dependencies in the scsi layer violates that rather
    strongly.

    So just make SCSI_NETLINK depend upon it's dependencies, and so on
    and so forth for the things selecting it (either directly or
    indirectly).

    From Anish Bhatt and Randy Dunlap.

 3) Fix generation of blackhole routes in IPSEC, from Steffen Klassert.

 4) Actually notice netdev feature changes in rtl_open() code, from
    Hayes Wang.

 5) Fix divide by zero in bond enslaving, from Nikolay Aleksandrov.

 6) Missing memory barrier in sunvnet driver, from David Stevens.

 7) Don't leave anycast addresses around when ipv6 interface is
    destroyed, from Sabrina Dubroca.

 8) Don't call efx_{arch}_filter_sync_rx_mode before addr_list_lock is
    initialized in SFC driver, from Edward Cree.

 9) Fix missing DMA error checking in 3c59x, from Neal Horman.

10) Openvswitch doesn't emit OVS_FLOW_CMD_NEW notifications accidently,
    fix from Samuel Gauthier.

11) pch_gbe needs to select NET_PTP_CLASSIFY otherwise we can get a
    build error.

12) Fix macvlan regression wherein we stopped emitting
    broadcast/multicast frames over software devices.  From Nicolas
    Dichtel.

13) Fix infiniband bug due to unintended overflow of skb->cb[], from
    Eric Dumazet.  And add an assertion so this doesn't happen again.

14) dm9000_parse_dt() should return error pointers, not NULL.  From
    Tobias Klauser.

15) IP tunneling code uses this_cpu_ptr() in preemptible contexts, fix
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma
  net: bcmgenet: fix TX reclaim accounting for fragments
  ipv4: do not use this_cpu_ptr() in preemptible context
  dm9000: Return an ERR_PTR() in all error conditions of dm9000_parse_dt()
  r8169: fix an if condition
  r8152: disable ALDPS
  ipoib: validate struct ipoib_cb size
  net: sched: shrink struct qdisc_skb_cb to 28 bytes
  tg3: Work around HW/FW limitations with vlan encapsulated frames
  macvlan: allow to enqueue broadcast pkt on virtual device
  pch_gbe: 'select' NET_PTP_CLASSIFY.
  scsi: Use 'depends' with LIBFC instead of 'select'.
  openvswitch: restore OVS_FLOW_CMD_NEW notifications
  genetlink: add function genl_has_listeners()
  lib: rhashtable: remove second linux/log2.h inclusion
  net: allow macvlans to move to net namespace
  3c59x: Fix bad offset spec in skb_frag_dma_map
  3c59x: Add dma error checking and recovery
  sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG
  can: at91_can: add missing prepare and unprepare of the clock
  ...
2014-09-22 18:23:33 -07:00
Eric Dumazet a35165ca10 ipv4: do not use this_cpu_ptr() in preemptible context
this_cpu_ptr() in preemptible context is generally bad

Sep 22 05:05:55 br kernel: [   94.608310] BUG: using smp_processor_id()
in
preemptible [00000000] code: ip/2261
Sep 22 05:05:55 br kernel: [   94.608316] caller is
tunnel_dst_set.isra.28+0x20/0x60 [ip_tunnel]
Sep 22 05:05:55 br kernel: [   94.608319] CPU: 3 PID: 2261 Comm: ip Not
tainted
3.17.0-rc5 #82

We can simply use raw_cpu_ptr(), as preemption is safe in these
contexts.

Should fix https://bugzilla.kernel.org/show_bug.cgi?id=84991

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Joe <joe9mail@gmail.com>
Fixes: 9a4aa9af44 ("ipv4: Use percpu Cache route in IP tunnels")
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 18:31:18 -04:00
Eric Dumazet a2aeb02a8e net: sched: fix compile warning in cls_u32
$ grep CONFIG_CLS_U32_MARK .config
# CONFIG_CLS_U32_MARK is not set

net/sched/cls_u32.c: In function 'u32_change':
net/sched/cls_u32.c:852:1: warning: label 'errout' defined but not used
[-Wunused-label]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:47:19 -04:00
David S. Miller 84de67b298 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2014-09-22

We generate a blackhole or queueing route if a packet
matches an IPsec policy but a state can't be resolved.
Here we assume that dst_output() is called to kill
these packets. Unfortunately this assumption is not
true in all cases, so it is possible that these packets
leave the system without the necessary transformations.

This pull request contains two patches to fix this issue:

1) Fix for blackhole routed packets.

2) Fix for queue routed packets.

Both patches are serious stable candidates.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:41:41 -04:00
Eric Dumazet fcdd1cf4dd tcp: avoid possible arithmetic overflows
icsk_rto is a 32bit field, and icsk_backoff can reach 15 by default,
or more if some sysctl (eg tcp_retries2) are changed.

Better use 64bit to perform icsk_rto << icsk_backoff operations

As Joe Perches suggested, add a helper for this.

Yuchung spotted the tcp_v4_err() case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:27:10 -04:00
Daniel Borkmann 35f7aa5309 ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback
RFC2710 (MLDv1), section 3.7. says:

  The length of a received MLD message is computed by taking the
  IPv6 Payload Length value and subtracting the length of any IPv6
  extension headers present between the IPv6 header and the MLD
  message. If that length is greater than 24 octets, that indicates
  that there are other fields present *beyond* the fields described
  above, perhaps belonging to a *future backwards-compatible* version
  of MLD. An implementation of the version of MLD specified in this
  document *MUST NOT* send an MLD message longer than 24 octets and
  MUST ignore anything past the first 24 octets of a received MLD
  message.

RFC3810 (MLDv2), section 8.2.1. states for *listeners* regarding
presence of MLDv1 routers:

  In order to be compatible with MLDv1 routers, MLDv2 hosts MUST
  operate in version 1 compatibility mode. [...] When Host
  Compatibility Mode is MLDv2, a host acts using the MLDv2 protocol
  on that interface. When Host Compatibility Mode is MLDv1, a host
  acts in MLDv1 compatibility mode, using *only* the MLDv1 protocol,
  on that interface. [...]

While section 8.3.1. specifies *router* behaviour regarding presence
of MLDv1 routers:

  MLDv2 routers may be placed on a network where there is at least
  one MLDv1 router. The following requirements apply:

  If an MLDv1 router is present on the link, the Querier MUST use
  the *lowest* version of MLD present on the network. This must be
  administratively assured. Routers that desire to be compatible
  with MLDv1 MUST have a configuration option to act in MLDv1 mode;
  if an MLDv1 router is present on the link, the system administrator
  must explicitly configure all MLDv2 routers to act in MLDv1 mode.
  When in MLDv1 mode, the Querier MUST send periodic General Queries
  truncated at the Multicast Address field (i.e., 24 bytes long),
  and SHOULD also warn about receiving an MLDv2 Query (such warnings
  must be rate-limited). The Querier MUST also fill in the Maximum
  Response Delay in the Maximum Response Code field, i.e., the
  exponential algorithm described in section 5.1.3. is not used. [...]

That means that we should not get queries from different versions of
MLD. When there's a MLDv1 router present, MLDv2 enforces truncation
and MRC == MRD (both fields are overlapping within the 24 octet range).

Section 8.3.2. specifies behaviour in the presence of MLDv1 multicast
address *listeners*:

  MLDv2 routers may be placed on a network where there are hosts
  that have not yet been upgraded to MLDv2. In order to be compatible
  with MLDv1 hosts, MLDv2 routers MUST operate in version 1 compatibility
  mode. MLDv2 routers keep a compatibility mode per multicast address
  record. The compatibility mode of a multicast address is determined
  from the Multicast Address Compatibility Mode variable, which can be
  in one of the two following states: MLDv1 or MLDv2.

  The Multicast Address Compatibility Mode of a multicast address
  record is set to MLDv1 whenever an MLDv1 Multicast Listener Report is
  *received* for that multicast address. At the same time, the Older
  Version Host Present timer for the multicast address is set to Older
  Version Host Present Timeout seconds. The timer is re-set whenever a
  new MLDv1 Report is received for that multicast address. If the Older
  Version Host Present timer expires, the router switches back to
  Multicast Address Compatibility Mode of MLDv2 for that multicast
  address. [...]

That means, what can happen is the following scenario, that hosts can
act in MLDv1 compatibility mode when they previously have received an
MLDv1 query (or, simply operate in MLDv1 mode-only); and at the same
time, an MLDv2 router could start up and transmits MLDv2 startup query
messages while being unaware of the current operational mode.

Given RFC2710, section 3.7 we would need to answer to that with an MLDv1
listener report, so that the router according to RFC3810, section 8.3.2.
would receive that and internally switch to MLDv1 compatibility as well.

Right now, I believe since the initial implementation of MLDv2, Linux
hosts would just silently drop such MLDv2 queries instead of replying
with an MLDv1 listener report, which would prevent a MLDv2 router going
into fallback mode (until it receives other MLDv1 queries).

Since the mapping of MRC to MRD in exactly such cases can make use of
the exponential algorithm from 5.1.3, we cannot [strictly speaking] be
aware in MLDv1 of the encoding in MRC, it seems also not mentioned by
the RFC. Since encodings are the same up to 32767, assume in such a
situation this value as a hard upper limit we would clamp. We have asked
one of the RFC authors on that regard, and he mentioned that there seem
not to be any implementations that make use of that exponential algorithm
on startup messages. In any case, this patch fixes this MLD
interoperability issue.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:23:15 -04:00
Loic Poulain fa5c107cc8 net: rfkill: gpio: Fix clock status
Clock is disabled when the device is blocked.
So, clock_enabled is the logical negation of "blocked".

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-22 16:02:15 -04:00
John Fastabend de5df63228 net: sched: cls_u32 changes to knode must appear atomic to readers
Changes to the cls_u32 classifier must appear atomic to the
readers. Before this patch if a change is requested for both
the exts and ifindex, first the ifindex is updated then the
exts with tcf_exts_change(). This opens a small window where
a reader can have a exts chain with an incorrect ifindex. This
violates the the RCU semantics.

Here we resolve this by always passing u32_set_parms() a copy
of the tc_u_knode to work on and then inserting it into the hash
table after the updates have been successfully applied.

Tested with the following short script:

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 handle 1: \
	       u32 divisor 256

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 \
	       u32 link 1: hashkey mask ffffff00 at 12    \
	       match ip src 192.168.8.0/2

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 102    \
	       handle 1::10 u32 classid 1:2 ht 1: 	      \
	       match ip src 192.168.8.0/8 match ip tos 0x0a 1e

#tc filter change dev p3p2 parent 8001:0 protocol ip prio 102 \
		 handle 1::10 u32 classid 1:2 ht 1:        \
		 match ip src 1.1.0.0/8 match ip tos 0x0b 1e

CC: Eric Dumazet <edumazet@google.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 15:59:21 -04:00
John Fastabend a1ddcfee2d net: cls_u32: fix missed pcpu_success free_percpu
This fixes a missed free_percpu in the unwind code path and when
keys are destroyed.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 15:59:21 -04:00
Tom Herbert 3fcb95a84f udp: Need to make ip6_udp_tunnel.c have GPL license
Unable to load various tunneling modules without this:

[   80.679049] fou: Unknown symbol udp_sock_create6 (err 0)
[   91.439939] ip6_udp_tunnel: Unknown symbol ip6_local_out (err 0)
[   91.439954] ip6_udp_tunnel: Unknown symbol __put_net (err 0)
[   91.457792] vxlan: Unknown symbol udp_sock_create6 (err 0)
[   91.457831] vxlan: Unknown symbol udp_tunnel6_xmit_skb (err 0)

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 15:08:25 -04:00
Jason Wang cecda693a9 net: keep original skb which only needs header checking during software GSO
Commit ce93718fb7 ("net: Don't keep
around original SKB when we software segment GSO frames") frees the
original skb after software GSO even for dodgy gso skbs. This breaks
the stream throughput from untrusted sources, since only header
checking was done during software GSO instead of a true
segmentation. This patch fixes this by freeing the original gso skb
only when it was really segmented by software.

Fixes ce93718fb7 ("net: Don't keep
around original SKB when we software segment GSO frames.")

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:57:08 -04:00
Florian Fainelli 19e57c4e6d net: dsa: add {get, set}_wol callbacks to slave devices
Allow switch drivers to implement per-port Wake-on-LAN getter and
setters.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:41:23 -04:00
Florian Fainelli 2446254915 net: dsa: allow switch drivers to implement suspend/resume hooks
Add an abstraction layer to suspend/resume switch devices, doing the
following split:

- suspend/resume the slave network devices and their corresponding PHY
  devices
- suspend/resume the switch hardware using switch driver callbacks

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:41:23 -04:00
Eric Dumazet 2571178626 net: sched: shrink struct qdisc_skb_cb to 28 bytes
We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
or increasing skb->cb[] size.

Commit e0f31d8498 ("flow_keys: Record IP layer protocol in
skb_flow_dissect()") broke IPoIB.

Only current offender is sch_choke, and this one do not need an
absolutely precise flow key.

If we store 17 bytes of flow key, its more than enough. (Its the actual
size of flow_keys if it was a packed structure, but we might add new
fields at the end of it later)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d8498 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:21:47 -04:00
Andy Zhou 6d967f8789 udp_tunnel: Only build ip6_udp_tunnel.c when IPV6 is selected
Functions supplied in ip6_udp_tunnel.c are only needed when IPV6 is
selected. When IPV6 is not selected, those functions are stubbed out
in udp_tunnel.h.

==================================================================
 net/ipv6/ip6_udp_tunnel.c:15:5: error: redefinition of 'udp_sock_create6'
     int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
 In file included from net/ipv6/ip6_udp_tunnel.c:9:0:
      include/net/udp_tunnel.h:36:19: note: previous definition of 'udp_sock_create6' was here
       static inline int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
==================================================================

Fixes:  fd384412e udp_tunnel: Seperate ipv6 functions into its own file
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 22:05:28 -04:00
Samuel Gauthier 9b67aa4a82 openvswitch: restore OVS_FLOW_CMD_NEW notifications
Since commit fb5d1e9e12 ("openvswitch: Build flow cmd netlink reply only if needed."),
the new flows are not notified to the listeners of OVS_FLOW_MCGROUP.

This commit fixes the problem by using the genl function, ie
genl_has_listerners() instead of netlink_has_listeners().

Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:28:26 -04:00
Tom Herbert 4565e9919c gre: Setup and TX path for gre/UDP foo-over-udp encapsulation
Added netlink attrs to configure FOU encapsulation for GRE, netlink
handling of these flags, and properly adjust MTU for encapsulation.
ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU
encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Tom Herbert 473ab820dd ipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation
Add netlink handling for IP tunnel encapsulation parameters and
and adjustment of MTU for encapsulation.  ip_tunnel_encap is called
from ip_tunnel_xmit to actually perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Tom Herbert 14909664e4 sit: Setup and TX path for sit/UDP foo-over-udp encapsulation
Added netlink handling of IP tunnel encapulation paramters, properly
adjust MTU for encapsulation. Added ip_tunnel_encap call to
ipip6_tunnel_xmit to actually perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Tom Herbert 5632848653 net: Changes to ip_tunnel to support foo-over-udp encapsulation
This patch changes IP tunnel to support (secondary) encapsulation,
Foo-over-UDP. Changes include:

1) Adding tun_hlen as the tunnel header length, encap_hlen as the
   encapsulation header length, and hlen becomes the grand total
   of these.
2) Added common netlink define to support FOU encapsulation.
3) Routines to perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Tom Herbert afe93325bc fou: Add GRO support
Implement fou_gro_receive and fou_gro_complete, and populate these
in the correponsing udp_offloads for the socket. Added ipproto to
udp_offloads and pass this from UDP to the fou GRO routine in proto
field of napi_gro_cb structure.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:31 -04:00
Tom Herbert 23461551c0 fou: Support for foo-over-udp RX path
This patch provides a receive path for foo-over-udp. This allows
direct encapsulation of IP protocols over UDP. The bound destination
port is used to map to an IP protocol, and the XFRM framework
(udp_encap_rcv) is used to receive encapsulated packets. Upon
reception, the encapsulation header is logically removed (pointer
to transport header is advanced) and the packet is reinjected into
the receive path with the IP protocol indicated by the mapping.

Netlink is used to configure FOU ports. The configuration information
includes the port number to bind to and the IP protocol corresponding
to that port.

This should support GRE/UDP
(http://tools.ietf.org/html/draft-yong-tsvwg-gre-in-udp-encap-02),
as will as the other IP tunneling protocols (IPIP, SIT).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:31 -04:00
Tom Herbert ce3e02867e net: Export inet_offloads and inet6_offloads
Want to be able to use these in foo-over-udp offloads, etc.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:31 -04:00
John Fastabend 4e2840eee6 net: sched: cls_u32: rcu can not be last node
tc_u32_sel 'sel' in tc_u_knode expects to be the last element in the
structure and pads the structure with tc_u32_key fields for each key.

 kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL)

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:05:45 -04:00
David S. Miller 8f665f6cb7 Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-09-17

Please pull this batch of fixes intended for the 3.17 stream...

Arend van Spriel sends a trio of minor brcmfmac fixes, including a
fix for a Kconfig/build issue, a fix for a crash (null reference),
and a regression fix related to event handling on a P2P interface.

Hante Meuleman follows-up with a brcmfmac fix for a memory leak.

Johannes Stezenbach brings an ath9k_htc fix for a regression related
to hardware decryption offload.

Marcel Holtmann delivers a one-liner to properly mark a device ID
table in rfkill-gpio.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:33:15 -04:00
Eric Dumazet ab34f64808 net: sched: use __skb_queue_head_init() where applicable
pfifo_fast and htb use skb lists, without needing their spinlocks.
(They instead use the standard qdisc lock)

We can use __skb_queue_head_init() instead of skb_queue_head_init()
to be consistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:32:10 -04:00
Florian Fainelli 6819563e64 net: dsa: allow switch drivers to specify phy_device::dev_flags
Some switch drivers (e.g: bcm_sf2) may have to communicate specific
workarounds or flags towards the PHY device driver. Allow switches
driver to be delegated that task by introducing a get_phy_flags()
callback which will do just that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Eric Dumazet 2e4e441071 net: add alloc_skb_with_frags() helper
Extract from sock_alloc_send_pskb() code building skb with frags,
so that we can reuse this in other contexts.

Intent is to use it from tcp_send_rcvq(), tcp_collapse(), ...

We also want to replace some skb_linearize() calls to a more reliable
strategy in pathological cases where we need to reduce number of frags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:25:23 -04:00
Eric Dumazet cb93471acc tcp: do not fake tcp headers in tcp_send_rcvq()
Now we no longer rely on having tcp headers for skbs in receive queue,
tcp repair do not need to build fake ones.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:04:13 -04:00
Andy Zhou c8fffcea0a l2tp: Refactor l2tp core driver to make use of the common UDP tunnel functions
Simplify l2tp implementation using common UDP tunnel APIs.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou 6a93cc9052 udp-tunnel: Add a few more UDP tunnel APIs
Added a few more UDP tunnel APIs that can be shared by UDP based
tunnel protocol implementation. The main ones are highlighted below.

setup_udp_tunnel_sock() configures UDP listener socket for
receiving UDP encapsulated packets.

udp_tunnel_xmit_skb() and upd_tunnel6_xmit_skb() transmit skb
using UDP encapsulation.

udp_tunnel_sock_release() closes the UDP tunnel listener socket.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou fd384412e1 udp_tunnel: Seperate ipv6 functions into its own file.
Add ip6_udp_tunnel.c for ipv6 UDP tunnel functions to avoid ifdefs
in udp_tunnel.c

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Pablo Neira Ayuso 84d7fce693 netfilter: nf_tables: export rule-set generation ID
This patch exposes the ruleset generation ID in three ways:

1) The new command NFT_MSG_GETGEN that exposes the 32-bits ruleset
   generation ID. This ID is incremented in every commit and it
   should be large enough to avoid wraparound problems.

2) The less significant 16-bits of the generation ID are exposed through
   the nfgenmsg->res_id header field. This allows us to quickly catch
   if the ruleset has change between two consecutive list dumps from
   different object lists (in this specific case I think the risk of
   wraparound is unlikely).

3) Userspace subscribers may receive notifications of new rule-set
   generation after every commit. This also provides an alternative
   way to monitor the generation ID. If the events are lost, the
   userspace process hits a overrun error, so it knows that it is
   working with a stale ruleset anyway.

Patrick spotted that rule-set transformations in userspace may take
quite some time. In that case, it annotates the 32-bits generation ID
before fetching the rule-set, then:

1) it compares it to what we obtain after the transformation to
   make sure it is not working with a stale rule-set and no wraparound
   has ocurred.

2) it subscribes to ruleset notifications, so it can watch for new
   generation ID.

This is complementary to the NLM_F_DUMP_INTR approach, which allows
us to detect an interference in the middle one single list dumping.
There is no way to explicitly check that an interference has occurred
between two list dumps from the kernel, since it doesn't know how
many lists the userspace client is actually going to dump.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-19 11:14:43 +02:00
Pablo Neira Ayuso fc04733a1a netfilter: nfnetlink: use original skbuff when committing/aborting
This allows us to access the original content of the batch from
the commit and the abort paths.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-19 11:14:42 +02:00
Johan Hedberg 5eb596f55c Bluetooth: Fix setting correct security level when initiating SMP
We can only determine the final security level when both pairing request
and response have been exchanged. When initiating pairing the starting
target security level is set to MEDIUM unless explicitly specified to be
HIGH, so that we can still perform pairing even if the remote doesn't
have MITM capabilities. However, once we've received the pairing
response we should re-consult the remote and local IO capabilities and
upgrade the target security level if necessary.

Without this patch the resulting Long Term Key will occasionally be
reported to be unauthenticated when it in reality is an authenticated
one.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-09-18 17:39:37 +02:00
Pablo Neira Ayuso fcfa8f493f Merge branch 'ipvs-next'
Simon Horman says:

====================
This pull requests makes the following changes:

* Add simple weighted fail-over scheduler.
  - Unlike other IPVS schedulers this offers fail-over rather than load
    balancing. Connections are directed to the appropriate server based
    solely on highest weight value and server availability.
  - Thanks to Kenny Mathis

* Support IPv6 real servers in IPv4 virtual-services and vice versa
  - This feature is supported in conjunction with the tunnel (IPIP)
    forwarding mechanism. That is, IPv4 may be forwarded in IPv6 and
    vice versa.
  - The motivation for this is to allow more flexibility in the
    choice of IP version offered by both virtual-servers and
    real-servers as they no longer need to match: An IPv4 connection from an
    end-user may be forwarded to a real-server using IPv6 and vice versa.
  - Further work need to be done to support this feature in conjunction
    with connection synchronisation. For now such configurations are
    not allowed.
  - This change includes update to netlink protocol, adding a new
    destination address family attribute. And the necessary changes
    to plumb this information throughout IPVS.
  - Thanks to Alex Gartrell and Julian Anastasov
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-18 10:59:33 +02:00
Herbert Xu 689f1c9de2 ipsec: Remove obsolete MAX_AH_AUTH_LEN
While tracking down the MAX_AH_AUTH_LEN crash in an old kernel
I thought that this limit was rather arbitrary and we should
just get rid of it.

In fact it seems that we've already done all the work needed
to remove it apart from actually removing it.  This limit was
there in order to limit stack usage.  Since we've already
switched over to allocating scratch space using kmalloc, there
is no longer any need to limit the authentication length.

This patch kills all references to it, including the BUG_ONs
that led me here.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-18 10:54:36 +02:00
Alex Gartrell bc18d37f67 ipvs: Allow heterogeneous pools now that we support them
Remove the temporary consistency check and add a case statement to only
allow ipip mixed dests.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:29 +09:00
Julian Anastasov f18ae7206e ipvs: use the new dest addr family field
Use the new address family field cp->daf when printing
cp->daddr in logs or connection listing.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:28 +09:00
Julian Anastasov 4d316f3f9a ipvs: use correct address family in scheduler logs
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-18 08:59:23 +09:00
Marcel Holtmann 0097db06f5 Bluetooth: Remove exported hci_recv_fragment function
The hci_recv_fragment function is no longer used by any driver and thus
do not export it. In fact it is not even needed by the core and it can
be removed altogether.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-09-17 10:23:03 +03:00
John Fastabend 9f6c38e70b net: sched: cls_cgroup need tcf_exts_init in all cases
This ensures the tcf_exts_init() is called for all cases.

Fixes: 952313bd62 ("net: sched: cls_cgroup use RCU")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 16:26:39 -04:00
David S. Miller 2d9d65fa44 Merge branch 'net_next_ovs' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch
Pravin B Shelar says:

====================
Open vSwitch

Following patches adds recirculation and hash action to OVS.
First patch removes pointer to stack object. Next three patches
does code restructuring which is required for last patch.
Recirculation implementation is changed, according to comments from
David Miller, to avoid using recursive calls in OVS. It is using
queue to record recirc action and deferred recirc is executed at
the end of current actions execution.

v1-v2:
Changed subsystem name in subject to openvswitch
v2-v3:
Added patch to remove pkt_key pointer from skb->cb.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 16:21:48 -04:00
Marcel Holtmann dda3b191eb net: rfkill: gpio: Enable module auto-loading for ACPI based switches
For the ACPI based switches the MODULE_DEVICE_TABLE is missing to
export the entries for module auto-loading.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-16 16:09:01 -04:00
John Fastabend e1f93eb06c net: sched: cls_fw: add missing tcf_exts_init call in fw_change()
When allocating a new structure we also need to call tcf_exts_init
to initialize exts.

A follow up patch might be in order to remove some of this code
and do tcf_exts_assign(). With this we could remove the
tcf_exts_init/tcf_exts_change pattern for some of the classifiers.
As part of the future tcf_actions RCU series this will need to be
done. For now fix the call here.

Fixes e35a8ee599 ("net: sched: fw use RCU")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
John Fastabend d14cbfc88f net: sched: cls_cgroup fix possible memory leak of 'new'
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   54996b529a
commit: c7953ef230 [625/646] net: sched: cls_cgroup use RCU

net/sched/cls_cgroup.c:130 cls_cgroup_change() warn: possible memory leak of 'new'
net/sched/cls_cgroup.c:135 cls_cgroup_change() warn: possible memory leak of 'new'
net/sched/cls_cgroup.c:139 cls_cgroup_change() warn: possible memory leak of 'new'

Fixes: c7953ef230 ("net: sched: cls_cgroup use RCU")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
John Fastabend a96366bf26 net: sched: cls_u32 add missing rcu_assign_pointer and annotation
Add missing rcu_assign_pointer and missing  annotation for ht_up
in cls_u32.c

Caught by kbuild bot,

>> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces)
   net/sched/cls_u32.c:378:36:    expected struct tc_u_hnode *ht
   net/sched/cls_u32.c:378:36:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces)
   net/sched/cls_u32.c:610:54:    expected struct tc_u_hnode *ht
   net/sched/cls_u32.c:610:54:    got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces)
   net/sched/cls_u32.c:684:18:    expected struct tc_u_hnode [noderef] <asn:4>*ht_up
   net/sched/cls_u32.c:684:18:    got struct tc_u_hnode *[assigned] ht
>> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
John Fastabend 80aab73de4 net: sched: fix unsued cpu variable
kbuild test robot reported an unused variable cpu in cls_u32.c
after the patch below. This happens when PERF and MARK config
variables are disabled

Fix this is to use separate variables for perf and mark
and define the cpu variable inside the ifdef logic.

Fixes: 459d5f626d ("net: sched: make cls_u32 per cpu")'
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:59:36 -04:00
WANG Cong 69301eaa7f net_sched: fix a null pointer dereference in tcindex_set_parms()
This patch fixes the following crash:

[   42.199159] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[   42.200027] IP: [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
[   42.200027] PGD d2319067 PUD d4ffe067 PMD 0
[   42.200027] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[   42.200027] CPU: 0 PID: 541 Comm: tc Not tainted 3.17.0-rc4+ #603
[   42.200027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   42.200027] task: ffff8800d22d2670 ti: ffff8800ce790000 task.ti: ffff8800ce790000
[   42.200027] RIP: 0010:[<ffffffff817e3fc4>]  [<ffffffff817e3fc4>] tcindex_set_parms+0x45c/0x526
[   42.200027] RSP: 0018:ffff8800ce793898  EFLAGS: 00010202
[   42.200027] RAX: 0000000000000001 RBX: ffff8800d1786498 RCX: 0000000000000000
[   42.200027] RDX: ffffffff82114ec8 RSI: ffffffff82114ec8 RDI: ffffffff82114ec8
[   42.200027] RBP: ffff8800ce793958 R08: 00000000000080d0 R09: 0000000000000001
[   42.200027] R10: ffff8800ce7939a0 R11: 0000000000000246 R12: ffff8800d017d238
[   42.200027] R13: 0000000000000018 R14: ffff8800d017c6a0 R15: ffff8800d1786620
[   42.200027] FS:  00007f4e24539740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[   42.200027] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   42.200027] CR2: 0000000000000018 CR3: 00000000cff38000 CR4: 00000000000006f0
[   42.200027] Stack:
[   42.200027]  ffff8800ce0949f0 0000000000000000 0000000200000003 ffff880000000000
[   42.200027]  ffff8800ce7938b8 ffff8800ce7938b8 0000000600000007 0000000000000000
[   42.200027]  ffff8800ce7938d8 ffff8800ce7938d8 0000000600000007 ffff8800ce0949f0
[   42.200027] Call Trace:
[   42.200027]  [<ffffffff817e4169>] tcindex_change+0xdb/0xee
[   42.200027]  [<ffffffff817c16ca>] tc_ctl_tfilter+0x44d/0x63f
[   42.200027]  [<ffffffff8179d161>] rtnetlink_rcv_msg+0x181/0x194
[   42.200027]  [<ffffffff8179cf9d>] ? rtnl_lock+0x17/0x19
[   42.200027]  [<ffffffff8179cfe0>] ? __rtnl_unlock+0x17/0x17
[   42.200027]  [<ffffffff817ee296>] netlink_rcv_skb+0x49/0x8b
[   43.462494]  [<ffffffff8179cfc2>] rtnetlink_rcv+0x23/0x2a
[   43.462494]  [<ffffffff817ec8df>] netlink_unicast+0xc7/0x148
[   43.462494]  [<ffffffff817ed413>] netlink_sendmsg+0x5cb/0x63d
[   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
[   43.462494]  [<ffffffff817757b8>] __sock_sendmsg_nosec+0x25/0x27
[   43.462494]  [<ffffffff81778165>] sock_sendmsg+0x57/0x71
[   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
[   43.462494]  [<ffffffff81152c06>] ? might_fault+0xa0/0xa4
[   43.462494]  [<ffffffff81152bbd>] ? might_fault+0x57/0xa4
[   43.462494]  [<ffffffff817838fd>] ? verify_iovec+0x69/0xb7
[   43.462494]  [<ffffffff817784f8>] ___sys_sendmsg+0x21d/0x2bb
[   43.462494]  [<ffffffff81009db3>] ? native_sched_clock+0x35/0x37
[   43.462494]  [<ffffffff8109ab53>] ? sched_clock_local+0x12/0x72
[   43.462494]  [<ffffffff810ad781>] ? mark_lock+0x2e/0x224
[   43.462494]  [<ffffffff8109ada4>] ? sched_clock_cpu+0xa0/0xb9
[   43.462494]  [<ffffffff810aee37>] ? __lock_acquire+0x5fe/0xde4
[   43.462494]  [<ffffffff8119f570>] ? rcu_read_lock_held+0x36/0x38
[   43.462494]  [<ffffffff8119f75a>] ? __fcheck_files.isra.7+0x4b/0x57
[   43.462494]  [<ffffffff8119fbf2>] ? __fget_light+0x30/0x54
[   43.462494]  [<ffffffff81779012>] __sys_sendmsg+0x42/0x60
[   43.462494]  [<ffffffff81779042>] SyS_sendmsg+0x12/0x1c
[   43.462494]  [<ffffffff819d24d2>] system_call_fastpath+0x16/0x1b

'p->h' could be NULL while 'cp->h' is always update to date.

Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-By: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:20:09 -04:00
WANG Cong 44b75e4317 net_sched: fix memory leak in cls_tcindex
Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-By: John Fastabend <john.r.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 15:19:23 -04:00
David Howells 0c903ab64f KEYS: Make the key matching functions return bool
Make the key matching functions pointed to by key_match_data::cmp return bool
rather than int.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2014-09-16 17:36:08 +01:00
David Howells c06cfb08b8 KEYS: Remove key_type::match in favour of overriding default by match_preparse
A previous patch added a ->match_preparse() method to the key type.  This is
allowed to override the function called by the iteration algorithm.
Therefore, we can just set a default that simply checks for an exact match of
the key description with the original criterion data and allow match_preparse
to override it as needed.

The key_type::match op is then redundant and can be removed, as can the
user_match() function.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2014-09-16 17:36:06 +01:00
David Howells 462919591a KEYS: Preparse match data
Preparse the match data.  This provides several advantages:

 (1) The preparser can reject invalid criteria up front.

 (2) The preparser can convert the criteria to binary data if necessary (the
     asymmetric key type really wants to do binary comparison of the key IDs).

 (3) The preparser can set the type of search to be performed.  This means
     that it's not then a one-off setting in the key type.

 (4) The preparser can set an appropriate comparator function.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
2014-09-16 17:36:02 +01:00
Steffen Klassert b8c203b2d2 xfrm: Generate queueing routes only from route lookup functions
Currently we genarate a queueing route if we have matching policies
but can not resolve the states and the sysctl xfrm_larval_drop is
disabled. Here we assume that dst_output() is called to kill the
queued packets. Unfortunately this assumption is not true in all
cases, so it is possible that these packets leave the system unwanted.

We fix this by generating queueing routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: a0073fe18e ("xfrm: Add a state resolution packet queue")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:49 +02:00
Steffen Klassert f92ee61982 xfrm: Generate blackhole routes only from route lookup functions
Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131b1 ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:40 +02:00
Andy Zhou 971427f353 openvswitch: Add recirc and hash action.
Recirc action allows a packet to reenter openvswitch processing.
currently openvswitch lookup flow for packet received and execute
set of actions on that packet, with help of recirc action we can
process/modify the packet and recirculate it back in openvswitch
for another pass.

OVS hash action calculates 5-tupple hash and set hash in flow-key
hash. This can be used along with recirculation for distributing
packets among different ports for bond devices.
For example:
OVS bonding can use following actions:
Match on: bond flow; Action: hash, recirc(id)
Match on: recirc-id == id and hash lower bits == a;
          Action: output port_bond_a

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 23:28:14 -07:00
Andy Zhou 32ae87ff79 openvswitch: simplify sample action implementation
The current sample() function implementation is more complicated
than necessary in handling single user space action optimization
and skb reference counting. There is no functional changes.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 23:28:14 -07:00
Pravin B Shelar 8c8b1b83fc openvswitch: Use tun_key only for egress tunnel path.
Currently tun_key is used for passing tunnel information
on ingress and egress path, this cause confusion.  Following
patch removes its use on ingress path make it egress only parameter.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-09-15 23:28:13 -07:00
Pravin B Shelar 83c8df26a3 openvswitch: refactor ovs flow extract API.
OVS flow extract is called on packet receive or packet
execute code path.  Following patch defines separate API
for extracting flow-key in packet execute code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2014-09-15 23:28:13 -07:00
Pravin B Shelar 2ff3e4e486 openvswitch: Remove pkt_key from OVS_CB
OVS keeps pointer to packet key in skb->cb, but the packet key is
store on stack. This could make code bit tricky. So it is better to
get rid of the pointer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-09-15 23:28:13 -07:00
Julian Anastasov cf34e646da ipvs: address family of LBLCR entry depends on svc family
The LBLCR entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:38 +09:00
Julian Anastasov f7fa380069 ipvs: address family of LBLC entry depends on svc family
The LBLC entries should use svc->af, not dest->af.
Needed to support svc->af != dest->af.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:38 +09:00
Alex Gartrell 8052ba2925 ipvs: support ipv4 in ipv6 and ipv6 in ipv4 tunnel forwarding
Pull the common logic for preparing an skb to prepend the header into a
single function and then set fields such that they can be used in either
case (generalize tos and tclass to dscp, hop_limit and ttl to ttl, etc)

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:37 +09:00
Alex Gartrell c63e4de2be ipvs: Add generic ensure_mtu_is_adequate to handle mixed pools
The out_rt functions check to see if the mtu is large enough for the packet
and, if not, send icmp messages (TOOBIG or DEST_UNREACH) to the source and
bail out.  We needed the ability to send ICMP from the out_rt_v6 function
and DEST_UNREACH from the out_rt function, so we just pulled it out into a
common function.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:37 +09:00
Alex Gartrell 919aa0b2bb ipvs: Pull out update_pmtu code
Another step toward heterogeneous pools, this removes another piece of
functionality currently specific to each address family type.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:36 +09:00
Alex Gartrell 4a4739d56b ipvs: Pull out crosses_local_route_boundary logic
This logic is repeated in both out_rt functions so it was redundant.
Additionally, we'll need to be able to do checks to route v4 to v6 and vice
versa in order to deal with heterogeneous pools.

This patch also updates the callsites to add an additional parameter to the
out route functions.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:36 +09:00
Alex Gartrell 391f503d69 ipvs: prevent mixing heterogeneous pools and synchronization
The synchronization protocol is not compatible with heterogeneous pools, so
we need to verify that we're not turning both on at the same time.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:35 +09:00
Alex Gartrell ba38528aae ipvs: Supply destination address family to ip_vs_conn_new
The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell ad147aa4dd ipvs: Pass destination address family to ip_vs_trash_get_dest
Part of a series of diffs to tease out destination family from virtual
family.  This diff just adds a parameter to ip_vs_trash_get and then uses
it for comparison rather than svc->af.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell 655eef103d ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}
We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Alex Gartrell 6cff339bbd ipvs: Add destination address family to netlink interface
This is necessary to support heterogeneous pools.  For example, if you have
an ipv6 addressed network, you'll want to be able to forward ipv4 traffic
into it.

This patch enforces that destination address family is the same as service
family, as none of the forwarding mechanisms support anything else.

For the old setsockopt mechanism, we simply set the dest address family to
AF_INET as we do with the service.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Kenny Mathis 616a9be25c ipvs: Add simple weighted failover scheduler
Add simple weighted IPVS failover support to the Linux kernel. All
other scheduling modules implement some form of load balancing, while
this offers a simple failover solution. Connections are directed to
the appropriate server based solely on highest weight value and server
availability. Tested functionality with keepalived.

Signed-off-by: Kenny Mathis <kmathis@chokepoint.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:32 +09:00
Florian Fainelli c1f570a6ab net: dsa: fix mii_bus to host_dev replacement
dsa_of_probe() still used cd->mii_bus instead of cd->host_dev when
building with CONFIG_OF=y. Fix this by making the replacement here as
well.

Fixes: b4d2394d01 ("dsa: Replace mii_bus with a generic host device")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:52:48 -04:00
WANG Cong 10ee1c34be net_sched: use tcindex_filter_result_init()
Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:51:18 -04:00
WANG Cong 2f9a220eff net_sched: fix suspicious RCU usage in tcindex_classify()
This patch fixes the following kernel warning:

[   44.805900] [ INFO: suspicious RCU usage. ]
[   44.808946] 3.17.0-rc4+ #610 Not tainted
[   44.811831] -------------------------------
[   44.814873] net/sched/cls_tcindex.c:84 suspicious rcu_dereference_check() usage!

Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:49:42 -04:00
WANG Cong a57a65ba47 net_sched: fix an allocation bug in tcindex_set_parms()
Fixes: commit 331b72922c ("net: sched: RCU cls_tcindex")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:48:23 -04:00
WANG Cong 80dcbd12fb net_sched: fix suspicious RCU usage in cls_bpf_classify()
Fixes: commit 1f947bf151 ("net: sched: rcu'ify cls_bpf")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:42:08 -04:00
Vlad Yasevich c095f248e6 bridge: Fix br_should_learn to check vlan_enabled
As Toshiaki Makita pointed out, the BRIDGE_INPUT_SKB_CB will
not be initialized in br_should_learn() as that function
is called only from br_handle_local_finish().  That is
an input handler for link-local ethernet traffic so it perfectly
correct to check br->vlan_enabled here.

Reported-by: Toshiaki Makita<toshiaki.makita1@gmail.com>
Fixes: 20adfa1 bridge: Check if vlan filtering is enabled only once.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:38:30 -04:00
Alexander Duyck b4d2394d01 dsa: Replace mii_bus with a generic host device
This change makes it so that instead of passing and storing a mii_bus we
instead pass and store a host_dev.  From there we can test to determine the
exact type of device, and can verify it is the correct device for our switch.

So for example it would be possible to pass a device pointer from a pci_dev
and instead of checking for a PHY ID we could check for a vendor and/or device
ID.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:24:20 -04:00
Alexander Duyck 5075314e4e dsa: Split ops up, and avoid assigning tag_protocol and receive separately
This change addresses several issues.

First, it was possible to set tag_protocol without setting the ops pointer.
To correct that I have reordered things so that rcv is now populated before
we set tag_protocol.

Second, it didn't make much sense to keep setting the device ops each time a
new slave was registered.  So by moving the receive portion out into root
switch initialization that issue should be addressed.

Third, I wanted to avoid sending tags if the rcv pointer was not registered
so I changed the tag check to verify if the rcv function pointer is set on
the root tree.  If it is then we start sending DSA tagged frames.

Finally I split the device ops pointer in the structures into two spots.  I
placed the rcv function pointer in the root switch since this makes it
easiest to access from there, and I placed the xmit function pointer in the
slave for the same reason.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:24:20 -04:00
Jozsef Kadlecsik 07034aeae1 netfilter: ipset: hash:mac type added to ipset
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:21 +02:00
Anton Danilov 76cea4109c netfilter: ipset: Add skbinfo extension support to SET target.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:21 +02:00
Anton Danilov cbee93d7b7 netfilter: ipset: Add skbinfo extension kernel support for the list set type.
Add skbinfo extension kernel support for the list set type.
Introduce the new revision of the list set type.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:20 +02:00
Anton Danilov af331419d3 netfilter: ipset: Add skbinfo extension kernel support for the hash set types.
Add skbinfo extension kernel support for the hash set types.
Inroduce the new revisions of all hash set types.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:20 +02:00
Anton Danilov 39d1ecf1ad netfilter: ipset: Add skbinfo extension kernel support for the bitmap set types.
Add skbinfo extension kernel support for the bitmap set types.
Inroduce the new revisions of bitmap_ip, bitmap_ipmac and bitmap_port set types.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:20 +02:00
Anton Danilov 0e9871e3f7 netfilter: ipset: Add skbinfo extension kernel support in the ipset core.
Skbinfo extension provides mapping of metainformation with lookup in the ipset tables.
This patch defines the flags, the constants, the functions and the structures
for the data type independent support of the extension.
Note the firewall mark stores in the kernel structures as two 32bit values,
but transfered through netlink as one 64bit value.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:20 +02:00
Jozsef Kadlecsik 73e64e1813 netfilter: ipset: Fix static checker warning in ip_set_core.c
Dan Carpenter reported the following static checker warning:

        net/netfilter/ipset/ip_set_core.c:1414 call_ad()
        error: 'nlh->nlmsg_len' from user is not capped properly

The payload size is limited now by the max size of size_t.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2014-09-15 22:20:20 +02:00
John W. Linville 1186b623c2 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-09-15 14:55:45 -04:00
John W. Linville 6bd2bd27ba This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
 timeout support, a number of changes for TDLS, early support for radio
 resource measurement and many fixes. Also, I'm changing a number of
 places to clear key memory when it's freed and Intel claims copyright
 for code they developed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUEpv0AAoJEDBSmw7B7bqr6CMP/2CXvWr/98AY2Flt74KDNyaE
 vmJBVCsu+eT0G9FL6YxbVU5+rvInGDHd9qTHkU4ljd+uXwnG8XAT+WHFlhBjzm+V
 juXPWblbSdMzwpWDfq7Kbk134b9ALTEUqekhqSFvhPA5h0Dq0/8lDK9CFyfwKWbN
 07PwUv0VUUEHKVqQoVSNJu9Szi5NvZvDcN7Jwg1Cpnv0sUOeH7J2Kz1OUT4RaEhI
 c/UJjCQV4ssXaEkTDIxciQ62HrglZanMqyx4a9LGbrxLdw1KJ19CNmSkwB5mQuZg
 LhV05Y0Gv4tkRC8sCo7HF7cqgjBfjTNiEjZYfbExW0QFOMKIgKmmjYIEezVdbrk7
 gFIyhTRE595UtztUJV0dcitoOlybbRf3OdEwAIJD6fc0vhoe/rSjUIyS7/CZisMT
 9zg33JvtK3eYPSJS1jy4lk2yZ5alhLoPMQTNmsEuyOGcU3sH9vTGMjONPffOlcH9
 nzj7aUS2Qvwn3H+4CIaZbZhySpa0B9zkGL3oxeaEBmLJbFMTo5ua2FNGhubC2O+O
 BwNULDBEMwsGHKMUCWCLmQwACWdVdNxYYWtXbWfxdmC/CJoXgdLCJIUfoa1aOf2A
 DyCqUFvG/n8ObHVy+P3RU6poQFj0M/yclJAMHRW6x2qzNvAkDb0G6TVeIlgN5dG8
 jLoZPL5OH0wb0BPVNEH8
 =OPIp
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."

Conflicts:
	net/mac80211/iface.c

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-15 14:51:23 -04:00
Eric Dumazet b3d6cb92fd tcp: do not copy headers in tcp_collapse()
tcp_collapse() wants to shrink skb so that the overhead is minimal.

Now we store tcp flags into TCP_SKB_CB(skb)->tcp_flags, we no longer
need to keep around full headers.
Whole available space is dedicated to the payload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 14:41:08 -04:00
Eric Dumazet e93a0435f8 tcp: allow segment with FIN in tcp_try_coalesce()
We can allow a segment with FIN to be aggregated,
if we take care to add tcp flags,
and if skb_try_coalesce() takes care of zero sized skbs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 14:41:07 -04:00
Eric Dumazet e11ecddf51 tcp: use TCP_SKB_CB(skb)->tcp_flags in input path
Input path of TCP do not currently uses TCP_SKB_CB(skb)->tcp_flags,
which is only used in output path.

tcp_recvmsg(), looks at tcp_hdr(skb)->syn for every skb found in receive queue,
and its unfortunate because this bit is located in a cache line right before
the payload.

We can simplify TCP by copying tcp flags into TCP_SKB_CB(skb)->tcp_flags.

This patch does so, and avoids the cache line miss in tcp_recvmsg()

Following patches will
- allow a segment with FIN being coalesced in tcp_try_coalesce()
- simplify tcp_collapse() by not copying the headers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 14:41:07 -04:00
Alexander Y. Fomichev 7ce64c79c4 net: fix creation adjacent device symlinks
__netdev_adjacent_dev_insert may add adjust device of different net
namespace, without proper check it leads to emergence of broken
sysfs links from/to devices in another namespace.
Fix: rewrite netdev_adjacent_is_neigh_list macro as a function,
     move net_eq check into netdev_adjacent_is_neigh_list.
     (thanks David)
     related to: 4c75431ac3

Signed-off-by: Alexander Fomichev <git.user@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 14:24:53 -04:00
Marcel Holtmann 43e73e4e2a Bluetooth: Provide HCI command opcode information to driver
The Bluetooth core already does processing of the HCI command header
and puts it together before sending it to the driver. It is not really
efficient for the driver to look at the HCI command header again in
case it has to make certain decisions about certain commands. To make
this easier, just provide the opcode as part of the SKB control buffer
information. The extra information about the opcode is optional and
only provided for HCI commands.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-09-15 07:15:45 +03:00
Marcel Holtmann 7cb9d20fd9 Bluetooth: Add BUILD_BUG_ON check for SKB control buffer size
The struct bt_skb_cb size needs to stay within the limits of skb->cb
at all times and to ensure that add a BUILD_BUG_ON to check for it at
compile time.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-09-15 07:15:41 +03:00
Sasha Levin c0d1379a19 net: bpf: correctly handle errors in sk_attach_filter()
Commit "net: bpf: make eBPF interpreter images read-only" has changed bpf_prog
to be vmalloc()ed but never handled some of the errors paths of the old code.

On error within sk_attach_filter (which userspace can easily trigger), we'd
kfree() the vmalloc()ed memory, and leak the internal bpf_work_struct.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:37:49 -04:00
Vlad Yasevich 635126b7ca bridge: Allow clearing of pvid and untagged bitmap
Currently, it is possible to modify the vlan filter
configuration to add pvid or untagged support.
For example:
  bridge vlan add vid 10 dev eth0
  bridge vlan add vid 10 dev eth0 untagged pvid

The second statement will modify vlan 10 to
include untagged and pvid configuration.
However, it is currently impossible to go backwards
  bridge vlan add vid 10 dev eth0 untagged pvid
  bridge vlan add vid 10 dev eth0

Here nothing happens.  This patch correct this so
that any modifiers not supplied are removed from
the configuration.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:21:56 -04:00
Vlad Yasevich 20adfa1a81 bridge: Check if vlan filtering is enabled only once.
The bridge code checks if vlan filtering is enabled on both
ingress and egress.   When the state flip happens, it
is possible for the bridge to currently be forwarding packets
and forwarding behavior becomes non-deterministic.  Bridge
may drop packets on some interfaces, but not others.

This patch solves this by caching the filtered state of the
packet into skb_cb on ingress.  The skb_cb is guaranteed to
not be over-written between the time packet entres bridge
forwarding path and the time it leaves it.  On egress, we
can then check the cached state to see if we need to
apply filtering information.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:21:56 -04:00
Hannes Frederic Sowa 233577a220 net: filter: constify detection of pkt_type_offset
Currently we have 2 pkt_type_offset functions doing the same thing and
spread across the architecture files. Remove those and replace them
with a PKT_TYPE_OFFSET macro helper which gets the constant value from a
zero sized sk_buff member right in front of the bitfield with offsetof.
This new offset marker does not change size of struct sk_buff.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:07:21 -04:00
Florian Fainelli ac7a04c33d net: dsa: change tag_protocol to an enum
Now that we introduced an additional multiplexing/demultiplexing layer
with commit 3e8a72d1da ("net: dsa: reduce number of protocol hooks")
that lives within the DSA code, we no longer need to have a given switch
driver tag_protocol be an actual ethertype value, instead, we can
replace it with an enum: dsa_tag_protocol.

Do this replacement in the drivers, which allows us to get rid of the
cpu_to_be16()/htons() dance, and remove ETH_P_BRCMTAG since we do not
need it anymore.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:04:35 -04:00
WANG Cong 3ce62a84d5 ipv6: exit early in addrconf_notify() if IPv6 is disabled
If IPv6 is explicitly disabled before the interface comes up,
it makes no sense to continue when it comes up, even just
print a message.

(I am not sure about other cases though, so I prefer not to touch)

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:39:40 -04:00
WANG Cong 1691c63ea4 ipv6: refactor ipv6_dev_mc_inc()
Refactor out allocation and initialization and make
the refcount code more readable.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong f7ed925c1b ipv6: update the comment in mcast.c
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong 414b6c943f ipv6: drop some rcu_read_lock in mcast
Similarly the code is already protected by rtnl lock.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong b5350916bf ipv6: drop ipv6_sk_mc_lock in mcast
Similarly the code is already protected by rtnl lock.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong 83aa29eefd ipv6: refactor __ipv6_dev_ac_inc()
Refactor out allocation and initialization and make
the refcount code more readable.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong 013b4d9038 ipv6: clean up ipv6_dev_ac_inc()
Make it accept inet6_dev, and rename it to __ipv6_dev_ac_inc()
to reflect this change.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong b03a9c04a3 ipv6: remove ipv6_sk_ac_lock
Just move rtnl lock up, so that the anycast list can be protected
by rtnl lock now.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
WANG Cong 6c555490e0 ipv6: drop useless rcu_read_lock() in anycast
These code is now protected by rtnl lock, rcu read lock
is useless now.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
John Fastabend 1f947bf151 net: sched: rcu'ify cls_bpf
This patch makes the cls_bpf classifier RCU safe. The tcf_lock
was being used to protect a list of cls_bpf_prog now this list
is RCU safe and updates occur with rcu_replace.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend b929d86d25 net: sched: rcu'ify cls_rsvp
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend 1ce87720d4 net: sched: make cls_u32 lockless
Make cls_u32 classifier safe to run without holding lock. This patch
converts statistics that are kept in read section u32_classify into
per cpu counters.

This patch was tested with a tight u32 filter add/delete loop while
generating traffic with pktgen. By running pktgen on vlan devices
created on top of a physical device we can hit the qdisc layer
correctly. For ingress qdisc's a loopback cable was used.

for i in {1..100}; do
        q=`echo $i%8|bc`;
        echo -n "u32 tos: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
                  action skbedit queue_mapping $q;
        sleep 1;
        tc filter del dev p3p2 prio $i;

        echo -n "u32 tos hash table: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
        sleep 2;
        tc filter del dev p3p2 prio $i
        sleep 1;
done

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend 459d5f626d net: sched: make cls_u32 per cpu
This uses per cpu counters in cls_u32 in preparation
to convert over to rcu.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend 331b72922c net: sched: RCU cls_tcindex
Make cls_tcindex RCU safe.

This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend 1109c00547 net: sched: RCU cls_route
RCUify the route classifier. For now however spinlock's are used to
protect fastmap cache.

The issue here is the fastmap may be read by one CPU while the
cache is being updated by another. An array of pointers could be
one possible solution.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend e35a8ee599 net: sched: fw use RCU
RCU'ify fw classifier.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend 70da9f0bf9 net: sched: cls_flow use RCU
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend 952313bd62 net: sched: cls_cgroup use RCU
Make cgroup classifier safe for RCU.

Also drops the calls in the classify routine that were doing a
rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held
entering this routine we have issues with deleting the classifier
chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock()
pair noting all paths AFAIK hold rcu_read_lock.

If there is a case where classify is called without the rcu read lock
then an rcu splat will occur and we can correct it.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:26 -04:00
John Fastabend 9888faefe1 net: sched: cls_basic use RCU
Enable basic classifier for RCU.

Dereferencing tp->root may look a bit strange here but it is needed
by my accounting because it is allocated at init time and needs to
be kfree'd at destroy time. However because it may be referenced in
the classify() path we must wait an RCU grace period before free'ing
it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern
is used in all the classifiers.

Also the hgenerator can be incremented without concern because it
is always incremented under RTNL.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
John Fastabend 25d8c0d55f net: rcu-ify tcf_proto
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
John Fastabend 46e5da40ae net: qdisc: use rcu prefix and silence sparse warnings
Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
David S. Miller cffc6c4c94 Merge tag 'master-2014-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-09-11

Please pull this batch of fixes intended for the 3.17 stream:

For the mac80211 bits, Johannes says:

"Two more fixes for mac80211 - one of them addresses a long-standing
issue that we only found when using vendor events more frequently;
the other addresses some bad information being reported in userspace
that people were starting to actually look at."

For the iwlwifi bits, Emmanuel says:

"I re-enable scheduled scan on firmware that contain the fix for
the bug that Linus reported.  A few trivial fixes: endianity issues,
the same DTIM period fix that I did in mac80211.  Eyal fixes a few
issues we identified with EAPOL, we now send them just as if they were
management frames, this solves interrop issues.  Johannes has another
set of trivial fixes, while Luca fixes the way we configure the filters
in the firmware. Last but not least, a new device is added by Oren."

Emmanuel was traveling, resulting in his pull to be a bit larger than
I would have liked to see at this point.  FWIW, I have asked Emmanuel
to be much more strict for any more pull requests in this cycle.

In addition to the above, Sujith Manoharan reverts an earlier ath9k
patch.  The earlier change was found to allow for the device to sleep
too long and miss beacons.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 18:21:47 -04:00
Scott Wood 2d8f7e2c8a udp: Fix inverted NAPI_GRO_CB(skb)->flush test
Commit 2abb7cdc0d ("udp: Add support for doing checksum unnecessary
conversion") caused napi_gro_cb structs with the "flush" field zero to
take the "udp_gro_receive" path rather than the "set flush to 1" path
that they would previously take.  As a result I saw booting from an NFS
root hang shortly after starting userspace, with "server not
responding" messages.

This change to the handling of "flush == 0" packets appears to be
incidental to the goal of adding new code in the case where
skb_gro_checksum_validate_zero_check() returns zero.  Based on that and
the fact that it breaks things, I'm assuming that it is unintentional.

Fixes: 2abb7cdc0d ("udp: Add support for doing checksum unnecessary conversion")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:55:41 -04:00
Alexander Duyck bf7fa551e0 mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path
There is a possible issue with the use, or lack thereof of sk_refcnt and
sk_wmem_alloc in the wifi ack status functionality.

Specifically if a socket were to request acknowledgements, and the socket
were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
to reach 0 it would be possible to have sock_queue_err_skb orphan the last
buffer, resulting in __sk_free being called on the socket.  After this the
buffer is enqueued on sk_error_queue, however the queue has already been
flushed resulting in at least a memory leak, if not a data corruption.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:51:25 -04:00
Alexander Duyck cab41c47d9 skb: Add documentation for skb_clone_sk
This change adds some documentation to the call skb_clone_sk.  This is
meant to help clarify the purpose of the function for other developers.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:51:24 -04:00
Sabrina Dubroca 381f4dca48 ipv6: clean up anycast when an interface is destroyed
If we try to rmmod the driver for an interface while sockets with
setsockopt(JOIN_ANYCAST) are alive, some refcounts aren't cleaned up
and we get stuck on:

  unregister_netdevice: waiting for ens3 to become free. Usage count = 1

If we LEAVE_ANYCAST/close everything before rmmod'ing, there is no
problem.

We need to perform a cleanup similar to the one for multicast in
addrconf_ifdown(how == 1).

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:33:06 -04:00
Johan Hedberg 9a783a139c Bluetooth: Fix re-setting RPA as expired when deferring update
The hci_update_random_address will clear the RPA_EXPIRED flag and
proceed with setting a new one if the flag was set. However, the
set_random_addr() function that is called may choose to defer the update
to a later moment. In such a case the flag would incorrectly remain
unset unless set_random_addr() re-sets it. This patch fixes the issue.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-12 18:34:25 +02:00
Pablo Neira Ayuso 0bbe80e571 netfilter: masquerading needs to be independent of x_tables in Kconfig
Users are starting to test nf_tables with no x_tables support. Therefore,
masquerading needs to be indenpendent of it from Kconfig.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-12 09:40:18 +02:00
Pablo Neira Ayuso 3e8dc212a0 netfilter: NFT_CHAIN_NAT_IPV* is independent of NFT_NAT
Now that we have masquerading support in nf_tables, the NAT chain can
be use with it, not only for SNAT/DNAT. So make this chain type
independent of it.

While at it, move it inside the scope of 'if NF_NAT_IPV*' to simplify
dependencies.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-12 09:40:17 +02:00
Linus Torvalds c73f6fdf2f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "The main thing here is a set of three patches that fix a buffer
  overrun for large authentication tickets (sigh).

  There is also a trivial warning fix and an error path fix that are
  both regressions"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: do not hard code max auth ticket len
  libceph: add process_one_ticket() helper
  libceph: gracefully handle large reply messages from the mon
  rbd: fix error return code in rbd_dev_device_setup()
  rbd: avoid format-security warning inside alloc_workqueue()
2014-09-11 18:03:21 -07:00
Eliad Peller 0d8614b4b9 mac80211: replace SMPS hw flags with wiphy feature bits
Use the new static_smps / dynamic_smps feature bits
instead of mac80211-internal hw flags.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 13:37:02 +02:00
Eliad Peller f699317487 mac80211: set smps_mode according to ap params
Take the requested smps mode from the ap params
(instead of always starting with SMPS_OFF)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 13:37:02 +02:00
Eliad Peller 18998c381b cfg80211: allow requesting SMPS mode on ap start
Add feature bits to indicate device support for
static-smps and dynamic-smps modes.

Add a new NL80211_ATTR_SMPS_MODE attribue to allow
configuring the smps mode to be used by the ap
(e.g. configuring to ap to dynamic smps mode will
reduce power consumption while having minor effect
on throughput)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 13:37:02 +02:00
Arik Nemtsov 59cd85cbcf mac80211: set network header in TDLS frames
Correctly mark the network header location in mac80211-generated TDLS
frames. These may be used by lower-level drivers.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:25:22 +02:00
Eliad Peller b0b6aa2c8e cfg80211/mac80211: add wmm info to assoc event
Userspace might need to know what queues are configured
for uapsd (e.g. for setting proper default values in tspecs).

Add this bitmap to the association event (inside wmm
nested attribute)

Add additional parameter to cfg80211_rx_assoc_resp,
and update its callers.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:24:39 +02:00
Johannes Berg 960d01acf6 cfg80211: add WMM traffic stream API
Add nl80211 and driver API to validate, add and delete traffic
streams with appropriate settings.

The API calls for userspace doing the action frame handshake
with the peer, and then allows only to set up the parameters
in the driver. To avoid setting up a session only to tear it
down again, the validate API is provided, but the real usage
later can still fail so userspace must be prepared for that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:21:18 +02:00
Liad Kaufman 9d58f25b12 mac80211: add TDLS connection timeout
Adding a timeout for tearing down a TDLS connection that
hasn't had ACKed traffic sent through it for a certain
amount of time.

Since we have no other monitoring facility to indicate the
existance (or non-existance) of a peer, this patch will
cause a peer to be considered as unavailable if for some X
time at least some Y packets have all not been ACKed.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:18:47 +02:00
Thomas Huehn 5935839ad7 mac80211: improve minstrel_ht rate sorting by throughput & probability
This patch improves the way minstrel_ht sorts rates according to throughput
and success probability. 3 FOR-loops across the entire rate and mcs group set
in function minstrel_ht_update_stats() which where used to determine the
fastest, second fastest and most robust rate are reduced to 2 FOR-loop.

The sorted list of rates according throughput is extended to the best four
rates as we need them in upcoming joint rate and power control. The sorting
is done via the new function minstrel_ht_sort_best_tp_rates(). The annotation
of those 4 best throughput rates in the debugfs file rc-stats is changes to:
"A,B,C,D", where A is the fastest rate and C the 4th fastest.

Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Tested-by: Stefan Venz <ikstream86@gmail.com>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:10:14 +02:00
Thomas Huehn ca12c0c833 mac80211: Unify rate statistic variables between Minstrel & Minstrel_HT
Minstrel and Mintrel_HT used there own structs to keep track of rate
statistics. Unify those variables in struct minstrel_rate_states and
move it to rc80211_minstrel.h for common usage. This is a clean-up
patch to prepare Minstrel and Minstrel_HT codebase for upcoming TPC.

Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:08:31 +02:00
Johannes Berg 5393b917bc cfg80211: clear nl80211 messages carrying keys after processing
Clear any nl80211 messages that might contain keys after
processing them to avoid leaving their data in memory
"forever" after they've been freed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:07:39 +02:00
Johannes Berg 78f686cae0 cfg80211: don't put kek/kck/replay counter on the stack
There's no need to put the values on the stack, just pass a
pointer to the data in the nl80211 message. This reduces stack
usage and avoids potential issues with putting sensitive data
on the stack.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:07:34 +02:00
Johannes Berg 538c9eb8b3 cfg80211: clear wext keys when freeing and removing them
When freeing the keys stored for wireless extensions, clear the memory
to avoid having the key material stick around in memory "forever".
Similarly, when userspace overwrites a key, actually clear it instead
of just setting the key length to zero.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:07:28 +02:00
Johannes Berg 29c3f9c399 mac80211: clear key material when freeing keys
When freeing the key, clear the memory to avoid having the
key material stick around in memory "forever".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:07:23 +02:00
Johannes Berg b47f610bd6 cfg80211: clear connect keys when freeing them
When freeing the connect keys, clear the memory to avoid
having the key material stick around in memory "forever".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-09-11 12:07:18 +02:00
Johan Hedberg 7ed3fa2078 Bluetooth: Expire RPA if encryption fails
If encryption fails and we're using an RPA it may be because of a
conflict with another device. To avoid repeated failures the safest
action is to simply mark the RPA as expired so that a new one gets
generated as soon as the connection drops.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-11 07:32:14 +02:00
Johan Hedberg 5be5e275ad Bluetooth: Avoid hard-coded IO capability values in SMP
This is a trivial change to use a proper define for the NoInputNoOutput
IO capability instead of hard-coded values.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-11 03:02:22 +02:00
Johan Hedberg aeaeb4bbca Bluetooth: Fix L2CAP information request handling for fixed channels
Even if we have no connection-oriented channels we should perform the
L2CAP Information Request procedures before notifying L2CAP channels of
the connection. This is so that the L2CAP channel implementations can
perform checks on what the remote side supports (e.g. does it support
the fixed channel in question).

So far the code has relied on the l2cap_do_start() function to initiate
the Information Request, however l2cap_do_start() is used on a
per-channel basis and only for connection-oriented channels. This means
that if there are no connection-oriented channels on the system we would
never start the Information Request procedure.

This patch creates a new l2cap_request_info() helper function to
initiate the Information Request procedure, and ensures that it is
called whenever a BR/EDR connection has been established. The patch also
updates fixed channels to be notified of connection readiness only once
the Information Request procedure has completed.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-11 02:45:24 +02:00
Johan Hedberg a6f7833ca3 Bluetooth: Add smp_ltk_sec_level() helper function
There are several places that need to determine the security level that
an LTK can provide. This patch adds a convenience function for this to
help make the code more readable.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-11 02:45:24 +02:00
Johan Hedberg 1afc2a1ab6 Bluetooth: Fix SMP security level when we have no IO capabilities
When the local IO capability is NoInputNoOutput any attempt to convert
the remote authentication requirement to a target security level is
futile. This patch makes sure that we set the target security level at
most to MEDIUM if the local IO capability is NoInputNoOutput.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-11 02:45:24 +02:00
Johan Hedberg 24bd0bd94e Bluetooth: Centralize disallowing SMP commands to a single place
All the cases where we mark SMP commands as dissalowed are their
respective command handlers. We can therefore simplify the code by
always clearing the bit immediately after testing it. This patch
converts the corresponding test_bit() call to a test_and_clear_bit()
call and also removes the now unused SMP_DISALLOW_CMD macro.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-11 02:45:24 +02:00
Johan Hedberg c05b9339c8 Bluetooth: Fix ignoring unknown SMP authentication requirement bits
The SMP specification states that we should ignore any unknown bits from
the authentication requirement. We already have a define for masking out
unknown bits but we haven't used it in all places so far. This patch
adds usage of the AUTH_REQ_MASK to all places that need it and ensures
that we don't pass unknown bits onward to other functions.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-11 02:45:24 +02:00
Johan Hedberg 3a7dbfb8ff Bluetooth: Remove unnecessary early initialization of variable
We do nothing else with the auth variable in smp_cmd_pairing_rsp()
besides passing it to tk_request() which in turn only cares about
whether one of the sides had the MITM bit set. It is therefore
unnecessary to assign a value to it until just before calling
tk_request(), and this value can simply be the bit-wise or of the local
and remote requirements.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-11 02:45:24 +02:00
Erik Hugne 0fc4dffad1 tipc: fix sparse warnings
This fixes the following sparse warnings:
sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static?
Also, the function is changed to return bool upon success, rather than a
potentially freed pointer.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 14:00:58 -07:00
Chris Perl 0f7a622ca6 rpc: xs_bind - do not bind when requesting a random ephemeral port
When attempting to establish a local ephemeral endpoint for a TCP or UDP
socket, do not explicitly call bind, instead let it happen implicilty when the
socket is first used.

The main motivating factor for this change is when TCP runs out of unique
ephemeral ports (i.e.  cannot find any ephemeral ports which are not a part of
*any* TCP connection).  In this situation if you explicitly call bind, then the
call will fail with EADDRINUSE.  However, if you allow the allocation of an
ephemeral port to happen implicitly as part of connect (or other functions),
then ephemeral ports can be reused, so long as the combination of (local_ip,
local_port, remote_ip, remote_port) is unique for TCP sockets on the system.

This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP
sockets the same.

This can allow mount.nfs(8) to continue to function successfully, even in the
face of misbehaving applications which are creating a large number of TCP
connections.

Signed-off-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-09-10 12:47:00 -07:00
David S. Miller 0aac383353 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
nf-next pull request

The following patchset contains Netfilter/IPVS updates for your
net-next tree. Regarding nf_tables, most updates focus on consolidating
the NAT infrastructure and adding support for masquerading. More
specifically, they are:

1) use __u8 instead of u_int8_t in arptables header, from
   Mike Frysinger.

2) Add support to match by skb->pkttype to the meta expression, from
   Ana Rey.

3) Add support to match by cpu to the meta expression, also from
   Ana Rey.

4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from
   Vytas Dauksa.

5) Fix netnet and netportnet hash types the range support for IPv4,
   from Sergey Popovich.

6) Fix missing-field-initializer warnings resolved, from Mark Rustad.

7) Dan Carperter reported possible integer overflows in ipset, from
   Jozsef Kadlecsick.

8) Filter out accounting objects in nfacct by type, so you can
   selectively reset quotas, from Alexey Perevalov.

9) Move specific NAT IPv4 functions to the core so x_tables and
   nf_tables can share the same NAT IPv4 engine.

10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4.

11) Move specific NAT IPv6 functions to the core so x_tables and
    nf_tables can share the same NAT IPv4 engine.

12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6.

13) Refactor code to add nft_delrule(), which can be reused in the
    enhancement of the NFT_MSG_DELTABLE to remove a table and its
    content, from Arturo Borrero.

14) Add a helper function to unregister chain hooks, from
    Arturo Borrero.

15) A cleanup to rename to nft_delrule_by_chain for consistency with
    the new nft_*() functions, also from Arturo.

16) Add support to match devgroup to the meta expression, from Ana Rey.

17) Reduce stack usage for IPVS socket option, from Julian Anastasov.

18) Remove unnecessary textsearch state initialization in xt_string,
    from Bojan Prtvar.

19) Add several helper functions to nf_tables, more work to prepare
    the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero.

20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from
    Arturo Borrero.

21) Support NAT flags in the nat expression to indicate the flavour,
    eg. random fully, from Arturo.

22) Add missing audit code to ebtables when replacing tables, from
    Nicolas Dichtel.

23) Generalize the IPv4 masquerading code to allow its re-use from
    nf_tables, from Arturo.

24) Generalize the IPv6 masquerading code, also from Arturo.

25) Add the new masq expression to support IPv4/IPv6 masquerading
    from nf_tables, also from Arturo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:46:32 -07:00
Joe Perches b167a37c7b netfilter: Convert pr_warning to pr_warn
Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Joe Perches 47c4cfc37f iucv: Convert pr_warning to pr_warn
Use the more common pr_warn.
Coalesce formats.
Realign arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Joe Perches 294a0b7f31 pktgen: Convert pr_warning to pr_warn
Use the more common pr_warn.
Realign arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Joe Perches ef423a4109 atm: Convert pr_warning to pr_warn
Use the more common pr_warn.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10 12:40:10 -07:00
Ilya Dryomov c27a3e4d66 libceph: do not hard code max auth ticket len
We hard code cephx auth ticket buffer size to 256 bytes.  This isn't
enough for any moderate setups and, in case tickets themselves are not
encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
ceph_decode_copy() doesn't - it's just a memcpy() wrapper).  Since the
buffer is allocated dynamically anyway, allocated it a bit later, at
the point where we know how much is going to be needed.

Fixes: http://tracker.ceph.com/issues/8979

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-10 20:08:36 +04:00
Ilya Dryomov 597cda3577 libceph: add process_one_ticket() helper
Add a helper for processing individual cephx auth tickets.  Needed for
the next commit, which deals with allocating ticket buffers.  (Most of
the diff here is whitespace - view with git diff -b).

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2014-09-10 20:08:35 +04:00
Sage Weil 73c3d4812b libceph: gracefully handle large reply messages from the mon
We preallocate a few of the message types we get back from the mon.  If we
get a larger message than we are expecting, fall back to trying to allocate
a new one instead of blindly using the one we have.

CC: stable@vger.kernel.org
Signed-off-by: Sage Weil <sage@redhat.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
2014-09-10 20:08:32 +04:00
Tom Herbert 19424e052f sit: Add gro callbacks to sit_offload
Add ipv6_gro_receive and ipv6_gro_complete to sit_offload to
support GRO.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 21:29:33 -07:00
Tom Herbert 9667e9bb3f ipip: Add gro callbacks to ipip offload
Add inet_gro_receive and inet_gro_complete to ipip_offload to
support GRO.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 21:29:33 -07:00
Tom Herbert 03d56daafe ipv6: Clear flush_id to make GRO work
In TCP gro we check flush_id which is derived from the IP identifier.
In IPv4 gro path the flush_id is set with the expectation that every
matched packet increments IP identifier. In IPv6, the flush_id is
never set and thus is uinitialized. What's worse is that in IPv6
over IPv4 encapsulation, the IP identifier is taken from the outer
header which is currently not incremented on every packet for Linux
stack, so GRO in this case never matches packets (identifier is
not increasing).

This patch clears flush_id for every time for a matched packet in
IPv6 gro_receive. We need to do this each time to overwrite the
setting that would be done in IPv4 gro_receive per the outer
header in IPv6 over Ipv4 encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 21:29:33 -07:00
David Howells ed3bfdfdce RxRPC: Fix missing __user annotation
Fix a missing __user annotation in a cast of a user space pointer (found by
checker).

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 20:39:40 -07:00
Florian Westphal 46cfd725c3 net: use kfree_skb_list() helper in more places
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 20:10:45 -07:00
Eric Dumazet 72bb17b37b ipv4: udp4_gro_complete() is static
net/ipv4/udp_offload.c:339:5: warning: symbol 'udp4_gro_complete' was
not declared. Should it be static?

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Fixes: 57c67ff4bd ("udp: additional GRO support")
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 20:10:45 -07:00
Eric Dumazet 416c51e17b netns: remove one sparse warning
net/core/net_namespace.c:227:18: warning: incorrect type in argument 1
(different address spaces)
net/core/net_namespace.c:227:18:    expected void const *<noident>
net/core/net_namespace.c:227:18:    got struct net_generic [noderef]
<asn:4>*gen

We can use rcu_access_pointer() here as read-side access to the pointer
was removed at least one grace period ago.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 20:10:45 -07:00
Eric Dumazet cc9c668a08 ipv6: udp6_gro_complete() is static
net/ipv6/udp_offload.c:159:5: warning: symbol 'udp6_gro_complete' was
not declared. Should it be static?

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 57c67ff4bd ("udp: additional GRO support")
Cc: Tom Herbert <therbert@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 20:10:44 -07:00
Eric Dumazet 8e380f004e ipv4: rcu cleanup in ip_ra_control()
Remove one sparse warning :
net/ipv4/ip_sockglue.c:328:22: warning: incorrect type in assignment (different address spaces)
net/ipv4/ip_sockglue.c:328:22:    expected struct ip_ra_chain [noderef] <asn:4>*next
net/ipv4/ip_sockglue.c:328:22:    got struct ip_ra_chain *[assigned] ra

And replace one rcu_assign_ptr() by RCU_INIT_POINTER() where applicable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 20:10:44 -07:00
Daniel Borkmann cbeddd5d16 ipv6: mcast: remove dead debugging defines
It's not used anywhere, so just remove these.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 20:10:44 -07:00
Ani Sinha 6a2a2b3ae0 net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland.
Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
break old binaries and any code for which there is no access to source code.
To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.

Signed-off-by: Ani Sinha <ani@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 17:35:46 -07:00
Willem de Bruijn 67cc0d4077 net-timestamp: optimize sock_tx_timestamp default path
Few packets have timestamping enabled. Exit sock_tx_timestamp quickly
in this common case.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 17:34:41 -07:00
Florian Westphal 17448e5f63 net_sched: sfq: remove unused macro
not used anymore since ddecf0f
(net_sched: sfq: add optional RED on top of SFQ).

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 17:34:41 -07:00
Daniel Borkmann 286aad3c40 net: bpf: be friendly to kmemcheck
Reported by Mikulas Patocka, kmemcheck currently barks out a
false positive since we don't have special kmemcheck annotation
for bitfields used in bpf_prog structure.

We currently have jited:1, len:31 and thus when accessing len
while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
we're reading uninitialized memory.

As we don't need the whole bit universe for pages member, we
can just split it to u16 and use a bool flag for jited instead
of a bitfield.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 16:58:56 -07:00
Eric Dumazet ca777eff51 tcp: remove dst refcount false sharing for prequeue mode
Alexander Duyck reported high false sharing on dst refcount in tcp stack
when prequeue is used. prequeue is the mechanism used when a thread is
blocked in recvmsg()/read() on a TCP socket, using a blocking model
rather than select()/poll()/epoll() non blocking one.

We already try to use RCU in input path as much as possible, but we were
forced to take a refcount on the dst when skb escaped RCU protected
region. When/if the user thread runs on different cpu, dst_release()
will then touch dst refcount again.

Commit 093162553c (tcp: force a dst refcount when prequeue packet)
was an example of a race fix.

It turns out the only remaining usage of skb->dst for a packet stored
in a TCP socket prequeue is IP early demux.

We can add a logic to detect when IP early demux is probably going
to use skb->dst. Because we do an optimistic check rather than duplicate
existing logic, we need to guard inet_sk_rx_dst_set() and
inet6_sk_rx_dst_set() from using a NULL dst.

Many thanks to Alexander for providing a nice bug report, git bisection,
and reproducer.

Tested using Alexander script on a 40Gb NIC, 8 RX queues.
Hosts have 24 cores, 48 hyper threads.

echo 0 >/proc/sys/net/ipv4/tcp_autocorking

for i in `seq 0 47`
do
  for j in `seq 0 2`
  do
     netperf -H $DEST -t TCP_STREAM -l 1000 \
             -c -C -T $i,$i -P 0 -- \
             -m 64 -s 64K -D &
  done
done

Before patch : ~6Mpps and ~95% cpu usage on receiver
After patch : ~9Mpps and ~35% cpu usage on receiver.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 16:54:41 -07:00
Johan Hedberg 196332f5a1 Bluetooth: Fix allowing SMP Signing info PDU
If the remote side is not distributing its IRK but is distributing the
CSRK the next PDU after master identification is the Signing
Information. This patch fixes a missing SMP_ALLOW_CMD() for this in the
smp_cmd_master_ident() function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-10 01:45:01 +02:00
Jeff Layton e0b93eddfe security: make security_file_set_fowner, f_setown and __f_setown void return
security_file_set_fowner always returns 0, so make it f_setown and
__f_setown void return functions and fix up the error handling in the
callers.

Cc: linux-security-module@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-09-09 16:01:36 -04:00
Li RongQing e403aded79 openvswitch: change the data type of error status to atomic_long_t
Change the date type of error status from u64 to atomic_long_t, and use atomic
operation, then remove the lock which is used to protect the error status.

The operation of atomic maybe faster than spin lock.

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:48:07 -07:00
Rami Rosen 5aaa62d608 bridge: Cleanup of unncessary check.
This patch removes an unncessary check in the br_afspec() method of
br_netlink.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:32:11 -07:00
Jiri Pirko 1332351617 bridge: implement rtnl_link_ops->changelink
Allow rtnetlink users to set bridge master info via IFLA_INFO_DATA attr
This initial part implements forward_delay, hello_time, max_age options.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:29:55 -07:00
Jiri Pirko e5c3ea5c66 bridge: implement rtnl_link_ops->get_size and rtnl_link_ops->fill_info
Allow rtnetlink users to get bridge master info in IFLA_INFO_DATA attr
This initial part implements forward_delay, hello_time, max_age options.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:29:55 -07:00
Jiri Pirko 3ac636b859 bridge: implement rtnl_link_ops->slave_changelink
Allow rtnetlink users to set port info via IFLA_INFO_SLAVE_DATA attr

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:29:55 -07:00
Jiri Pirko ced8283f90 bridge: implement rtnl_link_ops->get_slave_size and rtnl_link_ops->fill_slave_info
Allow rtnetlink users to get port info in IFLA_INFO_SLAVE_DATA attr

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:29:55 -07:00
Jiri Pirko 0f49579a39 bridge: switch order of rx_handler reg and upper dev link
The thing is that netdev_master_upper_dev_link calls
call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev). That generates rtnl
link message and during that, rtnl_link_ops->fill_slave_info is called.
But with current ordering, rx_handler and IFF_BRIDGE_PORT are not set
yet so there would have to be check for that in fill_slave_info callback.

Resolve this by reordering to similar what bonding and team does to
avoid the check.

Also add removal of IFF_BRIDGE_PORT flag into error path.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:29:54 -07:00
John W. Linville ab09b95cbf Two more fixes for mac80211 - one of them addresses a long-standing
issue that we only found when using vendor events more frequently;
 the other addresses some bad information being reported in userspace
 that people were starting to actually look at.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUDcHCAAoJEDBSmw7B7bqr4wAQALogLihe4uESol038EXLA6yi
 NqoDQLUKtpcbWzhZKQZlOf7W53ggIXHUeIYSS2SXpn9dOX/9VniA5MkzbJK70uCf
 oVaXkSHujFE7JkLZ5Pusto4WtCLSfmlWJTRt3SN2BEpGcW0boYM8FxVuKBMtIj7s
 H56eVcTSCWCQrMbCfQcS2pDkkRLQL2MYmA0CAT/Hbxy7KjBYuxMJvXz9HsoTKotj
 Zj86UzisZ2QJSyxRtV5v7Z95LTcQtRKCKQp1kIV+64Q7c/ZOeTZ6l//52MqUhLpH
 vIwfvAcW02iCWrN8d/lulkAKPfw4RCPvoEWt9sIsp9WQjwWhrsBXLocw6XiuAFHZ
 j2EGoZvOBJV43FQOSw9Hli232QkwHh2QTiLEObNaVbG0wTMuEfRcjIjKsSpJ/WRq
 HfSGzg32X4XLPVh2EJ5n7qbChPbTzgBv+ydU4ApESHFUJHmLPWrbKNgTEs14llpr
 oIM+QVFA4o3vukaZIL/wrPk9dfbTSeEOHvpLeJZ2BCqOko8WyitZqoD54NeaPthf
 u/S4QzeRifdHB3gPWI1NNIC5jAgkgA9Zy0a+4xZP75bSBPfPB4PeM9z5yXdv+y6G
 mT9KWoTUltTBoHRwDceb4BHOYXX4xjwZSfcWwjTs1q4A83a9PQidkG2RelJJEiGQ
 8WIVmGSlf6MpD2mZBnJc
 =gW3Y
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-john-2014-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg <johannes@sipsolutions.net> says:

"Two more fixes for mac80211 - one of them addresses a long-standing
issue that we only found when using vendor events more frequently;
the other addresses some bad information being reported in userspace
that people were starting to actually look at."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-09 14:29:36 -04:00
Vincent Bernat 49a601589c net/ipv4: bind ip_nonlocal_bind to current netns
net.ipv4.ip_nonlocal_bind sysctl was global to all network
namespaces. This patch allows to set a different value for each
network namespace.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 11:27:09 -07:00
Arturo Borrero 9ba1f726be netfilter: nf_tables: add new nft_masq expression
The nft_masq expression is intended to perform NAT in the masquerade flavour.

We decided to have the masquerade functionality in a separated expression other
than nft_nat.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:30 +02:00
Arturo Borrero be6b635cd6 netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).

The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.

This factorization only involves IPv6; a similar patch exists to
handle IPv4.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:29 +02:00
Arturo Borrero 8dd33cc93e netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables
Let's refactor the code so we can reach the masquerade functionality
from outside the xt context (ie. nftables).

The patch includes the addition of an atomic counter to the masquerade
notifier: the stuff to be done by the notifier is the same for xt and
nftables. Therefore, only one notification handler is needed.

This factorization only involves IPv4; a similar patch follows to
handle IPv6.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:29 +02:00
Nicolas Dichtel c55fbbb4a7 netfilter: ebtables: create audit records for replaces
This is already done for x_tables (family AF_INET and AF_INET6), let's
do it for AF_BRIDGE also.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:28 +02:00
Arturo Borrero e42eff8a32 netfilter: nft_nat: include a flag attribute
Both SNAT and DNAT (and the upcoming masquerade) can have additional
configuration parameters, such as port randomization and NAT addressing
persistence. We can cover these scenarios by simply adding a flag
attribute for userspace to fill when needed.

The flags to use are defined in include/uapi/linux/netfilter/nf_nat.h:

 NF_NAT_RANGE_MAP_IPS
 NF_NAT_RANGE_PROTO_SPECIFIED
 NF_NAT_RANGE_PROTO_RANDOM
 NF_NAT_RANGE_PERSISTENT
 NF_NAT_RANGE_PROTO_RANDOM_FULLY
 NF_NAT_RANGE_PROTO_RANDOM_ALL

The caller must take care of not messing up with the flags, as they are
added unconditionally to the final resulting nf_nat_range.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:27 +02:00
Arturo Borrero b9ac12ef09 netfilter: nf_tables: extend NFT_MSG_DELTABLE to support flushing the ruleset
This patch extend the NFT_MSG_DELTABLE call to support flushing the entire
ruleset.

The options now are:
 * No family speficied, no table specified: flush all the ruleset.
 * Family specified, no table specified: flush all tables in the AF.
 * Family specified, table specified: flush the given table.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:26 +02:00
Arturo Borrero ee01d54256 netfilter: nf_tables: add helpers to schedule objects deletion
This patch refactor the code to schedule objects deletion.
They are useful in follow-up patches.

In order to be able to use these new helper functions in all the code,
they are placed in the top of the file, with all the dependant functions
and symbols.

nft_rule_disactivate_next has been renamed to nft_rule_deactivate.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:25 +02:00
Bojan Prtvar c435201bed netfilter: xt_string: Remove unnecessary initialization of struct ts_state
The skb_find_text() accepts uninitialized textsearch state variable.

Signed-off-by: Bojan Prtvar <prtvar.b@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:25 +02:00
Julian Anastasov 5fcf0cf607 ipvs: reduce stack usage for sockopt data
Use union to reserve the required stack space for sockopt data
which is less than the currently hardcoded value of 128.
Now the tables for commands should be more readable.
The checks added for readability are optimized by compiler,
others warn at compile time if command uses too much
stack or exceeds the storage of set_arglen and get_arglen.

As Dan Carpenter points out, we can run for unprivileged user,
so we can silent some error messages.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
CC: Dan Carpenter <dan.carpenter@oracle.com>
CC: Andrey Utkin <andrey.krieger.utkin@gmail.com>
CC: David Binderman <dcb314@hotmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:24 +02:00
Ana Rey 3045d76070 netfilter: nf_tables: add devgroup support in meta expresion
Add devgroup support to let us match device group of a packets incoming
or outgoing interface.

Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:23 +02:00
Arturo Borrero ce24b7217b netfilter: nf_tables: rename nf_table_delrule_by_chain()
For the sake of homogenize the function naming scheme, let's rename
nf_table_delrule_by_chain() to nft_delrule_by_chain().

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-09 16:31:22 +02:00