Commit Graph

7970 Commits

Author SHA1 Message Date
Simon Horman 25cd9ba0ab openvswitch: Add basic MPLS support to kernel
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.

Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-11-05 23:52:33 -08:00
WANG Cong 25de4668d0 ipv6: move INET6_MATCH() to include/net/inet6_hashtables.h
It is only used in net/ipv6/inet6_hashtables.c.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:59:04 -05:00
Tom Herbert b17f709a24 gue: TX support for using remote checksum offload option
Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert c1aa8347e7 gue: Protocol constants for remote checksum offload
Define a private flag for remote checksun offload as well as a length
for the option.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert 5024c33ac3 gue: Add infrastructure for flags and options
Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:03 -05:00
Tom Herbert 63487babf0 net: Move fou_build_header into fou.c and refactor
Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.

Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05 16:30:02 -05:00
Alexander Aring dee56d1477 mac802154: add support for perm_extended_addr
This patch adding support for a perm extended address. This is useful
when a device supports an eeprom with a programmed static extended address.
If a device doesn't support such eeprom or serial registers then the
driver should generate a random extended address.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:05 +01:00
Alexander Aring 705cbbbe9c mac802154: cleanup ieee802154_netdev_to_extended_addr
This patch cleanups the ieee802154_be64_to_le64 to have a similar
function like ieee802154_le64_to_be64 only with switched source and
destionation types.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:05 +01:00
Alexander Aring 239a84a9a0 mac802154: add ieee802154_le64_to_be64
This patch adds a new function to convert a le64 to a be64. This is
useful to translate an extended address to a netdev dev_addr.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:05 +01:00
Alexander Aring 7c118c1a86 mac802154: add ieee802154_vif struct
This patch adds an ieee802154_vif similar like the ieee80211_vif which
holds the interface type and maybe further more attributes like the
ieee80211_vif structure.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring bd28a11f25 ieee802154: remove mlme get_phy callback
This patch removes the get_phy callback from mlme ops structure. Instead
we doing a dereference via ieee802154_ptr dev pointer. For backwards
compatibility we need to run get_device after dereference wpan_phy via
ieee802154_ptr.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring d5ae67bacd ieee802154: rework interface registration
This patch meld mac802154_netdev_register into ieee802154_if_add
function. Also we have now only one alloc_netdev call with one interface
setup routine "ieee802154_if_setup" instead two different one for each
interface type. This patch checks via runtime the interface type and do
different handling now. Additional we add the wpan_dev struct in
ieee802154_sub_if_data and set the new ieee802154_ptr while netdev
registration. This behaviour is very similar the mac80211 netdev
registration functionality.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:04 +01:00
Alexander Aring 9f3295b9ea ieee802154: remove nl802154 unused functions
The include/net/nl802154.h file contains a lot of prototypes which are
not used inside of ieee802154 subsystem. This patch removes this file
and make the only one used prototype "ieee802154_nl_start_confirm" as
static declaration in ieee802154/nl-mac.c

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Alexander Aring 53f9ee61b4 ieee802154: rework wpan_phy index assignment
This patch reworks the wpan_phy index incrementation. It's now similar
like wireless wiphy index incrementation. We move the wpan_phy index
attribute inside of cfg802154_registered_device and use atomic
operations instead locking mechanism via wpan_phy_mutex.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-05 21:53:03 +01:00
Joe Perches 824f1fbee7 netfilter: Convert print_tuple functions to return void
Since adding a new function to seq_file (seq_has_overflowed())
there isn't any value for functions called from seq_show to
return anything.   Remove the int returns of the various
print_tuple/<foo>_print_tuple functions.

Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:10:33 -05:00
Steven Rostedt (Red Hat) 37246a5837 netfilter: Remove return values for print_conntrack callbacks
The seq_printf() and friends are having their return values removed.
The print_conntrack() returns the result of seq_printf(), which is
meaningless when seq_printf() returns void. Might as well remove the
return values of print_conntrack() as well.

Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05 14:09:47 -05:00
John W. Linville bf515fb11a This relatively large batch of changes is comprised of the
following:
  * large mac80211-hwsim changes from Ben, Jukka and a bit myself
  * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
    University in Prague and Volkswagen Group Research
  * minstrel VHT work from Karl
  * more CSA work from Luca
  * WMM admission control support in mac80211 (myself)
  * various smaller fixes, spelling corrections, and minor API additions
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUWMs/AAoJEDBSmw7B7bqrmBQQAIbfAe7wH1WifRtOnhw3zWQQ
 K36+Edf3HlQ+EIkSs63QousRj2e7pGDOyhzMWLaqsmeTLteUtlGbr7qwiJO1QZdf
 Ml2V5O2s+b8hUIClDBVQF2L6+GGUmRUdQqvDDhkN1guoxD/Nk8cNtsRkSdiXWJWy
 R48NzvYDflBhc8uqPtR8jDb10eM3c00YP9HB+w9hYAfizD+FRue7UNp4MQIqwp9V
 HdKRT6L2n/6QA+Mzse0rMDes5qI7nIUNgj+hjqgJSnhITPMgGR5j/pitnVHrr81M
 ngOipBFG3svsQrwZh8nM4Llp0cM4Gs+GlgCieu9+TJpr2sY00Z3kYcp0pxtDoSxz
 Wblqz9n/bnW9mrkEfl12XqwwT5vguchwHoZ9cXhejDxSawWXoTRx20uW4ahO8ArA
 kWwwjTBVsQ5WMCtOBiqggzNKghwCc2ILmcZnjGdg9aNXcWsmQ4vyeCfG2QxBz/UB
 Grv/f9NSy6mzKQ34yv+lyR7rFZ8XcT03EVAnZSYz8X0ZZGxwtFupRp1RrBh1KPtD
 TJoe6Q71FfHKYRJ2xgygYkQFo+r9d0BKBeerq+Vu2hBeaqyi4aUwSj7d1sUaaq6N
 tL8fmAUqFjVOOUFeH1g07Xke5QD+yrEC7sJKkeRMfcRGB+dEa+2m3I5p4WDz9bWM
 AEvFSsYr/I9KI4d1huXD
 =6GIj
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"This relatively large batch of changes is comprised of the
following:
 * large mac80211-hwsim changes from Ben, Jukka and a bit myself
 * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
   University in Prague and Volkswagen Group Research
 * minstrel VHT work from Karl
 * more CSA work from Luca
 * WMM admission control support in mac80211 (myself)
 * various smaller fixes, spelling corrections, and minor API additions"

Conflicts:
	drivers/net/wireless/ath/wil6210/cfg80211.c

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-11-04 16:18:12 -05:00
Florian Westphal f7b3bec6f5 net: allow setting ecn via routing table
This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.

One can use 'ip route change dev $dev $net features ecn' to toggle this.

Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.

There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:

Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.

It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.

We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.

Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).

 [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
 [2] http://ecn.ethz.ch/

Joint work with Daniel Borkmann.

Reference: http://thread.gmane.org/gmane.linux.network/335797
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:09 -05:00
Florian Westphal f1673381b1 syncookies: split cookie_check_timestamp() into two functions
The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.

We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.

While this is the natural place for decoding options such as ECN which
commit 172d69e63c ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.

Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.

This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.

This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.

Joint work with Daniel Borkmann.

Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-04 16:06:09 -05:00
Eliad Peller cf2c92d840 mac80211: replace restart_complete() with reconfig_complete()
Drivers might want to know also when mac80211 has
completed reconfiguring after resume (e.g. in order
to know when frames can be passed to mac80211).

Rename restart_complete() to a more-generic reconfig_complete(),
and add a new enum to indicate the reconfiguration type.

Update the current users with the new prototype.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:49:00 +01:00
Rostislav Lisovy 239281f803 mac80211: 802.11p OCB mode support
This patch adds 802.11p OCB (Outside the Context of a BSS) mode
support.

When communicating in OCB mode a mandatory wildcard BSSID
(48 '1' bits) is used.

The EDCA parameters handling function was changed to support
802.11p specific values.

The insertion of a newly discovered STAs is done in the similar way
as in the IBSS mode -- through the deferred insertion.

The OCB mode uses a periodic 'housekeeping task' for expiration of
disconnected STAs (in the similar manner as in the MESH mode).

New Kconfig option for verbose OCB debugging outputs is added.

Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:18:21 +01:00
Rostislav Lisovy 6e0bd6c35b cfg80211: 802.11p OCB mode handling
This patch adds new iface type (NL80211_IFTYPE_OCB) representing
the OCB (Outside the Context of a BSS) mode.
When establishing a connection to the network a cfg80211_join_ocb
function is called (particular nl80211_command is added as well).
A mandatory parameters during the ocb_join operation are 'center
frequency' and 'channel width (5/10 MHz)'.

Changes done in mac80211 are minimal possible required to avoid
many warnings (warning: enumeration value 'NL80211_IFTYPE_OCB'
not handled in switch) during compilation. Full functionality
(where needed) is added in the following patch.

Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 13:18:17 +01:00
Felix Fietkau 5b3dc42b1b mac80211: add support for driver tx power reporting
The configured tx power is often limited by hardware capabilities,
channel settings, antenna configuration, etc.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[fix tracing compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-11-04 10:15:09 +01:00
Marcel Holtmann 845472e8d5 Bluetooth: Add hci_conn_lookup_type() helper function
Some drivers require knowledge of what connection handle is assigned
to what connection link type (ACL or SCO/eSCO). Instead of having each
driver implement connection tracking, provide a simple helper function
for lookup of the link type.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-03 10:13:43 +02:00
Marcel Holtmann 43d6bc46e2 Bluetooth: Introduce HCI_QUIRK_STRICT_DUPLICATE_FILTER
Some vendors decide to use a strict duplicate filter policy that only
filters on Bluetooth device addresses. This means that when the RSSI
changes, these devices are not reported again. During discovery it is
useful to actually get the RSSI updates.

Since this is specific to each controller, add a new quirk setting
that allows drivers to tell the core what kind of filtering policy
the controller uses.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-03 10:13:36 +02:00
Alexander Aring f753f7eeb4 mac802154: fix byteorder issues
Running make C=2 occurs these warnings:

cast from restricted __be64
incorrect type in argument 1 (different base types)
expected unsigned long long[unsigned] [usertype] val
got restricted __be64 [usertype]<noident>
cast from restricted __be64
cast to restricted __le64

This patch fix these warnings by forcing to __le64 type and using swabp64
instead swab64.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 21:52:02 +01:00
Marcel Holtmann 75e0569f7f Bluetooth: Add hci_reset_dev() for driver triggerd stack reset
Some Bluetooth drivers require to reset the upper stack. To avoid having
all drivers send HCI Hardware Error events, provide a generic function
to wrap the reset functionality.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-02 10:03:45 +02:00
Marcel Holtmann 24dfa34371 Bluetooth: Print error message for HCI_Hardware_Error event
When the HCI_Hardware_Error event is send by the controller or
injected by the driver, then at least print an error message.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-11-02 09:59:42 +02:00
Alexander Aring ab24f50f2a mac802154: add helper for converting dev_addr
This patch adds a helper for converting the dev_addr attribute in
netdevice to __le64 type. The dev_addr attribute is a char pointer
and contains the extended address in big endian byte order.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:07 +01:00
Alexander Aring 4a9a816a4f cfg802154: convert deprecated iface add and del
This patch removes the wpan_phy callbacks for add and del an interface
on a phy. Instead we introduce deprecated cfg802154 callbacks for this.
Furthermore we introduce a new netlink interface nl802154 which use
different callbacks. The deprecated function is to have a backwards
compatibility with the current netlink interface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
Alexander Aring a5dd1d72d8 cfg802154: introduce cfg802154_registered_device
This patch introduce the cfg802154_registered_device struct. Like
cfg80211_registered_device in wireless this should contain similar
functionality for cfg802154. This patch should not change any behaviour.
We just adds cfg802154_registered_device as container for wpan_phy struct.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-11-02 04:51:06 +01:00
David S. Miller 55b42b5ca2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/marvell.c

Simple overlapping changes in drivers/net/phy/marvell.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-01 14:53:27 -04:00
David S. Miller e3a88f9c4f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter/ipvs fixes for net

The following patchset contains fixes for netfilter/ipvs. This round of
fixes is larger than usual at this stage, specifically because of the
nf_tables bridge reject fixes that I would like to see in 3.18. The
patches are:

1) Fix a null-pointer dereference that may occur when logging
   errors. This problem was introduced by 4a4739d56b ("ipvs: Pull
   out crosses_local_route_boundary logic") in v3.17-rc5.

2) Update hook mask in nft_reject_bridge so we can also filter out
   packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow
   to filter from prerouting and postrouting"), which needs this chunk
   to work.

3) Two patches to refactor common code to forge the IPv4 and IPv6
   reject packets from the bridge. These are required by the nf_tables
   reject bridge fix.

4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
   packets from the bridge. The idea is to forge the reject packets and
   inject them to the original port via br_deliver() which is now
   exported for that purpose.

5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
   the original skbuff may cloned after prerouting when the bridge stack
   needs to flood it to several bridge ports, it is too late to reject
   the traffic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31 12:29:42 -04:00
Pablo Neira Ayuso 8bfcdf6671 netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_ip6hdr_put(): to build the IPv6 header.
* nf_reject_ip6_tcphdr_put(): to build the TCP header.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:49:57 +01:00
Pablo Neira Ayuso 052b9498ee netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions
That can be reused by the reject bridge expression to build the reject
packet. The new functions are:

* nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
* nf_reject_iphdr_put(): to build the IPv4 header.
* nf_reject_ip_tcphdr_put(): to build the TCP header.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-31 12:49:05 +01:00
Ben Hutchings 5188cd44c5 drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway.  Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 20:01:18 -04:00
Guenter Roeck 3d762a0f0a net: dsa: Add support for reading switch registers with ethtool
Add support for reading switch registers with 'ethtool -d'.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Guenter Roeck 6793abb4e8 net: dsa: Add support for switch EEPROM access
On some chips it is possible to access the switch eeprom.
Add infrastructure support for it.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Guenter Roeck 51579c3f1a net: dsa: Add support for reporting switch chip temperatures
Some switches provide chip temperature data.
Add support for reporting it through the hwmon subsystem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 14:54:11 -04:00
Alexander Aring ec718f3db9 mac802154: rx: add software checksum filtering check
This patch adds a new hardware flag which indicate that the transceiver
doesn't support check for bad checksum via hardware. Also add a handling of
this while receive.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring 90386a7e3b mac802154: separate omit tx/rx flags
This patch splits the IEEE802154_HW_OMIT_CKSUM hardware flag into
IEEE802154_HW_TX_OMIT_CKSUM and IEEE802154_HW_RX_OMIT_CKSUM. This is
useful to deliver the received crc from the driver layer to the monitor
interface. At the moment we can't do that without change the xmit
handling.

The received checksum should be visible in monitor mode only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring 94b792220c mac802154: add support for promiscuous mode
This patch adds a new driver operation to bring the transceiver into
promiscuous mode.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:45 +01:00
Alexander Aring c8fc84ed60 mac802154: add hardware address filter flag
Overdue introduction for address filtering hardware flag. Furthermore we
will check and set address filtering on interface up. This patch
prepares that we can check if an transceiver supports address filtering
option. Currently all mainline driver supports hardware address filtering.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Cc: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:44 +01:00
Alexander Aring ab79be3eeb mac802154: add IEEE802154_HW_ARET hw flag
This patch adds a new IEEE802154_HW_ARET hardware flag for indicating
that the transceiver supports ARET handling. Also remove the
IEEE802154_HW_FRAME_RETRIES from IEEE802154_HW_CSMA flag. Frame retries
handling is part of ARET.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:44 +01:00
Alexander Aring 90a6161df5 mac802154: remove tab after define
This patch removes tabs after define in hardware flags declarations.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-29 23:07:44 +01:00
Nicolas Dichtel 75fbfd3323 neigh: optimize neigh_parms_release()
In neigh_parms_release() we loop over all entries to find the entry given in
argument and being able to remove it from the list. By using a double linked
list, we can avoid this loop.

Here are some numbers with 30 000 dummy interfaces configured:

Before the patch:
$ time rmmod dummy
real	2m0.118s
user	0m0.000s
sys	1m50.048s

After the patch:
$ time rmmod dummy
real	1m9.970s
user	0m0.000s
sys	0m47.976s

Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 16:11:50 -04:00
Eric Dumazet dca145ffaa tcp: allow for bigger reordering level
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.

Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.

Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.

[1] Aggregation of links, per packet load balancing, fabrics not doing
 deep packet inspections, alternative TCP congestion modules...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29 15:05:15 -04:00
Alexander Aring a543c5989d mac802154: remove driver ops in wpan-phy
This patch removes the driver ops callbacks inside of wpan_phy struct.
It was used to check if a phy supports this driver ops call. We do this
now via hardware flags.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring e37d2ec82a mac802154: ops: declare channel and page as u8
The range of channel and page fits into an unsigned byte range. This
patch changes the set_channel parameter definitions for channel and
page to u8.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Alexander Aring 1630186100 mac802154: declare struct ieee802154_ops as const
The ieee802154_ops structure should be never changed during runtime.
This patch declare this structure as const to avoid a runtime change.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-28 23:19:07 +01:00
Simon Horman 941d8ebcf7 datapath: Rename last_action() as nla_is_last() and move to netlink.h
The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.

It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.

Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:07:29 -04:00
Peter Zijlstra 26cabd3125 sched, net: Clean up sk_wait_event() vs. might_sleep()
WARNING: CPU: 1 PID: 1744 at kernel/sched/core.c:7104 __might_sleep+0x58/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff81070e10>] prepare_to_wait+0x50 /0xa0

 [<ffffffff8105bc38>] __might_sleep+0x58/0x90
 [<ffffffff8148c671>] lock_sock_nested+0x31/0xb0
 [<ffffffff81498aaa>] sk_stream_wait_memory+0x18a/0x2d0

Which is a false positive because sk_wait_event() will already have
TASK_RUNNING at that point if it would've gone through
schedule_timeout().

So annotate with sched_annotate_sleep(); which goes away on !DEBUG builds.

Reported-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140924082242.524407432@infradead.org
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: netdev@vger.kernel.org
Cc: tglx@linutronix.de
Cc: ilya.dryomov@inktank.com
Cc: umgwanakikbuti@gmail.com
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:56:37 +01:00
Arturo Borrero e9105f1bea netfilter: nf_tables: add new expression nft_redir
This new expression provides NAT in the redirect flavour, which is to
redirect packets to local machine.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:49:39 +01:00
Arturo Borrero 9de920eddb netfilter: refactor NAT redirect IPv6 code to use it from nf_tables
This patch refactors the IPv6 code so it can be usable both from xt and
nf_tables.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:48:10 +01:00
Arturo Borrero 8b13eddfdf netfilter: refactor NAT redirect IPv4 to use it from nf_tables
This patch refactors the IPv4 code so it can be usable both from xt and
nf_tables.

A similar patch follows-up to handle IPv6.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-27 22:47:06 +01:00
Fabian Frederick 5e96d788d9 ipx: move extern sysctl_ipx_pprop_broadcasting to header file
include ipx.h from sysctl_net_ipx.c

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27 16:03:53 -04:00
Alexander Aring c5c47e67bc mac802154: rx: use tasklet instead workqueue
Tasklets have much less overhead than workqueues. This patch also
removes the heap allocation for the worker on receiving path.
Like mac80211 we should prefer use a tasklet here instead a workqueue to
getting fast out of interrupt context when ieee802154_rx_irqsafe is
called by driver. Like wireless inside the tasklet context we should
call netif_receive_skb instead netif_rx_ni anymore.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 18:07:40 +01:00
Martin Townsend 01141234f2 ieee802154: 6lowpan: rename process_data and lowpan_process_data
As we have decouple decompression from data delivery we can now rename all
occurences of process_data in receive path.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 15:51:16 +01:00
Martin Townsend f8b361768e 6lowpan: remove skb_deliver from IPHC
Separating skb delivery from decompression ensures that we can support further
decompression schemes and removes the mixed return value of error codes with
NET_RX_FOO.

Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-27 15:51:15 +01:00
Ben Greear e8f479b112 cfg80211: support configuring vif mac addr on create
This is useful when creating virtual interfaces.
Keeps udev from mucking with things it shouldn't, since
the default MAC is never seen by udev when specified on
the cmd-line during creation.

Signed-off-by: Ben Greear <greearb@candelatech.com>
[check for feature flag in nl80211 to force drivers to set it]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:33 +01:00
Ben Greear e27513fbd0 mac80211: support creating wiphy w/out creating wlanX
This will be helpful when using the mac80211_hwsim
wiphys and automated testing.  Let user create the
vifs as needed, and named as expected.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:32 +01:00
Ben Greear ad28757eef mac80211: allow creating wiphy devices with suggested name
Support creating wiphy devices with an optional name.
This will be used by hwsim to have better automated control
over virtual radio creation/deletion.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:31 +01:00
Ben Greear 1998d90ad4 cfg80211: support creating wiphy with suggested name
Kernel will attempt to use the name if it is supplied,
but if name cannot be used for some reason, the default
phyX name will be used instead.

Signed-off-by: Ben Greear <greearb@candelatech.com>
[while at it, use wiphy_name() instead of dev_name(),
 fix format string issue reported by Kees Cook]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-27 08:48:18 +01:00
Alexander Aring 1e7283a271 mac802154: tx: add comment at sync xmit callback
This patch adds a warning that xmit_sync callback is deprecated and
should be removed soon. The 802.15.4 subsystem will not accept synced
drivers anymore.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:05 +01:00
Alexander Aring ed0a5dce0c mac802154: tx: add support for xmit_async callback
This patch renames the existsing xmit callback to xmit_sync and
introduces an asynchronous xmit_async function. If ieee802154_ops
doesn't provide the xmit_async callback, then we have a fallback to
the xmit_sync callback.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:04 +01:00
Alexander Aring c208510351 mac802154: add netdev qeue helpers
This patch adds a new file net/mac802154/util.c which contains utility
functions for drivers, etc. This file contains functions to start and
stop queues for all virtual interfaces, this is useful for asynchronous
handling by driver level.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:03 +01:00
Alexander Aring dc67c6b30f mac802154: tx: remove xmit channel context switch
This patch removes the channel hopping feature before xmit. There are
several issues to provide a real channel hopping (timing requirements,
etc...).

We don't have any known kernelspace protocol which really use this
feature. And I don't know an real user of this feature.
We simply drop this feature now.

This patch removes also the hold of pib lock which isn't needed by any
real driver xmit callback implementation.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-26 17:24:03 +01:00
Alexander Aring c6f635faf3 mac802154: remove ieee802154_addr from driver_ops
This driver_ops callback function is never used by any driver.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:39 +02:00
Alexander Aring 5a50439775 ieee802154: rename ieee802154_dev to ieee802154_hw
The identical struct of the wireless stack implementation is named
ieee80211_hw. This is useful to name the variable hw instead of get
confusing with netdev dev variable.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:55:37 +02:00
Alexander Aring 4ca24aca55 ieee802154: move ieee802154 header
This patch moves the ieee802154 header into include/linux instead
include/net. Similar like wireless which have the ieee80211 header
inside of include/linux.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:57 +02:00
Alexander Aring 5ad60d3699 ieee802154: move wpan-phy.h to cfg802154.h
The wpan-phy header contains the wpan_phy struct information. Later this
header will be have similar function like cfg80211 header. The cfg80211
header contains the wiphy struct which is identically the wpan_phy
struct inside 802.15.4 subsystem.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 21:39:56 +02:00
Alexander Aring e72740d057 ieee802154: wpan-phy: use blank line after function
This patch adds a blank line after function declaration.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring 4896167d97 ieee802154: wpan-phy: change to __aligned(size)
This patch fix a checkpatch warning that __aligned(size) is preferred
over __attribute__((aligned(size))).

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring 57205c14ca mac802154: fix typo IEEE802515 to IEEE802154
This patch fixs a typo in address filter defines from IEEE802515 to
IEEE802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alexander Aring b3020f0a35 ieee802154: mac802154: remove FSF address
This patch removes the FSF address in files which belongs to ieee802154
and mac802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Alan Ott <alan@signal11.us>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-10-25 08:07:30 +02:00
Alfonso Acosta 89cbb0638e Bluetooth: Defer connection-parameter removal when unpairing
Systematically removing the LE connection parameters and autoconnect
action is inconvenient for rebonding without disconnecting from
userland (i.e. unpairing followed by repairing without
disconnecting). The parameters will be lost after unparing and
userland needs to take care of book-keeping them and re-adding them.

This patch allows userland to forget about parameter management when
rebonding without disconnecting. It defers clearing the connection
parameters when unparing without disconnecting, giving a chance of
keeping the parameters if a repairing happens before the connection is
closed.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Alfonso Acosta fd45ada910 Bluetooth: Include ADV_IND report in Device Connected event
There are scenarios when autoconnecting to a device after the
reception of an ADV_IND report (action 0x02), in which userland
might want to examine the report's contents.

For instance, the Service Data might have changed and it would be
useful to know ahead of time before starting any GATT procedures.
Also, the ADV_IND may contain Manufacturer Specific data which would
be lost if not propagated to userland. In fact, this patch results
from the need to rebond with a device lacking persistent storage which
notifies about losing its LTK in ADV_IND reports.

This patch appends the ADV_IND report which triggered the
autoconnection to the EIR Data in the Device Connected event.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:24 +02:00
Alfonso Acosta 48ec92fa4f Bluetooth: Refactor arguments of mgmt_device_connected
The values of a lot of the mgmt_device_connected() parameters come
straight from a hci_conn object. We can simplify the function by passing
the full hci_conn pointer to it.

Signed-off-by: Alfonso Acosta <fons@spotify.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-10-25 07:56:23 +02:00
Sébastien Barré 16704b129b Removed unused function sctp_addr_is_valid()
sctp_addr_is_valid() only appeared in its definition.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 00:37:21 -04:00
Kenjiro Nakayama 105970f608 net: Remove trailing whitespace in tcp.h icmp.c syncookies.c
Remove trailing whitespace in tcp.h icmp.c syncookies.c

Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-24 00:13:10 -04:00
Arik Nemtsov 0fc1e0495f mac80211: expose API allowing station iteration
Allow drivers to iterate all stations currently uploaded to them.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:40:03 +02:00
Arik Nemtsov 8b94148cfe mac80211: expose TDLS-initiator value to low level driver
Some drivers need to know which station is the TDLS link initiator.
Expose this value via the mac80211 ieee80211_sta structure.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:40:02 +02:00
Andrei Otcheretianski a7f3a76828 mac80211: export IE splitting function
Export ieee80211_ie_split function, so it can be reused by
drivers which need to insert additional elements.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-23 20:39:43 +02:00
Johannes Berg 02219b3abc mac80211: add WMM admission control support
Use the currently existing APIs between mac80211 and the low
level driver to implement WMM admission control.

The low level driver needs to report the media time used by
each transmitted packet in ieee80211_tx_status. Based on that
information, mac80211 will modify the QoS parameters of the
admission controlled Access Category when the limit is
reached. Once the original QoS parameters can be restored,
mac80211 will do so.

One issue with this approach is that management frames will
also erroneously be downgraded, but the upside is that the
implementation is simple. In the future, it can be extended
to driver- or device-based implementations that are better.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-22 10:42:09 +02:00
Johannes Berg 723e73acd1 cfg80211: make WMM TSPEC support flag an nl80211 feature flag
During the review of the corresponding wpa_supplicant patches we
noticed that the only way for it to detect that this functionality
is supported currently is to check for the command support. This
can be misleading though, as the command was also designed to, in
the future, support pure 802.11 TSPECs.

Expose the WMM-TSPEC feature flag to nl80211 so later we can also
expose an 802.11-TSPEC feature flag (if needed) to differentiate
the two cases.

Note: this change isn't needed in 3.18 as there's no driver there
yet that supports the functionality at all.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-22 10:41:49 +02:00
David S. Miller ce8ec48967 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains netfilter fixes for your net tree,
they are:

1) Fix missing MODULE_LICENSE() in the new nf_reject_ipv{4,6} modules.

2) Restrict nat and masq expressions to the nat chain type. Otherwise,
   users may crash their kernel if they attach a nat/masq rule to a non
   nat chain.

3) Fix hook validation in nft_compat when non-base chains are used.
   Basically, initialize hook_mask to zero.

4) Make sure you use match/targets in nft_compat from the right chain
   type. The existing validation relies on the table name which can be
   avoided by

5) Better netlink attribute validation in nft_nat. This expression has
   to reject the configuration when no address and proto configurations
   are specified.

6) Interpret NFTA_NAT_REG_*_MAX if only if NFTA_NAT_REG_*_MIN is set.
   Yet another sanity check to reject incorrect configurations from
   userspace.

7) Conditional NAT attribute dumping depending on the existing
   configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-20 11:57:47 -04:00
Jouni Malinen 988568669d cfg80211: Specify frame and reason code for NL80211_CMD_DEL_STATION
The optional NL80211_ATTR_MGMT_SUBTYPE and NL80211_ATTR_REASON_CODE
attributes can now be included in NL80211_CMD_DEL_STATION to indicate to
the driver which frame (Deauthentication/Disassociation) and reason code
in that frame should be used to indicate removal to the specific
station. This is used by drivers that implement AP SME and generate
those frames internally.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 16:39:23 +02:00
Jouni Malinen 89c771e5a6 cfg80211: Convert del_station() callback to use a param struct
This makes it easier to add new parameters for the del_station calls
without having to modify all drivers that use this.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-20 16:24:21 +02:00
Linus Torvalds e25b492741 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A quick batch of bug fixes:

  1) Fix build with IPV6 disabled, from Eric Dumazet.

  2) Several more cases of caching SKB data pointers across calls to
     pskb_may_pull(), thus referencing potentially free'd memory.  From
     Li RongQing.

  3) DSA phy code tests operation presence improperly, instead of going:

        if (x->ops->foo)
                r = x->ops->foo(args);

     it was going:

        if (x->ops->foo(args))
                r = x->ops->foo(args);

   Fix from Andew Lunn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  Net: DSA: Fix checking for get_phy_flags function
  ipv6: fix a potential use after free in sit.c
  ipv6: fix a potential use after free in ip6_offload.c
  ipv4: fix a potential use after free in gre_offload.c
  tcp: fix build error if IPv6 is not enabled
2014-10-19 11:41:57 -07:00
Eric Dumazet 815afe1785 tcp: fix build error if IPv6 is not enabled
$ make M=net/ipv4
  CC      net/ipv4/route.o
In file included from net/ipv4/route.c:102:0:
include/net/tcp.h: In function ‘tcp_v6_iif’:
include/net/tcp.h:738:32: error: ‘union <anonymous>’ has no member named ‘h6’
  return TCP_SKB_CB(skb)->header.h6.iif;

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 870c315138 ("ipv6: introduce tcp_v6_iif()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-18 13:01:36 -04:00
Linus Torvalds 2e923b0251 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Include fixes for netrom and dsa (Fabian Frederick and Florian
    Fainelli)

 2) Fix FIXED_PHY support in stmmac, from Giuseppe CAVALLARO.

 3) Several SKB use after free fixes (vxlan, openvswitch, vxlan,
    ip_tunnel, fou), from Li ROngQing.

 4) fec driver PTP support fixes from Luwei Zhou and Nimrod Andy.

 5) Use after free in virtio_net, from Michael S Tsirkin.

 6) Fix flow mask handling for megaflows in openvswitch, from Pravin B
    Shelar.

 7) ISDN gigaset and capi bug fixes from Tilman Schmidt.

 8) Fix route leak in ip_send_unicast_reply(), from Vasily Averin.

 9) Fix two eBPF JIT bugs on x86, from Alexei Starovoitov.

10) TCP_SKB_CB() reorganization caused a few regressions, fixed by Cong
    Wang and Eric Dumazet.

11) Don't overwrite end of SKB when parsing malformed sctp ASCONF
    chunks, from Daniel Borkmann.

12) Don't call sock_kfree_s() with NULL pointers, this function also has
    the side effect of adjusting the socket memory usage.  From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits)
  bna: fix skb->truesize underestimation
  net: dsa: add includes for ethtool and phy_fixed definitions
  openvswitch: Set flow-key members.
  netrom: use linux/uaccess.h
  dsa: Fix conversion from host device to mii bus
  tipc: fix bug in bundled buffer reception
  ipv6: introduce tcp_v6_iif()
  sfc: add support for skb->xmit_more
  r8152: return -EBUSY for runtime suspend
  ipv4: fix a potential use after free in fou.c
  ipv4: fix a potential use after free in ip_tunnel_core.c
  hyperv: Add handling of IP header with option field in netvsc_set_hash()
  openvswitch: Create right mask with disabled megaflows
  vxlan: fix a free after use
  openvswitch: fix a use after free
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ipv4: clean up cookie_v4_check()
  ipv4: share tcp_v4_save_options() with cookie_v4_check()
  ipv4: call __ip_options_echo() in cookie_v4_check()
  atm: simplify lanai.c by using module_pci_driver
  ...
2014-10-18 09:31:37 -07:00
Florian Fainelli a28205437b net: dsa: add includes for ethtool and phy_fixed definitions
net/dsa/slave.c uses functions and structures declared in phy_fixed.h
but does not explicitely include it, while dsa.h needs structure
declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix
those by including the correct header files.

Fixes: ec9436baed ("net: dsa: allow drivers to do link adjustment")
Fixes: ce31b31c68 ("net: dsa: allow updating fixed PHY link information")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:54:46 -04:00
Eric Dumazet 870c315138 ipv6: introduce tcp_v6_iif()
Commit 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line
misses") added a regression for SO_BINDTODEVICE on IPv6.

This is because we still use inet6_iif() which expects that IP6 control
block is still at the beginning of skb->cb[]

This patch adds tcp_v6_iif() helper and uses it where necessary.

Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
parameter to it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 23:48:07 -04:00
Cong Wang 461b74c391 ipv4: clean up cookie_v4_check()
We can retrieve opt from skb, no need to pass it as a parameter.
And opt should always be non-NULL, no need to check.

Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Tested-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 12:02:57 -04:00
Cong Wang e25f866fbc ipv4: share tcp_v4_save_options() with cookie_v4_check()
cookie_v4_check() allocates ip_options_rcu in the same way
with tcp_v4_save_options(), we can just make it a helper function.

Cc: Krzysztof Kolasa <kkolasa@winsoft.pl>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-17 12:02:57 -04:00
Nicolas Dichtel 2c6ba4b15b netlink: fix description of portid
Avoid confusion between pid and portid.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-16 14:52:35 -04:00
Linus Torvalds 0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Li RongQing 02ea80741a ipv6: remove aca_lock spinlock from struct ifacaddr6
no user uses this lock.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 13:15:15 -04:00
Daniel Borkmann b69040d8e3 net: sctp: fix panic on duplicate ASCONF chunks
When receiving a e.g. semi-good formed connection scan in the
form of ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ---------------- ASCONF_a; ASCONF_b ----------------->

... where ASCONF_a equals ASCONF_b chunk (at least both serials
need to be equal), we panic an SCTP server!

The problem is that good-formed ASCONF chunks that we reply with
ASCONF_ACK chunks are cached per serial. Thus, when we receive a
same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
not need to process them again on the server side (that was the
idea, also proposed in the RFC). Instead, we know it was cached
and we just resend the cached chunk instead. So far, so good.

Where things get nasty is in SCTP's side effect interpreter, that
is, sctp_cmd_interpreter():

While incoming ASCONF_a (chunk = event_arg) is being marked
!end_of_packet and !singleton, and we have an association context,
we do not flush the outqueue the first time after processing the
ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
queued up, although we set local_cork to 1. Commit 2e3216cd54
changed the precedence, so that as long as we get bundled, incoming
chunks we try possible bundling on outgoing queue as well. Before
this commit, we would just flush the output queue.

Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
continue to process the same ASCONF_b chunk from the packet. As
we have cached the previous ASCONF_ACK, we find it, grab it and
do another SCTP_CMD_REPLY command on it. So, effectively, we rip
the chunk->list pointers and requeue the same ASCONF_ACK chunk
another time. Since we process ASCONF_b, it's correctly marked
with end_of_packet and we enforce an uncork, and thus flush, thus
crashing the kernel.

Fix it by testing if the ASCONF_ACK is currently pending and if
that is the case, do not requeue it. When flushing the output
queue we may relink the chunk for preparing an outgoing packet,
but eventually unlink it when it's copied into the skb right
before transmission.

Joint work with Vlad Yasevich.

Fixes: 2e3216cd54 ("sctp: Follow security requirement of responding with 1 packet")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 12:46:22 -04:00
Daniel Borkmann 9de7922bc7 net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
Commit 6f4c618ddb ("SCTP : Add paramters validity check for
ASCONF chunk") added basic verification of ASCONF chunks, however,
it is still possible to remotely crash a server by sending a
special crafted ASCONF chunk, even up to pre 2.6.12 kernels:

skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
 end:0x440 dev:<NULL>
 ------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
 <IRQ>
 [<ffffffff8144fb1c>] skb_put+0x5c/0x70
 [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
 [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
 [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
 [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
 [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
 [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
 [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
 [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
 [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
 [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
 [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
 [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
 [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
 [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
 [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
 [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
 [<ffffffff81496ac5>] ip_rcv+0x275/0x350
 [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
 [<ffffffff81460588>] netif_receive_skb+0x58/0x60

This can be triggered e.g., through a simple scripted nmap
connection scan injecting the chunk after the handshake, for
example, ...

  -------------- INIT[ASCONF; ASCONF_ACK] ------------->
  <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------
  ------------------ ASCONF; UNKNOWN ------------------>

... where ASCONF chunk of length 280 contains 2 parameters ...

  1) Add IP address parameter (param length: 16)
  2) Add/del IP address parameter (param length: 255)

... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
Address Parameter in the ASCONF chunk is even missing, too.
This is just an example and similarly-crafted ASCONF chunks
could be used just as well.

The ASCONF chunk passes through sctp_verify_asconf() as all
parameters passed sanity checks, and after walking, we ended
up successfully at the chunk end boundary, and thus may invoke
sctp_process_asconf(). Parameter walking is done with
WORD_ROUND() to take padding into account.

In sctp_process_asconf()'s TLV processing, we may fail in
sctp_process_asconf_param() e.g., due to removal of the IP
address that is also the source address of the packet containing
the ASCONF chunk, and thus we need to add all TLVs after the
failure to our ASCONF response to remote via helper function
sctp_add_asconf_response(), which basically invokes a
sctp_addto_chunk() adding the error parameters to the given
skb.

When walking to the next parameter this time, we proceed
with ...

  length = ntohs(asconf_param->param_hdr.length);
  asconf_param = (void *)asconf_param + length;

... instead of the WORD_ROUND()'ed length, thus resulting here
in an off-by-one that leads to reading the follow-up garbage
parameter length of 12336, and thus throwing an skb_over_panic
for the reply when trying to sctp_addto_chunk() next time,
which implicitly calls the skb_put() with that length.

Fix it by using sctp_walk_params() [ which is also used in
INIT parameter processing ] macro in the verification *and*
in ASCONF processing: it will make sure we don't spill over,
that we walk parameters WORD_ROUND()'ed. Moreover, we're being
more defensive and guard against unknown parameter types and
missized addresses.

Joint work with Vlad Yasevich.

Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14 12:46:22 -04:00
Andy Shevchenko 5df1415aee lib80211: remove unused print_ssid()
In kernel we have %*pE specifier to print an escaped buffer.  All users
now switched to that approach.

This fixes a bug as well.  The current implementation wrongly prints
octal numbers: only two first digits are used in case when 3 are
required and the rest of the string ends up cut off.

Additionally by default the \f, \v, \a, and \e are escaped to their
alphabetic representation.  It's safe to do since it is currently used
for messaging only.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "John W . Linville" <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:27 +02:00
Pablo Neira Ayuso 7210e4e38f netfilter: nf_tables: restrict nat/masq expressions to nat chain type
This adds the missing validation code to avoid the use of nat/masq from
non-nat chains. The validation assumes two possible configuration
scenarios:

1) Use of nat from base chain that is not of nat type. Reject this
   configuration from the nft_*_init() path of the expression.

2) Use of nat from non-base chain. In this case, we have to wait until
   the non-base chain is referenced by at least one base chain via
   jump/goto. This is resolved from the nft_*_validate() path which is
   called from nf_tables_check_loops().

The user gets an -EOPNOTSUPP in both cases.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-13 20:42:00 +02:00
Linus Torvalds ca321885b0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "This set fixes a bunch of fallout from the changes that went in during
  this merge window, particularly:

   - Fix fsl_pq_mdio (Claudiu Manoil) and fm10k (Pranith Kumar) build
     failures.

   - Several networking drivers do atomic_set() on page counts where
     that's not exactly legal.  From Eric Dumazet.

   - Make __skb_flow_get_ports() work cleanly with unaligned data, from
     Alexander Duyck.

   - Fix some kernel-doc buglets in rfkill and netlabel, from Fabian
     Frederick.

   - Unbalanced enable_irq_wake usage in bcmgenet and systemport
     drivers, from Florian Fainelli.

   - pxa168_eth needs to depend on HAS_DMA, from Geert Uytterhoeven.

   - Multi-dequeue in the qdisc layer severely bypasses the fairness
     limits the previous code used to enforce, reintroduce in a way that
     at the same time doesn't compromise bulk dequeue opportunities.
     From Jesper Dangaard Brouer.

   - macvlan receive path unnecessarily hops through a softirq by using
     netif_rx() instead of netif_receive_skb().  From Jason Baron"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits)
  net: systemport: avoid unbalanced enable_irq_wake calls
  net: bcmgenet: avoid unbalanced enable_irq_wake calls
  net: bcmgenet: fix off-by-one in incrementing read pointer
  net: fix races in page->_count manipulation
  mlx4: fix race accessing page->_count
  ixgbe: fix race accessing page->_count
  igb: fix race accessing page->_count
  fm10k: fix race accessing page->_count
  net/phy: micrel: Add clock support for KSZ8021/KSZ8031
  flow-dissector: Fix alignment issue in __skb_flow_get_ports
  net: filter: fix the comments
  Documentation: replace __sk_run_filter with __bpf_prog_run
  macvlan: optimize the receive path
  macvlan: pass 'bool' type to macvlan_count_rx()
  drivers: net: xgene: Add 10GbE ethtool support
  drivers: net: xgene: Add 10GbE support
  drivers: net: xgene: Preparing for adding 10GbE support
  dtb: Add 10GbE node to APM X-Gene SoC device tree
  Documentation: dts: Update section header for APM X-Gene
  MAINTAINERS: Update APM X-Gene section
  ...
2014-10-11 21:19:00 -04:00
David S. Miller 7b6fa1eef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter fixes for net-next

This batch contains two fixes for what you have in your net-next,
they are:

1) Remove nf_send_reset6() from header file. This function now resides
   in the nf_reject_ipv6 module. Reported by Eric Dumazet.

2) Fix wrong NFT_REJECT_ICMPX_MAX definition and adjust code to fix
   errors reported by Dan Carpenter's static analysis tools.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:01:09 -04:00
Linus Torvalds c798360cd1 Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:
 "A lot of activities on percpu front.  Notable changes are...

   - percpu allocator now can take @gfp.  If @gfp doesn't contain
     GFP_KERNEL, it tries to allocate from what's already available to
     the allocator and a work item tries to keep the reserve around
     certain level so that these atomic allocations usually succeed.

     This will replace the ad-hoc percpu memory pool used by
     blk-throttle and also be used by the planned blkcg support for
     writeback IOs.

     Please note that I noticed a bug in how @gfp is interpreted while
     preparing this pull request and applied the fix 6ae833c7fe
     ("percpu: fix how @gfp is interpreted by the percpu allocator")
     just now.

   - percpu_ref now uses longs for percpu and global counters instead of
     ints.  It leads to more sparse packing of the percpu counters on
     64bit machines but the overhead should be negligible and this
     allows using percpu_ref for refcnting pages and in-memory objects
     directly.

   - The switching between percpu and single counter modes of a
     percpu_ref is made independent of putting the base ref and a
     percpu_ref can now optionally be initialized in single or killed
     mode.  This allows avoiding percpu shutdown latency for cases where
     the refcounted objects may be synchronously created and destroyed
     in rapid succession with only a fraction of them reaching fully
     operational status (SCSI probing does this when combined with
     blk-mq support).  It's also planned to be used to implement forced
     single mode to detect underflow more timely for debugging.

  There's a separate branch percpu/for-3.18-consistent-ops which cleans
  up the duplicate percpu accessors.  That branch causes a number of
  conflicts with s390 and other trees.  I'll send a separate pull
  request w/ resolutions once other branches are merged"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
  percpu: fix how @gfp is interpreted by the percpu allocator
  blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
  percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
  percpu_ref: add PERCPU_REF_INIT_* flags
  percpu_ref: decouple switching to percpu mode and reinit
  percpu_ref: decouple switching to atomic mode and killing
  percpu_ref: add PCPU_REF_DEAD
  percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
  percpu_ref: replace pcpu_ prefix with percpu_
  percpu_ref: minor code and comment updates
  percpu_ref: relocate percpu_ref_reinit()
  Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
  Revert "percpu: free percpu allocation info for uniprocessor system"
  percpu-refcount: make percpu_ref based on longs instead of ints
  percpu-refcount: improve WARN messages
  percpu: fix locking regression in the failure path of pcpu_alloc()
  percpu-refcount: add @gfp to percpu_ref_init()
  proportions: add @gfp to init functions
  percpu_counter: add @gfp to percpu_counter_init()
  percpu_counter: make percpu_counters_lock irq-safe
  ...
2014-10-10 07:26:02 -04:00
Luciano Coelho 0f791eb47f mac80211: allow channel switch with multiple channel contexts
Channel switch with multiple channel contexts should now work fine.
Remove check that disallows switches when multiple contexts are in
use.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:09 +02:00
Luciano Coelho f1d65583bc mac80211: add post_channel_switch driver operation
As a counterpart to the pre_channel_switch operation, add a
post_channel_switch operation.  This allows the drivers to go back to
a normal configuration after the channel switch is completed.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:09 +02:00
Luciano Coelho 6d027bcc8a mac80211: add pre_channel_switch driver operation
Some drivers may need to prepare for a channel switch also when it is
initiated from the remote side (eg. station, P2P client).  To make
this possible, add a generic callback that can be called for all
interface types.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:08 +02:00
Luciano Coelho 2ba45384e5 mac80211: add device_timestamp to the ieee80211_channel_switch struct
Some devices may need the device timestamp in order to synchronize the
channel switch.  To pass this value back to the driver, add it to the
channel switch structure and copy the device_timestamp value received
in the rx info structure into it.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:30:08 +02:00
Henning Rogge 66be7d2bcd cfg80211: add ops to query mesh proxy path table
Add two new cfg80211 operations for querying a table with proxied mesh
paths.

Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-10-09 11:19:07 +02:00
Linus Torvalds 35a9ad8af0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
      virtio_net

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric
      Dumazet.

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom
      Herbert.

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian
      Fainelli.

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John
      Fastabend.

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
      Duyck.

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
      Dumazet.

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
  ...
2014-10-08 21:40:54 -04:00
David S. Miller 64b1f00a08 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-10-08 16:22:22 -04:00
Linus Torvalds d0cd84817c dmaengine-3.17
1/ Step down as dmaengine maintainer see commit 08223d80df "dmaengine
    maintainer update"
 
 2/ Removal of net_dma, as it has been marked 'broken' since 3.13 (commit
    7787380336 "net_dma: mark broken"), without reports of performance
    regression.
 
 3/ Miscellaneous fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUKDLKAAoJEB7SkWpmfYgC7wwP/iNHqRjf1suMUTBIF3P6Hgbe
 VCUwh0IkuujMPDG46WRn6cYzarRxVPLoGaLHLPszgjI6pmGPVv19wqeDOlUxtcmr
 0iQWEWv/zqseaAIW+4gj/WYCyMgKil49EUBJKCZCfNmIaad+e0pr8f0uE5yOkHPM
 tqWoZERu9A4dlXGr1TjeOZVzdnPrCt92MrLDN6ZZ6tMuJaEc5PauaLxKTeGy5fYj
 UB+k1xJQzECbsYfpB+uCVYl5/qPO1rNyuBYS8THCsW+JYmrbbfH2kkF2lo2FaUpO
 8Yd50FtzXHKWwAt7BzfIwU2M7x0wRmryrC/xsQi6M+WmVeHYvvHUIpzaA66xRZ5x
 fCy3Fu8sEnmnmboAbh2v2c5uTycqRl2xPzbpLAuxglloXIxzi3ckp6ESF/Z4SldH
 oxIoEievN7lah3vKgvlHZYcWDzrYr8EKf/EzFe9RqDBQDKtzDzre1H9Uivr387Vm
 uFUcGHYG/GXuX47C7EUsMtaSW2UEoR2ytw/HR6CKFPTVXwAzEO6kA9vg0EqL0iIq
 2wVLgavlZuwegmaUBgnr+bgVZMvVN7OU7fAIRVe5xNO6itrPKvheSlQthmRiiq9C
 uzOu4PS6PexqzHUNPCcJpCsj+lawmCSrE0bxtPzTA/CQInVgWs219V9+W5Gn/0YA
 EARN9k6ueX9PZPQrPQLm
 =BBBv
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine

Pull dmaengine updates from Dan Williams:
 "Even though this has fixes marked for -stable, given the size and the
  needed conflict resolutions this is 3.18-rc1/merge-window material.

  These patches have been languishing in my tree for a long while.  The
  fact that I do not have the time to do proper/prompt maintenance of
  this tree is a primary factor in the decision to step down as
  dmaengine maintainer.  That and the fact that the bulk of drivers/dma/
  activity is going through Vinod these days.

  The net_dma removal has not been in -next.  It has developed simple
  conflicts against mainline and net-next (for-3.18).

  Continuing thanks to Vinod for staying on top of drivers/dma/.

  Summary:

   1/ Step down as dmaengine maintainer see commit 08223d80df
      "dmaengine maintainer update"

   2/ Removal of net_dma, as it has been marked 'broken' since 3.13
      (commit 7787380336 "net_dma: mark broken"), without reports of
      performance regression.

   3/ Miscellaneous fixes"

* tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
  net: make tcp_cleanup_rbuf private
  net_dma: revert 'copied_early'
  net_dma: simple removal
  dmaengine maintainer update
  dmatest: prevent memory leakage on error path in thread
  ioat: Use time_before_jiffies()
  dmaengine: fix xor sources continuation
  dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
  dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
  dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
  ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
  drivers: dma: Include appropriate header file in dca.c
  drivers: dma: Mark functions as static in dma_v3.c
  dma: mv_xor: Add DMA API error checks
  ioat/dca: Use dev_is_pci() to check whether it is pci device
2014-10-07 20:39:25 -04:00
Pablo Neira Ayuso 91c1a09b33 netfilter: kill nf_send_reset6() from include/net/netfilter/ipv6/nf_reject.h
nf_send_reset6() now resides in net/ipv6/netfilter/nf_reject_ipv6.c

Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <edumazet@google.com>
2014-10-07 19:58:07 +02:00
Andy Zhou 7c5df8fa19 openvswitch: fix a compilation error when CONFIG_INET is not setW!
Fix a openvswitch compilation error when CONFIG_INET is not set:

=====================================================
   In file included from include/net/geneve.h:4:0,
                       from net/openvswitch/flow_netlink.c:45:
		          include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
			  >> include/net/udp_tunnel.h💯2: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
			  >>      return iptunnel_handle_offloads(skb, udp_csum, type);
			  >>           ^
			  >>           >> include/net/udp_tunnel.h💯2: warning: return makes pointer from integer without a cast
			  >>           >>    cc1: some warnings being treated as errors

=====================================================

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:10:49 -04:00
Hannes Frederic Sowa 812918c464 ipv6: make fib6 serial number per namespace
Try to reduce number of possible fn_sernum mutation by constraining them
to their namespace.

Also remove rt_genid which I forgot to remove in 705f1c869d ("ipv6:
remove rt6i_genid").

Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa 42b1870646 ipv6: make rt_sernum atomic and serial number fields ordinary ints
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
Hannes Frederic Sowa 94b2cfe02b ipv6: minor fib6 cleanups like type safety, bool conversion, inline removal
Also renamed struct fib6_walker_t to fib6_walker and enum fib_walk_state_t
to fib6_walk_state as recommended by Cong Wang.

Cc: Cong Wang <cwang@twopensource.com>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 00:02:30 -04:00
John Fastabend 82a470f111 net: sched: remove tcf_proto from ematch calls
This removes the tcf_proto argument from the ematch code paths that
only need it to reference the net namespace. This allows simplifying
qdisc code paths especially when we need to tear down the ematch
from an RCU callback. In this case we can not guarentee that the
tcf_proto structure is still valid.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 18:02:32 -04:00
Eric Dumazet f2600cf02b net: sched: avoid costly atomic operation in fq_dequeue()
Standard qdisc API to setup a timer implies an atomic operation on every
packet dequeue : qdisc_unthrottled()

It turns out this is not really needed for FQ, as FQ has no concept of
global qdisc throttling, being a qdisc handling many different flows,
some of them can be throttled, while others are not.

Fix is straightforward : add a 'bool throttle' to
qdisc_watchdog_schedule_ns(), and remove calls to qdisc_unthrottled()
in sch_fq.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:55:10 -04:00
Jesse Gross f579668406 openvswitch: Add support for Geneve tunneling.
The Openvswitch implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Singed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:21 -04:00
Andy Zhou 0b5e8b8eea net: Add Geneve tunneling protocol driver
This adds a device level support for Geneve -- Generic Network
Virtualization Encapsulation. The protocol is documented at
http://tools.ietf.org/html/draft-gross-geneve-01

Only protocol layer Geneve support is provided by this driver.
Openvswitch can be used for configuring, set up and tear down
functional Geneve tunnels.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:32:20 -04:00
Vlad Yasevich bdf6fa52f0 sctp: handle association restarts when the socket is closed.
Currently association restarts do not take into consideration the
state of the socket.  When a restart happens, the current assocation
simply transitions into established state.  This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user.  The conditions
to trigger this are as follows:
  1) Remote does not acknoledge some data causing data to remain
     outstanding.
  2) Local application calls close() on the socket.  Since data
     is still outstanding, the association is placed in SHUTDOWN_PENDING
     state.  However, the socket is closed.
  3) The remote tries to create a new association, triggering a restart
     on the local system.  The association moves from SHUTDOWN_PENDING
     to ESTABLISHED.  At this point, it is no longer reachable by
     any socket on the local system.

This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk.  This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.

Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-06 00:21:45 -04:00
David S. Miller a4b4a2b7f9 Merge tag 'master-2014-10-02' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-10-03

Please pull tihs batch of updates intended for the 3.18 stream!

For the iwlwifi bits, Emmanuel says:

"I have here a few things that depend on the latest mac80211's changes:
RRM, TPC, Quiet Period etc...  Eyal keeps improving our rate control
and we have a new device ID. This last patch should probably have
gone to wireless.git, but at that stage, I preferred to send it to
-next and CC stable."

For (most of) the Atheros bits, Kalle says:

"The only new feature is testmode support from me. Ben added a new method
to crash the firmware with an assert for debug purposes. As usual, we
have lots of smaller fixes from Michal. Matteo fixed a Kconfig
dependency with debugfs. I fixed some warnings recently added to
checkpatch."

For the NFC bits, Samuel says:

"We've had major updates for TI and ST Microelectronics drivers, and a
few NCI related changes.

For TI's trf7970a driver:

- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues

For ST Microelectronics' ST21NFCA and ST21NFCB drivers:

- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes

Finally, Marvell added ISO15693 support to the NCI stack, together with a
couple of NCI fixes."

For the Bluetooth bits, Johan says:

"This 3.18 pull request replaces the one I did on Monday ("bluetooth-next
2014-09-22", which hasn't been pulled yet). The additions since the last
request are:

 - SCO connection fix for devices not supporting eSCO
 - Cleanups regarding the SCO establishment logic
 - Remove unnecessary return value from logging functions
 - Header compression fix for 6lowpan
 - Cleanups to the ieee802154/mrf24j40 driver

Here's a copy from previous request that this one replaces:

'
Here are some more patches for 3.18. They include various fixes to the
btusb HCI driver, a fix for LE SMP, as well as adding Jukka to the
MAINTAINERS file for generic 6LoWPAN (as requested by Alexander Aring).

I've held on to this pull request a bit since we were waiting for a SCO
related fix to get sorted out first. However, since the merge window is
getting closer I decided not to wait for it. If we do get the fix sorted
out there'll probably be a second small pull request later this week.
'"

And,

"Unless 3.17 gets delayed this will probably be our last -next pull request for
3.18. We've got:

  - New Marvell hardware supportr
  - Multicast support for 6lowpan
  - Several of 6lowpan fixes & cleanups
  - Fix for a (false-positive) lockdep warning in L2CAP
  - Minor btusb cleanup"

On top of all that comes the usual sort of updates to ath5k, ath9k,
ath10k, brcmfmac, mwifiex, and wil6210.  This time around there are
also a number of rtlwifi updates to enable some new hardware and
to reconcile the in-kernel drivers with some newer releases of the
Realtek vendor drivers.  Also of note is some device tree work for
the bcma bus.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:34:39 -04:00
David S. Miller 61b37d2f54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains another batch with Netfilter/IPVS updates
for net-next, they are:

1) Add abstracted ICMP codes to the nf_tables reject expression. We
   introduce four reasons to reject using ICMP that overlap in IPv4
   and IPv6 from the semantic point of view. This should simplify the
   maintainance of dual stack rule-sets through the inet table.

2) Move nf_send_reset() functions from header files to per-family
   nf_reject modules, suggested by Patrick McHardy.

3) We have to use IS_ENABLED(CONFIG_BRIDGE_NETFILTER) everywhere in the
   code now that br_netfilter can be modularized. Convert remaining spots
   in the network stack code.

4) Use rcu_barrier() in the nf_tables module removal path to ensure that
   we don't leave object that are still pending to be released via
   call_rcu (that may likely result in a crash).

5) Remove incomplete arch 32/64 compat from nft_compat. The original (bad)
   idea was to probe the word size based on the xtables match/target info
   size, but this assumption is wrong when you have to dump the information
   back to userspace.

6) Allow to filter from prerouting and postrouting in the nf_tables bridge.
   In order to emulate the ebtables NAT chains (which are actually simple
   filter chains with no special semantics), we have support filtering from
   this hooks too.

7) Add explicit module dependency between xt_physdev and br_netfilter.
   This provides a way to detect if the user needs br_netfilter from
   the configuration path. This should reduce the breakage of the
   br_netfilter modularization.

8) Cleanup coding style in ip_vs.h, from Simon Horman.

9) Fix crash in the recently added nf_tables masq expression. We have
   to register/unregister the notifiers to clean up the conntrack table
   entries from the module init/exit path, not from the rule addition /
   deletion path. From Arturo Borrero.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05 21:32:37 -04:00
Sébastien Barré dd3619f2ed Removed unused inet6 address state
the inet6 state INET6_IFADDR_STATE_UP only appeared in its definition.

Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:37:17 -04:00
Tom Herbert 37dd024779 gue: Receive side for Generic UDP Encapsulation
This patch adds support receiving for GUE packets in the fou module. The
fou module now supports direct foo-over-udp (no encapsulation header)
and GUE. To support this a type parameter is added to the fou netlink
parameters.

For a GUE socket we define gue_udp_recv, gue_gro_receive, and
gue_gro_complete to handle the specifics of the GUE protocol. Most
of the code to manage and configure sockets is common with the fou.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:33 -07:00
Eric Dumazet 55a93b3ea7 qdisc: validate skb without holding lock
Validation of skb can be pretty expensive :

GSO segmentation and/or checksum computations.

We can do this without holding qdisc lock, so that other cpus
can queue additional packets.

Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.

Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.

Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.

Same if disabling TX checksum offload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:36:11 -07:00
Jesper Dangaard Brouer 5772e9a346 qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.

This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().

The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
 (1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet.  The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
 (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.

One restriction of the new API is that every SKB must belong to the
same TXQ.  This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.

Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()).  In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit().  In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().

To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits.  In-effect bulking will only happen for BQL enabled drivers.

Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.

For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets.  They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.

Jointed work with Hannes, Daniel and Florian.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:37:06 -07:00
David S. Miller 739e4a758e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	net/netfilter/nfnetlink.c

Both r8152 and nfnetlink conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02 11:25:43 -07:00
Simon Horman 07dcc686fa ipvs: Clean up comment style in ip_vs.h
* Consistently use the multi-line comment style for networking code:

  /* This
   * That
   * The other thing
   */

* Use single-line comment style for comments with only one line of text.

* In general follow the leading '*' of each line of a comment with a
  single space and then text.

* Add missing line break between functions, remove double line break,
  align comments to previous lines whenever possible.

Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:58 +02:00
Pablo Neira Ayuso 4b7fd5d97e netfilter: explicit module dependency between br_netfilter and physdev
You can use physdev to match the physical interface enslaved to the
bridge device. This information is stored in skb->nf_bridge and it is
set up by br_netfilter. So, this is only available when iptables is
used from the bridge netfilter path.

Since 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the br_netfilter code is modular. To reduce the impact of this change,
we can autoload the br_netfilter if the physdev match is used since
we assume that the users need br_netfilter in place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:57 +02:00
Pablo Neira Ayuso c8d7b98bec netfilter: move nf_send_resetX() code to nf_reject_ipvX modules
Move nf_send_reset() and nf_send_reset6() to nf_reject_ipv4 and
nf_reject_ipv6 respectively. This code is shared by x_tables and
nf_tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:49 +02:00
Pablo Neira Ayuso 51b0a5d8c2 netfilter: nft_reject: introduce icmp code abstraction for inet and bridge
This patch introduces the NFT_REJECT_ICMPX_UNREACH type which provides
an abstraction to the ICMP and ICMPv6 codes that you can use from the
inet and bridge tables, they are:

* NFT_REJECT_ICMPX_NO_ROUTE: no route to host - network unreachable
* NFT_REJECT_ICMPX_PORT_UNREACH: port unreachable
* NFT_REJECT_ICMPX_HOST_UNREACH: host unreachable
* NFT_REJECT_ICMPX_ADMIN_PROHIBITED: administratevely prohibited

You can still use the specific codes when restricting the rule to match
the corresponding layer 3 protocol.

I decided to not overload the existing NFT_REJECT_ICMP_UNREACH to have
different semantics depending on the table family and to allow the user
to specify ICMP family specific codes if they restrict it to the
corresponding family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:29:57 +02:00
WANG Cong a0efb80ce3 net_sched: avoid calling tcf_unbind_filter() in call_rcu callback
This fixes the following crash:

[   63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
[   63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
[   63.980094] RIP: 0010:[<ffffffff817e6d07>]  [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d
[   63.980094] RSP: 0018:ffff880117dffcc0  EFLAGS: 00010202
[   63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
[   63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
[   63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
[   63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
[   63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
[   63.980094] FS:  0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[   63.980094] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
[   63.980094] Stack:
[   63.980094]  ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
[   63.980094]  ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
[   63.980094]  ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
[   63.980094] Call Trace:
[   63.980094]  [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d
[   63.980094]  [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691
[   63.980094]  [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691
[   63.980094]  [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d
[   63.980094]  [<ffffffff810780a4>] __do_softirq+0x142/0x323
[   63.980094]  [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53
[   63.980094]  [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221
[   63.980094]  [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33
[   63.980094]  [<ffffffff8108e44d>] kthread+0xc9/0xd1
[   63.980094]  [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125
[   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
[   63.980094]  [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0
[   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61

tp could be freed in call_rcu callback too, the order is not guaranteed.

John Fastabend says:

====================
Its worth noting why this is safe. Any running schedulers will either
read the valid class field or it will be zeroed.

All schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
====================

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 22:00:42 -04:00
Tom Herbert 8bce6d7d0d udp: Generalize skb_udp_segment
skb_udp_segment is the function called from udp4_ufo_fragment to
segment a UDP tunnel packet. This function currently assumes
segmentation is transparent Ethernet bridging (i.e. VXLAN
encapsulation). This patch generalizes the function to
operate on either Ethertype or IP protocol.

The inner_protocol field must be set to the protocol of the inner
header. This can now be either an Ethertype or an IP protocol
(in a union). A new flag in the skbuff indicates which type is
effective. skb_set_inner_protocol and skb_set_inner_ipproto
helper functions were added to set the inner_protocol. These
functions are called from the point where the tunnel encapsulation
is occuring.

When skb_udp_tunnel_segment is called, the function to segment the
inner packet is selected based on the inner IP or Ethertype. In the
case of an IP protocol encapsulation, the function is derived from
inet[6]_offloads. In the case of Ethertype, skb->protocol is
set to the inner_protocol and skb_mac_gso_segment is called. (GRE
currently does this, but it might be possible to lookup the protocol
in offload_base and call the appropriate segmenation function
directly).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01 21:35:51 -04:00
Li RongQing a12a601ed1 tcp: Change tcp_slow_start function to return void
No caller uses the return value, so make this function return void.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 17:09:16 -04:00
Hannes Frederic Sowa 705f1c869d ipv6: remove rt6i_genid
Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
are already marked DST_HOST (e.g. input routes routes) will always be
invalidated during sk_dst_check. Thus per-socket dst caching absolutely
had no effect and early demuxing had no effect.

Thus this patch removes rt6i_genid: fn_sernum already gets modified during
add operations, so we only must ensure we mutate fn_sernum during ipv6
address remove operations. This is a fairly cost extensive operations,
but address removal should not happen that often. Also our mtu update
functions do the same and we heard no complains so far. xfrm policy
changes also cause a call into fib6_flush_trees. Also plug a hole in
rt6_info (no cacheline changes).

I verified via tracing that this change has effect.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Martin Lau <kafai@fb.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 14:00:48 -04:00
John Fastabend b0ab6f9275 net: sched: enable per cpu qstats
After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend 6401585366 net: sched: restrict use of qstats qlen
This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().

The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.

It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend 25331d6ce4 net: sched: implement qstat helper routines
This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
John Fastabend 22e0f8b932 net: sched: make bstats per cpu and estimator RCU safe
In order to run qdisc's without locking statistics and estimators
need to be handled correctly.

To resolve bstats make the statistics per cpu. And because this is
only needed for qdiscs that are running without locks which is not
the case for most qdiscs in the near future only create percpu
stats when qdiscs set the TCQ_F_CPUSTATS flag.

Next because estimators use the bstats to calculate packets per
second and bytes per second the estimator code paths are updated
to use the per cpu statistics.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-30 01:02:26 -04:00
David S. Miller 852248449c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
pull request: netfilter/ipvs updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
most relevantly they are:

1) Four patches to make the new nf_tables masquerading support
   independent of the x_tables infrastructure. This also resolves a
   compilation breakage if the masquerade target is disabled but the
   nf_tables masq expression is enabled.

2) ipset updates via Jozsef Kadlecsik. This includes the addition of the
   skbinfo extension that allows you to store packet metainformation in the
   elements. This can be used to fetch and restore this to the packets through
   the iptables SET target, patches from Anton Danilov.

3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick.

4) Add simple weighted fail-over scheduler via Simon Horman. This provides
   a fail-over IPVS scheduler (unlike existing load balancing schedulers).
   Connections are directed to the appropriate server based solely on
   highest weight value and server availability, patch from Kenny Mathis.

5) Support IPv6 real servers in IPv4 virtual-services and vice versa.
   Simon Horman informs that the motivation for this is to allow more
   flexibility in the choice of IP version offered by both virtual-servers
   and real-servers as they no longer need to match: An IPv4 connection
   from an end-user may be forwarded to a real-server using IPv6 and
   vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell
   and Julian Anastasov.

6) Add global generation ID to the nf_tables ruleset. When dumping from
   several different object lists, we need a way to identify that an update
   has ocurred so userspace knows that it needs to refresh its lists. This
   also includes a new command to obtain the 32-bits generation ID. The
   less significant 16-bits of this ID is also exposed through res_id field
   in the nfnetlink header to quickly detect the interference and retry when
   there is no risk of ID wraparound.

7) Move br_netfilter out of the bridge core. The br_netfilter code is
   built in the bridge core by default. This causes problems of different
   kind to people that don't want this: Jesper reported performance drop due
   to the inconditional hook registration and I remember to have read complains
   on netdev from people regarding the unexpected behaviour of our bridging
   stack when br_netfilter is enabled (fragmentation handling, layer 3 and
   upper inspection). People that still need this should easily undo the
   damage by modprobing the new br_netfilter module.

8) Dump the set policy nf_tables that allows set parameterization. So
   userspace can keep user-defined preferences when saving the ruleset.
   From Arturo Borrero.

9) Use __seq_open_private() helper function to reduce boiler plate code
   in x_tables, From Rob Jones.

10) Safer default behaviour in case that you forget to load the protocol
   tracker. Daniel Borkmann and Florian Westphal detected that if your
   ruleset is stateful, you allow traffic to at least one single SCTP port
   and the SCTP protocol tracker is not loaded, then any SCTP traffic may
   be pass through unfiltered. After this patch, the connection tracking
   classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has
   been compiled with support for these modules.
====================

Trivially resolved conflict in include/linux/skbuff.h, Eric moved some
netfilter skbuff members around, and the netfilter tree adjusted the
ifdef guards for the bridging info pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 14:46:53 -04:00
Florian Westphal d82bd12298 tcp: move TCP_ECN_create_request out of header
After Octavian Purdilas tcp ipv4/ipv6 unification work this helper only
has a single callsite.

While at it, convert name to lowercase, suggested by Stephen.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 14:41:22 -04:00
Arturo Borrero 9363dc4b59 netfilter: nf_tables: store and dump set policy
We want to know in which cases the user explicitly sets the policy
options. In that case, we also want to dump back the info.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29 11:28:03 +02:00
Florian Westphal 9890092e46 net: tcp: more detailed ACK events and events for CE marked packets
DataCenter TCP (DCTCP) determines cwnd growth based on ECN information
and ACK properties, e.g. ACK that updates window is treated differently
than DUPACK.

Also DCTCP needs information whether ACK was delayed ACK. Furthermore,
DCTCP also implements a CE state machine that keeps track of CE markings
of incoming packets.

Therefore, extend the congestion control framework to provide these
event types, so that DCTCP can be properly implemented as a normal
congestion algorithm module outside of the core stack.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal 7354c8c389 net: tcp: split ack slow/fast events from cwnd_event
The congestion control ops "cwnd_event" currently supports
CA_EVENT_FAST_ACK and CA_EVENT_SLOW_ACK events (among others).
Both FAST and SLOW_ACK are only used by Westwood congestion
control algorithm.

This removes both flags from cwnd_event and adds a new
in_ack_event callback for this. The goal is to be able to
provide more detailed information about ACKs, such as whether
ECE flag was set, or whether the ACK resulted in a window
update.

It is required for DataCenter TCP (DCTCP) congestion control
algorithm as it makes a different choice depending on ECE being
set or not.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Daniel Borkmann 30e502a34b net: tcp: add flag for ca to indicate that ECN is required
This patch adds a flag to TCP congestion algorithms that allows
for requesting to mark IPv4/IPv6 sockets with transport as ECN
capable, that is, ECT(0), when required by a congestion algorithm.

It is currently used and needed in DataCenter TCP (DCTCP), as it
requires both peers to assert ECT on all IP packets sent - it
uses ECN feedback (i.e. CE, Congestion Encountered information)
from switches inside the data center to derive feedback to the
end hosts.

Therefore, simply add a new flag to icsk_ca_ops. Note that DCTCP's
algorithm/behaviour slightly diverges from RFC3168, therefore this
is only (!) enabled iff the assigned congestion control ops module
has requested this. By that, we can tightly couple this logic really
only to the provided congestion control ops.

Joint work with Florian Westphal and Glenn Judd.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal 55d8694fa8 net: tcp: assign tcp cong_ops when tcp sk is created
Split assignment and initialization from one into two functions.

This is required by followup patches that add Datacenter TCP
(DCTCP) congestion control algorithm - we need to be able to
determine if the connection is moderated by DCTCP before the
3WHS has finished.

As we walk the available congestion control list during the
assignment, we are always guaranteed to have Reno present as
it's fixed compiled-in. Therefore, since we're doing the
early assignment, we don't have a real use for the Reno alias
tcp_init_congestion_ops anymore and can thus remove it.

Actual usage of the congestion control operations are being
made after the 3WHS has finished, in some cases however we
can access get_info() via diag if implemented, therefore we
need to zero out the private area for those modules.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
WANG Cong 18d0264f63 net_sched: remove the first parameter from tcf_exts_destroy()
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:29:01 -04:00
David S. Miller f5c7e1a47a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-09-25

1) Remove useless hash_resize_mutex in xfrm_hash_resize().
   This mutex is used only there, but xfrm_hash_resize()
   can't be called concurrently at all. From Ying Xue.

2) Extend policy hashing to prefixed policies based on
   prefix lenght thresholds. From Christophe Gouault.

3) Make the policy hash table thresholds configurable
   via netlink. From Christophe Gouault.

4) Remove the maximum authentication length for AH.
   This was needed to limit stack usage. We switched
   already to allocate space, so no need to keep the
   limit. From Herbert Xu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:19:15 -04:00
Florian Fainelli 7905288f09 net: dsa: allow switches driver to implement get/set EEE
Allow switches driver to query and enable/disable EEE on a per-port
basis by implementing the ethtool_{get,set}_eee settings and delegating
these operations to the switch driver.

set_eee() will need to coordinate with the PHY driver to make sure that
EEE is enabled, the link-partner supports it and the auto-negotiation
result is satisfactory.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:09 -04:00
Florian Fainelli b2f2af21e3 net: dsa: allow enabling and disable switch ports
Whenever a per-port network device is used/unused, invoke the switch
driver port_enable/port_disable callbacks to allow saving as much power
as possible by disabling unused parts of the switch (RX/TX logic, memory
arrays, PHYs...). We supply a PHY device argument to make sure the
switch driver can act on the PHY device if needed (like putting/taking
the PHY out of deep low power mode).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 17:14:08 -04:00
Eric Dumazet cd7d8498c9 tcp: change tcp_skb_pcount() location
Our goal is to access no more than one cache line access per skb in
a write or receive queue when doing the various walks.

After recent TCP_SKB_CB() reorganizations, it is almost done.

Last part is tcp_skb_pcount() which currently uses
skb_shinfo(skb)->gso_segs, which is a terrible choice, because it needs
3 cache lines in current kernel (skb->head, skb->end, and
shinfo->gso_segs are all in 3 different cache lines, far from skb->cb)

This very simple patch reuses space currently taken by tcp_tw_isn
only in input path, as tcp_skb_pcount is only needed for skb stored in
write queue.

This considerably speeds up tcp_ack(), granted we avoid shinfo->tx_flags
to get SKBTX_ACK_TSTAMP, which seems possible.

This also speeds up all sack processing in general.

This speeds up tcp_sendmsg() because it no longer has to access/dirty
shinfo.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:36:48 -04:00
Eric Dumazet 971f10eca1 tcp: better TCP_SKB_CB layout to reduce cache line misses
TCP maintains lists of skb in write queue, and in receive queues
(in order and out of order queues)

Scanning these lists both in input and output path usually requires
access to skb->next, TCP_SKB_CB(skb)->seq, and TCP_SKB_CB(skb)->end_seq

These fields are currently in two different cache lines, meaning we
waste lot of memory bandwidth when these queues are big and flows
have either packet drops or packet reorders.

We can move TCP_SKB_CB(skb)->header at the end of TCP_SKB_CB, because
this header is not used in fast path. This allows TCP to search much faster
in the skb lists.

Even with regular flows, we save one cache line miss in fast path.

Thanks to Christoph Paasch for noticing we need to cleanup
skb->cb[] (IPCB/IP6CB) before entering IP stack in tx path,
and that I forgot IPCB use in tcp_v4_hnd_req() and tcp_v4_save_options().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:43 -04:00
Eric Dumazet a224772db8 ipv6: add a struct inet6_skb_parm param to ipv6_opt_accepted()
ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm
that it needs. Lets not assume this, as TCP stack might use a different
place.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:43 -04:00
Eric Dumazet 24a2d43d88 ipv4: rename ip_options_echo to __ip_options_echo()
ip_options_echo() assumes struct ip_options is provided in &IPCB(skb)->opt
Lets break this assumption, but provide a helper to not change all call points.

ip_send_unicast_reply() gets a new struct ip_options pointer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:35:42 -04:00
Dan Williams 3f33407856 net: make tcp_cleanup_rbuf private
net_dma was the only external user so this can become local to tcp.c
again.

Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:22:21 -07:00
Dan Williams 7bced39751 net_dma: simple removal
Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.

This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.

Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():

    https://lkml.org/lkml/2014/9/3/177

Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: <stable@vger.kernel.org>
Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2014-09-28 07:05:16 -07:00
LEROY Christophe 4565af0d40 net: optimise csum_replace4()
csum_partial() is a generic function which is not optimised for small fixed
length calculations, and its use requires to store "from" and "to" values in
memory while we already have them available in registers. This also has impact,
especially on RISC processors. In the same spirit as the change done by
Eric Dumazet on csum_replace2(), this patch rewrites inet_proto_csum_replace4()
taking into account RFC1624.

I spotted during a NATted tcp transfert that csum_partial() is one of top 5
consuming functions (around 8%), and the second user of csum_partial() is
inet_proto_csum_replace4().

I have proposed the same modification to inet_proto_csum_replace4() in another
patch.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 16:14:16 -04:00
David S. Miller 57219dc7bf Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-09-22

Please pull this batch of updates intended for the 3.18 stream...

For the mac80211 bits, Johannes says:

"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."

For the bluetooth bits, Johan says:

"Here are some more patches intended for 3.18. Most of them are cleanups
or fixes for SMP. The only exception is a fix for BR/EDR L2CAP fixed
channels which should now work better together with the L2CAP
information request procedure."

For the iwlwifi bits, Emmanuel says:

"I fix here dvm which was broken by my last pull request. Arik
continues to work on TDLS and Luca solved a few issues in CT-Kill. Eyal
keeps digging into rate scaling code, more to come soon. Besides this,
nothing really special here."

Beyond that, there are the usual big batches of updates to ath9k, b43,
mwifiex, and wil6210 as well as a handful of other bits here and there.
Also, rtlwifi gets some btcoexist attention from Larry.

Please let me know if there are problems!
====================

Had to adjust the wil6210 code to comply with Joe Perches's recent
change in net-next to make the netdev_*() routines return void instead
of 'int'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:39:24 -04:00
John W. Linville 30d3c071a6 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-09-26 13:38:51 -04:00
John W. Linville 330bd4ec9d NFC: 3.18 pull request
This is the NFC pull request for 3.18.
 
 We've had major updates for TI and ST Microelectronics drivers:
 
 For TI's trf7970a driver:
 
 - Target mode support for trf7970a
 - Suspend/resume support for trf7970a
 - DT properties additions to handle different quirks
 - A bunch of fixes for smartphone IOP related issues
 
 For ST Microelectronics' ST21NFCA and ST21NFCB drivers:
 
 - ISO15693 support for st21nfcb
 - checkpatch and sparse related warning fixes
 - Code cleanups and a few minor fixes
 
 Finally, Marvell add ISO15693 support to the NCI stack, together with a
 couple of NCI fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUIg4oAAoJEIqAPN1PVmxKg+kP/RPrgH6LA1tFubIwKR2+sGQ7
 g/W2J3AE8QASZkErkpXRNCt2D/SPWEKBY/qscwz+BtcWg76taIIaGTvVUNtxSaW3
 gS4hG6V1UlANWv3KFfaOKmzjEOO/SPNtkFAyI0cTOaGyUqG4o9BgBZpn1rYO16MD
 ZkSC39MpjMoXB9BbsfQngoUEoWc3tZNMmRzk4IVTwE/wXuQvZxmFQXEAiZ+pnYle
 NQfugaGMz0526rLG3QnrpkUakFb81iQwtONpbx6i8KW/Klkc6TN/ek6J9ecU8t5z
 tdHOViZWRmA1VwMGBHwpq8F2o/ATH6GeivTgqrQjcjGNhCUUT1Ulzve2UxGEMWi6
 ncjKY/GxUrYaMMtRvLv+/knrfbWtd+EnWOav07jgNrrA0tvgBNQvEKKHPoWykDVN
 QKpxu3YoNxrsR/LJMS+Zjj0IIM1Y+9DTOkLXzxJ5Hvht8rOl5heYGh2DICOpWsbQ
 ejrQicJOJvN5vqu+Sgcqq4msyTEdbs2LfRDrW1VC9A6ILI+KzYg2laTFGMnhZ5qn
 TgsYIDdONS2iGUulFHylGHI7ANtUg/mhklLUccY1HQYyiAM1NQUtzq1tAz6yLoIH
 l8iIiyzJSBWW57nWhyrULEbzHgPE+bHIjO4T+UUOxMgquYa4V11S1uP0OfWfZogR
 xS24GlobS2oXHMQqh0fA
 =d/C+
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.18 pull request

This is the NFC pull request for 3.18.

We've had major updates for TI and ST Microelectronics drivers:

For TI's trf7970a driver:

- Target mode support for trf7970a
- Suspend/resume support for trf7970a
- DT properties additions to handle different quirks
- A bunch of fixes for smartphone IOP related issues

For ST Microelectronics' ST21NFCA and ST21NFCB drivers:

- ISO15693 support for st21nfcb
- checkpatch and sparse related warning fixes
- Code cleanups and a few minor fixes

Finally, Marvell add ISO15693 support to the NCI stack, together with a
couple of NCI fixes."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-26 13:37:02 -04:00
Pablo Neira Ayuso 34666d467c netfilter: bridge: move br_netfilter out of the core
Jesper reported that br_netfilter always registers the hooks since
this is part of the bridge core. This harms performance for people that
don't need this.

This patch modularizes br_netfilter so it can be rmmod'ed, thus,
the hooks can be unregistered. I think the bridge netfilter should have
been a separated module since the beginning, Patrick agreed on that.

Note that this is breaking compatibility for users that expect that
bridge netfilter is going to be available after explicitly 'modprobe
bridge' or via automatic load through brctl.

However, the damage can be easily undone by modprobing br_netfilter.
The bridge core also spots a message to provide a clue to people that
didn't notice that this has been deprecated.

On top of that, the plan is that nftables will not rely on this software
layer, but integrate the connection tracking into the bridge layer to
enable stateful filtering and NAT, which is was bridge netfilter users
seem to require.

This patch still keeps the fake_dst_ops in the bridge core, since this
is required by when the bridge port is initialized. So we can safely
modprobe/rmmod br_netfilter anytime.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
2014-09-26 18:42:31 +02:00
Tejun Heo d06efebf0c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18
This is to receive 0a30288da1 ("blk-mq, percpu_ref: implement a
kludge for SCSI blk-mq stall during probe") which implements
__percpu_ref_kill_expedited() to work around SCSI blk-mq stall.  The
commit reverted and patches to implement proper fix will be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
2014-09-24 13:00:21 -04:00
Johan Hedberg d41c15cf95 Bluetooth: Fix reason code used for rejecting SCO connections
The core specification defines valid values for the
HCI_Reject_Synchronous_Connection_Request command to be 0x0D-0x0F. So
far the code has been using HCI_ERROR_REMOTE_USER_TERM (0x13) which is
not a valid value and is therefore being rejected by some controllers:

 > HCI Event: Connect Request (0x04) plen 10
	bdaddr 40:6F:2A:6A:E5:E0 class 0x000000 type eSCO
 < HCI Command: Reject Synchronous Connection (0x01|0x002a) plen 7
	bdaddr 40:6F:2A:6A:E5:E0 reason 0x13
	Reason: Remote User Terminated Connection
 > HCI Event: Command Status (0x0f) plen 4
	Reject Synchronous Connection (0x01|0x002a) status 0x12 ncmd 1
	Error: Invalid HCI Command Parameters

This patch introduces a new define for a value from the valid range
(0x0d == Connection Rejected Due To Limited Resources) and uses it
instead for rejecting incoming connections.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-24 14:03:32 +02:00
Joe Perches 2b0bf6c85a Bluetooth: Convert bt_<level> logging functions to return void
No caller or macro uses the return value so make all
the functions return void.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-24 09:40:08 +02:00
Christophe Ricard 9e87f9a9c4 NFC: nci: Add support for proprietary RF Protocols
In NFC Forum NCI specification, some RF Protocol values are
reserved for proprietary use (from 0x80 to 0xfe).
Some CLF vendor may need to use one value within this range
for specific technology.
Furthermore, some CLF may not becompliant with NFC Froum NCI
specification 2.0 and therefore will not support RF Protocol
value 0x06 for PROTOCOL_T5T as mention in a draft specification
and in a recent push.

Adding get_rf_protocol handle to the nci_ops structure will
help to set the correct technology to target.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-09-24 02:02:24 +02:00
Eric Dumazet 4cdf507d54 icmp: add a global rate limitation
Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
protected by a lock, meaning that hosts can be stuck hard if all cpus
want to check ICMP limits.

When say a DNS or NTP server process is restarted, inetpeer tree grows
quick and machine comes to its knees.

iptables can not help because the bottleneck happens before ICMP
messages are even cooked and sent.

This patch adds a new global limitation, using a token bucket filter,
controlled by two new sysctl :

icmp_msgs_per_sec - INTEGER
    Limit maximal number of ICMP packets sent per second from this host.
    Only messages whose type matches icmp_ratemask are
    controlled by this limit.
    Default: 1000

icmp_msgs_burst - INTEGER
    icmp_msgs_per_sec controls number of ICMP packets sent per second,
    while icmp_msgs_burst controls the burst size of these packets.
    Default: 50

Note that if we really want to send millions of ICMP messages per
second, we might extend idea and infra added in commit 04ca6973f7
("ip: make IP identifiers less predictable") :
add a token bucket in the ip_idents hash and no longer rely on inetpeer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:47:38 -04:00
David S. Miller 1f6d80358d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/mips/net/bpf_jit.c
	drivers/net/can/flexcan.c

Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23 12:09:27 -04:00
David S. Miller 84de67b298 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2014-09-22

We generate a blackhole or queueing route if a packet
matches an IPsec policy but a state can't be resolved.
Here we assume that dst_output() is called to kill
these packets. Unfortunately this assumption is not
true in all cases, so it is possible that these packets
leave the system without the necessary transformations.

This pull request contains two patches to fix this issue:

1) Fix for blackhole routed packets.

2) Fix for queue routed packets.

Both patches are serious stable candidates.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:41:41 -04:00
Eric Dumazet fcdd1cf4dd tcp: avoid possible arithmetic overflows
icsk_rto is a 32bit field, and icsk_backoff can reach 15 by default,
or more if some sysctl (eg tcp_retries2) are changed.

Better use 64bit to perform icsk_rto << icsk_backoff operations

As Joe Perches suggested, add a helper for this.

Yuchung spotted the tcp_v4_err() case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:27:10 -04:00
Daniel Borkmann 35f7aa5309 ipv6: mld: answer mldv2 queries with mldv1 reports in mldv1 fallback
RFC2710 (MLDv1), section 3.7. says:

  The length of a received MLD message is computed by taking the
  IPv6 Payload Length value and subtracting the length of any IPv6
  extension headers present between the IPv6 header and the MLD
  message. If that length is greater than 24 octets, that indicates
  that there are other fields present *beyond* the fields described
  above, perhaps belonging to a *future backwards-compatible* version
  of MLD. An implementation of the version of MLD specified in this
  document *MUST NOT* send an MLD message longer than 24 octets and
  MUST ignore anything past the first 24 octets of a received MLD
  message.

RFC3810 (MLDv2), section 8.2.1. states for *listeners* regarding
presence of MLDv1 routers:

  In order to be compatible with MLDv1 routers, MLDv2 hosts MUST
  operate in version 1 compatibility mode. [...] When Host
  Compatibility Mode is MLDv2, a host acts using the MLDv2 protocol
  on that interface. When Host Compatibility Mode is MLDv1, a host
  acts in MLDv1 compatibility mode, using *only* the MLDv1 protocol,
  on that interface. [...]

While section 8.3.1. specifies *router* behaviour regarding presence
of MLDv1 routers:

  MLDv2 routers may be placed on a network where there is at least
  one MLDv1 router. The following requirements apply:

  If an MLDv1 router is present on the link, the Querier MUST use
  the *lowest* version of MLD present on the network. This must be
  administratively assured. Routers that desire to be compatible
  with MLDv1 MUST have a configuration option to act in MLDv1 mode;
  if an MLDv1 router is present on the link, the system administrator
  must explicitly configure all MLDv2 routers to act in MLDv1 mode.
  When in MLDv1 mode, the Querier MUST send periodic General Queries
  truncated at the Multicast Address field (i.e., 24 bytes long),
  and SHOULD also warn about receiving an MLDv2 Query (such warnings
  must be rate-limited). The Querier MUST also fill in the Maximum
  Response Delay in the Maximum Response Code field, i.e., the
  exponential algorithm described in section 5.1.3. is not used. [...]

That means that we should not get queries from different versions of
MLD. When there's a MLDv1 router present, MLDv2 enforces truncation
and MRC == MRD (both fields are overlapping within the 24 octet range).

Section 8.3.2. specifies behaviour in the presence of MLDv1 multicast
address *listeners*:

  MLDv2 routers may be placed on a network where there are hosts
  that have not yet been upgraded to MLDv2. In order to be compatible
  with MLDv1 hosts, MLDv2 routers MUST operate in version 1 compatibility
  mode. MLDv2 routers keep a compatibility mode per multicast address
  record. The compatibility mode of a multicast address is determined
  from the Multicast Address Compatibility Mode variable, which can be
  in one of the two following states: MLDv1 or MLDv2.

  The Multicast Address Compatibility Mode of a multicast address
  record is set to MLDv1 whenever an MLDv1 Multicast Listener Report is
  *received* for that multicast address. At the same time, the Older
  Version Host Present timer for the multicast address is set to Older
  Version Host Present Timeout seconds. The timer is re-set whenever a
  new MLDv1 Report is received for that multicast address. If the Older
  Version Host Present timer expires, the router switches back to
  Multicast Address Compatibility Mode of MLDv2 for that multicast
  address. [...]

That means, what can happen is the following scenario, that hosts can
act in MLDv1 compatibility mode when they previously have received an
MLDv1 query (or, simply operate in MLDv1 mode-only); and at the same
time, an MLDv2 router could start up and transmits MLDv2 startup query
messages while being unaware of the current operational mode.

Given RFC2710, section 3.7 we would need to answer to that with an MLDv1
listener report, so that the router according to RFC3810, section 8.3.2.
would receive that and internally switch to MLDv1 compatibility as well.

Right now, I believe since the initial implementation of MLDv2, Linux
hosts would just silently drop such MLDv2 queries instead of replying
with an MLDv1 listener report, which would prevent a MLDv2 router going
into fallback mode (until it receives other MLDv1 queries).

Since the mapping of MRC to MRD in exactly such cases can make use of
the exponential algorithm from 5.1.3, we cannot [strictly speaking] be
aware in MLDv1 of the encoding in MRC, it seems also not mentioned by
the RFC. Since encodings are the same up to 32767, assume in such a
situation this value as a hard upper limit we would clamp. We have asked
one of the RFC authors on that regard, and he mentioned that there seem
not to be any implementations that make use of that exponential algorithm
on startup messages. In any case, this patch fixes this MLD
interoperability issue.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 16:23:15 -04:00
Florian Fainelli 19e57c4e6d net: dsa: add {get, set}_wol callbacks to slave devices
Allow switch drivers to implement per-port Wake-on-LAN getter and
setters.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:41:23 -04:00
Florian Fainelli 2446254915 net: dsa: allow switch drivers to implement suspend/resume hooks
Add an abstraction layer to suspend/resume switch devices, doing the
following split:

- suspend/resume the slave network devices and their corresponding PHY
  devices
- suspend/resume the switch hardware using switch driver callbacks

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:41:23 -04:00
Eric Dumazet 2571178626 net: sched: shrink struct qdisc_skb_cb to 28 bytes
We cannot make struct qdisc_skb_cb bigger without impacting IPoIB,
or increasing skb->cb[] size.

Commit e0f31d8498 ("flow_keys: Record IP layer protocol in
skb_flow_dissect()") broke IPoIB.

Only current offender is sch_choke, and this one do not need an
absolutely precise flow key.

If we store 17 bytes of flow key, its more than enough. (Its the actual
size of flow_keys if it was a packed structure, but we might add new
fields at the end of it later)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: e0f31d8498 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:21:47 -04:00
Nicolas Dichtel 0d566379c5 genetlink: add function genl_has_listeners()
This function is the counterpart of the function netlink_has_listeners().

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:28:26 -04:00
Sabrina Dubroca 54003f119c net: fix sparse warnings in SNMP_UPD_PO_STATS(_BH)
ptr used to be a non __percpu pointer (result of a this_cpu_ptr
assignment, 7d720c3e4f ("percpu: add __percpu sparse annotations to
net")). Since d25398df59 ("net: avoid reloads in SNMP_UPD_PO_STATS"),
that's no longer the case, SNMP_UPD_PO_STATS uses this_cpu_add and ptr
is now __percpu.

Silence sparse warnings by preserving the original type and
annotation, and remove the out-of-date comment.

warning: incorrect type in initializer (different address spaces)
   expected unsigned long long *ptr
   got unsigned long long [noderef] <asn:3>*<noident>
warning: incorrect type in initializer (different address spaces)
   expected void const [noderef] <asn:3>*__vpp_verify
   got unsigned long long *<noident>
warning: incorrect type in initializer (different address spaces)
   expected void const [noderef] <asn:3>*__vpp_verify
   got unsigned long long *<noident>

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:22:31 -04:00
Tom Herbert 5632848653 net: Changes to ip_tunnel to support foo-over-udp encapsulation
This patch changes IP tunnel to support (secondary) encapsulation,
Foo-over-UDP. Changes include:

1) Adding tun_hlen as the tunnel header length, encap_hlen as the
   encapsulation header length, and hlen becomes the grand total
   of these.
2) Added common netlink define to support FOU encapsulation.
3) Routines to perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Florian Fainelli 6819563e64 net: dsa: allow switch drivers to specify phy_device::dev_flags
Some switch drivers (e.g: bcm_sf2) may have to communicate specific
workarounds or flags towards the PHY device driver. Allow switches
driver to be delegated that task by introducing a get_phy_flags()
callback which will do just that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Andy Zhou 6a93cc9052 udp-tunnel: Add a few more UDP tunnel APIs
Added a few more UDP tunnel APIs that can be shared by UDP based
tunnel protocol implementation. The main ones are highlighted below.

setup_udp_tunnel_sock() configures UDP listener socket for
receiving UDP encapsulated packets.

udp_tunnel_xmit_skb() and upd_tunnel6_xmit_skb() transmit skb
using UDP encapsulation.

udp_tunnel_sock_release() closes the UDP tunnel listener socket.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou fd384412e1 udp_tunnel: Seperate ipv6 functions into its own file.
Add ip6_udp_tunnel.c for ipv6 UDP tunnel functions to avoid ifdefs
in udp_tunnel.c

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Herbert Xu 689f1c9de2 ipsec: Remove obsolete MAX_AH_AUTH_LEN
While tracking down the MAX_AH_AUTH_LEN crash in an old kernel
I thought that this limit was rather arbitrary and we should
just get rid of it.

In fact it seems that we've already done all the work needed
to remove it apart from actually removing it.  This limit was
there in order to limit stack usage.  Since we've already
switched over to allocating scratch space using kmalloc, there
is no longer any need to limit the authentication length.

This patch kills all references to it, including the BUG_ONs
that led me here.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-18 10:54:36 +02:00
Marcel Holtmann 0097db06f5 Bluetooth: Remove exported hci_recv_fragment function
The hci_recv_fragment function is no longer used by any driver and thus
do not export it. In fact it is not even needed by the core and it can
be removed altogether.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-09-17 10:23:03 +03:00
Steffen Klassert b8c203b2d2 xfrm: Generate queueing routes only from route lookup functions
Currently we genarate a queueing route if we have matching policies
but can not resolve the states and the sysctl xfrm_larval_drop is
disabled. Here we assume that dst_output() is called to kill the
queued packets. Unfortunately this assumption is not true in all
cases, so it is possible that these packets leave the system unwanted.

We fix this by generating queueing routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: a0073fe18e ("xfrm: Add a state resolution packet queue")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:49 +02:00
Steffen Klassert f92ee61982 xfrm: Generate blackhole routes only from route lookup functions
Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131b1 ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-09-16 10:08:40 +02:00
Alex Gartrell 391f503d69 ipvs: prevent mixing heterogeneous pools and synchronization
The synchronization protocol is not compatible with heterogeneous pools, so
we need to verify that we're not turning both on at the same time.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:35 +09:00
Alex Gartrell ba38528aae ipvs: Supply destination address family to ip_vs_conn_new
The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:34 +09:00
Alex Gartrell 655eef103d ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}
We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Alex Gartrell 6cff339bbd ipvs: Add destination address family to netlink interface
This is necessary to support heterogeneous pools.  For example, if you have
an ipv6 addressed network, you'll want to be able to forward ipv4 traffic
into it.

This patch enforces that destination address family is the same as service
family, as none of the forwarding mechanisms support anything else.

For the old setsockopt mechanism, we simply set the dest address family to
AF_INET as we do with the service.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16 09:03:33 +09:00
Alexander Duyck b4d2394d01 dsa: Replace mii_bus with a generic host device
This change makes it so that instead of passing and storing a mii_bus we
instead pass and store a host_dev.  From there we can test to determine the
exact type of device, and can verify it is the correct device for our switch.

So for example it would be possible to pass a device pointer from a pci_dev
and instead of checking for a PHY ID we could check for a vendor and/or device
ID.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:24:20 -04:00
Alexander Duyck 5075314e4e dsa: Split ops up, and avoid assigning tag_protocol and receive separately
This change addresses several issues.

First, it was possible to set tag_protocol without setting the ops pointer.
To correct that I have reordered things so that rcv is now populated before
we set tag_protocol.

Second, it didn't make much sense to keep setting the device ops each time a
new slave was registered.  So by moving the receive portion out into root
switch initialization that issue should be addressed.

Third, I wanted to avoid sending tags if the rcv pointer was not registered
so I changed the tag check to verify if the rcv function pointer is set on
the root tree.  If it is then we start sending DSA tagged frames.

Finally I split the device ops pointer in the structures into two spots.  I
placed the rcv function pointer in the root switch since this makes it
easiest to access from there, and I placed the xmit function pointer in the
slave for the same reason.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-15 17:24:20 -04:00
John W. Linville 6bd2bd27ba This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
 timeout support, a number of changes for TDLS, early support for radio
 resource measurement and many fixes. Also, I'm changing a number of
 places to clear key memory when it's freed and Intel claims copyright
 for code they developed.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJUEpv0AAoJEDBSmw7B7bqr6CMP/2CXvWr/98AY2Flt74KDNyaE
 vmJBVCsu+eT0G9FL6YxbVU5+rvInGDHd9qTHkU4ljd+uXwnG8XAT+WHFlhBjzm+V
 juXPWblbSdMzwpWDfq7Kbk134b9ALTEUqekhqSFvhPA5h0Dq0/8lDK9CFyfwKWbN
 07PwUv0VUUEHKVqQoVSNJu9Szi5NvZvDcN7Jwg1Cpnv0sUOeH7J2Kz1OUT4RaEhI
 c/UJjCQV4ssXaEkTDIxciQ62HrglZanMqyx4a9LGbrxLdw1KJ19CNmSkwB5mQuZg
 LhV05Y0Gv4tkRC8sCo7HF7cqgjBfjTNiEjZYfbExW0QFOMKIgKmmjYIEezVdbrk7
 gFIyhTRE595UtztUJV0dcitoOlybbRf3OdEwAIJD6fc0vhoe/rSjUIyS7/CZisMT
 9zg33JvtK3eYPSJS1jy4lk2yZ5alhLoPMQTNmsEuyOGcU3sH9vTGMjONPffOlcH9
 nzj7aUS2Qvwn3H+4CIaZbZhySpa0B9zkGL3oxeaEBmLJbFMTo5ua2FNGhubC2O+O
 BwNULDBEMwsGHKMUCWCLmQwACWdVdNxYYWtXbWfxdmC/CJoXgdLCJIUfoa1aOf2A
 DyCqUFvG/n8ObHVy+P3RU6poQFj0M/yclJAMHRW6x2qzNvAkDb0G6TVeIlgN5dG8
 jLoZPL5OH0wb0BPVNEH8
 =OPIp
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-john-2014-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg <johannes@sipsolutions.net> says:

"This time, I have some rate minstrel improvements, support for a very
small feature from CCX that Steinar reverse-engineered, dynamic ACK
timeout support, a number of changes for TDLS, early support for radio
resource measurement and many fixes. Also, I'm changing a number of
places to clear key memory when it's freed and Intel claims copyright
for code they developed."

Conflicts:
	net/mac80211/iface.c

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-09-15 14:51:23 -04:00
Marcel Holtmann 43e73e4e2a Bluetooth: Provide HCI command opcode information to driver
The Bluetooth core already does processing of the HCI command header
and puts it together before sending it to the driver. It is not really
efficient for the driver to look at the HCI command header again in
case it has to make certain decisions about certain commands. To make
this easier, just provide the opcode as part of the SKB control buffer
information. The extra information about the opcode is optional and
only provided for HCI commands.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-09-15 07:15:45 +03:00
Florian Fainelli ac7a04c33d net: dsa: change tag_protocol to an enum
Now that we introduced an additional multiplexing/demultiplexing layer
with commit 3e8a72d1da ("net: dsa: reduce number of protocol hooks")
that lives within the DSA code, we no longer need to have a given switch
driver tag_protocol be an actual ethertype value, instead, we can
replace it with an enum: dsa_tag_protocol.

Do this replacement in the drivers, which allows us to get rid of the
cpu_to_be16()/htons() dance, and remove ETH_P_BRCMTAG since we do not
need it anymore.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:04:35 -04:00
WANG Cong 013b4d9038 ipv6: clean up ipv6_dev_ac_inc()
Make it accept inet6_dev, and rename it to __ipv6_dev_ac_inc()
to reflect this change.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 16:38:42 -04:00
John Fastabend 25d8c0d55f net: rcu-ify tcf_proto
rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
John Fastabend 46e5da40ae net: qdisc: use rcu prefix and silence sparse warnings
Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 12:30:25 -04:00
Sabrina Dubroca 381f4dca48 ipv6: clean up anycast when an interface is destroyed
If we try to rmmod the driver for an interface while sockets with
setsockopt(JOIN_ANYCAST) are alive, some refcounts aren't cleaned up
and we get stuck on:

  unregister_netdevice: waiting for ens3 to become free. Usage count = 1

If we LEAVE_ANYCAST/close everything before rmmod'ing, there is no
problem.

We need to perform a cleanup similar to the one for multicast in
addrconf_ifdown(how == 1).

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-12 17:33:06 -04:00
Pablo Neira Ayuso 67981fefb2 netfilter: fix compilation of masquerading without IP_NF_TARGET_MASQUERADE
CONFIG_NF_NAT_MASQUERADE_IPV6=m
 # CONFIG_IP6_NF_TARGET_MASQUERADE is not set

results in:

net/ipv6/netfilter/nf_nat_masquerade_ipv6.c: In function ‘nf_nat_masquerade_ipv6’:
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:41:14: error: ‘struct nf_conn_nat’ has no member named ‘masq_index’
  nfct_nat(ct)->masq_index = out->ifindex;
              ^
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c: In function ‘device_cmp’:
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:61:12: error: ‘const struct nf_conn_nat’ has no member named ‘masq_index’
  return nat->masq_index == (int)(long)ifindex;
            ^
net/ipv6/netfilter/nf_nat_masquerade_ipv6.c:62:1: warning: control
reaches end of non-void function [-Wreturn-type]
 }
 ^
make[3]: *** [net/ipv6/netfilter/nf_nat_masquerade_ipv6.o] Error 1

Fix this by using the new NF_NAT_MASQUERADE_IPV4 and _IPV6 symbols
in include/net/netfilter/nf_nat.h.

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-11 17:02:45 +02:00