Commit Graph

28980 Commits

Author SHA1 Message Date
Pablo Neira Ayuso 71ffe9c77d netfilter: xt_TCPMSS: fix handling of malformed TCP header and options
Make sure the packet has enough room for the TCP header and
that it is not malformed.

While at it, store tcph->doff*4 in a variable, as it is used
several times.

This patch also fixes a possible off by one in case of malformed
TCP options.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01 11:42:53 +02:00
Linus Lüssing b00589af3b bridge: disable snooping if there is no querier
If there is no querier on a link then we won't get periodic reports and
therefore won't be able to learn about multicast listeners behind ports,
potentially leading to lost multicast packets, especially for multicast
listeners that joined before the creation of the bridge.

These lost multicast packets can appear since c5c2326059
("bridge: Add multicast_querier toggle and disable queries by default")
in particular.

With this patch we are flooding multicast packets if our querier is
disabled and if we didn't detect any other querier.

A grace period of the Maximum Response Delay of the querier is added to
give multicast responses enough time to arrive and to be learned from
before disabling the flooding behaviour again.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 17:40:21 -07:00
Dan Carpenter 8cb3b9c364 net_sched: info leak in atm_tc_dump_class()
The "pvc" struct has a hole after pvc.sap_family which is not cleared.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 15:04:19 -07:00
Linus Torvalds 06693f305e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix association failures not triggering a connect-failure event in
    cfg80211, from Johannes Berg.

 2) Eliminate a potential NULL deref with older iptables tools when
    configuring xt_socket rules, from Eric Dumazet.

 3) Missing RTNL locking in wireless regulatory code, from Johannes
    Berg.

 4) Fix OOPS caused by firmware loading races in ath9k_htc, from Alexey
    Khoroshilov.

 5) Fix usb URB leak in usb_8dev CAN driver, also from Alexey
    Khoroshilov.

 6) VXLAN namespace teardown fails to unregister devices, from Stephen
    Hemminger.

 7) Fix multicast settings getting dropped by firmware in qlcnic driver,
    from Sucheta Chakraborty.

 8) Add sysctl range enforcement for tcp_syn_retries, from Michal Tesar.

 9) Fix a nasty bug in bridging where an active timer would get
    reinitialized with a setup_timer() call.  From Eric Dumazet.

10) Fix use after free in new mlx5 driver, from Dan Carpenter.

11) Fix freed pointer reference in ipv6 multicast routing on namespace
    cleanup, from Hannes Frederic Sowa.

12) Some usbnet drivers report TSO and SG in their feature set, but the
    usbnet layer doesn't really support them.  From Eric Dumazet.

13) Fix crash on EEH errors in tg3 driver, from Gavin Shan.

14) Drop cb_lock when requesting modules in genetlink, from Stanislaw
    Gruszka.

15) Kernel stack leaks in cbq scheduler and af_key pfkey messages, from
    Dan Carpenter.

16) FEC driver erroneously signals NETDEV_TX_BUSY on transmit leading to
    endless loops, from Uwe Kleine-König.

17) Fix hangs from loading mvneta driver, from Arnaud Patard.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
  mlx5: fix error return code in mlx5_alloc_uuars()
  mvneta: Try to fix mvneta when compiled as module
  mvneta: Fix hang when loading the mvneta driver
  atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
  genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE
  af_key: more info leaks in pfkey messages
  net/fec: Don't let ndo_start_xmit return NETDEV_TX_BUSY without link
  net_sched: Fix stack info leak in cbq_dump_wrr().
  igb: fix vlan filtering in promisc mode when not in VT mode
  ixgbe: Fix Tx Hang issue with lldpad on 82598EB
  genetlink: release cb_lock before requesting additional module
  net: fec: workaround stop tx during errata ERR006358
  qlcnic: Fix diagnostic interrupt test for 83xx adapters.
  qlcnic: Fix setting Guest VLAN
  qlcnic: Fix operation type and command type.
  qlcnic: Fix initialization of work function.
  Revert "atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring"
  atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
  net/tg3: Fix warning from pci_disable_device()
  net/tg3: Fix kernel crash
  ...
2013-07-31 12:56:18 -07:00
Johannes Berg ddfe49b42d mac80211: continue using disabled channels while connected
In case the AP has different regulatory information than we do,
it can happen that we connect to an AP based on e.g. the world
roaming regulatory data, and then update our database with the
AP's country information disables the channel the AP is using.
If this happens on an HT AP, the bandwidth tracking code will
hit the WARN_ON() and disconnect. Since that's not very useful,
ignore the channel-disable flag in bandwidth tracking.

Cc: stable@vger.kernel.org
Reported-by: Chris Wright <chrisw@sous-sol.org>
Tested-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-31 21:18:17 +02:00
Johannes Berg 74418edec9 cfg80211: fix P2P GO interface teardown
When a P2P GO interface goes down, cfg80211 doesn't properly
tear it down, leading to warnings later. Add the GO interface
type to the enumeration to tear it down like AP interfaces.
Otherwise, we leave it pending and mac80211's state can get
very confused, leading to warnings later.

Cc: stable@vger.kernel.org
Reported-by: Ilan Peer <ilan.peer@intel.com>
Tested-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-31 21:18:17 +02:00
Johannes Berg 5cdaed1e87 mac80211: ignore HT primary channel while connected
While we're connected, the AP shouldn't change the primary channel
in the HT information. We checked this, and dropped the connection
if it did change it.

Unfortunately, this is causing problems on some APs, e.g. on the
Netgear WRT610NL: the beacons seem to always contain a bad channel
and if we made a connection using a probe response (correct data)
we drop the connection immediately and can basically not connect
properly at all.

Work around this by ignoring the HT primary channel information in
beacons if we're already connected.

Also print out more verbose messages in the other situations to
help diagnose similar bugs quicker in the future.

Cc: stable@vger.kernel.org [3.10]
Acked-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-31 21:18:10 +02:00
Johannes Berg cb236d2d71 mac80211: don't wait for TX status forever
TX status notification can get lost, or the frames could
get stuck on the queue, so don't wait for the callback
from the driver forever and instead time out after half
a second.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-31 21:16:17 +02:00
John W. Linville 11a45820d0 This is the second NFC fixes pull request for 3.11.
We have:
 
 - A build failure fix for the NCI SPI transport layer due to a
   missing CRC_CCITT Kconfig dependency.
 
 - A netlink command rename: CMD_FW_UPLOAD was merged during the 3.11
   merge window but the typical terminology for loading a firmware to a
   target is firmware download rather than upload. In order to avoid any
   confusion in a file exported to userspace, we rename this command into
   CMD_FW_DOWNLOAD.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR+E9jAAoJEIqAPN1PVmxKzkwP/3dy9wpwQG7f8FLv61IhbhhQ
 8gVqY3BX1RJdez1vH5MvqkTK6U3SmlQvtJM8pIPyfPXyR1+Af5AxqQh3vjTP3xUG
 PuMtmQOlz5OJ6ErttxZtYERVtrhFkasMVmVqKrN9ptItPADfOmeC0/hyoEnoYsWQ
 HrhZn1lYsf98zmEbNS2KoRcZUVLClbg4xosTktTVaz56jIGVuM8MAch+FS+tJhhl
 av0MX/VZvAUllSnlWWDmt0Lh9isJOLOMtqIRj6PLBAp2ra9sPNO5TlZ4lz2og2gx
 zesVhBBLyiF9oluuQj/FJft+s5Khcm0R9W969raL5SvehWY77wHoY76ZqHMUE2Qv
 7RPUvFRfOA5LvKJM8MduJ8fMf830mZWD7cByhIfUxtWQZumwPfn2Mbl3xNkPLFZB
 L2x13SwGjU+PdCo70+ybgr8zUvYIxiVULwq5xFynvXJNSpOujIe3nPdQb7QtK8C0
 4d9OudAHmfHsW93PMBE+Zki8i8GDLTR3DOQoXIRi7oPR+EVL2JDsBQvnauXhdSap
 mp9iyuoqAYjgc6e2o8coVqViXWbKmBEa9n7NKrX3dPrI9e5F67WChAyehBCu9KV3
 zZxruhEJBw6PLmIGDETk1XIVd9G6rfMBswnDfSJBjjG5PrUh6Xbfwa1y+KiRKqCh
 FG+IvbfHWZRmdeFX3U4P
 =p4r5
 -----END PGP SIGNATURE-----

Merge tag 'nfc-fixes-3.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixes

Samuel Ortiz <sameo@linux.intel.com> says:

'This is the second NFC fixes pull request for 3.11.

We have:

- A build failure fix for the NCI SPI transport layer due to a
  missing CRC_CCITT Kconfig dependency.

- A netlink command rename: CMD_FW_UPLOAD was merged during the 3.11
  merge window but the typical terminology for loading a firmware to a
  target is firmware download rather than upload. In order to avoid any
  confusion in a file exported to userspace, we rename this command into
  CMD_FW_DOWNLOAD."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-07-31 15:15:50 -04:00
Chris Wright b56e4b857c mac80211: fix infinite loop in ieee80211_determine_chantype
Commit "3d9646d mac80211: fix channel selection bug" introduced a possible
infinite loop by moving the out target above the chandef_downgrade
while loop.  When we downgrade to NL80211_CHAN_WIDTH_20_NOHT, we jump
back up to re-run the while loop...indefinitely.  Replace goto with
break and carry on.  This may not be sufficient to connect to the AP,
but will at least keep the cpu from livelocking.  Thanks to Derek Atkins
as an extra pair of debugging eyes.

Cc: stable@kernel.org
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-31 21:15:36 +02:00
John W. Linville 704278ccb5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Conflicts:
	net/bluetooth/hci_core.c
2013-07-31 15:11:50 -04:00
Pablo Neira e1ee3673a8 genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE
Currently, it is not possible to use neither NLM_F_EXCL nor
NLM_F_REPLACE from genetlink. This is due to this checking in
genl_family_rcv_msg:

	if (nlh->nlmsg_flags & NLM_F_DUMP)

NLM_F_DUMP is NLM_F_MATCH|NLM_F_ROOT. Thus, if NLM_F_EXCL or
NLM_F_REPLACE flag is set, genetlink believes that you're
requesting a dump and it calls the .dumpit callback.

The solution that I propose is to refine this checking to
make it stricter:

	if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP)

And given the combination NLM_F_REPLACE and NLM_F_EXCL does
not make sense to me, it removes the ambiguity.

There was a patch that tried to fix this some time ago (0ab03c2
netlink: test for all flags of the NLM_F_DUMP composite) but it
tried to resolve this ambiguity in *all* existing netlink subsystems,
not only genetlink. That patch was reverted since it broke iproute2,
which is using NLM_F_ROOT to request the dump of the routing cache.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-30 16:43:19 -07:00
Dan Carpenter ff862a4668 af_key: more info leaks in pfkey messages
This is inspired by a5cc68f3d6 "af_key: fix info leaks in notify
messages".  There are some struct members which don't get initialized
and could disclose small amounts of private information.

Acked-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-30 16:26:16 -07:00
Samuel Ortiz 9ea7187c53 NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD
Loading a firmware into a target is typically called firmware
download, not firmware upload. So we rename the netlink API to
NFC_CMD_FW_DOWNLOAD in order to avoid any terminology confusion from
userspace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-07-31 01:19:43 +02:00
Johannes Berg c319d50bfc nl80211: fix another nl80211_fam.attrbuf race
This is similar to the race Linus had reported, but in this case
it's an older bug: nl80211_prepare_wdev_dump() uses the wiphy
index in cb->args[0] as it is and thus parses the message over
and over again instead of just once because 0 is the first valid
wiphy index. Similar code in nl80211_testmode_dump() correctly
offsets the wiphy_index by 1, do that here as well.

Cc: stable@vger.kernel.org
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-30 22:40:34 +02:00
David S. Miller a0db856a95 net_sched: Fix stack info leak in cbq_dump_wrr().
Make sure the reserved fields, and padding (if any), are
fully initialized.

Based upon a patch by Dan Carpenter and feedback from
Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-30 00:16:21 -07:00
Johan Hedberg 53e21fbc28 Bluetooth: Fix calling request callback more than once
In certain circumstances, such as an HCI driver using __hci_cmd_sync_ev
with HCI_EV_CMD_COMPLETE as the expected completion event there is the
chance that hci_event_packet will call hci_req_cmd_complete twice (once
for the explicitly looked after event and another time in the actual
handler of cmd_complete).

In the case of __hci_cmd_sync_ev this introduces a race where the first
call wakes up the blocking __hci_cmd_sync_ev and lets it complete.
However, by the time that a second __hci_cmd_sync_ev call is already in
progress the second hci_req_cmd_complete call (from the previous
operation) will wake up the blocking function prematurely and cause it
to fail, as witnessed by the following log:

[  639.232195] hci_rx_work: hci0 Event packet
[  639.232201] hci_req_cmd_complete: opcode 0xfc8e status 0x00
[  639.232205] hci_sent_cmd_data: hci0 opcode 0xfc8e
[  639.232210] hci_req_sync_complete: hci0 result 0x00
[  639.232220] hci_cmd_complete_evt: hci0 opcode 0xfc8e
[  639.232225] hci_req_cmd_complete: opcode 0xfc8e status 0x00
[  639.232228] __hci_cmd_sync_ev: hci0 end: err 0
[  639.232234] __hci_cmd_sync_ev: hci0
[  639.232238] hci_req_add_ev: hci0 opcode 0xfc8e plen 250
[  639.232242] hci_prepare_cmd: skb len 253
[  639.232246] hci_req_run: length 1
[  639.232250] hci_sent_cmd_data: hci0 opcode 0xfc8e
[  639.232255] hci_req_sync_complete: hci0 result 0x00
[  639.232266] hci_cmd_work: hci0 cmd_cnt 1 cmd queued 1
[  639.232271] __hci_cmd_sync_ev: hci0 end: err 0
[  639.232276] Bluetooth: hci0 sending Intel patch command (0xfc8e) failed (-61)

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-07-29 12:28:04 +01:00
Johan Hedberg 3f8e2d75c1 Bluetooth: Fix HCI init for BlueFRITZ! devices
None of the BlueFRITZ! devices with manufacurer ID 31 (AVM Berlin)
support HCI_Read_Local_Supported_Commands. It is safe to use the
manufacturer ID (instead of e.g. a USB ID specific quirk) because the
company never created any newer controllers.

< HCI Command: Read Local Supported Comm.. (0x04|0x0002) plen 0 [hci0] 0.210014
> HCI Event: Command Status (0x0f) plen 4 [hci0] 0.217361
      Read Local Supported Commands (0x04|0x0002) ncmd 1
        Status: Unknown HCI Command (0x01)

Reported-by: Jörg Esser <jackfritt@boh.de>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Jörg Esser <jackfritt@boh.de>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-07-29 12:12:27 +01:00
Stanislaw Gruszka c74f2b2678 genetlink: release cb_lock before requesting additional module
Requesting external module with cb_lock taken can result in
the deadlock like showed below:

[ 2458.111347] Showing all locks held in the system:
[ 2458.111347] 1 lock held by NetworkManager/582:
[ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40
[ 2458.111347] 1 lock held by modprobe/603:
[ 2458.111347]  #0:  (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30

[ 2461.579457] SysRq : Show Blocked State
[ 2461.580103]   task                        PC stack   pid father
[ 2461.580103] NetworkManager  D ffff880034b84500  4040   582      1 0x00000080
[ 2461.580103]  ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8
[ 2461.580103]  ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff
[ 2461.580103]  ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700
[ 2461.580103] Call Trace:
[ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
[ 2461.580103]  [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360
[ 2461.580103]  [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140
[ 2461.580103]  [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50
[ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2461.580103]  [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170
[ 2461.580103]  [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20
[ 2461.580103]  [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210
[ 2461.580103]  [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170
[ 2461.580103]  [<ffffffff81095cc3>] __request_module+0x1b3/0x370
[ 2461.580103]  [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[ 2461.580103]  [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190
[ 2461.580103]  [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0
[ 2461.580103]  [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0
[ 2461.580103]  [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0
[ 2461.580103]  [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0
[ 2461.580103]  [<ffffffff8162bc88>] genl_rcv+0x28/0x40
[ 2461.580103]  [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190
[ 2461.580103]  [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750
[ 2461.580103]  [<ffffffff815db849>] sock_sendmsg+0x99/0xd0
[ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
[ 2461.580103]  [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350
[ 2461.580103]  [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0
[ 2461.580103]  [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50
[ 2461.580103]  [<ffffffff810218b9>] ? sched_clock+0x9/0x10
[ 2461.580103]  [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80
[ 2461.580103]  [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100
[ 2461.580103]  [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10
[ 2461.580103]  [<ffffffff810bb58f>] ? local_clock+0x5f/0x70
[ 2461.580103]  [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0
[ 2461.580103]  [<ffffffff8120fec9>] ? fget_light+0xf9/0x510
[ 2461.580103]  [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510
[ 2461.580103]  [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80
[ 2461.580103]  [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20
[ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
[ 2461.580103] modprobe        D ffff88000f2c8000  4632   603    602 0x00000080
[ 2461.580103]  ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8
[ 2461.580103]  ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500
[ 2461.580103]  ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0
[ 2461.580103] Call Trace:
[ 2461.580103]  [<ffffffff817355f9>] schedule+0x29/0x70
[ 2461.580103]  [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0
[ 2461.580103]  [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0
[ 2461.580103]  [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20
[ 2461.580103]  [<ffffffff8173492d>] ? down_write+0x9d/0xb2
[ 2461.580103]  [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30
[ 2461.580103]  [<ffffffff8162baa5>] genl_lock_all+0x15/0x30
[ 2461.580103]  [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211]
[ 2461.580103]  [<ffffffffa01dc000>] ? 0xffffffffa01dbfff
[ 2461.580103]  [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211]
[ 2461.580103]  [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0
[ 2461.580103]  [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50
[ 2461.580103]  [<ffffffff810f75af>] load_module+0x1c6f/0x27f0
[ 2461.580103]  [<ffffffff810f2c90>] ? store_uevent+0x40/0x40
[ 2461.580103]  [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0
[ 2461.580103]  [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b
[ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1

Problem start to happen after adding net-pf-16-proto-16-family-nl80211
alias name to cfg80211 module by below commit (though that commit
itself is perfectly fine):

commit fb4e156886
Author: Marcel Holtmann <marcel@holtmann.org>
Date:   Sun Apr 28 16:22:06 2013 -0700

    nl80211: Add generic netlink module alias for cfg80211/nl80211

Reported-and-tested-by: Jeff Layton <jlayton@redhat.com>
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Reviewed-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-27 22:19:20 -07:00
Francesco Fusco 555445cd11 neigh: prevent overflowing params in /proc/sys/net/ipv4/neigh/
Without this patch, the fields app_solicit, gc_thresh1, gc_thresh2,
gc_thresh3, proxy_qlen, ucast_solicit, mcast_solicit could have
assumed negative values when setting large numbers.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-26 14:22:10 -07:00
Gustavo Padovan fcee337704 Bluetooth: Fix race between hci_register_dev() and hci_dev_open()
If hci_dev_open() is called after hci_register_dev() added the device to
the hci_dev_list but before the workqueue are created we could run into a
NULL pointer dereference (see below).

This bug is very unlikely to happen, systems using bluetoothd to
manage their bluetooth devices will never see this happen.

BUG: unable to handle kernel NULL pointer dereference
0100
IP: [<ffffffff81077502>] __queue_work+0x32/0x3d0
(...)
Call Trace:
 [<ffffffff81077be5>] queue_work_on+0x45/0x50
 [<ffffffffa016e8ff>] hci_req_run+0xbf/0xf0 [bluetooth]
 [<ffffffffa01709b0>] ? hci_init2_req+0x720/0x720 [bluetooth]
 [<ffffffffa016ea06>] __hci_req_sync+0xd6/0x1c0 [bluetooth]
 [<ffffffff8108ee10>] ? try_to_wake_up+0x2b0/0x2b0
 [<ffffffff8150e3f0>] ? usb_autopm_put_interface+0x30/0x40
 [<ffffffffa016fad5>] hci_dev_open+0x275/0x2e0 [bluetooth]
 [<ffffffffa0182752>] hci_sock_ioctl+0x1f2/0x3f0 [bluetooth]
 [<ffffffff815c6050>] sock_do_ioctl+0x30/0x70
 [<ffffffff815c75f9>] sock_ioctl+0x79/0x2f0
 [<ffffffff811a8046>] do_vfs_ioctl+0x96/0x560
 [<ffffffff811a85a1>] SyS_ioctl+0x91/0xb0
 [<ffffffff816d989d>] system_call_fastpath+0x1a/0x1f

Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-07-25 19:52:36 +01:00
Jaganath Kanakkassery da9910ac4a Bluetooth: Fix invalid length check in l2cap_information_rsp()
The length check is invalid since the length varies with type of
info response.

This was introduced by the commit cb3b3152b2

Because of this, l2cap info rsp is not handled and command reject is sent.

> ACL data: handle 11 flags 0x02 dlen 16
        L2CAP(s): Info rsp: type 2 result 0
          Extended feature mask 0x00b8
            Enhanced Retransmission mode
            Streaming mode
            FCS Option
            Fixed Channels
< ACL data: handle 11 flags 0x00 dlen 10
        L2CAP(s): Command rej: reason 0
          Command not understood

Cc: stable@vger.kernel.org
Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Chan-Yeol Park <chanyeol.park@samsung.com>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-07-25 19:52:30 +01:00
Arik Nemtsov 23df0b7319 regulatory: use correct regulatory initiator on wiphy register
The current regdomain was not always set by the core. This causes
cards with a custom regulatory domain to ignore user initiated changes
if done before the card was registered.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-25 09:52:46 +02:00
David S. Miller 1df86b4cee Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
This is another batch of fixes intended for the 3.11 stream.  FWIW,
this is the first request with fixes from the mac80211 and iwlwifi
trees as well.

Regarding the mac80211 bits, Johannes says:

"Here I have a fix for RSSI thresholds in mesh, two minstrel fixes from
Felix, an nl80211 fix from Michal and four various fixes I did myself."

As for the iwlwifi bits, Johannes says:

"Here I have a fix for debugfs directory creation (causing a spurious
error message), two scanning fixes from David Spinadel, an LED fix and
two patches related to a BA session problem that eventually caused
firmware crashes from Emmanuel and a small BT fix for older devices as
well as a workaround for a firmware problem with APs with very small
beacon intervals from myself."

Along with those:

Arend van Spriel addresses a lock-up and a NULL pointer dereference
in brcmfmac.

Daniel Drake fixes an unhandled interrupt during device tear down
in mwifiex.

Larry Finger corrects a wil6210 build error.

Oleksij Rempel fixes two ath9k_htc problems related to keeping the
driver and firmware in sync.

Solomon Peachy gives us a cw1200 fix to avoid an oops in monitor mode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-25 00:06:59 -07:00
Florian Fainelli deceb4c062 net: fix comment above build_skb()
build_skb() specifies that the data parameter must come from a kmalloc'd
area, this is only true if frag_size equals 0, because then build_skb()
will use kzsize(data) to figure out the actual data size. Update the
comment to reflect that special condition.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24 17:59:07 -07:00
Hannes Frederic Sowa 905a6f96a1 ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup
Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer
which leads to a panic (from Srivatsa S. Bhat):

BUG: unable to handle kernel paging request at ffff882018552020
IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
+ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4
Hardware name: IBM  -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012
Workqueue: netns cleanup_net
task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000
RIP: 0010:[<ffffffffa0366b02>]  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
RSP: 0018:ffff881039367bd8  EFLAGS: 00010286
RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200
RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68
RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222
R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040
R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0
Stack:
 ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000
 ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0
 ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0
Call Trace:
 [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6]
 [<ffffffff815bdecb>] inet_release+0xfb/0x220
 [<ffffffff815bddf2>] ? inet_release+0x22/0x220
 [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6]
 [<ffffffff8151c1d9>] sock_release+0x29/0xa0
 [<ffffffff81525520>] sk_release_kernel+0x30/0x70
 [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6]
 [<ffffffff8152fff9>] ops_exit_list+0x39/0x60
 [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0
 [<ffffffff81075e3a>] process_one_work+0x1da/0x610
 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610
 [<ffffffff81076390>] worker_thread+0x120/0x3a0
 [<ffffffff81076270>] ? process_one_work+0x610/0x610
 [<ffffffff8107da2e>] kthread+0xee/0x100
 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70
Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65
RIP  [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6]
 RSP <ffff881039367bd8>
CR2: ffff882018552020

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24 17:02:13 -07:00
Jerry Snitselaar f585a991e1 fib_trie: potential out of bounds access in trie_show_stats()
With the <= max condition in the for loop, it will be always go 1
element further than needed. If the condition for the while loop is
never met, then max is MAX_STAT_DEPTH, and for loop will walk off the
end of nodesizes[].

Signed-off-by: Jerry Snitselaar <jerry.snitselaar@oracle.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24 16:05:14 -07:00
John W. Linville 18e1ccb6ca Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-07-24 11:50:38 -04:00
Stanislaw Gruszka cd34f647a7 mac80211: fix monitor interface suspend crash regression
My commit:

commit 12e7f51702
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Thu Feb 28 10:55:26 2013 +0100

    mac80211: cleanup generic suspend/resume procedures

removed check for deleting MONITOR and AP_VLAN when suspend. That can
cause a crash (i.e. in iwlagn_mac_remove_interface()) since we remove
interface in the driver that we did not add before.

Reference:
http://marc.info/?l=linux-kernel&m=137391815113860&w=2

Bisected-by: Ortwin Glück <odi@odi.ch>
Reported-and-tested-by: Ortwin Glück <odi@odi.ch>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-23 14:02:08 +02:00
David S. Miller 7bd04bcf91 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for your net tree,
they are:

* Fix potential NULL dereference in the socket match if revision 0
  is used, from Eric Dumazet.

* Fix missing expectation NAT initialization that results in dumping
  the NAT part via ctnetlink, thus leading to problems in expectation
  synchronization through conntrackd, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-22 14:32:39 -07:00
Chun-Yeow Yeoh 40d18ff959 mac80211: prevent the buffering or frame transmission to non-assoc mesh STA
This patch is intended to avoid the buffering to non-assoc mesh STA
and also to avoid the triggering of frame to non-assoc mesh STA which
could cause kernel panic in specific hw.

One of the examples, is kernel panic happens to ath9k if user space
inserts the mesh STA and not proceed with the SAE and AMPE, and later
the same mesh STA is detected again. The sta_state of the mesh STA remains
at IEEE80211_STA_NONE and if the ieee80211_sta_ps_deliver_wakeup is called
and subsequently the ath_tx_aggr_wakeup, the kernel panic due to
ath_tx_node_init is not called before to initialize the require data
structures.

This issue is reported by Cedric Voncken before.
http://www.spinics.net/lists/linux-wireless/msg106342.html

[<831ea6b4>] ath_tx_aggr_wakeup+0x44/0xcc [ath9k]
[<83084214>] ieee80211_sta_ps_deliver_wakeup+0xb8/0x208 [mac80211]
[<830b9824>] ieee80211_mps_sta_status_update+0x94/0x108 [mac80211]
[<83099398>] ieee80211_sta_ps_transition+0xc94/0x34d8 [mac80211]
[<8022399c>] nf_iterate+0x98/0x104
[<8309bb60>] ieee80211_sta_ps_transition+0x345c/0x34d8 [mac80211]

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-22 15:32:47 +02:00
Linus Torvalds 3be542d464 NFS client bugfixes for 3.11
- Fix a regression against NFSv4 FreeBSD servers when creating a new file
 - Fix another regression in rpc_client_register()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR6ZKMAAoJEGcL54qWCgDyQX8P/19LKLNKcL+y2zVGjLbXMTq0
 TpyWdBO0ux7QcqnPEDg+Jpvu62IowYiKTtaSOXtHb5BNjQMBo2RKw3B0eMBoCp/z
 6gHmQRD2hMgqwBxBwHceV+dNwueCUiZW7GqaaNh6/3bpGQefegdONnLEifuPogEu
 oZmEuiVrGDfITEF7D4k5+shXCQN4eNH0LFuIQo4XXdCqmK6PwvOsidZ7YwHVC3Mg
 /Jzda2YsCxHj8kPi1xb9skPPAn6g4kdfYfyr/xSY7IviPixrkg/nEEK1b8xHU81e
 a0dd0Yx5kq6fR8LsBvQCHdj2m7doHM15jf5Np5G7VnnaWEjB2y+QftkxWc9lCNU3
 t2fr9YVD7ZG/GGNSFePHAHmBY0OqDB1Htp4vcwEQfzX6CAR3Hel82WVvut62Z6m4
 G5qHjwdqUFhmRN//SWlDpEqSn+pbeCvPhQS60ayN0TLivRsscm/I4yA75odAnn9b
 4su1IcUpqeJGeV6yDyMUqbx4kYZFyCZg/DNkThXiTKOs47A7ogSS9ev2fTB/V+jd
 rroNHNd/U508ze9D6D4ai9vR78uUp4wKNSSBZMCkBtNh0uSApOTgyGVhertB1EKS
 vgAr4T1tc+9t+0qg1Sb+hbKyBM/KaS5zUrPn+APHPoBXPh5PSVBzeNJkpxHRw/V0
 ZxkEgSQKLZSXYb5ab770
 =XE+7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix a regression against NFSv4 FreeBSD servers when creating a new
   file
 - Fix another regression in rpc_client_register()

* tag 'nfs-for-3.11-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix a regression against the FreeBSD server
  SUNRPC: Fix another issue with rpc_client_register()
2013-07-20 10:48:24 -07:00
Eric Dumazet 1faabf2aab bridge: do not call setup_timer() multiple times
commit 9f00b2e7cf ("bridge: only expire the mdb entry when query is
received") added a nasty bug as an active timer can be reinitialized.

setup_timer() must be done once, no matter how many time mod_timer()
is called. br_multicast_new_group() is the right place to do this.

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Diagnosed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-19 22:12:42 -07:00
Michal Tesar 651e92716a sysctl net: Keep tcp_syn_retries inside the boundary
Limit the min/max value passed to the
/proc/sys/net/ipv4/tcp_syn_retries.

Signed-off-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-19 17:36:03 -07:00
Frederic Danis 7427b370e0 NFC: Fix NCI over SPI build
kbuild test robot found following error:

     net/built-in.o: In function `nci_spi_send':
  >> spi.c:(.text+0x19a76f): undefined reference to `crc_ccitt'

Add CRC_CCITT module to Kconfig to fix it

Reported-by: kbuild test robot.
Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-07-19 16:55:26 +02:00
Linus Torvalds ecb2cf1a6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A couple interesting SKB fragment handling fixes, plus the usual small
  bits here and there:

   1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
      Tim Gardner.

   2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
      address printing helper.

   3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
      device can't do checksumming offloads then it shouldn't say it can
      do SG either.  From Haiyang Zhang.

   4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.

   5) Don't leak DMA mappings on mapping failures, from Neil Horman.

   6) We need to reset the transport header of SKBs in ipv4 before we
      attempt to perform early socket demux, just like ipv6 does.  From
      Eric Dumazet.

   7) Add missing locking on vxlan device removal, from Stephen
      Hemminger.

   8) xen-netfront has to make two passes over an SKB to prepare it for
      transfer.  One pass calculates the number of slots needed, the
      second massages the SKB and fills the slots.  Unfortunately, the
      first pass doesn't calculate the number of slots properly so we
      can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
      work out so well.  Fix from Jan Beulich with help and discussion
      with several others.

   9) Fix a similar problem in tun and macvtap, which have to split up
      scatter-gather elements at PAGE_SIZE boundaries.  Don't do
      zerocopy if it would result in a > MAX_SKB_FRAGS skb.  Fixes from
      Jason Wang.

  10) On receive, once we've decoded the VLAN state completely, clear
      skb->vlan_tci.  Otherwise demuxed tunnels underneath can trigger
      the VLAN code again, corrupting the packet.  Fix from Eric
      Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  vlan: fix a race in egress prio management
  vlan: mask vlan prio bits
  macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  pkt_sched: sch_qfq: remove a source of high packet delay/jitter
  xen-netfront: pull on receive skb may need to happen earlier
  vxlan: add necessary locking on device removal
  hyperv: Fix the NETIF_F_SG flag setting in netvsc
  net: Fix sysfs_format_mac() code duplication.
  be2net: Fix to avoid hardware workaround when not needed
  macvtap: do not assume 802.1Q when send vlan packets
  macvtap: fix the missing ret value of TUNSETQUEUE
  ipv4: set transport header earlier
  mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
  bgmac: add dependency to phylib
  net/irda: fixed style issues in irlan_eth
  ethtool: fixed trailing statements in ethtool
  ndisc: bool initializations should use true and false
  atl1e: unmap partially mapped skb on dma error and free skb
2013-07-18 20:08:47 -07:00
Eric Dumazet 3e3aac4975 vlan: fix a race in egress prio management
egress_priority_map[] hash table updates are protected by rtnl,
and we never remove elements until device is dismantled.

We have to make sure that before inserting an new element in hash table,
all its fields are committed to memory or else another cpu could
find corrupt values and crash.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-18 13:07:13 -07:00
Eric Dumazet d4b812dea4 vlan: mask vlan prio bits
In commit 48cc32d38a
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.

But we also have a problem if prio bits are set.

Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.

Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :

16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
       0x0000:  0000 0029 8000 c7c3 7103 0001 a0ae e651
       0x0010:  0000 0000 ccce 0b00 0000 0000 1011 1213
       0x0020:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
       0x0030:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

It seems we are not really ready to properly cope with this right now.

We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.

For stable kernels, lets clear vlan_tci to fix the bugs.

Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-18 13:05:23 -07:00
Paolo Valente 87f40dd6ce pkt_sched: sch_qfq: remove a source of high packet delay/jitter
QFQ+ inherits from QFQ a design choice that may cause a high packet
delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a
special quantity, the system virtual time, to track the service
provided by the ideal system it approximates. When a packet is
dequeued, this quantity must be incremented by the size of the packet,
divided by the sum of the weights of the aggregates waiting to be
served. Tracking this sum correctly is a non-trivial task, because, to
preserve tight service guarantees, the decrement of this sum must be
delayed in a special way [1]: this sum can be decremented only after
that its value would decrease also in the ideal system approximated by
QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous'
weight sum, increased and decreased immediately as the weight of an
aggregate changes, and as an aggregate is created or destroyed (which,
in its turn, happens as a consequence of some class being
created/destroyed/changed). However, to avoid the problems caused to
service guarantees by these immediate decreases, QFQ+ increments the
system virtual time using the maximum value allowed for the weight
sum, 2^10, in place of the dynamic, instantaneous value. The
instantaneous value of the weight sum is used only to check whether a
request of weight increase or a class creation can be satisfied.

Unfortunately, the problems caused by this choice are worse than the
temporary degradation of the service guarantees that may occur, when a
class is changed or destroyed, if the instantaneous value of the
weight sum was used to update the system virtual time. In fact, the
fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is
equal to the ratio between the weight of the aggregate and the sum of
the weights of the competing aggregates. The packet delay guaranteed
to the aggregate is instead inversely proportional to the guaranteed
bandwidth. By using the maximum possible value, and not the actual
value of the weight sum, QFQ+ provides each aggregate with the worst
possible service guarantees, and not with service guarantees related
to the actual set of competing aggregates. To see the consequences of
this fact, consider the following simple example.

Suppose that only the following aggregates are backlogged, i.e., that
only the classes in the following aggregates have packets to transmit:
one aggregate with weight 10, say A, and ten aggregates with weight 1,
say B1, B2, ..., B10. In particular, suppose that these aggregates are
always backlogged. Given the weight distribution, the smoothest and
fairest service order would be:
A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ...

QFQ+ would provide exactly this optimal service if it used the actual
value for the weight sum instead of the maximum possible value, i.e.,
11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it
serves aggregates as follows (easy to prove and to reproduce
experimentally):
A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ...

By replacing 10 with N in the above example, and by increasing N, one
can increase at will the maximum packet delay and the jitter
experienced by the classes in aggregate A.

This patch addresses this issue by just using the above
'instantaneous' value of the weight sum, instead of the maximum
possible value, when updating the system virtual time.  After the
instantaneous weight sum is decreased, QFQ+ may deviate from the ideal
service for a time interval in the order of the time to serve one
maximum-size packet for each backlogged class. The worst-case extent
of the deviation exhibited by QFQ+ during this time interval [1] is
basically the same as of the deviation described above (but, without
this patch, QFQ+ suffers from such a deviation all the time). Finally,
this patch modifies the comment to the function qfq_slot_insert, to
make it coherent with the fact that the weight sum used by QFQ+ can
now be lower than the maximum possible value.

[1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix",
Proceedings of AAA-IDEA'05, June 2005.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-18 13:02:00 -07:00
Linus Torvalds 3f334c2081 Merge branch 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull phase two of __cpuinit removal from Paul Gortmaker:
 "With the __cpuinit infrastructure removed earlier, this group of
  commits only removes the function/data tagging that was done with the
  various (now no-op) __cpuinit related prefixes.

  Now that the dust has settled with yesterday's v3.11-rc1, there
  hopefully shouldn't be any new users leaking back in tree, but I think
  we can leave the harmless no-op stubs there for a release as a
  courtesy to those who still have out of tree stuff and weren't paying
  attention.

  Although the commits are against the recent tag to allow for minor
  context refreshes for things like yesterday's v3.11-rc1~ slab content,
  the patches have been largely unchanged for weeks, aside from such
  trivial updates.

  For detail junkies, the largely boring and mostly irrelevant history
  of the patches can be viewed at:

    http://git.kernel.org/cgit/linux/kernel/git/paulg/cpuinit-delete.git

  If nothing else, I guess it does at least demonstrate the level of
  involvement required to shepherd such a treewide change to completion.

  This is the same repository of patches that has been applied to the
  end of the daily linux-next branches for the past several weeks"

* 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (28 commits)
  block: delete __cpuinit usage from all block files
  drivers: delete __cpuinit usage from all remaining drivers files
  kernel: delete __cpuinit usage from all core kernel files
  rcu: delete __cpuinit usage from all rcu files
  net: delete __cpuinit usage from all net files
  acpi: delete __cpuinit usage from all acpi files
  hwmon: delete __cpuinit usage from all hwmon files
  cpufreq: delete __cpuinit usage from all cpufreq files
  clocksource+irqchip: delete __cpuinit usage from all related files
  x86: delete __cpuinit usage from all x86 files
  score: delete __cpuinit usage from all score files
  xtensa: delete __cpuinit usage from all xtensa files
  openrisc: delete __cpuinit usage from all openrisc files
  m32r: delete __cpuinit usage from all m32r files
  hexagon: delete __cpuinit usage from all hexagon files
  frv: delete __cpuinit usage from all frv files
  cris: delete __cpuinit usage from all cris files
  metag: delete __cpuinit usage from all metag files
  tile: delete __cpuinit usage from all tile files
  sh: delete __cpuinit usage from all sh files
  ...
2013-07-18 10:50:26 -07:00
Linus Torvalds 61f98b0fca Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
 "Just three minor bugfixes"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
  svcrdma: underflow issue in decode_write_list()
  nfsd4: fix minorversion support interface
  lockd: protect nlm_blocked access in nlmsvc_retry_blocked
2013-07-17 13:43:55 -07:00
David S. Miller ae8e9c5a1a net: Fix sysfs_format_mac() code duplication.
It's just a duplicate implementation of "%*phC".  Thanks to Joe
Perches for showing that we had exactly this support in the
lib/vsprintf.c code already.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16 17:09:22 -07:00
Eric Dumazet 21d1196a35 ipv4: set transport header earlier
commit 45f00f99d6 ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
performance regression for non GRO traffic, basically disabling
IP early demux.

IPv6 stack resets transport header in ip6_rcv() before calling
IP early demux in ip6_rcv_finish(), while IPv4 does this only in
ip_local_deliver_finish(), _after_ IP early demux.

GRO traffic happened to enable IP early demux because transport header
is also set in inet_gro_receive()

Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
same : transport_header should be set in ip_rcv() instead of
ip_local_deliver_finish()

ip_local_deliver_finish() can also use skb_network_header_len() which is
faster than ip_hdrlen()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16 12:59:28 -07:00
Dragos Foianu b6a82dd233 net/irda: fixed style issues in irlan_eth
Applied fixes suggested by checkpatch.pl

Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16 12:16:03 -07:00
Dragos Foianu 8a73125c36 ethtool: fixed trailing statements in ethtool
Applied fixes suggested by checkpatch.pl.

Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16 12:14:51 -07:00
Daniel Baluta f2f79cca13 ndisc: bool initializations should use true and false
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16 12:13:31 -07:00
Felix Fietkau 5c9fc93bc9 mac80211/minstrel: fix NULL pointer dereference issue
When priv_sta == NULL, mi->prev_sample is dereferenced too early. Move
the assignment further down, after the rate_control_send_low call.

Reported-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-16 17:48:14 +03:00
Johannes Berg 6b0f32745d mac80211: fix duplicate retransmission detection
The duplicate retransmission detection code in mac80211
erroneously attempts to do the check for every frame,
even frames that don't have a sequence control field or
that don't use it (QoS-Null frames.)

This is problematic because it causes the code to access
data beyond the end of the SKB and depending on the data
there will drop packets erroneously.

Correct the code to not do duplicate detection for such
frames.

I found this error while testing AP powersave, it lead
to retransmitted PS-Poll frames being dropped entirely
as the data beyond the end of the SKB was always zero.

Cc: stable@vger.kernel.org [all versions]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-16 09:55:59 +03:00
Chun-Yeow Yeoh 83374fe9de nl80211: fix the setting of RSSI threshold value for mesh
RSSI threshold value used for mesh peering should be in
negative value. After range checks to mesh parameters is
introduced, this is not allowed. Fix this.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:55:58 +03:00
Johannes Berg e13bae4f80 mac80211: fix ethtool stats for non-station interfaces
As reported in https://bugzilla.kernel.org/show_bug.cgi?id=60514,
the station loop never initialises 'sinfo' and therefore adds up
a stack values, leaking stack information (the number of times it
adds values is easily obtained another way.)

Fix this by initialising the sinfo for each station to add.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:55:57 +03:00
Felix Fietkau 1cd1585739 mac80211/minstrel_ht: fix cck rate sampling
The CCK group needs special treatment to set the right flags and rate
index. Add this missing check to prevent setting broken rates for tx
packets.

Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:55:56 +03:00
Johannes Berg f77b86d7d3 regulatory: add missing rtnl locking
restore_regulatory_settings() requires the RTNL to be held,
add the missing locking in reg_timeout_work().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-16 09:55:55 +03:00
Johannes Berg 923a0e7dee cfg80211: fix bugs in new SME implementation
When splitting the SME implementation from the MLME code,
I introduced a few bugs:
 * association failures no longer sent a connect-failure event
 * getting disassociated from the AP caused deauth to be sent
   but state wasn't cleaned up, leading to warnings
 * authentication failures weren't cleaned up properly, causing
   new connection attempts to warn and fail

Fix these bugs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2013-07-16 09:55:54 +03:00
Michal Kazior a0ec570f4f nl80211: fix mgmt tx status and testmode reporting for netns
These two events were sent to the default network
namespace.

This caused AP mode in a non-default netns to not
work correctly. Mgmt tx status was multicasted to
a different (default) netns instead of the one the
AP was in.

Cc: stable@vger.kernel.org
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-07-16 09:55:37 +03:00
Dan Carpenter b2781e1021 svcrdma: underflow issue in decode_write_list()
My static checker marks everything from ntohl() as untrusted and it
complains we could have an underflow problem doing:

	return (u32 *)&ary->wc_array[nchunks];

Also on 32 bit systems the upper bound check could overflow.

Cc: stable@vger.kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-15 11:46:23 -04:00
Trond Myklebust 1540c5d3cb SUNRPC: Fix another issue with rpc_client_register()
Fix the error pathway if rpcauth_create() fails.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-15 10:15:54 -04:00
Eric Dumazet baf60efa58 netfilter: xt_socket: fix broken v0 support
commit 681f130f39 ("netfilter: xt_socket: add XT_SOCKET_NOWILDCARD
flag") added a potential NULL dereference if an old iptables package
uses v0 of the match.

Fix this by removing the test on @info in fast path.

IPv6 can remove the test as well, as it uses v1 or v2.

Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-15 11:15:21 +02:00
Pablo Neira Ayuso f09eca8db0 netfilter: ctnetlink: fix incorrect NAT expectation dumping
nf_ct_expect_alloc leaves unset the expectation NAT fields. However,
ctnetlink_exp_dump_expect expects them to be zeroed in case they are
not used, which may not be the case. This results in dumping the NAT
tuple of the expectation when it should not.

Fix it by zeroing the NAT fields of the expectation.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-15 11:14:51 +02:00
Paul Gortmaker 013dbb325b net: delete __cpuinit usage from all net files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the net/* uses of the __cpuinit macros
from all C files.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:58 -04:00
Linus Torvalds 41d9884c44 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs stuff from Al Viro:
 "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
  making simple_lookup() usable for filesystems with non-NULL s_d_op,
  which allows us to get rid of quite a bit of ugliness"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sunrpc: now we can just set ->s_d_op
  cgroup: we can use simple_lookup() now
  efivarfs: we can use simple_lookup() now
  make simple_lookup() usable for filesystems that set ->s_d_op
  configfs: don't open-code d_alloc_name()
  __rpc_lookup_create_exclusive: pass string instead of qstr
  rpc_create_*_dir: don't bother with qstr
  llist: llist_add() can use llist_add_batch()
  llist: fix/simplify llist_add() and llist_add_batch()
  fput: turn "list_head delayed_fput_list" into llist_head
  fs/file_table.c:fput(): add comment
  Safer ABI for O_TMPFILE
2013-07-14 11:42:26 -07:00
Al Viro dae3794fd6 sunrpc: now we can just set ->s_d_op
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14 17:55:39 +04:00
Al Viro d3db90b0a4 __rpc_lookup_create_exclusive: pass string instead of qstr
... and use d_hash_and_lookup() instead of open-coding it, for fsck sake...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14 17:09:57 +04:00
Al Viro a95e691f9c rpc_create_*_dir: don't bother with qstr
just pass the name

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-14 17:02:28 +04:00
Linus Torvalds be9c6d9169 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Just a bunch of small fixes and tidy ups:

   1) Finish the "busy_poll" renames, from Eliezer Tamir.

   2) Fix RCU stalls in IFB driver, from Ding Tianhong.

   3) Linearize buffers properly in tun/macvtap zerocopy code.

   4) Don't crash on rmmod in vxlan, from Pravin B Shelar.

   5) Spinlock used before init in alx driver, from Maarten Lankhorst.

   6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
      Kravkov.

   7) Dummy and ifb driver load failure paths can oops, fixes from Tan
      Xiaojun and Ding Tianhong.

   8) Correct MTU calculations in IP tunnels, from Alexander Duyck.

   9) Account all TCP retransmits in SNMP stats properly, from Yuchung
      Cheng.

  10) atl1e and via-rhine do not handle DMA mapping failures properly,
      from Neil Horman.

  11) Various equal-cost multipath route fixes in ipv6 from Hannes
      Frederic Sowa"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
  ipv6: only static routes qualify for equal cost multipathing
  via-rhine: fix dma mapping errors
  atl1e: fix dma mapping warnings
  tcp: account all retransmit failures
  usb/net/r815x: fix cast to restricted __le32
  usb/net/r8152: fix integer overflow in expression
  net: access page->private by using page_private
  net: strict_strtoul is obsolete, use kstrtoul instead
  drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
  drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
  drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
  net/usb: add relative mii functions for r815x
  net/tipc: use %*phC to dump small buffers in hex form
  qlcnic: Adding Maintainers.
  gre: Fix MTU sizing check for gretap tunnels
  pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
  pkt_sched: sch_qfq: improve efficiency of make_eligible
  gso: Update tunnel segmentation to support Tx checksum offload
  inet: fix spacing in assignment
  ifb: fix oops when loading the ifb failed
  ...
2013-07-13 17:42:22 -07:00
Hannes Frederic Sowa 307f2fb95e ipv6: only static routes qualify for equal cost multipathing
Static routes in this case are non-expiring routes which did not get
configured by autoconf or by icmpv6 redirects.

To make sure we actually get an ecmp route while searching for the first
one in this fib6_node's leafs, also make sure it matches the ecmp route
assumptions.

v2:
a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF
   already ensures that this route, even if added again without
   RTF_EXPIRES (in case of a RA announcement with infinite timeout),
   does not cause the rt6i_nsiblings logic to go wrong if a later RA
   updates the expiration time later.

v3:
a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so,
   because an pmtu event could update the RTF_EXPIRES flag and we would
   not count this route, if another route joins this set. We now filter
   only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that
   don't get changed after rt6_info construction.

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12 16:29:54 -07:00
Yuchung Cheng 24ab6bec80 tcp: account all retransmit failures
Change snmp RETRANSFAILS stat to include timeout retransmit failures
in addition to other loss recoveries.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12 16:15:56 -07:00
Sunghan Suh 40dadff265 net: access page->private by using page_private
Signed-off-by: Sunghan Suh <sunghan.suh@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12 16:10:34 -07:00
“Cosmin 92338dc2fb net: strict_strtoul is obsolete, use kstrtoul instead
patch found using checkpatch.pl

Signed-off-by: Cosmin Stanescu <cosmin90stanescu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-12 16:09:14 -07:00
Andy Shevchenko d77e41e127 net/tipc: use %*phC to dump small buffers in hex form
Instead of passing each byte by stack let's use nice specifier for that.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11 17:03:36 -07:00
Alexander Duyck 8c91e162e0 gre: Fix MTU sizing check for gretap tunnels
This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
packets are sent from the interface.

In my case I was able to reproduce the issue by simply sending a ping of
1421 bytes with the gretap interface created on a device with a standard
1500 mtu.

This fix is based on the fact that the tunnel mtu is already adjusted by
dev->hard_header_len so it would make sense that any packets being compared
against that mtu should also be adjusted by hard_header_len and the tunnel
header instead of just the tunnel header.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reported-by: Cong Wang <amwang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11 16:12:03 -07:00
Paolo Valente 88d4f419a4 pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
This patch removes the forward declaration of qfq_update_agg_ts, by moving
the definition of the function above its first call. This patch also
removes a useless forward declaration of qfq_schedule_agg.

Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11 13:01:07 -07:00
Paolo Valente 87f1369d6e pkt_sched: sch_qfq: improve efficiency of make_eligible
In make_eligible, a mask is used to decide which groups must become eligible:
the i-th group becomes eligible only if the i-th bit of the mask (from the
right) is set. The mask is computed by left-shifting a 1 by a given number of
places, and decrementing the result.  The shift is performed on a ULL to avoid
problems in case the number of places to shift is higher than 31.  On a 32-bit
machine, this is more costly than working on an UL. This patch replaces such a
costly operation with two cheaper branches.

The trick is based on the following fact: in case of a shift of at least 32
places, the resulting mask has at least the 32 less significant bits set,
whereas the total number of groups is lower than 32.  As a consequence, in this
case it is enough to just set the 32 less significant bits of the mask with a
cheaper ~0UL. In the other case, the shift can be safely performed on a UL.

Reported-by: David S. Miller <davem@davemloft.net>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11 13:01:07 -07:00
Alexander Duyck cdbaa0bb26 gso: Update tunnel segmentation to support Tx checksum offload
This change makes it so that the GRE and VXLAN tunnels can make use of Tx
checksum offload support provided by some drivers via the hw_enc_features.
Without this fix enabling GSO means sacrificing Tx checksum offload and
this actually leads to a performance regression as shown below:

            Utilization
            Send
Throughput  local         GSO
10^6bits/s  % S           state
  6276.51   8.39          enabled
  7123.52   8.42          disabled

To resolve this it was necessary to address two items.  First
netif_skb_features needed to be updated so that it would correctly handle
the Trans Ether Bridging protocol without impacting the need to check for
Q-in-Q tagging.  To do this it was necessary to update harmonize_features
so that it used skb_network_protocol instead of just using the outer
protocol.

Second it was necessary to update the GRE and UDP tunnel segmentation
offloads so that they would reset the encapsulation bit and inner header
offsets after the offload was complete.

As a result of this change I have seen the following results on a interface
with Tx checksum enabled for encapsulated frames:

            Utilization
            Send
Throughput  local         GSO
10^6bits/s  % S           state
  7123.52   8.42          disabled
  8321.75   5.43          enabled

v2: Instead of replacing refrence to skb->protocol with
    skb_network_protocol just replace the protocol reference in
    harmonize_features to allow for double VLAN tag checks.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11 12:18:49 -07:00
Linus Torvalds 1466b77a7b NFS client updates for Linux 3.11 (part 2)
Highlights include:
 - Fix an_rpc pipefs regression that causes a deadlock on mount
 - Readdir optimisations by Scott Mayhew and Jeff Layton
 - clean up the rpc_pipefs dentry operation setup
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR3vVIAAoJEGcL54qWCgDyBWEP/0blqSlJId4zZj4xDviRFqJ4
 93C7b/Vn7LrAcNCgDQsPkkzTwAX5yTB1H5eNtMuyggAdGj89d4n0jXgBniIMHmqI
 Pjrr/XMQ65NddehrO491N01iJSfP9wE3CizJodnAv4VxMRO3xqiJG85lcnoLOFea
 V1FnEFUu9oi8e93cQt2fe6KdmTu/SuRqlqR7WPGyTFgS26x1l8nkp2OQgulit5Up
 lWuaxg4xbKOdj1jfUDXZhWUnDtkFjxyGxnKR63aA2X1DEGCUTJ6gB3tAl9pvnUb2
 RTQF3GVj+Bm/E3gE6ULJvqOjhsgWYjLAZn6hDA3yNAIiFyV7aA6gwK4oKy/B47a6
 tFEN2O1EupWzCqGyHhTArk+oEBLfUv/EgFyo7+Y0YIFV4sQTu5RbaZ0nQ2geY6LA
 50q2GH57tkXTs859gtBPQgKzgRF1ulkF1FDY9EYQHyGiUbNxBfx+6/2OI04ubQt3
 1gKUmm9w1WVzYGmHcHbxsXPT53NtAnHXW4ExcMgpaZ1YOPuIILm78ZuAw78XB/dd
 mvXRtbhVt/gs7qZAQQPp1iHIv+vnJ0KgjO62gbuTIRftw5jwWrpWcfYMUUZrMnyM
 kn326z3f4gn/vSDZI7J4tOfG1Uc7eNy+cJxStjtiNWTs3UzuWJKzJH0rZnoNZdei
 xAkLhjIUEybAqIpXJuGH
 =NqQf
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull second set of NFS client updates from Trond Myklebust:
 "This mainly contains some small readdir optimisations that had
  dependencies on Al Viro's readdir rewrite.  There is also a fix for a
  nasty deadlock which surfaced earlier in this merge window.

  Highlights include:
   - Fix an_rpc pipefs regression that causes a deadlock on mount
   - Readdir optimisations by Scott Mayhew and Jeff Layton
   - clean up the rpc_pipefs dentry operation setup"

* tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Fix a deadlock in rpc_client_register()
  rpc_pipe: rpc_dir_inode_operations can be static
  NFS: Allow nfs_updatepage to extend a write under additional circumstances
  NFS: Make nfs_readdir revalidate less often
  NFS: Make nfs_attribute_cache_expired() non-static
  rpc_pipe: set dentry operations at d_alloc time
  nfs: set verifier on existing dentries in nfs_prime_dcache
2013-07-11 12:11:35 -07:00
Camelia Groza 3b8ccd4473 inet: fix spacing in assignment
Found using checkpatch.pl

Signed-off-by: Camelia Groza <camelia.groza@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11 12:02:39 -07:00
Hannes Frederic Sowa afc154e978 ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREF
This is a follow-up patch to 3630d40067
("ipv6: rt6_check_neigh should successfully verify neigh if no NUD
information are available").

Since the removal of rt->n in rt6_info we can end up with a dst ==
NULL in rt6_check_neigh. In case the kernel is not compiled with
CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown
NUD state but we must not avoid doing round robin selection on routes
with the same target. So introduce and pass down a boolean ``do_rr'' to
indicate when we should update rt->rr_ptr. As soon as no route is valid
we do backtracking and do a lookup on a higher level in the fib trie.

v2:
a) Improved rt6_check_neigh logic (no need to create neighbour there)
   and documented return values.

v3:
a) Introduce enum rt6_nud_state to get rid of the magic numbers
   (thanks to David Miller).
b) Update and shorten commit message a bit to actualy reflect
   the source.

Reported-by: Pierre Emeriaud <petrus.lt@gmail.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11 11:51:10 -07:00
Sasha Levin 110ecd69a9 9p: fix off by one causing access violations and memory corruption
p9_release_pages() would attempt to dereference one value past the end of
pages[]. This would cause the following crashes:

[ 6293.171817] BUG: unable to handle kernel paging request at ffff8807c96f3000
[ 6293.174146] IP: [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
[ 6293.176447] PGD 79c5067 PUD 82c1e3067 PMD 82c197067 PTE 80000007c96f3060
[ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 6293.180060] Modules linked in:
[ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G        W    3.10.0-next-20130710-sasha #3954
[ 6293.180060] task: ffff8807b803b000 ti: ffff880787dde000 task.ti: ffff880787dde000
[ 6293.180060] RIP: 0010:[<ffffffff8412793b>]  [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
[ 6293.214316] RSP: 0000:ffff880787ddfc28  EFLAGS: 00010202
[ 6293.214316] RAX: 0000000000000001 RBX: ffff8807c96f2ff8 RCX: 0000000000000000
[ 6293.222017] RDX: ffff8807b803b000 RSI: 0000000000000001 RDI: ffffea001c7e3d40
[ 6293.222017] RBP: ffff880787ddfc48 R08: 0000000000000000 R09: 0000000000000000
[ 6293.222017] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[ 6293.222017] R13: 0000000000000001 R14: ffff8807cc50c070 R15: ffff8807cc50c070
[ 6293.222017] FS:  00007f572641d700(0000) GS:ffff8807f3600000(0000) knlGS:0000000000000000
[ 6293.256784] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6293.256784] CR2: ffff8807c96f3000 CR3: 00000007c8e81000 CR4: 00000000000006e0
[ 6293.256784] Stack:
[ 6293.256784]  ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8
[ 6293.256784]  ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001
[ 6293.256784]  ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000
[ 6293.256784] Call Trace:
[ 6293.256784]  [<ffffffff84128be8>] p9_virtio_zc_request+0x598/0x630
[ 6293.256784]  [<ffffffff8115c610>] ? wake_up_bit+0x40/0x40
[ 6293.256784]  [<ffffffff841209b1>] p9_client_zc_rpc+0x111/0x3a0
[ 6293.256784]  [<ffffffff81174b78>] ? sched_clock_cpu+0x108/0x120
[ 6293.256784]  [<ffffffff84122a21>] p9_client_read+0xe1/0x2c0
[ 6293.256784]  [<ffffffff81708a90>] v9fs_file_read+0x90/0xc0
[ 6293.256784]  [<ffffffff812bd073>] vfs_read+0xc3/0x130
[ 6293.256784]  [<ffffffff811a78bd>] ? trace_hardirqs_on+0xd/0x10
[ 6293.256784]  [<ffffffff812bd5a2>] SyS_read+0x62/0xa0
[ 6293.256784]  [<ffffffff841a1a00>] tracesys+0xdd/0xe2
[ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 <48> 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c
[ 6293.256784] RIP  [<ffffffff8412793b>] p9_release_pages+0x3b/0x60
[ 6293.256784]  RSP <ffff880787ddfc28>
[ 6293.256784] CR2: ffff8807c96f3000
[ 6293.256784] ---[ end trace 50822ee72cd360fc ]---

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11 11:36:02 -07:00
Linus Torvalds 19d2f8e0fb Second round of 9p patches for the 3.11 merge window.
Several of these patches were rebased in order to correct style issues.
 Only stylistic changes were made versus the patches which were in linux-next
 for two weeks.  The rebases have been in linux-next for 3 days and have
 passed my regressions.
 
 The bulk of these are RDMA fixes and improvements.  There's also some
 additions on the extended attributes front to support some additional
 namespaces and a new option for TCP to force allocation of mount requests
 from a priviledged port.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJR3rWXAAoJEDZk62b0Tg6xabIP/12I+SkQ57wRN03EQy5fqUdX
 gK/YMHKQ9QuDnZPBvrZ2lypesQNqVU0KINay6VEA86JG1gwzPyUd2MnpQ7F0vV3N
 XwVD54IoflV/M74xUnrgGWB8YxaPcdacQQ8yazX+mEgOgYGdWmDAl7FHmAkdKAFB
 gSl25f3PNJX1Rjay0dssNVXrVPXuJY/fZXKnNQZKtRwXffRWKsWHd8FU0Eq7F30A
 kNQB8tmMSfHBBjP+tzR0My6/kQ09jzHdtZOkH9IgVpNzqrd8tfy0l6tEvFypxqGT
 5oQFoxHHL/tUW05V0P3gYany2A7lEhSUifPKS6omqHO+vPlw+pDJw+xWlNq9fnDt
 8S8znqVuEHhvqRQW7zFdb9ac2MZi8CHHhC2wGIZ7GYjNG2q5XwE8b/QhdXQeFin7
 ibugvoW7+ZdcDewpQW27oO0g7B/8hRt8KC+1lc/8rITKIfGxbNJkGzTDl0F4Co7v
 IH7Ew5PHPe6ZiuU0QSdU+NBuvk8g8sWGxx04Xvzl3WicwOg7XvN3ivrKB9oN2U1x
 50KZRnYpwQQv/9AxyhroYU+Ufje8SF4v++zsq1eMzUcHsC/C73eatw2m764t+X4S
 8yMLrgqY1Nzif4nAMi/SDMnB/R1bXeuc8kXD9xT6XD9d2tf6e+zCHhQklVeC0tuK
 RiVRJqGrfanbKMnWIG0Y
 =n9rI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull second round of 9p patches from Eric Van Hensbergen:
 "Several of these patches were rebased in order to correct style
  issues.  Only stylistic changes were made versus the patches which
  were in linux-next for two weeks.  The rebases have been in linux-next
  for 3 days and have passed my regressions.

  The bulk of these are RDMA fixes and improvements.  There's also some
  additions on the extended attributes front to support some additional
  namespaces and a new option for TCP to force allocation of mount
  requests from a priviledged port"

* tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
  9P: Add cancelled() to the transport functions.
  9P/RDMA: count posted buffers without a pending request
  9P/RDMA: Improve error handling in rdma_request
  9P/RDMA: Do not free req->rc in error handling in rdma_request()
  9P/RDMA: Use a semaphore to protect the RQ
  9P/RDMA: Protect against duplicate replies
  9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
  9pnet: refactor struct p9_fcall alloc code
  9P/RDMA: rdma_request() needs not allocate req->rc
  9P: Fix fcall allocation for rdma
  fs/9p: xattr: add trusted and security namespaces
  net/9p: add privport option to 9p tcp transport
2013-07-11 10:21:23 -07:00
Linus Torvalds 0ff08ba5d0 Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:
 "Changes this time include:

   - 4.1 enabled on the server by default: the last 4.1-specific issues
     I know of are fixed, so we're not going to find the rest of the
     bugs without more exposure.
   - Experimental support for NFSv4.2 MAC Labeling (to allow running
     selinux over NFS), from Dave Quigley.
   - Fixes for some delicate cache/upcall races that could cause rare
     server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
     debugging persistence.
   - Fixes for some bugs found at the recent NFS bakeathon, mostly v4
     and v4.1-specific, but also a generic bug handling fragmented rpc
     calls"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
  nfsd4: support minorversion 1 by default
  nfsd4: allow destroy_session over destroyed session
  svcrpc: fix failures to handle -1 uid's
  sunrpc: Don't schedule an upcall on a replaced cache entry.
  net/sunrpc: xpt_auth_cache should be ignored when expired.
  sunrpc/cache: ensure items removed from cache do not have pending upcalls.
  sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
  sunrpc/cache: remove races with queuing an upcall.
  nfsd4: return delegation immediately if lease fails
  nfsd4: do not throw away 4.1 lock state on last unlock
  nfsd4: delegation-based open reclaims should bypass permissions
  svcrpc: don't error out on small tcp fragment
  svcrpc: fix handling of too-short rpc's
  nfsd4: minor read_buf cleanup
  nfsd4: fix decoding of compounds across page boundaries
  nfsd4: clean up nfs4_open_delegation
  NFSD: Don't give out read delegations on creates
  nfsd4: allow client to send no cb_sec flavors
  nfsd4: fail attempts to request gss on the backchannel
  nfsd4: implement minimal SP4_MACH_CRED
  ...
2013-07-11 10:17:13 -07:00
Hannes Frederic Sowa 1eb4f75828 ipv6: in case of link failure remove route directly instead of letting it expire
We could end up expiring a route which is part of an ecmp route set. Doing
so would invalidate the rt->rt6i_nsiblings calculations and could provoke
the following panic:

[   80.144667] ------------[ cut here ]------------
[   80.145172] kernel BUG at net/ipv6/ip6_fib.c:733!
[   80.145172] invalid opcode: 0000 [#1] SMP
[   80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
+snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk
[   80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118
[   80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000
[   80.145172] RIP: 0010:[<ffffffff815f3b5d>]  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
[   80.145172] RSP: 0018:ffff880118771798  EFLAGS: 00010202
[   80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480
[   80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738
[   80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001
[   80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680
[   80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90
[   80.145172] FS:  00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000
[   80.145172] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0
[   80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   80.145172] Stack:
[   80.145172]  0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280
[   80.145172]  0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa
[   80.145172]  0000000000000000 0000000000000000 0000000000000000 ffff88011350f680
[   80.145172] Call Trace:
[   80.145172]  [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90
[   80.145172]  [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70
[   80.145172]  [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40
[   80.145172]  [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0
[   80.145172]  [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30
[   80.145172]  [<ffffffff81616077>] fib6_rule_action+0xd7/0x210
[   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
[   80.145172]  [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140
[   80.145172]  [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80
[   80.145172]  [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30
[   80.145172]  [<ffffffff815edea3>] ip6_route_output+0x73/0xb0
[   80.145172]  [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0
[   80.145172]  [<ffffffff813007b1>] ? list_del+0x11/0x40
[   80.145172]  [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50
[   80.145172]  [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0
[   80.145172]  [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20
[   80.145172]  [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0
[   80.145172]  [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30
[   80.145172]  [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0
[   80.145172]  [<ffffffff81524a68>] SYSC_sendto+0x128/0x180
[   80.145172]  [<ffffffff8109825c>] ? update_curr+0xec/0x170
[   80.145172]  [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10
[   80.145172]  [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0
[   80.145172]  [<ffffffff8152509e>] SyS_sendto+0xe/0x10
[   80.145172]  [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b
[   80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53
[   80.145172] RIP  [<ffffffff815f3b5d>] fib6_add+0x75d/0x830
[   80.145172]  RSP <ffff880118771798>
[   80.387413] ---[ end trace 02f20b7a8b81ed95 ]---
[   80.390154] Kernel panic - not syncing: Fatal exception in interrupt

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 19:45:39 -07:00
Eliezer Tamir 64b0dc517e net: rename busy poll socket op and globals
Rename LL_SO to BUSY_POLL_SO
Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
Fix up users of these variables.
Fix documentation for sysctl.

a patch for the socket.7  man page will follow separately,
because of limitations of my mail setup.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir 8b80cda536 net: rename ll methods to busy-poll
Rename ndo_ll_poll to ndo_busy_poll.
Rename sk_mark_ll to sk_mark_napi_id.
Rename skb_mark_ll to skb_mark_napi_id.
Correct all useres of these functions.
Update comments and defines  in include/net/busy_poll.h

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Eliezer Tamir 076bb0c82a net: rename include/net/ll_poll.h to include/net/busy_poll.h
Rename the file and correct all the places where it is included.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 17:08:27 -07:00
Trond Myklebust eeee245268 SUNRPC: Fix a deadlock in rpc_client_register()
Commit 384816051c (SUNRPC: fix races on
PipeFS MOUNT notifications) introduces a regression when we call
rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour.

By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we
end up deadlocking in gss_pipes_dentries_create_net().
Fix is to register the client and release the mutex before calling
rpcauth_create().

Reported-by: Weston Andros Adamson <dros@netapp.com>
Tested-by: Weston Andros Adamson <dros@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org> # : 3848160: SUNRPC: fix races on PipeFS MOUNT
Cc: <stable@vger.kernel.org> # : e73f4cc: SUNRPC: split client creation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-10 15:58:55 -04:00
Fengguang Wu 4f8568cb52 rpc_pipe: rpc_dir_inode_operations can be static
Hi Jeff,

FYI, there are new sparse warnings show up in

tree:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git nfs-for-next
head:   296afe1f58d55fd56ed85daaafafcfee39f59ece
commit: 76fa666579 [2/5] rpc_pipe: set dentry operations at d_alloc time

>> net/sunrpc/rpc_pipe.c:496:31: sparse: symbol 'rpc_dir_inode_operations' was not declared. Should it be static?

Please consider folding the attached diff :-)

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09 21:35:27 -04:00
Linus Torvalds 496322bc91 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "This is a re-do of the net-next pull request for the current merge
  window.  The only difference from the one I made the other day is that
  this has Eliezer's interface renames and the timeout handling changes
  made based upon your feedback, as well as a few bug fixes that have
  trickeled in.

  Highlights:

   1) Low latency device polling, eliminating the cost of interrupt
      handling and context switches.  Allows direct polling of a network
      device from socket operations, such as recvmsg() and poll().

      Currently ixgbe, mlx4, and bnx2x support this feature.

      Full high level description, performance numbers, and design in
      commit 0a4db187a9 ("Merge branch 'll_poll'")

      From Eliezer Tamir.

   2) With the routing cache removed, ip_check_mc_rcu() gets exercised
      more than ever before in the case where we have lots of multicast
      addresses.  Use a hash table instead of a simple linked list, from
      Eric Dumazet.

   3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
      Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
      Marek Puzyniak, Michal Kazior, and Sujith Manoharan.

   4) Support reporting the TUN device persist flag to userspace, from
      Pavel Emelyanov.

   5) Allow controlling network device VF link state using netlink, from
      Rony Efraim.

   6) Support GRE tunneling in openvswitch, from Pravin B Shelar.

   7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
      Daniel Borkmann and Eric Dumazet.

   8) Allow controlling of TCP quickack behavior on a per-route basis,
      from Cong Wang.

   9) Several bug fixes and improvements to vxlan from Stephen
      Hemminger, Pravin B Shelar, and Mike Rapoport.  In particular,
      support receiving on multiple UDP ports.

  10) Major cleanups, particular in the area of debugging and cookie
      lifetime handline, to the SCTP protocol code.  From Daniel
      Borkmann.

  11) Allow packets to cross network namespaces when traversing tunnel
      devices.  From Nicolas Dichtel.

  12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
      manner akin to how we monitor real network traffic via ptype_all.
      From Daniel Borkmann.

  13) Several bug fixes and improvements for the new alx device driver,
      from Johannes Berg.

  14) Fix scalability issues in the netem packet scheduler's time queue,
      by using an rbtree.  From Eric Dumazet.

  15) Several bug fixes in TCP loss recovery handling, from Yuchung
      Cheng.

  16) Add support for GSO segmentation of MPLS packets, from Simon
      Horman.

  17) Make network notifiers have a real data type for the opaque
      pointer that's passed into them.  Use this to properly handle
      network device flag changes in arp_netdev_event().  From Jiri
      Pirko and Timo Teräs.

  18) Convert several drivers over to module_pci_driver(), from Peter
      Huewe.

  19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
      O(1) calculation instead.  From Eric Dumazet.

  20) Support setting of explicit tunnel peer addresses in ipv6, just
      like ipv4.  From Nicolas Dichtel.

  21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.

  22) Prevent a single high rate flow from overruning an individual cpu
      during RX packet processing via selective flow shedding.  From
      Willem de Bruijn.

  23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
      Dumazet.

  24) Don't just drop GSO packets which are above the TBF scheduler's
      burst limit, chop them up so they are in-bounds instead.  Also
      from Eric Dumazet.

  25) VLAN offloads are missed when configured on top of a bridge, fix
      from Vlad Yasevich.

  26) Support IPV6 in ping sockets.  From Lorenzo Colitti.

  27) Receive flow steering targets should be updated at poll() time
      too, from David Majnemer.

  28) Fix several corner case regressions in PMTU/redirect handling due
      to the routing cache removal, from Timo Teräs.

  29) We have to be mindful of ipv4 mapped ipv6 sockets in
      upd_v6_push_pending_frames().  From Hannes Frederic Sowa.

  30) Fix L2TP sequence number handling bugs, from James Chapman."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
  drivers/net: caif: fix wrong rtnl_is_locked() usage
  drivers/net: enic: release rtnl_lock on error-path
  vhost-net: fix use-after-free in vhost_net_flush
  net: mv643xx_eth: do not use port number as platform device id
  net: sctp: confirm route during forward progress
  virtio_net: fix race in RX VQ processing
  virtio: support unlocked queue poll
  net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
  Documentation: Fix references to defunct linux-net@vger.kernel.org
  net/fs: change busy poll time accounting
  net: rename low latency sockets functions to busy poll
  bridge: fix some kernel warning in multicast timer
  sfc: Fix memory leak when discarding scattered packets
  sit: fix tunnel update via netlink
  dt:net:stmmac: Add dt specific phy reset callback support.
  dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
  dt:net:stmmac: Allocate platform data only if its NULL.
  net:stmmac: fix memleak in the open method
  ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
  net: ipv6: fix wrong ping_v6_sendmsg return value
  ...
2013-07-09 18:24:39 -07:00
Jeff Layton 76fa666579 rpc_pipe: set dentry operations at d_alloc time
Currently the way these get set is a little convoluted. If the dentry is
allocated via lookup from userland, then it gets set by simple_lookup.
If it gets allocated when the kernel is populating the directory, then
it gets set via __rpc_lookup_create_exclusive, which has to check
whether they might already be set. Between both of these, this ensures
that all dentries have their d_op pointer set.

Instead of doing that, just have them set at d_alloc time by pointing
sb->s_d_op at them. With that change, we no longer want the lookup op
to set them, so we must move to using our own lookup routine.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09 17:16:39 -04:00
Linus Torvalds 899dd38885 Grab bag of little fixes and enhancements:
* optional security enhancements
   * fix path coverage in MAINTAINERS
   * switch to using most used protocol and transport as default
   * clean up buffer dumps in trace code
 
 Held off on RDMA patches as they need to be cleaned up a bit, but
 will try to get the cleaned, checked, and pushed by mid-week.
 
 (attempt 2, hopefully this one won't screw up the history)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJR2iZUAAoJEDZk62b0Tg6xsfQP/i3cYmkpf58lb++WoWDohQdh
 iH34P6Tv+5AKcF5SViBFDyXsdkE0D/Ixzl/E6jTsx+6OTSCA0eIw4OYyvPQpzFyp
 1+RqnTyEq6v2SQaGZKW7k7NyXDiRhVypXBupuNq8eZpYKS8B3cKdnQ/WFSAXcxQ1
 sbKWKUWnnqIZYnRNqNK4LTxz9cbLovXIQOYBhn0F+NoAFinC1ZQrWzuUVbct880i
 cSoukTivmJHb37Pt9AKluPc6GGa6XHXkomQewh0WOnBJ/9FR3YUHeRXR04cnAWAL
 zpGKagnIhYWtdaTJQXCzO2OMCQakhf9FiBWYGjfM9ysyzS4LDp1cknlyUPox97xF
 o9o6MfFF161c8+uC/RpK8Lp3vG6CFPEcMVxp73BydNNI4/1hzbfCs3WcGdpkvAg/
 rRik/zyN7l3jEwtvU03Y1WEV79Ep/Q8cvPqi4XZB2L1XYi43fT4yze6zMM/cmQ5K
 DLTbFxtN5ILWg2LjQergORyn66WqQjproPqcgd9tVrvJ30Z5KPjIh+CBVcYPWp4V
 hxD0Pd0yTySpxUqV4Qx/BMZdWiD1wuBgidKgl+jNldTaCSFtPqQ52LYmTWNpneI1
 lcc3SMFRNRhqWMOFhzpcX1xGuXKD5eRiOrQ+L1ecFxGFYVndY5nwa6Pn8gUrfGHW
 LEBmADtMsv2YQW2Kahk2
 =ktVU
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p update from Eric Van Hensbergen:
 "Grab bag of little fixes and enhancements:
  - optional security enhancements
  - fix path coverage in MAINTAINERS
  - switch to using most used protocol and transport as default
  - clean up buffer dumps in trace code

  Held off on RDMA patches as they need to be cleaned up a bit, but will
  try to get the cleaned, checked, and pushed by mid-week"

* tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: Add rest of 9p files to MAINTAINERS entry
  9p: trace: use %*ph to dump buffer
  net/9p: Handle error in zero copy request correctly for 9p2000.u
  net/9p: Use virtio transpart as the default transport
  net/9p: Make 9P2000.L the default protocol for 9p file system
2013-07-09 12:55:13 -07:00
Daniel Borkmann 8c2f414ad1 net: sctp: confirm route during forward progress
This fix has been proposed originally by Vlad Yasevich. He says:

  When SCTP makes forward progress (receives a SACK that acks new chunks,
  renegs, or answeres 0-window probes) or when HB-ACK arrives, mark
  the route as confirmed so we don't unnecessarily send NUD probes.

Having a simple SCTP client/server that exchange data chunks every 1sec,
without this patch ARP requests are sent periodically every 40-60sec.
With this fix applied, an ARP request is only done once right at the
"session" beginning. Also, when clearing the related ARP cache entry
manually during the session, a new request is correctly done. I have
only "backported" this to net-next and tested that it works, so full
credit goes to Vlad.

Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-09 12:49:56 -07:00
Linus Torvalds 9a5889ae1c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "There is some follow-on RBD cleanup after the last window's code drop,
  a series from Yan fixing multi-mds behavior in cephfs, and then a
  sprinkling of bug fixes all around.  Some warnings, sleeping while
  atomic, a null dereference, and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits)
  libceph: fix invalid unsigned->signed conversion for timespec encoding
  libceph: call r_unsafe_callback when unsafe reply is received
  ceph: fix race between cap issue and revoke
  ceph: fix cap revoke race
  ceph: fix pending vmtruncate race
  ceph: avoid accessing invalid memory
  libceph: Fix NULL pointer dereference in auth client code
  ceph: Reconstruct the func ceph_reserve_caps.
  ceph: Free mdsc if alloc mdsc->mdsmap failed.
  ceph: remove sb_start/end_write in ceph_aio_write.
  ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.
  ceph: fix sleeping function called from invalid context.
  ceph: move inode to proper flushing list when auth MDS changes
  rbd: fix a couple warnings
  ceph: clear migrate seq when MDS restarts
  ceph: check migrate seq before changing auth cap
  ceph: fix race between page writeback and truncate
  ceph: reset iov_len when discarding cap release messages
  ceph: fix cap release race
  libceph: fix truncate size calculation
  ...
2013-07-09 12:39:10 -07:00
Linus Torvalds be0c5d8c0b NFS client updates for Linux 3.11
Feature highlights include:
 - Add basic client support for NFSv4.2
 - Add basic client support for Labeled NFS (selinux for NFSv4.2)
 - Fix the use of credentials in NFSv4.1 stateful operations, and
   add support for NFSv4.1 state protection.
 
 Bugfix highlights:
 - Fix another NFSv4 open state recovery race
 - Fix an NFSv4.1 back channel session regression
 - Various rpc_pipefs races
 - Fix another issue with NFSv3 auth negotiation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR2vsSAAoJEGcL54qWCgDyWBIP/AqlpBBAblxbNQ1Bl/0m1Pdb
 iKH961qgM4U1BzK0svGtHTZqkovpm4o/VbkbKBT5mQ4g6SbbsJ/AsS1plCyfnIZi
 bdnKNJyj6zg0NsAkJ3vKWqd4BTaP+icdSfEIlRKQxAPESewN7b5B3OWgY4KdYmnk
 q5BP25anC1ryxVycSY67ux8S2IKXVSRZeCZv+RO21rvZ2G0bV5y7t8Om28ztxEnU
 RKrHgQHgaaktR7i8QVO0sbiWq3iqLa3GPkUvFLwWGr8PQJtTkYY0QwYSrsV3N4rY
 hYpMRUZFHpZ8UG5YvBT6xyOy/XaGwMGKSfZjB9/YG4QVju+tTy50U1JbTil5PEWY
 GHWYF68aurIeUkXrhSv8AVnOnhir0mISx5ou/SV7p0QoAZ92V6kq+LkPrW520qlc
 z8ILh3j28pN3ZUCIEArcaZhYCt48uO2hwBi5TqevQyyGRsXFGbN1moD5jvHkllft
 Fi0XGuCBdvhrzFRZcsEl+PDq7fT8lXUK2BHe8oR5jz9PhUp+jpEl9m/eg3RsjJjN
 DuxsHye2U4chScdnRtLBQvpFtdINvWX/Gy8Bi7kdE5tsQySvOa+rdwuBc7h88PHC
 +4xI2iX3z4O1+GpsAe/T9+pjW689jEilS+eVDRVEGl6yHGn9q8PYOayjPjwbJHxS
 R2mLTRhKu1DKguTzO13f
 =wGjn
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Feature highlights include:
   - Add basic client support for NFSv4.2
   - Add basic client support for Labeled NFS (selinux for NFSv4.2)
   - Fix the use of credentials in NFSv4.1 stateful operations, and add
     support for NFSv4.1 state protection.

  Bugfix highlights:
   - Fix another NFSv4 open state recovery race
   - Fix an NFSv4.1 back channel session regression
   - Various rpc_pipefs races
   - Fix another issue with NFSv3 auth negotiation

  Please note that Labeled NFS does require some additional support from
  the security subsystem.  The relevant changesets have all been
  reviewed and acked by James Morris."

* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
  NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
  NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
  nfs: have NFSv3 try server-specified auth flavors in turn
  nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
  nfs: move server_authlist into nfs_try_mount_request
  nfs: refactor "need_mount" code out of nfs_try_mount
  SUNRPC: PipeFS MOUNT notification optimization for dying clients
  SUNRPC: split client creation routine into setup and registration
  SUNRPC: fix races on PipeFS UMOUNT notifications
  SUNRPC: fix races on PipeFS MOUNT notifications
  NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
  NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
  NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
  NFS: Improve legacy idmapping fallback
  NFSv4.1 end back channel session draining
  NFS: Apply v4.1 capabilities to v4.2
  NFSv4.1: Clean up layout segment comparison helper names
  NFSv4.1: layout segment comparison helpers should take 'const' parameters
  NFSv4: Move the DNS resolver into the NFSv4 module
  rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
  ...
2013-07-09 12:09:43 -07:00
Eliezer Tamir cbf55001b2 net: rename low latency sockets functions to busy poll
Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.

Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-08 19:25:45 -07:00
J. Bruce Fields 0979292bfa svcrpc: fix failures to handle -1 uid's
As of f025adf191 "sunrpc: Properly decode
kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
(0xffff) uid or gid would fail with a badcred error.

Commit afe3c3fd53 "svcrpc: fix failures to
handle -1 uid's and gid's" fixed part of the problem, but overlooked the
gid upcall--the kernel can request supplementary gid's for the -1 uid,
but mountd's attempt write a response will get -EINVAL.

Symptoms were nfsd failing to reply to the first attempt to use a newly
negotiated krb5 context.

Reported-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Tested-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-08 17:27:23 -04:00
Simon Derr 80b45261a0 9P: Add cancelled() to the transport functions.
RDMA needs to post a buffer for each incoming reply.
Hence it needs to keep count of these and needs to be
aware of whether a flushed request has received a reply
or not.

This patch adds the cancelled() callback to the transport modules.
It is called when RFLUSH has been received and that the corresponding
request will never receive a reply.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:18:18 -05:00
Simon Derr 1cff33069a 9P/RDMA: count posted buffers without a pending request
In rdma_request():

If an error occurs between posting the recv and the send,
there will be a reply context posted without a pending
request.
Since there is no way to "un-post" it, we remember it and
skip post_recv() for the next request.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:04:36 -05:00
Simon Derr 2f52d07cb7 9P/RDMA: Improve error handling in rdma_request
Most importantly:
- do not free the recv context (rpl_context) after a successful post_recv()
- but do free the send context (c) after a failed send.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:29 -05:00
Simon Derr b530e252e2 9P/RDMA: Do not free req->rc in error handling in rdma_request()
rdma_request() should never be in charge of freeing rc.

When an error occurs:
* Either the rc buffer has been recv_post()'ed.
  then kfree()'ing it certainly is a bad idea.
* Or is has not, and in that case req->rc still points to it,
  hence it needs not be freed.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:29 -05:00
Simon Derr fd453d0ed6 9P/RDMA: Use a semaphore to protect the RQ
The current code keeps track of the number of buffers posted in the RQ,
and will prevent it from overflowing. But it does so by simply dropping
post requests (And leaking memory in the process).
When this happens there will actually be too few buffers posted, and
soon the 9P server will complain about 'RNR retry counter exceeded'
errors.

Instead, use a semaphore, and block until the RQ is ready for another
buffer to be posted.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:29 -05:00
Simon Derr 47229ff85e 9P/RDMA: Protect against duplicate replies
A well-behaved server would not send twice the reply to a request.
But if it ever happens...
This additional check prevents the kernel from leaking memory
and possibly more nasty consequences in that unlikely event.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:28 -05:00
Simon Derr 3fcc62f4e8 9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
The current value is too low to get good performance.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:28 -05:00
Simon Derr 5387320d48 9pnet: refactor struct p9_fcall alloc code
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:27 -05:00
Simon Derr 17b6fd9d6d 9P/RDMA: rdma_request() needs not allocate req->rc
p9_tag_alloc() takes care of that.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:27 -05:00
Simon Derr ea071aa136 9P: Fix fcall allocation for rdma
The current code assumes that when a request in the request array
does have a tc, it also has a rc.

This is normally true, but not always : when using RDMA, req->rc
will temporarily be set to NULL after the request has been sent.
That is usually OK though, as when the reply arrives, req->rc will be
reassigned to a sane value before the request is recycled.

But there is a catch : if the request is flushed, the reply will never
arrive, and req->rc will be NULL, but not req->tc.

This patch fixes p9_tag_alloc to take this into account.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:26 -05:00
Jim Garlick 2f28c8b31d net/9p: add privport option to 9p tcp transport
If the privport option is specified, the tcp transport binds local
address to a reserved port before connecting to the 9p server.

In some cases when 9P AUTH cannot be implemented, this is better than
nothing.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 21:59:54 -05:00
Cong Wang c7e8e8a8f7 bridge: fix some kernel warning in multicast timer
Several people reported the warning: "kernel BUG at kernel/timer.c:729!"
and the stack trace is:

	#7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905
	#8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge]
	#9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge]
	#10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge]
	#11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge]
	#12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc
	#13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6
	#14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad
	#15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17
	#16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68
	#17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101
	#18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8
	#19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun]
	#20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun]
	#21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1
	#22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe
	#23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f
	#24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1
	#25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292

this is due to I forgot to check if mp->timer is armed in
br_multicast_del_pg(). This bug is introduced by
commit 9f00b2e7cf (bridge: only expire the mdb entry
when query is received).

Same for __br_mdb_del().

Tested-by: poma <pomidorabelisima@gmail.com>
Reported-by: LiYonghua <809674045@qq.com>
Reported-by: Robert Hancock <hancockrwd@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-06 18:12:47 -07:00
Nicolas Dichtel 86bd68bfd7 sit: fix tunnel update via netlink
The device can stand in another netns, hence we need to do the lookup in netns
tunnel->net.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-04 14:55:47 -07:00
Linus Torvalds 3366dd9fa8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
 - HID battery handling cleanup by David Herrmann
 - ELO 4000/4500 driver, which has been finally ported to be proper HID
   driver by Jiri Slaby
 - ps3remote driver functionality is now provided by generic sony
   driver, by Jiri Kosina
 - PS2/3 Buzz controllers support, by Colin Leitner
 - rework of wiimote driver including full extensions hotpluggin
   support, sub-device modularization and speaker support by David
   Herrmann

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (55 commits)
  HID: wacom: Intuos4 battery charging changes
  HID: i2c-hid: support sending HID output reports using the output register
  HID: kye: Add report fixup for Genius Gila Gaming mouse
  HID: wiimote: support Nintendo Wii U Pro Controller
  Input: make gamepad API keycodes more clear
  input: document gamepad API and add extra keycodes
  HID: explain out-of-range check better
  HID: fix false positive out of range values
  HID: wiimote: fix coccinelle warnings
  HID: roccat: check cdev_add return value
  HID: fold ps3remote driver into generic Sony driver
  HID: hyperv: convert alloc+memcpy to memdup
  HID: core: fix reporting of raw events
  HID: wiimote: discard invalid EXT data reports
  HID: wiimote: fix classic controller parsing
  HID: wiimote: init EXT/MP during device detection
  HID: wiimote: fix DRM debug-attr to correctly parse input
  HID: wiimote: add MP quirks
  HID: wiimote: remove old static extension support
  HID: wiimote: add "bboard_calib" attribute
  ...
2013-07-04 11:39:00 -07:00
Jiri Kosina db58316892 Merge branches 'for-3.11/battery', 'for-3.11/elo', 'for-3.11/holtek' and 'for-3.11/i2c-hid-fixed' into for-linus 2013-07-04 15:01:01 +02:00
Hannes Frederic Sowa 3630d40067 ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
After the removal of rt->n we do not create a neighbour entry at route
insertion time (rt6_bind_neighbour is gone). As long as no neighbour is
created because of "useful traffic" we skip this routing entry because
rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and
thus returns false.

This change was introduced by commit
887c95cc1d ("ipv6: Complete neighbour
entry removal from dst_entry.")

To quote RFC4191:
"If the host has no information about the router's reachability, then
the host assumes the router is reachable."

and also:
"A host MUST NOT probe a router's reachability in the absence of useful
traffic that the host would have sent to the router if it were reachable."

So, just assume the router is reachable and let's rt6_probe do the
rest. We don't need to create a neighbour on route insertion time.

If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support)
a neighbour is only valid if its nud_state is NUD_VALID. I did not find
any references that we should probe the router on route insertion time
via the other RFCs. So skip this route in that case.

v2:
a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov)

Reported-by: Pierre Emeriaud <petrus.lt@gmail.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 17:50:51 -07:00
Lorenzo Colitti fbfe80c890 net: ipv6: fix wrong ping_v6_sendmsg return value
ping_v6_sendmsg currently returns 0 on success. It should return
the number of bytes written instead.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 17:42:05 -07:00
Lorenzo Colitti a1bdc45580 net: ipv6: add missing lock in ping_v6_sendmsg
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 17:40:58 -07:00
Linus Torvalds 7f0ef0267e Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - various misc bits
 - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
   distracted.  There has been quite a bit of activity.
 - About half the MM queue
 - Some backlight bits
 - Various lib/ updates
 - checkpatch updates
 - zillions more little rtc patches
 - ptrace
 - signals
 - exec
 - procfs
 - rapidio
 - nbd
 - aoe
 - pps
 - memstick
 - tools/testing/selftests updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
  tools/testing/selftests: don't assume the x bit is set on scripts
  selftests: add .gitignore for kcmp
  selftests: fix clean target in kcmp Makefile
  selftests: add .gitignore for vm
  selftests: add hugetlbfstest
  self-test: fix make clean
  selftests: exit 1 on failure
  kernel/resource.c: remove the unneeded assignment in function __find_resource
  aio: fix wrong comment in aio_complete()
  drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
  drivers/memstick/host/r592.c: convert to module_pci_driver
  drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
  pps-gpio: add device-tree binding and support
  drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
  drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
  drivers/parport/share.c: use kzalloc
  Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
  aoe: update internal version number to v83
  aoe: update copyright date
  aoe: perform I/O completions in parallel
  ...
2013-07-03 17:12:13 -07:00
Eric Dumazet 36b7bfe09b netem: fix possible NULL deref in netem_dequeue()
commit aec0a40a6f ("netem: use rb tree to implement the time queue")
added a regression if a child qdisc is attached to netem, as we perform
a NULL dereference.

Fix this by adding a temporary variable to cache
netem_skb_cb(skb)->time_to_send.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 16:52:10 -07:00
Joe Stringer 4bc41b84e9 core: Copy inner_protocol in copy_skb_header()
inner_protocol was added to struct sk_buff in
0d89d2035f ("MPLS: Add limited GSO support"),
which is scheduled to be included in v3.11.

That patch did not update __copy_skb_header to copy the inner_protocol.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 16:52:10 -07:00
Kees Cook f170168b9a drivers: avoid parsing names as kthread_run() format strings
Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:41 -07:00
Kees Cook d8537548c9 drivers: avoid format strings in names passed to alloc_workqueue()
For the workqueue creation interfaces that do not expect format strings,
make sure they cannot accidently be parsed that way.  Additionally, clean
up calls made with a single parameter that would be handled as a format
string.  Many callers are passing potentially dynamic string content, so
use "%s" in those cases to avoid any potential accidents.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:41 -07:00
Jiang Liu 0ed5fd1385 mm: use totalram_pages instead of num_physpages at runtime
The global variable num_physpages is scheduled to be removed, so use
totalram_pages instead of num_physpages at runtime.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:35 -07:00
Yan, Zheng 61c5d6bf70 libceph: call r_unsafe_callback when unsafe reply is received
We can't use !req->r_sent to check if OSD request is sent for the
first time, this is because __cancel_request() zeros req->r_sent
when OSD map changes. Rather than adding a new variable to struct
ceph_osd_request to indicate if it's sent for the first time, We
can call the unsafe callback only when unsafe OSD reply is received.
If OSD's first reply is safe, just skip calling the unsafe callback.

The purpose of unsafe callback is adding unsafe request to a list,
so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
to wait for a write(2) that hasn't returned yet. So it's OK to add
request to the unsafe list when the first OSD reply is received.
(ceph_sync_write() returns after receiving the first OSD reply)

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-03 15:32:58 -07:00
Tyler Hicks 2cb33cac62 libceph: Fix NULL pointer dereference in auth client code
A malicious monitor can craft an auth reply message that could cause a
NULL function pointer dereference in the client's kernel.

To prevent this, the auth_none protocol handler needs an empty
ceph_auth_client_ops->build_request() function.

CVE-2013-1059

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Chanam Park <chanam.park@hkpco.kr>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Cc: stable@vger.kernel.org
2013-07-03 15:32:55 -07:00
Yan, Zheng ccca4e37b1 libceph: fix truncate size calculation
check the "not truncated yet" case

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-03 15:32:45 -07:00
Yan, Zheng eb845ff13a libceph: fix safe completion
handle_reply() calls complete_request() only if the first OSD reply
has ONDISK flag.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-03 15:32:44 -07:00
Alex Elder 4974341eb9 libceph: print more info for short message header
If an osd client response message arrives that has a front section
that's too big for the buffer set aside to receive it, a warning
gets reported and a new buffer is allocated.

The warning says nothing about which connection had the problem.
Add the peer type and number to what gets reported, to be a bit more
informative.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:40 -07:00
Alex Elder 96e4dac66f libceph: add lingering request reference when registered
When an osd request is set to linger, the osd client holds onto the
request so it can be re-submitted following certain osd map changes.
The osd client holds a reference to the request until it is
unregistered.  This is used by rbd for watch requests.

Currently, the reference is taken when the request is marked with
the linger flag.  This means that if an error occurs after that
time but before the the request completes successfully, that
reference is leaked.

There's really no reason to take the reference until the request is
registered in the the osd client's list of lingering requests, and
that only happens when the lingering (watch) request completes
successfully.

So take that reference only when it gets registered following
succesful completion, and drop it (as before) when the request
gets unregistered.  This avoids the reference problem on error
in rbd.

Rearrange ceph_osdc_unregister_linger_request() to avoid using
the request pointer after it may have been freed.

And hold an extra reference in kick_requests() while handling
a linger request that has not yet been registered, to ensure
it doesn't go away.

This resolves:
    http://tracker.ceph.com/issues/3859

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03 15:32:37 -07:00
David S. Miller 0c1072ae02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/freescale/fec_main.c
	drivers/net/ethernet/renesas/sh_eth.c
	net/ipv4/gre.c

The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list)
and the splitting of the gre.c code into seperate files.

The FEC conflict was two sets of changes adding ethtool support code
in an "!CONFIG_M5272" CPP protected block.

Finally the sh_eth.c conflict was between one commit add bits set
in the .eesr_err_check mask whilst another commit removed the
.tx_error_check member and assignments.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 14:55:13 -07:00
Daniel Borkmann c50cd35788 net: gre: move GSO functions to gre_offload
Similarly to TCP/UDP offloading, move all related GRE functions to
gre_offload.c to make things more explicit and similar to the rest
of the code.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-03 14:37:39 -07:00
Linus Torvalds f991fae5c6 Power management and ACPI updates for 3.11-rc1
- Hotplug changes allowing device hot-removal operations to fail
   gracefully (instead of crashing the kernel) if they cannot be
   carried out completely.  From Rafael J Wysocki and Toshi Kani.
 
 - Freezer update from Colin Cross and Mandeep Singh Baines targeted
   at making the freezing of tasks a bit less heavy weight operation.
 
 - cpufreq resume fix from Srivatsa S Bhat for a regression introduced
   during the 3.10 cycle causing some cpufreq sysfs attributes to
   return wrong values to user space after resume.
 
 - New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to
   provide information previously available via related_cpus from
   Lan Tianyu.
 
 - cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin,
   Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and
   Tang Yuantian.
 
 - Fix for an ACPICA regression causing suspend/resume issues to
   appear on some systems introduced during the 3.4 development cycle
   from Lv Zheng.
 
 - ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng,
   Chao Guan, and Zhang Rui.
 
 - New cupidle driver for Xilinx Zynq processors from Michal Simek.
 
 - cpuidle fixes and cleanups from Daniel Lezcano.
 
 - Changes to make suspend/resume work correctly in Xen guests from
   Konrad Rzeszutek Wilk.
 
 - ACPI device power management fixes and cleanups from Fengguang Wu
   and Rafael J Wysocki.
 
 - ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo.
 
 - Fix for the IA-64 issue that was the reason for reverting commit
   9f29ab1 and updates of the ACPI scan code from Rafael J Wysocki.
 
 - Mechanism for adding CMOS RTC address space handlers from Lan Tianyu
   (to allow some EC-related breakage to be fixed on some systems).
 
 - Spec-compliant implementation of acpi_os_get_timer() from
   Mika Westerberg.
 
 - Modification of do_acpi_find_child() to execute _STA in order to
   to avoid situations in which a pointer to a disabled device object
   is returned instead of an enabled one with the same _ADR value.
   From Jeff Wu.
 
 - Intel BayTrail PCH (Platform Controller Hub) support for the ACPI
   Intel Low-Power Subsystems (LPSS) driver and modificaions of that
   driver to work around a couple of known BIOS issues from
   Mika Westerberg and Heikki Krogerus.
 
 - EC driver fix from Vasiliy Kulikov to make it use get_user() and
   put_user() instead of dereferencing user space pointers blindly.
 
 - Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and
   Toshi Kani.
 
 - Modification of the "runtime idle" helper routine to take the return
   values of the callbacks executed by it into account and to call
   rpm_suspend() if they return 0, which allows some code bloat
   reduction to be done, from Rafael J Wysocki and Alan Stern.
 
 - New trace points for PM QoS from Sahara <keun-o.park@windriver.com>.
 
 - PM QoS documentation update from Lan Tianyu.
 
 - Assorted core PM code cleanups and changes from Bernie Thompson,
   Bjorn Helgaas, Julius Werner, and Shuah Khan.
 
 - New devfreq driver for the Exynos5-bus device from Abhilash Kesavan.
 
 - Minor devfreq cleanups, fixes and MAINTAINERS update from
   MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and
   Wei Yongjun.
 
 - OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control
   driver updates from Andrii Tseglytskyi and Nishanth Menon.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE
 9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ
 g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq
 wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X
 VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe
 CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I
 fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2
 8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp
 R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z
 9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d
 zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z
 rHBZfYGkJBrbGRu+Q1gs
 =VBBq
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "This time the total number of ACPI commits is slightly greater than
  the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
  remains the most active patch submitter.

  To me, the most significant change is the addition of offline/online
  device operations to the driver core (with the Greg's blessing) and
  the related modifications of the ACPI core hotplug code.  Next are the
  freezer updates from Colin Cross that should make the freezing of
  tasks a bit less heavy weight.

  We also have a couple of regression fixes, a number of fixes for
  issues that have not been identified as regressions, two new drivers
  and a bunch of cleanups all over.

  Highlights:

   - Hotplug changes to support graceful hot-removal failures.

     It sometimes is necessary to fail device hot-removal operations
     gracefully if they cannot be carried out completely.  For example,
     if memory from a memory module being hot-removed has been allocated
     for the kernel's own use and cannot be moved elsewhere, it's
     desirable to fail the hot-removal operation in a graceful way
     rather than to crash the kernel, but currenty a success or a kernel
     crash are the only possible outcomes of an attempted memory
     hot-removal.  Needless to say, that is not a very attractive
     alternative and it had to be addressed.

     However, in order to make it work for memory, I first had to make
     it work for CPUs and for this purpose I needed to modify the ACPI
     processor driver.  It's been split into two parts, a resident one
     handling the low-level initialization/cleanup and a modular one
     playing the actual driver's role (but it binds to the CPU system
     device objects rather than to the ACPI device objects representing
     processors).  That's been sort of like a live brain surgery on a
     patient who's riding a bike.

     So this is a little scary, but since we found and fixed a couple of
     regressions it caused to happen during the early linux-next testing
     (a month ago), nobody has complained.

     As a bonus we remove some duplicated ACPI hotplug code, because the
     ACPI-based CPU hotplug is now going to use the common ACPI hotplug
     code.

   - Lighter weight freezing of tasks.

     These changes from Colin Cross and Mandeep Singh Baines are
     targeted at making the freezing of tasks a bit less heavy weight
     operation.  They reduce the number of tasks woken up every time
     during the freezing, by using the observation that the freezer
     simply doesn't need to wake up some of them and wait for them all
     to call refrigerator().  The time needed for the freezer to decide
     to report a failure is reduced too.

     Also reintroduced is the check causing a lockdep warining to
     trigger when try_to_freeze() is called with locks held (which is
     generally unsafe and shouldn't happen).

   - cpufreq updates

     First off, a commit from Srivatsa S Bhat fixes a resume regression
     introduced during the 3.10 cycle causing some cpufreq sysfs
     attributes to return wrong values to user space after resume.  The
     fix is kind of fresh, but also it's pretty obvious once Srivatsa
     has identified the root cause.

     Second, we have a new freqdomain_cpus sysfs attribute for the
     acpi-cpufreq driver to provide information previously available via
     related_cpus.  From Lan Tianyu.

     Finally, we fix a number of issues, mostly related to the
     CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
     up some code.  The majority of changes from Viresh Kumar with bits
     from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
     Arnd Bergmann, and Tang Yuantian.

   - ACPICA update

     A usual bunch of updates from the ACPICA upstream.

     During the 3.4 cycle we introduced support for ACPI 5 extended
     sleep registers, but they are only supposed to be used if the
     HW-reduced mode bit is set in the FADT flags and the code attempted
     to use them without checking that bit.  That caused suspend/resume
     regressions to happen on some systems.  Fix from Lv Zheng causes
     those registers to be used only if the HW-reduced mode bit is set.

     Apart from this some other ACPICA bugs are fixed and code cleanups
     are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
     Zhang Rui.

   - cpuidle updates

     New driver for Xilinx Zynq processors is added by Michal Simek.

     Multidriver support simplification, addition of some missing
     kerneldoc comments and Kconfig-related fixes come from Daniel
     Lezcano.

   - ACPI power management updates

     Changes to make suspend/resume work correctly in Xen guests from
     Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
     cleanups and fixes of the ACPI device power state selection
     routine.

   - ACPI documentation updates

     Some previously missing pieces of ACPI documentation are added by
     Lv Zheng and Aaron Lu (hopefully, that will help people to
     uderstand how the ACPI subsystem works) and one outdated doc is
     updated by Hanjun Guo.

   - Assorted ACPI updates

     We finally nailed down the IA-64 issue that was the reason for
     reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
     against objects having scan handlers"), so we can fix it and move
     the ACPI scan handler check added to the ACPI video driver back to
     the core.

     A mechanism for adding CMOS RTC address space handlers is
     introduced by Lan Tianyu to allow some EC-related breakage to be
     fixed on some systems.

     A spec-compliant implementation of acpi_os_get_timer() is added by
     Mika Westerberg.

     The evaluation of _STA is added to do_acpi_find_child() to avoid
     situations in which a pointer to a disabled device object is
     returned instead of an enabled one with the same _ADR value.  From
     Jeff Wu.

     Intel BayTrail PCH (Platform Controller Hub) support is added to
     the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
     driver is modified to work around a couple of known BIOS issues.
     Changes from Mika Westerberg and Heikki Krogerus.

     The EC driver is fixed by Vasiliy Kulikov to use get_user() and
     put_user() instead of dereferencing user space pointers blindly.

     Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
     Kani.

   - Assorted power management updates

     The "runtime idle" helper routine is changed to take the return
     values of the callbacks executed by it into account and to call
     rpm_suspend() if they return 0, which allows us to reduce the
     overall code bloat a bit (by dropping some code that's not
     necessary any more after that modification).

     The runtime PM documentation is updated by Alan Stern (to reflect
     the "runtime idle" behavior change).

     New trace points for PM QoS are added by Sahara
     (<keun-o.park@windriver.com>).

     PM QoS documentation is updated by Lan Tianyu.

     Code cleanups are made and minor issues are addressed by Bernie
     Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

   - devfreq updates

     New driver for the Exynos5-bus device from Abhilash Kesavan.

     Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
     Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

   - OMAP power management updates

     Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
     updates from Andrii Tseglytskyi and Nishanth Menon."

* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
  cpufreq: Fix cpufreq regression after suspend/resume
  ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
  PM / Sleep: Warn about system time after resume with pm_trace
  cpufreq: don't leave stale policy pointer in cdbs->cur_policy
  acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
  cpufreq: make sure frequency transitions are serialized
  ACPI: implement acpi_os_get_timer() according the spec
  ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
  ACPI: Add CMOS RTC Operation Region handler support
  ACPI / processor: Drop unused variable from processor_perflib.c
  cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
  ...
2013-07-03 14:35:40 -07:00
Linus Torvalds 790eac5640 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second set of VFS changes from Al Viro:
 "Assorted f_pos race fixes, making do_splice_direct() safe to call with
  i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
  ->d_hash/->d_compare calling conventions changes from Linus, misc
  stuff all over the place."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  Document ->tmpfile()
  ext4: ->tmpfile() support
  vfs: export lseek_execute() to modules
  lseek_execute() doesn't need an inode passed to it
  block_dev: switch to fixed_size_llseek()
  cpqphp_sysfs: switch to fixed_size_llseek()
  tile-srom: switch to fixed_size_llseek()
  proc_powerpc: switch to fixed_size_llseek()
  ubi/cdev: switch to fixed_size_llseek()
  pci/proc: switch to fixed_size_llseek()
  isapnp: switch to fixed_size_llseek()
  lpfc: switch to fixed_size_llseek()
  locks: give the blocked_hash its own spinlock
  locks: add a new "lm_owner_key" lock operation
  locks: turn the blocked_list into a hashtable
  locks: convert fl_link to a hlist_node
  locks: avoid taking global lock if possible when waking up blocked waiters
  locks: protect most of the file_lock handling with i_lock
  locks: encapsulate the fl_link list handling
  locks: make "added" in __posix_lock_file a bool
  ...
2013-07-03 09:10:19 -07:00
Pravin B Shelar 23a3647bc4 ip_tunnels: Use skb-len to PMTU check.
In path mtu check, ip header total length works for gre device
but not for gre-tap device.  Use skb len which is consistent
for all tunneling types.  This is old bug in gre.
This also fixes mtu calculation bug introduced by
commit c544193214 (GRE: Refactor GRE tunneling code).

Reported-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 16:43:35 -07:00
James Chapman a0dbd82227 l2tp: make datapath resilient to packet loss when sequence numbers enabled
If L2TP data sequence numbers are enabled and reordering is not
enabled, data reception stops if a packet is lost since the kernel
waits for a sequence number that is never resent. (When reordering is
enabled, data reception restarts when the reorder timeout expires.) If
no reorder timeout is set, we should count the number of in-sequence
packets after the out-of-sequence (OOS) condition is detected, and reset
sequence number state after a number of such packets are received.

For now, the number of in-sequence packets while in OOS state which
cause the sequence number state to be reset is hard-coded to 5. This
could be configurable later.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 16:33:25 -07:00
James Chapman 8a1631d588 l2tp: make datapath sequence number support RFC-compliant
The L2TP datapath is not currently RFC-compliant when sequence numbers
are used in L2TP data packets. According to the L2TP RFC, any received
sequence number NR greater than or equal to the next expected NR is
acceptable, where the "greater than or equal to" test is determined by
the NR wrap point. This differs for L2TPv2 and L2TPv3, so add state in
the session context to hold the max NR value and the NR window size in
order to do the acceptable sequence number value check. These might be
configurable later, but for now we derive it from the tunnel L2TP
version, which determines the sequence number field size.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 16:33:24 -07:00
James Chapman b6dc01a43a l2tp: do data sequence number handling in a separate func
This change moves some code handling data sequence numbers into a
separate function to avoid too much indentation. This is to prepare
for some changes to data sequence number handling in subsequent
patches.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 16:33:24 -07:00
Yann Droneaud 8a59bd3e9b sctp: use get_unused_fd_flags(0) instead of get_unused_fd()
Macro get_unused_fd() is used to allocate a file descriptor with
default flags. Those default flags (0) can be "unsafe":
O_CLOEXEC must be used by default to not leak file descriptor
across exec().

Instead of macro get_unused_fd(), functions anon_inode_getfd()
or get_unused_fd_flags() should be used with flags given by userspace.
If not possible, flags should be set to O_CLOEXEC to provide userspace
with a default safe behavor.

In a further patch, get_unused_fd() will be removed so that
new code start using anon_inode_getfd() or get_unused_fd_flags()
with correct flags.

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

The hard coded flag value (0) should be reviewed on a per-subsystem basis,
and, if possible, set to O_CLOEXEC.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 16:14:11 -07:00
Isaku Yamahata 06a23fe31c core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()
The dev_forward_skb() assignment of pkt_type should be done
after the call to eth_type_trans().

ip-encapsulated packets can be handled by localhost. But skb->pkt_type
can be PACKET_OTHERHOST when packet comes via veth into ip tunnel device.
In that case, the packet is dropped by ip_rcv().
Although this example uses gretap. l2tp-eth also has same issue.
For l2tp-eth case, add dummy device for ip address and ip l2tp command.

netns A |                     root netns                      | netns B
   veth<->veth=bridge=gretap <-loop back-> gretap=bridge=veth<->veth

arp packet ->
pkt_type
         BROADCAST------------>ip_rcv()------------------------>

                                                             <- arp reply
                                                                pkt_type
                               ip_rcv()<-----------------OTHERHOST
                               drop

sample operations
  ip link add tapa type gretap remote 172.17.107.4 local 172.17.107.3
  ip link add tapb type gretap remote 172.17.107.3 local 172.17.107.4
  ip link set tapa up
  ip link set tapb up
  ip address add 172.17.107.3 dev tapa
  ip address add 172.17.107.4 dev tapb
  ip route get 172.17.107.3
  > local 172.17.107.3 dev lo  src 172.17.107.3
  >    cache <local>
  ip route get 172.17.107.4
  > local 172.17.107.4 dev lo  src 172.17.107.4
  >    cache <local>
  ip link add vetha type veth peer name vetha-peer
  ip link add vethb type veth peer name vethb-peer
  brctl addbr bra
  brctl addbr brb
  brctl addif bra tapa
  brctl addif bra vetha-peer
  brctl addif brb tapb
  brctl addif brb vethb-peer
  brctl show
  > bridge name     bridge id               STP enabled     interfaces
  > bra             8000.6ea21e758ff1       no              tapa
  >                                                         vetha-peer
  > brb             8000.420020eb92d5       no              tapb
  >                                                         vethb-peer
  ip link set vetha-peer up
  ip link set vethb-peer up
  ip link set bra up
  ip link set brb up
  ip netns add a
  ip netns add b
  ip link set vetha netns a
  ip link set vethb netns b
  ip netns exec a ip address add 10.0.0.3/24 dev vetha
  ip netns exec b ip address add 10.0.0.4/24 dev vethb
  ip netns exec a ip link set vetha up
  ip netns exec b ip link set vethb up
  ip netns exec a arping -I vetha 10.0.0.4
  ARPING 10.0.0.4 from 10.0.0.3 vetha
  ^CSent 2 probes (2 broadcast(s))
  Received 0 response(s)

Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Hong Zhiguo <honkiko@gmail.com>
Cc: Rami Rosen <ramirose@gmail.com>
Cc: Tom Parkin <tparkin@katalix.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Jesse Gross <jesse@nicira.com>
Cc: dev@openvswitch.org
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 15:59:18 -07:00
Hannes Frederic Sowa 75a493e60a ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size
If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track
of this when appending the second frame on a corked socket. This results
in the following splat:

[37598.993962] ------------[ cut here ]------------
[37598.994008] kernel BUG at net/core/skbuff.c:2064!
[37598.994008] invalid opcode: 0000 [#1] SMP
[37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat
+nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi
+scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm
[37598.994008]  snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc
+dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video
[37598.994008] CPU 0
[37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG
[37598.994008] RIP: 0010:[<ffffffff815443a5>]  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
[37598.994008] RSP: 0018:ffff88003670da18  EFLAGS: 00010202
[37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0
[37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00
[37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040
[37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8
[37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000
[37598.994008] FS:  00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
[37598.994008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0
[37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0)
[37598.994008] Stack:
[37598.994008]  ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8
[37598.994008]  ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200
[37598.994008]  0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4
[37598.994008] Call Trace:
[37598.994008]  [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0
[37598.994008]  [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0
[37598.994008]  [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40
[37598.994008]  [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10
[37598.994008]  [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90
[37598.994008]  [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0
[37598.994008]  [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30
[37598.994008]  [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0
[37598.994008]  [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0
[37598.994008]  [<ffffffff8153d97d>] sys_sendto+0x12d/0x180
[37598.994008]  [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0
[37598.994008]  [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240
[37598.994008]  [<ffffffff8166a7e7>] tracesys+0xdd/0xe2
[37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48
[37598.994008] RIP  [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330
[37598.994008]  RSP <ffff88003670da18>
[37599.007323] ---[ end trace d69f6a17f8ac8eee ]---

While there, also check if path mtu discovery is activated for this
socket. The logic was adapted from ip6_append_data when first writing
on the corked socket.

This bug was introduced with commit
0c1833797a ("ipv6: fix incorrect ipsec
fragment").

v2:
a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE.
b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao
   feng, thanks!).
c) Change mtu to unsigned int, else we get a warning about
   non-matching types because of the min()-macro type-check.

Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 12:44:18 -07:00
Hannes Frederic Sowa 8822b64a0f ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data
We accidentally call down to ip6_push_pending_frames when uncorking
pending AF_INET data on a ipv6 socket. This results in the following
splat (from Dave Jones):

skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:126!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
+netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
Call Trace:
 [<ffffffff8159a9aa>] skb_push+0x3a/0x40
 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
 [<ffffffff816f5d54>] tracesys+0xdd/0xe2
Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
 RSP <ffff8801e6431de8>

This patch adds a check if the pending data is of address family AF_INET
and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
if that is the case.

This bug was found by Dave Jones with trinity.

(Also move the initialization of fl6 below the AF_INET check, even if
not strictly necessary.)

Cc: Dave Jones <davej@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 12:44:18 -07:00
Linus Torvalds fe3c22bd5c Char/Misc merge for 3.11-rc1
Here's the big char/misc driver tree merge for 3.11-rc1
 
 A variety of different driver patches here.  All of these have been in
 linux-next for a while, and the networking patches were acked-by David
 Miller, as it made sense for those patches to come through this tree.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlHRqsQACgkQMUfUDdst+ykNlACgwnDHLav/u2NrAxoqxmw7Bcd8
 qY0An3h0ZGI5PpDe6U0IyBDQIipHuOjG
 =vaRG
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc updates from Greg KH:
 "Here's the big char/misc driver tree merge for 3.11-rc1

  A variety of different driver patches here.  All of these have been in
  linux-next for a while, and the networking patches were acked-by David
  Miller, as it made sense for those patches to come through this tree"

* tag 'char-misc-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (102 commits)
  Revert "char: misc: assign file->private_data in all cases"
  drivers: uio_pdrv_genirq: Use of_match_ptr() macro
  mei: check whether hw start has succeeded
  mei: check if the hardware reset succeeded
  mei: mei_cl_connect: don't multiply the timeout twice
  mei: do not override a client writing state when buffering
  mei: move mei_cl_irq_write_complete to client.c
  UIO: Fix concurrency issue
  drivers: uio_dmem_genirq: Use of_match_ptr() macro
  char: misc: assign file->private_data in all cases
  drivers: hv: allocate synic structures before hv_synic_init()
  drivers: hv: check interrupt mask before read_index
  vme: vme_tsi148.c: fix error return code in tsi148_probe()
  FMC: fix error handling in probe() function
  fmc: avoid readl/writel namespace conflict
  FMC: NULL dereference on allocation failure
  UIO: fix uio_pdrv_genirq with device tree but no interrupt
  UIO: allow binding uio_pdrv_genirq.c to devices using command line option
  FMC: add a char-device mezzanine driver
  FMC: add a driver to write mezzanine EEPROM
  ...
2013-07-02 11:43:33 -07:00
Cong Wang 3b7b514f44 ipip: fix a regression in ioctl
This is a regression introduced by
commit fd58156e45 (IPIP: Use ip-tunneling code.)

Similar to GRE tunnel, previously we only check the parameters
for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the
check is moved for all commands.

So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.

Also, the check for i_key, o_key etc. is suspicious too,
which did not exist before, reset them before passing
to ip_tunnel_ioctl().

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 01:13:09 -07:00
Wei Yongjun e1558a93b6 l2tp: add missing .owner to struct pppox_proto
Add missing .owner of struct pppox_proto. This prevents the
module from being removed from underneath its users.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 01:11:56 -07:00
Michal Schmidt c590b5e2f0 ethtool: make .get_dump_data() harder to misuse by drivers
As the patch "bnx2x: remove zeroing of dump data buffer" showed,
it is too easy implement .get_dump_data incorrectly in a driver.

Let's make sure drivers cannot get confused by userspace requesting
a too big dump.

Also WARN if the driver sets dump->len to something weird and make
sure the length reported to userspace is the actual length of data
copied to userspace.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 00:15:56 -07:00
Daniel Borkmann e02010adee net: sctp: get rid of SCTP_DBG_TSNS entirely
After having reworked the debugging framework, Neil and Vlad agreed to
get rid of the leftover SCTP_DBG_TSNS code for a couple of reasons:

We can use systemtap scripts to investigate these things, we now have
pr_debug() helpers that make life easier, and if we really need anything
else besides those tools, we will be forced to come up with something
better than we have there. Therefore, get rid of this ifdef debugging
code entirely for now.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-02 00:08:03 -07:00
Amerigo Wang 8965779d2c ipv6,mcast: always hold idev->lock before mca_lock
dingtianhong reported the following deadlock detected by lockdep:

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.4.24.05-0.1-default #1 Not tainted
 -------------------------------------------------------
 ksoftirqd/0/3 is trying to acquire lock:
  (&ndev->lock){+.+...}, at: [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120

 but task is already holding lock:
  (&mc->mca_lock){+.+...}, at: [<ffffffff8149d130>] mld_send_report+0x40/0x150

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&mc->mca_lock){+.+...}:
        [<ffffffff810a8027>] validate_chain+0x637/0x730
        [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500
        [<ffffffff810a8734>] lock_acquire+0x114/0x150
        [<ffffffff814f691a>] rt_spin_lock+0x4a/0x60
        [<ffffffff8149e4bb>] igmp6_group_added+0x3b/0x120
        [<ffffffff8149e5d8>] ipv6_mc_up+0x38/0x60
        [<ffffffff81480a4d>] ipv6_find_idev+0x3d/0x80
        [<ffffffff81483175>] addrconf_notify+0x3d5/0x4b0
        [<ffffffff814fae3f>] notifier_call_chain+0x3f/0x80
        [<ffffffff81073471>] raw_notifier_call_chain+0x11/0x20
        [<ffffffff813d8722>] call_netdevice_notifiers+0x32/0x60
        [<ffffffff813d92d4>] __dev_notify_flags+0x34/0x80
        [<ffffffff813d9360>] dev_change_flags+0x40/0x70
        [<ffffffff813ea627>] do_setlink+0x237/0x8a0
        [<ffffffff813ebb6c>] rtnl_newlink+0x3ec/0x600
        [<ffffffff813eb4d0>] rtnetlink_rcv_msg+0x160/0x310
        [<ffffffff814040b9>] netlink_rcv_skb+0x89/0xb0
        [<ffffffff813eb357>] rtnetlink_rcv+0x27/0x40
        [<ffffffff81403e20>] netlink_unicast+0x140/0x180
        [<ffffffff81404a9e>] netlink_sendmsg+0x33e/0x380
        [<ffffffff813c4252>] sock_sendmsg+0x112/0x130
        [<ffffffff813c537e>] __sys_sendmsg+0x44e/0x460
        [<ffffffff813c5544>] sys_sendmsg+0x44/0x70
        [<ffffffff814feab9>] system_call_fastpath+0x16/0x1b

 -> #0 (&ndev->lock){+.+...}:
        [<ffffffff810a798e>] check_prev_add+0x3de/0x440
        [<ffffffff810a8027>] validate_chain+0x637/0x730
        [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500
        [<ffffffff810a8734>] lock_acquire+0x114/0x150
        [<ffffffff814f6c82>] rt_read_lock+0x42/0x60
        [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120
        [<ffffffff8149b036>] mld_newpack+0xb6/0x160
        [<ffffffff8149b18b>] add_grhead+0xab/0xc0
        [<ffffffff8149d03b>] add_grec+0x3ab/0x460
        [<ffffffff8149d14a>] mld_send_report+0x5a/0x150
        [<ffffffff8149f99e>] igmp6_timer_handler+0x4e/0xb0
        [<ffffffff8105705a>] call_timer_fn+0xca/0x1d0
        [<ffffffff81057b9f>] run_timer_softirq+0x1df/0x2e0
        [<ffffffff8104e8c7>] handle_pending_softirqs+0xf7/0x1f0
        [<ffffffff8104ea3b>] __do_softirq_common+0x7b/0xf0
        [<ffffffff8104f07f>] __thread_do_softirq+0x1af/0x210
        [<ffffffff8104f1c1>] run_ksoftirqd+0xe1/0x1f0
        [<ffffffff8106c7de>] kthread+0xae/0xc0
        [<ffffffff814fff74>] kernel_thread_helper+0x4/0x10

actually we can just hold idev->lock before taking pmc->mca_lock,
and avoid taking idev->lock again when iterating idev->addr_list,
since the upper callers of mld_newpack() already take
read_lock_bh(&idev->lock).

Reported-by: dingtianhong <dingtianhong@huawei.com>
Cc: dingtianhong <dingtianhong@huawei.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Chen Weilong <chenweilong@huawei.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 23:39:21 -07:00
Cong Wang ab6c7a0a43 vti: remove duplicated code to fix a memory leak
vti module allocates dev->tstats twice: in vti_fb_tunnel_init()
and in vti_tunnel_init(), this lead to a memory leak of
dev->tstats.

Just remove the duplicated operations in vti_fb_tunnel_init().

(candidate for -stable)

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Saurabh Mohan <saurabh.mohan@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 23:37:14 -07:00
Cong Wang 6c734fb859 gre: fix a regression in ioctl
When testing GRE tunnel, I got:

 # ip tunnel show
 get tunnel gre0 failed: Invalid argument
 get tunnel gre1 failed: Invalid argument

This is a regression introduced by commit c544193214
("GRE: Refactor GRE tunneling code.") because previously we
only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL,
after that commit, the check is moved for all commands.

So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.

After this patch I got:

 # ip tunnel show
 gre0: gre/ip  remote any  local any  ttl inherit  nopmtudisc
 gre1: gre/ip  remote 192.168.122.101  local 192.168.122.45  ttl inherit

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 23:35:22 -07:00
Daniel Borkmann bb33381d0c net: sctp: rework debugging framework to use pr_debug and friends
We should get rid of all own SCTP debug printk macros and use the ones
that the kernel offers anyway instead. This makes the code more readable
and conform to the kernel code, and offers all the features of dynamic
debbuging that pr_debug() et al has, such as only turning on/off portions
of debug messages at runtime through debugfs. The runtime cost of having
CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
is negligible [1]. If kernel debugging is completly turned off, then these
statements will also compile into "empty" functions.

While we're at it, we also need to change the Kconfig option as it /now/
only refers to the ifdef'ed code portions in outqueue.c that enable further
debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
was enabled with this Kconfig option and has now been removed, we
transform those code parts into WARNs resp. where appropriate BUG_ONs so
that those bugs can be more easily detected as probably not many people
have SCTP debugging permanently turned on.

To turn on all SCTP debugging, the following steps are needed:

 # mount -t debugfs none /sys/kernel/debug
 # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control

This can be done more fine-grained on a per file, per line basis and others
as described in [2].

 [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
 [2] Documentation/dynamic-debug-howto.txt

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 23:22:13 -07:00
David S. Miller 936faf6c49 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/vxlan-next
Stephen Hemminger says:

====================
Here is current updates for vxlan in net-next.  It includes Mike's changes
to handle multiple destinations and lots of little cosmetic stuff.

This is a fresh vxlan-next repository which was forked from net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 23:17:19 -07:00
Dave Jones 4ccb93ce74 x25: Fix broken locking in ioctl error paths.
Two of the x25 ioctl cases have error paths that break out of the function without
unlocking the socket, leading to this warning:

================================================
[ BUG: lock held when returning to user space! ]
3.10.0-rc7+ #36 Not tainted
------------------------------------------------
trinity-child2/31407 is leaving the kernel with locks still held!
1 lock held by trinity-child2/31407:
 #0:  (sk_lock-AF_X25){+.+.+.}, at: [<ffffffffa024b6da>] x25_ioctl+0x8a/0x740 [x25]

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 18:15:25 -07:00
Eric Dumazet aec0a40a6f netem: use rb tree to implement the time queue
Following typical setup to implement a ~100 ms RTT and big
amount of reorders has very poor performance because netem
implements the time queue using a linked list.
-----------------------------------------------------------
ETH=eth0
IFB=ifb0
modprobe ifb
ip link set dev $IFB up
tc qdisc add dev $ETH ingress 2>/dev/null
tc filter add dev $ETH parent ffff: \
   protocol ip u32 match u32 0 0 flowid 1:1 action mirred egress \
   redirect dev $IFB
ethtool -K $ETH gro off tso off gso off
tc qdisc add dev $IFB root netem delay 50ms 10ms limit 100000
tc qd add dev $ETH root netem delay 50ms limit 100000
---------------------------------------------------------

Switch netem time queue to a rb tree, so this kind of setup can work at
high speed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-01 18:07:15 -07:00
NeilBrown 0bebc633f1 sunrpc: Don't schedule an upcall on a replaced cache entry.
When a cache entry is replaced, the "expiry_time" get set to
zero by a call to "cache_fresh_locked(..., 0)" at the end of
"sunrpc_cache_update".

This low expiry time makes cache_check() think that the 'refresh_age'
is negative, so the 'age' is comparatively large and a refresh is
triggered.
However refreshing a replaced entry it pointless, it cannot achieve
anything useful.

So teach cache_check to ignore a low refresh_age when expiry_time
is zero.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:53:28 -04:00
NeilBrown 7715cde868 net/sunrpc: xpt_auth_cache should be ignored when expired.
commit d202cce896
    sunrpc: never return expired entries in sunrpc_cache_lookup

moved the 'entry is expired' test from cache_check to
sunrpc_cache_lookup, so that it happened early and some races could
safely be ignored.

However the ip_map (in svcauth_unix.c) has a separate single-item
cache which allows quick lookup without locking.  An entry in this
case would not be subject to the expiry test and so could be used
well after it has expired.

This is not normally a big problem because the first time it is used
after it is expired an up-call will be scheduled to refresh the entry
(if it hasn't been scheduled already) and the old entry will then
be invalidated.  So on the second attempt to use it after it has
expired, ip_map_cached_get will discard it.

However that is subtle and not ideal, so replace the "!cache_valid"
test with "cache_is_expired".
In doing this we drop the test on the "CACHE_VALID" bit.  This is
unnecessary as the bit is never cleared, and an entry will only
be cached if the bit is set.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:53:28 -04:00
NeilBrown 013920eb5d sunrpc/cache: ensure items removed from cache do not have pending upcalls.
It is possible for a race to set CACHE_PENDING after cache_clean()
has removed a cache entry from the cache.
If CACHE_PENDING is still set when the entry is finally 'put',
the cache_dequeue() will never happen and we can leak memory.

So set a new flag 'CACHE_CLEANED' when we remove something from
the cache, and don't queue any upcall if it is set.

If CACHE_PENDING is set before CACHE_CLEANED, the call that
cache_clean() makes to cache_fresh_unlocked() will free memory
as needed.  If CACHE_PENDING is set after CACHE_CLEANED, the
test in sunrpc_cache_pipe_upcall will ensure that the memory
is not allocated.

Reported-by: <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-07-01 17:53:28 -04:00