Commit Graph

33135 Commits

Author SHA1 Message Date
Arik Nemtsov 626911cc60 mac80211: track TDLS initiator internally
Infer the TDLS initiator and track it in mac80211 via a STA flag. This
avoids breaking old userspace that doesn't pass it via nl80211 APIs.

The only case where userspace will need to pass the initiator is when the
STA is removed due to unreachability before a teardown packet is sent.
Support for unreachability was only recently added to wpa_supplicant, so
it won't be a problem in practice.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-07-21 12:14:03 +02:00
Johannes Berg 1d4cc30c86 mac80211: suppress unused variable warning without lockdep
When lockdep isn't compiled, a local variable isn't used
(it's only in a macro argument), annotate it to suppress
the compiler warning.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-07-18 09:47:26 +02:00
Michal Kazior 97dc94f1d9 cfg80211: remove channel_switch combination check
Driver is now responsible for veryfing if the
switch is possible.

Since this is inherently tricky driver may decide
to disconnect an interface later with
cfg80211_stop_iface().

This doesn't mean driver can accept everything. It
should do it's best to verify requests and reject
them as soon as possible.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 18:06:20 +02:00
Michal Kazior 4c3ebc56d7 mac80211: use chanctx reservation for STA CSA
Channel switch finalization is now 2-step. First
step is when driver calls chswitch_done(), the
other is when reservation is actually finalized
(which be defered for in-place reservation).

It is now safe to call ieee80211_chswitch_done()
more than once.

Also remove the ieee80211_vif_change_channel()
because it is no longer used.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 18:06:20 +02:00
Michal Kazior 03078de4f9 mac80211: use chanctx reservation for AP CSA
Channel switch finalization is now 2-step. First
step is when driver calls csa_finish(), the other
is when reservation is actually finalized (which
can be deferred for in-place reservation).

It is now safe to call ieee80211_csa_finish() more
than once.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 18:06:20 +02:00
Michal Kazior 71e6195ed2 mac80211: make check_combinations() aware of chanctx reservation
The ieee80211_check_combinations() computes
radar_detect accordingly depending on chanctx
reservation status.

This makes it possible to use the function for
channel_switch validation.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 18:06:20 +02:00
Michal Kazior 5bcae31d9c mac80211: implement multi-vif in-place reservations
Multi-vif in-place reservations happen when
it is impossible to allocate more channel contexts
as indicated by interface combinations.

Such reservations are not finalized until all
assigned interfaces are ready.

This still doesn't handle all possible cases
(i.e. degradation of number of channels) properly.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 18:06:20 +02:00
David Spinadel 633e271326 mac80211: split sched scan IEs
Split sched scan IEs to band specific and not band specific
blocks. Common IEs blocks may be sent to the FW once per command,
instead of per band.

This allows optimization of size of the command, which may be
required by some drivers (eg. iwlmvm with newer firmware version).

As this changes the mac80211 API, update all drivers to use the
new version correctly, even if they don't (yet) make use of the
split data.

Signed-off-by: David Spinadel <david.spinadel@intel.com>
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 09:10:43 +02:00
David Spinadel c56ef67250 mac80211: support more than one band in scan request
Some drivers (such as iwlmvm) can handle multiple bands in a single
HW scan request. Add a HW flag to indicate that the driver support
this. To hold the required data, create a separate structure for
HW scan request that holds cfg scan request and data about
different parts of the scan IEs.

As this changes the mac80211 API, update all drivers using it to
use the correct new function type/argument.

Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-25 09:10:42 +02:00
Arik Nemtsov ee10f2c779 mac80211: protect TDLS discovery session
After sending a TDLS discovery-request, we expect a reply to arrive on
the AP's channel. We must stay on the channel (no PSM, scan, etc.), since
a TDLS setup-response is a direct packet not buffered by the AP.
Add a new mac80211 driver callback to allow discovery session protection.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:28:19 +02:00
Arik Nemtsov 7adc3e4664 mac80211: make sure TDLS peer STA exists during setup
Make sure userspace added a TDLS peer station before invoking the
transmission of the first setup frame. This ensures packets to the peer
won't go through the AP path.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:28:18 +02:00
Arik Nemtsov c887f0d3a0 mac80211: add API to request TDLS operation from userspace
Write a mac80211 to the cfg80211 API for requesting a userspace TDLS
operation. Define TDLS specific reason codes that can be used here.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:28:17 +02:00
Arik Nemtsov db67d661e8 mac80211: implement proper Tx path flushing for TDLS
As the spec mandates, flush data in the AP path before transmitting the
first setup frame. Data packets transmitted during setup are already
dropped in the Tx path.

For the teardown flow, flush all packets in the direct path before
transmitting the teardown frame. Un-authorize the peer sta after teardown
is sent, forcing all subsequent Tx to the peer through the AP.

Make sure to flush the queues when disabling the link to get the
teardown packet out.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
[adjust to Luca's new quuee API and stop only vif queues]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:27:58 +02:00
Arik Nemtsov 191dd46905 mac80211: split tdls_mgmt function
There are setup/teardown specific actions to be done that accompany
the sending of a TDLS management packet. Split the main function to
simplify future additions.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:24:55 +02:00
Arik Nemtsov 2fb6b9b8e5 mac80211: use TDLS initiator in tdls_mgmt operations
The TDLS initiator is set once during link setup. If determines the
address ordering in the link identifier IE.
Use the value from userspace in order to have a correct teardown packet.
With the current code, a teardown from the responder side fails the TDLS
MIC check because of a bad link identifier IE.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:24:55 +02:00
Arik Nemtsov 31fa97c5de cfg80211: pass TDLS initiator in tdls_mgmt operations
The TDLS initiator is set once during link setup. If determines the
address ordering in the link identifier IE.

Fix dependent drivers - mwifiex and mac80211.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:24:55 +02:00
Arik Nemtsov 17e6a59a36 mac80211: cleanup TDLS state during failed setup
When setting up a TDLS session, register a delayed work to remove
the peer if setup times out. Prevent concurrent setups to support this
capacity.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:24:55 +02:00
Arik Nemtsov 68885a54cd mac80211: set auth flags after other station info
For TDLS, the AUTHORIZED flag arrives with all other important station
info (supported rates, HT/VHT caps, ...). Make sure to set the station
state in the low-level driver after transferring this information to
the mac80211 STA entry.
This aligns the STA information during sta_state callbacks with the
non-TDLS case.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:24:55 +02:00
Arik Nemtsov 9deba04d0f mac80211: clarify TDLS Tx handling
Rename the flags used in the Tx path and add an explanation for the
reasons to drop, send directly or through the AP.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:24:54 +02:00
Luciano Coelho a46992b441 mac80211: stop only the queues assigned to the vif during channel switch
Instead of stopping all the hardware queues during channel switch,
which is especially bad when we have large CSA counts, stop only the
queues that are assigned to the vif that is performing the channel
switch.

Additionally, check for (sdata->csa_block_tx) instead of calling
ieee80211_csa_needs_block_tx(), which can now be removed.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:22:29 +02:00
Luciano Coelho 26da23b695 mac80211: add functions to stop and wake all queues assigned to a vif
In some cases we may want to stop the queues of a single vif (for
instance during a channel-switch).  Add a function that stops all the
queues that are assigned to a vif.  If a queue is assigned to more
than one vif, the corresponding netdev subqueue of the other vif(s)
will also be stopped.  If the HW doesn't set the
IEEE80211_HW_QUEUE_CONTROL flag, then all queues are stopped.

Also add a corresponding function to wake the queues of a vif back.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:22:27 +02:00
Luciano Coelho cca07b00a5 mac80211: introduce refcount for queue_stop_reasons
Sometimes different vifs may be stopping the queues for the same
reason (e.g. when several interfaces are performing a channel switch).
Instead of using a bitmask for the reasons, use an integer that holds
a refcount instead.  In order to keep it backwards compatible,
introduce a boolean in some functions that tell us whether the queue
stopping should be refcounted or not.  For now, use not refcounted for
all calls to keep it functionally the same as before.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:22:25 +02:00
Luciano Coelho 59f48fe22f mac80211: don't stop all queues when flushing
There is no need to stop all queues when we want to flush specific
queues, so stop only the queues that will be flushed.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:22:24 +02:00
Thomas Gleixner 8d7b70fb7b net: Mac80211: Remove silly timespec dance
Converting time from one format to another seems to give coders a warm
and fuzzy feeling.

Use the proper interfaces.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John W. Linville <linville@tuxdriver.com>
[fix compile error]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:22:21 +02:00
Thomas Gleixner 181715203b mac80211: Use ktime_get_ts()
do_posix_clock_monotonic_gettime() is a leftover from the initial
posix timer implementation which maps to ktime_get_ts().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:22:18 +02:00
Michal Kazior 10d78f2782 mac80211: use csa counter offsets instead of csa_active
vif->csa_active is protected by mutexes only. This
means it is unreliable to depend on it on codeflow
in non-sleepable beacon and CSA code. There was no
guarantee to have vif->csa_active update be
visible before beacons are updated on SMP systems.

Using csa counter offsets which are embedded in
beacon struct (and thus are protected with single
RCU assignment) is much safer.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:22:16 +02:00
Michal Kazior af296bdb8d mac80211: move csa counters from sdata to beacon/presp
Having csa counters part of beacon and probe_resp
structures makes it easier to get rid of possible
races between setting a beacon and updating
counters on SMP systems by guaranteeing counters
are always consistent against given beacon struct.

While at it relax WARN_ON into WARN_ON_ONCE to
prevent spamming logs and racing.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
[remove pointless array check]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 14:22:06 +02:00
Janusz Dziedzic b49328361b mac80211: allow tx via monitor iface when DFS
Allow send frames using monitor interface
when DFS chandef and we pass CAC (beaconing
allowed).

This fix problem when old kernel and new backports used,
in such case hostapd create/use also monitor interface.
Before this patch all frames hostapd send using monitor
iface were dropped when AP was configured on DFS channel.

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:05:35 +02:00
Johannes Berg b7ffbd7ef6 cfg80211: make ethtool the driver's responsibility
Currently, cfg80211 tries to implement ethtool, but that doesn't
really scale well, with all the different operations. Make the
lower-level driver responsible for it, which currently only has
an effect on mac80211. It will similarly not scale well at that
level though, since mac80211 also has many drivers.

To cleanly implement this in mac80211, introduce a new file and
move some code to appropriate places.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:05:33 +02:00
Johannes Berg ba9030c20a mac80211: remove weak WEP IV accounting
Since WEP is practically dead, there seems very little
point in keeping WEP weak IV accounting.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:05:31 +02:00
Antonio Ospite b314c66990 trivial: net/mac80211/mesh.c: fix typo s/Substract/Subtract/
Signed-off-by: Antonio Ospite <ao2@ao2.it>
Cc: Luis Carlos Cobo <luisca@cozybit.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:05:29 +02:00
Bob Copeland 2b470c39e8 mac80211: remove ignore_plink_timer flag
The mesh_plink code is doing some interesting things with the
ignore_plink_timer flag.  It seems the original intent was to
handle this race:

cpu 0                           cpu 1
-----                           -----
                                start timer handler for state X
acquire sta_lock
change state from X to Y
mod_timer() / del_timer()
release sta_lock
                                acquire sta_lock
                                execute state Y timer too soon

However, using the mod_timer()/del_timer() return values to
detect these cases is broken.  As a result, timers get ignored
unnecessarily, and stations can get stuck in the peering state
machine.

Instead, we can detect the case by looking at the timer expiration.
In the case of del_timer, just ignore the timers in the following
(LISTEN/ESTAB) states since they won't have timers anyway.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:05:27 +02:00
Johannes Berg 5ac2e35030 mac80211: fix station/driver powersave race
It is currently possible to have a race due to the station PS
unblock work like this:
 * station goes to sleep with frames buffered in the driver
 * driver blocks wakeup
 * station wakes up again
 * driver flushes/returns frames, and unblocks, which schedules
   the unblock work
 * unblock work starts to run, and checks that the station is
   awake (i.e. that the WLAN_STA_PS_STA flag isn't set)
 * we process a received frame with PM=1, setting the flag again
 * ieee80211_sta_ps_deliver_wakeup() runs, delivering all frames
   to the driver, and then clearing the WLAN_STA_PS_DRIVER and
   WLAN_STA_PS_STA flags

In this scenario, mac80211 will think that the station is awake,
while it really is asleep, and any TX'ed frames should be filtered
by the device (it will know that the station is sleeping) but then
passed to mac80211 again, which will not buffer it either as it
thinks the station is awake, and eventually the packets will be
dropped.

Fix this by moving the clearing of the flags to exactly where we
learn about the situation. This creates a problem of reordering,
so introduce another flag indicating that delivery is being done,
this new flag also queues frames and is cleared only while the
spinlock is held (which the queuing code also holds) so that any
concurrent delivery/TX is handled correctly.

Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:05:25 +02:00
John W. Linville 20edb50e59 mac80211: remove PID rate control
Minstrel has long since proven its worth.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-06-23 11:05:23 +02:00
Linus Torvalds a9be22425e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix checksumming regressions, from Tom Herbert.

 2) Undo unintentional permissions changes for SCTP rto_alpha and
    rto_beta sysfs knobs, from Denial Borkmann.

 3) VXLAN, like other IP tunnels, should advertize it's encapsulation
    size using dev->needed_headroom instead of dev->hard_header_len.
    From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: sctp: fix permissions for rto_alpha and rto_beta knobs
  vxlan: Checksum fixes
  net: add skb_pop_rcv_encapsulation
  udp: call __skb_checksum_complete when doing full checksum
  net: Fix save software checksum complete
  net: Fix GSO constants to match NETIF flags
  udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
  vxlan: use dev->needed_headroom instead of dev->hard_header_len
  MAINTAINERS: update cxgb4 maintainer
2014-06-15 16:37:03 -10:00
Daniel Borkmann b58537a1f5 net: sctp: fix permissions for rto_alpha and rto_beta knobs
Commit 3fd091e73b ("[SCTP]: Remove multiple levels of msecs
to jiffies conversions.") has silently changed permissions for
rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
this was to discourage users from tweaking rto_alpha and
rto_beta knobs in production environments since they are key
to correctly compute rtt/srtt.

RFC4960 under section 6.3.1. RTO Calculation says regarding
rto_alpha and rto_beta under rule C3 and C4:

  [...]
  C3)  When a new RTT measurement R' is made, set

       RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|

       and

       SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'

       Note: The value of SRTT used in the update to RTTVAR
       is its value before updating SRTT itself using the
       second assignment. After the computation, update
       RTO <- SRTT + 4 * RTTVAR.

  C4)  When data is in flight and when allowed by rule C5
       below, a new RTT measurement MUST be made each round
       trip. Furthermore, new RTT measurements SHOULD be
       made no more than once per round trip for a given
       destination transport address. There are two reasons
       for this recommendation: First, it appears that
       measuring more frequently often does not in practice
       yield any significant benefit [ALLMAN99]; second,
       if measurements are made more often, then the values
       of RTO.Alpha and RTO.Beta in rule C3 above should be
       adjusted so that SRTT and RTTVAR still adjust to
       changes at roughly the same rate (in terms of how many
       round trips it takes them to reflect new values) as
       they would if making only one measurement per
       round-trip and using RTO.Alpha and RTO.Beta as given
       in rule C3. However, the exact nature of these
       adjustments remains a research issue.
  [...]

While it is discouraged to adjust rto_alpha and rto_beta
and not further specified how to adjust them, the RFC also
doesn't explicitly forbid it, but rather gives a RECOMMENDED
default value (rto_alpha=3, rto_beta=2). We have a couple
of users relying on the old permissions before they got
changed. That said, if someone really has the urge to adjust
them, we could allow it with a warning in the log.

Fixes: 3fd091e73b ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 01:17:32 -07:00
Tom Herbert 46fb51eb96 net: Fix save software checksum complete
Geert reported issues regarding checksum complete and UDP.
The logic introduced in commit 7e3cead517
("net: Save software checksum complete") is not correct.

This patch:
1) Restores code in __skb_checksum_complete_header except for setting
   CHECKSUM_UNNECESSARY. This function may be calculating checksum on
   something less than skb->len.
2) Adds saving checksum to __skb_checksum_complete. The full packet
   checksum 0..skb->len is calculated without adding in pseudo header.
   This value is saved in skb->csum and then the pseudo header is added
   to that to derive the checksum for validation.
3) In both __skb_checksum_complete_header and __skb_checksum_complete,
   set skb->csum_valid to whether checksum of zero was computed. This
   allows skb_csum_unnecessary to return true without changing to
   CHECKSUM_UNNECESSARY which was done previously.
4) Copy new csum related bits in __copy_skb_header.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 01:00:49 -07:00
Eric Dumazet 63c6f81cdd udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
Its too easy to add thousand of UDP sockets on a particular bucket,
and slow down an innocent multicast receiver.

Early demux is supposed to be an optimization, we should avoid spending
too much time in it.

It is interesting to note __udp4_lib_demux_lookup() only tries to
match first socket in the chain.

10 is the threshold we already have in __udp4_lib_lookup() to switch
to secondary hash.

Fixes: 421b3885bf ("udp: ipv4: Add udp early demux")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Held <drheld@google.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-13 15:39:24 -07:00
Linus Torvalds 6d87c225f5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "This has a mix of bug fixes and cleanups.

  Alex's patch fixes a rare race in RBD.  Ilya's patches fix an ENOENT
  check when a second rbd image is mapped and a couple memory leaks.
  Zheng fixes several issues with fragmented directories and multiple
  MDSs.  Josh fixes a spin/sleep issue, and Josh and Guangliang's
  patches fix setting and unsetting RBD images read-only.

  Naturally there are several other cleanups mixed in for good measure"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
  rbd: only set disk to read-only once
  rbd: move calls that may sleep out of spin lock range
  rbd: add ioctl for rbd
  ceph: use truncate_pagecache() instead of truncate_inode_pages()
  ceph: include time stamp in every MDS request
  rbd: fix ida/idr memory leak
  rbd: use reference counts for image requests
  rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync()
  rbd: make sure we have latest osdmap on 'rbd map'
  libceph: add ceph_monc_wait_osdmap()
  libceph: mon_get_version request infrastructure
  libceph: recognize poolop requests in debugfs
  ceph: refactor readpage_nounlock() to make the logic clearer
  mds: check cap ID when handling cap export message
  ceph: remember subtree root dirfrag's auth MDS
  ceph: introduce ceph_fill_fragtree()
  ceph: handle cap import atomically
  ceph: pre-allocate ceph_cap struct for ceph_add_cap()
  ceph: update inode fields according to issued caps
  rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
  ...
2014-06-12 23:06:23 -07:00
Linus Torvalds f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Michal Schmidt e5eca6d41f rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
When running RHEL6 userspace on a current upstream kernel, "ip link"
fails to show VF information.

The reason is a kernel<->userspace API change introduced by commit
88c5b5ce5c ("rtnetlink: Call nlmsg_parse() with correct header length"),
after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
in the netlink request.

iproute2 adjusted for the API change in its commit 63338dca4513
("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").

The problem has been noticed before:
http://marc.info/?l=linux-netdev&m=136692296022182&w=2
(Subject: Re: getting VF link info seems to be broken in 3.9-rc8)

We can do better than tell those with old userspace to upgrade. We can
recognize the old iproute2 in the kernel by checking the netlink message
length. Even when including the IFLA_EXT_MASK attribute, its netlink
message is shorter than struct ifinfomsg.

With this patch "ip link" shows VF information in both old and new
iproute2 versions.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:07:42 -07:00
Per Hurtig bef1909ee3 tcp: fixing TLP's FIN recovery
Fix to a problem observed when losing a FIN segment that does not
contain data.  In such situations, TLP is unable to recover from
*any* tail loss and instead adds at least PTO ms to the
retransmission process, i.e., RTO = RTO + PTO.

Signed-off-by: Per Hurtig <per.hurtig@kau.se>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:05:51 -07:00
Linus Lüssing 3993c4e159 bridge: fix compile error when compiling without IPv6 support
Some fields in "struct net_bridge" aren't available when compiling the
kernel without IPv6 support. Therefore adding a check/macro to skip the
complaining code sections in that case.

Introduced by 2cd4143192
("bridge: memorize and export selected IGMP/MLD querier port")

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:00:24 -07:00
Linus Lüssing 6c03ee8bda bridge: fix smatch warning / potential null pointer dereference
"New smatch warnings:
  net/bridge/br_multicast.c:1368 br_ip6_multicast_query() error:
    we previously assumed 'group' could be null (see line 1349)"

In the rare (sort of broken) case of a query having a Maximum
Response Delay of zero, we could create a potential null pointer
dereference.

Fixing this by skipping the multicast specific MLD Query parsing again
if no multicast group address is available.

Introduced by dc4eb53a99
("bridge: adhere to querier election mechanism specified by RFCs")

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:00:24 -07:00
Linus Torvalds 16b9057804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "This the bunch that sat in -next + lock_parent() fix.  This is the
  minimal set; there's more pending stuff.

  In particular, I really hope to get acct.c fixes merged this cycle -
  we need that to deal sanely with delayed-mntput stuff.  In the next
  pile, hopefully - that series is fairly short and localized
  (kernel/acct.c, fs/super.c and fs/namespace.c).  In this pile: more
  iov_iter work.  Most of prereqs for ->splice_write with sane locking
  order are there and Kent's dio rewrite would also fit nicely on top of
  this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
  lock_parent: don't step on stale ->d_parent of all-but-freed one
  kill generic_file_splice_write()
  ceph: switch to iter_file_splice_write()
  shmem: switch to iter_file_splice_write()
  nfs: switch to iter_splice_write_file()
  fs/splice.c: remove unneeded exports
  ocfs2: switch to iter_file_splice_write()
  ->splice_write() via ->write_iter()
  bio_vec-backed iov_iter
  optimize copy_page_{to,from}_iter()
  bury generic_file_aio_{read,write}
  lustre: get rid of messing with iovecs
  ceph: switch to ->write_iter()
  ceph_sync_direct_write: stop poking into iov_iter guts
  ceph_sync_read: stop poking into iov_iter guts
  new helper: copy_page_from_iter()
  fuse: switch to ->write_iter()
  btrfs: switch to ->write_iter()
  ocfs2: switch to ->write_iter()
  xfs: switch to ->write_iter()
  ...
2014-06-12 10:30:18 -07:00
Xufeng Zhang d3217b15a1 sctp: Fix sk_ack_backlog wrap-around problem
Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.

Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 10:27:14 -07:00
Al Viro 9c1d5284c7 Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linus
Backmerge of dcache.c changes from mainline.  It's that, or complete
rebase...

Conflicts:
	fs/splice.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-12 00:28:09 -04:00
David S. Miller 902455e007 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/core/rtnetlink.c
	net/core/skbuff.c

Both conflicts were very simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 16:02:55 -07:00
Doug Ledford c5b4616087 net/core: Add VF link state control policy
Commit 1d8faf48c7 (net/core: Add VF link state control) added VF link state
control to the netlink VF nested structure, but failed to add a proper entry
for the new structure into the VF policy table.  Add the missing entry so
the table and the actual data copied into the netlink nested struct are in
sync.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:51:37 -07:00
Florian Westphal 6e765a009a net_sched: drr: warn when qdisc is not work conserving
The DRR scheduler requires that items on the active list are work
conserving, i.e. do not hold on to skbs for throttling purposes, etc.
Attaching e.g. tbf renders DRR useless because all other classes on the
active list are delayed as well.

So, warn users that this configuration won't work as expected; we
already do this in couple of other qdiscs, see e.g.

commit b00355db3f
('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')

The 'const' change is needed to avoid compiler warning ("discards 'const'
qualifier from pointer target type").

tested with:
drr_hier() {
        parent=$1
        classes=$2
        for i in  $(seq 1 $classes); do
                classid=$parent$(printf %x $i)
                tc class add dev eth0 parent $parent classid $classid drr
		tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
        done
}
tc qdisc add dev eth0 root handle 1: drr
drr_hier 1: 32
tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:50:59 -07:00