Commit Graph

442819 Commits

Author SHA1 Message Date
Linus Torvalds da579dd6a1 Staging driver fixes for 3.15-rc8
Here are some staging driver fixes for 3.15.  3 are for the speakup
 drivers (one fix a regression caused in 3.15-rc, and the other 2 resolve
 a tty issue found by Ben Hutchings)  The comedi and r8192e_pci driver
 fixes also resolve reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlOM9eUACgkQMUfUDdst+ykETACbBTSOZVE1pAIKGjGY2bcOtQET
 yKMAnil04hAbmz5L/5TNvdkUWu+7SfGi
 =K8cr
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are some staging driver fixes for 3.15.

  Three are for the speakup drivers (one fixes a regression caused in
  3.15-rc, and the other two resolve a tty issue found by Ben Hutchings)
  The comedi and r8192e_pci driver fixes also resolve reported issues"

* tag 'staging-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8192e_pci: fix htons error
  Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt
  Staging: speakup: Move pasting into a work item
  staging: comedi: ni_daq_700: add mux settling delay
  speakup: fix incorrect perms on speakup_acntsa.c
2014-06-02 16:55:18 -07:00
Yuchung Cheng 0cfa5c07d6 tcp: fix cwnd undo on DSACK in F-RTO
This bug is discovered by an recent F-RTO issue on tcpm list
https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html

The bug is that currently F-RTO does not use DSACK to undo cwnd in
certain cases: upon receiving an ACK after the RTO retransmission in
F-RTO, and the ACK has DSACK indicating the retransmission is spurious,
the sender only calls tcp_try_undo_loss() if some never retransmisted
data is sacked (FLAG_ORIG_DATA_SACKED).

The correct behavior is to unconditionally call tcp_try_undo_loss so
the DSACK information is used properly to undo the cwnd reduction.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:50:49 -07:00
Eric W. Biederman 2d7a85f4b0 netlink: Only check file credentials for implicit destinations
It was possible to get a setuid root or setcap executable to write to
it's stdout or stderr (which has been set made a netlink socket) and
inadvertently reconfigure the networking stack.

To prevent this we check that both the creator of the socket and
the currentl applications has permission to reconfigure the network
stack.

Unfortunately this breaks Zebra which always uses sendto/sendmsg
and creates it's socket without any privileges.

To keep Zebra working don't bother checking if the creator of the
socket has privilege when a destination address is specified.  Instead
rely exclusively on the privileges of the sender of the socket.

Note from Andy: This is exactly Eric's code except for some comment
clarifications and formatting fixes.  Neither I nor, I think, anyone
else is thrilled with this approach, but I'm hesitant to wait on a
better fix since 3.15 is almost here.

Note to stable maintainers: This is a mess.  An earlier series of
patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
but they did so in a way that breaks Zebra.  The offending series
includes:

    commit aa4cf9452f
    Author: Eric W. Biederman <ebiederm@xmission.com>
    Date:   Wed Apr 23 14:28:03 2014 -0700

        net: Add variants of capable for use on netlink messages

If a given kernel version is missing that series of fixes, it's
probably worth backporting it and this patch.  if that series is
present, then this fix is critical if you care about Zebra.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:34:09 -07:00
Kristian Evensen 22fd2a52f7 ipheth: Add support for iPad 2 and iPad 3
Each iPad model has a different product id, this patch adds support for iPad 2
(pid 0x12a2) and iPad 3 (pid 0x12a6). Note that iPad 2 must be jailbroken and a
third-party app must be used for tethering to work. On iPad 3, tethering works
out of the box (assuming your ISP is nice).

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:11:55 -07:00
Jiri Pirko 9d0d68faea team: fix mtu setting
Now it is not possible to set mtu to team device which has a port
enslaved to it. The reason is that when team_change_mtu() calls
dev_set_mtu() for port device, notificator for NETDEV_PRECHANGEMTU
event is called and team_device_event() returns NOTIFY_BAD forbidding
the change. So fix this by returning NOTIFY_DONE here in case team is
changing mtu in team_change_mtu().

Introduced-by: 3d249d4c "net: introduce ethernet teaming device"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:56:01 -07:00
Eric Dumazet 39c36094d7 net: fix inet_getid() and ipv6_select_ident() bugs
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.

06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)

It appears I introduced this bug in linux-3.1.

inet_getid() must return the old value of peer->ip_id_count,
not the new one.

Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.

Fixes: 87c48fa3b4 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:09:28 -07:00
Aleksander Morgado fc0d6e9cd0 net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI
This interface is unusable, as the cdc-wdm character device doesn't reply to
any QMI command. Also, the out-of-tree Sierra Wireless GobiNet driver fully
skips it.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:00:28 -07:00
Aleksander Morgado 9a793e71eb net: qmi_wwan: add additional Sierra Wireless QMI devices
A set of new VID/PIDs retrieved from the out-of-tree GobiNet/GobiSerial
Sierra Wireless drivers.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:00:28 -07:00
Toshiaki Makita e0d7968ab6 bridge: Prevent insertion of FDB entry with disallowed vlan
br_handle_local_finish() is allowing us to insert an FDB entry with
disallowed vlan. For example, when port 1 and 2 are communicating in
vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can
interfere with their communication by spoofed src mac address with
vlan id 10.

Note: Even if it is judged that a frame should not be learned, it should
not be dropped because it is destined for not forwarding layer but higher
layer. See IEEE 802.1Q-2011 8.13.10.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 13:38:23 -07:00
Michal Schmidt bfc5184b69 netlink: rate-limit leftover bytes warning and print process name
Any process is able to send netlink messages with leftover bytes.
Make the warning rate-limited to prevent too much log spam.

The warning is supposed to help find userspace bugs, so print the
triggering command name to implicate the buggy program.

[v2: Use pr_warn_ratelimited instead of printk_ratelimited.]

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 11:16:11 -07:00
Takashi Iwai 192a98e280 ALSA: hda/realtek - Fix COEF widget NID for ALC260 replacer fixup
The conversion to a fixup table for Replacer model with ALC260 in
commit 20f7d928 took the wrong widget NID for COEF setups.  Namely,
NID 0x1a should have been used instead of NID 0x20, which is the
common node for all Realtek codecs but ALC260.

Fixes: 20f7d928fa ('ALSA: hda/realtek - Replace ALC260 model=replacer with the auto-parser')
Cc: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-06-02 16:48:28 +02:00
Ronan Marquet e30cf2d2be ALSA: hda/realtek - Correction of fixup codes for PB V7900 laptop
Correcion of wrong fixup entries add in commit ca8f0424 to replace
static model quirk for PB V7900 laptop (will model).

[note: the removal of ALC260_FIXUP_HP_PIN_0F chain is also needed as a
 part of the fix; otherwise the pin is set up wrongly as a headphone,
 and user-space (PulseAudio) may be wrongly trying to detect the jack
 state -- tiwai]

Fixes: ca8f04247e ('ALSA: hda/realtek - Add the fixup codes for ALC260 model=will')
Signed-off-by: Ronan Marquet <ronan.marquet@orange.fr>
Cc: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-06-02 16:46:31 +02:00
Dave Young a3530e8fe9 x86/efi: Do not export efi runtime map in case old map
For ioremapped efi memory aka old_map the virt addresses are not persistant
across kexec reboot. kexec-tools will read the runtime maps from sysfs then
pass them to 2nd kernel and assuming kexec efi boot is ok. This will cause
kexec boot failure.

To address this issue do not export runtime maps in case efi old_map so
userspace can use no efi boot instead.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-06-02 12:21:59 +01:00
Doug Smythies bf8102228a intel_pstate: Improve initial busy calculation
This change makes the busy calculation using 64 bit math which prevents
overflow for large values of aperf/mperf.

Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 12:46:01 +02:00
Dirk Brandewie c4ee841f60 intel_pstate: add sample time scaling
The PID assumes that samples are of equal time, which for a deferable
timers this is not true when the system goes idle.  This causes the
PID to take a long time to converge to the min P state and depending
on the pattern of the idle load can make the P state appear stuck.

The hold-off value of three sample times before using the scaling is
to give a grace period for applications that have high performance
requirements and spend a lot of time idle,  The poster child for this
behavior is the ffmpeg benchmark in the Phoronix test suite.

Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 12:45:05 +02:00
Dirk Brandewie f0fe3cd7e1 intel_pstate: Correct rounding in busy calculation
Changing to fixed point math throughout the busy calculation in
commit e66c1768 (Change busy calculation to use fixed point
math.) Introduced some inaccuracies by rounding the busy value at two
points in the calculation.  This change removes roundings and moves
the rounding to the output of the PID where the calculations are
complete and the value returned as an integer.

Fixes: e66c176837 (intel_pstate: Change busy calculation to use fixed point math.)
Reported-by: Doug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 12:44:59 +02:00
Dirk Brandewie adacdf3f2b intel_pstate: Remove C0 tracking
Commit fcb6a15c (intel_pstate: Take core C0 time into account for core
busy calculation) introduced a regression referenced below.  The issue
with "lockup" after suspend that this commit was addressing is now dealt
with in the suspend path.

Fixes: fcb6a15c2e (intel_pstate: Take core C0 time into account for core busy calculation)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581
Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121
Reported-by: Doug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 12:44:48 +02:00
Christian König 91b0275c0e drm/radeon: use the CP DMA on CIK
The SDMA sometimes doesn't seem to work reliable.

Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
2014-06-02 10:57:04 +02:00
Christian König 37903b5e08 drm/radeon: sync page table updates
Only necessary if we don't use the same engine for buffer moves and table updates.

Signed-off-by: Christian König <christian.koenig@amd.com>
2014-06-02 10:56:21 +02:00
Christian König 2f93dc32b0 drm/radeon: fix vm buffer size estimation
Only relevant if we got VM_BLOCK_SIZE>9, but better save than sorry.

Signed-off-by: Christian König <christian.koenig@amd.com>
2014-06-02 10:54:02 +02:00
Jon Maxwell c65c7a3066 bridge: notify user space after fdb update
There has been a number incidents recently where customers running KVM have
reported that VM hosts on different Hypervisors are unreachable. Based on
pcap traces we found that the bridge was broadcasting the ARP request out
onto the network. However some NICs have an inbuilt switch which on occasions
were broadcasting the VMs ARP request back through the physical NIC on the
Hypervisor. This resulted in the bridge changing ports and incorrectly learning
that the VMs mac address was external. As a result the ARP reply was directed
back onto the external network and VM never updated it's ARP cache. This patch
will notify the bridge command, after a fdb has been updated to identify such
port toggling.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 22:14:50 -07:00
Sergei Antonov ba6f582606 drm/crtc-helper: skip locking checks in panicking path
Skip locking checks in drm_helper_*_in_use() if they are called in panicking
path. See similar code in drm_warn_on_modeset_not_all_locked().

After panic information has been output, these WARN_ONs go off outputing a lot
of lines and scrolling the panic information out of the screen. Here is a
partial call trace showing how execution reaches them:

? drm_helper_crtc_in_use()
? __drm_helper_disable_unused_functions()
? several *_set_config functions
? drm_fb_helper_restore_fbdev_mode()

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-06-02 13:36:52 +10:00
Alex Deucher 3640da2faa drm/radeon/dpm: resume fixes for some systems
Setting the power state prior to restoring the display
hardware leads to blank screens on some systems.  Drop
the power state set from dpm resume.  The power state
will get set as part of the mode set sequence.  Also
add an explicit power state set after mode set resume
to cover PX and headless systems.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=76761

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-06-02 13:33:03 +10:00
Aleksander Morgado 4324be1e0b net: qmi_wwan: add Netgear AirCard 341U
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 19:43:57 -07:00
Nikolay Aleksandrov 4b9b1cdf83 net: fix wrong mac_len calculation for vlans
After 1e785f48d2 ("net: Start with correct mac_len in
skb_network_protocol") skb->mac_len is used as a start of the
calculation in skb_network_protocol() but that is not always correct. If
skb->protocol == 8021Q/AD, usually the vlan header is already inserted
in the skb (i.e. vlan reorder hdr == 0). Usually when the packet enters
dev_hard_xmit it has mac_len == 0 so we take 2 bytes from the
destination mac address (skb->data + VLAN_HLEN) as a type in
skb_network_protocol() and return vlan_depth == 4. In the case where TSO is
off, then the mac_len is set but it's == 18 (ETH_HLEN + VLAN_HLEN), so
skb_network_protocol() returns a type from inside the packet and
offset == 22. Also make vlan_depth unsigned as suggested before.
As suggested by Eric Dumazet, move the while() loop in the if() so we
can avoid additional testing in fast path.

Here are few netperf tests + debug printk's to illustrate:
cat netperf.tso-on.reorder-on.bugged
- Vlan -> device (reorder on, default, this case is okay)
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    7111.54
[   81.605435] skb->len 65226 skb->gso_size 1448 skb->proto 0x800
skb->mac_len 0 vlan_depth 0 type 0x800

- Vlan -> device (reorder off, bad)
cat netperf.tso-on.reorder-off.bugged
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00     241.35
[  204.578332] skb->len 1518 skb->gso_size 0 skb->proto 0x8100
skb->mac_len 0 vlan_depth 4 type 0x5301
0x5301 are the last two bytes of the destination mac.

And if we stop TSO, we may get even the following:
[   83.343156] skb->len 2966 skb->gso_size 1448 skb->proto 0x8100
skb->mac_len 18 vlan_depth 22 type 0xb84
Because mac_len already accounts for VLAN_HLEN.

After the fix:
cat netperf.tso-on.reorder-off.fixed
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.01    5001.46
[   81.888489] skb->len 65230 skb->gso_size 1448 skb->proto 0x8100
skb->mac_len 0 vlan_depth 18 type 0x800

CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Borkman <dborkman@redhat.com>
CC: David S. Miller <davem@davemloft.net>

Fixes:1e785f48d29a ("net: Start with correct mac_len in
skb_network_protocol")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 19:39:13 -07:00
Linus Torvalds fad01e866a Linux 3.15-rc8 2014-06-01 19:12:24 -07:00
Linus Torvalds 204fe0380b Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fix from Ben Herrenschmidt:
 "Here's just one trivial patch to wire up sys_renameat2 which I seem to
  have completely missed so far.

  (My test build scripts fwd me warnings but miss the ones generated for
  missing syscalls)"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Wire renameat2() syscall
2014-06-01 18:30:07 -07:00
Linus Torvalds 568180a517 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
 "A fair number of fixes across the field.  Nothing terribly
  complicated; the one liners in below changelog should be fairly
  descriptive.

  Noteworthy is the SB1 change which the result of changes to binutils
  resulting in one big gas warning for most files being assembled as
  well as the asid_cache and branch emulation fixes which fix corruption
  or possible uninteded behaviour of kernel or application code.  The
  remainder of fixes are more platforms or subsystem specific"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: R46000: Fix Micro-assembler field overflow for R4600 V2
  MIPS: ptrace: Avoid smp_processor_id() in preemptible code
  MIPS: Lemote 2F: cs5536: mfgpt: use raw locks
  MIPS: SB1: Fix excessive kernel warnings.
  MIPS: RC32434: fix broken PCI resource initialization
  MIPS: malta: memory.c: Initialize the 'memsize' variable
  MIPS: Fix typo when reporting cache and ftlb errors for ImgTec cores
  MIPS: Fix inconsistancy of __NR_Linux_syscalls value
  MIPS: Fix branch emulation of branch likely instructions.
  MIPS: Fix a typo error in AUDIT_ARCH definition
  MIPS: Change type of asid_cache to unsigned long
2014-06-01 18:28:58 -07:00
Linus Torvalds 32439700fe Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Various fixlets, mostly related to the (root-only) SCHED_DEADLINE
  policy, but also a hotplug bug fix and a fix for a NR_CPUS related
  overallocation bug causing a suspend/resume regression"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix hotplug vs. set_cpus_allowed_ptr()
  sched/cpupri: Replace NR_CPUS arrays
  sched/deadline: Replace NR_CPUS arrays
  sched/deadline: Restrict user params max value to 2^63 ns
  sched/deadline: Change sched_getparam() behaviour vs SCHED_DEADLINE
  sched: Disallow sched_attr::sched_policy < 0
  sched: Make sched_setattr() correctly return -EFBIG
2014-06-01 18:26:59 -07:00
Benjamin Herrenschmidt 8212f58a9b powerpc: Wire renameat2() syscall
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-02 09:24:27 +10:00
David S. Miller 6ce995c6f4 Included changes:
- prevent NULL dereference in multicast code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTiZ58AAoJEJgn97Bh2u9e2x4QAIjWLDIo6feo4jH8l6q6R5iO
 cH/EXCtqk9GHNwvfZDNt+pF19ejzVk/TPnmmXTZ4QcElS9GuXe5WdWxiGcS5KEwa
 0UNDRp8fgcBSV1Kqc/vbyKiQ4j69QtC1PPfLWUtxj/GYE0qHX/A1OzB9zROvoHJ7
 sa3l8O5XRWiaxBYDkT0RfhHH0jeDdvm3I9yt8B+4B6c71094VIsfGXBVPp4tPrdg
 nkuzBdwF0HFPiFrlsfboJDLcXPLpRR93H1GsmfELYd5jQ4rtUhlcuEESq6573tvB
 TV93tkm/zmbwtInMoPI29qKL8t2478cJH7SvKvM4NiqMsB1zOhknhUXzElh9TPGA
 xyNivxJraYJzL53XguBFO8A8fP1k/E8Z6UQXJbgry4lu+6qZ60e0/J8zGxGpSamP
 i1JX0MAVPX6T4MAlZ70LMxfmzJ5sSNkkYyXobG+aBa/AgzRsXVvG4So1qi364COx
 btCxgBXK1Z20ZuNclY8/J06D8EbTXI5y5MCSDvMCOHQlb5mjBl34RtFVw+5/QXkg
 v2suc7T/YLOPNtZktZC2506caPHoOlwEVvkyA55p+qdkcD/Dd5Iv4Hndi+g+C5gv
 O2ja7gUQco1R8ElormKW9rE7OvjiUlowNJmguXWAdzc9FC0yISpP66BAGjBqwhF9
 6YibEebXMQICjxAVTEAM
 =fvfu
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- prevent NULL dereference in multicast code

Antonion Quartulli says:

====================
pull request net: batman-adv 20140527

here you have another very small fix intended for net/linux-3.15.
It prevents some multicast functions from dereferencing a NULL pointer.
(Actually it was nothing more than a typo)
I hope it is not too late for such a small patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31 20:01:47 -07:00
Linus Torvalds a4bf79eb6a Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core futex/rtmutex fixes from Thomas Gleixner:
 "Three fixlets for long standing issues in the futex/rtmutex code
  unearthed by Dave Jones syscall fuzzer:

   - Add missing early deadlock detection checks in the futex code
   - Prevent user space from attaching a futex to kernel threads
   - Make the deadlock detector of rtmutex work again

  Looks large, but is more comments than code change"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rtmutex: Fix deadlock detector for real
  futex: Prevent attaching to kernel threads
  futex: Add another early deadlock detection check
2014-05-31 09:47:55 -07:00
Linus Torvalds 80e0679469 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Mostly quiet now:

  i915:
    fixing userspace visiblie issues, all stable marked

  radeon:
    one more pll fix, two crashers, one suspend/resume regression"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: Resume fbcon last
  drm/radeon: only allocate necessary size for vm bo list
  drm/radeon: don't allow RADEON_GEM_DOMAIN_CPU for command submission
  drm/radeon: avoid crash if VM command submission isn't available
  drm/radeon: lower the ref * post PLL maximum once more
  drm/i915: Prevent negative relocation deltas from wrapping
  drm/i915: Only copy back the modified fields to userspace from execbuffer
  drm/i915: Fix dynamic allocation of physical handles
2014-05-31 09:19:02 -07:00
Linus Torvalds 9f12600fe4 dcache: add missing lockdep annotation
lock_parent() very much on purpose does nested locking of dentries, and
is careful to maintain the right order (lock parent first).  But because
it didn't annotate the nested locking order, lockdep thought it might be
a deadlock on d_lock, and complained.

Add the proper annotation for the inner locking of the child dentry to
make lockdep happy.

Introduced by commit 046b961b45 ("shrink_dentry_list(): take parent's
->d_lock earlier").

Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-31 09:13:21 -07:00
Marek Lindner af0a171c07 batman-adv: fix NULL pointer dereferences
Was introduced with 4c8755d69c
("batman-adv: Send multicast packets to nodes with a WANT_ALL flag")

Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-05-31 10:07:14 +02:00
David S. Miller dbfc4b698a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains a late fix for IPVS:

* Fix crash when trying to remove the transport header with non-linear
  skbuffs, this was introduced in 3.6-rc. Patch from Peter Christensen
  via the IPVS folks.

I'll pass this to -stable once this hits mainstream.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:56:09 -07:00
David S. Miller 554215c5ca linux-can-fixes-for-3.15-20140528
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlOFkZ4ACgkQjTAFq1RaXHOYCQCeJwwa8XQMw2HkGemGMBSpcUHb
 MLwAoISCoqgrXWq/lBBzDqlVXik8fR4t
 =gHVg
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-3.15-20140528' of git://gitorious.org/linux-can/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2014-05-28

here's a pull request for v3.15, hope it's not too late.

Oliver Hartkopp fixed a bug in the CAN led trigger device renaming code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:32:53 -07:00
Jack Morgenstein 111c6094bd net/mlx4_core: Reset RoCE VF gids when guest driver goes down
Reset the GIDs assigned to a VF in the port RoCE GID table when
that guest goes down (either crashes or goes down cleanly).

As part of this fix, we refactor the RoCE gid table driver copy,
moving it to the mlx4_port_info structure (together with the MAC
and VLAN tables).

As with the MAC and VLAN tables, we now use a mutex per port
for the GID table so that modifying the driver copy and
modifying the firmware copy of a port GID table becomes an
atomic operation (thus avoiding driver-copy/FW-copy mismatches).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 16:57:52 -07:00
Ivan Mikhaylov 09271db6e0 emac: aggregation of v1-2 PLB errors for IER register
Aggreagation of version 1-2 because of version 1 can hit
PLB errors too. If it's not set so we missing events for PLB bits
and driver can't process those interrupts.

Signed-off-by: Ivan Mikhaylov <ivan@ru.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 16:29:57 -07:00
Ivan Mikhaylov faacd3af0c emac: add missing support of 10mbit in emac/rgmii
In chips of emac/rgmii b'000' for 0/1 channel isn't suitable which
resulted in non working network interface in this mode.

Signed-off-by: Ivan Mikhaylov <ivan@ru.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 16:29:57 -07:00
Daniel Vetter 18ee37a485 drm/radeon: Resume fbcon last
So a few people complained that

commit 177cf92de4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Apr 1 22:14:59 2014 +0200

    drm/crtc-helpers: fix dpms on logic

which was merged into 3.15-rc1, broke resume on radeons. Strangely git
bisect lead everyone to

commit 25f397a429
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Jul 19 18:57:11 2013 +0200

    drm/crtc-helper: explicit DPMS on after modeset

which was merged long ago and actually part of 3.14.

Digging deeper I've noticed (again) that the call to
drm_helper_resume_force_mode in the radeon resume handlers was a no-op
previously because everything gets shut down on suspend. radeon does
this with explicit calls to drm_helper_connector_dpms with DPMS_OFF.
But with 177c we now force the dpms state to ON, so suddenly
resume_force_mode actually forced the crtcs back on.

This is the intention of the change after all, the problem is that
radeon resumes the fbdev console layer _before_ restoring the display,
through calling fb_set_suspend. And fbcon does an immediate ->set_par,
which in turn causes the same forced mode restore to happen.

Two concurrent modeset operations didn't lead to happiness. Fix this
by delaying the fbcon resume until the end of the readeon resum
functions.

v2: Fix up a bit of the spelling fail.

References: https://lkml.org/lkml/2014/5/29/1043
References: https://lkml.org/lkml/2014/5/2/388
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=74751
Tested-by: Ken Moffat <zarniwhoop@ntlworld.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Ken Moffat <zarniwhoop@ntlworld.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>
2014-05-31 09:19:51 +10:00
Dave Airlie 1446e04c9b Merge branch 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux into drm-fixes
this is the next pull request for stashed up radeon fixes for 3.15. This is finally calming down with only four patches in this pull request.

* 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux:
  drm/radeon: only allocate necessary size for vm bo list
  drm/radeon: don't allow RADEON_GEM_DOMAIN_CPU for command submission
  drm/radeon: avoid crash if VM command submission isn't available
  drm/radeon: lower the ref * post PLL maximum once more
2014-05-31 09:19:05 +10:00
Linus Torvalds 1487385edb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
 "A couple of driver/build fixups and also redone quirk for Synaptics
  touchpads on Lenovo boxes (now using PNP IDs instead of DMI data to
  limit number of quirks)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics - change min/max quirk table to pnp-id matching
  Input: synaptics - add a matches_pnp_id helper function
  Input: synaptics - T540p - unify with other LEN0034 models
  Input: synaptics - add min/max quirk for the ThinkPad W540
  Input: ambakmi - request a shared interrupt for AMBA KMI devices
  Input: pxa27x-keypad - fix generating scancode
  Input: atmel-wm97xx - only build for AVR32
  Input: fix ps2/serio module dependency
2014-05-30 12:07:48 -07:00
Linus Torvalds 1326af2464 Regression fix for the IEEE 1394 subsystem:
Re-enable IRQ-based asynchronous request reception at addresses below 128 TB.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTiIlwAAoJEHnzb7JUXXnQ+4oQAJ3ZHtqDJNP99qokiyVorDSw
 hk2b/nCF2A7PYGgYdyImDx+xiOdI7FE3T6vpDBrel/Vfc/eny3g1PSmklF0pmrgR
 LKyEAKmqRrEFKNevJPIAOyvf8jMV4E7mkKKp2iGN5s77CdW/KeoNfeP8yzRzMW3n
 ChG9yZIPWMVuoiGBL74iOZujETP3Y62CVRP9Anz3uG2I7oGQN+09uK+Qc58SR917
 fbvJh8f2Yrf+st7oogwpFvKRlDqyskmoonWc39OCPcGeumQ9LycTs5KTMx4htVjW
 jf8bJDQ/xmHCvPpZpQtZiMbj5pmMkD9cSt0KvMHv4TuV/1NBFECITjoQ03//+byb
 RmxTH8GRryfpafTLL1Qlf+boecjEK9yuJ/k1YbDEt9YTc6x29oTqJjeXbli5r3FX
 f0MYLpDEdDH4GuKMgdvDt+CFzGI7NHWupjyYbm/uT0LCEvLhmKnh3z+3HhmUO2wD
 7PoDp63t779YhiScnfA+P5CXmjnMjZNtqeCyJYiCDrzcCS3VVQVocFf3iEKF8xrf
 gOju+BZHuNzw5S3Tn0kf3LaexUW9eMD1LAYDooF1eqpl/hrmsYpLhw3mI7601CDa
 UNHTY2Wz6Y/2PwkpzKgsHrrKnyaQWak/kZl0jFy7lZhvJqFyNPe8nVWYRdPdVeHk
 /b/TKJZDiRQXZCrcfEXc
 =GDAT
 -----END PGP SIGNATURE-----

Merge tag 'firewire-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Stefan Richter:
 "A regression fix for the IEEE 1394 subsystem: re-enable IRQ-based
  asynchronous request reception at addresses below 128 TB"

* tag 'firewire-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: revert to 4 GB RDMA, fix protocols using Memory Space
2014-05-30 12:06:15 -07:00
Linus Torvalds 24e19d279f A dm-cache stable fix to split discards on cache block boundaries
because dm-cache cannot yet handle discards that span cache blocks.
 
 Really fix a dm-mpath LOCKDEP warning that was introduced in -rc1.
 
 Add a 'no_space_timeout' control to dm-thinp to restore the ability to
 queue IO indefinitely when no data space is available.  This fixes a
 change in behavior that was introduced in -rc6 where the timeout
 couldn't be disabled.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTh9QAAAoJEMUj8QotnQNaNpYH/j07FeH8YlxXRcFzDi7xRVtx
 luK5b9fLLlmPwW2eKSrvpI8Le4jwDvLwBmpEvN9/wyPiRDSUnYIyYdoV7RJXX2LT
 wqXatObb84fwQBJ6/q8o2YMzU5ODa5XT6KGEZyD4cHdAZ9FZSwfgqhslyrBJDkSN
 JBFfkXu066qw8cuYA6KFv4DwBf5eHAt5AjV/QPGd5zGXwETHLZ4ypgpwYHAGbdXa
 MgfHetwtEnJYvVQex/e+9xC5IDc4/BEAhZq4n3YmEJjNq8EbX15udHmCX7S2M5pT
 +9tNjUMz4j9BhoC9F8ntRz0pxWZtJK9hGojO4xoXqOCOHgp1xLQd/tHrFZS0v8E=
 =u5Xd
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device-mapper fixes from Mike Snitzer:
 "A dm-cache stable fix to split discards on cache block boundaries
  because dm-cache cannot yet handle discards that span cache blocks.

  Really fix a dm-mpath LOCKDEP warning that was introduced in -rc1.

  Add a 'no_space_timeout' control to dm-thinp to restore the ability to
  queue IO indefinitely when no data space is available.  This fixes a
  change in behavior that was introduced in -rc6 where the timeout
  couldn't be disabled"

* tag 'dm-3.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm mpath: really fix lockdep warning
  dm cache: always split discards on cache block boundaries
  dm thin: add 'no_space_timeout' dm-thin-pool module param
2014-05-30 12:04:56 -07:00
Minchan Kim 6538b8ea88 x86_64: expand kernel stack to 16K
While I play inhouse patches with much memory pressure on qemu-kvm,
3.14 kernel was randomly crashed. The reason was kernel stack overflow.

When I investigated the problem, the callstack was a little bit deeper
by involve with reclaim functions but not direct reclaim path.

I tried to diet stack size of some functions related with alloc/reclaim
so did a hundred of byte but overflow was't disappeard so that I encounter
overflow by another deeper callstack on reclaim/allocator path.

Of course, we might sweep every sites we have found for reducing
stack usage but I'm not sure how long it saves the world(surely,
lots of developer start to add nice features which will use stack
agains) and if we consider another more complex feature in I/O layer
and/or reclaim path, it might be better to increase stack size(
meanwhile, stack usage on 64bit machine was doubled compared to 32bit
while it have sticked to 8K. Hmm, it's not a fair to me and arm64
already expaned to 16K. )

So, my stupid idea is just let's expand stack size and keep an eye
toward stack consumption on each kernel functions via stacktrace of ftrace.
For example, we can have a bar like that each funcion shouldn't exceed 200K
and emit the warning when some function consumes more in runtime.
Of course, it could make false positive but at least, it could make a
chance to think over it.

I guess this topic was discussed several time so there might be
strong reason not to increase kernel stack size on x86_64, for me not
knowing so Ccing x86_64 maintainers, other MM guys and virtio
maintainers.

Here's an example call trace using up the kernel stack:

         Depth    Size   Location    (51 entries)
         -----    ----   --------
   0)     7696      16   lookup_address
   1)     7680      16   _lookup_address_cpa.isra.3
   2)     7664      24   __change_page_attr_set_clr
   3)     7640     392   kernel_map_pages
   4)     7248     256   get_page_from_freelist
   5)     6992     352   __alloc_pages_nodemask
   6)     6640       8   alloc_pages_current
   7)     6632     168   new_slab
   8)     6464       8   __slab_alloc
   9)     6456      80   __kmalloc
  10)     6376     376   vring_add_indirect
  11)     6000     144   virtqueue_add_sgs
  12)     5856     288   __virtblk_add_req
  13)     5568      96   virtio_queue_rq
  14)     5472     128   __blk_mq_run_hw_queue
  15)     5344      16   blk_mq_run_hw_queue
  16)     5328      96   blk_mq_insert_requests
  17)     5232     112   blk_mq_flush_plug_list
  18)     5120     112   blk_flush_plug_list
  19)     5008      64   io_schedule_timeout
  20)     4944     128   mempool_alloc
  21)     4816      96   bio_alloc_bioset
  22)     4720      48   get_swap_bio
  23)     4672     160   __swap_writepage
  24)     4512      32   swap_writepage
  25)     4480     320   shrink_page_list
  26)     4160     208   shrink_inactive_list
  27)     3952     304   shrink_lruvec
  28)     3648      80   shrink_zone
  29)     3568     128   do_try_to_free_pages
  30)     3440     208   try_to_free_pages
  31)     3232     352   __alloc_pages_nodemask
  32)     2880       8   alloc_pages_current
  33)     2872     200   __page_cache_alloc
  34)     2672      80   find_or_create_page
  35)     2592      80   ext4_mb_load_buddy
  36)     2512     176   ext4_mb_regular_allocator
  37)     2336     128   ext4_mb_new_blocks
  38)     2208     256   ext4_ext_map_blocks
  39)     1952     160   ext4_map_blocks
  40)     1792     384   ext4_writepages
  41)     1408      16   do_writepages
  42)     1392      96   __writeback_single_inode
  43)     1296     176   writeback_sb_inodes
  44)     1120      80   __writeback_inodes_wb
  45)     1040     160   wb_writeback
  46)      880     208   bdi_writeback_workfn
  47)      672     144   process_one_work
  48)      528     112   worker_thread
  49)      416     240   kthread
  50)      176     176   ret_from_fork

[ Note: the problem is exacerbated by certain gcc versions that seem to
  generate much bigger stack frames due to apparently bad coalescing of
  temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
  35% more stack on the virtio path than 4.8.2 does, for example.

  Minchan not only uses such a bad gcc version (4.6.3 in his case), but
  some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
  what causes that kernel_map_pages() frame, for example). But we're
  clearly getting too close.

  The VM code also seems to have excessive stack frames partly for the
  same compiler reason, triggered by excessive inlining and lots of
  function arguments.

  We need to improve on our stack use, but in the meantime let's do this
  simple stack increase too.  Unlike most earlier reports, there is
  nothing simple that stands out as being really horribly wrong here,
  apart from the fact that the stack frames are just bigger than they
  should need to be.        - Linus ]

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S Tsirkin <mst@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-30 11:52:51 -07:00
Linus Torvalds 6f6111e4a7 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs dcache livelock fix from Al Viro:
 "Fixes for livelocks in shrink_dentry_list() introduced by fixes to
  shrink list corruption; the root cause was that trylock of parent's
  ->d_lock could be disrupted by d_walk() happening on other CPUs,
  resulting in shrink_dentry_list() making no progress *and* the same
  d_walk() being called again and again for as long as
  shrink_dentry_list() doesn't get past that mess.

  The solution is to have shrink_dentry_list() treat that trylock
  failure not as 'try to do the same thing again', but 'lock them in the
  right order'"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dentry_kill() doesn't need the second argument now
  dealing with the rest of shrink_dentry_list() livelock
  shrink_dentry_list(): take parent's ->d_lock earlier
  expand dentry_kill(dentry, 0) in shrink_dentry_list()
  split dentry_kill()
  lift the "already marked killed" case into shrink_dentry_list()
2014-05-30 09:52:55 -07:00
Al Viro 8cbf74da43 dentry_kill() doesn't need the second argument now
it's 1 in the only remaining caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-30 11:10:33 -04:00
Al Viro b2b80195d8 dealing with the rest of shrink_dentry_list() livelock
We have the same problem with ->d_lock order in the inner loop, where
we are dropping references to ancestors.  Same solution, basically -
instead of using dentry_kill() we use lock_parent() (introduced in the
previous commit) to get that lock in a safe way, recheck ->d_count
(in case if lock_parent() has ended up dropping and retaking ->d_lock
and somebody managed to grab a reference during that window), trylock
the inode->i_lock and use __dentry_kill() to do the rest.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-30 11:10:33 -04:00
Al Viro 046b961b45 shrink_dentry_list(): take parent's ->d_lock earlier
The cause of livelocks there is that we are taking ->d_lock on
dentry and its parent in the wrong order, forcing us to use
trylock on the parent's one.  d_walk() takes them in the right
order, and unfortunately it's not hard to create a situation
when shrink_dentry_list() can't make progress since trylock
keeps failing, and shrink_dcache_parent() or check_submounts_and_drop()
keeps calling d_walk() disrupting the very shrink_dentry_list() it's
waiting for.

Solution is straightforward - if that trylock fails, let's unlock
the dentry itself and take locks in the right order.  We need to
stabilize ->d_parent without holding ->d_lock, but that's doable
using RCU.  And we'd better do that in the very beginning of the
loop in shrink_dentry_list(), since the checks on refcount, etc.
would need to be redone anyway.

That deals with a half of the problem - killing dentries on the
shrink list itself.  Another one (dropping their parents) is
in the next commit.

locking parent is interesting - it would be easy to do rcu_read_lock(),
lock whatever we think is a parent, lock dentry itself and check
if the parent is still the right one.  Except that we need to check
that *before* locking the dentry, or we are risking taking ->d_lock
out of order.  Fortunately, once the D1 is locked, we can check if
D2->d_parent is equal to D1 without the need to lock D2; D2->d_parent
can start or stop pointing to D1 only under D1->d_lock, so taking
D1->d_lock is enough.  In other words, the right solution is
rcu_read_lock/lock what looks like parent right now/check if it's
still our parent/rcu_read_unlock/lock the child.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-30 11:03:21 -04:00