Commit Graph

442806 Commits

Author SHA1 Message Date
Linus Torvalds 813895f8dc Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "A significantly larger than I'd like set of patches for just below the
  wire.  All of these, however, fix real problems.

  The one thing that is genuinely scary in here is the change of SMP
  initialization, but that *does* fix a confirmed hang when booting
  virtual machines.

  There is also a patch to actually do the right thing about not
  offlining a CPU when there are not enough interrupt vectors available
  in the system; the accounting was done incorrectly.  The worst case
  for that patch is that we fail to offline CPUs when we should (the new
  code is strictly more conservative than the old), so is not
  particularly risky.

  Most of the rest is minor stuff; the EFI patches are all about
  exporting correct information to boot loaders and kexec"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: EFI_MIXED should not prohibit loading above 4G
  x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
  x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
  x86: Fix list/memory corruption on CPU hotplug
  x86: irq: Get correct available vectors for cpu disable
  x86/efi: Do not export efi runtime map in case old map
  x86/efi: earlyprintk=efi,keep fix
2014-06-07 14:50:38 -07:00
Matt Fleming 745c51673e x86/boot: EFI_MIXED should not prohibit loading above 4G
commit 7d453eee36 ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a
regression for the functionality to load kernels above 4G. The relevant
(incorrect) reasoning behind this change can be seen in the commit
message,

  "The xloadflags field in the bzImage header is also updated to reflect
  that the kernel supports both entry points by setting both of
  XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
  XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
  guaranteed to be addressable with 32-bits."

This is obviously bogus since 32-bit EFI loaders will never place the
kernel above the 4G mark. So this restriction is entirely unnecessary.

But things are worse than that - since we want to encourage people to
always compile with CONFIG_EFI_MIXED=y so that their kernels work out of
the box for both 32-bit and 64-bit firmware, commit 7d453eee36
effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely.

Remove the overzealous and superfluous restriction and restore the
XLF_CAN_BE_LOADED_ABOVE_4G functionality.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-06-07 09:31:00 -07:00
Naoya Horiguchi d4c54919ed mm: add !pte_present() check on existing hugetlb_entry callbacks
The age table walker doesn't check non-present hugetlb entry in common
path, so hugetlb_entry() callbacks must check it.  The reason for this
behavior is that some callers want to handle it in its own way.

[ I think that reason is bogus, btw - it should just do what the regular
  code does, which is to call the "pte_hole()" function for such hugetlb
  entries  - Linus]

However, some callers don't check it now, which causes unpredictable
result, for example when we have a race between migrating hugepage and
reading /proc/pid/numa_maps.  This patch fixes it by adding !pte_present
checks on buggy callbacks.

This bug exists for years and got visible by introducing hugepage
migration.

ChangeLog v2:
- fix if condition (check !pte_present() instead of pte_present())

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Backported to 3.15.  Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 13:21:16 -07:00
Linus Torvalds d54d14bfb4 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Four misc fixes: each was deemed serious enough to warrant v3.15
  inclusion"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
  sched/dl: Fix race in dl_task_timer()
  sched: Fix sched_policy < 0 comparison
  sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled
2014-06-06 09:53:32 -07:00
Andrey Ryabinin 624483f3ea mm: rmap: fix use-after-free in __put_anon_vma
While working address sanitizer for kernel I've discovered
use-after-free bug in __put_anon_vma.

For the last anon_vma, anon_vma->root freed before child anon_vma.
Later in anon_vma_free(anon_vma) we are referencing to already freed
anon_vma->root to check rwsem.

This fixes it by freeing the child anon_vma before freeing
anon_vma->root.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 08:53:41 -07:00
H. Peter Anvin 177875423e * Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
of the framebuffer when early_ioremap() is no longer available and
    dropping __init from functions that may be invoked after
    free_initmem() - Dave Young
 
  * We shouldn't be exporting the EFI runtime map in sysfs if not using
    the new 1:1 EFI mapping code since in that case the mappings are not
    static across a kexec reboot - Dave Young
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTj0xzAAoJEC84WcCNIz1VgBgP/3rzGcVB6JBT5qAy2SCcKkL1
 zKAvMNiFWpfyOwICE1gbfZzDky9DIVRXpzriLIe27aQzgn/GYBLH9hAIsYO/5W1m
 uw/KRMKbUkvfnUKPxZUMe95OMsXqwhYth7oGJXq1yFa6SsQdCU4jnmMflUQ6IWfy
 sXtQzlV7frFO0W55Gxv4jRB/nxrExtTHm9Fc01H9xvguKO2l0KEh0C9PbyfZLQp0
 FffxGhcVSZXG23fB5eU0yKLV69buDk83O1y4lQ79zETvCTjuzbpGe2agFDxrlJ/u
 wxZROMStvthkHqapheGwwBu4SySoKsuNRQHKxeJ2t3XEVSBuW+c9jiYYMmpDS0t6
 LhIdLdDtBk8ItsTY60r6w86oLvQVTnkwLxwX/XdzUyA/bSzf/EzHENw2iPbJhpjn
 qfq2ZonXD8j/EVPyYXk65wUea4tMmG1QYpi9oPqMp3b4hCgoJr7qosOqvT9v+E3/
 bbGTjr6bgzOaXdctOocmEg0qfhMw/Ol63V9P4wsiN5mUq1nRJnJcKQtqA3Do08IP
 RgZJg8yQAe9FAfZ0RDaRKs5OSPgnC42t/d6pr9JdiFzgR5Ktw7BymfmOu52o7Koq
 EJFf1TlD+Kipdyo/1Q4s9vtZ1ijO5If6FSL9rYJoduWMWCPSZ9by43uCIbEobEQh
 aXD9ZvSyeu5IwD4BoBiO
 =7zSB
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Fix earlyprintk=efi,keep support by switching to an ioremap() mapping
   of the framebuffer when early_ioremap() is no longer available and
   dropping __init from functions that may be invoked after
   free_initmem() - Dave Young

 * We shouldn't be exporting the EFI runtime map in sysfs if not using
   the new 1:1 EFI mapping code since in that case the mappings are not
   static across a kexec reboot - Dave Young

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-06-05 13:09:44 -07:00
Linus Torvalds 951e273060 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two last minute tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf probe: Fix perf probe to find correct variable DIE
  perf probe: Fix a segfault if asked for variable it doesn't find
2014-06-05 12:51:05 -07:00
Linus Torvalds 1c5aefb5b1 Merge branch 'futex-fixes' (futex fixes from Thomas Gleixner)
Merge futex fixes from Thomas Gleixner:
 "So with more awake and less futex wreckaged brain, I went through my
  list of points again and came up with the following 4 patches.

  1) Prevent pi requeueing on the same futex

     I kept Kees check for uaddr1 == uaddr2 as a early check for private
     futexes and added a key comparison to both futex_requeue and
     futex_wait_requeue_pi.

     Sebastian, sorry for the confusion yesterday night.  I really
     misunderstood your question.

     You are right the check is pointless for shared futexes where the
     same physical address is mapped to two different virtual addresses.

  2) Sanity check atomic acquisiton in futex_lock_pi_atomic

     That's basically what Darren suggested.

     I just simplified it to use futex_top_waiter() to find kernel
     internal state.  If state is found return -EINVAL and do not bother
     to fix up the user space variable.  It's corrupted already.

  3) Ensure state consistency in futex_unlock_pi

     The code is silly versus the owner died bit.  There is no point to
     preserve it on unlock when the user space thread owns the futex.

     What's worse is that it does not update the user space value when
     the owner died bit is set.  So the kernel itself creates observable
     inconsistency.

     Another "optimization" is to retry an atomic unlock.  That's
     pointless as in a sane environment user space would not call into
     that code if it could have unlocked it atomically.  So we always
     check whether there is kernel state around and only if there is
     none, we do the unlock by setting the user space value to 0.

  4) Sanitize lookup_pi_state

     lookup_pi_state is ambigous about TID == 0 in the user space value.

     This can be a valid state even if there is kernel state on this
     uaddr, but we miss a few corner case checks.

     I tried to come up with a smaller solution hacking the checks into
     the current cruft, but it turned out to be ugly as hell and I got
     more confused than I was before.  So I rewrote the sanity checks
     along the state documentation with awful lots of commentry"

* emailed patches from Thomas Gleixner <tglx@linutronix.de>:
  futex: Make lookup_pi_state more robust
  futex: Always cleanup owner tid in unlock_pi
  futex: Validate atomic acquisition in futex_lock_pi_atomic()
  futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
2014-06-05 12:31:32 -07:00
Thomas Gleixner 54a217887a futex: Make lookup_pi_state more robust
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi->owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | >0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner 13fbca4c6e futex: Always cleanup owner tid in unlock_pi
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner b3eaa9fc5c futex: Validate atomic acquisition in futex_lock_pi_atomic()
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner e9c243a5a6 futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Igor Mammedov 3e1a878b7c x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:

   https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:08 +02:00
Igor Mammedov feef1e8ecb x86/smpboot: Log error on secondary CPU wakeup failure at ERR level
If system is running without debug level logging,
it will not log error if do_boot_cpu() failed to
wakeup AP. It may lead to silent AP bringup
failures at boot time.
Change message level to KERN_ERR to make error
visible to user as it's done on other architectures.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:07 +02:00
Igor Mammedov 89f898c1e1 x86: Fix list/memory corruption on CPU hotplug
currently if AP wake up is failed, master CPU marks AP as not
present in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:

[  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[  418.178445] Call Trace:
[  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
[  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
[  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
[  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
[  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
[  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
[  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
[  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
[  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
[  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
[  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
[  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
[  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
[  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:

 cpu = cpumask_next_zero(-1, cpu_present_mask);

which returns id of previously failed to wake up CPU, since its
bit is cleared by do_boot_cpu() and as result register_cpu()
tries to register another CPU with the same id as already
present but failed to be onlined CPU.

Taking in account that AP will not do anything if master CPU
failed to wake it up, there is no reason to mark that AP as not
present and break next cpu hotplug attempts. As a side effect of
not marking AP as not present, user would be allowed to online
it again later.

Also fix memory corruption in acpi_unmap_lsapic()

if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.

However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.

So with above fix of cpu_present_mask make sure that a present
CPU has a valid APIC ID by not setting x86_cpu_to_apicid
to BAD_APICID in do_boot_cpu() on failure and allow
acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 16:33:07 +02:00
Roman Gushchin 09dc4ab039 sched/fair: Fix tg_set_cfs_bandwidth() deadlock on rq->lock
tg_set_cfs_bandwidth() sets cfs_b->timer_active to 0 to
force the period timer restart. It's not safe, because
can lead to deadlock, described in commit 927b54fccbf0:
"__start_cfs_bandwidth calls hrtimer_cancel while holding rq->lock,
waiting for the hrtimer to finish. However, if sched_cfs_period_timer
runs for another loop iteration, the hrtimer can attempt to take
rq->lock, resulting in deadlock."

Three CPUs must be involved:

  CPU0               CPU1                         CPU2
  take rq->lock      period timer fired
  ...                take cfs_b lock
  ...                ...                          tg_set_cfs_bandwidth()
  throttle_cfs_rq()  release cfs_b lock           take cfs_b lock
  ...                distribute_cfs_runtime()     timer_active = 0
  take cfs_b->lock   wait for rq->lock            ...
  __start_cfs_bandwidth()
  {wait for timer callback
   break if timer_active == 1}

So, CPU0 and CPU1 are deadlocked.

Instead of resetting cfs_b->timer_active, tg_set_cfs_bandwidth can
wait for period timer callbacks (ignoring cfs_b->timer_active) and
restart the timer explicitly.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/87wqdi9g8e.wl\%klamm@yandex-team.ru
Cc: pjt@google.com
Cc: chris.j.arges@canonical.com
Cc: gregkh@linuxfoundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:51:34 +02:00
Kirill Tkhai 0f397f2c90 sched/dl: Fix race in dl_task_timer()
Throttled task is still on rq, and it may be moved to other cpu
if user is playing with sched_setaffinity(). Therefore, unlocked
task_rq() access makes the race.

Juri Lelli reports he got this race when dl_bandwidth_enabled()
was not set.

Other thing, pointed by Peter Zijlstra:

   "Now I suppose the problem can still actually happen when
    you change the root domain and trigger a effective affinity
    change that way".

To fix that we do the same as made in __task_rq_lock(). We do not
use __task_rq_lock() itself, because it has a useful lockdep check,
which is not correct in case of dl_task_timer(). We do not need
pi_lock locked here. This case is an exception (PeterZ):

   "The only reason we don't strictly need ->pi_lock now is because
    we're guaranteed to have p->state == TASK_RUNNING here and are
    thus free of ttwu races".

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.14+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3056991400578422@web14g.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:51:12 +02:00
Richard Weinberger b14ed2c273 sched: Fix sched_policy < 0 comparison
attr.sched_policy is u32, therefore a comparison against < 0 is never true.
Fix this by casting sched_policy to int.

This issue was reported by coverity CID 1219934.

Fixes: dbdb22754f ("sched: Disallow sched_attr::sched_policy < 0")
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1401741514-7045-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:07:43 +02:00
Steven Rostedt e9dd685ce8 sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled
As Peter Zijlstra told me, we have the following path:

do_exit()
  exit_itimers()
    itimer_delete()
      spin_lock_irqsave(&timer->it_lock, &flags);
      timer_delete_hook(timer);
        kc->timer_del(timer) := posix_cpu_timer_del()
          put_task_struct()
            __put_task_struct()
              task_numa_free()
                spin_lock(&grp->lock);

Which means that task_numa_free() can be called with interrupts
disabled, which means that we should not be using spin_lock_irq() but
spin_lock_irqsave() instead. Otherwise we are enabling interrupts while
holding an interrupt unsafe lock!

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner<tglx@linutronix.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140527182541.GH11096@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 11:07:41 +02:00
Ingo Molnar 22c91aa235 perf/urgent fixes:
. Fix perf probe to find correct variable DIE (Masami Hiramatsu)
 
 . Fix a segfault in perf probe if asked for variable it doesn't find (Masami Hiramatsu)
 
 Signed-off-by: Jiri Olsa <jolsa@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTjxatAAoJEPZqUSBWB3s9TwMQAJGA0B+NYny3LyCZyQm2hfbf
 eI/bgzZ757JT2/ughf7ccXRBXMlcfhYFh8tIkgR0/Ky9qSlwNt5yA+d7BfxI/hyW
 +TjW9JpxM3pudOzLK25C1Z4g4s2C+E5qPR+IgK3xoHhaEwSNc47SZpK1A9PqdxXo
 hseR7JFFTcaO9xZdFmwjMGbDeNlQ7Juq4EzwKlstuGxL5XkLRHkBXZyUgsOwNP2D
 tiUGbEHFJtVmrCqRpZ0yALAxjTWRPxMhXTTGePvS58sS6bWukG0BkL/0rlFBM0r0
 ro0bsXxZw6JgbPTT7W0iHHCjiMTOXOXo4Eit8hwHFWn9oLubU2DqhknTucX8G0PX
 7dM0sNEgC3VmY3bueqYUEAuqrDN9c+XZYg5nOuqHx2x8lqQyXqfLX67Qf05I5ZMs
 SlAKTcA70ueVvAZh0XoK5QvtbjmSJpWnznHsRbe6qIWlYrTMp9UiGEmnROIMpZ1W
 IZf2rRPVT1Z3Wkhp7LGcoDHOiJkNRZw/8v0Xn7Cl2DvZfFgxG2qemfW966Jzep02
 OOmZyklu6MIsqo0ZbCmA0gDDNXHWWmxdKFqvTfkn8LlFfeQjKPj2/JiEWGsuWZVQ
 2ZVqu5OYGoSj9dU32fKT/o1Kt5JLxHctaQUb1jrhfnB75f6jNbJhgV1eQFZ/+YF6
 JudOUvZ9rlabrV4yY9x8
 =xQAY
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/urgent

Pull perf/urgent fixes from Jiri Olsa:

 * Fix perf probe to find correct variable DIE (Masami Hiramatsu)

 * Fix a segfault in perf probe if asked for variable it doesn't find (Masami Hiramatsu)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-05 09:54:01 +02:00
Linus Torvalds 54539cd217 Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu fix from Tejun Heo:
 "It is very late but this is an important percpu-refcount fix from
  Sebastian Ott.

  The problem is that percpu_ref_*() used __this_cpu_*() instead of
  this_cpu_*().  The difference between the two is that the latter is
  atomic on the local cpu while the former is not.  this_cpu_inc() is
  guaranteed to increment the percpu counter on the cpu that the
  operation is executed on without any synchronization; however,
  __this_cpu_inc() doesn't and if the local cpu invokes the function
  from different contexts (e.g.  process and irq) of the same CPU, it's
  not guaranteed to actually increment as it may be implemented as rmw.

  This bug existed from the get-go but it hasn't been noticed earlier
  probably because on x86 __this_cpu_inc() is equivalent to
  this_cpu_inc() as both get translated into single instruction;
  however, s390 uses the generic rmw implementation and gets affected by
  the bug.  Kudos to Sebastian and Heiko for diagnosing it.

  The change is very low risk and fixes a critical issue on the affected
  architectures, so I think it's a good candidate for inclusion although
  it's very late in the devel cycle.  On the other hand, this has been
  broken since v3.11, so backporting it through -stable post -rc1 won't
  be the end of the world.

  I'll ping Christoph whether __this_cpu_*() ops can be better annotated
  so that it can trigger lockdep warning when used from multiple
  contexts"

* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu-refcount: fix usage of this_cpu_ops
2014-06-04 09:56:03 -07:00
Sebastian Ott 0c36b390a5 percpu-refcount: fix usage of this_cpu_ops
The percpu-refcount infrastructure uses the underscore variants of
this_cpu_ops in order to modify percpu reference counters.
(e.g. __this_cpu_inc()).

However the underscore variants do not atomically update the percpu
variable, instead they may be implemented using read-modify-write
semantics (more than one instruction).  Therefore it is only safe to
use the underscore variant if the context is always the same (process,
softirq, or hardirq). Otherwise it is possible to lose updates.

This problem is something that Sebastian has seen within the aio
subsystem which uses percpu refcounters both in process and softirq
context leading to reference counts that never dropped to zeroes; even
though the number of "get" and "put" calls matched.

Fix this by using the non-underscore this_cpu_ops variant which
provides correct per cpu atomic semantics and fixes the corrupted
reference counts.

Cc: Kent Overstreet <kmo@daterainc.com>
Cc: <stable@vger.kernel.org> # v3.11+
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
References: http://lkml.kernel.org/g/alpine.LFD.2.11.1406041540520.21183@denkbrett
2014-06-04 12:12:29 -04:00
Linus Torvalds c717d15614 Final power management fixes for 3.15
- Taking non-idle time into account when calculating core busy
    time was a mistake and led to a performance regression.  Since
    the problem it was supposed to address is now taken care of in
    a different way, we don't need to do it any more, so drop the
    non-idle time tracking from intel_pstate.  Dirk Brandewie.
 
  - Changing to fixed point math throughout the busy calculation
    introduced rounding errors that adversely affect the accuracy
    of intel_pstate's computations.  Fix from Dirk Brandewie.
 
  - The PID controller algorithm used by intel_pstate assumes that
    the time interval between two adjacent samples will always be the
    same which is not the case for deferable timers (used by
    intel_pstate) when the system is idle. This leads to inaccurate
    predictions and artificially increases convergence times for
    the minimum P-state.  Fix from Dirk Brandewie.
 
  - intel_pstate carries out computations using 32-bit variables
    that may overflow for large enough values of APERF/MPERF.  Switch
    to using 64-bit variables for computations, from Doug Smythies.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTjjxqAAoJEILEb/54YlRxyxYP/RbWoU3ueLJnPuWWfRmdyW++
 ebQGku6nVjRheDJxKK/bE5XIvZVx1rk8XPrzhmAI4iWZ8KVwRwezKL4rwaLS4TNo
 Q2AuG7nHWjsTdvZH7NhYvBNIxRCPkdxI4GyHeJvuYu+SrphgwgcQ3xW8I9re+c8Q
 afy3PK6bfFyPmx/IGL41AD0Tmh7edWpkGIGizI9QYsATn6IzbjNj17IBjLgpUf9s
 yyj5OgU0T9J7B/sHHyDgmto0cniQdKgs8mvFLNzfHoytG/H1MCIII4+v1DZJvL4Y
 L6cx71jrS+OrbBhJi9Z3n2m09LuA9/cxAGp1ojVDQ3TFZF7NQ+ruGvjDtLDgnqJK
 crckpNQP1umL+maWnKbP2//IxvUo8bJi0g0GgOeIO8Ju9hf2oqCRDHR2L6cPJ5c5
 DDbN+MmcRTdynXaTE0nMqwsR+ZsKyIbe9vx02roQUbvGlBNH35zbHsh7rsT4O0Cr
 XpZET80G8WtggqZKTBj08A1o31rTaGXIu4uGsN4cFO4dNrmTDWsguJg5tB7fMpCH
 8rMDo8h+Q2V+h+TWGkhqDxZnChik5jNWJY2lBnhyh88o1Nx5zLhnEAgSddQVnzTN
 as4QDSuj2D7wU7UBDqZO9GV9MRtyYSMk/lsAx/lbIvryY6wZYZSiDeWIu82jcdeb
 iO1WGBlQJHIkng6OZz7e
 =YT7e
 -----END PGP SIGNATURE-----

Merge tag 'pm-3.15-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull intel pstate fixes from Rafael Wysocki:
 "Final power management fixes for 3.15

   - Taking non-idle time into account when calculating core busy time
     was a mistake and led to a performance regression.  Since the
     problem it was supposed to address is now taken care of in a
     different way, we don't need to do it any more, so drop the
     non-idle time tracking from intel_pstate.  Dirk Brandewie.

   - Changing to fixed point math throughout the busy calculation
     introduced rounding errors that adversely affect the accuracy of
     intel_pstate's computations.  Fix from Dirk Brandewie.

   - The PID controller algorithm used by intel_pstate assumes that the
     time interval between two adjacent samples will always be the same
     which is not the case for deferable timers (used by intel_pstate)
     when the system is idle.  This leads to inaccurate predictions and
     artificially increases convergence times for the minimum P-state.
     Fix from Dirk Brandewie.

   - intel_pstate carries out computations using 32-bit variables that
     may overflow for large enough values of APERF/MPERF.  Switch to
     using 64-bit variables for computations, from Doug Smythies"

* tag 'pm-3.15-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_pstate: Improve initial busy calculation
  intel_pstate: add sample time scaling
  intel_pstate: Correct rounding in busy calculation
  intel_pstate: Remove C0 tracking
2014-06-04 07:48:54 -07:00
Linus Torvalds 9e9a928eed Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "All fairly small: radeon stability and a panic path fix.

  Mostly radeon fixes, suspend/resume fix, stability on the CIK
  chipsets, along with a locking check avoidance patch for panic times
  regression"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: use the CP DMA on CIK
  drm/radeon: sync page table updates
  drm/radeon: fix vm buffer size estimation
  drm/crtc-helper: skip locking checks in panicking path
  drm/radeon/dpm: resume fixes for some systems
2014-06-04 07:48:01 -07:00
Masami Hiramatsu 082f96a93e perf probe: Fix perf probe to find correct variable DIE
Fix perf probe to find correct variable DIE which has location or
external instance by tracking down the lexical blocks.

Current die_find_variable() expects that the all variable DIEs
which has DW_TAG_variable have a location. However, since recent
dwarf information may have declaration variable DIEs at the
entry of function (subprogram), die_find_variable() returns it.

To solve this problem, it must track down the DIE tree to find
a DIE which has an actual location or a reference for external
instance.

e.g. finding a DIE which origin is <0xdc73>;

 <1><11496>: Abbrev Number: 95 (DW_TAG_subprogram)
    <11497>   DW_AT_abstract_origin: <0xdc42>
    <1149b>   DW_AT_low_pc      : 0x1850
[...]
 <2><114cc>: Abbrev Number: 119 (DW_TAG_variable) <- this is a declaration
    <114cd>   DW_AT_abstract_origin: <0xdc73>
 <2><114d1>: Abbrev Number: 119 (DW_TAG_variable)
[...]
 <3><115a7>: Abbrev Number: 105 (DW_TAG_lexical_block)
    <115a8>   DW_AT_ranges      : 0xaa0
 <4><115ac>: Abbrev Number: 96 (DW_TAG_variable) <- this has a location
    <115ad>   DW_AT_abstract_origin: <0xdc73>
    <115b1>   DW_AT_location    : 0x486c        (location list)

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20140529121930.30879.87092.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2014-06-04 14:49:20 +02:00
Masami Hiramatsu 0c188a07b6 perf probe: Fix a segfault if asked for variable it doesn't find
Fix a segfault bug by asking for variable it doesn't find.
Since the convert_variable() didn't handle error code returned
from convert_variable_location(), it just passed an incomplete
variable field and then a segfault was occurred when formatting
the field.

This fixes that bug by handling success code correctly in
convert_variable(). Other callers of convert_variable_location()
are correctly checking the return code.

This bug was introduced by following commit. But another hidden
erroneous error handling has been there previously (-ENOMEM case).

 commit 3d918a12a1

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20140529105232.28251.30447.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2014-06-04 14:48:03 +02:00
Yinghai Lu ac2a55395e x86: irq: Get correct available vectors for cpu disable
check_irq_vectors_for_cpu_disable() can overestimate the number of
available interrupt vectors, so the check for cpu down succeeds, but
the actual cpu removal fails.

It iterates from FIRST_EXTERNAL_VECTOR to NR_VECTORS, which is wrong
because the systems vectors are not taken into account.

Limit the search to first_system_vector instead of NR_VECTORS.

The second indicator for vector availability the used_vectors bitmap
is not taken into account at all. So system vectors,
e.g. IA32_SYSCALL_VECTOR (0x80) and IRQ_MOVE_CLEANUP_VECTOR (0x20),
are accounted as available.

Add a check for the used_vectors bitmap and do not account vectors
which are marked there.

[ tglx: Simplified code. Rewrote changelog and code comments. ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Elliott, Robert (Server Storage)" <Elliott@hp.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1400160305-17774-2-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-04 14:18:34 +02:00
Dave Airlie 0a4ae727d6 Merge branch 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux into drm-fixes
The first one is a one liner fixing a stupid typo in the VM handling code and is only relevant if play with one of the VM defines.

The other two switches CIK to use the CPDMA instead of the SDMA for buffer moves, as it turned out the SDMA is still sometimes not 100% reliable.

* 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux:
  drm/radeon: use the CP DMA on CIK
  drm/radeon: sync page table updates
  drm/radeon: fix vm buffer size estimation
2014-06-04 13:29:13 +10:00
Linus Torvalds d2cfd31050 sound fixes for 3.15-final
A few addition of HD-audio fixups for ALC260 and AD1986A codecs.
 All marked as stable fixes.
 
 The fixes are pretty local and they are old machines, so quite safe
 to apply.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTjZODAAoJEGwxgFQ9KSmk6VIQAJSilHMZ2aHoY7ecRG00SU/n
 aM4j0vbPc8po/u1OUiTy6Uz0wB8PioLvRdy0B8wd6ydxn0csyqymI7XNlwewg393
 lTyZCzsDbrb82cxWdsSs3BfS9sNtIt7A2OUxfGvRhObRp4oaUZu9blGazhafELO6
 wSQyItdv87MT+d17htndmfv0VyDpxpkxYT37CUNwTbE5xRvoz5u2L90OORh5Ki4D
 65pJ/kNqhjhhmftKfacp52pUxeYhLqqAklcsSHP3VzAChKQ2qcI9SPzqX9J9ieE4
 mBsTry/5/DKaps9DqOztK1XYT16jrhMQSKB0eN6Y0P2zaMW9nlQ4AykT7ZcUQsYx
 4Po/GeGIOkT0IhZP4ApjDeitjfDlhlpdN5RY1p3y7gUl+Mkr/Wx1v6NkW9zT2EpF
 GQZao2eeUZCF3S+rqLotlXyxDw9vFHc8KGjZDWYBDMVCN13TBoxNpDXRyyIz+78M
 3j7xCpXrJxc/ybZWJTOYUwcYeQpEG8fIgtAltDMZVYQTbHnevBRxmuFeFiTkQjND
 cSHFb1dUW92OQAYjZ30YIluxq1T3iLf6yaE/C2UKDIEoqike57Zplp5534Q6E7bL
 bh68ZOZOpkGZStYdKWrOJbvmXehanCjiwoIpReFjoz6f7S9ISF58m3bXev0w3qzK
 lCohAX+VDEfaNq+iKb8a
 =BSlJ
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A few addition of HD-audio fixups for ALC260 and AD1986A codecs.  All
  marked as stable fixes.

  The fixes are pretty local and they are old machines, so quite safe to
  apply"

* tag 'sound-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Fix COEF widget NID for ALC260 replacer fixup
  ALSA: hda/realtek - Correction of fixup codes for PB V7900 laptop
  ALSA: hda/analog - Fix silent output on ASUS A8JN
2014-06-03 12:07:30 -07:00
Jianyu Zhan c9482a5bdc kernfs: move the last knowledge of sysfs out from kernfs
There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.

Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-03 08:11:18 -07:00
Linus Torvalds cae61ba37b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Unbreak zebra and other netlink apps, from Eric W Biederman.

 2) Some new qmi_wwan device IDs, from Aleksander Morgado.

 3) Fix info leak in DCB netlink handler of qlcnic driver, from Dan
    Carpenter.

 4) inet_getid() and ipv6_select_ident() do not generate monotonically
    increasing ID numbers, fix from Eric Dumazet.

 5) Fix memory leak in __sk_prepare_filter(), from Leon Yu.

 6) Netlink leftover bytes warning message is user triggerable, rate
    limit it.  From Michal Schmidt.

 7) Fix non-linear SKB panic in ipvs, from Peter Christensen.

 8) Congestion window undo needs to be performed even if only never
    retransmitted data is SACK'd, fix from Yuching Cheng.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  net: filter: fix possible memory leak in __sk_prepare_filter()
  net: ec_bhf: Add runtime dependencies
  tcp: fix cwnd undo on DSACK in F-RTO
  netlink: Only check file credentials for implicit destinations
  ipheth: Add support for iPad 2 and iPad 3
  team: fix mtu setting
  net: fix inet_getid() and ipv6_select_ident() bugs
  net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI
  net: qmi_wwan: add additional Sierra Wireless QMI devices
  bridge: Prevent insertion of FDB entry with disallowed vlan
  netlink: rate-limit leftover bytes warning and print process name
  bridge: notify user space after fdb update
  net: qmi_wwan: add Netgear AirCard 341U
  net: fix wrong mac_len calculation for vlans
  batman-adv: fix NULL pointer dereferences
  net/mlx4_core: Reset RoCE VF gids when guest driver goes down
  emac: aggregation of v1-2 PLB errors for IER register
  emac: add missing support of 10mbit in emac/rgmii
  can: only rename enabled led triggers when changing the netdev name
  ipvs: Fix panic due to non-linear skb
  ...
2014-06-02 18:16:41 -07:00
Leon Yu 418c96ac15 net: filter: fix possible memory leak in __sk_prepare_filter()
__sk_prepare_filter() was reworked in commit bd4cf0ed3 (net: filter:
rework/optimize internal BPF interpreter's instruction set) so that it should
have uncharged memory once things went wrong. However that work isn't complete.
Error is handled only in __sk_migrate_filter() while memory can still leak in
the error path right after sk_chk_filter().

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 17:49:45 -07:00
Linus Torvalds ca755175f2 Two md bugfixes for possible corruption when restarting reshape
If a raid5/6 reshape is restarted (After stopping and re-assembling
 the array) and the array is marked read-only (or read-auto), then
 the reshape will appear to complete immediately, without actually
 moving anything around.  This can result in corruption.
 
 There are two patches which do much the same thing in different places.
 They are separate because one is an older bug and so can be applied to
 more -stable kernels.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIUAwUAU40KTDnsnt1WYoG5AQJcaA/1GkoZit6LqLjiIQmsK9Ci/4TI+sNqYaSB
 9SleSjWt+bcNCRY4sS3Wv0H580LmkoRR24wdei+mukoFa+bpBBs6PodPMABAVsnL
 VxlnUX+P4Ef77s2zJ8B5wCY3ftmecaQL3TdZf10+hIITacXSp7JmsLJXm3DW+Jvq
 DZsxJRBEQfsz5obZAZXnvPAcTSkqMT4QQ13nIEmaYEz+AYVn6Tcf8xwDBOcZM4u9
 Gdi6BHNaY6RjSU1gsVblPYmWQyqqdgCJ6UEV/KYyY9rtFyozkvJ0SDWcu/kRA74A
 uydN5U6iVqJatY9l9eK2tV7GQkN+o+MWIA0JocTRZe67ihE4tWxiLRn/7fZdLVsX
 TV6zYar0M/ZSn3XioGi4hQ0tWDPpq/aCzCAk5JQpywgBmoaMqqh8rttwdCkWvK6P
 TNnaVfo3r9AMJY8MVm8in/efEhY6jUa3q2oDqCEKjuL916v9ODsxXloqTlbEy2KC
 NrKNLCZA2subbzPa3T8u4aKRBzl0xSBSig8ecrufSpDC1I0G+Mbuc8wrDzjAnI3N
 +fbQCxxRR0akcleZrFZD67avOa5/DsQqWJbcW1D5VCekJoZcgdz5CGJz/bNl+0i6
 bwrvNWi6q1X2P4Nt2BBhk771xzNiUlufsI0x7SFIJxpDiGlxINkluXvnEQKFSzhr
 uYSrvTCQwg==
 =cTEe
 -----END PGP SIGNATURE-----

Merge tag 'md/3.15-fixes' of git://neil.brown.name/md

Pull two md bugfixes from Neil Brown:
 "Two md bugfixes for possible corruption when restarting reshape

  If a raid5/6 reshape is restarted (After stopping and re-assembling
  the array) and the array is marked read-only (or read-auto), then the
  reshape will appear to complete immediately, without actually moving
  anything around.  This can result in corruption.

  There are two patches which do much the same thing in different
  places.  They are separate because one is an older bug and so can be
  applied to more -stable kernels"

* tag 'md/3.15-fixes' of git://neil.brown.name/md:
  md: always set MD_RECOVERY_INTR when interrupting a reshape thread.
  md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync".
2014-06-02 17:04:37 -07:00
Jean Delvare 3aab01d800 net: ec_bhf: Add runtime dependencies
The ec_bhf driver is specific to the Beckhoff CX embedded PC series.
These are based on Intel x86 CPU. So we can add a dependency on
X86, with COMPILE_TEST as an alternative to still allow for broader
build-testing.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Darek Marcinkiewicz <reksio@newterm.pl>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 17:02:28 -07:00
Martin K. Petersen 3b8d2676d1 libata: Blacklist queued trim for Crucial M500
Queued trim only works for some users with MU05 firmware.  Revert to
blacklisting all firmware versions.

Introduced by commit d121f7d0cb ("libata: Update queued trim blacklist
for M5x0 drives") which this effectively reverts, while retaining the
blacklisting of M550.

See

    https://bugzilla.kernel.org/show_bug.cgi?id=71371

for reports of trouble with MU05 firmware.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-02 16:59:25 -07:00
Linus Torvalds 92b4e11315 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Peter Anvin:
 "A single quite small patch that managed to get overlooked earlier, to
  prevent a user space triggerable oops on systems without HPET"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
2014-06-02 16:57:23 -07:00
Linus Torvalds 8ee7a330fb USB fixes for 3.15-rc8
Here are some fixes for 3.15-rc8 that resolve a number of tiny USB
 issues that have been reported, and there are some new device ids as
 well.
 
 All have been tested in linux-next.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlOM9RkACgkQMUfUDdst+ykhbACeKn9kmESO0QMC2ST0kQkQxHJX
 olYAoKbYZxvlZbdSJBmtYDbm1c5wrCfO
 =H5ae
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some fixes for 3.15-rc8 that resolve a number of tiny USB
  issues that have been reported, and there are some new device ids as
  well.

  All have been tested in linux-next"

* tag 'usb-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: delete endpoints from bandwidth list before freeing whole device
  usb: pci-quirks: Prevent Sony VAIO t-series from switching usb ports
  USB: cdc-wdm: properly include types.h
  usb: cdc-wdm: export cdc-wdm uapi header
  USB: serial: option: add support for Novatel E371 PCIe card
  USB: ftdi_sio: add NovaTech OrionLXm product ID
  USB: io_ti: fix firmware download on big-endian machines (part 2)
  USB: Avoid runtime suspend loops for HCDs that can't handle suspend/resume
2014-06-02 16:56:42 -07:00
Linus Torvalds da579dd6a1 Staging driver fixes for 3.15-rc8
Here are some staging driver fixes for 3.15.  3 are for the speakup
 drivers (one fix a regression caused in 3.15-rc, and the other 2 resolve
 a tty issue found by Ben Hutchings)  The comedi and r8192e_pci driver
 fixes also resolve reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlOM9eUACgkQMUfUDdst+ykETACbBTSOZVE1pAIKGjGY2bcOtQET
 yKMAnil04hAbmz5L/5TNvdkUWu+7SfGi
 =K8cr
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are some staging driver fixes for 3.15.

  Three are for the speakup drivers (one fixes a regression caused in
  3.15-rc, and the other two resolve a tty issue found by Ben Hutchings)
  The comedi and r8192e_pci driver fixes also resolve reported issues"

* tag 'staging-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8192e_pci: fix htons error
  Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt
  Staging: speakup: Move pasting into a work item
  staging: comedi: ni_daq_700: add mux settling delay
  speakup: fix incorrect perms on speakup_acntsa.c
2014-06-02 16:55:18 -07:00
Yuchung Cheng 0cfa5c07d6 tcp: fix cwnd undo on DSACK in F-RTO
This bug is discovered by an recent F-RTO issue on tcpm list
https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html

The bug is that currently F-RTO does not use DSACK to undo cwnd in
certain cases: upon receiving an ACK after the RTO retransmission in
F-RTO, and the ACK has DSACK indicating the retransmission is spurious,
the sender only calls tcp_try_undo_loss() if some never retransmisted
data is sacked (FLAG_ORIG_DATA_SACKED).

The correct behavior is to unconditionally call tcp_try_undo_loss so
the DSACK information is used properly to undo the cwnd reduction.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:50:49 -07:00
Eric W. Biederman 2d7a85f4b0 netlink: Only check file credentials for implicit destinations
It was possible to get a setuid root or setcap executable to write to
it's stdout or stderr (which has been set made a netlink socket) and
inadvertently reconfigure the networking stack.

To prevent this we check that both the creator of the socket and
the currentl applications has permission to reconfigure the network
stack.

Unfortunately this breaks Zebra which always uses sendto/sendmsg
and creates it's socket without any privileges.

To keep Zebra working don't bother checking if the creator of the
socket has privilege when a destination address is specified.  Instead
rely exclusively on the privileges of the sender of the socket.

Note from Andy: This is exactly Eric's code except for some comment
clarifications and formatting fixes.  Neither I nor, I think, anyone
else is thrilled with this approach, but I'm hesitant to wait on a
better fix since 3.15 is almost here.

Note to stable maintainers: This is a mess.  An earlier series of
patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
but they did so in a way that breaks Zebra.  The offending series
includes:

    commit aa4cf9452f
    Author: Eric W. Biederman <ebiederm@xmission.com>
    Date:   Wed Apr 23 14:28:03 2014 -0700

        net: Add variants of capable for use on netlink messages

If a given kernel version is missing that series of fixes, it's
probably worth backporting it and this patch.  if that series is
present, then this fix is critical if you care about Zebra.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:34:09 -07:00
Kristian Evensen 22fd2a52f7 ipheth: Add support for iPad 2 and iPad 3
Each iPad model has a different product id, this patch adds support for iPad 2
(pid 0x12a2) and iPad 3 (pid 0x12a6). Note that iPad 2 must be jailbroken and a
third-party app must be used for tethering to work. On iPad 3, tethering works
out of the box (assuming your ISP is nice).

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:11:55 -07:00
Jiri Pirko 9d0d68faea team: fix mtu setting
Now it is not possible to set mtu to team device which has a port
enslaved to it. The reason is that when team_change_mtu() calls
dev_set_mtu() for port device, notificator for NETDEV_PRECHANGEMTU
event is called and team_device_event() returns NOTIFY_BAD forbidding
the change. So fix this by returning NOTIFY_DONE here in case team is
changing mtu in team_change_mtu().

Introduced-by: 3d249d4c "net: introduce ethernet teaming device"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:56:01 -07:00
Eric Dumazet 39c36094d7 net: fix inet_getid() and ipv6_select_ident() bugs
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.

06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)

It appears I introduced this bug in linux-3.1.

inet_getid() must return the old value of peer->ip_id_count,
not the new one.

Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.

Fixes: 87c48fa3b4 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:09:28 -07:00
Aleksander Morgado fc0d6e9cd0 net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI
This interface is unusable, as the cdc-wdm character device doesn't reply to
any QMI command. Also, the out-of-tree Sierra Wireless GobiNet driver fully
skips it.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:00:28 -07:00
Aleksander Morgado 9a793e71eb net: qmi_wwan: add additional Sierra Wireless QMI devices
A set of new VID/PIDs retrieved from the out-of-tree GobiNet/GobiSerial
Sierra Wireless drivers.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:00:28 -07:00
Toshiaki Makita e0d7968ab6 bridge: Prevent insertion of FDB entry with disallowed vlan
br_handle_local_finish() is allowing us to insert an FDB entry with
disallowed vlan. For example, when port 1 and 2 are communicating in
vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can
interfere with their communication by spoofed src mac address with
vlan id 10.

Note: Even if it is judged that a frame should not be learned, it should
not be dropped because it is destined for not forwarding layer but higher
layer. See IEEE 802.1Q-2011 8.13.10.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 13:38:23 -07:00
Michal Schmidt bfc5184b69 netlink: rate-limit leftover bytes warning and print process name
Any process is able to send netlink messages with leftover bytes.
Make the warning rate-limited to prevent too much log spam.

The warning is supposed to help find userspace bugs, so print the
triggering command name to implicate the buggy program.

[v2: Use pr_warn_ratelimited instead of printk_ratelimited.]

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 11:16:11 -07:00
Takashi Iwai 192a98e280 ALSA: hda/realtek - Fix COEF widget NID for ALC260 replacer fixup
The conversion to a fixup table for Replacer model with ALC260 in
commit 20f7d928 took the wrong widget NID for COEF setups.  Namely,
NID 0x1a should have been used instead of NID 0x20, which is the
common node for all Realtek codecs but ALC260.

Fixes: 20f7d928fa ('ALSA: hda/realtek - Replace ALC260 model=replacer with the auto-parser')
Cc: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-06-02 16:48:28 +02:00
Ronan Marquet e30cf2d2be ALSA: hda/realtek - Correction of fixup codes for PB V7900 laptop
Correcion of wrong fixup entries add in commit ca8f0424 to replace
static model quirk for PB V7900 laptop (will model).

[note: the removal of ALC260_FIXUP_HP_PIN_0F chain is also needed as a
 part of the fix; otherwise the pin is set up wrongly as a headphone,
 and user-space (PulseAudio) may be wrongly trying to detect the jack
 state -- tiwai]

Fixes: ca8f04247e ('ALSA: hda/realtek - Add the fixup codes for ALC260 model=will')
Signed-off-by: Ronan Marquet <ronan.marquet@orange.fr>
Cc: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-06-02 16:46:31 +02:00
Dave Young a3530e8fe9 x86/efi: Do not export efi runtime map in case old map
For ioremapped efi memory aka old_map the virt addresses are not persistant
across kexec reboot. kexec-tools will read the runtime maps from sysfs then
pass them to 2nd kernel and assuming kexec efi boot is ok. This will cause
kexec boot failure.

To address this issue do not export runtime maps in case efi old_map so
userspace can use no efi boot instead.

Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-06-02 12:21:59 +01:00