Commit Graph

11789 Commits

Author SHA1 Message Date
Hidehiro Kawai b279d67df8 x86/nmi: Save regs in crash dump on external NMI
Now, multiple CPUs can receive an external NMI simultaneously by
specifying the "apic_extnmi=all" command line parameter. When we take
a crash dump by using external NMI with this option, we fail to save
registers into the crash dump. This happens as follows:

  CPU 0                              CPU 1
  ================================   =============================
  receive an external NMI
  default_do_nmi()                   receive an external NMI
    spin_lock(&nmi_reason_lock)      default_do_nmi()
    io_check_error()                   spin_lock(&nmi_reason_lock)
      panic()                            busy loop
      ...
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                         busy loop...

Here, since CPU 1 is in NMI context, an additional NMI from CPU 0
remains unhandled until CPU 1 IRETs. However, CPU 1 will never execute
IRET so the NMI is not handled and the callback function to save
registers is never called.

To solve this issue, we check if the IPI for crash dumping was issued
while waiting for nmi_reason_lock to be released, and if so, call its
callback function directly. If the IPI is not issued (e.g. kdump is
disabled), the actual behavior doesn't change.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210065245.4587.39316.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:07:01 +01:00
Hidehiro Kawai b7c4948e98 x86/apic: Introduce apic_extnmi command line parameter
This patch introduces a command line parameter apic_extnmi:

 apic_extnmi=( bsp|all|none )

The default value is "bsp" and this is the current behavior: only the
Boot-Strapping Processor receives an external NMI.

"all" allows external NMIs to be broadcast to all CPUs. This would
raise the success rate of panic on NMI when BSP hangs in NMI context
or the external NMI is swallowed by other NMI handlers on the BSP.

If you specify "none", no CPUs receive external NMIs. This is useful for
the dump capture kernel so that it cannot be shot down by accidentally
pressing the external NMI button (on platforms which have it) while
saving a crash dump.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bandan Das <bsd@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210014632.25437.43778.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:07:01 +01:00
Hidehiro Kawai 58c5661f21 panic, x86: Allow CPUs to save registers even if looping in NMI context
Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.

For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:

  CPU 0                             CPU 1
  ===========================       ==========================
  receive an unknown NMI
  unknown_nmi_error()
    panic()                         receive an unknown NMI
      spin_trylock(&panic_lock)     unknown_nmi_error()
      crash_kexec()                   panic()
                                        spin_trylock(&panic_lock)
                                        panic_smp_self_stop()
                                          infinite loop
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                          infinite loop...

Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.

In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.

To save registers in this case, we need to:

  a) Return from NMI handler instead of looping infinitely
  or
  b) Call the callback function directly from the infinite loop

Inherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices.  So, we chose b).

This patch does the following:

1. Move the infinite looping of CPUs which haven't called panic() in NMI
   context (actually done by panic_smp_self_stop()) outside of panic() to
   enable us to refer pt_regs. Please note that panic_smp_self_stop() is
   still used for normal context.

2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
   registers and do some cleanups after setting waiting_for_crash_ipi which
   is used for counting down the number of CPUs which handled the callback

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:07:01 +01:00
Hidehiro Kawai 1717f2096b panic, x86: Fix re-entrance problem due to panic on NMI
If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.

To avoid this problem, don't call panic() in NMI context if we've
already entered panic().

For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:07:00 +01:00
Thomas Gleixner d267b8d6c6 Merge branch 'linus' into x86/apic
Pull in update changes so we can apply conflicting patches
2015-12-19 11:03:18 +01:00
Ashok Raj d90167a941 x86/mce: Ensure offline CPUs don't participate in rendezvous process
Intel's MCA implementation broadcasts MCEs to all CPUs on the
node. This poses a problem for offlined CPUs which cannot
participate in the rendezvous process:

  Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
  Kernel Offset: disabled
  Rebooting in 100 seconds..

More specifically, Linux does a soft offline of a CPU when
writing a 0 to /sys/devices/system/cpu/cpuX/online, which
doesn't prevent the #MC exception from being broadcasted to that
CPU.

Ensure that offline CPUs don't participate in the MCE rendezvous
and clear the RIP valid status bit so that a second MCE won't
cause a shutdown.

Without the patch, mce_start() will increment mce_callin and
wait for all CPUs. Offlined CPUs should avoid participating in
the rendezvous process altogether.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1449742346-21470-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 09:55:31 +01:00
Ingo Molnar 057032e457 Linux 4.4-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWbh6rAAoJEHm+PkMAQRiG2L4H/AmgfAPX8vYQTKXAnpxt7cLx
 2W9C+fyKRcHzqBCsUB4wxPwYwDGPmEZHo4Y/evaSdUbPhngRRRfEVZpzbTUqheEm
 A7h/1mCl5gcaGRzDNTAST3vfmNmwOHAWC+Ch3ZuuzxH+brtY0Ynb32CNa1XnmW9K
 7qknzpgyE3ZNQgwKzZ7F/+TscGcslalKRoAxPa7fumb1srW/Z04aGXYZdEQxOhow
 6Oc2op0IijTse5TdfW/MsbpvbH2uBLnQcYHvKXJ0wRmnQGeowguLSgyW356EAOQ9
 7L7xCvXyX5dFakZ9EApT3wsSP+cS7jUInDLDAxf14gyODrnl65KO3LU8vmZbJmI=
 =l5Rq
 -----END PGP SIGNATURE-----

Merge tag 'v4.4-rc5' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-14 09:31:23 +01:00
Andy Lutomirski cc1e24fdb0 x86/vdso: Remove pvclock fixmap machinery
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a20035f5c1fe4f.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-11 08:56:03 +01:00
Andy Lutomirski dac16fba6f x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-11 08:56:03 +01:00
Linus Torvalds 51825c8a86 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree includes four core perf fixes for misc bugs, three fixes to
  x86 PMU drivers, and two updates to old email addresses"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do not send exit event twice
  perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro
  perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell
  perf: Fix PERF_EVENT_IOC_PERIOD deadlock
  treewide: Remove old email address
  perf/x86: Fix LBR call stack save/restore
  perf: Update email address in MAINTAINERS
  perf/core: Robustify the perf_cgroup_from_task() RCU checks
  perf/core: Fix RCU problem with cgroup context switching code
2015-12-08 13:01:23 -08:00
Linus Torvalds 69d2ca6002 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thoma Gleixner:
 "Another round of fixes for x86:

   - Move the initialization of the microcode driver to late_initcall to
     make sure everything that init function needs is available.

   - Make sure that lockdep knows about interrupts being off in the
     entry code before calling into c-code.

   - Undo the cpu hotplug init delay regression.

   - Use the proper conditionals in the mpx instruction decoder.

   - Fixup restart_syscall for x32 tasks.

   - Fix the hugepage regression on PAE kernels which was introduced
     with the latest PAT changes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/signal: Fix restart_syscall number for x32 tasks
  x86/mpx: Fix instruction decoder condition
  x86/mm: Fix regression with huge pages on PAE
  x86 smpboot: Re-enable init_udelay=0 by default on modern CPUs
  x86/entry/64: Fix irqflag tracing wrt context tracking
  x86/microcode: Initialize the driver late when facilities are up
2015-12-06 08:08:56 -08:00
Andi Kleen f1ad44884a perf/x86: Remove old MSR perf tracing code
Now that we have generic MSR trace points we can remove the old
hackish perf MSR read tracing code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1449018060-1742-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:56:14 +01:00
Andi Kleen da008ee72c perf/x86/intel: Fix __initconst declaration in the RAPL perf driver
Fix a definition in the perf rapl driver. __initconst must
be applied to a const object, but to declare a const pointer
you need to use * const ..., not const ... *

This fixes a section attribute conflict with LTO builds.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1448905722-2767-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:55:53 +01:00
Ingo Molnar 42a0789bf5 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:55:37 +01:00
Jiri Olsa 169b932a15 perf/x86/intel: Fix INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA macro
We need to add rest of the flags to the constraint mask
instead of another INTEL_ARCH_EVENT_MASK, fixing a typo.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1447061071-28085-1-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:54:48 +01:00
Yuanfang Chen e0fbac1cd4 perf/x86/intel: Make L1D_PEND_MISS.FB_FULL not constrained on Haswell
There was a mistake in the Haswell constraints table.

Signed-off-by: Yuanfang Chen <cheny@udel.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1448384701-9110-1-git-send-email-cheny@udel.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:54:48 +01:00
Igor Mammedov ec941c5ffe x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN
when memory hotplug enabled system is booted with less
than 4GB of RAM and then later more RAM is hotplugged
32-bit devices stop functioning with following error:

 nommu_map_single: overflow 327b4f8c0+1522 of device mask ffffffff

the reason for this is that if x86_64 system were booted
with RAM less than 4GB, it doesn't enable SWIOTLB and
when memory is hotplugged beyond MAX_DMA32_PFN, devices
that expect 32-bit addresses can't handle 64-bit addresses.

Fix it by tracking max possible PFN when parsing
memory affinity structures from SRAT ACPI table and
enable SWIOTLB if there is hotpluggable memory
regions beyond MAX_DMA32_PFN.

It fixes KVM guests when they use emulated devices
(reproduces with ata_piix, e1000 and usb devices,
 RHBZ: 1275941, 1275977, 1271527)

It also fixes the HyperV, VMWare with emulated devices
which are affected by this issue as well.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: fujita.tomonori@lab.ntt.co.jp
Cc: konrad.wilk@oracle.com
Cc: pbonzini@redhat.com
Cc: revers@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1449234426-273049-3-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:46:31 +01:00
Igor Mammedov 8dd3303001 x86/mm: Introduce max_possible_pfn
max_possible_pfn will be used for tracking max possible
PFN for memory that isn't present in E820 table and
could be hotplugged later.

By default max_possible_pfn is initialized with max_pfn,
but later it could be updated with highest PFN of
hotpluggable memory ranges declared in ACPI SRAT table
if any present.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: fujita.tomonori@lab.ntt.co.jp
Cc: konrad.wilk@oracle.com
Cc: pbonzini@redhat.com
Cc: revers@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:46:31 +01:00
Dmitry V. Levin 22eab11087 x86/signal: Fix restart_syscall number for x32 tasks
When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK,
regs->ax is assigned to a restart_syscall number.  For x32 tasks, this
syscall number must have __X32_SYSCALL_BIT set, otherwise it will be
an x86_64 syscall number instead of a valid x32 syscall number. This
issue has been there since the introduction of x32.

Reported-by: strace/tests/restart_syscall.test
Reported-and-tested-by: Elvira Khabirova <lineprinter0@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Elvira Khabirova <lineprinter0@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20151130215436.GA25996@altlinux.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-05 18:52:14 +01:00
Rasmus Villemoes c332813b51 x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata
'range_new' doesn't seem to be used after init. It is only passed
to memset(), sum_ranges(), memcmp() and x86_get_mtrr_mem_range(), the
latter of which also only passes it on to various *range*
library functions.

So mark it __initdata to free up an extra page after init.

Its contents are wiped at every call to mtrr_calc_range_state(),
so it being static is not about preserving state between calls,
but simply to avoid a 4k+ stack frame. While there, add a
comment explaining this and why it's safe.

We could also mark nr_range_new as __initdata, but since it's
just a single int and also doesn't carry state between calls (it
is unconditionally assigned to before it is read), we might as
well make it an ordinary automatic variable.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/1449002691-20783-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 09:11:28 +01:00
Dan Williams bc0d0d093b libnvdimm, e820: skip module loading when no type-12
If there are no persistent memory ranges present then don't bother
creating the platform device.  Otherwise, it loads the full libnvdimm
sub-system only to discover no resources present.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-11-30 09:10:33 -08:00
Julia Lawall d6b56b0bc6 x86/platform/calgary: Constify cal_chipset_ops structures
The cal_chipset_ops structures are never modified, so declare
them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jon D. Mason <jdmason@kudzu.us>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448726295-10959-1-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-29 08:50:58 +01:00
Rasmus Villemoes e49a449b86 x86/fpu: Put a few variables in .init.data
These are clearly just used during init.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447424312-26400-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-27 10:23:17 +01:00
Len Brown 656279a1f3 x86 smpboot: Re-enable init_udelay=0 by default on modern CPUs
commit f1ccd24931 allowed the cmdline "cpu_init_udelay=" to work
with all values, including the default of 10000.

But in setting the default of 10000, it over-rode the code that sets
the delay 0 on modern processors.

Also, tidy up use of INT/UINT.

Fixes: f1ccd24931 "x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior"
Reported-by: Shane <shrybman@teksavvy.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: dparsons@brightdsl.net
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/9082eb809ef40dad02db714759c7aaf618c518d4.1448232494.git.len.brown@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-25 23:17:48 +01:00
Juergen Gross d6ccc3ec95 x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer
pte_update_defer can be removed as it is always set to the same
function as pte_update. So any usage of pte_update_defer() can be
replaced by pte_update().

pmd_update and pmd_update_defer are always set to paravirt_nop, so they
can just be nuked.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: jeremy@goop.org
Cc: chrisw@sous-sol.org
Cc: akataria@vmware.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xen.org
Cc: konrad.wilk@oracle.com
Cc: david.vrabel@citrix.com
Cc: boris.ostrovsky@oracle.com
Link: http://lkml.kernel.org/r/1447771879-1806-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-25 23:08:37 +01:00
Len Brown 0007bccc3c x86: Replace RDRAND forced-reseed with simple sanity check
x86_init_rdrand() was added with 2 goals:

1. Sanity check that the built-in-self-test circuit on the Digital
   Random Number Generator (DRNG) is not complaining.  As RDRAND
   HW self-checks on every invocation, this goal is achieved
   by simply invoking RDRAND and checking its return code.

2. Force a full re-seed of the random number generator.
   This was done out of paranoia to benefit the most un-sophisticated
   DRNG implementation conceivable in the architecture,
   an implementation that does not exist, and unlikely ever will.
   This worst-case full-re-seed is achieved by invoking
   a 64-bit RDRAND 8192 times.

Unfortunately, this worst-case re-seed costs O(1,000us).
Magnifying this cost, it is done from identify_cpu(), which is the
synchronous critical path to bring a processor on-line -- repeated
for every logical processor in the system at boot and resume from S3.

As it is very expensive, and of highly dubious value, we delete the
worst-case re-seed from the kernel.

We keep the 1st goal -- sanity check the hardware, and mark it absent
if it complains.

This change reduces the cost of x86_init_rdrand() by a factor of 1,000x,
to O(1us) from O(1,000us).

Signed-off-by: Len Brown <len.brown@intel.com>
Link: http://lkml.kernel.org/r/058618cc56ec6611171427ad7205e37e377aa8d4.1439738240.git.len.brown@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-25 22:46:43 +01:00
Steven Rostedt (Red Hat) b05086c77a ftrace: Add variable ftrace_expected for archs to show expected code
When an anomaly is found while modifying function code, ftrace_bug() is
called which disables the function tracing infrastructure and reports
information about what failed. If the code that is to be replaced does not
match what is expected, then actual code is shown. Currently there is no
arch generic way to show what was expected.

Add a new variable pointer calld ftrace_expected that the arch code can set
to point to what it expected so that ftrace_bug() can report the actual text
as well as the text that was expected to be there.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-25 15:24:16 -05:00
Ingo Molnar 49b2410631 Merge branch 'x86/urgent' into x86/asm, to pick up dependent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:55:11 +01:00
Juergen Gross 42baa2581c x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume
Saving and restoring lapic vectors in lapic_suspend() and
lapic_resume() is not consistent: the thmr vector saving is
guarded by a different config option than the restore part. The
cmci vector isn't handled at all.

Those inconsistencies are not very critical, as the missing cmci
vector will be set via mce resume handling, the wrong config
option used for restoring the thmr vector can't be configured
differently than the one which should be used.

Nevertheless correct the thmr vector restore and add cmci vector
handling.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448276364-31334-1-git-send-email-jgross@suse.com
[ Minor code edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:18:33 +01:00
Borislav Petkov 31ac34ca56 x86/cpu: Fix MSR value truncation issue
So sparse rightfully complains that the u64 MSR value we're
writing into the STAR MSR, i.e. 0xc0000081, is being truncated:

./arch/x86/include/asm/msr.h:193:36: warning: cast truncates
bits from constant value (23001000000000 becomes 0)

because the actual value doesn't fit into the unsigned 32-bit
quantity which are the @low and @high wrmsrl() parameters.

This is not a problem, practically, because gcc is actually
being smart enough here and does the right thing:

  .loc 3 87 0
  xorl    %esi, %esi		# we needz a 32-bit zero
  movl    $2293776, %edx	# 0x00230010 == (__USER32_CS << 16) | __KERNEL_CS go into the high bits
  movl    $-1073741695, %ecx	# MSR_STAR, i.e., 0xc0000081
  movl    %esi, %eax		# low order 32 bits in the MSR which are 0
  #APP
  # 87 "./arch/x86/include/asm/msr.h" 1
          wrmsr

More specifically, MSR_STAR[31:0] is being set to 0. That field
is reserved on Intel and on AMD it is 32-bit SYSCALL Target EIP.

I'd strongly guess because Intel doesn't have SYSCALL in
compat/legacy mode and we're using SYSENTER and INT80 there. And
for compat syscalls in long mode we use CSTAR.

So let's fix the sparse warning by writing SYSRET and SYSCALL CS
and SS into the high 32-bit half of STAR and 0 in the low half
explicitly.

 [ Actually, if we had to be precise, we would have to read what's in
   STAR[31:0] and write it back unchanged on Intel and write 0 on AMD. I
   guess the current writing to 0 is still ok since Intel can apparently
   stomach it. ]

The resulting code is identical to what we have above:

  .loc 3 87 0
  xorl    %esi, %esi      # tmp104
  movl    $2293776, %eax  #, tmp103
  movl    $-1073741695, %ecx      #, tmp102
  movl    %esi, %edx      # tmp104, tmp104

  ...

        wrmsr

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:55 +01:00
Borislav Petkov ae8b787543 x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it
checks whether the way access filter is enabled on some F15h
models, and, if so, disables it.

kvm doesn't handle that MSR access and complains about it, which
can get really noisy in dmesg when one starts kvm guests all the
time for testing. And it is useless anyway - guest kernel
shouldn't be doing such changes anyway so tell it that that
filter is disabled.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:54 +01:00
Borislav Petkov 99f925ce92 x86/cpu: Unify CPU family, model, stepping calculation
Add generic functions which calc family, model and stepping from
the CPUID_1.EAX leaf and stick them into the library we have.

Rename those which do call CPUID with the prefix "x86_cpuid" as
suggested by Paolo Bonzini.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:54 +01:00
Borislav Petkov feab21f835 x86/mce: Make usable address checks Intel-only
The MCi_MISC bitfield definitions mce_usable_address() checks
are Intel-only. Make them so.

While at it, move mce_usable_address() up, before all its
callers and get rid of the forward declaration.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:12:35 +01:00
Borislav Petkov db548a28fc x86/mce: Add the missing memory error check on AMD
We simply need to look at the extended error code when detecting
whether the error is of type memory.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:12:35 +01:00
Borislav Petkov c0ec382e19 x86/RAS: Remove mce.usable_addr
It is useless and we can use the function instead. Besides,
mcelog(8) hasn't managed to make use of it yet. So kill it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448350880-5573-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:12:35 +01:00
Tony Luck 8b38937b7a x86/mce: Do not enter deferred errors into the generic pool twice
We used to have a special ring buffer for deferred errors that
was used to mark problem pages. We replaced that with a generic
pool. Then later converted mce_log() to also use the same pool.
As a result, we end up adding all deferred errors to the pool
twice.

Rearrange this code. Make sure to set the m.severity and
m.usable_addr fields for deferred errors. Then if flags and
mca_cfg.dont_log_ce mean we call mce_log() we are done, because
that will add this entry to the generic pool.

If we skipped mce_log(), then we still want to take action for
the deferred error, so add to the pool.

Change the name of the boolean "error_logged" to "error_seen",
we should set it whether of not we logged an error because the
return value from machine_check_poll() is used to decide whether
storms have subsided or not.

Reported-by: Gong Chen <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1448350880-5573-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:12:35 +01:00
Boris Ostrovsky 75ef82190d x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op
As result of commit "x86/xen: Avoid fast syscall path for Xen PV
guests", usergs_sysret32 pv op is not called by Xen PV guests
anymore and since they were the only ones who used it we can
safely remove it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-4-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:48:16 +01:00
Boris Ostrovsky 88c15ec90f x86/paravirt: Remove the unused irq_enable_sysexit pv op
As result of commit "x86/xen: Avoid fast syscall path for Xen PV
guests", the irq_enable_sysexit pv op is not called by Xen PV guests
anymore and since they were the only ones who used it we can
safely remove it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-3-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:48:16 +01:00
Borislav Petkov 2d5be37d68 x86/microcode: Initialize the driver late when facilities are up
Running microcode_init() from setup_arch() is a bad idea because
not even kmalloc() is ready at that point and the loader does
all kinds of allocations and init/registration with various
subsystems.

Make it a late initcall when required facilities are initialized
so that the microcode driver initialization can succeed too.

Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151120112400.GC4028@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:39:49 +01:00
Andi Kleen b7883a1c4f perf/x86: Handle multiple umask bits for BDW CYCLE_ACTIVITY.*
The earlier constraint fix for Broadwell CYCLE_ACTIVITY.*
forced umask 8 to counter 2. For this it used UEVENT,
to match the complete umask.

The event list for Broadwell has an additional
STALLS_L1D_PENDIND event that uses umask 8, but also
sets other bits in the umask.  The earlier strict umask match
didn't handle this case.

Add a new UBIT_EVENT constraint macro that only matches
the specified bits in the umask. Then use that macro
to handle CYCLE_ACTIVITY.* on Broadwell.

The documented event also uses cmask, but there's no
need to let the event scheduler know about the cmask,
as the scheduling restriction is only tied to the umask.

Reported-by: Grant Ayers <ayers@cs.stanford.edu>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1447719667-9998-1-git-send-email-andi@firstfloor.org
[ Filled in the missing email address of Grant Ayers - hopefully I got the right one. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:27 +01:00
Takao Indoh da06a43d3f perf, x86: Stop Intel PT before kdump starts
This patch stops Intel PT logging and saves its registers in memory
before kdump is started. This feature is needed to prevent Intel PT from
overwriting its log buffer after panic, and saved registers are needed to
find the last position where Intel PT wrote data.

After the crash dump is captured by kdump, users can retrieve the log buffer
from the vmcore and use it to investigate bad kernel behavior.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-3-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:26 +01:00
Takao Indoh 24cc12b176 perf/x86/intel/pt: Add interface to stop Intel PT logging
This patch add a function for external components to stop Intel PT.
Basically this function is used when kernel panic occurs. When it is
called, the intel_pt driver disables Intel PT and saves its registers
using pt_event_stop(), which is also used by pmu.stop handler.

This function stops Intel PT on the CPU where it is working, therefore
users of it need to call it for each CPU to stop all logging.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-2-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:26 +01:00
Andi Kleen b16a5b52eb perf/x86: Add option to disable reading branch flags/cycles
With LBRv5 reading the extra LBR flags like mispredict, TSX, cycles is
not free anymore, as it has moved to a separate MSR.

For callstack mode we don't need any of this information; so we can
avoid the unnecessary MSR read. Add flags to the perf interface where
perf record can request not collecting this information.

Add branch_sample_type flags for CYCLES and FLAGS. It's a bit unusual
for branch_sample_types to be negative (disable), not positive (enable),
but since the legacy ABI reported the flags we need some form of
explicit disabling to avoid breaking the ABI.

After we have the flags the x86 perf code can keep track if any users
need the flags. If noone needs it the information is not collected.

This cuts down the cost of LBR callstack on Skylake significantly.
Profiling a kernel build with LBR call stack the average run time of
the PMI handler drops by 43%.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1445366797-30894-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:25 +01:00
Andi Kleen 75925e1ad7 perf/x86: Optimize stack walk user accesses
Change the perf user stack walking to use the new
__copy_from_user_nmi(), and split each access into word sized transfer
sizes. This allows to inline the complete access and optimize it all
into a single load.

The main advantage is that this avoids the overhead of double page
faults.  When normal copy_from_user() fails it reexecutes the copy to
compute an accurate number of non copied bytes. This leads to
executing the expensive page fault twice.

While walking stacks having a fault at some point is relatively common
(typically when some part of the program isn't compiled with frame
pointers), so this is a large overhead.

With the optimized copies we avoid this problem because they only do
all accesses once. And of course they're much faster too when the
access does not fault because they're just single instructions instead
of complex function calls.

While profiling a kernel build with -g, the patch brings down the
average time of the PMI handler from 966ns to 552ns (-43%).

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1445551641-13379-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:25 +01:00
Peter Zijlstra 90eec103b9 treewide: Remove old email address
There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:44:58 +01:00
Andi Kleen b28ae9560b perf/x86: Fix LBR call stack save/restore
This fixes a bug I added in the following commit:

  90405aa022 ("perf/x86/intel/lbr: Limit LBR accesses to TOS in callstack mode")

The bug could lead to lost LBR call stacks. When restoring the LBR state
we need to use the TOS of the previous context, not the current context.
To do that we need to save/restore the TOS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1445366797-30894-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:44:57 +01:00
Stephane Eranian 614e4c4ebc perf/core: Robustify the perf_cgroup_from_task() RCU checks
This patch reinforces the lockdep checks performed by
perf_cgroup_from_tsk() by passing the perf_event_context
whenever possible. It is okay to not hold the RCU read lock
when we know we hold the ctx->lock. This patch makes sure this
property holds.

In some functions, such as perf_cgroup_sched_in(), we do not
pass the context because we are sure we are holding the RCU
read lock.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: edumazet@google.com
Link: http://lkml.kernel.org/r/1447322404-10920-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:21:03 +01:00
Len Brown 2fde46b79e x86/smpboot: Re-enable init_udelay=0 by default on modern CPUs
Fix a Linux-4.3 corner case performance regression, introduced by commit:

  f1ccd24931 ("x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior")

which allowed the cmdline  "cpu_init_udelay=" to work with all values,
including the default of 10000.

But in setting the default of 10000, it over-rode the code stat sets
the delay 0 on modern processors.

Also, tidy up use of INT/UINT.

Reported-by: Shane <shrybman@teksavvy.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dparsons@brightdsl.net
Link: http://lkml.kernel.org/r/9082eb809ef40dad02db714759c7aaf618c518d4.1448232494.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:08:33 +01:00
Linus Torvalds 069ec22915 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This update contains:

   - MPX updates for handling 32bit processes

   - A fix for a long standing bug in 32bit signal frame handling
     related to FPU/XSAVE state

   - Handle get_xsave_addr() correctly in KVM

   - Fix SMAP check under paravirtualization

   - Add a comment to the static function trace entry to avoid further
     confusion about the difference to dynamic tracing"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Fix SMAP check in PVOPS environments
  x86/ftrace: Add comment on static function tracing
  x86/fpu: Fix get_xsave_addr() behavior under virtualization
  x86/fpu: Fix 32-bit signal frame handling
  x86/mpx: Fix 32-bit address space calculation
  x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels
2015-11-22 12:00:12 -08:00
Andrew Cooper 581b7f158f x86/cpu: Fix SMAP check in PVOPS environments
There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <lguest@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:07:49 +01:00
Namhyung Kim 112677d683 x86/ftrace: Add comment on static function tracing
There was a confusion between update_ftrace_function() and static
function tracing trampoline regarding 3rd parameter (ftrace_ops).
Add a comment for clarification.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1447721004-2551-1-git-send-email-namhyung@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:07:49 +01:00
Juergen Gross 4609586592 x86/paravirt: Remove unused pv_apic_ops structure
The only member of that structure is startup_ipi_hook which is always
set to paravirt_nop.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: jeremy@goop.org
Cc: chrisw@sous-sol.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xen.org
Cc: konrad.wilk@oracle.com
Cc: boris.ostrovsky@oracle.com
Link: http://lkml.kernel.org/r/1447767872-16730-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:03:58 +01:00
Thomas Gleixner 2f7a3f8e87 x86/tsc: Remove unused tsc_pre_init() hook
No more users. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
2015-11-19 11:03:13 +01:00
Linus Torvalds 0ca9b67606 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
 "Mostly updates to the perf tool plus two fixes to the kernel core code:

   - Handle tracepoint filters correctly for inherited events (Peter
     Zijlstra)

   - Prevent a deadlock in perf_lock_task_context (Paul McKenney)

   - Add missing newlines to some pr_err() calls (Arnaldo Carvalho de
     Melo)

   - Print full source file paths when using 'perf annotate --print-line
     --full-paths' (Michael Petlan)

   - Fix 'perf probe -d' when just one out of uprobes and kprobes is
     enabled (Wang Nan)

   - Add compiler.h to list.h to fix 'make perf-tar-src-pkg' generated
     tarballs, i.e. out of tree building (Arnaldo Carvalho de Melo)

   - Add the llvm-src-base.c and llvm-src-kbuild.c files, generated by
     the 'perf test' LLVM entries, when running it in-tree, to
     .gitignore (Yunlong Song)

   - libbpf error reporting improvements, using a strerror interface to
     more precisely tell the user about problems with the provided
     scriptlet, be it in C or as a ready made object file (Wang Nan)

   - Do not be case sensitive when searching for matching 'perf test'
     entries (Arnaldo Carvalho de Melo)

   - Inform the user about objdump failures in 'perf annotate' (Andi
     Kleen)

   - Improve the LLVM 'perf test' entry, introduce a new ones for BPF
     and kbuild tests to check the environment used by clang to compile
     .c scriptlets (Wang Nan)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
  tools include: Add compiler.h to list.h
  perf probe: Verify parameters in two functions
  perf session: Add missing newlines to some pr_err() calls
  perf annotate: Support full source file paths for srcline fix
  perf test: Add llvm-src-base.c and llvm-src-kbuild.c to .gitignore
  perf: Fix inherited events vs. tracepoint filters
  perf: Disable IRQs across RCU RS CS that acquires scheduler lock
  perf test: Do not be case sensitive when searching for matching tests
  perf test: Add 'perf test BPF'
  perf test: Enhance the LLVM tests: add kbuild test
  perf test: Enhance the LLVM test: update basic BPF test program
  perf bpf: Improve BPF related error messages
  perf tools: Make fetch_kernel_version() publicly available
  bpf tools: Add new API bpf_object__get_kversion()
  bpf tools: Improve libbpf error reporting
  perf probe: Cleanup find_perf_probe_point_from_map to reduce redundancy
  perf annotate: Inform the user about objdump failures in --stdio
  perf stat: Make stat options global
  perf sched latency: Fix thread pid reuse issue
  ...
2015-11-15 09:36:24 -08:00
Linus Torvalds bba072dfd7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A couple of fixes and updates related to x86:

   - Fix the W+X check regression on XEN

   - The real fix for the low identity map trainwreck

   - Probe legacy PIC early instead of unconditionally allocating legacy
     irqs

   - Add cpu verification to long mode entry

   - Adjust the cache topology to AMD Fam17H systems

   - Let Merrifield use the TSC across S3"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Call verify_cpu() after having entered long mode too
  x86/setup: Fix low identity map for >= 2GB kernel range
  x86/mm: Skip the hypervisor range when walking PGD
  x86/AMD: Fix last level cache topology for AMD Fam17h systems
  x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
  x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield
2015-11-15 09:32:59 -08:00
Huang Rui 41ac18ebfc perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro
Signed-off-by: Huang Rui <ray.huang@amd.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1446630233-3166-1-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:44:25 +01:00
Huaitong Han a05917b6ba x86/fpu: Fix get_xsave_addr() behavior under virtualization
KVM uses the get_xsave_addr() function in a different fashion from
the native kernel, in that the 'xsave' parameter belongs to guest vcpu,
not the currently running task.

But 'xsave' is replaced with current task's (host) xsave structure, so
get_xsave_addr() will incorrectly return the bad xsave address to KVM.

Fix it so that the passed in 'xsave' address is used - as intended
originally.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/1446800423-21622-1-git-send-email-huaitong.han@intel.com
[ Tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:34:58 +01:00
Dave Hansen ab6b529475 x86/fpu: Fix 32-bit signal frame handling
(This should have gone to LKML originally. Sorry for the extra
 noise, folks on the cc.)

Background:

Signal frames on x86 have two formats:

  1. For 32-bit executables (whether on a real 32-bit kernel or
     under 32-bit emulation on a 64-bit kernel) we have a
    'fpregset_t' that includes the "FSAVE" registers.

  2. For 64-bit executables (on 64-bit kernels obviously), the
     'fpregset_t' is smaller and does not contain the "FSAVE"
     state.

When creating the signal frame, we have to be aware of whether
we are running a 32 or 64-bit executable so we create the
correct format signal frame.

Problem:

save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
called for a 32-bit executable.  This is for real 32-bit and
ia32 emulation.

But, fpu__init_prepare_fx_sw_frame() only initializes
'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
32-bit kernels.

This leads to really wierd situations where 32-bit programs
lose their extended state when returning from a signal handler.
The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
out to userspace in save_xstate_epilog().  But when returning
from the signal, the kernel errors out in check_for_xstate()
when it does not see FP_XSTATE_MAGIC1 present (because it was
zeroed).  This leads to the FPU/XSAVE state being initialized.

For MPX, this leads to the most permissive state and means we
silently lose bounds violations.  I think this would also mean
that we could lose *ANY* FPU/SSE/AVX state.  I'm not sure why
no one has spotted this bug.

I believe this was broken by:

	72a671ced6 ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")

way back in 2012.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Cc: fenghua.yu@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12 09:23:45 +01:00
Linus Torvalds ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds 99aaa9c64b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
 "A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
  (as in such case it's possible for module struct to share a page with
  executable text, which is currently not being handled with grace) from
  Josh Poimboeuf"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
2015-11-07 12:15:17 -08:00
Borislav Petkov 79f1d83692 x86/paravirt: Kill some unused patching functions
paravirt_patch_ignore() is completely unused and paravirt_patch_nop()
doesn't do a whole lot. Remove them both.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1446542329-32037-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 18:09:35 +01:00
Borislav Petkov 04633df0c4 x86/cpu: Call verify_cpu() after having entered long mode too
When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.

For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.

A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:

  # rdmsr -a 0x1a0
  400850089
  850089
  850089
  850089

I know, right?!

There's not even an off switch in there.

So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.

Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com>
Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:45:02 +01:00
Krzysztof Mazur 68accac392 x86/setup: Fix low identity map for >= 2GB kernel range
The commit f5f3497cad extended the low identity mapping. However, if
the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory
split), the normal memory mapping is overwritten by the low identity
mapping causing a crash. To avoid overwritting, limit the low identity
map to cover only memory before kernel range (PAGE_OFFSET).

Fixes: f5f3497cad "x86/setup: Extend low identity map to cover whole kernel range
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:39:40 +01:00
Aravind Gopalakrishnan 3849e91f57 x86/AMD: Fix last level cache topology for AMD Fam17h systems
On AMD Fam17h systems, the last level cache is not resident in the
northbridge. Therefore, we cannot assign cpu_llc_id to the same value as
Node ID as we have been doing until now.

We should rather look at the ApicID bits of the core to provide us the
last level cache ID info.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Link: http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:37:51 +01:00
Vitaly Kuznetsov 8c058b0b9c x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
Commit d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain
interfaces") brought a regression for Hyper-V Gen2 instances. These
instances don't have i8259 legacy PIC but they use legacy IRQs for serial
port, rtc, and acpi. With this commit included we end up with these IRQs
not initialized. Earlier, there was a special workaround for legacy IRQs
in mp_map_pin_to_irq() doing mp_irqdomain_map() without looking at
nr_legacy_irqs() and now we fail in __irq_domain_alloc_irqs() when
irq_domain_alloc_descs() returns -EEXIST.

The essence of the issue seems to be that early_irq_init() calls
arch_probe_nr_irqs() to figure out the number of legacy IRQs before
we probe for i8259 and gets 16. Later when init_8259A() is called we switch
to NULL legacy PIC and nr_legacy_irqs() starts to return 0 but we already
have 16 descs allocated.

Solve the issue by separating i8259 probe from init and calling it in
arch_probe_nr_irqs() before we actually use nr_legacy_irqs() information.

Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446543614-3621-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:37:37 +01:00
Andy Shevchenko 354dbaa7ff x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield
The Intel Merrifield SoC is a successor of the Intel MID line of
SoCs. Let's set the neccessary capability for that chip. See commit
c54fdbb282 (x86: Add cpu capability flag X86_FEATURE_NONSTOP_TSC_S3)
for the details.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: http://lkml.kernel.org/r/1444319786-36125-1-git-send-email-andriy.shevchenko@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:37:30 +01:00
Martin Kepplinger 78e3c79510 arch/x86/kernel/cpu/perf_event_msr.c: use sign_extend64() for sign extension
Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Maxime Coquelin <maxime.coquelin@st.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mel Gorman d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Linus Torvalds 22402cd0af Most of the changes are clean ups and small fixes. Some of them have
stable tags to them. I searched through my INBOX just as the merge window
 opened and found lots of patches to pull. I ran them through all my tests
 and they were in linux-next for a few days.
 
 Features added this release:
 ----------------------------
 
  o Module globbing. You can now filter function tracing to several
    modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
 
  o Tracer specific options are now visible even when the tracer is not
    active. It was rather annoying that you can only see and modify tracer
    options after enabling the tracer. Now they are in the options/ directory
    even when the tracer is not active. Although they are still only visible
    when the tracer is active in the trace_options file.
 
  o Trace options are now per instance (although some of the tracer specific
    options are global)
 
  o New tracefs file: set_event_pid. If any pid is added to this file, then
    all events in the instance will filter out events that are not part of
    this pid. sched_switch and sched_wakeup events handle next and the wakee
    pids.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWPLQ5AAoJEKKk/i67LK/8CTYIAI1u8DE5QCzv3J0p54jVpNVR
 J5FqEU3eXIzd6FS4JXD4nxCeMpUZAy21YnhlZpsnrbJJM5bc9bUsBCwiKKM+MuSZ
 ztmy2sgYKkO0h/KUdhNgYJrzis3/Ojquyx9iAqK5ST/Fr+nKYx81akFKjNK53iur
 RJRut45sSa8rv11LaL8sgJ6hAWQTc+YkybUdZ5xaMdJmZ6A61T7Y6VzTjbUexuvL
 hntCfTjYLtVd8dbfknAnf3B7n/VOO3IFF85wr7ciYR5oEVfPrF8tHmJBlhHExPpX
 kaXAiDDRY/UTg/5DQqnp4zmxJoR5BQ2l4pT5PwiLcnwhcphIDNYS8EYUmOYAWjU=
 =TjOE
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracking updates from Steven Rostedt:
 "Most of the changes are clean ups and small fixes.  Some of them have
  stable tags to them.  I searched through my INBOX just as the merge
  window opened and found lots of patches to pull.  I ran them through
  all my tests and they were in linux-next for a few days.

  Features added this release:
  ----------------------------

   - Module globbing.  You can now filter function tracing to several
     modules.  # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)

   - Tracer specific options are now visible even when the tracer is not
     active.  It was rather annoying that you can only see and modify
     tracer options after enabling the tracer.  Now they are in the
     options/ directory even when the tracer is not active.  Although
     they are still only visible when the tracer is active in the
     trace_options file.

   - Trace options are now per instance (although some of the tracer
     specific options are global)

   - New tracefs file: set_event_pid.  If any pid is added to this file,
     then all events in the instance will filter out events that are not
     part of this pid.  sched_switch and sched_wakeup events handle next
     and the wakee pids"

* tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
  tracefs: Fix refcount imbalance in start_creating()
  tracing: Put back comma for empty fields in boot string parsing
  tracing: Apply tracer specific options from kernel command line.
  tracing: Add some documentation about set_event_pid
  ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark
  tracing: Allow dumping traces without tracking trace started cpus
  ring_buffer: Fix more races when terminating the producer in the benchmark
  ring_buffer: Do no not complete benchmark reader too early
  tracing: Remove redundant TP_ARGS redefining
  tracing: Rename max_stack_lock to stack_trace_max_lock
  tracing: Allow arch-specific stack tracer
  recordmcount: arm64: Replace the ignored mcount call into nop
  recordmcount: Fix endianness handling bug for nop_mcount
  tracepoints: Fix documentation of RCU lockdep checks
  tracing: ftrace_event_is_function() can return boolean
  tracing: is_legal_op() can return boolean
  ring-buffer: rb_event_is_commit() can return boolean
  ring-buffer: rb_per_cpu_empty() can return boolean
  ring_buffer: ring_buffer_empty{cpu}() can return boolean
  ring-buffer: rb_is_reader_page() can return boolean
  ...
2015-11-06 13:30:20 -08:00
Josh Poimboeuf e2391a2dca livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
When loading a patch module on a kernel with
!CONFIG_DEBUG_SET_MODULE_RONX, the following crash occurs:

  [  205.988776] livepatch: enabling patch 'kpatch_meminfo_string'
  [  205.989829] BUG: unable to handle kernel paging request at ffffffffa08d2fc0
  [  205.989863] IP: [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
  [  205.989888] PGD 1a10067 PUD 1a11063 PMD 7bcde067 PTE 3740e161
  [  205.989915] Oops: 0003 [#1] SMP
  [  205.990187] CPU: 2 PID: 14570 Comm: insmod Tainted: G           O  K 4.1.12
  [  205.990214] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
  [  205.990249] task: ffff8800374aaa90 ti: ffff8800794b8000 task.ti: ffff8800794b8000
  [  205.990276] RIP: 0010:[<ffffffff8154fecb>]  [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
  [  205.990307] RSP: 0018:ffff8800794bbd58  EFLAGS: 00010246
  [  205.990327] RAX: 0000000000000000 RBX: ffffffffa08d2fc0 RCX: 0000000000000000
  [  205.990356] RDX: 01ffff8000000080 RSI: 0000000000000000 RDI: ffffffff81a54b40
  [  205.990382] RBP: ffff88007b4c4d80 R08: 0000000000000007 R09: 0000000000000000
  [  205.990408] R10: 0000000000000008 R11: ffffea0001f18840 R12: 0000000000000000
  [  205.990433] R13: 0000000000000001 R14: ffffffffa08d2fc0 R15: ffff88007bd0bc40
  [  205.990459] FS:  00007f1128fbc700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
  [  205.990488] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  205.990509] CR2: ffffffffa08d2fc0 CR3: 000000002606e000 CR4: 00000000001406e0
  [  205.990536] Stack:
  [  205.990545]  ffff8800794bbec8 0000000000000001 ffffffffa08d3010 ffffffff810ecea9
  [  205.990576]  ffffffff810e8e40 000000000005f360 ffff88007bd0bc50 ffffffffa08d3240
  [  205.990608]  ffffffffa08d52c0 ffffffffa08d3210 ffff8800794bbed8 ffff8800794bbf1c
  [  205.990639] Call Trace:
  [  205.990651]  [<ffffffff810ecea9>] ? load_module+0x1e59/0x23a0
  [  205.990672]  [<ffffffff810e8e40>] ? store_uevent+0x40/0x40
  [  205.990693]  [<ffffffff810e99b5>] ? copy_module_from_fd.isra.49+0xb5/0x140
  [  205.990718]  [<ffffffff810ed5bd>] ? SyS_finit_module+0x7d/0xa0
  [  205.990741]  [<ffffffff81556832>] ? system_call_fastpath+0x16/0x75
  [  205.990763] Code: f9 00 00 00 74 23 49 c7 c0 92 e1 60 81 48 8d 53 18 89 c1 4c 89 c6 48 c7 c7 f0 85 7d 81 31 c0 e8 71 fa ff ff e8 58 0e 00 00 31 f6 <c7> 03 00 00 00 00 48 89 da 48 c7 c7 20 c7 a5 81 e8 d0 ec b3 ff
  [  205.990916] RIP  [<ffffffff8154fecb>] do_init_module+0x8c/0x1ba
  [  205.990940]  RSP <ffff8800794bbd58>
  [  205.990953] CR2: ffffffffa08d2fc0

With !CONFIG_DEBUG_SET_MODULE_RONX, module text and rodata pages are
writable, and the debug_align() macro allows the module struct to share
a page with executable text.  When klp_write_module_reloc() calls
set_memory_ro() on the page, it effectively turns the module struct into
a read-only structure, resulting in a page fault when load_module() does
"mod->state = MODULE_STATE_LIVE".

Reported-by: Cyril B. <cbay@alwaysdata.com>
Tested-by: Cyril B. <cbay@alwaysdata.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-11-06 11:10:03 +01:00
Linus Torvalds 933425fb00 s390: A bunch of fixes and optimizations for interrupt and time
handling.
 
 PPC: Mostly bug fixes.
 
 ARM: No big features, but many small fixes and prerequisites including:
 - a number of fixes for the arch-timer
 - introducing proper level-triggered semantics for the arch-timers
 - a series of patches to synchronously halt a guest (prerequisite for
   IRQ forwarding)
 - some tracepoint improvements
 - a tweak for the EL2 panic handlers
 - some more VGIC cleanups getting rid of redundant state
 
 x86: quite a few changes:
 
 - support for VT-d posted interrupts (i.e. PCI devices can inject
 interrupts directly into vCPUs).  This introduces a new component (in
 virt/lib/) that connects VFIO and KVM together.  The same infrastructure
 will be used for ARM interrupt forwarding as well.
 
 - more Hyper-V features, though the main one Hyper-V synthetic interrupt
 controller will have to wait for 4.5.  These will let KVM expose Hyper-V
 devices.
 
 - nested virtualization now supports VPID (same as PCID but for vCPUs)
 which makes it quite a bit faster
 
 - for future hardware that supports NVDIMM, there is support for clflushopt,
 clwb, pcommit
 
 - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
 userspace, which reduces the attack surface of the hypervisor
 
 - obligatory smattering of SMM fixes
 
 - on the guest side, stable scheduler clock support was rewritten to not
 require help from the hypervisor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
 f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
 VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
 p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
 PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
 iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
 =iv2z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.4.

  s390:
     A bunch of fixes and optimizations for interrupt and time handling.

  PPC:
     Mostly bug fixes.

  ARM:
     No big features, but many small fixes and prerequisites including:

      - a number of fixes for the arch-timer

      - introducing proper level-triggered semantics for the arch-timers

      - a series of patches to synchronously halt a guest (prerequisite
        for IRQ forwarding)

      - some tracepoint improvements

      - a tweak for the EL2 panic handlers

      - some more VGIC cleanups getting rid of redundant state

  x86:
     Quite a few changes:

      - support for VT-d posted interrupts (i.e. PCI devices can inject
        interrupts directly into vCPUs).  This introduces a new
        component (in virt/lib/) that connects VFIO and KVM together.
        The same infrastructure will be used for ARM interrupt
        forwarding as well.

      - more Hyper-V features, though the main one Hyper-V synthetic
        interrupt controller will have to wait for 4.5.  These will let
        KVM expose Hyper-V devices.

      - nested virtualization now supports VPID (same as PCID but for
        vCPUs) which makes it quite a bit faster

      - for future hardware that supports NVDIMM, there is support for
        clflushopt, clwb, pcommit

      - support for "split irqchip", i.e.  LAPIC in kernel +
        IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
        the hypervisor

      - obligatory smattering of SMM fixes

      - on the guest side, stable scheduler clock support was rewritten
        to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
  KVM: VMX: Fix commit which broke PML
  KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
  KVM: x86: allow RSM from 64-bit mode
  KVM: VMX: fix SMEP and SMAP without EPT
  KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
  KVM: device assignment: remove pointless #ifdefs
  KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
  KVM: x86: zero apic_arb_prio on reset
  drivers/hv: share Hyper-V SynIC constants with userspace
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: removing unused variable
  KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: s390: use simple switch statement as multiplexer
  KVM: s390: drop useless newline in debugging data
  KVM: s390: SCA must not cross page boundaries
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP
  ...
2015-11-05 16:26:26 -08:00
Thomas Gleixner 72613184a1 x86/smp: Remove single IPI wrapper
All APIC implementation have send_IPI now. Remove the conditional in
the calling code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.807817097@linutronix.de
2015-11-05 13:07:54 +01:00
Thomas Gleixner 6153058a03 x86/apic: Use default send single IPI wrapper
Wire up the default_send_IPI_single() wrapper to the last holdouts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.711224890@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner 7e29393b20 x86/apic: Provide default send single IPI wrapper
Instead of doing the wrapping in the smp code we can provide a default
wrapper for those APICs which insist on cpumasks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.631111846@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner 4727da2eb1 x86/apic: Implement single IPI for apic_noop
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.455429817@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner c61a0d31ba x86/apic: Wire up single IPI for apic_numachip
The function already exists.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.551445489@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner 8642ea953d x86/apic: Wire up single IPI for x2apic_uv
The function already exists.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.376775625@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner f2bffe8a3e x86/apic: Implement single IPI for x2apic_phys
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.296438009@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner 5789a12e28 x86/apic: Wire up single IPI for bigsmp_apic
Use the default implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.213292642@linutronix.de
2015-11-05 13:07:52 +01:00
Thomas Gleixner 500bd02fb1 x86/apic: Remove pointless indirections from bigsmp_apic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.133086575@linutronix.de
2015-11-05 13:07:52 +01:00
Thomas Gleixner 68cd88ff8d x86/apic: Wire up single IPI for apic_physflat
Use the default implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.055046864@linutronix.de
2015-11-05 13:07:52 +01:00
Thomas Gleixner 449112f4f3 x86/apic: Remove pointless indirections from apic_physflat
No value in having 32 byte extra text.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.975653382@linutronix.de
2015-11-05 13:07:52 +01:00
Thomas Gleixner 53be0fac8b x86/apic: Implement default single target IPI function
apic_physflat and bigsmp_apic can share that implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.898543767@linutronix.de
2015-11-05 13:07:52 +01:00
Linus Torvalds 7b6ce46cb3 x86/apic: Implement single target IPI function for x2apic_cluster
[ tglx: Split it out from the patch which provides the new callback ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.817975597@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-05 13:07:52 +01:00
Linus Torvalds 539da78772 x86/apic: Add a single-target IPI function to the apic
We still fall back on the "send mask" versions if an apic definition
doesn't have the single-target version, but at least this allows the
(trivial) case for the common clustered x2apic case.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.737120838@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-05 13:07:51 +01:00
Linus Torvalds 0d51ce9ca1 Power management and ACPI updates for v4.4-rc1
- ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).
 
    The most significant change is to allow the AML debugger to be
    built into the kernel.  On top of that there is an update related
    to the NFIT table (the ACPI persistent memory interface)
    and a few fixes and cleanups.
 
  - ACPI CPPC2 (Collaborative Processor Performance Control v2)
    support along with a cpufreq frontend (Ashwin Chaugule).
 
    This can only be enabled on ARM64 at this point.
 
  - New ACPI infrastructure for the early probing of IRQ chips and
    clock sources (Marc Zyngier).
 
  - Support for a new hierarchical properties extension of the ACPI
    _DSD (Device Specific Data) device configuration object allowing
    the kernel to handle hierarchical properties (provided by the
    platform firmware this way) automatically and make them available
    to device drivers via the generic device properties interface
    (Rafael Wysocki).
 
  - Generic device properties API extension to obtain an index of
    certain string value in an array of strings, along the lines of
    of_property_match_string(), but working for all of the supported
    firmware node types, and support for the "dma-names" device
    property based on it (Mika Westerberg).
 
  - ACPI core fix to parse the MADT (Multiple APIC Description Table)
    entries in the order expected by platform firmware (and mandated
    by the specification) to avoid confusion on systems with more than
    255 logical CPUs (Lukasz Anaczkowski).
 
  - Consolidation of the ACPI-based handling of PCI host bridges
    on x86 and ia64 (Jiang Liu).
 
  - ACPI core fixes to ensure that the correct IRQ number is used to
    represent the SCI (System Control Interrupt) in the cases when
    it has been re-mapped (Chen Yu).
 
  - New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).
 
  - ACPI EC driver fixes (Lv Zheng).
 
  - Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
    Kosina, Rami Rosen, Rasmus Villemoes).
 
  - New mechanism in the PM core allowing drivers to check if the
    platform firmware is going to be involved in the upcoming system
    suspend or if it has been involved in the suspend the system is
    resuming from at the moment (Rafael Wysocki).
 
    This should allow drivers to optimize their suspend/resume
    handling in some cases and the changes include a couple of users
    of it (the i8042 input driver, PCI PM).
 
  - PCI PM fix to prevent runtime-suspended devices with PME enabled
    from being resumed during system suspend even if they aren't
    configured to wake up the system from sleep (Rafael Wysocki).
 
  - New mechanism to report the number of a wakeup IRQ that woke up
    the system from sleep last time (Alexandra Yates).
 
  - Removal of unused interfaces from the generic power domains
    framework and fixes related to latency measurements in that
    code (Ulf Hansson, Daniel Lezcano).
 
  - cpufreq core sysfs interface rework to make it handle CPUs that
    share performance scaling settings (represented by a common
    cpufreq policy object) more symmetrically (Viresh Kumar).
 
    This should help to simplify the CPU offline/online handling among
    other things.
 
  - cpufreq core fixes and cleanups (Viresh Kumar).
 
  - intel_pstate fixes related to the Turbo Activation Ratio (TAR)
    mechanism on client platforms which causes the turbo P-states
    range to vary depending on platform firmware settings (Srinivas
    Pandruvada).
 
  - intel_pstate sysfs interface fix (Prarit Bhargava).
 
  - Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
    and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
    Bhat, Luis de Bethencourt).
 
  - cpuidle mvebu driver cleanups (Russell King).
 
  - OPP (Operating Performance Points) framework code reorganization
    to make it more maintainable (Viresh Kumar).
 
  - Intel Broxton support for the RAPL (Running Average Power Limits)
    power capping driver (Amy Wiles).
 
  - Assorted power management code fixes and cleanups (Dan Carpenter,
    Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
    Villemoes).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJWOC9oAAoJEILEb/54YlRx/c8P/joflwoFsISwJccG62YTQMuc
 bMQKM4Kw0vl5La8+pkLpe5t6+mW7l81UFtYF6Dzd8LOKlD9sszD34z1lHmCeT/oR
 wn0uZpHagRyLMUfoyiEtlU/VRU6WQNNtS3EgjwUi7xgFz9Q0pjcCZ9OQ6vKov1j5
 +6j40ODif5sgo+2vl+rztJiV0SIMkYdkgNqgfN1FE9bdLA2Zkk+PxxJbtGQORuDu
 O/K+XhQT2xWquVWi/1p+VtQxs5glBS1oKm0kogV5bElCvNTRNIVABUNcjogITQwo
 QSAKgoCKIoaIl5jtDT6u5dc0y67q/dMtqOY9fOCcOz1Z7jbWQzR8D7mpFWIsJUPK
 K2LClI3t85ynpN6Jref246A6+C9nwB8JMAiAR11oBw7WbBlkd6tbRgcT5B+iz8UE
 FuCCif7pha/Fs+Jt1YRazscIqteQ2bAhhxikuIPMfw2M6M67MNfVNeKA1bAoSM34
 dH7JsilblitvV7shrwJHwXPXCOF2jEPoK8I4/q2+TR5qUxEpRJjelQxXGSaQScMZ
 iNnjeTgv8H8q+rY5Yjzsl4pxP0Fvf7IuqkptWOJbgepg4cQc9pS87wOpY3uEeQzr
 H7ruaQJFCnLO4aXbPNClsiJARhrBk+qMlsh4vBEyCJ2T0ucb+nIUcN4BTi8t85yl
 X97BfHHUiDoUrnIsNids
 =1gaH
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "Quite a new features are included this time.

  First off, the Collaborative Processor Performance Control interface
  (version 2) defined by ACPI will now be supported on ARM64 along with
  a cpufreq frontend for CPU performance scaling.

  Second, ACPI gets a new infrastructure for the early probing of IRQ
  chips and clock sources (along the lines of the existing similar
  mechanism for DT).

  Next, the ACPI core and the generic device properties API will now
  support a recently introduced hierarchical properties extension of the
  _DSD (Device Specific Data) ACPI device configuration object.  If the
  ACPI platform firmware uses that extension to organize device
  properties in a hierarchical way, the kernel will automatically handle
  it and make those properties available to device drivers via the
  generic device properties API.

  It also will be possible to build the ACPICA's AML interpreter
  debugger into the kernel now and use that to diagnose AML-related
  problems more efficiently.  In the future, this should make it
  possible to single-step AML execution and do similar things.
  Interesting stuff, although somewhat experimental at this point.

  Finally, the PM core gets a new mechanism that can be used by device
  drivers to distinguish between suspend-to-RAM (based on platform
  firmware support) and suspend-to-idle (or other variants of system
  suspend the platform firmware is not involved in) and possibly
  optimize their device suspend/resume handling accordingly.

  In addition to that, some existing features are re-organized quite
  substantially.

  First, the ACPI-based handling of PCI host bridges on x86 and ia64 is
  unified and the common code goes into the ACPI core (so as to reduce
  code duplication and eliminate non-essential differences between the
  two architectures in that area).

  Second, the Operating Performance Points (OPP) framework is
  reorganized to make the code easier to find and follow.

  Next, the cpufreq core's sysfs interface is reorganized to get rid of
  the "primary CPU" concept for configurations in which the same
  performance scaling settings are shared between multiple CPUs.

  Finally, some interfaces that aren't necessary any more are dropped
  from the generic power domains framework.

  On top of the above we have some minor extensions, cleanups and bug
  fixes in multiple places, as usual.

  Specifics:

   - ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).

     The most significant change is to allow the AML debugger to be
     built into the kernel.  On top of that there is an update related
     to the NFIT table (the ACPI persistent memory interface) and a few
     fixes and cleanups.

   - ACPI CPPC2 (Collaborative Processor Performance Control v2) support
     along with a cpufreq frontend (Ashwin Chaugule).

     This can only be enabled on ARM64 at this point.

   - New ACPI infrastructure for the early probing of IRQ chips and
     clock sources (Marc Zyngier).

   - Support for a new hierarchical properties extension of the ACPI
     _DSD (Device Specific Data) device configuration object allowing
     the kernel to handle hierarchical properties (provided by the
     platform firmware this way) automatically and make them available
     to device drivers via the generic device properties interface
     (Rafael Wysocki).

   - Generic device properties API extension to obtain an index of
     certain string value in an array of strings, along the lines of
     of_property_match_string(), but working for all of the supported
     firmware node types, and support for the "dma-names" device
     property based on it (Mika Westerberg).

   - ACPI core fix to parse the MADT (Multiple APIC Description Table)
     entries in the order expected by platform firmware (and mandated by
     the specification) to avoid confusion on systems with more than 255
     logical CPUs (Lukasz Anaczkowski).

   - Consolidation of the ACPI-based handling of PCI host bridges on x86
     and ia64 (Jiang Liu).

   - ACPI core fixes to ensure that the correct IRQ number is used to
     represent the SCI (System Control Interrupt) in the cases when it
     has been re-mapped (Chen Yu).

   - New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).

   - ACPI EC driver fixes (Lv Zheng).

   - Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
     Kosina, Rami Rosen, Rasmus Villemoes).

   - New mechanism in the PM core allowing drivers to check if the
     platform firmware is going to be involved in the upcoming system
     suspend or if it has been involved in the suspend the system is
     resuming from at the moment (Rafael Wysocki).

     This should allow drivers to optimize their suspend/resume handling
     in some cases and the changes include a couple of users of it (the
     i8042 input driver, PCI PM).

   - PCI PM fix to prevent runtime-suspended devices with PME enabled
     from being resumed during system suspend even if they aren't
     configured to wake up the system from sleep (Rafael Wysocki).

   - New mechanism to report the number of a wakeup IRQ that woke up the
     system from sleep last time (Alexandra Yates).

   - Removal of unused interfaces from the generic power domains
     framework and fixes related to latency measurements in that code
     (Ulf Hansson, Daniel Lezcano).

   - cpufreq core sysfs interface rework to make it handle CPUs that
     share performance scaling settings (represented by a common cpufreq
     policy object) more symmetrically (Viresh Kumar).

     This should help to simplify the CPU offline/online handling among
     other things.

   - cpufreq core fixes and cleanups (Viresh Kumar).

   - intel_pstate fixes related to the Turbo Activation Ratio (TAR)
     mechanism on client platforms which causes the turbo P-states range
     to vary depending on platform firmware settings (Srinivas
     Pandruvada).

   - intel_pstate sysfs interface fix (Prarit Bhargava).

   - Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
     and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
     Bhat, Luis de Bethencourt).

   - cpuidle mvebu driver cleanups (Russell King).

   - OPP (Operating Performance Points) framework code reorganization to
     make it more maintainable (Viresh Kumar).

   - Intel Broxton support for the RAPL (Running Average Power Limits)
     power capping driver (Amy Wiles).

   - Assorted power management code fixes and cleanups (Dan Carpenter,
     Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
     Villemoes)"

* tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (108 commits)
  cpufreq: postfix policy directory with the first CPU in related_cpus
  cpufreq: create cpu/cpufreq/policyX directories
  cpufreq: remove cpufreq_sysfs_{create|remove}_file()
  cpufreq: create cpu/cpufreq at boot time
  cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
  cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  PM / Domains: Merge measurements for PM QoS device latencies
  PM / Domains: Don't measure ->start|stop() latency in system PM callbacks
  PM / clk: Fix broken build due to non-matching code and header #ifdefs
  ACPI / Documentation: add copy_dsdt to ACPI format options
  ACPI / sysfs: correctly check failing memory allocation
  ACPI / video: Add a quirk to force native backlight on Lenovo IdeaPad S405
  ACPI / CPPC: Fix potential memory leak
  ACPI / CPPC: signedness bug in register_pcc_channel()
  ACPI / PAD: power_saving_thread() is not freezable
  ACPI / PM: Fix incorrect wakeup IRQ setting during suspend-to-idle
  ACPI: Using correct irq when waiting for events
  ACPI: Use correct IRQ when uninstalling ACPI interrupt handler
  cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver
  cpuidle: mvebu: clean up multiple platform drivers
  ...
2015-11-04 18:10:13 -08:00
Linus Torvalds 4302d506d5 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 sigcontext header cleanups from Ingo Molnar:
 "This series reorganizes and cleans up various aspects of the main
  sigcontext UAPI headers, such as unifying the data structures and
  updating/adding lots of comments to explain all the ABI details and
  quirks.  The headers can now also be built in user-space standalone"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/headers: Clean up too long lines
  x86/headers: Remove <asm/sigcontext.h> references on the kernel side
  x86/headers: Remove direct sigcontext32.h uses
  x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
  x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
  x86/headers: Make sigcontext pointers bit independent
  x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
  x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
  x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
  x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
  x86/headers: Unify register type definitions between 32-bit compat and i386
  x86/headers: Use ABI types consistently in sigcontext*.h
  x86/headers: Separate out legacy user-space structure definitions
  x86/headers: Clean up and better document uapi/asm/sigcontext.h
  x86/headers: Clean up uapi/asm/sigcontext32.h
  x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h
2015-11-03 21:05:40 -08:00
Linus Torvalds ce4d72fac1 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu changes from Ingo Molnar:
 "There are two main areas of changes:

   - Rework of the extended FPU state code to robustify the kernel's
     usage of cpuid provided xstate sizes - and related changes (Dave
     Hansen)"

   - math emulation enhancements: new modern FPU instructions support,
     with testcases, plus cleanups (Denys Vlasnko)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/fpu: Fixup uninitialized feature_name warning
  x86/fpu/math-emu: Add support for FISTTP instructions
  x86/fpu/math-emu, selftests: Add test for FISTTP instructions
  x86/fpu/math-emu: Add support for FCMOVcc insns
  x86/fpu/math-emu: Add support for F[U]COMI[P] insns
  x86/fpu/math-emu: Remove define layer for undocumented opcodes
  x86/fpu/math-emu, selftests: Add tests for FCMOV and FCOMI insns
  x86/fpu/math-emu: Remove !NO_UNDOC_CODE
  x86/fpu: Check CPU-provided sizes against struct declarations
  x86/fpu: Check to ensure increasing-offset xstate offsets
  x86/fpu: Correct and check XSAVE xstate size calculations
  x86/fpu: Add C structures for AVX-512 state components
  x86/fpu: Rework YMM definition
  x86/fpu/mpx: Rework MPX 'xstate' types
  x86/fpu: Add xfeature_enabled() helper instead of test_bit()
  x86/fpu: Remove 'xfeature_nr'
  x86/fpu: Rework XSTATE_* macros to remove magic '2'
  x86/fpu: Rename XFEATURES_NR_MAX
  x86/fpu: Rename XSAVE macros
  x86/fpu: Remove partial LWP support definitions
  ...
2015-11-03 20:50:26 -08:00
Linus Torvalds 0f25f2c1b1 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kgdb fixlet from Ingo Molnar:
 "A single debugging related commit: compress the memory usage of a kgdb
  data structure"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kgdb: Replace bool_int_array[NR_CPUS] with bitmap
2015-11-03 20:12:10 -08:00
Linus Torvalds f323c49b30 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu changes from Ingo Molnar:
 "Two changes in this cycle: a Kconfig help text enhancement, and an AMD
  CLZERO instruction capability detection and enumeration"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add CLZERO detection
  x86/Kconfig/cpus: Fix/complete CPU type help texts
2015-11-03 19:39:42 -08:00
Linus Torvalds 33d46f9765 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "An early_printk cleanup plus deinlining enhancements"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/early_printk: Set __iomem address space for IO
  x86/signal: Deinline get_sigframe, save 240 bytes
  x86: Deinline early_console_register, save 403 bytes
  x86/e820: Deinline e820_type_to_string, save 126 bytes
2015-11-03 19:34:22 -08:00
Linus Torvalds 378e4e9825 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot cleanup from Ingo Molnar:
 "A single commit: remove an obsolete kcrash boot flag"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kexec: Remove obsolete 'in_crash_kexec' flag
2015-11-03 19:28:37 -08:00
Linus Torvalds a75a3f6fc9 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "The main change in this cycle is another step in the big x86 system
  call interface rework by Andy Lutomirski, which moves most of the low
  level x86 entry code from assembly to C, for all syscall entries
  except native 64-bit system calls:

    arch/x86/entry/entry_32.S        | 182 ++++------
    arch/x86/entry/entry_64_compat.S | 547 ++++++++-----------------------
    194 insertions(+), 535 deletions(-)

  ... our hope is that the final remaining step (converting native
  64-bit system calls) will be less painful as all the previous steps,
  given that most of the legacies and quirks are concentrated around
  native 32-bit and compat environments"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/entry/32: Fix FS and GS restore in opportunistic SYSEXIT
  x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on
  um/x86: Fix build after x86 syscall changes
  x86/asm: Remove the xyz_cfi macros from dwarf2.h
  selftests/x86: Style fixes for the 'unwind_vdso' test
  x86/entry/64/compat: Document sysenter_fix_flags's reason for existence
  x86/entry: Split and inline syscall_return_slowpath()
  x86/entry: Split and inline prepare_exit_to_usermode()
  x86/entry: Use pt_regs_to_thread_info() in syscall entry tracing
  x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRY
  x86/entry: Micro-optimize compat fast syscall arg fetch
  x86/entry: Force inlining of 32-bit syscall code
  x86/entry: Make irqs_disabled checks in exit code depend on lockdep
  x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls
  x86/asm: Remove thread_info.sysenter_return
  x86/entry/32: Re-implement SYSENTER using the new C path
  x86/entry/32: Switch INT80 to the new C syscall path
  x86/entry/32: Open-code return tracking from fork and kthreads
  x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls
  x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace
  ...
2015-11-03 18:59:10 -08:00
Linus Torvalds d2bea739f8 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic changes from Ingo Molnar:
 "The main changes in this cycle were:

   - Numachip updates: new hardware support, fixes and cleanups.
     (Daniel J Blueman)

   - misc smaller cleanups and fixlets"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/io_apic: Make eoi_ioapic_pin() static
  x86/irq: Drop unlikely before IS_ERR_OR_NULL
  x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC
  x86/apic: Deinline various functions
  x86/numachip: Fix timer build conflict
  x86/numachip: Introduce Numachip2 timer mechanisms
  x86/numachip: Add Numachip IPI optimisations
  x86/numachip: Add Numachip2 APIC support
  x86/numachip: Cleanup Numachip support
2015-11-03 18:33:15 -08:00
Linus Torvalds 53528695ff Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle were:

   - sched/fair load tracking fixes and cleanups (Byungchul Park)

   - Make load tracking frequency scale invariant (Dietmar Eggemann)

   - sched/deadline updates (Juri Lelli)

   - stop machine fixes, cleanups and enhancements for bugs triggered by
     CPU hotplug stress testing (Oleg Nesterov)

   - scheduler preemption code rework: remove PREEMPT_ACTIVE and related
     cleanups (Peter Zijlstra)

   - Rework the sched_info::run_delay code to fix races (Peter Zijlstra)

   - Optimize per entity utilization tracking (Peter Zijlstra)

   - ... misc other fixes, cleanups and smaller updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS
  sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()
  sched: Start stopper early
  stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
  stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
  stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
  stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
  stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
  sched/x86: Fix typo in __switch_to() comments
  sched/core: Remove a parameter in the migrate_task_rq() function
  sched/core: Drop unlikely behind BUG_ON()
  sched/core: Fix task and run queue sched_info::run_delay inconsistencies
  sched/numa: Fix task_tick_fair() from disabling numa_balancing
  sched/core: Add preempt_count invariant check
  sched/core: More notrace annotations
  sched/core: Kill PREEMPT_ACTIVE
  sched/core, sched/x86: Kill thread_info::saved_preempt_count
  sched/core: Simplify preempt_count tests
  sched/core: Robustify preemption leak checks
  sched/core: Stop setting PREEMPT_ACTIVE
  ...
2015-11-03 18:03:50 -08:00
Linus Torvalds b831ef2cad Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS changes from Ingo Molnar:
 "The main system reliability related changes were from x86, but also
  some generic RAS changes:

   - AMD MCE error injection subsystem enhancements.  (Aravind
     Gopalakrishnan)

   - Fix MCE and CPU hotplug interaction bug.  (Ashok Raj)

   - kcrash bootup robustness fix.  (Baoquan He)

   - kcrash cleanups.  (Borislav Petkov)

   - x86 microcode driver rework: simplify it by unmodularizing it and
     other cleanups.  (Borislav Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
  x86/mce: Add a Scalable MCA vendor flags bit
  MAINTAINERS: Unify the microcode driver section
  x86/microcode/intel: Move #ifdef DEBUG inside the function
  x86/microcode/amd: Remove maintainers from comments
  x86/microcode: Remove modularization leftovers
  x86/microcode: Merge the early microcode loader
  x86/microcode: Unmodularize the microcode driver
  x86/mce: Fix thermal throttling reporting after kexec
  kexec/crash: Say which char is the unrecognized
  x86/setup/crash: Check memblock_reserve() retval
  x86/setup/crash: Cleanup some more
  x86/setup/crash: Remove alignment variable
  x86/setup: Cleanup crashkernel reservation functions
  x86/amd_nb, EDAC: Rename amd_get_node_id()
  x86/setup: Do not reserve crashkernel high memory if low reservation failed
  x86/microcode/amd: Do not overwrite final patch levels
  x86/microcode/amd: Extract current patch level read to a function
  x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC
  x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts
  ...
2015-11-03 17:51:33 -08:00
Linus Torvalds b02ac6b18c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Improve accuracy of perf/sched clock on x86.  (Adrian Hunter)

   - Intel DS and BTS updates.  (Alexander Shishkin)

   - Intel cstate PMU support.  (Kan Liang)

   - Add group read support to perf_event_read().  (Peter Zijlstra)

   - Branch call hardware sampling support, implemented on x86 and
     PowerPC.  (Stephane Eranian)

   - Event groups transactional interface enhancements.  (Sukadev
     Bhattiprolu)

   - Enable proper x86/intel/uncore PMU support on multi-segment PCI
     systems.  (Taku Izumi)

   - ... misc fixes and cleanups.

  The perf tooling team was very busy again with 200+ commits, the full
  diff doesn't fit into lkml size limits.  Here's an (incomplete) list
  of the tooling highlights:

  New features:

   - Change the default event used in all tools (record/top): use the
     most precise "cycles" hw counter available, i.e. when the user
     doesn't specify any event, it will try using cycles:ppp, cycles:pp,
     etc and fall back transparently until it finds a working counter.
     (Arnaldo Carvalho de Melo)

   - Integration of perf with eBPF that, given an eBPF .c source file
     (or .o file built for the 'bpf' target with clang), will get it
     automatically built, validated and loaded into the kernel via the
     sys_bpf syscall, which can then be used and seen using 'perf trace'
     and other tools.

     (Wang Nan)

  Various user interface improvements:

   - Automatic pager invocation on long help output.  (Namhyung Kim)

   - Search for more options when passing args to -h, e.g.: (Arnaldo
     Carvalho de Melo)

        $ perf report -h interface

        Usage: perf report [<options>]

         --gtk    Use the GTK2 interface
         --stdio  Use the stdio interface
         --tui    Use the TUI interface

   - Show ordered command line options when -h is used or when an
     unknown option is specified.  (Arnaldo Carvalho de Melo)

   - If options are passed after -h, show just its descriptions, not all
     options.  (Arnaldo Carvalho de Melo)

   - Implement column based horizontal scrolling in the hists browser
     (top, report), making it possible to use the TUI for things like
     'perf mem report' where there are many more columns than can fit in
     a terminal.  (Arnaldo Carvalho de Melo)

   - Enhance the error reporting of tracepoint event parsing, e.g.:

       $ oldperf record -e sched:sched_switc usleep 1
       event syntax error: 'sched:sched_switc'
                            \___ unknown tracepoint
       Run 'perf list' for a list of valid events

     Now we get the much nicer:

       $ perf record -e sched:sched_switc ls
       event syntax error: 'sched:sched_switc'
                            \___ can't access trace events

       Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc
       Hint:  Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'

     And after we have those mount point permissions fixed:

       $ perf record -e sched:sched_switc ls
       event syntax error: 'sched:sched_switc'
                            \___ unknown tracepoint

       Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found.
       Hint:  Perhaps this kernel misses some CONFIG_ setting to enable this feature?.

     I.e.  basically now the event parsing routing uses the strerror_open()
     routines introduced by and used in 'perf trace' work.  (Jiri Olsa)

   - Fail properly when pattern matching fails to find a tracepoint,
     i.e. '-e non:existent' was being correctly handled, with a proper
     error message about that not being a valid event, but '-e
     non:existent*' wasn't, fix it.  (Jiri Olsa)

   - Do event name substring search as last resort in 'perf list'.
     (Arnaldo Carvalho de Melo)

     E.g.:

       # perf list clock

       List of pre-defined events (to be used in -e):

        cpu-clock                                          [Software event]
        task-clock                                         [Software event]

        uncore_cbox_0/clockticks/                          [Kernel PMU event]
        uncore_cbox_1/clockticks/                          [Kernel PMU event]

        kvm:kvm_pvclock_update                             [Tracepoint event]
        kvm:kvm_update_master_clock                        [Tracepoint event]
        power:clock_disable                                [Tracepoint event]
        power:clock_enable                                 [Tracepoint event]
        power:clock_set_rate                               [Tracepoint event]
        syscalls:sys_enter_clock_adjtime                   [Tracepoint event]
        syscalls:sys_enter_clock_getres                    [Tracepoint event]
        syscalls:sys_enter_clock_gettime                   [Tracepoint event]
        syscalls:sys_enter_clock_nanosleep                 [Tracepoint event]
        syscalls:sys_enter_clock_settime                   [Tracepoint event]
        syscalls:sys_exit_clock_adjtime                    [Tracepoint event]
        syscalls:sys_exit_clock_getres                     [Tracepoint event]
        syscalls:sys_exit_clock_gettime                    [Tracepoint event]
        syscalls:sys_exit_clock_nanosleep                  [Tracepoint event]
        syscalls:sys_exit_clock_settime                    [Tracepoint event]

  Intel PT hardware tracing enhancements:

   - Accept a zero --itrace period, meaning "as often as possible".  In
     the case of Intel PT that is the same as a period of 1 and a unit
     of 'instructions' (i.e.  --itrace=i1i).  (Adrian Hunter)

   - Harmonize itrace's synthesized callchains with the existing
     --max-stack tool option.  (Adrian Hunter)

   - Allow time to be displayed in nanoseconds in 'perf script'.
     (Adrian Hunter)

   - Fix potential infinite loop when handling Intel PT timestamps.
     (Adrian Hunter)

   - Slighly improve Intel PT debug logging.  (Adrian Hunter)

   - Warn when AUX data has been lost, just like when processing
     PERF_RECORD_LOST.  (Adrian Hunter)

   - Further document export-to-postgresql.py script.  (Adrian Hunter)

   - Add option to synthesize branch stack from auxtrace data.  (Adrian
     Hunter)

  Misc notable changes:

   - Switch the default callchain output mode to 'graph,0.5,caller', to
     make it look like the default for other tools, reducing the
     learning curve for people used to 'caller' based viewing.  (Arnaldo
     Carvalho de Melo)

   - various call chain usability enhancements.  (Namhyung Kim)

   - Introduce the 'P' event modifier, meaning 'max precision level,
     please', i.e.:

        $ perf record -e cycles:P usleep 1

     Is now similar to:

        $ perf record usleep 1

     Useful, for instance, when specifying multiple events.  (Jiri Olsa)

   - Add 'socket' sort entry, to sort by the processor socket in 'perf
     top' and 'perf report'.  (Kan Liang)

   - Introduce --socket-filter to 'perf report', for filtering by
     processor socket.  (Kan Liang)

   - Add new "Zoom into Processor Socket" operation in the perf hists
     browser, used in 'perf top' and 'perf report'.  (Kan Liang)

   - Allow probing on kmodules without DWARF.  (Masami Hiramatsu)

   - Fix 'perf probe -l' for probes added to kernel module functions.
     (Masami Hiramatsu)

   - Preparatory work for the 'perf stat record' feature that will allow
     generating perf.data files with counting data in addition to the
     sampling mode we have now (Jiri Olsa)

   - Update libtraceevent KVM plugin.  (Paolo Bonzini)

   - ... plus lots of other enhancements that I failed to list properly,
     by: Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda,
     Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He
     Kuang, Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan
     Liang, Kirill Tkhai, Masami Hiramatsu, Matt Fleming, Namhyung Kim,
     Paolo Bonzini, Peter Zijlstra, Rabin Vincent, Scott Wood, Stephane
     Eranian, Sukadev Bhattiprolu, Taku Izumi, Vaishali Thakkar, Wang
     Nan, Yang Shi and Yunlong Song"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (260 commits)
  perf unwind: Pass symbol source to libunwind
  tools build: Fix libiberty feature detection
  perf tools: Compile scriptlets to BPF objects when passing '.c' to --event
  perf record: Add clang options for compiling BPF scripts
  perf bpf: Attach eBPF filter to perf event
  perf tools: Make sure fixdep is built before libbpf
  perf script: Enable printing of branch stack
  perf trace: Add cmd string table to decode sys_bpf first arg
  perf bpf: Collect perf_evsel in BPF object files
  perf tools: Load eBPF object into kernel
  perf tools: Create probe points for BPF programs
  perf tools: Enable passing bpf object file to --event
  perf ebpf: Add the libbpf glue
  perf tools: Make perf depend on libbpf
  perf symbols: Fix endless loop in dso__split_kallsyms_for_kcore
  perf tools: Enable pre-event inherit setting by config terms
  perf symbols: we can now read separate debug-info files based on a build ID
  perf symbols: Fix type error when reading a build-id
  perf tools: Search for more options when passing args to -h
  perf stat: Cache aggregated map entries in extra cpumap
  ...
2015-11-03 17:38:09 -08:00
Linus Torvalds f5a8160c1e Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI changes from Ingo Molnar:
 "The main changes in this cycle were:

   - further EFI code generalization to make it more workable for ARM64
   - various extensions, such as 64-bit framebuffer address support,
     UEFI v2.5 EFI_PROPERTIES_TABLE support
   - code modularization simplifications and cleanups
   - new debugging parameters
   - various fixes and smaller additions"

* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  efi: Fix warning of int-to-pointer-cast on x86 32-bit builds
  efi: Use correct type for struct efi_memory_map::phys_map
  x86/efi: Fix kernel panic when CONFIG_DEBUG_VIRTUAL is enabled
  efi: Add "efi_fake_mem" boot option
  x86/efi: Rename print_efi_memmap() to efi_print_memmap()
  efi: Auto-load the efi-pstore module
  efi: Introduce EFI_NX_PE_DATA bit and set it from properties table
  efi: Add support for UEFIv2.5 Properties table
  efi: Add EFI_MEMORY_MORE_RELIABLE support to efi_md_typeattr_format()
  efifb: Add support for 64-bit frame buffer addresses
  efi/arm64: Clean up efi_get_fdt_params() interface
  arm64: Use core efi=debug instead of uefi_debug command line parameter
  efi/x86: Move efi=debug option parsing to core
  drivers/firmware: Make efi/esrt.c driver explicitly non-modular
  efi: Use the generic efi.memmap instead of 'memmap'
  acpi/apei: Use appropriate pgprot_t to map GHES memory
  arm64, acpi/apei: Implement arch_apei_get_mem_attributes()
  arm64/mm: Add PROT_DEVICE_nGnRnE and PROT_NORMAL_WT
  acpi, x86: Implement arch_apei_get_mem_attributes()
  efi, x86: Rearrange efi_mem_attributes()
  ...
2015-11-03 15:05:52 -08:00
Linus Torvalds 7b2a4306f9 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer departement provides:

   - More y2038 work in the area of ntp and pps.

   - Optimization of posix cpu timers

   - New time related selftests

   - Some new clocksource drivers

   - The usual pile of fixes, cleanups and improvements"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  timeconst: Update path in comment
  timers/x86/hpet: Type adjustments
  clocksource/drivers/armada-370-xp: Implement ARM delay timer
  clocksource/drivers/tango_xtal: Add new timer for Tango SoCs
  clocksource/drivers/imx: Allow timer irq affinity change
  clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr()
  clocksource/drivers/h8300_*: Remove unneeded memset()s
  clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup()
  clocksource/drivers/em_sti: Remove unneeded memset()s
  clocksource/drivers/mediatek: Use GPT as sched clock source
  clockevents/drivers/mtk: Fix spurious interrupt leading to crash
  posix_cpu_timer: Reduce unnecessary sighand lock contention
  posix_cpu_timer: Convert cputimer->running to bool
  posix_cpu_timer: Check thread timers only when there are active thread timers
  posix_cpu_timer: Optimize fastpath_timer_check()
  timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
  timers: Use __fls in apply_slack()
  clocksource: Remove return statement from void functions
  net: sfc: avoid using timespec
  ntp/pps: use y2038 safe types in pps_event_time
  ...
2015-11-03 14:13:41 -08:00
Wan Zongshun 2167ceabf3 x86/cpu: Add CLZERO detection
AMD Fam17h processors introduce support for the CLZERO
instruction. It zeroes out the 64 byte cache line specified in
RAX.

Add the bit here to allow /proc/cpuinfo to list the feature.

Boris: we're adding this as a separate ->x86_capability leaf
because CPUID_80000008_EBX is going to contain more feature bits
and it will fill out with time.

Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ Wrap code in patch form, fix comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1446207099-24948-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:23 +01:00