Commit Graph

7545 Commits

Author SHA1 Message Date
Eric Biggers e63e5d5c15 x86/fpu: Introduce validate_xstate_header()
Move validation of user-supplied xstate_header into a helper function,
in preparation of calling it from both the ptrace and sigreturn syscall
paths.

The new function also considers it to be an error if *any* reserved bits
are set, whereas before we were just clearing most of them silently.

This should reduce the chance of bugs that fail to correctly validate
user-supplied XSAVE areas.  It also will expose any broken userspace
programs that set the other reserved bits; this is desirable because
such programs will lose compatibility with future CPUs and kernels if
those bits are ever used for anything.  (There shouldn't be any such
programs, and in fact in the case where the compacted format is in use
we were already validating xfeatures.  But you never know...)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170924105913.9157-2-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26 09:43:45 +02:00
Ingo Molnar 369a036de2 x86/fpu: Rename fpu__activate_fpstate_read/write() to fpu__prepare_[read|write]()
As per the new nomenclature we don't 'activate' the FPU state
anymore, we initialize it. So drop the _activate_fpstate name
from these functions, which were a bit of a mouthful anyway,
and name them:

	fpu__prepare_read()
	fpu__prepare_write()

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26 09:43:44 +02:00
Ingo Molnar 2ce03d850b x86/fpu: Rename fpu__activate_curr() to fpu__initialize()
Rename this function to better express that it's all about
initializing the FPU state of a task which goes hand in hand
with the fpu::initialized field.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26 09:43:44 +02:00
Ingo Molnar e4a81bfcaa x86/fpu: Rename fpu::fpstate_active to fpu::initialized
The x86 FPU code used to have a complex state machine where both the FPU
registers and the FPU state context could be 'active' (or inactive)
independently of each other - which enabled features like lazy FPU restore.

Much of this complexity is gone in the current code: now we basically can
have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU
state at all, plus full FPU users that save/restore directly with no laziness
whatsoever.

But the fpu::fpstate_active still carries bits of the old complexity - meanwhile
this flag has become a simple flag that shows whether the FPU context saving
area in the thread struct is initialized and used, or not.

Rename it to fpu::initialized to express this simplicity in the name as well.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26 09:43:36 +02:00
Ingo Molnar 685c930d6e x86/fpu: Remove fpu__current_fpstate_write_begin/end()
These functions are not used anymore, so remove them.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-29-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26 09:42:20 +02:00
Thomas Gleixner 2cffad7bad x86/irq: Simplify hotplug vector accounting
Before a CPU is taken offline the number of active interrupt vectors on the
outgoing CPU and the number of vectors which are available on the other
online CPUs are counted and compared. If the active vectors are more than
the available vectors on the other CPUs then the CPU hot-unplug operation
is aborted. This again uses loop based search and is inaccurate.

The bitmap matrix allocator has accurate accounting information and can
tell exactly whether the vector space is sufficient or not.

Emit a message when the number of globaly reserved (unallocated) vectors is
larger than the number of available vectors after offlining a CPU because
after that point request_irq() might fail.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213156.351193962@linutronix.de
2017-09-25 20:52:02 +02:00
Thomas Gleixner 2db1f959d9 x86/vector: Handle managed interrupts proper
Managed interrupts need to reserve interrupt vectors permanently, but as
long as the interrupt is deactivated, the vector should not be active.

Reserve a new system vector, which can be used to initially initialize
MSI/DMAR/IOAPIC entries. In that situation the interrupts are disabled in
the corresponding MSI/DMAR/IOAPIC devices. So the vector should never be
sent to any CPU.

When the managed interrupt is started up, a real vector is assigned from
the managed vector space and configured in MSI/DMAR/IOAPIC.

This allows a clear separation of inactive and active modes and simplifies
the final decisions whether the global vector space is sufficient for CPU
offline operations.

The vector space can be reserved even on offline CPUs and will survive CPU
offline/online operations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213156.104616625@linutronix.de
2017-09-25 20:52:01 +02:00
Thomas Gleixner ba224feac8 x86/vector: Untangle internal state from irq_cfg
The vector management state is not required to live in irq_cfg. irq_cfg is
only relevant for the depending irq domains (IOAPIC, DMAR, MSI ...).

The seperation of the vector management status allows to direct a shut down
interrupt to a special shutdown vector w/o confusing the internal state of
the vector management.

Preparatory change for the rework of managed interrupts and the global
vector reservation scheme.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213155.683712356@linutronix.de
2017-09-25 20:51:59 +02:00
Thomas Gleixner 8d1e3dca7d x86/vector: Add tracepoints for vector management
Add tracepoints for analysing the new vector management

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213155.357986795@linutronix.de
2017-09-25 20:51:58 +02:00
Thomas Gleixner 0fa115da40 x86/irq/vector: Initialize matrix allocator
Initialize the matrix allocator and add the proper accounting points to the
code.

No functional change, just preparation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213155.108410660@linutronix.de
2017-09-25 20:51:56 +02:00
Thomas Gleixner 9f9e3bb1cf x86/apic: Add replacement for cpu_mask_to_apicid()
As preparation for replacing the vector allocator, provide a new function
which takes a cpu number instead of a cpu mask to calculate/lookup the
resulting APIC destination id.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
2017-09-25 20:51:56 +02:00
Thomas Gleixner 3534be05e4 x86/ioapic: Mark legacy vectors at reallocation time
When the legacy PIC vectors are taken over by the IO APIC the current
vector assignement code is tricked to reuse the vector by allocating the
apic data in the early boot process. This can be avoided by marking the
allocation as legacy PIC take over. Preparatory patch for further cleanups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213154.700501979@linutronix.de
2017-09-25 20:51:54 +02:00
Thomas Gleixner ef9e56d894 x86/ioapic: Remove obsolete post hotplug update
With single CPU affinities the post SMP boot vector update is pointless as
it will just leave the affinities on the same vectors and the same CPUs.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213154.308697243@linutronix.de
2017-09-25 20:51:52 +02:00
Thomas Gleixner 7854f82293 x86/vector: Rename used_vectors to system_vectors
used_vectors is a nisnomer as it only has the system vectors which are
excluded from the regular vector allocation marked. It's not what the name
suggests storage for the actually used vectors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213154.150209009@linutronix.de
2017-09-25 20:51:52 +02:00
Thomas Gleixner c1d1ee9ac1 x86/apic: Get rid of apic->target_cpus
The target_cpus() callback of the apic struct is not really useful. Some
APICs return cpu_online_mask and others cpus_all_mask. The latter is bogus
as it does not take holes in the cpus_possible_mask into account.

Replace it with cpus_online_mask which makes the most sense and remove the
callback.

The usage sites will be removed in a later step anyway, so get rid of it
now to have incremental changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213154.070850916@linutronix.de
2017-09-25 20:51:51 +02:00
Thomas Gleixner 72f48a3850 x86/apic: Reorganize struct apic
struct apic has just grown over time by adding function pointers in random
places. Reorganize it so it becomes more cache line friendly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213153.913642524@linutronix.de
2017-09-25 20:51:51 +02:00
Thomas Gleixner 83a105229c x86/apic: Move common APIC callbacks
Move more apic struct specific functions out of the header and the apic
management code into the common source file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213153.834421893@linutronix.de
2017-09-25 20:51:50 +02:00
Thomas Gleixner 6406350583 x86/apic: Sanitize 32/64bit APIC callbacks
The 32bit and the 64bit implementation of default_cpu_present_to_apicid()
and default_check_phys_apicid_present() are exactly the same, but
implemented and located differently.

Move them to common apic code and get rid of the pointless difference.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213153.757329991@linutronix.de
2017-09-25 20:51:50 +02:00
Thomas Gleixner 1da91779e1 x86/apic: Move APIC noop specific functions
Move more inlines to the place where they belong.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213153.677743545@linutronix.de
2017-09-25 20:51:49 +02:00
Thomas Gleixner 0801bbaac0 x86/apic: Move probe32 specific APIC functions
The apic functions which are used in probe_32.c are implemented as inlines
or in apic.c. There is no reason to have them at random places.

Move them to the actual usage site and make them static.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213153.596768194@linutronix.de
2017-09-25 20:51:49 +02:00
Thomas Gleixner 57e0aa4461 x86/apic: Sanitize return value of check_apicid_used()
The check is boolean, but the function returns unsigned long for no value.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213153.516730518@linutronix.de
2017-09-25 20:51:49 +02:00
Thomas Gleixner 727657e620 x86/apic: Sanitize return value of apic.set_apic_id()
The set_apic_id() callback returns an unsigned long value which is handed
in to apic_write() as the value argument u32.

Adjust the return value so it returns u32 right away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213153.437208268@linutronix.de
2017-09-25 20:51:48 +02:00
Thomas Gleixner 981c2eac1c x86/apic: Deinline x2apic functions
These inline functions are used in both the cluster and the physical x2apic
code to fill in the function pointers of the apic structure. That means the
code is generated twice for no reason.

Move it to a C code and reuse it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213153.358954066@linutronix.de
2017-09-25 20:51:48 +02:00
Thomas Gleixner e4ae4c8ea7 Merge branch 'irq/core' into x86/apic
Pick up the dependencies for the vector management rework series.
2017-09-25 20:39:01 +02:00
Thomas Gleixner 7249164346 genirq/irqdomain: Update irq_domain_ops.activate() signature
The irq_domain_ops.activate() callback has no return value and no way to
tell the function that the activation is early.

The upcoming changes to support a reservation scheme which allows to assign
interrupt vectors on x86 only when the interrupt is actually requested
requires:

  - A return value, so activation can fail at request_irq() time
  
  - Information that the activate invocation is early, i.e. before
    request_irq().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Yu Chen <yu.c.chen@intel.com>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: https://lkml.kernel.org/r/20170913213152.848490816@linutronix.de
2017-09-25 20:38:24 +02:00
Dou Liyang af5768507c x86/timers: Make recalibrate_cpu_khz() void
recalibrate_cpu_khz() is called from powernow K7 and Pentium 4/Xeon
CPU freq driver. It recalibrates cpu frequency in case of SMP = n
and doesn't need to return anything.

Mark it void, also remove the #else branch.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1500003247-17368-2-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:22:44 +02:00
Dou Liyang eb496063c9 x86/timers: Move the simple udelay calibration to tsc.h
Commit dd759d93f4 ("x86/timers: Add simple udelay calibration") adds
an static function in x86 boot-time initializations.

But, this function is actually related to TSC, so it should be maintained
in tsc.c, not in setup.c.

Move simple_udelay_calibration() from setup.c to tsc.c and rename it to
tsc_early_delay_calibrate for more readability.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1500003247-17368-1-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:22:44 +02:00
Dou Liyang b371ae0d4a x86/apic: Remove init_bsp_APIC()
init_bsp_APIC() which works for the virtual wire mode is used in ISA irq
initialization at boot time.

With the new APIC interrupt delivery mode scheme, which initializes the
APIC before the first interrupt is expected, init_bsp_APIC() is not longer
required and can be removed.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-13-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:12:37 +02:00
Dou Liyang 34fba3e6b1 x86/init: Add intr_mode_init to x86_init_ops
X86 and XEN initialize interrupt delivery mode in different way.

To avoid conditionals, add a new x86_init_ops function which defaults to
the standard function and can be overridden by the early XEN platform code.

[ tglx: Folded the XEN part which was a separate patch to preserve
  	bisectability ]

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-10-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:03:17 +02:00
Dou Liyang 0c759131ae x86/apic: Unify interrupt mode setup for UP system
In UniProcessor kernel with UP_LATE_INIT=y, the interrupt delivery mode is
initialized in up_late_init().

Use the new unified apic_intr_mode_init() function and remove
APIC_init_uniprocessor().

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-8-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:03:16 +02:00
Dou Liyang 4f45ed9f84 x86/apic: Mark the apic_intr_mode extern for sanity check cleanup
Calling native_smp_prepare_cpus() to prepare for SMP bootup, does some
sanity checking, enables APIC mode and disables SMP feature.

Now, APIC mode setup has been unified to apic_intr_mode_init(), some sanity
checks are redundant and need to be cleanup.

Mark the apic_intr_mode extern to refine the switch and remove the
redundant sanity check.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-7-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:03:16 +02:00
Dou Liyang 4b1244b45c x86/apic: Move logical APIC ID away from apic_bsp_setup()
apic_bsp_setup() sets and returns logical APIC ID for initializing
cpu0_logical_apicid in a SMP-capable system.

The id has nothing to do with the initialization of local APIC and I/O
APIC. And apic_bsp_setup() should be called for interrupt mode setup only.

Move the id setup into a separate helper function for cleanup and mark
apic_bsp_setup() void.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-5-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:03:15 +02:00
Dou Liyang 4b1669e8d1 x86/apic: Prepare for unifying the interrupt delivery modes setup
There are three places which initialize the interrupt delivery modes:

1) init_bsp_APIC() which is called early might setup the through-local-APIC
   virtual wire mode on non SMP systems.

2) In an SMP-capable system, native_smp_prepare_cpus() tries to switch to
   symmetric I/O model.

3) In UP system with UP_LATE_INIT=y, the local APIC and I/O APIC are set up
   in smp_init().

There is no technical reason to make these initializations at random places
and run the kernel with the potentially wrong mode through the early boot
stage, but it has a problematic side effect: The late switch to symmetric
I/O mode causes dump-capture kernel to hang when the kernel command line
option 'notsc' is active.

Provide a new function to unify that three positions. Preparatory patch to
initialize an interrupt mode directly.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yinghai@kernel.org
Cc: bhe@redhat.com
Link: https://lkml.kernel.org/r/1505293975-26005-3-git-send-email-douly.fnst@cn.fujitsu.com
2017-09-25 15:03:14 +02:00
Ville Syrjälä 5ac751d9e6 x86: Don't cast away the __user in __get_user_asm_u64()
Don't cast away the __user in __get_user_asm_u64() on x86-32.
Prevents sparse getting upset.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170912164000.13745-1-ville.syrjala@linux.intel.com
2017-09-25 09:36:16 +02:00
Eric Biggers d5c8028b47 x86/fpu: Reinitialize FPU registers if restoring FPU state fails
Userspace can change the FPU state of a task using the ptrace() or
rt_sigreturn() system calls.  Because reserved bits in the FPU state can
cause the XRSTOR instruction to fail, the kernel has to carefully
validate that no reserved bits or other invalid values are being set.

Unfortunately, there have been bugs in this validation code.  For
example, we were not checking that the 'xcomp_bv' field in the
xstate_header was 0.  As-is, such bugs are exploitable to read the FPU
registers of other processes on the system.  To do so, an attacker can
create a task, assign to it an invalid FPU state, then spin in a loop
and monitor the values of the FPU registers.  Because the task's FPU
registers are not being restored, sometimes the FPU registers will have
the values from another process.

This is likely to continue to be a problem in the future because the
validation done by the CPU instructions like XRSTOR is not immediately
visible to kernel developers.  Nor will invalid FPU states ever be
encountered during ordinary use --- they will only be seen during
fuzzing or exploits.  There can even be reserved bits outside the
xstate_header which are easy to forget about.  For example, the MXCSR
register contains reserved bits, which were not validated by the
KVM_SET_XSAVE ioctl until commit a575813bfe ("KVM: x86: Fix load
damaged SSEx MXCSR register").

Therefore, mitigate this class of vulnerability by restoring the FPU
registers from init_fpstate if restoring from the task's state fails.

We actually used to do this, but it was (perhaps unwisely) removed by
commit 9ccc27a5d2 ("x86/fpu: Remove error return values from
copy_kernel_to_*regs() functions").  This new patch is also a bit
different.  First, it only clears the registers, not also the bad
in-memory state; this is simpler and makes it easier to make the
mitigation cover all callers of __copy_kernel_to_fpregs().  Second, it
does the register clearing in an exception handler so that no extra
instructions are added to context switches.  In fact, we *remove*
instructions, since previously we were always zeroing the register
containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-25 09:26:38 +02:00
Andi Kleen 03eaec81ac x86/fpu: Turn WARN_ON() in context switch into WARN_ON_FPU()
copy_xregs_to_kernel checks if the alternatives have been already
patched.

This WARN_ON() is always executed in every context switch.

All the other checks in fpu internal.h are WARN_ON_FPU(), but
this one is plain WARN_ON(). I assume it was forgotten to switch it.

So switch it to WARN_ON_FPU() too to avoid some unnecessary code
in the context switch, and a potentially expensive cache line miss for the
global variable.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170329062605.4970-1-andi@firstfloor.org
Link: http://lkml.kernel.org/r/20170923130016.21448-24-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:35 +02:00
Rik van Riel 0852b37417 x86/fpu: Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake CPUs
On Skylake CPUs I noticed that XRSTOR is unable to deal with states
created by copyout_from_xsaves() if the xstate has only SSE/YMM state, and
no FP state. That is, xfeatures had XFEATURE_MASK_SSE set, but not
XFEATURE_MASK_FP.

The reason is that part of the SSE/YMM state lives in the MXCSR and
MXCSR_FLAGS fields of the FP state.

Ensure that whenever we copy SSE or YMM state around, the MXCSR and
MXCSR_FLAGS fields are also copied around.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170210085445.0f1cc708@annuminas.surriel.com
Link: http://lkml.kernel.org/r/20170923130016.21448-22-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar 99dc26bda2 x86/fpu: Remove struct fpu::fpregs_active
The previous changes paved the way for the removal of the
fpu::fpregs_active state flag - we now only have the
fpu::fpstate_active and fpu::last_cpu fields left.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-21-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar 6cf4edbe05 x86/fpu: Decouple fpregs_activate()/fpregs_deactivate() from fpu->fpregs_active
The fpregs_activate()/fpregs_deactivate() are currently called in such a pattern:

	if (!fpu->fpregs_active)
		fpregs_activate(fpu);

	...

	if (fpu->fpregs_active)
		fpregs_deactivate(fpu);

But note that it's actually safe to call them without checking the flag first.

This further decouples the fpu->fpregs_active flag from actual FPU logic.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-20-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar f1c8cd0176 x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_active
We want to simplify the FPU state machine by eliminating fpu->fpregs_active,
and we can do that because the two state flags (::fpregs_active and
::fpstate_active) are set essentially together.

The old lazy FPU switching code used to make a distinction - but there's
no lazy switching code anymore, we always switch in an 'eager' fashion.

Do this by first changing all substantial uses of fpu->fpregs_active
to fpu->fpstate_active and adding a few debug checks to double check
our assumption is correct.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-19-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:34 +02:00
Ingo Molnar b3a163081c x86/fpu: Simplify fpu->fpregs_active use
The fpregs_active() inline function is pretty pointless - in almost
all the callsites it can be replaced with a direct fpu->fpregs_active
access.

Do so and eliminate the extra layer of obfuscation.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-16-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar 6d7f7da553 x86/fpu: Flip the parameter order in copy_*_to_xstate()
Make it more consistent with regular memcpy() semantics, where the destination
argument comes first.

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-15-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar 7b9094c688 x86/fpu: Remove 'kbuf' parameter from the copy_user_to_xstate() API
No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-14-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar 59dffa4edb x86/fpu: Remove 'ubuf' parameter from the copy_kernel_to_xstate() API
No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-13-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:33 +02:00
Ingo Molnar 79fecc2b75 x86/fpu: Split copy_user_to_xstate() into copy_kernel_to_xstate() & copy_user_to_xstate()
Similar to:

  x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user()

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-12-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:32 +02:00
Ingo Molnar 56583c9a14 x86/fpu: Clarify parameter names in the copy_xstate_to_*() methods
Right now there's a confusing mixture of 'offset' and 'size' parameters:

 - __copy_xstate_to_*() input parameter 'end_pos' not not really an offset,
   but the full size of the copy to be performed.

 - input parameter 'count' to copy_xstate_to_*() shadows that of
   __copy_xstate_to_*()'s 'count' parameter name - but the roles
   are different: the first one is the total number of bytes to
   be copied, while the second one is a partial copy size.

To unconfuse all this, use a consistent set of parameter names:

 - 'size' is the partial copy size within a single xstate component
 - 'size_total' is the total copy requested
 - 'offset_start' is the requested starting offset.
 - 'offset' is the offset within an xstate component.

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-9-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:32 +02:00
Ingo Molnar d7eda6c99c x86/fpu: Clean up parameter order in the copy_xstate_to_*() APIs
Parameter ordering is weird:

  int copy_xstate_to_kernel(unsigned int pos, unsigned int count, void *kbuf, struct xregs_state *xsave);
  int copy_xstate_to_user(unsigned int pos, unsigned int count, void __user *ubuf, struct xregs_state *xsave);

'pos' and 'count', which are attributes of the destination buffer, are listed before the destination
buffer itself ...

List them after the primary arguments instead.

This makes the code more similar to regular memcpy() variant APIs.

No change in functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-6-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:31 +02:00
Ingo Molnar a69c158fb3 x86/fpu: Remove 'kbuf' parameter from the copy_xstate_to_user() APIs
The 'kbuf' parameter is unused in the _user() side of the API, remove it.

This simplifies the code and makes it easier to think about.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-5-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:31 +02:00
Ingo Molnar 4d981cf2d9 x86/fpu: Remove 'ubuf' parameter from the copy_xstate_to_kernel() APIs
The 'ubuf' parameter is unused in the _kernel() side of the API, remove it.

This simplifies the code and makes it easier to think about.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-4-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:31 +02:00
Ingo Molnar f0d4f30a7f x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user()
copy_xstate_to_user() is a weird API - in part due to a bad API inherited
from the regset APIs.

But don't propagate that bad API choice into the FPU code - so as a first
step split the API into kernel and user buffer handling routines.

(Also split the xstate_copyout() internal helper.)

The split API is a dumb duplication that should be obviously correct, the
real splitting will be done in the next patch.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-3-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:30 +02:00
Ingo Molnar 656f083116 x86/fpu: Rename copyin_to_xsaves()/copyout_from_xsaves() to copy_user_to_xstate()/copy_xstate_to_user()
The 'copyin/copyout' nomenclature needlessly departs from what the modern FPU code
uses, which is:

 copy_fpregs_to_fpstate()
 copy_fpstate_to_sigframe()
 copy_fregs_to_user()
 copy_fxregs_to_kernel()
 copy_fxregs_to_user()
 copy_kernel_to_fpregs()
 copy_kernel_to_fregs()
 copy_kernel_to_fxregs()
 copy_kernel_to_xregs()
 copy_user_to_fregs()
 copy_user_to_fxregs()
 copy_user_to_xregs()
 copy_xregs_to_kernel()
 copy_xregs_to_user()

I.e. according to this pattern, the following rename should be done:

  copyin_to_xsaves()    -> copy_user_to_xstate()
  copyout_from_xsaves() -> copy_xstate_to_user()

or, if we want to be pedantic, denote that that the user-space format is ptrace:

  copyin_to_xsaves()    -> copy_user_ptrace_to_xstate()
  copyout_from_xsaves() -> copy_xstate_to_user_ptrace()

But I'd suggest the shorter, non-pedantic name.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-2-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 13:04:30 +02:00
Uros Bizjak 3c52b5c643 x86/asm: Remove unnecessary \n\t in front of CC_SET() from asm templates
There is no need for \n\t in front of CC_SET(), as the macro already includes these two.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170906151808.5634-1-ubizjak@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24 11:19:01 +02:00
Josh Poimboeuf f5caf621ee x86/asm: Fix inline asm call constraints for Clang
For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

  static inline void foo()
  {
	register void *__sp asm(_ASM_SP);
	asm("call bar" : "+r" (__sp))
  }

Unfortunately, that pattern causes Clang to corrupt the stack pointer.

The fix is easy: convert the stack pointer register variable to a global
variable.

It should be noted that the end result is different based on the GCC
version.  With GCC 6.4, this patch has exactly the same result as
before:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9820389		9491555		8816046		8516940
 after	9820389		9491555		8816046		8516940

With GCC 7.2, however, GCC's behavior has changed.  It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm.  (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.)  It's a bit
overkill, but the performance impact should be negligible.  And in fact,
there's a nice improvement with frame pointers disabled:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9796316		9468236		9076191		8790305
 after	9796957		9464267		9076381		8785949

So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.

Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 15:06:20 +02:00
Colin Ian King 51a9a8284e x86/xen: clean up clang build warning
In the case where sizeof(maddr) != sizeof(long) p is initialized and
never read and clang throws a warning on this.  Move declaration of
p to clean up the clang build warning:

warning: Value stored to 'p' during its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-09-21 12:34:03 -04:00
Andy Lutomirski 52a2af400c x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code
Putting the logical ASID into CR3's PCID bits directly means that we
have two cases to consider separately: ASID == 0 and ASID != 0.
This means that bugs that only hit in one of these cases trigger
nondeterministically.

There were some bugs like this in the past, and I think there's
still one in current kernels.  In particular, we have a number of
ASID-unware code paths that save CR3, write some special value, and
then restore CR3.  This includes suspend/resume, hibernate, kexec,
EFI, and maybe other things I've missed.  This is currently
dangerous: if ASID != 0, then this code sequence will leave garbage
in the TLB tagged for ASID 0.  We could potentially see corruption
when switching back to ASID 0.  In principle, an
initialize_tlbstate_and_flush() call after these sequences would
solve the problem, but EFI, at least, does not call this.  (And it
probably shouldn't -- initialize_tlbstate_and_flush() is rather
expensive.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17 18:59:08 +02:00
Andy Lutomirski 47061a24e2 x86/mm: Factor out CR3-building code
Current, the code that assembles a value to load into CR3 is
open-coded everywhere.  Factor it out into helpers build_cr3() and
build_cr3_noflush().

This makes one semantic change: __get_current_cr3_fast() was wrong
on SME systems.  No one noticed because the only caller is in the
VMX code, and there are no CPUs with both SME and VMX.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17 18:59:08 +02:00
Linus Torvalds 9db59599ae * PPC bugfixes
* RCU splat fix
 * swait races fix
 * pointless userspace-triggerable BUG() fix
 * misc fixes for KVM_RUN corner cases
 * nested virt correctness fixes + one host DoS
 * some cleanups
 * clang build fix
 * fix AMD AVIC with default QEMU command line options
 * x86 bugfixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZvANtAAoJEL/70l94x66DtcIH/0i4fenYamxdq2xiWtsZdbcy
 yfk7mWKEzWGZhP+2X8SSOeetd5mqnIcf2cc4m68UCXpt0zoPEjY0i0D4xrYJHZ03
 R3ifqvtpHByodfT7dOKQPEisO8PdJ5tvecaCMnK3u6SNaNLjAZfhobuLppQHOwQO
 eBvpm0jROpA7ENlDgXtsti8MEdsoWtnmGGrRBY77EGW+t24OpNuGB1EMC0nvcs65
 eChwZ3u8xeU5Ws3Y/DiC8tK8t628znknd8ay02LTZjA303Ftoe192jPpS33V4v15
 kqS6vUFy2lpr9L6wicZtcnnSLtKv+LqecK6o8cxNjzlkOeaZuo9D8UMYsWQfj6w=
 =Ma23
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 - PPC bugfixes
 - RCU splat fix
 - swait races fix
 - pointless userspace-triggerable BUG() fix
 - misc fixes for KVM_RUN corner cases
 - nested virt correctness fixes + one host DoS
 - some cleanups
 - clang build fix
 - fix AMD AVIC with default QEMU command line options
 - x86 bugfixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
  kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly
  kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly
  kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry
  kvm,mips: Fix potential swait_active() races
  kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
  kvm: Serialize wq active checks in kvm_vcpu_wake_up()
  kvm,x86: Fix apf_task_wake_one() wq serialization
  kvm,lapic: Justify use of swait_active()
  kvm,async_pf: Use swq_has_sleeper()
  sched/wait: Add swq_has_sleeper()
  KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
  KVM: Don't accept obviously wrong gsi values via KVM_IRQFD
  kvm: nVMX: Don't allow L2 to access the hardware CR8
  KVM: trace events: update list of exit reasons
  KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
  KVM: X86: Don't block vCPU if there is pending exception
  KVM: SVM: Add irqchip_split() checks before enabling AVIC
  KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
  KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
  KVM: x86: fix clang build
  ...
2017-09-15 15:43:55 -07:00
Suravee Suthikulpanit b2a05feff2 KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-13 18:29:06 +02:00
Joerg Roedel 98152b83e0 KVM: x86: Remove .get_pkru() from kvm_x86_ops
The commit

	9dd21e104bc ('KVM: x86: simplify handling of PKRU')

removed all users and providers of that call-back, but
didn't remove it. Remove it now.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-13 15:33:04 +02:00
Vitaly Kuznetsov 1278f58cde x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition
Commits:

  7dcf90e9e0 ("PCI: hv: Use vPCI protocol version 1.2")
  628f54cc64 ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")

added the same definition and they came in through different trees.
Fix the duplication.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170911150620.3998-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13 11:29:46 +02:00
Juergen Gross 87930019c7 x86/paravirt: Remove no longer used paravirt functions
With removal of lguest some of the paravirt functions are no longer
needed:

	->read_cr4()
	->store_idt()
	->set_pmd_at()
	->set_pud_at()
	->pte_update()

Remove them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13 10:55:15 +02:00
Linus Torvalds 680352bda5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two fixes: dead code removal, plus a SME memory encryption fix on
  32-bit kernels that crashed Xen guests"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Remove unused and undefined __generic_processor_info() declaration
  x86/mm: Make the SME mask a u64
2017-09-12 11:34:39 -07:00
Linus Torvalds 89fd915c40 libnvdimm for 4.14
* Media error handling support in the Block Translation Table (BTT)
   driver is reworked to address sleeping-while-atomic locking and
   memory-allocation-context conflicts.
 
 * The dax_device lookup overhead for xfs and ext4 is moved out of the
   iomap hot-path to a mount-time lookup.
 
 * A new 'ecc_unit_size' sysfs attribute is added to advertise the
   read-modify-write boundary property of a persistent memory range.
 
 * Preparatory fix-ups for arm and powerpc pmem support are included
   along with other miscellaneous fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZtsAGAAoJEB7SkWpmfYgCrzMP/2vPvZvrFjZn5pAoZjlmTmHM
 ySceoOC7vwvVXIsSs52FhSjcxEoXo9cklXPwhXOPVtVUFdSDJBUOIUxwIziE6Y+5
 sFJ2xT9K+5zKBUiXJwqFQDg52dn//eBNnnnDz+HQrBSzGrbWQhIZY2m19omPzv1I
 BeN0OCGOdW3cjSo3BCFl1d+KrSl704e7paeKq/TO3GIiAilIXleTVxcefEEodV2K
 ZvWHpFIhHeyN8dsF8teI952KcCT92CT/IaabxQIwCxX0/8/GFeDc5aqf77qiYWKi
 uxCeQXdgnaE8EZNWZWGWIWul6eYEkoCNbLeUQ7eJnECq61VxVajJS0NyGa5T9OiM
 P046Bo2b1b3R0IHxVIyVG0ZCm3YUMAHSn/3uRxPgESJ4bS/VQ3YP5M6MLxDOlc90
 IisLilagitkK6h8/fVuVrwciRNQ71XEC34t6k7GCl/1ZnLlLT+i4/jc5NRZnGEZh
 aXAAGHdteQ+/mSz6p2UISFUekbd6LerwzKRw8ibDvH6pTud8orYR7g2+JoGhgb6Y
 pyFVE8DhIcqNKAMxBsjiRZ46OQ7qrT+AemdAG3aVv6FaNoe4o5jPLdw2cEtLqtpk
 +DNm0/lSWxxxozjrvu6EUZj6hk8R5E19XpRzV5QJkcKUXMu7oSrFLdMcC4FeIjl9
 K4hXLV3fVBVRMiS0RA6z
 =5iGY
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm from Dan Williams:
 "A rework of media error handling in the BTT driver and other updates.
  It has appeared in a few -next releases and collected some late-
  breaking build-error and warning fixups as a result.

  Summary:

   - Media error handling support in the Block Translation Table (BTT)
     driver is reworked to address sleeping-while-atomic locking and
     memory-allocation-context conflicts.

   - The dax_device lookup overhead for xfs and ext4 is moved out of the
     iomap hot-path to a mount-time lookup.

   - A new 'ecc_unit_size' sysfs attribute is added to advertise the
     read-modify-write boundary property of a persistent memory range.

   - Preparatory fix-ups for arm and powerpc pmem support are included
     along with other miscellaneous fixes"

* tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
  libnvdimm, btt: fix format string warnings
  libnvdimm, btt: clean up warning and error messages
  ext4: fix null pointer dereference on sbi
  libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
  dax: fix FS_DAX=n BLOCK=y compilation
  libnvdimm: fix integer overflow static analysis warning
  libnvdimm, nd_blk: remove mmio_flush_range()
  libnvdimm, btt: rework error clearing
  libnvdimm: fix potential deadlock while clearing errors
  libnvdimm, btt: cache sector_size in arena_info
  libnvdimm, btt: ensure that flags were also unchanged during a map_read
  libnvdimm, btt: refactor map entry operations with macros
  libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
  libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
  ext4: perform dax_device lookup at mount
  ext2: perform dax_device lookup at mount
  xfs: perform dax_device lookup at mount
  dax: introduce a fs_dax_get_by_bdev() helper
  libnvdimm, btt: check memory allocation failure
  libnvdimm, label: fix index block size calculation
  ...
2017-09-11 13:10:57 -07:00
Dou Liyang e2329b4252 x86/cpu: Remove unused and undefined __generic_processor_info() declaration
The following revert:

  2b85b3d229 ("x86/acpi: Restore the order of CPU IDs")

... got rid of __generic_processor_info(), but forgot to remove its
declaration in mpspec.h.

Remove the declaration and update the comments as well.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1505101403-29100-1-git-send-email-douly.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-11 08:16:37 +02:00
Linus Torvalds fbf4432ff7 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - most of the rest of MM

 - a small number of misc things

 - lib/ updates

 - checkpatch

 - autofs updates

 - ipc/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits)
  ipc: optimize semget/shmget/msgget for lots of keys
  ipc/sem: play nicer with large nsops allocations
  ipc/sem: drop sem_checkid helper
  ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
  ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
  ipc: convert ipc_namespace.count from atomic_t to refcount_t
  kcov: support compat processes
  sh: defconfig: cleanup from old Kconfig options
  mn10300: defconfig: cleanup from old Kconfig options
  m32r: defconfig: cleanup from old Kconfig options
  drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
  drivers/pps: aesthetic tweaks to PPS-related content
  cpumask: make cpumask_next() out-of-line
  kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
  kmod: split off umh headers into its own file
  MAINTAINERS: clarify kmod is just a kernel module loader
  kmod: split out umh code into its own file
  test_kmod: flip INT checks to be consistent
  test_kmod: remove paranoid UINT_MAX check on uint range processing
  vfat: deduplicate hex2bin()
  ...
2017-09-09 10:30:07 -07:00
Matthew Wilcox 4c51248533 x86: implement memset16, memset32 & memset64
These are single instructions on x86.  There's no 64-bit instruction for
x86-32, but we don't yet have any user for memset64() on 32-bit
architectures, so don't bother to implement it.

Link: http://lkml.kernel.org/r/20170720184539.31609-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Naoya Horiguchi ab6e3d0939 mm: soft-dirty: keep soft-dirty bits over thp migration
Soft dirty bit is designed to keep tracked over page migration.  This
patch makes it work in the same manner for thp migration too.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Zi Yan 616b837153 mm: thp: enable thp migration in generic path
Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration.  If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration.  If you succeed to
allocate destination thp, you enter thp migration.  Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
  Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Naoya Horiguchi eee4818baa mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
_PAGE_PSE is used to distinguish between a truly non-present
(_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and
should be treated as present.

But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would
cause confusion between one of those PMDs undergoing a THP split, and a
soft-dirty PMD.  Dropping _PAGE_PSE check in pmd_present() does not work
well, because it can hurt optimization of tlb handling in thp split.

Thus, we need to move the bit.

In the current kernel, bits 1-4 are not used in non-present format since
commit 00839ee3b2 ("x86/mm: Move swap offset/type up in PTE to work
around erratum").  So let's move _PAGE_SWP_SOFT_DIRTY to bit 1.  Bit 7
is used as reserved (always clear), so please don't use it for other
purpose.

Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:45 -07:00
Linus Torvalds 0756b7fbb6 First batch of KVM changes for 4.14
Common:
  - improve heuristic for boosting preempted spinlocks by ignoring VCPUs
    in user mode
 
 ARM:
  - fix for decoding external abort types from guests
 
  - added support for migrating the active priority of interrupts when
    running a GICv2 guest on a GICv3 host
 
  - minor cleanup
 
 PPC:
  - expose storage keys to userspace
 
  - merge powerpc/topic/ppc-kvm branch that contains
    find_linux_pte_or_hugepte and POWER9 thread management cleanup
 
  - merge kvm-ppc-fixes with a fix that missed 4.13 because of vacations
 
  - fixes
 
 s390:
  - merge of topic branch tlb-flushing from the s390 tree to get the
    no-dat base features
 
  - merge of kvm/master to avoid conflicts with additional sthyi fixes
 
  - wire up the no-dat enhancements in KVM
 
  - multiple epoch facility (z14 feature)
 
  - Configuration z/Architecture Mode
 
  - more sthyi fixes
 
  - gdb server range checking fix
 
  - small code cleanups
 
 x86:
  - emulate Hyper-V TSC frequency MSRs
 
  - add nested INVPCID
 
  - emulate EPTP switching VMFUNC
 
  - support Virtual GIF
 
  - support 5 level page tables
 
  - speedup nested VM exits by packing byte operations
 
  - speedup MMIO by using hardware provided physical address
 
  - a lot of fixes and cleanups, especially nested
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJZspE1AAoJEED/6hsPKofoDcMIALT11n+LKV50QGwQdg2W1GOt
 aChbgnj/Kegit3hQlDhVNb8kmdZEOZzSL81Lh0VPEr7zXU8QiWn2snbizDPv8sde
 MpHhcZYZZ0YrpoiZKjl8yiwcu88OWGn2qtJ7OpuTS5hvEGAfxMncp0AMZho6fnz/
 ySTwJ9GK2MTgBw39OAzCeDOeoYn4NKYMwjJGqBXRhNX8PG/1wmfqv0vPrd6wfg31
 KJ58BumavwJjr8YbQ1xELm9rpQrAmaayIsG0R1dEUqCbt5a1+t2gt4h2uY7tWcIv
 ACt2bIze7eF3xA+OpRs+eT+yemiH3t9btIVmhCfzUpnQ+V5Z55VMSwASLtTuJRQ=
 =R8Ry
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "First batch of KVM changes for 4.14

  Common:
   - improve heuristic for boosting preempted spinlocks by ignoring
     VCPUs in user mode

  ARM:
   - fix for decoding external abort types from guests

   - added support for migrating the active priority of interrupts when
     running a GICv2 guest on a GICv3 host

   - minor cleanup

  PPC:
   - expose storage keys to userspace

   - merge kvm-ppc-fixes with a fix that missed 4.13 because of
     vacations

   - fixes

  s390:
   - merge of kvm/master to avoid conflicts with additional sthyi fixes

   - wire up the no-dat enhancements in KVM

   - multiple epoch facility (z14 feature)

   - Configuration z/Architecture Mode

   - more sthyi fixes

   - gdb server range checking fix

   - small code cleanups

  x86:
   - emulate Hyper-V TSC frequency MSRs

   - add nested INVPCID

   - emulate EPTP switching VMFUNC

   - support Virtual GIF

   - support 5 level page tables

   - speedup nested VM exits by packing byte operations

   - speedup MMIO by using hardware provided physical address

   - a lot of fixes and cleanups, especially nested"

* tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits)
  KVM: arm/arm64: Support uaccess of GICC_APRn
  KVM: arm/arm64: Extract GICv3 max APRn index calculation
  KVM: arm/arm64: vITS: Drop its_ite->lpi field
  KVM: arm/arm64: vgic: constify seq_operations and file_operations
  KVM: arm/arm64: Fix guest external abort matching
  KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fd
  KVM: s390: vsie: cleanup mcck reinjection
  KVM: s390: use WARN_ON_ONCE only for checking
  KVM: s390: guestdbg: fix range check
  KVM: PPC: Book3S HV: Report storage key support to userspace
  KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9
  KVM: PPC: Book3S HV: Fix invalid use of register expression
  KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation
  KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER
  KVM: PPC: e500mc: Fix a NULL dereference
  KVM: PPC: e500: Fix some NULL dereferences on error
  KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list
  KVM: s390: we are always in czam mode
  KVM: s390: expose no-DAT to guest and migration support
  KVM: s390: sthyi: remove invalid guest write access
  ...
2017-09-08 15:18:36 -07:00
Radim Krčmář 5f54c8b2d4 Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
This fix was intended for 4.13, but didn't get in because both
maintainers were on vacation.

Paul Mackerras:
 "It adds mutual exclusion between list_add_rcu and list_del_rcu calls
  on the kvm->arch.spapr_tce_tables list.  Without this, userspace could
  potentially trigger corruption of the list and cause a host crash or
  worse."
2017-09-08 14:40:43 +02:00
Linus Torvalds 3ee31b89d9 xen: fixes and features for 4.14
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJZsBbYAAoJELDendYovxMv4hoH/39psrSeHw2hPX78KJ6orq4v
 mTVEP2gLA/qxaM03EnFljXfd88J8NcJsxv7vVjh/U4xRwntvAMovCkygkkO1aw93
 nZEhUq6IGupr8KzmqQi5U7WtiWAXFwDbGSasnOKEj/lLa7E0/9MsYYQ01FS6oFkc
 c9CHONaCWepdz0Xpt7s6BKyzo74ZbJeCc5rUZU81oH40XphaZEoy8E9NOgDdfz3l
 VvPSaxZvebynT8JKDe4KxrMPpBjhr7mwgLcXk/Zy2EzOzxFSxXLsDAnwjtCW1gTh
 lPLD4TkgtziDfPfZXxFH3J34IUe1tZ2M+7Cz157FBu6BKX/g9ETQT24DXWDzFuI=
 =cgfV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.14b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - the new pvcalls backend for routing socket calls from a guest to dom0

 - some cleanups of Xen code

 - a fix for wrong usage of {get,put}_cpu()

* tag 'for-linus-4.14b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (27 commits)
  xen/mmu: set MMU_NORMAL_PT_UPDATE in remap_area_mfn_pte_fn
  xen: Don't try to call xen_alloc_p2m_entry() on autotranslating guests
  xen/events: events_fifo: Don't use {get,put}_cpu() in xen_evtchn_fifo_init()
  xen/pvcalls: use WARN_ON(1) instead of __WARN()
  xen: remove not used trace functions
  xen: remove unused function xen_set_domain_pte()
  xen: remove tests for pvh mode in pure pv paths
  xen-platform: constify pci_device_id.
  xen: cleanup xen.h
  xen: introduce a Kconfig option to enable the pvcalls backend
  xen/pvcalls: implement write
  xen/pvcalls: implement read
  xen/pvcalls: implement the ioworker functions
  xen/pvcalls: disconnect and module_exit
  xen/pvcalls: implement release command
  xen/pvcalls: implement poll command
  xen/pvcalls: implement accept command
  xen/pvcalls: implement listen command
  xen/pvcalls: implement bind command
  xen/pvcalls: implement connect command
  ...
2017-09-07 10:24:21 -07:00
Linus Torvalds 57e88b43b8 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes include various Hyper-V optimizations such as faster
  hypercalls and faster/better TLB flushes - and there's also some
  Intel-MID cleanups"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others()
  x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls
  x86/platform/intel-mid: Make several arrays static, to make code smaller
  MAINTAINERS: Add missed file for Hyper-V
  x86/hyper-v: Use hypercall for remote TLB flush
  hyper-v: Globalize vp_index
  x86/hyper-v: Implement rep hypercalls
  hyper-v: Use fast hypercall for HVCALL_SIGNAL_EVENT
  x86/hyper-v: Introduce fast hypercall implementation
  x86/hyper-v: Make hv_do_hypercall() inline
  x86/hyper-v: Include hyperv/ only when CONFIG_HYPERV is set
  x86/platform/intel-mid: Make 'bt_sfi_data' const
  x86/platform/intel-mid: Make IRQ allocation a bit more flexible
  x86/platform/intel-mid: Group timers callbacks together
2017-09-07 09:25:15 -07:00
Borislav Petkov 21d9bb4a05 x86/mm: Make the SME mask a u64
The SME encryption mask is for masking 64-bit pagetable entries. It
being an unsigned long works fine on X86_64 but on 32-bit builds in
truncates bits leading to Xen guests crashing very early.

And regardless, the whole SME mask handling shouldnt've leaked into
32-bit because SME is X86_64-only feature. So, first make the mask u64.
And then, add trivial 32-bit versions of the __sme_* macros so that
nothing happens there.

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <Thomas.Lendacky@amd.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas <Thomas.Lendacky@amd.com>
Fixes: 21729f81ce ("x86/mm: Provide general kernel support for memory encryption")
Link: http://lkml.kernel.org/r/20170907093837.76zojtkgebwtqc74@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-07 11:53:11 +02:00
Linus Torvalds d34fc1adf0 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - various misc bits

 - DAX updates

 - OCFS2

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits)
  mm,fork: introduce MADV_WIPEONFORK
  x86,mpx: make mpx depend on x86-64 to free up VMA flag
  mm: add /proc/pid/smaps_rollup
  mm: hugetlb: clear target sub-page last when clearing huge page
  mm: oom: let oom_reap_task and exit_mmap run concurrently
  swap: choose swap device according to numa node
  mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
  mm, oom: do not rely on TIF_MEMDIE for memory reserves access
  z3fold: use per-cpu unbuddied lists
  mm, swap: don't use VMA based swap readahead if HDD is used as swap
  mm, swap: add sysfs interface for VMA based swap readahead
  mm, swap: VMA based swap readahead
  mm, swap: fix swap readahead marking
  mm, swap: add swap readahead hit statistics
  mm/vmalloc.c: don't reinvent the wheel but use existing llist API
  mm/vmstat.c: fix wrong comment
  selftests/memfd: add memfd_create hugetlbfs selftest
  mm/shmem: add hugetlbfs support to memfd_create()
  mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
  mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
  ...
2017-09-06 20:49:49 -07:00
Andy Lutomirski 72c0098d92 x86/mm: Reinitialize TLB state on hotplug and resume
When Linux brings a CPU down and back up, it switches to init_mm and then
loads swapper_pg_dir into CR3.  With PCID enabled, this has the side effect
of masking off the ASID bits in CR3.

This can result in some confusion in the TLB handling code.  If we
bring a CPU down and back up with any ASID other than 0, we end up
with the wrong ASID active on the CPU after resume.  This could
cause our internal state to become corrupt, although major
corruption is unlikely because init_mm doesn't have any user pages.
More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
in the next context switch.  The result of *that* is a failure to
resume from suspend with probability 1 - 1/6^(cpus-1).

Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Jiri Kosina <jikos@kernel.org>
Fixes: 10af6235e0 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 20:12:57 -07:00
Mike Kravetz aafd4562df mm: arch: consolidate mmap hugetlb size encodings
A non-default huge page size can be encoded in the flags argument of the
mmap system call.  The definitions for these encodings are in arch
specific header files.  However, all architectures use the same values.

Consolidate all the definitions in the primary user header file
(uapi/linux/mman.h).  Include definitions for all known huge page sizes.
Use the generic encoding definitions in hugetlb_encode.h as the basis
for these definitions.

Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:28 -07:00
Linus Torvalds 53ac64aac9 ACPI updates for v4.14-rc1
- Update the ACPICA code in the kernel to upstream revision 20170728
    including:
    * Alias operator handling update (Bob Moore).
    * Deferred resolution of reference package elements (Bob Moore).
    * Support for the _DMA method in walk resources (Bob Moore).
    * Tables handling update and support for deferred table
      verification (Lv Zheng).
    * Update of SMMU models for IORT (Robin Murphy).
    * Compiler and disassembler updates (Alex James, Erik Schmauss,
      Ganapatrao Kulkarni, James Morse).
    * Tools updates (Erik Schmauss, Lv Zheng).
    * Assorted minor fixes and cleanups (Bob Moore, Kees Cook,
      Lv Zheng, Shao Ming).
 
  - Rework the initialization of non-wakeup GPEs with method handlers
    in order to address a boot crash on some systems with Thunderbolt
    devices connected at boot time where we miss an early hotplug
    event due to a delay in GPE enabling (Rafael Wysocki).
 
  - Rework the handling of PCI bridges when setting up ACPI-based
    device wakeup in order to avoid disabling wakeup for bridges
    prematurely (Rafael Wysocki).
 
  - Consolidate Apple DMI checks throughout the tree, add support for
    Apple device properties to the device properties framework and
    use these properties for the handling of I2C and SPI devices on
    Apple systems (Lukas Wunner).
 
  - Add support for _DMA to the ACPI-based device properties lookup
    code and make it possible to use the information from there to
    configure DMA regions on ARM64 systems (Lorenzo Pieralisi).
 
  - Fix several issues in the APEI code, add support for exporting
    the BERT error region over sysfs and update APEI MAINTAINERS
    entry with reviewers information (Borislav Petkov, Dongjiu Geng,
    Loc Ho, Punit Agrawal, Tony Luck, Yazen Ghannam).
 
  - Fix a potential initialization ordering issue in the ACPI EC
    driver and clean it up somewhat (Lv Zheng).
 
  - Update the ACPI SPCR driver to extend the existing XGENE 8250
    workaround in it to a new platform (m400) and to work around
    an Xgene UART clock issue (Graeme Gregory).
 
  - Add a new utility function to the ACPI core to support using
    ACPI OEM ID / OEM Table ID / Revision for system identification
    in blacklisting or similar and switch over the existing code
    already using this information to this new interface (Toshi Kani).
 
  - Fix an xpower PMIC issue related to GPADC reads that always return
    0 without extra pin manipulations (Hans de Goede).
 
  - Add statements to print debug messages in a couple of places in
    the ACPI core for easier diagnostics (Rafael Wysocki).
 
  - Clean up the ACPI processor driver slightly (Colin Ian King,
    Hanjun Guo).
 
  - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko).
 
  - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight
    driver (Alex Hung).
 
  - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur
    Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal,
    Ronald Tschalär, Sumeet Pawnikar).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJZrcE+AAoJEILEb/54YlRxVGAP/RKzkJlYlOIXtMjf4XWg5ZfJ
 RKZA68E9DW179KoBoTCVPD6/eD5UoEJ7fsWXFU2Hgp2xL3N1mZMAJHgAE4GoAwCx
 uImoYvQgdPna7DawzRIFkvkfceYxNyh+KaV9s7xne4hAwsB7JzP9yf5Ywll53+oF
 Le27/r6lDOaWhG7uYcxSabnQsWZQkBF5mj2GPzEpKDIHcLA1Vii0URzm7mAHdZsz
 vGjYhxrshKYEVdkLSRn536m1rEfp2fqsRJ5wqNAazZJr6Cs1WIfNVuv/RfduRJpG
 /zHIRAmgKV+3jp39cBpjdnexLczb1rGiCV1yZOvwCNM7jy4evL8vbL7VgcUCopaj
 fHbF34chNG/hKJd3Zn3RRCTNzCs6bv+txslOMARxji5eyr2Q4KuVnvg5LM4hxOUP
 23FvcYkBYWu4QCNLOTnC7y2OqK6WzOvDpfi7hf13Z42iNzeAUbwt1sVF0/OCwL51
 Og6blSy2x8FidKp8oaBBboBzHEiKWnXBj/Hw8KEHVcsqZv1ZC6igNRAL3tjxamU8
 98/Z2NSZHYPrrrn13tT9ywISYXReXzUF85787+0ofugvDe8/QyBH6UhzzZc/xKVA
 t329JEjEFZZSLgxMIIa9bXoQANxkeZEGsxN6FfwvQhyIVdagLF3UvCjZl/q2NScC
 9n++s32qfUBRHetGODWc
 =6Ke9
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
 "These include a usual ACPICA code update (this time to upstream
  revision 20170728), a fix for a boot crash on some systems with
  Thunderbolt devices connected at boot time, a rework of the handling
  of PCI bridges when setting up device wakeup, new support for Apple
  device properties, support for DMA configurations reported via ACPI on
  ARM64, APEI-related updates, ACPI EC driver updates and assorted minor
  modifications in several places.

  Specifics:

   - Update the ACPICA code in the kernel to upstream revision 20170728
     including:
      * Alias operator handling update (Bob Moore).
      * Deferred resolution of reference package elements (Bob Moore).
      * Support for the _DMA method in walk resources (Bob Moore).
      * Tables handling update and support for deferred table
        verification (Lv Zheng).
      * Update of SMMU models for IORT (Robin Murphy).
      * Compiler and disassembler updates (Alex James, Erik Schmauss,
        Ganapatrao Kulkarni, James Morse).
      * Tools updates (Erik Schmauss, Lv Zheng).
      * Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv
        Zheng, Shao Ming).

   - Rework the initialization of non-wakeup GPEs with method handlers
     in order to address a boot crash on some systems with Thunderbolt
     devices connected at boot time where we miss an early hotplug event
     due to a delay in GPE enabling (Rafael Wysocki).

   - Rework the handling of PCI bridges when setting up ACPI-based
     device wakeup in order to avoid disabling wakeup for bridges
     prematurely (Rafael Wysocki).

   - Consolidate Apple DMI checks throughout the tree, add support for
     Apple device properties to the device properties framework and use
     these properties for the handling of I2C and SPI devices on Apple
     systems (Lukas Wunner).

   - Add support for _DMA to the ACPI-based device properties lookup
     code and make it possible to use the information from there to
     configure DMA regions on ARM64 systems (Lorenzo Pieralisi).

   - Fix several issues in the APEI code, add support for exporting the
     BERT error region over sysfs and update APEI MAINTAINERS entry with
     reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit
     Agrawal, Tony Luck, Yazen Ghannam).

   - Fix a potential initialization ordering issue in the ACPI EC driver
     and clean it up somewhat (Lv Zheng).

   - Update the ACPI SPCR driver to extend the existing XGENE 8250
     workaround in it to a new platform (m400) and to work around an
     Xgene UART clock issue (Graeme Gregory).

   - Add a new utility function to the ACPI core to support using ACPI
     OEM ID / OEM Table ID / Revision for system identification in
     blacklisting or similar and switch over the existing code already
     using this information to this new interface (Toshi Kani).

   - Fix an xpower PMIC issue related to GPADC reads that always return
     0 without extra pin manipulations (Hans de Goede).

   - Add statements to print debug messages in a couple of places in the
     ACPI core for easier diagnostics (Rafael Wysocki).

   - Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun
     Guo).

   - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko).

   - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver
     (Alex Hung).

   - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur
     Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal,
     Ronald Tschalär, Sumeet Pawnikar)"

* tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
  ACPI / APEI: Suppress message if HEST not present
  intel_pstate: convert to use acpi_match_platform_list()
  ACPI / blacklist: add acpi_match_platform_list()
  ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources
  ACPI: make device_attribute const
  ACPI / sysfs: Extend ACPI sysfs to provide access to boot error region
  ACPI: APEI: fix the wrong iteration of generic error status block
  ACPI / processor: make function acpi_processor_check_duplicates() static
  ACPI / EC: Clean up EC GPE mask flag
  ACPI: EC: Fix possible issues related to EC initialization order
  ACPI / PM: Add debug statements to acpi_pm_notify_handler()
  ACPI: Add debug statements to acpi_global_event_handler()
  ACPI / scan: Enable GPEs before scanning the namespace
  ACPICA: Make it possible to enable runtime GPEs earlier
  ACPICA: Dispatch active GPEs at init time
  ACPI: SPCR: work around clock issue on xgene UART
  ACPI: SPCR: extend XGENE 8250 workaround to m400
  ACPI / LPSS: Don't abort ACPI scan on missing mem resource
  mailbox: pcc: Drop uninformative output during boot
  ACPI/IORT: Add IORT named component memory address limits
  ...
2017-09-05 12:45:03 -07:00
Linus Torvalds bafb0762cb Char/Misc drivers for 4.14-rc1
Here is the big char/misc driver update for 4.14-rc1.
 
 Lots of different stuff in here, it's been an active development cycle
 for some reason.  Highlights are:
   - updated binder driver, this brings binder up to date with what
     shipped in the Android O release, plus some more changes that
     happened since then that are in the Android development trees.
   - coresight updates and fixes
   - mux driver file renames to be a bit "nicer"
   - intel_th driver updates
   - normal set of hyper-v updates and changes
   - small fpga subsystem and driver updates
   - lots of const code changes all over the driver trees
   - extcon driver updates
   - fmc driver subsystem upadates
   - w1 subsystem minor reworks and new features and drivers added
   - spmi driver updates
 
 Plus a smattering of other minor driver updates and fixes.
 
 All of these have been in linux-next with no reported issues for a
 while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWa1+Ew8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yl26wCgquufNylfhxr65NbJrovduJYzRnUAniCivXg8
 bePIh/JI5WxWoHK+wEbY
 =hYWx
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big char/misc driver update for 4.14-rc1.

  Lots of different stuff in here, it's been an active development cycle
  for some reason. Highlights are:

   - updated binder driver, this brings binder up to date with what
     shipped in the Android O release, plus some more changes that
     happened since then that are in the Android development trees.

   - coresight updates and fixes

   - mux driver file renames to be a bit "nicer"

   - intel_th driver updates

   - normal set of hyper-v updates and changes

   - small fpga subsystem and driver updates

   - lots of const code changes all over the driver trees

   - extcon driver updates

   - fmc driver subsystem upadates

   - w1 subsystem minor reworks and new features and drivers added

   - spmi driver updates

  Plus a smattering of other minor driver updates and fixes.

  All of these have been in linux-next with no reported issues for a
  while"

* tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits)
  ANDROID: binder: don't queue async transactions to thread.
  ANDROID: binder: don't enqueue death notifications to thread todo.
  ANDROID: binder: Don't BUG_ON(!spin_is_locked()).
  ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl
  ANDROID: binder: push new transactions to waiting threads.
  ANDROID: binder: remove proc waitqueue
  android: binder: Add page usage in binder stats
  android: binder: fixup crash introduced by moving buffer hdr
  drivers: w1: add hwmon temp support for w1_therm
  drivers: w1: refactor w1_slave_show to make the temp reading functionality separate
  drivers: w1: add hwmon support structures
  eeprom: idt_89hpesx: Support both ACPI and OF probing
  mcb: Fix an error handling path in 'chameleon_parse_cells()'
  MCB: add support for SC31 to mcb-lpc
  mux: make device_type const
  char: virtio: constify attribute_group structures.
  Documentation/ABI: document the nvmem sysfs files
  lkdtm: fix spelling mistake: "incremeted" -> "incremented"
  perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file
  nvmem: include linux/err.h from header
  ...
2017-09-05 11:08:17 -07:00
Linus Torvalds 24e700e291 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
 "This update provides:

   - Cleanup of the IDT management including the removal of the extra
     tracing IDT. A first step to cleanup the vector management code.

   - The removal of the paravirt op adjust_exception_frame. This is a
     XEN specific issue, but merged through this branch to avoid nasty
     merge collisions

   - Prevent dmesg spam about the TSC DEADLINE bug, when the CPU has
     disabled the TSC DEADLINE timer in CPUID.

   - Adjust a debug message in the ioapic code to print out the
     information correctly"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  x86/idt: Fix the X86_TRAP_BP gate
  x86/xen: Get rid of paravirt op adjust_exception_frame
  x86/eisa: Add missing include
  x86/idt: Remove superfluous ALIGNment
  x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature
  x86/idt: Remove the tracing IDT leftovers
  x86/idt: Hide set_intr_gate()
  x86/idt: Simplify alloc_intr_gate()
  x86/idt: Deinline setup functions
  x86/idt: Remove unused functions/inlines
  x86/idt: Move interrupt gate initialization to IDT code
  x86/idt: Move APIC gate initialization to tables
  x86/idt: Move regular trap init to tables
  x86/idt: Move IST stack based traps to table init
  x86/idt: Move debug stack init to table based
  x86/idt: Switch early trap init to IDT tables
  x86/idt: Prepare for table based init
  x86/idt: Move early IDT setup out of 32-bit asm
  x86/idt: Move early IDT handler setup to IDT code
  x86/idt: Consolidate IDT invalidation
  ...
2017-09-04 17:43:56 -07:00
Linus Torvalds f57091767a Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache quality monitoring update from Thomas Gleixner:
 "This update provides a complete rewrite of the Cache Quality
  Monitoring (CQM) facility.

  The existing CQM support was duct taped into perf with a lot of issues
  and the attempts to fix those turned out to be incomplete and
  horrible.

  After lengthy discussions it was decided to integrate the CQM support
  into the Resource Director Technology (RDT) facility, which is the
  obvious choise as in hardware CQM is part of RDT. This allowed to add
  Memory Bandwidth Monitoring support on top.

  As a result the mechanisms for allocating cache/memory bandwidth and
  the corresponding monitoring mechanisms are integrated into a single
  management facility with a consistent user interface"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  x86/intel_rdt: Turn off most RDT features on Skylake
  x86/intel_rdt: Add command line options for resource director technology
  x86/intel_rdt: Move special case code for Haswell to a quirk function
  x86/intel_rdt: Remove redundant ternary operator on return
  x86/intel_rdt/cqm: Improve limbo list processing
  x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
  x86/intel_rdt: Modify the intel_pqr_state for better performance
  x86/intel_rdt/cqm: Clear the default RMID during hotcpu
  x86/intel_rdt: Show bitmask of shareable resource with other executing units
  x86/intel_rdt/mbm: Handle counter overflow
  x86/intel_rdt/mbm: Add mbm counter initialization
  x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
  x86/intel_rdt/cqm: Add CPU hotplug support
  x86/intel_rdt/cqm: Add sched_in support
  x86/intel_rdt: Introduce rdt_enable_key for scheduling
  x86/intel_rdt/cqm: Add mount,umount support
  x86/intel_rdt/cqm: Add rmdir support
  x86/intel_rdt: Separate the ctrl bits from rmdir
  x86/intel_rdt/cqm: Add mon_data
  x86/intel_rdt: Prepare for RDT monitor data support
  ...
2017-09-04 13:56:37 -07:00
Linus Torvalds b1b6f83ac9 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "PCID support, 5-level paging support, Secure Memory Encryption support

  The main changes in this cycle are support for three new, complex
  hardware features of x86 CPUs:

   - Add 5-level paging support, which is a new hardware feature on
     upcoming Intel CPUs allowing up to 128 PB of virtual address space
     and 4 PB of physical RAM space - a 512-fold increase over the old
     limits. (Supercomputers of the future forecasting hurricanes on an
     ever warming planet can certainly make good use of more RAM.)

     Many of the necessary changes went upstream in previous cycles,
     v4.14 is the first kernel that can enable 5-level paging.

     This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
     default.

     (By Kirill A. Shutemov)

   - Add 'encrypted memory' support, which is a new hardware feature on
     upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
     RAM to be encrypted and decrypted (mostly) transparently by the
     CPU, with a little help from the kernel to transition to/from
     encrypted RAM. Such RAM should be more secure against various
     attacks like RAM access via the memory bus and should make the
     radio signature of memory bus traffic harder to intercept (and
     decrypt) as well.

     This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
     by default.

     (By Tom Lendacky)

   - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
     hardware feature that attaches an address space tag to TLB entries
     and thus allows to skip TLB flushing in many cases, even if we
     switch mm's.

     (By Andy Lutomirski)

  All three of these features were in the works for a long time, and
  it's coincidence of the three independent development paths that they
  are all enabled in v4.14 at once"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
  x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
  x86/mm: Use pr_cont() in dump_pagetable()
  x86/mm: Fix SME encryption stack ptr handling
  kvm/x86: Avoid clearing the C-bit in rsvd_bits()
  x86/CPU: Align CR3 defines
  x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
  acpi, x86/mm: Remove encryption mask from ACPI page protection type
  x86/mm, kexec: Fix memory corruption with SME on successive kexecs
  x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
  x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
  x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
  x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
  x86/mm: Allow userspace have mappings above 47-bit
  x86/mm: Prepare to expose larger address space to userspace
  x86/mpx: Do not allow MPX if we have mappings above 47-bit
  x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
  x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
  x86/mm/dump_pagetables: Fix printout of p4d level
  x86/mm/dump_pagetables: Generalize address normalization
  x86/boot: Fix memremap() related build failure
  ...
2017-09-04 12:21:28 -07:00
Linus Torvalds 5f82e71a00 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - Add 'cross-release' support to lockdep, which allows APIs like
   completions, where it's not the 'owner' who releases the lock, to be
   tracked. It's all activated automatically under
   CONFIG_PROVE_LOCKING=y.

 - Clean up (restructure) the x86 atomics op implementation to be more
   readable, in preparation of KASAN annotations. (Dmitry Vyukov)

 - Fix static keys (Paolo Bonzini)

 - Add killable versions of down_read() et al (Kirill Tkhai)

 - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

 - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

 - Remove smp_mb__before_spinlock() and convert its usages, introduce
   smp_mb__after_spinlock() (Peter Zijlstra)

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  locking/lockdep/selftests: Fix mixed read-write ABBA tests
  sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
  acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
  locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
  smp: Avoid using two cache lines for struct call_single_data
  locking/lockdep: Untangle xhlock history save/restore from task independence
  locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
  futex: Remove duplicated code and fix undefined behaviour
  Documentation/locking/atomic: Finish the document...
  locking/lockdep: Fix workqueue crossrelease annotation
  workqueue/lockdep: 'Fix' flush_work() annotation
  locking/lockdep/selftests: Add mixed read-write ABBA tests
  mm, locking/barriers: Clarify tlb_flush_pending() barriers
  locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
  locking/lockdep: Explicitly initialize wq_barrier::done::map
  locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
  locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
  locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
  locking/refcounts, x86/asm: Implement fast refcount overflow protection
  locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
  ...
2017-09-04 11:52:29 -07:00
Linus Torvalds 6c51e67b64 Merge branch 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull syscall updates from Ingo Molnar:
 "Improve the security of set_fs(): we now check the address limit on a
  number of key platforms (x86, arm, arm64) before returning to
  user-space - without adding overhead to the typical system call fast
  path"

* 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64/syscalls: Check address limit on user-mode return
  arm/syscalls: Check address limit on user-mode return
  x86/syscalls: Check address limit on user-mode return
2017-09-04 11:18:17 -07:00
Linus Torvalds b0c79f49c3 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:

 - Introduce the ORC unwinder, which can be enabled via
   CONFIG_ORC_UNWINDER=y.

   The ORC unwinder is a lightweight, Linux kernel specific debuginfo
   implementation, which aims to be DWARF done right for unwinding.
   Objtool is used to generate the ORC unwinder tables during build, so
   the data format is flexible and kernel internal: there's no
   dependency on debuginfo created by an external toolchain.

   The ORC unwinder is almost two orders of magnitude faster than the
   (out of tree) DWARF unwinder - which is important for perf call graph
   profiling. It is also significantly simpler and is coded defensively:
   there has not been a single ORC related kernel crash so far, even
   with early versions. (knock on wood!)

   But the main advantage is that enabling the ORC unwinder allows
   CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
   measurably:

   With frame pointers disabled, GCC does not have to add frame pointer
   instrumentation code to every function in the kernel. The kernel's
   .text size decreases by about 3.2%, resulting in better cache
   utilization and fewer instructions executed, resulting in a broad
   kernel-wide speedup. Average speedup of system calls should be
   roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
   a speedup of 5-10% for some function execution intense workloads.

   The main cost of the unwinder is that the unwinder data has to be
   stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
   config - which is a modest cost on modern x86 systems.

   Given how young the ORC unwinder code is it's not enabled by default
   - but given the performance advantages the plan is to eventually make
   it the default unwinder on x86.

   See Documentation/x86/orc-unwinder.txt for more details.

 - Remove lguest support: its intended role was that of a temporary
   proof of concept for virtualization, plus its removal will enable the
   reduction (removal) of the paravirt API as well, so Rusty agreed to
   its removal. (Juergen Gross)

 - Clean up and fix FSGS related functionality (Andy Lutomirski)

 - Clean up IO access APIs (Andy Shevchenko)

 - Enhance the symbol namespace (Jiri Slaby)

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  objtool: Handle GCC stack pointer adjustment bug
  x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
  x86/fpu/math-emu: Add ENDPROC to functions
  x86/boot/64: Extract efi_pe_entry() from startup_64()
  x86/boot/32: Extract efi_pe_entry() from startup_32()
  x86/lguest: Remove lguest support
  x86/paravirt/xen: Remove xen_patch()
  objtool: Fix objtool fallthrough detection with function padding
  x86/xen/64: Fix the reported SS and CS in SYSCALL
  objtool: Track DRAP separately from callee-saved registers
  objtool: Fix validate_branch() return codes
  x86: Clarify/fix no-op barriers for text_poke_bp()
  x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
  selftests/x86/fsgsbase: Test selectors 1, 2, and 3
  x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
  x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
  x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
  x86/asm/32: Fix regs_get_register() on segment registers
  x86/xen/64: Rearrange the SYSCALL entries
  x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
  ...
2017-09-04 09:52:57 -07:00
Linus Torvalds f213a6c84c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - fix affine wakeups (Peter Zijlstra)

   - improve CPU onlining (and general bootup) scalability on systems
     with ridiculous number (thousands) of CPUs (Peter Zijlstra)

   - sched/numa updates (Rik van Riel)

   - sched/deadline updates (Byungchul Park)

   - sched/cpufreq enhancements and related cleanups (Viresh Kumar)

   - sched/debug enhancements (Xie XiuQi)

   - various fixes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  sched/debug: Optimize sched_domain sysctl generation
  sched/topology: Avoid pointless rebuild
  sched/topology, cpuset: Avoid spurious/wrong domain rebuilds
  sched/topology: Improve comments
  sched/topology: Fix memory leak in __sdt_alloc()
  sched/completion: Document that reinit_completion() must be called after complete_all()
  sched/autogroup: Fix error reporting printk text in autogroup_create()
  sched/fair: Fix wake_affine() for !NUMA_BALANCING
  sched/debug: Intruduce task_state_to_char() helper function
  sched/debug: Show task state in /proc/sched_debug
  sched/debug: Use task_pid_nr_ns in /proc/$pid/sched
  sched/core: Remove unnecessary initialization init_idle_bootup_task()
  sched/deadline: Change return value of cpudl_find()
  sched/deadline: Make find_later_rq() choose a closer CPU in topology
  sched/numa: Scale scan period with tasks in group and shared/private
  sched/numa: Slow down scan rate if shared faults dominate
  sched/pelt: Fix false running accounting
  sched: Mark pick_next_task_dl() and build_sched_domain() as static
  sched/cpupri: Don't re-initialize 'struct cpupri'
  sched/deadline: Don't re-initialize 'struct cpudl'
  ...
2017-09-04 09:10:24 -07:00
Linus Torvalds 9657752cb5 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add branch type profiling/tracing support. (Jin Yao)

   - Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of
     physical memory addresses, where the PMU supports it. (Kan Liang)

   - Export some PMU capability details in the new
     /sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi
     Kleen)

   - Aux data fixes and updates (Will Deacon)

   - kprobes fixes and updates (Masami Hiramatsu)

   - AMD uncore PMU driver fixes and updates (Janakarajan Natarajan)

  On the tooling side, here's a (limited!) list of highlights - there
  were many other changes that I could not list, see the shortlog and
  git history for details:

  UI improvements:

   - Implement a visual marker for fused x86 instructions in the
     annotate TUI browser, available now in 'perf report', more work
     needed to have it available as well in 'perf top' (Jin Yao)

     Further explanation from one of Jin's patches:

             │   ┌──cmpl   $0x0,argp_program_version_hook
       81.93 │   ├──je     20
             │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
             │   │↓ jne    29
             │   │↓ jmp    43
       11.47 │20:└─→cmpxch %esi,0x38a999(%rip)

     That means the cmpl+je is a fused instruction pair and they should
     be considered together.

   - Record the branch type and then show statistics and info about in
     callchain entries (Jin Yao)

     Example from one of Jin's patches:

        # perf record -g -j any,save_type
        # perf report --branch-history --stdio --no-children

        38.50%  div.c:45                [.] main                    div
                |
                ---main div.c:42 (RET CROSS_2M cycles:2)
                   compute_flag div.c:28 (cycles:2)
                   compute_flag div.c:27 (RET CROSS_2M cycles:1)
                   rand rand.c:28 (cycles:1)
                   rand rand.c:28 (RET CROSS_2M cycles:1)
                   __random random.c:298 (cycles:1)
                   __random random.c:297 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (RET CROSS_2M cycles:9)

  namespaces support:

   - Add initial support for namespaces, using setns to access files in
     namespaces, grabbing their build-ids, etc. (Krister Johansen)

  perf trace enhancements:

   - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Add initial 'clone' syscall args beautifier in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Beautifiers for the 'cmd' arg of several ioctl types, including:
     sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de
     Melo)

   - Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data'
     CTF conversion, allowing CTF trace visualization tools to show
     callchains and to resolve symbols (Geneviève Bastien)

   - Beautify the fcntl syscall, which is an interesting one in the
     sense that infrastructure had to be put in place to change the
     formatters of some arguments according to the value in a previous
     one, i.e. cmd dictates how arg and the syscall return will be
     formatted. (Arnaldo Carvalho de Melo

  perf stat enhancements:

   - Use group read for event groups in 'perf stat', reducing overhead
     when groups are defined in the event specification, i.e. when using
     {} to enclose a list of events, asking them to be read at the same
     time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa)

  pipe mode improvements:

   - Process tracing data in 'perf annotate' pipe mode (David
     Carrillo-Cisneros)

   - Add header record types to pipe-mode, now this command:

        $ perf record -o - -e cycles sleep 1 | perf report --stdio --header

     Will show the same as in non-pipe mode, i.e. involving a perf.data
     file (David Carrillo-Cisneros)

  Vendor specific hardware event support updates/enhancements:

   - Update POWER9 vendor events tables (Sukadev Bhattiprolu)

   - Add POWER9 PMU events Sukadev (Bhattiprolu)

   - Support additional POWER8+ PVR in PMU mapfile (Shriya)

   - Add Skylake server uncore JSON vendor events (Andi Kleen)

   - Support exporting Intel PT data to sqlite3 with python perf
     scripts, this is in addition to the postgresql support that was
     already there (Adrian Hunter)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits)
  perf symbols: Fix plt entry calculation for ARM and AARCH64
  perf probe: Fix kprobe blacklist checking condition
  perf/x86: Fix caps/ for !Intel
  perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR
  perf/core, pt, bts: Get rid of itrace_started
  perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments
  tools headers: Sync cpu features kernel ABI headers with tooling headers
  perf tools: Pass full path of FEATURES_DUMP
  perf tools: Robustify detection of clang binary
  tools lib: Allow external definition of CC, AR and LD
  perf tools: Allow external definition of flex and bison binary names
  tools build tests: Don't hardcode gcc name
  perf report: Group stat values on global event id
  perf values: Zero value buffers
  perf values: Fix allocation check
  perf values: Fix thread index bug
  perf report: Add dump_read function
  perf record: Set read_format for inherit_stat
  perf c2c: Fix remote HITM detection for Skylake
  perf tools: Fix static build with newer toolchains
  ...
2017-09-04 08:39:02 -07:00
Ingo Molnar edc2988c54 Merge branch 'linus' into locking/core, to fix up conflicts
Conflicts:
	mm/page_alloc.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-04 11:01:18 +02:00
Rafael J. Wysocki 01d2f105a4 Merge branches 'acpi-x86', 'acpi-soc', 'acpi-pmic' and 'acpi-apple'
* acpi-x86:
  ACPI / boot: Add number of legacy IRQs to debug output
  ACPI / boot: Correct address space of __acpi_map_table()
  ACPI / boot: Don't define unused variables

* acpi-soc:
  ACPI / LPSS: Don't abort ACPI scan on missing mem resource

* acpi-pmic:
  ACPI / PMIC: xpower: Do pinswitch magic when reading GPADC

* acpi-apple:
  spi: Use Apple device properties in absence of ACPI resources
  ACPI / scan: Recognize Apple SPI and I2C slaves
  ACPI / property: Support Apple _DSM properties
  ACPI / property: Don't evaluate objects for devices w/o handle
  treewide: Consolidate Apple DMI checks
2017-09-03 23:54:03 +02:00
Dan Williams 8f98ae0c9b Merge branch 'for-4.14/fs' into libnvdimm-for-next 2017-08-31 16:25:59 -07:00
Jérôme Glisse fb1522e099 KVM: update to new mmu_notifier semantic v2
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Changed since v1 (Linus Torvalds)
    - remove now useless kvm_arch_mmu_notifier_invalidate_page()

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31 16:13:00 -07:00
Robin Murphy 5deb67f77a libnvdimm, nd_blk: remove mmio_flush_range()
mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe90 ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.

Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-08-31 15:05:10 -07:00
Juergen Gross 5878d5d6fd x86/xen: Get rid of paravirt op adjust_exception_frame
When running as Xen pv-guest the exception frame on the stack contains
%r11 and %rcx additional to the other data pushed by the processor.

Instead of having a paravirt op being called for each exception type
prepend the Xen specific code to each exception entry. When running as
Xen pv-guest just use the exception entry with prepended instructions,
otherwise use the entry without the Xen specific code.

[ tglx: Merged through tip to avoid ugly merge conflict ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: boris.ostrovsky@oracle.com
Cc: luto@amacapital.net
Link: http://lkml.kernel.org/r/20170831174249.26853-1-jg@pfupf.net
2017-08-31 21:35:10 +02:00
Juergen Gross 882bbe56ae xen: remove unused function xen_set_domain_pte()
The function xen_set_domain_pte() is used nowhere in the kernel.
Remove it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-08-31 09:45:55 -04:00
Juergen Gross 82616f9599 xen: remove tests for pvh mode in pure pv paths
Remove the last tests for XENFEAT_auto_translated_physmap in pure
PV-domain specific paths. PVH V1 is gone and the feature will always
be "false" in PV guests.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-08-31 09:45:55 -04:00
Vitaly Kuznetsov 773b79f7a7 tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others()
Add Hyper-V tracing subsystem and trace hyperv_mmu_flush_tlb_others().
Tracing is done the same way we do xen_mmu_flush_tlb_others().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-10-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31 14:20:37 +02:00
Vitaly Kuznetsov 628f54cc64 x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls
Hyper-V hosts may support more than 64 vCPUs, we need to use
HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX hypercalls in this
case.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-9-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31 14:20:36 +02:00
Ingo Molnar 3e83dfd5d8 Merge branch 'x86/mm' into x86/platform, to pick up TLB flush dependency
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31 14:20:06 +02:00
Vitaly Kuznetsov 9e52fc2b50 x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
There's a subtle bug in how some of the paravirt guest code handles
page table freeing on x86:

On x86 software page table walkers depend on the fact that remote TLB flush
does an IPI: walk is performed lockless but with interrupts disabled and in
case the page table is freed the freeing CPU will get blocked as remote TLB
flush is required. On other architectures which don't require an IPI to do
remote TLB flush we have an RCU-based mechanism (see
include/asm-generic/tlb.h for more details).

In virtualized environments we may want to override the ->flush_tlb_others
callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a
remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has
been doing this for years and the upcoming remote TLB flush for Hyper-V will
do it too.

This is not safe, as software page table walkers may step on an already
freed page.

Fix the bug by enabling the RCU-based page table freeing mechanism,
CONFIG_HAVE_RCU_TABLE_FREE=y.

Testing with kernbench and mmap/munmap microbenchmarks, and neither showed
any noticeable performance impact.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com
[ Rewrote/fixed/clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31 11:07:07 +02:00
Thomas Gleixner 1d792a678c x86/idt: Remove the tracing IDT leftovers
Stephen reported a merge conflict with the XEN tree. That also shows that the
IDT cleanup forgot to remove the now unused trace_{trap} defines.

Remove them.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
2017-08-31 10:56:57 +02:00
Ingo Molnar e0563e0495 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 15:09:03 +02:00
Thomas Gleixner facaa3e3c8 x86/idt: Hide set_intr_gate()
set_intr_gate() is an internal function of the IDT code. The only user left
is the KVM code which replaces the pagefault handler eventually.

Provide an explicit update_intr_gate() function and make set_intr_gate()
static. While at it replace the magic number 14 in the KVM code with the
proper trap define.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.663008004@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:29 +02:00
Thomas Gleixner db18da78f9 x86/idt: Deinline setup functions
None of this is performance sensitive in any way - so debloat the kernel.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.502052875@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:28 +02:00
Thomas Gleixner 485fa57bd7 x86/idt: Remove unused functions/inlines
The IDT related inlines are not longer used. Remove them.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.422083717@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:28 +02:00
Thomas Gleixner 636a7598f6 x86/idt: Move APIC gate initialization to tables
Replace the APIC/SMP vector gate initialization with the table based
mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.260177013@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:28 +02:00
Thomas Gleixner b70543a0b2 x86/idt: Move regular trap init to tables
Initialize the regular traps with a table.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.182128165@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:27 +02:00
Thomas Gleixner 90f6225fba x86/idt: Move IST stack based traps to table init
Initialize the IST based traps via a table.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.091328949@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:27 +02:00
Thomas Gleixner 0a30908b91 x86/idt: Move debug stack init to table based
Add the debug_idt init table and make use of it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.006502252@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:27 +02:00
Thomas Gleixner 87e81786b1 x86/idt: Move early IDT setup out of 32-bit asm
The early IDT setup can be done in C code like it's done on 64-bit kernels.
Reuse the 64-bit version.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.757980775@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:26 +02:00
Thomas Gleixner 588787fde7 x86/idt: Move early IDT handler setup to IDT code
The early IDT handler setup is done in C entry code on 64-bit kernels and in
ASM entry code on 32-bit kernels.

Move the 64-bit variant to the IDT code so it can be shared with 32-bit
in the next step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.679561404@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:26 +02:00
Thomas Gleixner e802a51ede x86/idt: Consolidate IDT invalidation
kexec and reboot have both code to invalidate IDT. Create a common function
and use it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.600953282@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:26 +02:00
Thomas Gleixner 8f55868f9e x86/idt: Remove unused set_trap_gate()
This inline is not used at all.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.522053134@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:26 +02:00
Thomas Gleixner 87cc037674 x86/ldttss: Clean up 32-bit descriptors
Like the IDT descriptors, the LDT/TSS descriptors are pointlessly different
on 32 and 64 bit kernels.

Unify them and get rid of the duplicated code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.289634692@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:25 +02:00
Thomas Gleixner 38e9e81f4c x86/gdt: Use bitfields for initialization
The GDT entry related code uses two ways to access entries via
union fields:

 - bitfields

 - macros which initialize the two 16-bit parts of the entry
   by magic shift and mask operations.

Clean it up and only use the bitfields to initialize and access entries.

( The old access patterns were partly done due to GCC optimizing bitfield
  accesses in a horrible way - that's mostly fixed these days and clarity
  of code in such low level accessors is very important. )

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.197673367@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:25 +02:00
Thomas Gleixner 9a98e77800 x86/asm: Replace access to desc_struct:a/b fields
The union inside of desc_struct which allows access to the raw u32 parts of
the descriptors. This raw access part is about to go away.

Replace the few code parts which access those fields.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.120214366@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:25 +02:00
Thomas Gleixner 64b163fab6 x86/idt: Unify gate_struct handling for 32/64-bit kernels
The first 32 bits of gate struct are the same for 32 and 64 bit kernels.

The 32-bit version uses desc_struct and no designated data structure,
so we need different accessors for 32 and 64 bit kernels.

Aside of that the macros which are necessary to build the 32-bit
gate descriptor are horrible to read.

Unify the gate structs and switch all code fiddling with it over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:24 +02:00
Thomas Gleixner 7328552780 x86/tracing: Build tracepoints only when they are used
The tracepoint macro magic emits code for all tracepoints in a event header
file. That code stays around even if the tracepoint is not used at all. The
linker does not discard it.

Build the various irq_vector tracepoints dependent on the appropriate CONFIG
switches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.770651777@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 12:07:24 +02:00
Thomas Gleixner a45525b5b4 x86/irq_work: Make it depend on APIC
The irq work interrupt vector is only installed when CONFIG_X86_LOCAL_APIC is
enabled, but the interrupt handler is compiled in unconditionally.

Compile the cruft out when the APIC is disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.691909010@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:30 +02:00
Thomas Gleixner 0428e01a2f x86/ipi: Make platform IPI depend on APIC
The platform IPI vector is only installed when the local APIC is enabled. All
users of it depend on the local APIC anyway.

Make the related code conditional on CONFIG_X86_LOCAL_APIC=y.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.615286163@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:29 +02:00
Thomas Gleixner 809547472e x86/tracing: Disentangle pagefault and resched IPI tracing key
The pagefault and the resched IPI handler are the only ones where it is
worth to optimize the code further in case tracepoints are disabled. But it
makes no sense to have a single static key for both.

Seperate the static keys so the facilities are handled seperately.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.536699116@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:29 +02:00
Thomas Gleixner 6f54f3ec6c x86/idt: Clean up the i386 low level entry macros
Some of the entry function defines for i386 were explictely using the
BUILD_INTERRUPT3() macro to prevent that the extra trace entry got added
via BUILD_INTERRUPT(). No that the trace cruft is gone, the file can be
cleaned up and converted to use BUILD_INTERRUPT() which avoids the ugly
line breaks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.456815006@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:28 +02:00
Thomas Gleixner 4b9a8dca0e x86/idt: Remove the tracing IDT completely
No more users of the tracing IDT. All exception tracepoints have been moved
into the regular handlers. Get rid of the mess which shouldn't have been
created in the first place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.378851687@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:28 +02:00
Thomas Gleixner 3cd788c1ee x86/smp: Use static key for reschedule interrupt tracing
It's worth to avoid the extra irq_enter()/irq_exit() pair in the case that
the reschedule interrupt tracepoints are disabled.

Use the static key which indicates that exception tracing is enabled. For
now this key is global. It will be optimized in a later step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170828064957.299808677@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:27 +02:00
Thomas Gleixner 85b77cdd8f x86/smp: Remove pointless duplicated interrupt code
Two NOP5s are really a good tradeoff vs. the unholy IDT switching mess,
which duplicates code all over the place. The rescheduling interrupt gets
optimized in a later step.

Make the ordering of function call and statistics increment the same as in
other places. Calculate stats first, then do the function call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.222101344@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:27 +02:00
Thomas Gleixner 0f42ae283c x86/mce: Remove duplicated tracing interrupt code
Machine checks are not really high frequency events. The extra two NOP5s for
the disabled tracepoints are noise vs. the heavy lifting which needs to be
done in the MCE handler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.144301907@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:26 +02:00
Thomas Gleixner daabb8eb9a x86/irqwork: Get rid of duplicated tracing interrupt code
Two NOP5s are a reasonable tradeoff to avoid duplicated code and the
requirement to switch the IDT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.064746737@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:26 +02:00
Thomas Gleixner 61069de7a3 x86/apic: Remove the duplicated tracing versions of interrupts
The error and the spurious interrupt are really rare events and not at all
performance sensitive: two NOP5s can be tolerated when tracing is disabled.

Remove the complication.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170828064956.986009402@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:25 +02:00
Thomas Gleixner 8a17116b1f x86/irq: Get rid of duplicated trace_x86_platform_ipi() code
Two NOP5s are really a good tradeoff vs. the unholy IDT switching mess,
which duplicates code all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170828064956.907209383@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:25 +02:00
Thomas Gleixner 302a98f896 x86/apic: Remove the duplicated tracing version of local_timer_interrupt()
The two NOP5s are noise in the rest of the work which is done by the timer
interrupt and modern CPUs are pretty good in optimizing NOPs anyway.

Get rid of the interrupt handler duplication and move the tracepoints into
the regular handler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170828064956.751247330@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:24 +02:00
Thomas Gleixner 11a7ffb017 x86/traps: Simplify pagefault tracing logic
Make use of the new irqvector tracing static key and remove the duplicated
trace_do_pagefault() implementation.

If irq vector tracing is disabled, then the overhead of this is a single
NOP5, which is a reasonable tradeoff to avoid duplicated code and the
unholy macro mess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064956.672965407@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:23 +02:00
Thomas Gleixner 2feb1b316d x86/tracing: Introduce a static key for exception tracing
Switching the IDT just for avoiding tracepoints creates a completely
impenetrable macro/inline/ifdef mess.

There is no point in avoiding tracepoints for most of the traps/exceptions.
For the more expensive tracepoints, like pagefaults, this can be handled with
an explicit static key.

Preparatory patch to remove the tracing IDT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064956.593094539@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:23 +02:00
Thomas Gleixner 9aec458ff0 x86/irq: Remove duplicated used_vectors definition
Also remove the unparseable comment in the other place while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064956.436711634@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:21 +02:00
Thomas Gleixner 05161b9cbe x86/irq: Get rid of the 'first_system_vector' indirection bogosity
This variable is beyond pointless. Nothing allocates a vector via
alloc_gate() below FIRST_SYSTEM_VECTOR. So nothing can change
first_system_vector.

If there is a need for a gate below FIRST_SYSTEM_VECTOR then it can be
added to the vector defines and FIRST_SYSTEM_VECTOR can be adjusted
accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064956.357109735@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:21 +02:00
Thomas Gleixner 69de72ec6d x86/irq: Remove vector_used_by_percpu_irq()
Last user (lguest) is gone. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064956.201432430@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:20 +02:00
Ingo Molnar 416b0c0faf Merge branch 'linus' into x86/apic, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:42:07 +02:00
Greg Kroah-Hartman 9749c37275 Merge 4.13-rc7 into char-misc-next
We want the binder fix in here as well for testing and merge issues.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 10:19:01 +02:00
Linus Torvalds c153e62105 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two fixes: one for an ldt_struct handling bug and a cherry-picked
  objtool fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix use-after-free of ldt_struct
  objtool: Fix '-mtune=atom' decoding support in objtool 2.0
2017-08-26 09:06:28 -07:00
Ingo Molnar 413d63d71b Merge branch 'linus' into x86/mm to pick up fixes and to fix conflicts
Conflicts:
	arch/x86/kernel/head64.c
	arch/x86/mm/mmap.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-26 09:19:13 +02:00
Jiri Slaby 30d6e0a419 futex: Remove duplicated code and fix undefined behaviour
There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.

Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.

This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.

Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker <dalias@libc.org>
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
2017-08-25 22:49:59 +02:00
Ingo Molnar 3a9ff4fd04 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25 11:07:13 +02:00
Ingo Molnar 10c9850cb2 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25 11:04:51 +02:00
Eric Biggers ccd5b32351 x86/mm: Fix use-after-free of ldt_struct
The following commit:

  39a0526fb3 ("x86/mm: Factor out LDT init from context init")

renamed init_new_context() to init_new_context_ldt() and added a new
init_new_context() which calls init_new_context_ldt().  However, the
error code of init_new_context_ldt() was ignored.  Consequently, if a
memory allocation in alloc_ldt_struct() failed during a fork(), the
->context.ldt of the new task remained the same as that of the old task
(due to the memcpy() in dup_mm()).  ldt_struct's are not intended to be
shared, so a use-after-free occurred after one task exited.

Fix the bug by making init_new_context() pass through the error code of
init_new_context_ldt().

This bug was found by syzkaller, which encountered the following splat:

    BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
    Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710

    CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     __dump_stack lib/dump_stack.c:16 [inline]
     dump_stack+0x194/0x257 lib/dump_stack.c:52
     print_address_description+0x73/0x250 mm/kasan/report.c:252
     kasan_report_error mm/kasan/report.c:351 [inline]
     kasan_report+0x24e/0x340 mm/kasan/report.c:409
     __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
     free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
     free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
     destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
     destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
     __mmdrop+0xe9/0x530 kernel/fork.c:889
     mmdrop include/linux/sched/mm.h:42 [inline]
     exec_mmap fs/exec.c:1061 [inline]
     flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
     load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
     search_binary_handler+0x142/0x6b0 fs/exec.c:1652
     exec_binprm fs/exec.c:1694 [inline]
     do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
     do_execve+0x31/0x40 fs/exec.c:1860
     call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
     ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

    Allocated by task 3700:
     save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
     save_stack+0x43/0xd0 mm/kasan/kasan.c:447
     set_track mm/kasan/kasan.c:459 [inline]
     kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
     kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
     kmalloc include/linux/slab.h:493 [inline]
     alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
     write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
     sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
     entry_SYSCALL_64_fastpath+0x1f/0xbe

    Freed by task 3700:
     save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
     save_stack+0x43/0xd0 mm/kasan/kasan.c:447
     set_track mm/kasan/kasan.c:459 [inline]
     kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
     __cache_free mm/slab.c:3503 [inline]
     kfree+0xca/0x250 mm/slab.c:3820
     free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
     free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
     destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
     destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
     __mmdrop+0xe9/0x530 kernel/fork.c:889
     mmdrop include/linux/sched/mm.h:42 [inline]
     __mmput kernel/fork.c:916 [inline]
     mmput+0x541/0x6e0 kernel/fork.c:927
     copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
     copy_process kernel/fork.c:1546 [inline]
     _do_fork+0x1ef/0xfb0 kernel/fork.c:2025
     SYSC_clone kernel/fork.c:2135 [inline]
     SyS_clone+0x37/0x50 kernel/fork.c:2129
     do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
     return_from_SYSCALL_64+0x0/0x7a

Here is a C reproducer:

    #include <asm/ldt.h>
    #include <pthread.h>
    #include <signal.h>
    #include <stdlib.h>
    #include <sys/syscall.h>
    #include <sys/wait.h>
    #include <unistd.h>

    static void *fork_thread(void *_arg)
    {
        fork();
    }

    int main(void)
    {
        struct user_desc desc = { .entry_number = 8191 };

        syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));

        for (;;) {
            if (fork() == 0) {
                pthread_t t;

                srand(getpid());
                pthread_create(&t, NULL, fork_thread, NULL);
                usleep(rand() % 10000);
                syscall(__NR_exit_group, 0);
            }
            wait(NULL);
        }
    }

Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
may use vmalloc() to allocate a large ->entries array, and after
commit:

  5d17a73a2e ("vmalloc: back off when the current task is killed")

it is possible for userspace to fail a task's vmalloc() by
sending a fatal signal, e.g. via exit_group().  It would be more
difficult to reproduce this bug on kernels without that commit.

This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org> [v4.6+]
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Fixes: 39a0526fb3 ("x86/mm: Factor out LDT init from context init")
Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25 09:55:52 +02:00
Paolo Bonzini 38cfd5e3df KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state
The host pkru is restored right after vcpu exit (commit 1be0e61), so
KVM_GET_XSAVE will return the host PKRU value instead.  Fix this by
using the guest PKRU explicitly in fill_xsave and load_xsave.  This
part is based on a patch by Junkang Fu.

The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
because it could have been changed by userspace since the last time
it was saved, so skip loading it in kvm_load_guest_fpu.

Reported-by: Junkang Fu <junkang.fjk@alibaba-inc.com>
Cc: Yang Zhang <zy107165@alibaba-inc.com>
Fixes: 1be0e61c1f
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-25 09:28:37 +02:00
Paolo Bonzini b9dd21e104 KVM: x86: simplify handling of PKRU
Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
simple comparison against the host value of the register.  The write of
PKRU in addition can be skipped if the guest has not enabled the feature.
Once we do this, we need not test OSPKE in the host anymore, because
guest_CR4.PKE=1 implies host_CR4.PKE=1.

The static PKU test is kept to elide the code on older CPUs.

Suggested-by: Yang Zhang <zy107165@alibaba-inc.com>
Fixes: 1be0e61c1f
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-25 09:28:28 +02:00
Thomas Gleixner c0bb80cfa3 Merge branch 'x86/asm' into x86/apic
Pick up dependent changes to avoid merge conflicts
2017-08-25 08:56:22 +02:00
Wanpeng Li 664f8e26b0 KVM: X86: Fix loss of exception which has not yet been injected
vmx_complete_interrupts() assumes that the exception is always injected,
so it can be dropped by kvm_clear_exception_queue().  However,
an exception cannot be injected immediately if it is: 1) originally
destined to a nested guest; 2) trapped to cause a vmexit; 3) happening
right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true.

This patch applies to exceptions the same algorithm that is used for
NMIs, replacing exception.reinject with "exception.injected" (equivalent
to nmi_injected).

exception.pending now represents an exception that is queued and whose
side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet.
If exception.pending is true, the exception might result in a nested
vmexit instead, too (in which case the side effects must not be applied).

exception.injected instead represents an exception that is going to be
injected into the guest at the next vmentry.

Reported-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24 18:09:19 +02:00
Yu Zhang fd8cb43373 KVM: MMU: Expose the LA57 feature to VM.
This patch exposes 5 level page table feature to the VM.
At the same time, the canonical virtual address checking is
extended to support both 48-bits and 57-bits address width.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24 18:09:17 +02:00
Yu Zhang 855feb6736 KVM: MMU: Add 5 level EPT & Shadow page table support.
Extends the shadow paging code, so that 5 level shadow page
table can be constructed if VM is running in 5 level paging
mode.

Also extends the ept code, so that 5 level ept table can be
constructed if maxphysaddr of VM exceeds 48 bits. Unlike the
shadow logic, KVM should still use 4 level ept table for a VM
whose physical address width is less than 48 bits, even when
the VM is running in 5 level paging mode.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
[Unconditionally reset the MMU context in kvm_cpuid_update.
 Changing MAXPHYADDR invalidates the reserved bit bitmasks.
 - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24 18:09:17 +02:00
Yu Zhang 2a7266a8f9 KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL.
Now we have 4 level page table and 5 level page table in 64 bits
long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL,
then we can use PT64_ROOT_5LEVEL for 5 level page table, it's
helpful to make the code more clear.

Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just
redefine it to 5 whenever a replacement is needed for 5 level
paging.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24 18:09:16 +02:00
Yu Zhang d1cd3ce900 KVM: MMU: check guest CR3 reserved bits based on its physical address width.
Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the
reserved bits in CR3. Yet the length of reserved bits in
guest CR3 should be based on the physical address width
exposed to the VM. This patch changes CR3 check logic to
calculate the reserved bits at runtime.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-24 18:09:16 +02:00