Commit Graph

35823 Commits

Author SHA1 Message Date
Peter Zijlstra bf2b300844 x86/entry: Rename trace_hardirqs_off_prepare()
The typical pattern for trace_hardirqs_off_prepare() is:

  ENTRY
    lockdep_hardirqs_off(); // because hardware
    ... do entry magic
    instrumentation_begin();
    trace_hardirqs_off_prepare();
    ... do actual work
    trace_hardirqs_on_prepare();
    lockdep_hardirqs_on_prepare();
    instrumentation_end();
    ... do exit magic
    lockdep_hardirqs_on();

which shows that it's named wrong, rename it to
trace_hardirqs_off_finish(), as it concludes the hardirq_off transition.

Also, given that the above is the only correct order, make the traditional
all-in-one trace_hardirqs_off() follow suit.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.415774872@infradead.org
2020-06-11 15:15:24 +02:00
Peter Zijlstra 59bc300b71 x86/entry: Clarify irq_{enter,exit}_rcu()
Because:

  irq_enter_rcu() includes lockdep_hardirq_enter()
  irq_exit_rcu() does *NOT* include lockdep_hardirq_exit()

Which resulted in two 'stray' lockdep_hardirq_exit() calls in
idtentry.h, and me spending a long time trying to find the matching
enter calls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.359433429@infradead.org
2020-06-11 15:15:24 +02:00
Peter Zijlstra fd501d4f03 x86/entry: Remove DBn stacks
Both #DB itself, as all other IST users (NMI, #MC) now clear DR7 on
entry. Combined with not allowing breakpoints on entry/noinstr/NOKPROBE
text and no single step (EFLAGS.TF) inside the #DB handler should guarantee
no nested #DB.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.303027161@infradead.org
2020-06-11 15:15:23 +02:00
Peter Zijlstra f9912ada82 x86/entry: Remove debug IDT frobbing
This is all unused now.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.245019500@infradead.org
2020-06-11 15:15:23 +02:00
Peter Zijlstra 84b6a34915 x86/entry: Optimize local_db_save() for virt
Because DRn access is 'difficult' with virt; but the DR7 read is cheaper
than a cacheline miss on native, add a virt specific fast path to
local_db_save(), such that when breakpoints are not in use to avoid
touching DRn entirely.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.187833200@infradead.org
2020-06-11 15:15:22 +02:00
Peter Zijlstra cd840e424f x86/entry, mce: Disallow #DB during #MC
#MC is fragile as heck, don't tempt fate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.131187767@infradead.org
2020-06-11 15:15:22 +02:00
Peter Zijlstra fd338e3564 x86/entry, nmi: Disable #DB
Instead of playing stupid games with IST stacks, fully disallow #DB
during NMIs. There is absolutely no reason to allow them, and killing
this saves a heap of trouble.

#DB is already forbidden on noinstr and CEA, so there can't be a #DB before
this. Disabling it right after nmi_enter() ensures that the full NMI code
is protected.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.069223695@infradead.org
2020-06-11 15:15:22 +02:00
Peter Zijlstra e1de11d4d1 x86/entry: Introduce local_db_{save,restore}()
In order to allow other exceptions than #DB to disable breakpoints,
provide common helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.012060983@infradead.org
2020-06-11 15:15:21 +02:00
Lai Jiangshan fdef24dfcc x86/hw_breakpoint: Prevent data breakpoints on user_pcid_flush_mask
The per-CPU user_pcid_flush_mask is used in the low level entry code. A
data breakpoint can cause #DB recursion. 

Protect the full cpu_tlbstate structure for simplicity.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200526014221.2119-5-laijs@linux.alibaba.com
Link: https://lkml.kernel.org/r/20200529213320.955117574@infradead.org
2020-06-11 15:15:21 +02:00
Lai Jiangshan f9fe0b89f0 x86/hw_breakpoint: Prevent data breakpoints on per_cpu cpu_tss_rw
cpu_tss_rw is not directly referenced by hardware, but cpu_tss_rw is
accessed in CPU entry code, especially when #DB shifts its stacks.

If a data breakpoint would be set on cpu_tss_rw.x86_tss.ist[IST_INDEX_DB],
it would cause recursive #DB ending up in a double fault.

Add it to the list of protected items.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200526014221.2119-4-laijs@linux.alibaba.com
Link: https://lkml.kernel.org/r/20200529213320.897976479@infradead.org
2020-06-11 15:15:21 +02:00
Lai Jiangshan 97417cb9ad x86/hw_breakpoint: Prevent data breakpoints on direct GDT
A data breakpoint on the GDT can be fatal and must be avoided.  The GDT in
the CPU entry area is already protected, but not the direct GDT.

Add the necessary protection.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200526014221.2119-3-laijs@linux.alibaba.com
Link: https://lkml.kernel.org/r/20200529213320.840953950@infradead.org
2020-06-11 15:15:20 +02:00
Lai Jiangshan d390e6de89 x86/hw_breakpoint: Add within_area() to check data breakpoints
Add a within_area() helper to checking whether the data breakpoints overlap
with cpu_entry_area.

It will be used to completely prevent data breakpoints on GDT, IDT, or TSS.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200526014221.2119-2-laijs@linux.alibaba.com
Link: https://lkml.kernel.org/r/20200529213320.784524504@infradead.org
2020-06-11 15:15:20 +02:00
Vitaly Kuznetsov 998c2034c6 xen: Move xen_setup_callback_vector() definition to include/xen/hvm.h
Kbuild test robot reports the following problem on ARM:

  for 'xen_setup_callback_vector' [-Wmissing-prototypes]
1664 | void xen_setup_callback_vector(void) {}
|      ^~~~~~~~~~~~~~~~~~~~~~~~~

The problem is that xen_setup_callback_vector is a x86 only thing, its
definition is present in arch/x86/xen/xen-ops.h but not on ARM. In
events_base.c there is a stub for !CONFIG_XEN_PVHVM but it is not declared
as 'static'.

On x86 the situation is hardly better: drivers/xen/events/events_base.c
doesn't include 'xen-ops.h' from arch/x86/xen/, it includes its namesake
from include/xen/ which also results in a 'no previous prototype' warning.

Currently, xen_setup_callback_vector() has two call sites: one in
drivers/xen/events_base.c and another in arch/x86/xen/suspend_hvm.c. The
former is placed under #ifdef CONFIG_X86 and the later is only compiled
in when CONFIG_XEN_PVHVM.

Resolve the issue by moving xen_setup_callback_vector() declaration to
arch neutral 'include/xen/hvm.h' as the implementation lives in arch
neutral drivers/xen/events/events_base.c.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20200520161600.361895-1-vkuznets@redhat.com
2020-06-11 15:15:19 +02:00
Thomas Gleixner 320100a5ff x86/entry: Remove the TRACE_IRQS cruft
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202120.523289762@linutronix.de
2020-06-11 15:15:19 +02:00
Thomas Gleixner 3ffdfdcec1 x86/entry: Move paranoid irq tracing out of ASM code
The last step to remove the irq tracing cruft from ASM. Ignore #DF as the
maschine is going to die anyway.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202120.414043330@linutronix.de
2020-06-11 15:15:19 +02:00
Thomas Gleixner 9628f26bae x86/entry/64: Remove TRACE_IRQS_*_DEBUG
Since INT3/#BP no longer runs on an IST, this workaround is no longer
required.

Tested by running lockdep+ftrace as described in the initial commit:

  5963e317b1 ("ftrace/x86: Do not change stacks in DEBUG when calling lockdep")

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202120.319418546@linutronix.de
2020-06-11 15:15:18 +02:00
Thomas Gleixner fa95a0cb04 x86/entry/32: Remove redundant irq disable code
All exceptions/interrupts return with interrupts disabled now. No point in
doing this in ASM again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202120.221223450@linutronix.de
2020-06-11 15:15:18 +02:00
Thomas Gleixner 3b6c9bf69e x86/entry: Make enter_from_user_mode() static
The ASM users are gone. All callers are local.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202120.129232680@linutronix.de
2020-06-11 15:15:17 +02:00
Thomas Gleixner e3e5c64ea1 x86/entry/64: Remove IRQ stack switching ASM
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202120.021462159@linutronix.de
2020-06-11 15:15:17 +02:00
Thomas Gleixner 75da04f7f3 x86/entry: Remove the apic/BUILD interrupt leftovers
Remove all the code which was there to emit the system vector stubs. All
users are gone.

Move the now unused GET_CR2_INTO macro muck to head_64.S where the last
user is. Fixup the eye hurting comment there while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.927433002@linutronix.de
2020-06-11 15:15:16 +02:00
Thomas Gleixner 13cad9851e x86/entry: Convert reschedule interrupt to IDTENTRY_SYSVEC_SIMPLE
The scheduler IPI does not need the full interrupt entry handling logic
when the entry is from kernel mode. Use IDTENTRY_SYSVEC_SIMPLE and spare
all the overhead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.835425642@linutronix.de
2020-06-11 15:15:16 +02:00
Thomas Gleixner cb09ea2924 x86/entry: Convert XEN hypercall vector to IDTENTRY_SYSVEC
Convert the last oldstyle defined vector to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

Fixup the related XEN code by providing the primary C entry point in x86 to
avoid cluttering the generic code with X86'isms.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.741950104@linutronix.de
2020-06-11 15:15:15 +02:00
Thomas Gleixner a16be368dd x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVEC
Convert various hypervisor vectors to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.647997594@linutronix.de
2020-06-11 15:15:15 +02:00
Thomas Gleixner 9c3b1f4975 x86/entry: Convert KVM vectors to IDTENTRY_SYSVEC*
Convert KVM specific system vectors to IDTENTRY_SYSVEC*:

The two empty stub handlers which only increment the stats counter do no
need to run on the interrupt stack. Use IDTENTRY_SYSVEC_SIMPLE for them.

The wakeup handler does more work and runs on the interrupt stack.

None of these handlers need to save and restore the irq_regs pointer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.555715519@linutronix.de
2020-06-11 15:15:15 +02:00
Thomas Gleixner 720909a7ab x86/entry: Convert various system vectors
Convert various system vectors to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.464812973@linutronix.de
2020-06-11 15:15:14 +02:00
Thomas Gleixner 582f919123 x86/entry: Convert SMP system vectors to IDTENTRY_SYSVEC
Convert SMP system vectors to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.372234635@linutronix.de
2020-06-11 15:15:14 +02:00
Thomas Gleixner db0338eec5 x86/entry: Convert APIC interrupts to IDTENTRY_SYSVEC
Convert APIC interrupts to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.280728850@linutronix.de
2020-06-11 15:15:13 +02:00
Thomas Gleixner 6368558c37 x86/entry: Provide IDTENTRY_SYSVEC
Provide IDTENTRY variants for system vectors to consolidate the different
mechanisms to emit the ASM stubs for 32- and 64-bit.

On 64-bit this also moves the stack switching from ASM to C code. 32-bit will
excute the system vectors w/o stack switching as before.

The simple variant is meant for "empty" system vectors like scheduler IPI
and KVM posted interrupt vectors. These do not need the full glory of irq
enter/exit handling with softirq processing and more.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.185317067@linutronix.de
2020-06-11 15:15:13 +02:00
Thomas Gleixner fa5e5c4092 x86/entry: Use idtentry for interrupts
Replace the extra interrupt handling code and reuse the existing idtentry
machinery. This moves the irq stack switching on 64-bit from ASM to C code;
32-bit already does the stack switching in C.

This requires to remove HAVE_IRQ_EXIT_ON_IRQ_STACK as the stack switch is
not longer in the low level entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.078690991@linutronix.de
2020-06-11 15:15:12 +02:00
Thomas Gleixner 0bf7c314ff x86/entry: Add IRQENTRY_IRQ macro
Provide a seperate IDTENTRY macro for device interrupts. Similar to
IDTENTRY_ERRORCODE with the addition of invoking irq_enter/exit_rcu() and
providing the errorcode as a 'u8' argument to the C function, which
truncates the sign extended vector number.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.984573165@linutronix.de
2020-06-11 15:15:12 +02:00
Thomas Gleixner 7c2a57364c x86/irq: Rework handle_irq() for 64-bit
To consolidate the interrupt entry/exit code vs. the other exceptions
make handle_irq() an inline and handle both 64-bit and 32-bit mode.

Preparatory change to move irq stack switching for 64-bit to C which allows
to consolidate the entry exit handling by reusing the idtentry machinery
both in ASM and C.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.889972748@linutronix.de
2020-06-11 15:15:12 +02:00
Thomas Gleixner 633260fa14 x86/irq: Convey vector as argument and not in ptregs
Device interrupts which go through do_IRQ() or the spurious interrupt
handler have their separate entry code on 64 bit for no good reason.

Both 32 and 64 bit transport the vector number through ORIG_[RE]AX in
pt_regs. Further the vector number is forced to fit into an u8 and is
complemented and offset by 0x80 so it's in the signed character
range. Otherwise GAS would expand the pushq to a 5 byte instruction for any
vector > 0x7F.

Treat the vector number like an error code and hand it to the C function as
argument. This allows to get rid of the extra entry code in a later step.

Simplify the error code push magic by implementing the pushq imm8 via a
'.byte 0x6a, vector' sequence so GAS is not able to screw it up. As the
pushq imm8 is sign extending the resulting error code needs to be truncated
to 8 bits in C code.

Originally-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.796915981@linutronix.de
2020-06-11 15:15:11 +02:00
Thomas Gleixner 79b9c18302 x86/irq: Use generic irq_regs implementation
The only difference is the name of the per-CPU variable: irq_regs
vs. __irq_regs, but the accessor functions are identical.

Remove the pointless copy and use the generic variant.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.704169051@linutronix.de
2020-06-11 15:15:11 +02:00
Thomas Gleixner 74ebed3193 x86/entry/32: Remove common_exception()
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.611906966@linutronix.de
2020-06-11 15:15:11 +02:00
Thomas Gleixner 23d73f2ad4 x86/entry/64: Remove error_exit()
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.516757524@linutronix.de
2020-06-11 15:15:10 +02:00
Thomas Gleixner e88d974136 x86/entry: Change exit path of xen_failsafe_callback
xen_failsafe_callback() is invoked from XEN for two cases:

  1. Fault while reloading DS, ES, FS or GS
  2. Fault while executing IRET

 #1 retries the IRET after XEN has fixed up the segments.
 #2 injects a #GP which kills the task

For #1 there is no reason to go through the full exception return path
because the tasks TIF state is still the same. So just going straight to
the IRET path is good enough.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.423224507@linutronix.de
2020-06-11 15:15:10 +02:00
Thomas Gleixner e2dcb5f139 x86/entry: Remove the transition leftovers
Now that all exceptions are converted over the sane flag is not longer
needed. Also the vector argument of idtentry_body on 64-bit is pointless
now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.331115895@linutronix.de
2020-06-11 15:15:09 +02:00
Thomas Gleixner 91eeafea1e x86/entry: Switch page fault exception to IDTENTRY_RAW
Convert page fault exceptions to IDTENTRY_RAW:

  - Implement the C entry point with DEFINE_IDTENTRY_RAW
  - Add the CR2 read into the exception handler
  - Add the idtentry_enter/exit_cond_rcu() invocations in
    in the regular page fault handler and in the async PF
    part.
  - Emit the ASM stub with DECLARE_IDTENTRY_RAW
  - Remove the ASM idtentry in 64-bit
  - Remove the CR2 read from 64-bit
  - Remove the open coded ASM entry code in 32-bit
  - Fix up the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.238455120@linutronix.de
2020-06-11 15:15:09 +02:00
Thomas Gleixner 00cf8baf9c x86/entry/64: Simplify idtentry_body
All C functions which do not have an error code have been converted to the
new IDTENTRY interface which does not expect an error code in the
arguments. Spare the XORL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202118.145811853@linutronix.de
2020-06-11 15:15:08 +02:00
Thomas Gleixner 2f6474e463 x86/entry: Switch XEN/PV hypercall entry to IDTENTRY
Convert the XEN/PV hypercall to IDTENTRY:

  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64-bit
  - Remove the open coded ASM entry code in 32-bit
  - Remove the old prototypes

The handler stubs need to stay in ASM code as they need corner case handling
and adjustment of the stack pointer.

Provide a new C function which invokes the entry/exit handling and calls
into the XEN handler on the interrupt stack if required.

The exit code is slightly different from the regular idtentry_exit() on
non-preemptible kernels. If the hypercall is preemptible and need_resched()
is set then XEN provides a preempt hypercall scheduling function.

Move this functionality into the entry code so it can use the existing
idtentry functionality.

[ mingo: Build fixes. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20200521202118.055270078@linutronix.de
2020-06-11 15:15:08 +02:00
Thomas Gleixner 1de16e0c17 x86/entry: Split out idtentry_exit_cond_resched()
The XEN PV hypercall requires the ability of conditional rescheduling when
preemption is disabled because some hypercalls take ages.

Split out the rescheduling code from idtentry_exit_cond_rcu() so it can
be reused for that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.962199649@linutronix.de
2020-06-11 15:15:08 +02:00
Thomas Gleixner eb6555c839 x86/entry/64: Move do_softirq_own_stack() to C
The first step to get rid of the ENTER/LEAVE_IRQ_STACK ASM macro maze.  Use
the new C code helpers to move do_softirq_own_stack() out of ASM code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.870911120@linutronix.de
2020-06-11 15:15:07 +02:00
Thomas Gleixner 931b941459 x86/entry: Provide helpers for executing on the irqstack
Device interrupt handlers and system vector handlers are executed on the
interrupt stack. The stack switch happens in the low level assembly entry
code. This conflicts with the efforts to consolidate the exit code in C to
ensure correctness vs. RCU and tracing.

As there is no way to move #DB away from IST due to the MOV SS issue, the
requirements vs. #DB and NMI for switching to the interrupt stack do not
exist anymore. The only requirement is that interrupts are disabled.

That allows the moving of the stack switching to C code, which simplifies the
entry/exit handling further, because it allows the switching of stacks after
handling the entry and on exit before handling RCU, returning to usermode and
kernel preemption in the same way as for regular exceptions.

The initial attempt of having the stack switching in inline ASM caused too
much headache vs. objtool and the unwinder. After analysing the use cases
it was agreed on that having the stack switch in ASM for the price of an
indirect call is acceptable, as the main users are indirect call heavy
anyway and the few system vectors which are empty shells (scheduler IPI and
KVM posted interrupt vectors) can run from the regular stack.

Provide helper functions to check whether the interrupt stack is already
active and whether stack switching is required.

64-bit only for now, as 32-bit has a variant of that already. Once this is
cleaned up, the two implementations might be consolidated as an additional
cleanup on top.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.763775313@linutronix.de
2020-06-11 15:15:07 +02:00
Thomas Gleixner 9ee01e0f69 x86/entry: Clean up idtentry_enter/exit() leftovers
Now that everything is converted to conditional RCU handling remove
idtentry_enter/exit() and tidy up the conditional functions.

This does not remove rcu_irq_exit_preempt(), to avoid conflicts with the RCU
tree. Will be removed once all of this hits Linus's tree.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.473597954@linutronix.de
2020-06-11 15:15:06 +02:00
Thomas Gleixner fa95d7dc1a x86/idtentry: Switch to conditional RCU handling
Switch all idtentry_enter/exit() users over to the new conditional RCU
handling scheme and make the user mode entries in #DB, #INT3 and #MCE use
the user mode idtentry functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.382387286@linutronix.de
2020-06-11 15:15:05 +02:00
Thomas Gleixner 9f9781b60d x86/entry: Provide idtentry_enter/exit_user()
As there are exceptions which already handle entry from user mode and from
kernel mode separately, providing explicit user entry/exit handling callbacks
makes sense and makes the code easier to understand.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.289548561@linutronix.de
2020-06-11 15:15:05 +02:00
Thomas Gleixner 3eeec38584 x86/entry: Provide idtentry_entry/exit_cond_rcu()
After a lengthy discussion [1] it turned out that RCU does not need a full
rcu_irq_enter/exit() when RCU is already watching. All it needs if
NOHZ_FULL is active is to check whether the tick needs to be restarted.

This allows to avoid a separate variant for the pagefault handler which
cannot invoke rcu_irq_enter() on a kernel pagefault which might sleep.

The cond_rcu argument is only temporary and will be removed once the
existing users of idtentry_enter/exit() have been cleaned up. After that
the code can be significantly simplified.

[ mingo: Simplified the control flow ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Paul E. McKenney" <paulmck@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: [1] https://lkml.kernel.org/r/20200515235125.628629605@linutronix.de
Link: https://lore.kernel.org/r/20200521202117.181397835@linutronix.de
2020-06-11 15:15:04 +02:00
Ingo Molnar 7102cb0713 x86/entry: Fix allnoconfig build warning
The following commit:

  095b7a3e7745 ("x86/entry: Convert double fault exception to IDTENTRY_DF")

introduced a new build warning on 64-bit allnoconfig kernels, that have CONFIG_VMAP_STACK disabled:

  arch/x86/kernel/traps.c:332:16: warning: unused variable ‘address’ [-Wunused-variable]

This variable is only used if CONFIG_VMAP_STACK is defined, so make it
dependent on that, not CONFIG_X86_64.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
2020-06-11 15:15:03 +02:00
Thomas Gleixner c29c775a55 x86/entry: Convert double fault exception to IDTENTRY_DF
Convert #DF to IDTENTRY_DF
  - Implement the C entry point with DEFINE_IDTENTRY_DF
  - Emit the ASM stub with DECLARE_IDTENTRY_DF on 64bit
  - Remove the ASM idtentry in 64bit
  - Adjust the 32bit shim code
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.583415264@linutronix.de
2020-06-11 15:15:03 +02:00
Thomas Gleixner 6a8dfa8e40 x86/idtentry: Provide IDTENTRY_DF
Provide a separate macro for #DF as this needs to emit paranoid only code
and has also a special ASM stub in 32bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.583415264@linutronix.de
2020-06-11 15:15:02 +02:00
Thomas Gleixner 865d3a9afe x86/mce: Address objtools noinstr complaints
Mark the relevant functions noinstr, use the plain non-instrumented MSR
accessors. The only odd part is the instrumentation_begin()/end() pair around the
indirect machine_check_vector() call as objtool can't figure that out. The
possible invoked functions are annotated correctly.

Also use notrace variant of nmi_enter/exit(). If MCEs happen then hardware
latency tracing is the least of the worries.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.476734898@linutronix.de
2020-06-11 15:15:02 +02:00
Thomas Gleixner 75347bb253 x86/traps: Address objtool noinstr complaints in #DB
The functions invoked from handle_debug() can be instrumented. Tell objtool
about it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.380927730@linutronix.de
2020-06-11 15:15:01 +02:00
Thomas Gleixner 9347f41352 x86/traps: Restructure #DB handling
Now that there are separate entry points, move the kernel/user_mode specifc
checks into the entry functions so the common handling code does not need
the extra mode checks. Make the code more readable while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.283276272@linutronix.de
2020-06-11 15:15:01 +02:00
Thomas Gleixner 4c0dcd8350 x86/entry: Implement user mode C entry points for #DB and #MCE
The MCE entry point uses the same mechanism as the IST entry point for
now. For #DB split the inner workings and just keep the nmi_enter/exit()
magic in the IST variant. Fixup the ASM code to emit the proper
noist_##cfunc call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.177564104@linutronix.de
2020-06-11 15:15:00 +02:00
Thomas Gleixner f08e32ec3c x86/idtentry: Provide IDTRENTRY_NOIST variants for #DB and #MC
Provide NOIST entry point macros which allows to implement NOIST variants
of the C entry points. These are invoked when #DB or #MC enter from user
space. This allows explicit handling of the difference between user mode
and kernel mode entry later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.084882104@linutronix.de
2020-06-11 15:15:00 +02:00
Thomas Gleixner df7ccaffd2 x86/entry/64: Remove error code clearing from #DB and #MCE ASM stub
The C entry points do not expect an error code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.992621707@linutronix.de
2020-06-11 15:15:00 +02:00
Thomas Gleixner 2bbc68f837 x86/entry: Convert Debug exception to IDTENTRY_DB
Convert #DB to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY_DB
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.900297476@linutronix.de
2020-06-11 15:14:59 +02:00
Peter Zijlstra 9f58fdde95 x86/db: Split out dr6/7 handling
DR6/7 should be handled before nmi_enter() is invoked and restore after
nmi_exit() to minimize the exposure.

Split it out into helper inlines and bring it into the correct order.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.808628211@linutronix.de
2020-06-11 15:14:59 +02:00
Thomas Gleixner f051f69795 x86/nmi: Protect NMI entry against instrumentation
Mark all functions in the fragile code parts noinstr or force inlining so
they can't be instrumented.

Also make the hardware latency tracer invocation explicit outside of
non-instrumentable section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.716186134@linutronix.de
2020-06-11 15:14:58 +02:00
Thomas Gleixner 6271fef00b x86/entry: Convert NMI to IDTENTRY_NMI
Convert #NMI to IDTENTRY_NMI:
  - Implement the C entry point with DEFINE_IDTENTRY_NMI
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.609932306@linutronix.de
2020-06-11 15:14:58 +02:00
Thomas Gleixner 9cce81cff7 x86/idtentry: Provide IDTENTRY_XEN for XEN/PV
XEN/PV has special wrappers for NMI and DB exceptions. They redirect these
exceptions through regular IDTENTRY points. Provide the necessary IDTENTRY
macros to make this work

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.518622698@linutronix.de
2020-06-11 15:14:57 +02:00
Thomas Gleixner aedbdeab00 x86/mce: Use untraced rd/wrmsr in the MCE offline/crash check
mce_check_crashing_cpu() is called right at the entry of the MCE
handler. It uses mce_rdmsr() and mce_wrmsr() which are wrappers around
rdmsr() and wrmsr() to handle the MCE error injection mechanism, which is
pointless in this context, i.e. when the MCE hits an offline CPU or the
system is already marked crashing.

The MSR access can also be traced, so use the untraceable variants. This
is also safe vs. XEN paravirt as these MSRs are not affected by XEN PV
modifications.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.426347351@linutronix.de
2020-06-11 15:14:57 +02:00
Thomas Gleixner 8cd501c1fa x86/entry: Convert Machine Check to IDTENTRY_IST
Convert #MC to IDTENTRY_MCE:
  - Implement the C entry points with DEFINE_IDTENTRY_MCE
  - Emit the ASM stub with DECLARE_IDTENTRY_MCE
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the error code from *machine_check_vector() as
    it is always 0 and not used by any of the functions
    it can point to. Fixup all the functions as well.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.334980426@linutronix.de
2020-06-11 15:14:57 +02:00
Thomas Gleixner 94a46d316f x86/mce: Move nmi_enter/exit() into the entry point
There is no reason to have nmi_enter/exit() in the actual MCE
handlers. Move it to the entry point. This also covers the until now
uncovered initial handler which only prints.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.243936614@linutronix.de
2020-06-11 15:14:56 +02:00
Thomas Gleixner 2c058b03cc x86/idtentry: Provide IDTENTRY_IST
Same as IDTENTRY but for exceptions which run on Interrupt Stacks (IST) on
64bit. For 32bit this maps to IDTENTRY.

There are 3 variants which will be used:
      IDTENTRY_MCE
      IDTENTRY_DB
      IDTENTRY_NMI

These map to IDTENTRY_IST, but only the MCE and DB variants are emitting
ASM code as the NMI entry needs hand crafted ASM still.

The function defines do not contain any idtenter/exit calls as these
exceptions need special treatment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.137125609@linutronix.de
2020-06-11 15:14:55 +02:00
Peter Zijlstra 21e28290b3 x86/traps: Split int3 handler up
For code simplicity split up the int3 handler into a kernel and user part
which makes the code flow simpler to understand.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505135314.045220765@linutronix.de
2020-06-11 15:14:55 +02:00
Thomas Gleixner 8edd7e37ae x86/entry: Convert INT3 exception to IDTENTRY_RAW
Convert #BP to IDTENTRY_RAW:
  - Implement the C entry point with DEFINE_IDTENTRY_RAW
  - Invoke idtentry_enter/exit() from the function body
  - Emit the ASM stub with DECLARE_IDTENTRY_RAW
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

This could be a plain IDTENTRY, but as Peter pointed out INT3 is broken
vs. the static key in the context tracking code as this static key might be
in the state of being patched and has an int3 which would recurse forever.
IDTENTRY_RAW is therefore chosen to allow addressing this issue without
lots of code churn.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.938474960@linutronix.de
2020-06-11 15:14:55 +02:00
Thomas Gleixner 0dc6cdc21b x86/idtentry: Provide IDTENTRY_RAW
Some exception handlers need to do extra work before any of the entry
helpers are invoked. Provide IDTENTRY_RAW for this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.830540017@linutronix.de
2020-06-11 15:14:54 +02:00
Peter Zijlstra f64366efd8 x86/int3: Inline bsearch()
Avoid calling out to bsearch() by inlining it, for normal kernel configs
this was the last external call and poke_int3_handler() is now fully self
sufficient -- no calls to external code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.731774429@linutronix.de
2020-06-11 15:14:54 +02:00
Peter Zijlstra ef882bfef9 x86/int3: Avoid atomic instrumentation
Use arch_atomic_*() and __READ_ONCE() to ensure nothing untoward
creeps in and ruins things.

That is; this is the INT3 text poke handler, strictly limit the code
that runs in it, lest it inadvertenly hits yet another INT3.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.517429268@linutronix.de
2020-06-11 15:14:53 +02:00
Thomas Gleixner 4979fb53ab x86/int3: Ensure that poke_int3_handler() is not traced
In order to ensure poke_int3_handler() is completely self contained -- this
is called while modifying other text, imagine the fun of hitting another
INT3 -- ensure that everything it uses is not traced.

The primary means here is to force inlining; bsearch() is notrace because
all of lib/ is.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.410702173@linutronix.de
2020-06-11 15:14:52 +02:00
Thomas Gleixner d77290507a x86/entry/32: Convert IRET exception to IDTENTRY_SW
Convert the IRET exception handler to IDTENTRY_SW. This is slightly
different than the conversions of hardware exceptions as the IRET exception
is invoked via an exception table when IRET faults. So it just uses the
IDTENTRY_SW mechanism for consistency. It does not emit ASM code as it does
not fit the other idtentry exceptions.

  - Implement the C entry point with DEFINE_IDTENTRY_SW() which maps to
    DEFINE_IDTENTRY()
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134906.128769226@linutronix.de
2020-06-11 15:14:52 +02:00
Thomas Gleixner 48227e21f7 x86/entry: Convert SIMD coprocessor error exception to IDTENTRY
Convert #XF to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Handle INVD_BUG in C
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134906.021552202@linutronix.de
2020-06-11 15:14:52 +02:00
Thomas Gleixner 436608bb00 x86/entry: Convert Alignment check exception to IDTENTRY
Convert #AC to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.928967113@linutronix.de
2020-06-11 15:14:51 +02:00
Thomas Gleixner 14a8bd2aa7 x86/entry: Convert Coprocessor error exception to IDTENTRY
Convert #MF to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.838823510@linutronix.de
2020-06-11 15:14:51 +02:00
Thomas Gleixner dad7106f81 x86/entry: Convert Spurious interrupt bug exception to IDTENTRY
Convert #SPURIOUS to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.728077036@linutronix.de
2020-06-11 15:14:50 +02:00
Thomas Gleixner be4c11afbb x86/entry: Convert General protection exception to IDTENTRY
Convert #GP to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.637269946@linutronix.de
2020-06-11 15:14:50 +02:00
Thomas Gleixner fd9689bf91 x86/entry: Convert Stack segment exception to IDTENTRY
Convert #SS to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134905.539867572@linutronix.de
2020-06-11 15:14:49 +02:00
Thomas Gleixner 99a3fb8d01 x86/entry: Convert Segment not present exception to IDTENTRY
Convert #NP to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.443591450@linutronix.de
2020-06-11 15:14:49 +02:00
Thomas Gleixner 97b3d290b8 x86/entry: Convert Invalid TSS exception to IDTENTRY
Convert #TS to IDTENTRY_ERRORCODE:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.350676449@linutronix.de
2020-06-11 15:14:49 +02:00
Thomas Gleixner aabfe5383e x86/idtentry: Provide IDTENTRY_ERRORCODE
Same as IDTENTRY but the C entry point has an error code argument.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.258989060@linutronix.de
2020-06-11 15:14:48 +02:00
Thomas Gleixner f95658fdb5 x86/entry: Convert Coprocessor segment overrun exception to IDTENTRY
Convert #OLD_MF to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.838823510@linutronix.de
2020-06-11 15:14:48 +02:00
Thomas Gleixner 866ae2ccee x86/entry: Convert Device not available exception to IDTENTRY
Convert #NM to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134905.056243863@linutronix.de
2020-06-11 15:14:47 +02:00
Thomas Gleixner 49893c5cb2 x86/entry: Convert Invalid Opcode exception to IDTENTRY
Convert #UD to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Fixup the FOOF bug call in fault.c
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.955511913@linutronix.de
2020-06-11 15:14:47 +02:00
Thomas Gleixner 58d9c81fac x86/entry: Convert Bounds exception to IDTENTRY
Convert #BR to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the RCU warning as the new entry macro ensures correctness

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.863001309@linutronix.de
2020-06-11 15:14:46 +02:00
Thomas Gleixner 4b6b9111c0 x86/entry: Convert Overflow exception to IDTENTRY
Convert #OF to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.771457898@linutronix.de
2020-06-11 15:14:46 +02:00
Thomas Gleixner 9d06c4027f x86/entry: Convert Divide Error to IDTENTRY
Convert #DE to IDTENTRY:
  - Implement the C entry point with DEFINE_IDTENTRY
  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134904.663914713@linutronix.de
2020-06-11 15:14:46 +02:00
Thomas Gleixner 218e31b6e7 x86/traps: Prepare for using DEFINE_IDTENTRY
Prepare for using IDTENTRY to define the C exception/trap entry points. It
would be possible to glue this into the existing macro maze, but it's
simpler and better to read at the end to just make them distinct.

Provide a trivial inline helper to read the trap address and add a comment
explaining the logic behind it.

The existing macros will be removed once all instances are converted.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.556327833@linutronix.de
2020-06-11 15:14:45 +02:00
Thomas Gleixner 0ba50e861a x86/entry/common: Provide idtentry_enter/exit()
Provide functions which handle the low level entry and exit similar to
enter/exit from user mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134904.457578656@linutronix.de
2020-06-11 15:14:45 +02:00
Thomas Gleixner 53aaf262c6 x86/idtentry: Provide macros to define/declare IDT entry points
Provide DECLARE/DEFINE_IDTENTRY() macros.

DEFINE_IDTENTRY() provides a wrapper which acts as the function
definition. The exception handler body is just appended to it with curly
brackets. The entry point is marked noinstr so that irq tracing and the
enter_from_user_mode() can be moved into the C-entry point. As all
C-entries use the same macro (or a later variant) the necessary entry
handling can be implemented at one central place.

DECLARE_IDTENTRY() provides the function prototypes:
  - The C entry point 	    	cfunc
  - The ASM entry point		asm_cfunc
  - The XEN/PV entry point	xen_asm_cfunc

They all follow the same naming convention.

When included from ASM code DECLARE_IDTENTRY() is a macro which emits the
low level entry point in assembly by instantiating idtentry.

IDTENTRY is the simplest variant which just has a pt_regs argument. It's
going to be used for all exceptions which have no error code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.273363275@linutronix.de
2020-06-11 15:14:44 +02:00
Thomas Gleixner 60400677e1 x86/entry/32: Provide macro to emit IDT entry stubs
32 and 64 bit have unnecessary different ways to populate the exception
entry code. Provide a idtentry macro which allows to consolidate all of
that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134904.166735365@linutronix.de
2020-06-11 15:14:44 +02:00
Thomas Gleixner 424c7d0a9a x86/entry/64: Provide sane error entry/exit
For gradual conversion provide a macro parameter and the required code
which allows to handle instrumentation and interrupt flags tracking in C.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134904.058904490@linutronix.de
2020-06-11 15:14:44 +02:00
Thomas Gleixner cfa82a0053 x86/entry: Distangle idtentry
idtentry is a completely unreadable maze. Split it into distinct idtentry
variants which only contain the minimal code:

  - idtentry for regular exceptions
  - idtentry_mce_debug for #MCE and #DB
  - idtentry_df for #DF

The generated binary code is equivalent.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134903.949227617@linutronix.de
2020-06-11 15:14:43 +02:00
Thomas Gleixner 67f1386616 x86/entry/64: Reorder idtentries
Move them all together so verifying the cleanup patches for binary
equivalence will be easier.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134903.841853522@linutronix.de
2020-06-11 15:14:43 +02:00
Thomas Gleixner 877f183f83 x86/traps: Split trap numbers out in a separate header
So they can be used in ASM code. For this it is also necessary to convert
them to defines. Will be used for the rework of the entry code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134903.731004084@linutronix.de
2020-06-11 15:14:42 +02:00
Thomas Gleixner ca4c6a9858 x86/traps: Make interrupt enable/disable symmetric in C code
Traps enable interrupts conditionally but rely on the ASM return code to
disable them again. That results in redundant interrupt disable and trace
calls.

Make the trap handlers disable interrupts before returning to avoid that,
which allows simplification of the ASM entry code in follow up changes.

Originally-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134903.622702796@linutronix.de
2020-06-11 15:14:42 +02:00
Thomas Gleixner c9317202af x86/entry/64: Use native swapgs in asm_load_gs_index()
When PARAVIRT_XXL is in use, then load_gs_index() uses xen_load_gs_index()
and asm_load_gs_index() is unused.

It's therefore pointless to use the paravirtualized SWAPGS implementation
in asm_load_gs_index(). Switch it to a plain swapgs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200512213809.583980272@linutronix.de
2020-06-11 15:14:42 +02:00
Thomas Gleixner 410367e321 x86/entry: Disable interrupts for native_load_gs_index() in C code
There is absolutely no point in doing this in ASM code. Move it to C.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134903.531534675@linutronix.de
2020-06-11 15:14:41 +02:00
Thomas Gleixner daf7a69787 x86/traps: Mark sync_regs() noinstr
Replace the notrace and NOKPROBE annotations with noinstr.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134903.439765290@linutronix.de
2020-06-11 15:14:41 +02:00
Thomas Gleixner d73a332936 x86/traps: Mark fixup_bad_iret() noinstr
This is called from deep entry ASM in a situation where instrumentation
will cause more harm than providing useful information.

Switch from memmove() to memcpy() because memmove() can't be called
from noinstr code. 

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134903.346741553@linutronix.de
2020-06-11 15:14:41 +02:00
Peter Zijlstra 1c3e5d3f60 x86/entry: Make entry_64_compat.S objtool clean
Currently entry_64_compat is exempt from objtool, but with vmlinux
mode there is no hiding it.

Make the following changes to make it pass:

 - change entry_SYSENTER_compat to STT_NOTYPE; it's not a function
   and doesn't have function type stack setup.

 - mark all STT_NOTYPE symbols with UNWIND_HINT_EMPTY; so we do
   validate them and don't treat them as unreachable.

 - don't abuse RSP as a temp register, this confuses objtool
   mightily as it (rightfully) thinks we're doing unspeakable
   things to the stack.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134341.272248024@linutronix.de
2020-06-11 15:14:40 +02:00
Thomas Gleixner a7ef9ba986 x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline
Prevent the compiler from uninlining and creating traceable/probable
functions as this is invoked _after_ context tracking switched to
CONTEXT_USER and rcu idle.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134340.902709267@linutronix.de
2020-06-11 15:14:40 +02:00
Thomas Gleixner 4983e5d74c x86/entry: Move irq flags tracing to prepare_exit_to_usermode()
This is another step towards more C-code and less convoluted ASM.

Similar to the entry path, invoke the tracer before context tracking which
might turn off RCU and invoke lockdep as the last step before going back to
user space. Annotate the code sections in exit_to_user_mode() accordingly
so objtool won't complain about the tracer invocation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134340.703783926@linutronix.de
2020-06-11 15:14:39 +02:00
Thomas Gleixner dd8e2d9ae6 x86/entry: Move irq tracing on syscall entry to C-code
Now that the C entry points are safe, move the irq flags tracing code into
the entry helper:

    - Invoke lockdep before calling into context tracking

    - Use the safe trace_hardirqs_on_prepare() trace function after context
      tracking established state and RCU is watching.

enter_from_user_mode() is also still invoked from the exception/interrupt
entry code which still contains the ASM irq flags tracing. So this is just
a redundant and harmless invocation of tracing / lockdep until these are
removed as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134340.611961721@linutronix.de
2020-06-11 15:14:39 +02:00
Thomas Gleixner 8f159f1dfa x86/entry/common: Protect against instrumentation
Mark the various syscall entries with noinstr to protect them against
instrumentation and add the noinstrumentation_begin()/end() annotations to mark the
parts of the functions which are safe to call out into instrumentable code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134340.520277507@linutronix.de
2020-06-11 15:14:39 +02:00
Thomas Gleixner 1723be30e4 x86/entry: Mark enter_from_user_mode() noinstr
Both the callers in the low level ASM code and __context_tracking_exit()
which is invoked from enter_from_user_mode() via user_exit_irqoff() are
marked NOKPROBE. Allowing enter_from_user_mode() to be probed is
inconsistent at best.

Aside of that while function tracing per se is safe the function trace
entry/exit points can be used via BPF as well which is not safe to use
before context tracking has reached CONTEXT_KERNEL and adjusted RCU.

Mark it noinstr which moves it into the instrumentation protected text
section and includes notrace.

Note, this needs further fixups in context tracking to ensure that the
full call chain is protected. Will be addressed in follow up changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134340.429059405@linutronix.de
2020-06-11 15:14:38 +02:00
Thomas Gleixner 8c0fa8a036 x86/entry/32: Move non entry code into .text section
All ASM code which is not part of the entry functionality can move out into
the .text section. No reason to keep it in the non-instrumentable entry
section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134340.320164650@linutronix.de
2020-06-11 15:14:38 +02:00
Thomas Gleixner b9f6976bfb x86/entry/64: Move non entry code into .text section
All ASM code which is not part of the entry functionality can move out into
the .text section. No reason to keep it in the non-instrumentable entry
section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134340.227579223@linutronix.de
2020-06-11 15:14:37 +02:00
Thomas Gleixner fba8dbeaf3 x86/idt: Remove update_intr_gate()
No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-11 15:14:37 +02:00
Thomas Gleixner 5916d5f9b3 bug: Annotate WARN/BUG/stackfail as noinstr safe
Warnings, bugs and stack protection fails from noinstr sections, e.g. low
level and early entry code, are likely to be fatal.

Mark them as "safe" to be invoked from noinstr protected code to avoid
annotating all usage sites. Getting the information out is important.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.376598577@linutronix.de
2020-06-11 15:14:36 +02:00
Peter Zijlstra 20355e5f73 x86/entry: Exclude low level entry code from sanitizing
The sanitizers are not really applicable to the fragile low level entry
code. Entry code needs to carefully setup a normal 'runtime' environment.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134059.970057117@linutronix.de
2020-06-11 15:14:35 +02:00
Thomas Gleixner 44d7e4fbc0 x86/entry: Remove the unused LOCKDEP_SYSEXIT cruft
No users left since two years due to commit 21d375b6b3 ("x86/entry/64:
Remove the SYSCALL64 fast path")

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.061301403@linutronix.de
2020-06-11 15:14:35 +02:00
Thomas Gleixner 725005897e x86/entry/64: Avoid pointless code when CONTEXT_TRACKING=n
GAS cannot optimize out the test and conditional jump when context tracking
is disabled and CALL_enter_from_user_mode is an empty macro.

Wrap it in #ifdeffery. Will go away once all this is moved to C.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134058.955968069@linutronix.de
2020-06-11 15:14:34 +02:00
Peter Zijlstra e9660391d0 x86/doublefault: Remove memmove() call
Use of memmove() in #DF is problematic considered tracing and other
instrumentation.

Remove the memmove() call and simply write out what needs doing; this
even clarifies the code, win-win! The code copies from the espfix64
stack to the normal task stack, there is no possible way for that to
overlap.

Survives selftests/x86, specifically sigreturn_64.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134058.863038566@linutronix.de
2020-06-11 15:14:34 +02:00
Andy Lutomirski 24ae0c91cb x86/hw_breakpoint: Prevent data breakpoints on cpu_entry_area
A data breakpoint near the top of an IST stack will cause unrecoverable
recursion.  A data breakpoint on the GDT, IDT, or TSS is terrifying.
Prevent either of these from happening.

Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134058.272448010@linutronix.de
2020-06-11 15:14:33 +02:00
Vitaly Kuznetsov 1f1fbc70c1 x86/idt: Keep spurious entries unset in system_vectors
With commit dc20b2d526 ("x86/idt: Move interrupt gate initialization to
IDT code") non assigned system vectors are also marked as used in
'used_vectors' (now 'system_vectors') bitmap. This makes checks in
arch_show_interrupts() whether a particular system vector is allocated to
always pass and e.g. 'Hyper-V reenlightenment interrupts' entry always
shows up in /proc/interrupts.

Another side effect of having all unassigned system vectors marked as used
is that irq_matrix_debug_show() will wrongly count them among 'System'
vectors.

As it is now ensured that alloc_intr_gate() is not called after init, it is
possible to leave unused entries in 'system_vectors' unset to fix these
issues.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200428093824.1451532-4-vkuznets@redhat.com
2020-06-11 15:14:33 +02:00
Vitaly Kuznetsov 06184325a1 x86/idt: Annotate alloc_intr_gate() with __init
There seems to be no reason to allocate interrupt gates after init. Mark
alloc_intr_gate() as __init and add WARN_ON() checks making sure it is
only used before idt_setup_apic_and_irq_gates() finalizes IDT setup and
maps all un-allocated entries to spurious entries.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200428093824.1451532-3-vkuznets@redhat.com
2020-06-11 15:14:33 +02:00
Vitaly Kuznetsov a0bb51f263 x86/xen: Split HVM vector callback setup and interrupt gate allocation
As a preparatory change for making alloc_intr_gate() __init split
xen_callback_vector() into callback vector setup via hypercall
(xen_setup_callback_vector()) and interrupt gate allocation
(xen_alloc_callback_vector()).

xen_setup_callback_vector() is being called twice: on init and upon
system resume from xen_hvm_post_suspend(). alloc_intr_gate() only
needs to be called once.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200428093824.1451532-2-vkuznets@redhat.com
2020-06-11 15:14:32 +02:00
Lai Jiangshan fbaed278a3 x86/idt: Remove address operator on function machine_check()
machine_check is function address, the address operator on it is nop for
compiler.

Make it consistent with the other function addresses in the same file.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200419144049.1906-3-laijs@linux.alibaba.com
2020-06-11 15:14:32 +02:00
Lai Jiangshan c758907004 x86/entry/64: Remove unneeded kernel CR3 switching
When native_load_gs_index() fails on .Lgs_change, CR3 must be kernel
CR3. No need to switch it.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200419144049.1906-2-laijs@linux.alibaba.com
2020-06-11 15:14:31 +02:00
Lai Jiangshan 26fa1263b0 x86/entry/64: Remove an unused label
The label .Lcommon_\sym was introduced by 39e9543344.
(x86-64: Reduce amount of redundant code generated for invalidate_interruptNN)
And all the other relevant information was removed by 52aec3308d
(x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR)

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200419144049.1906-4-laijs@linux.alibaba.com
2020-06-11 15:14:31 +02:00
Peter Zijlstra 37f8173dd8 locking/atomics: Flip fallbacks and instrumentation
Currently instrumentation of atomic primitives is done at the architecture
level, while composites or fallbacks are provided at the generic level.

The result is that there are no uninstrumented variants of the
fallbacks. Since there is now need of such variants to isolate text poke
from any form of instrumentation invert this ordering.

Doing this means moving the instrumentation into the generic code as
well as having (for now) two variants of the fallbacks.

Notes:

 - the various *cond_read* primitives are not proper fallbacks
   and got moved into linux/atomic.c. No arch_ variants are
   generated because the base primitives smp_cond_load*()
   are instrumented.

 - once all architectures are moved over to arch_atomic_ one of the
   fallback variants can be removed and some 2300 lines reclaimed.

 - atomic_{read,set}*() are no longer double-instrumented

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20200505134058.769149955@linutronix.de
2020-06-11 08:03:24 +02:00
Linus Torvalds 4382a79b27 Merge branch 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc uaccess updates from Al Viro:
 "Assorted uaccess patches for this cycle - the stuff that didn't fit
  into thematic series"

* 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  bpf: make bpf_check_uarg_tail_zero() use check_zeroed_user()
  x86: kvm_hv_set_msr(): use __put_user() instead of 32bit __clear_user()
  user_regset_copyout_zero(): use clear_user()
  TEST_ACCESS_OK _never_ had been checked anywhere
  x86: switch cp_stat64() to unsafe_put_user()
  binfmt_flat: don't use __put_user()
  binfmt_elf_fdpic: don't use __... uaccess primitives
  binfmt_elf: don't bother with __{put,copy_to}_user()
  pselect6() and friends: take handling the combined 6th/7th args into helper
2020-06-10 16:02:54 -07:00
Linus Torvalds 3a2a875174 power supply and reset changes for the v5.8 series
kobject:
  * Increase number of allowed uevent variables
 
 power-supply core:
  * Add power-supply type in uevent
  * Cleanup property handling in core
  * Make property and usb_type pointers const
  * Convert core power-supply DT binding to YAML
  * Cleanup HWMON code
  * Add new health status "calibration required"
  * Add new properties for manufacture date and
    capacity error margin
 
 battery drivers:
  * new cw2015 battery driver used by pine64 Pinebook Pro laptop
  * axp22: blacklist on Meegopad T02
  * sc27xx: support current/voltage reading
  * max17042: support time-to-empty reading
  * simple-battery: add more battery parameters
  * bq27xxx: convert DT binding document to YAML
  * sbs-battery: add TI BQ20Z65 support, fix technology property, convert
                 DT binding to YAML, add option to disable charger
 		broadcasts, add new properties: manufacture date,
 		capacity error margin, average current, charge current
 		and voltage and support calibration required health
 		status
  * misc. fixes
 
 charger drivers:
  * bq25890: cleanup, implement charge type, precharge current and input
             current limiting properties
  * bd70528: use new linear range helper library
  * bd99954: new charger driver
  * mp2629: new charger driver
  * misc. fixes
 
 reboot drivers:
  * oxnas-restart: introduce new driver
  * syscon-reboot: convert DT binding to YAML, add parent syscon device support
  * misc. fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE72YNB0Y/i3JqeVQT2O7X88g7+poFAl7a2L0ACgkQ2O7X88g7
 +poAIg//caN2sHqOFEsukXWe1oft/X0IQeGNmlCnj55l1zGJ/x03Yn0GeKb0FZgM
 go+sfMaM/a6NNdmKhseLIsYFlhVBa2E7qW1nvaxgt66JKS+GXAHfE8LWVwazDJex
 rq0esXuegsqFdbCUth81YLk2H+0qwnhVVv/Urvv1RaE/woeFHAHL1cYfcFa+YDXm
 XRDT0W73YozslFkMnZMLBLyQzad3yVnNcnRYF3Dx0CMnUsjGfPjBlyk4RLPTcgUk
 8ChTvCcHRG7IhEtF0a1HUr3UjCy9rjwiqWIobQltnbEYImxY0LWkCVVr1EsNjeyr
 ikl3c4JfwmdlVCBCSPn294mPlGKu8DNBMLN1IgAuJHKW1GuQxd0Tcbd9OwF6VlVj
 WTFWp2GcoIjKQtOWKDeCqby+hoWhgclxTHUudo7FGTT0xBP9OWTKWDpDL18cZWd8
 dNCFMEI6MDMRtwL8+3ilLcnvSoMzyZ94TZmwii9toD3xSI7TMn7oVRuOlq1TkOEw
 ZuPS3QP6jBHm8NN9JEhLbrwradP+R8qpE/jpsnWiBDR6Jt7xx4W7W0xupq4GdHAl
 u0Lbh38/0bkTijeo9xoPe03KsSBV8HOr9Lf/QVW3zLcYcGKW17j8QTkes3a0bOs0
 DZHVCFeXuWNFa/tJxe6c+J+NsroLucTMwA3lBmlZhI/TPsG2vb8=
 =He9L
 -----END PGP SIGNATURE-----

Merge tag 'for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply and reset updates from Sebastian Reichel:
 "This time there are lots of changes. Quite a few changes to the core,
  lots of driver changes and one change to kobject core (with Ack from
  Greg).

  Summary:

  kobject:
   - Increase number of allowed uevent variables

  power-supply core:
   - Add power-supply type in uevent
   - Cleanup property handling in core
   - Make property and usb_type pointers const
   - Convert core power-supply DT binding to YAML
   - Cleanup HWMON code
   - Add new health status "calibration required"
   - Add new properties for manufacture date and capacity error margin

  battery drivers:
   - new cw2015 battery driver used by pine64 Pinebook Pro laptop
   - axp22: blacklist on Meegopad T02
   - sc27xx: support current/voltage reading
   - max17042: support time-to-empty reading
   - simple-battery: add more battery parameters
   - bq27xxx: convert DT binding document to YAML
   - sbs-battery: add TI BQ20Z65 support, fix technology property,
         convert DT binding to YAML, add option to disable charger
         broadcasts, add new properties: manufacture date, capacity
         error margin, average current, charge current and voltage and
         support calibration required health status
   - misc fixes

  charger drivers:
   - bq25890: cleanup, implement charge type, precharge current and
         input current limiting properties
   - bd70528: use new linear range helper library
   - bd99954: new charger driver
   - mp2629: new charger driver
   - misc fixes

  reboot drivers:
   - oxnas-restart: introduce new driver
   - syscon-reboot: convert DT binding to YAML, add parent syscon device
         support
   - misc fixes"

* tag 'for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (85 commits)
  power: supply: cw2015: Attach OF ID table to the driver
  power: reset: gpio-poweroff: add missing '\n' in dev_err()
  Revert "power: supply: sbs-battery: simplify read_read_string_data"
  Revert "power: supply: sbs-battery: add PEC support"
  dt-bindings: power: sbs-battery: Convert to yaml
  power: supply: sbs-battery: constify power-supply property array
  power: supply: sbs-battery: switch to i2c's probe_new
  power: supply: sbs-battery: switch from of_property_* to device_property_*
  power: supply: sbs-battery: add ability to disable charger broadcasts
  power: supply: sbs-battery: fix idle battery status
  power: supply: sbs-battery: add POWER_SUPPLY_HEALTH_CALIBRATION_REQUIRED support
  power: supply: sbs-battery: add MANUFACTURE_DATE support
  power: supply: sbs-battery: add POWER_SUPPLY_PROP_CONSTANT_CHARGE_CURRENT/VOLTAGE_MAX support
  power: supply: sbs-battery: Improve POWER_SUPPLY_PROP_TECHNOLOGY support
  power: supply: sbs-battery: add POWER_SUPPLY_PROP_CURRENT_AVG support
  power: supply: sbs-battery: add PEC support
  power: supply: sbs-battery: simplify read_read_string_data
  power: supply: sbs-battery: add POWER_SUPPLY_PROP_CAPACITY_ERROR_MARGIN support
  power: supply: sbs-battery: Add TI BQ20Z65 support
  power: supply: core: add POWER_SUPPLY_HEALTH_CALIBRATION_REQUIRED
  ...
2020-06-10 11:28:35 -07:00
Linus Torvalds 3beff76b54 x86: use proper parentheses around new uaccess macro argument uses
__get_kernel_nofault() didn't have the parentheses around the use of
'src' and 'dst' macro arguments, making the casts potentially do the
wrong thing.

The parentheses aren't necessary with the current very limited use in
mm/access.c, but it's bad form, and future use-cases might have very
unexpected errors as a result.

Do the same for unsafe_copy_loop() while at it, although in that case it
is an entirely internal x86 uaccess helper macro that isn't used
anywhere else and any other use would be invalid anyway.

Fixes: fa94111d94 ("x86: use non-set_fs based maccess routines")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 10:39:33 -07:00
Linus Torvalds a5ad5742f6 Merge branch 'akpm' (patches from Andrew)
Merge even more updates from Andrew Morton:

 - a kernel-wide sweep of show_stack()

 - pagetable cleanups

 - abstract out accesses to mmap_sem - prep for mmap_sem scalability work

 - hch's user acess work

Subsystems affected by this patch series: debug, mm/pagemap, mm/maccess,
mm/documentation.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (93 commits)
  include/linux/cache.h: expand documentation over __read_mostly
  maccess: return -ERANGE when probe_kernel_read() fails
  x86: use non-set_fs based maccess routines
  maccess: allow architectures to provide kernel probing directly
  maccess: move user access routines together
  maccess: always use strict semantics for probe_kernel_read
  maccess: remove strncpy_from_unsafe
  tracing/kprobes: handle mixed kernel/userspace probes better
  bpf: rework the compat kernel probe handling
  bpf:bpf_seq_printf(): handle potentially unsafe format string better
  bpf: handle the compat string in bpf_trace_copy_string better
  bpf: factor out a bpf_trace_copy_string helper
  maccess: unify the probe kernel arch hooks
  maccess: remove probe_read_common and probe_write_common
  maccess: rename strnlen_unsafe_user to strnlen_user_nofault
  maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
  maccess: rename strncpy_from_unsafe_user to strncpy_from_user_nofault
  maccess: update the top of file comment
  maccess: clarify kerneldoc comments
  maccess: remove duplicate kerneldoc comments
  ...
2020-06-09 09:54:46 -07:00
Christoph Hellwig fa94111d94 x86: use non-set_fs based maccess routines
Provide arch_kernel_read and arch_kernel_write routines to implement the
maccess routines without messing with set_fs and without stac/clac that
opens up access to user space.

[akpm@linux-foundation.org: coding style fixes]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-20-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:16 -07:00
Christoph Hellwig 98a23609b1 maccess: always use strict semantics for probe_kernel_read
Except for historical confusion in the kprobes/uprobes and bpf tracers,
which has been fixed now, there is no good reason to ever allow user
memory accesses from probe_kernel_read.  Switch probe_kernel_read to only
read from kernel memory.

[akpm@linux-foundation.org: update it for "mm, dump_page(): do not crash with invalid mapping pointer"]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-17-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig eab0c6089b maccess: unify the probe kernel arch hooks
Currently architectures have to override every routine that probes
kernel memory, which includes a pure read and strcpy, both in strict
and not strict variants.  Just provide a single arch hooks instead to
make sure all architectures cover all the cases.

[akpm@linux-foundation.org: fix !CONFIG_X86_64 build]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-11-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Christoph Hellwig c4cb164426 maccess: rename strncpy_from_unsafe_strict to strncpy_from_kernel_nofault
This matches the naming of strncpy_from_user_nofault, and also makes it
more clear what the function is supposed to do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200521152301.2587579-8-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:15 -07:00
Michel Lespinasse c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse da1c55f1b2 mmap locking API: rename mmap_sem to mmap_lock
Rename the mmap_sem field to mmap_lock.  Any new uses of this lock should
now go through the new mmap locking api.  The mmap_lock is still
implemented as a rwsem, though this could change in the future.

[akpm@linux-foundation.org: fix it for mm-gup-might_lock_readmmap_sem-in-get_user_pages_fast.patch]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-11-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse 42fc541404 mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()
Add new APIs to assert that mmap_sem is held.

Using this instead of rwsem_is_locked and lockdep_assert_held[_write]
makes the assertions more tolerant of future changes to the lock type.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-10-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse 14c3656b72 mmap locking API: add MMAP_LOCK_INITIALIZER
Define a new initializer for the mmap locking api.  Initially this just
evaluates to __RWSEM_INITIALIZER as the API is defined as wrappers around
rwsem.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-9-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse 89154dd531 mmap locking API: convert mmap_sem call sites missed by coccinelle
Convert the last few remaining mmap_sem rwsem calls to use the new mmap
locking API.  These were missed by coccinelle for some reason (I think
coccinelle does not support some of the preprocessor constructs in these
files ?)

[akpm@linux-foundation.org: convert linux-next leftovers]
[akpm@linux-foundation.org: more linux-next leftovers]
[akpm@linux-foundation.org: more linux-next leftovers]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Michel Lespinasse d8ed45c5dc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Mike Rapoport 974b9b2c68 mm: consolidate pte_index() and pte_offset_*() definitions
All architectures define pte_index() as

	(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)

and all architectures define pte_offset_kernel() as an entry in the array
of PTEs indexed by the pte_index().

For the most architectures the pte_offset_kernel() implementation relies
on the availability of pmd_page_vaddr() that converts a PMD entry value to
the virtual address of the page containing PTEs array.

Let's move x86 definitions of the PTE accessors to the generic place in
<linux/pgtable.h> and then simply drop the respective definitions from the
other architectures.

The architectures that didn't provide pmd_page_vaddr() are updated to have
that defined.

The generic implementation of pte_offset_kernel() can be overridden by an
architecture and alpha makes use of this because it has special ordering
requirements for its version of pte_offset_kernel().

[rppt@linux.ibm.com: v2]
  Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
[rppt@linux.ibm.com: update]
  Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
[rppt@linux.ibm.com: update]
  Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
[akpm@linux-foundation.org: fix x86 warning]
[sfr@canb.auug.org.au: fix powerpc build]
  Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Mike Rapoport e05c7b1f2b mm: pgtable: add shortcuts for accessing kernel PMD and PTE
The powerpc 32-bit implementation of pgtable has nice shortcuts for
accessing kernel PMD and PTE for a given virtual address.  Make these
helpers available for all architectures.

[rppt@linux.ibm.com: microblaze: fix page table traversal in setup_rt_frame()]
  Link: http://lkml.kernel.org/r/20200518191511.GD1118872@kernel.org
[akpm@linux-foundation.org: s/pmd_ptr_k/pmd_off_k/ in various powerpc places]

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-9-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport 88107d330d x86/mm: simplify init_trampoline() and surrounding logic
There are three cases for the trampoline initialization:
* 32-bit does nothing
* 64-bit with kaslr disabled simply copies a PGD entry from the direct map
  to the trampoline PGD
* 64-bit with kaslr enabled maps the real mode trampoline at PUD level

These cases are currently differentiated by a bunch of ifdefs inside
asm/include/pgtable.h and the case of 64-bits with kaslr on uses
pgd_index() helper.

Replacing the ifdefs with a static function in arch/x86/mm/init.c gives
clearer code and allows moving pgd_index() to the generic implementation
in include/linux/pgtable.h

[rppt@linux.ibm.com: take CONFIG_RANDOMIZE_MEMORY into account in kaslr_enabled()]
  Link: http://lkml.kernel.org/r/20200525104045.GB13212@linux.ibm.com

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-8-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport 65fddcfca8 mm: reorder includes after introduction of linux/pgtable.h
The replacement of <asm/pgrable.h> with <linux/pgtable.h> made the include
of the latter in the middle of asm includes.  Fix this up with the aid of
the below script and manual adjustments here and there.

	import sys
	import re

	if len(sys.argv) is not 3:
	    print "USAGE: %s <file> <header>" % (sys.argv[0])
	    sys.exit(1)

	hdr_to_move="#include <linux/%s>" % sys.argv[2]
	moved = False
	in_hdrs = False

	with open(sys.argv[1], "r") as f:
	    lines = f.readlines()
	    for _line in lines:
		line = _line.rstrip('
')
		if line == hdr_to_move:
		    continue
		if line.startswith("#include <linux/"):
		    in_hdrs = True
		elif not moved and in_hdrs:
		    moved = True
		    print hdr_to_move
		print line

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport ca5999fde0 mm: introduce include/linux/pgtable.h
The include/linux/pgtable.h is going to be the home of generic page table
manipulation functions.

Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
make the latter include asm/pgtable.h.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Mike Rapoport e31cf2f4ca mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include <linux/mm.h>") ; do
		sed -i -e '/include <asm\/pgtable.h>/ d' $f
	done

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Dmitry Safonov 9cb8f069de kernel: rename show_stack_loglvl() => show_stack()
Now the last users of show_stack() got converted to use an explicit log
level, show_stack_loglvl() can drop it's redundant suffix and become once
again well known show_stack().

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200418201944.482088-51-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:13 -07:00
Dmitry Safonov 9ed5b01a36 x86/amd_gart: print stacktrace for a leak with KERN_ERR
It's under CONFIG_IOMMU_LEAK option which is enabled by debug config.
Likely the backtrace is worth to be seen - so aligning with log level of
error message in iommu_full().

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200418201944.482088-46-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:12 -07:00
Dmitry Safonov a832ff0224 x86: add show_stack_loglvl()
Currently, the log-level of show_stack() depends on a platform
realization.  It creates situations where the headers are printed with
lower log level or higher than the stacktrace (depending on a platform or
user).

Furthermore, it forces the logic decision from user to an architecture
side.  In result, some users as sysrq/kdb/etc are doing tricks with
temporary rising console_loglevel while printing their messages.  And in
result it not only may print unwanted messages from other CPUs, but also
omit printing at all in the unlucky case where the printk() was deferred.

Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier
approach than introducing more printk buffers.  Also, it will consolidate
printings with headers.

Introduce show_stack_loglvl(), that eventually will substitute
show_stack().

[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/T/#u

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200418201944.482088-42-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:12 -07:00
Dmitry Safonov d46b3df78a x86: add missing const qualifiers for log_lvl
Currently, the log-level of show_stack() depends on a platform
realization.  It creates situations where the headers are printed with
lower log level or higher than the stacktrace (depending on a platform or
user).

Furthermore, it forces the logic decision from user to an architecture
side.  In result, some users as sysrq/kdb/etc are doing tricks with
temporary rising console_loglevel while printing their messages.  And in
result it not only may print unwanted messages from other CPUs, but also
omit printing at all in the unlucky case where the printk() was deferred.

Introducing log-level parameter and KERN_UNSUPPRESSED [1] seems an easier
approach than introducing more printk buffers.  Also, it will consolidate
printings with headers.

Keep log_lvl const show_trace_log_lvl() and printk_stack_address() as the
new generic show_stack_loglvl() wants to have a proper const qualifier.

And gcc rightfully produces warnings in case it's not keept:
arch/x86/kernel/dumpstack.c: In function `show_stack':
arch/x86/kernel/dumpstack.c:294:37: warning: passing argument 4 of `show_trace_log_lv ' discards `const' qualifier from pointer target type [-Wdiscarded-qualifiers]
  294 |  show_trace_log_lvl(task, NULL, sp, loglvl);
      |                                     ^~~~~~
arch/x86/kernel/dumpstack.c:163:32: note: expected `char *' but argument is of type `const char *'
  163 |    unsigned long *stack, char *log_lvl)
      |                          ~~~~~~^~~~~~~

[1]: https://lore.kernel.org/lkml/20190528002412.1625-1-dima@arista.com/T/#u

Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200418201944.482088-41-dima@arista.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:12 -07:00
Linus Torvalds 8b4d37db9a Merge branch 'x86/srbds' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 srbds fixes from Thomas Gleixner:
 "The 9th episode of the dime novel "The performance killer" with the
  subtitle "Slow Randomizing Boosts Denial of Service".

  SRBDS is an MDS-like speculative side channel that can leak bits from
  the random number generator (RNG) across cores and threads. New
  microcode serializes the processor access during the execution of
  RDRAND and RDSEED. This ensures that the shared buffer is overwritten
  before it is released for reuse. This is equivalent to a full bus
  lock, which means that many threads running the RNG instructions in
  parallel have the same effect as the same amount of threads issuing a
  locked instruction targeting an address which requires locking of two
  cachelines at once.

  The mitigation support comes with the usual pile of unpleasant
  ingredients:

   - command line options

   - sysfs file

   - microcode checks

   - a list of vulnerable CPUs identified by model and stepping this
     time which requires stepping match support for the cpu match logic.

   - the inevitable slowdown of affected CPUs"

* branch 'x86/srbds' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation: Add Ivy Bridge to affected list
  x86/speculation: Add SRBDS vulnerability and mitigation documentation
  x86/speculation: Add Special Register Buffer Data Sampling (SRBDS) mitigation
  x86/cpu: Add 'table' argument to cpu_matches()
2020-06-09 09:30:21 -07:00
Thomas Gleixner 7778d8417b x86/vdso: Unbreak paravirt VDSO clocks
The conversion of x86 VDSO to the generic clock mode storage broke the
paravirt and hyperv clocksource logic. These clock sources have their own
internal sequence counter to validate the clocksource at the point of
reading it. This is necessary because the hypervisor can invalidate the
clocksource asynchronously so a check during the VDSO data update is not
sufficient. If the internal check during read invalidates the clocksource
the read return U64_MAX. The original code checked this efficiently by
testing whether the result (casted to signed) is negative, i.e. bit 63 is
set. This was done that way because an extra indicator for the validity had
more overhead.

The conversion broke this check because the check was replaced by a check
for a valid VDSO clock mode.

The wreckage manifests itself when the paravirt clock is installed as a
valid VDSO clock and during runtime invalidated by the hypervisor,
e.g. after a host suspend/resume cycle. After the invalidation the read
function returns U64_MAX which is used as cycles and makes the clock jump
by ~2200 seconds, and become stale until the 2200 seconds have elapsed
where it starts to jump again. The period of this effect depends on the
shift/mult pair of the clocksource and the jumps and staleness are an
artifact of undefined but reproducible behaviour of math overflow.

Implement an x86 version of the new vdso_cycles_ok() inline which adds this
check back and a variant of vdso_clocksource_ok() which lets the compiler
optimize it out to avoid the extra conditional. That's suboptimal when the
system does not have a VDSO capable clocksource, but that's not the case
which is optimized for.

Fixes: 5d51bee725 ("clocksource: Add common vdso clock mode storage")
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200606221532.080560273@linutronix.de
2020-06-09 16:36:49 +02:00
Sean Christopherson 80fbd280be KVM: x86: Unexport x86_fpu_cache and make it static
Make x86_fpu_cache static now that FPU allocation and destruction is
handled entirely by common x86 code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200608180218.20946-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-09 05:57:27 -04:00
Bob Haarman d8ad6d39c3 x86_64: Fix jiffies ODR violation
'jiffies' and 'jiffies_64' are meant to alias (two different symbols that
share the same address).  Most architectures make the symbols alias to the
same address via a linker script assignment in their
arch/<arch>/kernel/vmlinux.lds.S:

jiffies = jiffies_64;

which is effectively a definition of jiffies.

jiffies and jiffies_64 are both forward declared for all architectures in
include/linux/jiffies.h. jiffies_64 is defined in kernel/time/timer.c.

x86_64 was peculiar in that it wasn't doing the above linker script
assignment, but rather was:
1. defining jiffies in arch/x86/kernel/time.c instead via the linker script.
2. overriding the symbol jiffies_64 from kernel/time/timer.c in
arch/x86/kernel/vmlinux.lds.s via 'jiffies_64 = jiffies;'.

As Fangrui notes:

  In LLD, symbol assignments in linker scripts override definitions in
  object files. GNU ld appears to have the same behavior. It would
  probably make sense for LLD to error "duplicate symbol" but GNU ld
  is unlikely to adopt for compatibility reasons.

This results in an ODR violation (UB), which seems to have survived
thus far. Where it becomes harmful is when;

1. -fno-semantic-interposition is used:

As Fangrui notes:

  Clang after LLVM commit 5b22bcc2b70d
  ("[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local")
  defaults to -fno-semantic-interposition similar semantics which help
  -fpic/-fPIC code avoid GOT/PLT when the referenced symbol is defined
  within the same translation unit. Unlike GCC
  -fno-semantic-interposition, Clang emits such relocations referencing
  local symbols for non-pic code as well.

This causes references to jiffies to refer to '.Ljiffies$local' when
jiffies is defined in the same translation unit. Likewise, references to
jiffies_64 become references to '.Ljiffies_64$local' in translation units
that define jiffies_64.  Because these differ from the names used in the
linker script, they will not be rewritten to alias one another.

2. Full LTO

Full LTO effectively treats all source files as one translation
unit, causing these local references to be produced everywhere.  When
the linker processes the linker script, there are no longer any
references to jiffies_64' anywhere to replace with 'jiffies'.  And
thus '.Ljiffies$local' and '.Ljiffies_64$local' no longer alias
at all.

In the process of porting patches enabling Full LTO from arm64 to x86_64,
spooky bugs have been observed where the kernel appeared to boot, but init
doesn't get scheduled.

Avoid the ODR violation by matching other architectures and define jiffies
only by linker script.  For -fno-semantic-interposition + Full LTO, there
is no longer a global definition of jiffies for the compiler to produce a
local symbol which the linker script won't ensure aliases to jiffies_64.

Fixes: 40747ffa5a ("asmlinkage: Make jiffies visible")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Alistair Delva <adelva@google.com>
Debugged-by: Nick Desaulniers <ndesaulniers@google.com>
Debugged-by: Sami Tolvanen <samitolvanen@google.com>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Bob Haarman <inglorion@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # build+boot on
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/852
Link: https://lkml.kernel.org/r/20200602193100.229287-1-inglorion@google.com
2020-06-09 10:50:56 +02:00
Anthony Steinhauser 4d8df8cbb9 x86/speculation: PR_SPEC_FORCE_DISABLE enforcement for indirect branches.
Currently, it is possible to enable indirect branch speculation even after
it was force-disabled using the PR_SPEC_FORCE_DISABLE option. Moreover, the
PR_GET_SPECULATION_CTRL command gives afterwards an incorrect result
(force-disabled when it is in fact enabled). This also is inconsistent
vs. STIBP and the documention which cleary states that
PR_SPEC_FORCE_DISABLE cannot be undone.

Fix this by actually enforcing force-disabled indirect branch
speculation. PR_SPEC_ENABLE called after PR_SPEC_FORCE_DISABLE now fails
with -EPERM as described in the documentation.

Fixes: 9137bb27e6 ("x86/speculation: Add prctl() control for indirect branch speculation")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2020-06-09 10:50:55 +02:00
Anthony Steinhauser dbbe2ad02e x86/speculation: Prevent rogue cross-process SSBD shutdown
On context switch the change of TIF_SSBD and TIF_SPEC_IB are evaluated
to adjust the mitigations accordingly. This is optimized to avoid the
expensive MSR write if not needed.

This optimization is buggy and allows an attacker to shutdown the SSBD
protection of a victim process.

The update logic reads the cached base value for the speculation control
MSR which has neither the SSBD nor the STIBP bit set. It then OR's the
SSBD bit only when TIF_SSBD is different and requests the MSR update.

That means if TIF_SSBD of the previous and next task are the same, then
the base value is not updated, even if TIF_SSBD is set. The MSR write is
not requested.

Subsequently if the TIF_STIBP bit differs then the STIBP bit is updated
in the base value and the MSR is written with a wrong SSBD value.

This was introduced when the per task/process conditional STIPB
switching was added on top of the existing SSBD switching.

It is exploitable if the attacker creates a process which enforces SSBD
and has the contrary value of STIBP than the victim process (i.e. if the
victim process enforces STIBP, the attacker process must not enforce it;
if the victim process does not enforce STIBP, the attacker process must
enforce it) and schedule it on the same core as the victim process. If
the victim runs after the attacker the victim becomes vulnerable to
Spectre V4.

To fix this, update the MSR value independent of the TIF_SSBD difference
and dependent on the SSBD mitigation method available. This ensures that
a subsequent STIPB initiated MSR write has the correct state of SSBD.

[ tglx: Handle X86_FEATURE_VIRT_SSBD & X86_FEATURE_VIRT_SSBD correctly
        and massaged changelog ]

Fixes: 5bfbe3ad58 ("x86/speculation: Prepare for per task indirect branch speculation control")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2020-06-09 10:50:55 +02:00
Anthony Steinhauser 21998a3515 x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.
When STIBP is unavailable or enhanced IBRS is available, Linux
force-disables the IBPB mitigation of Spectre-BTB even when simultaneous
multithreading is disabled. While attempts to enable IBPB using
prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, ...) fail with
EPERM, the seccomp syscall (or its prctl(PR_SET_SECCOMP, ...) equivalent)
which are used e.g. by Chromium or OpenSSH succeed with no errors but the
application remains silently vulnerable to cross-process Spectre v2 attacks
(classical BTB poisoning). At the same time the SYSFS reporting
(/sys/devices/system/cpu/vulnerabilities/spectre_v2) displays that IBPB is
conditionally enabled when in fact it is unconditionally disabled.

STIBP is useful only when SMT is enabled. When SMT is disabled and STIBP is
unavailable, it makes no sense to force-disable also IBPB, because IBPB
protects against cross-process Spectre-BTB attacks regardless of the SMT
state. At the same time since missing STIBP was only observed on AMD CPUs,
AMD does not recommend using STIBP, but recommends using IBPB, so disabling
IBPB because of missing STIBP goes directly against AMD's advice:
https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf

Similarly, enhanced IBRS is designed to protect cross-core BTB poisoning
and BTB-poisoning attacks from user space against kernel (and
BTB-poisoning attacks from guest against hypervisor), it is not designed
to prevent cross-process (or cross-VM) BTB poisoning between processes (or
VMs) running on the same core. Therefore, even with enhanced IBRS it is
necessary to flush the BTB during context-switches, so there is no reason
to force disable IBPB when enhanced IBRS is available.

Enable the prctl control of IBPB even when STIBP is unavailable or enhanced
IBRS is available.

Fixes: 7cc765a67d ("x86/speculation: Enable prctl mode for spectre_v2_user")
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2020-06-09 10:50:54 +02:00
Christoph Hellwig e0cf615d72 asm-generic: don't include <linux/mm.h> in cacheflush.h
This seems to lead to some crazy include loops when using
asm-generic/cacheflush.h on more architectures, so leave it to the arch
header for now.

[hch@lst.de: fix warning]
  Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.de

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will@kernel.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:57 -07:00
Eiichi Tsukata e649b3f018 KVM: x86: Fix APIC page invalidation race
Commit b1394e745b ("KVM: x86: fix APIC page invalidation") tried
to fix inappropriate APIC page invalidation by re-introducing arch
specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
kvm_mmu_notifier_invalidate_range_start. However, the patch left a
possible race where the VMCS APIC address cache is updated *before*
it is unmapped:

  (Invalidator) kvm_mmu_notifier_invalidate_range_start()
  (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
  (KVM VCPU) vcpu_enter_guest()
  (KVM VCPU) kvm_vcpu_reload_apic_access_page()
  (Invalidator) actually unmap page

Because of the above race, there can be a mismatch between the
host physical address stored in the APIC_ACCESS_PAGE VMCS field and
the host physical address stored in the EPT entry for the APIC GPA
(0xfee0000).  When this happens, the processor will not trap APIC
accesses, and will instead show the raw contents of the APIC-access page.
Because Windows OS periodically checks for unexpected modifications to
the LAPIC register, this will show up as a BSOD crash with BugCheck
CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
https://bugzilla.redhat.com/show_bug.cgi?id=1751017.

The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
cannot guarantee that no additional references are taken to the pages in
the range before kvm_mmu_notifier_invalidate_range_end().  Fortunately,
this case is supported by the MMU notifier API, as documented in
include/linux/mmu_notifier.h:

	 * If the subsystem
         * can't guarantee that no additional references are taken to
         * the pages in the range, it has to implement the
         * invalidate_range() notifier to remove any references taken
         * after invalidate_range_start().

The fix therefore is to reload the APIC-access page field in the VMCS
from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().

Cc: stable@vger.kernel.org
Fixes: b1394e745b ("KVM: x86: fix APIC page invalidation")
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-08 09:05:38 -04:00
Paolo Bonzini fb7333dfd8 KVM: SVM: fix calls to is_intercept
is_intercept takes an INTERCEPT_* constant, not SVM_EXIT_*; because
of this, the compiler was removing the body of the conditionals,
as if is_intercept returned 0.

This unveils a latent bug: when clearing the VINTR intercept,
int_ctl must also be changed in the L1 VMCB (svm->nested.hsave),
just like the intercept itself is also changed in the L1 VMCB.
Otherwise V_IRQ remains set and, due to the VINTR intercept being clear,
we get a spurious injection of a vector 0 interrupt on the next
L2->L1 vmexit.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-08 08:00:57 -04:00
Vitaly Kuznetsov 25597f64c2 Revert "KVM: x86: work around leak of uninitialized stack contents"
handle_vmptrst()/handle_vmread() stopped injecting #PF unconditionally
and switched to nested_vmx_handle_memory_failure() which just kills the
guest with KVM_EXIT_INTERNAL_ERROR in case of MMIO access, zeroing
'exception' in kvm_write_guest_virt_system() is not needed anymore.

This reverts commit 541ab2aeb2.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200605115906.532682-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-08 07:59:42 -04:00
Vitaly Kuznetsov 7a35e515a7 KVM: VMX: Properly handle kvm_read/write_guest_virt*() result
Syzbot reports the following issue:

WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618
 kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
...
Call Trace:
...
RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
...
 nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638
 handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline]
 handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728
 vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067

'exception' we're trying to inject with kvm_inject_emulated_page_fault()
comes from:

  nested_vmx_get_vmptr()
   kvm_read_guest_virt()
     kvm_read_guest_virt_helper()
       vcpu->arch.walk_mmu->gva_to_gpa()

but it is only set when GVA to GPA conversion fails. In case it doesn't but
we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and
nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed
'exception'. This happen when the argument is MMIO.

Paolo also noticed that nested_vmx_get_vmptr() is not the only place in
KVM code where kvm_read/write_guest_virt*() return result is mishandled.
VMX instructions along with INVPCID have the same issue. This was already
noticed before, e.g. see commit 541ab2aeb2 ("KVM: x86: work around
leak of uninitialized stack contents") but was never fully fixed.

KVM could've handled the request correctly by going to userspace and
performing I/O but there doesn't seem to be a good need for such requests
in the first place.

Introduce vmx_handle_memory_failure() as an interim solution.

Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF,
KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is
needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem
to have a good enum describing this tristate, just add "int *ret" to
nested_vmx_get_vmptr() interface to pass the information.

Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200605115906.532682-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-08 07:59:08 -04:00
Linus Torvalds 1ee18de929 dma-mapping updates for 5.8, part 1
- enhance the dma pool to allow atomic allocation on x86 with AMD SEV
    (David Rientjes)
  - two small cleanups (Jason Yan and Peter Collingbourne)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl7bvTULHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMJVhAAgTiWNzxPJhM6RTeRooM6W0NvcZGTJT6ExyJghaau
 aJvHUjXPrRmeBM8Zjwbbu5dioncd8c7npfRjBvATaEL74pa1u9gH3jnUTxh6L4WQ
 /FTNYryZVbprXJsdFuDZvCsO/CChqfZL8PWz+NFgIpICOyyXdorQELMhCaeOhnfU
 /goq6SvKmPlmXdb4eM2fXRD7udt1qlp+Oq2EZUdT3Xb4CBFsWUYbOMde22VY390Z
 2E9mEztOaKjNgAM/TfCoXo7iRUSwxcpO5aSliDhJJ/7uWaxyWTzFlaoIlwIkkNKb
 TcguNJbIZtjIXwBMv9gS6CqVEgFymmWqX5Tr23+vbb7S/235HqKtN1dPmV2h4R0H
 QOpvYXfm6kc4tpH4J32NMp+IqfQmwgMbNtUsiXWk5Lxl27cb8K2Q5eqEwxRWMbG+
 HObO7Kzb8oCygWwozZ+3QcWSr+9QAgzsb4Jl4jg6adjd8LDcbmKo4B9TKptGpVnL
 xjDleKdb/P4Vq55q9KHFLjqFUesuQIv2mKl2s+zr2BqROxjZ562kM9QHwsoCqc4Q
 tFuVed+XOoT7yhdKdtwEK7lwcQBtZgP5l/HgsoosmuJ975holsQ4pbKSf4A2Y4yo
 XwHYonSwOAEbi4nPxnvKIm4aUNq+PC44TH0VJcXud3tmQ/DGipdlLW8/nyw9ecfa
 qaQ=
 =GT3J
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.8' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - enhance the dma pool to allow atomic allocation on x86 with AMD SEV
   (David Rientjes)

 - two small cleanups (Jason Yan and Peter Collingbourne)

* tag 'dma-mapping-5.8' of git://git.infradead.org/users/hch/dma-mapping:
  dma-contiguous: fix comment for dma_release_from_contiguous
  dma-pool: scale the default DMA coherent pool size with memory capacity
  x86/mm: unencrypted non-blocking DMA allocations use coherent pools
  dma-pool: add pool sizes to debugfs
  dma-direct: atomic allocations must come from atomic coherent pools
  dma-pool: dynamically expanding atomic pools
  dma-pool: add additional coherent pools to map to gfp mask
  dma-remap: separate DMA atomic pools from direct remap code
  dma-debug: make __dma_entry_alloc_check_leak() static
2020-06-06 11:43:23 -07:00
Linus Torvalds 3925c3bbdf pci-v5.8-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl7ZUn8UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwuyxAAnjyHTbjpLtFcSh1on0Tsehi6hAiV
 o8w40hj3geQv/zVPgRsui3OSE9PTihPacKUQBxs/7M/iSF3cKzQEreYqDNbOMlun
 957X6CD0yTPChjcHy2brFFhQqs6O8Nl/acvkHKKP369nZkost5A2TMHPSKzKajdB
 fYw6imeunIvSpUYH9GlP9CU+suj25UHQqxUUyXAHqmF4R89j3mTpSMYJzq3VN426
 R77xeRC26lUEHepRgEiaFmdEK0zGTp/uXsJQNwaIFduNfX6x6Oo8u9EC4Mrb48g1
 duoxLUV6a8bqpjkwpPfQIv7lw1Mm1mx/vMA040zGVQR98QAMZacCPJO7eVHM/N9/
 m62bymyKHuNxXhWQwbeAyF0ZH+usaCFh7uidd4L0x1Uvs/GNCGD/icuKxHTnRWLv
 1P4bZg0n23YFzE5/eXY2voos+rfrycIoC5gspuJdP/H2Q5UM/EnI1RhizIItuqJY
 ORAdczo7a8oamt90WWZ636HeL1UQIGZjq1iWz2++vRTJGzfLeuOJj76cn2RuDQsZ
 +rBtQ1zH3saEZSARPdS5HQH+SaDxtjfu/AaAz6NDNSvasDLqmjRdOkM6UIU2pqMG
 cE9um1ijcSy2UjUInVPr3Hk7bbbd9YJTQQiWr04uL6ap0JpwRhSPqQpDHtLwLh+E
 R13t08tEMd09730=
 =9ymj
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:

   - Program MPS for RCiEP devices (Ashok Raj)

   - Fix pci_register_host_bridge() device_register() error handling
     (Rob Herring)

   - Fix pci_host_bridge struct device release/free handling (Rob
     Herring)

  Resource management:

   - Allow resizing BARs for devices on root bus (Ard Biesheuvel)

  Power management:

   - Reduce Thunderbolt resume time by working around devices that don't
     support DLL Link Active reporting (Mika Westerberg)

   - Work around a Pericom USB controller OHCI/EHCI PME# defect
     (Kai-Heng Feng)

  Virtualization:

   - Add ACS quirk for Intel Root Complex Integrated Endpoints (Ashok
     Raj)

   - Avoid FLR for AMD Starship USB 3.0 (Kevin Buettner)

   - Avoid FLR for AMD Matisse HD Audio & USB 3.0 (Marcos Scriven)

  Error handling:

   - Use only _OSC (not HEST FIRMWARE_FIRST) to determine AER ownership
     (Alexandru Gagniuc, Kuppuswamy Sathyanarayanan)

   - Reduce verbosity by logging only ACPI_NOTIFY_DISCONNECT_RECOVER
     events (Kuppuswamy Sathyanarayanan)

   - Don't enable AER by default in Kconfig (Bjorn Helgaas)

  Peer-to-peer DMA:

   - Add AMD Zen Raven and Renoir Root Ports to whitelist (Alex Deucher)

  ASPM:

   - Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges (Kai-Heng Feng)

  Endpoint framework:

   - Fix DMA channel release in test (Kunihiko Hayashi)

   - Add page size as argument to pci_epc_mem_init() (Lad Prabhakar)

   - Add support to handle multiple base for mapping outbound memory
     (Lad Prabhakar)

  Generic host bridge driver:

   - Support building as module (Rob Herring)

   - Eliminate pci_host_common_probe wrappers (Rob Herring)

  Amlogic Meson PCIe controller driver:

   - Don't use FAST_LINK_MODE to set up link (Marc Zyngier)

  Broadcom STB PCIe controller driver:

   - Disable ASPM L0s if 'aspm-no-l0s' in DT (Jim Quinlan)

   - Fix clk_put() error (Jim Quinlan)

   - Fix window register offset (Jim Quinlan)

   - Assert fundamental reset on initialization (Nicolas Saenz Julienne)

   - Add notify xHCI reset property (Nicolas Saenz Julienne)

   - Add init routine for Raspberry Pi 4 VL805 USB controller (Nicolas
     Saenz Julienne)

   - Sync with Raspberry Pi 4 firmware for VL805 initialization (Nicolas
     Saenz Julienne)

  Cadence PCIe controller driver:

   - Remove "cdns,max-outbound-regions" DT property (replaced by
     "ranges") (Kishon Vijay Abraham I)

   - Read 32-bit (not 16-bit) Vendor ID/Device ID property from DT
     (Kishon Vijay Abraham I)

  Marvell Aardvark PCIe controller driver:

   - Improve link training (Marek Behún)

   - Add PHY support (Marek Behún)

   - Add "phys", "max-link-speed", "reset-gpios" to dt-binding (Marek
     Behún)

   - Train link immediately after enabling training to work around
     detection issues with some cards (Pali Rohár)

   - Issue PERST via GPIO to work around detection issues (Pali Rohár)

   - Don't blindly enable ASPM L0s (Pali Rohár)

   - Replace custom macros by standard linux/pci_regs.h macros (Pali
     Rohár)

  Microsoft Hyper-V host bridge driver:

   - Fix probe failure path to release resource (Wei Hu)

   - Retry PCI bus D0 entry on invalid device state for kdump (Wei Hu)

  Renesas R-Car PCIe controller driver:

   - Fix incorrect programming of OB windows (Andrew Murray)

   - Add suspend/resume (Kazufumi Ikeda)

   - Rename pcie-rcar.c to pcie-rcar-host.c (Lad Prabhakar)

   - Add endpoint controller driver (Lad Prabhakar)

   - Fix PCIEPAMR mask calculation (Lad Prabhakar)

   - Add r8a77961 to DT binding (Yoshihiro Shimoda)

  Socionext UniPhier Pro5 controller driver:

   - Add endpoint controller driver (Kunihiko Hayashi)

  Synopsys DesignWare PCIe controller driver:

   - Program outbound ATU upper limit register (Alan Mikhak)

   - Fix inner MSI IRQ domain registration (Marc Zyngier)

  Miscellaneous:

   - Check for platform_get_irq() failure consistently (negative return
     means failure) (Aman Sharma)

   - Fix several runtime PM get/put imbalances (Dinghao Liu)

   - Use flexible-array and struct_size() helpers for code cleanup
     (Gustavo A. R. Silva)

   - Update & fix issues in bridge emulation of PCIe registers (Jon
     Derrick)

   - Add macros for bridge window names (PCI_BRIDGE_IO_WINDOW, etc)
     (Krzysztof Wilczyński)

   - Work around Intel PCH MROMs that have invalid BARs (Xiaochun Lee)"

* tag 'pci-v5.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (100 commits)
  PCI: uniphier: Add Socionext UniPhier Pro5 PCIe endpoint controller driver
  PCI: Add ACS quirk for Intel Root Complex Integrated Endpoints
  PCI/DPC: Print IRQ number used by port
  PCI/AER: Use "aer" variable for capability offset
  PCI/AER: Remove redundant dev->aer_cap checks
  PCI/AER: Remove redundant pci_is_pcie() checks
  PCI/AER: Remove HEST/FIRMWARE_FIRST parsing for AER ownership
  PCI: tegra: Fix runtime PM imbalance on error
  PCI: vmd: Filter resource type bits from shadow register
  PCI: tegra194: Fix runtime PM imbalance on error
  dt-bindings: PCI: Add UniPhier PCIe endpoint controller description
  PCI: hv: Use struct_size() helper
  PCI: Rename _DSM constants to align with spec
  PCI: Avoid FLR for AMD Starship USB 3.0
  PCI: Avoid FLR for AMD Matisse HD Audio & USB 3.0
  x86/PCI: Drop unused xen_register_pirq() gsi_override parameter
  PCI: dwc: Use private data pointer of "struct irq_domain" to get pcie_port
  PCI: amlogic: meson: Don't use FAST_LINK_MODE to set up link
  PCI: dwc: Fix inner MSI IRQ domain registration
  PCI: dwc: pci-dra7xx: Use devm_platform_ioremap_resource_byname()
  ...
2020-06-06 11:01:58 -07:00
Linus Torvalds ac7b34218a Split the old READ_IMPLIES_EXEC workaround from executable PT_GNU_STACK
now that toolchains long support PT_GNU_STACK marking and there's no
 need anymore to force modern programs into having all its user mappings
 executable instead of only the stack and the PROT_EXEC ones. Disable
 that automatic READ_IMPLIES_EXEC forcing on x86-64 and arm64. Add tables
 documenting how READ_IMPLIES_EXEC is handled on x86-64, arm and arm64.
 By Kees Cook.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl7YFDIACgkQEsHwGGHe
 VUpnzxAAmXdODNOb1gGQvt+KJthkfkWh2A2R+tWxCRmFtjFTcS/eRxFfvGu2KmFY
 2b2AcJzuJeGjs7WIvQU0pkR2p6STyzuSBBLj5J/OJR9FonQ4pPah38df4A0fOgI6
 GJyJV9Ie7O2Ph1w2iLOeWBdmR90CnYuabxsfipgOL+sjHlEI0RqLSDgARRQsxTEj
 KM+JVAFD472KcUJnQKBVBOD1I1DOVBGu12r3y6chgsOtwshLNW/cO15cDgYrgnJZ
 OlR3EIUukCEEc1KQzUCihsypLuGfrmdq1MyPN8CME8gLfmOBsJyGRDhvmdbS+Wxh
 kAMYQ9BuNP/jMVtN950qV0qUtnZCeIPlj1sDb9STWz5fInLsXDSCS0eYi32yBFi+
 7yviVU95ml6Mda1Qd5axItTHFAjKIn0qfMZszkLOtUszIzNinCgH7t3ThoXeV223
 BqrpntRwiGZVpXDdcp0QFYBsWSMchR47yuhL8pB4SWxQzgNzXqAEg2KFQU0XMDKp
 pdia9IzUozg/BrjG5cnRfZhq2lBra7fy3Dn6fw5+NR5vqhka0Wr8L6dyM1Rj74EU
 HPk5bRXgt0OIiIFPi4139ApY7k+8j2nbf12qUchue1ZVVKzbvK996FDXbrGgW3zD
 Wis1wglxB9urSUTmC1bMOeyOd+gebo3i/ACAjgSo+EbDN7qW0Qw=
 =2L7y
 -----END PGP SIGNATURE-----

Merge tag 'core_core_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull READ_IMPLIES_EXEC changes from Borislav Petkov:
 "Split the old READ_IMPLIES_EXEC workaround from executable
  PT_GNU_STACK now that toolchains long support PT_GNU_STACK marking and
  there's no need anymore to force modern programs into having all its
  user mappings executable instead of only the stack and the PROT_EXEC
  ones.

  Disable that automatic READ_IMPLIES_EXEC forcing on x86-64 and
  arm64.

  Add tables documenting how READ_IMPLIES_EXEC is handled on x86-64, arm
  and arm64.

  By Kees Cook"

* tag 'core_core_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64/elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces
  arm32/64/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
  arm32/64/elf: Add tables to document READ_IMPLIES_EXEC
  x86/elf: Disable automatic READ_IMPLIES_EXEC on 64-bit
  x86/elf: Split READ_IMPLIES_EXEC from executable PT_GNU_STACK
  x86/elf: Add table to document READ_IMPLIES_EXEC
2020-06-05 13:45:21 -07:00
Linus Torvalds f4dd60a3d4 Misc changes:
- Unexport various PAT primitives
 
  - Unexport per-CPU tlbstate
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7Z+3cRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jgyxAAjPoXEzi9rqGHY6Eus37DNbzHtdQj4fqN
 68h8T2tSnOMzETe3L/c4puxI50YFpMA0sFbzm8BfjCtucs0K7Tj4Sv8Aoap2b99A
 /bP+ySgHh2BMoI/tu9TiD8et+vttAGGwkXQhIOgeakZcYzpAY7oUNwc+CogkytbQ
 DaC8s9FL7RjCXCL91fvZ33C0ksg5J9ynFbRozEHOacHPrE3CbrqUwu+75PmS7nJC
 13vatOxjdqNPQhVMg7waN1nHv7K06kph1wxWxYHoD0QwAPy1ecE84wLvg9gv5AqK
 BfUBmB34qRW21qbB5tQrMlGDS9tuV0vUB1fxUV7/iOKXQUH6viEG/7J7jm+YwXji
 U9S54UPj/TOp8fvYdS18sp6vI1gS3HKjd3LO3pPHWsyZVMJBoGuMConZRs3C31Cp
 WuwBU1gY+mFB5l4prt8WU8ocPvEnZkP00cCYNyzPk21tblfUwFbrmu3wcZxOkx3s
 ZhRO4KrhxtL7l/wDLuNtWShBL2c6Rz2tts58tr/fj/M+UscJK2MPKxPLCAb20QYZ
 qSkMa36+r8LkuMCyjpegEEmo4sw9yC6aLXFKfYu2ABki5o9AR4tavk+lwO+dad6T
 k0DJjGXLsG9sReR6hrfaNTk5h7ImiRFDVntnWAhgKhARRoloJJS4/RkzW+ylPbac
 mTuNNJDChUQ=
 =RXKK
 -----END PGP SIGNATURE-----

Merge tag 'x86-mm-2020-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Ingo Molnar:
 "Misc changes:

   - Unexport various PAT primitives

   - Unexport per-CPU tlbstate and uninline TLB helpers"

* tag 'x86-mm-2020-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/tlb/uv: Add a forward declaration for struct flush_tlb_info
  x86/cpu: Export native_write_cr4() only when CONFIG_LKTDM=m
  x86/tlb: Restrict access to tlbstate
  xen/privcmd: Remove unneeded asm/tlb.h include
  x86/tlb: Move PCID helpers where they are used
  x86/tlb: Uninline nmi_uaccess_okay()
  x86/tlb: Move cr4_set_bits_and_update_boot() to the usage site
  x86/tlb: Move paravirt_tlb_remove_table() to the usage site
  x86/tlb: Move __flush_tlb_all() out of line
  x86/tlb: Move flush_tlb_others() out of line
  x86/tlb: Move __flush_tlb_one_kernel() out of line
  x86/tlb: Move __flush_tlb_one_user() out of line
  x86/tlb: Move __flush_tlb_global() out of line
  x86/tlb: Move __flush_tlb() out of line
  x86/alternatives: Move temporary_mm helpers into C
  x86/cr4: Sanitize CR4.PCE update
  x86/cpu: Uninline CR4 accessors
  x86/tlb: Uninline __get_current_cr3_fast()
  x86/mm: Use pgprotval_t in protval_4k_2_large() and protval_large_2_4k()
  x86/mm: Unexport __cachemode2pte_tbl
  ...
2020-06-05 11:18:53 -07:00
Paolo Bonzini 34d2618d33 KVM: x86: emulate reserved nops from 0f/18 to 0f/1f
Instructions starting with 0f18 up to 0f1f are reserved nops, except those
that were assigned to MPX.  These include the endbr markers used by CET.
List them correctly in the opcode table.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-05 11:16:15 -04:00
Linus Torvalds 886d7de631 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - More MM work. 100ish more to go. Mike Rapoport's "mm: remove
   __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue

 - Various other little subsystems

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
  lib/ubsan.c: fix gcc-10 warnings
  tools/testing/selftests/vm: remove duplicate headers
  selftests: vm: pkeys: fix multilib builds for x86
  selftests: vm: pkeys: use the correct page size on powerpc
  selftests/vm/pkeys: override access right definitions on powerpc
  selftests/vm/pkeys: test correct behaviour of pkey-0
  selftests/vm/pkeys: introduce a sub-page allocator
  selftests/vm/pkeys: detect write violation on a mapped access-denied-key page
  selftests/vm/pkeys: associate key on a mapped page and detect write violation
  selftests/vm/pkeys: associate key on a mapped page and detect access violation
  selftests/vm/pkeys: improve checks to determine pkey support
  selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()
  selftests/vm/pkeys: fix number of reserved powerpc pkeys
  selftests/vm/pkeys: introduce powerpc support
  selftests/vm/pkeys: introduce generic pkey abstractions
  selftests: vm: pkeys: use the correct huge page size
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really random
  selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()
  selftests/vm/pkeys: fix pkey_disable_clear()
  selftests: vm: pkeys: add helpers for pkey bits
  ...
2020-06-04 19:18:29 -07:00
Ira Weiny 090e77e166 kmap: consolidate kmap_prot definitions
Most architectures define kmap_prot to be PAGE_KERNEL.

Let sparc and xtensa define there own and define PAGE_KERNEL as the
default if not overridden.

[akpm@linux-foundation.org: coding style fixes]
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-16-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny 20b271dfe9 arch/kmap: define kmap_atomic_prot() for all arch's
To support kmap_atomic_prot(), all architectures need to support
protections passed to their kmap_atomic_high() function.  Pass protections
into kmap_atomic_high() and change the name to kmap_atomic_high_prot() to
match.

Then define kmap_atomic_prot() as a core function which calls
kmap_atomic_high_prot() when needed.

Finally, redefine kmap_atomic() as a wrapper of kmap_atomic_prot() with
the default kmap_prot exported by the architectures.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-11-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny abca2500c0 arch/kunmap_atomic: consolidate duplicate code
Every single architecture (including !CONFIG_HIGHMEM) calls...

	pagefault_enable();
	preempt_enable();

... before returning from __kunmap_atomic().  Lift this code into the
kunmap_atomic() macro.

While we are at it rename __kunmap_atomic() to kunmap_atomic_high() to
be consistent.

[ira.weiny@intel.com: don't enable pagefault/preempt twice]
  Link: http://lkml.kernel.org/r/20200518184843.3029640-1-ira.weiny@intel.com
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/20200507150004.1423069-8-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny 78b6d91ec7 arch/kmap_atomic: consolidate duplicate code
Every arch has the same code to ensure atomic operations and a check for
!HIGHMEM page.

Remove the duplicate code by defining a core kmap_atomic() which only
calls the arch specific kmap_atomic_high() when the page is high memory.

[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-7-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny ee9bc5fdf5 {x86,powerpc,microblaze}/kmap: move preempt disable
During this kmap() conversion series we must maintain bisect-ability.  To
do this, kmap_atomic_prot() in x86, powerpc, and microblaze need to remain
functional.

Create a temporary inline version of kmap_atomic_prot within these
architectures so we can rework their kmap_atomic() calls and then lift
kmap_atomic_prot() to the core.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-6-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny e23c45976f arch/kunmap: remove duplicate kunmap implementations
All architectures do exactly the same thing for kunmap(); remove all the
duplicate definitions and lift the call to the core.

This also has the benefit of changing kmap_unmap() on a number of
architectures to be an inline call rather than an actual function.

[akpm@linux-foundation.org: fix CONFIG_HIGHMEM=n build on various architectures]
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-5-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny 525aaf9bad arch/kmap: remove redundant arch specific kmaps
The kmap code for all the architectures is almost 100% identical.

Lift the common code to the core.  Use ARCH_HAS_KMAP_FLUSH_TLB to indicate
if an arch defines kmap_flush_tlb() and call if if needed.

This also has the benefit of changing kmap() on a number of architectures
to be an inline call rather than an actual function.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200507150004.1423069-4-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Ira Weiny 01c4b788e0 arch/kmap: remove BUG_ON()
Patch series "Remove duplicated kmap code", v3.

The kmap infrastructure has been copied almost verbatim to every
architecture.  This series consolidates obvious duplicated code by
defining core functions which call into the architectures only when
needed.

Some of the k[un]map_atomic() implementations have some similarities but
the similarities were not sufficient to warrant further changes.

In addition we remove a duplicate implementation of kmap() in DRM.

This patch (of 15):

Replace the use of BUG_ON(in_interrupt()) in the kmap() and kunmap() in
favor of might_sleep().

Besides the benefits of might_sleep(), this normalizes the implementations
such that they can be made generic in subsequent patches.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Link: http://lkml.kernel.org/r/20200507150004.1423069-1-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20200507150004.1423069-2-ira.weiny@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:22 -07:00
Anshuman Khandual 399145f9eb mm/debug: add tests validating architecture page table helpers
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments.  The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol.  This test gets called
via late_initcall().

This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE.  For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.

Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot.  Any non conformity here
will be reported as an warning which would need to be fixed.  This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.

[anshuman.khandual@arm.com: v17]
  Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
  Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>	[ppc32]
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:21 -07:00
Anshuman Khandual 8898ad58a0 x86/mm: define mm_p4d_folded()
Patch series "mm/debug: Add tests validating architecture page table
helpers", v18.

This adds a test validation for architecture exported page table helpers.
Patch adds basic transformation tests at various levels of the page table.

This test was originally suggested by Catalin during arm64 THP migration
RFC discussion earlier.  Going forward it can include more specific tests
with respect to various generic MM functions like THP, HugeTLB etc and
platform specific tests.

https://lore.kernel.org/linux-mm/20190628102003.GA56463@arrakis.emea.arm.com/

This patch (of 2):

This just defines mm_p4d_folded() to check whether P4D page table level is
folded at runtime.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1587436495-22033-2-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r/1588564865-31160-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:06:21 -07:00
Fan Yang 5bfea2d9b1 mm: Fix mremap not considering huge pmd devmap
The original code in mm/mremap.c checks huge pmd by:

		if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd)) {

However, a DAX mapped nvdimm is mapped as huge page (by default) but it
is not transparent huge page (_PAGE_PSE | PAGE_DEVMAP).  This commit
changes the condition to include the case.

This addresses CVE-2020-10757.

Fixes: 5c7fb56e5e ("mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd")
Cc: <stable@vger.kernel.org>
Reported-by: Fan Yang <Fan_Yang@sjtu.edu.cn>
Signed-off-by: Fan Yang <Fan_Yang@sjtu.edu.cn>
Tested-by: Fan Yang <Fan_Yang@sjtu.edu.cn>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 19:05:24 -07:00
Linus Torvalds 15a2bc4dbb Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull execve updates from Eric Biederman:
 "Last cycle for the Nth time I ran into bugs and quality of
  implementation issues related to exec that could not be easily be
  fixed because of the way exec is implemented. So I have been digging
  into exec and cleanup up what I can.

  I don't think I have exec sorted out enough to fix the issues I
  started with but I have made some headway this cycle with 4 sets of
  changes.

   - promised cleanups after introducing exec_update_mutex

   - trivial cleanups for exec

   - control flow simplifications

   - remove the recomputation of bprm->cred

  The net result is code that is a bit easier to understand and work
  with and a decrease in the number of lines of code (if you don't count
  the added tests)"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
  exec: Compute file based creds only once
  exec: Add a per bprm->file version of per_clear
  binfmt_elf_fdpic: fix execfd build regression
  selftests/exec: Add binfmt_script regression test
  exec: Remove recursion from search_binary_handler
  exec: Generic execfd support
  exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
  exec: Move the call of prepare_binprm into search_binary_handler
  exec: Allow load_misc_binary to call prepare_binprm unconditionally
  exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
  exec: Teach prepare_exec_creds how exec treats uids & gids
  exec: Set the point of no return sooner
  exec: Move handling of the point of no return to the top level
  exec: Run sync_mm_rss before taking exec_update_mutex
  exec: Fix spelling of search_binary_handler in a comment
  exec: Move the comment from above de_thread to above unshare_sighand
  exec: Rename flush_old_exec begin_new_exec
  exec: Move most of setup_new_exec into flush_old_exec
  exec: In setup_new_exec cache current in the local variable me
  ...
2020-06-04 14:07:08 -07:00
Anthony Yznaga 3741679ba4 KVM: x86: minor code refactor and comments fixup around dirty logging
Consolidate the code and correct the comments to show that the actions
taken to update existing mappings to disable or enable dirty logging
are not necessary when creating, moving, or deleting a memslot.

Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Message-Id: <1591128450-11977-4-git-send-email-anthony.yznaga@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 14:42:40 -04:00
Anthony Yznaga 4b44295538 KVM: x86: avoid unnecessary rmap walks when creating/moving slots
On large memory guests it has been observed that creating a memslot
for a very large range can take noticeable amount of time.
Investigation showed that the time is spent walking the rmaps to update
existing sptes to remove write access or set/clear dirty bits to support
dirty logging.  These rmap walks are unnecessary when creating or moving
a memslot.  A newly created memslot will not have any existing mappings,
and the existing mappings of a moved memslot will have been invalidated
and flushed.  Any mappings established once the new/moved memslot becomes
visible will be set using the properties of the new slot.

Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Message-Id: <1591128450-11977-3-git-send-email-anthony.yznaga@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 14:42:35 -04:00
Anthony Yznaga 5688fed649 KVM: x86: remove unnecessary rmap walk of read-only memslots
There's no write access to remove.  An existing memslot cannot be updated
to set or clear KVM_MEM_READONLY, and any mappings established in a newly
created or moved read-only memslot will already be read-only.

Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Message-Id: <1591128450-11977-2-git-send-email-anthony.yznaga@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 14:42:25 -04:00
Denis Efremov 7ec28e264f KVM: Use vmemdup_user()
Replace opencoded alloc and copy with vmemdup_user().

Signed-off-by: Denis Efremov <efremov@linux.com>
Message-Id: <20200603101131.2107303-1-efremov@linux.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 14:41:05 -04:00
Linus Torvalds 9fb4c5250f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Jiri Kosina:

 - simplifications and improvements for issues Peter Ziljstra found
   during his previous work on W^X cleanups.

   This allows us to remove livepatch arch-specific .klp.arch sections
   and add proper support for jump labels in patched code.

   Also, this patchset removes the last module_disable_ro() usage in the
   tree.

   Patches from Josh Poimboeuf and Peter Zijlstra

 - a few other minor cleanups

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  MAINTAINERS: add lib/livepatch to LIVE PATCHING
  livepatch: add arch-specific headers to MAINTAINERS
  livepatch: Make klp_apply_object_relocs static
  MAINTAINERS: adjust to livepatch .klp.arch removal
  module: Make module_enable_ro() static again
  x86/module: Use text_mutex in apply_relocate_add()
  module: Remove module_disable_ro()
  livepatch: Remove module_disable_ro() usage
  x86/module: Use text_poke() for late relocations
  s390/module: Use s390_kernel_write() for late relocations
  s390: Change s390_kernel_write() return type to match memcpy()
  livepatch: Prevent module-specific KLP rela sections from referencing vmlinux symbols
  livepatch: Remove .klp.arch
  livepatch: Apply vmlinux-specific KLP relocations early
  livepatch: Disallow vmlinux.ko
2020-06-04 11:13:03 -07:00
Sean Christopherson 0e96edd9a9 x86/kvm: Remove defunct KVM_DEBUG_FS Kconfig
Remove KVM_DEBUG_FS, which can easily be misconstrued as controlling
KVM-as-a-host.  The sole user of CONFIG_KVM_DEBUG_FS was removed by
commit cfd8983f03 ("x86, locking/spinlocks: Remove ticket (spin)lock
implementation").

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200528031121.28904-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 14:12:36 -04:00
Bjorn Helgaas 39a1af7619 Merge branch 'pci/virtualization'
- Remove unused xen_register_pirq() parameter (Wei Liu)

  - Quirk AMD Matisse HD Audio & USB 3.0 devices where FLR hangs the device
    (Marcos Scriven)

  - Quirk AMD Starship USB 3.0 device where FLR doesn't seem to work (Kevin
    Buettner)

  - Add ACS quirk for Intel RCiEPs (Ashok Raj)

* pci/virtualization:
  PCI: Add ACS quirk for Intel Root Complex Integrated Endpoints
  PCI: Avoid FLR for AMD Starship USB 3.0
  PCI: Avoid FLR for AMD Matisse HD Audio & USB 3.0
  x86/PCI: Drop unused xen_register_pirq() gsi_override parameter
2020-06-04 12:59:13 -05:00
Babu Moger fa44b82eb8 KVM: x86: Move MPK feature detection to common code
Both Intel and AMD support (MPK) Memory Protection Key feature.
Move the feature detection from VMX to the common code. It should
work for both the platforms now.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <158932795627.44260.15144185478040178638.stgit@naples-babu.amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 12:35:06 -04:00
Xiaoyao Li 65b1891499 KVM: x86: Assign correct value to array.maxnent
Delay the assignment of array.maxnent to use correct value for the case
cpuid->nent > KVM_MAX_CPUID_ENTRIES.

Fixes: e53c95e8d4 ("KVM: x86: Encapsulate CPUID entries and metadata in struct")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200604041636.1187-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 12:21:10 -04:00
Sean Christopherson f4f6bd93fd KVM: VMX: Always treat MSR_IA32_PERF_CAPABILITIES as a valid PMU MSR
Unconditionally return true when querying the validity of
MSR_IA32_PERF_CAPABILITIES so as to defer the validity check to
intel_pmu_{get,set}_msr(), which can properly give the MSR a pass when
the access is initiated from host userspace.  The MSR is emulated so
there is no underlying hardware dependency to worry about.

Fixes: 27461da310 ("KVM: x86/pmu: Support full width counting")
Cc: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200603203303.28545-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 12:20:44 -04:00
Paolo Bonzini d56f5136b0 KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directories
After commit 63d0434 ("KVM: x86: move kvm_create_vcpu_debugfs after
last failure point") we are creating the pre-vCPU debugfs files
after the creation of the vCPU file descriptor.  This makes it
possible for userspace to reach kvm_vcpu_release before
kvm_create_vcpu_debugfs has finished.  The vcpu->debugfs_dentry
then does not have any associated inode anymore, and this causes
a NULL-pointer dereference in debugfs_create_file.

The solution is simply to avoid removing the files; they are
cleaned up when the VM file descriptor is closed (and that must be
after KVM_CREATE_VCPU returns).  We can stop storing the dentry
in struct kvm_vcpu too, because it is not needed anywhere after
kvm_create_vcpu_debugfs returns.

Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com
Fixes: 63d0434837 ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-04 11:00:54 -04:00
Linus Torvalds ee01c4d72a Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "More mm/ work, plenty more to come

  Subsystems affected by this patch series: slub, memcg, gup, kasan,
  pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
  thp, mmap, kconfig"

* akpm: (131 commits)
  arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
  riscv: support DEBUG_WX
  mm: add DEBUG_WX support
  drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
  mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
  powerpc/mm: drop platform defined pmd_mknotpresent()
  mm: thp: don't need to drain lru cache when splitting and mlocking THP
  hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
  sparc32: register memory occupied by kernel as memblock.memory
  include/linux/memblock.h: fix minor typo and unclear comment
  mm, mempolicy: fix up gup usage in lookup_node
  tools/vm/page_owner_sort.c: filter out unneeded line
  mm: swap: memcg: fix memcg stats for huge pages
  mm: swap: fix vmstats for huge pages
  mm: vmscan: limit the range of LRU type balancing
  mm: vmscan: reclaim writepage is IO cost
  mm: vmscan: determine anon/file pressure balance at the reclaim root
  mm: balance LRU lists based on relative thrashing
  mm: only count actual rotations as LRU reclaim cost
  ...
2020-06-03 20:24:15 -07:00
Zong Li 7e01ccb43d x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
Extract DEBUG_WX to mm/Kconfig.debug for shared use.  Change to use
ARCH_HAS_DEBUG_WX instead of DEBUG_WX defined by arch port.

Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/430736828d149df3f5b462d291e845ec690e0141.1587455584.git.zong.li@sifive.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:50 -07:00
Anshuman Khandual 86ec2da037 mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
pmd_present() is expected to test positive after pmdp_mknotpresent() as
the PMD entry still points to a valid huge page in memory.
pmdp_mknotpresent() implies that given PMD entry is just invalidated from
MMU perspective while still holding on to pmd_page() referred valid huge
page thus also clearing pmd_present() test.  This creates the following
situation which is counter intuitive.

[pmd_present(pmd_mknotpresent(pmd)) = true]

This renames pmd_mknotpresent() as pmd_mkinvalid() reflecting the helper's
functionality more accurately while changing the above mentioned situation
as follows.  This does not create any functional change.

[pmd_present(pmd_mkinvalid(pmd)) = true]

This is not applicable for platforms that define own pmdp_invalidate() via
__HAVE_ARCH_PMDP_INVALIDATE.  Suggestion for renaming came during a
previous discussion here.

https://patchwork.kernel.org/patch/11019637/

[anshuman.khandual@arm.com: change pmd_mknotvalid() to pmd_mkinvalid() per Will]
  Link: http://lkml.kernel.org/r/1587520326-10099-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Link: http://lkml.kernel.org/r/1584680057-13753-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:49 -07:00
Anshuman Khandual 5be9934328 mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags()
There are multiple similar definitions for arch_clear_hugepage_flags() on
various platforms.  Lets just add it's generic fallback definition for
platforms that do not override.  This help reduce code duplication.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1588907271-11920-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:46 -07:00
Anshuman Khandual b0eae98c66 mm/hugetlb: define a generic fallback for is_hugepage_only_range()
There are multiple similar definitions for is_hugepage_only_range() on
various platforms.  Lets just add it's generic fallback definition for
platforms that do not override.  This help reduce code duplication.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1588907271-11920-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:46 -07:00
Mike Kravetz 3823783088 hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
hugetlb_add_hstate() prints a warning if the hstate already exists.  This
was originally done as part of kernel command line parsing.  If
'hugepagesz=' was specified more than once, the warning

	pr_warn("hugepagesz= specified twice, ignoring\n");

would be printed.

Some architectures want to enable all huge page sizes.  They would call
hugetlb_add_hstate for all supported sizes.  However, this was done after
command line processing and as a result hstates could have already been
created for some sizes.  To make sure no warning were printed, there would
often be code like:

	if (!size_to_hstate(size)
		hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)

The only time we want to print the warning is as the result of command
line processing.  So, remove the warning from hugetlb_add_hstate and add
it to the single arch independent routine processing "hugepagesz=".  After
this, calls to size_to_hstate() in arch specific code can be removed and
hugetlb_add_hstate can be called without worrying about warning messages.

[mike.kravetz@oracle.com: fix hugetlb initialization]
  Link: http://lkml.kernel.org/r/4c36c6ce-3774-78fa-abc4-b7346bf24348@oracle.com
  Link: http://lkml.kernel.org/r/20200428205614.246260-5-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Mina Almasry <almasrymina@google.com>
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
Acked-by: Will Deacon <will@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Longpeng <longpeng2@huawei.com>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200417185049.275845-4-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20200428205614.246260-4-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:46 -07:00
Mike Kravetz 359f25443a hugetlbfs: move hugepagesz= parsing to arch independent code
Now that architectures provide arch_hugetlb_valid_size(), parsing of
"hugepagesz=" can be done in architecture independent code.  Create a
single routine to handle hugepagesz= parsing and remove all arch specific
routines.  We can also remove the interface hugetlb_bad_size() as this is
no longer used outside arch independent code.

This also provides consistent behavior of hugetlbfs command line options.
The hugepagesz= option should only be specified once for a specific size,
but some architectures allow multiple instances.  This appears to be more
of an oversight when code was added by some architectures to set up ALL
huge pages sizes.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sandipan Das <sandipan@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Mina Almasry <almasrymina@google.com>
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
Acked-by: Will Deacon <will@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Longpeng <longpeng2@huawei.com>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200417185049.275845-3-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20200428205614.246260-3-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:46 -07:00
Mike Kravetz ae94da8981 hugetlbfs: add arch_hugetlb_valid_size
Patch series "Clean up hugetlb boot command line processing", v4.

Longpeng(Mike) reported a weird message from hugetlb command line
processing and proposed a solution [1].  While the proposed patch does
address the specific issue, there are other related issues in command line
processing.  As hugetlbfs evolved, updates to command line processing have
been made to meet immediate needs and not necessarily in a coordinated
manner.  The result is that some processing is done in arch specific code,
some is done in arch independent code and coordination is problematic.
Semantics can vary between architectures.

The patch series does the following:
- Define arch specific arch_hugetlb_valid_size routine used to validate
  passed huge page sizes.
- Move hugepagesz= command line parsing out of arch specific code and into
  an arch independent routine.
- Clean up command line processing to follow desired semantics and
  document those semantics.

[1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpeng2@huawei.com

This patch (of 3):

The architecture independent routine hugetlb_default_setup sets up the
default huge pages size.  It has no way to verify if the passed value is
valid, so it accepts it and attempts to validate at a later time.  This
requires undocumented cooperation between the arch specific and arch
independent code.

For architectures that support more than one huge page size, provide a
routine arch_hugetlb_valid_size to validate a huge page size.
hugetlb_default_setup can use this to validate passed values.

arch_hugetlb_valid_size will also be used in a subsequent patch to move
processing of the "hugepagesz=" in arch specific code to a common routine
in arch independent code.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Longpeng <longpeng2@huawei.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20200428205614.246260-1-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20200428205614.246260-2-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20200417185049.275845-1-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20200417185049.275845-2-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:46 -07:00
Daniel Jordan ecd0965069 mm: make deferred init's max threads arch-specific
Using padata during deferred init has only been tested on x86, so for now
limit it to this architecture.

If another arch wants this, it can find the max thread limit that's best
for it and override deferred_page_init_max_threads().

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Josh Triplett <josh@joshtriplett.org>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Link: http://lkml.kernel.org/r/20200527173608.2885243-8-daniel.m.jordan@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:45 -07:00
Mike Rapoport bc9331a19d mm: rename free_area_init_node() to free_area_init_memoryless_node()
free_area_init_node() is only used by x86 to initialize a memory-less
nodes.  Make its name reflect this and drop all the function parameters
except node ID as they are anyway zero.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
Cc: Baoquan He <bhe@redhat.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-19-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:44 -07:00
Mike Rapoport acd3f5c441 mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES
The memmap_init() function was made to iterate over memblock regions and
as the result the early_pfn_in_nid() function became obsolete.  Since
CONFIG_NODES_SPAN_OTHER_NODES is only used to pick a stub or a real
implementation of early_pfn_in_nid(), it is also not needed anymore.

Remove both early_pfn_in_nid() and the CONFIG_NODES_SPAN_OTHER_NODES.

Co-developed-by: Hoan Tran <Hoan@os.amperecomputing.com>
Signed-off-by: Hoan Tran <Hoan@os.amperecomputing.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
Cc: Baoquan He <bhe@redhat.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-17-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:44 -07:00
Mike Rapoport 9691a071aa mm: use free_area_init() instead of free_area_init_nodes()
free_area_init() has effectively became a wrapper for
free_area_init_nodes() and there is no point of keeping it.  Still
free_area_init() name is shorter and more general as it does not imply
necessity to initialize multiple nodes.

Rename free_area_init_nodes() to free_area_init(), update the callers and
drop old version of free_area_init().

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:43 -07:00
Mike Rapoport 3f08a302f5 mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option
CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
nodes and zones structures between the systems that have region to node
mapping in memblock and those that don't.

Currently all the NUMA architectures enable this option and for the
non-NUMA systems we can presume that all the memory belongs to node 0 and
therefore the compile time configuration option is not required.

The remaining few architectures that use DISCONTIGMEM without NUMA are
easily updated to use memblock_add_node() instead of memblock_add() and
thus have proper correspondence of memblock regions to NUMA nodes.

Still, free_area_init_node() must have a backward compatible version
because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
different.  Once all the architectures will use the new semantics, the
entire compatibility layer can be dropped.

To avoid addition of extra run time memory to store node id for
architectures that keep memblock but have only a single node, the node id
field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
the corresponding accessors presume that in those cases it is always 0.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Cc: Baoquan He <bhe@redhat.com>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200412194859.12663-4-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:43 -07:00