Commit Graph

6847 Commits

Author SHA1 Message Date
Christophe Leroy 13dac4e31e powerpc/8xx: Activate KUEP at all time
On the 8xx, there is absolutely no runtime impact with KUEP. Protection
against execution of user code in kernel mode is set up at boot time
by configuring the groups with contain all user pages as having swapped
protection rights, in extenso EX for user and NA for supervisor.

Configure KUEP at startup and force selection of CONFIG_PPC_KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2129e86944323ffe9ed07fffbeafdfd2e363690a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy 6c1fa60d36 Revert "powerpc: Inline setup_kup()"
This reverts commit 1791ebd131.

setup_kup() was inlined to manage conflict between PPC32 marking
setup_{kuap/kuep}() __init and PPC64 not marking them __init.

But in fact PPC32 has removed the __init mark for all but 8xx
in order to properly handle SMP.

In order to make setup_kup() grow a bit, revert the commit
mentioned above but remove __init for 8xx as well so that
we don't have to mark setup_kup() as __ref.

Also switch the order so that KUAP is initialised before KUEP
because on the 40x, KUEP will depend on the activation of KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7691088fd0994ee3c8db6298dc8c00259e3f6a7f.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:16 +11:00
Nicholas Piggin 387e220a2e powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG.
      Fix radix_enabled() use in setup_initial_memory_limit(). Add some
      stubs to reduce number of ifdefs.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2021-12-09 22:41:13 +11:00
Nicholas Piggin c28573744b powerpc/64s: Make hash MMU support configurable
This adds Kconfig selection which allows 64s hash MMU support to be
disabled. It can be disabled if radix support is enabled, the minimum
supported CPU type is POWER9 (or higher), and KVM is not selected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin debeda0171 powerpc/64s: Always define arch unmapped area calls
To avoid any functional changes to radix paths when building with hash
MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
arch get_unmapped_area calls on 64s platforms.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-16-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Sean Christopherson 005467e06b KVM: Drop obsolete kvm_arch_vcpu_block_finish()
Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
nops.

No functional change intended.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:50 -05:00
Sean Christopherson 510958e997 KVM: Force PPC to define its own rcuwait object
Do not define/reference kvm_vcpu.wait if __KVM_HAVE_ARCH_WQP is true, and
instead force the architecture (PPC) to define its own rcuwait object.
Allowing common KVM to directly access vcpu->wait without a guard makes
it all too easy to introduce potential bugs, e.g. kvm_vcpu_block(),
kvm_vcpu_on_spin(), and async_pf_execute() all operate on vcpu->wait, not
the result of kvm_arch_vcpu_get_wait(), and so may do the wrong thing for
PPC.

Due to PPC's shenanigans with respect to callbacks and waits (it switches
to the virtual core's wait object at KVM_RUN!?!?), it's not clear whether
or not this fixes any bugs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08 04:24:46 -05:00
Sean Christopherson eaaaed137e KVM: PPC: Avoid referencing userspace memory region in memslot updates
For PPC HV, get the number of pages directly from the new memslot instead
of computing the same from the userspace memory region, and explicitly
check for !DELETE instead of inferring the same when toggling mmio_update.
The motivation for these changes is to avoid referencing the @mem param
so that it can be dropped in a future commit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <1e97fb5198be25f98ef82e63a8d770c682264cc9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:22 -05:00
Sean Christopherson 537a17b314 KVM: Let/force architectures to deal with arch specific memslot data
Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch
code to handle propagating arch specific data from "new" to "old" when
necessary.  This is a baby step towards dynamically allocating "new" from
the get go, and is a (very) minor performance boost on x86 due to not
unnecessarily copying arch data.

For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE
and FLAGS_ONLY.  This is functionally a nop as the previous behavior
would overwrite the pointer for CREATE, and eventually discard/ignore it
for DELETE.

For x86, copy the arch data only for FLAGS_ONLY changes.  Unlike PPC HV,
x86 needs to reallocate arch data in the MOVE case as the size of x86's
allocations depend on the alignment of the memslot's gfn.

Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to
match the "commit" prototype.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
[mss: add missing RISCV kvm_arch_prepare_memory_region() change]
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>
2021-12-08 04:24:20 -05:00
Sebastian Andrzej Siewior 77993b595a locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
The printk header file includes ratelimit_types.h for its __ratelimit()
based usage. It is required for the static initializer used in
printk_ratelimited(). It uses a raw_spinlock_t and includes the
spinlock_types.h.

PREEMPT_RT substitutes spinlock_t with a rtmutex based implementation and so
its spinlock_t implmentation (provided by spinlock_rt.h) includes rtmutex.h and
atomic.h which leads to recursive includes where defines are missing.

By including only the raw_spinlock_t defines it avoids the atomic.h
related includes at this stage.

An example on powerpc:

|  CALL    scripts/atomic/check-atomics.sh
|In file included from include/linux/bug.h:5,
|                 from include/linux/page-flags.h:10,
|                 from kernel/bounds.c:10:
|arch/powerpc/include/asm/page_32.h: In function ‘clear_page’:
|arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function â=80=98__WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
|   87 |    __WARN();    \
|      |    ^~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
|   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
|      |  ^~~~~~~
|arch/powerpc/include/asm/bug.h:58:17: error: invalid application of ‘sizeofâ€=99 to incomplete type ‘struct bug_entryâ€=99
|   58 |     "i" (sizeof(struct bug_entry)), \
|      |                 ^~~~~~
|arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro ‘BUG_ENTRYâ€=99
|   89 |   BUG_ENTRY(PPC_TLNEI " %4, 0",   \
|      |   ^~~~~~~~~
|arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
|   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
|      |  ^~~~~~~
|In file included from arch/powerpc/include/asm/ptrace.h:298,
|                 from arch/powerpc/include/asm/hw_irq.h:12,
|                 from arch/powerpc/include/asm/irqflags.h:12,
|                 from include/linux/irqflags.h:16,
|                 from include/asm-generic/cmpxchg-local.h:6,
|                 from arch/powerpc/include/asm/cmpxchg.h:526,
|                 from arch/powerpc/include/asm/atomic.h:11,
|                 from include/linux/atomic.h:7,
|                 from include/linux/rwbase_rt.h:6,
|                 from include/linux/rwlock_types.h:55,
|                 from include/linux/spinlock_types.h:74,
|                 from include/linux/ratelimit_types.h:7,
|                 from include/linux/printk.h:10,
|                 from include/asm-generic/bug.h:22,
|                 from arch/powerpc/include/asm/bug.h:109,
|                 from include/linux/bug.h:5,
|                 from include/linux/page-flags.h:10,
|                 from kernel/bounds.c:10:
|include/linux/thread_info.h: In function â=80=98copy_overflowâ=80=99:
|include/linux/thread_info.h:210:2: error: implicit declaration of function â=80=98WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
|  210 |  WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
|      |  ^~~~

The WARN / BUG include pulls in printk.h and then ptrace.h expects WARN
(from bug.h) which is not yet complete. Even hw_irq.h has WARN_ON()
statements.

On POWERPC64 there are missing atomic64 defines while building 32bit
VDSO:
|  VDSO32C arch/powerpc/kernel/vdso32/vgettimeofday.o
|In file included from include/linux/atomic.h:80,
|                 from include/linux/rwbase_rt.h:6,
|                 from include/linux/rwlock_types.h:55,
|                 from include/linux/spinlock_types.h:74,
|                 from include/linux/ratelimit_types.h:7,
|                 from include/linux/printk.h:10,
|                 from include/linux/kernel.h:19,
|                 from arch/powerpc/include/asm/page.h:11,
|                 from arch/powerpc/include/asm/vdso/gettimeofday.h:5,
|                 from include/vdso/datapage.h:137,
|                 from lib/vdso/gettimeofday.c:5,
|                 from <command-line>:
|include/linux/atomic-arch-fallback.h: In function ‘arch_atomic64_incâ€=99:
|include/linux/atomic-arch-fallback.h:1447:2: error: implicit declaration of function ‘arch_atomic64_add’; did you mean ‘arch_atomic_add’? [-Werror=3Dimpl
|icit-function-declaration]
| 1447 |  arch_atomic64_add(1, v);
|      |  ^~~~~~~~~~~~~~~~~
|      |  arch_atomic_add

The generic fallback is not included, atomics itself are not used. If
kernel.h does not include printk.h then it comes later from the bug.h
include.

Allow asm/spinlock_types.h to be included from
linux/spinlock_types_raw.h.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211129174654.668506-12-bigeasy@linutronix.de
2021-12-07 15:14:12 +01:00
Nicholas Piggin 935b534c24 powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific
slb.c is hash-specific SLB management, but do_bad_slb_fault deals with
segment interrupts that occur with radix MMU as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin a4135cbebd powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVE
The pseries platform does not use the native hash code but the PAPR
virtualised hash interfaces, so remove PPC_HASH_MMU_NATIVE.

This requires moving tlbiel code from hash_native.c to hash_utils.c.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-4-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin 79b74a6848 powerpc: Remove unused FW_FEATURE_NATIVE references
FW_FEATURE_NATIVE_ALWAYS and FW_FEATURE_NATIVE_POSSIBLE are always
zero and never do anything. Remove them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-2-npiggin@gmail.com
2021-12-02 22:57:22 +11:00
Kees Cook 62ea67e319 powerpc/signal32: Use struct_group() to zero spe regs
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the spe registers so that memset() can correctly reason
about the size:

   In function 'fortify_memset_chk',
       inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3:
   >> include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
     195 |    __write_overflow_field();
         |    ^~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118203604.1288379-1-keescook@chromium.org
2021-12-02 10:39:00 +11:00
Nicholas Piggin 5402e239d0 powerpc/64s: Get LPID bit width from device tree
Allow the LPID bit width and partition table size to be set at runtime
from the device tree.

Move the PID bit width detection into the same place.

KVM does not support using the extra bits yet, this is mainly required
to get the PTCR register values correct (so KVM will run but it will
not allocate > 4096 LPIDs).

OPAL firmware provides this property for POWER10 CPUs since skiboot
commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for
POWER10 CPUs").

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com
2021-11-30 22:27:07 +11:00
Athira Rajeev 2c9ac51b85 powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC
Running perf fuzzer showed below in dmesg logs:
  "Can't find PMC that caused IRQ"

This means a PMU exception happened, but none of the PMC's (Performance
Monitor Counter) were found to be overflown. There are some corner cases
that clears the PMCs after PMI gets masked. In such cases, the perf
interrupt handler will not find the active PMC values that had caused
the overflow and thus leads to this message while replaying.

Case 1: PMU Interrupt happens during replay of other interrupts and
counter values gets cleared by PMU callbacks before replay:

During replay of interrupts like timer, __do_irq() and doorbell
exception, we conditionally enable interrupts via may_hard_irq_enable().
This could potentially create a window to generate a PMI. Since irq soft
mask is set to ALL_DISABLED, the PMI will get masked here. We could get
IPIs run before perf interrupt is replayed and the PMU events could
be deleted or stopped. This will change the PMU SPR values and resets
the counters. Snippet of ftrace log showing PMU callbacks invoked in
__do_irq():

  <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
  <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
  <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
  <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
  <<>>
  <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
  <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
  <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
  <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
  <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
  <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
  <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
  <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
  <<>>
  <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
  <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts

Case 2: PMI's masked during local_* operations, example local_add(). If
the local_add() operation happens within a local_irq_save(), replay of
PMI will be during local_irq_restore(). Similar to case 1, this could
also create a window before replay where PMU events gets deleted or
stopped.

Fix it by updating the PMU callback function power_pmu_disable() to
check for pending perf interrupt. If there is an overflown PMC and
pending perf interrupt indicated in paca, clear the PMI bit in paca to
drop that sample. Clearing of PMI bit is done in power_pmu_disable()
since disable is invoked before any event gets deleted/stopped. With
this fix, if there are more than one event running in the PMU, there is
a chance that we clear the PMI bit for the event which is not getting
deleted/stopped. The other events may still remain active. Hence to make
sure we don't drop valid sample in such cases, another check is added in
power_pmu_enable. This checks if there is an overflown PMC found among
the active events and if so enable back the PMI bit. Two new helper
functions are introduced to clear/set the PMI, ie
clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function
pmi_irq_pending() is introduced to give a warning if there is pending
PMI bit in paca, but no PMC is overflown.

Also there are corner cases which result in performance monitor
interrupts being triggered during power_pmu_disable(). This happens
since PMXE bit is not cleared along with disabling of other MMCR0 bits
in the pmu_disable. Such PMI's could leave the PMU running and could
trigger PMI again which will set MMCR0 PMAO bit. This could lead to
spurious interrupts in some corner cases. Example, a timer after
power_pmu_del() which will re-enable interrupts and triggers a PMI again
since PMAO bit is still set. But fails to find valid overflow since PMC
was cleared in power_pmu_del(). Fix that by disabling PMXE along with
disabling of other MMCR0 bits in power_pmu_disable().

We can't just replay PMI any time. Hence this approach is preferred
rather than replaying PMI before resetting overflown PMC. Patch also
documents core-book3s on a race condition which can trigger these PMC
messages during idle path in PowerNV.

Fixes: f442d00480 ("powerpc/64s: Add support to mask perf interrupts and replay them")
Reported-by: Nageswara R Sastry <nasastry@in.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make pmi_irq_pending() return bool, reflow/reword some comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-11-30 17:15:49 +11:00
Christophe Leroy f05cab0034 powerpc/atomics: Remove atomic_inc()/atomic_dec() and friends
Now that atomic_add() and atomic_sub() handle immediate operands,
atomic_inc() and atomic_dec() have no added value compared to the
generic fallback which calls atomic_add(1) and atomic_sub(1).

Also remove atomic_inc_not_zero() which fallsback to
atomic_add_unless() which itself fallsback to
atomic_fetch_add_unless() which now handles immediate operands.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0bc64a2f18726055093dbb2e479cefc60a409cfd.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:57 +11:00
Christophe Leroy 41d65207de powerpc/atomics: Use immediate operand when possible
Today we get the following code generation for atomic operations:

	c001bb2c:	39 20 00 01 	li      r9,1
	c001bb30:	7d 40 18 28 	lwarx   r10,0,r3
	c001bb34:	7d 09 50 50 	subf    r8,r9,r10
	c001bb38:	7d 00 19 2d 	stwcx.  r8,0,r3

	c001c7a8:	39 40 00 01 	li      r10,1
	c001c7ac:	7d 00 18 28 	lwarx   r8,0,r3
	c001c7b0:	7c ea 42 14 	add     r7,r10,r8
	c001c7b4:	7c e0 19 2d 	stwcx.  r7,0,r3

By allowing GCC to choose between immediate or regular operation,
we get:

	c001bb2c:	7d 20 18 28 	lwarx   r9,0,r3
	c001bb30:	39 49 ff ff 	addi    r10,r9,-1
	c001bb34:	7d 40 19 2d 	stwcx.  r10,0,r3
	--
	c001c7a4:	7d 40 18 28 	lwarx   r10,0,r3
	c001c7a8:	39 0a 00 01 	addi    r8,r10,1
	c001c7ac:	7d 00 19 2d 	stwcx.  r8,0,r3

For "and", the dot form has to be used because "andi" doesn't exist.

For logical operations we use unsigned 16 bits immediate.
For arithmetic operations we use signed 16 bits immediate.

On pmac32_defconfig, it reduces the text by approx another 8 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2ec558d44db8045752fe9dbd29c9ba84bab6030b.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:57 +11:00
Christophe Leroy fb350784d8 powerpc/bitops: Use immediate operand when possible
Today we get the following code generation for bitops like
set or clear bit:

	c0009fe0:	39 40 08 00 	li      r10,2048
	c0009fe4:	7c e0 40 28 	lwarx   r7,0,r8
	c0009fe8:	7c e7 53 78 	or      r7,r7,r10
	c0009fec:	7c e0 41 2d 	stwcx.  r7,0,r8

	c000d568:	39 00 18 00 	li      r8,6144
	c000d56c:	7c c0 38 28 	lwarx   r6,0,r7
	c000d570:	7c c6 40 78 	andc    r6,r6,r8
	c000d574:	7c c0 39 2d 	stwcx.  r6,0,r7

Most set bits are constant on lower 16 bits, so it can easily
be replaced by the "immediate" version of the operation. Allow
GCC to choose between the normal or immediate form.

For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
when all bits to be cleared are consecutive.

On 64 bits we don't have any equivalent single operation for clearing,
single bits or a few bits, we'd need two 'rldicl' so it is not
worth it, the li/andc sequence is doing the same.

With this patch we get:

	c0009fe0:	7d 00 50 28 	lwarx   r8,0,r10
	c0009fe4:	61 08 08 00 	ori     r8,r8,2048
	c0009fe8:	7d 00 51 2d 	stwcx.  r8,0,r10

	c000d558:	7c e0 40 28 	lwarx   r7,0,r8
	c000d55c:	54 e7 05 64 	rlwinm  r7,r7,0,21,18
	c000d560:	7c e0 41 2d 	stwcx.  r7,0,r8

On pmac32_defconfig, it reduces the text by approx 10 kbytes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30 11:45:50 +11:00
Nicholas Piggin aebd1fb45c powerpc: flexible GPR range save/restore macros
Introduce macros that operate on a (start, end) range of GPRs, which
reduces lines of code and need to do mental arithmetic while reading the
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com
2021-11-29 23:15:20 +11:00
Michael Ellerman ab85a27395 powerpc: Mark probe_machine() __init and static
Prior to commit b1923caa6e ("powerpc: Merge 32-bit and 64-bit
setup_arch()") probe_machine() was called from setup_32/64.c and lived
in setup-common.c. But now it's only called from setup-common.c so it
can be static and __init, and we don't need the declaration in
machdep.h either.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-6-mpe@ellerman.id.au
2021-11-29 22:49:26 +11:00
Nicholas Piggin b350111bf7 powerpc: remove cpu_online_cores_map function
This function builds the cores online map with on-stack cpumasks which
can cause high stack usage with large NR_CPUS.

It is not used in any performance sensitive paths, so instead just check
for first thread sibling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105035042.1398309-1-npiggin@gmail.com
2021-11-29 22:48:32 +11:00
Michael Ellerman af3fdce4ab Revert "powerpc/code-patching: Improve verification of patchability"
This reverts commit 8b8a8f0ab3.

As reported[1] by Sachin this causes problems with ftrace, and it also
causes the code patching selftests to fail as reported[2] by Stephen.

So revert it for now.

1: https://lore.kernel.org/linuxppc-dev/3668743C-09DF-4673-B15C-2FFE2A57F7D7@linux.vnet.ibm.com/
2: https://lore.kernel.org/linuxppc-dev/20211126161747.1f7795b0@canb.auug.org.au/

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-11-29 17:41:52 +11:00
Nathan Lynch 38f7b7067d powerpc/rtas: rtas_busy_delay() improvements
Generally RTAS cannot block, and in PAPR it is required to return control
to the OS within a few tens of microseconds. In order to support operations
which may take longer to complete, many RTAS primitives can return
intermediate -2 ("busy") or 990x ("extended delay") values, which indicate
that the OS should reattempt the same call with the same arguments at some
point in the future.

Current versions of PAPR are less than clear about this, but the intended
meanings of these values in more detail are:

RTAS_BUSY (-2): RTAS has suspended a potentially long-running operation in
order to meet its latency obligation and give the OS the opportunity to
perform other work. RTAS can resume making progress as soon as the OS
reattempts the call.

RTAS_EXTENDED_DELAY_{MIN...MAX} (9900-9905): RTAS must wait for an external
event to occur or for internal contention to resolve before it can complete
the requested operation. The value encodes a non-binding hint as to roughly
how long the OS should wait before calling again, but the OS is allowed to
reattempt the call sooner or even immediately.

Linux of course must take its own CPU scheduling obligations into account
when handling these statuses; e.g. a task which receives an RTAS_BUSY
status should check whether to reschedule before it attempts the RTAS call
again to avoid starving other tasks.

rtas_busy_delay() is a helper function that "consumes" a busy or extended
delay status. Common usage:

    int rc;

    do {
        rc = rtas_call(rtas_token("some-function"), ...);
    } while (rtas_busy_delay(rc));

    /* convert rc to Linux error value, etc */

If rc is a busy or extended delay status, the caller can rely on
rtas_busy_delay() to perform an appropriate sleep or reschedule and return
nonzero. Other statuses are handled normally by the caller.

The current implementation of rtas_busy_delay() both oversleeps and
overuses the CPU:

*  It performs msleep() for all 990x and even when no delay is
   suggested (-2), but this is understood to actually sleep for two jiffies
   minimum in practice (20ms with HZ=100). 9900 (1ms) and 9901 (10ms)
   appear to be the most common extended delay statuses, and the
   oversleeping measurably lengthens DLPAR operations, which perform
   many RTAS calls.

*  It does not sleep on 990x unless need_resched() is true, causing code
   like the loop above to needlessly retry, wasting CPU time.

Alter the logic to align better with the intended meanings:

*  When passed RTAS_BUSY, perform cond_resched() and return without
   sleeping. The caller should reattempt immediately

*  Always sleep when passed an extended delay status, using usleep_range()
   for precise shorter sleeps. Limit the sleep time to one second even
   though there are higher architected values.

Change rtas_busy_delay()'s return type to bool to better reflect its usage,
and add kernel-doc.

rtas_busy_delay_time() is unchanged, even though it "incorrectly" returns 1
for RTAS_BUSY. There are users of that API with open-coded delay loops in
sensitive contexts that will have to be taken on an individual basis.

Brief results for addition and removal of 5GB memory on a small P9 PowerVM
partition follow. Load was generated with stress-ng --cpu N. For add,
elapsed time is greatly reduced without significant change in the number of
RTAS calls or time spent on CPU. For remove, elapsed time is modestly
reduced, with significant reductions in RTAS calls and time spent on CPU.

With no competing workload (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             1,935      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
-            609.99 msec task-clock                #    0.183 CPUs utilized            ( +-  0.19% )
+             1,956      probe:rtas_call           #    0.003 M/sec                    ( +-  0.17% )
+            618.56 msec task-clock                #    0.278 CPUs utilized            ( +-  0.11% )

-            3.3322 +- 0.0670 seconds time elapsed  ( +-  2.01% )
+            2.2222 +- 0.0416 seconds time elapsed  ( +-  1.87% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             6,224      probe:rtas_call           #    0.008 M/sec                    ( +-  2.57% )
-            750.36 msec task-clock                #    0.190 CPUs utilized            ( +-  2.01% )
+               843      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
+            250.66 msec task-clock                #    0.068 CPUs utilized            ( +-  0.17% )

-            3.9394 +- 0.0890 seconds time elapsed  ( +-  2.26% )
+             3.678 +- 0.113 seconds time elapsed  ( +-  3.07% )

With all CPUs 100% busy (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             2,979      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
-          1,096.62 msec task-clock                #    0.105 CPUs utilized            ( +-  0.10% )
+             2,981      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
+          1,095.26 msec task-clock                #    0.154 CPUs utilized            ( +-  0.21% )

-            10.476 +- 0.104 seconds time elapsed  ( +-  1.00% )
+            7.1124 +- 0.0865 seconds time elapsed  ( +-  1.22% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             2,702      probe:rtas_call           #    0.004 M/sec                    ( +-  4.00% )
-            722.71 msec task-clock                #    0.067 CPUs utilized            ( +-  2.41% )
+             1,246      probe:rtas_call           #    0.003 M/sec                    ( +-  0.25% )
+            487.73 msec task-clock                #    0.049 CPUs utilized            ( +-  0.20% )

-            10.829 +- 0.163 seconds time elapsed  ( +-  1.51% )
+            9.9887 +- 0.0866 seconds time elapsed  ( +-  0.87% )

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-2-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Christophe Leroy 8b8a8f0ab3 powerpc/code-patching: Improve verification of patchability
Today, patch_instruction() assumes that it is called exclusively on
valid addresses, and only checks that it is not called on an init
address after init section has been freed.

Improve verification by calling kernel_text_address() instead.

kernel_text_address() already includes a verification of
initmem release.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc683d499a411730504b132a924de0ccc2ef1f79.1636971137.git.christophe.leroy@csgroup.eu
2021-11-25 11:25:32 +11:00
Hari Bathini f15a71b388 powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
macro is used while adding BPF_PROBE_MEM support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211012123056.485795-5-hbathini@linux.ibm.com
2021-11-25 11:25:31 +11:00
Cédric Le Goater 756c52c632 powerpc/xive: Activate StoreEOI on P10
StoreEOI (the capability to EOI with a store) requires load-after-store
ordering in some cases to be reliable. P10 introduced a new offset for
load operations to enforce correct ordering and the XIVE driver has
the required support since kernel 5.8, commit b1f9be9392
("powerpc/xive: Enforce load-after-store ordering when StoreEOI is active")

Since skiboot v7, StoreEOI support is advertised on P10 with a new flag
on the PowerNV platform. See skiboot commit 4bd7d84afe46 ("xive/p10:
Introduce a new OPAL_XIVE_IRQ_STORE_EOI2 flag"). When detected,
activate the feature.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211105102636.1016378-4-clg@kaod.org
2021-11-25 11:25:30 +11:00
Nicholas Piggin 9c5a432a55 KVM: PPC: Book3S HV P9: Remove subcore HMI handling
On POWER9 and newer, rather than the complex HMI synchronisation and
subcore state, have each thread un-apply the guest TB offset before
calling into the early HMI handler.

This allows the subcore state to be avoided, including subcore enter
/ exit guest, which includes an expensive divide that shows up
slightly in profiles.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-54-npiggin@gmail.com
2021-11-24 21:09:03 +11:00
Nicholas Piggin 434398ab5e KVM: PPC: Book3S HV P9: Avoid cpu_in_guest atomics on entry and exit
cpu_in_guest is set to determine if a CPU needs to be IPI'ed to exit
the guest and notice the need_tlb_flush bit.

This can be implemented as a global per-CPU pointer to the currently
running guest instead of per-guest cpumasks, saving 2 atomics per
entry/exit. P7/8 doesn't require cpu_in_guest, nor does a nested HV
(only the L0 does), so move it to the P9 HV path.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-50-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin b49c65c5f9 KVM: PPC: Book3S HV P9: Improve mfmsr performance on entry
Rearrange the MSR saving on entry so it does not follow the mtmsrd to
disable interrupts, avoiding a possible RAW scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-46-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin 46dea77f79 KVM: PPC: Book3S HV Nested: Avoid extra mftb() in nested entry
mftb() is expensive and one can be avoided on nested guest dispatch.

If the time checking code distinguishes between the L0 timer and the
nested HV timer, then both can be tested in the same place with the
same mftb() value.

This also nicely illustrates the relationship between the L0 and nested
HV timers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-45-npiggin@gmail.com
2021-11-24 21:09:02 +11:00
Nicholas Piggin 0ba0e5d5a6 KVM: PPC: Book3S HV: Split P8 from P9 path guest vCPU TLB flushing
This creates separate functions for old and new paths for vCPU TLB
flushing, which will reduce complexity of the next change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-43-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin 5236756d04 KVM: PPC: Book3S HV P9: Use Linux SPR save/restore to manage some host SPRs
Linux implements SPR save/restore including storage space for registers
in the task struct for process context switching. Make use of this
similarly to the way we make use of the context switching fp/vec save
restore.

This improves code reuse, allows some stack space to be saved, and helps
with avoiding VRSAVE updates if they are not required.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-39-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin 022ecb960c KVM: PPC: Book3S HV P9: Demand fault TM facility registers
Use HFSCR facility disabling to implement demand faulting for TM, with
a hysteresis counter similar to the load_fp etc counters in context
switching that implement the equivalent demand faulting for userspace
facilities.

This speeds up guest entry/exit by avoiding the register save/restore
when a guest is not frequently using them. When a guest does use them
often, there will be some additional demand fault overhead, but these
are not commonly used facilities.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-38-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin a3e18ca8ab KVM: PPC: Book3S HV P9: Demand fault EBB facility registers
Use HFSCR facility disabling to implement demand faulting for EBB, with
a hysteresis counter similar to the load_fp etc counters in context
switching that implement the equivalent demand faulting for userspace
facilities.

This speeds up guest entry/exit by avoiding the register save/restore
when a guest is not frequently using them. When a guest does use them
often, there will be some additional demand fault overhead, but these
are not commonly used facilities.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-37-npiggin@gmail.com
2021-11-24 21:09:01 +11:00
Nicholas Piggin cb2553a093 KVM: PPC: Book3S HV P9: Optimise timebase reads
Reduce the number of mfTB executed by passing the current timebase
around entry and exit code rather than read it multiple times.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-25-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin 3c1a4322bb KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase
Change dec_expires to be relative to the guest timebase, and allow
it to be moved into low level P9 guest entry functions, to improve
SPR access scheduling.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-23-npiggin@gmail.com
2021-11-24 21:08:59 +11:00
Nicholas Piggin 34e119c96b KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs
This reduces the number of mtmsrd required to enable facility bits when
saving/restoring registers, by having the KVM code set all bits up front
rather than using individual facility functions that set their particular
MSR bits.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-20-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin 57dc0eed73 KVM: PPC: Book3S HV P9: Implement PMU save/restore in C
Implement the P9 path PMU save/restore code in C, and remove the
POWER9/10 code from the P7/8 path assembly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-14-npiggin@gmail.com
2021-11-24 21:08:58 +11:00
Nicholas Piggin 25aa145856 powerpc/time: add API for KVM to re-arm the host timer/decrementer
Rather than have KVM look up the host timer and fiddle with the
irq-work internal details, have the powerpc/time.c code provide a
function for KVM to re-arm the Linux timer code when exiting a
guest.

This is implementation has an improvement over existing code of
marking a decrementer interrupt as soft-pending if a timer has
expired, rather than setting DEC to a -ve value, which tended to
cause host timers to take two interrupts (first hdec to exit the
guest, then the immediate dec).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-8-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin 9581991a60 KVM: PPC: Book3S HV P9: Use large decrementer for HDEC
On processors that don't suppress the HDEC exceptions when LPCR[HDICE]=0,
this could help reduce needless guest exits due to leftover exceptions on
entering the guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-6-npiggin@gmail.com
2021-11-24 21:08:57 +11:00
Nicholas Piggin 4ebbd075bc KVM: PPC: Book3S HV P9: Use host timer accounting to avoid decrementer read
There is no need to save away the host DEC value, as it is derived
from the host timer subsystem which maintains the next timer time,
so it can be restored from there.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211123095231.1036501-5-npiggin@gmail.com
2021-11-24 21:08:56 +11:00
Linus Torvalds e8f023caee asm-generic: asm/syscall.h cleanup
This is a single cleanup from Peter Collingbourne, removing
 some dead code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGKm+kACgkQmmx57+YA
 GNkksg/7BUxJrWFrQmLA3fhzh4wG3KswdrKTGQMf0jRVI1n77vmdfig3hEJmMekH
 0SZoIYmPztOSj34+6p4NxuqY/Sk62oYLr8Awo8ZLDhIBWNUJE8UWC07Qb/3DpGbp
 JfD7/8mXu10htgM85aGlhaVLnHvvqUBR8PlkJUGNxuY5Gy2L+eCkpwCAYlZpSOqr
 vs0BJWFY1LzO1POcjIq4/IM0PvcU2ncLB5XoxDjjIIWnWKyHzWY21ZvoeaNumvou
 xqQ/Hj8Sc+ufS0yNlSgIC+bJP0bp1bSw/dALKr8oYxLt7X9LELVY3WXyRH+It4nS
 b6HPYmga26NVq/u7RrylBA+2fRCDB8E6z73gHt4SeHrRDEvFjhNzIyV+aXPae/dY
 XI/pjiwpG6k6FpSnF69YZJ/Y+GmUA90V/Jq8aLFZhGz8SgpjRl+2foEUSDhRVXCA
 jGB1Y388m0e6jPlVJROB7ORzXMd8K5iciyUGqtAI87QCOtPozn10ruh5RdziguKm
 kDW5IKy9E4l1ch8WRprVbgV5Ew+QWKS1JIbyjDaX3jN0lUPCqgwkwvRxxtgdFnVA
 Lq5BiUdraSWFUr84rLU3gCRU0+VoEdyZYI+bQGGNlQ4ovmLYU1nLmCU/azMvSEgz
 ZsoM5YdffShxUtfwMg+W67RpYOjSnV55ZzwN0+d1C2oqz9xECXc=
 =8EFP
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanup from Arnd Bergmann:
 "This is a single cleanup from Peter Collingbourne, removing some dead
  code"

* tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: remove unused function syscall_set_arguments()
2021-11-10 11:22:03 -08:00
Linus Torvalds 0c5c62ddf8 pci-v5.16-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmGFXBkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vx6Tg/7BsGWm8f+uw/mr9lLm47q2mc4XyoO
 7bR9KDp5NM84W/8ZOU7dqqqsnY0ddrSOLBRyhJJYMW3SwJd1y1ajTBsL1Ujqv+eN
 z+JUFmhq4Laqm4k6Spc9CEJE+Ol5P6gGUtxLYo6PM2R0VxnSs/rDxctT5i7YOpCi
 COJ+NVT/mc/by2loz1kLTSR9GgtBBgd+Y8UA33GFbHKssROw02L0OI3wffp81Oba
 EhMGPoD+0FndAniDw+vaOSoO+YaBuTfbM92T/O00mND69Fj1PWgmNWZz7gAVgsXb
 3RrNENUFxgw6CDt7LZWB8OyT04iXe0R2kJs+PA9gigFCGbypwbd/Nbz5M7e9HUTR
 ray+1EpZib6+nIksQBL2mX8nmtyHMcLiM57TOEhq0+ECDO640MiRm8t0FIG/1E8v
 3ZYd9w20o/NxlFNXHxxpZ3D/osGH5ocyF5c5m1rfB4RGRwztZGL172LWCB0Ezz9r
 eHB8sWxylxuhrH+hp2BzQjyddg7rbF+RA4AVfcQSxUpyV01hoRocKqknoDATVeLH
 664nJIINFxKJFwfuL3E6OhrInNe1LnAhCZsHHqbS+NNQFgvPRznbixBeLkI9dMf5
 Yf6vpsWO7ur8lHHbRndZubVu8nxklXTU7B/w+C11sq6k9LLRJSHzanr3Fn9WA80x
 sznCxwUvbTCu1r0=
 =nsMh
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
 "Enumeration:
   - Conserve IRQs by setting up portdrv IRQs only when there are users
     (Jan Kiszka)
   - Rework and simplify _OSC negotiation for control of PCIe features
     (Joerg Roedel)
   - Remove struct pci_dev.driver pointer since it's redundant with the
     struct device.driver pointer (Uwe Kleine-König)

  Resource management:
   - Coalesce contiguous host bridge apertures from _CRS to accommodate
     BARs that cover more than one aperture (Kai-Heng Feng)

  Sysfs:
   - Check CAP_SYS_ADMIN before parsing user input (Krzysztof
     Wilczyński)
   - Return -EINVAL consistently from "store" functions (Krzysztof
     Wilczyński)
   - Use sysfs_emit() in endpoint "show" functions to avoid buffer
     overruns (Kunihiko Hayashi)

  PCIe native device hotplug:
   - Ignore Link Down/Up caused by resets during error recovery so
     endpoint drivers can remain bound to the device (Lukas Wunner)

  Virtualization:
   - Avoid bus resets on Atheros QCA6174, where they hang the device
     (Ingmar Klein)
   - Work around Pericom PI7C9X2G switch packet drop erratum by using
     store and forward mode instead of cut-through (Nathan Rossi)
   - Avoid trying to enable AtomicOps on VFs; the PF setting applies to
     all VFs (Selvin Xavier)

  MSI:
   - Document that /sys/bus/pci/devices/.../irq contains the legacy INTx
     interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry
     Song)

  VPD:
   - Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere
     in the possible VPD space; use these to simplify the cxgb3 driver
     (Heiner Kallweit)

  Peer-to-peer DMA:
   - Add (not subtract) the bus offset when calculating DMA address
     (Wang Lu)

  ASPM:
   - Re-enable LTR at Downstream Ports so they don't report Unsupported
     Requests when reset or hot-added devices send LTR messages
     (Mingchuang Qiao)

  Apple PCIe controller driver:
   - Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc
     Zyngier)

  Cadence PCIe controller driver:
   - Return success when probe succeeds instead of falling into error
     path (Li Chen)

  HiSilicon Kirin PCIe controller driver:
   - Reorganize PHY logic and add support for external PHY drivers
     (Mauro Carvalho Chehab)
   - Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro
     Carvalho Chehab)
   - Add Kirin 970 support (Mauro Carvalho Chehab)
   - Make driver removable (Mauro Carvalho Chehab)

  Intel VMD host bridge driver:
   - If IOMMU supports interrupt remapping, leave VMD MSI-X remapping
     enabled (Adrian Huang)
   - Number each controller so we can tell them apart in
     /proc/interrupts (Chunguang Xu)
   - Avoid building on UML because VMD depends on x86 bare metal APIs
     (Johannes Berg)

  Marvell Aardvark PCIe controller driver:
   - Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár)
   - Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár)
   - Downgrade PIO Response Status messages to debug level (Marek Behún)
   - Preserve CRS SV (Config Request Retry Software Visibility) bit in
     emulated Root Control register (Pali Rohár)
   - Fix issue in configuring reference clock (Pali Rohár)
   - Don't clear status bits for masked interrupts (Pali Rohár)
   - Don't mask unused interrupts (Pali Rohár)
   - Avoid code repetition in advk_pcie_rd_conf() (Marek Behún)
   - Retry config accesses on CRS response (Pali Rohár)
   - Simplify emulated Root Capabilities initialization (Pali Rohár)
   - Fix several link training issues (Pali Rohár)
   - Fix link-up checking via LTSSM (Pali Rohár)
   - Fix reporting of Data Link Layer Link Active (Pali Rohár)
   - Fix emulation of W1C bits (Marek Behún)
   - Fix MSI domain .alloc() method to return zero on success (Marek
     Behún)
   - Read entire 16-bit MSI vector in MSI handler, not just low 8 bits
     (Marek Behún)
   - Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits
     at startup; PCI core will set those as necessary (Pali Rohár)
   - When operating as a Root Port, set class code to "PCI Bridge"
     instead of the default "Mass Storage Controller" (Pali Rohár)
   - Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't
     implement this per spec (Pali Rohár)
   - Add emulation of option ROM BAR since aardvark doesn't implement
     this per spec (Pali Rohár)

  MediaTek MT7621 PCIe controller driver:
   - Add MediaTek MT7621 PCIe host controller driver and DT binding
     (Sergio Paracuellos)

  Qualcomm PCIe controller driver:
   - Add SC8180x compatible string (Bjorn Andersson)
   - Add endpoint controller driver and DT binding (Manivannan
     Sadhasivam)
   - Restructure to use of_device_get_match_data() (Prasad Malisetty)
   - Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty)

  Renesas R-Car PCIe controller driver:
   - Remove unnecessary includes (Geert Uytterhoeven)

  Rockchip DesignWare PCIe controller driver:
   - Add DT binding (Simon Xue)

  Socionext UniPhier Pro5 controller driver:
   - Serialize INTx masking/unmasking (Kunihiko Hayashi)

  Synopsys DesignWare PCIe controller driver:
   - Run dwc .host_init() method before registering MSI interrupt
     handler so we can deal with pending interrupts left by bootloader
     (Bjorn Andersson)
   - Clean up Kconfig dependencies (Andy Shevchenko)
   - Export symbols to allow more modular drivers (Luca Ceresoli)

  TI DRA7xx PCIe controller driver:
   - Allow host and endpoint drivers to be modules (Luca Ceresoli)
   - Enable external clock if present (Luca Ceresoli)

  TI J721E PCIe driver:
   - Disable PHY when probe fails after initializing it (Christophe
     JAILLET)

  MicroSemi Switchtec management driver:
   - Return error to application when command execution fails because an
     out-of-band reset has cleared the device BARs, Memory Space Enable,
     etc (Kelvin Cao)
   - Fix MRPC error status handling issue (Kelvin Cao)
   - Mask out other bits when reading of management VEP instance ID
     (Kelvin Cao)
   - Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions
     (Kelvin Cao)
   - Add check of event support (Logan Gunthorpe)

  Miscellaneous:
   - Remove unused pci_pool wrappers, which have been replaced by
     dma_pool (Cai Huoqing)
   - Use 'unsigned int' instead of bare 'unsigned' (Krzysztof
     Wilczyński)
   - Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof
     Wilczyński)
   - Fix some sscanf(), sprintf() format mismatches (Krzysztof
     Wilczyński)
   - Update PCI subsystem information in MAINTAINERS (Krzysztof
     Wilczyński)
   - Correct some misspellings (Krzysztof Wilczyński)"

* tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits)
  PCI: Add ACS quirk for Pericom PI7C9X2G switches
  PCI: apple: Configure RID to SID mapper on device addition
  iommu/dart: Exclude MSI doorbell from PCIe device IOVA range
  PCI: apple: Implement MSI support
  PCI: apple: Add INTx and per-port interrupt support
  PCI: kirin: Allow removing the driver
  PCI: kirin: De-init the dwc driver
  PCI: kirin: Disable clkreq during poweroff sequence
  PCI: kirin: Move the power-off code to a common routine
  PCI: kirin: Add power_off support for Kirin 960 PHY
  PCI: kirin: Allow building it as a module
  PCI: kirin: Add MODULE_* macros
  PCI: kirin: Add Kirin 970 compatible
  PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge
  PCI: apple: Set up reference clocks when probing
  PCI: apple: Add initial hardware bring-up
  PCI: of: Allow matching of an interrupt-map local to a PCI device
  of/irq: Allow matching of an interrupt-map local to an interrupt controller
  irqdomain: Make of_phandle_args_to_fwspec() generally available
  PCI: Do not enable AtomicOps on VFs
  ...
2021-11-06 14:36:12 -07:00
Linus Torvalds 512b7931ad Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
2021-11-06 14:08:17 -07:00
David Hildenbrand 50f9481ed9 mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.

Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>	[kselftest]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
Christophe Leroy e012a25d81 powerpc: use generic version of arch_is_kernel_initmem_freed()
The generic version of arch_is_kernel_initmem_freed() now does the same
as powerpc version.

Remove the powerpc version.

Link: https://lkml.kernel.org/r/c53764eb45d41491e2b21da2e7812239897dbebb.1633001016.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:38 -07:00
Linus Torvalds 5c0b0c676a powerpc updates for 5.16
- Enable STRICT_KERNEL_RWX for Freescale 85xx platforms.
 
  - Activate CONFIG_STRICT_KERNEL_RWX by default, while still allowing it to be disabled.
 
  - Add support for out-of-line static calls on 32-bit.
 
  - Fix oopses doing bpf-to-bpf calls when STRICT_KERNEL_RWX is enabled.
 
  - Fix boot hangs on e5500 due to stale value in ESR passed to do_page_fault().
 
  - Fix several bugs on pseries in handling of device tree cache information for hotplugged
    CPUs, and/or during partition migration.
 
  - Various other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Alistair Popple, Anatolij Gustschin, Andrew Donnellan,
 Athira Rajeev, Bixuan Cui, Bjorn Helgaas, Cédric Le Goater, Christophe Leroy, Daniel
 Axtens, Daniel Henrique Barboza, Denis Kirjanov, Fabiano Rosas, Frederic Barrat, Gustavo
 A. R. Silva, Hari Bathini, Jacques de Laval, Joel Stanley, Kai Song, Kajol Jain, Laurent
 Vivier, Leonardo Bras, Madhavan Srinivasan, Nathan Chancellor, Nathan Lynch, Naveen N.
 Rao, Nicholas Piggin, Nick Desaulniers, Niklas Schnelle, Oliver O'Halloran, Rob Herring,
 Russell Currey, Srikar Dronamraju, Stan Johnson, Tyrel Datwyler, Uwe Kleine-König, Vasant
 Hegde, Wan Jiabing, Xiaoming Ni,
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGFDPoTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgNbsEACVczMVwMBEny5a7W1tqq1bnY9RFVw3
 K+/rE7/FpSLX+RrwgoMkmqfPvfyc9WISVLlIQDKz4XhkBjaXv0+Y4OMsymuDCbxL
 Qk7F1ff22UfLmPjjJk39gHJ8QZQqZk3wmFu2QzTO4yBZbz2SqqXFLxwyoLpZ0LJ8
 pdGl51+bIsTkDJzrdkhX9X4AKS/fYyjbQxq5u7FS89ZCCs+KvzjLcDRo0GZYaOK/
 hgDBa60DCCszL/9yjbh0ANZxmM2Z3+6AXkvAAXrtXzIGk4JzenZfiV+VEzmq8Tt0
 UpWSsUEe7VgykMR3MTrL9G8op70PpKX6OMUPegJq4iRQ24h4mpFCK4oV9OMKJqpF
 ifN9NO2ZZKOz1ke4l7Xe8SEHLX7rq5U/P7INh3AsKYNYwo6HkJhSPxiCBWUTlnZ3
 OYoZ7czyO4gMPHWP92z4CoSiTYVBFuyhYexRcnQskg60TIwbr+lMXziRyPRGI8b6
 taf2rD8eAiyUJnvkFUsyAHtYHpkSkuMeiVqY2CDQdh2SdtIFgwKzB2RjFL0gzaBZ
 XP9RWD+HernGQAJSlIk7cVthont3JHklcKk+ohhDbsWzPeUJcz6t4ChtgRq0x4q4
 Hpes1lsVfXpyxj5ouBK/E/t+diwPvBLM0dCcarQJE6ExgMzBC/C7Br7jCSgyVJA2
 VhtcZaCYh+vRlQ==
 =f7HE
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Enable STRICT_KERNEL_RWX for Freescale 85xx platforms.

 - Activate CONFIG_STRICT_KERNEL_RWX by default, while still allowing it
   to be disabled.

 - Add support for out-of-line static calls on 32-bit.

 - Fix oopses doing bpf-to-bpf calls when STRICT_KERNEL_RWX is enabled.

 - Fix boot hangs on e5500 due to stale value in ESR passed to
   do_page_fault().

 - Fix several bugs on pseries in handling of device tree cache
   information for hotplugged CPUs, and/or during partition migration.

 - Various other small features and fixes.

Thanks to Alexey Kardashevskiy, Alistair Popple, Anatolij Gustschin,
Andrew Donnellan, Athira Rajeev, Bixuan Cui, Bjorn Helgaas, Cédric Le
Goater, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Denis
Kirjanov, Fabiano Rosas, Frederic Barrat, Gustavo A.  R.  Silva, Hari
Bathini, Jacques de Laval, Joel Stanley, Kai Song, Kajol Jain, Laurent
Vivier, Leonardo Bras, Madhavan Srinivasan, Nathan Chancellor, Nathan
Lynch, Naveen N.  Rao, Nicholas Piggin, Nick Desaulniers, Niklas
Schnelle, Oliver O'Halloran, Rob Herring, Russell Currey, Srikar
Dronamraju, Stan Johnson, Tyrel Datwyler, Uwe Kleine-König, Vasant
Hegde, Wan Jiabing, and Xiaoming Ni,

* tag 'powerpc-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (73 commits)
  powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST
  powerpc/32e: Ignore ESR in instruction storage interrupt handler
  powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
  powerpc: Don't provide __kernel_map_pages() without ARCH_SUPPORTS_DEBUG_PAGEALLOC
  MAINTAINERS: Update powerpc KVM entry
  powerpc/xmon: fix task state output
  powerpc/44x/fsp2: add missing of_node_put
  powerpc/dcr: Use cmplwi instead of 3-argument cmpli
  KVM: PPC: Tick accounting should defer vtime accounting 'til after IRQ handling
  powerpc/security: Use a mutex for interrupt exit code patching
  powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return void
  powerpc/fsl_booke: Fix setting of exec flag when setting TLBCAMs
  powerpc/book3e: Fix set_memory_x() and set_memory_nx()
  powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()
  powerpc/bpf: Fix write protecting JIT code
  selftests/powerpc: Use date instead of EPOCHSECONDS in mitigation-patching.sh
  powerpc/64s/interrupt: Fix check_return_regs_valid() false positive
  powerpc/boot: Set LC_ALL=C in wrapper script
  powerpc/64s: Default to 64K pages for 64 bit book3s
  Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC"
  ...
2021-11-05 08:15:46 -07:00
Linus Torvalds d7e0a795bf ARM:
* More progress on the protected VM front, now with the full
   fixed feature set as well as the limitation of some hypercalls
   after initialisation.
 
 * Cleanup of the RAZ/WI sysreg handling, which was pointlessly
   complicated
 
 * Fixes for the vgic placement in the IPA space, together with a
   bunch of selftests
 
 * More memcg accounting of the memory allocated on behalf of a guest
 
 * Timer and vgic selftests
 
 * Workarounds for the Apple M1 broken vgic implementation
 
 * KConfig cleanups
 
 * New kvmarm.mode=none option, for those who really dislike us
 
 RISC-V:
 * New KVM port.
 
 x86:
 * New API to control TSC offset from userspace
 
 * TSC scaling for nested hypervisors on SVM
 
 * Switch masterclock protection from raw_spin_lock to seqcount
 
 * Clean up function prototypes in the page fault code and avoid
 repeated memslot lookups
 
 * Convey the exit reason to userspace on emulation failure
 
 * Configure time between NX page recovery iterations
 
 * Expose Predictive Store Forwarding Disable CPUID leaf
 
 * Allocate page tracking data structures lazily (if the i915
 KVM-GT functionality is not compiled in)
 
 * Cleanups, fixes and optimizations for the shadow MMU code
 
 s390:
 * SIGP Fixes
 
 * initial preparations for lazy destroy of secure VMs
 
 * storage key improvements/fixes
 
 * Log the guest CPNC
 
 Starting from this release, KVM-PPC patches will come from
 Michael Ellerman's PPC tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBOiEUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNowwf/axlx3g9sgCwQHr12/6UF/7hL/RwP
 9z+pGiUzjl2YQE+RjSvLqyd6zXh+h4dOdOKbZDLSkSTbcral/8U70ojKnQsXM0XM
 1LoymxBTJqkgQBLm9LjYreEbzrPV4irk4ygEmuk3CPOHZu8xX1ei6c5LdandtM/n
 XVUkXsQY+STkmnGv4P3GcPoDththCr0tBTWrFWtxa0w9hYOxx0ay1AZFlgM4FFX0
 QFuRc8VBLoDJpIUjbkhsIRIbrlHc/YDGjuYnAU7lV/CIME8vf2BW6uBwIZJdYcDj
 0ejozLjodEnuKXQGnc8sXFioLX2gbMyQJEvwCgRvUu/EU7ncFm1lfs7THQ==
 =UxKM
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - More progress on the protected VM front, now with the full fixed
     feature set as well as the limitation of some hypercalls after
     initialisation.

   - Cleanup of the RAZ/WI sysreg handling, which was pointlessly
     complicated

   - Fixes for the vgic placement in the IPA space, together with a
     bunch of selftests

   - More memcg accounting of the memory allocated on behalf of a guest

   - Timer and vgic selftests

   - Workarounds for the Apple M1 broken vgic implementation

   - KConfig cleanups

   - New kvmarm.mode=none option, for those who really dislike us

  RISC-V:

   - New KVM port.

  x86:

   - New API to control TSC offset from userspace

   - TSC scaling for nested hypervisors on SVM

   - Switch masterclock protection from raw_spin_lock to seqcount

   - Clean up function prototypes in the page fault code and avoid
     repeated memslot lookups

   - Convey the exit reason to userspace on emulation failure

   - Configure time between NX page recovery iterations

   - Expose Predictive Store Forwarding Disable CPUID leaf

   - Allocate page tracking data structures lazily (if the i915 KVM-GT
     functionality is not compiled in)

   - Cleanups, fixes and optimizations for the shadow MMU code

  s390:

   - SIGP Fixes

   - initial preparations for lazy destroy of secure VMs

   - storage key improvements/fixes

   - Log the guest CPNC

  Starting from this release, KVM-PPC patches will come from Michael
  Ellerman's PPC tree"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  RISC-V: KVM: fix boolreturn.cocci warnings
  RISC-V: KVM: remove unneeded semicolon
  RISC-V: KVM: Fix GPA passed to __kvm_riscv_hfence_gvma_xyz() functions
  RISC-V: KVM: Factor-out FP virtualization into separate sources
  KVM: s390: add debug statement for diag 318 CPNC data
  KVM: s390: pv: properly handle page flags for protected guests
  KVM: s390: Fix handle_sske page fault handling
  KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol
  KVM: x86: On emulation failure, convey the exit reason, etc. to userspace
  KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info
  KVM: x86: Clarify the kvm_run.emulation_failure structure layout
  KVM: s390: Add a routine for setting userspace CPU state
  KVM: s390: Simplify SIGP Set Arch handling
  KVM: s390: pv: avoid stalls when making pages secure
  KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm
  KVM: s390: pv: avoid double free of sida page
  KVM: s390: pv: add macros for UVC CC values
  s390/mm: optimize reset_guest_reference_bit()
  s390/mm: optimize set_guest_storage_key()
  s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present
  ...
2021-11-02 11:24:14 -07:00
Linus Torvalds 79ef0c0014 Tracing updates for 5.16:
- kprobes: Restructured stack unwinder to show properly on x86 when a stack
   dump happens from a kretprobe callback.
 
 - Fix to bootconfig parsing
 
 - Have tracefs allow owner and group permissions by default (only denying
   others). There's been pressure to allow non root to tracefs in a
   controlled fashion, and using groups is probably the safest.
 
 - Bootconfig memory managament updates.
 
 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.
 
 - Allow perf to be traced by function tracer.
 
 - Rewrite of function graph tracer to be a callback from the function tracer
   instead of having its own trampoline (this change will happen on an arch
   by arch basis, and currently only x86_64 implements it).
 
 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.
 
 - Allow histogram triggers to add variables that can perform calculations
   against the event's fields.
 
 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent warnings
   from the compiler.
 
 - Extend histogram triggers to key off of variables.
 
 - Have trace recursion use bit magic to determine preempt context over if
   branches.
 
 - Have trace recursion disable preemption as all use cases do anyway.
 
 - Added testing for verification of tracing utilities.
 
 - Various small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYYBdxhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qp1sAQD2oYFwaG3sx872gj/myBcHIBSKdiki
 Hry5csd8zYDBpgD+Poylopt5JIbeDuoYw/BedgEXmscZ8Qr7VzjAXdnv/Q4=
 =Loz8
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - kprobes: Restructured stack unwinder to show properly on x86 when a
   stack dump happens from a kretprobe callback.

 - Fix to bootconfig parsing

 - Have tracefs allow owner and group permissions by default (only
   denying others). There's been pressure to allow non root to tracefs
   in a controlled fashion, and using groups is probably the safest.

 - Bootconfig memory managament updates.

 - Bootconfig clean up to have the tools directory be less dependent on
   changes in the kernel tree.

 - Allow perf to be traced by function tracer.

 - Rewrite of function graph tracer to be a callback from the function
   tracer instead of having its own trampoline (this change will happen
   on an arch by arch basis, and currently only x86_64 implements it).

 - Allow multiple direct trampolines (bpf hooks to functions) be batched
   together in one synchronization.

 - Allow histogram triggers to add variables that can perform
   calculations against the event's fields.

 - Use the linker to determine architecture callbacks from the ftrace
   trampoline to allow for proper parameter prototypes and prevent
   warnings from the compiler.

 - Extend histogram triggers to key off of variables.

 - Have trace recursion use bit magic to determine preempt context over
   if branches.

 - Have trace recursion disable preemption as all use cases do anyway.

 - Added testing for verification of tracing utilities.

 - Various small clean ups and fixes.

* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
  tracing/histogram: Fix semicolon.cocci warnings
  tracing/histogram: Fix documentation inline emphasis warning
  tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
  tracing: Show size of requested perf buffer
  bootconfig: Initialize ret in xbc_parse_tree()
  ftrace: do CPU checking after preemption disabled
  ftrace: disable preemption when recursion locked
  tracing/histogram: Document expression arithmetic and constants
  tracing/histogram: Optimize division by a power of 2
  tracing/histogram: Covert expr to const if both operands are constants
  tracing/histogram: Simplify handling of .sym-offset in expressions
  tracing: Fix operator precedence for hist triggers expression
  tracing: Add division and multiplication support for hist triggers
  tracing: Add support for creating hist trigger variables from literal
  selftests/ftrace: Stop tracing while reading the trace file by default
  MAINTAINERS: Update KPROBES and TRACING entries
  test_kprobes: Move it from kernel/ to lib/
  docs, kprobes: Remove invalid URL and add new reference
  samples/kretprobes: Fix return value if register_kretprobe() failed
  lib/bootconfig: Fix the xbc_get_info kerneldoc
  ...
2021-11-01 20:05:19 -07:00
Linus Torvalds bf953917be Various hardening fixes and cleanups for 5.16-rc1
Hi Linus,
 
 Please, pull the following hardening fixes and cleanups that I've
 been collecting during the last development cycle. All of them have
 been baking in linux-next.
 
 Fix -Wcast-function-type error:
 
 - firewire: Remove function callback casts (Oscar Carter)
 
 Fix application of sizeof operator:
 
 - firmware/psci: fix application of sizeof to pointer (jing yangyang)
 
 Replace open coded instances with size_t saturating arithmetic helpers:
 
 - assoc_array: Avoid open coded arithmetic in allocator arguments (Len Baker)
 - writeback: prefer struct_size over open coded arithmetic (Len Baker)
 - aio: Prefer struct_size over open coded arithmetic (Len Baker)
 - dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic (Len Baker)
 
 Flexible array transformation:
 
 - KVM: PPC: Replace zero-length array with flexible array member (Len Baker)
 
 Use 2-factor argument multiplication form:
 
 - nouveau/svm: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)
 - xfs: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)
 
 Thanks
 --
 Gustavo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmGAWzsACgkQRwW0y0cG
 2zF55A/+PTBZKg0XLQkPZ7HFipobeZpfvM0dU4JutwN6Kts1RmMRftPn6ootY18v
 4tWR4jXcnblvEr7UTgYAl6QQdytFXZKOK+JKMWV8LXLqyNGF6sS2PmA6zk/iQoa5
 1q0IKUaaqLIXwmm3xoz+/uNHsb+kfjYOHZpHA6HhYZQFDyShW7+hhIeS1NauJo2X
 op3IWasMumrawPkCJZ0ZJJQLELtZNGt4gHnOjB1MAYhOTAokowgeeDNtyfoJ9j1L
 iL8kimphVLI35H/GERBozmqdqRGIIZLlQF4P66VfNNEXSDoKOemAKDSFrfmYoVwE
 kdh6fqeKPV/aRImrCtNthfpiEjqEpm8afQGMC5H5uPnZontUX9tcU1Qagg0vwYx0
 fLZ8mMuNQK5AZfugK+1+2ShfBYUlhvWRhQdtjC9nIAoO80NqouWB7QD0zIHC2WV7
 durdlhzxik70ISnXqKmTR6bQNcXB6kFLPR30RpcA3E6+AgwlkP0FmaD3e+sDttJ0
 vtxDMHqMMNNzOWlLW2eqEdKMEfoU0gLyRt5iM7EN6R8HUXwup5f9bu7V4LuCnR6y
 FAX4tEa8b5wg01zNfyWClCccU6tetSeXjdrhdIk7szQVsOsYXc4zxDrp6xvqsAh2
 B7GbGk5qeUzM/O7QWNIl+5s/NhUjEzQ3QiQebRDdjVyINU2OKsI=
 =Jk0U
 -----END PGP SIGNATURE-----

Merge tag 'kspp-misc-fixes-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull hardening fixes and cleanups from Gustavo A. R. Silva:
 "Various hardening fixes and cleanups that I've been collecting during
  the last development cycle:

  Fix -Wcast-function-type error:

   - firewire: Remove function callback casts (Oscar Carter)

  Fix application of sizeof operator:

   - firmware/psci: fix application of sizeof to pointer (jing yangyang)

  Replace open coded instances with size_t saturating arithmetic
  helpers:

   - assoc_array: Avoid open coded arithmetic in allocator arguments
     (Len Baker)

   - writeback: prefer struct_size over open coded arithmetic (Len
     Baker)

   - aio: Prefer struct_size over open coded arithmetic (Len Baker)

   - dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic
     (Len Baker)

  Flexible array transformation:

   - KVM: PPC: Replace zero-length array with flexible array member (Len
     Baker)

  Use 2-factor argument multiplication form:

   - nouveau/svm: Use kvcalloc() instead of kvzalloc() (Gustavo A. R.
     Silva)

   - xfs: Use kvcalloc() instead of kvzalloc() (Gustavo A. R. Silva)"

* tag 'kspp-misc-fixes-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  firewire: Remove function callback casts
  nouveau/svm: Use kvcalloc() instead of kvzalloc()
  firmware/psci: fix application of sizeof to pointer
  dmaengine: pxa_dma: Prefer struct_size over open coded arithmetic
  KVM: PPC: Replace zero-length array with flexible array member
  aio: Prefer struct_size over open coded arithmetic
  writeback: prefer struct_size over open coded arithmetic
  xfs: Use kvcalloc() instead of kvzalloc()
  assoc_array: Avoid open coded arithmetic in allocator arguments
2021-11-01 17:29:10 -07:00
Linus Torvalds 01463374c5 cpu-to-thread_info update for v5.16-rc1
Cross-architecture update to move task_struct::cpu back into thread_info
 on arm64, x86, s390, powerpc, and riscv. All Acked by arch maintainers.
 
 Quoting Ard Biesheuvel:
 
 "Move task_struct::cpu back into thread_info
 
  Keeping CPU in task_struct is problematic for architectures that define
  raw_smp_processor_id() in terms of this field, as it requires
  linux/sched.h to be included, which causes a lot of pain in terms of
  circular dependencies (aka 'header soup')
 
  This series moves it back into thread_info (where it came from) for all
  architectures that enable THREAD_INFO_IN_TASK, addressing the header
  soup issue as well as some pointless differences in the implementations
  of task_cpu() and set_task_cpu()."
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmGAEPYWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJq4wEACItgLuyzPgB2eSLVMc3sHPIWcn
 EUWbAWsuzJH79wmJtn2AKxW/C5OLBNGeoNjkXQvFN3ULkQDPrfCpB4x/tB6CjIQI
 WRDf8kO7oaAD85ZrbSwyFl/MFfrD67f6H1HZoB9FKWAzuv/Bp2xQ0Kf06Dv4HEZp
 CzprzZuWtjHB+qgyy+EpGOge3zbFmCuYPE2QpMYLWgs1rcVW9OYvoCI6AYtNefrC
 6Kl6CbmBb1k6lFxkhM7wvRcIJthBl6Bajpc3Z2uL1aLb27dVpQZs3YpY859Knb6U
 ZpOQCRJOMui3HOxyF3bDUI37y0XVLm6xaNM6C/7i0XS1GiFlSxkGVamg+Mp7anpI
 +hdK5kqtSagaBC9CaJvRHnWIex1npQAfiyDNdyiEbrsUJ1dp6/zZcQSe4/m/XRbi
 vywQPGxU9f1ASshzHsGU2TJf7Ps7qHulUsS5fKwmHU2ZjQnbYCoPN10JGO9gKjOX
 yioN5xsKnbPY9j0ys3l9XBqaMJ8KAr1XspplTGIMZIVbjNMlqrfgbg8Qn8T8WGM7
 oUqudMIxczilj0/iEGfGRxBeFaYAfhGQCDnxNlNX9g7Xe/gHTJgNYlHVxL55jHNu
 AoPE3Gd0X8K9fbov0BCB6a21XwGJ6Wj+FSrnvuyWrRuy8JWiDFJaVKUBEcalKr7a
 MhoUNQPu5M83OdC42A==
 =PzvV
 -----END PGP SIGNATURE-----

Merge tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull thread_info update to move 'cpu' back from task_struct from Kees Cook:
 "Cross-architecture update to move task_struct::cpu back into
  thread_info on arm64, x86, s390, powerpc, and riscv. All Acked by arch
  maintainers.

  Quoting Ard Biesheuvel:

     'Move task_struct::cpu back into thread_info

      Keeping CPU in task_struct is problematic for architectures that
      define raw_smp_processor_id() in terms of this field, as it
      requires linux/sched.h to be included, which causes a lot of pain
      in terms of circular dependencies (aka 'header soup')

      This series moves it back into thread_info (where it came from)
      for all architectures that enable THREAD_INFO_IN_TASK, addressing
      the header soup issue as well as some pointless differences in the
      implementations of task_cpu() and set_task_cpu()'"

* tag 'cpu-to-thread_info-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  riscv: rely on core code to keep thread_info::cpu updated
  powerpc: smp: remove hack to obtain offset of task_struct::cpu
  sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y
  powerpc: add CPU field to struct thread_info
  s390: add CPU field to struct thread_info
  x86: add CPU field to struct thread_info
  arm64: add CPU field to struct thread_info
2021-11-01 17:00:05 -07:00
Linus Torvalds 6e5772c8d9 Add an interface called cc_platform_has() which is supposed to be used
by confidential computing solutions to query different aspects of the
 system. The intent behind it is to unify testing of such aspects instead
 of having each confidential computing solution add its own set of tests
 to code paths in the kernel, leading to an unwieldy mess.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/uLUACgkQEsHwGGHe
 VUqGbQ/+LOmz8hmL5vtbXw/lVonCSBRKI2KVefnN2VtQ3rjtCq8HlNoq/hAdi15O
 WntABFV8u4daNAcssp+H/p+c8Mt/NzQa60TRooC5ZIynSOCj4oZQxTWjcnR4Qxrf
 oABy4sp09zNW31qExtTVTwPC/Ejzv4hA0Vqt9TLQOSxp7oYVYKeDJNp79VJK64Yz
 Ky7epgg8Pauk0tAT76ATR4kyy9PLGe4/Ry0bOtAptO4NShL1RyRgI0ywUmptJHSw
 FV/MnoexdAs4V8+4zPwyOkf8YMDnhbJcvFcr7Yd9AEz2q9Z1wKCgi1M3aZIoW8lV
 YMXECMGe9DfxmEJbnP5zbnL6eF32x+tbq+fK8Ye4V2fBucpWd27zkcTXjoP+Y+zH
 NLg+9QykR9QCH75YCOXcAg1Q5hSmc4DaWuJymKjT+W7MKs89ywjq+ybIBpLBHbQe
 uN9FM/CEKXx8nQwpNQc7mdUE5sZeCQ875028RaLbLx3/b6uwT6rBlNJfxl/uxmcZ
 iF1kG7Cx4uO+7G1a9EWgxtWiJQ8GiZO7PMCqEdwIymLIrlNksAk7nX2SXTuH5jIZ
 YDuBj/Xz2UUVWYFm88fV5c4ogiFlm9Jeo140Zua/BPdDJd2VOP013rYxzFE/rVSF
 SM2riJxCxkva8Fb+8TNiH42AMhPMSpUt1Nmd1H2rcEABRiT83Ow=
 =Na0U
 -----END PGP SIGNATURE-----

Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull generic confidential computing updates from Borislav Petkov:
 "Add an interface called cc_platform_has() which is supposed to be used
  by confidential computing solutions to query different aspects of the
  system.

  The intent behind it is to unify testing of such aspects instead of
  having each confidential computing solution add its own set of tests
  to code paths in the kernel, leading to an unwieldy mess"

* tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  treewide: Replace the use of mem_encrypt_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_active() with cc_platform_has()
  x86/sme: Replace occurrences of sme_active() with cc_platform_has()
  powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
  x86/sev: Add an x86 version of cc_platform_has()
  arch/cc: Introduce a function to check for confidential computing features
  x86/ioremap: Selectively build arch override encryption functions
2021-11-01 15:16:52 -07:00
Linus Torvalds 9a7e0a90a4 Scheduler updates:
- Revert the printk format based wchan() symbol resolution as it can leak
    the raw value in case that the symbol is not resolvable.
 
  - Make wchan() more robust and work with all kind of unwinders by
    enforcing that the task stays blocked while unwinding is in progress.
 
  - Prevent sched_fork() from accessing an invalid sched_task_group
 
  - Improve asymmetric packing logic
 
  - Extend scheduler statistics to RT and DL scheduling classes and add
    statistics for bandwith burst to the SCHED_FAIR class.
 
  - Properly account SCHED_IDLE entities
 
  - Prevent a potential deadlock when initial priority is assigned to a
    newly created kthread. A recent change to plug a race between cpuset and
    __sched_setscheduler() introduced a new lock dependency which is now
    triggered. Break the lock dependency chain by moving the priority
    assignment to the thread function.
 
  - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
 
  - Improve idle balancing in general and especially for NOHZ enabled
    systems.
 
  - Provide proper interfaces for live patching so it does not have to
    fiddle with scheduler internals.
 
  - Add cluster aware scheduling support.
 
  - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
    scheduler options and delaying mmdrop)
 
  - The usual small tweaks and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
 iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
 /1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
 aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
 3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
 ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
 vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
 mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
 V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
 s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
 i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
 d2qWG7UvoseT+g==
 =fgtS
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Thomas Gleixner:

 - Revert the printk format based wchan() symbol resolution as it can
   leak the raw value in case that the symbol is not resolvable.

 - Make wchan() more robust and work with all kind of unwinders by
   enforcing that the task stays blocked while unwinding is in progress.

 - Prevent sched_fork() from accessing an invalid sched_task_group

 - Improve asymmetric packing logic

 - Extend scheduler statistics to RT and DL scheduling classes and add
   statistics for bandwith burst to the SCHED_FAIR class.

 - Properly account SCHED_IDLE entities

 - Prevent a potential deadlock when initial priority is assigned to a
   newly created kthread. A recent change to plug a race between cpuset
   and __sched_setscheduler() introduced a new lock dependency which is
   now triggered. Break the lock dependency chain by moving the priority
   assignment to the thread function.

 - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.

 - Improve idle balancing in general and especially for NOHZ enabled
   systems.

 - Provide proper interfaces for live patching so it does not have to
   fiddle with scheduler internals.

 - Add cluster aware scheduling support.

 - A small set of tweaks for RT (irqwork, wait_task_inactive(), various
   scheduler options and delaying mmdrop)

 - The usual small tweaks and improvements all over the place

* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
  sched/fair: Cleanup newidle_balance
  sched/fair: Remove sysctl_sched_migration_cost condition
  sched/fair: Wait before decaying max_newidle_lb_cost
  sched/fair: Skip update_blocked_averages if we are defering load balance
  sched/fair: Account update_blocked_averages in newidle_balance cost
  x86: Fix __get_wchan() for !STACKTRACE
  sched,x86: Fix L2 cache mask
  sched/core: Remove rq_relock()
  sched: Improve wake_up_all_idle_cpus() take #2
  irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
  irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
  irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
  sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
  sched: Add cluster scheduler level for x86
  sched: Add cluster scheduler level in core and related Kconfig for ARM64
  topology: Represent clusters of CPUs within a die
  sched: Disable -Wunused-but-set-variable
  sched: Add wrapper for get_wchan() to keep task blocked
  x86: Fix get_wchan() to support the ORC unwinder
  proc: Use task_is_running() for wchan in /proc/$pid/stat
  ...
2021-11-01 13:48:52 -07:00
Linus Torvalds 595b28fb0c Locking updates:
- Move futex code into kernel/futex/ and split up the kitchen sink into
    seperate files to make integration of sys_futex_waitv() simpler.
 
  - Add a new sys_futex_waitv() syscall which allows to wait on multiple
    futexes. The main use case is emulating Windows' WaitForMultipleObjects
    which allows Wine to improve the performance of Windows Games. Also
    native Linux games can benefit from this interface as this is a common
    wait pattern for this kind of applications.
 
  - Add context to ww_mutex_trylock() to provide a path for i915 to rework
    their eviction code step by step without making lockdep upset until the
    final steps of rework are completed. It's also useful for regulator and
    TTM to avoid dropping locks in the non contended path.
 
  - Lockdep and might_sleep() cleanups and improvements
 
  - A few improvements for the RT substitutions.
 
  - The usual small improvements and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/FTITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoVNZD/9vIm3Bu1Coz8tbNXz58AiCYq9Y/vp5
 mzFgSzz+VJTkW5Vh8jo5Uel4rCKZyt+rL276EoaRPzYl8KFtWDbpK3qd3PrXKqTX
 At49JO4ttAMJUHIBQ6vblEkykmfEd9YPU1uSWk5roJ+s7Jmr5VWnu0FEWHP00As5
 tWOca/TM0ei9kof26V2fl5aecTGII4i4Zsvy+LPsXtI+TnmP0gSBcGAS/5UnZTtJ
 vQRWTR3ojoYvh5iTmNqbaURYoQLe2j8yscn1DSW1CABWVmP12eDWs+N7jRP4b5S9
 73xOv5P7vpva41wxrK2ir5iNkpsLE97VL2JOHTW8nm7orblfiuxHLTCkTjEdd2pO
 h8blI2IBizEB3JYn2BMkOAaZQOSjN8hd6Ye/b2B4AMEGWeXEoEv6eVy/orYKCluQ
 XDqGn47Vce/SYmo5vfTB8VMt6nANx8PKvOP3IvjHInYEQBgiT6QrlUw3RRkXBp5s
 clQkjYYwjAMVIXowcCrdhoKjMROzi6STShVwHwGL8MaZXqr8Vl6BUO9ckU0pY+4C
 F000Hzwxi8lGEQ9k+P+BnYOEzH5osCty8lloKiQ/7ciX6T+CZHGJPGK/iY4YL8P5
 C3CJWMsHCqST7DodNFJmdfZt99UfIMmEhshMDduU9AAH0tHCn8vOu0U6WvCtpyBp
 BvHj68zteAtlYg==
 =RZ4x
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:

 - Move futex code into kernel/futex/ and split up the kitchen sink into
   seperate files to make integration of sys_futex_waitv() simpler.

 - Add a new sys_futex_waitv() syscall which allows to wait on multiple
   futexes.

   The main use case is emulating Windows' WaitForMultipleObjects which
   allows Wine to improve the performance of Windows Games. Also native
   Linux games can benefit from this interface as this is a common wait
   pattern for this kind of applications.

 - Add context to ww_mutex_trylock() to provide a path for i915 to
   rework their eviction code step by step without making lockdep upset
   until the final steps of rework are completed. It's also useful for
   regulator and TTM to avoid dropping locks in the non contended path.

 - Lockdep and might_sleep() cleanups and improvements

 - A few improvements for the RT substitutions.

 - The usual small improvements and cleanups.

* tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  locking: Remove spin_lock_flags() etc
  locking/rwsem: Fix comments about reader optimistic lock stealing conditions
  locking: Remove rcu_read_{,un}lock() for preempt_{dis,en}able()
  locking/rwsem: Disable preemption for spinning region
  docs: futex: Fix kernel-doc references
  futex: Fix PREEMPT_RT build
  futex2: Documentation: Document sys_futex_waitv() uAPI
  selftests: futex: Test sys_futex_waitv() wouldblock
  selftests: futex: Test sys_futex_waitv() timeout
  selftests: futex: Add sys_futex_waitv() test
  futex,arm: Wire up sys_futex_waitv()
  futex,x86: Wire up sys_futex_waitv()
  futex: Implement sys_futex_waitv()
  futex: Simplify double_lock_hb()
  futex: Split out wait/wake
  futex: Split out requeue
  futex: Rename mark_wake_futex()
  futex: Rename: match_futex()
  futex: Rename: hb_waiter_{inc,dec,pending}()
  futex: Split out PI futex
  ...
2021-11-01 13:15:36 -07:00
Arnd Bergmann f98a3dccfc locking: Remove spin_lock_flags() etc
parisc, ia64 and powerpc32 are the only remaining architectures that
provide custom arch_{spin,read,write}_lock_flags() functions, which are
meant to re-enable interrupts while waiting for a spinlock.

However, none of these can actually run into this codepath, because
it is only called on architectures without CONFIG_GENERIC_LOCKBREAK,
or when CONFIG_DEBUG_LOCK_ALLOC is set without CONFIG_LOCKDEP, and none
of those combinations are possible on the three architectures.

Going back in the git history, it appears that arch/mn10300 may have
been able to run into this code path, but there is a good chance that
it never worked. On the architectures that still exist, it was
already impossible to hit back in 2008 after the introduction of
CONFIG_GENERIC_LOCKBREAK, and possibly earlier.

As this is all dead code, just remove it and the helper functions built
around it. For arch/ia64, the inline asm could be cleaned up, but
it seems safer to leave it untouched.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Helge Deller <deller@gmx.de>  # parisc
Link: https://lore.kernel.org/r/20211022120058.1031690-1-arnd@kernel.org
2021-10-30 16:37:28 +02:00
Michael Ellerman 5b9f69ffa6 Merge branch 'topic/ppc-kvm' into next
Merge a couple of KVM ppc patches we are keeping in a topic branch.
2021-10-29 00:00:57 +11:00
Christophe Leroy b6cb20fdc2 powerpc/book3e: Fix set_memory_x() and set_memory_nx()
set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC.
set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC.

Book3e has 2 bits, UX and SX, which defines the exec rights
resp. for user (PR=1) and for kernel (PR=0).

_PAGE_EXEC is defined as UX only.

An executable kernel page is set with either _PAGE_KERNEL_RWX
or _PAGE_KERNEL_ROX, which both have SX set and UX cleared.

So set_memory_nx() call for an executable kernel page does
nothing because UX is already cleared.

And set_memory_x() on a non-executable kernel page makes it
executable for the user and keeps it non-executable for kernel.

Also, pte_exec() always returns 'false' on kernel pages, because
it checks _PAGE_EXEC which doesn't include SX, so for instance
the W+X check doesn't work.

To fix this:
  - change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER
  - sets both UX and SX in _PAGE_EXEC so that pte_exec() returns
    true whenever one of the two bits is set and pte_exprotect()
    clears both bits.
  - Define a book3e specific version of pte_mkexec() which sets
    either SX or UX based on UR.

Fixes: 1f9ad21c3b ("powerpc/mm: Implement set_memory() routines")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c41100f9c144dc5b62e5a751b810190c6b5d42fd.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28 00:41:29 +11:00
Christophe Leroy b1b93cb7e7 powerpc/nohash: Fix __ptep_set_access_flags() and ptep_set_wrprotect()
Commit 26973fa5ac ("powerpc/mm: use pte helpers in generic code")
changed those two functions to use pte helpers to determine which
bits to clear and which bits to set.

This change was based on the assumption that bits to be set/cleared
are always the same and can be determined by applying the pte
manipulation helpers on __pte(0).

But on platforms like book3e, the bits depend on whether the page
is a user page or not.

For the time being it more or less works because of _PAGE_EXEC being
used for user pages only and exec right being set at all time on
kernel page. But following patch will clean that and output of
pte_mkexec() will depend on the page being a user or kernel page.

Instead of trying to make an even more complicated helper where bits
would become dependent on the final pte value, come back to a more
static situation like before commit 26973fa5ac ("powerpc/mm: use
pte helpers in generic code"), by introducing an 8xx specific
version of __ptep_set_access_flags() and ptep_set_wrprotect().

Fixes: 26973fa5ac ("powerpc/mm: use pte helpers in generic code")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/922bdab3a220781bae2360ff3dd5adb7fe4d34f1.1635226743.git.christophe.leroy@csgroup.eu
2021-10-28 00:41:28 +11:00
Michael Ellerman b7472e1764 Revert "powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC"
This reverts commit 566af8cda3.

This caused some conflicts vs the audit tree, and the audit maintainers
would prefer we postpone this to the next merge window so we have more
time for testing.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-10-27 22:30:32 +11:00
Christophe Leroy c7d19189d7 powerpc/32: Don't use a struct based type for pte_t
Long time ago we had a config item called STRICT_MM_TYPECHECKS
to build the kernel with pte_t defined as a structure in order
to perform additional build checks or build it with pte_t
defined as a simple type in order to get simpler generated code.

Commit 670eea9241 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS")
made the struct based definition the only one, considering that the
generated code was similar in both cases.

That's right on ppc64 because the ABI is such that the content of a
struct having a single simple type element is passed as register,
but on ppc32 such a structure is passed via the stack like any
structure.

Simple test function:

	pte_t test(pte_t pte)
	{
		return pte;
	}

Before this patch we get

	c00108ec <test>:
	c00108ec:	81 24 00 00 	lwz     r9,0(r4)
	c00108f0:	91 23 00 00 	stw     r9,0(r3)
	c00108f4:	4e 80 00 20 	blr

So, for PPC32, restore the simple type behaviour we got before
commit 670eea9241, but instead of adding a config option to
activate type check, do it when __CHECKER__ is set so that type
checking is performed by 'sparse' and provides feedback like:

	arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types)
	arch/powerpc/mm/pgtable.c:466:16:    expected unsigned long
	arch/powerpc/mm/pgtable.c:466:16:    got struct pte_t [usertype] x

With this patch we now get

	c0010890 <test>:
	c0010890:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Define STRICT_MM_TYPECHECKS rather than repeating the condition]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c904599f33aaf6bb7ee2836a9ff8368509e0d78d.1631887042.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:06 +11:00
Christophe Leroy 63f501e07a powerpc/8xx: Simplify TLB handling
In the old days, TLB handling for 8xx was using tlbie and tlbia
instructions directly as much as possible.

But commit f048aace29 ("powerpc/mm: Add SMP support to no-hash
TLB handling") broke that by introducing out-of-line unnecessary
complex functions for booke/smp which don't have tlbie/tlbia
instructions and require more complex handling.

Restore direct use of tlbie and tlbia for 8xx which is never SMP.

With this patch we now get

	c00ecc68 <ptep_clear_flush>:
	c00ecc68:	39 00 00 00 	li      r8,0
	c00ecc6c:	81 46 00 00 	lwz     r10,0(r6)
	c00ecc70:	91 06 00 00 	stw     r8,0(r6)
	c00ecc74:	7c 00 2a 64 	tlbie   r5,r0
	c00ecc78:	7c 00 04 ac 	hwsync
	c00ecc7c:	91 43 00 00 	stw     r10,0(r3)
	c00ecc80:	4e 80 00 20 	blr

Before it was

	c0012880 <local_flush_tlb_page>:
	c0012880:	2c 03 00 00 	cmpwi   r3,0
	c0012884:	41 82 00 54 	beq     c00128d8 <local_flush_tlb_page+0x58>
	c0012888:	81 22 00 00 	lwz     r9,0(r2)
	c001288c:	81 43 00 20 	lwz     r10,32(r3)
	c0012890:	39 29 00 01 	addi    r9,r9,1
	c0012894:	91 22 00 00 	stw     r9,0(r2)
	c0012898:	2c 0a 00 00 	cmpwi   r10,0
	c001289c:	41 82 00 10 	beq     c00128ac <local_flush_tlb_page+0x2c>
	c00128a0:	81 2a 01 dc 	lwz     r9,476(r10)
	c00128a4:	2c 09 ff ff 	cmpwi   r9,-1
	c00128a8:	41 82 00 0c 	beq     c00128b4 <local_flush_tlb_page+0x34>
	c00128ac:	7c 00 22 64 	tlbie   r4,r0
	c00128b0:	7c 00 04 ac 	hwsync
	c00128b4:	81 22 00 00 	lwz     r9,0(r2)
	c00128b8:	39 29 ff ff 	addi    r9,r9,-1
	c00128bc:	2c 09 00 00 	cmpwi   r9,0
	c00128c0:	91 22 00 00 	stw     r9,0(r2)
	c00128c4:	4c a2 00 20 	bclr+   4,eq
	c00128c8:	81 22 00 70 	lwz     r9,112(r2)
	c00128cc:	71 29 00 04 	andi.   r9,r9,4
	c00128d0:	4d 82 00 20 	beqlr
	c00128d4:	48 65 76 74 	b       c0669f48 <preempt_schedule>
	c00128d8:	81 22 00 00 	lwz     r9,0(r2)
	c00128dc:	39 29 00 01 	addi    r9,r9,1
	c00128e0:	91 22 00 00 	stw     r9,0(r2)
	c00128e4:	4b ff ff c8 	b       c00128ac <local_flush_tlb_page+0x2c>
...
	c00ecdc8 <ptep_clear_flush>:
	c00ecdc8:	94 21 ff f0 	stwu    r1,-16(r1)
	c00ecdcc:	39 20 00 00 	li      r9,0
	c00ecdd0:	93 c1 00 08 	stw     r30,8(r1)
	c00ecdd4:	83 c6 00 00 	lwz     r30,0(r6)
	c00ecdd8:	91 26 00 00 	stw     r9,0(r6)
	c00ecddc:	93 e1 00 0c 	stw     r31,12(r1)
	c00ecde0:	7c 08 02 a6 	mflr    r0
	c00ecde4:	7c 7f 1b 78 	mr      r31,r3
	c00ecde8:	7c 83 23 78 	mr      r3,r4
	c00ecdec:	7c a4 2b 78 	mr      r4,r5
	c00ecdf0:	90 01 00 14 	stw     r0,20(r1)
	c00ecdf4:	4b f2 5a 8d 	bl      c0012880 <local_flush_tlb_page>
	c00ecdf8:	93 df 00 00 	stw     r30,0(r31)
	c00ecdfc:	7f e3 fb 78 	mr      r3,r31
	c00ece00:	80 01 00 14 	lwz     r0,20(r1)
	c00ece04:	83 c1 00 08 	lwz     r30,8(r1)
	c00ece08:	83 e1 00 0c 	lwz     r31,12(r1)
	c00ece0c:	7c 08 03 a6 	mtlr    r0
	c00ece10:	38 21 00 10 	addi    r1,r1,16
	c00ece14:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fb324f1c8f2ddb57cf6aad1cea26329558f1c1c0.1631887021.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy 5c810ced36 powerpc/32: Add support for out-of-line static calls
Add support for out-of-line static calls on PPC32. This change
improve performance of calls to global function pointers by
using direct calls instead of indirect calls.

The trampoline is initialy populated with a 'blr' or branch to target,
followed by an unreachable long jump sequence.

In order to cater with parallele execution, the trampoline needs to
be updated in a way that ensures it remains consistent at all time.
This means we can't use the traditional lis/addi to load r12 with
the target address, otherwise there would be a window during which
the first instruction contains the upper part of the new target
address while the second instruction still contains the lower part of
the old target address. To avoid that the target address is stored
just after the 'bctr' and loaded from there with a single instruction.

Then, depending on the target distance, arch_static_call_transform()
will either replace the first instruction by a direct 'bl <target>' or
'nop' in order to have the trampoline fall through the long jump
sequence.

For the special case of __static_call_return0(), to avoid the risk of
a far branch, a version of it is inlined at the end of the trampoline.

Performancewise the long jump sequence is probably not better than
the indirect calls set by GCC when we don't use static calls, but
such calls are unlikely to be required on powerpc32: With most
configurations the kernel size is far below 32 Mbytes so only
modules may happen to be too far. And even modules are likely to
be close enough as they are allocated below the kernel core and
as close as possible of the kernel text.

static_call selftest is running successfully with this change.

With this patch, __do_irq() has the following sequence to trace
irq entries:

	c0004a00 <__SCT__tp_func_irq_entry>:
	c0004a00:	48 00 00 e0 	b       c0004ae0 <__traceiter_irq_entry>
	c0004a04:	3d 80 c0 00 	lis     r12,-16384
	c0004a08:	81 8c 4a 1c 	lwz     r12,18972(r12)
	c0004a0c:	7d 89 03 a6 	mtctr   r12
	c0004a10:	4e 80 04 20 	bctr
	c0004a14:	38 60 00 00 	li      r3,0
	c0004a18:	4e 80 00 20 	blr
	c0004a1c:	00 00 00 00 	.long 0x0
...
	c0005654 <__do_irq>:
...
	c0005664:	7c 7f 1b 78 	mr      r31,r3
...
	c00056a0:	81 22 00 00 	lwz     r9,0(r2)
	c00056a4:	39 29 00 01 	addi    r9,r9,1
	c00056a8:	91 22 00 00 	stw     r9,0(r2)
	c00056ac:	3d 20 c0 af 	lis     r9,-16209
	c00056b0:	81 29 74 cc 	lwz     r9,29900(r9)
	c00056b4:	2c 09 00 00 	cmpwi   r9,0
	c00056b8:	41 82 00 10 	beq     c00056c8 <__do_irq+0x74>
	c00056bc:	80 69 00 04 	lwz     r3,4(r9)
	c00056c0:	7f e4 fb 78 	mr      r4,r31
	c00056c4:	4b ff f3 3d 	bl      c0004a00 <__SCT__tp_func_irq_entry>

Before this patch, __do_irq() was doing the following to trace irq
entries:

	c0005700 <__do_irq>:
...
	c0005710:	7c 7e 1b 78 	mr      r30,r3
...
	c000574c:	93 e1 00 0c 	stw     r31,12(r1)
	c0005750:	81 22 00 00 	lwz     r9,0(r2)
	c0005754:	39 29 00 01 	addi    r9,r9,1
	c0005758:	91 22 00 00 	stw     r9,0(r2)
	c000575c:	3d 20 c0 af 	lis     r9,-16209
	c0005760:	83 e9 f4 cc 	lwz     r31,-2868(r9)
	c0005764:	2c 1f 00 00 	cmpwi   r31,0
	c0005768:	41 82 00 24 	beq     c000578c <__do_irq+0x8c>
	c000576c:	81 3f 00 00 	lwz     r9,0(r31)
	c0005770:	80 7f 00 04 	lwz     r3,4(r31)
	c0005774:	7d 29 03 a6 	mtctr   r9
	c0005778:	7f c4 f3 78 	mr      r4,r30
	c000577c:	4e 80 04 21 	bctrl
	c0005780:	85 3f 00 0c 	lwzu    r9,12(r31)
	c0005784:	2c 09 00 00 	cmpwi   r9,0
	c0005788:	40 82 ff e4 	bne     c000576c <__do_irq+0x6c>

Behind the fact of now using a direct 'bl' instead of a
'load/mtctr/bctr' sequence, we can also see that we get one less
register on the stack.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy 8f7fadb4ba powerpc/machdep: Remove stale functions from ppc_md structure
ppc_md.iommu_save() is not set anymore by any platform after
commit c40785ad30 ("powerpc/dart: Use a cachable DART").
So iommu_save() has become a nop and can be removed.

ppc_md.show_percpuinfo() is not set anymore by any platform after
commit 4350147a81 ("[PATCH] ppc64: SMU based macs cpufreq support").

Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were
removed by commit 0f03a43b8f ("[POWERPC] Remove todc code from
ARCH=powerpc")

Last user of kgdb_map_scc() was removed by commit 17ce452f7e ("kgdb,
powerpc: arch specific powerpc kgdb support").

ppc.machine_kexec_prepare() has not been used since
commit 8ee3e0d696 ("powerpc: Remove the main legacy iSerie platform
code"). This allows the removal of machine_kexec_prepare() and the
rename of default_machine_kexec_prepare() into machine_kexec_prepare()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Daniel Axtens <dja@axtens.net>
[mpe: Drop prototype for default_machine_kexec_prepare() as noted by dja]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24d4ca0ada683c9436a5f812a7aeb0a1362afa2b.1630398606.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:05 +11:00
Christophe Leroy 566af8cda3 powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
Commit e65e1fc2d2 ("[PATCH] syscall class hookup for all normal
targets") added generic support for AUDIT but that didn't include
support for bi-arch like powerpc.

Commit 4b58841149 ("audit: Add generic compat syscall support")
added generic support for bi-arch.

Convert powerpc to that bi-arch generic audit support.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a4b3951d1191d4183d92a07a6097566bde60d00a.1629812058.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:04 +11:00
Christophe Leroy a85c728cb5 powerpc/32: Don't use lmw/stmw for saving/restoring non volatile regs
Instructions lmw/stmw are interesting for functions that are rarely
used and not in the cache, because only one instruction is to be
copied into the instruction cache instead of 19. However those
instruction are less performant than 19x raw lwz/stw as they require
synchronisation plus one additional cycle.

SAVE_NVGPRS / REST_NVGPRS are used in only a few places which are
mostly in interrupts entries/exits and in task switch so they are
likely already in the cache.

Using standard lwz improves null_syscall selftest by:
- 10 cycles on mpc832x.
- 2 cycles on mpc8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/316c543b8906712c108985c8463eec09c8db577b.1629732542.git.christophe.leroy@csgroup.eu
2021-10-22 15:22:04 +11:00
Joel Stanley 4f703e7faa powerpc/s64: Clarify that radix lacks DEBUG_PAGEALLOC
The page_alloc.c code will call into __kernel_map_pages() when
DEBUG_PAGEALLOC is configured and enabled.

As the implementation assumes hash, this should crash spectacularly if
not for a bit of luck in __kernel_map_pages(). In this function
linear_map_hash_count is always zero, the for loop exits without doing
any damage.

There are no other platforms that determine if they support
debug_pagealloc at runtime. Instead of adding code to mm/page_alloc.c to
do that, this change turns the map/unmap into a noop when in radix
mode and prints a warning once.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Reformat if per Christophe's suggestion]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211013213438.675095-1-joel@jms.id.au
2021-10-22 15:22:02 +11:00
Len Baker 5dfbbb668a KVM: PPC: Replace zero-length array with flexible array member
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use "flexible array members" [1] for these cases. The
older style of one-element or zero-length arrays should no longer be
used[2].

Also, make use of the struct_size() helper in kzalloc().

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays

Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-10-20 18:30:42 -05:00
Kees Cook 42a20f86dc sched: Add wrapper for get_wchan() to keep task blocked
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
2021-10-15 11:25:14 +02:00
Nicholas Piggin 322fda0405 KVM: PPC: Book3S HV: H_ENTER filter out reserved HPTE[B] value
The HPTE B field is a 2-bit field with values 0b10 and 0b11 reserved.
This field is also taken from the HPTE and used when KVM executes
TLBIEs to set the B field of those instructions.

Disallow the guest setting B to a reserved value with H_ENTER by
rejecting it. This is the same approach already taken for rejecting
reserved (unsupported) LLP values. This prevents the guest from being
able to induce the host to execute TLBIE with reserved values, which
is not known to be a problem with current processors but in theory it
could prevent the TLBIE from working correctly in a future processor.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145749.1331331-1-npiggin@gmail.com
2021-10-13 13:08:16 +11:00
Uwe Kleine-König 5a72431ec3 powerpc/eeh: Use dev_driver_string() instead of struct pci_dev->driver->name
Replace pdev->driver->name by dev_driver_string() for the corresponding
struct device.  This is a step toward removing pci_dev->driver.

Move the function nearer its only user and instead of the ?: operator use a
normal "if" which is more readable.

Link: https://lore.kernel.org/r/20211004125935.2300113-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-10-12 17:50:12 -05:00
Athira Rajeev 29908bbf7b powerpc/perf: Expose instruction and data address registers as part of extended regs
Patch adds support to include Sampled Instruction Address Register
(SIAR) and Sampled Data Address Register (SDAR) SPRs as part of extended
registers. Update the definition of PERF_REG_PMU_MASK_300/31 and
PERF_REG_EXTENDED_MAX to include these SPR's.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211007065505.27809-4-atrajeev@linux.vnet.ibm.com
2021-10-12 18:39:02 +11:00
Athira Rajeev 02b182e674 powerpc/perf: Refactor the code definition of perf reg extended mask
PERF_REG_PMU_MASK_300 and PERF_REG_PMU_MASK_31 defines the mask
value for extended registers. Current definition of these mask values
uses hex constant and does not use registers by name, making it less
readable. Patch refactor the macro values by or'ing together the actual
register value constants. Also include PERF_REG_EXTENDED_MAX as
part of enum definition.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211007065505.27809-2-atrajeev@linux.vnet.ibm.com
2021-10-12 18:38:50 +11:00
Nathan Lynch fda0eb2200 powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted()
vcpu_is_preempted() can be used outside of preempt-disabled critical
sections, yielding warnings such as:

BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
caller is rwsem_spin_on_owner+0x1cc/0x2d0
CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
Call Trace:
[c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
[c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
[c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
[c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
[c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
[c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
[c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
[c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
[c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250

The result of vcpu_is_preempted() is always used speculatively, and the
function does not access per-cpu resources in a (Linux) preempt-unsafe way.
Use raw_smp_processor_id() to avoid such warnings, adding explanatory
comments.

Fixes: ca3f969dcb ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210928214147.312412-3-nathanl@linux.ibm.com
2021-10-09 00:15:59 +11:00
Nathan Lynch 799f9b51db powerpc/paravirt: vcpu_is_preempted() commentary
Add comments more clearly documenting that this function determines whether
hypervisor-level preemption of the VM has occurred.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210928214147.312412-2-nathanl@linux.ibm.com
2021-10-09 00:15:59 +11:00
Nick Desaulniers 2a24d80fc8 powerpc/asm: Remove UPD_CONSTR after GCC 4.9 removal
UPD_CONSTR was previously a preprocessor define for an old GCC 4.9
inline asm bug with m<> constraints.

Fixes: 6563139d90 ("powerpc: remove GCC version check for UPD_CONSTR")
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210914161712.2463458-1-ndesaulniers@google.com
2021-10-09 00:15:59 +11:00
Christophe Leroy d93f9e2374 powerpc/32s: Fix kuap_kernel_restore()
At interrupt exit, kuap_kernel_restore() calls kuap_unlock() with the
value contained in regs->kuap. However, when regs->kuap contains
0xffffffff it means that KUAP was not unlocked so calling kuap_unlock()
is unrelevant and results in jeopardising the contents of kernel space
segment registers.

So check that regs->kuap doesn't contain KUAP_NONE before calling
kuap_unlock(). In the meantime it also means that if KUAP has not
been correcly locked back at interrupt exit, it must be locked
before continuing. This is done by checking the content of
current->thread.kuap which was returned by kuap_get_and_assert_locked()

Fixes: 16132529ce ("powerpc/32s: Rework Kernel Userspace Access Protection")
Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d0c4d0f050a637052287c09ba521bad960a2790.1631715131.git.christophe.leroy@csgroup.eu
2021-10-07 23:34:00 +11:00
Nicholas Piggin f08fb25bc6 powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
The machine check handler is not considered NMI on 64s. The early
handler is the true NMI handler, and then it schedules the
machine_check_exception handler to run when interrupts are enabled.

This works fine except the case of an unrecoverable MCE, where the true
NMI is taken when MSR[RI] is clear, it can not recover, so it calls
machine_check_exception directly so something might be done about it.

Calling an async handler from NMI context can result in irq state and
other things getting corrupted. This can also trigger the BUG at
  arch/powerpc/include/asm/interrupt.h:168
  BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE));

Fix this by making an _async version of the handler which is called
in the normal case, and a NMI version that is called for unrecoverable
interrupts.

Fixes: 2b43dd7653 ("powerpc/64: enable MSR[EE] in irq replay pt_regs")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-6-npiggin@gmail.com
2021-10-07 19:54:55 +11:00
Nicholas Piggin 768c470103 powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
If a NMI hits early in an interrupt handler before the irq soft-mask
state is reconciled, that can cause a false-positive BUG with a
CONFIG_PPC_IRQ_SOFT_MASK_DEBUG assertion.

Remove that assertion and instead check the case that if regs->msr has
EE clear, then regs->softe should be marked as disabled so the irq state
looks correct to NMI handlers, the same as how it's fixed up in the
case it was implicit soft-masked.

This doesn't fix a known problem -- the change that was fixed by commit
4ec5feec1a ("powerpc/64s: Make NMI record implicitly soft-masked code
as irqs disabled") was the addition of a warning in the soft-nmi
watchdog interrupt which can never actually fire when MSR[EE]=0. However
it may be important if NMI handlers grow more code, and it's less
surprising to anything using 'regs' - (I tripped over this when working
in the area).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211004145642.1331214-5-npiggin@gmail.com
2021-10-07 19:54:55 +11:00
Naveen N. Rao 030905920f powerpc/security: Add a helper to query stf_barrier type
Add a helper to return the stf_barrier type for the current processor.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 19:52:58 +11:00
Naveen N. Rao 4549c3ea31 powerpc/lib: Add helper to check if offset is within conditional branch range
Add a helper to check if a given offset is within the branch range for a
powerpc conditional branch instruction, and update some sites to use the
new helper.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/442b69a34ced32ca346a0d9a855f3f6cfdbbbd41.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07 18:52:33 +11:00
Tom Lendacky e9d1d2bb75 treewide: Replace the use of mem_encrypt_active() with cc_platform_has()
Replace uses of mem_encrypt_active() with calls to cc_platform_has() with
the CC_ATTR_MEM_ENCRYPT attribute.

Remove the implementation of mem_encrypt_active() across all arches.

For s390, since the default implementation of the cc_platform_has()
matches the s390 implementation of mem_encrypt_active(), cc_platform_has()
does not need to be implemented in s390 (the config option
ARCH_HAS_CC_PLATFORM is not set).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210928191009.32551-9-bp@alien8.de
2021-10-04 11:47:24 +02:00
Masami Hiramatsu adf8a61a94 kprobes: treewide: Make it harder to refer kretprobe_trampoline directly
Since now there is kretprobe_trampoline_addr() for referring the
address of kretprobe trampoline code, we don't need to access
kretprobe_trampoline directly.

Make it harder to refer by renaming it to __kretprobe_trampoline().

Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30 21:24:06 -04:00
Ard Biesheuvel 336868afba powerpc: smp: remove hack to obtain offset of task_struct::cpu
Instead of relying on awful hacks to obtain the offset of the cpu field
in struct task_struct, move it back into struct thread_info, which does
not create the same level of circular dependency hell when trying to
include the header file that defines it.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2021-09-30 16:13:11 +02:00
Ard Biesheuvel 227d735d88 powerpc: add CPU field to struct thread_info
The CPU field will be moved back into thread_info even when
THREAD_INFO_IN_TASK is enabled, so add it back to powerpc's definition
of struct thread_info.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2021-09-30 16:13:10 +02:00
Juergen Gross a1c42ddedf kvm: rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS
KVM_MAX_VCPU_ID is not specifying the highest allowed vcpu-id, but the
number of allowed vcpu-ids. This has already led to confusion, so
rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS to make its semantics more
clear

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210913135745.13944-3-jgross@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30 04:27:05 -04:00
Peter Collingbourne 7962c2eddb
arch: remove unused function syscall_set_arguments()
This function appears to have been unused since it was first introduced in
commit 828c365cc8 ("tracehook: asm/syscall.h").

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/I8ce04f002903a37c0b6c1d16e9b2a3afa716c097
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-09-14 16:06:20 +02:00
Linus Torvalds 316346243b Merge branch 'gcc-min-version-5.1' (make gcc-5.1 the minimum version)
Merge patch series from Nick Desaulniers to update the minimum gcc
version to 5.1.

This is some of the left-overs from the merge window that I didn't want
to deal with yesterday, so it comes in after -rc1 but was sent before.

Gcc-4.9 support has been an annoyance for some time, and with -Werror I
had the choice of applying a fairly big patch from Kees Cook to remove a
fair number of initializer warnings (still leaving some), or this patch
series from Nick that just removes the source of the problem.

The initializer cleanups might still be worth it regardless, but
honestly, I preferred just tackling the problem with gcc-4.9 head-on.
We've been more aggressiuve about no longer having to care about
compilers that were released a long time ago, and I think it's been a
good thing.

I added a couple of patches on top to sort out a few left-overs now that
we no longer support gcc-4.x.

As noted by Arnd, as a result of this minimum compiler version upgrade
we can probably change our use of '--std=gnu89' to '--std=gnu11', and
finally start using local loop declarations etc.  But this series does
_not_ yet do that.

Link: https://lore.kernel.org/all/20210909182525.372ee687@canb.auug.org.au/
Link: https://lore.kernel.org/lkml/CAK7LNASs6dvU6D3jL2GG3jW58fXfaj6VNOe55NJnTB8UPuk2pA@mail.gmail.com/
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

* emailed patches from Nick Desaulniers <ndesaulniers@google.com>:
  Drop some straggling mentions of gcc-4.9 as being stale
  compiler_attributes.h: drop __has_attribute() support for gcc4
  vmlinux.lds.h: remove old check for GCC 4.9
  compiler-gcc.h: drop checks for older GCC versions
  Makefile: drop GCC < 5 -fno-var-tracking-assignments workaround
  arm64: remove GCC version check for ARCH_SUPPORTS_INT128
  powerpc: remove GCC version check for UPD_CONSTR
  riscv: remove Kconfig check for GCC version for ARCH_RV64I
  Kconfig.debug: drop GCC 5+ version check for DWARF5
  mm/ksm: remove old GCC 4.9+ check
  compiler.h: drop fallback overflow checkers
  Documentation: raise minimum supported version of GCC to 5.1
2021-09-13 10:43:04 -07:00
Nick Desaulniers 6563139d90 powerpc: remove GCC version check for UPD_CONSTR
Now that GCC 5.1 is the minimum supported version, we can drop this
workaround for older versions of GCC. This adversely affected clang,
too.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-13 10:18:28 -07:00
Arnd Bergmann a7a08b275a arch: remove compat_alloc_user_space
All users of compat_alloc_user_space() and copy_in_user() have been
removed from the kernel, only a few functions in sparc remain that can be
changed to calling arch_copy_in_user() instead.

Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-08 15:32:35 -07:00
Linus Torvalds 192ad3c27a ARM:
- Page ownership tracking between host EL1 and EL2
 
 - Rely on userspace page tables to create large stage-2 mappings
 
 - Fix incompatibility between pKVM and kmemleak
 
 - Fix the PMU reset state, and improve the performance of the virtual PMU
 
 - Move over to the generic KVM entry code
 
 - Address PSCI reset issues w.r.t. save/restore
 
 - Preliminary rework for the upcoming pKVM fixed feature
 
 - A bunch of MM cleanups
 
 - a vGIC fix for timer spurious interrupts
 
 - Various cleanups
 
 s390:
 
 - enable interpretation of specification exceptions
 
 - fix a vcpu_idx vs vcpu_id mixup
 
 x86:
 
 - fast (lockless) page fault support for the new MMU
 
 - new MMU now the default
 
 - increased maximum allowed VCPU count
 
 - allow inhibit IRQs on KVM_RUN while debugging guests
 
 - let Hyper-V-enabled guests run with virtualized LAPIC as long as they
   do not enable the Hyper-V "AutoEOI" feature
 
 - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC)
 
 - tuning for the case when two-dimensional paging (EPT/NPT) is disabled
 
 - bugfixes and cleanups, especially with respect to 1) vCPU reset and
   2) choosing a paging mode based on CR0/CR4/EFER
 
 - support for 5-level page table on AMD processors
 
 Generic:
 
 - MMU notifier invalidation callbacks do not take mmu_lock unless necessary
 
 - improved caching of LRU kvm_memory_slot
 
 - support for histogram statistics
 
 - add statistics for halt polling and remote TLB flush requests
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmE2CIAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMyqwf+Ky2WoThuQ9Ra0r/m8pUTAx5+gsAf
 MmG24rNLE+26X0xuBT9Q5+etYYRLrRTWJvo5cgHooz7muAYW6scR+ho5xzvLTAxi
 DAuoijkXsSdGoFCp0OMUHiwG3cgY5N7feTEwLPAb2i6xr/l6SZyCP4zcwiiQbJ2s
 UUD0i3rEoNQ02/hOEveud/ENxzUli9cmmgHKXR3kNgsJClSf1fcuLnhg+7EGMhK9
 +c2V+hde5y0gmEairQWm22MLMRolNZ5NL4kjykiNh2M5q9YvbHe5+f/JmENlNZMT
 bsUQT6Ry1ukuJ0V59rZvUw71KknPFzZ3d6HgW4pwytMq6EJKiISHzRbVnQ==
 =FCAB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - Page ownership tracking between host EL1 and EL2
   - Rely on userspace page tables to create large stage-2 mappings
   - Fix incompatibility between pKVM and kmemleak
   - Fix the PMU reset state, and improve the performance of the virtual
     PMU
   - Move over to the generic KVM entry code
   - Address PSCI reset issues w.r.t. save/restore
   - Preliminary rework for the upcoming pKVM fixed feature
   - A bunch of MM cleanups
   - a vGIC fix for timer spurious interrupts
   - Various cleanups

  s390:
   - enable interpretation of specification exceptions
   - fix a vcpu_idx vs vcpu_id mixup

  x86:
   - fast (lockless) page fault support for the new MMU
   - new MMU now the default
   - increased maximum allowed VCPU count
   - allow inhibit IRQs on KVM_RUN while debugging guests
   - let Hyper-V-enabled guests run with virtualized LAPIC as long as
     they do not enable the Hyper-V "AutoEOI" feature
   - fixes and optimizations for the toggling of AMD AVIC (virtualized
     LAPIC)
   - tuning for the case when two-dimensional paging (EPT/NPT) is
     disabled
   - bugfixes and cleanups, especially with respect to vCPU reset and
     choosing a paging mode based on CR0/CR4/EFER
   - support for 5-level page table on AMD processors

  Generic:
   - MMU notifier invalidation callbacks do not take mmu_lock unless
     necessary
   - improved caching of LRU kvm_memory_slot
   - support for histogram statistics
   - add statistics for halt polling and remote TLB flush requests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
  KVM: Drop unused kvm_dirty_gfn_invalid()
  KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
  KVM: MMU: mark role_regs and role accessors as maybe unused
  KVM: MIPS: Remove a "set but not used" variable
  x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
  KVM: stats: Add VM stat for remote tlb flush requests
  KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
  KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
  KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
  Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
  KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
  kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
  kvm: x86: Increase MAX_VCPUS to 1024
  kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
  KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
  KVM: s390: index kvm->arch.idle_mask by vcpu_idx
  KVM: s390: Enable specification exception interpretation
  KVM: arm64: Trim guest debug exception handling
  KVM: SVM: Add 5-level page table support for SVM
  ...
2021-09-07 13:40:51 -07:00
Linus Torvalds 7cca308cfd powerpc updates for 5.15
- Convert pseries & powernv to use MSI IRQ domains.
 
  - Rework the pseries CPU numbering so that CPUs that are removed, and later re-added, are
    given a CPU number on the same node as previously, when possible.
 
  - Add support for a new more flexible device-tree format for specifying NUMA distances.
 
  - Convert powerpc to GENERIC_PTDUMP.
 
  - Retire sbc8548 and sbc8641d board support.
 
  - Various other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard, Cédric Le Goater,
 Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas, Fangrui Song, Finn Thain, Gautham R.
 Shenoy, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo
 Bras, Lukas Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan Chancellor,
 Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R. Sampat, Randy Dunlap, Sebastian
 Andrzej Siewior, Srikar Dronamraju, Wan Jiabing, Xiongwei Song, Zheng Yongjun.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEyHTYTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDo3D/9aXMVP2wsEMNB0XhTiJ1UUdi311Uq9
 PvkAaGZH14ZqZLVigeiD3gt6YzTH0cEuGj6qgwsJrPDjF8FESnMbBsprMLr5/qE1
 itWRGMAMCFaeTcB9ogYVJkzwg6RN2ZgIqoq4NVswNSXoAQGWb+1bvXq3RnXXNuGR
 TQmLL02poNC6nX0YbRaQoT1Xx4nfUTiKHhU+Aok9uOCMJIyYZVATR6Qafb7/j7tO
 UvjwOHztbu84lcJOGmSnw4LcmwNORLuP9IwR0r+O1M3ijEZqDo9TPkvtSz8HZwjU
 mxdJwhrUmN0euMcghuiFxW+1XG2eM49ugsdJugiezG2RaIijbIp0nAIvdeaKAgT1
 OSSwvWCQ0fkTPyLXE+O6tVqMhlUMdqQlRcyNwmN9svIip9VnwGNq3vA4ePlJm6Fi
 i0i/tLqVNlJwFokZ7blW5g8SRgGRuFfXd5XUYLFvy5Teez+/7b1mW95gPQZSJ8kV
 Tbx2e0nHAPX4hCAxJ1AB3/zTlnjY+4+WJ9bD5XdgXkeVE8PPh1BEkulhMi1R1OMj
 57D1W6OgsBu/Pze78wjAvwO8+NAb1T/2mv2Bd/LY6Q+7hNDqOOhuajyBTxbH41FG
 sqx5bKjKOwgTybfV9A0Eo0e4FQBX07yXltBFHaPlyA4sOsIhM59+PxNrEwN1eZrQ
 LVVsdBXg8pHxrw==
 =EbN0
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Convert pseries & powernv to use MSI IRQ domains.

 - Rework the pseries CPU numbering so that CPUs that are removed, and
   later re-added, are given a CPU number on the same node as
   previously, when possible.

 - Add support for a new more flexible device-tree format for specifying
   NUMA distances.

 - Convert powerpc to GENERIC_PTDUMP.

 - Retire sbc8548 and sbc8641d board support.

 - Various other small features and fixes.

Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard,
Cédric Le Goater, Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas,
Fangrui Song, Finn Thain, Gautham R.  Shenoy, Hari Bathini, Joel
Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo Bras, Lukas
Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan
Chancellor, Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R.
Sampat, Randy Dunlap, Sebastian Andrzej Siewior, Srikar Dronamraju, Wan
Jiabing, Xiongwei Song, and Zheng Yongjun.

* tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (154 commits)
  powerpc/bug: Cast to unsigned long before passing to inline asm
  powerpc/ptdump: Fix generic ptdump for 64-bit
  KVM: PPC: Fix clearing never mapped TCEs in realmode
  powerpc/pseries/iommu: Rename "direct window" to "dma window"
  powerpc/pseries/iommu: Make use of DDW for indirect mapping
  powerpc/pseries/iommu: Find existing DDW with given property name
  powerpc/pseries/iommu: Update remove_dma_window() to accept property name
  powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
  powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
  powerpc/pseries/iommu: Allow DDW windows starting at 0x00
  powerpc/pseries/iommu: Add ddw_list_new_entry() helper
  powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
  powerpc/kernel/iommu: Add new iommu_table_in_use() helper
  powerpc/pseries/iommu: Replace hard-coded page shift
  powerpc/numa: Update cpu_cpu_map on CPU online/offline
  powerpc/numa: Print debug statements only when required
  powerpc/numa: convert printk to pr_xxx
  powerpc/numa: Drop dbg in favour of pr_debug
  powerpc/smp: Enable CACHE domain for shared processor
  powerpc/smp: Update cpu_core_map on all PowerPc systems
  ...
2021-09-03 11:22:50 -07:00
Michael Ellerman a3314262ee Merge branch 'fixes' into next
Merge our fixes branch into next.

That lets us resolve a conflict in arch/powerpc/sysdev/xive/common.c.

Between cbc06f051c ("powerpc/xive: Do not skip CPU-less nodes when
creating the IPIs"), which moved request_irq() out of xive_init_ipis(),
and 17df41fec5 ("powerpc: use IRQF_NO_DEBUG for IPIs") which added
IRQF_NO_DEBUG to that request_irq() call, which has now moved.
2021-09-03 22:54:12 +10:00
Michael Ellerman e432fe97f3 powerpc/bug: Cast to unsigned long before passing to inline asm
In commit 1e688dd2a3 ("powerpc/bug: Provide better flexibility to
WARN_ON/__WARN_FLAGS() with asm goto") we changed WARN_ON(). Previously
it would take the warning condition, x, and double negate it before
converting the result to int, and passing that int to the underlying
inline asm. ie:

  #define WARN_ON(x) ({
  	int __ret_warn_on = !!(x);
  	if (__builtin_constant_p(__ret_warn_on)) {
  	...
  	} else {
  		BUG_ENTRY(PPC_TLNEI " %4, 0",
  			  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
  			  "r" (__ret_warn_on));

The asm then does a full register width comparison with zero and traps
if it is non-zero (PPC_TLNEI).

The new code instead passes the full expression, x, with some arbitrary
type, to the inline asm:

  #define WARN_ON(x) ({
	...
	do {
		if (__builtin_constant_p((x))) {
		...
		} else {
			...
			WARN_ENTRY(PPC_TLNEI " %4, 0",
				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),
				   __label_warn_on, "r" (x));

As reported[1] by Nathan, when building with clang this can cause
spurious warnings to fire repeatedly at boot:

  WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
  NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
  REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
  MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
  CFAR: c00000000090a034 IRQMASK: 0
  GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
  GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
  GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
  GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
  GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
  NIP .klist_add_tail+0x3c/0x110
  LR  .bus_add_driver+0x148/0x290
  Call Trace:
    0xc0000000073c35d0 (unreliable)
    .bus_add_driver+0x148/0x290
    .driver_register+0xb8/0x190
    .__hid_register_driver+0x70/0xd0
    .redragon_driver_init+0x34/0x58
    .do_one_initcall+0x130/0x3b0
    .do_initcall_level+0xd8/0x188
    .do_initcalls+0x7c/0xdc
    .kernel_init_freeable+0x178/0x21c
    .kernel_init+0x34/0x220
    .ret_from_kernel_thread+0x58/0x60
  Instruction dump:
  fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
  fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024

The instruction dump shows that we are trapping because r30 is not zero:
  tdnei   r30,0

Where r30 = c000000007de72c8

The WARN_ON() comes from:

  static void knode_set_klist(struct klist_node *knode, struct klist *klist)
  {
  	knode->n_klist = klist;
  	/* no knode deserves to start its life dead */
  	WARN_ON(knode_dead(knode));
  		      ^^^^^^^^^^^^^^^^^

Where:

  #define KNODE_DEAD		1LU

  static bool knode_dead(struct klist_node *knode)
  {
  	return (unsigned long)knode->n_klist & KNODE_DEAD;
  }

The full disassembly shows that clang has not generated any code to
apply the "& KNODE_DEAD" to the n_klist pointer, which is surprising.

Nathan filed an LLVM bug [2], in which Eli Friedman explained that clang
believes it is only passing a single bit to the asm (ie. a bool) and so
the mask of bit 0 with 1 can be omitted, and suggested that if we want
the full 64-bit value passed to the inline asm we should cast to a
64-bit type (or 32-bit on 32-bits).

In fact we already do that for BUG_ENTRY(), which was added to fix a
possibly similar bug in 2005 in commit 32818c2eb6 ("[PATCH] ppc64: Fix
issue with gcc 4.0 compiled kernels").

So cast the value we pass to the inline asm to long.

For GCC this appears to have no effect on code generation, other than
causing sign extension in some cases.

[1]: http://lore.kernel.org/r/YSa1O4fcX1nNKqN/@Ryzen-9-3900X.localdomain
[2]: https://bugs.llvm.org/show_bug.cgi?id=51634

Fixes: 1e688dd2a3 ("powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210901112522.1085134-1-mpe@ellerman.id.au
2021-09-01 21:25:43 +10:00
Jakub Kicinski 97c78d0af5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 17:57:57 -07:00
Leonardo Bras 3c33066a21 powerpc/kernel/iommu: Add new iommu_table_in_use() helper
Having a function to check if the iommu table has any allocation helps
deciding if a tbl can be reset for using a new DMA window.

It should be enough to replace all instances of !bitmap_empty(tbl...).

iommu_table_in_use() skips reserved memory, so we don't need to worry about
releasing it before testing. This causes iommu_table_release_pages() to
become unnecessary, given it is only used to remove reserved memory for
testing.

Also, only allow storing reserved memory values in tbl if they are valid
in the table, so there is no need to check it in the new helper.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-3-leobras.c@gmail.com
2021-08-27 00:56:54 +10:00
Leonardo Bras 0c634bafe3 powerpc/pseries/iommu: Replace hard-coded page shift
Some functions assume IOMMU page size can only be 4K (pageshift == 12).
Update them to accept any page size passed, so we can use 64K pages.

In the process, some defines like TCE_SHIFT were made obsolete, and then
removed.

IODA3 Revision 3.0_prd1 (OpenPowerFoundation), Figures 3.4 and 3.5 show
a RPN of 52-bit, and considers a 12-bit pageshift, so there should be
no need of using TCE_RPN_MASK, which masks out any bit after 40 in rpn.
It's usage removed from tce_build_pSeries(), tce_build_pSeriesLP(), and
tce_buildmulti_pSeriesLP().

Most places had a tbl struct, so using tbl->it_page_shift was simple.
tce_free_pSeriesLP() was a special case, since callers not always have a
tbl struct, so adding a tceshift parameter seems the right thing to do.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210817063929.38701-2-leobras.c@gmail.com
2021-08-27 00:56:54 +10:00
Srikar Dronamraju 9a245d0e1f powerpc/numa: Update cpu_cpu_map on CPU online/offline
cpu_cpu_map holds all the CPUs in the DIE. However in PowerPC, when
onlining/offlining of CPUs, this mask doesn't get updated.  This mask
is however updated when CPUs are added/removed. So when both
operations like online/offline of CPUs and adding/removing of CPUs are
done simultaneously, then cpumaps end up broken.

WARNING: CPU: 13 PID: 1142 at kernel/sched/topology.c:898
build_sched_domains+0xd48/0x1720
Modules linked in: rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag
udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag
bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
rfkill nf_tables nfnetlink pseries_rng xts vmx_crypto uio_pdrv_genirq
uio binfmt_misc ip_tables xfs libcrc32c dm_service_time sd_mod t10_pi sg
ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror dm_region_hash
dm_log dm_mod fuse
CPU: 13 PID: 1142 Comm: kworker/13:2 Not tainted 5.13.0-rc6+ #28
Workqueue: events cpuset_hotplug_workfn
NIP:  c0000000001caac8 LR: c0000000001caac4 CTR: 00000000007088ec
REGS: c00000005596f220 TRAP: 0700   Not tainted  (5.13.0-rc6+)
MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48828222  XER:
00000009
CFAR: c0000000001ea698 IRQMASK: 0
GPR00: c0000000001caac4 c00000005596f4c0 c000000001c4a400 0000000000000036
GPR04: 00000000fffdffff c00000005596f1d0 0000000000000027 c0000018cfd07f90
GPR08: 0000000000000023 0000000000000001 0000000000000027 c0000018fe68ffe8
GPR12: 0000000000008000 c00000001e9d1880 c00000013a047200 0000000000000800
GPR16: c000000001d3c7d0 0000000000000240 0000000000000048 c000000010aacd18
GPR20: 0000000000000001 c000000010aacc18 c00000013a047c00 c000000139ec2400
GPR24: 0000000000000280 c000000139ec2520 c000000136c1b400 c000000001c93060
GPR28: c00000013a047c20 c000000001d3c6c0 c000000001c978a0 000000000000000d
NIP [c0000000001caac8] build_sched_domains+0xd48/0x1720
LR [c0000000001caac4] build_sched_domains+0xd44/0x1720
Call Trace:
[c00000005596f4c0] [c0000000001caac4] build_sched_domains+0xd44/0x1720 (unreliable)
[c00000005596f670] [c0000000001cc5ec] partition_sched_domains_locked+0x3ac/0x4b0
[c00000005596f710] [c0000000002804e4] rebuild_sched_domains_locked+0x404/0x9e0
[c00000005596f810] [c000000000283e60] rebuild_sched_domains+0x40/0x70
[c00000005596f840] [c000000000284124] cpuset_hotplug_workfn+0x294/0xf10
[c00000005596fc60] [c000000000175040] process_one_work+0x290/0x590
[c00000005596fd00] [c0000000001753c8] worker_thread+0x88/0x620
[c00000005596fda0] [c000000000181704] kthread+0x194/0x1a0
[c00000005596fe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
485af049 60000000 2fa30800 409e0028 80fe0000 e89a00f8 e86100e8 38da0120
7f88e378 7ce53b78 4801fb91 60000000 <0fe00000> 39000000 38e00000 38c00000

Fix this by updating cpu_cpu_map aka cpumask_of_node() on every CPU
online/offline.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210826100521.412639-5-srikar@linux.vnet.ibm.com
2021-08-27 00:56:54 +10:00
Christophe Leroy 8149238ffd powerpc: Redefine HMT_xxx macros as empty on PPC32
HMT_xxx macros are macros for adjusting thread priority
(hardware multi-threading) are macros inherited from PPC64
via commit 5f7c690728 ("[PATCH] powerpc: Merged ppc_asm.h")

Those instructions are pointless on PPC32, but some common
fonctions like arch_cpu_idle() use them.

So make them empty on PPC32 to avoid those instructions.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c5a07fadea33d640ad10cecf0ac8faaec1c524e0.1629898474.git.christophe.leroy@csgroup.eu
2021-08-27 00:56:52 +10:00
Michael Ellerman 465e333e77 Merge branch 'topic/ppc-kvm' into next
Merge some KVM patches we are keeping in a topic branch in case there
are any merge conflicts that need resolving.
2021-08-26 21:21:11 +10:00
Christophe Leroy 806c0e6e7e powerpc: Refactor verification of MSR_RI
40x and BOOKE don't have MSR_RI therefore all tests involving
MSR_RI may be problematic on those plateforms.

Create helpers to check or set MSR_RI in regs, and use them
in common code.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c2fb93708196734f4176dda334aaa3055f213b89.1629707037.git.christophe.leroy@csgroup.eu
2021-08-26 21:21:07 +10:00
Xiongwei Song 4872cbd0ca powerpc: Add dear as a synonym for pt_regs.dar register
Create an anonymous union for dar and dear regsiters, we can reference
dear to get the effective address when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dar. This makes code more clear.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Reword commit title]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807010239.416055-4-sxwjean@me.com
2021-08-26 21:21:07 +10:00
Xiongwei Song 4f8e78c075 powerpc: Add esr as a synonym for pt_regs.dsisr
Create an anonymous union for dsisr and esr regsiters, we can reference
esr to get the exception detail when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dsisr. This makes code more clear.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Reword commit title]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210807010239.416055-2-sxwjean@me.com
2021-08-26 21:21:06 +10:00
Nicholas Piggin 1782663897 KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live
After the L1 saves its PMU SPRs but before loading the L2's PMU SPRs,
switch the pmcregs_in_use field in the L1 lppaca to the value advertised
by the L2 in its VPA. On the way out of the L2, set it back after saving
the L2 PMU registers (if they were in-use).

This transfers the PMU liveness indication between the L1 and L2 at the
points where the registers are not live.

This fixes the nested HV bug for which a workaround was added to the L0
HV by commit 63279eeb7f ("KVM: PPC: Book3S HV: Always save guest pmu
for guest capable of nesting"), which explains the problem in detail.
That workaround is no longer required for guests that include this bug
fix.

Fixes: 360cae3137 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Link: https://lore.kernel.org/r/20210811160134.904987-10-npiggin@gmail.com
2021-08-25 16:37:18 +10:00
Nicholas Piggin 8b210a880b KVM: PPC: Book3S HV Nested: Make nested HFSCR state accessible
When the L0 runs a nested L2, there are several permutations of HFSCR
that can be relevant. The HFSCR that the L1 vcpu L1 requested, the
HFSCR that the L1 vcpu may use, and the HFSCR that is actually being
used to run the L2.

The L1 requested HFSCR is not accessible outside the nested hcall
handler, so copy that into a new kvm_nested_guest.hfscr field.

The permitted HFSCR is taken from the HFSCR that the L1 runs with,
which is also not accessible while the hcall is being made. Move
this into a new kvm_vcpu_arch.hfscr_permitted field.

These will be used by the next patch to improve facility handling
for nested guests, and later by facility demand faulting patches.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-7-npiggin@gmail.com
2021-08-25 16:37:18 +10:00
Nicholas Piggin d82b392d9b KVM: PPC: Book3S HV Nested: Fix TM softpatch HFAC interrupt emulation
Have the TM softpatch emulation code set up the HFAC interrupt and
return -1 in case an instruction was executed with HFSCR bits clear,
and have the interrupt exit handler fall through to the HFAC handler.
When the L0 is running a nested guest, this ensures the HFAC interrupt
is correctly passed up to the L1.

The "direct guest" exit handler will turn these into PROGILL program
interrupts so functionality in practice will be unchanged. But it's
possible an L1 would want to handle these in a different way.

Also rearrange the FAC interrupt emulation code to match the HFAC format
while here (mainly, adding the FSCR_INTR_CAUSE mask).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210811160134.904987-5-npiggin@gmail.com
2021-08-25 16:37:17 +10:00
Christophe Leroy 9401f4e46c powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros
Force the eh flag at 0 on PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1fc81f07cabebb875b963e295408cc3dd38c8d85.1614674882.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:49 +10:00
Christophe Leroy 19e932eb6e powerpc/ptrace: Make user_mode() common to PPC32 and PPC64
Today we have:

	#ifdef __powerpc64__
	#define user_mode(regs) ((((regs)->msr) >> MSR_PR_LG) & 0x1)
	#else
	#define user_mode(regs) (((regs)->msr & MSR_PR) != 0)
	#endif

With ppc64_defconfig, we get:

	if (!user_mode(regs))
    14b4:	e9 3e 01 08 	ld      r9,264(r30)
    14b8:	71 29 40 00 	andi.   r9,r9,16384
    14bc:	41 82 07 a4 	beq     1c60 <.emulate_instruction+0x7d0>

If taking the ppc32 definition of user_mode(), the exact same code
is generated for ppc64_defconfig.

So, only keep one version of user_mode(), preferably the one not
using MSR_PR_LG which should be kept internal to reg.h.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/000a28c51808bbd802b505af42d2cb316c2be7d3.1629216000.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:49 +10:00
Christophe Leroy f5007dbf4d powerpc/booke: Avoid link stack corruption in several places
Use bcl 20,31,+4 instead of bl in order to preserve link stack.

See commit c974809a26 ("powerpc/vdso: Avoid link stack corruption
in __get_datapage()") for details.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e9fbc285eceb720e6c0e032ef47fe8b05f669b48.1629791751.git.christophe.leroy@csgroup.eu
2021-08-25 13:35:47 +10:00
Cédric Le Goater cc47ad409b powerpc/compat_sys: Declare syscalls
This fixes a compile error with W=1.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210823090039.166120-3-clg@kaod.org
2021-08-23 19:28:21 +10:00
Christophe Leroy a00ea5b6f2 powerpc/syscalls: Remove __NR__exit
__NR__exit is nowhere used. On most architectures it was removed by
commit 135ab6ec8f ("[PATCH] remove remaining errno and
__KERNEL_SYSCALLS__ references") but not on powerpc.

powerpc removed __KERNEL_SYSCALLS__ in commit 3db03b4afb ("[PATCH]
rename the provided execve functions to kernel_execve"), but __NR__exit
was left over.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6457eb4f327313323ed1f70e540bbb4ddc9178fa.1629701106.git.christophe.leroy@csgroup.eu
2021-08-23 19:28:20 +10:00
Jing Zhang 87bcc5fa09 KVM: stats: Add halt_wait_ns stats for all architectures
Add simple stats halt_wait_ns to record the time a VCPU has spent on
waiting for all architectures (not just powerpc).

Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210802165633.1866976-5-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-20 16:06:33 -04:00
Christophe Leroy 770cec16cd powerpc/audit: Simplify syscall_get_arch()
Make use of is_32bit_task() and CONFIG_CPU_LITTLE_ENDIAN
to simplify syscall_get_arch().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4be53b9187a4d8c163968f4d224267e41a7fcc33.1629451479.git.christophe.leroy@csgroup.eu
2021-08-20 22:38:13 +10:00
Christophe Leroy 898a1ef06a powerpc/audit: Avoid unneccessary #ifdef in syscall_get_arguments()
Use is_32bit_task() which already handles CONFIG_COMPAT.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ba49cdd574558a0363300c3f6b5b062b397cb071.1629451483.git.christophe.leroy@csgroup.eu
2021-08-20 22:38:08 +10:00
Jakub Kicinski f444fea789 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/ptp/Kconfig:
  55c8fca1da ("ptp_pch: Restore dependency on PCI")
  e5f3155267 ("ethernet: fix PTP_1588_CLOCK dependencies")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 18:09:18 -07:00
Christophe Leroy ef486bf448 powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP
Commit b5efec00b6 ("powerpc/32s: Move KUEP locking/unlocking in C")
removed the 'isync' instruction after adding/removing NX bit in user
segments. The reasoning behind this change was that when setting the
NX bit we don't mind it taking effect with delay as the kernel never
executes text from userspace, and when clearing the NX bit this is
to return to userspace and then the 'rfi' should synchronise the
context.

However, it looks like on book3s/32 having a hash page table, at least
on the G3 processor, we get an unexpected fault from userspace, then
this is followed by something wrong in the verification of MSR_PR
at end of another interrupt.

This is fixed by adding back the removed isync() following update
of NX bit in user segment registers. Only do it for cores with an
hash table, as 603 cores don't exhibit that problem and the two isync
increase ./null_syscall selftest by 6 cycles on an MPC 832x.

First problem: unexpected WARN_ON() for mysterious PROTFAULT

  WARNING: CPU: 0 PID: 1660 at arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0
  Modules linked in:
  CPU: 0 PID: 1660 Comm: Xorg Not tainted 5.13.0-pmac-00028-gb3c15b60339a #40
  NIP:  c001b5c8 LR: c001b6f8 CTR: 00000000
  REGS: e2d09e40 TRAP: 0700   Not tainted  (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  00021032 <ME,IR,DR,RI>  CR: 42d04f30  XER: 20000000
  GPR00: c000424c e2d09f00 c301b680 e2d09f40 0000001e 42000000 00cba028 00000000
  GPR08: 08000000 48000010 c301b680 e2d09f30 22d09f30 00c1fff0 00cba000 a7b7ba4c
  GPR16: 00000031 00000000 00000000 00000000 00000000 00000000 a7b7b0d0 00c5c010
  GPR24: a7b7b64c a7b7d2f0 00000004 00000000 c1efa6c0 00cba02c 00000300 e2d09f40
  NIP [c001b5c8] do_page_fault+0x6c/0x5b0
  LR [c001b6f8] do_page_fault+0x19c/0x5b0
  Call Trace:
  [e2d09f00] [e2d09f04] 0xe2d09f04 (unreliable)
  [e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4
  --- interrupt: 300 at 0xa7a261dc
  NIP:  a7a261dc LR: a7a253bc CTR: 00000000
  REGS: e2d09f40 TRAP: 0300   Not tainted  (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 228428e2  XER: 20000000
  DAR: 00cba02c DSISR: 42000000
  GPR00: a7a27448 afa6b0e0 a74c35c0 a7b7b614 0000001e a7b7b614 00cba028 00000000
  GPR08: 00020fd9 00000031 00cb9ff8 a7a273b0 220028e2 00c1fff0 00cba000 a7b7ba4c
  GPR16: 00000031 00000000 00000000 00000000 00000000 00000000 a7b7b0d0 00c5c010
  GPR24: a7b7b64c a7b7d2f0 00000004 00000002 0000001e a7b7b614 a7b7aff4 00000030
  NIP [a7a261dc] 0xa7a261dc
  LR [a7a253bc] 0xa7a253bc
  --- interrupt: 300
  Instruction dump:
  7c4a1378 810300a0 75278410 83820298 83a300a4 553b018c 551e0036 4082038c
  2e1b0000 40920228 75280800 41820220 <0fe00000> 3b600000 41920214 81420594

Second problem: MSR PR is seen unset allthough the interrupt frame shows it set

  kernel BUG at arch/powerpc/kernel/interrupt.c:458!
  Oops: Exception in kernel mode, sig: 5 [#1]
  BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
  Modules linked in:
  CPU: 0 PID: 1660 Comm: Xorg Tainted: G        W         5.13.0-pmac-00028-gb3c15b60339a #40
  NIP:  c0011434 LR: c001629c CTR: 00000000
  REGS: e2d09e70 TRAP: 0700   Tainted: G        W          (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 42d09f30  XER: 00000000
  GPR00: 00000000 e2d09f30 c301b680 e2d09f40 83440000 c44d0e68 e2d09e8c 00000000
  GPR08: 00000002 00dc228a 00004000 e2d09f30 22d09f30 00c1fff0 afa6ceb4 00c26144
  GPR16: 00c25fb8 00c26140 afa6ceb8 90000000 00c944d8 0000001c 00000000 00200000
  GPR24: 00000000 000001fb afa6d1b4 00000001 00000000 a539a2a0 a530fd80 00000089
  NIP [c0011434] interrupt_exit_kernel_prepare+0x10/0x70
  LR [c001629c] interrupt_return+0x9c/0x144
  Call Trace:
  [e2d09f30] [c000424c] DataAccess_virt+0xd4/0xe4 (unreliable)
  --- interrupt: 300 at 0xa09be008
  NIP:  a09be008 LR: a09bdfe8 CTR: a09bdfc0
  REGS: e2d09f40 TRAP: 0300   Tainted: G        W          (5.13.0-pmac-00028-gb3c15b60339a)
  MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 420028e2  XER: 20000000
  DAR: a539a308 DSISR: 0a000000
  GPR00: a7b90d50 afa6b2d0 a74c35c0 a0a8b690 a0a8b698 a5365d70 a4fa82a8 00000004
  GPR08: 00000000 a09bdfc0 00000000 a5360000 a09bde7c 00c1fff0 afa6ceb4 00c26144
  GPR16: 00c25fb8 00c26140 afa6ceb8 90000000 00c944d8 0000001c 00000000 00200000
  GPR24: 00000000 000001fb afa6d1b4 00000001 00000000 a539a2a0 a530fd80 00000089
  NIP [a09be008] 0xa09be008
  LR [a09bdfe8] 0xa09bdfe8
  --- interrupt: 300
  Instruction dump:
  80010024 83e1001c 7c0803a6 4bffff80 3bc00800 4bffffd0 486b42fd 4bffffcc
  81430084 71480002 41820038 554a0462 <0f0a0000> 80620060 74630001 40820034

Fixes: b5efec00b6 ("powerpc/32s: Move KUEP locking/unlocking in C")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4856f5574906e2aec0522be17bf3848a22b2cd0b.1629269345.git.christophe.leroy@csgroup.eu
2021-08-19 09:41:54 +10:00
Christophe Leroy 1e688dd2a3 powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
flexibility to GCC.

For that add an entry to the exception table so that
program_check_exception() knowns where to resume execution
after a WARNING.

Here are two exemples. The first one is done on PPC32 (which
benefits from the previous patch), the second is on PPC64.

	unsigned long test(struct pt_regs *regs)
	{
		int ret;

		WARN_ON(regs->msr & MSR_PR);

		return regs->gpr[3];
	}

	unsigned long test9w(unsigned long a, unsigned long b)
	{
		if (WARN_ON(!b))
			return 0;
		return a / b;
	}

Before the patch:

	000003a8 <test>:
	 3a8:	81 23 00 84 	lwz     r9,132(r3)
	 3ac:	71 29 40 00 	andi.   r9,r9,16384
	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
	 3b8:	4e 80 00 20 	blr

	 3bc:	0f e0 00 00 	twui    r0,0
	 3c0:	80 63 00 0c 	lwz     r3,12(r3)
	 3c4:	4e 80 00 20 	blr

	0000000000000bf0 <.test9w>:
	 bf0:	7c 89 00 74 	cntlzd  r9,r4
	 bf4:	79 29 d1 82 	rldicl  r9,r9,58,6
	 bf8:	0b 09 00 00 	tdnei   r9,0
	 bfc:	2c 24 00 00 	cmpdi   r4,0
	 c00:	41 82 00 0c 	beq     c0c <.test9w+0x1c>
	 c04:	7c 63 23 92 	divdu   r3,r3,r4
	 c08:	4e 80 00 20 	blr

	 c0c:	38 60 00 00 	li      r3,0
	 c10:	4e 80 00 20 	blr

After the patch:

	000003a8 <test>:
	 3a8:	81 23 00 84 	lwz     r9,132(r3)
	 3ac:	71 29 40 00 	andi.   r9,r9,16384
	 3b0:	40 82 00 0c 	bne     3bc <test+0x14>
	 3b4:	80 63 00 0c 	lwz     r3,12(r3)
	 3b8:	4e 80 00 20 	blr

	 3bc:	0f e0 00 00 	twui    r0,0

	0000000000000c50 <.test9w>:
	 c50:	7c 89 00 74 	cntlzd  r9,r4
	 c54:	79 29 d1 82 	rldicl  r9,r9,58,6
	 c58:	0b 09 00 00 	tdnei   r9,0
	 c5c:	7c 63 23 92 	divdu   r3,r3,r4
	 c60:	4e 80 00 20 	blr

	 c70:	38 60 00 00 	li      r3,0
	 c74:	4e 80 00 20 	blr

In the first exemple, we see GCC doesn't need to duplicate what
happens after the trap.

In the second exemple, we see that GCC doesn't need to emit a test
and a branch in the likely path in addition to the trap.

We've got some WARN_ON() in .softirqentry.text section so it needs
to be added in the OTHER_TEXT_SECTIONS in modpost.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/389962b1b702e3c78d169e59bcfac56282889173.1618331882.git.christophe.leroy@csgroup.eu
2021-08-15 13:49:24 +10:00
Christophe Leroy db87a71992 powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
powerpc BUG_ON() and WARN_ON() are based on using twnei instruction.

For catching simple conditions like a variable having value 0, this
is efficient because it does the test and the trap at the same time.
But most conditions used with BUG_ON or WARN_ON are more complex and
forces GCC to format the condition into a 0 or 1 value in a register.
This will usually require 2 to 3 instructions.

The most efficient solution would be to use __builtin_trap() because
GCC is able to optimise the use of the different trap instructions
based on the requested condition, but this is complex if not
impossible for the following reasons:
- __builtin_trap() is a non-recoverable instruction, so it can't be
used for WARN_ON
- Knowing which line of code generated the trap would require the
analysis of DWARF information. This is not a feature we have today.

As mentioned in commit 8d4fbcfbe0 ("Fix WARN_ON() on bitfield ops")
the way WARN_ON() is implemented is suboptimal. That commit also
mentions an issue with 'long long' condition. It fixed it for
WARN_ON() but the same problem still exists today with BUG_ON() on
PPC32. It will be fixed by using the generic implementation.

By using the generic implementation, gcc will naturally generate a
branch to the unconditional trap generated by BUG().

As modern powerpc implement zero-cycle branch,
that's even more efficient.

And for the functions using WARN_ON() and its return, the test
on return from WARN_ON() is now also used for the WARN_ON() itself.

On PPC64 we don't want it because we want to be able to use CFAR
register to track how we entered the code that trapped. The CFAR
register would be clobbered by the branch.

A simple test function:

	unsigned long test9w(unsigned long a, unsigned long b)
	{
		if (WARN_ON(!b))
			return 0;
		return a / b;
	}

Before the patch:

	0000046c <test9w>:
	 46c:	7c 89 00 34 	cntlzw  r9,r4
	 470:	55 29 d9 7e 	rlwinm  r9,r9,27,5,31
	 474:	0f 09 00 00 	twnei   r9,0
	 478:	2c 04 00 00 	cmpwi   r4,0
	 47c:	41 82 00 0c 	beq     488 <test9w+0x1c>
	 480:	7c 63 23 96 	divwu   r3,r3,r4
	 484:	4e 80 00 20 	blr

	 488:	38 60 00 00 	li      r3,0
	 48c:	4e 80 00 20 	blr

After the patch:

	00000468 <test9w>:
	 468:	2c 04 00 00 	cmpwi   r4,0
	 46c:	41 82 00 0c 	beq     478 <test9w+0x10>
	 470:	7c 63 23 96 	divwu   r3,r3,r4
	 474:	4e 80 00 20 	blr

	 478:	0f e0 00 00 	twui    r0,0
	 47c:	38 60 00 00 	li      r3,0
	 480:	4e 80 00 20 	blr

So we see before the patch we need 3 instructions on the likely path
to handle the WARN_ON(). With the patch the trap goes on the unlikely
path.

See below the difference at the entry of system_call_exception where
we have several BUG_ON(), allthough less impressing.

With the patch:

	00000000 <system_call_exception>:
	   0:	81 6a 00 84 	lwz     r11,132(r10)
	   4:	90 6a 00 88 	stw     r3,136(r10)
	   8:	71 60 00 02 	andi.   r0,r11,2
	   c:	41 82 00 70 	beq     7c <system_call_exception+0x7c>
	  10:	71 60 40 00 	andi.   r0,r11,16384
	  14:	41 82 00 6c 	beq     80 <system_call_exception+0x80>
	  18:	71 6b 80 00 	andi.   r11,r11,32768
	  1c:	41 82 00 68 	beq     84 <system_call_exception+0x84>
	  20:	94 21 ff e0 	stwu    r1,-32(r1)
	  24:	93 e1 00 1c 	stw     r31,28(r1)
	  28:	7d 8c 42 e6 	mftb    r12
	...
	  7c:	0f e0 00 00 	twui    r0,0
	  80:	0f e0 00 00 	twui    r0,0
	  84:	0f e0 00 00 	twui    r0,0

Without the patch:

	00000000 <system_call_exception>:
	   0:	94 21 ff e0 	stwu    r1,-32(r1)
	   4:	93 e1 00 1c 	stw     r31,28(r1)
	   8:	90 6a 00 88 	stw     r3,136(r10)
	   c:	81 6a 00 84 	lwz     r11,132(r10)
	  10:	69 60 00 02 	xori    r0,r11,2
	  14:	54 00 ff fe 	rlwinm  r0,r0,31,31,31
	  18:	0f 00 00 00 	twnei   r0,0
	  1c:	69 60 40 00 	xori    r0,r11,16384
	  20:	54 00 97 fe 	rlwinm  r0,r0,18,31,31
	  24:	0f 00 00 00 	twnei   r0,0
	  28:	69 6b 80 00 	xori    r11,r11,32768
	  2c:	55 6b 8f fe 	rlwinm  r11,r11,17,31,31
	  30:	0f 0b 00 00 	twnei   r11,0
	  34:	7d 8c 42 e6 	mftb    r12

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b286e07fb771a664b631cd07a40b09c06f26e64b.1618331881.git.christophe.leroy@csgroup.eu
2021-08-14 22:00:01 +10:00
Aneesh Kumar K.V 1c6b5a7e74 powerpc/pseries: Add support for FORM2 associativity
PAPR interface currently supports two different ways of communicating resource
grouping details to the OS. These are referred to as Form 0 and Form 1
associativity grouping. Form 0 is the older format and is now considered
deprecated. This patch adds another resource grouping named FORM2.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-6-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:27 +10:00
Aneesh Kumar K.V ef31cb83d1 powerpc/pseries: Add a helper for form1 cpu distance
This helper is only used with the dispatch trace log collection.
A later patch will add Form2 affinity support and this change helps
in keeping that simpler. Also add a comment explaining we don't expect
the code to be called with FORM0

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-5-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:27 +10:00
Aneesh Kumar K.V 8ddc6448ec powerpc/pseries: Consolidate different NUMA distance update code paths
The associativity details of the newly added resourced are collected from
the hypervisor via "ibm,configure-connector" rtas call. Update the numa
distance details of the newly added numa node after the above call.

Instead of updating NUMA distance every time we lookup a node id
from the associativity property, add helpers that can be used
during boot which does this only once. Also remove the distance
update from node id lookup helpers.

Currently, we duplicate parsing code for ibm,associativity and
ibm,associativity-lookup-arrays in the kernel. The associativity array provided
by these device tree properties are very similar and hence can use
a helper to parse the node id and numa distance details.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-4-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V 0eacd06bb8 powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
Also make related code cleanup that will allow adding FORM2_AFFINITY in
later patches. No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132223.225214-3-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:26 +10:00
Aneesh Kumar K.V dbf77fed8b powerpc: rename powerpc_debugfs_root to arch_debugfs_dir
No functional change in this patch. arch_debugfs_dir is the generic kernel
name declared in linux/debugfs.h for arch-specific debugfs directory.
Architectures like x86/s390 already use the name. Rename powerpc
specific powerpc_debugfs_root to arch_debugfs_dir.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com
2021-08-13 22:04:26 +10:00
Christophe Leroy 98694166c2 powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()
An interrupt handler shall not be called from another interrupt
handler otherwise this leads to problems like the following:

  Kernel attempted to write user page (afd4fa84) - exploit attempt? (uid: 1000)
  ------------[ cut here ]------------
  Bug: Write fault blocked by KUAP!
  WARNING: CPU: 0 PID: 1617 at arch/powerpc/mm/fault.c:230 do_page_fault+0x484/0x720
  Modules linked in:
  CPU: 0 PID: 1617 Comm: sshd Tainted: G        W         5.13.0-pmac-00010-g8393422eb77 #7
  NIP:  c001b77c LR: c001b77c CTR: 00000000
  REGS: cb9e5bc0 TRAP: 0700   Tainted: G        W          (5.13.0-pmac-00010-g8393422eb77)
  MSR:  00021032 <ME,IR,DR,RI>  CR: 24942424  XER: 00000000

  GPR00: c001b77c cb9e5c80 c1582c00 00000021 3ffffbff 085b0000 00000027 c8eb644c
  GPR08: 00000023 00000000 00000000 00000000 24942424 0063f8c8 00000000 000186a0
  GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90
  GPR24: 00000040 afd4fa96 00000040 02000000 c1fda6c0 afd4fa84 00000300 cb9e5cc0
  NIP [c001b77c] do_page_fault+0x484/0x720
  LR [c001b77c] do_page_fault+0x484/0x720
  Call Trace:
  [cb9e5c80] [c001b77c] do_page_fault+0x484/0x720 (unreliable)
  [cb9e5cb0] [c000424c] DataAccess_virt+0xd4/0xe4
  --- interrupt: 300 at __copy_tofrom_user+0x110/0x20c
  NIP:  c001f9b4 LR: c03250a0 CTR: 00000004
  REGS: cb9e5cc0 TRAP: 0300   Tainted: G        W          (5.13.0-pmac-00010-g8393422eb77)
  MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 48028468  XER: 20000000
  DAR: afd4fa84 DSISR: 0a000000
  GPR00: 20726f6f cb9e5d80 c1582c00 00000004 cb9e5e3a 00000016 afd4fa80 00000000
  GPR08: 3835202d 72777872 2d78722d 00000004 28028464 0063f8c8 00000000 000186a0
  GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90
  GPR24: 00000040 afd4fa96 00000040 cb9e5e0c 00000daa a0000000 cb9e5e98 afd4fa56
  NIP [c001f9b4] __copy_tofrom_user+0x110/0x20c
  LR [c03250a0] _copy_to_iter+0x144/0x990
  --- interrupt: 300
  [cb9e5d80] [c03e89c0] n_tty_read+0xa4/0x598 (unreliable)
  [cb9e5df0] [c03e2a0c] tty_read+0xdc/0x2b4
  [cb9e5e80] [c0156bf8] vfs_read+0x274/0x340
  [cb9e5f00] [c01571ac] ksys_read+0x70/0x118
  [cb9e5f30] [c0016048] ret_from_syscall+0x0/0x28
  --- interrupt: c00 at 0xa7855c88
  NIP:  a7855c88 LR: a7855c5c CTR: 00000000
  REGS: cb9e5f40 TRAP: 0c00   Tainted: G        W          (5.13.0-pmac-00010-g8393422eb77)
  MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 2402446c  XER: 00000000

  GPR00: 00000003 afd4ec70 a72137d0 0000000b afd4ecac 00004000 0065a990 00000800
  GPR08: 00000000 a7947930 00000000 00000004 c15831b0 0063f8c8 00000000 000186a0
  GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 0065a9e0 00000001 0065fac0
  GPR24: 00000000 00000089 00664050 00000000 00668e30 a720c8dc a7943ff4 0065f9b0
  NIP [a7855c88] 0xa7855c88
  LR [a7855c5c] 0xa7855c5c
  --- interrupt: c00
  Instruction dump:
  3884aa88 38630178 48076861 807f0080 48042e45 2f830000 419e0148 3c80c079
  3c60c076 38841be4 386301c0 4801f705 <0fe00000> 3860000b 4bfffe30 3c80c06b
  ---[ end trace fd69b91a8046c2e5 ]---

Here the problem is that by re-enterring an exception handler,
kuap_save_and_lock() is called a second time with this time KUAP
access locked, leading to regs->kuap being overwritten hence
KUAP not being unlocked at exception exit as expected.

Do not call do_IRQ() from timer_interrupt() directly. Instead,
redefine do_IRQ() as a standard function named __do_IRQ(), and
call it from both do_IRQ() and time_interrupt() handlers.

Fixes: 3a96570ffc ("powerpc: convert interrupt handlers to use wrappers")
Cc: stable@vger.kernel.org # v5.12+
Reported-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c17d234f4927d39a1d7100864a8e1145323d33a0.1628611927.git.christophe.leroy@csgroup.eu
2021-08-12 22:21:57 +10:00
Cédric Le Goater f5af0a9787 KVM: PPC: Book3S HV: XIVE: Add support for automatic save-restore
On P10, the feature doing an automatic "save & restore" of a VCPU
interrupt context is set by default in OPAL. When a VP context is
pulled out, the state of the interrupt registers are saved by the XIVE
interrupt controller under the internal NVP structure representing the
VP. This saves a costly store/load in guest entries and exits.

If OPAL advertises the "save & restore" feature in the device tree,
it should also have set the 'H' bit in the CAM line. Check that when
vCPUs are connected to their ICP in KVM before going any further.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210720134209.256133-3-clg@kaod.org
2021-08-10 23:15:02 +10:00
Cédric Le Goater c325712b5f powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi()
pnv_opal_pci_msi_eoi() is called from KVM to EOI passthrough interrupts
when in real mode. Adding MSI domain broke the hack using the
'ioda.irq_chip' field to deduce the owning PHB. Fix that by using the
IRQ chip data in the MSI domain.

The 'ioda.irq_chip' field is now unused and could be removed from the
pnv_phb struct.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-30-clg@kaod.org
2021-08-10 23:15:01 +10:00
Cédric Le Goater 248af248a8 powerpc/xics: Rename the map handler in a check handler
This moves the IRQ initialization done under the different ICS backends
in the common part of XICS. The 'map' handler becomes a simple 'check'
on the HW IRQ at the FW level.

As we don't need an ICS anymore in xics_migrate_irqs_away(), the XICS
domain does not set a chip data for the IRQ.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-18-clg@kaod.org
2021-08-10 23:14:59 +10:00
Cédric Le Goater e5e78b1511 KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines
The routine kvmppc_set_passthru_irq() calls kvmppc_xive_set_mapped()
and kvmppc_xive_clr_mapped() with an IRQ descriptor. Use directly the
host IRQ number to remove a useless conversion.

Add some debug.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-15-clg@kaod.org
2021-08-10 23:14:59 +10:00
Cédric Le Goater 9a014f4568 powerpc/pseries/pci: Add a msi_free() handler to clear XIVE data
The MSI domain clears the IRQ with msi_domain_free(), which calls
irq_domain_free_irqs_top(), which clears the handler data. This is a
problem for the XIVE controller since we need to unmap MMIO pages and
free a specific XIVE structure.

The 'msi_free()' handler is called before irq_domain_free_irqs_top()
when the handler data is still available. Use that to clear the XIVE
controller data.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-10-clg@kaod.org
2021-08-10 23:14:58 +10:00
Cédric Le Goater a5f3d2c17b powerpc/pseries/pci: Add MSI domains
Two IRQ domains are added on top of default machine IRQ domain.

First, the top level "pSeries-PCI-MSI" domain deals with the MSI
specificities. In this domain, the HW IRQ numbers are generated by the
PCI MSI layer, they compose a unique ID for an MSI source with the PCI
device identifier and the MSI vector number.

These numbers can be quite large on a pSeries machine running under
the IBM Hypervisor and /sys/kernel/irq/ and /proc/interrupts will
require small fixes to show them correctly.

Second domain is the in-the-middle "pSeries-MSI" domain which acts as
a proxy between the PCI MSI subsystem and the machine IRQ subsystem.
It usually allocate the MSI vector numbers but, on pSeries machines,
this is done by the RTAS FW and RTAS returns IRQ numbers in the IRQ
number space of the machine. This is why the in-the-middle "pSeries-MSI"
domain has the same HW IRQ numbers as its parent domain.

Only the XIVE (P9/P10) parent domain is supported for now. We still
need to add support for IRQ domain hierarchy under XICS.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210701132750.1475580-6-clg@kaod.org
2021-08-10 23:14:57 +10:00
Jordan Niethe 27fd111105 powerpc: Always inline radix_enabled() to fix build failure
This is the same as commit acdad8fb4a ("powerpc: Force inlining of
mmu_has_feature to fix build failure") but for radix_enabled().  The
config in the linked bugzilla causes the following build failure:

  LD      .tmp_vmlinux.kallsyms1
  powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function `.__ptep_set_access_flags':
  pgtable.c:(.text+0x17c): undefined reference to `.radix__ptep_set_access_flags'
  powerpc64-linux-ld: arch/powerpc/mm/pageattr.o: in function `.change_page_attr':
  pageattr.c:(.text+0xc0): undefined reference to `.radix__flush_tlb_kernel_range'
  etc.

This is due to radix_enabled() not being inlined. See extract from
building with -Winline:

  In file included from arch/powerpc/include/asm/lppaca.h:46,
                   from arch/powerpc/include/asm/paca.h:17,
                   from arch/powerpc/include/asm/current.h:13,
                   from include/linux/thread_info.h:23,
                   from include/asm-generic/preempt.h:5,
                   from ./arch/powerpc/include/generated/asm/preempt.h:1,
                   from include/linux/preempt.h:78,
                   from include/linux/spinlock.h:51,
                   from include/linux/mmzone.h:8,
                   from include/linux/gfp.h:6,
                   from arch/powerpc/mm/pgtable.c:21:
  arch/powerpc/include/asm/book3s/64/pgtable.h: In function '__ptep_set_access_flags':
  arch/powerpc/include/asm/mmu.h:327:20: error: inlining failed in call to 'radix_enabled': call is unlikely and code size would grow [-Werror=inline]

The code relies on constant folding of MMU_FTRS_POSSIBLE at buildtime
and elimination of non possible parts of code at compile time. For this
to work radix_enabled() must be inlined so make it __always_inline.

Reported-by: Erhard F. <erhard_f@mailbox.org>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
[mpe: Trimmed error messages in change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213803
Link: https://lore.kernel.org/r/20210804013724.514468-1-jniethe5@gmail.com
2021-08-10 23:14:56 +10:00
Laurent Dufour d144f4d5a8 pseries/drmem: update LMBs after LPM
After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
updated by the hypervisor in the case the NUMA topology of the LPAR's
memory is updated.

This is handled by the kernel, but the memory's node is not updated because
there is no way to move a memory block between nodes from the Linux kernel
point of view.

If later a memory block is added or removed, drmem_update_dt() is called
and it is overwriting the DT node ibm,dynamic-reconfiguration-memory to
match the added or removed LMB. But the LMB's associativity node has not
been updated after the DT node update and thus the node is overwritten by
the Linux's topology instead of the hypervisor one.

Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
updated to force an update of the LMB's associativity. However, ignore the
call to that hook when the update has been triggered by drmem_update_dt().
Because, in that case, the LMB tree has been used to set the DT property
and thus it doesn't need to be updated back. Since drmem_update_dt() is
called under the protection of the device_hotplug_lock and the hook is
called in the same context, use a simple boolean variable to detect that
call.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210517090606.56930-1-ldufour@linux.ibm.com
2021-08-10 23:14:55 +10:00
Christophe Leroy c8a6d91005 powerpc/non-smp: Unconditionaly call smp_mb() on switch_mm
Commit 3ccfebedd8 ("powerpc, membarrier: Skip memory barrier in
switch_mm()") added some logic to skip the smp_mb() in
switch_mm_irqs_off() before the call to switch_mmu_context().

However, on non SMP smp_mb() is just a compiler barrier and doing
it unconditionaly is simpler than the logic used to check whether the
barrier is needed or not.

After the patch:

00000000 <switch_mm_irqs_off>:
...
   c:	7c 04 18 40 	cmplw   r4,r3
  10:	81 24 00 24 	lwz     r9,36(r4)
  14:	91 25 04 c8 	stw     r9,1224(r5)
  18:	4d 82 00 20 	beqlr
  1c:	48 00 00 00 	b       1c <switch_mm_irqs_off+0x1c>
			1c: R_PPC_REL24	switch_mmu_context

Before the patch:

00000000 <switch_mm_irqs_off>:
...
   c:	7c 04 18 40 	cmplw   r4,r3
  10:	81 24 00 24 	lwz     r9,36(r4)
  14:	91 25 04 c8 	stw     r9,1224(r5)
  18:	4d 82 00 20 	beqlr
  1c:	81 24 00 28 	lwz     r9,40(r4)
  20:	71 29 00 0a 	andi.   r9,r9,10
  24:	40 82 00 34 	bne     58 <switch_mm_irqs_off+0x58>
  28:	48 00 00 00 	b       28 <switch_mm_irqs_off+0x28>
			28: R_PPC_REL24	switch_mmu_context
...
  58:	2c 03 00 00 	cmpwi   r3,0
  5c:	41 82 ff cc 	beq     28 <switch_mm_irqs_off+0x28>
  60:	48 00 00 00 	b       60 <switch_mm_irqs_off+0x60>
			60: R_PPC_REL24	switch_mmu_context

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e9d501da0c59f60ca767b1b3ea4603fce6d02b9e.1625486440.git.christophe.leroy@csgroup.eu
2021-08-10 23:14:55 +10:00
Christophe Leroy 09ca497528 powerpc: Remove in_kernel_text()
Last user of in_kernel_text() stopped using in with
commit 549e8152de ("powerpc: Make the 64-bit kernel as a
position-independent executable").

Generic function is_kernel_text() does the same.

So remote it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2a3a5b6f8cc0ef4e854d7b764f66aa8d2ee270d2.1624813698.git.christophe.leroy@csgroup.eu
2021-08-10 23:14:55 +10:00
Christophe Leroy b5cfc9cd7b powerpc/32: Fix critical and debug interrupts on BOOKE
32 bits BOOKE have special interrupts for debug and other
critical events.

When handling those interrupts, dedicated registers are saved
in the stack frame in addition to the standard registers, leading
to a shift of the pt_regs struct.

Since commit db297c3b07 ("powerpc/32: Don't save thread.regs on
interrupt entry"), the pt_regs struct is expected to be at the
same place all the time.

Instead of handling a special struct in addition to pt_regs, just
add those special registers to struct pt_regs.

Fixes: db297c3b07 ("powerpc/32: Don't save thread.regs on interrupt entry")
Cc: stable@vger.kernel.org
Reported-by: Radu Rendec <radu.rendec@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/028d5483b4851b01ea4334d0751e7f260419092b.1625637264.git.christophe.leroy@csgroup.eu
2021-08-07 08:53:59 +10:00
Parth Shah e9ef81e107 powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings
On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
itself to find both the L2 and L3 cache siblings.
Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings,
but fill the mask using same property "2" array.

Signed-off-by: Parth Shah <parth@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210728175607.591679-4-parth@linux.ibm.com
2021-08-04 10:53:39 +10:00
Gautham R. Shenoy a4bec516b9 powerpc/cacheinfo: Lookup cache by dt node and thread-group id
Currently the cacheinfo code on powerpc indexes the "cache" objects
(modelling the L1/L2/L3 caches) where the key is device-tree node
corresponding to that cache. On some of the POWER server platforms
thread-groups within the core share different sets of caches (Eg: On
SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
threads 1,3,5,7 of the same core share another L1 cache). On such
platforms, there is a single device-tree node corresponding to that
cache and the cache-configuration within the threads of the core is
indicated via "ibm,thread-groups" device-tree property.

Since the current code is not aware of the "ibm,thread-groups"
property, on the aforementoined systems, cacheinfo code still treats
all the threads in the core to be sharing the cache because of the
single device-tree node (In the earlier example, the cacheinfo code
would says CPUs 0-7 share L1 cache).

In this patch, we make the powerpc cacheinfo code aware of the
"ibm,thread-groups" property. We indexe the "cache" objects by the
key-pair (device-tree node, thread-group id). For any CPUX, for a
given level of cache, the thread-group id is defined to be the first
CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
of cache which are not represented in "ibm,thread-groups" property,
the thread-group id is -1.

[parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map"
and "thread_group_l2_cache_map" to get rid of the compile error.]

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Parth Shah <parth@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210728175607.591679-2-parth@linux.ibm.com
2021-08-04 10:53:39 +10:00
Arnd Bergmann 1a33b18b3b compat: make linux/compat.h available everywhere
Parts of linux/compat.h are under an #ifdef, but we end up
using more of those over time, moving things around bit by
bit.

To get it over with once and for all, make all of this file
uncondititonal now so it can be accessed everywhere. There
are only a few types left that are in asm/compat.h but not
yet in the asm-generic version, so add those in the process.

This requires providing a few more types in asm-generic/compat.h
that were not already there. The only tricky one is
compat_sigset_t, which needs a little help on 32-bit architectures
and for x86.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23 14:20:24 +01:00
Aneesh Kumar K.V cec6515abb powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache
flush_tlb_range is special in that we don't specify the page size used for
the translation.  Hence when flushing TLB we flush the translation cache
for all possible page sizes.  The kernel also uses the same interface when
moving page tables around.  Such a move requires us to flush the page walk
cache.

Instead of adding another interface to force page walk cache flush, update
flush_tlb_range to flush page walk cache if the range flushed is more than
the PMD range.  A page table move will always involve an invalidate range
more than PMD_SIZE.

Running microbenchmark with mprotect and parallel memory access didn't
show any observable performance impact.

Link: https://lkml.kernel.org/r/20210616045735.374532-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:23 -07:00
Aneesh Kumar K.V 3bbda69c48 mm/mremap: allow arch runtime override
Patch series "Speedup mremap on ppc64", v8.

This patchset enables MOVE_PMD/MOVE_PUD support on power.  This requires
the platform to support updating higher-level page tables without updating
page table entries.  This also needs to invalidate the Page Walk Cache on
architecture supporting the same.

This patch (of 3):

Architectures like ppc64 support faster mremap only with radix
translation.  Hence allow a runtime check w.r.t support for fast mremap.

Link: https://lkml.kernel.org/r/20210616045735.374532-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20210616045735.374532-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:23 -07:00
Aneesh Kumar K.V dc4875f0e7 mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *
No functional change in this patch.

[aneesh.kumar@linux.ibm.com: m68k build error reported by kernel robot]
  Link: https://lkml.kernel.org/r/87tulxnb2v.fsf@linux.ibm.com

Link: https://lkml.kernel.org/r/20210615110859.320299-2-aneesh.kumar@linux.ibm.com
Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:22 -07:00
Aneesh Kumar K.V 9cf6fa2458 mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
No functional change in this patch.

[aneesh.kumar@linux.ibm.com: fix]
  Link: https://lkml.kernel.org/r/87wnqtnb60.fsf@linux.ibm.com
[sfr@canb.auug.org.au: another fix]
  Link: https://lkml.kernel.org/r/20210619134410.89559-1-aneesh.kumar@linux.ibm.com

Link: https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.kumar@linux.ibm.com
Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:22 -07:00
Linus Torvalds 019b3fd94b powerpc updates for 5.14
- A big series refactoring parts of our KVM code, and converting some to C.
 
  - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs.
 
  - Support for the Microwatt soft-core.
 
  - Optimisations to our interrupt return path on 64-bit.
 
  - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10.
 
  - Enable KUAP and KUEP by default on 32-bit Book3S CPUs.
 
  - Other smaller features, fixes & cleanups.
 
 Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li,
 Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique
 Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley,
 Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas
 Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun
 Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain,
 YueHaibing, Zhang Jianhua, Zhen Lei.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmDfFS4THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFxHEAC88NJ+Gz87LiTQFt6QjhziBaJUd+sY
 uqADPRROr4P50O8PjYZbMi2qbXzOlLkZO4wJWX7jpZ1F9KmbPNqY2shD8h4ahyge
 F/uqzBW1FXBJfnDEKdU2MzalkeTP+dwxLZyouUamjDCGNLFjOV4x/Fft5otOdXjO
 k9uO6yoGyOkWYzjC+Y/irNPlIDDByB/+bD92Cb52Y2mXMDDEnx4JzbtkeJW+8udT
 Sjn3bWzeL+dz5GehjMKwK4+SptNiyQGOgM8FwtnKUMvgzxv04DqCGjr9YC12L2Z7
 VoFZc4GzVgtf8DZg4fJ3KG5aG2nH3Tui7jc9lUckdrxixDAZw5wSG7CQ39gFb/5+
 7A4fEJk4Z3h5llibwxAZrC7wV8ZDDXn8oRFzRcOJjfxYaD+ohOOyWHIebwkdiXYx
 nfYI7sBcScDLXeBvHtDra2GJpbFSpVL3S/QNhhi1vKVNrFSyAgbAybcVL2xPLZ6+
 8Mh7A8xt+hf2bo9AXuYJDo9mwXWfg1093d0kT+AslcRhZioBk18c2AiZLIz0FzuL
 Ua/e5FPb99x9LSdcZHvaAXBoHT2iTgDyCyDa3gkIesyuRX6ggHoFcVQuvdDcbJ9d
 H8LK+Tahy1Y+E5b6KdtU8mDEGE+QG+CWLnwQ6YSCaL/MYgaFzNa32Jdj1fmztSBC
 cttP43kHZ7ljTw==
 =zo4d
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - A big series refactoring parts of our KVM code, and converting some
   to C.

 - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on
   some CPUs.

 - Support for the Microwatt soft-core.

 - Optimisations to our interrupt return path on 64-bit.

 - Support for userspace access to the NX GZIP accelerator on PowerVM on
   Power10.

 - Enable KUAP and KUEP by default on 32-bit Book3S CPUs.

 - Other smaller features, fixes & cleanups.

Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe
Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand,
Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe,
Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas
Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika
Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar
Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei.

* tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits)
  powerpc: Only build restart_table.c for 64s
  powerpc/64s: move ret_from_fork etc above __end_soft_masked
  powerpc/64s/interrupt: clean up interrupt return labels
  powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols
  powerpc/64: enable MSR[EE] in irq replay pt_regs
  powerpc/64s/interrupt: preserve regs->softe for NMI interrupts
  powerpc/64s: add a table of implicit soft-masked addresses
  powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
  powerpc/64e: fix CONFIG_RELOCATABLE build warnings
  powerpc/64s: fix hash page fault interrupt handler
  powerpc/4xx: Fix setup_kuep() on SMP
  powerpc/32s: Fix setup_{kuap/kuep}() on SMP
  powerpc/interrupt: Use names in check_return_regs_valid()
  powerpc/interrupt: Also use exit_must_hard_disable() on PPC32
  powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE
  powerpc/ptrace: Refactor regs_set_return_{msr/ip}
  powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip}
  powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
  powerpc/pseries/vas: Include irqdomain.h
  powerpc: mark local variables around longjmp as volatile
  ...
2021-07-02 12:54:34 -07:00
Linus Torvalds 4cad671979 asm-generic/unaligned: Unify asm/unaligned.h around struct helper
The get_unaligned()/put_unaligned() helpers are traditionally architecture
 specific, with the two main variants being the "access-ok.h" version
 that assumes unaligned pointer accesses always work on a particular
 architecture, and the "le-struct.h" version that casts the data to a
 byte aligned type before dereferencing, for architectures that cannot
 always do unaligned accesses in hardware.
 
 Based on the discussion linked below, it appears that the access-ok
 version is not realiable on any architecture, but the struct version
 probably has no downsides. This series changes the code to use the
 same implementation on all architectures, addressing the few exceptions
 separately.
 
 Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
 Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
 Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2
 Link: https://lore.kernel.org/lkml/CAHk-=whGObOKruA_bU3aPGZfoDqZM1_9wBkwREp0H0FgR-90uQ@mail.gmail.com/
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmDfFx4ACgkQmmx57+YA
 GNkqzRAAjdlIr8M+xI2CyT0/A9tswYfLMeWejmYopq3zlxI6RnvPiJJDIdY2I8US
 1npIiDo55w061CnXL9rV65ocL3XmGu1mabOvgM6ATsec+8t4WaXBV9tysxTJ9ea0
 ltLTa2P5DXWALvWiVMTME7hFaf1cW+8Uqt3LmXxDp2l5zasXajCHAH6YokON2PfM
 CsaRhwSxIu8Sbnu/IQGBI9JW5UXsBfKSyUwtM0OwP7jFOuIeZ4WBVA+j6UxONnFC
 wouKmAM/ThoOsaV9aP4EZLIfBx8d4/hfYQjZ958kYXurerruYkJeEqdIRbV0QqTy
 2O6ZrJ6uqPlzfWz9h458me2dt98YEtALHV/3DCWUcBfHmUQtxElyJYEhG0YjVF3H
 5RYtjw8Q2LS/QR5ask1Xn0JfT89rRnLi2migAtsA4Ce70JP4Us6wGobkj4SHlgDt
 P7+eVq2Mkhqw/kmV8N4p+ZS5lpkK0JniDN+ONDhkZqHL/zXG/HQzx9wLV69jlvo2
 ASevKxITdi+bKHWs5ANungkBOnBUQZacq46mVyi4HPDwMAFyWvVYTbFumy9koagQ
 o9NEgX3RsZcxxi7bU1xuFPFMLMlUQT3Nb30+84B4fKe9FmvHC1hizTiCnp7q4bZr
 z6a6AMHke7YLqKZOqzTJGRR3lPoZZDCb775SAd70LQp6XPZXOHs=
 =IY5U
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-unaligned-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm/unaligned.h unification from Arnd Bergmann:
 "Unify asm/unaligned.h around struct helper

  The get_unaligned()/put_unaligned() helpers are traditionally
  architecture specific, with the two main variants being the
  "access-ok.h" version that assumes unaligned pointer accesses always
  work on a particular architecture, and the "le-struct.h" version that
  casts the data to a byte aligned type before dereferencing, for
  architectures that cannot always do unaligned accesses in hardware.

  Based on the discussion linked below, it appears that the access-ok
  version is not realiable on any architecture, but the struct version
  probably has no downsides. This series changes the code to use the
  same implementation on all architectures, addressing the few
  exceptions separately"

Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/
Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2
Link: https://lore.kernel.org/lkml/CAHk-=whGObOKruA_bU3aPGZfoDqZM1_9wBkwREp0H0FgR-90uQ@mail.gmail.com/

* tag 'asm-generic-unaligned-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: simplify asm/unaligned.h
  asm-generic: uaccess: 1-byte access is always aligned
  netpoll: avoid put_unaligned() on single character
  mwifiex: re-fix for unaligned accesses
  apparmor: use get_unaligned() only for multi-byte words
  partitions: msdos: fix one-byte get_unaligned()
  asm-generic: unaligned always use struct helpers
  asm-generic: unaligned: remove byteshift helpers
  powerpc: use linux/unaligned/le_struct.h on LE power7
  m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
  sh: remove unaligned access for sh4a
  openrisc: always use unaligned-struct header
  asm-generic: use asm-generic/unaligned.h for most architectures
2021-07-02 12:43:40 -07:00
Linus Torvalds 71bd934101 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "190 patches.

  Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
  vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
  migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
  zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
  core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
  signals, exec, kcov, selftests, compress/decompress, and ipc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
  ipc/util.c: use binary search for max_idx
  ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
  ipc: use kmalloc for msg_queue and shmid_kernel
  ipc sem: use kvmalloc for sem_undo allocation
  lib/decompressors: remove set but not used variabled 'level'
  selftests/vm/pkeys: exercise x86 XSAVE init state
  selftests/vm/pkeys: refill shadow register after implicit kernel write
  selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
  selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
  kcov: add __no_sanitize_coverage to fix noinstr for all architectures
  exec: remove checks in __register_bimfmt()
  x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
  hfsplus: report create_date to kstat.btime
  hfsplus: remove unnecessary oom message
  nilfs2: remove redundant continue statement in a while-loop
  kprobes: remove duplicated strong free_insn_page in x86 and s390
  init: print out unknown kernel parameters
  checkpatch: do not complain about positive return values starting with EPOLL
  checkpatch: improve the indented label test
  checkpatch: scripts/spdxcheck.py now requires python3
  ...
2021-07-02 12:08:10 -07:00
Anshuman Khandual 1c2f7d14d8 mm/thp: define default pmd_pgtable()
Currently most platforms define pmd_pgtable() as pmd_page() duplicating
the same code all over.  Instead just define a default value i.e
pmd_page() for pmd_pgtable() and let platforms override when required via
<asm/pgtable.h>.  All the existing platform that override pmd_pgtable()
have been moved into their respective <asm/pgtable.h> header in order to
precede before the new generic definition.  This makes it much cleaner
with reduced code.

Link: https://lkml.kernel.org/r/1623646133-20306-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:03 -07:00
Anshuman Khandual fac7757e1f mm: define default value for FIRST_USER_ADDRESS
Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the
same code all over.  Instead just define a generic default value (i.e 0UL)
for FIRST_USER_ADDRESS and let the platforms override when required.  This
makes it much cleaner with reduced code.

The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h>
when the given platform overrides its value via <asm/pgtable.h>.

Link: https://lkml.kernel.org/r/1620615725-24623-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Acked-by: Guo Ren <guoren@kernel.org>			[csky]
Acked-by: Stafford Horne <shorne@gmail.com>		[openrisc]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>	[RISC-V]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01 11:06:02 -07:00
Christophe Leroy a6a8f7c4aa powerpc/8xx: add support for huge pages on VMAP and VMALLOC
powerpc 8xx has 4 page sizes:
- 4k
- 16k
- 512k
- 8M

At the time being, vmalloc and vmap only support huge pages which are leaf
at PMD level.

Here the PMD level is 4M, it doesn't correspond to any supported page
size.

For now, implement use of 16k and 512k pages which is done at PTE level.

Support of 8M pages will be implemented later, it requires vmalloc to
support hugepd tables.

Link: https://lkml.kernel.org/r/8b972f1c03fb6bd59953035f0a3e4d26659de4f8.1620795204.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:26 -07:00
Christophe Leroy 79c1c594f4 mm/hugetlb: change parameters of arch_make_huge_pte()
Patch series "Subject: [PATCH v2 0/5] Implement huge VMAP and VMALLOC on powerpc 8xx", v2.

This series implements huge VMAP and VMALLOC on powerpc 8xx.

Powerpc 8xx has 4 page sizes:
- 4k
- 16k
- 512k
- 8M

At the time being, vmalloc and vmap only support huge pages which are
leaf at PMD level.

Here the PMD level is 4M, it doesn't correspond to any supported
page size.

For now, implement use of 16k and 512k pages which is done
at PTE level.

Support of 8M pages will be implemented later, it requires use of
hugepd tables.

To allow this, the architecture provides two functions:
- arch_vmap_pte_range_map_size() which tells vmap_pte_range() what
page size to use. A stub returning PAGE_SIZE is provided when the
architecture doesn't provide this function.
- arch_vmap_pte_supported_shift() which tells __vmalloc_node_range()
what page shift to use for a given area size. A stub returning
PAGE_SHIFT is provided when the architecture doesn't provide this
function.

This patch (of 5):

At the time being, arch_make_huge_pte() has the following prototype:

  pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
			   struct page *page, int writable);

vma is used to get the pages shift or size.
vma is also used on Sparc to get vm_flags.
page is not used.
writable is not used.

In order to use this function without a vma, replace vma by shift and
flags.  Also remove the used parameters.

Link: https://lkml.kernel.org/r/cover.1620795204.git.christophe.leroy@csgroup.eu
Link: https://lkml.kernel.org/r/f4633ac6a7da2f22f31a04a89e0a7026bb78b15b.1620795204.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30 20:47:26 -07:00
Nicholas Piggin 2b43dd7653 powerpc/64: enable MSR[EE] in irq replay pt_regs
Similar to commit 2b48e96be2 ("powerpc/64: fix irq replay
pt_regs->softe value"), enable MSR_EE in pt_regs->msr. This makes the
regs look more normal. It also allows some extra debug checks to be
added to interrupt handler entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210630074621.2109197-7-npiggin@gmail.com
2021-06-30 22:21:20 +10:00
Nicholas Piggin 1b0482229c powerpc/64s/interrupt: preserve regs->softe for NMI interrupts
If an NMI interrupt hits in an implicit soft-masked region, regs->softe
is modified to reflect that. This may not be necessary for correctness
at the moment, but it is less surprising and it's unhelpful when
debugging or adding checks.

Make sure this is changed back to how it was found before returning.

Fixes: 4ec5feec1a ("powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210630074621.2109197-6-npiggin@gmail.com
2021-06-30 22:21:20 +10:00
Nicholas Piggin 325678fd05 powerpc/64s: add a table of implicit soft-masked addresses
Commit 9d1988ca87 ("powerpc/64: treat low kernel text as irqs
soft-masked") ends up catching too much code, including ret_from_fork,
and parts of interrupt and syscall return that do not expect to be
interrupts to be soft-masked. If an interrupt gets marked pending,
and then the code proceeds out of the implicit soft-masked region it
will fail to deal with the pending interrupt.

Fix this by adding a new table of addresses which explicitly marks
the regions of code that are soft masked. This table is only checked
for interrupts that below __end_soft_masked, so most kernel interrupts
will not have the overhead of the table search.

Fixes: 9d1988ca87 ("powerpc/64: treat low kernel text as irqs soft-masked")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210630074621.2109197-5-npiggin@gmail.com
2021-06-30 22:21:20 +10:00
Nicholas Piggin 9b69d48c75 powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
The implicit soft-masking to speed up interrupt return was going to be
used by 64e as well, but it has not been extensively tested on that
platform and is not considered ready. It was intended to be disabled
before merge. Disable it for now.

Most of the restart code is common with 64s, so with more correctness
and performance testing this could be re-enabled again by adding the
extra soft-mask checks to interrupt handlers and flipping
exit_must_hard_disable().

Fixes: 9d1988ca87 ("powerpc/64: treat low kernel text as irqs soft-masked")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210630074621.2109197-4-npiggin@gmail.com
2021-06-30 22:21:20 +10:00
Linus Torvalds 65090f30ab Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "191 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
  slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
  mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
  pagealloc, and memory-failure)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
  mm,hwpoison: make get_hwpoison_page() call get_any_page()
  mm,hwpoison: send SIGBUS with error virutal address
  mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
  mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
  mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
  mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
  docs: remove description of DISCONTIGMEM
  arch, mm: remove stale mentions of DISCONIGMEM
  mm: remove CONFIG_DISCONTIGMEM
  m68k: remove support for DISCONTIGMEM
  arc: remove support for DISCONTIGMEM
  arc: update comment about HIGHMEM implementation
  alpha: remove DISCONTIGMEM and NUMA
  mm/page_alloc: move free_the_page
  mm/page_alloc: fix counting of managed_pages
  mm/page_alloc: improve memmap_pages dbg msg
  mm: drop SECTION_SHIFT in code comments
  mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
  mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
  mm/page_alloc: scale the number of pages that are batch freed
  ...
2021-06-29 17:29:11 -07:00
Linus Torvalds 21edf50948 Updates for the interrupt subsystem:
Core changes:
 
   - Cleanup and simplification of common code to invoke the low level
     interrupt flow handlers when this invocation requires irqdomain
     resolution. Add the necessary core infrastructure.
 
   - Provide a proper interface for modular PMU drivers to set the
     interrupt affinity.
 
   - Add a request flag which allows to exclude interrupts from spurious
     interrupt detection. Useful especially for IPI handlers which always
     return IRQ_HANDLED which turns the spurious interrupt detection into a
     pointless waste of CPU cycles.
 
 Driver changes:
 
   - Bulk convert interrupt chip drivers to the new irqdomain low level flow
     handler invocation mechanism.
 
   - Add device tree bindings for the Renesas R-Car M3-W+ SoC
 
   - Enable modular build of the Qualcomm PDC driver
 
   - The usual small fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDbIg8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobZNEAC2wTq3Ishk026va7g5mbQVSvAQyf8G
 0msmgJ48lJWVL9a6JUogNcCO7sZCTcAy4CYbuHI6kz1fGZZnNWSCrtEz0rFNAdWE
 WVR2k8ExR2R73vJm+K50WUMMj8YsefRnIFXWlJdTp+pksr3TZ7Lo70taGUK/6tMo
 aL0dqvnf7Vb3LG0iIkaHWLF4HnyK/UGqB+121rlL4UhI1/g+3EUxNWNcY5eg/dmc
 Ym73U1uDsjydp3/3jm8v8NYNtwCDGpujZZc/88RFLjP6PMpF1S9JUvDEt+LHJi0a
 cdS3RreB78HYXpLg5NtDFJwIegRMLSitvCGPBjHvWBzbifkMsA2zWIb6Cs8VkYys
 vuPoEGZ0ol+SWvcnSh5Xy36nyr4iGIBhQql47UAaqelSxsYPjvCCSD4yJV3k8hnC
 ZuDscOekXUMn75qZR0quNdi1SkgKpGZxK73QFbuW3Apl5EgArVai6kq0rbl6zlx6
 ACy0SEcevhOcpU6WpqDgrmUBgFr+M8zina8edRELgiFEuWT6pYxKwrN3pT4U5djO
 e5V3YuNzzwzvtUoXN4AiTlT8gwRiGfgeiEvHpvZBXPNvk5ffS6XzPiV81ZMWiBkb
 ReoCbqME3PKoxj1VAHJvVXHbcjiPIJeCRdV+5vQSNh1SPSQOmEdWyJtNUDrSkoym
 QkKKY5jrOhPhlQ==
 =FIKh
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "Updates for the interrupt subsystem:

  Core changes:

   - Cleanup and simplification of common code to invoke the low level
     interrupt flow handlers when this invocation requires irqdomain
     resolution. Add the necessary core infrastructure.

   - Provide a proper interface for modular PMU drivers to set the
     interrupt affinity.

   - Add a request flag which allows to exclude interrupts from spurious
     interrupt detection. Useful especially for IPI handlers which
     always return IRQ_HANDLED which turns the spurious interrupt
     detection into a pointless waste of CPU cycles.

  Driver changes:

   - Bulk convert interrupt chip drivers to the new irqdomain low level
     flow handler invocation mechanism.

   - Add device tree bindings for the Renesas R-Car M3-W+ SoC

   - Enable modular build of the Qualcomm PDC driver

   - The usual small fixes and improvements"

* tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties
  irqchip: gic-pm: Remove redundant error log of clock bulk
  irqchip/sun4i: Remove unnecessary oom message
  irqchip/irq-imx-gpcv2: Remove unnecessary oom message
  irqchip/imgpdc: Remove unnecessary oom message
  irqchip/gic-v3-its: Remove unnecessary oom message
  irqchip/gic-v2m: Remove unnecessary oom message
  irqchip/exynos-combiner: Remove unnecessary oom message
  irqchip: Bulk conversion to generic_handle_domain_irq()
  genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ()
  genirq: Add generic_handle_domain_irq() helper
  irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq()
  irqdesc: Fix __handle_domain_irq() comment
  genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co
  irqdomain: Introduce irq_resolve_mapping()
  irqdomain: Protect the linear revmap with RCU
  irqdomain: Cache irq_data instead of a virq number in the revmap
  irqdomain: Use struct_size() helper when allocating irqdomain
  irqdomain: Make normal and nomap irqdomains exclusive
  powerpc: Move the use of irq_domain_add_nomap() behind a config option
  ...
2021-06-29 12:25:04 -07:00
Mike Rapoport a9ee6cf5c6 mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
configuration options are equivalent.

Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.

Done with

	$ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
		$(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
	$ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
		$(git grep -wl NEED_MULTIPLE_NODES)

with manual tweaks afterwards.

[rppt@linux.ibm.com: fix arm boot crash]
  Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com

Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29 10:53:55 -07:00
Linus Torvalds 36824f198c ARM:
- Add MTE support in guests, complete with tag save/restore interface
 
 - Reduce the impact of CMOs by moving them in the page-table code
 
 - Allow device block mappings at stage-2
 
 - Reduce the footprint of the vmemmap in protected mode
 
 - Support the vGIC on dumb systems such as the Apple M1
 
 - Add selftest infrastructure to support multiple configuration
   and apply that to PMU/non-PMU setups
 
 - Add selftests for the debug architecture
 
 - The usual crop of PMU fixes
 
 PPC:
 
 - Support for the H_RPT_INVALIDATE hypercall
 
 - Conversion of Book3S entry/exit to C
 
 - Bug fixes
 
 S390:
 
 - new HW facilities for guests
 
 - make inline assembly more robust with KASAN and co
 
 x86:
 
 - Allow userspace to handle emulation errors (unknown instructions)
 
 - Lazy allocation of the rmap (host physical -> guest physical address)
 
 - Support for virtualizing TSC scaling on VMX machines
 
 - Optimizations to avoid shattering huge pages at the beginning of live migration
 
 - Support for initializing the PDPTRs without loading them from memory
 
 - Many TLB flushing cleanups
 
 - Refuse to load if two-stage paging is available but NX is not (this has
   been a requirement in practice for over a year)
 
 - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
   CR0/CR4/EFER, using the MMU mode everywhere once it is computed
   from the CPU registers
 
 - Use PM notifier to notify the guest about host suspend or hibernate
 
 - Support for passing arguments to Hyper-V hypercalls using XMM registers
 
 - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on
   AMD processors
 
 - Hide Hyper-V hypercalls that are not included in the guest CPUID
 
 - Fixes for live migration of virtual machines that use the Hyper-V
   "enlightened VMCS" optimization of nested virtualization
 
 - Bugfixes (not many)
 
 Generic:
 
 - Support for retrieving statistics without debugfs
 
 - Cleanups for the KVM selftests API
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV9UYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIRgf/XX8fKLh24RnTOs2ldIu2AfRGVrT4
 QMrr8MxhmtukBAszk2xKvBt8/6gkUjdaIC3xqEnVjxaDaUvZaEtP7CQlF5JV45rn
 iv1zyxUKucXrnIOr+gCioIT7qBlh207zV35ArKioP9Y83cWx9uAs22pfr6g+7RxO
 h8bJZlJbSG6IGr3voANCIb9UyjU1V/l8iEHqRwhmr/A5rARPfD7g8lfMEQeGkzX6
 +/UydX2fumB3tl8e2iMQj6vLVdSOsCkehvpHK+Z33EpkKhan7GwZ2sZ05WmXV/nY
 QLAYfD10KegoNWl5Ay4GTp4hEAIYVrRJCLC+wnLdc0U8udbfCuTC31LK4w==
 =NcRh
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This covers all architectures (except MIPS) so I don't expect any
  other feature pull requests this merge window.

  ARM:

   - Add MTE support in guests, complete with tag save/restore interface

   - Reduce the impact of CMOs by moving them in the page-table code

   - Allow device block mappings at stage-2

   - Reduce the footprint of the vmemmap in protected mode

   - Support the vGIC on dumb systems such as the Apple M1

   - Add selftest infrastructure to support multiple configuration and
     apply that to PMU/non-PMU setups

   - Add selftests for the debug architecture

   - The usual crop of PMU fixes

  PPC:

   - Support for the H_RPT_INVALIDATE hypercall

   - Conversion of Book3S entry/exit to C

   - Bug fixes

  S390:

   - new HW facilities for guests

   - make inline assembly more robust with KASAN and co

  x86:

   - Allow userspace to handle emulation errors (unknown instructions)

   - Lazy allocation of the rmap (host physical -> guest physical
     address)

   - Support for virtualizing TSC scaling on VMX machines

   - Optimizations to avoid shattering huge pages at the beginning of
     live migration

   - Support for initializing the PDPTRs without loading them from
     memory

   - Many TLB flushing cleanups

   - Refuse to load if two-stage paging is available but NX is not (this
     has been a requirement in practice for over a year)

   - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
     CR0/CR4/EFER, using the MMU mode everywhere once it is computed
     from the CPU registers

   - Use PM notifier to notify the guest about host suspend or hibernate

   - Support for passing arguments to Hyper-V hypercalls using XMM
     registers

   - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap
     on AMD processors

   - Hide Hyper-V hypercalls that are not included in the guest CPUID

   - Fixes for live migration of virtual machines that use the Hyper-V
     "enlightened VMCS" optimization of nested virtualization

   - Bugfixes (not many)

  Generic:

   - Support for retrieving statistics without debugfs

   - Cleanups for the KVM selftests API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits)
  KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
  kvm: x86: disable the narrow guest module parameter on unload
  selftests: kvm: Allows userspace to handle emulation errors.
  kvm: x86: Allow userspace to handle emulation errors
  KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
  KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
  KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
  KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
  KVM: x86: Enhance comments for MMU roles and nested transition trickiness
  KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
  KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
  KVM: x86/mmu: Use MMU's role to determine PTTYPE
  KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
  KVM: x86/mmu: Add a helper to calculate root from role_regs
  KVM: x86/mmu: Add helper to update paging metadata
  KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
  KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
  KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
  KVM: x86/mmu: Get nested MMU's root level from the MMU's role
  ...
2021-06-28 15:40:51 -07:00
Linus Torvalds a15286c63d Locking changes for this cycle:
- Core locking & atomics:
 
      - Convert all architectures to ARCH_ATOMIC: move every
        architecture to ARCH_ATOMIC, then get rid of ARCH_ATOMIC
        and all the transitory facilities and #ifdefs.
 
        Much reduction in complexity from that series:
 
            63 files changed, 756 insertions(+), 4094 deletions(-)
 
      - Self-test enhancements
 
  - Futexes:
 
      - Add the new FUTEX_LOCK_PI2 ABI, which is a variant that
        doesn't set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).
 
        [ The temptation to repurpose FUTEX_LOCK_PI's implicit
          setting of FLAGS_CLOCKRT & invert the flag's meaning
          to avoid having to introduce a new variant was
          resisted successfully. ]
 
      - Enhance futex self-tests
 
  - Lockdep:
 
      - Fix dependency path printouts
      - Optimize trace saving
      - Broaden & fix wait-context checks
 
  - Misc cleanups and fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZaEYRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hPdxAAiNCsxL6X1cZ8zqbWsvLefT9Zqhzgs5u6
 gdZele7PNibvbYdON26b5RUzuKfOW/hgyX6LKqr+AiNYTT9PGhcY+tycUr2PGk5R
 LMyhJWmmX5cUVPU92ky+z5hEHB2gr4XPJcvgpKKUL0XB1tBaSvy2DtgwPuhXOoT1
 1sCQfy63t71snt2RfEnibVW6xovwaA2lsqL81lLHJN4iRFWvqO498/m4+PWkylsm
 ig/+VT1Oz7t4wqu3NhTqNNZv+4K4W2asniyo53Dg2BnRm/NjhJtgg4jRibrb0ssb
 67Xdq6y8+xNBmEAKj+Re8VpMcu4aj346Ctk7d4gst2ah/Rc0TvqfH6mezH7oq7RL
 hmOrMBWtwQfKhEE/fDkng30nrVxc/98YXP0n2rCCa0ySsaF6b6T185mTcYDRDxFs
 BVNS58ub+zxrF9Zd4nhIHKaEHiL2ZdDimqAicXN0RpywjIzTQ/y11uU7I1WBsKkq
 WkPYs+FPHnX7aBv1MsuxHhb8sUXjG924K4JeqnjF45jC3sC1crX+N0jv4wHw+89V
 h4k20s2Tw6m5XGXlgGwMJh0PCcD6X22Vd9Uyw8zb+IJfvNTGR9Rp1Ec+1gMRSll+
 xsn6G6Uy9bcNU0SqKlBSfelweGKn4ZxbEPn76Jc8KWLiepuZ6vv5PBoOuaujWht9
 KAeOC5XdjMk=
 =tH//
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - Core locking & atomics:

     - Convert all architectures to ARCH_ATOMIC: move every architecture
       to ARCH_ATOMIC, then get rid of ARCH_ATOMIC and all the
       transitory facilities and #ifdefs.

       Much reduction in complexity from that series:

           63 files changed, 756 insertions(+), 4094 deletions(-)

     - Self-test enhancements

 - Futexes:

     - Add the new FUTEX_LOCK_PI2 ABI, which is a variant that doesn't
       set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).

       [ The temptation to repurpose FUTEX_LOCK_PI's implicit setting of
         FLAGS_CLOCKRT & invert the flag's meaning to avoid having to
         introduce a new variant was resisted successfully. ]

     - Enhance futex self-tests

 - Lockdep:

     - Fix dependency path printouts

     - Optimize trace saving

     - Broaden & fix wait-context checks

 - Misc cleanups and fixes.

* tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  locking/lockdep: Correct the description error for check_redundant()
  futex: Provide FUTEX_LOCK_PI2 to support clock selection
  futex: Prepare futex_lock_pi() for runtime clock selection
  lockdep/selftest: Remove wait-type RCU_CALLBACK tests
  lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
  lockdep: Fix wait-type for empty stack
  locking/selftests: Add a selftest for check_irq_usage()
  lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
  locking/lockdep: Remove the unnecessary trace saving
  locking/lockdep: Fix the dep path printing for backwards BFS
  selftests: futex: Add futex compare requeue test
  selftests: futex: Add futex wait test
  seqlock: Remove trailing semicolon in macros
  locking/lockdep: Reduce LOCKDEP dependency list
  locking/lockdep,doc: Improve readability of the block matrix
  locking/atomics: atomic-instrumented: simplify ifdeffery
  locking/atomic: delete !ARCH_ATOMIC remnants
  locking/atomic: xtensa: move to ARCH_ATOMIC
  locking/atomic: sparc: move to ARCH_ATOMIC
  locking/atomic: sh: move to ARCH_ATOMIC
  ...
2021-06-28 11:45:29 -07:00
Paolo Bonzini b8917b4ae4 KVM/arm64 updates for v5.14.
- Add MTE support in guests, complete with tag save/restore interface
 - Reduce the impact of CMOs by moving them in the page-table code
 - Allow device block mappings at stage-2
 - Reduce the footprint of the vmemmap in protected mode
 - Support the vGIC on dumb systems such as the Apple M1
 - Add selftest infrastructure to support multiple configuration
   and apply that to PMU/non-PMU setups
 - Add selftests for the debug architecture
 - The usual crop of PMU fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmDV2bEPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDEr8P/ivwROx5NwGcHGmU5RfUCT3aFqhtVHHwD/lu
 jPcgoO61kz9TelOu6QRaVuK+mVHxcq3iP4R8nPq/QCkUlEXTmK2xkyhXhGXSYpH4
 6jM8+BbC3eG7iAxx6H0UM4JTl4Riwat6ZZtXpWEWs9TKqOHOQYFpMkxSttwVZ1CZ
 SjbtFvXLEdzKn6PzUWnKdBNMV/mHsdAtohZit9oJOc4ttc8072XxETQ4TFQ+MSvA
 j9zY9QPmWzgcZnotqRRu9sbTGO2vxtXuUtY3sjdD8+C9OgSe9qvpnNjymcmfwaMu
 1fBkfh65oaO4ItJBdGOUOoEcFqwN5imPiI7CB/O+ZYkO9sBCuTUPSQwPkyiwXb9r
 bUkTaQw2nZiNWsqR1x07fQ2sGYbMp5mnmgmqiV4MUWkLmFp9LZATCWYTTn24cBNS
 6SjVP6/8S0r3EhLnYjH0Pn1we5PooU1EF6RlCAd3ewYoo+9fPnwjNYwIWH5i5wB7
 +tnei44NACAw9cfbos+BYQQ/dY15OSFzLzIMomlabB7OpXOdDg3H6tJnPbFwWwXb
 9nF8XdHqxeDVVVrDCAx1BSodSXm9xqgnQM2RDGTUnpVcAfqAr3MXX6VsyKQDzj8T
 QXF9qOVCBAABv6BXAvSQ6mvMJZDUVbUPEPhf7kXzF46JsRd6A7wWoU/OnMGHQ/w7
 wjvH8HVy
 =fWBV
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
  and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
2021-06-25 11:24:24 -04:00
Christophe Leroy cae4644673 powerpc/ptrace: Refactor regs_set_return_{msr/ip}
regs_set_return_msr() and regs_set_return_ip() have a copy
of the code of set_return_regs_changed().

Call the later instead.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/baf64a91557d3811c155616a6aa23ed7b3b21da4.1624619582.git.christophe.leroy@csgroup.eu
2021-06-26 00:12:39 +10:00
Christophe Leroy 5f0f95f1e1 powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip}
regs_set_return_msr() and regs_set_return_ip() have a copy
of the code of set_return_regs_changed().

Move up set_return_regs_changed() so it can be reused by
regs_set_return_{msr/ip}

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/49f4fb051a3e1cb69f7305d5b6768aec14727c32.1624619582.git.christophe.leroy@csgroup.eu
2021-06-26 00:12:39 +10:00
Michael Ellerman 9583922563 powerpc: Fix is_kvm_guest() / kvm_para_available()
Commit a21d1becaa ("powerpc: Reintroduce is_kvm_guest() as a fast-path
check") added is_kvm_guest() and changed kvm_para_available() to use it.

is_kvm_guest() checks a static key, kvm_guest, and that static key is
set in check_kvm_guest().

The problem is check_kvm_guest() is only called on pseries, and even
then only in some configurations. That means is_kvm_guest() always
returns false on all non-pseries and some pseries depending on
configuration. That's a bug.

For PR KVM guests this is noticable because they no longer do live
patching of themselves, which can be detected by the omission of a
message in dmesg such as:

  KVM: Live patching for a fast VM worked

To fix it make check_kvm_guest() an initcall, to ensure it's always
called at boot. It needs to be core so that it runs before
kvm_guest_init() which is postcore. To be an initcall it needs to return
int, where 0 means success, so update that.

We still call it manually in pSeries_smp_probe(), because that runs
before init calls are run.

Fixes: a21d1becaa ("powerpc: Reintroduce is_kvm_guest() as a fast-path check")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210623130514.2543232-1-mpe@ellerman.id.au
2021-06-25 14:47:19 +10:00
Naveen N. Rao 0566fa760d powerpc/kprobes: Roll IS_RFI() macro into IS_RFID()
In kprobes and xmon, we should exclude both 32-bit and 64-bit variants
of mtmsr and rfi instructions from being stepped. Have IS_RFID() also
detect a rfi instruction similar to IS_MTMSRD().

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eee32e1b75dae85d471c89b4c0a123ad4b0aabf8.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
2021-06-25 14:47:18 +10:00
Vaibhav Jain de21e1377c powerpc/papr_scm: Add support for reporting dirty-shutdown-count
Persistent memory devices like NVDIMMs can loose cached writes in case
something prevents flush on power-fail. Such situations are termed as
dirty shutdown and are exposed to applications as
last-shutdown-state (LSS) flag and a dirty-shutdown-counter(DSC) as
described at [1]. The latter being useful in conditions where multiple
applications want to detect a dirty shutdown event without racing with
one another.

PAPR-NVDIMMs have so far only exposed LSS style flags to indicate a
dirty-shutdown-state. This patch further adds support for DSC via the
"ibm,persistence-failed-count" device tree property of an NVDIMM. This
property is a monotonic increasing 64-bit counter thats an indication
of number of times an NVDIMM has encountered a dirty-shutdown event
causing persistence loss.

Since this value is not expected to change after system-boot hence
papr_scm reads & caches its value during NVDIMM probe and exposes it
as a PAPR sysfs attributed named 'dirty_shutdown' to match the name of
similarly named NFIT sysfs attribute. Also this value is available to
libnvdimm via PAPR_PDSM_HEALTH payload. 'struct nd_papr_pdsm_health'
has been extended to add a new member called 'dimm_dsc' presence of
which is indicated by the newly introduced PDSM_DIMM_DSC_VALID flag.

References:
[1] https://pmem.io/documents/Dirty_Shutdown_Handling-V1.0.pdf

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210624080621.252038-1-vaibhav@linux.ibm.com
2021-06-25 14:47:18 +10:00
Jing Zhang 0193cc908b KVM: stats: Separate generic stats from architecture specific ones
Generic KVM stats are those collected in architecture independent code
or those supported by all architectures; put all generic statistics in
a separate structure.  This ensures that they are defined the same way
in the statistics API which is being added, removing duplication among
different architectures in the declaration of the descriptors.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-2-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24 11:47:56 -04:00
Christophe Leroy 56afad8852 powerpc: Remove klimit
klimit is a global variable initialised at build time with the
value of _end.

This variable is never modified, so _end symbol can be used directly.

Remove klimit.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9fa9ba6807c17f93f35a582c199c646c4a8bfd9c.1622800638.git.christophe.leroy@csgroup.eu
2021-06-25 00:07:10 +10:00
Nicholas Piggin 13799748b9 powerpc/64: use interrupt restart table to speed up return from interrupt
Use the restart table facility to return from interrupt or system calls
without disabling MSR[EE] or MSR[RI].

Interrupt return asm is put into the low soft-masked region, to prevent
interrupts being processed here, although they are still taken as masked
interrupts which causes SRRs to be clobbered, and a pending soft-masked
interrupt to require replaying.

The return code uses restart table regions to redirct to a fixup handler
rather than continue with the exit, if such an interrupt happens. In
this case the interrupt return is redirected to a fixup handler which
reloads r1 for the interrupt stack and reloads registers and sets state
up to replay the soft-masked interrupt and try the exit again.

Some types of security exit fallback flushes and barriers are currently
unable to cope with reentrant interrupts, e.g., because they store some
state in the scratch SPR which would be clobbered even by masked
interrupts. For now the interrupts-enabled exits are disabled when these
flushes are used.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Guard unused exit_must_hard_disable() as reported by lkp]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-13-npiggin@gmail.com
2021-06-25 00:06:56 +10:00
Nicholas Piggin 9d1988ca87 powerpc/64: treat low kernel text as irqs soft-masked
Treat code below __end_soft_masked as soft-masked for the purpose
of alternate return. 64s already mostly does this for scv entry.

This will be used to exit from interrupts without disabling MSR[EE].

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-12-npiggin@gmail.com
2021-06-25 00:06:56 +10:00
Nicholas Piggin f23699c93b powerpc/64: allow alternate return locations for soft-masked interrupts
The exception table fixup adjusts a failed page fault's interrupt return
location if it was taken at an address specified in the exception table,
to a corresponding fixup handler address.

Introduce a variation of that idea which adds a fixup table for NMIs and
soft-masked asynchronous interrupts. This will be used to protect
certain critical sections that are sensitive to being clobbered by
interrupts coming in (due to using the same SPRs and/or irq soft-mask
state).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-10-npiggin@gmail.com
2021-06-25 00:06:56 +10:00
Nicholas Piggin e754f4d13e powerpc/64: move interrupt return asm to interrupt_64.S
The next patch would like to move interrupt return assembly code to a low
location before general text, so move it into its own file and include via
head_64.S

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-7-npiggin@gmail.com
2021-06-25 00:06:55 +10:00
Nicholas Piggin 59dc5bfca0 powerpc/64s: avoid reloading (H)SRR registers if they are still valid
When an interrupt is taken, the SRR registers are set to return to where
it left off. Unless they are modified in the meantime, or the return
address or MSR are modified, there is no need to reload these registers
when returning from interrupt.

Introduce per-CPU flags that track the validity of SRR and HSRR
registers. These are cleared when returning from interrupt, when
using the registers for something else (e.g., OPAL calls), when
adjusting the return address or MSR of a context, and when context
switching (which changes the return address and MSR).

This improves the performance of interrupt returns.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in fixup patch from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com
2021-06-25 00:06:55 +10:00
Nicholas Piggin bf9155f197 powerpc: remove interrupt exit helpers unused argument
The msr argument is not used, remove it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210617155116.2167984-3-npiggin@gmail.com
2021-06-25 00:06:55 +10:00
Paolo Bonzini c3ab0e28a4 Merge branch 'topic/ppc-kvm' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD
- Support for the H_RPT_INVALIDATE hypercall

- Conversion of Book3S entry/exit to C

- Bug fixes
2021-06-23 07:30:41 -04:00
Michael Ellerman a736143afd Merge branch 'topic/ppc-kvm' into next
Pull in some more ppc KVM patches we are keeping in our topic branch.

In particular this brings in the series to add H_RPT_INVALIDATE.
2021-06-23 00:19:08 +10:00
Bharata B Rao 53324b51c5 KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE
Enable support for process-scoped invalidations from nested
guests and partition-scoped invalidations for nested guests.

Process-scoped invalidations for any level of nested guests
are handled by implementing H_RPT_INVALIDATE handler in the
nested guest exit path in L0.

Partition-scoped invalidation requests are forwarded to the
right nested guest, handled there and passed down to L0
for eventual handling.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
[aneesh: Nested guest partition-scoped invalidation changes]
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Squash in fixup patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-5-bharata@linux.ibm.com
2021-06-22 23:35:37 +10:00
Bharata B Rao f0c6fbbb90 KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
H_RPT_INVALIDATE does two types of TLB invalidations:

1. Process-scoped invalidations for guests when LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This is currently handled
   by H_TLB_INVALIDATE hcall and this new replaces the old that.

This commit enables process-scoped invalidations for L1 guests.
Support for process-scoped and partition-scoped invalidations
from/for nested guests will be added separately.

Process scoped tlbie invalidations from L1 and nested guests
need RS register for TLBIE instruction to contain both PID and
LPID.  This patch introduces primitives that execute tlbie
instruction with both PID and LPID set in prepartion for
H_RPT_INVALIDATE hcall.

A description of H_RPT_INVALIDATE follows:

int64   /* H_Success: Return code on successful completion */
        /* H_Busy - repeat the call with the same */
        /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid
	   parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT
					translation
					lookaside information */
      uint64 id,        /* PID/LPID to invalidate */
      uint64 target,    /* Invalidation target */
      uint64 type,      /* Type of lookaside information */
      uint64 pg_sizes,  /* Page sizes */
      uint64 start,     /* Start of Effective Address (EA)
			   range (inclusive) */
      uint64 end)       /* End of EA range (exclusive) */

Invalidation targets (target)
-----------------------------
Core MMU        0x01 /* All virtual processors in the
			partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU        0x04 /* All nest/accelerator agents
			in use by the partition */

A combination of the above can be specified,
except core and core local.

Type of translation to invalidate (type)
---------------------------------------
NESTED       0x0001  /* invalidate nested guest partition-scope */
TLB          0x0002  /* Invalidate TLB */
PWC          0x0004  /* Invalidate Page Walk Cache */
PRT          0x0008  /* Invalidate caching of Process Table
			Entries if NESTED is clear */
PAT          0x0008  /* Invalidate caching of Partition Table
			Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pages)
----------------------
4K              0x01
64K             0x02
2M              0x04
1G              0x08
All sizes       (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
           matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters
  are different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and
  end should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid
  addresses. Else start and end should be aligned to 4kB (lower 11
  bits clear).
* If NESTED is clear, then invalidate process scoped lookaside
  information. Else pid specifies a nested LPID, and the invalidation
  is performed   on nested guest partition table and nested guest
  partition scope real addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3
  and quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside
  invalidation range, which allows a caller to optimally invalidate
  ranges that may   contain mixed page sizes.
* Return H_SUCCESS on success.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com
2021-06-21 22:54:27 +10:00
Bharata B Rao d6265cb33b powerpc/book3s64/radix: Add H_RPT_INVALIDATE pgsize encodings to mmu_psize_def
Add a field to mmu_psize_def to store the page size encodings
of H_RPT_INVALIDATE hcall. Initialize this while scanning the radix
AP encodings. This will be used when invalidating with required
page size encoding in the hcall.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-3-bharata@linux.ibm.com
2021-06-21 22:48:18 +10:00
Aneesh Kumar K.V f09216a190 KVM: PPC: Book3S HV: Fix comments of H_RPT_INVALIDATE arguments
The type values H_RPTI_TYPE_PRT and H_RPTI_TYPE_PAT indicate
invalidating the caching of process and partition scoped entries
respectively.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210621085003.904767-2-bharata@linux.ibm.com
2021-06-21 22:48:18 +10:00
Benjamin Herrenschmidt aa9c5adf2f powerpc/xics: Add a native ICS backend for microwatt
This is a simple native ICS backend that matches the layout of
the Microwatt implementation of ICS.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
[mpe: Add empty ics_native_init() to unbreak non-microwatt builds]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

fixup-ics
Link: https://lore.kernel.org/r/YMwW8cxrwB2W5EUN@thinks.paulus.ozlabs.org
2021-06-21 21:15:58 +10:00
Christophe Leroy 4d1755b6a7 powerpc/mm: implement set_memory_attr()
In addition to the set_memory_xx() functions which allows to change
the memory attributes of not (yet) used memory regions, implement a
set_memory_attr() function to:
- set the final memory protection after init on currently used
kernel regions.
- enable/disable kernel memory regions in the scope of DEBUG_PAGEALLOC.

Unlike the set_memory_xx() which can act in three step as the regions
are unused, this function must modify 'on the fly' as the kernel is
executing from them. At the moment only PPC32 will use it and changing
page attributes on the fly is not an issue.

Reported-by: kbuild test robot <lkp@intel.com>
[ruscur: cast "data" to unsigned long instead of int]
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-9-jniethe5@gmail.com
2021-06-21 21:13:21 +10:00
Jordan Niethe 4fcc636615 powerpc/modules: Make module_alloc() Strict Module RWX aware
Make module_alloc() use PAGE_KERNEL protections instead of
PAGE_KERNEL_EXEX if Strict Module RWX is enabled.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-4-jniethe5@gmail.com
2021-06-21 21:13:20 +10:00
Jordan Niethe 71a5b3db9f powerpc/lib/code-patching: Set up Strict RWX patching earlier
setup_text_poke_area() is a late init call so it runs before
mark_rodata_ro() and after the init calls. This lets all the init code
patching simply write to their locations. In the future, kprobes is
going to allocate its instruction pages RO which means they will need
setup_text__poke_area() to have been already called for their code
patching. However, init_kprobes() (which allocates and patches some
instruction pages) is an early init call so it happens before
setup_text__poke_area().

start_kernel() calls poking_init() before any of the init calls. On
powerpc, poking_init() is currently a nop. setup_text_poke_area() relies
on kernel virtual memory, cpu hotplug and per_cpu_areas being setup.
setup_per_cpu_areas(), boot_cpu_hotplug_init() and mm_init() are called
before poking_init().

Turn setup_text_poke_area() into poking_init().

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Russell Currey <ruscur@russell.cc>
[mpe: Fold in missing prototype for poking_init() from lkp]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-3-jniethe5@gmail.com
2021-06-21 21:13:20 +10:00
Russell Currey 1f9ad21c3b powerpc/mm: Implement set_memory() routines
The set_memory_{ro/rw/nx/x}() functions are required for
STRICT_MODULE_RWX, and are generally useful primitives to have.  This
implementation is designed to be generic across powerpc's many MMUs.
It's possible that this could be optimised to be faster for specific
MMUs.

This implementation does not handle cases where the caller is attempting
to change the mapping of the page it is executing from, or if another
CPU is concurrently using the page being altered.  These cases likely
shouldn't happen, but a more complex implementation with MMU-specific code
could safely handle them.

On hash, the linear mapping is not kept in the linux pagetable, so this
will not change the protection if used on that range. Currently these
functions are not used on the linear map so just WARN for now.

apply_to_existing_page_range() does not work on huge pages so for now
disallow changing the protection of huge pages.

[jpn: - Allow set memory functions to be used without Strict RWX
      - Hash: Disallow certain regions
      - Have change_page_attr() take function pointers to manipulate ptes
      - Radix: Add ptesync after set_pte_at()]

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210609013431.9805-2-jniethe5@gmail.com
2021-06-21 21:13:20 +10:00
Nicholas Piggin 393eff5a7b powerpc/pesries: Get STF barrier requirement from H_GET_CPU_CHARACTERISTICS
This allows the hypervisor / firmware to describe this workarounds to
the guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-4-npiggin@gmail.com
2021-06-21 21:13:19 +10:00
Nicholas Piggin 84ed26fd00 powerpc/security: Add a security feature for STF barrier
Rather than tying this mitigation to RFI L1D flush requirement, add a
new bit for it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-3-npiggin@gmail.com
2021-06-21 21:13:19 +10:00
Nicholas Piggin 65c7d07085 powerpc/pseries: Get entry and uaccess flush required bits from H_GET_CPU_CHARACTERISTICS
This allows the hypervisor / firmware to describe these workarounds to
the guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503130243.891868-2-npiggin@gmail.com
2021-06-21 21:13:19 +10:00
Suraj Jitindar Singh 77bbbc0cf8 KVM: PPC: Book3S HV: Fix TLB management on SMT8 POWER9 and POWER10 processors
The POWER9 vCPU TLB management code assumes all threads in a core share
a TLB, and that TLBIEL execued by one thread will invalidate TLBs for
all threads. This is not the case for SMT8 capable POWER9 and POWER10
(big core) processors, where the TLB is split between groups of threads.
This results in TLB multi-hits, random data corruption, etc.

Fix this by introducing cpu_first_tlb_thread_sibling etc., to determine
which siblings share TLBs, and use that in the guest TLB flushing code.

[npiggin@gmail.com: add changelog and comment]

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210602040441.3984352-1-npiggin@gmail.com
2021-06-21 09:22:34 +10:00
Haren Myneni b22f2d88e4 powerpc/pseries/vas: Integrate API with open/close windows
This patch adds VAS window allocatioa/close with the corresponding
hcalls. Also changes to integrate with the existing user space VAS
API and provide register/unregister functions to NX pseries driver.

The driver register function is used to create the user space
interface (/dev/crypto/nx-gzip) and unregister to remove this entry.

The user space process opens this device node and makes an ioctl
to allocate VAS window. The close interface is used to deallocate
window.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e8d956bace3f182c4d2e66e343ff37cb0391d1fd.camel@linux.ibm.com
2021-06-20 21:58:57 +10:00
Haren Myneni 540761b7f5 powerpc/vas: Define QoS credit flag to allocate window
PowerVM introduces two different type of credits: Default and Quality
of service (QoS).

The total number of default credits available on each LPAR depends
on CPU resources configured. But these credits can be shared or
over-committed across LPARs in shared mode which can result in
paste command failure (RMA_busy). To avoid NX HW contention, the
hypervisor ntroduces QoS credit type which makes sure guaranteed
access to NX esources. The system admins can assign QoS credits
or each LPAR via HMC.

Default credit type is used to allocate a VAS window by default as
on PowerVM implementation. But the process can pass
VAS_TX_WIN_FLAG_QOS_CREDIT flag with VAS_TX_WIN_OPEN ioctl to open
QoS type window.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa950b7b8e8077364267720274a7b9ec34e76e73.camel@linux.ibm.com
2021-06-20 21:58:56 +10:00
Haren Myneni 8f3a6c9280 powerpc/pseries/vas: Define VAS/NXGZIP hcalls and structs
This patch adds hcalls and other definitions. Also define structs
that are used in VAS implementation on PowerVM.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b4b8c594c27ee4aa6be9dc6dc4ee7331571cbbe8.camel@linux.ibm.com
2021-06-20 21:58:56 +10:00
Haren Myneni 7bc6f71bdf powerpc/vas: Define and use common vas_window struct
Many elements in vas_struct are used on PowerNV and PowerVM
platforms. vas_window is used for both TX and RX windows on
PowerNV and for TX windows on PowerVM. So some elements are
specific to these platforms.

So this patch defines common vas_window and platform
specific window structs (pnv_vas_window on PowerNV). Also adds
the corresponding changes in PowerNV vas code.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1698c35c158dfe52c6d2166667823d3d4a463353.camel@linux.ibm.com
2021-06-20 21:58:56 +10:00
Haren Myneni 3b26797350 powerpc/vas: Move update_csb/dump_crb to common book3s platform
If a coprocessor encounters an error translating an address, the
VAS will cause an interrupt in the host. The kernel processes
the fault by updating CSB. This functionality is same for both
powerNV and pseries. So this patch moves these functions to
common vas-api.c and the actual functionality is not changed.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bf8d5b0770fa1ef5cba88c96580caa08d999d3b5.camel@linux.ibm.com
2021-06-20 21:58:56 +10:00
Haren Myneni 3856aa542d powerpc/vas: Create take/drop pid and mm reference functions
Take pid and mm references when each window opens and drops during
close. This functionality is needed for powerNV and pseries. So
this patch defines the existing code as functions in common book3s
platform vas-api.c

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2fa40df962250a737c804e58202924717b39e381.camel@linux.ibm.com
2021-06-20 21:58:56 +10:00
Haren Myneni 1a0d0d5ed5 powerpc/vas: Add platform specific user window operations
PowerNV uses registers to open/close VAS windows, and getting the
paste address. Whereas the hypervisor calls are used on PowerVM.

This patch adds the platform specific user space window operations
and register with the common VAS user space interface.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f85091f4ace67f951ac04d60394d67b21e2f5d3c.camel@linux.ibm.com
2021-06-20 21:58:55 +10:00
Haren Myneni 06c6fad9bf powerpc/powernv/vas: Rename register/unregister functions
powerNV and pseries drivers register / unregister to the corresponding
platform specific VAS separately. Then these VAS functions call the
common API with the specific window operations. So rename powerNV VAS
API register/unregister functions.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9db00d58dbdcb7cfc07a1df95f3d2a9e3e5d746a.camel@linux.ibm.com
2021-06-20 21:58:55 +10:00
Haren Myneni 413d6ed3ea powerpc/vas: Move VAS API to book3s common platform
The pseries platform will share vas and nx code and interfaces
with the PowerNV platform, so create the
arch/powerpc/platforms/book3s/ directory and move VAS API code
there. Functionality is not changed.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e05c8db17b9eabe3545b902d034238e4c6c08180.camel@linux.ibm.com
2021-06-20 21:58:55 +10:00
Linus Torvalds b84a7c286c powerpc fixes for 5.13 #6
Fix initrd corruption caused by our recent change to use relative jump labels.
 
 Fix a crash using perf record on systems without a hardware PMU backend.
 
 Rework our 64-bit signal handling slighty to make it more closely match the old behaviour,
 after the recent change to use unsafe user accessors.
 
 Thanks to: Anastasia Kovaleva, Athira Rajeev, Christophe Leroy, Daniel Axtens, Greg Kurz,
 Roman Bolshakov.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmDOeA0THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgHZgD/9NbskRdhx9Vj+lWBCa8K37Cckf+aYu
 bQxszcVDA65xwhASk9CotSy6NC1HgyxB7n3VO7FbCku50JNapT85/Onl07R/Aiiv
 PhHOuDs5Gj8hB8rdpxYQjas3C2XW/UJR6ebogMcNxf4BN6fjHHoLbGmig1o+X2Jm
 rd9l5dWiiK27o8McqCsTESW1VKVtjov7owX7xh/HW/U6hbDkuLdVyMSViaisIwi2
 I1wfzmMWcN8JpBUv1G7pWFuKgatsTfr2p3bsdVFmPl3LjUXXcyJ8zQS5yoV6uD5F
 laFx/BG1T06y1ny1yvEL3sTlNHE0tQPF+FVZ75hYmPKnE7tjo5rxeGFiul4RWo5q
 oiSVDOrjt2urNeRVv10oSCUgs2epRosIaTqXJx1JyK/yYF2oI4FvqkTMBjPxeGot
 ZHqFV8QNm19ZxlMtvzNSpagp6FX8kEOVxJaGersfmzBhSZudcR/VxX0086rWguw4
 RA+0+qY6KGBi+qxylmyvzpJjsp59houykDNhhbED7tpQJACex7JMDJiiH88VP8l3
 vf9h1z+NbBoiR3y0a/uv1nDTMLsTQ5PbcnNbZr9u+Oc5vwu8DP1gCq41lm6BOMbz
 F6IxdcOqBHn3HGM11ZtAt5u6ep3ZDfPx8lMth1z3kaAXjm3nlDAJPC4N//p6vz+a
 IzL1Iv0r2X7qvQ==
 =Qrh9
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix initrd corruption caused by our recent change to use relative jump
  labels.

  Fix a crash using perf record on systems without a hardware PMU
  backend.

  Rework our 64-bit signal handling slighty to make it more closely
  match the old behaviour, after the recent change to use unsafe user
  accessors.

  Thanks to Anastasia Kovaleva, Athira Rajeev, Christophe Leroy, Daniel
  Axtens, Greg Kurz, and Roman Bolshakov"

* tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not set
  powerpc: Fix initrd corruption with relative jump labels
  powerpc/signal64: Copy siginfo before changing regs->nip
  powerpc/mem: Add back missing header to fix 'no previous prototype' error
2021-06-19 16:50:23 -07:00
Paolo Bonzini e3cb6fa0e2 KVM: switch per-VM stats to u64
Make them the same type as vCPU stats.  There is no reason
to limit the counters to unsigned long.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17 14:25:27 -04:00
Michael Ellerman 3c53642324 Merge branch 'topic/ppc-kvm' into next
Merge some powerpc KVM patches from our topic branch.

In particular this brings in Nick's big series rewriting parts of the
guest entry/exit path in C.

Conflicts:
	arch/powerpc/kernel/security.c
	arch/powerpc/kvm/book3s_hv_rmhandlers.S
2021-06-17 16:51:38 +10:00
Sudeep Holla db8f7066dc powerpc/64: drop redundant defination of spin_until_cond
linux/processor.h has exactly same defination for spin_until_cond.
Drop the redundant defination in asm/processor.h

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1fff2054e5dfc00329804dbd3f2a91667c9a8aff.1623438544.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:11 +10:00
Christophe Leroy ab3aab292c powerpc: Move update_power8_hid0() into its only user
update_power8_hid0() is used only by powernv platform subcore.c

Move it there.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/37f41d74faa0c66f90b373e243e8b1ee37a1f6fa.1623219019.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:11 +10:00
Christophe Leroy 77b0bed742 powerpc: Remove proc_trap()
proc_trap() has never been used, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/827944ea12d470c2f862635f48b5ee6c1520351f.1623217909.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:10 +10:00
Christophe Leroy e72421a085 powerpc: Define swapper_pg_dir[] in C
Don't duplicate swapper_pg_dir[] in each platform's head.S

Define it in mm/pgtable.c

Define MAX_PTRS_PER_PGD because on book3s/64 PTRS_PER_PGD is
not a constant.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5e3f1b8a4695c33ccc80aa3870e016bef32b85e1.1623063174.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:10 +10:00
Christophe Leroy a56ab7c729 powerpc/nohash: Convert set_context() to C
ppc8xx already has set_context() in C.

Other ones have it in assembly. The only thing it does is to
write the context id into SPRN_PID.

Do it in C.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a5d0759064f3831c6b88af49ef5d3b05ba1c4dad.1622712515.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:09 +10:00
Christophe Leroy 240efd717c powerpc/kuap: Force inlining of all first level KUAP helpers.
All KUAP helpers defined in asm/kup.h are single line functions
that should be inlined. But on book3s/32 build, we get many
instances of <prevent_write_to_user.constprop.0>.

Force inlining of those helpers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8479a862e165a57a855292d47e24c259a578f5a0.1622711627.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:09 +10:00
Christophe Leroy cb2f1fb205 powerpc/kuap: Remove to/from/size parameters of prevent_user_access()
prevent_user_access() doesn't use anymore to/from/size parameters.

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b7113662fd2c26e4c33e9d705de324bd3860822e.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:09 +10:00
Christophe Leroy d008f8f8a0 powerpc/kuap: Remove KUAP_CURRENT_XXX
book3s/32 was the only user of KUAP_CURRENT_XXX.

After rework of book3s/32 KUAP, it is not used anymore.

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/549214ecf6887d965645e664520d4886663c5ffb.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:09 +10:00
Christophe Leroy 16132529ce powerpc/32s: Rework Kernel Userspace Access Protection
On book3s/32, KUAP is provided by toggling Ks bit in segment registers.
One segment register addresses 256M of virtual memory.

At the time being, KUAP implements a complex logic to apply the
unlock/lock on the exact number of segments covering the user range
to access, with saving the boundaries of the range of segments in
a member of thread struct.

But most if not all user accesses are within a single segment.

Rework KUAP with a different approach:
- Open only one segment, the one corresponding to the starting
address of the range to be accessed.
- If a second segment is involved, it will generate a page fault. The
segment will then be open by the page fault handler.

The kuap member of thread struct will now contain:
- The start address of the current on going user access, that will be
used to know which segment to lock at the end of the user access.
- ~0 when no user access is open
- ~1 when additionnal segments are opened by a page fault.

Then, at lock time
- When only one segment is open, close it.
- When several segments are open, close all user segments.

Almost 100% of the time, only one segment will be involved.

In interrupts, inline the function that unlock/lock all segments,
because not inlining them implies a lot of register save/restore.

With the patch, writing value 128 in userspace in perf_copy_attr() is
done with 16 instructions:

    3890:	93 82 04 dc 	stw     r28,1244(r2)
    3894:	7d 20 e5 26 	mfsrin  r9,r28
    3898:	55 29 00 80 	rlwinm  r9,r9,0,2,0
    389c:	7d 20 e1 e4 	mtsrin  r9,r28
    38a0:	4c 00 01 2c 	isync

    38a4:	39 20 00 80 	li      r9,128
    38a8:	91 3c 00 00 	stw     r9,0(r28)

    38ac:	81 42 04 dc 	lwz     r10,1244(r2)
    38b0:	39 00 ff ff 	li      r8,-1
    38b4:	91 02 04 dc 	stw     r8,1244(r2)
    38b8:	2c 0a ff fe 	cmpwi   r10,-2
    38bc:	41 82 00 88 	beq     3944 <perf_copy_attr+0x36c>
    38c0:	7d 20 55 26 	mfsrin  r9,r10
    38c4:	65 29 40 00 	oris    r9,r9,16384
    38c8:	7d 20 51 e4 	mtsrin  r9,r10
    38cc:	4c 00 01 2c 	isync
...
    3944:	48 00 00 01 	bl      3944 <perf_copy_attr+0x36c>
			3944: R_PPC_REL24	kuap_lock_all_ool

Before the patch it was 118 instructions. In reality only 42 are
executed in most cases, but GCC is not able to see that a properly
aligned user access cannot involve more than one segment.

    5060:	39 1d 00 04 	addi    r8,r29,4
    5064:	3d 20 b0 00 	lis     r9,-20480
    5068:	7c 08 48 40 	cmplw   r8,r9
    506c:	40 81 00 08 	ble     5074 <perf_copy_attr+0x2cc>
    5070:	3d 00 b0 00 	lis     r8,-20480
    5074:	39 28 ff ff 	addi    r9,r8,-1
    5078:	57 aa 00 06 	rlwinm  r10,r29,0,0,3
    507c:	55 29 27 3e 	rlwinm  r9,r9,4,28,31
    5080:	39 29 00 01 	addi    r9,r9,1
    5084:	7d 29 53 78 	or      r9,r9,r10
    5088:	91 22 04 dc 	stw     r9,1244(r2)
    508c:	7d 20 ed 26 	mfsrin  r9,r29
    5090:	55 29 00 80 	rlwinm  r9,r9,0,2,0
    5094:	7c 08 50 40 	cmplw   r8,r10
    5098:	40 81 00 c0 	ble     5158 <perf_copy_attr+0x3b0>
    509c:	7d 46 50 f8 	not     r6,r10
    50a0:	7c c6 42 14 	add     r6,r6,r8
    50a4:	54 c6 27 be 	rlwinm  r6,r6,4,30,31
    50a8:	7d 20 51 e4 	mtsrin  r9,r10
    50ac:	3c ea 10 00 	addis   r7,r10,4096
    50b0:	39 29 01 11 	addi    r9,r9,273
    50b4:	7f 88 38 40 	cmplw   cr7,r8,r7
    50b8:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    50bc:	40 9d 00 9c 	ble     cr7,5158 <perf_copy_attr+0x3b0>

    50c0:	2f 86 00 00 	cmpwi   cr7,r6,0
    50c4:	41 9e 00 4c 	beq     cr7,5110 <perf_copy_attr+0x368>
    50c8:	2f 86 00 01 	cmpwi   cr7,r6,1
    50cc:	41 9e 00 2c 	beq     cr7,50f8 <perf_copy_attr+0x350>
    50d0:	2f 86 00 02 	cmpwi   cr7,r6,2
    50d4:	41 9e 00 14 	beq     cr7,50e8 <perf_copy_attr+0x340>
    50d8:	7d 20 39 e4 	mtsrin  r9,r7
    50dc:	39 29 01 11 	addi    r9,r9,273
    50e0:	3c e7 10 00 	addis   r7,r7,4096
    50e4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    50e8:	7d 20 39 e4 	mtsrin  r9,r7
    50ec:	39 29 01 11 	addi    r9,r9,273
    50f0:	3c e7 10 00 	addis   r7,r7,4096
    50f4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    50f8:	7d 20 39 e4 	mtsrin  r9,r7
    50fc:	3c e7 10 00 	addis   r7,r7,4096
    5100:	39 29 01 11 	addi    r9,r9,273
    5104:	7f 88 38 40 	cmplw   cr7,r8,r7
    5108:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    510c:	40 9d 00 4c 	ble     cr7,5158 <perf_copy_attr+0x3b0>
    5110:	7d 20 39 e4 	mtsrin  r9,r7
    5114:	39 29 01 11 	addi    r9,r9,273
    5118:	3c c7 10 00 	addis   r6,r7,4096
    511c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5120:	7d 20 31 e4 	mtsrin  r9,r6
    5124:	39 29 01 11 	addi    r9,r9,273
    5128:	3c c6 10 00 	addis   r6,r6,4096
    512c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5130:	7d 20 31 e4 	mtsrin  r9,r6
    5134:	39 29 01 11 	addi    r9,r9,273
    5138:	3c c7 30 00 	addis   r6,r7,12288
    513c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5140:	7d 20 31 e4 	mtsrin  r9,r6
    5144:	3c e7 40 00 	addis   r7,r7,16384
    5148:	39 29 01 11 	addi    r9,r9,273
    514c:	7f 88 38 40 	cmplw   cr7,r8,r7
    5150:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5154:	41 9d ff bc 	bgt     cr7,5110 <perf_copy_attr+0x368>

    5158:	4c 00 01 2c 	isync
    515c:	39 20 00 80 	li      r9,128
    5160:	91 3d 00 00 	stw     r9,0(r29)

    5164:	38 e0 00 00 	li      r7,0
    5168:	90 e2 04 dc 	stw     r7,1244(r2)
    516c:	7d 20 ed 26 	mfsrin  r9,r29
    5170:	65 29 40 00 	oris    r9,r9,16384
    5174:	40 81 00 c0 	ble     5234 <perf_copy_attr+0x48c>
    5178:	7d 47 50 f8 	not     r7,r10
    517c:	7c e7 42 14 	add     r7,r7,r8
    5180:	54 e7 27 be 	rlwinm  r7,r7,4,30,31
    5184:	7d 20 51 e4 	mtsrin  r9,r10
    5188:	3d 4a 10 00 	addis   r10,r10,4096
    518c:	39 29 01 11 	addi    r9,r9,273
    5190:	7c 08 50 40 	cmplw   r8,r10
    5194:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5198:	40 81 00 9c 	ble     5234 <perf_copy_attr+0x48c>

    519c:	2c 07 00 00 	cmpwi   r7,0
    51a0:	41 82 00 4c 	beq     51ec <perf_copy_attr+0x444>
    51a4:	2c 07 00 01 	cmpwi   r7,1
    51a8:	41 82 00 2c 	beq     51d4 <perf_copy_attr+0x42c>
    51ac:	2c 07 00 02 	cmpwi   r7,2
    51b0:	41 82 00 14 	beq     51c4 <perf_copy_attr+0x41c>
    51b4:	7d 20 51 e4 	mtsrin  r9,r10
    51b8:	39 29 01 11 	addi    r9,r9,273
    51bc:	3d 4a 10 00 	addis   r10,r10,4096
    51c0:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    51c4:	7d 20 51 e4 	mtsrin  r9,r10
    51c8:	39 29 01 11 	addi    r9,r9,273
    51cc:	3d 4a 10 00 	addis   r10,r10,4096
    51d0:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    51d4:	7d 20 51 e4 	mtsrin  r9,r10
    51d8:	3d 4a 10 00 	addis   r10,r10,4096
    51dc:	39 29 01 11 	addi    r9,r9,273
    51e0:	7c 08 50 40 	cmplw   r8,r10
    51e4:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    51e8:	40 81 00 4c 	ble     5234 <perf_copy_attr+0x48c>
    51ec:	7d 20 51 e4 	mtsrin  r9,r10
    51f0:	39 29 01 11 	addi    r9,r9,273
    51f4:	3c ea 10 00 	addis   r7,r10,4096
    51f8:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    51fc:	7d 20 39 e4 	mtsrin  r9,r7
    5200:	39 29 01 11 	addi    r9,r9,273
    5204:	3c e7 10 00 	addis   r7,r7,4096
    5208:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    520c:	7d 20 39 e4 	mtsrin  r9,r7
    5210:	39 29 01 11 	addi    r9,r9,273
    5214:	3c ea 30 00 	addis   r7,r10,12288
    5218:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    521c:	7d 20 39 e4 	mtsrin  r9,r7
    5220:	3d 4a 40 00 	addis   r10,r10,16384
    5224:	39 29 01 11 	addi    r9,r9,273
    5228:	7c 08 50 40 	cmplw   r8,r10
    522c:	55 29 02 06 	rlwinm  r9,r9,0,8,3
    5230:	41 81 ff bc 	bgt     51ec <perf_copy_attr+0x444>

    5234:	4c 00 01 2c 	isync

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Export the ool handlers to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d9121f96a7c4302946839a0771f5d1daeeb6968c.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:08 +10:00
Christophe Leroy 6b4d630068 powerpc/32s: Allow disabling KUAP at boot time
PPC64 uses MMU features to enable/disable KUAP at boot time.
But feature fixups are applied way too early on PPC32.

Now that all KUAP related actions are in C following the
conversion of KUAP initial setup and context switch in C,
static branches can be used to enable/disable KUAP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Export disable_kuap_key to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cd79e8008455fba5395d099f9bb1305c039b931c.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:08 +10:00
Christophe Leroy 50d2f104cd powerpc/32s: Allow disabling KUEP at boot time
PPC64 uses MMU features to enable/disable KUEP at boot time.
But feature fixups are applied way too early on PPC32.

Now that all KUEP related actions are in C following the
conversion of KUEP initial setup and context switch in C,
static branches can be used to enable/disable KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7745a2c3a08ec46302920a3f48d1cb9b5469dbbb.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:08 +10:00
Christophe Leroy 882136fb2f powerpc/32s: Simplify calculation of segment register content
segment register has VSID on bits 8-31.
Bits 4-7 are reserved, there is no requirement to set them to 0.

VSIDs are calculated from VSID of SR0 by adding 0x111.

Even with highest possible VSID which would be 0xFFFFF0,
adding 16 times 0x111 results in 0x1001100.

So, the reserved bits are never overflowed, no need to clear
the reserved bits after each calculation.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ddc1cfd2ec8f3b2395c6a4d7f2b0c1aa1b1e64fb.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:08 +10:00
Christophe Leroy 863771a28e powerpc/32s: Convert switch_mmu_context() to C
switch_mmu_context() does things that can easily be done in C.

For updating user segments, we have update_user_segments().

As mentionned in commit b5efec00b6 ("powerpc/32s: Move KUEP
locking/unlocking in C"), update_user_segments() has the loop
unrolled which is a significant performance gain.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/05c0875ad8220c03452c3a334946e207c6ca04d6.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:08 +10:00
Christophe Leroy 7235bb3593 powerpc/32s: move CTX_TO_VSID() into mmu-hash.h
In order to reuse it in switch_mmu_context(), this
patch moves CTX_TO_VSID() macro into asm/book3s/32/mmu-hash.h

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/26b36ef2939234a04b37baf6ffe50cba81f5d1b7.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:08 +10:00
Christophe Leroy 91bb30822a powerpc/32s: Refactor update of user segment registers
KUEP implements the update of user segment registers.

Move it into mmu-hash.h in order to use it from other places.

And inline kuep_lock() and kuep_unlock(). Inlining kuep_lock() is
important for system_call_exception(), otherwise system_call_exception()
has to save into stack the system call parameters that are used just
after, and doing that takes more instructions than kuep_lock() itself.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24591ca480d14a62ef910e38a5273d551262c4a2.1622708530.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:07 +10:00
Christophe Leroy f6025a140b powerpc/8xx: Allow disabling KUAP at boot time
PPC64 uses MMU features to enable/disable KUAP at boot time.
But feature fixups are applied way too early on PPC32.

But since commit c16728835e ("powerpc/32: Manage KUAP in C"),
all KUAP is in C so it is now possible to use static branches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3dca510ce555335261a47c4799167da698f569c0.1622782111.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:07 +10:00
Christophe Leroy 10248dcba1 powerpc/44x: Implement Kernel Userspace Exec Protection (KUEP)
Powerpc 44x has two bits for exec protection in TLBs: one
for user (UX) and one for superviser (SX).

Clear SX on user pages in TLB miss handlers to provide KUEP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/169310e08152aa1d96c979770291d165ec6896ae.1622616032.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:07 +10:00
Christophe Leroy 0e628ad2d6 powerpc/optprobes: use PPC_RAW_ macros
Use PPC_RAW_ macros to simplify the code.

And use PPC_LO/PPC_HI instead of IMM_L/IMM_H which are for
internal use inside ppc-opcode.h

Those macros are self explanatory, comments can go as well.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5a167b8ba4d33a5c09cd504f0c862e25ffe85459.1621516826.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:07 +10:00
Christophe Leroy 077c4dedef powerpc/inst: Refactor PPC32 and PPC64 versions
ppc_inst() ppc_inst_prefixed() ppc_inst_swab() can easily be made common
to both PPC32 and PPC64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d54c63dcac6d190e1cc0d2fe3259d6e621928cdf.1621516826.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:06 +10:00
Christophe Leroy 69d4d6e5fd powerpc: Don't use 'struct ppc_inst' to reference instruction location
'struct ppc_inst' is an internal representation of an instruction, but
in-memory instructions are and will remain a table of 'u32' forever.

Replace all 'struct ppc_inst *' used for locating an instruction in
memory by 'u32 *'. This removes a lot of undue casts to 'struct
ppc_inst *'.

It also helps locating ab-use of 'struct ppc_inst' dereference.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix ppc_inst_next(), use u32 instead of unsigned int]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7062722b087228e42cbd896e39bfdf526d6a340a.1621516826.git.christophe.leroy@csgroup.eu
2021-06-17 00:09:00 +10:00
Christophe Leroy 6c0d181daa powerpc/lib/code-patching: Make instr_is_branch_to_addr() static
instr_is_branch_to_addr() is only used in code-patching.c

Make it static.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5f6b9c8c83170ed310953eac2f5b14539bfc964a.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16 23:35:57 +10:00
Christophe Leroy 036b5560be powerpc/inst: Avoid pointer dereferencing in ppc_inst_equal()
Avoid casting/dereferencing ppc_inst() as u64* , check each member
of the struct when relevant.

And remove the 0xff initialisation of the suffix for non
prefixed instruction. An instruction with 0xff as a suffix
might be invalid, but still is a prefixed instruction and
has to be considered as this.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d8b155e930b7a9708ca110e8ff0ace6713a7af75.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16 23:35:57 +10:00
Christophe Leroy 042e0860e1 powerpc/inst: Improve readability of get_user_instr() and friends
Remove unneeded line splits.

And remove unneeded local variable initialisation.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fb097fda78cc6852905ef00f8f7bf371b6cc66f7.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16 23:35:30 +10:00
Christophe Leroy 9134806e14 powerpc/inst: Reduce casts in get_user_instr()
Declare __gui_ptr as 'u32 *' instead of casting it at each use to
'unsigned int *' (which is an equivalent type).

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Use u32 * instead of unsigned int *]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2c2123998e05535d08ba03a96ea1eea921d06a86.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16 23:35:10 +10:00
Christophe Leroy b3a9e52323 powerpc/inst: Fix sparse detection on get_user_instr()
get_user_instr() lacks sparse detection for the __user tag.

This is because __gui_ptr is assigned with a cast.

Fix that by adding a __chk_user_ptr()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0320e5b41a794fd456ab8c5993bbfadcf9e1d8b4.1621516826.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:49 +10:00
Christophe Leroy f30becb5e9 powerpc: Replace PPC_INST_NOP by PPC_RAW_NOP()
On the road to removing all PPC_INST_xx defines in
asm/ppc-opcodes.h, change PPC_INST_NOP to PPC_RAW_NOP().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ad46c195ca1b8572629ef07ba6bfe247585239a6.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:49 +10:00
Christophe Leroy deefd0ae99 powerpc/traps: Start using PPC_RAW_xx() macros
Start using PPC_RAW_xx() macros where relevant.

PPC_INST_SYNC is used to both represent the 'sync' instruction and
the family of synchronisation instructions. Keep it for the later,
maybe we'll change the name in the future to avoid confusion.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0945c155d6cb113431185fc1296ac127359fe29b.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:48 +10:00
Christophe Leroy ef909ba954 powerpc/lib/feature-fixups: Use PPC_RAW_xxx() macros
Use PPC_RAW_xxx() macros instead of open coding assembly
opcodes.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix bad converison in do_stf_exit_barrier_fixups()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e79cd8e111ca13bf8c61a384bac365aa7e207647.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:48 +10:00
Christophe Leroy e08021f8db powerpc/ebpf64: Use PPC_RAW_MFLR()
Use PPC_RAW_MFLR() instead of open coding with PPC_INST_MFLR.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c1887623e91e8b4da36e669e4c74de86320a5092.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:48 +10:00
Christophe Leroy e730459756 powerpc/security: Use PPC_RAW_BLR() and PPC_RAW_NOP()
On the road to remove all use of PPC_INST_xxx, replace
PPC_INST_BLR by PPC_RAW_BLR(). Same for PPC_INST_NOP.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c04f88d0e53d2122fbbe92226892a01ebc668b6a.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:48 +10:00
Christophe Leroy 47b04699d0 powerpc/modules: Use PPC_RAW_xx() macros
To improve readability, use PPC_RAW_xx() macros instead of
open coding. Those macros are self-explanatory so the comments
can go as well.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/99d9ee8849d3992beeadb310a665aae01c3abfb1.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:48 +10:00
Christophe Leroy 1c9debbc2e powerpc/signal: Use PPC_RAW_xx() macros
To improve readability, use PPC_RAW_xx() macros instead of
open coding. Those macros are self-explanatory so the comments
can go as well.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4ca2bfdca2f47a293d05f61eb3c4e487ee170f1f.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:47 +10:00
Christophe Leroy 8804d5beef powerpc/lib/code-patching: Use PPC_RAW_() macros
Instead of open coding with PPC_INST_ defines, use
PPC_RAW_() macros. It improves readability.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8c92f1d9e825ee47c6f88fe43ad42d2a8cc2ab4a.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:47 +10:00
Christophe Leroy 07cd18320e powerpc/opcodes: Add shorter macros for registers for use with PPC_RAW_xx()
Today we have __REG_Rx macros . They are mainly meant for
internal use by macros __PPC_RA() and friends macros which
allows uses like __PPC_RA(R12).

When used with PPC_RAW_xx() macros, it gives a result which is
not very readable.

Add shorter macros _Rx in order to improve readability when
used with PPC_RAW_xx() macros.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ec34d92b7c2f810622261acfeeed4b0a0f4d01bd.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:47 +10:00
Christophe Leroy 148a047602 powerpc: Rework PPC_RAW_xxx() macros for prefixed instructions
At the time being, we have PPC_RAW_PLXVP() and PPC_RAW_PSTXVP() which
provide a 64 bits value, and then it gets split by open coding to
format it into a 'struct ppc_inst' instruction.

Instead, define a PPC_RAW_xxx_P() and a PPC_RAW_xxx_S() to be used
as is.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d146b31b943e7ad674894421db4feef54804b9b.1621506159.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:47 +10:00
Christophe Leroy 4423eff71c powerpc: Force inlining of csum_add()
Commit 328e7e487a ("powerpc: force inlining of csum_partial() to
avoid multiple csum_partial() with GCC10") inlined csum_partial().

Now that csum_partial() is inlined, GCC outlines csum_add() when
called by csum_partial().

c064fb28 <csum_add>:
c064fb28:	7c 63 20 14 	addc    r3,r3,r4
c064fb2c:	7c 63 01 94 	addze   r3,r3
c064fb30:	4e 80 00 20 	blr

c0665fb8 <csum_add>:
c0665fb8:	7c 63 20 14 	addc    r3,r3,r4
c0665fbc:	7c 63 01 94 	addze   r3,r3
c0665fc0:	4e 80 00 20 	blr

c066719c:	7c 9a c0 2e 	lwzx    r4,r26,r24
c06671a0:	38 60 00 00 	li      r3,0
c06671a4:	7f 1a c2 14 	add     r24,r26,r24
c06671a8:	4b ff ee 11 	bl      c0665fb8 <csum_add>
c06671ac:	80 98 00 04 	lwz     r4,4(r24)
c06671b0:	4b ff ee 09 	bl      c0665fb8 <csum_add>
c06671b4:	80 98 00 08 	lwz     r4,8(r24)
c06671b8:	4b ff ee 01 	bl      c0665fb8 <csum_add>
c06671bc:	a0 98 00 0c 	lhz     r4,12(r24)
c06671c0:	4b ff ed f9 	bl      c0665fb8 <csum_add>
c06671c4:	7c 63 18 f8 	not     r3,r3
c06671c8:	81 3f 00 68 	lwz     r9,104(r31)
c06671cc:	81 5f 00 a0 	lwz     r10,160(r31)
c06671d0:	7d 29 18 14 	addc    r9,r9,r3
c06671d4:	7d 29 01 94 	addze   r9,r9
c06671d8:	91 3f 00 68 	stw     r9,104(r31)
c06671dc:	7d 1a 50 50 	subf    r8,r26,r10
c06671e0:	83 01 00 10 	lwz     r24,16(r1)
c06671e4:	83 41 00 18 	lwz     r26,24(r1)

The sum with 0 is useless, should have been skipped.
And there is even one completely unused instance of csum_add().

In file included from ./include/net/checksum.h:22,
                 from ./include/linux/skbuff.h:28,
                 from ./include/linux/icmp.h:16,
                 from net/ipv6/ip6_tunnel.c:23:
./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv':
./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline]
   94 | static inline __wsum csum_add(__wsum csum, __wsum addend)
      |                      ^~~~~~~~
./arch/powerpc/include/asm/checksum.h:172:31: note: called from here
  172 |                         sum = csum_add(sum, (__force __wsum)*(const u32 *)buff);
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline]
   94 | static inline __wsum csum_add(__wsum csum, __wsum addend)
      |                      ^~~~~~~~
./arch/powerpc/include/asm/checksum.h:177:31: note: called from here
  177 |                         sum = csum_add(sum, (__force __wsum)
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  178 |                                             *(const u32 *)(buff + 4));
      |                                             ~~~~~~~~~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline]
   94 | static inline __wsum csum_add(__wsum csum, __wsum addend)
      |                      ^~~~~~~~
./arch/powerpc/include/asm/checksum.h:183:31: note: called from here
  183 |                         sum = csum_add(sum, (__force __wsum)
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  184 |                                             *(const u32 *)(buff + 8));
      |                                             ~~~~~~~~~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline]
   94 | static inline __wsum csum_add(__wsum csum, __wsum addend)
      |                      ^~~~~~~~
./arch/powerpc/include/asm/checksum.h:186:31: note: called from here
  186 |                         sum = csum_add(sum, (__force __wsum)
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  187 |                                             *(const u16 *)(buff + 12));
      |                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~

Force inlining of csum_add().

     94c:	80 df 00 a0 	lwz     r6,160(r31)
     950:	7d 28 50 2e 	lwzx    r9,r8,r10
     954:	7d 48 52 14 	add     r10,r8,r10
     958:	80 aa 00 04 	lwz     r5,4(r10)
     95c:	80 ff 00 68 	lwz     r7,104(r31)
     960:	7d 29 28 14 	addc    r9,r9,r5
     964:	7d 29 01 94 	addze   r9,r9
     968:	7d 08 30 50 	subf    r8,r8,r6
     96c:	80 aa 00 08 	lwz     r5,8(r10)
     970:	a1 4a 00 0c 	lhz     r10,12(r10)
     974:	7d 29 28 14 	addc    r9,r9,r5
     978:	7d 29 01 94 	addze   r9,r9
     97c:	7d 29 50 14 	addc    r9,r9,r10
     980:	7d 29 01 94 	addze   r9,r9
     984:	7d 29 48 f8 	not     r9,r9
     988:	7c e7 48 14 	addc    r7,r7,r9
     98c:	7c e7 01 94 	addze   r7,r7
     990:	90 ff 00 68 	stw     r7,104(r31)

In the non-inlined version, the first sum with 0 was performed.
Here it is skipped.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f7f4d4e364de6e473da874468b903da6e5d97adc.1620713272.git.christophe.leroy@csgroup.eu
2021-06-16 00:16:47 +10:00
Michael Ellerman a4785e93aa Merge branch 'fixes' into next
Merge our fixes branch which has a number of important fixes, notably
the fix for initrd corruption, as well as the fixes for scv vs ptrace.
2021-06-16 00:14:55 +10:00
Michael Ellerman 478036c4cd powerpc: Fix initrd corruption with relative jump labels
Commit b0b3b2c78e ("powerpc: Switch to relative jump labels") switched
us to using relative jump labels. That involves changing the code,
target and key members in struct jump_entry to be relative to the
address of the jump_entry, rather than absolute addresses.

We have two static inlines that create a struct jump_entry,
arch_static_branch() and arch_static_branch_jump(), as well as an asm
macro ARCH_STATIC_BRANCH, which is used by the pseries-only hypervisor
tracing code.

Unfortunately we missed updating the key to be a relative reference in
ARCH_STATIC_BRANCH.

That causes a pseries kernel to have a handful of jump_entry structs
with bad key values. Instead of being a relative reference they instead
hold the full address of the key.

However the code doesn't expect that, it still adds the key value to the
address of the jump_entry (see jump_entry_key()) expecting to get a
pointer to a key somewhere in kernel data.

The table of jump_entry structs sits in rodata, which comes after the
kernel text. In a typical build this will be somewhere around 15MB. The
address of the key will be somewhere in data, typically around 20MB.
Adding the two values together gets us a pointer somewhere around 45MB.

We then call static_key_set_entries() with that bad pointer and modify
some members of the struct static_key we think we are pointing at.

A pseries kernel is typically ~30MB in size, so writing to ~45MB won't
corrupt the kernel itself. However if we're booting with an initrd,
depending on the size and exact location of the initrd, we can corrupt
the initrd. Depending on how exactly we corrupt the initrd it can either
cause the system to not boot, or just corrupt one of the files in the
initrd.

The fix is simply to make the key value relative to the jump_entry
struct in the ARCH_STATIC_BRANCH macro.

Fixes: b0b3b2c78e ("powerpc: Switch to relative jump labels")
Reported-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reported-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reported-by: Greg Kurz <groug@kaod.org>
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Daniel Axtens <dja@axtens.net>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210614131440.312360-1-mpe@ellerman.id.au
2021-06-15 23:35:57 +10:00
Nicholas Piggin 0bf7e1b2e9 KVM: PPC: Book3S HV P9: implement hash host / hash guest support
Implement support for hash guests under hash host. This has to save and
restore the host SLB, and ensure that the MMU is off while switching
into the guest SLB.

POWER9 and later CPUs now always go via the P9 path. The "fast" guest
mode is now renamed to the P9 mode, which is consistent with its
functionality and the rest of the naming.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-32-npiggin@gmail.com
2021-06-10 22:12:15 +10:00
Nicholas Piggin 079a09a500 KVM: PPC: Book3S HV P9: implement hash guest support
Implement hash guest support. Guest entry/exit has to restore and
save/clear the SLB, plus several other bits to accommodate hash guests
in the P9 path. Radix host, hash guest support is removed from the P7/8
path.

The HPT hcalls and faults are not handled in real mode, which is a
performance regression. A worst-case fork/exit microbenchmark takes 3x
longer after this patch. kbuild benchmark performance is in the noise,
but the slowdown is likely to be noticed somewhere.

For now, accept this penalty for the benefit of simplifying the P7/8
paths and unifying P9 hash with the new code, because hash is a less
important configuration than radix on processors that support it. Hash
will benefit from future optimisations to this path, including possibly
a faster path to handle such hcalls and interrupts without doing a full
exit.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-31-npiggin@gmail.com
2021-06-10 22:12:15 +10:00
Nicholas Piggin dcbac73a5b KVM: PPC: Book3S HV: Remove virt mode checks from real mode handlers
Now that the P7/8 path no longer supports radix, real-mode handlers
do not need to deal with being called in virt mode.

This change effectively reverts commit acde25726b ("KVM: PPC: Book3S
HV: Add radix checks in real-mode hypercall handlers").

It removes a few more real-mode tests in rm hcall handlers, which
allows the indirect ops for the xive module to be removed from the
built-in xics rm handlers.

kvmppc_h_random is renamed to kvmppc_rm_h_random to be a bit more
descriptive and consistent with other rm handlers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-25-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin aaae8c7900 KVM: PPC: Book3S HV: Remove support for dependent threads mode on P9
Dependent-threads mode is the normal KVM mode for pre-POWER9 SMT
processors, where all threads in a core (or subcore) would run the same
partition at the same time, or they would run the host.

This design was mandated by MMU state that is shared between threads in
a processor, so the synchronisation point is in hypervisor real-mode
that has essentially no shared state, so it's safe for multiple threads
to gather and switch to the correct mode.

It is implemented by having the host unplug all secondary threads and
always run in SMT1 mode, and host QEMU threads essentially represent
virtual cores that wake these secondary threads out of unplug when the
ioctl is called to run the guest. This happens via a side-path that is
mostly invisible to the rest of the Linux host and the secondary threads
still appear to be unplugged.

POWER9 / ISA v3.0 has a more flexible MMU design that is independent
per-thread and allows a much simpler KVM implementation. Before the new
"P9 fast path" was added that began to take advantage of this, POWER9
support was implemented in the existing path which has support to run
in the dependent threads mode. So it was not much work to add support to
run POWER9 in this dependent threads mode.

The mode is not required by the POWER9 MMU (although "mixed-mode" hash /
radix MMU limitations of early processors were worked around using this
mode). But it is one way to run SMT guests without running different
guests or guest and host on different threads of the same core, so it
could avoid or reduce some SMT attack surfaces without turning off SMT
entirely.

This security feature has some real, if indeterminate, value. However
the old path is lagging in features (nested HV), and with this series
the new P9 path adds remaining missing features (radix prefetch bug
and hash support, in later patches), so POWER9 dependent threads mode
support would be the only remaining reason to keep that code in and keep
supporting POWER9/POWER10 in the old path. So here we make the call to
drop this feature.

Remove dependent threads mode support for POWER9 and above processors.
Systems can still achieve this security by disabling SMT entirely, but
that would generally come at a larger performance cost for guests.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-23-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin 2e1ae9cd56 KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU
Rather than partition the guest PID space + flush a rogue guest PID to
work around this problem, instead fix it by always disabling the MMU when
switching in or out of guest MMU context in HV mode.

This may be a bit less efficient, but it is a lot less complicated and
allows the P9 path to trivally implement the workaround too. Newer CPUs
are not subject to this issue.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com
2021-06-10 22:12:14 +10:00
Nicholas Piggin c00366e237 KVM: PPC: Book3S HV P9: inline kvmhv_load_hv_regs_and_go into __kvmhv_vcpu_entry_p9
Now the initial C implementation is done, inline more HV code to make
rearranging things easier.

And rename __kvmhv_vcpu_entry_p9 to drop the leading underscores as it's
now C, and is now a more complete vcpu entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-16-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin 89d35b2391 KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C
Almost all logic is moved to C, by introducing a new in_guest mode for
the P9 path that branches very early in the KVM interrupt handler to P9
exit code.

The main P9 entry and exit assembly is now only about 160 lines of low
level stack setup and register save/restore, plus a bad-interrupt
handler.

There are two motivations for this, the first is just make the code more
maintainable being in C. The second is to reduce the amount of code
running in a special KVM mode, "realmode". In quotes because with radix
it is no longer necessarily real-mode in the MMU, but it still has to be
treated specially because it may be in real-mode, and has various
important registers like PID, DEC, TB, etc set to guest. This is hostile
to the rest of Linux and can't use arbitrary kernel functionality or be
instrumented well.

This initial patch is a reasonably faithful conversion of the asm code,
but it does lack any loop to return quickly back into the guest without
switching out of realmode in the case of unimportant or easily handled
interrupts. As explained in previous changes, handling HV interrupts
very quickly in this low level realmode is not so important for P9
performance, and are important to avoid for security, observability,
debugability reasons.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-15-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin 9dc2babc18 KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path
In the interest of minimising the amount of code that is run in
"real-mode", don't handle hcalls in real mode in the P9 path. This
requires some new handlers for H_CEDE and xics-on-xive to be added
before xive is pulled or cede logic is checked.

This introduces a change in radix guest behaviour where radix guests
that execute 'sc 1' in userspace now get a privilege fault whereas
previously the 'sc 1' would be reflected as a syscall interrupt to the
guest kernel. That reflection is only required for hash guests that run
PR KVM.

Background:

In POWER8 and earlier processors, it is very expensive to exit from the
HV real mode context of a guest hypervisor interrupt, and switch to host
virtual mode. On those processors, guest->HV interrupts reach the
hypervisor with the MMU off because the MMU is loaded with guest context
(LPCR, SDR1, SLB), and the other threads in the sub-core need to be
pulled out of the guest too. Then the primary must save off guest state,
invalidate SLB and ERAT, and load up host state before the MMU can be
enabled to run in host virtual mode (~= regular Linux mode).

Hash guests also require a lot of hcalls to run due to the nature of the
MMU architecture and paravirtualisation design. The XICS interrupt
controller requires hcalls to run.

So KVM traditionally tries hard to avoid the full exit, by handling
hcalls and other interrupts in real mode as much as possible.

By contrast, POWER9 has independent MMU context per-thread, and in radix
mode the hypervisor is in host virtual memory mode when the HV interrupt
is taken. Radix guests do not require significant hcalls to manage their
translations, and xive guests don't need hcalls to handle interrupts. So
it's much less important for performance to handle hcalls in real mode on
POWER9.

One caveat is that the TCE hcalls are performance critical, real-mode
variants introduced for POWER8 in order to achieve 10GbE performance.
Real mode TCE hcalls were found to be less important on POWER9, which
was able to drive 40GBe networking without them (using the virt mode
hcalls) but performance is still important. These hcalls will benefit
from subsequent guest entry/exit optimisation including possibly a
faster "partial exit" that does not entirely switch to host context to
handle the hcall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-14-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin 6ffe2c6e6d KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races
irq_work's use of the DEC SPR is racy with guest<->host switch and guest
entry which flips the DEC interrupt to guest, which could lose a host
work interrupt.

This patch closes one race, and attempts to comment another class of
races.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-11-npiggin@gmail.com
2021-06-10 22:12:13 +10:00
Nicholas Piggin 023c3c96ca KVM: PPC: Book3S HV P9: implement kvmppc_xive_pull_vcpu in C
This is more symmetric with kvmppc_xive_push_vcpu, and has the advantage
that it runs with the MMU on.

The extra test added to the asm will go away with a future change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-9-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Nicholas Piggin 04ece7b60b KVM: PPC: Book3S 64: Move hcall early register setup to KVM
System calls / hcalls have a different calling convention than
other interrupts, so there is code in the KVMTEST to massage these
into the same form as other interrupt handlers.

Move this work into the KVM hcall handler. This means teaching KVM
a little more about the low level interrupt handler setup, PACA save
areas, etc., although that's not obviously worse than the current
approach of coming up with an entirely different interrupt register
/ save convention.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210528090752.3542186-5-npiggin@gmail.com
2021-06-10 22:12:12 +10:00
Marc Zyngier 582f5aa1db powerpc: Drop dependency between asm/irq.h and linux/irqdomain.h
Directly including linux/irqdomain.h was hiding all sort of sins,
which have now been fixed. Drop the spurious include.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10 13:09:17 +01:00
Marc Zyngier 7c576f4d3c powerpc: Convert irq_domain_add_legacy_isa use to irq_domain_add_legacy
irq_domain_add_legacy_isa is a pain. It only exists for the benefit of
two PPC-specific drivers, and creates an ugly dependency between asm/irq.h
and linux/irqdomain.h

Instead, let's convert these two drivers to irq_domain_add_legacy(),
stop using NUM_ISA_INTERRUPTS by directly setting NR_IRQS_LEGACY.

The dependency cannot be broken yet as there is a lot of PPC-related
code that depends on it, but that's the first step towards it.

A followup patch will remove irq_domain_add_legacy_isa.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-10 13:09:16 +01:00
Geoff Levand 9733862e50 powerpc/ps3: Add dma_mask to ps3_dma_region
Commit f959dcd6dd (dma-direct: Fix
potential NULL pointer dereference) added a null check on the
dma_mask pointer of the kernel's device structure.

Add a dma_mask variable to the ps3_dma_region structure and set
the device structure's dma_mask pointer to point to this new variable.

Fixes runtime errors like these:
# WARNING: Fixes tag on line 10 doesn't match correct format
# WARNING: Fixes tag on line 10 doesn't match correct format

  ps3_system_bus_match:349: dev=8.0(sb_01), drv=8.0(ps3flash): match
  WARNING: CPU: 0 PID: 1 at kernel/dma/mapping.c:151 .dma_map_page_attrs+0x34/0x1e0
  ps3flash sb_01: ps3stor_setup:193: map DMA region failed

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/562d0c9ea0100a30c3b186bcc7adb34b0bbd2cd7.1622746428.git.geoff@infradead.org
2021-06-10 21:44:58 +10:00
Geoff Levand 6caebff168 powerpc/ps3: Add CONFIG_PS3_VERBOSE_RESULT option
To aid debugging, add a new PS3 kernel config option
PS3_VERBOSE_RESULT that, when enabled, will print more
verbose messages for the result of LV1 hypercalls.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0ce4b6969a08094a747bd382dbfd30b72ebc192d.1622746428.git.geoff@infradead.org
2021-06-10 21:44:57 +10:00
Nathan Chancellor 015d98149b powerpc/barrier: Avoid collision with clang's __lwsync macro
A change in clang 13 results in the __lwsync macro being defined as
__builtin_ppc_lwsync, which emits 'lwsync' or 'msync' depending on what
the target supports. This breaks the build because of -Werror in
arch/powerpc, along with thousands of warnings:

 In file included from arch/powerpc/kernel/pmc.c:12:
 In file included from include/linux/bug.h:5:
 In file included from arch/powerpc/include/asm/bug.h:109:
 In file included from include/asm-generic/bug.h:20:
 In file included from include/linux/kernel.h:12:
 In file included from include/linux/bitops.h:32:
 In file included from arch/powerpc/include/asm/bitops.h:62:
 arch/powerpc/include/asm/barrier.h:49:9: error: '__lwsync' macro redefined [-Werror,-Wmacro-redefined]
 #define __lwsync()      __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
        ^
 <built-in>:308:9: note: previous definition is here
 #define __lwsync __builtin_ppc_lwsync
        ^
 1 error generated.

Undefine this macro so that the runtime patching introduced by
commit 2d1b202762 ("powerpc: Fixup lwsync at runtime") continues to
work properly with clang and the build no longer breaks.

Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1386
Link: 62b5df7fe2
Link: https://lore.kernel.org/r/20210528182752.1852002-1-nathan@kernel.org
2021-06-10 21:44:57 +10:00
Linus Torvalds bd7b12aa60 powerpc fixes for 5.13 #5
Fix our KVM reverse map real-mode handling since we enabled huge vmalloc (in some
 configurations).
 
 Revert a recent change to our IOMMU code which broke some devices.
 
 Fix KVM handling of FSCR on P7/P8, which could have possibly let a guest crash it's Qemu.
 
 Fix kprobes validation of prefixed instructions across page boundary.
 
 Thanks to: Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas, Frederic Barrat, Naveen
 N. Rao, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmC8wi8THG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgN42D/4vHCHX4T0CZ/5bwh1RMOoGKM+PFyLe
 BoA2i8lvUILG1+LOiRJuBnVZiWwKYBqfkkfY4BmQpU3Oe3gjbJJwc9QGGHUDarWn
 NmMPqVgaO5qXObObKXzBU1Ihq4UQwMhK044srzXcgMYyTnSFNgWQAsvO0+0Cl4K4
 uT100AFV4tps8dLCHCq2XVHuQALnHzZah4yQ8i6u1TMN/TK+kXyONrMSCgsQ1mrM
 dDsT1zVeegj8EuW/n9kXkLNp2YZeatptZB7cPDtojlhCQTsZBcKnYtDq5ScASuwy
 7hGjzA2SyWsa6l0Iejoj8tr/ZS8Nutftz3izuhDNLEf4foz0tOWqxbXJayOA5J7w
 vzs9OSFbT6z/svELSIkRCvfePqUdDdC2MthWoShgv0SoIXj+Y7ABKQRW9B5rLeF5
 RiB2kCB+7S/03qjDtn57IlJC6aVoHzglTAdYXuj7guUEsZQrmtsdm1IM4eB0XYyx
 A9/AMCGSbswT0/IUriO4b9FtWGOJJf1vWv3WeqE63gPxqhyTz1ACqMT/0HLrARJZ
 /QLZrbuOSMBSGDnmJxy3vzb+3fxGxSGrUcoYc6MiSODuRgf7zHuRJsSDwoftnOTW
 PXVWPVz9ef0OEmuBJyEgTrO+/g9jjCPw8UJz9EaFzkMHbaoHRuZdo2m8X6zrXQLh
 AUVlDkkSmblY9w==
 =KkfQ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix our KVM reverse map real-mode handling since we enabled huge
  vmalloc (in some configurations).

  Revert a recent change to our IOMMU code which broke some devices.

  Fix KVM handling of FSCR on P7/P8, which could have possibly let a
  guest crash it's Qemu.

  Fix kprobes validation of prefixed instructions across page boundary.

  Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
  Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"

* tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
  KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
  powerpc: Fix reverse map real-mode address lookup with huge vmalloc
  powerpc/kprobes: Fix validation of prefixed instructions across page boundary
2021-06-06 12:39:36 -07:00
Linus Torvalds 224478289c ARM fixes:
* Another state update on exit to userspace fix
 
 * Prevent the creation of mixed 32/64 VMs
 
 * Fix regression with irqbypass not restarting the guest on failed connect
 
 * Fix regression with debug register decoding resulting in overlapping access
 
 * Commit exception state on exit to usrspace
 
 * Fix the MMU notifier return values
 
 * Add missing 'static' qualifiers in the new host stage-2 code
 
 x86 fixes:
 * fix guest missed wakeup with assigned devices
 
 * fix WARN reported by syzkaller
 
 * do not use BIT() in UAPI headers
 
 * make the kvm_amd.avic parameter bool
 
 PPC fixes:
 * make halt polling heuristics consistent with other architectures
 
 selftests:
 * various fixes
 
 * new performance selftest memslot_perf_test
 
 * test UFFD minor faults in demand_paging_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCyF0MUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOHSgf/Q4Hm5e12Bj2xJy6A+iShnrbbT8PW
 hcIIOA7zGWXfjVYcBV7anbj7CcpzfIz0otcRBABa5mkhj+fb3YmPEb0EzCPi4Hru
 zxpcpB2w7W7WtUOIKe2EmaT+4Pk6/iLcfr8UMHMqx460akE9OmIg10QNWai3My/3
 RIOeakSckBI9e/1TQZbxH66dsLwCT0lLco7i7AWHdFxkzUQyoA34HX5pczOCBsO5
 3nXH+/txnRVhqlcyzWLVVGVzFqmpHtBqkIInDOXfUqIoxo/gOhOgF1QdMUEKomxn
 5ZFXlL5IXNtr+7yiI67iHX7CWkGZE9oJ04TgPHn6LR6wRnVvc3JInzcB5Q==
 =ollO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM fixes:

   - Another state update on exit to userspace fix

   - Prevent the creation of mixed 32/64 VMs

   - Fix regression with irqbypass not restarting the guest on failed
     connect

   - Fix regression with debug register decoding resulting in
     overlapping access

   - Commit exception state on exit to usrspace

   - Fix the MMU notifier return values

   - Add missing 'static' qualifiers in the new host stage-2 code

  x86 fixes:

   - fix guest missed wakeup with assigned devices

   - fix WARN reported by syzkaller

   - do not use BIT() in UAPI headers

   - make the kvm_amd.avic parameter bool

  PPC fixes:

   - make halt polling heuristics consistent with other architectures

  selftests:

   - various fixes

   - new performance selftest memslot_perf_test

   - test UFFD minor faults in demand_paging_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
  selftests: kvm: fix overlapping addresses in memslot_perf_test
  KVM: X86: Kill off ctxt->ud
  KVM: X86: Fix warning caused by stale emulation context
  KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
  KVM: x86/mmu: Fix comment mentioning skip_4k
  KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
  KVM: x86: add start_assignment hook to kvm_x86_ops
  KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  selftests: kvm: do only 1 memslot_perf_test run by default
  KVM: X86: Use _BITUL() macro in UAPI headers
  KVM: selftests: add shared hugetlbfs backing source type
  KVM: selftests: allow using UFFD minor faults for demand paging
  KVM: selftests: create alias mappings when using shared memory
  KVM: selftests: add shmem backing source type
  KVM: selftests: refactor vm_mem_backing_src_type flags
  KVM: selftests: allow different backing source types
  KVM: selftests: compute correct demand paging size
  KVM: selftests: simplify setup_demand_paging error handling
  KVM: selftests: Print a message if /dev/kvm is missing
  ...
2021-05-29 06:02:25 -10:00
Nicholas Piggin 5362a4b6ee powerpc: Fix reverse map real-mode address lookup with huge vmalloc
real_vmalloc_addr() does not currently work for huge vmalloc, which is
what the reverse map can be allocated with for radix host, hash guest.

Extract the hugepage aware equivalent from eeh code into a helper, and
convert existing sites including this one to use it.

Fixes: 8abddd968a ("powerpc/64s/radix: Enable huge vmalloc mappings")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526120005.3432222-1-npiggin@gmail.com
2021-05-28 22:54:27 +10:00
Marcelo Tosatti 084071d5e9 KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
KVM_REQ_UNBLOCK will be used to exit a vcpu from
its inner vcpu halt emulation loop.

Rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK, switch
PowerPC to arch specific request bit.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Message-Id: <20210525134321.303768132@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-27 07:57:38 -04:00
Mark Rutland 9eaa82935d locking/atomic: powerpc: move to ARCH_ATOMIC
We'd like all architectures to convert to ARCH_ATOMIC, as once all
architectures are converted it will be possible to make significant
cleanups to the atomics headers, and this will make it much easier to
generically enable atomic functionality (e.g. debug logic in the
instrumented wrappers).

As a step towards that, this patch migrates powerpc to ARCH_ATOMIC. The
arch code provides arch_{atomic,atomic64,xchg,cmpxchg}*(), and common
code wraps these with optional instrumentation to provide the regular
functions.

While atomic_try_cmpxchg_lock() is not part of the common atomic API, it
is given an `arch_` prefix for consistency.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210525140232.53872-28-mark.rutland@arm.com
2021-05-26 13:20:52 +02:00
Mark Rutland 6988631bdf locking/atomic: cmpxchg: make `generic` a prefix
The asm-generic implementations of cmpxchg_local() and cmpxchg64_local()
use a `_generic` suffix to distinguish themselves from arch code or
wrappers used elsewhere.

Subsequent patches will add ARCH_ATOMIC support to these
implementations, and will distinguish more functions with a `generic`
portion. To align with how ARCH_ATOMIC uses an `arch_` prefix, it would
be helpful to use a `generic_` prefix rather than a `_generic` suffix.

In preparation for this, this patch renames the existing functions to
make `generic` a prefix rather than a suffix. There should be no
functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210525140232.53872-12-mark.rutland@arm.com
2021-05-26 13:20:50 +02:00
Sathvika Vasireddy 60060d704c powerpc/sstep: Add tests for setb instruction
This adds selftests for setb instruction.

Signed-off-by: Sathvika Vasireddy <sathvika@linux.vnet.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b05b61ccb5f10279d46fed490796f32ea2ccc270.1620727160.git.sathvika@linux.vnet.ibm.com
2021-05-23 20:51:35 +10:00
Nicholas Piggin d72500f992 powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
The scv implementation missed updating syscall return value and error
value get/set functions to deal with the changed register ABI. This
broke ptrace PTRACE_GET_SYSCALL_INFO as well as some kernel auditing
and tracing functions.

Fix. tools/testing/selftests/ptrace/get_syscall_info now passes when
scv is used.

Fixes: 7fa95f9ada ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: "Dmitry V. Levin" <ldv@altlinux.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210520111931.2597127-2-npiggin@gmail.com
2021-05-21 00:58:56 +10:00
Christophe Leroy ca8cc36901 powerpc/32s: Remove asm/book3s/32/hash.h
Move the PAGE bits into pgtable.h to be more similar to book3s/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7f4aaa479569328a1e5b07c96c08fbca0cd7dd88.1620307890.git.christophe.leroy@csgroup.eu
2021-05-17 15:27:17 +10:00
Christophe Leroy b09049c516 powerpc: Only pad struct pt_regs when needed
If neither KUAP nor PPC64 is selected, there is nothing in the second
union of struct pt_regs, so the alignment padding is waste of memory.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d536bbc46094f66b24d3017343be25164f232933.1620307840.git.christophe.leroy@csgroup.eu
2021-05-17 15:27:16 +10:00
Christophe Leroy 8af8d72dc5 powerpc/32s: Speed up likely path of kuap_update_sr()
In most cases, kuap_update_sr() will update a single segment
register.

We know that first update will always be done, if there is no
segment register to update at all, kuap_update_sr() is not
called.

Avoid recurring calculations and tests in that case.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/848f18d213b8341939add7302dc4ef80cc7a12e3.1620307636.git.christophe.leroy@csgroup.eu
2021-05-17 15:27:16 +10:00
Christophe Leroy 0441729e16 powerpc/mmu: Remove leftover CONFIG_E200
Commit 39c8bf2b3c ("powerpc: Retire e200 core (mpc555x processor)")
removed CONFIG_E200.
Commit f9158d58a4 ("powerpc/mm: Add mask of always present MMU
features") was merged in the same cycle and added a new use of
CONFIG_E200.

Remove that use.

Fixes: f9158d58a4 ("powerpc/mm: Add mask of always present MMU features")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/594fa3cc0df11e21644fd6a584851ae4f164b2bf.1620367249.git.christophe.leroy@csgroup.eu
2021-05-17 15:27:16 +10:00
Christophe Leroy fe3dc333d2 powerpc/mmu: Don't duplicate radix_enabled()
mmu_has_feature(MMU_FTR_TYPE_RADIX) can be evaluated regardless of
CONFIG_PPC_RADIX_MMU.

When CONFIG_PPC_RADIX_MMU is not set, mmu_has_feature(MMU_FTR_TYPE_RADIX)
will evaluate to 'false' at build time because MMU_FTR_TYPE_RADIX
wont be included in MMU_FTRS_POSSIBLE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/62743846cbd493e5d9a02e197c2672a1d30df149.1620366342.git.christophe.leroy@csgroup.eu
2021-05-17 15:27:16 +10:00
Christophe Leroy 13c7dad951 powerpc/paca: Remove mm_ctx_id and mm_ctx_slb_addr_limit
mm_ctx_id and mm_ctx_slb_addr_limit are not used anymore.

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6e1813953da38c452c131fe3e2a2761a0fddb975.1620223303.git.christophe.leroy@csgroup.eu
2021-05-17 15:27:15 +10:00
Nicholas Piggin 4ec5feec1a powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled
scv support introduced the notion of code that implicitly soft-masks
irqs due to the instruction addresses. This is required because scv
enters the kernel with MSR[EE]=1.

If a NMI (including soft-NMI) interrupt hits when we are implicitly
soft-masked then its regs->softe does not reflect this because it is
derived from the explicit soft mask state (paca->irq_soft_mask). This
makes arch_irq_disabled_regs(regs) return false.

This can trigger a warning in the soft-NMI watchdog code (shown below).
Fix it by having NMI interrupts set regs->softe to disabled in case of
interrupting an implicit soft-masked region.

  ------------[ cut here ]------------
  WARNING: CPU: 41 PID: 1103 at arch/powerpc/kernel/watchdog.c:259 soft_nmi_interrupt+0x3e4/0x5f0
  CPU: 41 PID: 1103 Comm: (spawn) Not tainted
  NIP:  c000000000039534 LR: c000000000039234 CTR: c000000000009a00
  REGS: c000007fffbcf940 TRAP: 0700   Not tainted
  MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 22042482  XER: 200400ad
  CFAR: c000000000039260 IRQMASK: 3
  GPR00: c000000000039204 c000007fffbcfbe0 c000000001d6c300 0000000000000003
  GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020
  GPR08: 0000007ffd4e0000 0000000000000000 c000007ffffceb00 7265677368657265
  GPR12: 9000000000009033 c000007ffffceb00 00000f7075bf4480 000000000000002a
  GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000
  GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040
  GPR24: 0000000000000000 0000000000000029 c000000001dae058 0000000000000029
  GPR28: 0000000000000000 0000000000000800 0000000000000009 c000007fffbcfd60
  NIP [c000000000039534] soft_nmi_interrupt+0x3e4/0x5f0
  LR [c000000000039234] soft_nmi_interrupt+0xe4/0x5f0
  Call Trace:
  [c000007fffbcfbe0] [c000000000039204] soft_nmi_interrupt+0xb4/0x5f0 (unreliable)
  [c000007fffbcfcf0] [c00000000000c0e8] soft_nmi_common+0x138/0x1c4
  --- interrupt: 900 at end_real_trampolines+0x0/0x1000
  NIP:  c000000000003000 LR: 00007ca426adb03c CTR: 900000000280f033
  REGS: c000007fffbcfd60 TRAP: 0900
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44042482  XER: 200400ad
  CFAR: 00007ca426946020 IRQMASK: 0
  GPR00: 00000000000000ad 00007ffffa45d050 00007ca426b07f00 0000000000000035
  GPR04: 00007ffffa45d078 0000000000000000 0000000000000008 0000000000000020
  GPR08: 0000000000000000 0000000000100000 0000000010000000 00007ffffa45d110
  GPR12: 0000000000000001 00007ca426d4e680 00000f7075bf4480 000000000000002a
  GPR16: 00000f705745a528 00007ffffa45ddd8 00000f70574d0008 0000000000000000
  GPR20: 00000f7075c58d70 00000f7057459c38 0000000000000001 0000000000000040
  GPR24: 0000000000000000 00000f7057473f68 0000000000000003 000000000000041b
  GPR28: 00007ffffa45d4c4 0000000000000035 0000000000000000 00000f7057473f68
  NIP [c000000000003000] end_real_trampolines+0x0/0x1000
  LR [00007ca426adb03c] 0x7ca426adb03c
  --- interrupt: 900
  Instruction dump:
  60000000 60000000 60420000 38600001 482b3ae5 60000000 e93f0138 a36d0008
  7daa6b78 71290001 7f7907b4 4082fd34 <0fe00000> 4bfffd2c 60420000 ea6100a8
  ---[ end trace dc75f67d819779da ]---

Fixes: 118178e62e ("powerpc: move NMI entry/exit code into wrapper")
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210503111708.758261-1-npiggin@gmail.com
2021-05-14 17:28:52 +10:00
Christophe Leroy 7315e457d6 powerpc/uaccess: Fix __get_user() with CONFIG_CC_HAS_ASM_GOTO_OUTPUT
Building kernel mainline with GCC 11 leads to following failure
when starting 'init':

  init[1]: bad frame in sys_sigreturn: 7ff5a900 nip 001083cc lr 001083c4
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

This is an issue due to a segfault happening in
__unsafe_restore_general_regs() in a loop copying registers from user
to kernel:

  10:	7d 09 03 a6 	mtctr   r8
  14:	80 ca 00 00 	lwz     r6,0(r10)
  18:	80 ea 00 04 	lwz     r7,4(r10)
  1c:	90 c9 00 08 	stw     r6,8(r9)
  20:	90 e9 00 0c 	stw     r7,12(r9)
  24:	39 0a 00 08 	addi    r8,r10,8
  28:	39 29 00 08 	addi    r9,r9,8
  2c:	81 4a 00 08 	lwz     r10,8(r10)  <== r10 is clobbered here
  30:	81 6a 00 0c 	lwz     r11,12(r10)
  34:	91 49 00 08 	stw     r10,8(r9)
  38:	91 69 00 0c 	stw     r11,12(r9)
  3c:	39 48 00 08 	addi    r10,r8,8
  40:	39 29 00 08 	addi    r9,r9,8
  44:	42 00 ff d0 	bdnz    14 <__unsafe_restore_general_regs+0x14>

As shown above, this is due to r10 being re-used by GCC. This didn't
happen with CLANG.

This is fixed by tagging 'x' output as an earlyclobber operand in
__get_user_asm2_goto().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/cf0a050d124d4f426cdc7a74009d17b01d8d8969.1620465917.git.christophe.leroy@csgroup.eu
2021-05-12 11:07:39 +10:00
Nicholas Piggin 7058f4b13e powerpc/pseries: use notrace hcall variant for H_CEDE idle
Rather than special-case H_CEDE in the hcall trace wrappers, make the
idle H_CEDE call use plpar_hcall_norets_notrace().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210508101455.1578318-4-npiggin@gmail.com
2021-05-12 11:07:38 +10:00
Nicholas Piggin 2c8c89b958 powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks
The paravit queued spinlock slow path adds itself to the queue then
calls pv_wait to wait for the lock to become free. This is implemented
by calling H_CONFER to donate cycles.

When hcall tracing is enabled, this H_CONFER call can lead to a spin
lock being taken in the tracing code, which will result in the lock to
be taken again, which will also go to the slow path because it queues
behind itself and so won't ever make progress.

An example trace of a deadlock:

  __pv_queued_spin_lock_slowpath
  trace_clock_global
  ring_buffer_lock_reserve
  trace_event_buffer_lock_reserve
  trace_event_buffer_reserve
  trace_event_raw_event_hcall_exit
  __trace_hcall_exit
  plpar_hcall_norets_trace
  __pv_queued_spin_lock_slowpath
  trace_clock_global
  ring_buffer_lock_reserve
  trace_event_buffer_lock_reserve
  trace_event_buffer_reserve
  trace_event_raw_event_rcu_dyntick
  rcu_irq_exit
  irq_exit
  __do_irq
  call_do_irq
  do_IRQ
  hardware_interrupt_common_virt

Fix this by introducing plpar_hcall_norets_notrace(), and using that to
make SPLPAR virtual processor dispatching hcalls by the paravirt
spinlock code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210508101455.1578318-2-npiggin@gmail.com
2021-05-12 11:07:38 +10:00
Christophe Leroy a78339698a powerpc/interrupts: Fix kuep_unlock() call
Same as kuap_user_restore(), kuep_unlock() has to be called when
really returning to user, that is in interrupt_exit_user_prepare(),
not in interrupt_exit_prepare().

Fixes: b5efec00b6 ("powerpc/32s: Move KUEP locking/unlocking in C")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b831e54a2579db24fbef836ed415588ce2b3e825.1620312573.git.christophe.leroy@csgroup.eu
2021-05-12 11:07:37 +10:00
Arnd Bergmann f12d3ff3f4 powerpc: use linux/unaligned/le_struct.h on LE power7
Little-endian POWER7 kernels disable
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because that is not supported on
the hardware, but the kernel still uses direct load/store for explicti
get_unaligned()/put_unaligned().

I assume this is a mistake that leads to power7 having to trap and fix
up all these unaligned accesses at a noticeable performance cost.

The fix is completely trivial, just remove the file and use the
generic version that gets it right.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-05-10 17:50:47 +02:00
Michael Ellerman f96271cefe Merge branch 'master' into next
Merge master back into next, this allows us to resolve some conflicts in
arch/powerpc/Kconfig, and also re-sort the symbols under config PPC so
that they are in alphabetical order again.
2021-05-08 21:12:55 +10:00
Linus Torvalds a48b0872e6 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "This is everything else from -mm for this merge window.

  90 patches.

  Subsystems affected by this patch series: mm (cleanups and slub),
  alpha, procfs, sysctl, misc, core-kernel, bitmap, lib, compat,
  checkpatch, epoll, isofs, nilfs2, hpfs, exit, fork, kexec, gcov,
  panic, delayacct, gdb, resource, selftests, async, initramfs, ipc,
  drivers/char, and spelling"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (90 commits)
  mm: fix typos in comments
  mm: fix typos in comments
  treewide: remove editor modelines and cruft
  ipc/sem.c: spelling fix
  fs: fat: fix spelling typo of values
  kernel/sys.c: fix typo
  kernel/up.c: fix typo
  kernel/user_namespace.c: fix typos
  kernel/umh.c: fix some spelling mistakes
  include/linux/pgtable.h: few spelling fixes
  mm/slab.c: fix spelling mistake "disired" -> "desired"
  scripts/spelling.txt: add "overflw"
  scripts/spelling.txt: Add "diabled" typo
  scripts/spelling.txt: add "overlfow"
  arm: print alloc free paths for address in registers
  mm/vmalloc: remove vwrite()
  mm: remove xlate_dev_kmem_ptr()
  drivers/char: remove /dev/kmem for good
  mm: fix some typos and code style problems
  ipc/sem.c: mundane typo fixes
  ...
2021-05-07 00:34:51 -07:00
David Hildenbrand f2e762bab9 mm: remove xlate_dev_kmem_ptr()
Since /dev/kmem has been removed, let's remove the xlate_dev_kmem_ptr()
leftovers.

Link: https://lkml.kernel.org/r/20210324102351.6932-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07 00:26:34 -07:00
Nicholas Piggin 32b48bf851 KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks
Commit b1c5356e87 ("KVM: PPC: Convert to the gfn-based MMU notifier
callbacks") causes unmap_gfn_range and age_gfn callbacks to only work
on the first gfn in the range. It also makes the aging callbacks call
into both radix and hash aging functions for radix guests. Fix this.

Add warnings for the single-gfn calls that have been converted to range
callbacks, in case they ever receieve ranges greater than 1.

Fixes: b1c5356e87 ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks")
Reported-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210505121509.1470207-1-npiggin@gmail.com
2021-05-06 00:25:42 +10:00
Linus Torvalds 9b1f61d5d7 tracing updates for 5.13
New feature:
 
  The "func-no-repeats" option in tracefs/options directory. When set
  the function tracer will detect if the current function being traced
  is the same as the previous one, and instead of recording it, it will
  keep track of the number of times that the function is repeated in a row.
  And when another function is recorded, it will write a new event that
  shows the function that repeated, the number of times it repeated and
  the time stamp of when the last repeated function occurred.
 
 Enhancements:
 
  In order to implement the above "func-no-repeats" option, the ring
  buffer timestamp can now give the accurate timestamp of the event
  as it is being recorded, instead of having to record an absolute
  timestamp for all events. This helps the histogram code which no longer
  needs to waste ring buffer space.
 
  New validation logic to make sure all trace events that access
  dereferenced pointers do so in a safe way, and will warn otherwise.
 
 Fixes:
 
  No longer limit the PIDs of tasks that are recorded for "saved_cmdlines"
  to PID_MAX_DEFAULT (32768), as systemd now allows for a much larger
  range. This caused the mapping of PIDs to the task names to be dropped
  for all tasks with a PID greater than 32768.
 
  Change trace_clock_global() to never block. This caused a deadlock.
 
 Clean ups:
 
  Typos, prototype fixes, and removing of duplicate or unused code.
 
  Better management of ftrace_page allocations.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYI/1vBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qiL0AP9EemIC5TDh2oihqLRNeUjdTu0ryEoM
 HRFqxozSF985twD/bfkt86KQC8rLHwxTbxQZ863bmdaC6cMGFhWiF+H/MAs=
 =psYt
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New feature:

   - A new "func-no-repeats" option in tracefs/options directory.

     When set the function tracer will detect if the current function
     being traced is the same as the previous one, and instead of
     recording it, it will keep track of the number of times that the
     function is repeated in a row. And when another function is
     recorded, it will write a new event that shows the function that
     repeated, the number of times it repeated and the time stamp of
     when the last repeated function occurred.

  Enhancements:

   - In order to implement the above "func-no-repeats" option, the ring
     buffer timestamp can now give the accurate timestamp of the event
     as it is being recorded, instead of having to record an absolute
     timestamp for all events. This helps the histogram code which no
     longer needs to waste ring buffer space.

   - New validation logic to make sure all trace events that access
     dereferenced pointers do so in a safe way, and will warn otherwise.

  Fixes:

   - No longer limit the PIDs of tasks that are recorded for
     "saved_cmdlines" to PID_MAX_DEFAULT (32768), as systemd now allows
     for a much larger range. This caused the mapping of PIDs to the
     task names to be dropped for all tasks with a PID greater than
     32768.

   - Change trace_clock_global() to never block. This caused a deadlock.

  Clean ups:

   - Typos, prototype fixes, and removing of duplicate or unused code.

   - Better management of ftrace_page allocations"

* tag 'trace-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (32 commits)
  tracing: Restructure trace_clock_global() to never block
  tracing: Map all PIDs to command lines
  ftrace: Reuse the output of the function tracer for func_repeats
  tracing: Add "func_no_repeats" option for function tracing
  tracing: Unify the logic for function tracing options
  tracing: Add method for recording "func_repeats" events
  tracing: Add "last_func_repeats" to struct trace_array
  tracing: Define new ftrace event "func_repeats"
  tracing: Define static void trace_print_time()
  ftrace: Simplify the calculation of page number for ftrace_page->records some more
  ftrace: Store the order of pages allocated in ftrace_page
  tracing: Remove unused argument from "ring_buffer_time_stamp()
  tracing: Remove duplicate struct declaration in trace_events.h
  tracing: Update create_system_filter() kernel-doc comment
  tracing: A minor cleanup for create_system_filter()
  kernel: trace: Mundane typo fixes in the file trace_events_filter.c
  tracing: Fix various typos in comments
  scripts/recordmcount.pl: Make vim and emacs indent the same
  scripts/recordmcount.pl: Make indent spacing consistent
  tracing: Add a verifier to check string pointers for trace events
  ...
2021-05-03 11:19:54 -07:00
Christoph Hellwig 562d1e207d powerpc/powernv: remove the nvlink support
This code was only used by the vfio-nvlink2 code, which itself had no
proper use.  Drop this huge chunk of code build into every powernv
or generic build.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210326061311.1497642-3-hch@lst.de
2021-05-02 23:35:32 +10:00
Linus Torvalds 152d32aa84 ARM:
- Stage-2 isolation for the host kernel when running in protected mode
 
 - Guest SVE support when running in nVHE mode
 
 - Force W^X hypervisor mappings in nVHE mode
 
 - ITS save/restore for guests using direct injection with GICv4.1
 
 - nVHE panics now produce readable backtraces
 
 - Guest support for PTP using the ptp_kvm driver
 
 - Performance improvements in the S2 fault handler
 
 x86:
 
 - Optimizations and cleanup of nested SVM code
 
 - AMD: Support for virtual SPEC_CTRL
 
 - Optimizations of the new MMU code: fast invalidation,
   zap under read lock, enable/disably dirty page logging under
   read lock
 
 - /dev/kvm API for AMD SEV live migration (guest API coming soon)
 
 - support SEV virtual machines sharing the same encryption context
 
 - support SGX in virtual machines
 
 - add a few more statistics
 
 - improved directed yield heuristics
 
 - Lots and lots of cleanups
 
 Generic:
 
 - Rework of MMU notifier interface, simplifying and optimizing
 the architecture-specific code
 
 - Some selftests improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
 y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
 c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
 Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
 +2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
 M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
 =AXUi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This is a large update by KVM standards, including AMD PSP (Platform
  Security Processor, aka "AMD Secure Technology") and ARM CoreSight
  (debug and trace) changes.

  ARM:

   - CoreSight: Add support for ETE and TRBE

   - Stage-2 isolation for the host kernel when running in protected
     mode

   - Guest SVE support when running in nVHE mode

   - Force W^X hypervisor mappings in nVHE mode

   - ITS save/restore for guests using direct injection with GICv4.1

   - nVHE panics now produce readable backtraces

   - Guest support for PTP using the ptp_kvm driver

   - Performance improvements in the S2 fault handler

  x86:

   - AMD PSP driver changes

   - Optimizations and cleanup of nested SVM code

   - AMD: Support for virtual SPEC_CTRL

   - Optimizations of the new MMU code: fast invalidation, zap under
     read lock, enable/disably dirty page logging under read lock

   - /dev/kvm API for AMD SEV live migration (guest API coming soon)

   - support SEV virtual machines sharing the same encryption context

   - support SGX in virtual machines

   - add a few more statistics

   - improved directed yield heuristics

   - Lots and lots of cleanups

  Generic:

   - Rework of MMU notifier interface, simplifying and optimizing the
     architecture-specific code

   - a handful of "Get rid of oprofile leftovers" patches

   - Some selftests improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
  KVM: selftests: Speed up set_memory_region_test
  selftests: kvm: Fix the check of return value
  KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
  KVM: SVM: Skip SEV cache flush if no ASIDs have been used
  KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
  KVM: SVM: Drop redundant svm_sev_enabled() helper
  KVM: SVM: Move SEV VMCB tracking allocation to sev.c
  KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
  KVM: SVM: Unconditionally invoke sev_hardware_teardown()
  KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
  KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
  KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
  KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
  KVM: SVM: Move SEV module params/variables to sev.c
  KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
  KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
  KVM: SVM: Zero out the VMCB array used to track SEV ASID association
  x86/sev: Drop redundant and potentially misleading 'sev_enabled'
  KVM: x86: Move reverse CPUID helpers to separate header file
  KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
  ...
2021-05-01 10:14:08 -07:00
Linus Torvalds 4f9701057a IOMMU Updates for Linux v5.13
Including:
 
 	- Big cleanup of almost unsused parts of the IOMMU API by
 	  Christoph Hellwig. This mostly affects the Freescale PAMU
 	  driver.
 
 	- New IOMMU driver for Unisoc SOCs
 
 	- ARM SMMU Updates from Will:
 
 	  - SMMUv3: Drop vestigial PREFETCH_ADDR support
 	  - SMMUv3: Elide TLB sync logic for empty gather
 	  - SMMUv3: Fix "Service Failure Mode" handling
     	  - SMMUv2: New Qualcomm compatible string
 
 	- Removal of the AMD IOMMU performance counter writeable check
 	  on AMD. It caused long boot delays on some machines and is
 	  only needed to work around an errata on some older (possibly
 	  pre-production) chips. If someone is still hit by this
 	  hardware issue anyway the performance counters will just
 	  return 0.
 
 	- Support for targeted invalidations in the AMD IOMMU driver.
 	  Before that the driver only invalidated a single 4k page or the
 	  whole IO/TLB for an address space. This has been extended now
 	  and is mostly useful for emulated AMD IOMMUs.
 
 	- Several fixes for the Shared Virtual Memory support in the
 	  Intel VT-d driver
 
 	- Mediatek drivers can now be built as modules
 
 	- Re-introduction of the forcedac boot option which got lost
 	  when converting the Intel VT-d driver to the common dma-iommu
 	  implementation.
 
 	- Extension of the IOMMU device registration interface and
 	  support iommu_ops to be const again when drivers are built as
 	  modules.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmCMEIoACgkQK/BELZcB
 GuOu9xAAvg6aR0uHlxvRq6cgNnHN9Ltp5+t3qFYtRRrauY0iOPMO62k0QQli5shX
 CGeczD0e59KAZqI0zNJnQn8hMY5dg7XVkFCC5BrSzuCDCtwJZ0N5Tq3pfUlaV1rw
 BJf41t79Fd+jp7kn53tu+vRAfYZ3+sLOx/6U3c15pqKRZSkyFWbQllOtD3J5LnLu
 1PyPlfiNpMwCajiS7aQbN+fuJ/lKIFeA2MDPOsCBzhbfxiJUqJxZOKAZO3rOjFfK
 feTibqQ+3Zz6MPXt9st1cvPpy8jCosv81OY6Knqvxf/oB5q+fEdi2uNrKISonb/t
 Fw331oOIwg2A+HOpwC9MN1AumOIqiHSWWENAMk9SlP+TMIWKQ8kZreyI6IEB23dV
 +QvP3DVA+CfLwtNY/Zh0IqKh28D+IHlKbpWNU1m+9AUe468mV/MTjfwxr9Yfffhm
 LZ6C0DgFdmtqv8jPuDGUOgo3RNeN8bLnUSEHG9gHibA+RKujl5BWDjKkwILqMQTt
 Ysdsu8TiNtFIULomizqCpgqEbQfW8TLFvASXCM1VMQ/PDURxvchZPxFDJonYXy+K
 z2HGaG3eUE07YrAdRKH69aMVIbmS+sjEhvmi4xZ1Lh7wWcIE2AZVvO8qNb+Ckcp3
 4tLPPDksm/iQngnFf6gdgH3qv4rgbzE4+74GXqeANiQCjY9dSJI=
 =qF2C
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:

 - Big cleanup of almost unsused parts of the IOMMU API by Christoph
   Hellwig. This mostly affects the Freescale PAMU driver.

 - New IOMMU driver for Unisoc SOCs

 - ARM SMMU Updates from Will:
     - Drop vestigial PREFETCH_ADDR support (SMMUv3)
     - Elide TLB sync logic for empty gather (SMMUv3)
     - Fix "Service Failure Mode" handling (SMMUv3)
     - New Qualcomm compatible string (SMMUv2)

 - Removal of the AMD IOMMU performance counter writeable check on AMD.
   It caused long boot delays on some machines and is only needed to
   work around an errata on some older (possibly pre-production) chips.
   If someone is still hit by this hardware issue anyway the performance
   counters will just return 0.

 - Support for targeted invalidations in the AMD IOMMU driver. Before
   that the driver only invalidated a single 4k page or the whole IO/TLB
   for an address space. This has been extended now and is mostly useful
   for emulated AMD IOMMUs.

 - Several fixes for the Shared Virtual Memory support in the Intel VT-d
   driver

 - Mediatek drivers can now be built as modules

 - Re-introduction of the forcedac boot option which got lost when
   converting the Intel VT-d driver to the common dma-iommu
   implementation.

 - Extension of the IOMMU device registration interface and support
   iommu_ops to be const again when drivers are built as modules.

* tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (84 commits)
  iommu: Streamline registration interface
  iommu: Statically set module owner
  iommu/mediatek-v1: Add error handle for mtk_iommu_probe
  iommu/mediatek-v1: Avoid build fail when build as module
  iommu/mediatek: Always enable the clk on resume
  iommu/fsl-pamu: Fix uninitialized variable warning
  iommu/vt-d: Force to flush iotlb before creating superpage
  iommu/amd: Put newline after closing bracket in warning
  iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()'
  iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86
  iommu/amd: Remove performance counter pre-initialization test
  Revert "iommu/amd: Fix performance counter initialization"
  iommu/amd: Remove duplicate check of devid
  iommu/exynos: Remove unneeded local variable initialization
  iommu/amd: Page-specific invalidations for more than one page
  iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command
  iommu/vt-d: Avoid unnecessary cache flush in pasid entry teardown
  iommu/vt-d: Invalidate PASID cache when root/context entry changed
  iommu/vt-d: Remove WO permissions on second-level paging entries
  iommu/vt-d: Report the right page fault address
  ...
2021-05-01 09:33:00 -07:00
Linus Torvalds d42f323a7d Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "A few misc subsystems and some of MM.

  175 patches.

  Subsystems affected by this patch series: ia64, kbuild, scripts, sh,
  ocfs2, kfifo, vfs, kernel/watchdog, and mm (slab-generic, slub,
  kmemleak, debug, pagecache, msync, gup, memremap, memcg, pagemap,
  mremap, dma, sparsemem, vmalloc, documentation, kasan, initialization,
  pagealloc, and memory-failure)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (175 commits)
  mm/memory-failure: unnecessary amount of unmapping
  mm/mmzone.h: fix existing kernel-doc comments and link them to core-api
  mm: page_alloc: ignore init_on_free=1 for debug_pagealloc=1
  net: page_pool: use alloc_pages_bulk in refill code path
  net: page_pool: refactor dma_map into own function page_pool_dma_map
  SUNRPC: refresh rq_pages using a bulk page allocator
  SUNRPC: set rq_page_end differently
  mm/page_alloc: inline __rmqueue_pcplist
  mm/page_alloc: optimize code layout for __alloc_pages_bulk
  mm/page_alloc: add an array-based interface to the bulk page allocator
  mm/page_alloc: add a bulk page allocator
  mm/page_alloc: rename alloced to allocated
  mm/page_alloc: duplicate include linux/vmalloc.h
  mm, page_alloc: avoid page_to_pfn() in move_freepages()
  mm/Kconfig: remove default DISCONTIGMEM_MANUAL
  mm: page_alloc: dump migrate-failed pages
  mm/mempolicy: fix mpol_misplaced kernel-doc
  mm/mempolicy: rewrite alloc_pages_vma documentation
  mm/mempolicy: rewrite alloc_pages documentation
  mm/mempolicy: rename alloc_pages_current to alloc_pages
  ...
2021-04-30 14:38:01 -07:00
Linus Torvalds c70a4be130 powerpc updates for 5.13
- Enable KFENCE for 32-bit.
 
  - Implement EBPF for 32-bit.
 
  - Convert 32-bit to do interrupt entry/exit in C.
 
  - Convert 64-bit BookE to do interrupt entry/exit in C.
 
  - Changes to our signal handling code to use user_access_begin/end() more extensively.
 
  - Add support for time namespaces (CONFIG_TIME_NS)
 
  - A series of fixes that allow us to reenable STRICT_KERNEL_RWX.
 
  - Other smaller features, fixes & cleanups.
 
 Thanks to: Alexey Kardashevskiy, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V,
   Athira Rajeev, Bhaskar Chowdhury, Bixuan Cui, Cédric Le Goater, Chen Huang, Chris
   Packham, Christophe Leroy, Christopher M. Riedl, Colin Ian King, Dan Carpenter, Daniel
   Axtens, Daniel Henrique Barboza, David Gibson, Davidlohr Bueso, Denis Efremov,
   dingsenjie, Dmitry Safonov, Dominic DeMarco, Fabiano Rosas, Ganesh Goudar, Geert
   Uytterhoeven, Geetika Moolchandani, Greg Kurz, Guenter Roeck, Haren Myneni, He Ying,
   Jiapeng Chong, Jordan Niethe, Laurent Dufour, Lee Jones, Leonardo Bras, Li Huafei,
   Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Nathan Chancellor, Nathan
   Lynch, Nicholas Piggin, Oliver O'Halloran, Paul Menzel, Pu Lehui, Randy Dunlap, Ravi
   Bangoria, Rosen Penev, Russell Currey, Santosh Sivaraj, Sebastian Andrzej Siewior,
   Segher Boessenkool, Shivaprasad G Bhat, Srikar Dronamraju, Stephen Rothwell, Thadeu Lima
   de Souza Cascardo, Thomas Gleixner, Tony Ambardar, Tyrel Datwyler, Vaibhav Jain,
   Vincenzo Frascino, Xiongwei Song, Yang Li, Yu Kuai, Zhang Yunkai.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmCLV1kTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLUyD/4jrTolG4sVec211hYO+0VuJzoqN4Cf
 j2CA2Ju39butnSMiq4LJUPRB7QRZY1OofkoNFpZeDQspjfZXPz2ulpYAz+SxHWE2
 ReHPmWH1rOABlUPXFboePF4OLwmAs9eR5mN2z9HpKXbT3k78HaToLqiONyB4fVCr
 Q5TkJeRn/Y7ZJLdyPLTpczHHleQ8KoM6kT7ncXnTm6p97JOBJSrGaJ5N/8X5a4+e
 6jtgB7Pvw8jNDShSr8BDLBgBZZcmoTiuG8KfgwRZ+m+mKB1yI2X8S/a54w/lDi9g
 UcSv3jQcFLJuW+T/pYe4R330uWDYa0cwjJOtMmsJ98S4EYOevoe9fZuL97qNshme
 xtBr4q1i03G1icYOJJ8dXtvabG2rUzj8t1SCDpwYfrynzTWVRikiQYTXUBhRSFoK
 nsoklvKd2IZa485XYJ2ljSyClMy8S4yJJ9RuzZ94DTXDSJUesKuyRWGnso4mhkcl
 wvl4wwMTJvnCMKVo6dsJyV24QWfd6dABxzm04uPA94CKhG33UwK8252jXVeaohSb
 WSO7qWBONgDXQLJ0mXRcEYa9NHvFS4Jnp6APbxnHr1gS+K+PNkD4gPBf34FoyN0E
 9s27kvEYk5vr8APUclETF6+FkbGUD5bFbusjt3hYloFpAoHQ/k5pFVDsOZNPA8sW
 fDIRp05KunDojw==
 =dfKL
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Enable KFENCE for 32-bit.

 - Implement EBPF for 32-bit.

 - Convert 32-bit to do interrupt entry/exit in C.

 - Convert 64-bit BookE to do interrupt entry/exit in C.

 - Changes to our signal handling code to use user_access_begin/end()
   more extensively.

 - Add support for time namespaces (CONFIG_TIME_NS)

 - A series of fixes that allow us to reenable STRICT_KERNEL_RWX.

 - Other smaller features, fixes & cleanups.

Thanks to Alexey Kardashevskiy, Andreas Schwab, Andrew Donnellan, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Bixuan Cui, Cédric Le
Goater, Chen Huang, Chris Packham, Christophe Leroy, Christopher M.
Riedl, Colin Ian King, Dan Carpenter, Daniel Axtens, Daniel Henrique
Barboza, David Gibson, Davidlohr Bueso, Denis Efremov, dingsenjie,
Dmitry Safonov, Dominic DeMarco, Fabiano Rosas, Ganesh Goudar, Geert
Uytterhoeven, Geetika Moolchandani, Greg Kurz, Guenter Roeck, Haren
Myneni, He Ying, Jiapeng Chong, Jordan Niethe, Laurent Dufour, Lee
Jones, Leonardo Bras, Li Huafei, Madhavan Srinivasan, Mahesh Salgaonkar,
Masahiro Yamada, Nathan Chancellor, Nathan Lynch, Nicholas Piggin,
Oliver O'Halloran, Paul Menzel, Pu Lehui, Randy Dunlap, Ravi Bangoria,
Rosen Penev, Russell Currey, Santosh Sivaraj, Sebastian Andrzej Siewior,
Segher Boessenkool, Shivaprasad G Bhat, Srikar Dronamraju, Stephen
Rothwell, Thadeu Lima de Souza Cascardo, Thomas Gleixner, Tony Ambardar,
Tyrel Datwyler, Vaibhav Jain, Vincenzo Frascino, Xiongwei Song, Yang Li,
Yu Kuai, and Zhang Yunkai.

* tag 'powerpc-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (302 commits)
  powerpc/signal32: Fix erroneous SIGSEGV on RT signal return
  powerpc: Avoid clang uninitialized warning in __get_user_size_allowed
  powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe
  powerpc/kvm: Fix build error when PPC_MEM_KEYS/PPC_PSERIES=n
  powerpc/kasan: Fix shadow start address with modules
  powerpc/kernel/iommu: Use largepool as a last resort when !largealloc
  powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs
  powerpc/44x: fix spelling mistake in Kconfig "varients" -> "variants"
  powerpc/iommu: Annotate nested lock for lockdep
  powerpc/iommu: Do not immediately panic when failed IOMMU table allocation
  powerpc/iommu: Allocate it_map by vmalloc
  selftests/powerpc: remove unneeded semicolon
  powerpc/64s: remove unneeded semicolon
  powerpc/eeh: remove unneeded semicolon
  powerpc/selftests: Add selftest to test concurrent perf/ptrace events
  powerpc/selftests/perf-hwbreak: Add testcases for 2nd DAWR
  powerpc/selftests/perf-hwbreak: Coalesce event creation code
  powerpc/selftests/ptrace-hwbreak: Add testcases for 2nd DAWR
  powerpc/configs: Add IBMVNIC to some 64-bit configs
  selftests/powerpc: Add uaccess flush test
  ...
2021-04-30 12:22:28 -07:00
Nicholas Piggin 6f680e70b6 mm/vmalloc: provide fallback arch huge vmap support functions
If an architecture doesn't support a particular page table level as a huge
vmap page size then allow it to skip defining the support query function.

Link: https://lkml.kernel.org/r/20210317062402.533919-11-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Nicholas Piggin 8309c9d717 powerpc: inline huge vmap supported functions
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Link: https://lkml.kernel.org/r/20210317062402.533919-8-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Nicholas Piggin bbc180a5ad mm: HUGE_VMAP arch support cleanup
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.

This also adds a prot argument to the arch query.  This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30 11:20:40 -07:00
Linus Torvalds 0080665fbd Devicetree updates for v5.13:
- Refactoring powerpc and arm64 kexec DT handling to common code. This
   enables IMA on arm64.
 
 - Add kbuild support for applying DT overlays at build time. The first
   user are the DT unittests.
 
 - Fix kerneldoc formatting and W=1 warnings in drivers/of/
 
 - Fix handling 64-bit flag on PCI resources
 
 - Bump dtschema version required to v2021.2.1
 
 - Enable undocumented compatible checks for dtbs_check. This allows
   tracking of missing binding schemas.
 
 - DT docs improvements. Regroup the DT docs and add the example schema
   and DT kernel ABI docs to the doc build.
 
 - Convert Broadcom Bluetooth and video-mux bindings to schema
 
 - Add QCom sm8250 Venus video codec binding schema
 
 - Add vendor prefixes for AESOP, YIC System Co., Ltd, and Siliconfile
   Technologies Inc.
 
 - Cleanup of DT schema type references on common properties and
   standard unit properties
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmCIYdgQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw/PKEACkOCWDnLSY9U7w1uGDHr6UgXIWOY9j8bYy
 2pTvDrVa6KZphT6yGU/hxrOk8Mqh5AMd2vUhO2OCoyyl/priTv+Ktqo+bikvJZLa
 MQm3JnrLpPy/GetdmVD8wq1l+FoeOSTnRIJqRxInsd8UFVpZImtP22ELox6KgGiv
 keVHIrjsHU/HpafK3w8wHCLikCZk+1Gl6pL/QgFDv2FaaCTKW16Dt64dPqYm49Xk
 j7YMMQWl+3NJ9ywZV0+PMbl9udi3EjGm5Ap5VfKzpj53Nh07QObg/QtH/1sj0HPo
 apyW7jAyQFyLytbjxzFL/tljtOeW/5rZos1GWThZ326e+Y0mTKUTDZShvNplfjIf
 e26FvVi7gndWlRSr30Ia5gdNFAx72IkpJUAuypBXgb+qNPchBJjAXLn9tcIcg/k+
 2R6BIB7SkVLpgTnJ1Bq1+PRqkKM+ggACdJNJIUApj44xoiG01vtGDGRaFuIio+Ch
 HT4aBbic4kLvagm8VzuiIF/sL7af5pntzArcyOfQTaZ92DyGI2C0j90rK3yPRIYM
 u9qX/24t1SXiUji74QpoQFzt/+Egy5hYXMJOJJSywUjKf7DBhehqklTjiJRQHKm6
 0DJ/n8q4lNru8F0Y4keKSuYTfHBstF7fS3UTH/rUmBAbfEwkvZe6B29KQbs+7aph
 GTw+jeoR5Q==
 =rF27
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:

 - Refactor powerpc and arm64 kexec DT handling to common code. This
   enables IMA on arm64.

 - Add kbuild support for applying DT overlays at build time. The first
   user are the DT unittests.

 - Fix kerneldoc formatting and W=1 warnings in drivers/of/

 - Fix handling 64-bit flag on PCI resources

 - Bump dtschema version required to v2021.2.1

 - Enable undocumented compatible checks for dtbs_check. This allows
   tracking of missing binding schemas.

 - DT docs improvements. Regroup the DT docs and add the example schema
   and DT kernel ABI docs to the doc build.

 - Convert Broadcom Bluetooth and video-mux bindings to schema

 - Add QCom sm8250 Venus video codec binding schema

 - Add vendor prefixes for AESOP, YIC System Co., Ltd, and Siliconfile
   Technologies Inc.

 - Cleanup of DT schema type references on common properties and
   standard unit properties

* tag 'devicetree-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (64 commits)
  powerpc: If kexec_build_elf_info() fails return immediately from elf64_load()
  powerpc: Free fdt on error in elf64_load()
  of: overlay: Fix kerneldoc warning in of_overlay_remove()
  of: linux/of.h: fix kernel-doc warnings
  of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses
  dt-bindings: bcm4329-fmac: add optional brcm,ccode-map
  docs: dt: update writing-schema.rst references
  dt-bindings: media: venus: Add sm8250 dt schema
  of: base: Fix spelling issue with function param 'prop'
  docs: dt: Add DT API documentation
  of: Add missing 'Return' section in kerneldoc comments
  of: Fix kerneldoc output formatting
  docs: dt: Group DT docs into relevant sub-sections
  docs: dt: Make 'Devicetree' wording more consistent
  docs: dt: writing-schema: Include the example schema in the doc build
  docs: dt: writing-schema: Remove spurious indentation
  dt-bindings: Fix reference in submitting-patches.rst to the DT ABI doc
  dt-bindings: ddr: Add optional manufacturer and revision ID to LPDDR3
  dt-bindings: media: video-interfaces: Drop the example
  devicetree: bindings: clock: Minor typo fix in the file armada3700-tbg-clock.txt
  ...
2021-04-28 15:50:24 -07:00
Nathan Chancellor f9cd5f91a8 powerpc: Avoid clang uninitialized warning in __get_user_size_allowed
Commit 9975f852ce ("powerpc/uaccess: Remove calls to __get_user_bad()
and __put_user_bad()") switch to BUILD_BUG() in the default case, which
leaves x uninitialized. This will not be an issue because the build will
be broken in that case but clang does static analysis before it realizes
the default case will be done so it warns about x being uninitialized
(trimmed for brevity):

 In file included from mm/mprotect.c:13:
 In file included from ./include/linux/hugetlb.h:28:
 In file included from ./include/linux/mempolicy.h:16:
 ./include/linux/pagemap.h:772:16: warning: variable '__gu_val' is used
 uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
                 if (unlikely(__get_user(c, uaddr) != 0))
                              ^~~~~~~~~~~~~~~~~~~~
 ./arch/powerpc/include/asm/uaccess.h:266:2: note: expanded from macro '__get_user'
         __get_user_size_allowed(__gu_val, __gu_addr, __gu_size, __gu_err);      \
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ./arch/powerpc/include/asm/uaccess.h:235:2: note: expanded from macro
 '__get_user_size_allowed'
        default: BUILD_BUG();                                   \
        ^~~~~~~

Commit 5cd29b1fd3 ("powerpc/uaccess: Use asm goto for get_user when
compiler supports it") added an initialization for x because of the same
reason. Do the same thing here so there is no warning across all
versions of clang.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1359
Link: https://lore.kernel.org/r/20210426203518.981550-1-nathan@kernel.org
2021-04-28 23:34:50 +10:00
Christophe Leroy 30c400886b powerpc/kasan: Fix shadow start address with modules
Modules are now located before kernel, KASAN area has to
be extended accordingly.

Fixes: 80edc68e04 ("powerpc/32s: Define a MODULE area below kernel text all the time")
Fixes: 9132a2e82a ("powerpc/8xx: Define a MODULE area below kernel text")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c68163065163f303f5af1e4bbdd9f1ce69f0543e.1619260465.git.christophe.leroy@csgroup.eu
2021-04-25 21:29:04 +10:00
Paolo Bonzini fd49e8ee70 Merge branch 'kvm-sev-cgroup' into HEAD 2021-04-22 13:19:01 -04:00
Christophe Leroy 9ccba66d4d powerpc/64: Fix the definition of the fixmap area
At the time being, the fixmap area is defined at the top of
the address space or just below KASAN.

This definition is not valid for PPC64.

For PPC64, use the top of the I/O space.

Because of circular dependencies, it is not possible to include
asm/fixmap.h in asm/book3s/64/pgtable.h , so define a fixed size
AREA at the top of the I/O space for fixmap and ensure during
build that the size is big enough.

Fixes: 265c3491c4 ("powerpc: Add support for GENERIC_EARLY_IOREMAP")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d51620eacf036d683d1a3c41328f69adb601dc0.1618925560.git.christophe.leroy@csgroup.eu
2021-04-22 20:59:15 +10:00
Michael Ellerman d936f8182e powerpc/powernv: Fix type of opal_mpipl_query_tag() addr argument
opal_mpipl_query_tag() takes a pointer to a 64-bit value, which firmware
writes a value to. As OPAL is traditionally big endian this value will
be big endian.

This can be confirmed by looking at the implementation in skiboot:

  static uint64_t opal_mpipl_query_tag(enum opal_mpipl_tags tag, __be64 *tag_val)
  {
  	...
  	*tag_val = cpu_to_be64(opal_mpipl_tags[tag]);
  	return OPAL_SUCCESS;
  }

Fix the declaration to annotate that the value is big endian.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210421125402.1955013-2-mpe@ellerman.id.au
2021-04-22 20:59:09 +10:00
Christophe Leroy 41d6cf68b5 powerpc: Rename probe_kernel_read_inst()
When probe_kernel_read_inst() was created, it was to mimic
probe_kernel_read() function.

Since then, probe_kernel_read() has been renamed
copy_from_kernel_nofault().

Rename probe_kernel_read_inst() into copy_inst_from_kernel_nofault().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b783d1f7cdb8914992384a669a2af57051b6bdcf.1618405715.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:33 +10:00
Christophe Leroy 6449078d50 powerpc: Make probe_kernel_read_inst() common to PPC32 and PPC64
We have two independant versions of probe_kernel_read_inst(), one for
PPC32 and one for PPC64.

The PPC32 is identical to the first part of the PPC64 version.
The remaining part of PPC64 version is not relevant for PPC32, but
not contradictory, so we can easily have a common function with
the PPC64 part opted out via a IS_ENABLED(CONFIG_PPC64).

The only need is to add a version of ppc_inst_prefix() for PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f7b9dfddef3b3760182c7e5466356c121a293dc9.1618405715.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:33 +10:00
Christophe Leroy 6ac7897f08 powerpc: Remove probe_user_read_inst()
Its name comes from former probe_user_read() function.
That function is now called copy_from_user_nofault().

probe_user_read_inst() uses copy_from_user_nofault() to read only
a few bytes. It is suboptimal.

It does the same as get_user_inst() but in addition disables
page faults.

But on the other hand, it is not used for the time being. So remove it
for now. If one day it is really needed, we can give it a new name
more in line with today's naming, and implement it using get_user_inst()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5f6f82572242a59bfee1e19a71194d8f7ef5fca4.1618405715.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:33 +10:00
Christophe Leroy ee7c3ec3b4 powerpc/ebpf32: Use standard function call for functions within 32M distance
If the target of a function call is within 32 Mbytes distance, use a
standard function call with 'bl' instead of the 'lis/ori/mtlr/blrl'
sequence.

In the first pass, no memory has been allocated yet and the code
position is not known yet (image pointer is NULL). This pass is there
to calculate the amount of memory to allocate for the EBPF code, so
assume the 4 instructions sequence is required, so that enough memory
is allocated.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/74944a1e3e5cfecc141e440a6ccd37920e186b70.1618227846.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:33 +10:00
Christophe Leroy 867e762480 powerpc/32: Use r2 in wrtspr() instead of r0
wrtspr() is a function to write an arbitrary value in a special
register. It is used on 8xx to write to SPRN_NRI, SPRN_EID and
SPRN_EIE. Writing any value to one of those will play with MSR EE
and MSR RI regardless of that value.

r0 is used many places in the generated code and using r0 for
that creates an unnecessary dependency of this instruction with
preceding ones using r0 in a few places in vmlinux.

r2 is most likely the most stable register as it contains the
pointer to 'current'.

Using r2 instead of r0 avoids that unnecessary dependency.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/69f9968f4b592fefda55227f0f7430ea612cc950.1611299687.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:32 +10:00
Christophe Leroy 693557ebf4 powerpc/inst: ppc_inst_as_u64() becomes ppc_inst_as_ulong()
In order to simplify use on PPC32, change ppc_inst_as_u64()
into ppc_inst_as_ulong() that returns the 32 bits instruction
on PPC32.

Will be used when porting OPTPROBES to PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/22cadf29620664b600b82026d2a72b8b23351777.1618927318.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:32 +10:00
Christophe Leroy e522331173 powerpc/irq: Enhance readability of trap types
This patch makes use of trap types in irq.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f7f8c9f98c33eaea316755c7fef150d1d77e047d.1618847273.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:32 +10:00
Christophe Leroy 7fab639729 powerpc/32s: Enhance readability of trap types
This patch makes use of trap types in head_book3s_32.S

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bd80ace67757f489fc4ecdb76dd1a71511daba94.1618847273.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:31 +10:00
Christophe Leroy 0f5eb28a6c powerpc/8xx: Enhance readability of trap types
This patch makes use of trap types in head_8xx.S

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e1147287bf6f2fb0693048fe8db0298c7870e419.1618847273.git.christophe.leroy@csgroup.eu
2021-04-21 22:52:31 +10:00
Tyrel Datwyler 39d0099f94 powerpc/pseries: Add shutdown() to vio_driver and vio_bus
Currently, neither the vio_bus or vio_driver structures provide support
for a shutdown() routine.

Add support for shutdown() by allowing drivers to provide a
implementation via function pointer in their vio_driver struct and
provide a proper implementation in the driver template for the vio_bus
that calls a vio drivers shutdown() if defined.

In the case that no shutdown() is defined by a vio driver and a kexec is
in progress we implement a big hammer that calls remove() to ensure no
further DMA for the devices is possible.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210402001325.939668-1-tyreld@linux.ibm.com
2021-04-20 14:22:24 +10:00
Athira Rajeev af31fd0c91 powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT
Performance Monitoring Unit (PMU) registers in powerpc provides
information on cycles elapsed between different stages in the
pipeline. This can be used for application tuning. On ISA v3.1
platform, this information is exposed by sampling registers.
Patch adds kernel support to capture two of the cycle counters
as part of perf sample using the sample type:
PERF_SAMPLE_WEIGHT_STRUCT.

The power PMU function 'get_mem_weight' currently uses 64 bit weight
field of perf_sample_data to capture memory latency. But following the
introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
64-bit or 32-bit value depending on the architexture support for
PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
pipeline stage cycles info. Hence update the ppmu functions to work for
64-bit and 32-bit weight values.

If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
latency is stored in the low 32bits of perf_sample_weight structure.
Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
two 16 bit fields of perf_sample_weight structure.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1616425047-1666-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-04-20 14:22:23 +10:00
Michael Ellerman a38cb41719 Merge branch 'topic/ppc-kvm' into next
Merge some powerpc KVM patches we are keeping in a topic branch just in
case anyone else needs to merge them.
2021-04-18 23:55:12 +10:00
Nicholas Piggin 49c1d07fd0 powerpc/powernv: Enable HAIL (HV AIL) for ISA v3.1 processors
Starting with ISA v3.1, LPCR[AIL] no longer controls the interrupt
mode for HV=1 interrupts. Instead, a new LPCR[HAIL] bit is defined
which behaves like AIL=3 for HV interrupts when set.

Set HAIL on bare metal to give us mmu-on interrupts and improve
performance.

This also fixes an scv bug: we don't implement scv real mode (AIL=0)
vectors because they are at an inconvenient location, so we just
disable scv support when AIL can not be set. However powernv assumes
that LPCR[AIL] will enable AIL mode so it enables scv support despite
HV interrupts being AIL=0, which causes scv interrupts to go off into
the weeds.

Fixes: 7fa95f9ada ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210402024124.545826-1-npiggin@gmail.com
2021-04-18 23:19:29 +10:00
Sean Christopherson b4c5936c47 KVM: Kill off the old hva-based MMU notifier callbacks
Yank out the hva-based MMU notifier APIs now that all architectures that
use the notifiers have moved to the gfn-based APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:08 -04:00
Sean Christopherson b1c5356e87 KVM: PPC: Convert to the gfn-based MMU notifier callbacks
Move PPC to the gfn-base MMU notifier APIs, and update all 15 bajillion
PPC-internal hooks to work with gfns instead of hvas.

No meaningful functional change intended, though the exact order of
operations is slightly different since the memslot lookups occur before
calling into arch code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:07 -04:00
Sean Christopherson 5f7c292b89 KVM: Move prototypes for MMU notifier callbacks to generic code
Move the prototypes for the MMU notifier callbacks out of arch code and
into common code.  There is no benefit to having each arch replicate the
prototypes since any deviation from the invocation in common code will
explode.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:30:55 -04:00
Xiongwei Song 7153d4bf0b powerpc/traps: Enhance readability for trap types
Define macros to list ppc interrupt types in interttupt.h, replace the
reference of the trap hex values with these macros.

Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S,
arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S,
arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h.

Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
2021-04-17 22:20:19 +10:00
Tony Ambardar 7de21e679e powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
A few archs like powerpc have different errno.h values for macros
EDEADLOCK and EDEADLK. In code including both libc and linux versions of
errno.h, this can result in multiple definitions of EDEADLOCK in the
include chain. Definitions to the same value (e.g. seen with mips) do
not raise warnings, but on powerpc there are redefinitions changing the
value, which raise warnings and errors (if using "-Werror").

Guard against these redefinitions to avoid build errors like the following,
first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with
musl 1.1.24:

  In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5,
                   from ../../include/linux/err.h:8,
                   from libbpf.c:29:
  ../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined [-Werror]
   #define EDEADLOCK EDEADLK

  In file included from toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10,
                   from libbpf.c:26:
  toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is the location of the previous definition
   #define EDEADLOCK       58

  cc1: all warnings being treated as errors

Cc: Stable <stable@vger.kernel.org>
Reported-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200917135437.1238787-1-Tony.Ambardar@gmail.com
2021-04-17 10:40:51 +10:00
Srikar Dronamraju c1e53367da powerpc/smp: Cache CPU to chip lookup
On systems with large CPUs per node, even with the filtered matching of
related CPUs, there can be large number of calls to cpu_to_chip_id for
the same CPU. For example with 4096 vCPU, 1 node QEMU configuration,
with 4 threads per core, system could be see upto 1024 calls to
cpu_to_chip_id() for the same CPU. On a given system, cpu_to_chip_id()
for a given CPU would always return the same. Hence cache the result in
a lookup table for use in subsequent calls.

Since all CPUs sharing the same core will belong to the same chip, the
lookup_table has an entry for one CPU per core.  chip_id_lookup_table is
not being freed and would be used on subsequent CPU online post CPU
offline.

Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210415120934.232271-4-srikar@linux.vnet.ibm.com
2021-04-17 10:40:51 +10:00
Srikar Dronamraju 131c82b6a1 Revert "powerpc/topology: Update topology_core_cpumask"
Now that cpu_core_mask has been reintroduced, lets revert
commit 4bce545903 ("powerpc/topology: Update topology_core_cpumask")

Post this commit, lscpu should reflect topologies as requested by a user
when a QEMU instance is launched with NUMA spanning multiple sockets.

Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210415120934.232271-3-srikar@linux.vnet.ibm.com
2021-04-17 10:40:51 +10:00
Srikar Dronamraju c47f892d7a powerpc/smp: Reintroduce cpu_core_mask
Daniel reported that with Commit 4ca234a9cb ("powerpc/smp: Stop
updating cpu_core_mask") QEMU was unable to set single NUMA node SMP
topologies such as:
 -smp 8,maxcpus=8,cores=2,threads=2,sockets=2
 i.e he expected 2 sockets in one NUMA node.

The above commit helped to reduce boot time on Large Systems for
example 4096 vCPU single socket QEMU instance. PAPR is silent on
having more than one socket within a NUMA node.

cpu_core_mask and cpu_cpu_mask for any CPU would be same unless the
number of sockets is different from the number of NUMA nodes.

One option is to reintroduce cpu_core_mask but use a slightly
different method to arrive at the cpu_core_mask. Previously each CPU's
chip-id would be compared with all other CPU's chip-id to verify if
both the CPUs were related at the chip level. Now if a CPU 'A' is
found related / (unrelated) to another CPU 'B', all the thread
siblings of 'A' and thread siblings of 'B' are automatically marked as
related / (unrelated).

Also if a platform doesn't support ibm,chip-id property, i.e its
cpu_to_chip_id returns -1, cpu_core_map holds a copy of
cpu_cpu_mask().

Fixes: 4ca234a9cb ("powerpc/smp: Stop updating cpu_core_mask")
Reported-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210415120934.232271-2-srikar@linux.vnet.ibm.com
2021-04-17 10:40:51 +10:00
Joerg Roedel 49d11527e5 Merge branches 'iommu/fixes', 'arm/mediatek', 'arm/smmu', 'arm/exynos', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into next 2021-04-16 17:16:03 +02:00
Christophe Leroy 74205b3fc2 powerpc/vdso: Add support for time namespaces
This patch adds the necessary glue to provide time namespaces.

Things are mainly copied from ARM64.

__arch_get_timens_vdso_data() calculates timens vdso data position
based on the vdso data position, knowing it is the next page in vvar.
This avoids having to redo the mflr/bcl/mflr/mtlr dance to locate
the page relative to running code position.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # vDSO parts
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1a15495f80ec19a87b16cf874dbf7c3fa5ec40fe.1617209142.git.christophe.leroy@csgroup.eu
2021-04-14 23:04:44 +10:00
Dmitry Safonov 1c4bce6753 powerpc/vdso: Separate vvar vma from vdso
Since commit 511157ab64 ("powerpc/vdso: Move vdso datapage up front")
VVAR page is in front of the VDSO area. In result it breaks CRIU
(Checkpoint Restore In Userspace) [1], where CRIU expects that "[vdso]"
from /proc/../maps points at ELF/vdso image, rather than at VVAR data page.
Laurent made a patch to keep CRIU working (by reading aux vector).
But I think it still makes sence to separate two mappings into different
VMAs. It will also make ppc64 less "special" for userspace and as
a side-bonus will make VVAR page un-writable by debugger (which previously
would COW page and can be unexpected).

I opportunistically Cc stable on it: I understand that usually such
stuff isn't a stable material, but that will allow us in CRIU have
one workaround less that is needed just for one release (v5.11) on
one platform (ppc64), which we otherwise have to maintain.
I wouldn't go as far as to say that the commit 511157ab64 is ABI
regression as no other userspace got broken, but I'd really appreciate
if it gets backported to v5.11 after v5.12 is released, so as not
to complicate already non-simple CRIU-vdso code. Thanks!

[1]: https://github.com/checkpoint-restore/criu/issues/1417

Cc: stable@vger.kernel.org # v5.11
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # vDSO parts.
Acked-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f401eb1ebc0bfc4d8f0e10dc8e525fd409eb68e2.1617209142.git.christophe.leroy@csgroup.eu
2021-04-14 23:04:44 +10:00
Nicholas Piggin 8f6cc75a97 powerpc: move norestart trap flag to bit 0
Compact the trap flags down to use the low 4 bits of regs.trap.

A few 64e interrupt trap numbers set bit 4. Although they tended to be
trivial so it wasn't a real problem[1], it is not the right thing to do,
and confusing.

[*] E.g., 0x310 hypercall goes to unknown_exception, which prints
    regs->trap directly so 0x310 will appear fine, and only the syscall
    interrupt will test norestart, so it won't be confused by 0x310.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-12-npiggin@gmail.com
2021-04-14 23:04:44 +10:00
Nicholas Piggin 8dc7f0229b powerpc: remove partial register save logic
All subarchitectures always save all GPRs to pt_regs interrupt frames
now. Remove FULL_REGS and associated bits.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-11-npiggin@gmail.com
2021-04-14 23:04:44 +10:00
Nicholas Piggin c45ba4f44f powerpc: clean up do_page_fault
search_exception_tables + __bad_page_fault can be substituted with
bad_page_fault, do_page_fault no longer needs to return a value
to asm for any sub-architecture, and __bad_page_fault can be static.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-10-npiggin@gmail.com
2021-04-14 23:04:44 +10:00
Nicholas Piggin ceff77efa4 powerpc/64e/interrupt: Use new interrupt context tracking scheme
With the new interrupt exit code, context tracking can be managed
more precisely, so remove the last of the 64e workarounds and switch
to the new context tracking code already used by 64s.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-8-npiggin@gmail.com
2021-04-14 23:04:43 +10:00
Nicholas Piggin 097157e16c powerpc/64e/interrupt: reconcile irq soft-mask state in C
Use existing 64s interrupt entry wrapper code to reconcile irqs in C.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-7-npiggin@gmail.com
2021-04-14 23:04:43 +10:00
Nicholas Piggin 3db8aa10de powerpc/64e/interrupt: NMI save irq soft-mask state in C
64e non-maskable interrupts save the state of the irq soft-mask in
asm. This can be done in C in interrupt wrappers as 64s does.

I haven't been able to test this with qemu because it doesn't seem
to cause FSL bookE WDT interrupts.

This makes WatchdogException an NMI interrupt, which affects 32-bit
as well (okay, or create a new handler?)

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-6-npiggin@gmail.com
2021-04-14 23:04:20 +10:00
Nicholas Piggin 0c2472de23 powerpc/64e/interrupt: use new interrupt return
Update the new C and asm interrupt return code to account for 64e
specifics, switch over to use it.

The now-unused old ret_from_except code, that was moved to 64e after the
64s conversion, is removed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-5-npiggin@gmail.com
2021-04-14 23:04:20 +10:00
Nicholas Piggin 4228b2c3d2 powerpc/64e/interrupt: always save nvgprs on interrupt
In order to use the C interrupt return, nvgprs must always be saved.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210316104206.407354-3-npiggin@gmail.com
2021-04-14 23:04:19 +10:00
Madhavan Srinivasan 2e2a441d2c powerpc/perf: Infrastructure to support checking of attr.config*
Introduce code to support the checking of attr.config* for
values which are reserved for a given platform.
Performance Monitoring Unit (PMU) configuration registers
have fields that are reserved and some specific values for
bit fields are reserved. For ex., MMCRA[61:62] is
Random Sampling Mode (SM) and value of 0b11 for this field
is reserved.

Writing non-zero or invalid values in these fields will
have unknown behaviours.

Patch adds a generic call-back function "check_attr_config"
in "struct power_pmu", to be called in event_init to
check for attr.config* values for a given platform.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408074504.248211-1-maddy@linux.ibm.com
2021-04-14 23:04:19 +10:00
Christophe Leroy 6c96020882 powerpc/mem: Inline flush_dcache_page()
flush_dcache_page() is only a few lines, it is worth
inlining.

ia64, csky, mips, openrisc and riscv have a similar
flush_dcache_page() and inline it.

On pmac32_defconfig, we get a small size reduction.
On ppc64_defconfig, we get a very small size increase.

In both case that's in the noise (less than 0.1%).

text		data	bss	dec		hex	filename
18991155	5934744	1497624	26423523	19330e3	vmlinux64.before
18994829	5936732	1497624	26429185	1934701	vmlinux64.after
9150963		2467502	 184548	11803013	 b41985	vmlinux32.before
9149689		2467302	 184548	11801539	 b413c3	vmlinux32.after

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/21c417488b70b7629dae316539fb7bb8bdef4fdd.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14 23:04:19 +10:00
Christophe Leroy bf26e0bbd2 powerpc/mem: Declare __flush_dcache_icache() static
__flush_dcache_icache() is only used in mem.c.

Move it before the functions that use it and declare it static.

And also fix the name of the parameter in the comment.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3fa903eb5a10b2bc7d99a8c559ffdaa05452d8e0.1617895813.git.christophe.leroy@csgroup.eu
2021-04-14 23:04:18 +10:00
Masahiro Yamada 14b3c9d24a powerpc/syscalls: switch to generic syscalltbl.sh
Many architectures duplicate similar shell scripts.

This commit converts powerpc to use scripts/syscalltbl.sh. This also
unifies syscall_table_32.h and syscall_table_c32.h.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210301153019.362742-1-masahiroy@kernel.org
2021-04-14 23:04:16 +10:00
Nathan Lynch e5d5676352 powerpc/rtas: rename RTAS_RMOBUF_MAX to RTAS_USER_REGION_SIZE
RTAS_RMOBUF_MAX doesn't actually describe a "maximum" value in any
sense. It represents the size of an area of memory set aside for user
space to use as work areas for certain RTAS calls.

Rename it to RTAS_USER_REGION_SIZE.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210408140630.205502-6-nathanl@linux.ibm.com
2021-04-14 23:04:16 +10:00
Cédric Le Goater 6bf66eb8f4 powerpc/xive: Simplify the dump of XIVE interrupts under xmon
Move the xmon routine under XIVE subsystem and rework the loop on the
interrupts taking into account the xive_irq_domain to filter out IPIs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331144514.892250-7-clg@kaod.org
2021-04-14 23:04:14 +10:00
Christophe Leroy 80edc68e04 powerpc/32s: Define a MODULE area below kernel text all the time
On book3s/32, the segment below kernel text is used for module
allocation when CONFIG_STRICT_KERNEL_RWX is defined.

In order to benefit from the powerpc specific module_alloc()
function which allocate modules with 32 Mbytes from
end of kernel text, use that segment below PAGE_OFFSET at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a46dcdd39a9e80b012d86c294c4e5cd8d31665f3.1617283827.git.christophe.leroy@csgroup.eu
2021-04-14 23:04:13 +10:00
Christophe Leroy 9132a2e82a powerpc/8xx: Define a MODULE area below kernel text
On the 8xx, TASK_SIZE is 0x80000000. The space between TASK_SIZE
and PAGE_OFFSET is not used.

In order to benefit from the powerpc specific module_alloc()
function which allocate modules with 32 Mbytes from
end of kernel text, define MODULES_VADDR and MODULES_END.

Set a 256Mb area just below PAGE_OFFSET, like book3s/32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a225606d5b3a8bc53fe612ad52c855c60b0a0a58.1617283827.git.christophe.leroy@csgroup.eu
2021-04-14 23:04:13 +10:00
Shivaprasad G Bhat 75b7c05ebf powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall
Add support for ND_REGION_ASYNC capability if the device tree
indicates 'ibm,hcall-flush-required' property in the NVDIMM node.
Flush is done by issuing H_SCM_FLUSH hcall to the hypervisor.

If the flush request failed, the hypervisor is expected to
to reflect the problem in the subsequent nvdimm H_SCM_HEALTH call.

This patch prevents mmap of namespaces with MAP_SYNC flag if the
nvdimm requires an explicit flush[1].

References:
[1] https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Use unsigned long / long instead of uint64_t/int64_t]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/161703936121.36.7260632399582101498.stgit@e1fbed493c87
2021-04-14 23:04:07 +10:00
Nicholas Piggin 6c12c4376b KVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument
The va argument is not used in the function or set by its asm caller,
so remove it to be safe.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-8-npiggin@gmail.com
2021-04-12 13:36:23 +10:00
Nicholas Piggin 67145ef496 KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits
Guest LPCR depends on hardware type, and future changes will add
restrictions based on errata and guest MMU mode. Move this logic
to a common function and use it for the cases where the guest
wants to update its LPCR (or the LPCR of a nested guest).

This also adds a warning in other places that set or update LPCR
if we try to set something that would have been disallowed by
the filter, as a sanity check.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210412014845.1517916-4-npiggin@gmail.com
2021-04-12 13:36:23 +10:00
Christophe Leroy c46bbf5d2d powerpc/32: Remove powerpc specific definition of 'ptrdiff_t'
For unknown reason, old commit d27dfd388715 ("Import pre2.0.8")
changed 'ptrdiff_t' from 'int' to 'long'.

GCC expects it as 'int' really, and this leads to the following
warning when building KFENCE:

  CC      mm/kfence/report.o
In file included from ./include/linux/printk.h:7,
                 from ./include/linux/kernel.h:16,
                 from mm/kfence/report.c:10:
mm/kfence/report.c: In function 'kfence_report_error':
./include/linux/kern_levels.h:5:18: warning: format '%td' expects argument of type 'ptrdiff_t', but argument 6 has type 'long int' [-Wformat=]
    5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
      |                  ^~~~~~
./include/linux/kern_levels.h:11:18: note: in expansion of macro 'KERN_SOH'
   11 | #define KERN_ERR KERN_SOH "3" /* error conditions */
      |                  ^~~~~~~~
./include/linux/printk.h:343:9: note: in expansion of macro 'KERN_ERR'
  343 |  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
      |         ^~~~~~~~
mm/kfence/report.c:213:3: note: in expansion of macro 'pr_err'
  213 |   pr_err("Out-of-bounds %s at 0x%p (%luB %s of kfence-#%td):\n",
      |   ^~~~~~

<asm-generic/uapi/posix-types.h> defines it as 'int', and
defines 'size_t' and 'ssize_t' exactly as powerpc do, so
remove the powerpc specific definitions and fallback on
generic ones.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e43d133bf52fa19e577f64f3a3a38cedc570377d.1617616601.git.christophe.leroy@csgroup.eu
2021-04-08 21:17:46 +10:00
Nicholas Piggin 98db179a78 powerpc/64s: power4 nap fixup in C
There is no need for this to be in asm, use the new intrrupt entry wrapper.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210406025508.821718-1-npiggin@gmail.com
2021-04-08 21:17:45 +10:00
Michael Ellerman 56bec2f9d4 powerpc/mm/64s: Add _PAGE_KERNEL_ROX
In the past we had a fallback definition for _PAGE_KERNEL_ROX, but we
removed that in commit d82fd29c5a ("powerpc/mm: Distribute platform
specific PAGE and PMD flags and definitions") and added definitions
for each MMU family.

However we missed adding a definition for 64s, which was not really a
bug because it's currently not used.

But we'd like to use PAGE_KERNEL_ROX in a future patch so add a
definition now.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210331003845.216246-1-mpe@ellerman.id.au
2021-04-08 21:17:42 +10:00
Jordan Niethe b8b2f37cf6 powerpc/64s: Fix pte update for kernel memory on radix
When adding a PTE a ptesync is needed to order the update of the PTE
with subsequent accesses otherwise a spurious fault may be raised.

radix__set_pte_at() does not do this for performance gains. For
non-kernel memory this is not an issue as any faults of this kind are
corrected by the page fault handler. For kernel memory these faults
are not handled. The current solution is that there is a ptesync in
flush_cache_vmap() which should be called when mapping from the
vmalloc region.

However, map_kernel_page() does not call flush_cache_vmap(). This is
troublesome in particular for code patching with Strict RWX on radix.
In do_patch_instruction() the page frame that contains the instruction
to be patched is mapped and then immediately patched. With no ordering
or synchronization between setting up the PTE and writing to the page
it is possible for faults.

As the code patching is done using __put_user_asm_goto() the resulting
fault is obscured - but using a normal store instead it can be seen:

  BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c
  Faulting instruction address: 0xc00000000008bd74
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: nop_module(PO+) [last unloaded: nop_module]
  CPU: 4 PID: 757 Comm: sh Tainted: P           O      5.10.0-rc5-01361-ge3c1b78c8440-dirty #43
  NIP:  c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810
  REGS: c000000016f634a0 TRAP: 0300   Tainted: P           O       (5.10.0-rc5-01361-ge3c1b78c8440-dirty)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 44002884  XER: 00000000
  CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1

This results in the kind of issue reported here:
  https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/

Chris Riedl suggested a reliable way to reproduce the issue:
  $ mount -t debugfs none /sys/kernel/debug
  $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) &

Turning ftrace on and off does a large amount of code patching which
in usually less then 5min will crash giving a trace like:

   ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000)
   ------------[ ftrace bug ]------------
   ftrace failed to modify
   [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390
    actual:   11:3b:47:4b
   Setting ftrace call site to call ftrace function
   ftrace record flags: 80000001
    (1)
    expected tramp: c00000000006c96c
   ------------[ cut here ]------------
   WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8
   Modules linked in: nop_module(PO-) [last unloaded: nop_module]
   CPU: 4 PID: 809 Comm: sh Tainted: P           O      5.10.0-rc5-01360-gf878ccaf250a #1
   NIP:  c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0
   REGS: c000000004c8b760 TRAP: 0700   Tainted: P           O       (5.10.0-rc5-01360-gf878ccaf250a)
   MSR:  900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 28008848  XER: 20040000
   CFAR: c0000000001a9c98 IRQMASK: 0
   GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022
   GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8
   GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118
   GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000
   GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008
   GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8
   GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020
   GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0
   NIP ftrace_bug+0x28c/0x2e8
   LR  ftrace_bug+0x288/0x2e8
   Call Trace:
     ftrace_bug+0x288/0x2e8 (unreliable)
     ftrace_modify_all_code+0x168/0x210
     arch_ftrace_update_code+0x18/0x30
     ftrace_run_update_code+0x44/0xc0
     ftrace_startup+0xf8/0x1c0
     register_ftrace_function+0x4c/0xc0
     function_trace_init+0x80/0xb0
     tracing_set_tracer+0x2a4/0x4f0
     tracing_set_trace_write+0xd4/0x130
     vfs_write+0xf0/0x330
     ksys_write+0x84/0x140
     system_call_exception+0x14c/0x230
     system_call_common+0xf0/0x27c

To fix this when updating kernel memory PTEs using ptesync.

Fixes: f1cb8f9beb ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Tidy up change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com
2021-04-08 21:17:42 +10:00
Bhaskar Chowdhury 4763d37827 powerpc: Spelling/typo fixes
Various spelling/typo fixes.

Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-04-08 21:17:42 +10:00
Christoph Hellwig 4eeb96f6ef iommu/fsl_pamu: replace DOMAIN_ATTR_FSL_PAMU_STASH with a direct call
Add a fsl_pamu_configure_l1_stash API that qman_portal can call directly
instead of indirecting through the iommu attr API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Li Yang <leoyang.li@nxp.com>
Link: https://lore.kernel.org/r/20210401155256.298656-8-hch@lst.de
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07 10:56:52 +02:00
Christophe Leroy b0b3b2c78e powerpc: Switch to relative jump labels
Convert powerpc to relative jump labels.

Before the patch, pseries_defconfig vmlinux.o has:
9074 __jump_table  0003f2a0  0000000000000000  0000000000000000  01321fa8  2**0

With the patch, the same config gets:
9074 __jump_table  0002a0e0  0000000000000000  0000000000000000  01321fb4  2**0

Size is 258720 without the patch, 172256 with the patch.
That's a 33% size reduction.

Largely copied from commit c296146c05 ("arm64/kernel: jump_label:
Switch to relative references")

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/828348da7868eda953ce023994404dfc49603b64.1616514473.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:21 +11:00
Christophe Leroy 355a8d26cd powerpc/asm: Add some opcodes in asm/ppc-opcode.h for PPC32 eBPF
The following opcodes will be needed for the implementation
of eBPF for PPC32. Add them in asm/ppc-opcode.h

PPC_RAW_ADDE
PPC_RAW_ADDZE
PPC_RAW_ADDME
PPC_RAW_MFLR
PPC_RAW_ADDIC
PPC_RAW_ADDIC_DOT
PPC_RAW_SUBFC
PPC_RAW_SUBFE
PPC_RAW_SUBFIC
PPC_RAW_SUBFZE
PPC_RAW_ANDIS
PPC_RAW_NOR

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f7bd573a368edd78006f8a5af508c726e7ce1ed2.1616430991.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:21 +11:00
Christophe Leroy 627b72bee8 powerpc/signal32: Convert restore_[tm]_user_regs() to user access block
Convert restore_user_regs() and restore_tm_user_regs()
to use user_access_read_begin/end blocks.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/181adf15a6f644efcd1aeafb355f3578ff1b6bc5.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:19 +11:00
Christophe Leroy c1cc1570bc powerpc/uaccess: Also perform 64 bits copies in unsafe_copy_from_user() on ppc32
Similarly to commit 5cf773fc8f37 ("powerpc/uaccess: Also perform
64 bits copies in unsafe_copy_to_user() on ppc32")

ppc32 has an efficiant 64 bits unsafe_get_user(), so also use it in
order to unroll loops more.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/308e65d9237a14e8c0e3b22919fcf0b5e5592608.1616151715.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:18 +11:00
Christophe Leroy 5cd29b1fd3 powerpc/uaccess: Use asm goto for get_user when compiler supports it
clang 11 and future GCC are supporting asm goto with outputs.

Use it to implement get_user in order to get better generated code.

Note that clang requires to set x in the default branch of
__get_user_size_goto() otherwise is compliant about x not being
initialised :puzzled:

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/403745b5aaa1b315bb4e8e46c1ba949e77eecec0.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:16 +11:00
Christophe Leroy 035785ab28 powerpc/uaccess: Introduce __get_user_size_goto()
We have got two places doing a goto based on the result
of __get_user_size_allowed().

Refactor that into __get_user_size_goto().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/def8a39289e02653cfb1583b3b19837de9efed3a.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:13 +11:00
Christophe Leroy e72fcdb26c powerpc/uaccess: Refactor get/put_user() and __get/put_user()
Make get_user() do the access_ok() check then call __get_user().
Make put_user() do the access_ok() check then call __put_user().

Then embed  __get_user_size() and __put_user_size() in
__get_user() and __put_user().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eebc554f6a81f570c46ea3551000ff5b886e4faa.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:10 +11:00
Christophe Leroy 17f8c0bc21 powerpc/uaccess: Rename __get/put_user_check/nocheck
__get_user_check() becomes get_user()
__put_user_check() becomes put_user()
__get_user_nocheck() becomes __get_user()
__put_user_nocheck() becomes __put_user()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/41d7e45f4733f0e61e63824e4865b4e049db74d6.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:08 +11:00
Christophe Leroy f904c22f2a powerpc/uaccess: Split out __get_user_nocheck()
One part of __get_user_nocheck() is used for __get_user(),
the other part for unsafe_get_user().

Move the part dedicated to unsafe_get_user() in it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/618fe2e0626b308a5a063d5baac827b968e85c32.1615398265.git.christophe.leroy@csgroup.eu
2021-04-03 21:22:05 +11:00