Commit Graph

21639 Commits

Author SHA1 Message Date
Andrey Abramov bac7ca7b98 powerpc: module_[32|64].c: replace swap function with built-in one
Replace relaswap with built-in one, because relaswap
does a simple byte to byte swap.

Since Spectre mitigations have made indirect function calls more
expensive, and the default simple byte copies swap is implemented
without them, an "optimized" custom swap function is now
a waste of time as well as code.

Signed-off-by: Andrey Abramov <st5pub@yandex.ru>
Reviewed-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/994931554238042@iva8-b333b7f98ab0.qloud-c.yandex.net
2020-05-11 23:15:14 +10:00
Christophe JAILLET 2f62870ca5 powerpc/powernv: Fix a warning message
Fix a cut'n'paste error in a warning message. This should be
'cpu-idle-state-residency-ns' to match the property searched in the
previous 'of_property_read_u32_array()'

Fixes: 9c7b185ab2 ("powernv/cpuidle: Parse dt idle properties into global structure")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200502115949.139000-1-christophe.jaillet@wanadoo.fr
2020-05-11 23:15:14 +10:00
Christophe Leroy e963b7a28b powerpc/vdso32: Fallback on getres syscall when clock is unknown
There are other clocks than the standard ones, for instance
per process clocks. Therefore, being above the last standard clock
doesn't mean it is a bad clock. So, fallback to syscall instead
of returning -EINVAL inconditionaly.

Fixes: e33ffc956b ("powerpc/vdso32: implement clock_getres entirely")
Cc: stable@vger.kernel.org # v5.6+
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Link: https://lore.kernel.org/r/7316a9e2c0c2517923eb4b0411c4a08d15e675a4.1589017281.git.christophe.leroy@csgroup.eu
2020-05-11 19:24:29 +10:00
Michael Ellerman e2a8b49e79 powerpc/uaccess: Don't use "m<>" constraint
The "m<>" constraint breaks compilation with GCC 4.6.x era compilers.

The use of the constraint allows the compiler to use update-form
instructions, however in practice current compilers never generate
those forms for any of the current uses of __put_user_asm_goto().

We anticipate that GCC 4.6 will be declared unsupported for building
the kernel in the not too distant future. So for now just switch to
the "m" constraint.

Fixes: 334710b149 ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/r/20200507123324.2250024-1-mpe@ellerman.id.au
2020-05-08 13:30:42 +10:00
Cédric Le Goater b1f9be9392 powerpc/xive: Enforce load-after-store ordering when StoreEOI is active
When an interrupt has been handled, the OS notifies the interrupt
controller with a EOI sequence. On a POWER9 system using the XIVE
interrupt controller, this can be done with a load or a store
operation on the ESB interrupt management page of the interrupt. The
StoreEOI operation has less latency and improves interrupt handling
performance but it was deactivated during the POWER9 DD2.0 timeframe
because of ordering issues. We use the LoadEOI today but we plan to
reactivate StoreEOI in future architectures.

There is usually no need to enforce ordering between ESB load and
store operations as they should lead to the same result. E.g. a store
trigger and a load EOI can be executed in any order. Assuming the
interrupt state is PQ=10, a store trigger followed by a load EOI will
return a Q bit. In the reverse order, it will create a new interrupt
trigger from HW. In both cases, the handler processing interrupts is
notified.

In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to
disable temporarily the interrupt source (mask/unmask). When the
source is reenabled, the OS can detect if interrupts were received
while the source was disabled and reinject them. This process needs
special care when StoreEOI is activated. The ESB load and store
operations should be correctly ordered because a XIVE_ESB_STORE_EOI
operation could leave the source enabled if it has not completed
before the loads.

For those cases, we enforce Load-after-Store ordering with a special
load operation offset. To avoid performance impact, this ordering is
only enforced when really needed, that is when interrupt sources are
temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not
be needed for other loads.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220081506.31209-1-clg@kaod.org
2020-05-07 22:58:31 +10:00
Christophe Leroy 4833ce06e6 powerpc/32s: Fix build failure with CONFIG_PPC_KUAP_DEBUG
gpr2 is not a parametre of kuap_check(), it doesn't exist.

Use gpr instead.

Fixes: a68c31fc01 ("powerpc/32s: Implement Kernel Userspace Access Protection")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/ea599546f2a7771bde551393889e44e6b2632332.1587368807.git.christophe.leroy@c-s.fr
2020-05-07 17:25:54 +10:00
Nayna Jain fa4f3f56cc powerpc/ima: Fix secure boot rules in ima arch policy
To prevent verifying the kernel module appended signature
twice (finit_module), once by the module_sig_check() and again by IMA,
powerpc secure boot rules define an IMA architecture specific policy
rule only if CONFIG_MODULE_SIG_FORCE is not enabled. This,
unfortunately, does not take into account the ability of enabling
"sig_enforce" on the boot command line (module.sig_enforce=1).

Including the IMA module appraise rule results in failing the
finit_module syscall, unless the module signing public key is loaded
onto the IMA keyring.

This patch fixes secure boot policy rules to be based on
CONFIG_MODULE_SIG instead.

Fixes: 4238fad366 ("powerpc/ima: Add support to initialize ima policy rules")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/1588342612-14532-1-git-send-email-nayna@linux.ibm.com
2020-05-07 17:25:54 +10:00
Nicholas Piggin c44dc6323c powerpc/64s/kuap: Restore AMR in fast_interrupt_return
Interrupts that use fast_interrupt_return actually do lock AMR, but
they have been ones which tend to come from userspace (or kernel bugs)
in radix mode. With kuap on hash, segment interrupts are taken in
kernel often, which quickly breaks due to the missing restore.

Fixes: 890274c2dc ("powerpc/64s: Implement KUAP for Radix MMU")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200429065654.1677541-6-npiggin@gmail.com
2020-05-07 11:00:41 +10:00
Michael Ellerman 1f12096aca Merge the lockless page table walk rework into next
This merges the lockless page table walk rework series from Aneesh.
Because it touches powerpc KVM code we are sharing it with the kvm-ppc
tree in our topic/ppc-kvm branch.

This is the cover letter from Aneesh:

Avoid IPI while updating page table entries.

Problem Summary:
Slow termination of KVM guest with large guest RAM config due to a
large number of IPIs that were caused by clearing level 1 PTE
entries (THP) entries. This is shown in the stack trace below.

- qemu-system-ppc  [kernel.vmlinux]            [k] smp_call_function_many
   - smp_call_function_many
      - 36.09% smp_call_function_many
           serialize_against_pte_lookup
           radix__pmdp_huge_get_and_clear
           zap_huge_pmd
           unmap_page_range
           unmap_vmas
           unmap_region
           __do_munmap
           __vm_munmap
           sys_munmap
          system_call
           __munmap
           qemu_ram_munmap
           qemu_anon_ram_free
           reclaim_ramblock
           call_rcu_thread
           qemu_thread_start
           start_thread
           __clone

Why we need to do IPI when clearing PMD entries:
This was added as part of commit: 13bd817bb8 ("powerpc/thp: Serialize pmd clear against a linux page table walk")

serialize_against_pte_lookup makes sure that all parallel lockless
page table walk completes before we convert a PMD pte entry to regular
pmd entry. We end up doing that conversion in the below scenarios

1) __split_huge_zero_page_pmd
2) do_huge_pmd_wp_page_fallback
3) MADV_DONTNEED running parallel to page faults.

local_irq_disable and lockless page table walk:

The lockless page table walk work with the assumption that we can
dereference the page table contents without holding a lock. For this
to work, we need to make sure we read the page table contents
atomically and page table pages are not going to be freed/released
while we are walking the table pages. We can achieve by using a rcu
based freeing for page table pages or if the architecture implements
broadcast tlbie, we can block the IPI as we walk the page table pages.

To support both the above framework, lockless page table walk is done
with irq disabled instead of rcu_read_lock()

We do have two interface for lockless page table walk, gup fast and
__find_linux_pte. This patch series makes __find_linux_pte table walk
safe against the conversion of PMD PTE to regular PMD.

gup fast:

gup fast is already safe against THP split because kernel now
differentiate between a pmd split and a compound page split. gup fast
can run parallel to a pmd split and we prevent a parallel gup fast to
a hugepage split, by freezing the page refcount and failing the
speculative page ref increment.

Similar to how gup is safe against parallel pmd split, this patch
series updates the __find_linux_pte callers to be safe against a
parallel pmd split. We do that by enforcing the following rules.

1) Don't reload the pte value, because that can be updated in
   parallel.
2) Code should be able to work with a stale PTE value and not the
   recent one. ie, the pte value that we are looking at may not be the
   latest value in the page table.
3) Before looking at pte value check for _PAGE_PTE bit. We now do this
as part of pte_present() check.

Performance:

This speeds up Qemu guest RAM del/unplug time as below
128 core, 496GB guest:

Without patch:
  munmap start: timer = 13162 ms, PID=7684
  munmap finish: timer = 95312 ms, PID=7684 - delta = 82150 ms

With patch (upto removing IPI)
  munmap start: timer = 196449 ms, PID=6681
  munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms

With patch (with adding the tlb invalidate in pmdp_huge_get_and_clear_full)
  munmap start: timer = 196345 ms, PID=6879
  munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms

Link: https://lore.kernel.org/r/20200505071729.54912-1-aneesh.kumar@linux.ibm.com
2020-05-06 15:53:24 +10:00
Aneesh Kumar K.V 75358ea359 powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race
MADV_DONTNEED holds mmap_sem in read mode and that implies a
parallel page fault is possible and the kernel can end up with a level 1 PTE
entry (THP entry) converted to a level 0 PTE entry without flushing
the THP TLB entry.

Most architectures including POWER have issues with kernel instantiating a level
0 PTE entry while holding level 1 TLB entries.

The code sequence I am looking at is

down_read(mmap_sem)                         down_read(mmap_sem)

zap_pmd_range()
 zap_huge_pmd()
  pmd lock held
  pmd_cleared
  table details added to mmu_gather
  pmd_unlock()
                                         insert a level 0 PTE entry()

tlb_finish_mmu().

Fix this by forcing a tlb flush before releasing pmd lock if this is
not a fullmm invalidate. We can safely skip this invalidate for
task exit case (fullmm invalidate) because in that case we are sure
there can be no parallel fault handlers.

This do change the Qemu guest RAM del/unplug time as below

128 core, 496GB guest:

Without patch:
munmap start: timer = 196449 ms, PID=6681
munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms

With patch:
munmap start: timer = 196345 ms, PID=6879
munmap finish: timer = 196714 ms, PID=6879 - delta = 369ms

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-23-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V e21dfbf013 powerpc/mm/book3s64: Avoid sending IPI on clearing PMD
Now that all the lockless page table walk is careful w.r.t the PTE
address returned, we can now revert
commit: 13bd817bb8 ("powerpc/thp: Serialize pmd clear against a linux page table walk.")

We also drop the equivalent IPI from other pte updates routines. We still keep
IPI in hash pmdp collapse and that is to take care of parallel hash page table
insert. The radix pmdp collapse flush can possibly be removed once I am sure
generic code doesn't have the any expectations around parallel gup walk.

This speeds up Qemu guest RAM del/unplug time as below

128 core, 496GB guest:

Without patch:
munmap start: timer = 13162 ms, PID=7684
munmap finish: timer = 95312 ms, PID=7684 - delta = 82150 ms

With patch:
munmap start: timer = 196449 ms, PID=6681
munmap finish: timer = 196488 ms, PID=6681 - delta = 39ms

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-21-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V 0e11df9649 powerpc/kvm/book3s: Use pte_present instead of opencoding _PAGE_PRESENT check
This adds _PAGE_PTE check and makes sure we validate the pte value returned via
find_kvm_host_pte.

NOTE: this also considers _PAGE_INVALID to the software valid bit.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-20-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V 9fd4236faa powerpc/kvm/book3s: Use find_kvm_host_pte in kvmppc_get_hpa
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-19-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V bda3deaa6f powerpc/kvm/book3s: use find_kvm_host_pte in kvmppc_book3s_instantiate_page
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-18-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:16 +10:00
Aneesh Kumar K.V 3ff8df1430 powerpc/kvm/book3s: Avoid using rmap to protect parallel page table update.
We now depend on kvm->mmu_lock

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-17-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V 7769a3394b powerpc/kvm/book3s: use find_kvm_host_pte in pute_tce functions
Current code just hold rmap lock to ensure parallel page table update is
prevented. That is not sufficient. The kernel should also check whether
a mmu_notifer callback was running in parallel.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-16-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V e3d8ed5518 powerpc/kvm/book3s: Use find_kvm_host_pte in h_enter
Since kvmppc_do_h_enter can get called in realmode use low level
arch_spin_lock which is safe to be called in realmode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-15-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V 9781e759b3 powerpc/kvm/book3s: Use find_kvm_host_pte in page fault handler
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-14-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V 35528876a9 powerpc/kvm/book3s: Add helper for host page table walk
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-13-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V 6cdf30375f powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table
update kvmppc_hv_handle_set_rc to use find_kvm_nested_guest_pte and
find_kvm_secondary_pte

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-12-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V dc891849e0 powerpc/kvm/nested: Add helper to walk nested shadow linux page table.
The locking rules for walking nested shadow linux page table is different from process
scoped table. Hence add a helper for nested page table walk and also
add check whether we are holding the right locks.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-11-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V 4b99412ed6 powerpc/kvm/book3s: Add helper to walk partition scoped linux page table.
The locking rules for walking partition scoped table is different from process
scoped table. Hence add a helper for secondary linux page table walk and also
add check whether we are holding the right locks.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-10-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:15 +10:00
Aneesh Kumar K.V 87013f9c60 powerpc/kvm/book3s: switch from raw_spin_*lock to arch_spin_lock.
These functions can get called in realmode. Hence use low level
arch_spin_lock which is safe to be called in realmode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-9-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V 15759cb054 powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow
read_user_stack_slow is called with interrupts soft disabled and it copies contents
from the page which we find mapped to a specific address. To convert
userspace address to pfn, the kernel now uses lockless page table walk.

The kernel needs to make sure the pfn value read remains stable and is not released
and reused for another process while the contents are read from the page. This
can only be achieved by holding a page reference.

One of the first approaches I tried was to check the pte value after the kernel
copies the contents from the page. But as shown below we can still get it wrong

CPU0                           CPU1
pte = READ_ONCE(*ptep);
                               pte_clear(pte);
                               put_page(page);
                               page = alloc_page();
                               memcpy(page_address(page), "secret password", nr);
memcpy(buf, kaddr + offset, nb);
                               put_page(page);
                               handle_mm_fault()
                               page = alloc_page();
                               set_pte(pte, page);
if (pte_val(pte) != pte_val(*ptep))

Hence switch to __get_user_pages_fast.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-8-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V 0da81b658b powerpc/mce: Don't reload pte val in addr_to_pfn
A lockless page table walk should be safe against parallel THP collapse, THP
split and madvise(MADV_DONTNEED)/parallel fault. This patch makes sure kernel
won't reload the pteval when checking for different conditions. The patch also added
a check for pte_present to make sure the kernel is indeed operating
on a PTE and not a pointer to level 0 table page.

The pfn value we find here can be different from the actual pfn on which
machine check happened. This can happen if we raced with a parallel update
of the page table. In such a scenario we end up isolating a wrong pfn. But that
doesn't have any other side effect.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-7-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V 2f92447f9f powerpc/book3s64/hash: Use the pte_t address from the caller
Don't fetch the pte value using lockless page table walk. Instead use the value from the
caller. hash_preload is called with ptl lock held. So it is safe to use the
pte_t address directly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-6-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V 7900757ce1 powerpc/hash64: Restrict page table lookup using init_mm with __flush_hash_table_range
This is only used with init_mm currently. Walking init_mm is much simpler
because we don't need to handle concurrent page table like other mm_context

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-5-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V ec4abf1e70 powerpc/mm/hash64: use _PAGE_PTE when checking for pte_present
This makes the pte_present check stricter by checking for additional _PAGE_PTE
bit. A level 1 pte pointer (THP pte) can be switched to a pointer to level 0 pte
page table page by following two operations.

1) THP split.
2) madvise(MADV_DONTNEED) in parallel to page fault.

A lockless page table walk need to make sure we can handle such changes
gracefully.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-4-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V c46241a370 powerpc/pkeys: Check vma before returning key fault error to the user
If multiple threads in userspace keep changing the protection keys
mapping a range, there can be a scenario where kernel takes a key fault
but the pkey value found in the siginfo struct is a permissive one.

This can confuse the userspace as shown in the below test case.

/* use this to control the number of test iterations */

static void pkeyreg_set(int pkey, unsigned long rights)
{
	unsigned long reg, shift;

	shift = (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY;
	asm volatile("mfspr	%0, 0xd" : "=r"(reg));
	reg &= ~(((unsigned long) PKEY_BITS_MASK) << shift);
	reg |= (rights & PKEY_BITS_MASK) << shift;
	asm volatile("mtspr	0xd, %0" : : "r"(reg));
}

static unsigned long pkeyreg_get(void)
{
	unsigned long reg;

	asm volatile("mfspr	%0, 0xd" : "=r"(reg));
	return reg;
}

static int sys_pkey_mprotect(void *addr, size_t len, int prot, int pkey)
{
	return syscall(SYS_pkey_mprotect, addr, len, prot, pkey);
}

static int sys_pkey_alloc(unsigned long flags, unsigned long access_rights)
{
	return syscall(SYS_pkey_alloc, flags, access_rights);
}

static int sys_pkey_free(int pkey)
{
	return syscall(SYS_pkey_free, pkey);
}

static int faulting_pkey;
static int permissive_pkey;
static pthread_barrier_t pkey_set_barrier;
static pthread_barrier_t mprotect_barrier;

static void pkey_handle_fault(int signum, siginfo_t *sinfo, void *ctx)
{
	unsigned long pkeyreg;

	/* FIXME: printf is not signal-safe but for the current purpose,
	          it gets the job done. */
	printf("pkey: exp = %d, got = %d\n", faulting_pkey, sinfo->si_pkey);
	fflush(stdout);

	assert(sinfo->si_code == SEGV_PKUERR);
	assert(sinfo->si_pkey == faulting_pkey);

	/* clear pkey permissions to let the faulting instruction continue */
	pkeyreg_set(faulting_pkey, 0x0);
}

static void *do_mprotect_fault(void *p)
{
	unsigned long rights, pkeyreg, pgsize;
	unsigned int i;
	void *region;
	int pkey;

	srand(time(NULL));
	pgsize = sysconf(_SC_PAGESIZE);
	rights = PKEY_DISABLE_WRITE;
	region = p;

	/* allocate key, no permissions */
	assert((pkey = sys_pkey_alloc(0, PKEY_DISABLE_ACCESS)) > 0);
	pkeyreg_set(4, 0x0);

	/* cache the pkey here as the faulting pkey for future reference
	   in the signal handler */
	faulting_pkey = pkey;
	printf("%s: faulting pkey = %d\n", __func__, faulting_pkey);

	/* try to allocate, mprotect and free pkeys repeatedly */
	for (i = 0; i < NUM_ITERATIONS; i++) {
		/* sync up with the other thread here */
		pthread_barrier_wait(&pkey_set_barrier);

		/* make sure that the pkey used by the non-faulting thread
		   is made permissive for this thread's context too so that
		   no faults are triggered because it still might have been
		   set to a restrictive value */
//		pkeyreg_set(permissive_pkey, 0x0);

		/* sync up with the other thread here */
		pthread_barrier_wait(&mprotect_barrier);

		/* perform mprotect */
		assert(!sys_pkey_mprotect(region, pgsize, PROT_READ | PROT_WRITE, pkey));

		/* choose a random byte from the protected region and
		   attempt to write to it, this will generate a fault */
		*((char *) region + (rand() % pgsize)) = rand();

		/* restore pkey permissions as the signal handler may have
		   cleared the bit out for the sake of continuing */
		pkeyreg_set(pkey, PKEY_DISABLE_WRITE);
	}

	/* free pkey */
	sys_pkey_free(pkey);

	return NULL;
}

static void *do_mprotect_nofault(void *p)
{
	unsigned long pgsize;
	unsigned int i, j;
	void *region;
	int pkey;

	pgsize = sysconf(_SC_PAGESIZE);
	region = p;

	/* try to allocate, mprotect and free pkeys repeatedly */
	for (i = 0; i < NUM_ITERATIONS; i++) {
		/* allocate pkey, all permissions */
		assert((pkey = sys_pkey_alloc(0, 0)) > 0);
		permissive_pkey = pkey;

		/* sync up with the other thread here */
		pthread_barrier_wait(&pkey_set_barrier);
		pthread_barrier_wait(&mprotect_barrier);

		/* perform mprotect on the common page, no faults will
		   be triggered as this is most permissive */
		assert(!sys_pkey_mprotect(region, pgsize, PROT_READ | PROT_WRITE, pkey));

		/* free pkey */
		assert(!sys_pkey_free(pkey));
	}

	return NULL;
}

int main(int argc, char **argv)
{
	pthread_t fault_thread, nofault_thread;
	unsigned long pgsize;
	struct sigaction act;
	pthread_attr_t attr;
	cpu_set_t fault_cpuset, nofault_cpuset;
	unsigned int i;
	void *region;

	/* allocate memory region to protect */
	pgsize = sysconf(_SC_PAGESIZE);
	assert(region = memalign(pgsize, pgsize));

	CPU_ZERO(&fault_cpuset);
	CPU_SET(0, &fault_cpuset);
	CPU_ZERO(&nofault_cpuset);
	CPU_SET(8, &nofault_cpuset);
	assert(!pthread_attr_init(&attr));

	/* setup sigsegv signal handler */
	act.sa_handler = 0;
	act.sa_sigaction = pkey_handle_fault;
	assert(!sigprocmask(SIG_SETMASK, 0, &act.sa_mask));
	act.sa_flags = SA_SIGINFO;
	act.sa_restorer = 0;
	assert(!sigaction(SIGSEGV, &act, NULL));

	/* setup barrier for the two threads */
	pthread_barrier_init(&pkey_set_barrier, NULL, 2);
	pthread_barrier_init(&mprotect_barrier, NULL, 2);

	/* setup and start threads */
	assert(!pthread_create(&fault_thread, &attr, &do_mprotect_fault, region));
	assert(!pthread_setaffinity_np(fault_thread, sizeof(cpu_set_t), &fault_cpuset));
	assert(!pthread_create(&nofault_thread, &attr, &do_mprotect_nofault, region));
	assert(!pthread_setaffinity_np(nofault_thread, sizeof(cpu_set_t), &nofault_cpuset));

	/* cleanup */
	assert(!pthread_attr_destroy(&attr));
	assert(!pthread_join(fault_thread, NULL));
	assert(!pthread_join(nofault_thread, NULL));
	assert(!pthread_barrier_destroy(&pkey_set_barrier));
	assert(!pthread_barrier_destroy(&mprotect_barrier));
	free(region);

	puts("PASS");

	return EXIT_SUCCESS;
}

The above test can result the below failure without this patch.

pkey: exp = 3, got = 3
pkey: exp = 3, got = 4
a.out: pkey-siginfo-race.c💯 pkey_handle_fault: Assertion `sinfo->si_pkey == faulting_pkey' failed.
Aborted

Check for vma access before considering this a key fault. If vma pkey allow
access retry the acess again.

Test case is written by Sandipan Das <sandipan@linux.ibm.com> hence added SOB
from him.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-3-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:14 +10:00
Aneesh Kumar K.V fe4a6856cb powerpc/pkeys: Avoid using lockless page table walk
Fetch pkey from vma instead of linux page table. Also document the fact that in
some cases the pkey returned in siginfo won't be the same as the one we took
keyfault on. Even with linux page table walk, we can end up in a similar scenario.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200505071729.54912-2-aneesh.kumar@linux.ibm.com
2020-05-05 21:20:13 +10:00
Michael Ellerman f2b8d76dc6 PPC KVM fix for 5.7
- Fix a regression introduced in the last merge window, which results
   in guests in HPT mode dying randomly.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJeni/pAAoJEJ2a6ncsY3GfTRoIANAQjIZi96AfJcfnrYQ4yUF7
 scxawTiJ9VavvsEJLJ7vsozrJ4xxmvmA0fFWC84uw9+BwPqoLFFvZTjazbGEDVvF
 FGwNBR/k7nfFVMIHS3K9iy9KjvYL3xkL26AgFTDJFq8hmOO9pH0txuk4r7SXb+NX
 bGG0mScAD/Dg/HwAHAS6EP3jT35QtGTK62p8foqVTziTNcmBn9Ywtg0lEzAcq2iY
 Y1BUD4Ov3cggshMI9SqHE8Yyq0XA2Wi6ggcyz/gVzvcbdFQmtg57Tri8nN8661LX
 XKh+VTpYSIxNs5GgjwlNesJzJ9h6CSynJF556qrjQ0XsXcNqvn8fcZdNQ+hnRYw=
 =Y19W
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-fixes-5.7-1' into topic/ppc-kvm

This brings in a fix from the kvm-ppc tree that was merged to mainline
after rc2, and so isn't in the base of our topic branch. We'd like it
in the topic branch because it interacts with patches we plan to carry
in this branch.
2020-05-05 21:16:47 +10:00
Hari Bathini 140777a3d8 powerpc/fadump: consider reserved ranges while reserving memory
Commit 0962e8004e ("powerpc/prom: Scan reserved-ranges node for
memory reservations") enabled support to parse reserved-ranges DT
node and reserve kernel memory falling in these ranges for F/W
purposes. Memory reserved for FADump should not overlap with these
ranges as it could corrupt memory meant for F/W or crash'ed kernel
memory to be exported as vmcore.

But since commit 579ca1a276 ("powerpc/fadump: make use of memblock's
bottom up allocation mode"), memblock_find_in_range() is being used to
find the appropriate area to reserve memory for FADump, which can't
account for reserved-ranges as these ranges are reserved only after
FADump memory reservation.

With reserved-ranges now being populated during early boot, look out
for these memory ranges while reserving memory for FADump. Without
this change, MPIPL on PowerNV systems aborts with hostboot failure,
when memory reserved for FADump is less than 4096MB.

Fixes: 579ca1a276 ("powerpc/fadump: make use of memblock's bottom up allocation mode")
Cc: stable@vger.kernel.org
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/158737297693.26700.16193820746269425424.stgit@hbathini.in.ibm.com
2020-05-04 22:29:58 +10:00
Hari Bathini 02c04e374e powerpc/fadump: use static allocation for reserved memory ranges
At times, memory ranges have to be looked up during early boot, when
kernel couldn't be initialized for dynamic memory allocation. In fact,
reserved-ranges look up is needed during FADump memory reservation.
Without accounting for reserved-ranges in reserving memory for FADump,
MPIPL boot fails with memory corruption issues. So, extend memory
ranges handling to support static allocation and populate reserved
memory ranges during early boot.

Fixes: dda9dbfeeb ("powerpc/fadump: consider reserved ranges while releasing memory")
Cc: stable@vger.kernel.org
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/158737294432.26700.4830263187856221314.stgit@hbathini.in.ibm.com
2020-05-04 22:29:58 +10:00
Nicholas Piggin 53459dc970 powerpc/64s/kuap: Restore AMR in system reset exception
The system reset interrupt handler locks AMR and exits with
EXCEPTION_RESTORE_REGS without restoring AMR. Similarly to the
soft-NMI handler, it needs to restore.

Fixes: 890274c2dc ("powerpc/64s: Implement KUAP for Radix MMU")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200429065654.1677541-5-npiggin@gmail.com
2020-05-04 15:21:28 +10:00
Nicholas Piggin c0d7dcf89e powerpc/64/kuap: Move kuap checks out of MSR[RI]=0 regions of exit code
Any kind of WARN causes a program check that will crash with
unrecoverable exception if it occurs when RI is clear.

Fixes: 68b34588e2 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200429065654.1677541-2-npiggin@gmail.com
2020-05-04 09:22:58 +10:00
Michael Ellerman 0094368e3b powerpc/64s: Fix unrecoverable SLB crashes due to preemption check
Hugh reported that his trusty G5 crashed after a few hours under load
with an "Unrecoverable exception 380".

The crash is in interrupt_return() where we check lazy_irq_pending(),
which calls get_paca() and with CONFIG_DEBUG_PREEMPT=y that goes to
check_preemption_disabled() via debug_smp_processor_id().

As Nick explained on the list:

  Problem is MSR[RI] is cleared here, ready to do the last few things
  for interrupt return where we're not allowed to take any other
  interrupts.

  SLB interrupts can happen just about anywhere aside from kernel
  text, global variables, and stack. When that hits, it appears to be
  unrecoverable due to RI=0.

The problematic access is in preempt_count() which is:

	return READ_ONCE(current_thread_info()->preempt_count);

Because of THREAD_INFO_IN_TASK, current_thread_info() just points to
current, so the access is to somewhere in kernel memory, but not on
the stack or in .data, which means it can cause an SLB miss. If we
take an SLB miss with RI=0 it is fatal.

The easiest solution is to add a version of lazy_irq_pending() that
doesn't do the preemption check and call it from the interrupt return
path.

Fixes: 68b34588e2 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200502143316.929341-1-mpe@ellerman.id.au
2020-05-04 09:18:06 +10:00
Michael Ellerman 07ad112ab7 Merge KUAP fix from topic/uaccess-ppc into fixes
Merge a KUAP fix from Nick that we're keeping in a topic branch due to
interactions with other series that are headed for next.
2020-05-03 18:28:11 +10:00
Christophe Leroy 4fe5cda9f8 powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin
Add support for selective read or write user access with
user_read_access_begin/end and user_write_access_begin/end.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6c83af0f0809ef2a955c39ac622767f6cbede035.1585898438.git.christophe.leroy@c-s.fr
2020-05-01 12:37:15 +10:00
Christophe Leroy 17bc43367f powerpc/uaccess: Implement unsafe_copy_to_user() as a simple loop
At the time being, unsafe_copy_to_user() is based on
raw_copy_to_user() which calls __copy_tofrom_user().

__copy_tofrom_user() is a big optimised function to copy big amount
of data. It aligns destinations to cache line in order to use
dcbz instruction.

Today unsafe_copy_to_user() is called only from filldir().
It is used to mainly copy small amount of data like filenames,
so __copy_tofrom_user() is not fit.

Also, unsafe_copy_to_user() is used within user_access_begin/end
sections. In those section, it is preferable to not call functions.

Rewrite unsafe_copy_to_user() as a macro that uses __put_user_goto().
We first perform a loop of long, then we finish with necessary
complements.

unsafe_copy_to_user() might be used in the near future to copy
fixed-size data, like pt_regs structs during signal processing.
Having it as a macro allows GCC to optimise it for instead when
it knows the size in advance, it can unloop loops, drop complements
when the size is a multiple of longs, etc ...

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe952112c29bf6a0a2778c9e6bbb4f4afd2c4258.1587143308.git.christophe.leroy@c-s.fr
2020-04-30 20:30:40 +10:00
Christophe Leroy 334710b149 powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'
unsafe_put_user() is designed to take benefit of 'asm goto'.

Instead of using the standard __put_user() approach and branch
based on the returned error, use 'asm goto' and make the
exception code branch directly to the error label. There is
no code anymore in the fixup section.

This change significantly simplifies functions using
unsafe_put_user()

Small exemple of the benefit with the following code:

struct test {
	u32 item1;
	u16 item2;
	u8 item3;
	u64 item4;
};

int set_test_to_user(struct test __user *test, u32 item1, u16 item2, u8 item3, u64 item4)
{
	unsafe_put_user(item1, &test->item1, failed);
	unsafe_put_user(item2, &test->item2, failed);
	unsafe_put_user(item3, &test->item3, failed);
	unsafe_put_user(item4, &test->item4, failed);
	return 0;
failed:
	return -EFAULT;
}

Before the patch:

00000be8 <set_test_to_user>:
 be8:	39 20 00 00 	li      r9,0
 bec:	90 83 00 00 	stw     r4,0(r3)
 bf0:	2f 89 00 00 	cmpwi   cr7,r9,0
 bf4:	40 9e 00 38 	bne     cr7,c2c <set_test_to_user+0x44>
 bf8:	b0 a3 00 04 	sth     r5,4(r3)
 bfc:	2f 89 00 00 	cmpwi   cr7,r9,0
 c00:	40 9e 00 2c 	bne     cr7,c2c <set_test_to_user+0x44>
 c04:	98 c3 00 06 	stb     r6,6(r3)
 c08:	2f 89 00 00 	cmpwi   cr7,r9,0
 c0c:	40 9e 00 20 	bne     cr7,c2c <set_test_to_user+0x44>
 c10:	90 e3 00 08 	stw     r7,8(r3)
 c14:	91 03 00 0c 	stw     r8,12(r3)
 c18:	21 29 00 00 	subfic  r9,r9,0
 c1c:	7d 29 49 10 	subfe   r9,r9,r9
 c20:	38 60 ff f2 	li      r3,-14
 c24:	7d 23 18 38 	and     r3,r9,r3
 c28:	4e 80 00 20 	blr
 c2c:	38 60 ff f2 	li      r3,-14
 c30:	4e 80 00 20 	blr

00000000 <.fixup>:
	...
  b8:	39 20 ff f2 	li      r9,-14
  bc:	48 00 00 00 	b       bc <.fixup+0xbc>
			bc: R_PPC_REL24	.text+0xbf0
  c0:	39 20 ff f2 	li      r9,-14
  c4:	48 00 00 00 	b       c4 <.fixup+0xc4>
			c4: R_PPC_REL24	.text+0xbfc
  c8:	39 20 ff f2 	li      r9,-14
  cc:	48 00 00 00 	b       cc <.fixup+0xcc>
  d0:	39 20 ff f2 	li      r9,-14
  d4:	48 00 00 00 	b       d4 <.fixup+0xd4>
			d4: R_PPC_REL24	.text+0xc18

00000000 <__ex_table>:
	...
			a0: R_PPC_REL32	.text+0xbec
			a4: R_PPC_REL32	.fixup+0xb8
			a8: R_PPC_REL32	.text+0xbf8
			ac: R_PPC_REL32	.fixup+0xc0
			b0: R_PPC_REL32	.text+0xc04
			b4: R_PPC_REL32	.fixup+0xc8
			b8: R_PPC_REL32	.text+0xc10
			bc: R_PPC_REL32	.fixup+0xd0
			c0: R_PPC_REL32	.text+0xc14
			c4: R_PPC_REL32	.fixup+0xd0

After the patch:

00000be8 <set_test_to_user>:
 be8:	90 83 00 00 	stw     r4,0(r3)
 bec:	b0 a3 00 04 	sth     r5,4(r3)
 bf0:	98 c3 00 06 	stb     r6,6(r3)
 bf4:	90 e3 00 08 	stw     r7,8(r3)
 bf8:	91 03 00 0c 	stw     r8,12(r3)
 bfc:	38 60 00 00 	li      r3,0
 c00:	4e 80 00 20 	blr
 c04:	38 60 ff f2 	li      r3,-14
 c08:	4e 80 00 20 	blr

00000000 <__ex_table>:
	...
			a0: R_PPC_REL32	.text+0xbe8
			a4: R_PPC_REL32	.text+0xc04
			a8: R_PPC_REL32	.text+0xbec
			ac: R_PPC_REL32	.text+0xc04
			b0: R_PPC_REL32	.text+0xbf0
			b4: R_PPC_REL32	.text+0xc04
			b8: R_PPC_REL32	.text+0xbf4
			bc: R_PPC_REL32	.text+0xc04
			c0: R_PPC_REL32	.text+0xbf8
			c4: R_PPC_REL32	.text+0xc04

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/23e680624680a9a5405f4b88740d2596d4b17c26.1587143308.git.christophe.leroy@c-s.fr
2020-04-30 20:30:40 +10:00
Nicholas Piggin d02f6b7dab powerpc/uaccess: Evaluate macro arguments once, before user access is allowed
get/put_user() can be called with nontrivial arguments. fs/proc/page.c
has a good example:

    if (put_user(stable_page_flags(ppage), out)) {

stable_page_flags() is quite a lot of code, including spin locks in
the page allocator.

Ensure these arguments are evaluated before user access is allowed.

This improves security by reducing code with access to userspace, but
it also fixes a PREEMPT bug with KUAP on powerpc/64s:
stable_page_flags() is currently called with AMR set to allow writes,
it ends up calling spin_unlock(), which can call preempt_schedule. But
the task switch code can not be called with AMR set (it relies on
interrupts saving the register), so this blows up.

It's fine if the code inside allow_user_access() is preemptible,
because a timer or IPI will save the AMR, but it's not okay to
explicitly cause a reschedule.

Fixes: de78a9c42a ("powerpc: Add a framework for Kernel Userspace Access Protection")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200407041245.600651-1-npiggin@gmail.com
2020-04-30 20:21:44 +10:00
Naveen N. Rao 57b3ed941b powerpc/64: Have MPROFILE_KERNEL depend on FUNCTION_TRACER
Currently, it is possible to have CONFIG_FUNCTION_TRACER disabled, but
CONFIG_MPROFILE_KERNEL enabled. Though all existing users of
MPROFILE_KERNEL are doing the right thing, it is weird to have
MPROFILE_KERNEL enabled when the function tracer isn't. Fix this by
making MPROFILE_KERNEL depend on FUNCTION_TRACER.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200422092612.514301-1-naveen.n.rao@linux.vnet.ibm.com
2020-04-30 12:56:28 +10:00
Gautham R. Shenoy 6909f179ca powerpc/sysfs: Show idle_purr and idle_spurr for every CPU
On Pseries LPARs, to calculate utilization, we need to know the
[S]PURR ticks when the CPUs were busy or idle.

The total PURR and SPURR ticks are already exposed via the per-cpu
sysfs files "purr" and "spurr". This patch adds support for exposing
the idle PURR and SPURR ticks via new per-cpu sysfs files named
"idle_purr" and "idle_spurr".

This patch also adds helper functions to accurately read the values of
idle_purr and idle_spurr especially from an interrupt context between
when the interrupt has occurred between the pseries_idle_prolog() and
pseries_idle_epilog(). This will ensure that the idle purr/spurr
values corresponding to the latest idle period is accounted for before
these values are read.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1586249263-14048-5-git-send-email-ego@linux.vnet.ibm.com
2020-04-30 12:35:26 +10:00
Gautham R. Shenoy dc8afce5f4 powerpc/pseries: Account for SPURR ticks on idle CPUs
On Pseries LPARs, to calculate utilization, we need to know the
[S]PURR ticks when the CPUs were busy or idle.

Via pseries_idle_prolog(), pseries_idle_epilog(), we track the idle
PURR ticks in the VPA variable "wait_state_cycles". This patch extends
the support to account for the idle SPURR ticks.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1586249263-14048-4-git-send-email-ego@linux.vnet.ibm.com
2020-04-30 12:35:26 +10:00
Gautham R. Shenoy c4019198cf powerpc/idle: Store PURR snapshot in a per-cpu global variable
Currently when CPU goes idle, we take a snapshot of PURR via
pseries_idle_prolog() which is used at the CPU idle exit to compute
the idle PURR cycles via the function pseries_idle_epilog().  Thus,
the value of idle PURR cycle thus read before pseries_idle_prolog() and
after pseries_idle_epilog() is always correct.

However, if we were to read the idle PURR cycles from an interrupt
context between pseries_idle_prolog() and pseries_idle_epilog() (this
will be done in a future patch), then, the value of the idle PURR thus
read will not include the cycles spent in the most recent idle period.
Thus, in that interrupt context, we will need access to the snapshot
of the PURR before going idle, in order to compute the idle PURR
cycles for the latest idle duration.

In this patch, we save the snapshot of PURR in pseries_idle_prolog()
in a per-cpu variable, instead of on the stack, so that it can be
accessed from an interrupt context.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1586249263-14048-3-git-send-email-ego@linux.vnet.ibm.com
2020-04-30 12:35:26 +10:00
Gautham R. Shenoy e4a884cc28 powerpc: Move idle_loop_prolog()/epilog() functions to header file
Currently prior to entering an idle state on a Linux Guest, the
pseries cpuidle driver implement an idle_loop_prolog() and
idle_loop_epilog() functions which ensure that idle_purr is correctly
computed, and the hypervisor is informed that the CPU cycles have been
donated.

These prolog and epilog functions are also required in the default
idle call, i.e pseries_lpar_idle(). Hence move these accessor
functions to a common header file and call them from
pseries_lpar_idle(). Since the existing header files such as
asm/processor.h have enough clutter, create a new header file
asm/idle.h. Finally rename idle_loop_prolog() and idle_loop_epilog()
to pseries_idle_prolog() and pseries_idle_epilog() as they are only
relavent for on pseries guests.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1586249263-14048-2-git-send-email-ego@linux.vnet.ibm.com
2020-04-30 12:35:26 +10:00
Christophe Leroy feb8e960d7 powerpc/mm: Fix CONFIG_PPC_KUAP_DEBUG on PPC32
CONFIG_PPC_KUAP_DEBUG is not selectable because it depends on PPC_32
which doesn't exists.

Fixing it leads to a deadlock due to a vital register getting
clobbered in _switch().

Change dependency to PPC32 and use r0 instead of r4 in _switch()

Fixes: e2fb9f5444 ("powerpc/32: Prepare for Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/540242f7d4573f7cdf1b3bf46bb35f743b2cd68f.1587124651.git.christophe.leroy@c-s.fr
2020-04-22 20:24:02 +10:00
Christophe Leroy b61c38baa9 powerpc/8xx: Fix STRICT_KERNEL_RWX startup test failure
WRITE_RO lkdtm test works.

But when selecting CONFIG_DEBUG_RODATA_TEST, the kernel reports
	rodata_test: test data was not read only

This is because when rodata test runs, there are still old entries
in TLB.

Flush TLB after setting kernel pages RO or NX.

Fixes: d5f17ee964 ("powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/485caac75f195f18c11eb077b0031fdd2bb7fb9e.1587361039.git.christophe.leroy@c-s.fr
2020-04-22 20:23:41 +10:00
Stephen Rothwell 45591da765 powerpc/vas: Include linux/types.h in uapi/asm/vas-api.h
allyesconfig fails with:
  ./usr/include/asm/vas-api.h:15:2: error: unknown type name '__u32'
     15 |  __u32 version;
        |  ^~~~~
  ./usr/include/asm/vas-api.h:16:2: error: unknown type name '__s16'
     16 |  __s16 vas_id; /* specific instance of vas or -1 for default */
        |  ^~~~~
  ./usr/include/asm/vas-api.h:17:2: error: unknown type name '__u16'
     17 |  __u16 reserved1;
        |  ^~~~~
  ./usr/include/asm/vas-api.h:18:2: error: unknown type name '__u64'
     18 |  __u64 flags; /* Future use */
        |  ^~~~~
  ./usr/include/asm/vas-api.h:19:2: error: unknown type name '__u64'
     19 |  __u64 reserved2[6];
        |  ^~~~~

uapi headers should be self contained, so add an include of
linux/types.h.

Fixes: 45f25a79fe ("powerpc/vas: Define VAS_TX_WIN_OPEN ioctl API")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Flesh out change log from linux-next error report]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200422154129.11f988fd@canb.auug.org.au
2020-04-22 20:02:14 +10:00
Chris Packham 94c0b013c9 powerpc/setup_64: Set cache-line-size based on cache-block-size
If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not, use
the block-size value for both. Per the devicetree spec cache-line-size
is only needed if it differs from the block size.

Originally the code would fallback from block size to line size. An
error message was printed if both properties were missing.

Later the code was refactored to use clearer names and logic but it
inadvertently made line size a required property, meaning on systems
without a line size property we fall back to the default from the
cputable.

On powernv (OPAL) platforms, since the introduction of device tree CPU
features (5a61ef74f2 ("powerpc/64s: Support new device tree binding
for discovering CPU features")), that has led to the wrong value being
used, as the fallback value is incorrect for Power8/Power9 CPUs.

The incorrect values flow through to the VDSO and also to the sysconf
values, SC_LEVEL1_ICACHE_LINESIZE etc.

Fixes: bd067f83b0 ("powerpc/64: Fix naming of cache block vs. cache line")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reported-by: Qian Cai <cai@lca.pw>
[mpe: Add even more detail to change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200416221908.7886-1-chris.packham@alliedtelesis.co.nz
2020-04-21 18:01:06 +10:00
Paul Mackerras ae49dedaa9 KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions
Since cd758a9b57 "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
page fault handler", it's been possible in fairly rare circumstances to
load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
guest on a POWER8 host.

Because that case wasn't checked for, we could misinterpret the non-present
PTE as being a cache-inhibited PTE.  That could mismatch with the
corresponding hash PTE, which would cause the function to fail with -EFAULT
a little further down.  That would propagate up to the KVM_RUN ioctl()
generally causing the KVM userspace (usually qemu) to fall over.

This addresses the problem by catching that case and returning to the guest
instead.

For completeness, this fixes the radix page fault handler in the same
way.  For radix this didn't cause any obvious misbehaviour, because we
ended up putting the non-present PTE into the guest's partition-scoped
page tables, leading immediately to another hypervisor data/instruction
storage interrupt, which would go through the page fault path again
and fix things up.

Fixes: cd758a9b57 "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2020-04-21 09:23:41 +10:00
Haren Myneni 040b00acec crypto/nx: Remove 'pid' in vas_tx_win_attr struct
When window is opened, pid reference is taken for user space
windows. Not needed for kernel windows. So remove 'pid' in
vas_tx_win_attr struct.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587114674.2275.1132.camel@hbabu-laptop
2020-04-20 16:53:14 +10:00
Haren Myneni dda44eb29c powerpc/vas: Add VAS user space API
On power9, userspace can send GZIP compression requests directly to NX
once kernel establishes NX channel / window with VAS. This patch provides
user space API which allows user space to establish channel using open
VAS_TX_WIN_OPEN ioctl, mmap and close operations.

Each window corresponds to file descriptor and application can open
multiple windows. After the window is opened, VAS_TX_WIN_OPEN icoctl to
open a window on specific VAS instance, mmap() system call to map
the hardware address of engine's request queue into the application's
virtual address space.

Then the application can then submit one or more requests to the the
engine by using the copy/paste instructions and pasting the CRBs to
the virtual address (aka paste_address) returned by mmap().

Only NX GZIP coprocessor type is supported right now and allow GZIP
engine access via /dev/crypto/nx-gzip device node.

Thanks to Michael Ellerman for his changes and suggestions to make the
ioctl generic to support any coprocessor type.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587114121.2275.1109.camel@hbabu-laptop
2020-04-20 16:53:14 +10:00
Haren Myneni 45f25a79fe powerpc/vas: Define VAS_TX_WIN_OPEN ioctl API
Define the VAS_TX_WIN_OPEN ioctl interface for NX GZIP access
from user space. This interface is used to open GZIP send window and
mmap region which can be used by userspace to send requests to NX
directly with copy/paste instructions.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587114065.2275.1106.camel@hbabu-laptop
2020-04-20 16:53:13 +10:00
Haren Myneni a8c0c69b5e powerpc/vas: Initialize window attributes for GZIP coprocessor type
Initialize send and receive window attributes for GZIP high and
normal priority types.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587114029.2275.1103.camel@hbabu-laptop
2020-04-20 16:53:13 +10:00
Haren Myneni c420644c0a powerpc: Use mm_context vas_windows counter to issue CP_ABORT
set_thread_uses_vas() sets used_vas flag for a process that opened VAS
window and issue CP_ABORT during context switch for only that process.
In multi-thread application, windows can be shared. For example Thread
A can open a window and Thread B can run COPY/PASTE instructions to
send NX request which may cause corruption or snooping or a covert
channel Also once this flag is set, continue to run CP_ABORT even the
VAS window is closed.

So define vas-windows counter in process mm_context, increment this
counter for each window open and decrement it for window close. If
vas-windows is set, issue CP_ABORT during context switch. It means
clear the foreign real address mapping only if the process / thread
uses COPY/PASTE. Then disable it for that process if windows are not
open.

Moved set_thread_uses_vas() code to vas_tx_win_open() as this
functionality is needed only for userspace open windows. We are adding
VAS userspace support along with this fix. So no need to include this
fix in stable releases.

Fixes: 9d2a4d7133 ("powerpc: Define set_thread_uses_vas()")
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Suggested-by: Milton Miller <miltonm@us.ibm.com>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587017291.2275.1077.camel@hbabu-laptop
2020-04-20 16:53:01 +10:00
Haren Myneni 1d955f9818 powerpc/vas: Free send window in VAS instance after credits returned
NX may be processing requests while trying to close window. Wait until
all credits are returned and then free send window from VAS instance.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587017256.2275.1076.camel@hbabu-laptop
2020-04-20 16:53:01 +10:00
Haren Myneni bd4da68dbd powerpc/vas: Display process stuck message
Process can not close send window until all requests are processed.
Means wait until window state is not busy and send credits are
returned. Display debug messages in case taking longer to close the
window.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587017219.2275.1073.camel@hbabu-laptop
2020-04-20 16:53:01 +10:00
Haren Myneni 04f6296ca7 powerpc/vas: Do not use default credits for receive window
System checkstops if RxFIFO overruns with more requests than the
maximum possible number of CRBs allowed in FIFO at any time. So
max credits value (rxattr.wcreds_max) is set and is passed to
vas_rx_win_open() by the the driver.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587017136.2275.1070.camel@hbabu-laptop
2020-04-20 16:53:01 +10:00
Haren Myneni cf33e1e938 powerpc/vas: Print CRB and FIFO values
Dump FIFO entries if could not find send window and print CRB
for debugging.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587017099.2275.1067.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni 461862ef94 powerpc/vas: Return credits after handling fault
NX uses credit mechanism to control the number of requests issued on
a specific window at any point of time. Only send windows and fault
window are used credits. When the request is issued on a given window,
a credit is taken. This credit will be returned after that request is
processed. If credits are not available, returns RMA_Busy for send
window and RMA_Reject for fault window.

NX expects OS to return credit for send window after processing fault
CRB. Also credit has to be returned for fault window after handling
the fault.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587017059.2275.1064.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni c96c4436ab powerpc/vas: Update CSB and notify process for fault CRBs
Applications polls on CSB for the status update after requests are
issued. NX process these requests and update the CSB with the status.
If it encounters translation error, pastes CRB in fault FIFO and
raises an interrupt. The kernel handles fault by reading CRB from
fault FIFO and process the fault CRB.

For each fault CRB, update fault address in CRB (fault_storage_addr)
and translation error status in CSB so that user space can touch the
fault address and resend the request. If the user space passed invalid
CSB address send signal to process with SIGSEGV.

In the case of multi-thread applications, child thread may not be
available. So if the task is not running, send signal to tgid.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587017022.2275.1063.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni 9774628acf powerpc/vas: Setup thread IRQ handler per VAS instance
When NX encounters translation error on CRB and any request buffer,
raises an interrupt on the CPU to handle the fault. It can raise one
interrupt for multiple faults. Expects OS to handle these faults and
return credits for fault window after processing faults.

Setup thread IRQ handler and IRQ thread function per each VAS instance.
IRQ handler checks if the thread is already woken up and can handle new
faults. If so returns with IRQ_HANDLED, otherwise wake up thread to
process new faults.

The thread functions reads each CRB entry from fault FIFO until sees
invalid entry. After reading each CRB, determine the corresponding
send window using pswid (from CRB) and process fault CRB. Then
invalidate the entry and return credit. Processing fault CRB and
return credit is described in subsequent patches.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587016982.2275.1060.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni db1c08a740 powerpc/vas: Take reference to PID and mm for user space windows
When process opens a window, its pid and tgid will be saved in the
vas_window struct. This window will be closed when the process exits.
The kernel handles NX faults by updating CSB or send SEGV signal to pid
of the process if the userspace csb addr is invalid.

In multi-thread applications, a window can be opened by a child thread,
but it will not be closed when this thread exits. It is expected that
the parent will clean up all resources including NX windows opened by
child threads. A child thread can send NX requests using this window
and could be killed before completion is reported. If the pid assigned
to this thread is reused while requests are pending, a failure SEGV
would be directed to the wrong place.

To prevent reusing the pid, take references to pid and mm when the window
is opened and release them when when the window is closed. Then if child
thread is not running, SEGV signal will be sent to thread group leader
(tgid).

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587016936.2275.1057.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni 8b8a73dc79 powerpc/vas: Register NX with fault window ID and IRQ port value
For each user space send window, register NX with fault window ID
and port value so that NX paste CRBs in this fault FIFO when it
sees fault on the request buffer.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587016888.2275.1054.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni 0d17de03ce powerpc/vas: Setup fault window per VAS instance
Setup fault window for each VAS instance. When NX gets a fault on
request buffer, pastes fault CRB in the corresponding fault FIFO and
then raises an interrupt to the OS. The kernel handles this fault
and process faults CRB from this FIFO.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587016846.2275.1053.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni c20e1e299d powerpc/vas: Alloc and setup IRQ and trigger port address
Allocate a xive irq on each chip with a vas instance. The NX coprocessor
raises a host CPU interrupt via vas if it encounters page fault on user
space request buffer. Subsequent patches register the trigger port with
the NX coprocessor, and create a vas fault handler for this interrupt
mapping.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587016806.2275.1050.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni 73a8077938 powerpc/vas: Define nx_fault_stamp in coprocessor_request_block
Kernel sets fault address and status in CRB for NX page fault on user
space address after processing page fault. User space gets the signal
and handles the fault mentioned in CRB by bringing the page in to
memory and send NX request again.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587016769.2275.1048.camel@hbabu-laptop
2020-04-20 16:53:00 +10:00
Haren Myneni 8d0ea29db5 powerpc/xive: Define xive_native_alloc_irq_on_chip()
This function allocates IRQ on a specific chip. VAS needs per chip
IRQ allocation and will have IRQ handler per VAS instance.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1587016720.2275.1047.camel@hbabu-laptop
2020-04-20 16:52:59 +10:00
Logan Gunthorpe bfeb022f8f mm/memory_hotplug: add pgprot_t to mhp_params
devm_memremap_pages() is currently used by the PCI P2PDMA code to create
struct page mappings for IO memory.  At present, these mappings are
created with PAGE_KERNEL which implies setting the PAT bits to be WB.
However, on x86, an mtrr register will typically override this and force
the cache type to be UC-.  In the case firmware doesn't set this
register it is effectively WB and will typically result in a machine
check exception when it's accessed.

Other arches are not currently likely to function correctly seeing they
don't have any MTRR registers to fall back on.

To solve this, provide a way to specify the pgprot value explicitly to
arch_add_memory().

Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a
simple change to pass the pgprot_t down to their respective functions
which set up the page tables.  For x86_32, set the page tables
explicitly using _set_memory_prot() (seeing they are already mapped).

For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this
should be fine, for now, seeing these architectures don't support
ZONE_DEVICE.

A check in __add_pages() is also added to ensure the pgprot parameter
was set for all arches.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Logan Gunthorpe 4e00c5affd powerpc/mm: thread pgprot_t through create_section_mapping()
In prepartion to support a pgprot_t argument for arch_add_memory().

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-6-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Logan Gunthorpe f5637d3b42 mm/memory_hotplug: rename mhp_restrictions to mhp_params
The mhp_restrictions struct really doesn't specify anything resembling a
restriction anymore so rename it to be mhp_params as it is a list of
extended parameters.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Anshuman Khandual 6cb4d9a287 mm/vma: introduce VM_ACCESS_FLAGS
There are many places where all basic VMA access flags (read, write,
exec) are initialized or checked against as a group.  One such example
is during page fault.  Existing vma_is_accessible() wrapper already
creates the notion of VMA accessibility as a group access permissions.

Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which
will not only reduce code duplication but also extend the VMA
accessibility concept in general.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Springer <rspringer@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Anshuman Khandual c62da0c35d mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS
This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the
existing VM_STACK_DEFAULT_FLAGS.  While here, also define some more
macros with standard VMA access flag combinations that are used
frequently across many platforms.  Apart from simplification, this
reduces code duplication as well.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Chris Zankel <chris@zankel.net>
Link: http://lkml.kernel.org/r/1583391014-8170-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10 15:36:21 -07:00
Linus Torvalds e4da01d833 powerpc updates for 5.7 #2
- A fix for a crash in machine check handling on pseries (ie. guests)
 
  - A small series to make it possible to disable CONFIG_COMPAT, and turn it off
    by default for ppc64le where it's not used.
 
  - A few other miscellaneous fixes and small improvements.
 
 Thanks to:
   Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann, Christophe Leroy, Dan
   Carpenter, Ganesh Goudar, Geert Uytterhoeven, Geoff Levand, Mahesh Salgaonkar,
   Markus Elfring, Michal Suchanek, Nicholas Piggin, Stephen Boyd, Wen Xiong.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6O8LgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgMcYEACbGf+Z9brLSasYajoqU6QdqGPacHEN
 1a9TEmUnN+HWgtfkkoEBFbyuYHnhyhYuf7hvNccDjDA91ESVylO+Wq7Q+v/xoz29
 LVyb3V6uuVMLHnoqwP5jpr0lS0aOpuu3Nc2SpfBuolDtJqeMxpVEEK3Ln3uATlq6
 SnEAxQEKb2x4Y4Tfuq5A3txupj0s/UYrmeR6GAdkN3Oapbb9uQl8Ql2smqrZo0cq
 6TLxtoFzbfXkV6NmY2V0OKMTeXt0fhrvdDhFEBckCUpRZLv4Fd7CwPWNER2bUVs6
 04kg87BAO8qRyfr3G93oP0mWgi65kpXI8yN6Vt5Lig+5PnxNh7swvEqDpQR1s9se
 uVoeN7RBHOKQppZRkpzf6yZbelCcMuwTeIBQ9XOx/jYFAHJ2nUkJm9TYgKckVvNY
 E4shiM1eoOVMrEmODFBCUmUkJLyn5jU1+r5mn708v2Nb5E1XgoTejitB6bHyL+Aa
 zo/0DdsZO86iNE7th94oHQRgVbx1vtP9kV6vK6BLB5M95RSGEdVAEMo5CTL70wr+
 hz7suOaijPi+TeCW9YPrcOHcHOQE9pzTFQoVZHuv8egbvd9Rni3aBAMMqHx/lQdE
 Hk4zN7IytOLqQTfINNYgoo0kUKI1ipJQOBfR5jyeLhgg6yp36CnO/m+0QGvrILbi
 18tlf2qkD75Z3g==
 =72sb
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:
 "The bulk of this is the series to make CONFIG_COMPAT user-selectable,
  it's been around for a long time but was blocked behind the
  syscall-in-C series.

  Plus there's also a few fixes and other minor things.

  Summary:

   - A fix for a crash in machine check handling on pseries (ie. guests)

   - A small series to make it possible to disable CONFIG_COMPAT, and
     turn it off by default for ppc64le where it's not used.

   - A few other miscellaneous fixes and small improvements.

  Thanks to: Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann,
  Christophe Leroy, Dan Carpenter, Ganesh Goudar, Geert Uytterhoeven,
  Geoff Levand, Mahesh Salgaonkar, Markus Elfring, Michal Suchanek,
  Nicholas Piggin, Stephen Boyd, Wen Xiong"

* tag 'powerpc-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Always build the tm-poison test 64-bit
  powerpc: Improve ppc_save_regs()
  Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
  powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
  powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory
  powerpc/perf: split callchain.c by bitness
  powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
  powerpc/64: make buildable without CONFIG_COMPAT
  powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
  powerpc/perf: consolidate read_user_stack_32
  powerpc: move common register copy functions from signal_32.c to signal.c
  powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig
  powerpc/ps3: Remove an unneeded NULL check
  powerpc/ps3: Remove duplicate error message
  powerpc/powernv: Re-enable imc trace-mode in kernel
  powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
  powerpc/pseries: Fix MCE handling on pseries
  selftests/eeh: Skip ahci adapters
  powerpc/64s: Fix doorbell wakeup msgclr optimisation
2020-04-09 11:01:42 -07:00
Linus Torvalds 9b06860d7c libnvdimm for 5.7
- Add support for region alignment configuration and enforcement to
   fix compatibility across architectures and PowerPC page size
   configurations.
 
 - Introduce 'zero_page_range' as a dax operation. This facilitates
   filesystem-dax operation without a block-device.
 
 - Introduce phys_to_target_node() to facilitate drivers that want to
   know resulting numa node if a given reserved address range was
   onlined.
 
 - Advertise a persistence-domain for of_pmem and papr_scm. The
   persistence domain indicates where cpu-store cycles need to reach in
   the platform-memory subsystem before the platform will consider them
   power-fail protected.
 
 - Promote numa_map_to_online_node() to a cross-kernel generic facility.
 
 - Save x86 numa information to allow for node-id lookups for reserved
   memory ranges, deploy that capability for the e820-pmem driver.
 
 - Pick up some miscellaneous minor fixes, that missed v5.6-final,
   including a some smatch reports in the ioctl path and some unit test
   compilation fixups.
 
 - Fixup some flexible-array declarations.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAl6LtIAACgkQHtKRamZ9
 iAIwRA/8CLVVuQpgHQ1tqK4h8CZPrISFXh7wy7uhocEU2xrDh6iGVnLztmoLRr2k
 5f8T9lRzreSAwIVL5DbGqP1pFncqIt9VMnKsFlaPMBGCBNR+hURY0iBCNjIT+jiq
 BOzLd52MR2rqJxeXGTMUbWrBrbmuj4mZPdmGVuFFe7GFRpoaVpCgOo+296eWa/ot
 gIOFUTonZY7STYjNvDok0TXCmiCFuJb+P+y5ldfCPShHvZhTiaF53jircja8vAjO
 G5dt8ixBKUK0rXRc4SEQsQhAZNcAFHb6Gy5lg4C2QzhTF374xTc9usJZNWbIE9iM
 5mipBYvjVuoY+XaCNZDkaRcJIy/jqB15O6l3QIWbZLGaK9m95YPp9LmkPFwd3JpO
 e3rO24ML471DxqB9iWIiJCNcBBocLOlnd6qAQTpppWDpGNbudwXvfsmKHmKIScSE
 x+IDCdscLmmm+WG2dLmLraWOVPu42xZFccoQCi4M3TTqfeB9pZ9XckFQ37zX62zG
 5t+7Ek+t1W4QVt/JQYVKH03XT15sqUpVknvx0Hl4Y5TtbDOkFLkO8RN0/HyExDef
 7iegS35kqTsM4EfZQ+9juKbI2JBAjHANcbj0V4dogqaRj6vr3akumBzUtuYqAofv
 qU3s9skmLsEemOJC+ns2PT8vl5dyIoeDfH0r2XvGWxYqolMqJpA=
 =sY4N
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm and dax updates from Dan Williams:
 "There were multiple touches outside of drivers/nvdimm/ this round to
  add cross arch compatibility to the devm_memremap_pages() interface,
  enhance numa information for persistent memory ranges, and add a
  zero_page_range() dax operation.

  This cycle I switched from the patchwork api to Konstantin's b4 script
  for collecting tags (from x86, PowerPC, filesystem, and device-mapper
  folks), and everything looks to have gone ok there. This has all
  appeared in -next with no reported issues.

  Summary:

   - Add support for region alignment configuration and enforcement to
     fix compatibility across architectures and PowerPC page size
     configurations.

   - Introduce 'zero_page_range' as a dax operation. This facilitates
     filesystem-dax operation without a block-device.

   - Introduce phys_to_target_node() to facilitate drivers that want to
     know resulting numa node if a given reserved address range was
     onlined.

   - Advertise a persistence-domain for of_pmem and papr_scm. The
     persistence domain indicates where cpu-store cycles need to reach
     in the platform-memory subsystem before the platform will consider
     them power-fail protected.

   - Promote numa_map_to_online_node() to a cross-kernel generic
     facility.

   - Save x86 numa information to allow for node-id lookups for reserved
     memory ranges, deploy that capability for the e820-pmem driver.

   - Pick up some miscellaneous minor fixes, that missed v5.6-final,
     including a some smatch reports in the ioctl path and some unit
     test compilation fixups.

   - Fixup some flexible-array declarations"

* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
  dax: Move mandatory ->zero_page_range() check in alloc_dax()
  dax,iomap: Add helper dax_iomap_zero() to zero a range
  dax: Use new dax zero page method for zeroing a page
  dm,dax: Add dax zero_page_range operation
  s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
  dax, pmem: Add a dax operation zero_page_range
  pmem: Add functions for reading/writing page to/from pmem
  libnvdimm: Update persistence domain value for of_pmem and papr_scm device
  tools/test/nvdimm: Fix out of tree build
  libnvdimm/region: Fix build error
  libnvdimm/region: Replace zero-length array with flexible-array member
  libnvdimm/label: Replace zero-length array with flexible-array member
  ACPI: NFIT: Replace zero-length array with flexible-array member
  libnvdimm/region: Introduce an 'align' attribute
  libnvdimm/region: Introduce NDD_LABELING
  libnvdimm/namespace: Enforce memremap_compat_align()
  libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
  libnvdimm: Out of bounds read in __nd_ioctl()
  acpi/nfit: improve bounds checking for 'func'
  mm/memremap_pages: Introduce memremap_compat_align()
  ...
2020-04-08 21:03:40 -07:00
Linus Torvalds 9bb715260e virtio: fixes, vdpa
Some bug fixes.
 The new vdpa subsystem with two first drivers.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl6MS7wPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpGp8H/2H49Gya1cfVbGU13qgmBSQqQXC8hS3iNLuG
 ltRgU+jafJT//kvkdm3/DUzfK3eRUWUfqZLKEbAQDtMY0OGHi/KGEBYVLDde7Zxt
 Lg4VnwBhkYDR/f01ZZDbHxzj9JAr83i28nILjLIqf3a1BX4zf203+ZE0/JM8a7wL
 dOPoH7NAfyz5ul2F67bR1IOF8vC6TidpavzR2+HC/MocHYXb6Bgfvt+i4EcrfuMf
 9lnBfajgklKr9sNJniwvvR1pWVg+YyG3VeC6T8tIC/xzbCmIoNT+5b3q2XPSIHq1
 EuQTeXH9CBFXS0qcFlq2ktR1xd1Lx95hKwZpqLwLFDmfgjhV2QU=
 =/84P
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - Some bug fixes

 - The new vdpa subsystem with two first drivers

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-balloon: Revert "virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM"
  vdpa: move to drivers/vdpa
  virtio: Intel IFC VF driver for VDPA
  vdpasim: vDPA device simulator
  vhost: introduce vDPA-based backend
  virtio: introduce a vDPA based transport
  vDPA: introduce vDPA bus
  vringh: IOTLB support
  vhost: factor out IOTLB
  vhost: allow per device message handler
  vhost: refine vhost and vringh kconfig
  virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
  virtio-net: Introduce hash report feature
  virtio-net: Introduce RSS receive steering feature
  virtio-net: Introduce extended RSC feature
  tools/virtio: option to build an out of tree module
2020-04-08 10:51:53 -07:00
Michal Simek 06e85c7e9a asm-generic: fix unistd_32.h generation format
Generated files are also checked by sparse that's why add newline to
remove sparse (C=1) warning.

The issue was found on Microblaze and reported like this:
./arch/microblaze/include/generated/uapi/asm/unistd_32.h:438:45: warning:
no newline at end of file

Mips and PowerPC have it already but let's align with style used by m68k.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com> (xtensa)
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/4d32ab4e1fb2edb691d2e1687e8fb303c09fd023.1581504803.git.michal.simek@xilinx.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:42 -07:00
David Hildenbrand ed7f9fec8c powernv/memtrace: always online added memory blocks
Let's always try to online the re-added memory blocks.  In case
add_memory() already onlined the added memory blocks, the first
device_online() call will fail and stop processing the remaining memory
blocks.

This avoids manually having to check memhp_auto_online.

Note: PPC always onlines all hotplugged memory directly from the kernel as
well - something that is handled by user space on other architectures.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Yumei Huang <yuhuang@redhat.com>
Link: http://lkml.kernel.org/r/20200317104942.11178-5-david@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:40 -07:00
Anshuman Khandual 03911132aa mm/vma: replace all remaining open encodings with is_vm_hugetlb_page()
This replaces all remaining open encodings with is_vm_hugetlb_page().

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1582520593-30704-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:37 -07:00
Anshuman Khandual 3122e80efc mm/vma: make vma_is_accessible() available for general use
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
available for general use.  While here, this replaces all remaining open
encodings for VMA access check with vma_is_accessible().

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:37 -07:00
Linus Torvalds d38c07afc3 powerpc updates for 5.7
- A large series from Nick for 64-bit to further rework our exception vectors,
    and rewrite portions of the syscall entry/exit and interrupt return in C. The
    result is much easier to follow code that is also faster in general.
 
  - Cleanup of our ptrace code to split various parts out that had become badly
    intertwined with #ifdefs over the years.
 
  - Changes to our NUMA setup under the PowerVM hypervisor which should
    hopefully avoid non-sensical topologies which can lead to warnings from the
    workqueue code and other problems.
 
  - MAINTAINERS updates to remove some of our old orphan entries and update the
    status of others.
 
  - Quite a few other small changes and fixes all over the map.
 
 Thanks to:
   Abdul Haleem, afzal mohammed, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
   Kumar K.V, Balamuruhan S, Cédric Le Goater, Chen Zhou, Christophe JAILLET,
   Christophe Leroy, Christoph Hellwig, Clement Courbet, Daniel Axtens, David
   Gibson, Douglas Miller, Fabiano Rosas, Fangrui Song, Ganesh Goudar, Gautham R.
   Shenoy, Greg Kroah-Hartman, Greg Kurz, Gustavo Luiz Duarte, Hari Bathini, Ilie
   Halip, Jan Kara, Joe Lawrence, Joe Perches, Kajol Jain, Larry Finger,
   Laurentiu Tudor, Leonardo Bras, Libor Pechacek, Madhavan Srinivasan, Mahesh
   Salgaonkar, Masahiro Yamada, Masami Hiramatsu, Mauricio Faria de Oliveira,
   Michael Neuling, Michal Suchanek, Mike Rapoport, Nageswara R Sastry, Nathan
   Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers,
   Oliver O'Halloran, Po-Hsu Lin, Pratik Rajesh Sampat, Rasmus Villemoes, Ravi
   Bangoria, Roman Bolshakov, Sam Bobroff, Sandipan Das, Santosh S, Sedat Dilek,
   Segher Boessenkool, Shilpasri G Bhat, Sourabh Jain, Srikar Dronamraju, Stephen
   Rothwell, Tyrel Datwyler, Vaibhav Jain, YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6JypATHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgOTyD/0U90tXb3VXlQcc4OFIb8vWIj76k4Zn
 ZSZ7RyOuvb5pCISBZjSK79XkR9eMHT77qagX4V41q64k4yQl8nbgLeVnwL76hLLc
 IJCs23f4nsO0uqX/MhSCc5dfOOOS2i8V+OQYtsYWsH5QaG95v0cHIqVaHHMlfQxu
 507GO/W5W6KTd4x008b5unQOuE51zMKlKvqEJXkT59obQFpaa2S5Wn7OzhsnarCH
 YSRNxaC7vtgBKLA9wUnFh8UUbh0FbOwXBCaq4OhHMhgRihdteVBCzlcR/6c+IRbt
 EoZxKzfQ0hI1z5f++kJNaRXMtUbSpM8D1HdKKHgiWjpdBSD0eu2X106KQT2R2ZOF
 qhX8xPLWNzdBglA6L43AaZUu+4ayd3QrrJIkjDv/K1rCHZjfGOzSQfoZgTEBNLFA
 tC0crhEfw8m98e4EwhCtekGQxdczRdLS9YvtC/h6mU2xkpA35yNSwB1/iuVQdkYD
 XyrEqImAQ1PJla7NL0hxSy5ZxrBtMeKT4WZZ0BNgKXryemldg8Tuv3AEyach3BHz
 eU0pIwpbnPm1JAPyrpDQ1yEf7QsD77gTPfEvilEci60R9DhvIMGAY+pt0qfME3yX
 wOLp2yVBEXlRmvHk/y/+r+m4aCsmwSrikbWwmLLwAAA6JehtzFOWxTEfNpACP23V
 mZyyZznsHIIE3Q==
 =ARdm
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Slightly late as I had to rebase mid-week to insert a bug fix:

   - A large series from Nick for 64-bit to further rework our exception
     vectors, and rewrite portions of the syscall entry/exit and
     interrupt return in C. The result is much easier to follow code
     that is also faster in general.

   - Cleanup of our ptrace code to split various parts out that had
     become badly intertwined with #ifdefs over the years.

   - Changes to our NUMA setup under the PowerVM hypervisor which should
     hopefully avoid non-sensical topologies which can lead to warnings
     from the workqueue code and other problems.

   - MAINTAINERS updates to remove some of our old orphan entries and
     update the status of others.

   - Quite a few other small changes and fixes all over the map.

  Thanks to: Abdul Haleem, afzal mohammed, Alexey Kardashevskiy, Andrew
  Donnellan, Aneesh Kumar K.V, Balamuruhan S, Cédric Le Goater, Chen
  Zhou, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Clement
  Courbet, Daniel Axtens, David Gibson, Douglas Miller, Fabiano Rosas,
  Fangrui Song, Ganesh Goudar, Gautham R. Shenoy, Greg Kroah-Hartman,
  Greg Kurz, Gustavo Luiz Duarte, Hari Bathini, Ilie Halip, Jan Kara,
  Joe Lawrence, Joe Perches, Kajol Jain, Larry Finger, Laurentiu Tudor,
  Leonardo Bras, Libor Pechacek, Madhavan Srinivasan, Mahesh Salgaonkar,
  Masahiro Yamada, Masami Hiramatsu, Mauricio Faria de Oliveira, Michael
  Neuling, Michal Suchanek, Mike Rapoport, Nageswara R Sastry, Nathan
  Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick
  Desaulniers, Oliver O'Halloran, Po-Hsu Lin, Pratik Rajesh Sampat,
  Rasmus Villemoes, Ravi Bangoria, Roman Bolshakov, Sam Bobroff,
  Sandipan Das, Santosh S, Sedat Dilek, Segher Boessenkool, Shilpasri G
  Bhat, Sourabh Jain, Srikar Dronamraju, Stephen Rothwell, Tyrel
  Datwyler, Vaibhav Jain, YueHaibing"

* tag 'powerpc-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
  powerpc: Make setjmp/longjmp signature standard
  powerpc/cputable: Remove unnecessary copy of cpu_spec->oprofile_type
  powerpc: Suppress .eh_frame generation
  powerpc: Drop -fno-dwarf2-cfi-asm
  powerpc/32: drop unused ISA_DMA_THRESHOLD
  powerpc/powernv: Add documentation for the opal sensor_groups sysfs interfaces
  selftests/powerpc: Fix try-run when source tree is not writable
  powerpc/vmlinux.lds: Explicitly retain .gnu.hash
  powerpc/ptrace: move ptrace_triggered() into hw_breakpoint.c
  powerpc/ptrace: create ppc_gethwdinfo()
  powerpc/ptrace: create ptrace_get_debugreg()
  powerpc/ptrace: split out ADV_DEBUG_REGS related functions.
  powerpc/ptrace: move register viewing functions out of ptrace.c
  powerpc/ptrace: split out TRANSACTIONAL_MEM related functions.
  powerpc/ptrace: split out SPE related functions.
  powerpc/ptrace: split out ALTIVEC related functions.
  powerpc/ptrace: split out VSX related functions.
  powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET
  powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64
  powerpc/ptrace: remove unused header includes
  ...
2020-04-05 11:12:59 -07:00
Nicholas Piggin d16a58f885 powerpc: Improve ppc_save_regs()
Make ppc_save_regs() a bit more useful:
  - Set NIP to our caller rather rather than the caller's
    caller (which is what we save to LR in the stack frame).
  - Set SOFTE to the current irq soft-mask state rather than
    uninitialised.
  - Zero CFAR rather than leave it uninitialised.

In qemu, injecting a nmi to an idle CPU gives a nicer stack
trace (note NIP, IRQMASK, CFAR).

  Oops: System Reset, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc2-00429-ga76e38fd80bf #1277
  NIP:  c0000000000b6e5c LR: c0000000000b6e5c CTR: c000000000b06270
  REGS: c00000000173fb08 TRAP: 0100   Not tainted
  MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28000224  XER: 00000000
  CFAR: c0000000016a2128 IRQMASK: c00000000173fc80
  GPR00: c0000000000b6e5c c00000000173fc80 c000000001743400 c00000000173fb08
  GPR04: 0000000000000000 0000000000000000 0000000000000008 0000000000000001
  GPR08: 00000001fea80000 0000000000000000 0000000000000000 ffffffffffffffff
  GPR12: c000000000b06270 c000000001930000 00000000300026c0 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000003 c0000000016a2128
  GPR20: c0000001ffc97148 0000000000000001 c000000000f289a8 0000000000080000
  GPR24: c0000000016e1480 000000011dc870ba 0000000000000000 0000000000000003
  GPR28: c0000000016a2128 c0000001ffc97148 c0000000016a2260 0000000000000003
  NIP [c0000000000b6e5c] power9_idle_type+0x5c/0x70
  LR [c0000000000b6e5c] power9_idle_type+0x5c/0x70
  Call Trace:
  [c00000000173fc80] [c0000000000b6e5c] power9_idle_type+0x5c/0x70 (unreliable)
  [c00000000173fcb0] [c000000000b062b0] stop_loop+0x40/0x60
  [c00000000173fce0] [c000000000b022d8] cpuidle_enter_state+0xa8/0x660
  [c00000000173fd60] [c000000000b0292c] cpuidle_enter+0x4c/0x70
  [c00000000173fda0] [c00000000017624c] call_cpuidle+0x4c/0x90
  [c00000000173fdc0] [c000000000176768] do_idle+0x338/0x460
  [c00000000173fe60] [c000000000176b3c] cpu_startup_entry+0x3c/0x40
  [c00000000173fe90] [c0000000000126b4] rest_init+0x124/0x140
  [c00000000173fed0] [c0000000010948d4] start_kernel+0x938/0x988
  [c00000000173ff90] [c00000000000cdcc] start_here_common+0x1c/0x20

  Oops: System Reset, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc2-00430-gddce91b8712f #1278
  NIP:  c00000000001d150 LR: c0000000000b6e5c CTR: c000000000b06270
  REGS: c00000000173fb08 TRAP: 0100   Not tainted
  MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28000224  XER: 00000000
  CFAR: 0000000000000000 IRQMASK: 1
  GPR00: c0000000000b6e5c c00000000173fc80 c000000001743400 c00000000173fb08
  GPR04: 0000000000000000 0000000000000000 0000000000000008 0000000000000001
  GPR08: 00000001fea80000 0000000000000000 0000000000000000 ffffffffffffffff
  GPR12: c000000000b06270 c000000001930000 00000000300026c0 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000003 c0000000016a2128
  GPR20: c0000001ffc97148 0000000000000001 c000000000f289a8 0000000000080000
  GPR24: c0000000016e1480 00000000b68db8ce 0000000000000000 0000000000000003
  GPR28: c0000000016a2128 c0000001ffc97148 c0000000016a2260 0000000000000003
  NIP [c00000000001d150] replay_system_reset+0x30/0xa0
  LR [c0000000000b6e5c] power9_idle_type+0x5c/0x70
  Call Trace:
  [c00000000173fc80] [c0000000000b6e5c] power9_idle_type+0x5c/0x70 (unreliable)
  [c00000000173fcb0] [c000000000b062b0] stop_loop+0x40/0x60
  [c00000000173fce0] [c000000000b022d8] cpuidle_enter_state+0xa8/0x660
  [c00000000173fd60] [c000000000b0292c] cpuidle_enter+0x4c/0x70
  [c00000000173fda0] [c00000000017624c] call_cpuidle+0x4c/0x90
  [c00000000173fdc0] [c000000000176768] do_idle+0x338/0x460
  [c00000000173fe60] [c000000000176b38] cpu_startup_entry+0x38/0x40
  [c00000000173fe90] [c0000000000126b4] rest_init+0x124/0x140
  [c00000000173fed0] [c0000000010948d4] start_kernel+0x938/0x988
  [c00000000173ff90] [c00000000000cdcc] start_here_common+0x1c/0x20

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200403131006.123243-1-npiggin@gmail.com
2020-04-04 21:40:57 +11:00
Linus Torvalds ff2ae607c6 SPDX patches for 5.7-rc1.
Here are 3 SPDX patches for 5.7-rc1.
 
 One fixes up the SPDX tag for a single driver, while the other two go
 through the tree and add SPDX tags for all of the .gitignore files as
 needed.
 
 Nothing too complex, but you will get a merge conflict with your current
 tree, that should be trivial to handle (one file modified by two things,
 one file deleted.)
 
 All 3 of these have been in linux-next for a while, with no reported
 issues other than the merge conflict.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
 m5AuCzO3Azt9KBi7NL+L
 =2Lm5
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull SPDX updates from Greg KH:
 "Here are three SPDX patches for 5.7-rc1.

  One fixes up the SPDX tag for a single driver, while the other two go
  through the tree and add SPDX tags for all of the .gitignore files as
  needed.

  Nothing too complex, but you will get a merge conflict with your
  current tree, that should be trivial to handle (one file modified by
  two things, one file deleted.)

  All three of these have been in linux-next for a while, with no
  reported issues other than the merge conflict"

* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  ASoC: MT6660: make spdxcheck.py happy
  .gitignore: add SPDX License Identifier
  .gitignore: remove too obvious comments
2020-04-03 13:12:26 -07:00
Nicholas Piggin abc3fce76a Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled"
This reverts commit ebb37cf3ff.

That commit does not play well with soft-masked irq state
manipulations in idle, interrupt replay, and possibly others due to
tracing code sometimes using irq_work_queue (e.g., in
trace_hardirqs_on()). That can cause PACA_IRQ_DEC to become set when
it is not expected, and be ignored or cleared or cause warnings.

The net result seems to be missing an irq_work until the next timer
interrupt in the worst case which is usually not going to be noticed,
however it could be a long time if the tick is disabled, which is
against the spirit of irq_work and might cause real problems.

The idea is still solid, but it would need more work. It's not really
clear if it would be worth added complexity, so revert this for
now (not a straight revert, but replace with a comment explaining why
we might see interrupts happening, and gives git blame something to
find).

Fixes: ebb37cf3ff ("powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200402120401.1115883-1-npiggin@gmail.com
2020-04-03 16:55:34 +11:00
Geert Uytterhoeven 60083063b7 powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
The PowerPC time code is not a clock provider, and just needs to call
of_clk_init().

Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.

Remove the #ifdef protecting the of_clk_init() call, as a stub is
available for the !CONFIG_COMMON_CLK case.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200213083804.24315-1-geert+renesas@glider.be
2020-04-03 16:17:22 +11:00
Dan Williams f6d2b802f8 Merge branch 'for-5.7/libnvdimm' into libnvdimm-for-next
- Introduce 'zero_page_range' as a dax operation. This facilitates
  filesystem-dax operation without a block-device.

- Advertise a persistence-domain for of_pmem and papr_scm. The
  persistence domain indicates where cpu-store cycles need to reach in
  the platform-memory subsystem before the platform will consider them
  power-fail protected.

- Fixup some flexible-array declarations.
2020-04-02 19:55:17 -07:00
Dan Williams d3b88655c0 Merge branch 'for-5.7/numa' into libnvdimm-for-next
- Promote numa_map_to_online_node() to a cross-kernel generic facility.

- Save x86 numa information to allow for node-id lookups for reserved
  memory ranges, deploy that capability for the e820-pmem driver.

- Introduce phys_to_target_node() to facilitate drivers that want to
  know resulting numa node if a given reserved address range was
  onlined.
2020-04-02 19:50:31 -07:00
Alexey Kardashevskiy 54fc3c681d powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory
Unlike normal memory ("memory" compatible type in the FDT), the
persistent memory ("ibm,pmemory" in the FDT) can be mapped anywhere in
the guest physical space and it can be used for DMA.

In order to maintain 1:1 mapping via the huge DMA window, we need to
know the maximum physical address at the time of the window setup. So
far we've been looking at "memory" nodes but "ibm,pmemory" does not
have fixed addresses and the persistent memory may be mapped
afterwards.

Since the persistent memory is still backed with page structs, use
MAX_PHYSMEM_BITS as the upper limit.

This effectively disables huge DMA window in LPAR under pHyp if
persistent memory is present but this is the best we can do for the
moment.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200331012338.23773-1-aik@ozlabs.ru
2020-04-03 12:22:47 +11:00
Linus Torvalds bef7b2a7be Devicetree updates for v5.7:
- Unit test for overlays with GPIO hogs
 
 - Improve dma-ranges parsing to handle dma-ranges with multiple entries
 
 - Update dtc to upstream version v1.6.0-2-g87a656ae5ff9
 
 - Improve overlay error reporting
 
 - Device link support for power-domains and hwlocks bindings
 
 - Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
   PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
   ITE Tech
 
 - Add deprecated Marvell vendor prefix 'mrvl'
 
 - A bunch of binding conversions to DT schema continues. Of note, the
   common serial and USB connector bindings are converted.
 
 - Add more Arm CPU compatibles
 
 - Drop Mark Rutland as DT maintainer :(
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAl6GTJUQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw8vaD/9HDT0EEDRdfr8s5OiLHYQMSXlSExJgR+ON
 iLI4v19apWwyycil176RXuWL5fFxh9HHOKbxOL27mmfqSLdv/xe/ev6Z4oNLVZzm
 YiUalm37K+R9yWsWTqxJKtQxWqSBBtOlyJmUYnvxI1bjfF4PANKF+IMpQVDflj1w
 9UMixoPRFdlOoMXVmC7yT0sfE1B6REJIh0Qfa4pqDwnHRhs8imgoIYA+JwFnmFPr
 k5Q8xS2YCwopH/7W5aRLzyEI2dF6Rfy9FID8jfyNoKZLtBW6cIEkA4zTv41BZSuj
 4Zj2etELXAnkIWMZXXdeKSz8IoSOlgjsCR482QmzTRuoFdmVLoJgsgh9hO2+oK07
 vJDEVJnXKxq1pF6DVbYwuzNgPB1pRzg6Y8EitdH3SBgjsOQ/m4RpKaqN0MkQhYrB
 BI9PCiF7GKCzNfNl3b/Op+HSsZ6aJkOsQPzC89/Ww0mB8azypgAwvLsjte5R3hib
 fS8z+V+lxAAC0/M981TsPKJ96HNv4VXbLI3bY9qJBd2LB+NDortsF/OkUuObsh+6
 hVQCWnae5V5iq8WmMTFABfBAK5lHafop87edn6X33LOyZZQCT9svNz/zrpakXh0e
 /SBGTUudtjCOy/BLj1VQrb6tczEzos7Ij+bszeIBAcSzrWg8gtVPjw9TvuAo2b/b
 9nwR4yrm0Q==
 =db/l
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:

 - Unit test for overlays with GPIO hogs

 - Improve dma-ranges parsing to handle dma-ranges with multiple entries

 - Update dtc to upstream version v1.6.0-2-g87a656ae5ff9

 - Improve overlay error reporting

 - Device link support for power-domains and hwlocks bindings

 - Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
   PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
   ITE Tech

 - Add deprecated Marvell vendor prefix 'mrvl'

 - A bunch of binding conversions to DT schema continues. Of note, the
   common serial and USB connector bindings are converted.

 - Add more Arm CPU compatibles

 - Drop Mark Rutland as DT maintainer :(

* tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (106 commits)
  MAINTAINERS: drop an old reference to stm32 pwm timers doc
  MAINTAINERS: dt: update etnaviv file reference
  dt-bindings: usb: dwc2: fix bindings for amlogic, meson-gxbb-usb
  dt-bindings: uniphier-system-bus: fix warning in the example
  dt-bindings: display: meson-vpu: fix indentation of reg-names' "items"
  dt-bindings: iio: Fix adi, ltc2983 uint64-matrix schema constraints
  dt-bindings: power: Fix example for power-domain
  dt-bindings: arm: Add some constraints for PSCI nodes
  of: some unittest overlays not untracked
  of: gpio unittest kfree() wrong object
  dt-bindings: phy: convert phy-rockchip-inno-usb2 bindings to yaml
  dt-bindings: serial: sh-sci: Convert to json-schema
  dt-bindings: serial: Document serialN aliases
  dt-bindings: thermal: tsens: Set 'additionalProperties: false'
  dt-bindings: thermal: tsens: Fix nvmem-cell-names schema
  dt-bindings: vendor-prefixes: Add Beacon vendor prefix
  dt-bindings: vendor-prefixes: Add Topwise
  of: of_private.h: Replace zero-length array with flexible-array member
  docs: dt: fix a broken reference to input.yaml
  docs: dt: fix references to ap806-system-controller.txt
  ...
2020-04-02 17:32:52 -07:00
Linus Torvalds 79f51b7b9c SCSI misc on 20200402
update changing all our txt files to rst ones.  Excluding that, we
 have the usual driver updates (qla2xxx, ufs, lpfc, zfcp, ibmvfc,
 pm80xx, aacraid), a treewide update for scnprintf and some other minor
 updates.  The major core update is Hannes moving functions out of the
 aacraid driver and into the core.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXoYKiyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSasAP4iGwSB
 Y8tFaZgWadu76+wj5MdqTBoXdhnIuFF0rZG3pQEAiIKdsfQlbSFdm75+gUtx5hG/
 GOilX/pJczTRJDCGNis=
 =g7Sk
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series has a huge amount of churn because it pulls in Mauro's doc
  update changing all our txt files to rst ones.

  Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc,
  zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and
  some other minor updates.

  The major core change is Hannes moving functions out of the aacraid
  driver and into the core"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits)
  scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code
  scsi: ufs: Do not rely on prefetched data
  scsi: dc395x: remove dc395x_bios_param
  scsi: libiscsi: Fix error count for active session
  scsi: hpsa: correct race condition in offload enabled
  scsi: message: fusion: Replace zero-length array with flexible-array member
  scsi: qedi: Add PCI shutdown handler support
  scsi: qedi: Add MFW error recovery process
  scsi: ufs: Enable block layer runtime PM for well-known logical units
  scsi: ufs-qcom: Override devfreq parameters
  scsi: ufshcd: Let vendor override devfreq parameters
  scsi: ufshcd: Update the set frequency to devfreq
  scsi: ufs: Resume ufs host before accessing ufs device
  scsi: ufs-mediatek: customize the delay for enabling host
  scsi: ufs: make HCE polling more compact to improve initialization latency
  scsi: ufs: allow custom delay prior to host enabling
  scsi: ufs-mediatek: use common delay function
  scsi: ufs: introduce common and flexible delay function
  scsi: ufs: use an enum for host capabilities
  scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc()
  ...
2020-04-02 17:03:53 -07:00
Linus Torvalds 8c1b724ddb ARM:
* GICv4.1 support
 * 32bit host removal
 
 PPC:
 * secure (encrypted) using under the Protected Execution Framework
 ultravisor
 
 s390:
 * allow disabling GISA (hardware interrupt injection) and protected
 VMs/ultravisor support.
 
 x86:
 * New dirty bitmap flag that sets all bits in the bitmap when dirty
 page logging is enabled; this is faster because it doesn't require bulk
 modification of the page tables.
 * Initial work on making nested SVM event injection more similar to VMX,
 and less buggy.
 * Various cleanups to MMU code (though the big ones and related
 optimizations were delayed to 5.8).  Instead of using cr3 in function
 names which occasionally means eptp, KVM too has standardized on "pgd".
 * A large refactoring of CPUID features, which now use an array that
 parallels the core x86_features.
 * Some removal of pointer chasing from kvm_x86_ops, which will also be
 switched to static calls as soon as they are available.
 * New Tigerlake CPUID features.
 * More bugfixes, optimizations and cleanups.
 
 Generic:
 * selftests: cleanups, new MMU notifier stress test, steal-time test
 * CSV output for kvm_stat.
 
 KVM/MIPS has been broken since 5.5, it does not compile due to a patch committed
 by MIPS maintainers.  I had already prepared a fix, but the MIPS maintainers
 prefer to fix it in generic code rather than KVM so they are taking care of it.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6GOnIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMfxwf/ZKLZiRoaovXCOG71M/eHtQb8ZIqU
 3MPy+On3eC5Sk/aBxWUL9EFZsbYG6kYdbZ1VOvG9XPBoLlnkDSm/IR0kaELHtnjj
 oGVda/tvGn46Ne39y8xBptmb91WDcWH0vFthT/CwlMxAw3xjr+gG7Qyo+8F2CW6m
 SSSuLiHSBnyO1cQKruBTHZ8qnR8LlnfXEqtd6Y4LFLic0LbLIoIdRcT3wjQrcZrm
 Djd7wbTEYZjUfoqZ72ekwEDUsONcDLDSKcguDO9pSMSCGhpxCVT5Vy68KRpoIMs2
 nzNWDKjvqQo5zb2+GWxJgkd12Hv+n7PCXZMbVrWBu1pQsewUns9m4mkpGw==
 =6fGt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - GICv4.1 support

   - 32bit host removal

  PPC:
   - secure (encrypted) using under the Protected Execution Framework
     ultravisor

  s390:
   - allow disabling GISA (hardware interrupt injection) and protected
     VMs/ultravisor support.

  x86:
   - New dirty bitmap flag that sets all bits in the bitmap when dirty
     page logging is enabled; this is faster because it doesn't require
     bulk modification of the page tables.

   - Initial work on making nested SVM event injection more similar to
     VMX, and less buggy.

   - Various cleanups to MMU code (though the big ones and related
     optimizations were delayed to 5.8). Instead of using cr3 in
     function names which occasionally means eptp, KVM too has
     standardized on "pgd".

   - A large refactoring of CPUID features, which now use an array that
     parallels the core x86_features.

   - Some removal of pointer chasing from kvm_x86_ops, which will also
     be switched to static calls as soon as they are available.

   - New Tigerlake CPUID features.

   - More bugfixes, optimizations and cleanups.

  Generic:
   - selftests: cleanups, new MMU notifier stress test, steal-time test

   - CSV output for kvm_stat"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits)
  x86/kvm: fix a missing-prototypes "vmread_error"
  KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y
  KVM: VMX: Add a trampoline to fix VMREAD error handling
  KVM: SVM: Annotate svm_x86_ops as __initdata
  KVM: VMX: Annotate vmx_x86_ops as __initdata
  KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup()
  KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection
  KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes
  KVM: VMX: Configure runtime hooks using vmx_x86_ops
  KVM: VMX: Move hardware_setup() definition below vmx_x86_ops
  KVM: x86: Move init-only kvm_x86_ops to separate struct
  KVM: Pass kvm_init()'s opaque param to additional arch funcs
  s390/gmap: return proper error code on ksm unsharing
  KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move()
  KVM: Fix out of range accesses to memslots
  KVM: X86: Micro-optimize IPI fastpath delay
  KVM: X86: Delay read msr data iff writes ICR MSR
  KVM: PPC: Book3S HV: Add a capability for enabling secure guests
  KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs
  KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs
  ...
2020-04-02 15:13:15 -07:00
Linus Torvalds 7f218319ca Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
 "Just a couple of updates for linux-5.7:

   - A new Kconfig option to enable IMA architecture specific runtime
     policy rules needed for secure and/or trusted boot, as requested.

   - Some message cleanup (eg. pr_fmt, additional error messages)"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  ima: add a new CONFIG for loading arch-specific policies
  integrity: Remove duplicate pr_fmt definitions
  IMA: Add log statements for failure conditions
  IMA: Update KBUILD_MODNAME for IMA files to ima
2020-04-02 14:49:46 -07:00
Linus Torvalds 6cad420cc6 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "A large amount of MM, plenty more to come.

  Subsystems affected by this patch series:
   - tools
   - kthread
   - kbuild
   - scripts
   - ocfs2
   - vfs
   - mm: slub, kmemleak, pagecache, gup, swap, memcg, pagemap, mremap,
         sparsemem, kasan, pagealloc, vmscan, compaction, mempolicy,
         hugetlbfs, hugetlb"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (155 commits)
  include/linux/huge_mm.h: check PageTail in hpage_nr_pages even when !THP
  mm/hugetlb: fix build failure with HUGETLB_PAGE but not HUGEBTLBFS
  selftests/vm: fix map_hugetlb length used for testing read and write
  mm/hugetlb: remove unnecessary memory fetch in PageHeadHuge()
  mm/hugetlb.c: clean code by removing unnecessary initialization
  hugetlb_cgroup: add hugetlb_cgroup reservation docs
  hugetlb_cgroup: add hugetlb_cgroup reservation tests
  hugetlb: support file_region coalescing again
  hugetlb_cgroup: support noreserve mappings
  hugetlb_cgroup: add accounting for shared mappings
  hugetlb: disable region_add file_region coalescing
  hugetlb_cgroup: add reservation accounting for private mappings
  mm/hugetlb_cgroup: fix hugetlb_cgroup migration
  hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
  hugetlb_cgroup: add hugetlb_cgroup reservation counter
  hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  mm/memblock.c: remove redundant assignment to variable max_addr
  mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
  mm: mempolicy: use VM_BUG_ON_VMA in queue_pages_test_walk()
  ...
2020-04-02 13:55:34 -07:00
Pingfan Liu e03d1f7834 mm/sparse: rename pfn_present() to pfn_in_present_section()
After introducing mem sub section concept, pfn_present() loses its literal
meaning, and will not be necessary a truth on partial populated mem
section.

Since all of the callers use it to judge an absent section, it is better
to rename pfn_present() as pfn_in_present_section().

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Leonardo Bras <leonardo@linux.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Link: http://lkml.kernel.org/r/1581919110-29575-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:30 -07:00
Peter Xu 4064b98270 mm: allow VM_FAULT_RETRY for multiple times
The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once.  We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time.  This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page.  However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY.  It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event.  Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

  - ALLOW_RETRY and !TRIED:  this means the page fault allows to
                             retry, and this is the first try

  - ALLOW_RETRY and TRIED:   this means the page fault allows to
                             retry, and this is not the first try

  - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
                             to retry at all

  - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used

In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths.  One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault.  It might also benefit
other potential users who will have similar requirement like userfault
write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:30 -07:00
Peter Xu dde1607248 mm: introduce FAULT_FLAG_DEFAULT
Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags.  Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.

Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over.  With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Peter Xu c9a0dad162 powerpc/mm: use helper fault_signal_pending()
Let powerpc code to use the new helper, by moving the signal handling
earlier before the retry logic.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160222.9422-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Anshuman Khandual 7969f2264f mm/vma: make vma_is_foreign() available for general use
Idea of a foreign VMA with respect to the present context is very generic.
But currently there are two identical definitions for this in powerpc and
x86 platforms.  Lets consolidate those redundant definitions while making
vma_is_foreign() available for general use later.  This should not cause
any functional change.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Link: http://lkml.kernel.org/r/1582782965-3274-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Masahiro Yamada 630f289b71 asm-generic: make more kernel-space headers mandatory
Change a header to mandatory-y if both of the following are met:

[1] At least one architecture (except um) specifies it as generic-y in
    arch/*/include/asm/Kbuild

[2] Every architecture (except um) either has its own implementation
    (arch/*/include/asm/*.h) or specifies it as generic-y in
    arch/*/include/asm/Kbuild

This commit was generated by the following shell script.

----------------------------------->8-----------------------------------

arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d')

tmpfile=$(mktemp)

grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile

find arch -path 'arch/*/include/asm/Kbuild' |
	xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u |
while read header
do
	mandatory=yes

	for arch in $arches
	do
		if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild &&
			! [ -f arch/$arch/include/asm/$header ]; then
			mandatory=no
			break
		fi
	done

	if [ "$mandatory" = yes ]; then
		echo "mandatory-y += $header" >> $tmpfile

		for arch in $arches
		do
			sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild
		done
	fi

done

sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild

LANG=C sort $tmpfile >> include/asm-generic/Kbuild

----------------------------------->8-----------------------------------

One obvious benefit is the diff stat:

 25 files changed, 52 insertions(+), 557 deletions(-)

It is tedious to list generic-y for each arch that needs it.

So, mandatory-y works like a fallback default (by just wrapping
asm-generic one) when arch does not have a specific header
implementation.

See the following commits:

def3f7cefe
a1b39bae16

It is tedious to convert headers one by one, so I processed by a shell
script.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/20200210175452.5030-1-masahiroy@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:25 -07:00
Michal Suchanek 7c0eda1a04 powerpc/perf: split callchain.c by bitness
Building callchain.c with !COMPAT proved quite ugly with all the
defines. Splitting out the 32bit and 64bit parts looks better.

No code change intended.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a20027bf1074935a7934ee2a6757c99ea047e70d.1584699455.git.msuchanek@suse.de
2020-04-03 00:10:00 +11:00
Michal Suchanek 6e944aed88 powerpc/64: Make COMPAT user-selectable disabled on littleendian by default.
On bigendian ppc64 it is common to have 32bit legacy binaries but much
less so on littleendian.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/41393d6e895b0d3a47ee62f8f51e1cf888ad6226.1584699455.git.msuchanek@suse.de
2020-04-03 00:10:00 +11:00
Michal Suchanek 0a7601b6ff powerpc/64: make buildable without CONFIG_COMPAT
There are numerous references to 32bit functions in generic and 64bit
code so ifdef them out.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e5619617020ef3a1f54f0c076e7d74cb9ec9f3bf.1584699455.git.msuchanek@suse.de
2020-04-03 00:10:00 +11:00
Michal Suchanek 2910428106 powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
Merge the 32bit and 64bit version.

Halve the check constants on 32bit.

Use STACK_TOP since it is defined.

Passing is_64 is now redundant since is_32bit_task() is used to
determine which callchain variant should be used. Use STACK_TOP and
is_32bit_task() directly.

This removes a page from the valid 32bit area on 64bit:
 #define TASK_SIZE_USER32 (0x0000000100000000UL - (1 * PAGE_SIZE))
 #define STACK_TOP_USER32 TASK_SIZE_USER32

Change return value to bool. It is inverted by users anyway.

Change to invalid_user_sp to avoid inverting the return value twice.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/be8e40fc0737fb28ad08b198552dee7cac1c5ce2.1584699455.git.msuchanek@suse.de
2020-04-03 00:10:00 +11:00
Michal Suchanek d6c19bdee2 powerpc/perf: consolidate read_user_stack_32
There are two almost identical copies for 32bit and 64bit.

The function is used only in 32bit code which will be split out in next
patch so consolidate to one function.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0c21c919ed1296420199c78f7c3cfd29d3c7e909.1584699455.git.msuchanek@suse.de
2020-04-03 00:09:59 +11:00
Michal Suchanek 3dd4eb83a9 powerpc: move common register copy functions from signal_32.c to signal.c
These functions are required for 64bit as well.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9fd6d9b7c5e91fab21159fe23534a2f16b4962d3.1584699455.git.msuchanek@suse.de
2020-04-03 00:09:59 +11:00
Michal Suchanek 9e62ccec3b powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
This partially reverts commit caf6f9c8a3 ("asm-generic: Remove
unneeded __ARCH_WANT_SYS_LLSEEK macro")

When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.

There is resistance to both removing the llseek syscall from the 64bit
syscall tables and building the llseek interface unconditionally.


Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/20190828151552.GA16855@infradead.org/
Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/
Link: https://lore.kernel.org/r/dd4575c51e31766e87f7e7fa121d099ab78d3290.1584699455.git.msuchanek@suse.de
2020-04-03 00:09:59 +11:00
Geoff Levand d3883fa078 powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig
Set CONFIG_UEVENT_HELPER=y in ps3_defconfig.

commit 1be01d4a57 (driver: base: Disable
CONFIG_UEVENT_HELPER by default) disabled the CONFIG_UEVENT_HELPER option
that is needed for hotplug and module loading by most older 32bit powerpc
distributions that users typically install on the PS3.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/410cda9aa1a6e04434dfe1f9aa2103d0694f706c.1585340156.git.geoff@infradead.org
2020-04-03 00:09:59 +11:00
Markus Elfring 7ee417497a powerpc/ps3: Remove duplicate error message
Remove a duplicate memory allocation failure error message.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1bc5a16a22c487c478a204ebb7b80a22d2ad9cd0.1585340156.git.geoff@infradead.org
2020-04-03 00:09:58 +11:00
Anju T Sudhakar 4bdd39460b powerpc/powernv: Re-enable imc trace-mode in kernel
commit <249fad734a25> ""powerpc/perf: Disable trace_imc pmu"
disables IMC(In-Memory Collection) trace-mode in kernel, since frequent
mode switching between accumulation mode and trace mode via the spr LDBAR
in the hardware can trigger a checkstop(system crash).

Patch to re-enable imc-trace mode in kernel.

The previous patch(1/2) in this series will address the mode switching issue
by implementing a global lock, and will restrict the usage of
accumulation and trace-mode at a time.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313055238.8656-2-anju@linux.vnet.ibm.com
2020-04-03 00:09:58 +11:00
Anju T Sudhakar a36e8ba60b powerpc/perf: Implement a global lock to avoid races between trace, core and thread imc events.
IMC(In-memory Collection Counters) does performance monitoring in
two different modes, i.e accumulation mode(core-imc and thread-imc events),
and trace mode(trace-imc events). A cpu thread can either be in
accumulation-mode or trace-mode at a time and this is done via the LDBAR
register in POWER architecture. The current design does not address the
races between thread-imc and trace-imc events.

Patch implements a global id and lock to avoid the races between
core, trace and thread imc events. With this global id-lock
implementation, the system can either run core, thread or trace imc
events at a time. i.e. to run any core-imc events, thread/trace imc events
should not be enabled/monitored.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313055238.8656-1-anju@linux.vnet.ibm.com
2020-04-03 00:09:58 +11:00
Ganesh Goudar a95a0a1654 powerpc/pseries: Fix MCE handling on pseries
MCE handling on pSeries platform fails as recent rework to use common
code for pSeries and PowerNV in machine check error handling tries to
access per-cpu variables in realmode. The per-cpu variables may be
outside the RMO region on pSeries platform and needs translation to be
enabled for access. Just moving these per-cpu variable into RMO region
did'nt help because we queue some work to workqueues in real mode, which
again tries to touch per-cpu variables. Also fwnmi_release_errinfo()
cannot be called when translation is not enabled.

This patch fixes this by enabling translation in the exception handler
when all required real mode handling is done. This change only affects
the pSeries platform.

Without this fix below kernel crash is seen on injecting
SLB multihit:

BUG: Unable to handle kernel data access on read at 0xc00000027b205950
Faulting instruction address: 0xc00000000003b7e0
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: mcetest_slb(OE+) af_packet(E) xt_tcpudp(E) ip6t_rpfilter(E) ip6t_REJECT(E) ipt_REJECT(E) xt_conntrack(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) xfs(E) ibmveth(E) vmx_crypto(E) gf128mul(E) uio_pdrv_genirq(E) uio(E) crct10dif_vpmsum(E) rtc_generic(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) raid6_pq(E) sr_mod(E) sd_mod(E) cdrom(E) ibmvscsi(E) scsi_transport_srp(E) crc32c_vpmsum(E) dm_mod(E) sg(E) scsi_mod(E)
CPU: 34 PID: 8154 Comm: insmod Kdump: loaded Tainted: G OE 5.5.0-mahesh #1
NIP: c00000000003b7e0 LR: c0000000000f2218 CTR: 0000000000000000
REGS: c000000007dcb960 TRAP: 0300 Tainted: G OE (5.5.0-mahesh)
MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28002428 XER: 20040000
CFAR: c0000000000f2214 DAR: c00000027b205950 DSISR: 40000000 IRQMASK: 0
GPR00: c0000000000f2218 c000000007dcbbf0 c000000001544800 c000000007dcbd70
GPR04: 0000000000000001 c000000007dcbc98 c008000000d00258 c0080000011c0000
GPR08: 0000000000000000 0000000300000003 c000000001035950 0000000003000048
GPR12: 000000027a1d0000 c000000007f9c000 0000000000000558 0000000000000000
GPR16: 0000000000000540 c008000001110000 c008000001110540 0000000000000000
GPR20: c00000000022af10 c00000025480fd70 c008000001280000 c00000004bfbb300
GPR24: c000000001442330 c00800000800000d c008000008000000 4009287a77000510
GPR28: 0000000000000000 0000000000000002 c000000001033d30 0000000000000001
NIP [c00000000003b7e0] save_mce_event+0x30/0x240
LR [c0000000000f2218] pseries_machine_check_realmode+0x2c8/0x4f0
Call Trace:
Instruction dump:
3c4c0151 38429050 7c0802a6 60000000 fbc1fff0 fbe1fff8 f821ffd1 3d42ffaf
3fc2ffaf e98d0030 394a1150 3bdef530 <7d6a62aa> 1d2b0048 2f8b0063 380b0001
---[ end trace 46fd63f36bbdd940 ]---

Fixes: 9ca766f989 ("powerpc/64s/pseries: machine check convert to use common event code")
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200320110119.10207-1-ganeshgr@linux.ibm.com
2020-04-03 00:09:58 +11:00
Nicholas Piggin 0c89649a70 powerpc/64s: Fix doorbell wakeup msgclr optimisation
Commit 3282a3da25 ("powerpc/64: Implement soft interrupt replay in C")
broke the doorbell wakeup optimisation introduced by commit a9af97aa0a
("powerpc/64s: msgclr when handling doorbell exceptions from system
reset").

This patch restores the msgclr, in C code. It's now done in the system
reset wakeup path rather than doorbell interrupt replay where it used
to be, because it is always the right thing to do in the wakeup case,
but it may be rarely of use in other interrupt replay situations in
which case it's wasted work - we would have to run measurements to see
if that was a worthwhile optimisation, and I suspect it would not be.

The results are similar to those in the original commit, test on POWER8
of context_switch selftests benchmark with polling idle disabled (e.g.,
always nap, giving cross-CPU IPIs) gives the following results:

                                  broken           patched
  Different threads, same core:   317k/s           375k/s    +18.7%
  Different cores:                280k/s           282k/s     +1.0%

Fixes: 3282a3da25 ("powerpc/64: Implement soft interrupt replay in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200402121212.1118218-1-npiggin@gmail.com
2020-04-03 00:09:53 +11:00
Linus Torvalds 50a5de895d hmm related patches for 5.7
This series focuses on corner case bug fixes and general clarity
 improvements to hmm_range_fault().
 
 - 9 bug fixes
 
 - Allow pgmap to track the 'owner' of a DEVICE_PRIVATE - in this case the
   owner tells the driver if it can understand the DEVICE_PRIVATE page or
   not. Use this to resolve a bug in nouveau where it could touch
   DEVICE_PRIVATE pages from other drivers.
 
 - Remove a bunch of dead, redundant or unused code and flags
 
 - Clarity improvements to hmm_range_fault()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6CUDEACgkQOG33FX4g
 mxqXgQ/+I2o4QIq86xTddRCesINab73sA4bdvIYgixojTBvo5a9Tu9JO3nmzJOEi
 mmUXsdN+MnyTdnrkcfbEvo0b5ktCGlwGYQmcRv9MrB1QvSChg6Zy3F3cWR+dsv5c
 Mo3z0sWRx+//dVSI0p1BMhd5rym0nZCh/rUO94C/XgZ3YoC+MjX4lKeCMvjJpZFc
 DDEN3Z+zjOvjeiT9TMNMmkPnZSY+N4RQwTRKek5l95QwcHyFy5SBI+uzOHyE0lVw
 KG5+yk+TFRMoiHjZh4BFqkibQbVKG0qMKiOymIncTr3kL4DK0Hwf/4Zk4DMKucEb
 Rs2B7C+UShiyIOzdjbnBsqPiHevAhR3nOpozJy3x/Z6fLapVoIlVx3UJJ/djuxZa
 7+H2VzxKz1xGRDH4Js/WD0smZ8jisA04vW5THJVFr0Vd2sqKBo3G/5ay2Kw9NX7T
 MRII1KkA/luZbmM6WLEQkliVuNkMCsVUU3hiY96tLI7uN73M9fQUe6s3NubeoFxi
 UKg0zuMsAlsJxkHGeY+IMHtUR/law+k9/aZoB4idMGwV/i7KiiolbRcB0H5yn4L5
 OV+4uxVBFS/n37sCpMwpx5WJtgbPzww3b6Cd0hfIUqnQzSOjP40Pl5ZX/c/2M7ps
 bUdZjve5j653YeC/7NBPDprvzbfmyyxJdZZZzzXLQl5kyL2o9LE=
 =UEpV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull hmm updates from Jason Gunthorpe:
 "This series focuses on corner case bug fixes and general clarity
  improvements to hmm_range_fault(). It arose from a review of
  hmm_range_fault() by Christoph, Ralph and myself.

  hmm_range_fault() is being used by these 'SVM' style drivers to
  non-destructively read the page tables. It is very similar to
  get_user_pages() except that the output is an array of PFNs and
  per-pfn flags, and it has various modes of reading.

  This is necessary before RDMA ODP can be converted, as we don't want
  to have weird corner case regressions, which is still a looking
  forward item. Ralph has a nice tester for this routine, but it is
  waiting for feedback from the selftests maintainers.

  Summary:

   - 9 bug fixes

   - Allow pgmap to track the 'owner' of a DEVICE_PRIVATE - in this case
     the owner tells the driver if it can understand the DEVICE_PRIVATE
     page or not. Use this to resolve a bug in nouveau where it could
     touch DEVICE_PRIVATE pages from other drivers.

   - Remove a bunch of dead, redundant or unused code and flags

   - Clarity improvements to hmm_range_fault()"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (25 commits)
  mm/hmm: return error for non-vma snapshots
  mm/hmm: do not set pfns when returning an error code
  mm/hmm: do not unconditionally set pfns when returning EBUSY
  mm/hmm: use device_private_entry_to_pfn()
  mm/hmm: remove HMM_FAULT_SNAPSHOT
  mm/hmm: remove unused code and tidy comments
  mm/hmm: return the fault type from hmm_pte_need_fault()
  mm/hmm: remove pgmap checking for devmap pages
  mm/hmm: check the device private page owner in hmm_range_fault()
  mm: simplify device private page handling in hmm_range_fault
  mm: handle multiple owners of device private pages in migrate_vma
  memremap: add an owner field to struct dev_pagemap
  mm: merge hmm_vma_do_fault into into hmm_vma_walk_hole_
  mm/hmm: don't handle the non-fault case in hmm_vma_walk_hole_()
  mm/hmm: simplify hmm_vma_walk_hugetlb_entry()
  mm/hmm: remove the unused HMM_FAULT_ALLOW_RETRY flag
  mm/hmm: don't provide a stub for hmm_range_fault()
  mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
  mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
  ...
2020-04-01 17:57:52 -07:00
Jason Wang 20c384f1ea vhost: refine vhost and vringh kconfig
Currently, CONFIG_VHOST depends on CONFIG_VIRTUALIZATION. But vhost is
not necessarily for VM since it's a generic userspace and kernel
communication protocol. Such dependency may prevent archs without
virtualization support from using vhost.

To solve this, a dedicated vhost menu is created under drivers so
CONIFG_VHOST can be decoupled out of CONFIG_VIRTUALIZATION.

While at it, also squash Kconfig.vringh into vhost Kconfig file. This
avoids the trick of conditional inclusion from VOP or CAIF. Then it
will be easier to introduce new vringh users and common dependency for
both vringh and vhost.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200326140125.19794-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-01 12:06:26 -04:00
Clement Courbet c17eb4dca5 powerpc: Make setjmp/longjmp signature standard
Declaring setjmp()/longjmp() as taking longs makes the signature
non-standard, and makes clang complain. In the past, this has been
worked around by adding -ffreestanding to the compile flags.

The implementation looks like it only ever propagates the value
(in longjmp) or sets it to 1 (in setjmp), and we only call longjmp
with integer parameters.

This allows removing -ffreestanding from the compilation flags.

Fixes: c9029ef9c9 ("powerpc: Avoid clang warnings around setjmp and longjmp")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Clement Courbet <courbet@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200330080400.124803-1-courbet@google.com
2020-04-01 14:30:51 +11:00
Leonardo Bras 41b8426fdb powerpc/cputable: Remove unnecessary copy of cpu_spec->oprofile_type
Before checking for cpu_type == NULL, this same copy happens, so doing
it here will just write the same value to the t->oprofile_type
again.

Remove the repeated copy, as it is unnecessary.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200215053637.280880-1-leonardo@linux.ibm.com
2020-04-01 14:30:51 +11:00
Naveen N. Rao ba96301ce9 powerpc: Suppress .eh_frame generation
GCC v8 defaults to enabling -fasynchronous-unwind-tables due to
https://gcc.gnu.org/r259298, which results in .eh_frame section being
generated. This results in additional disk usage by the build, as well
as the kernel modules. Since the kernel has no use for this, this
section is discarded.

Add -fno-asynchronous-unwind-tables to KBUILD_CFLAGS to suppress
generation of .eh_frame section. Note that our VDSOs need .eh_frame, but
are not affected by this change since our VDSO code are all in assembly.

Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1ed7cd84a7d1a3180b30c0c60e70eed8bb8b40c3.1583415544.git.naveen.n.rao@linux.vnet.ibm.com
2020-04-01 14:30:51 +11:00
Naveen N. Rao c04868df38 powerpc: Drop -fno-dwarf2-cfi-asm
The original commit/discussion adding -fno-dwarf2-cfi-asm refers to
R_PPC64_REL32 relocations not being handled by our module loader:
http://lkml.kernel.org/r/20090224065112.GA6690@bombadil.infradead.org

However, that is now handled thanks to commit 9f751b82b4
("powerpc/module: Add support for R_PPC64_REL32 relocations").

So, drop this flag from our Makefile.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9b22a064de6eb1301d92177eb3a38559df7005d3.1583415544.git.naveen.n.rao@linux.vnet.ibm.com
2020-04-01 14:30:51 +11:00
Mike Rapoport b77afad84e powerpc/32: drop unused ISA_DMA_THRESHOLD
The ISA_DMA_THRESHOLD variable is set by several platforms but never
referenced.
Remove it.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191125092033.20014-1-rppt@kernel.org
2020-04-01 14:30:50 +11:00
Michael Ellerman ead983604c powerpc/vmlinux.lds: Explicitly retain .gnu.hash
Relocatable kernel builds produce a warning about .gnu.hash being an
orphan section:

  ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in section `.gnu.hash'

If we try to discard it the build fails:

  ld -EL -m elf64lppc -pie --orphan-handling=warn --build-id -o
    .tmp_vmlinux1 -T ./arch/powerpc/kernel/vmlinux.lds --whole-archive
    arch/powerpc/kernel/head_64.o arch/powerpc/kernel/entry_64.o
    ...
    sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive
    --start-group lib/lib.a --end-group
  ld: could not find section .gnu.hash

So add an entry to explicitly retain it, as we do for .hash.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200227045933.22967-1-mpe@ellerman.id.au
2020-04-01 14:30:50 +11:00
Christophe Leroy ccbed90b82 powerpc/ptrace: move ptrace_triggered() into hw_breakpoint.c
ptrace_triggered() is declared in asm/hw_breakpoint.h and
only needed when CONFIG_HW_BREAKPOINT is set, so move it
into hw_breakpoint.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8402c516023da1371953a65af7df2008758ea0c4.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:49 +11:00
Christophe Leroy da529d4739 powerpc/ptrace: create ppc_gethwdinfo()
Create ippc_gethwdinfo() to handle PPC_PTRACE_GETHWDBGINFO and
reduce ifdef mess

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/82fefcc1ec75b96cece792878217a5d85ecda0c2.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:49 +11:00
Christophe Leroy e08227d25a powerpc/ptrace: create ptrace_get_debugreg()
Create ptrace_get_debugreg() to handle PTRACE_GET_DEBUGREG and
reduce ifdef mess

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c1482c41a39cc216f4073a51070d8680f52d5054.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:49 +11:00
Christophe Leroy 323a780ca1 powerpc/ptrace: split out ADV_DEBUG_REGS related functions.
Move ADV_DEBUG_REGS functions out of ptrace.c, into
ptrace-adv.c and ptrace-noadv.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Squash in fixup patch from Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e2bd7d275bd5933d848aad4fee3ca652a14d039b.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:49 +11:00
Christophe Leroy 6e0b79750c powerpc/ptrace: move register viewing functions out of ptrace.c
Create a dedicated ptrace-view.c file.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bfd8c3ed57c9057e4a5d3816737b5ee98c6f7e43.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy 7c1f8db019 powerpc/ptrace: split out TRANSACTIONAL_MEM related functions.
Move TRANSACTIONAL_MEM functions out of ptrace.c, into
ptrace-tm.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2d0ef3bb2610c0344bd42252c7134f429818c000.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy 60ef9dbd9d powerpc/ptrace: split out SPE related functions.
Move CONFIG_SPE functions out of ptrace.c, into
ptrace-spe.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0f17a331760310b5562fae3791cdd3cf9c64237b.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy 1b20773b00 powerpc/ptrace: split out ALTIVEC related functions.
Move CONFIG_ALTIVEC functions out of ptrace.c, into
ptrace-altivec.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/35dae891d01c817fca0fd6ab406a3a2c7bf07f60.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy 7b99ed4e8e powerpc/ptrace: split out VSX related functions.
Move CONFIG_VSX functions out of ptrace.c, into
ptrace-vsx.c and ptrace-novsx.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dc8e20c8c95b7e83add0c6dd48f9470628896c5c.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy 963ae6b2ff powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET
PARAMETER_SAVE_AREA_OFFSET is not used, drop it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6dac2b49207647f75cbf0e6771a545e691f0fd93.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:47 +11:00
Christophe Leroy f1763e623c powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64
Drop a bunch of #ifdefs CONFIG_PPC64 that are not vital.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/af38b87a7e1e3efe4f9b664eaeb029e6e7d69fdb.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:47 +11:00
Christophe Leroy b3138536c8 powerpc/ptrace: remove unused header includes
Remove unused header includes in ptrace.c and ptrace32.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6276df0be87a4329c2bb46b3b0f02059ae9e70e6.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:47 +11:00
Christophe Leroy da9a1c10e2 powerpc: Move ptrace into a subdirectory.
In order to allow splitting of ptrace depending on the different
CONFIG_ options, create a subdirectory dedicated to ptrace and move
ptrace.c and ptrace32.c into it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9ebcbe37834e9d447dd97f4381084795a673260c.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:41 +11:00
Nicholas Piggin 5f0b6ac390 powerpc/64/syscall: Reconcile interrupts
This reconciles interrupts in the system call case like all other
interrupts. This allows system_call_common to be shared with the scv
system call implementation in a subsequent patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-31-npiggin@gmail.com
2020-04-01 13:42:14 +11:00
Nicholas Piggin 702f098052 powerpc/64s/exception: Remove lite interrupt return
Regular interrupt return restores NVGPRS whereas lite returns do not.
This is clumsy: most interrupts can return without restoring NVGPRS in
most of the time, but there are special cases that require it (when
registers have been modified by the kernel). So change interrupt
return to not restore NVGPRS, and have interrupt handlers restore them
explicitly in the cases that requires it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-30-npiggin@gmail.com
2020-04-01 13:42:14 +11:00
Nicholas Piggin 6cc0c16d82 powerpc/64s: Implement interrupt exit logic in C
Implement the bulk of interrupt return logic in C. The asm return code
must handle a few cases: restoring full GPRs, and emulating stack
store.

The stack store emulation is significantly simplfied, rather than
creating a new return frame and switching to that before performing
the store, it uses the PACA to keep a scratch register around to
perform the store.

The asm return code is moved into 64e for now. The new logic has made
allowance for 64e, but I don't have a full environment that works well
to test it, and even booting in emulated qemu is not great for stress
testing. 64e shouldn't be too far off working with this, given a bit
more testing and auditing of the logic.

This is slightly faster on a POWER9 (page fault speed increases about
1.1%), probably due to reduced mtmsrd.

mpe: Includes fixes from Nick for _TIF_EMULATE_STACK_STORE
handling (including the fast_interrupt_return path), to remove
trace_hardirqs_on(), and fixes the interrupt-return part of the
MSR_VSX restore bug caught by tm-unavailable selftest.

mpe: Incorporate fix from Nick:

The return-to-kernel path has to replay any soft-pending interrupts if
it is returning to a context that had interrupts soft-enabled. It has
to do this carefully and avoid plain enabling interrupts if this is an
irq context, which can cause multiple nesting of interrupts on the
stack, and other unexpected issues.

The code which avoided this case got the soft-mask state wrong, and
marked interrupts as enabled before going around again to retry. This
seems to be mostly harmless except when PREEMPT=y, this calls
preempt_schedule_irq with irqs apparently enabled and runs into a BUG
in kernel/sched/core.c

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-29-npiggin@gmail.com
2020-04-01 13:42:14 +11:00
Nicholas Piggin 3282a3da25 powerpc/64: Implement soft interrupt replay in C
When local_irq_enable() finds a pending soft-masked interrupt, it
"replays" it by setting up registers like the initial interrupt entry,
then calls into the low level handler to set up an interrupt stack
frame and process the interrupt.

This is not necessary, and uses more stack than needed. The high level
interrupt handler can be called directly from C, with just pt_regs set
up on stack. This should be faster and use less stack.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-28-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin 993c670a4d powerpc/64/syscall: Zero volatile registers when returning
Kernel addresses and potentially other sensitive data could be leaked
in volatile registers after a syscall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-27-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin 68b34588e2 powerpc/64/sycall: Implement syscall entry/exit logic in C
System call entry and particularly exit code is beyond the limit of
what is reasonable to implement in asm.

This conversion moves all conditional branches out of the asm code,
except for the case that all GPRs should be restored at exit.

Null syscall test is about 5% faster after this patch, because the
exit work is handled under local_irq_disable, and the hard mask and
pending interrupt replay is handled after that, which avoids games
with MSR.

mpe: Includes subsequent fixes from Nick:

This fixes 4 issues caught by TM selftests. First was a tm-syscall bug
that hit due to tabort_syscall being called after interrupts were
reconciled (in a subsequent patch), which led to interrupts being
enabled before tabort_syscall was called. Rather than going through an
un-reconciling interrupts for the return, I just go back to putting
the test early in asm, the C-ification of that wasn't a big win
anyway.

Second is the syscall return _TIF_USER_WORK_MASK check would go into
an infinite loop if _TIF_RESTORE_TM became set. The asm code uses
_TIF_USER_WORK_MASK to brach to slowpath which includes
restore_tm_state.

Third is system call return was not calling restore_tm_state, I missed
this completely (alhtough it's in the return from interrupt C
conversion because when the asm syscall code encountered problems it
would branch to the interrupt return code.

Fourth is MSR_VEC missing from restore_math, which was caught by
tm-unavailable selftest taking an unexpected facility unavailable
interrupt when testing VSX unavailble exception with MSR.FP=1
MSR.VEC=1. Fourth case also has a fixup in a subsequent patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-26-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin f14f8a2032 powerpc/64/sstep: Ifdef the deprecated fast endian switch syscall
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-25-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin 965dd3ad30 powerpc/64/syscall: Remove non-volatile GPR save optimisation
powerpc has an optimisation where interrupts avoid saving the
non-volatile (or callee saved) registers to the interrupt stack frame
if they are not required.

Two problems with this are that an interrupt does not always know
whether it will need non-volatiles; and if it does need them, they can
only be saved from the entry-scoped asm code (because we don't control
what the C compiler does with these registers).

system calls are the most difficult: some system calls always require
all registers (e.g., fork, to copy regs into the child). Sometimes
registers are only required under certain conditions (e.g., tracing,
signal delivery). These cases require ugly logic in the call
chains (e.g., ppc_fork), and require a lot of logic to be implemented
in asm.

So remove the optimisation for system calls, and always save NVGPRs on
entry. Modern high performance CPUs are not so sensitive, because the
stores are dense in cache and can be hidden by other expensive work in
the syscall path -- the null syscall selftests benchmark on POWER9 is
not slowed (124.40ns before and 123.64ns after, i.e., within the
noise).

Other interrupts retain the NVGPR optimisation for now.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-24-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin 71c3b05a80 powerpc/64s/exception: Soft NMI interrupt should not use ret_from_except
The soft NMI handler does not reconcile interrupt state, so it should
not return via the normal ret_from_except path. Return like other NMIs,
using the EXCEPTION_RESTORE_REGS macro.

This becomes important when the scv interrupt is implemented, which
must handle soft-masked interrupts that have r13 set to something
other than the PACA -- returning to kernel in this case must restore
r13.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-23-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin b44fc96d7b powerpc/64s/exception: Reconcile interrupts in system_reset
This adds IRQ_HARD_DIS to irq_happened. Although it doesn't seem to
matter much because we're not allowed to enable irqs in an NMI
handler, the soft-irq debugging code is becoming more strict about
ensuring IRQ_HARD_DIS is in sync with MSR[EE], this may help avoid
asserts or other issues.

Add a comment explaining why MCE does not have this. Early machine
check is generally much smaller and more contained code which will
explode if you look at it wrong anyway as it runs in real mode, though
there's an argument that we should do similar reconciling for the MCE
as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-22-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin 2284ffea8f powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported
Apart from SRESET, MCE, and syscall (hcall variant), the SRR type
interrupts are not escalated to hypervisor mode, so are delivered to
the OS.

When running PR KVM, the OS is the hypervisor, and the guest runs with
MSR[PR]=1 (ie. usermode), so these interrupts must test if a guest was
running when interrupted. These tests are required at the real-mode
entry points because the PR KVM host runs with LPCR[AIL]=0.

In HV KVM and nested HV KVM, the guest always receives these
interrupts, so there is no need for the host to make this test. So
remove the tests if PR KVM is not configured.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-21-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin 94325357e8 powerpc/64s/exception: Add more comments for interrupt handlers
A few of the non-standard handlers are left uncommented. Some more
description could be added to some.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-20-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin 3f7fbd97d0 powerpc/64s/exception: Clean up SRR specifiers
Remove more magic numbers and replace with nicely named bools.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-19-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin 689e732262 powerpc/64s/exception: Re-inline some handlers
The reduction in interrupt entry size allows some handlers to be
re-inlined.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-18-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin 2babd6ea43 powerpc/64s/exception: Avoid touching the stack in hdecrementer
The hdec interrupt handler is reported to sometimes fire in Linux if
KVM leaves it pending after a guest exists. This is harmless, so there
is a no-op handler for it.

The interrupt handler currently uses the regular kernel stack. Change
this to avoid touching the stack entirely.

This should be the last place where the regular Linux stack can be
accessed with asynchronous interrupts (including PMI) soft-masked.
It might be possible to take advantage of this invariant, e.g., to
context switch the kernel stack SLB entry without clearing MSR[EE].

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-17-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin 9d598f9344 powerpc/64s/exception: Trim unused arguments from KVMTEST macro
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-16-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin 931dc86b3a powerpc/64s/exception: Remove the SPR saving patch code macros
These are used infrequently enough they don't provide much help, so
inline them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-15-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin d73a10cbf9 powerpc/64s/exception: Remove confusing IEARLY option
Replace IEARLY=1 and IEARLY=2 with IBRANCH_COMMON, which controls if
the entry code branches to a common handler; and IREALMODE_COMMON,
which controls whether the common handler should remain in real mode.

These special cases no longer avoid loading the SRR registers, there
is no point as most of them load the registers immediately anyway.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-14-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin 9600f261ac powerpc/64s/exception: Move KVM test to common code
This allows more code to be moved out of unrelocated regions. The
system call KVMTEST is changed to be open-coded and remain in the
tramp area to avoid having to move it to entry_64.S. The custom nature
of the system call entry code means the hcall case can be made more
streamlined than regular interrupt handlers.

mpe: Incorporate fix from Nick:

Moving KVM test to the common entry code missed the case of HMI and
MCE, which do not do __GEN_COMMON_ENTRY (because they don't want to
switch to virt mode).

This means a MCE or HMI exception that is taken while KVM is running a
guest context will not be switched out of that context, and KVM won't
be notified. Found by running sigfuz in guest with patched host on
POWER9 DD2.3, which causes some TM related HMI interrupts (which are
expected and supposed to be handled by KVM).

This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add
the KVM test. This makes them look a little more like other handlers
that all use __GEN_COMMON_ENTRY.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-13-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin 0eddf327e1 powerpc/64s/exception: Move soft-mask test to common code
As well as moving code out of the unrelocated vectors, this allows the
masked handlers to be moved to common code, and allows the soft_nmi
handler to be generated more like a regular handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-12-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin 8729c26e67 powerpc/64s/exception: Move real to virt switch into the common handler
The real mode interrupt entry points currently use rfid to branch to
the common handler in virtual mode. This is a significant amount of
code, and forces other code (notably the KVM test) to live in the
real mode handler.

In the interest of minimising the amount of code that runs unrelocated
move the switch to virt mode into the common code, and do it with
mtmsrd, which avoids clobbering SRRs (although the post-KVMTEST
performance of real-mode interrupt handlers is not a big concern these
days).

This requires CTR to always be saved (real-mode needs to reach 0xc...)
but that's not a huge impact these days. It could be optimized away in
future.

mpe: Incorporate fix from Nick:

It's possible for interrupts to be replayed when TM is enabled and
suspended, for example rt_sigreturn, where the mtmsrd MSR_KERNEL in
the real-mode entry point to the common handler causes a TM Bad Thing
exception (due to attempting to clear suspended).

The fix for this is to have replay interrupts go to the _virt entry
point and skip the mtmsrd, which matches what happens before this
patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-11-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin a3cd35be6e powerpc/64s/exception: Add ISIDE option
Rather than using DAR=2 to select the i-side registers, add an
explicit option.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-10-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin b177ae2f8c powerpc/64s/exception: Remove old INT_KVM_HANDLER
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-9-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin 6d71759a74 powerpc/64s/exception: Remove old INT_COMMON macro
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-8-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin fc589ee416 powerpc/64s/exception: Remove old INT_ENTRY macro
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-7-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin 4f50541f67 powerpc/64s/exception: Move all interrupt handlers to new style code gen macros
Aside from label names and BUG line numbers, the generated code change
is an additional HMI KVM handler added for the "late" KVM handler,
because early and late HMI generation is achieved by defining two
different interrupt types.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-6-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin eb204d863b powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
These don't provide a large amount of code sharing. Removing them
makes code easier to shuffle around. For example, some of the common
instructions will be moved into the common code gen macro.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-5-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin d52fd3d31b powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE parameters
No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-4-npiggin@gmail.com
2020-04-01 13:42:09 +11:00
Nicholas Piggin 7cb3a1a03e powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters
No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-3-npiggin@gmail.com
2020-04-01 13:42:09 +11:00
Nicholas Piggin a42a239db3 powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation
The code generation macro arguments are difficult to read, and
defaults can't easily be used.

This introduces a block where parameters can be set for interrupt
handler code generation by the subsequent macros, and adds the first
generation macro for interrupt entry.

One interrupt handler is converted to the new macros to demonstrate
the change, the rest will be coverted all at once.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-2-npiggin@gmail.com
2020-04-01 13:42:09 +11:00
Nicholas Piggin a2e366832f powerpc/64: mark emergency stacks valid to unwind
Before:

  WARNING: CPU: 0 PID: 494 at arch/powerpc/kernel/irq.c:343
  CPU: 0 PID: 494 Comm: a Tainted: G        W
  NIP:  c00000000001ed2c LR: c000000000d13190 CTR: c00000000003f910
  REGS: c0000001fffd3870 TRAP: 0700   Tainted: G        W
  MSR:  8000000000021003 <SF,ME,RI,LE>  CR: 28000488  XER: 00000000
  CFAR: c00000000001ec90 IRQMASK: 0
  GPR00: c000000000aeb12c c0000001fffd3b00 c0000000012ba300 0000000000000000
  GPR04: 0000000000000000 0000000000000000 000000010bd207c8 6b00696e74657272
  GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000
  GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 000000010bd207bc
  GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50
  NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
  LR [c000000000d13190] _raw_spin_unlock_irqrestore+0x50/0xc0
  Call Trace:
  Instruction dump:
  60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
  60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000

After:

  WARNING: CPU: 0 PID: 499 at arch/powerpc/kernel/irq.c:343
  CPU: 0 PID: 499 Comm: a Not tainted
  NIP:  c00000000001ed2c LR: c000000000d13210 CTR: c00000000003f980
  REGS: c0000001fffd3870 TRAP: 0700   Not tainted
  MSR:  8000000000021003 <SF,ME,RI,LE>  CR: 28000488  XER: 00000000
  CFAR: c00000000001ec90 IRQMASK: 0
  GPR00: c000000000aeb1ac c0000001fffd3b00 c0000000012ba300 0000000000000000
  GPR04: 0000000000000000 0000000000000000 00000001347607c8 6b00696e74657272
  GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000
  GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 00000001347607bc
  GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50
  NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
  LR [c000000000d13210] _raw_spin_unlock_irqrestore+0x50/0xc0
  Call Trace:
  [c0000001fffd3b20] [c000000000aeb1ac] of_find_property+0x6c/0x90
  [c0000001fffd3b70] [c000000000aeb1f0] of_get_property+0x20/0x40
  [c0000001fffd3b90] [c000000000042cdc] rtas_token+0x3c/0x70
  [c0000001fffd3bb0] [c0000000000dc318] fwnmi_release_errinfo+0x28/0x70
  [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540
  [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70
  [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0
  --- interrupt: 200 at 0x1347607c8
      LR = 0x7fffafbd8328
  Instruction dump:
  60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
  60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200325104144.158362-1-npiggin@gmail.com
2020-04-01 13:42:09 +11:00
Michael Ellerman c7def7fbde powerpc/64/tm: Don't let userspace set regs->trap via sigreturn
In restore_tm_sigcontexts() we take the trap value directly from the
user sigcontext with no checking:

	err |= __get_user(regs->trap, &sc->gp_regs[PT_TRAP]);

This means we can be in the kernel with an arbitrary regs->trap value.

Although that's not immediately problematic, there is a risk we could
trigger one of the uses of CHECK_FULL_REGS():

	#define CHECK_FULL_REGS(regs)	BUG_ON(regs->trap & 1)

It can also cause us to unnecessarily save non-volatile GPRs again in
save_nvgprs(), which shouldn't be problematic but is still wrong.

It's also possible it could trick the syscall restart machinery, which
relies on regs->trap not being == 0xc00 (see 9a81c16b52 ("powerpc:
fix double syscall restarts")), though I haven't been able to make
that happen.

Finally it doesn't match the behaviour of the non-TM case, in
restore_sigcontext() which zeroes regs->trap.

So change restore_tm_sigcontexts() to zero regs->trap.

This was discovered while testing Nick's upcoming rewrite of the
syscall entry path. In that series the call to save_nvgprs() prior to
signal handling (do_notify_resume()) is removed, which leaves the
low-bit of regs->trap uncleared which can then trigger the FULL_REGS()
WARNs in setup_tm_sigcontexts().

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200401023836.3286664-1-mpe@ellerman.id.au
2020-04-01 13:42:00 +11:00
Linus Torvalds 29d9f30d4c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Fix the iwlwifi regression, from Johannes Berg.

   2) Support BSS coloring and 802.11 encapsulation offloading in
      hardware, from John Crispin.

   3) Fix some potential Spectre issues in qtnfmac, from Sergey
      Matyukevich.

   4) Add TTL decrement action to openvswitch, from Matteo Croce.

   5) Allow paralleization through flow_action setup by not taking the
      RTNL mutex, from Vlad Buslov.

   6) A lot of zero-length array to flexible-array conversions, from
      Gustavo A. R. Silva.

   7) Align XDP statistics names across several drivers for consistency,
      from Lorenzo Bianconi.

   8) Add various pieces of infrastructure for offloading conntrack, and
      make use of it in mlx5 driver, from Paul Blakey.

   9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.

  10) Lots of parallelization improvements during configuration changes
      in mlxsw driver, from Ido Schimmel.

  11) Add support to devlink for generic packet traps, which report
      packets dropped during ACL processing. And use them in mlxsw
      driver. From Jiri Pirko.

  12) Support bcmgenet on ACPI, from Jeremy Linton.

  13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
      Starovoitov, and your's truly.

  14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.

  15) Fix sysfs permissions when network devices change namespaces, from
      Christian Brauner.

  16) Add a flags element to ethtool_ops so that drivers can more simply
      indicate which coalescing parameters they actually support, and
      therefore the generic layer can validate the user's ethtool
      request. Use this in all drivers, from Jakub Kicinski.

  17) Offload FIFO qdisc in mlxsw, from Petr Machata.

  18) Support UDP sockets in sockmap, from Lorenz Bauer.

  19) Fix stretch ACK bugs in several TCP congestion control modules,
      from Pengcheng Yang.

  20) Support virtual functiosn in octeontx2 driver, from Tomasz
      Duszynski.

  21) Add region operations for devlink and use it in ice driver to dump
      NVM contents, from Jacob Keller.

  22) Add support for hw offload of MACSEC, from Antoine Tenart.

  23) Add support for BPF programs that can be attached to LSM hooks,
      from KP Singh.

  24) Support for multiple paths, path managers, and counters in MPTCP.
      From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
      and others.

  25) More progress on adding the netlink interface to ethtool, from
      Michal Kubecek"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
  net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
  cxgb4/chcr: nic-tls stats in ethtool
  net: dsa: fix oops while probing Marvell DSA switches
  net/bpfilter: remove superfluous testing message
  net: macb: Fix handling of fixed-link node
  net: dsa: ksz: Select KSZ protocol tag
  netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
  net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
  net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
  net: stmmac: create dwmac-intel.c to contain all Intel platform
  net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
  net: dsa: bcm_sf2: Add support for matching VLAN TCI
  net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
  net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
  net: dsa: bcm_sf2: Disable learning for ASP port
  net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
  net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
  net: dsa: b53: Restore VLAN entries upon (re)configuration
  net: dsa: bcm_sf2: Fix overflow checks
  hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
  ...
2020-03-31 17:29:33 -07:00
Aneesh Kumar K.V 338f6dac85 libnvdimm: Update persistence domain value for of_pmem and papr_scm device
Currently, kernel shows the below values
	"persistence_domain":"cpu_cache"
	"persistence_domain":"memory_controller"
	"persistence_domain":"unknown"

"cpu_cache" indicates no extra instructions is needed to ensure the persistence
of data in the pmem media on power failure.

"memory_controller" indicates cpu cache flush instructions are required to flush
the data. Platform provides mechanisms to automatically flush outstanding
write data from memory controler to pmem on system power loss.

Based on the above use memory_controller for non volatile regions on ppc64.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/20200324034821.60869-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2020-03-31 14:42:28 -07:00
Sean Christopherson b990408537 KVM: Pass kvm_init()'s opaque param to additional arch funcs
Pass @opaque to kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() to allow architecture specific code to
reference @opaque without having to stash it away in a temporary global
variable.  This will enable x86 to separate its vendor specific callback
ops, which are passed via @opaque, into "init" and "runtime" ops without
having to stash away the "init" ops.

No functional change intended.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com> #s390
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-31 10:48:03 -04:00
Paolo Bonzini 4f4af841f0 KVM PPC update for 5.7
* Add a capability for enabling secure guests under the Protected
   Execution Framework ultravisor
 
 * Various bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJegnq3AAoJEJ2a6ncsY3GfrU8IANpaxS7kTAsW3ZN0wJnP2PHq
 i8j7ZPlCVLpkQWArrMyMimDLdiN9VP7lvaWonAWjG0HJrxlmMUUVnwVSMuPTXhWY
 vSwEqhUXwSF5KKxN7DYkgxqaKFxkElJIubVE/AYJjH9zFpu9ca4vM5sCxnzWcvS3
 hxPNe756nKhFH9xhrC/9NRUZWmDAiv75wvzq+5DRKbSVPJGJugchdIPBbi3Yrr7P
 qxnUCPZdCxzXXU94bfQl038wrjSMR3S7b4FvekJ12go2FalujqzsL2lVtift4Rf0
 jvu+RIINcmTiaQqclY332j7l24LZ6Pni456RygT5OwFxuYoxWKZRafN16JmaMCo=
 =rpJL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

KVM PPC update for 5.7

* Add a capability for enabling secure guests under the Protected
  Execution Framework ultravisor

* Various bug fixes and cleanups.
2020-03-31 10:45:49 -04:00
Paolo Bonzini cf39d37539 KVM/arm updates for Linux 5.7
- GICv4.1 support
 - 32bit host removal
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl6DKKIPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDDe0P/30Oda6HJdcUY+g0dnHkH8N7t+VKjPPnihlX
 WBaT0Y4SzMsfAtG5lQqS48A50dXKWW70QvwkZjxu7abQhYFWGd2SGtTQxwqJXT8J
 I6MBh4r9xrIfiqzVT2BXslA6id5H6wCyyFI6vKm/IFkIu1J6JtwnKakQ0CIddS1d
 Blbgj5jcxGw+2xOppHCQXbWwwDdmYWkMZEBZjmhkezddqLDK+oaAUiUhHHHizTsB
 kLjgqYBVENpR1zDIsGpQAJloKXAiHfBQshQAmnhnBNzXE60LZ0n0/iODU9U5FDEO
 5j0DRWccKvsIMsUh7JpPr5xerGJ0rqk1IwPC2JcyzfRbvRLMpK1IOWfhI5Tg5lbP
 4Ev96QLEMBnKOWMSE0MqnMdq6JPzDLA6WZ28HZe2nc3/oWNgsSDtlXigx4xFFxTX
 zfc2YpAgFu3xJkPf8PtWTFvItm0AvFNFynPg0Rr/NsGf/FGeszYR4cLcHmv5NlWS
 IiV4+lgnlmr2LZr3VjUaumbtWIpuVF4Db5Al2K2E/PCN7ObfEkyCweDic8ophkH8
 sMS9TI38aH1Efy+I2Nfxxqpy8BcElZAMrAWt9R27A4JRLHdr7j5DsGnyRigXHgRe
 pFgbqtk/EjWkHwjaJVg8kPxf2+2P05VZsQeGG721nbKAIKDetM3RA2BflexdsptY
 kXplNsVr
 =eILh
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for Linux 5.7

- GICv4.1 support
- 32bit host removal
2020-03-31 10:44:53 -04:00
David S. Miller ed52f2c608 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 19:52:37 -07:00
Linus Torvalds 336622e9fc NOHZ full updates:
- Remove TIF_NOHZ from 3 architectures
 
     These architectures use a static key to decide whether context tracking
     needs to be invoked and the TIF_NOHZ flag just causes a pointless
     slowpath execution for nothing.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+bITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZjpD/9PkXE/zQoVmPLhOOcEBXB4i0rQQV41
 mR8F83aswch+qtT1g7A00G5j49CWkLh/hj5PX7ajS9nSTQCHOQ9jdZuxPrjW8CGZ
 gMHCyd5o9C98sKOORylR2nuCKhVdOq0/HleRjBBDsqcO0T5KlhVPUrtuJ878kX8d
 1SnoZnZMx+Ro0+4+Ehp39CmZJ0pV6o5ypT469esa2MB1xw389AQCmLt4rk99FNMo
 LDbKAB+7XBwNAu/rqD0hIv7YyvaSlcdlWBAXBLeCrwVIKQG3VfT9CpgwTtGoNFhY
 9KBkzr0z+lvHS9eKWyWzpXYrgVU1u28gUVvpaavv+Ma5V8STqNunoMBs7hKanJqV
 mPh+4ABACtFieKlwkj2PwUrGEgH+y/SAfStliOFsimVz/w2udC0S777/EjjzfKaN
 NS13mP19s5/P1q3y/6BSrOxYD0inicROO+UfetHNPOgMePY+Gp/xzluefPnhTagX
 CnJxndA3Fbjh9rXFbSZ5TMlf97kTxVVJE+qtrh5Upw1AWpo/qvkLsIFsamgyW2jR
 7t3MbHzKYnLkUJlwOLPJimvZeN4hZOx05ra/RZOkVaxri7xtVsDCkaEvhgEqLWYj
 Gbt2mGnNccawwN0bVPd2hgkKmUBqO8u5llhQcM2BBG4CJgZMaB8LjIkS+F6FsSME
 xMnY+tS3c7Q8TQ==
 =0lHP
 -----END PGP SIGNATURE-----

Merge tag 'timers-nohz-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull NOHZ update from Thomas Gleixner:
 "Remove TIF_NOHZ from three architectures

  These architectures use a static key to decide whether context
  tracking needs to be invoked and the TIF_NOHZ flag just causes a
  pointless slowpath execution for nothing"

* tag 'timers-nohz-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64: Remove TIF_NOHZ
  arm: Remove TIF_NOHZ
  x86: Remove TIF_NOHZ
  context-tracking: Introduce CONFIG_HAVE_TIF_NOHZ
  x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY
2020-03-30 18:29:05 -07:00
Linus Torvalds 992a1a3b45 CPU (hotplug) updates:
- Support for locked CSD objects in smp_call_function_single_async()
     which allows to simplify callsites in the scheduler core and MIPS
 
   - Treewide consolidation of CPU hotplug functions which ensures the
     consistency between the sysfs interface and kernel state. The low level
     functions cpu_up/down() are now confined to the core code and not
     longer accessible from random code.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B9VQTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodCyD/0WFYAe7LkOfNjkbLa0IeuyLjF9rnCi
 ilcSXMLpaVwwoQvm7MopwkXUDdmEIyeJ0B641j3mC3AKCRap4+O36H2IEg2byrj7
 twOvQNCfxpVVmCCD11FTH9aQa74LEB6AikTgjevhrRWj6eHsal7c2Ak26AzCgrt+
 0eEkOAOWJbLAlbIiPdHlCZ3TMldcs3gg+lRSYd5QCGQVkZFnwpXzyOvpyJEUGGbb
 R/JuvwJoLhRMiYAJDILoQQQg/J07ODuivse/R8PWaH2djkn+2NyRGrD794PhyyOg
 QoTU0ZrYD3Z48ACXv+N3jLM7wXMcFzjYtr1vW1E3O/YGA7GVIC6XHGbMQ7tEihY0
 ajtwq8DcnpKtuouviYnf7NuKgqdmJXkaZjz3Gms6n8nLXqqSVwuQELWV2CXkxNe6
 9kgnnKK+xXMOGI4TUhN8bejvkXqRCmKMeQJcWyf+7RA9UIhAJw5o7WGo8gXfQWUx
 tazCqDy/inYjqGxckW615fhi2zHfemlYTbSzIGOuMB1TEPKFcrgYAii/VMsYHQVZ
 5amkYUXGQ5brlCOzOn38lzp5OkALBnFzD7xgvOcQgWT3ynVpdqADfBytXiEEHh4J
 KSkSgSSRcS58397nIxnDcJgJouHLvAWYyPZ4UC6mfynuQIic31qMHGVqwdbEKMY3
 4M5dGgqIfOBgYw==
 =jwCg
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core SMP updates from Thomas Gleixner:
 "CPU (hotplug) updates:

   - Support for locked CSD objects in smp_call_function_single_async()
     which allows to simplify callsites in the scheduler core and MIPS

   - Treewide consolidation of CPU hotplug functions which ensures the
     consistency between the sysfs interface and kernel state. The low
     level functions cpu_up/down() are now confined to the core code and
     not longer accessible from random code"

* tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
  cpu/hotplug: Hide cpu_up/down()
  cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
  torture: Replace cpu_up/down() with add/remove_cpu()
  firmware: psci: Replace cpu_up/down() with add/remove_cpu()
  xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
  parisc: Replace cpu_up/down() with add/remove_cpu()
  sparc: Replace cpu_up/down() with add/remove_cpu()
  powerpc: Replace cpu_up/down() with add/remove_cpu()
  x86/smp: Replace cpu_up/down() with add/remove_cpu()
  arm64: hibernate: Use bringup_hibernate_cpu()
  cpu/hotplug: Provide bringup_hibernate_cpu()
  arm64: Use reboot_cpu instead of hardconding it to 0
  arm64: Don't use disable_nonboot_cpus()
  ARM: Use reboot_cpu instead of hardcoding it to 0
  ARM: Don't use disable_nonboot_cpus()
  ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
  cpu/hotplug: Create a new function to shutdown nonboot cpus
  cpu/hotplug: Add new {add,remove}_cpu() functions
  sched/core: Remove rq.hrtick_csd_pending
  ...
2020-03-30 18:06:39 -07:00
Linus Torvalds 9b82f05f86 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "The main changes in this cycle were:

  Kernel side changes:

   - A couple of x86/cpu cleanups and changes were grandfathered in due
     to patch dependencies. These clean up the set of CPU model/family
     matching macros with a consistent namespace and C99 initializer
     style.

   - A bunch of updates to various low level PMU drivers:
       * AMD Family 19h L3 uncore PMU
       * Intel Tiger Lake uncore support
       * misc fixes to LBR TOS sampling

   - optprobe fixes

   - perf/cgroup: optimize cgroup event sched-in processing

   - misc cleanups and fixes

  Tooling side changes are to:

   - perf {annotate,expr,record,report,stat,test}

   - perl scripting

   - libapi, libperf and libtraceevent

   - vendor events on Intel and S390, ARM cs-etm

   - Intel PT updates

   - Documentation changes and updates to core facilities

   - misc cleanups, fixes and other enhancements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
  cpufreq/intel_pstate: Fix wrong macro conversion
  x86/cpu: Cleanup the now unused CPU match macros
  hwrng: via_rng: Convert to new X86 CPU match macros
  crypto: Convert to new CPU match macros
  ASoC: Intel: Convert to new X86 CPU match macros
  powercap/intel_rapl: Convert to new X86 CPU match macros
  PCI: intel-mid: Convert to new X86 CPU match macros
  mmc: sdhci-acpi: Convert to new X86 CPU match macros
  intel_idle: Convert to new X86 CPU match macros
  extcon: axp288: Convert to new X86 CPU match macros
  thermal: Convert to new X86 CPU match macros
  hwmon: Convert to new X86 CPU match macros
  platform/x86: Convert to new CPU match macros
  EDAC: Convert to new X86 CPU match macros
  cpufreq: Convert to new X86 CPU match macros
  ACPI: Convert to new X86 CPU match macros
  x86/platform: Convert to new CPU match macros
  x86/kernel: Convert to new CPU match macros
  x86/kvm: Convert to new CPU match macros
  x86/perf/events: Convert to new CPU match macros
  ...
2020-03-30 16:40:08 -07:00
Linus Torvalds 4b9fd8a829 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Continued user-access cleanups in the futex code.

   - percpu-rwsem rewrite that uses its own waitqueue and atomic_t
     instead of an embedded rwsem. This addresses a couple of
     weaknesses, but the primary motivation was complications on the -rt
     kernel.

   - Introduce raw lock nesting detection on lockdep
     (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
     lock differences. This too originates from -rt.

   - Reuse lockdep zapped chain_hlocks entries, to conserve RAM
     footprint on distro-ish kernels running into the "BUG:
     MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
     chain-entries pool.

   - Misc cleanups, smaller fixes and enhancements - see the changelog
     for details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
  thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
  Documentation/locking/locktypes: Minor copy editor fixes
  Documentation/locking/locktypes: Further clarifications and wordsmithing
  m68knommu: Remove mm.h include from uaccess_no.h
  x86: get rid of user_atomic_cmpxchg_inatomic()
  generic arch_futex_atomic_op_inuser() doesn't need access_ok()
  x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
  x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
  objtool: whitelist __sanitizer_cov_trace_switch()
  [parisc, s390, sparc64] no need for access_ok() in futex handling
  sh: no need of access_ok() in arch_futex_atomic_op_inuser()
  futex: arch_futex_atomic_op_inuser() calling conventions change
  completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
  lockdep: Add posixtimer context tracing bits
  lockdep: Annotate irq_work
  lockdep: Add hrtimer context tracing bits
  lockdep: Introduce wait-type checks
  completion: Use simple wait queues
  sched/swait: Prepare usage in completions
  ...
2020-03-30 16:17:15 -07:00
Thomas Gleixner cf226c42b2 Merge branch 'uaccess.futex' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into locking/core
Pull uaccess futex cleanups for Al Viro:

     Consolidate access_ok() usage and the futex uaccess function zoo.
2020-03-28 11:59:24 +01:00
Al Viro a08971e948 futex: arch_futex_atomic_op_inuser() calling conventions change
Move access_ok() in and pagefault_enable()/pagefault_disable() out.
Mechanical conversion only - some instances don't really need
a separate access_ok() at all (e.g. the ones only using
get_user()/put_user(), or architectures where access_ok()
is always true); we'll deal with that in followups.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-03-27 23:58:51 -04:00
Aneesh Kumar K.V 233ba54618 powerpc/64: Avoid isync in flush_dcache_range()
As per ISA an isync is only needed on instruction cache block
invalidate. Remove the same from dcache invalidate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200320103242.229223-1-aneesh.kumar@linux.ibm.com
2020-03-27 17:37:07 +11:00
Fangrui Song 968339fad4 powerpc/boot: Delete unneeded .globl _zimage_start
.globl sets the symbol binding to STB_GLOBAL while .weak sets the
binding to STB_WEAK. GNU as let .weak override .globl since
binutils-gdb 5ca547dc2399a0a5d9f20626d4bf5547c3ccfddd (1996). Clang
integrated assembler let the last win but it may error in the future.

Since it is a convention that only one binding directive is used, just
delete .globl.

Fixes: ee9d21b3b3 ("powerpc/boot: Ensure _zimage_start is a weak symbol")
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200325164257.170229-1-maskray@google.com
2020-03-27 15:50:06 +11:00
Ganesh Goudar efbc4303b2 powerpc/pseries: Handle UE event for memcpy_mcsafe
memcpy_mcsafe has been implemented for power machines which is used
by pmem infrastructure, so that an UE encountered during memcpy from
pmem devices would not result in panic instead a right error code
is returned. The implementation expects machine check handler to ignore
the event and set nip to continue the execution from fixup code.

Appropriate changes are already made to powernv machine check handler,
make similar changes to pseries machine check handler to ignore the
the event and set nip to continue execution at the fixup entry if we
hit UE at an instruction with a fixup entry.

while we are at it, have a common function which searches the exception
table entry and updates nip with fixup address, and any future common
changes can be made in this function that are valid for both architectures.

powernv changes are made by
commit 895e3dceeb ("powerpc/mce: Handle UE event for memcpy_mcsafe")

Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Santosh S <santosh@fossix.org>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200326184916.31172-1-ganeshgr@linux.ibm.com
2020-03-27 14:59:35 +11:00
Christoph Hellwig 800bb1c8dc mm: handle multiple owners of device private pages in migrate_vma
Add a new src_owner field to struct migrate_vma.  If the field is set,
only device private pages with page->pgmap->owner equal to that field are
migrated.  If the field is not set only "normal" pages are migrated.

Fixes: df6ad69838 ("mm/device-public-memory: device memory cache coherent with CPU")
Link: https://lore.kernel.org/r/20200316193216.920734-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 14:33:38 -03:00
Christoph Hellwig f894ddd5ff memremap: add an owner field to struct dev_pagemap
Add a new opaque owner field to struct dev_pagemap, which will allow the
hmm and migrate_vma code to identify who owns ZONE_DEVICE memory, and
refuse to work on mappings not owned by the calling entity.

Link: https://lore.kernel.org/r/20200316193216.920734-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-26 14:33:37 -03:00
Michael Ellerman c72e8da062 powerpc/smp: Use IS_ENABLED() to avoid #ifdef
We can avoid the #ifdef by using IS_ENABLED() in the existing
condition check.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313112020.28235-2-mpe@ellerman.id.au
2020-03-27 01:15:09 +11:00
Michael Ellerman 4b4d181d63 powerpc/smp: Drop superfluous NULL check
We don't need the NULL check of np, the result is the same because the
OF helpers cope with NULL, of_node_to_nid(NULL) == NUMA_NO_NODE (-1).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313112020.28235-1-mpe@ellerman.id.au
2020-03-27 01:15:09 +11:00
Douglas Miller 8ec26c25c3 powerpc/xmon: Add ASCII dump to d1,d2,d4,d8 commands.
The reason debuggers add an ASCII dump to other types of memory dumps
is to give the user visual reference points in the case that ASCII
strings are adjacent to other structures or element.  For example,
when examining the task_struct structure one can look for the comm[]
string and use it to locate other important elements.

ASCII strings do not have endianess, they exist in memory in the same
order regardless of CPU endianess. ASCII strings are, by definition,
human readable and so should be presented in a human readable format.

For these reasons, the supplemental ASCII dump does not re-order
the strings from memory to match the endianess of the corresponding
16, 32, or 64 bit words. That would make the ASCII dump much less
useful.

Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1488205694-13337-1-git-send-email-dougmill@linux.vnet.ibm.com
2020-03-27 00:49:44 +11:00
Cédric Le Goater 930914b7d5 powerpc/xive: Add a debugfs file to dump internal XIVE state
As does XMON, the debugfs file /sys/kernel/debug/powerpc/xive exposes
the XIVE internal state of the machine CPUs and interrupts. Available
on the PowerNV and sPAPR platforms.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
[mpe: Make the debugfs file 0400]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-5-clg@kaod.org
2020-03-27 00:20:38 +11:00
Cédric Le Goater 5191e0ba51 powerpc/xmon: Add source flags to output of XIVE interrupts
Some firmwares or hypervisors can advertise different source
characteristics. Track their value under XMON. What we are mostly
interested in is the StoreEOI flag.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-4-clg@kaod.org
2020-03-27 00:19:05 +11:00
Cédric Le Goater 97ef275077 powerpc/xive: Fix xmon support on the PowerNV platform
The PowerNV platform has multiple IRQ chips and the xmon command
dumping the state of the XIVE interrupt should only operate on the
XIVE IRQ chip.

Fixes: 5896163f7f ("powerpc/xmon: Improve output of XIVE interrupts")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-3-clg@kaod.org
2020-03-27 00:19:05 +11:00
Cédric Le Goater b1a504a650 powerpc/xive: Use XIVE_BAD_IRQ instead of zero to catch non configured IPIs
When a CPU is brought up, an IPI number is allocated and recorded
under the XIVE CPU structure. Invalid IPI numbers are tracked with
interrupt number 0x0.

On the PowerNV platform, the interrupt number space starts at 0x10 and
this works fine. However, on the sPAPR platform, it is possible to
allocate the interrupt number 0x0 and this raises an issue when CPU 0
is unplugged. The XIVE spapr driver tracks allocated interrupt numbers
in a bitmask and it is not correctly updated when interrupt number 0x0
is freed. It stays allocated and it is then impossible to reallocate.

Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms.

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Fixes: eac1e731b5 ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306150143.5551-2-clg@kaod.org
2020-03-27 00:19:04 +11:00
Nick Desaulniers a7032637b5 powerpc: Prefer __section and __printf from compiler_attributes.h
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
[mpe: Drop changes to a/p/boot which doesn't use linux headers]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190812215052.71840-10-ndesaulniers@google.com
2020-03-27 00:16:32 +11:00
Paul Mackerras 9a5788c615 KVM: PPC: Book3S HV: Add a capability for enabling secure guests
At present, on Power systems with Protected Execution Facility
hardware and an ultravisor, a KVM guest can transition to being a
secure guest at will.  Userspace (QEMU) has no way of knowing
whether a host system is capable of running secure guests.  This
will present a problem in future when the ultravisor is capable of
migrating secure guests from one host to another, because
virtualization management software will have no way to ensure that
secure guests only run in domains where all of the hosts can
support secure guests.

This adds a VM capability which has two functions: (a) userspace
can query it to find out whether the host can support secure guests,
and (b) userspace can enable it for a guest, which allows that
guest to become a secure guest.  If userspace does not enable it,
KVM will return an error when the ultravisor does the hypercall
that indicates that the guest is starting to transition to a
secure guest.  The ultravisor will then abort the transition and
the guest will terminate.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
2020-03-26 11:09:04 +11:00
Qais Yousef 4d37cc2dc3 powerpc: Replace cpu_up/down() with add/remove_cpu()
The core device API performs extra housekeeping bits that are missing
from directly calling cpu_up/down.

See commit a6717c01dd ("powerpc/rtas: use device model APIs and
serialization during LPM") for an example description of what might go
wrong.

This also prepares to make cpu_up/down() a private interface of the CPU
subsystem.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lkml.kernel.org/r/20200323135110.30522-11-qais.yousef@arm.com
2020-03-25 12:59:35 +01:00
Masahiro Yamada d198b34f38 .gitignore: add SPDX License Identifier
Add SPDX License Identifier to all .gitignore files.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 11:50:48 +01:00
Fabiano Rosas 7074695ac6 powerpc/prom_init: Remove leftover comment
The if statement that this comment referred to was removed in
commit 11fdb30934 ("powerpc/prom_init: Remove support for OPAL v2").

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200324182912.1048906-1-farosas@linux.ibm.com
2020-03-25 21:15:01 +11:00
Christophe Leroy 21f8b2fa3c powerpc/kprobes: Ignore traps that happened in real mode
When a program check exception happens while MMU translation is
disabled, following Oops happens in kprobe_handler() in the following
code:

	} else if (*addr != BREAKPOINT_INSTRUCTION) {

  BUG: Unable to handle kernel data access on read at 0x0000e268
  Faulting instruction address: 0xc000ec34
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE PAGE_SIZE=16K PREEMPT CMPC885
  Modules linked in:
  CPU: 0 PID: 429 Comm: cat Not tainted 5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267
  NIP:  c000ec34 LR: c000ecd8 CTR: c019cab8
  REGS: ca4d3b58 TRAP: 0300   Not tainted  (5.6.0-rc1-s3k-dev-00824-g84195dc6c58a)
  MSR:  00001032 <ME,IR,DR,RI>  CR: 2a4d3c52  XER: 00000000
  DAR: 0000e268 DSISR: c0000000
  GPR00: c000b09c ca4d3c10 c66d0620 00000000 ca4d3c60 00000000 00009032 00000000
  GPR08: 00020000 00000000 c087de44 c000afe0 c66d0ad0 100d3dd6 fffffff3 00000000
  GPR16: 00000000 00000041 00000000 ca4d3d70 00000000 00000000 0000416d 00000000
  GPR24: 00000004 c53b6128 00000000 0000e268 00000000 c07c0000 c07bb6fc ca4d3c60
  NIP [c000ec34] kprobe_handler+0x128/0x290
  LR [c000ecd8] kprobe_handler+0x1cc/0x290
  Call Trace:
  [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc
  [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4
  --- interrupt: 700 at 0xe268
  Instruction dump:
  913e0008 81220000 38600001 3929ffff 91220000 80010024 bb410008 7c0803a6
  38210020 4e800020 38600000 4e800020 <813b0000> 6d2a7fe0 2f8a0008 419e0154
  ---[ end trace 5b9152d4cdadd06d ]---

kprobe is not prepared to handle events in real mode and functions
running in real mode should have been blacklisted, so kprobe_handler()
can safely bail out telling 'this trap is not mine' for any trap that
happened while in real-mode.

If the trap happened with MSR_IR or MSR_DR cleared, return 0
immediately.

Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Fixes: 6cc89bad60 ("powerpc/kprobes: Invoke handlers directly")
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/424331e2006e7291a1bfe40e7f3fa58825f565e1.1582054578.git.christophe.leroy@c-s.fr
2020-03-25 12:09:51 +11:00
Nathan Chancellor af6cf95c4d powerpc/maple: Fix declaration made after definition
When building ppc64 defconfig, Clang errors (trimmed for brevity):

  arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration
  must precede definition [-Werror,-Wignored-attributes]
  machine_device_initcall(maple, maple_cpc925_edac_setup);
  ^

machine_device_initcall expands to __define_machine_initcall, which in
turn has the macro machine_is used in it, which declares mach_##name
with an __attribute__((weak)). define_machine actually defines
mach_##name, which in this file happens before the declaration, hence
the warning.

To fix this, move define_machine after machine_device_initcall so that
the declaration occurs before the definition, which matches how
machine_device_initcall and define_machine work throughout
arch/powerpc.

While we're here, remove some spaces before tabs.

Fixes: 8f101a051e ("edac: cpc925 MC platform device setup")
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Ilie Halip <ilie.halip@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancellor@gmail.com
2020-03-25 12:09:48 +11:00
Nicholas Piggin adde8715cf powerpc/pseries: Avoid harmless preempt warning
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200320152436.1468651-1-npiggin@gmail.com
2020-03-25 12:09:39 +11:00
Oliver O'Halloran e86350f70a powerpc/eeh: Rework eeh_ops->probe()
With the EEH early probe now being pseries specific there's no need for
eeh_ops->probe() to take a pci_dn. Instead, we can make it take a pci_dev
and use the probe function to map a pci_dev to an eeh_dev. This allows
the platform to implement it's own method for finding (or creating) an
eeh_dev for a given pci_dev which also removes a use of pci_dn in
generic EEH code.

This patch also renames eeh_device_add_late() to eeh_device_probe(). This
better reflects what it does does and removes the last vestiges of the
early/late EEH probe split.

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306073904.4737-6-oohall@gmail.com
2020-03-25 12:09:39 +11:00
Oliver O'Halloran b6eebb093c powerpc/eeh: Make early EEH init pseries specific
The eeh_ops->probe() function is called from two different contexts:

1. On pseries, where we set EEH_PROBE_MODE_DEVTREE, it's called in
   eeh_add_device_early() which is supposed to run before we create
   a pci_dev.

2. On PowerNV, where we set EEH_PROBE_MODE_DEV, it's called in
   eeh_device_add_late() which is supposed to run *after* the
   pci_dev is created.

The "early" probe is required because PAPR requires that we perform an RTAS
call to enable EEH support on a device before we start interacting with it
via config space or MMIO. This requirement doesn't exist on PowerNV and
shoehorning two completely separate initialisation paths into a common
interface just results in a convoluted code everywhere.

Additionally the early probe requires the probe function to take an pci_dn
rather than a pci_dev argument. We'd like to make pci_dn a pseries specific
data structure since there's no real requirement for them on PowerNV. To
help both goals move the early probe into the pseries containment zone
so the platform depedence is more explicit.

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306073904.4737-5-oohall@gmail.com
2020-03-25 12:09:39 +11:00