OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ganesh Goudar	cc15ff3275	powerpc/mce: Avoid using irq_work_queue() in realmode In realmode mce handler we use irq_work_queue() to defer the processing of mce events, irq_work_queue() can only be called when translation is enabled because it touches memory outside RMA, hence we enable translation before calling irq_work_queue and disable on return, though it is not safe to do in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220120121931.517974-1-ganeshgr@linux.ibm.com	2022-03-08 00:05:00 +11:00
Ganesh Goudar	0a182611d1	powerpc/mce: Modify the real address error logging messages To avoid ambiguity, modify the strings in real address error logging messages to "foreign/control memory" from "foreign", Since the error discriptions in P9 user manual and P10 user manual are different for same type of errors. P9 User Manual for MCE: DSISR:59 Host real address to foreign space during translation. DSISR:60 Host real address to foreign space on a load or store access. P10 User Manual for MCE: DSISR:59 D-side tablewalk used a host real address in the control memory address range. DSISR:60 D-side operand access to control memory address space. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220107141428.67862-3-ganeshgr@linux.ibm.com	2022-03-08 00:05:00 +11:00
Ganesh Goudar	0f54bddefe	powerpc/pseries: Parse control memory access error Add support to parse and log control memory access error for pseries. These changes are made according to PAPR v2.11 10.3.2.2.12. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220107141428.67862-1-ganeshgr@linux.ibm.com	2022-03-08 00:04:59 +11:00
Naveen N. Rao	49c3af43e6	powerpc/bpf: Simplify bpf_to_ppc() and adopt it for powerpc64 Convert bpf_to_ppc() to a macro to help simplify its usage since codegen_context is available in all places it is used. Adopt it also for powerpc64 for uniformity and get rid of the global b2p structure. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/09f0540ce3e0cd4120b5b33993b5e73b6ef9e979.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:59 +11:00
Jordan Niethe	3a3fc9bf10	powerpc64/bpf: Store temp registers' bpf to ppc mapping In bpf_jit_build_body(), the mapping of TMP_REG_1 and TMP_REG_2's bpf register to ppc register is evalulated at every use despite not changing. Instead, determine the ppc register once and store the result. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [Rebased, converted additional usage sites] Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0944e2f0fa6dd254ea401f1c946fb6c9a5294278.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:59 +11:00
Naveen N. Rao	036d559c0b	powerpc/bpf: Use _Rn macros for GPRs Use _Rn macros to specify register names to make their usage clear. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7df626b8cdc6141d4295ac16137c82ad570b6637.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:59 +11:00
Naveen N. Rao	576a6c3a00	powerpc/bpf: Move bpf_jit64.h into bpf_jit_comp64.c There is no need for a separate header anymore. Move the contents of bpf_jit64.h into bpf_jit_comp64.c Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b873a8e6eff7d91bf2a2cabdd53082aadfe20761.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:59 +11:00
Naveen N. Rao	7b187dcdb5	powerpc/bpf: Cleanup bpf_jit.h - PPC_EX32() is only used by ppc32 JIT. Move it to bpf_jit_comp32.c - PPC_LI64() is only valid in ppc64. #ifdef it - PPC_FUNC_ADDR() is not used anymore. Remove it. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/58f5b66b2f8546bbbee620f62103a8e97a63eb7c.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:59 +11:00
Naveen N. Rao	794abc08d7	powerpc64/bpf: Get rid of PPC_BPF_[LL\|STL\|STLU] macros All these macros now have a single user. Expand their usage in place. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e0526fc7633a34f983a7a330712b55bdfaf20482.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:58 +11:00
Naveen N. Rao	391c271f4d	powerpc64/bpf: Convert some of the uses of PPC_BPF_[LL\|STL] to PPC_BPF_[LD\|STD] PPC_BPF_[LL\|STL] are macros meant for scenarios where we may have to deal with a non-word aligned offset. Limit their usage to only those scenarios by converting the rest to just use PPC_BPF_[LD\|STD]. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0eb472428165a307f6fdaf22b0c33cbf13a9a635.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:58 +11:00
Naveen N. Rao	74bbe3f084	powerpc/bpf: Rename PPC_BL_ABS() to PPC_BL() PPC_BL_ABS() is just doing a relative branch with link. The name suggests that it is for branching to an absolute address, which is incorrect. Rename the macro to a more appropriate PPC_BL(). Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f0e57b6c7a6ee40dba645535b70da46f46e8af5e.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:58 +11:00
Naveen N. Rao	feb6307289	powerpc64/bpf: Optimize instruction sequence used for function calls When calling BPF helpers, we load the function address to call into a register. This can result in upto 5 instructions. Optimize this by instead using the kernel toc in r2 and adjusting offset to the BPF helper. This works since all BPF helpers are part of kernel text, and all BPF programs/functions utilize the kernel TOC. Further more: - load the actual function entry address in elf v1, rather than loading it through the function descriptor address. - load the Local Entry Point (LEP) in elf v2 skipping TOC setup. - consolidate code across elf abi v1 and v2 by using r12 on both. Reported-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1233c7544e60dcb021c52b1f840b0f21a87b33ed.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:58 +11:00
Naveen N. Rao	43d636f8b4	powerpc64/bpf elfv1: Do not load TOC before calling functions BPF helpers always reside in core kernel and all BPF programs use the kernel TOC. As such, there is no need to load the TOC before calling helpers or other BPF functions. Drop code to do the same. Add a check to ensure we don't proceed if this assumption ever changes in future. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a3cd3da4d24d95d845cd10382b1af083600c9074.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:58 +11:00
Naveen N. Rao	b10cb163c4	powerpc64/bpf elfv2: Setup kernel TOC in r2 on entry In preparation for using kernel TOC, load the same in r2 on entry. With elfv1, the kernel TOC is already setup by our caller. We adjust the number of instructions to skip on a tail call accordingly. We get rid of the #ifdef in bpf_jit_emit_tail_call() since FUNCTION_DESCR_SIZE is itself under a #ifdef. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/18a05a4ceec14a8617c9dd4b7128d0afa83fd14e.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:58 +11:00
Naveen N. Rao	4eeac2b0aa	powerpc64: Set PPC64_ELF_ABI_v[1\|2] macros to 1 Set macros to 1 so that they can be used with __is_defined(). Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/abad4868416ddfd42893f99c0cad8e5faf998095.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:57 +11:00
Naveen N. Rao	1d4866d565	powerpc64/bpf: Use r12 for constant blinding In preparation for preserving kernel toc in r2, switch BPF_REG_AX from r2 to r12. r12 is not used by bpf JIT except during external helper/bpf calls, or with BPF_NOSPEC. These sequences aren't emitted when BPF_REG_AX is used for constant blinding and other purposes. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e109f98617eacb4512c17a48525e94eda42889e6.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:57 +11:00
Naveen N. Rao	c2067f7f88	powerpc64/bpf: Do not save/restore LR on each call to bpf_stf_barrier() Instead of saving and restoring LR before each invocation to bpf_stf_barrier(), set SEEN_FUNC flag so that we save/restore LR in prologue/epilogue. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4446f25478d82a2a4ac9dab2ebdfd88ddf923eb7.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:57 +11:00
Naveen N. Rao	0ffdbce6f4	powerpc/bpf: Handle large branch ranges with BPF_EXIT In some scenarios, it is possible that the program epilogue is outside the branch range for a BPF_EXIT instruction. Instead of rejecting such programs, emit epilogue as an alternate exit point from the program. Track the location of the same so that subsequent exits can take either of the two paths. Reported-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/33aa2e92645a92712be23b18035a2c6dcb92ff8d.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:57 +11:00
Naveen N. Rao	bafb5898de	powerpc/bpf: Emit a single branch instruction for known short branch ranges PPC_BCC() emits two instructions to accommodate scenarios where we need to branch outside the range of a conditional branch. PPC_BCC_SHORT() emits a single branch instruction and can be used when the branch is known to be within a conditional branch range. Convert some of the uses of PPC_BCC() in the powerpc BPF JIT over to PPC_BCC_SHORT() where we know the branch range. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/edbca01377d1d5f472868bf6d8962b0a0d85b96f.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:57 +11:00
Naveen N. Rao	acd7408d27	powerpc/bpf: Skip branch range validation during first pass During the first pass, addrs[] is still being populated. So, all branches to following instructions will appear to be going to the start of the JIT program. Ignore branch range validation for such instructions and assume those to be in range. Branch range validation will happen during the second pass after addrs[] is setup properly. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bc517413d11636e20dbfc88503dad14bcbe391e2.1644834730.git.naveen.n.rao@linux.vnet.ibm.com	2022-03-08 00:04:57 +11:00
Michael Ellerman	591b4b2684	powerpc/code-patching: Pre-map patch area Paul reported a warning with DEBUG_ATOMIC_SLEEP=y: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 0, expected: 0 ... Call Trace: dump_stack_lvl+0xa0/0xec (unreliable) __might_resched+0x2f4/0x310 kmem_cache_alloc+0x220/0x4b0 __pud_alloc+0x74/0x1d0 hash__map_kernel_page+0x2cc/0x390 do_patch_instruction+0x134/0x4a0 arch_jump_label_transform+0x64/0x78 __jump_label_update+0x148/0x180 static_key_enable_cpuslocked+0xd0/0x120 static_key_enable+0x30/0x50 check_kvm_guest+0x60/0x88 pSeries_smp_probe+0x54/0xb0 smp_prepare_cpus+0x3e0/0x430 kernel_init_freeable+0x20c/0x43c kernel_init+0x30/0x1a0 ret_from_kernel_thread+0x5c/0x64 Peter pointed out that this is because do_patch_instruction() has disabled interrupts, but then map_patch_area() calls map_kernel_page() then hash__map_kernel_page() which does a sleeping memory allocation. We only see the warning in KVM guests with SMT enabled, which is not particularly common, or on other platforms if CONFIG_KPROBES is disabled, also not common. The reason we don't see it in most configurations is that another path that happens to have interrupts enabled has allocated the required page tables for us, eg. there's a path in kprobes init that does that. That's just pure luck though. As Christophe suggested, the simplest solution is to do a dummy map/unmap when we initialise the patching, so that any required page table levels are pre-allocated before the first call to do_patch_instruction(). This works because the unmap doesn't free any page tables that were allocated by the map, it just clears the PTE, leaving the page table levels there for the next map. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Debugged-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220223015821.473097-1-mpe@ellerman.id.au	2022-03-08 00:04:57 +11:00
Michael Ellerman	d4679ac8ea	powerpc/64s: Don't use DSISR for SLB faults Since commit `46ddcb3950` ("powerpc/mm: Show if a bad page fault on data is read or write.") we use page_fault_is_write(regs->dsisr) in __bad_page_fault() to determine if the fault is for a read or write, and change the message printed accordingly. But SLB faults, aka Data Segment Interrupts, don't set DSISR (Data Storage Interrupt Status Register) to a useful value. All ISA versions from v2.03 through v3.1 specify that the Data Segment Interrupt sets DSISR "to an undefined value". As far as I can see there's no mention of SLB faults setting DSISR in any BookIV content either. This manifests as accesses that should be a read being incorrectly reported as writes, for example, using the xmon "dump" command: 0:mon> d 0x5deadbeef0000000 5deadbeef0000000 [359526.415354][ C6] BUG: Unable to handle kernel data access on write at 0x5deadbeef0000000 [359526.415611][ C6] Faulting instruction address: 0xc00000000010a300 cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf400] pc: c00000000010a300: mread+0x90/0x190 If we disassemble the PC, we see a load instruction: 0:mon> di c00000000010a300 c00000000010a300 89490000 lbz r10,0(r9) We can also see in exceptions-64s.S that the data_access_slb block doesn't set IDSISR=1, which means it doesn't load DSISR into pt_regs. So the value we're using to determine if the fault is a read/write is some stale value in pt_regs from a previous page fault. Rework the printing logic to separate the SLB fault case out, and only print read/write in the cases where we can determine it. The result looks like eg: 0:mon> d 0x5deadbeef0000000 5deadbeef0000000 [ 721.779525][ C6] BUG: Unable to handle kernel data access at 0x5deadbeef0000000 [ 721.779697][ C6] Faulting instruction address: 0xc00000000014cbe0 cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf390] 0:mon> d 0 0000000000000000 [ 742.793242][ C6] BUG: Kernel NULL pointer dereference at 0x00000000 [ 742.793316][ C6] Faulting instruction address: 0xc00000000014cbe0 cpu 0x6: Vector: 380 (Data SLB Access) at [c00000000ffbf390] Fixes: `46ddcb3950` ("powerpc/mm: Show if a bad page fault on data is read or write.") Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20220222113449.319193-1-mpe@ellerman.id.au	2022-03-08 00:04:56 +11:00
Jakob Koschel	fa1321b11b	powerpc/sysdev: fix incorrect use to determine if list is empty 'gtm' will always be set by list_for_each_entry(). It is incorrect to assume that the iterator value will be NULL if the list is empty. Instead of checking the pointer it should be checked if the list is empty. Fixes: `83ff9dcf37` ("powerpc/sysdev: implement FSL GTM support") Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220228142434.576226-1-jakobkoschel@gmail.com	2022-03-08 00:04:56 +11:00
Haren Myneni	37e6764895	powerpc/pseries/vas: Add VAS migration handler Since the VAS windows belong to the VAS hardware resource, the hypervisor expects the partition to close them on source partition and reopen them after the partition migrated on the destination machine. This handler is called before pseries_suspend() to close these windows and again invoked after migration. All active windows for both default and QoS types will be closed and mark them inactive and reopened after migration with this handler. During the migration, the user space receives paste instruction failure if it issues copy/paste on these inactive windows. The current migration implementation does not freeze the user space and applications can continue to open VAS windows while migration is in progress. So when the migration_in_progress flag is set, VAS open window API returns -EBUSY. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/05e45ff4f8babd2490ccb7ae923884f4aa21a7e5.camel@linux.ibm.com	2022-03-08 00:04:56 +11:00
Haren Myneni	716d7a2e37	powerpc/pseries/vas: Modify reconfig open/close functions for migration VAS is a hardware engine stays on the chip. So when the partition migrates, all VAS windows on the source system have to be closed and reopen them on the destination after migration. The kernel has to consider both DLPAR CPU and migration events to take action on VAS windows. So using VAS_WIN_NO_CRED_CLOSE and VAS_WIN_MIGRATE_CLOSE status bits and windows will be reopened after migration only after both status bits are cleared. This patch make changes to the current reconfig_open/close_windows functions to support migration: - Set VAS_WIN_MIGRATE_CLOSE to the window status when closes and reopen windows with the same status during resume. - Continue to close all windows even if deallocate HCALL failed (should not happen) since no way to stop migration with the current LPM implementation. - If the DLPAR CPU event happens while migration is in progress, set VAS_WIN_NO_CRED_CLOSE to the window status. Close window happens with the first event (migration or DLPAR) and Reopen window happens only with the last event (migration or DLPAR). Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0aad580387cb58379496b4cbbd7c5596e9ea70be.camel@linux.ibm.com	2022-03-08 00:04:56 +11:00
Haren Myneni	278fe1cc22	powerpc/pseries/vas: Define global hv_cop_caps struct The coprocessor capabilities struct is used to get default and QoS capabilities from the hypervisor during init, DLPAR event and migration. So instead of allocating this struct for each event, define global struct and reuse it which allows the migration code to avoid adding an error path. Also disable copy/paste feature flag if any capabilities HCALL is failed. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/57da6a270fcb9308cd57be7c88037029343080f7.camel@linux.ibm.com	2022-03-08 00:04:56 +11:00
Haren Myneni	45f06eac30	powerpc/pseries/vas: Add 'update_total_credits' entry for QoS capabilities pseries supports two types of credits - Default (uses normal priority FIFO) and Qality of service (QoS uses high priority FIFO). The user decides the number of QoS credits and sets this value with HMC interface. The total credits for QoS capabilities can be changed dynamically with HMC interface which invokes drmgr to communicate to the kernel. This patch creats 'update_total_credits' entry for QoS capabilities so that drmgr command can write the new target QoS credits in sysfs. Instead of using this value, the kernel gets the new QoS capabilities from the hypervisor whenever update_total_credits is updated to make sure sync with the QoS target credits in the hypervisor. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b01ef31a0f964686d00243e7de7f09c73c07e69e.camel@linux.ibm.com	2022-03-08 00:04:56 +11:00
Haren Myneni	b903737bc5	powerpc/pseries/vas: sysfs interface to export capabilities The hypervisor provides the available VAS GZIP capabilities such as default or QoS window type and the target available credits in each type. This patch creates sysfs entries and exports the target, used and the available credits for each feature. This interface can be used by the user space to determine the credits usage or to set the target credits in the case of QoS type (for DLPAR). /sys/devices/vas/vas0/gzip/default_capabilities (default GZIP capabilities) nr_total_credits /* Total credits available. Can be /* changed with DLPAR operation / nr_used_credits / Used credits */ /sys/devices/vas/vas0/gzip/qos_capabilities (QoS GZIP capabilities) nr_total_credits nr_used_credits Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/702d8b626ebfac2b52f4995eebeafe1c9a6fcb75.camel@linux.ibm.com	2022-03-08 00:04:56 +11:00
Haren Myneni	c656cfe571	powerpc/pseries/vas: Reopen windows with DLPAR core add VAS windows can be closed in the hypervisor due to lost credits when the core is removed and the kernel gets fault for NX requests on these inactive windows. If the NX requests are issued on these inactive windows, OS gets page faults and the paste failure will be returned to the user space. If the lost credits are available later with core add, reopen these windows and set them active. Later when the OS sees page faults on these active windows, it creates mapping on the new paste address. Then the user space can continue to use these windows and send HW compression requests to NX successfully. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d9f360e21355e6826142c81146acfa9b60bc7ecc.camel@linux.ibm.com	2022-03-08 00:04:55 +11:00
Haren Myneni	8ef7b9e176	powerpc/pseries/vas: Close windows with DLPAR core removal The hypervisor assigns vas credits (windows) for each LPAR based on the number of cores configured in that system. The OS is expected to release credits when cores are removed, and may allocate more when cores are added. So there is a possibility of using excessive credits (windows) in the LPAR and the hypervisor expects the system to close the excessive windows so that NX load can be equally distributed across all LPARs in the system. When the OS closes the excessive windows in the hypervisor, it sets the window status inactive and invalidates window virtual address mapping. The user space receives paste instruction failure if any NX requests are issued on the inactive window. Then the user space can use with the available open windows or retry NX requests until this window active again. This patch also adds the notifier for core removal/add to close windows in the hypervisor if the system lost credits (core removal) and reopen windows in the hypervisor when the previously lost credits are available. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/108928f9c00a48cc6a722315d482d07cf66acf5a.camel@linux.ibm.com	2022-03-08 00:04:55 +11:00
Haren Myneni	6a8d4ca891	powerpc/vas: Map paste address only if window is active The paste address mapping is done with mmap() after the window is opened with ioctl. The partition has to close VAS windows in the hypervisor if it lost credits due to DLPAR core removal. But the kernel marks these windows inactive until the previously lost credits are available later. If the window is inactive due to DLPAR after this mmap(), the paste instruction returns failure until the the OS reopens this window again. Before the user space issuing mmap(), there is a possibility of happening DLPAR core removal event which causes the corresponding window inactive. So if the window is not active, return mmap() failure with -EACCES and expects the user space reissue mmap() when the window is active or open a new window when the credit is available. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bbb203c26b324534e25658cb1dbbcb5160a2f93a.camel@linux.ibm.com	2022-03-08 00:04:55 +11:00
Haren Myneni	b5c63d90cc	powerpc/vas: Return paste instruction failure if no active window The VAS window may not be active if the system looses credits and the NX generates page fault when it receives request on unmap paste address. The kernel handles the fault by remap new paste address if the window is active again, Otherwise return the paste instruction failure if the executed instruction that caused the fault was a paste. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/492b9aefd593061d51dda67ee4d2fc449c000dce.camel@linux.ibm.com	2022-03-08 00:04:55 +11:00
Haren Myneni	1fe3a33ba0	powerpc/vas: Add paste address mmap fault handler The user space opens VAS windows and issues NX requests by pasting CRB on the corresponding paste address mmap. When the system lost credits due to core removal, the kernel has to close the window in the hypervisor and make the window inactive by unmapping this paste address. Also the OS has to handle NX request page faults if the user space issue NX requests. This handler maps the new paste address with the same VMA when the window is active again (due to core add with DLPAR). Otherwise returns paste failure. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3956e1c1fdfde69127055ff1c0256c7d71104030.camel@linux.ibm.com	2022-03-08 00:04:55 +11:00
Haren Myneni	976410cd2c	powerpc/pseries/vas: Save PID in pseries_vas_window struct The kernel sets the VAS window with PID when it is opened in the hypervisor. During DLPAR operation, windows can be closed and reopened in the hypervisor when the credit is available. So saves this PID in pseries_vas_window struct when the window is opened initially and reuse it later during DLPAR operation. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a57cbe6d292fe49ad55a0b49c5679d6a24d8fe73.camel@linux.ibm.com	2022-03-08 00:04:55 +11:00
Haren Myneni	40562fe4fa	powerpc/pseries/vas: Use common names in VAS capability structure nr_total/nr_used_credits provides credits usage to user space via sysfs and the same interface can be used on PowerNV in future. Changed with proper naming so that applicable on both pseries and PowerNV. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f4313e9f198ee4f8d4fa4d015d8d1873e17851e6.camel@linux.ibm.com	2022-03-08 00:04:54 +11:00
Michael Ellerman	9ef78b6293	Merge branch 'topic/ppc-kvm' into next Merge our topic branch containing powerpc KVM related commits. Alexey Kardashevskiy (1): KVM: PPC: Merge powerpc's debugfs entry content into generic entry Fabiano Rosas (9): KVM: PPC: Book3S HV: Stop returning internal values to userspace KVM: PPC: Fix vmx/vsx mixup in mmio emulation KVM: PPC: mmio: Reject instructions that access more than mmio.data size KVM: PPC: mmio: Return to guest after emulation failure KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init KVM: PPC: Book3S HV: Delay setting of kvm ops KVM: PPC: Book3S HV: Free allocated memory if module init fails KVM: PPC: Decrement module refcount if init_vm fails Jason Wang (1): powerpc/kvm: no need to initialise statics to 0 Nour-eddine Taleb (1): KVM: PPC: Book3S HV: remove unnecessary casts	2022-03-08 00:02:29 +11:00
Michael Ellerman	4bc06c59f6	Merge branch 'topic/func-desc-lkdtm' into next Merge a topic branch we are maintaining with some cross-architecture changes to function descriptor handling and their use in LKDTM. From Christophe's cover letter: Fix LKDTM for PPC64/IA64/PARISC PPC64/IA64/PARISC have function descriptors. LKDTM doesn't work on those three architectures because LKDTM messes up function descriptors with functions. This series does some cleanup in the three architectures and refactors function descriptors so that it can then easily use it in a generic way in LKDTM.	2022-03-07 23:34:32 +11:00
Nicholas Piggin	c7fa848ff0	KVM: PPC: Book3S HV P9: Fix "lost kick" race When new work is created that requires attention from the hypervisor (e.g., to inject an interrupt into the guest), fast_vcpu_kick is used to pull the target vcpu out of the guest if it may have been running. Therefore the work creation side looks like this: vcpu->arch.doorbell_request = 1; kvmppc_fast_vcpu_kick_hv(vcpu) { smp_mb(); cpu = vcpu->cpu; if (cpu != -1) send_ipi(cpu); } And the guest entry side should look like this: vcpu->cpu = smp_processor_id(); smp_mb(); if (vcpu->arch.doorbell_request) { // do something (abort entry or inject doorbell etc) } But currently the store and load are flipped, so it is possible for the entry to see no doorbell pending, and the doorbell creation misses the store to set cpu, resulting lost work (or at least delayed until the next guest exit). Fix this by reordering the entry operations and adding a smp_mb between them. The P8 path appears to have a similar race which is commented but not addressed yet. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-2-npiggin@gmail.com	2022-03-07 13:14:30 +11:00
Michael Ellerman	48015b632f	powerpc: Fix STACKTRACE=n build Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get STACKTRACE enabled either. That leads to a build failure since commit `1614b2b11f` ("arch: Make ARCH_STACKWALK independent of STACKTRACE") made stacktrace.c build even when STACKTRACE=n. arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’: arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’ 171 \| nmi_cpu_backtrace(regs); \| ^~~~~~~~~~~~~~~~~ arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’: arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’ 226 \| nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This happens because our headers haven't defined arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to build nmi_cpu_backtrace(). The code in question doesn't actually depend on STACKTRACE=y, that was just added because arch_trigger_cpumask_backtrace() lived in stacktrace.c for convenience. So drop the dependency on CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build nmi_cpu_backtrace() etc. and fixes the build. Fixes: `1614b2b11f` ("arch: Make ARCH_STACKWALK independent of STACKTRACE") [mpe: Cherry pick of `5a72345e6a` from next into fixes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220212111349.2806972-1-mpe@ellerman.id.au	2022-03-07 10:26:20 +11:00
Murilo Opsfelder Araujo	58dbe9b373	powerpc/64s: Fix build failure when CONFIG_PPC_64S_HASH_MMU is not set The following build failure occurs when CONFIG_PPC_64S_HASH_MMU is not set: arch/powerpc/kernel/setup_64.c: In function ‘setup_per_cpu_areas’: arch/powerpc/kernel/setup_64.c:811:21: error: ‘mmu_linear_psize’ undeclared (first use in this function); did you mean ‘mmu_virtual_psize’? 811 \| if (mmu_linear_psize == MMU_PAGE_4K) \| ^~~~~~~~~~~~~~~~ \| mmu_virtual_psize arch/powerpc/kernel/setup_64.c:811:21: note: each undeclared identifier is reported only once for each function it appears in Move the declaration of mmu_linear_psize outside of CONFIG_PPC_64S_HASH_MMU ifdef. After the above is fixed, it fails later with the following error: ld: arch/powerpc/kexec/file_load_64.o: in function `.arch_kexec_kernel_image_probe': file_load_64.c:(.text+0x1c1c): undefined reference to `.add_htab_mem_range' Fix that, too, by conditioning add_htab_mem_range() symbol to CONFIG_PPC_64S_HASH_MMU. Fixes: `387e220a2e` ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU") Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215567 Link: https://lore.kernel.org/r/20220301204743.45133-1-muriloo@linux.ibm.com	2022-03-05 20:42:21 +11:00
Nour-eddine Taleb	e40b38a41c	KVM: PPC: Book3S HV: remove unnecessary casts Remove unnecessary casts, from "void " to "struct kvmppc_xics " Signed-off-by: Nour-eddine Taleb <kernel.noureddine@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303143416.201851-1-kernel.noureddine@gmail.com	2022-03-04 12:58:46 +11:00
Christoph Hellwig	27674ef6c7	mm: remove the extra ZONE_DEVICE struct page refcount ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE pages. Note that this excludes the special idle page wakeup for fsdax pages, which still happens at refcount 1. This is a separate issue and will be sorted out later. Given that only fsdax pages require the notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig symbol can go away and be replaced with a FS_DAX check for this hook in the put_page fastpath. Based on an earlier patch from Ralph Campbell <rcampbell@nvidia.com>. Link: https://lkml.kernel.org/r/20220210072828.2930359-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-03-03 12:47:33 -05:00
Christoph Hellwig	dc90f0846d	mm: don't include <linux/memremap.h> in <linux/mm.h> Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-03-03 12:47:33 -05:00
Anders Roxell	8219d31eff	powerpc/lib/sstep: Fix build errors with newer binutils Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian 2.37.90.20220207) the following build error shows up: {standard input}: Assembler messages: {standard input}:10576: Error: unrecognized opcode: `stbcx.' {standard input}:10680: Error: unrecognized opcode: `lharx' {standard input}:10694: Error: unrecognized opcode: `lbarx' Rework to add assembler directives [1] around the instruction. The problem with this might be that we can trick a power6 into single-stepping through an stbcx. for instance, and it will execute that in kernel mode. [1] https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo Fixes: `350779a29f` ("powerpc: Handle most loads and stores in instruction emulation code") Cc: stable@vger.kernel.org # v4.14+ Co-developed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224162215.3406642-3-anders.roxell@linaro.org	2022-03-01 23:51:09 +11:00
Anders Roxell	8667d0d64d	powerpc: Fix build errors with newer binutils Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian 2.37.90.20220207) the following build error shows up: {standard input}: Assembler messages: {standard input}:1190: Error: unrecognized opcode: `stbcix' {standard input}:1433: Error: unrecognized opcode: `lwzcix' {standard input}:1453: Error: unrecognized opcode: `stbcix' {standard input}:1460: Error: unrecognized opcode: `stwcix' {standard input}:1596: Error: unrecognized opcode: `stbcix' ... Rework to add assembler directives [1] around the instruction. Going through them one by one shows that the changes should be safe. Like __get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(), which according to the name is specific to power9. And __raw_rm_read*() are only called in things that are powernv or book3s_hv specific. [1] https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo Cc: stable@vger.kernel.org Co-developed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> [mpe: Make commit subject more descriptive] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224162215.3406642-2-anders.roxell@linaro.org	2022-03-01 23:51:08 +11:00
Anders Roxell	a633cb1edd	powerpc/lib/sstep: Fix 'sthcx' instruction Looks like there been a copy paste mistake when added the instruction 'stbcx' twice and one was probably meant to be 'sthcx'. Changing to 'sthcx' from 'stbcx'. Fixes: `350779a29f` ("powerpc: Handle most loads and stores in instruction emulation code") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224162215.3406642-1-anders.roxell@linaro.org	2022-03-01 23:51:08 +11:00
Michael Ellerman	2863dd2db2	powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bit When CONFIG_GENERIC_CPU=y (true for all our defconfigs) we pass -mcpu=powerpc64 to the compiler, even when we're building a 32-bit kernel. This happens because we have an ifdef CONFIG_PPC_BOOK3S_64/else block in the Makefile that was written before 32-bit supported GENERIC_CPU. Prior to that the else block only applied to 64-bit Book3E. The GCC man page says -mcpu=powerpc64 "[specifies] a pure ... 64-bit big endian PowerPC ... architecture machine [type], with an appropriate, generic processor model assumed for scheduling purposes." It's unclear how that interacts with -m32, which we are also passing, although obviously -m32 is taking precedence in some sense, as the 32-bit kernel only contains 32-bit instructions. This was noticed by inspection, not via any bug reports, but it does affect code generation. Comparing before/after code generation, there are some changes to instruction scheduling, and the after case (with -mcpu=powerpc64 removed) the compiler seems more keen to use r8. Fix it by making the else case only apply to Book3E 64, which excludes 32-bit. Fixes: `0e00a8c9fd` ("powerpc: Allow CPU selection also on PPC32") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220215112858.304779-1-mpe@ellerman.id.au	2022-03-01 23:51:04 +11:00
Daniel Henrique Barboza	749ed4a206	powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties() Executing node_set_online() when nid = NUMA_NO_NODE results in an undefined behavior. node_set_online() will call node_set_state(), into __node_set(), into set_bit(), and since NUMA_NO_NODE is -1 we'll end up doing a negative shift operation inside arch/powerpc/include/asm/bitops.h. This potential UB was detected running a kernel with CONFIG_UBSAN. The behavior was introduced by commit `10f78fd0da` ("powerpc/numa: Fix a regression on memoryless node 0"), where the check for nid > 0 was removed to fix a problem that was happening with nid = 0, but the result is that now we're trying to online NUMA_NO_NODE nids as well. Checking for nid >= 0 will allow node 0 to be onlined while avoiding this UB with NUMA_NO_NODE. Fixes: `10f78fd0da` ("powerpc/numa: Fix a regression on memoryless node 0") Reported-by: Ping Fang <pifang@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224182312.1012527-1-danielhb413@gmail.com	2022-03-01 23:41:01 +11:00
Christophe Leroy	973e2e6462	powerpc/interrupt: Remove struct interrupt_state Since commit `ceff77efa4` ("powerpc/64e/interrupt: Use new interrupt context tracking scheme") struct interrupt_state has been empty and unused. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1d862ce3eab3da6ca7ac47d4a78a18f154462511.1645806970.git.christophe.leroy@csgroup.eu	2022-03-01 23:41:00 +11:00
Hari Bathini	607451ce0a	powerpc/fadump: register for fadump as early as possible Crash recovery (fadump) is setup in the userspace by some service. This service rebuilds initrd with dump capture capability, if it is not already dump capture capable before proceeding to register for firmware assisted dump (echo 1 > /sys/kernel/fadump/registered). But arming the kernel with crash recovery support does not have to wait for userspace configuration. So, register for fadump while setting it up itself. This can at worst lead to a scenario, where /proc/vmcore is ready afer crash but the initrd does not know how/where to offload it, which is always better than not having a /proc/vmcore at all due to incomplete configuration in the userspace at the time of crash. Commit `0823c68b05` ("powerpc/fadump: re-register firmware-assisted dump if already registered") ensures this change does not break userspace. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> [mpe: Reword comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220201105305.155511-1-hbathini@linux.ibm.com	2022-03-01 23:41:00 +11:00
Kees Cook	2792d84e6d	usercopy: Check valid lifetime via stack depth One of the things that CONFIG_HARDENED_USERCOPY sanity-checks is whether an object that is about to be copied to/from userspace is overlapping the stack at all. If it is, it performs a number of inexpensive bounds checks. One of the finer-grained checks is whether an object crosses stack frames within the stack region. Doing this on x86 with CONFIG_FRAME_POINTER was cheap/easy. Doing it with ORC was deemed too heavy, and was left out (a while ago), leaving the courser whole-stack check. The LKDTM tests USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM try to exercise these cross-frame cases to validate the defense is working. They have been failing ever since ORC was added (which was expected). While Muhammad was investigating various LKDTM failures[1], he asked me for additional details on them, and I realized that when exact stack frame boundary checking is not available (i.e. everything except x86 with FRAME_POINTER), it could check if a stack object is at least "current depth valid", in the sense that any object within the stack region but not between start-of-stack and current_stack_pointer should be considered unavailable (i.e. its lifetime is from a call no longer present on the stack). Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures have actually implemented the common global register alias. Additionally report usercopy bounds checking failures with an offset from current_stack_pointer, which may assist with diagnosing failures. The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests (once slightly adjusted in a separate patch) pass again with this fixed. [1] https://github.com/kernelci/kernelci-project/issues/84 Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Kees Cook <keescook@chromium.org> --- v1: https://lore.kernel.org/lkml/20220216201449.2087956-1-keescook@chromium.org v2: https://lore.kernel.org/lkml/20220224060342.1855457-1-keescook@chromium.org v3: https://lore.kernel.org/lkml/20220225173345.3358109-1-keescook@chromium.org v4: - improve commit log (akpm)	2022-02-25 18:20:11 -08:00
Arnd Bergmann	dd865f090f	Merge branch 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic into asm-generic Christoph Hellwig and a few others spent a huge effort on removing set_fs() from most of the important architectures, but about half the other architectures were never completed even though most of them don't actually use set_fs() at all. I did a patch for microblaze at some point, which turned out to be fairly generic, and now ported it to most other architectures, using new generic implementations of access_ok() and __{get,put}_kernel_nocheck(). Three architectures (sparc64, ia64, and sh) needed some extra work, which I also completed. * 'set_fs-4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: uaccess: remove CONFIG_SET_FS ia64: remove CONFIG_SET_FS support sh: remove CONFIG_SET_FS support sparc64: remove CONFIG_SET_FS support lib/test_lockup: fix kernel pointer check for separate address spaces uaccess: generalize access_ok() uaccess: fix type mismatch warnings from access_ok() arm64: simplify access_ok() m68k: fix access_ok for coldfire MIPS: use simpler access_ok() MIPS: Handle address errors for accesses above CPU max virtual user address uaccess: add generic __{get,put}_kernel_nofault nios2: drop access_ok() check from __put_user() x86: use more conventional access_ok() definition x86: remove __range_not_ok() sparc64: add __{get,put}_kernel_nofault() nds32: fix access_ok() checks in get/put_user uaccess: fix nios2 and microblaze get_user_8() uaccess: fix integer overflow on access_ok()	2022-02-25 11:16:58 +01:00
Arnd Bergmann	12700c17fc	uaccess: generalize access_ok() There are many different ways that access_ok() is defined across architectures, but in the end, they all just compare against the user_addr_max() value or they accept anything. Provide one definition that works for most architectures, checking against TASK_SIZE_MAX for user processes or skipping the check inside of uaccess_kernel() sections. For architectures without CONFIG_SET_FS(), this should be the fastest check, as it comes down to a single comparison of a pointer against a compile-time constant, while the architecture specific versions tend to do something more complex for historic reasons or get something wrong. Type checking for __user annotations is handled inconsistently across architectures, but this is easily simplified as well by using an inline function that takes a 'const void __user *' argument. A handful of callers need an extra __user annotation for this. Some architectures had trick to use 33-bit or 65-bit arithmetic on the addresses to calculate the overflow, however this simpler version uses fewer registers, which means it can produce better object code in the end despite needing a second (statically predicted) branch. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Stafford Horne <shorne@gmail.com> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-25 09:36:05 +01:00
Arnd Bergmann	23fc539e81	uaccess: fix type mismatch warnings from access_ok() On some architectures, access_ok() does not do any argument type checking, so replacing the definition with a generic one causes a few warnings for harmless issues that were never caught before. Fix the ones that I found either through my own test builds or that were reported by the 0-day bot. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-25 09:36:05 +01:00
Arnd Bergmann	34737e2698	uaccess: add generic __{get,put}_kernel_nofault Nine architectures are still missing __{get,put}_kernel_nofault: alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa. Add a generic version that lets everything use the normal copy_{from,to}_kernel_nofault() code based on these, removing the last use of get_fs()/set_fs() from architecture-independent code. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-25 09:36:05 +01:00
Jakub Kicinski	aaa25a2fa7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/testing/selftests/net/mptcp/mptcp_join.sh `34aa6e3bcc` ("selftests: mptcp: add ip mptcp wrappers") `857898eb4b` ("selftests: mptcp: add missing join check") `6ef84b1517` ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c `fb7e76ea3f` ("net/mlx5e: TC, Skip redundant ct clear actions") `c63741b426` ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") `09bf979232` ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") `84ba8062e3` ("net/mlx5e: Test CT and SAMPLE on flow attr") `efe6f961cd` ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") `3b49a7edec` ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-24 17:54:25 -08:00
Guo Zhengkui	8a0edc72be	powerpc/module_64: fix array_size.cocci warning Fix following coccicheck warning: ./arch/powerpc/kernel/module_64.c:432:40-41: WARNING: Use ARRAY_SIZE. ARRAY_SIZE(arr) is a macro provided by the kernel. It makes sure that arr is an array, so it's safer than sizeof(arr) / sizeof(arr[0]) and more standard. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220223075426.20939-1-guozhengkui@vivo.com	2022-02-24 17:53:55 +11:00
Nicholas Piggin	8b91cee5ea	powerpc/64s/hash: Make hash faults work in NMI context Hash faults are not resoved in NMI context, instead causing the access to fail. This is done because perf interrupts can get backtraces including walking the user stack, and taking a hash fault on those could deadlock on the HPTE lock if the perf interrupt hits while the same HPTE lock is being held by the hash fault code. The user-access for the stack walking will notice the access failed and deal with that in the perf code. The reason to allow perf interrupts in is to better profile hash faults. The problem with this is any hash fault on a kernel access that happens in NMI context will crash, because kernel accesses must not fail. Hard lockups, system reset, machine checks that access vmalloc space including modules and including stack backtracing and symbol lookup in modules, per-cpu data, etc could all run into this problem. Fix this by disallowing perf interrupts in the hash fault code (the direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs to extend to the preload case). This simplifies the tricky logic in hash faults and perf, at the cost of reduced profiling of hash faults. perf can still latch addresses when interrupts are disabled, it just won't get the stack trace at that point, so it would still find hot spots, just sometimes with confusing stack chains. An alternative could be to allow perf interrupts here but always do the slowpath stack walk if we are in nmi context, but that slows down all perf interrupt stack walking on hash though and it does not remove as much tricky code. Reported-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com	2022-02-24 12:46:54 +11:00
Christophe Leroy	406a8c1d8f	powerpc: Remove remaining stab codes Following commit `1231816373` ("powerpc/32: Remove remaining .stabs annotations"), stabs code are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d8b33342d7454f6ca4f368f5206896558dfa06f4.1645538722.git.christophe.leroy@csgroup.eu	2022-02-23 14:49:27 +11:00
Pali Rohár	904b10fb18	PCI: Add defines for normal and subtractive PCI bridges Add these PCI class codes to pci_ids.h: PCI_CLASS_BRIDGE_PCI_NORMAL PCI_CLASS_BRIDGE_PCI_SUBTRACTIVE Use these defines in all kernel code for describing PCI class codes for normal and subtractive PCI bridges. [bhelgaas: similar change in pci-mvebu.c] Link: https://lore.kernel.org/r/20220214114109.26809-1-pali@kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2022-02-17 15:29:35 -06:00
Masahiro Yamada	4a3233c1a6	shmbuf.h: add asm/shmbuf.h to UAPI compile-test coverage asm/shmbuf.h is currently excluded from the UAPI compile-test because of the errors like follows: HDRTEST usr/include/asm/shmbuf.h In file included from ./usr/include/asm/shmbuf.h:6, from <command-line>: ./usr/include/asm-generic/shmbuf.h:26:33: error: field ‘shm_perm’ has incomplete type 26 \| struct ipc64_perm shm_perm; /* operation perms / \| ^~~~~~~~ ./usr/include/asm-generic/shmbuf.h:27:9: error: unknown type name ‘size_t’ 27 \| size_t shm_segsz; / size of segment (bytes) / \| ^~~~~~ ./usr/include/asm-generic/shmbuf.h:40:9: error: unknown type name ‘__kernel_pid_t’ 40 \| __kernel_pid_t shm_cpid; / pid of creator / \| ^~~~~~~~~~~~~~ ./usr/include/asm-generic/shmbuf.h:41:9: error: unknown type name ‘__kernel_pid_t’ 41 \| __kernel_pid_t shm_lpid; / pid of last operator */ \| ^~~~~~~~~~~~~~ The errors can be fixed by replacing size_t with __kernel_size_t and by including proper headers. Then, remove the no-header-test entry from user/include/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-17 09:09:37 +01:00
Masahiro Yamada	72113d0a7d	signal.h: add linux/signal.h and asm/signal.h to UAPI compile-test coverage linux/signal.h and asm/signal.h are currently excluded from the UAPI compile-test because of the errors like follows: HDRTEST usr/include/asm/signal.h In file included from <command-line>: ./usr/include/asm/signal.h:103:9: error: unknown type name ‘size_t’ 103 \| size_t ss_size; \| ^~~~~~ The errors can be fixed by replacing size_t with __kernel_size_t. Then, remove the no-header-test entries from user/include/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2022-02-17 09:09:36 +01:00
Christophe Leroy	e1478d8eaf	asm-generic: Refactor dereference_[kernel]_function_descriptor() dereference_function_descriptor() and dereference_kernel_function_descriptor() are identical on the three architectures implementing them. Make them common and put them out-of-line in kernel/extable.c which is one of the users and has similar type of functions. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/449db09b2eba57f4ab05f80102a67d8675bc8bcd.1644928018.git.christophe.leroy@csgroup.eu	2022-02-16 23:25:11 +11:00
Christophe Leroy	0dc690e4ef	asm-generic: Define 'func_desc_t' to commonly describe function descriptors We have three architectures using function descriptors, each with its own type and name. Add a common typedef that can be used in generic code. Also add a stub typedef for architecture without function descriptors, to avoid a forest of #ifdefs. It replaces the similar 'func_desc_t' previously defined in arch/powerpc/kernel/module_64.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f1f91b142b3c1082bdc1586ce71c9bac1e75213c.1644928018.git.christophe.leroy@csgroup.eu	2022-02-16 23:25:11 +11:00
Christophe Leroy	a257cacc38	asm-generic: Define CONFIG_HAVE_FUNCTION_DESCRIPTORS Replace HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR by a config option named CONFIG_HAVE_FUNCTION_DESCRIPTORS and use it instead of 'dereference_function_descriptor' macro to know whether an arch has function descriptors. To limit churn in one of the following patches, use an #ifdef/#else construct with empty first part instead of an #ifndef in asm-generic/sections.h On powerpc, make sure the config option matches the ABI used by the compiler with a BUILD_BUG_ON() and add missing _CALL_ELF=2 when calling 'sparse' so that sparse sees the same piece of code as GCC. And include a helper to check whether an arch has function descriptors or not : have_function_descriptors() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4a0f11fb0ea74a3197bc44dd7ba25e53a24fd03d.1644928018.git.christophe.leroy@csgroup.eu	2022-02-16 23:25:11 +11:00
Christophe Leroy	2fd986377d	powerpc: Prepare func_desc_t for refactorisation In preparation of making func_desc_t generic, change the ELFv2 version to a struct containing 'addr' element. This allows using single helpers common to ELFv1 and ELFv2 and reduces the amount of #ifdef's Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5c36105e08b27b98450535bff48d71b690c19739.1644928018.git.christophe.leroy@csgroup.eu	2022-02-16 23:25:11 +11:00
Christophe Leroy	0a9c5ae279	powerpc: Remove 'struct ppc64_opd_entry' 'struct ppc64_opd_entry' doesn't belong to uapi/asm/elf.h It was initially in module_64.c and commit `2d291e9027` ("Fix compile failure with non modular builds") moved it into asm/elf.h But it was by mistake added outside of __KERNEL__ section, therefore commit `c3617f7203` ("UAPI: (Scripted) Disintegrate arch/powerpc/include/asm") moved it to uapi/asm/elf.h Now that it is not used anymore by the kernel, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c309ccee65ec2e3802df7a7fe761d0a298584809.1644928018.git.christophe.leroy@csgroup.eu	2022-02-16 23:25:11 +11:00
Christophe Leroy	d3e32b997a	powerpc: Use 'struct func_desc' instead of 'struct ppc64_opd_entry' 'struct ppc64_opd_entry' is somehow redundant with 'struct func_desc', the later is more correct/complete as it includes the third field which is unused. So use 'struct func_desc' instead of 'struct ppc64_opd_entry' Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/34e76bac6cbe95a63ecd37df69fb7feb93b0ea7c.1644928018.git.christophe.leroy@csgroup.eu	2022-02-16 23:25:10 +11:00
Christophe Leroy	5b23cb8cc6	powerpc: Move and rename func_descr_t There are three architectures with function descriptors, try to have common names for the address they contain in order to refactor some functions into generic functions later. powerpc has 'entry' ia64 has 'ip' parisc has 'addr' Vote for 'addr' and update 'func_descr_t' accordingly. Move it in asm/elf.h to have it at the same place on all three architectures, remove the typedef which hides its real type, and change it to a smoother name 'struct func_desc'. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/529b2ba1d001e8f628ef0d30e8044c9b3d0a4921.1644928018.git.christophe.leroy@csgroup.eu	2022-02-16 23:25:10 +11:00
Christophe Leroy	81df21de8f	powerpc: Fix 'sparse' checking on PPC64le 'sparse' is architecture agnostic and knows nothing about ELF ABI version. Just like it gets arch and powerpc type and endian from Makefile, it also need to get _CALL_ELF from there, otherwise it won't set PPC64_ELF_ABI_v2 macro for PPC64le and won't check the correct code. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ac1312f2451aa558bb2a8806b4d0aa2020f0c176.1644928018.git.christophe.leroy@csgroup.eu	2022-02-16 23:25:10 +11:00
Vaibhav Jain	bbbca72352	powerpc/papr_scm: Implement initial support for injecting smart errors Presently PAPR doesn't support injecting smart errors on an NVDIMM. This makes testing the NVDIMM health reporting functionality difficult as simulating NVDIMM health related events need a hacked up qemu version. To solve this problem this patch proposes simulating certain set of NVDIMM health related events in papr_scm. Specifically 'fatal' health state and 'dirty' shutdown state. These error can be injected via the user-space 'ndctl-inject-smart(1)' command. With the proposed patch and corresponding ndctl patches following command flow is expected: $ sudo ndctl list -DH -d nmem0 ... "health_state":"ok", "shutdown_state":"clean", ... # inject unsafe shutdown and fatal health error $ sudo ndctl inject-smart nmem0 -Uf ... "health_state":"fatal", "shutdown_state":"dirty", ... # uninject all errors $ sudo ndctl inject-smart nmem0 -N ... "health_state":"ok", "shutdown_state":"clean", ... The patch adds a new member 'health_bitmap_inject_mask' inside struct papr_scm_priv which is then bitwise ANDed to the health bitmap fetched from the hypervisor. The value for 'health_bitmap_inject_mask' is accessible from sysfs at nmemX/papr/health_bitmap_inject. A new PDSM named 'SMART_INJECT' is proposed that accepts newly introduced 'struct nd_papr_pdsm_smart_inject' as payload thats exchanged between libndctl and papr_scm to indicate the requested smart-error states. When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject() constructs a pair or 'inject_mask' and 'clear_mask' bitmaps from the payload and bit-blt it to the 'health_bitmap_inject_mask'. This ensures the after being fetched from the hypervisor, the health_bitmap reflects requested smart-error states. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124202204.1488346-1-vaibhav@linux.ibm.com	2022-02-16 23:10:47 +11:00
Christophe Leroy	76b372814b	powerpc/ftrace: Style cleanup in ftrace_mprofile.S Add some line breaks to better match the file's style, add some space after comma and fix a couple of misplaced blanks. Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/973506292d0c7b05c06530c8e11803ce38e5eda2.1644949750.git.christophe.leroy@csgroup.eu	2022-02-16 23:09:47 +11:00
Christophe Leroy	fc75f87337	powerpc/ftrace: Have arch_ftrace_get_regs() return NULL unless FL_SAVE_REGS is set When FL_SAVE_REGS is not set we get here via ftrace_caller() which doesn't save all registers. ftrace_caller() explicitely clears regs.msr, so we can rely on it to know where we come from. We don't expect MSR register to be 0 at all when involving ftrace. Fixes: `40b035efe2` ("powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS") Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2f9a7e898c93cc7438ef5ccd47cb9c3a9c5b53ef.1644949750.git.christophe.leroy@csgroup.eu	2022-02-16 23:09:47 +11:00
Christophe Leroy	df45a55788	powerpc/ftrace: Add recursion protection in prepare_ftrace_return() The function_graph_enter() does not provide any recursion protection. Add a protection in prepare_ftrace_return() in case function_graph_enter() calls something that gets function graph traced. Fixes: `830213786c` ("powerpc/ftrace: directly call of function graph tracer by ftrace caller") Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/74edf2ff0a60e66b0d9225a137100a86a0557032.1644949750.git.christophe.leroy@csgroup.eu	2022-02-16 23:09:47 +11:00
Christophe Leroy	34d8dac807	powerpc/ftrace: Also save r1 in ftrace_caller() Also save r1 in ftrace_caller() r1 is needed during unwinding when the function_graph tracer is active. Fixes: `830213786c` ("powerpc/ftrace: directly call of function graph tracer by ftrace caller") Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ff535e86d3a69376a6d89168511d4e403835f18b.1644949750.git.christophe.leroy@csgroup.eu	2022-02-16 23:09:47 +11:00
Anders Roxell	fe663df782	powerpc/lib/sstep: fix 'ptesync' build error Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian 2.37.90.20220207) the following build error shows up: {standard input}: Assembler messages: {standard input}:2088: Error: unrecognized opcode: `ptesync' make[3]: *** [/builds/linux/scripts/Makefile.build:287: arch/powerpc/lib/sstep.o] Error 1 Add the 'ifdef CONFIG_PPC64' around the 'ptesync' in function 'emulate_update_regs()' to like it is in 'analyse_instr()'. Since it looks like it got dropped inadvertently by commit `3cdfcbfd32` ("powerpc: Change analyse_instr so it doesn't modify regs"). A key detail is that analyse_instr() will never recognise lwsync or ptesync on 32-bit (because of the existing ifdef), and as a result emulate_update_regs() should never be called with an op specifying either of those on 32-bit. So removing them from emulate_update_regs() should be a nop in terms of runtime behaviour. Fixes: `3cdfcbfd32` ("powerpc: Change analyse_instr so it doesn't modify regs") Cc: stable@vger.kernel.org # v4.14+ Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> [mpe: Add last paragraph of change log mentioning analyse_instr() details] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220211005113.1361436-1-anders.roxell@linaro.org	2022-02-15 22:31:35 +11:00
Paul Menzel	cb7356986d	powerpc/boot: Add `otheros-too-big.bld` to .gitignore Currently, `git status` lists the file as untracked by git, so tell git to ignore it. Fixes: `aa3bc365ee` ("powerpc/ps3: Add check for otheros image size") Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220214065543.198992-1-pmenzel@molgen.mpg.de	2022-02-15 22:29:52 +11:00
Christophe Leroy	38a1756861	powerpc: Don't allow the use of EMIT_BUG_ENTRY with BUGFLAG_WARNING Warnings in assembly must use EMIT_WARN_ENTRY in order to generate the necessary entry in exception table. Check in EMIT_BUG_ENTRY that flags don't include BUGFLAG_WARNING. This change avoids problems like the one fixed by commit `fd1eaaaaa6` ("powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings"). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ddcb422102a37eb45f57694c7ef0ec6187964dff.1644742951.git.christophe.leroy@csgroup.eu	2022-02-14 13:06:43 +11:00
Michael Ellerman	5a72345e6a	powerpc: Fix STACKTRACE=n build Our skiroot_defconfig doesn't enable FTRACE, and so doesn't get STACKTRACE enabled either. That leads to a build failure since commit `1614b2b11f` ("arch: Make ARCH_STACKWALK independent of STACKTRACE") made stacktrace.c build even when STACKTRACE=n. arch/powerpc/kernel/stacktrace.c: In function ‘handle_backtrace_ipi’: arch/powerpc/kernel/stacktrace.c:171:2: error: implicit declaration of function ‘nmi_cpu_backtrace’ 171 \| nmi_cpu_backtrace(regs); \| ^~~~~~~~~~~~~~~~~ arch/powerpc/kernel/stacktrace.c: In function ‘arch_trigger_cpumask_backtrace’: arch/powerpc/kernel/stacktrace.c:226:2: error: implicit declaration of function ‘nmi_trigger_cpumask_backtrace’ 226 \| nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace_ipi); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This happens because our headers haven't defined arch_trigger_cpumask_backtrace, which causes lib/nmi_backtrace.c not to build nmi_cpu_backtrace(). The code in question doesn't actually depend on STACKTRACE=y, that was just added because arch_trigger_cpumask_backtrace() lived in stacktrace.c for convenience. So drop the dependency on CONFIG_STACKTRACE, that causes lib/nmi_backtrace.c to build nmi_cpu_backtrace() etc. and fixes the build. Fixes: `1614b2b11f` ("arch: Make ARCH_STACKWALK independent of STACKTRACE") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220212111349.2806972-1-mpe@ellerman.id.au	2022-02-12 22:47:44 +11:00
Aneesh Kumar K.V	2354ad252b	powerpc/mm: Update default hugetlb size early commit: `d9c2340052` ("Do not depend on MAX_ORDER when grouping pages by mobility") introduced pageblock_order which will be used to group pages better. The kernel now groups pages based on the value of HPAGE_SHIFT. Hence HPAGE_SHIFT should be set before we call set_pageblock_order. set_pageblock_order happens early in the boot and default hugetlb page size should be initialized before that to compute the right pageblock_order value. Currently, default hugetlbe page size is set via arch_initcalls which happens late in the boot as shown via the below callstack: [c000000007383b10] [c000000001289328] hugetlbpage_init+0x2b8/0x2f8 [c000000007383bc0] [c0000000012749e4] do_one_initcall+0x14c/0x320 [c000000007383c90] [c00000000127505c] kernel_init_freeable+0x410/0x4e8 [c000000007383da0] [c000000000012664] kernel_init+0x30/0x15c [c000000007383e10] [c00000000000cf14] ret_from_kernel_thread+0x5c/0x64 and the pageblock_order initialization is done early during the boot. [c0000000018bfc80] [c0000000012ae120] set_pageblock_order+0x50/0x64 [c0000000018bfca0] [c0000000012b3d94] sparse_init+0x188/0x268 [c0000000018bfd60] [c000000001288bfc] initmem_init+0x28c/0x328 [c0000000018bfe50] [c00000000127b370] setup_arch+0x410/0x480 [c0000000018bfed0] [c00000000127401c] start_kernel+0xb8/0x934 [c0000000018bff90] [c00000000000d984] start_here_common+0x1c/0x98 delaying default hugetlb page size initialization implies the kernel will initialize pageblock_order to (MAX_ORDER - 1) which is not an optimal value for mobility grouping. IIUC we always had this issue. But it was not a problem for hash translation mode because (MAX_ORDER - 1) is the same as HUGETLB_PAGE_ORDER (8) in the case of hash (16MB). With radix, HUGETLB_PAGE_ORDER will be 5 (2M size) and hence pageblock_order should be 5 instead of 8. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220211065215.101767-1-aneesh.kumar@linux.ibm.com	2022-02-12 22:47:44 +11:00
Nathan Lynch	92e6dc257b	powerpc/pseries: make pseries_devicetree_update() static pseries_devicetree_update() has only one call site, in the same file in which it is defined. Make it static. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220207221247.354454-1-nathanl@linux.ibm.com	2022-02-12 22:47:44 +11:00
Christophe Leroy	692b21d780	powerpc/vdso: Move cvdso_call macro into gettimeofday.S Now that gettimeofday.S is unique, move cvdso_call macro into that file which is the only user. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/72720359d4c58e3a3b96dd74952741225faac3de.1642782130.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:44 +11:00
Christophe Leroy	9b97bea900	powerpc/vdso: Remove cvdso_call_time macro cvdso_call_time macro is very similar to cvdso_call macro. Add a call_time argument to cvdso_call which is 0 by default and set to 1 when using cvdso_call to call __c_kernel_time(). Return returned value as is with CR[SO] cleared when it is used for time(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/837a260ad86fc1ce297a562c2117fd69be5f7b5c.1642782130.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:43 +11:00
Christophe Leroy	fd1feade75	powerpc/vdso: Merge vdso64 and vdso32 into a single directory merge vdso64 into vdso32 and rename it vdso. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4dbe05cc130f6a0858d09ac72e436c373cb08b70.1642782130.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:43 +11:00
Christophe Leroy	d88378d8d2	powerpc/vdso: Rework VDSO32 makefile to add a prefix to object files In order to merge vdso32 and vdso64 build in following patch, rework Makefile is order to add -32 suffix to VDSO32 object files. Also change sigtramp.S to sigtramp32.S as VDSO64 sigtramp.S is too different to be squashed into VDSO32 sigtramp.S at the first place. gen_vdso_offsets.sh also becomes gen_vdso32_offsets.sh Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0c421b704a57b228e75a891512568339c53667ad.1642782130.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:43 +11:00
Christophe Leroy	f061fb03ee	powerpc/vdso: augment VDSO32 functions to support 64 bits build VDSO64 cacheflush.S datapage.S gettimeofday.S and vgettimeofday.c are very similar to their VDSO32 counterpart. VDSO32 counterpart is already more complete than the VDSO64 version as it supports both PPC32 vdso and 32 bits VDSO for PPC64. Use compat macros wherever necessary in PPC32 files so that they can also be used to build VDSO64. vdso64/note.S is already a link to vdso32/note.S so no change is required. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c2cbb8f046b7efc251053521dc39b752795e26b7.1642782130.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:43 +11:00
Christophe Leroy	6836f09903	powerpc/lib/sstep: use truncate_if_32bit() Use truncate_if_32bit() when possible instead of open coding. truncate_if_32bit() returns an unsigned long, so don't use it when a signed value is expected. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7e1c07123f13156d4a27991a2e2694fb584bc068.1642752375.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:43 +11:00
Christophe Leroy	7c3bba9199	powerpc/lib/sstep: Remove unneeded #ifdef __powerpc64__ MSR_64BIT is always defined, no need to hide code using MSR_64BIT inside an #ifdef __powerpc64__ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ee61b693bc7e046eed1abb7a34909eb4878a9442.1642752375.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:43 +11:00
Christophe Leroy	67484e0de9	powerpc/lib/sstep: Use l1_dcache_bytes() instead of opencoding Don't opencode dcache size retrieval based on whether that's ppc32 or ppc64. Use l1_dcache_bytes() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6c608fd4795e2d8ea1a0a449405a0087f76d8bb3.1642752375.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:42 +11:00
Christophe Leroy	9d44d1bd93	powerpc: Use the newly added is_tsk_32bit_task() macro Two places deserve using the macro is_tsk_32bit_task() added by commit `252745240b` ("powerpc/audit: Fix syscall_get_arch()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7304a889dbe885aefad8a8333673c81ee4b8f7a6.1642751874.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:42 +11:00
Christophe Leroy	0670010f3b	powerpc/32s: Enable STRICT_MODULE_RWX for the 603 core The book3s/32 MMU doesn't support per page execution protection and doesn't support RO protection for kernel pages. However, on the 603 which implements software loaded TLBs, execution protection is honored by the TLB Miss handler which doesn't load Instruction TLB for non executable pages. And RO protection is honored by clearing the C bit for RO pages, leading to DSI. So on the 603, STRICT_MODULE_RWX is possible without much effort. Don't disable STRICT_MODULE_RWX on book3s/32 and print a warning in case STRICT_MODULE_RWX has been selected and the platform has a Hardware HASH MMU. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1e6162f334167e75f1140082932e3a354b16daba.1642413973.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:42 +11:00
Christophe Leroy	a8936569a0	powerpc/bpf: Always reallocate BPF_REG_5, BPF_REG_AX and TMP_REG when possible BPF_REG_5, BPF_REG_AX and TMP_REG are mapped on non volatile registers because there are not enough volatile registers, but they don't need to be preserved on function calls. So when some volatile registers become available, those registers can always be reallocated regardless of whether SEEN_FUNC is set or not. Suggested-by: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b04c246874b716911139c04bc004b3b14eed07ef.1641817763.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:42 +11:00
Christophe Leroy	f222ab83df	powerpc: Add set_memory_{p/np}() and remove set_memory_attr() set_memory_attr() was implemented by commit `4d1755b6a7` ("powerpc/mm: implement set_memory_attr()") because the set_memory_xx() couldn't be used at that time to modify memory "on the fly" as explained it the commit. But set_memory_attr() uses set_pte_at() which leads to warnings when CONFIG_DEBUG_VM is selected, because set_pte_at() is unexpected for updating existing page table entries. The check could be bypassed by using __set_pte_at() instead, as it was the case before commit `c988cfd38e` ("powerpc/32: use set_memory_attr()") but since commit `9f7853d760` ("powerpc/mm: Fix set_memory_() against concurrent accesses") it is now possible to use set_memory_xx() functions to update page table entries "on the fly" because the update is now atomic. For DEBUG_PAGEALLOC we need to clear and set back _PAGE_PRESENT. Add set_memory_np() and set_memory_p() for that. Replace all uses of set_memory_attr() by the relevant set_memory_xx() and remove set_memory_attr(). Fixes: `c988cfd38e` ("powerpc/32: use set_memory_attr()") Cc: stable@vger.kernel.org Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Maxime Bizon <mbizon@freebox.fr> Reviewed-by: Russell Currey <ruscur@russell.cc> Depends-on: `9f7853d760` ("powerpc/mm: Fix set_memory_() against concurrent accesses") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cda2b44b55c96f9ac69fa92e68c01084ec9495c5.1640344012.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:42 +11:00
Christophe Leroy	a4c182ecf3	powerpc/set_memory: Avoid spinlock recursion in change_page_attr() Commit `1f9ad21c3b` ("powerpc/mm: Implement set_memory() routines") included a spin_lock() to change_page_attr() in order to safely perform the three step operations. But then commit `9f7853d760` ("powerpc/mm: Fix set_memory_*() against concurrent accesses") modify it to use pte_update() and do the operation safely against concurrent access. In the meantime, Maxime reported some spinlock recursion. [ 15.351649] BUG: spinlock recursion on CPU#0, kworker/0:2/217 [ 15.357540] lock: init_mm+0x3c/0x420, .magic: dead4ead, .owner: kworker/0:2/217, .owner_cpu: 0 [ 15.366563] CPU: 0 PID: 217 Comm: kworker/0:2 Not tainted 5.15.0+ #523 [ 15.373350] Workqueue: events do_free_init [ 15.377615] Call Trace: [ 15.380232] [e4105ac0] [800946a4] do_raw_spin_lock+0xf8/0x120 (unreliable) [ 15.387340] [e4105ae0] [8001f4ec] change_page_attr+0x40/0x1d4 [ 15.393413] [e4105b10] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.400009] [e4105b60] [80169620] free_pcp_prepare+0x1e4/0x4a0 [ 15.406045] [e4105ba0] [8016c5a0] free_unref_page+0x40/0x2b8 [ 15.411979] [e4105be0] [8018724c] kasan_depopulate_vmalloc_pte+0x6c/0x94 [ 15.418989] [e4105c00] [801424e0] __apply_to_page_range+0x164/0x310 [ 15.425451] [e4105c50] [80187834] kasan_release_vmalloc+0xbc/0x134 [ 15.431898] [e4105c70] [8015f7a8] __purge_vmap_area_lazy+0x4e4/0xdd8 [ 15.438560] [e4105d30] [80160d10] _vm_unmap_aliases.part.0+0x17c/0x24c [ 15.445283] [e4105d60] [801642d0] __vunmap+0x2f0/0x5c8 [ 15.450684] [e4105db0] [800e32d0] do_free_init+0x68/0x94 [ 15.456181] [e4105dd0] [8005d094] process_one_work+0x4bc/0x7b8 [ 15.462283] [e4105e90] [8005d614] worker_thread+0x284/0x6e8 [ 15.468227] [e4105f00] [8006aaec] kthread+0x1f0/0x210 [ 15.473489] [e4105f40] [80017148] ret_from_kernel_thread+0x14/0x1c Remove the read / modify / write sequence to make the operation atomic and remove the spin_lock() in change_page_attr(). To do the operation atomically, we can't use pte modification helpers anymore. Because all platforms have different combination of bits, it is not easy to use those bits directly. But all have the _PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare two sets to know which bits are set or cleared. For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you know which bit gets cleared and which bit get set when changing exec permission. Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/all/20211212112152.GA27070@sakura/ Link: https://lore.kernel.org/r/43c3c76a1175ae6dc1a3d3b5c3f7ecb48f683eea.1640344012.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:42 +11:00
Christophe Leroy	4ee83a2cfb	powerpc/ftrace: Remove ftrace_32.S Functions in ftrace_32.S are common with PPC64. Reuse the ones defined for PPC64 with slight modification when required. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Squash in fixup diff from Christophe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5e837fc190504c4ef834272e70d60ae33f175d49.1640017960.git.christophe.leroy@csgroup.eu	2022-02-12 22:47:28 +11:00
Ard Biesheuvel	297565aa22	lib/xor: make xor prototypes more friendly to compiler vectorization Modern compilers are perfectly capable of extracting parallelism from the XOR routines, provided that the prototypes reflect the nature of the input accurately, in particular, the fact that the input vectors are expected not to overlap. This is not documented explicitly, but is implied by the interchangeability of the various C routines, some of which use temporary variables while others don't: this means that these routines only behave identically for non-overlapping inputs. So let's decorate these input vectors with the __restrict modifier, which informs the compiler that there is no overlap. While at it, make the input-only vectors pointer-to-const as well. Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/563 Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-02-11 20:39:39 +11:00
Jakub Kicinski	1127170d45	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 18:40:56 -08:00
Song Liu	0f350231b5	bpf: Fix leftover header->pages in sparc and powerpc code. Replace header->pages * PAGE_SIZE with new header->size. Fixes: `ed2d9e1a26` ("bpf: Use size instead of pages in bpf_binary_header") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220208220509.4180389-2-song@kernel.org	2022-02-08 14:52:05 -08:00
Christophe Leroy	41315494be	powerpc/ftrace: Prepare ftrace_64_mprofile.S for reuse by PPC32 PPC64 mprofile versions and PPC32 are very similar. Modify PPC64 version so that if can be reused for PPC32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/82a732915dc71ee766e31809350939331944006d.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:11 +11:00
Christophe Leroy	830213786c	powerpc/ftrace: directly call of function graph tracer by ftrace caller Modify function graph tracer to be handled directly by the standard ftrace caller. This is made possible as powerpc now supports CONFIG_DYNAMIC_FTRACE_WITH_ARGS. This change simplifies the call of function graph ftrace. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/04d196585ff81bde06a000bd9c633a33a5b21130.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:11 +11:00
Christophe Leroy	0c81ed5ed4	powerpc/ftrace: Refactor ftrace_{en/dis}able_ftrace_graph_caller ftrace_enable_ftrace_graph_caller() and ftrace_disable_ftrace_graph_caller() have common code. They will have even more common code after following patch. Refactor into a single ftrace_modify_ftrace_graph_caller() function. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f37785a531f1a8f201e1b3da45997a5c77e9d820.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:11 +11:00
Christophe Leroy	40b035efe2	powerpc/ftrace: Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS. It accelerates the call of livepatching. Also note that powerpc being the last one to convert to CONFIG_DYNAMIC_FTRACE_WITH_ARGS, it will now be possible to remove klp_arch_set_pc() on all architectures. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5831f711a778fcd6eb51eb5898f1faae4378b35b.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:11 +11:00
Christophe Leroy	c75388a8ce	powerpc/ftrace: Prepare PPC64's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change ftrace_caller() to handle LIVEPATCH the same way as frace_caller_regs(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/850817333cc76593699032e8e9a70d8c36e1af1e.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:11 +11:00
Christophe Leroy	d95bf254be	powerpc/ftrace: Prepare PPC32's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change ftrace_caller() stack layout to match struct pt_regs. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/da9734eba504998fb914aca12131c9f6bf6120a8.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:11 +11:00
Christophe Leroy	7bdb478c1d	powerpc/ftrace: Simplify PPC32's return_to_handler() return_to_handler() was copied from PPC64. For PPC32 it just needs to save r3 and r4, and doesn't require any nop after the bl. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aab39b77b34fb2c4ed08ed01c547b6ed13643788.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:11 +11:00
Christophe Leroy	7875bc9b07	powerpc/ftrace: Don't save again LR in ftrace_regs_caller() on PPC32 PPC32 mcount() caller already saves LR on stack, no need to save it again. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eadcfc770b4f1e35535ffb85e28e858a2c31dec4.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:10 +11:00
Christophe Leroy	a4520b2527	powerpc/ftrace: Add support for livepatch to PPC32 PPC64 needs some special logic to properly set up the TOC. See commit `85baa09549` ("powerpc/livepatch: Add live patching support on ppc64le") for details. PPC32 doesn't have TOC so it doesn't need that logic, so adding LIVEPATCH support is straight forward. Add CONFIG_LIVEPATCH_64 and move livepatch stack logic into that item. Livepatch sample modules all work. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/63cb094125b6a6038c65eeac2abaabbabe63addd.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:10 +11:00
Christophe Leroy	0c850965d6	powerpc/module_32: Fix livepatching for RO modules Livepatching a loaded module involves applying relocations through apply_relocate_add(), which attempts to write to read-only memory when CONFIG_STRICT_MODULE_RWX=y. R_PPC_ADDR16_LO, R_PPC_ADDR16_HI, R_PPC_ADDR16_HA and R_PPC_REL24 are the types generated by the kpatch-build userspace tool or klp-convert kernel tree observed applying a relocation to a post-init module. Use patch_instruction() to patch those relocations. Commit `8734b41b3e` ("powerpc/module_64: Fix livepatching for RO modules") did similar change in module_64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d5697157cb7dba3927e19aa17c915a83bc550bb2.1640017960.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:10 +11:00
Christophe Leroy	27e21e8f12	powerpc/32: Remove _ENTRY() macro _ENTRY() is now redundant with _GLOBAL(). Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62a35f8dde2bb74c8d0d7a5430cce07a5a3a6fb6.1638273868.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:10 +11:00
Christophe Leroy	1231816373	powerpc/32: Remove remaining .stabs annotations STABS debug format has been superseded long time ago by DWARF. Remove the few remaining .stabs annotations from old 32 bits code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/68932ec2ba6b868d35006b96e90f0890f3da3c05.1638273868.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:10 +11:00
Christophe Leroy	66ada29078	powerpc/corenet: Change criteria to set MPIC_ENABLE_COREINT Don't use ppc_md function comparison. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c8ef82ee5f2713f4c36eb5d2d49b0905c7472801.1630667612.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:10 +11:00
Christophe Leroy	fae65a9ac8	powerpc/mpc86xx_hpcn: Remove obsolete statement Comment says "Delete this in 2.6.27". Do so now. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a47bb6a69c68156bc2d555152dab5a23733856b7.1630667612.git.christophe.leroy@csgroup.eu	2022-02-07 21:03:09 +11:00
Christophe Leroy	e6d03ac156	powerpc/machdep: Move sys_ctrler_t definition into pmac_feature.h sys_ctrler_t definitions are tied to pmac. Move it into pmac_feature.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Move to pmac_feature.h to fix some build errors] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7dd5ead4bbca749e2da089ff6fe2b1878d6bf40e.1630667612.git.christophe.leroy@csgroup.eu	2022-02-07 21:02:20 +11:00
Christophe Leroy	9bb162fa26	powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE Allthough kernel text is always mapped with BATs, we still have inittext mapped with pages, so TLB miss handling is required when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set. The final solution should be to set a BAT that also maps inittext but that BAT then needs to be cleared at end of init, and it will require more changes to be able to do it properly. As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big deal so let's fix it simply for now to enable easy stable application. Fixes: `035b19a15a` ("powerpc/32s: Always map kernel text and rodata with BATs") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aea33b4813a26bdb9378b5f273f00bd5d4abe240.1638857364.git.christophe.leroy@csgroup.eu	2022-02-07 17:18:47 +11:00
Christophe Leroy	d6a6c725a2	powerpc/machdep: Remove CONFIG_PPC_HAS_FEATURE_CALLS Last user was removed by commit `7bbd827750` ("[PATCH] ppc64: very basic desktop g5 sound support"). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/803779fffb4ee0801746b2173d37cea3b273f821.1630667612.git.christophe.leroy@csgroup.eu	2022-02-07 16:43:35 +11:00
Sourabh Jain	7c5ed82b80	powerpc: Set crashkernel offset to mid of RMA region On large config LPARs (having 192 and more cores), Linux fails to boot due to insufficient memory in the first memblock. It is due to the memory reservation for the crash kernel which starts at 128MB offset of the first memblock. This memory reservation for the crash kernel doesn't leave enough space in the first memblock to accommodate other essential system resources. The crash kernel start address was set to 128MB offset by default to ensure that the crash kernel get some memory below the RMA region which is used to be of size 256MB. But given that the RMA region size can be 512MB or more, setting the crash kernel offset to mid of RMA size will leave enough space for the kernel to allocate memory for other system resources. Since the above crash kernel offset change is only applicable to the LPAR platform, the LPAR feature detection is pushed before the crash kernel reservation. The rest of LPAR specific initialization will still be done during pseries_probe_fw_features as usual. This patch is dependent on changes to paca allocation for boot CPU. It expect boot CPU to discover 1T segment support which is introduced by the patch posted here: https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com	2022-02-07 15:26:12 +11:00
Christophe Leroy	4291d085b0	powerpc/32s: Make pte_update() non atomic on 603 core On 603 core, TLB miss handler don't do any change to the page tables so pte_update() doesn't need to be atomic. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cc89d3c11fc9c742d0df3454a657a3a00be24046.1643538554.git.christophe.leroy@csgroup.eu	2022-02-03 22:57:19 +11:00
Christophe Leroy	535bda36db	powerpc/nohash: Remove pte_same() arch/powerpc/include/asm/nohash/{32/64}/pgtable.h has #define __HAVE_ARCH_PTE_SAME #define pte_same(A,B) ((pte_val(A) ^ pte_val(B)) == 0) include/linux/pgtable.h has #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { return pte_val(pte_a) == pte_val(pte_b); } #endif Remove the powerpc version which is similar to the generic one. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/83c97bd58a3596ef1b0ff28b1e41fd492d005520.1643616989.git.christophe.leroy@csgroup.eu	2022-02-03 22:57:00 +11:00
Christophe Leroy	4634bf4455	powerpc/603: Clear C bit when PTE is read only On book3s/32 MMU, PP bits don't offer kernel RO protection, kernel pages are always RW. However, on the 603 a page fault is always generated when the C bit (change bit = dirty bit) is not set. Enforce kernel RO protection by clearing C bit in TLB miss handler when the page doesn't have _PAGE_RW flag. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bbb13848ff0100a76ee9ea95118058c30ae95f2c.1643613343.git.christophe.leroy@csgroup.eu	2022-02-03 22:56:44 +11:00
Christophe Leroy	9872cbfb45	powerpc/603: Remove outdated comment Since commit `84de6ab0e9` ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") page table is not updated anymore by TLB miss handlers. Remove the comment. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/38b1ffefd2146fa56bf8aa605d476ad9736bbb37.1643613296.git.christophe.leroy@csgroup.eu	2022-02-03 22:38:13 +11:00
Chen Jingwen	dd75080aa8	powerpc/kasan: Fix early region not updated correctly The shadow's page table is not updated when PTE_RPN_SHIFT is 24 and PAGE_SHIFT is 12. It not only causes false positives but also false negative as shown the following text. Fix it by bringing the logic of kasan_early_shadow_page_entry here. 1. False Positive: ================================================================== BUG: KASAN: vmalloc-out-of-bounds in pcpu_alloc+0x508/0xa50 Write of size 16 at addr f57f3be0 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.0-12267-gdebe436e77c7 #1 Call Trace: [c80d1c20] [c07fe7b8] dump_stack_lvl+0x4c/0x6c (unreliable) [c80d1c40] [c02ff668] print_address_description.constprop.0+0x88/0x300 [c80d1c70] [c02ff45c] kasan_report+0x1ec/0x200 [c80d1cb0] [c0300b20] kasan_check_range+0x160/0x2f0 [c80d1cc0] [c03018a4] memset+0x34/0x90 [c80d1ce0] [c0280108] pcpu_alloc+0x508/0xa50 [c80d1d40] [c02fd7bc] __kmem_cache_create+0xfc/0x570 [c80d1d70] [c0283d64] kmem_cache_create_usercopy+0x274/0x3e0 [c80d1db0] [c2036580] init_sd+0xc4/0x1d0 [c80d1de0] [c00044a0] do_one_initcall+0xc0/0x33c [c80d1eb0] [c2001624] kernel_init_freeable+0x2c8/0x384 [c80d1ef0] [c0004b14] kernel_init+0x24/0x170 [c80d1f10] [c001b26c] ret_from_kernel_thread+0x5c/0x64 Memory state around the buggy address: f57f3a80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f57f3b00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 >f57f3b80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ^ f57f3c00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f57f3c80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ================================================================== 2. False Negative (with KASAN tests): ================================================================== Before fix: ok 45 - kmalloc_double_kzfree # vmalloc_oob: EXPECTATION FAILED at lib/test_kasan.c:1039 KASAN failure expected in "((volatile char *)area)[3100]", but none occurred not ok 46 - vmalloc_oob not ok 1 - kasan ================================================================== After fix: ok 1 - kasan Fixes: `cbd18991e2` ("powerpc/mm: Fix an Oops in kasan_mmu_init()") Cc: stable@vger.kernel.org # 5.4.x Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211229035226.59159-1-chenjingwen6@huawei.com	2022-02-03 22:37:44 +11:00
Christophe JAILLET	e414e2938e	powerpc/xive: Add some error handling code to 'xive_spapr_init()' 'xive_irq_bitmap_add()' can return -ENOMEM. In this case, we should free the memory already allocated and return 'false' to the caller. Also add an error path which undoes the 'tima = ioremap(...)' Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/564998101804886b151235c8a9f93020923bfd2c.1643718324.git.christophe.jaillet@wanadoo.fr	2022-02-03 22:36:59 +11:00
Athira Rajeev	0198322379	powerpc/perf: Don't use perf_hw_context for trace IMC PMU Trace IMC (In-Memory collection counters) in powerpc is useful for application level profiling. For trace_imc, presently task context (task_ctx_nr) is set to perf_hw_context. But perf_hw_context should only be used for CPU PMU. See commit `2665784850` ("perf/core: Verify we have a single perf_hw_context PMU"). So for trace_imc, even though it is per thread PMU, it is preferred to use sw_context in order to be able to do application level monitoring. Hence change the task_ctx_nr to use perf_sw_context. Fixes: `012ae24484` ("powerpc/perf: Trace imc PMU functions") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Update subject & incorporate notes into change log, reflow comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202041837.65968-1-atrajeev@linux.vnet.ibm.com	2022-02-03 22:32:54 +11:00
Wedson Almeida Filho	d4be60fe66	powerpc/module_64: use module_init_section instead of patching names Without this patch, module init sections are disabled by patching their names in arch-specific code when they're loaded (which prevents code in layout_sections from finding init sections). This patch uses the new arch-specific module_init_section instead. This allows modules that have .init_array sections to have the initialisers properly called (on load, before init). Without this patch, the initialisers are not called because .init_array is renamed to _init_array, and thus isn't found by code in find_module_sections(). Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202055123.2144842-1-wedsonaf@google.com	2022-02-03 22:20:37 +11:00
Julia Lawall	925f76c557	powerpc/spufs: adjust list element pointer type Other uses of &gang->aff_list_head, eg in spufs_assert_affinity, indicate that the list elements have type spu_context, not spu as used here. Change the type of tmp accordingly. This has no impact on the execution, because tmp is not used in the body of the loop. Fixes: `c5fc8d2a92` ("[CELL] cell: add placement computation for scheduling of affinity contexts") Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1588929176-28527-1-git-send-email-Julia.Lawall@inria.fr	2022-02-03 21:40:32 +11:00
Bhaskar Chowdhury	a1c4140933	powerpc/epapr: Fix parmeters typo s/parmeters/parameters/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210320213932.22697-1-unixbhaskar@gmail.com	2022-02-03 21:35:56 +11:00
Fabiano Rosas	b53c861059	powerpc: Fix debug print in smp_setup_cpu_maps When figuring out the number of threads, the debug message prints "1 thread" for the first iteration of the loop, instead of the actual number of threads calculated from the length of the "ibm,ppc-interrupt-server#s" property. * /cpus/PowerPC,POWER8@20... ibm,ppc-interrupt-server#s -> 1 threads <--- WRONG thread 0 -> cpu 0 (hard id 32) thread 1 -> cpu 1 (hard id 33) thread 2 -> cpu 2 (hard id 34) thread 3 -> cpu 3 (hard id 35) thread 4 -> cpu 4 (hard id 36) thread 5 -> cpu 5 (hard id 37) thread 6 -> cpu 6 (hard id 38) thread 7 -> cpu 7 (hard id 39) * /cpus/PowerPC,POWER8@28... ibm,ppc-interrupt-server#s -> 8 threads thread 0 -> cpu 8 (hard id 40) thread 1 -> cpu 9 (hard id 41) thread 2 -> cpu 10 (hard id 42) thread 3 -> cpu 11 (hard id 43) thread 4 -> cpu 12 (hard id 44) thread 5 -> cpu 13 (hard id 45) thread 6 -> cpu 14 (hard id 46) thread 7 -> cpu 15 (hard id 47) (...) Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210120181847.952106-1-farosas@linux.ibm.com	2022-02-03 20:24:59 +11:00
Fabiano Rosas	4feb74aa64	KVM: PPC: Decrement module refcount if init_vm fails We increment the reference count for KVM-HV/PR before the call to kvmppc_core_init_vm. If that function fails we need to decrement the refcount. Also remove the check on kvm_ops->owner because try_module_get can handle a NULL module. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125155735.1018683-5-farosas@linux.ibm.com	2022-02-03 16:50:44 +11:00
Fabiano Rosas	175be7e580	KVM: PPC: Book3S HV: Free allocated memory if module init fails The module's exit function is not called when the init fails, we need to do cleanup before returning. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125155735.1018683-4-farosas@linux.ibm.com	2022-02-03 16:50:44 +11:00
Fabiano Rosas	c5d0d77b45	KVM: PPC: Book3S HV: Delay setting of kvm ops Delay the setting of kvm_hv_ops until after all init code has completed. This avoids leaving the ops still accessible if the init fails. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125155735.1018683-3-farosas@linux.ibm.com	2022-02-03 16:50:44 +11:00
Fabiano Rosas	69ab6ac380	KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init The return of the function is being shadowed by the call to kvmppc_uvmem_init. Fixes: `ca9f494267` ("KVM: PPC: Book3S HV: Support for running secure guests") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125155735.1018683-2-farosas@linux.ibm.com	2022-02-03 16:50:44 +11:00
Michael Ellerman	961f649fb3	powerpc/ptdump: Fix sparse warning in hashpagetable.c As reported by sparse: arch/powerpc/mm/ptdump/hashpagetable.c:264:29: warning: restricted __be64 degrades to integer arch/powerpc/mm/ptdump/hashpagetable.c:265:49: warning: restricted __be64 degrades to integer arch/powerpc/mm/ptdump/hashpagetable.c:267:36: warning: incorrect type in assignment (different base types) arch/powerpc/mm/ptdump/hashpagetable.c:267:36: expected unsigned long long [usertype] arch/powerpc/mm/ptdump/hashpagetable.c:267:36: got restricted __be64 [usertype] v arch/powerpc/mm/ptdump/hashpagetable.c:268:36: warning: incorrect type in assignment (different base types) arch/powerpc/mm/ptdump/hashpagetable.c:268:36: expected unsigned long long [usertype] arch/powerpc/mm/ptdump/hashpagetable.c:268:36: got restricted __be64 [usertype] r The values returned by plpar_pte_read_4() are CPU endian, not __be64, so assigning them to struct hash_pte confuses sparse. As a minimal fix open code a struct to hold the values with CPU endian types. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202053039.691917-1-mpe@ellerman.id.au	2022-02-02 20:32:11 +11:00
Michael Ellerman	2e7f1e2b30	powerpc/64: Move paca allocation later in boot Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size() and paca allocation. The first is that on a Radix capable machine but with "disable_radix" on the command line, there is a window during early boot where early_radix_enabled() is true, even though it will later become false. early_init_devtree: <- early_radix_enabled() = false early_init_dt_scan_cpus: <- early_radix_enabled() = false ... check_cpu_pa_features: <- early_radix_enabled() = false ... ^ <- early_radix_enabled() = TRUE allocate_paca: \| <- early_radix_enabled() = TRUE ... \| ppc64_bolted_size: \| <- early_radix_enabled() = TRUE if (early_radix_enabled())\| <- early_radix_enabled() = TRUE return ULONG_MAX; \| ... \| ... \| <- early_radix_enabled() = TRUE ... \| <- early_radix_enabled() = TRUE mmu_early_init_devtree() V ... <- early_radix_enabled() = false This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's paca allocation, even though later it will return a different value. This is not currently a bug because the paca allocation is also limited by the RMA size, but that is very fragile. The second issue is that when using the Hash MMU, when we call ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet detected whether 1T segments are available. That causes ppc64_bolted_size() to return 256MB, even if the machine can actually support up to 1T. This is usually OK, we generally have space below 256MB for one paca, but for a kdump kernel placed above 256MB it causes the boot to fail. At boot we cannot discover all the features of the machine instantaneously, so there will always be some periods where we have incomplete knowledge of the system. However both the above problems stem from the fact that we allocate the boot CPU's paca (and paca pointers array) before we decide which MMU we are using, or discover its exact features. Moving the paca allocation slightly later still can solve both the issues described above, and means for a normal boot we don't do any permanent allocations until after we've discovered the MMU. Note that although we move the boot CPU's paca allocation later, we still have a temporary paca (boot_paca) accessible via r13, so code that does read only access to paca fields is safe. The only risk is that some code writes to the boot_paca, and that write will then be lost when we switch away from the boot_paca later in early_setup(). The additional code that runs before the paca allocation is primarily mmu_early_init_devtree(), which is scanning the device tree and populating globals and cur_cpu_spec with MMU related flags. I do not see any additional code that writes to paca fields. [1]: https://lore.kernel.org/r/20211018084434.217772-2-sourabhjain@linux.ibm.com [2]: https://lore.kernel.org/r/20211018084434.217772-3-sourabhjain@linux.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124130544.408675-1-mpe@ellerman.id.au	2022-02-02 20:32:11 +11:00
Maxim Kiselev	5ebb747492	powerpc: dts: t1040rdb: fix ports names for Seville Ethernet switch On board rev A, the network interface labels for the switch ports written on the front panel are different than on rev B and later. This patch fixes network interface names for the switch ports according to labels that are written on the front panel of the board rev B. They start from ETH3 and end at ETH10. This patch also introduces a separate device tree for rev A. The main device tree is supposed to cover rev B and later. Fixes: `e69eb0824d` ("powerpc: dts: t1040rdb: add ports for Seville Ethernet switch") Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com> Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220121091447.3412907-1-bigunclemax@gmail.com	2022-02-02 20:32:10 +11:00
Laurent Dufour	eddaa9a402	powerpc/pseries: read the lpar name from the firmware The LPAR name may be changed after the LPAR has been started in the HMC. In that case lparstat command is not reporting the updated value because it reads it from the device tree which is read at boot time. However this value could be read from RTAS. Adding this value in the /proc/powerpc/lparcfg output allows to read the updated value. However the hypervisor, like Qemu/KVM, may not support this RTAS parameter. In that case the value reported in lparcfg is read from the device tree and so is not updated accordingly. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> [mpe: Drop doc-comment syntax, change RTAS/DT to lower case, use of_root to fix missing of_node_put(), use of_property_read_string()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220106161339.74656-1-ldufour@linux.ibm.com	2022-02-02 20:32:10 +11:00
Jason Wang	8e0f353a44	powerpc/kvm: no need to initialise statics to 0 Static variables do not need to be initialised to 0, because compiler will initialise all uninitialised statics to 0. Thus, remove the unneeded initialization. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211220030243.603435-1-wangborong@cdjrlc.com	2022-02-02 20:31:25 +11:00
Alexey Kardashevskiy	faf01aef05	KVM: PPC: Merge powerpc's debugfs entry content into generic entry At the moment KVM on PPC creates 4 types of entries under the kvm debugfs: 1) "%pid-%fd" per a KVM instance (for all platforms); 2) "vm%pid" (for PPC Book3s HV KVM); 3) "vm%u_vcpu%u_timing" (for PPC Book3e KVM); 4) "kvm-xive-%p" (for XIVE PPC Book3s KVM, the same for XICS); The problem with this is that multiple VMs per process is not allowed for 2) and 3) which makes it possible for userspace to trigger errors when creating duplicated debugfs entries. This merges all these into 1). This defines kvm_arch_create_kvm_debugfs() similar to kvm_arch_create_vcpu_debugfs(). This defines 2 hooks in kvmppc_ops that allow specific KVM implementations add necessary entries, this adds the _e500 suffix to kvmppc_create_vcpu_debugfs_e500() to make it clear what platform it is for. This makes use of already existing kvm_arch_create_vcpu_debugfs() on PPC. This removes no more used debugfs_dir pointers from PPC kvm_arch structs. This stops removing vcpu entries as once created vcpus stay around for the entire life of a VM and removed when the KVM instance is closed, see commit `d56f5136b0` ("KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directories"). Suggested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220111005404.162219-1-aik@ozlabs.ru	2022-02-02 20:30:26 +11:00
Thierry Reding	d5342fdd16	powerpc: dts: Fix some I2C unit addresses The unit-address for the Maxim MAX1237 ADCs on XPedite5200 boards don't match the value in the "reg" property and cause a DTC warning. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211220134036.683309-1-thierry.reding@gmail.com	2022-01-31 13:45:24 +11:00
Maxim Kiselev	17846485df	powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 T1040RDB has two RTL8211E-VB phys which requires setting of internal delays for correct work. Changing the phy-connection-type property to `rgmii-id` will fix this issue. Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com> Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211230151123.1258321-1-bigunclemax@gmail.com	2022-01-31 13:45:24 +11:00
Tobias Waldekranz	f529edd1b6	powerpc/e500/qemu-e500: allow core to idle without waiting This means an idle guest won't needlessly consume an entire core on the host, waiting for work to show up. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Joachim Wiberg <troglobit@gmail.com> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220112112459.1033754-1-troglobit@gmail.com	2022-01-31 13:45:24 +11:00
Michal Suchanek	b2a6f60435	powerpc: add link stack flush mitigation status in debugfs. The link stack flush status is not visible in debugfs. It can be enabled even when count cache flush is disabled. Add separate file for its status. Signed-off-by: Michal Suchanek <msuchanek@suse.de> [mpe: Update for change to link_stack_flush_type] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191127220959.6208-1-msuchanek@suse.de	2022-01-31 13:45:23 +11:00
Sachin Sant	279d1a72c0	powerpc/xive: Export XIVE IPI information for online-only processors. Cédric pointed out that XIVE IPI information exported via sysfs (debug/powerpc/xive) display empty lines for processors which are not online. Switch to using for_each_online_cpu() so that information is displayed for online-only processors. Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Sachin Sant <sachinp@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/164146703333.19039.10920919226094771665.sendpatchset@MacBook-Pro.local	2022-01-31 13:45:23 +11:00
Fabiano Rosas	c1c8a66367	KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure MMIO emulation can fail if the guest uses an instruction that we are not prepared to emulate. Since these instructions can be and most likely are valid ones, this is (slightly) closer to an access fault than to an illegal instruction, so deliver a Data Storage interrupt instead of a Program interrupt. BookE ignores bad faults, so it will keep using a Program interrupt because a DSI would cause a fault loop in the guest. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-6-farosas@linux.ibm.com	2022-01-31 13:43:00 +11:00
Fabiano Rosas	349fbfe9b9	KVM: PPC: mmio: Return to guest after emulation failure If MMIO emulation fails we don't want to crash the whole guest by returning to userspace. The original commit `bbf45ba57e` ("KVM: ppc: PowerPC 440 KVM implementation") added a todo: /* XXX Deliver Program interrupt to guest. */ and later the commit `d69614a295` ("KVM: PPC: Separate loadstore emulation from priv emulation") added the Program interrupt injection but in another file, so I'm assuming it was missed that this block needed to be altered. Also change the message to a ratelimited one since we're letting the guest run and it could flood the host logs. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-5-farosas@linux.ibm.com	2022-01-31 13:43:00 +11:00
Fabiano Rosas	3f83150448	KVM: PPC: mmio: Reject instructions that access more than mmio.data size The MMIO interface between the kernel and userspace uses a structure that supports a maximum of 8-bytes of data. Instructions that access more than that need to be emulated in parts. We currently don't have generic support for splitting the emulation in parts and each set of instructions needs to be explicitly included. There's already an error message being printed when a load or store exceeds the mmio.data buffer but we don't fail the emulation until later at kvmppc_complete_mmio_load and even then we allow userspace to make a partial copy of the data, which ends up overwriting some fields of the mmio structure. This patch makes the emulation fail earlier at kvmppc_handle_load\|store, which will send a Program interrupt to the guest. This is better than allowing the guest to proceed with partial data. Note that this was caught in a somewhat artificial scenario using quadword instructions (lq/stq), there's no account of an actual guest in the wild running instructions that are not properly emulated. (While here, remove the "bad MMIO" messages. The caller already has an error message.) Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-4-farosas@linux.ibm.com	2022-01-31 13:43:00 +11:00
Fabiano Rosas	b99234b918	KVM: PPC: Fix vmx/vsx mixup in mmio emulation The MMIO emulation code for vector instructions is duplicated between VSX and VMX. When emulating VMX we should check the VMX copy size instead of the VSX one. Fixes: `acc9eb9305` ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction ...") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-3-farosas@linux.ibm.com	2022-01-31 13:42:59 +11:00
Fabiano Rosas	36d014d37d	KVM: PPC: Book3S HV: Stop returning internal values to userspace Our kvm_arch_vcpu_ioctl_run currently returns the RESUME_HOST values to userspace, against the API of the KVM_RUN ioctl which returns 0 on success. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220125215655.1026224-2-farosas@linux.ibm.com	2022-01-31 13:42:59 +11:00
Al Viro	0c9dceb9bb	asm/user.h: killed unused macros Some of them used to be used by libbfd for a.out coredump handling. Seeing that * libbfd has their copies anyway * we don't export them into userland headers * we don't support a.out coredumps anymore let's bury the definitions. They never had in-kernel users anyway... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-01-30 21:17:00 -05:00
Arnd Bergmann	3e17314c22	agp: define proper stubs for empty helpers The empty unmap_page_from_agp() macro causes a warning when building with 'make W=1' on a couple of architectures: drivers/char/agp/generic.c: In function 'agp_generic_destroy_page': drivers/char/agp/generic.c:1265:28: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 1265 \| unmap_page_from_agp(page); Change the definitions to a 'do { } while (0)' construct to make these more reliable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Helge Deller <deller@gmx.de>	2022-01-29 22:24:25 +01:00
Nicholas Piggin	8defc2a5dd	powerpc/64s/interrupt: Fix decrementer storm The decrementer exception can fail to be cleared when the interrupt returns in the case where the decrementer wraps with the next timer still beyond decrementer_max. This results in a decrementer interrupt storm. This is triggerable with small decrementer system with hard and soft watchdogs disabled. Fix this by always programming the decrementer if there was no timer. Fixes: `0faf20a1ad` ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124143930.3923442-1-npiggin@gmail.com	2022-01-25 16:50:10 +11:00

1 2 3 4 5 ...

24907 Commits