OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Michael Ellerman	41dc056391	powerpc: Add hardware description string Create a hardware description string, which we will use to record various details of the hardware platform we are running on. Print the accumulated description at boot, and use it to set the generic description which is printed in oopses. To begin with add ppc_md.name, aka the "machine description". Example output at boot with the full series applied: Linux version 6.0.0-rc2-gcc-11.1.0-00199-g893f9007a5ce-dirty (michael@alpine1-p1) (powerpc64-linux-gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #844 SMP Thu Sep 29 22:29:53 AEST 2022 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-5b4c5a pSeries printk: bootconsole [udbg0] enabled Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220930082709.55830-1-mpe@ellerman.id.au	2022-09-30 18:35:52 +10:00
Michael Ellerman	d91c3f15fc	powerpc/configs: Enable PPC_UV in powernv_defconfig Make sure the ultravisor code at least gets some build testing by enabling it in powernv_defconfig. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220929051517.1903079-1-mpe@ellerman.id.au	2022-09-30 18:35:52 +10:00
Lukas Bulwahn	d210ee3fdf	powerpc/configs: Update config files for removed/renamed symbols Clean up config files by: - removing configs that were deleted in the past - removing configs not in tree and without recently pending patches - adding new configs that are replacements for old configs in the file For some detailed information, see: https://lore.kernel.org/kernel-janitors/20220929090645.1389-1-lukas.bulwahn@gmail.com/ Renamed: - CONFIG_PPC_PTDUMP -> CONFIG_GENERIC_PTDUMP `e084728393` ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP") Removed: - CONFIG_BLK_DEV_CRYPTOLOOP `47e9624616` ("block: remove support for cryptoloop and the xor transfer") - CONFIG_CRYPTO_RMD128 `b21b9a5e0a` ("crypto: rmd128 - remove RIPE-MD 128 hash algorithm") - CONFIG_CRYPTO_RMD256 `c15d4167f0` ("crypto: rmd256 - remove RIPE-MD 256 hash algorithm") - CONFIG_CRYPTO_RMD320 `93f6420292` ("crypto: rmd320 - remove RIPE-MD 320 hash algorithm") - CONFIG_CRYPTO_SALSA20 `663f63ee6d` ("crypto: salsa20 - remove Salsa20 stream cipher algorithm") - CONFIG_CRYPTO_TGR192 `87cd723f89` ("crypto: tgr192 - remove Tiger 128/160/192 hash algorithms") - CONFIG_HARDENED_USERCOPY_PAGESPAN `1109a5d907` ("usercopy: Remove HARDENED_USERCOPY_PAGESPAN") - CONFIG_RAPIDIO_TSI568, CONFIG_RAPIDIO_TSI57X `612d490419` ("rapidio: remove not used code about RIO_VID_TUNDRA") - CONFIG_RAW_DRIVER `603e4922f1` ("remove the raw driver") - CONFIG_ROCKETPORT `3b00b6af7a` ("tty: rocket, remove the driver") - CONFIG_ENABLE_MUST_CHECK `1967939462` ("Compiler Attributes: remove CONFIG_ENABLE_MUST_CHECK") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> [mpe: Add documentation of relevant commit for each symbol change] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220929101502.32527-1-lukas.bulwahn@gmail.com	2022-09-30 18:35:52 +10:00
Aneesh Kumar K.V	7dd3a7b90b	powerpc/mm: Fix UBSAN warning reported on hugetlb Powerpc architecture supports 16GB hugetlb pages with hash translation. For 4K page size, this is implemented as a hugepage directory entry at PGD level and for 64K it is implemented as a huge page pte at PUD level With 16GB hugetlb size, offset within a page is greater than 32 bits. Hence switch to use unsigned long type when using hugepd_shift. In order to keep things simpler, we make sure we always use unsigned long type when using hugepd_shift() even though all the hugetlb page size won't require that. The hugetlb_free_p*d_range changes are all related to nohash usage where we can have multiple pgd entries pointing to the same hugepd entries. Hence on book3s64 where we can have > 4GB hugetlb page size we will always find more < next even if we compute the value of more correctly. Hence there is no functional change in this patch except that it fixes the below warning. UBSAN: shift-out-of-bounds in arch/powerpc/mm/hugetlbpage.c:499:21 shift exponent 34 is too large for 32-bit type 'int' CPU: 39 PID: 1673 Comm: a.out Not tainted 6.0.0-rc2-00327-gee88a56e8517-dirty #1 Call Trace: dump_stack_lvl+0x98/0xe0 (unreliable) ubsan_epilogue+0x18/0x70 __ubsan_handle_shift_out_of_bounds+0x1bc/0x390 hugetlb_free_pgd_range+0x5d8/0x600 free_pgtables+0x114/0x290 exit_mmap+0x150/0x550 mmput+0xcc/0x210 do_exit+0x420/0xdd0 do_group_exit+0x4c/0xd0 sys_exit_group+0x24/0x30 system_call_exception+0x250/0x600 system_call_common+0xec/0x250 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Drop generic change to be sent separately, change 1ULL to 1UL] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908072440.258301-1-aneesh.kumar@linux.ibm.com	2022-09-30 18:35:52 +10:00
Aneesh Kumar K.V	7b31f7dadd	powerpc/mm: Always update max/min_low_pfn in mem_topology_setup() For both CONFIG_NUMA enabled/disabled use mem_topology_setup() to update max/min_low_pfn. This also adds min_low_pfn update to CONFIG_NUMA which was initialized to zero before. (mpe: Though MEMORY_START is == 0 for PPC64=y which is all possible NUMA=y systems) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220704063851.295482-1-aneesh.kumar@linux.ibm.com	2022-09-30 18:35:52 +10:00
Aneesh Kumar K.V	d368e0c478	powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range This function does the hash page table update. Hence rename it to indicate this better to avoid confusion with flush_pmd_tlb_range() Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Drop unnecessary extern] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220907081941.209501-1-aneesh.kumar@linux.ibm.com	2022-09-30 18:35:52 +10:00
Michael Ellerman	7673335e2a	powerpc: Drops STABS_DEBUG from linker scripts No toolchain we support should be generating stabs debug information anymore. Drop the sections entirely from our linker scripts. We removed all the manual stabs annotations in commit `1231816373` ("powerpc/32: Remove remaining .stabs annotations"). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220928130951.1732983-1-mpe@ellerman.id.au	2022-09-30 18:35:52 +10:00
Michael Ellerman	0c36099642	powerpc/64s: Remove lost/old comment The bulk of this was moved/reworded in: `57f266497d` ("powerpc: Use gas sections for arranging exception vectors") And now appears around line 700 in arch/powerpc/kernel/exceptions-64s.S. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220928130941.1732818-1-mpe@ellerman.id.au	2022-09-30 18:35:51 +10:00
Michael Ellerman	57a8e4b26e	powerpc/64s: Remove old STAB comment This used to be about the 0x4300 handler, but that was moved in commit `da2bc4644c` ("powerpc/64s: Add new exception vector macros"). Note that "STAB" here refers to "Segment Table" not the debug format. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220928130912.1732466-1-mpe@ellerman.id.au	2022-09-30 18:35:51 +10:00
Nicholas Piggin	a08661af4c	powerpc: remove orphan systbl_chk.sh arch/powerpc/kernel/systbl_chk.sh has not been referenced since commit `ab66dcc76d` ("powerpc: generate uapi header and system call table files"). Remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220929032120.3592593-1-npiggin@gmail.com	2022-09-30 18:35:51 +10:00
Haren Myneni	f3e5d9e53e	powerpc/pseries/vas: Pass hw_cpu_id to node associativity HCALL Generally the hypervisor decides to allocate a window on different VAS instances. But if user space wishes to allocate on the current VAS instance where the process is executing, the kernel has to pass associativity domain IDs to allocate VAS window HCALL. To determine the associativity domain IDs for the current CPU, smp_processor_id() is passed to node associativity HCALL which may return H_P2 (-55) error during DLPAR CPU event. This is because Linux CPU numbers (smp_processor_id()) are not the same as the hypervisor's view of CPU numbers. Fix the issue by passing hard_smp_processor_id() with VPHN_FLAG_VCPU flag (PAPR 14.11.6.1 H_HOME_NODE_ASSOCIATIVITY). Fixes: `b22f2d88e4` ("powerpc/pseries/vas: Integrate API with open/close windows") Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> [mpe: Update change log to mention Linux vs HV CPU numbers] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/55380253ea0c11341824cd4c0fc6bbcfc5752689.camel@linux.ibm.com	2022-09-30 18:35:51 +10:00
Nicholas Piggin	e4335f5319	KVM: PPC: Book3S HV: Implement scheduling wait interval counters in the VPA PAPR specifies accumulated virtual processor wait intervals that relate to partition scheduling interval times. Implement these counters in the same way as they are repoted by dtl. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-5-npiggin@gmail.com	2022-09-30 18:35:38 +10:00
Michael Ellerman	9511b5a033	Merge branch 'topic/ppc-kvm' into next Merge some KVM commits we are keeping in our topic branch.	2022-09-30 18:35:16 +10:00
Jakub Kicinski	accc3b4a57	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-29 14:30:51 -07:00
Peter Zijlstra	a1ebcd5943	Linux 6.0-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmMwwY4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGdlwH/0ESzdb6F9zYWwHR E08har56/IfwjOsn1y+JuHibpwUjzskLzdwIfI5zshSZAQTj5/UyC0P7G/wcYh/Z INh1uHGazmDUkx4O3lwuWLR+mmeUxZRWdq4NTwYDRNPMSiPInVxz+cZJ7y0aPr2e wii7kMFRHgXmX5DMDEwuHzehsJF7vZrp8zBu2DqzVUGnbwD50nPbyMM3H4g9mute fAEpDG0X3+smqMaKL+2rK0W/Av/87r3U8ZAztBem3nsCJ9jT7hqMO1ICcKmFMviA DTERRMwWjPq+mBPE2CiuhdaXvNZBW85Ds81mSddS6MsO6+Tvuzfzik/zSLQJxlBi vIqYphY= =NqG+ -----END PGP SIGNATURE----- Merge branch 'v6.0-rc7' Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <peterz@infradead.org>	2022-09-29 12:20:50 +02:00
Masahiro Yamada	2df8220cc5	kbuild: build init/built-in.a just once Kbuild builds init/built-in.a twice; first during the ordinary directory descending, second from scripts/link-vmlinux.sh. We do this because UTS_VERSION contains the build version and the timestamp. We cannot update it during the normal directory traversal since we do not yet know if we need to update vmlinux. UTS_VERSION is temporarily calculated, but omitted from the update check. Otherwise, vmlinux would be rebuilt every time. When Kbuild results in running link-vmlinux.sh, it increments the version number in the .version file and takes the timestamp at that time to really fix UTS_VERSION. However, updating the same file twice is a footgun. To avoid nasty timestamp issues, all build artifacts that depend on init/built-in.a are atomically generated in link-vmlinux.sh, where some of them do not need rebuilding. To fix this issue, this commit changes as follows: [1] Split UTS_VERSION out to include/generated/utsversion.h from include/generated/compile.h include/generated/utsversion.h is generated just before the vmlinux link. It is generated under include/generated/ because some decompressors (s390, x86) use UTS_VERSION. [2] Split init_uts_ns and linux_banner out to init/version-timestamp.c from init/version.c init_uts_ns and linux_banner contain UTS_VERSION. During the ordinary directory descending, they are compiled with __weak and used to determine if vmlinux needs relinking. Just before the vmlinux link, they are compiled without __weak to embed the real version and timestamp. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2022-09-29 04:40:15 +09:00
Haren Myneni	335e1a9104	powerpc: Ignore DSI error caused by the copy/paste instruction The data storage interrupt (DSI) error will be generated when the paste operation is issued on the suspended Nest Accelerator (NX) window due to NX state changes. The hypervisor expects the partition to ignore this error during page fault handling. To differentiate DSI caused by an actual HW configuration or by the NX window, a new “ibm,pi-features” type value is defined. Byte 0, bit 3 of pi-attribute-specifier-type is now defined to indicate this DSI error. If this error is not ignored, the user space can get SIGBUS when the NX request is issued. This patch adds changes to read ibm,pi-features property and ignore DSI error during page fault handling if MMU_FTR_NX_DSI is defined. Signed-off-by: Haren Myneni <haren@linux.ibm.com> [mpe: Mention PAPR version in comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b9cd844b85eb8f70459109ce1b14e44c4cc85fa7.camel@linux.ibm.com	2022-09-28 22:52:32 +10:00
Michael Ellerman	19c95df127	powerpc: Reverse stack frame marker on little endian On little endian the stack frame marker appears reversed when dumping memory sequentially, as is typical in xmon or gdb, eg: c000000004733e40 0000000000000000 0000000000000000 \|................\| c000000004733e50 0000000000000000 0000000000000000 \|................\| c000000004733e60 0000000000000000 0000000000000000 \|................\| c000000004733e70 5347455200000000 0000000000000000 \|SGER............\| c000000004733e80 a700000000000000 708897f7ff7f0000 \|........p.......\| c000000004733e90 0073428fff7f0000 208997f7ff7f0000 \|.sB..... .......\| c000000004733ea0 0100000000000000 ffffffffffffffff \|................\| c000000004733eb0 0000000000000000 0000000000000000 \|................\| To make it easier to recognise, reverse the value on little endian, so it always appears as "REGS", eg: c000000004733e70 5245475300000000 0000000000000000 \|REGS............\| Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220927150419.1503001-2-mpe@ellerman.id.au	2022-09-28 22:21:17 +10:00
Michael Ellerman	bbd7170908	powerpc: Make stack frame marker upper case Now that the stack frame regs marker is only 32-bits it is not as obvious in memory dumps and easier to miss, eg: c000000004733e40 0000000000000000 0000000000000000 \|................\| c000000004733e50 0000000000000000 0000000000000000 \|................\| c000000004733e60 0000000000000000 0000000000000000 \|................\| c000000004733e70 7367657200000000 0000000000000000 \|sger............\| c000000004733e80 a700000000000000 708897f7ff7f0000 \|........p.......\| c000000004733e90 0073428fff7f0000 208997f7ff7f0000 \|.sB..... .......\| c000000004733ea0 0100000000000000 ffffffffffffffff \|................\| c000000004733eb0 0000000000000000 0000000000000000 \|................\| So make it upper case to make it stand out a bit more: c000000004733e70 5347455200000000 0000000000000000 \|SGER............\| Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220927150419.1503001-1-mpe@ellerman.id.au	2022-09-28 22:21:11 +10:00
Li Huafei	97f88a3d72	powerpc/kprobes: Fix null pointer reference in arch_prepare_kprobe() I found a null pointer reference in arch_prepare_kprobe(): # echo 'p cmdline_proc_show' > kprobe_events # echo 'p cmdline_proc_show+16' >> kprobe_events Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc000000000050bfc Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 122 Comm: sh Not tainted 6.0.0-rc3-00007-gdcf8e5633e2e #10 NIP: c000000000050bfc LR: c000000000050bec CTR: 0000000000005bdc REGS: c0000000348475b0 TRAP: 0300 Not tainted (6.0.0-rc3-00007-gdcf8e5633e2e) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 88002444 XER: 20040006 CFAR: c00000000022d100 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 ... NIP arch_prepare_kprobe+0x10c/0x2d0 LR arch_prepare_kprobe+0xfc/0x2d0 Call Trace: 0xc0000000012f77a0 (unreliable) register_kprobe+0x3c0/0x7a0 __register_trace_kprobe+0x140/0x1a0 __trace_kprobe_create+0x794/0x1040 trace_probe_create+0xc4/0xe0 create_or_delete_trace_kprobe+0x2c/0x80 trace_parse_run_command+0xf0/0x210 probes_write+0x20/0x40 vfs_write+0xfc/0x450 ksys_write+0x84/0x140 system_call_exception+0x17c/0x3a0 system_call_vectored_common+0xe8/0x278 --- interrupt: 3000 at 0x7fffa5682de0 NIP: 00007fffa5682de0 LR: 0000000000000000 CTR: 0000000000000000 REGS: c000000034847e80 TRAP: 3000 Not tainted (6.0.0-rc3-00007-gdcf8e5633e2e) MSR: 900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 44002408 XER: 00000000 The address being probed has some special: cmdline_proc_show: Probe based on ftrace cmdline_proc_show+16: Probe for the next instruction at the ftrace location The ftrace-based kprobe does not generate kprobe::ainsn::insn, it gets set to NULL. In arch_prepare_kprobe() it will check for: ... prev = get_kprobe(p->addr - 1); preempt_enable_no_resched(); if (prev && ppc_inst_prefixed(ppc_inst_read(prev->ainsn.insn))) { ... If prev is based on ftrace, 'ppc_inst_read(prev->ainsn.insn)' will occur with a null pointer reference. At this point prev->addr will not be a prefixed instruction, so the check can be skipped. Check if prev is ftrace-based kprobe before reading 'prev->ainsn.insn' to fix this problem. Fixes: `b4657f7650` ("powerpc/kprobes: Don't allow breakpoints on suffixes") Signed-off-by: Li Huafei <lihuafei1@huawei.com> [mpe: Trim oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220923093253.177298-1-lihuafei1@huawei.com	2022-09-28 22:19:52 +10:00
ye xingchen	5e4952656b	ocxl: Remove the unneeded result variable Return the value opal_npu_spa_clear_cache() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220906072006.337099-1-ye.xingchen@zte.com.cn	2022-09-28 19:22:14 +10:00
ye xingchen	91986d7f03	powerpc/pseries/vas: Remove the unneeded result variable Return the value vas_register_coproc_api() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220825072657.229168-1-ye.xingchen@zte.com.cn	2022-09-28 19:22:14 +10:00
Nathan Lynch	b37ac1894a	powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up() At boot time, it is not necessary to delay between polls of cpu_callin_map when waiting for a kicked CPU to come up. Remove the delay intervals, but preserve the overall deadline (five seconds). At run time, the first poll result is usually negative and we incur a sleeping wait. If we spin on the callin word for a short time first, we can reduce __cpu_up() from dozens of milliseconds to under 1ms in the common case on a P9 LPAR: $ ppc64_cpu --smt=off $ bpftrace -e 'kprobe:__cpu_up { @start[tid] = nsecs; } kretprobe:__cpu_up /@start[tid]/ { @us = hist((nsecs - @start[tid]) / 1000); delete(@start[tid]); }' -c 'ppc64_cpu --smt=on' Before: @us: [16K, 32K) 85 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [32K, 64K) 13 \|@@@@@@@ \| After: @us: [128, 256) 95 \|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\| [256, 512) 3 \|@ \| Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926220250.157022-1-nathanl@linux.ibm.com	2022-09-28 19:22:14 +10:00
Nathan Lynch	b8f3e48834	powerpc/rtas: block error injection when locked down The error injection facility on pseries VMs allows corruption of arbitrary guest memory, potentially enabling a sufficiently privileged user to disable lockdown or perform other modifications of the running kernel via the rtas syscall. Block the PAPR error injection facility from being opened or called when locked down. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926131643.146502-3-nathanl@linux.ibm.com	2022-09-28 19:22:14 +10:00
Nathan Lynch	99df7a2810	powerpc/pseries: block untrusted device tree changes when locked down The /proc/powerpc/ofdt interface allows the root user to freely alter the in-kernel device tree, enabling arbitrary physical address writes via drivers that could bind to malicious device nodes, thus making it possible to disable lockdown. Historically this interface has been used on the pseries platform to facilitate the runtime addition and removal of processor, memory, and device resources (aka Dynamic Logical Partitioning or DLPAR). Years ago, the processor and memory use cases were migrated to designs that happen to be lockdown-friendly: device tree updates are communicated directly to the kernel from firmware without passing through untrusted user space. I/O device DLPAR via the "drmgr" command in powerpc-utils remains the sole legitimate user of /proc/powerpc/ofdt, but it is already broken in lockdown since it uses /dev/mem to allocate argument buffers for the rtas syscall. So only illegitimate uses of the interface should see a behavior change when running on a locked down kernel. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926131643.146502-2-nathanl@linux.ibm.com	2022-09-28 19:22:14 +10:00
Pali Rohár	6bd7ff497b	powerpc/udbg: Remove extern function prototypes 'extern' keyword is pointless and deprecated for function prototypes. Signed-off-by: Pali Rohár <pali@kernel.org> Suggested-by: Gabriel Paubert <paubert@iram.es> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220822231751.16973-1-pali@kernel.org	2022-09-28 19:22:14 +10:00
Pali Rohár	110a58b9f9	powerpc/boot: Explicitly disable usage of SPE instructions uImage boot wrapper should not use SPE instructions, like kernel itself. Boot wrapper has already disabled Altivec and VSX instructions but not SPE. Options -mno-spe and -mspe=no already set when compilation of kernel, but not when compiling uImage wrapper yet. Fix it. Cc: stable@vger.kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220827134454.17365-1-pali@kernel.org	2022-09-28 19:22:14 +10:00
Pali Rohár	c102432005	powerpc: Include e500v1_power_isa.dtsi for remaining e500v1 platforms There are still some board device tree files without Power ISA properties which have Freescale e500v1 cores, namely those which are based on Freescale mpc8540, mpc8541, mpc8555 and mpc8560 processors. So include newly introduced e500v1_power_isa.dtsi file in devices tree files with those processors. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220902212103.22534-2-pali@kernel.org	2022-09-28 19:22:14 +10:00
Pali Rohár	37b9345ce7	powerpc: Fix SPE Power ISA properties for e500v1 platforms Commit `2eb2800643` ("powerpc/e500v2: Add Power ISA properties to comply with ePAPR 1.1") introduced new include file e500v2_power_isa.dtsi and should have used it for all e500v2 platforms. But apparently it was used also for e500v1 platforms mpc8540, mpc8541, mpc8555 and mpc8560. e500v1 cores compared to e500v2 do not support double precision floating point SPE instructions. Hence power-isa-sp.fd should not be set on e500v1 platforms, which is in e500v2_power_isa.dtsi include file. Fix this issue by introducing a new e500v1_power_isa.dtsi include file and use it in all e500v1 device tree files. Fixes: `2eb2800643` ("powerpc/e500v2: Add Power ISA properties to comply with ePAPR 1.1") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220902212103.22534-1-pali@kernel.org	2022-09-28 19:22:13 +10:00
Athira Rajeev	b9c001276d	powerpc/perf: Fix branch_filter support for multiple filters For PERF_SAMPLE_BRANCH_STACK sample type, different branch_sample_type ie branch filters are supported. The branch filters are requested via event attribute "branch_sample_type". Multiple branch filters can be passed in event attribute. eg: $ perf record -b -o- -B --branch-filter any,ind_call true None of the Power PMUs support having multiple branch filters at the same time. Branch filters for branch stack sampling is set via MMCRA IFM bits [32:33]. But currently when requesting for multiple filter types, the "perf record" command does not report any error. eg: $ perf record -b -o- -B --branch-filter any,save_type true $ perf record -b -o- -B --branch-filter any,ind_call true The "bhrb_filter_map" function in PMU driver code does the validity check for supported branch filters. But this check is done for single filter. Hence "perf record" will proceed here without reporting any error. Fix power_pmu_event_init() to return EOPNOTSUPP when multiple branch filters are requested in the event attr. After the fix: $ perf record --branch-filter any,ind_call -- ls Error: cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel<disgoel@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Tweak comment and change log wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921145255.20972-1-atrajeev@linux.vnet.ibm.com	2022-09-28 19:22:13 +10:00
Nicholas Piggin	e1100cee05	powerpc/64s/interrupt: halt early boot interrupts if paca is not set up Ensure r13 is zero from very early in boot until it gets set to the boot paca pointer. This allows early program and mce handlers to halt if there is no valid paca, rather than potentially run off into the weeds. This preserves register and memory contents for low level debugging tools. Nothing could be printed to console at this point in any case because even udbg is only set up after the boot paca is set, so this shouldn't be missed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-6-npiggin@gmail.com	2022-09-28 19:22:13 +10:00
Nicholas Piggin	519b2e317e	powerpc/64: don't set boot CPU's r13 to paca until the structure is set up The idea is to get to the point where if r13 is non-zero, then it should contain a reasonable paca. This can be used in early boot program check and machine check handlers to avoid running off into the weeds if they hit before r13 has a paca. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-5-npiggin@gmail.com	2022-09-28 19:22:13 +10:00
Nicholas Piggin	b830c8754e	powerpc/64: avoid using r13 in relocate relocate() uses r13 in early boot before it is used for the paca. Use a different register for this so r13 is kept unchanged until it is set to the paca pointer. Avoid r14 as well while we're here, there's no reason not to use the volatile registers which is a bit less surprising, and r14 could be used as another fixed reg one day. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-4-npiggin@gmail.com	2022-09-28 19:22:13 +10:00
Nicholas Piggin	2f5182cffa	powerpc/64s: early boot machine check handler Use the early boot interrupt fixup in the machine check handler to allow the machine check handler to run before interrupt endian is set up. Branch to an early boot handler that just does a basic crash, which allows it to run before ppc_md is set up. MSR[ME] is enabled on the boot CPU earlier, and the machine check stack is temporarily set to the middle of the init task stack. This allows machine checks (e.g., due to invalid data access in real mode) to print something useful earlier in boot (as soon as udbg is set up, if CONFIG_PPC_EARLY_DEBUG=y). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-3-npiggin@gmail.com	2022-09-28 19:22:13 +10:00
Nicholas Piggin	bf75a3258a	powerpc/64s/interrupt: move early boot ILE fixup into a macro In preparation for using this sequence in machine check interrupt, move it into a macro, with a small change to make it position independent. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926055620.2676869-2-npiggin@gmail.com	2022-09-28 19:22:12 +10:00
Nicholas Piggin	3569d84bb2	powerpc/64e: provide an addressing macro for use with TOC in alternate register The interrupt entry code carefully saves a minimal number of registers, so in some places the TOC is required, it is loaded into a different register, so provide a macro that can supply an alternate TOC register. This continues to use got addressing because TOC-relative results in "got/toc optimization is not supported" messages by the linker. Having r2 be one of the saved registers and using that for TOC addressing may be the best way to avoid that and switch this to TOC addressing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-6-npiggin@gmail.com	2022-09-28 19:22:12 +10:00
Nicholas Piggin	8e93fb33c8	powerpc/64: provide a helper macro to load r2 with the kernel TOC A later change stops the kernel using r2 and loads it with a poison value. Provide a PACATOC loading abstraction which can hide this detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-5-npiggin@gmail.com	2022-09-28 19:22:12 +10:00
Nicholas Piggin	754f611774	powerpc/64: switch asm helpers from GOT to TOC relative addressing There is no need to use GOT addressing within the kernel. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-4-npiggin@gmail.com	2022-09-28 19:22:12 +10:00
Nicholas Piggin	dab3b8f4fd	powerpc/64: asm use consistent global variable declaration and access Use helper macros to access global variables, and place them in .data sections rather than in .toc. Putting addresses in TOC is not required because the kernel is linked with a single TOC. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-3-npiggin@gmail.com	2022-09-28 19:22:12 +10:00
Nicholas Piggin	17773afdcd	powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER Using a 32-bit constant for this marker allows it to be loaded with two ALU instructions, like 32-bit. This avoids a TOC entry and a TOC load that depends on the r2 value that has just been loaded from the PACA. This changes the value for 32-bit as well, so both have the same value in the low 4 bytes and 64-bit has 0 in the top bytes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-2-npiggin@gmail.com	2022-09-28 19:22:12 +10:00
Nicholas Piggin	4b2a9315f2	powerpc/64s: POWER10 CPU Kconfig build option This adds basic POWER10_CPU option, which builds with -mcpu=power10. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220923033004.536127-1-npiggin@gmail.com	2022-09-28 19:22:12 +10:00
Haren Myneni	465dda9d32	powerpc/pseries: Move vas_migration_handler early during migration When the migration is initiated, the hypervisor changes VAS mappings as part of pre-migration event. Then the OS gets the migration event which closes all VAS windows before the migration starts. NX generates continuous faults until windows are closed and the user space can not differentiate these NX faults coming from the actual migration. So to reduce this time window, close VAS windows first in pseries_migrate_partition(). Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d8efade91dda831c9ed4abb226dab627da594c5f.camel@linux.ibm.com	2022-09-28 19:22:12 +10:00
Nicholas Piggin	1da5351f9e	powerpc/64/irq: tidy soft-masked irq replay and improve documentation irq replay is quite complicated because of softirq processing which itself enables and disables irqs. Several considerations need to be accounted for due to this, and they are not clearly documented. Refactor the irq replay code a bit to tidy and deduplicate some common functions. Add comments, debug checks. This has a minor functional change that irq tracing enable/disable is done after each interrupt replayed, rather than after a batch. It also re-sets state to IRQS_ALL_DISABLED after an interrupt, which doesn't matter much because interrupts are hard disabled at this point, but it is more consistent with how interrupt handlers are called. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-8-npiggin@gmail.com	2022-09-28 19:22:11 +10:00
Nicholas Piggin	f7bff6e775	powerpc/64/interrupt: avoid BUG/WARN recursion in interrupt entry BUG/WARN are handled with a program interrupt which can turn into an infinite recursion when there are bugs in interrupt handler entry (which can be irritated by bugs in other parts of the code). There is one feeble attempt to avoid this recursion, but it misses several cases. Make a tidier macro for this and switch most bugs in the interrupt entry wrapper over to use it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-7-npiggin@gmail.com	2022-09-28 19:22:11 +10:00
Nicholas Piggin	c39fb71a54	powerpc/64s/interrupt: masked handler debug check for previous hard disable Prior changes eliminated cases of masked PACA_IRQ_MUST_HARD_MASK interrupts that re-fire due to MSR[EE] being enabled while they are pending. Add a debug check in the masked interrupt handler to catch if this occurs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-6-npiggin@gmail.com	2022-09-28 19:22:11 +10:00
Nicholas Piggin	9524f2278f	powerpc/64s: Fix irq state management in runlatch functions When irqs are soft-disabled, MSR[EE] is volatile and can change from 1 to 0 asynchronously (if a PACA_IRQ_MUST_HARD_MASK interrupt hits). So it can not be used to check hard IRQ enabled status, except to confirm it is disabled. ppc64_runlatch_on/off functions use MSR this way to decide whether to re-enable MSR[EE] after disabling it, which leads to MSR[EE] being enabled when it shouldn't be (when a PACA_IRQ_MUST_HARD_MASK had disabled it between reading the MSR and clearing EE). This has been tolerated in the kernel previously, and it doesn't seem to cause a problem, but it is unexpected and may trip warnings or cause other problems as we tighten up this state management. Fix this by only re-enabling if PACA_IRQ_HARD_DIS is clear. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-5-npiggin@gmail.com	2022-09-28 19:22:11 +10:00
Nicholas Piggin	e485f6c751	powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending If a synchronous interrupt (e.g., hash fault) is taken inside an irqs-disabled region which has MSR[EE]=1, then an asynchronous interrupt that is PACA_IRQ_MUST_HARD_MASK (e.g., PMI) is taken inside the synchronous interrupt handler, then the synchronous interrupt will return with MSR[EE]=1 and the asynchronous interrupt fires again. If the asynchronous interrupt is a PMI and the original context does not have PMIs disabled (only Linux IRQs), the asynchronous interrupt will fire despite having the PMI marked soft pending. This can confuse the perf code and cause warnings. This patch changes the interrupt return so that irqs-disabled MSR[EE]=1 contexts will be returned to with MSR[EE]=0 if a PACA_IRQ_MUST_HARD_MASK interrupt has become pending in the meantime. The longer explanation for what happens: 1. local_irq_disable() 2. Hash fault interrupt fires, do_hash_fault handler runs 3. interrupt_enter_prepare() sets IRQS_ALL_DISABLED 4. interrupt_enter_prepare() sets MSR[EE]=1 5. PMU interrupt fires, masked handler runs 6. Masked handler marks PMI pending 7. Masked handler returns with PACA_IRQ_HARD_DIS set, MSR[EE]=0 8. do_hash_fault interrupt return handler runs 9. interrupt_exit_kernel_prepare() clears PACA_IRQ_HARD_DIS 10. interrupt returns with MSR[EE]=1 11. PMU interrupt fires, perf handler runs Fixes: `4423eb5ae3` ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-4-npiggin@gmail.com	2022-09-28 19:22:11 +10:00
Nicholas Piggin	799f7063c7	powerpc/64: mark irqs hard disabled in boot paca This prevents interrupts in early boot (e.g., program check) from enabling MSR[EE], potentially causing endian mismatch or other crashes when reporting early boot traps. Fixes: `4423eb5ae3` ("powerpc/64/interrupt: make normal synchronous interrupts enable MSR[EE] if possible") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-3-npiggin@gmail.com	2022-09-28 19:22:11 +10:00
Nicholas Piggin	56adbb7a8b	powerpc/64/interrupt: Fix false warning in context tracking due to idle state Commit `171476775d` ("context_tracking: Convert state to atomic_t") added a CONTEXT_IDLE state which can be encountered by interrupts from kernel mode in the idle thread, causing a false positive warning. Fixes: `171476775d` ("context_tracking: Convert state to atomic_t") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926054305.2671436-2-npiggin@gmail.com	2022-09-28 19:22:11 +10:00
Nicholas Miehlbradt	a5edf9815d	powerpc/64s: Enable KFENCE on book3s64 KFENCE support was added for ppc32 in commit `90cbac0e99` ("powerpc: Enable KFENCE for PPC32"). Enable KFENCE on ppc64 architecture with hash and radix MMUs. It uses the same mechanism as debug pagealloc to protect/unprotect pages. All KFENCE kunit tests pass on both MMUs. KFENCE memory is initially allocated using memblock but is later marked as SLAB allocated. This necessitates the change to __pud_free to ensure that the KFENCE pages are freed appropriately. Based on previous work by Christophe Leroy and Jordan Niethe. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-4-nicholas@linux.ibm.com	2022-09-28 19:22:10 +10:00
Christophe Leroy	d7902d31cb	powerpc/64s: Allow double call of kernel_[un]map_linear_page() If the page is already mapped resp. already unmapped, bail out. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-3-nicholas@linux.ibm.com	2022-09-28 19:22:10 +10:00
Christophe Leroy	3e791d0f32	powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utils debug_pagealloc_enabled() is always defined and constant folds to 'false' when CONFIG_DEBUG_PAGEALLOC is not enabled. Remove the #ifdefs, the code and associated static variables will be optimised out by the compiler when CONFIG_DEBUG_PAGEALLOC is not defined. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-2-nicholas@linux.ibm.com	2022-09-28 19:22:10 +10:00
Nicholas Miehlbradt	5e8b2c4dd3	powerpc/64s: Add DEBUG_PAGEALLOC for radix There is support for DEBUG_PAGEALLOC on hash but not on radix. Add support on radix. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926075726.2846-1-nicholas@linux.ibm.com	2022-09-28 19:22:10 +10:00
Nicholas Piggin	7fd123e544	powerpc/64s: update cpu selection options Update the 64s GENERIC_CPU option. POWER4 support has been dropped, so make that clear in the option name. The POWER5_CPU option is dropped because it's uncommon, and GENERIC_CPU covers it. -mtune= before power8 is dropped because the minimum gcc version supports power8, and tuning is made consistent between big and little endian. A 970 option is added for PowerPC 970 / G5 because they still have a user base, and -mtune=power8 does not generate good code for the 970. This also updates the ISA versions document to add Power4/Power4+ because I didn't realise Power4+ used 2.01. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921014103.587954-2-npiggin@gmail.com	2022-09-28 19:22:10 +10:00
Nicholas Piggin	58ec7f06b7	powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5 Big-endian GENERIC_CPU supports 970, but builds with -mcpu=power5. POWER5 is ISA v2.02 whereas 970 is v2.01 plus Altivec. 2.02 added the popcntb instruction which a compiler might use. Use -mcpu=power4. Fixes: `471d7ff8b5` ("powerpc/64s: Remove POWER4 support") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921014103.587954-1-npiggin@gmail.com	2022-09-28 19:22:10 +10:00
Nicholas Piggin	9c7bfc2dc2	powerpc/64s: Make POWER10 and later use pause_short in cpu_relax loops We want to move away from using SMT priority updates for cpu_relax, and use a 'wait' instruction which is similar to x86. As well as being a much better fit for what everybody else uses and tests with, priority nops are stateful which is nasty (interrupts have to consider they might be taken at a different priority), and they're expensive to execute, similar to a mtSPR which can effect other threads in the pipe. This has shown to give results that are less affected by code alignment on benchmarks that cause a lot of spin waiting (e.g., rwsem contention on unixbench filesystem benchmarks) on POWER10. QEMU TCG only supports this instruction correctly since v7.1, versions without the fix may cause hangs whne running POWER10 CPUs. Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix checkpatch warnings RE the macros] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220920122259.363092-2-npiggin@gmail.com	2022-09-28 19:22:10 +10:00
Nicholas Piggin	dabeb572ad	powerpc: add ISA v3.0 / v3.1 wait opcode macro The wait instruction encoding changed between ISA v2.07 and ISA v3.0. In v3.1 the instruction gained a new field. Update the PPC_WAIT macro to the current encoding. Rename the older incompatible one with a _v203 suffix as it was introduced in v2.03 (the WC field was introduced in v2.07 but the kernel only uses WC=0). Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220920122259.363092-1-npiggin@gmail.com	2022-09-28 19:22:10 +10:00
Nicholas Piggin	c84550203b	powerpc/time: avoid programming DEC at the start of the timer interrupt Setting DEC to maximum at the start of the timer interrupt is not necessary and can be avoided for performance when MSR[EE] is not enabled during the handler as explained in commit `0faf20a1ad` ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use"), where this change was first attempted. The idea is that the timer interrupt runs with MSR[EE]=0, and at the end of the interrupt DEC is programmed to the next timer interval, so there is no need to clear the decrementer exception before then. When the above commit was merged, that was not quite true. The low res timer subsystem had some cases in the oneshot timer code where if the tick was to be stopped and no timers active, the clock device would not get the ->set_state_oneshot_stopped() call, so DEC would not get reprogrammed, and this would hang taking continual timer interrupts. So this was reverted in commit `d2b9be1f4a` ("powerpc/time: Always set decrementer in timer_interrupt()"), which was a partial revert of the above commit. Commit `62c1256d54` ("timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped") was later merged to fix this missing case in the timer subsystem, so now the behaviour can be restored. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220909142457.278032-1-npiggin@gmail.com	2022-09-28 19:22:09 +10:00
Pali Rohár	b19448fe84	powerpc: Add support for early debugging via Serial 16550 console Currently powerpc early debugging contains lot of platform specific options, but does not support standard UART / serial 16550 console. Later legacy_serial.c code supports registering UART as early debug console from device tree but it is not early during booting, but rather later after machine description code finishes. So for real early debugging via UART is current code unsuitable. Add support for new early debugging option CONFIG_PPC_EARLY_DEBUG_16550 which enable Serial 16550 console on address defined by new option CONFIG_PPC_EARLY_DEBUG_16550_PHYSADDR and by stride by option CONFIG_PPC_EARLY_DEBUG_16550_STRIDE. With this change it is possible to debug powerpc machine descriptor code. For example this early debugging code can print on serial console also "No suitable machine description found" error which is done before legacy_serial.c code. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220822231501.16827-1-pali@kernel.org	2022-09-28 19:22:09 +10:00
Hari Bathini	bd7dc90e52	powerpc/64/kdump: Limit kdump base to 512MB Since commit `e641eb03ab` ("powerpc: Fix up the kdump base cap to 128M") memory for kdump kernel has been reserved at an offset of 128MB. This held up well for a long time before running into boot failure on LPARs having a lot of cores. Commit `7c5ed82b80` ("powerpc: Set crashkernel offset to mid of RMA region") fixed this boot failure by moving the offset to mid of RMA region. This change meant the offset is either 256MB or 512MB on LPARs as ppc64_rma_size was 512MB or 1024MB owing to commit `103a8542cb` ("powerpc/book3s64/ radix: Fix boot failure with large amount of guest memory"). But ppc64_rma_size can be larger as well with newer f/w. So, limit crashkernel reservation offset to 512MB to avoid running into boot failures during kdump kernel boot, due to RTAS or other allocation restrictions. Also, while here, use SZ_128M instead of opening coding it. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220912065031.57416-1-hbathini@linux.ibm.com	2022-09-28 19:22:09 +10:00
Rohan McLure	7e92e01b72	powerpc: Provide syscall wrapper Implement syscall wrapper as per s390, x86, arm64. When enabled cause handlers to accept parameters from a stack frame rather than from user scratch register state. This allows for user registers to be safely cleared in order to reduce caller influence on speculation within syscall routine. The wrapper is a macro that emits syscall handler symbols that call into the target handler, obtaining its parameters from a struct pt_regs on the stack. As registers are already saved to the stack prior to calling system_call_exception, it appears that this function is executed more efficiently with the new stack-pointer convention than with parameters passed by registers, avoiding the allocation of a stack frame for this method. On a 32-bit system, we see >20% performance increases on the null_syscall microbenchmark, and on a Power 8 the performance gains amortise the cost of clearing and restoring registers which is implemented at the end of this series, seeing final result of ~5.6% performance improvement on null_syscall. Syscalls are wrapped in this fashion on all platforms except for the Cell processor as this commit does not provide SPU support. This can be quickly fixed in a successive patch, but requires spu_sys_callback to allocate a pt_regs structure to satisfy the wrapped calling convention. Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmai.com> [mpe: Make incompatible with COMPAT to retain clearing of high bits of args] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-22-rmclure@linux.ibm.com	2022-09-28 19:22:09 +10:00
Rohan McLure	f8971c627b	powerpc: Change system_call_exception calling convention Change system_call_exception arguments to pass a pointer to a stack frame container caller state, as well as the original r0, which determines the number of the syscall. This has been observed to yield improved performance to passing them by registers, circumventing the need to allocate a stack frame. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Retain clearing of high bits of args for compat tasks] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-21-rmclure@linux.ibm.com	2022-09-28 19:22:09 +10:00
Rohan McLure	8640de0dee	powerpc: Use common syscall handler type Cause syscall handlers to be typed as follows when called indirectly throughout the kernel. This is to allow for better type checking. typedef long (syscall_fn)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); Since both 32 and 64-bit abis allow for at least the first six machine-word length parameters to a function to be passed by registers, even handlers which admit fewer than six parameters may be viewed as having the above type. Coercing syscalls to syscall_fn requires a cast to void to avoid -Wcast-function-type. Fixup comparisons in VDSO to avoid pointer-integer comparison. Introduce explicit cast on systems with SPUs. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-19-rmclure@linux.ibm.com	2022-09-28 19:22:09 +10:00
Rohan McLure	39859aea41	powerpc: Enable compile-time check for syscall handlers The table of syscall handlers and registered compatibility syscall handlers has in past been produced using assembly, with function references resolved at link time. This moves link-time errors to compile-time, by rewriting systbl.S in C, and including the linux/syscalls.h, linux/compat.h and asm/syscalls.h headers for prototypes. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-18-rmclure@linux.ibm.com	2022-09-28 19:22:09 +10:00
Rohan McLure	8cd1def4b8	powerpc: Include all arch-specific syscall prototypes Forward declare all syscall handler prototypes where a generic prototype is not provided in either linux/syscalls.h or linux/compat.h in asm/syscalls.h. This is required for compile-time type-checking for syscall handlers, which is implemented later in this series. 32-bit compatibility syscall handlers are expressed in terms of types in ppc32.h. Expose this header globally. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use standard include guard naming for syscalls_32.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-17-rmclure@linux.ibm.com	2022-09-28 19:22:08 +10:00
Rohan McLure	dec20c50df	powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers Arch-specific implementations of syscall handlers are currently used over generic implementations for the following reasons: 1. Semantics unique to powerpc 2. Compatibility syscalls require 'argument padding' to comply with 64-bit argument convention in ELF32 abi. 3. Parameter types or order is different in other architectures. These syscall handlers have been defined prior to this patch series without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with custom input and output types. We remove every such direct definition in favour of the aforementioned macros. Also update syscalls.tbl in order to refer to the symbol names generated by each of these macros. Since ppc64_personality can be called by both 64 bit and 32 bit binaries through compatibility, we must generate both both compat_sys_ and sys_ symbols for this handler. As an aside: A number of architectures including arm and powerpc agree on an alternative argument order and numbering for most of these arch-specific handlers. A future patch series may allow for asm/unistd.h to signal through its defines that a generic implementation of these syscall handlers with the correct calling convention be emitted, through the __ARCH_WANT_COMPAT_SYS_... convention. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-16-rmclure@linux.ibm.com	2022-09-28 19:22:08 +10:00
Rohan McLure	ac17defbeb	powerpc: Provide do_ppc64_personality helper Avoid duplication in future patch that will define the ppc64_personality syscall handler in terms of the SYSCALL_DEFINE and COMPAT_SYSCALL_DEFINE macros, by extracting the common body of ppc64_personality into a helper function. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-15-rmclure@linux.ibm.com	2022-09-28 19:22:08 +10:00
Rohan McLure	b7fa9ce86d	powerpc: Remove direct call to mmap2 syscall handlers Syscall handlers should not be invoked internally by their symbol names, as these symbols defined by the architecture-defined SYSCALL_DEFINE macro. Move the compatibility syscall definition for mmap2 to syscalls.c, so that all mmap implementations can share a helper function. Remove 'inline' on static mmap helper. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix compat_sys_mmap2() prototype and offset handling] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-14-rmclure@linux.ibm.com	2022-09-28 19:21:26 +10:00
Nicholas Piggin	1a5486b3c3	KVM: PPC: Book3S HV P9: Restore stolen time logging in dtl Stolen time logging in dtl was removed from the P9 path, so guests had no stolen time accounting. Add it back in a simpler way that still avoids locks and per-core accounting code. Fixes: `ecb6a7207f` ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-4-npiggin@gmail.com	2022-09-28 01:07:19 +10:00
Nicholas Piggin	b31bc24a49	KVM: PPC: Book3S HV: Update guest state entry/exit accounting to new API Update the guest state and timing entry/exit accounting to use the new API, which was introduced following issues found[1]. KVM HV does possibly call instrumented code inside the guest context, and it does call srcu inside the guest context which is fragile at best. Switch to the new API, moving the guest context inside the srcu_read_lock/unlock region. [1] https://lore.kernel.org/lkml/20220201132926.3301912-1-mark.rutland@arm.com/ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-3-npiggin@gmail.com	2022-09-28 01:07:19 +10:00
Nicholas Piggin	c953f7500b	KVM: PPC: Book3S HV P9: Fix irq disabling in tick accounting kvmhv_run_single_vcpu() disables PMIs as well as Linux irqs, however the tick time accounting code enables and disables irqs and not PMIs within this region. By chance this might not actually cause a bug, but it is clearly an incorrect use of the APIs. Fixes: `2251fbe763` ("KVM: PPC: Book3S HV P9: Improve mtmsrd scheduling by delaying MSR[EE] disable") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-2-npiggin@gmail.com	2022-09-28 01:07:19 +10:00
Nicholas Piggin	bc91c04bff	KVM: PPC: Book3S HV P9: Clear vcpu cpu fields before enabling host irqs On guest entry, vcpu->cpu and vcpu->arch.thread_cpu are set after disabling host irqs. On guest exit there is a window whre tick time accounting briefly enables irqs before these fields are cleared. Move them up to ensure they are cleared before host irqs are run. This is possibly not a problem, but is more symmetric and makes the fields less surprising. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220908132545.4085849-1-npiggin@gmail.com	2022-09-28 01:07:19 +10:00
Fabiano Rosas	0a5bfb824a	KVM: PPC: Book3S HV: Fix decrementer migration We used to have a workaround[1] for a hang during migration that was made ineffective when we converted the decrementer expiry to be relative to guest timebase. The point of the workaround was that in the absence of an explicit decrementer expiry value provided by userspace during migration, KVM needs to initialize dec_expires to a value that will result in an expired decrementer after subtracting the current guest timebase. That stops the vcpu from hanging after migration due to a decrementer that's too large. If the dec_expires is now relative to guest timebase, its initialization needs to be guest timebase-relative as well, otherwise we end up with a decrementer expiry that is still larger than the guest timebase. 1- https://git.kernel.org/torvalds/c/5855564c8ab2 Fixes: `3c1a4322bb` ("KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220816222517.1916391-1-farosas@linux.ibm.com	2022-09-28 01:07:19 +10:00
Matthew Wilcox (Oracle)	405e669172	powerpc: remove mmap linked list walks Use the VMA iterator instead. Link: https://lkml.kernel.org/r/20220906194824.2110408-34-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-09-26 19:46:19 -07:00
Linus Torvalds	3800a713b6	26 hotfixes. 8 are for issues which were introduced during this -rc cycle, 18 are for earlier issues, and are cc:stable. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYzH+NgAKCRDdBJ7gKXxA ju4AAQDrFWErVp+ra5P66SSbiFmm8NAW1awt4nHwAPcihNf3yQD/eQcB3w2q0Dm1 9HjsyEVkTYIeaJSAbCraDnMwUdWTIgY= =p5+0 -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull last (?) hotfixes from Andrew Morton: "26 hotfixes. 8 are for issues which were introduced during this -rc cycle, 18 are for earlier issues, and are cc:stable" * tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits) x86/uaccess: avoid check_object_size() in copy_from_user_nmi() mm/page_isolation: fix isolate_single_pageblock() isolation behavior mm,hwpoison: check mm when killing accessing process mm/hugetlb: correct demote page offset logic mm: prevent page_frag_alloc() from corrupting the memory mm: bring back update_mmu_cache() to finish_fault() frontswap: don't call ->init if no ops are registered mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all() mm: fix madivse_pageout mishandling on non-LRU page powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush mm: gup: fix the fast GUP race against THP collapse mm: fix dereferencing possible ERR_PTR vmscan: check folio_test_private(), not folio_get_private() mm: fix VM_BUG_ON in __delete_from_swap_cache() tools: fix compilation after gfp_types.h split mm/damon/dbgfs: fix memory leak when using debugfs_lookup() mm/migrate_device.c: copy pte dirty bit to page mm/migrate_device.c: add missing flush_cache_page() mm/migrate_device.c: flush TLB while holding PTL x86/mm: disable instrumentations of mm/pgprot.c ...	2022-09-26 13:23:15 -07:00
Andrew Morton	6d751329e7	Merge branch 'mm-hotfixes-stable' into mm-stable	2022-09-26 13:13:15 -07:00
Yang Shi	bedf034169	powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will move to use RCU instead of disabling local interrupts in fast-GUP. Using an IPI is the old-styled way of serializing against fast-GUP although it still works as expected now. And fast-GUP now fixed the potential race with THP collapse by checking whether PMD is changed or not. So IPI broadcast in radix pmd collapse flush is not necessary anymore. But it is still needed for hash TLB. Link: https://lkml.kernel.org/r/20220907180144.555485-2-shy828301@gmail.com Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2022-09-26 12:14:33 -07:00
Paolo Bonzini	c59fb12758	KVM: remove KVM_REQ_UNHALT KVM_REQ_UNHALT is now unnecessary because it is replaced by the return value of kvm_vcpu_block/kvm_vcpu_halt. Remove it. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20220921003201.1441511-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-26 12:37:21 -04:00
Rohan McLure	4df0221f9d	powerpc: Remove direct call to personality syscall handler Syscall handlers should not be invoked internally by their symbol names, as these symbols defined by the architecture-defined SYSCALL_DEFINE macro. Fortunately, in the case of ppc64_personality, its call to sys_personality can be replaced with an invocation to the equivalent ksys_personality inline helper in <linux/syscalls.h>. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-13-rmclure@linux.ibm.com	2022-09-26 23:00:16 +10:00
Rohan McLure	b6b1334c95	powerpc/32: Remove powerpc select specialisation Syscall #82 has been implemented for 32-bit platforms in a unique way on powerpc systems. This hack will in effect guess whether the caller is expecting new select semantics or old select semantics. It does so via a guess, based off the first parameter. In new select, this parameter represents the length of a user-memory array of file descriptors, and in old select this is a pointer to an arguments structure. The heuristic simply interprets sufficiently large values of its first parameter as being a call to old select. The following is a discussion on how this syscall should be handled. As discussed in this thread, the existence of such a hack suggests that for whatever powerpc binaries may predate glibc, it is most likely that they would have taken use of the old select semantics. x86 and arm64 both implement this syscall with oldselect semantics. Remove the powerpc implementation, and update syscall.tbl to refer to emit a reference to sys_old_select and compat_sys_old_select for 32-bit binaries, in keeping with how other architectures support syscall #82. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a1a0@csgroup.eu/ Link: https://lore.kernel.org/r/20220921065605.1051927-12-rmclure@linux.ibm.com	2022-09-26 23:00:15 +10:00
Rohan McLure	c2e7a19827	powerpc: Use generic fallocate compatibility syscall The powerpc fallocate compat syscall handler is identical to the generic implementation provided by commit `59c10c52f5` ("riscv: compat: syscall: Add compat_sys_call_table implementation"), and as such can be removed in favour of the generic implementation. A future patch series will replace more architecture-defined syscall handlers with generic implementations, dependent on introducing generic implementations that are compatible with powerpc and arm's parameter reorderings. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-11-rmclure@linux.ibm.com	2022-09-26 23:00:15 +10:00
Rohan McLure	016ff72bd2	powerpc: Fix fallocate and fadvise64_64 compat parameter combination As reported[1] by Arnd, the arch-specific fadvise64_64 and fallocate compatibility handlers assume parameters are passed with 32-bit big-endian ABI. This affects the assignment of odd-even parameter pairs to the high or low words of a 64-bit syscall parameter. Fix fadvise64_64 fallocate compat handlers to correctly swap upper/lower 32 bits conditioned on endianness. A future patch will replace the arch-specific compat fallocate with an asm-generic implementation. This patch is intended for ease of back-port. [1]: https://lore.kernel.org/all/be29926f-226e-48dc-871a-e29a54e80583@www.fastmail.com/ Fixes: `57f48b4b74` ("powerpc/compat_sys: swap hi/lo parts of 64-bit syscall args in LE mode") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-9-rmclure@linux.ibm.com	2022-09-26 23:00:15 +10:00
Rohan McLure	620f5c59c8	powerpc/64s: Fix comment on interrupt handler prologue Interrupt handlers on 64s systems will often need to save register state from the interrupted process to make space for loading special purpose registers or for internal state. Fix a comment documenting a common code path macro in the beginning of interrupt handlers where r10 is saved to the PACA to afford space for the value of the CFAR. Comment is currently written as if r10-r12 are saved to PACA, but in fact only r10 is saved, with r11-r12 saved much later. The distance in code between these saves has grown over the many revisions of this macro. Fix this by signalling with a comment where r11-r12 are saved to the PACA. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reported-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-8-rmclure@linux.ibm.com	2022-09-26 23:00:15 +10:00
Rohan McLure	53ecaa6778	powerpc/64e: Clarify register saves and clears with {SAVE,ZEROIZE}_GPRS The common interrupt handler prologue macro and the bad_stack trampolines include consecutive sequences of register saves, and some register clears. Neaten such instances by expanding use of the SAVE_GPRS macro and employing the ZEROIZE_GPR macro when appropriate. Also simplify an invocation of SAVE_GPRS targetting all non-volatile registers to SAVE_NVGPRS. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reported-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-7-rmclure@linux.ibm.com	2022-09-26 23:00:15 +10:00
Rohan McLure	15ba74502c	powerpc/32: Clarify interrupt restores with REST_GPR macro in entry_32.S Restoring the register state of the interrupted thread involves issuing a large number of predictable loads to the kernel stack frame. Issue the REST_GPR{,S} macros to clearly signal when this is happening, and bunch together restores at the end of the interrupt handler where the saved value is not consumed earlier in the handler code. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-6-rmclure@linux.ibm.com	2022-09-26 23:00:15 +10:00
Rohan McLure	2b1dac4b5f	powerpc/64s: Use {ZEROIZE,SAVE,REST}_GPRS macros in sc, scv 0 handlers Use the convenience macros for saving/clearing/restoring gprs in keeping with syscall calling conventions. The plural variants of these macros can store a range of registers for concision. This works well when the user gpr value we are hoping to save is still live. In the syscall interrupt handlers, user register state is sometimes juggled between registers. Hold-off from issuing the SAVE_GPR macro for applicable neighbouring lines to highlight the delicate register save logic. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-5-rmclure@linux.ibm.com	2022-09-26 23:00:15 +10:00
Rohan McLure	9d54a5ce3a	powerpc: Add ZEROIZE_GPRS macros for register clears Provide register zeroing macros, following the same convention as existing register stack save/restore macros, to be used in later change to concisely zero a sequence of consecutive gprs. The resulting macros are called ZEROIZE_GPRS and ZEROIZE_NVGPRS, keeping with the naming of the accompanying restore and save macros, and usage of zeroize to describe this operation elsewhere in the kernel. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-4-rmclure@linux.ibm.com	2022-09-26 23:00:14 +10:00
Rohan McLure	2c27d4a419	powerpc: Save caller r3 prior to system_call_exception This reverts commit `8875f47b76` ("powerpc/syscall: Save r3 in regs->orig_r3 "). Save caller's original r3 state to the kernel stackframe before entering system_call_exception. This allows for user registers to be cleared by the time system_call_exception is entered, reducing the influence of user registers on speculation within the kernel. Prior to this commit, orig_r3 was saved at the beginning of system_call_exception. Instead, save orig_r3 while the user value is still live in r3. Also replicate this early save in 32-bit. A similar save was removed in commit `6f76a01173` ("powerpc/syscall: implement system call entry/exit logic in C for PPC32") when 32-bit adopted system_call_exception. Revert its removal of orig_r3 saves. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-3-rmclure@linux.ibm.com	2022-09-26 23:00:14 +10:00
Rohan McLure	5ba6c9a912	powerpc: Remove asmlinkage from syscall handler definitions The asmlinkage macro has no special meaning in powerpc, and prior to this patch is used sporadically on some syscall handler definitions. On architectures that do not define asmlinkage, it resolves to extern "C" for C++ compilers and a nop otherwise. The current invocations of asmlinkage provide far from complete support for C++ toolchains, and so the macro serves no purpose in powerpc. Remove all invocations of asmlinkage in arch/powerpc. These incidentally only occur in syscall definitions and prototypes. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-2-rmclure@linux.ibm.com	2022-09-26 23:00:14 +10:00
Christophe Leroy	4af8354553	powerpc/irq: Refactor irq_soft_mask_{set,or}_return() This partialy reapply commit `ef5b570d37` ("powerpc/irq: Don't open code irq_soft_mask helpers") which was reverted by commit `684c68d92e` ("Revert "powerpc/irq: Don't open code irq_soft_mask helpers"") irq_soft_mask_set_return() and irq_soft_mask_or_return() are overset of irq_soft_mask_set(). Have them use irq_soft_mask_set() instead of duplicating it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/18473da42362ee8f07bce36b9caef8ba77d7633f.1663656054.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:14 +10:00
Christophe Leroy	605ba9ee8a	powerpc: Remove impossible mmu_psize_defs[] on nohash Today there is: if e500 or 8xx if e500 mmu_psize_defs[] = else if 8xx mmu_psize_defs[] = else mmu_psize_defs[] = endif endif The else leg is dead definition. Drop that else leg and rewrite as: if e500 mmu_psize_defs[] = endif if 8xx mmu_psize_defs[] = endif Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/030a843449f348c0b709ca5349640624f36a016f.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:14 +10:00
Christophe Leroy	6556fd1a1e	powerpc: Cleanup idle for e500 e500 idle setup is a bit messy. e500_idle() is used for PPC32 while book3e_idle() is used for PPC64. As they are mutually exclusive, call them all e500_idle(). Use CONFIG_MPC_85xx instead of PPC32 + E500 in Makefile and rename idle_e500.c to idle_85xx.c . Rename idle_book3e.c to idle_64e.c and remove #ifdef PPC64 in as it's only built on PPC64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8039301334e948974c85ec5ef2db37751075185b.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:14 +10:00
Christophe Leroy	73d1149879	powerpc: Simplify redundant Kconfig tests PPC_85xx implies PPC32 so no need to check PPC32 in addition. PPC64 && !PPC_BOOK3E_64 means PPC_BOOK3S_64. PPC_BOOK3E_64 implies PPC_E500. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/244cce3e603f2b79796314c0c1c46cab927b9adc.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:14 +10:00
Christophe Leroy	772fd56dec	powerpc: Replace PPC_85xx \|\| PPC_BOOKE_64 by PPC_E500 PPC_E500 is the same as PPC_85xx \|\| PPC_BOOKE_64 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/af79696f8cb8536fb4e20c0d98a6bf159a9e371b.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:14 +10:00
Christophe Leroy	aa5f59df20	powerpc: Remove CONFIG_PPC_BOOK3E_MMU CONFIG_PPC_BOOK3E_MMU is redundant with CONFIG_PPC_E500. Remove it. Also rename mmu-book3e.h to mmu-e500.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c5549cd59a131204ff94ab909cad2e2dad4ddf2f.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:14 +10:00
Christophe Leroy	3e7318584d	powerpc: Remove CONFIG_PPC_FSL_BOOK3E CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500. Remove it. And rename five files accordingly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Rename include guards to match new file names] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/795cb93b88c9a0279289712e674f39e3b108a1b4.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:13 +10:00
Christophe Leroy	688de017ef	powerpc: Change CONFIG_E500 to CONFIG_PPC_E500 It will be used outside arch/powerpc, make it clear its a powerpc configuration item. And we already have CONFIG_PPC_E500MC, so that will make it more consistent. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e63b22083c11c4300f4a82d3123a46e5fdd54fa6.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:13 +10:00
Christophe Leroy	1df399012b	powerpc: Remove redundant selection of E500 and E500MC PPC_85xx and PPC_BOOK3E_64 already select E500 so no need to select it again by PPC_QEMU_E500 and CORENET_GENERIC as they depend on PPC_85xx \|\| PPC_BOOK3E_64. PPC_BOOK3E_64 already selects E500MC so no need to select it again by PPC_QEMU_E500 if PPC64, PPC_BOOK3E_64 is the only way into PPC_QEMU_E500 with PPC64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/44f03fa1506892fabf626dceb2f47a049908b6af.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:13 +10:00
Christophe Leroy	e0d68273d7	powerpc: Remove CONFIG_PPC_BOOK3E CONFIG_PPC_BOOK3E is redundant with CONFIG_PPC_BOOK3E_64. The later is more explicit about the fact that it's a 64 bits target. Remove CONFIG_PPC_BOOK3E. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5d0891490813c19cdcfc04678f512ea68cba3e64.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:13 +10:00
Christophe Leroy	d7216567c6	powerpc/cputable: Split cpu_specs[] for mpc85xx and e500mc e500v1/v2 and e500mc are said to be mutually exclusive in Kconfig. Split e500 cpu_specs[] and then restrict the non e500mc to PPC32 which is then 85xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Tweak formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/553b901ea91e393df231103da4b018e9b251b0e9.1663606876.git.christophe.leroy@csgroup.eu	2022-09-26 23:00:05 +10:00

1 2 3 4 5 ...

25609 Commits