OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Michael Ellerman	f24f21c412	Merge branch 'topic/objtool' into next Merge the powerpc objtool support, which we were keeping in a topic branch in case of any merge conflicts.	2022-12-08 23:57:47 +11:00
Nicholas Piggin	d2e8ff9f14	powerpc: add a definition for the marker offset within the interrupt frame Define a constant rather than open-code the offset for the "regs" marker. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221127124942.1665522-9-npiggin@gmail.com	2022-12-02 17:54:08 +11:00
Sathvika Vasireddy	8d0c21b506	powerpc: Curb objtool unannotated intra-function call warnings objtool throws the following unannotated intra-function call warnings: arch/powerpc/kernel/entry_64.o: warning: objtool: .text+0x4: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe64: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xee4: unannotated intra-function call Fix these warnings by annotating intra-function calls, using ANNOTATE_INTRA_FUNCTION_CALL macro, to indicate that the branch targets are valid. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-5-sv@linux.ibm.com	2022-11-15 20:11:47 +11:00
Sathvika Vasireddy	29a011fc79	powerpc: Fix objtool unannotated intra-function call warnings Objtool throws unannotated intra-function call warnings in the following assembly files: arch/powerpc/kernel/vector.o: warning: objtool: .text+0x53c: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x60: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x124: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x5d4: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x5dc: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xcb8: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xd0c: unannotated intra-function call arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0x1030: unannotated intra-function call arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x358: unannotated intra-function call arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x728: unannotated intra-function call arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x4d94: unannotated intra-function call arch/powerpc/kernel/head_64.o: warning: objtool: .text+0x4ec4: unannotated intra-function call arch/powerpc/kvm/book3s_hv_interrupts.o: warning: objtool: .text+0x6c: unannotated intra-function call arch/powerpc/kernel/misc_64.o: warning: objtool: .text+0x64: unannotated intra-function call Objtool does not add STT_NOTYPE symbols with size 0 to the rbtree, which is why find_call_destination() function is not able to find the destination symbol for 'bl' instruction. For such symbols, objtool is throwing unannotated intra-function call warnings in assembly files. Fix these warnings by annotating those symbols with SYM_FUNC_START_LOCAL and SYM_FUNC_END macros, inorder to set symbol type to STT_FUNC and symbol size accordingly. Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114175754.1131267-4-sv@linux.ibm.com	2022-11-15 20:11:47 +11:00
Nicholas Piggin	376b3275c1	KVM: PPC: Book3S HV: Fix stack frame regs marker The hard-coded marker is out of date now, fix it using the nice define. Fixes: `17773afdcd` ("powerpc/64: use 32-bit immediate for STACK_FRAME_REGS_MARKER") Reported-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221006143345.129077-1-npiggin@gmail.com	2022-10-07 21:30:25 +11:00
Nicholas Piggin	8e93fb33c8	powerpc/64: provide a helper macro to load r2 with the kernel TOC A later change stops the kernel using r2 and loads it with a poison value. Provide a PACATOC loading abstraction which can hide this detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926034057.2360083-5-npiggin@gmail.com	2022-09-28 19:22:12 +10:00
Fabiano Rosas	3f8ed993be	KVM: PPC: Book3S HV: Add a new config for P8 debug timing Turn the existing Kconfig KVM_BOOK3S_HV_EXIT_TIMING into KVM_BOOK3S_HV_P8_TIMING in preparation for the addition of a new config for P9 timings. This applies only to P8 code, the generic timing code is still kept under KVM_BOOK3S_HV_EXIT_TIMING. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220525130554.2614394-3-farosas@linux.ibm.com	2022-06-29 19:21:21 +10:00
Alexey Kardashevskiy	b22af90419	KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers Currently we have 2 sets of interrupt controller hypercalls handlers for real and virtual modes, this is from POWER8 times when switching MMU on was considered an expensive operation. POWER9 however does not have dependent threads and MMU is enabled for handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers never execute on real P9 and later CPUs. This untemplate the handlers and only keeps the real mode handlers for XICS native (up to POWER8) and remove the rest of dead code. Changes in functions are mechanical except few missing empty lines to make checkpatch.pl happy. The default implemented hcalls list already contains XICS hcalls so no change there. This should not cause any behavioral change. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220509071150.181250-1-aik@ozlabs.ru	2022-05-19 00:44:28 +10:00
Alexey Kardashevskiy	cad32d9d42	KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation. The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine. If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version. On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs. So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput. This removes the real mode handlers from KVM and related code from the powernv platform. This updates the list of implemented hcalls in KVM-HV as the realmode handlers are removed. This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru	2022-05-19 00:44:01 +10:00
Nicholas Piggin	5d506f159b	KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested The LPID allocator init is changed to: - use mmu_lpid_bits rather than hard-coding; - use KVM_MAX_NESTED_GUESTS for nested hypervisors; - not reserve the top LPID on POWER9 and newer CPUs. The reserved LPID is made a POWER7/8-specific detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-3-npiggin@gmail.com	2022-05-13 21:33:33 +10:00
Nicholas Piggin	aebd1fb45c	powerpc: flexible GPR range save/restore macros Introduce macros that operate on a (start, end) range of GPRs, which reduces lines of code and need to do mental arithmetic while reading the code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com	2021-11-29 23:15:20 +11:00
Nicholas Piggin	3c1a4322bb	KVM: PPC: Book3S HV: Change dec_expires to be relative to guest timebase Change dec_expires to be relative to the guest timebase, and allow it to be moved into low level P9 guest entry functions, to improve SPR access scheduling. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-23-npiggin@gmail.com	2021-11-24 21:08:59 +11:00
Nicholas Piggin	a1a19e1154	KVM: PPC: Book3S HV: CTRL SPR does not require read-modify-write Processors that support KVM HV do not require read-modify-write of the CTRL SPR to set/clear their thread's runlatch. Just write 1 or 0 to it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-18-npiggin@gmail.com	2021-11-24 21:08:58 +11:00
Nicholas Piggin	57dc0eed73	KVM: PPC: Book3S HV P9: Implement PMU save/restore in C Implement the P9 path PMU save/restore code in C, and remove the POWER9/10 code from the P7/8 path assembly. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-14-npiggin@gmail.com	2021-11-24 21:08:58 +11:00
Nicholas Piggin	46f9caf1a2	powerpc/64s: Keep AMOR SPR a constant ~0 at runtime This register controls supervisor SPR modifications, and as such is only relevant for KVM. KVM always sets AMOR to ~0 on guest entry, and never restores it coming back out to the host, so it can be kept constant and avoid the mtSPR in KVM guest entry. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211123095231.1036501-10-npiggin@gmail.com	2021-11-24 21:08:57 +11:00
Michael Ellerman	dae5818646	KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr() kvmppc_h_set_dabr(), and kvmppc_h_set_xdabr() which jumps into it, need to use _GLOBAL_TOC to setup the kernel TOC pointer, because kvmppc_h_set_dabr() uses LOAD_REG_ADDR() to load dawr_force_enable. When called from hcall_try_real_mode() we have the kernel TOC in r2, established near the start of kvmppc_interrupt_hv(), so there is no issue. But they can also be called from kvmppc_pseries_do_hcall() which is module code, so the access ends up happening with the kvm-hv module's r2, which will not point at dawr_force_enable and could even cause a fault. With the current code layout and compilers we haven't observed a fault in practice, the load hits somewhere in kvm-hv.ko and silently returns some bogus value. Note that we we expect p8/p9 guests to use the DAWR, but SLOF uses h_set_dabr() to test if sc1 works correctly, see SLOF's lib/libhvcall/brokensc1.c. Fixes: `c1fe190c06` ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/20210923151031.72408-1-mpe@ellerman.id.au	2021-11-15 15:46:45 +11:00
Michael Ellerman	cdeb5d7d89	KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest We call idle_kvm_start_guest() from power7_offline() if the thread has been requested to enter KVM. We pass it the SRR1 value that was returned from power7_idle_insn() which tells us what sort of wakeup we're processing. Depending on the SRR1 value we pass in, the KVM code might enter the guest, or it might return to us to do some host action if the wakeup requires it. If idle_kvm_start_guest() is able to handle the wakeup, and enter the guest it is supposed to indicate that by returning a zero SRR1 value to us. That was the behaviour prior to commit `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C"), however in that commit the handling of SRR1 was reworked, and the zeroing behaviour was lost. Returning from idle_kvm_start_guest() without zeroing the SRR1 value can confuse the host offline code, causing the guest to crash and other weirdness. Fixes: `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015133929.832061-2-mpe@ellerman.id.au	2021-10-16 00:40:03 +11:00
Michael Ellerman	9b4416c509	KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest() In commit `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C") kvm_start_guest() became idle_kvm_start_guest(). The old code allocated a stack frame on the emergency stack, but didn't use the frame to store anything, and also didn't store anything in its caller's frame. idle_kvm_start_guest() on the other hand is written more like a normal C function, it creates a frame on entry, and also stores CR/LR into its callers frame (per the ABI). The problem is that there is no caller frame on the emergency stack. The emergency stack for a given CPU is allocated with: paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE; So emergency_sp actually points to the first address above the emergency stack allocation for a given CPU, we must not store above it without first decrementing it to create a frame. This is different to the regular kernel stack, paca->kstack, which is initialised to point at an initial frame that is ready to use. idle_kvm_start_guest() stores the backchain, CR and LR all of which write outside the allocation for the emergency stack. It then creates a stack frame and saves the non-volatile registers. Unfortunately the frame it creates is not large enough to fit the non-volatiles, and so the saving of the non-volatile registers also writes outside the emergency stack allocation. The end result is that we corrupt whatever is at 0-24 bytes, and 112-248 bytes above the emergency stack allocation. In practice this has gone unnoticed because the memory immediately above the emergency stack happens to be used for other stack allocations, either another CPUs mc_emergency_sp or an IRQ stack. See the order of calls to irqstack_early_init() and emergency_stack_init(). The low addresses of another stack are the top of that stack, and so are only used if that stack is under extreme pressue, which essentially never happens in practice - and if it did there's a high likelyhood we'd crash due to that stack overflowing. Still, we shouldn't be corrupting someone else's stack, and it is purely luck that we aren't corrupting something else. To fix it we save CR/LR into the caller's frame using the existing r1 on entry, we then create a SWITCH_FRAME_SIZE frame (which has space for pt_regs) on the emergency stack with the backchain pointing to the existing stack, and then finally we switch to the new frame on the emergency stack. Fixes: `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015133929.832061-1-mpe@ellerman.id.au	2021-10-16 00:39:54 +11:00
Nicholas Piggin	267cdfa213	KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers POWER9 DD2.2 and 2.3 hardware implements a "fake-suspend" mode where certain TM instructions executed in HV=0 mode cause softpatch interrupts so the hypervisor can emulate them and prevent problematic processor conditions. In this fake-suspend mode, the treclaim. instruction does not modify registers. Unfortunately the rfscv instruction executed by the guest do not generate softpatch interrupts, which can cause the hypervisor to lose track of the fake-suspend mode, and it can execute this treclaim. while not in fake-suspend mode. This modifies GPRs and crashes the hypervisor. It's not trivial to disable scv in the guest with HFSCR now, because they assume a POWER9 has scv available. So this fix saves and restores checkpointed registers across the treclaim. Fixes: `7854f7545b` ("KVM: PPC: Book3S: Rework TM save/restore code and make it C-callable") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210908101718.118522-2-npiggin@gmail.com	2021-09-13 22:34:12 +10:00
Nicholas Piggin	daac40e8d7	KVM: PPC: Book3S HV: Remove TM emulation from POWER7/8 path TM fake-suspend emulation is only used by POWER9. Remove it from the old code path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210811160134.904987-3-npiggin@gmail.com	2021-08-25 16:37:17 +10:00
Nicholas Piggin	fae5c9f366	KVM: PPC: Book3S HV: remove ISA v3.0 and v3.1 support from P7/8 path POWER9 and later processors always go via the P9 guest entry path now. Remove the remaining support from the P7/8 path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-33-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	079a09a500	KVM: PPC: Book3S HV P9: implement hash guest support Implement hash guest support. Guest entry/exit has to restore and save/clear the SLB, plus several other bits to accommodate hash guests in the P9 path. Radix host, hash guest support is removed from the P7/8 path. The HPT hcalls and faults are not handled in real mode, which is a performance regression. A worst-case fork/exit microbenchmark takes 3x longer after this patch. kbuild benchmark performance is in the noise, but the slowdown is likely to be noticed somewhere. For now, accept this penalty for the benefit of simplifying the P7/8 paths and unifying P9 hash with the new code, because hash is a less important configuration than radix on processors that support it. Hash will benefit from future optimisations to this path, including possibly a faster path to handle such hcalls and interrupts without doing a full exit. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-31-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	dcbac73a5b	KVM: PPC: Book3S HV: Remove virt mode checks from real mode handlers Now that the P7/8 path no longer supports radix, real-mode handlers do not need to deal with being called in virt mode. This change effectively reverts commit `acde25726b` ("KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers"). It removes a few more real-mode tests in rm hcall handlers, which allows the indirect ops for the xive module to be removed from the built-in xics rm handlers. kvmppc_h_random is renamed to kvmppc_rm_h_random to be a bit more descriptive and consistent with other rm handlers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-25-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	9769a7fd79	KVM: PPC: Book3S HV: Remove radix guest support from P7/8 path The P9 path now runs all supported radix guest combinations, so remove radix guest support from the P7/8 path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-24-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	2e1ae9cd56	KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU Rather than partition the guest PID space + flush a rogue guest PID to work around this problem, instead fix it by always disabling the MMU when switching in or out of guest MMU context in HV mode. This may be a bit less efficient, but it is a lot less complicated and allows the P9 path to trivally implement the workaround too. Newer CPUs are not subject to this issue. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	89d35b2391	KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C Almost all logic is moved to C, by introducing a new in_guest mode for the P9 path that branches very early in the KVM interrupt handler to P9 exit code. The main P9 entry and exit assembly is now only about 160 lines of low level stack setup and register save/restore, plus a bad-interrupt handler. There are two motivations for this, the first is just make the code more maintainable being in C. The second is to reduce the amount of code running in a special KVM mode, "realmode". In quotes because with radix it is no longer necessarily real-mode in the MMU, but it still has to be treated specially because it may be in real-mode, and has various important registers like PID, DEC, TB, etc set to guest. This is hostile to the rest of Linux and can't use arbitrary kernel functionality or be instrumented well. This initial patch is a reasonably faithful conversion of the asm code, but it does lack any loop to return quickly back into the guest without switching out of realmode in the case of unimportant or easily handled interrupts. As explained in previous changes, handling HV interrupts very quickly in this low level realmode is not so important for P9 performance, and are important to avoid for security, observability, debugability reasons. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-15-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	9dc2babc18	KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path In the interest of minimising the amount of code that is run in "real-mode", don't handle hcalls in real mode in the P9 path. This requires some new handlers for H_CEDE and xics-on-xive to be added before xive is pulled or cede logic is checked. This introduces a change in radix guest behaviour where radix guests that execute 'sc 1' in userspace now get a privilege fault whereas previously the 'sc 1' would be reflected as a syscall interrupt to the guest kernel. That reflection is only required for hash guests that run PR KVM. Background: In POWER8 and earlier processors, it is very expensive to exit from the HV real mode context of a guest hypervisor interrupt, and switch to host virtual mode. On those processors, guest->HV interrupts reach the hypervisor with the MMU off because the MMU is loaded with guest context (LPCR, SDR1, SLB), and the other threads in the sub-core need to be pulled out of the guest too. Then the primary must save off guest state, invalidate SLB and ERAT, and load up host state before the MMU can be enabled to run in host virtual mode (~= regular Linux mode). Hash guests also require a lot of hcalls to run due to the nature of the MMU architecture and paravirtualisation design. The XICS interrupt controller requires hcalls to run. So KVM traditionally tries hard to avoid the full exit, by handling hcalls and other interrupts in real mode as much as possible. By contrast, POWER9 has independent MMU context per-thread, and in radix mode the hypervisor is in host virtual memory mode when the HV interrupt is taken. Radix guests do not require significant hcalls to manage their translations, and xive guests don't need hcalls to handle interrupts. So it's much less important for performance to handle hcalls in real mode on POWER9. One caveat is that the TCE hcalls are performance critical, real-mode variants introduced for POWER8 in order to achieve 10GbE performance. Real mode TCE hcalls were found to be less important on POWER9, which was able to drive 40GBe networking without them (using the virt mode hcalls) but performance is still important. These hcalls will benefit from subsequent guest entry/exit optimisation including possibly a faster "partial exit" that does not entirely switch to host context to handle the hcall. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-14-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	023c3c96ca	KVM: PPC: Book3S HV P9: implement kvmppc_xive_pull_vcpu in C This is more symmetric with kvmppc_xive_push_vcpu, and has the advantage that it runs with the MMU on. The extra test added to the asm will go away with a future change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-9-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	1b5821c630	KVM: PPC: Book3S 64: move bad_host_intr check to HV handler The bad_host_intr check will never be true with PR KVM, move it to HV code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-7-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	f36011569b	KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point Rather than bifurcate the call depending on whether or not HV is possible, and have the HV entry test for PR, just make a single common point which does the demultiplexing. This makes it simpler to add another type of exit handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-2-npiggin@gmail.com	2021-06-10 22:12:01 +10:00
Nicholas Piggin	6ba53317d4	KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path Similar to commit `25edcc50d7` ("KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path"), ensure the P7/8 path saves and restores the host FSCR. The logic explained in that patch actually applies there to the old path well: a context switch can be made before kvmppc_vcpu_run_hv restores the host FSCR and returns. Now both the p9 and the p7/8 paths now save and restore their FSCR, it no longer needs to be restored at the end of kvmppc_vcpu_run_hv Fixes: `b005255e12` ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs") Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com	2021-06-04 22:02:25 +10:00
Nicholas Piggin	72476aaa46	KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests Commit `68ad28a4cd` ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") incorrectly removed the radix host instruction patch to skip re-loading the host SLB entries when exiting from a hash guest. Restore it. Fixes: `68ad28a4cd` ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-11 17:28:15 +11:00
Paul Mackerras	ab950e1acd	KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries Commit `68ad28a4cd` ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") changed the older guest entry path, with the side effect that vcpu->arch.slb_max no longer gets cleared for a radix guest. This means that a HPT guest which loads some SLB entries, switches to radix mode, runs the guest using the old guest entry path (e.g., because the indep_threads_mode module parameter has been set to false), and then switches back to HPT mode would now see the old SLB entries being present, whereas previously it would have seen no SLB entries. To avoid changing guest-visible behaviour, this adds a store instruction to clear vcpu->arch.slb_max for a radix guest using the old guest entry path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-11 17:19:42 +11:00
Nicholas Piggin	7a7f94a3a9	KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB IH=6 may preserve hypervisor real-mode ERAT entries and is the recommended SLBIA hint for switching partitions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Nicholas Piggin	078ebe35fc	KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Nicholas Piggin	68ad28a4cd	KVM: PPC: Book3S HV: Fix radix guest SLB side channel The slbmte instruction is legal in radix mode, including radix guest mode. This means radix guests can load the SLB with arbitrary data. KVM host does not clear the SLB when exiting a guest if it was a radix guest, which would allow a rogue radix guest to use the SLB as a side channel to communicate with other guests. Fix this by ensuring the SLB is cleared when coming out of a radix guest. Only the first 4 entries are a concern, because radix guests always run with LPCR[UPRT]=1, which limits the reach of slbmte. slbia is not used (except in a non-performance-critical path) because it can clear cached translations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Nicholas Piggin	b1b1697ae0	KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support This reverts much of commit `c01015091a` ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts"), which was required to run HPT guests on RPT hosts on early POWER9 CPUs without support for "mixed mode", which meant the host could not run with MMU on while guests were running. This code has some corner case bugs, e.g., when the guest hits a machine check or HMI the primary locks up waiting for secondaries to switch LPCR to host, which they never do. This could all be fixed in software, but most CPUs in production have mixed mode support, and those that don't are believed to be all in installations that don't use this capability. So simplify things and remove support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Ravi Bangoria	bd1de1a0e6	KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR KVM code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, KVM will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Ravi Bangoria	122954ed7d	KVM: PPC: Book3S HV: Rename current DAWR macros and variables Power10 is introducing a second DAWR (Data Address Watchpoint Register). Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Nicholas Piggin	dc462267d2	powerpc/64s: handle ISA v3.1 local copy-paste context switches The ISA v3.1 the copy-paste facility has a new memory move functionality which allows the copy buffer to be pasted to domestic memory (RAM) as opposed to foreign memory (accelerator). This means the POWER9 trick of avoiding the cp_abort on context switch if the process had not mapped foreign memory does not work on POWER10. Do the cp_abort unconditionally there. KVM must also cp_abort on guest exit to prevent copy buffer state leaking between contexts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825075535.224536-1-npiggin@gmail.com	2020-09-08 22:57:12 +10:00
Athira Rajeev	5752fe0b81	KVM: PPC: Book3S HV: Save/restore new PMU registers Power ISA v3.1 has added new performance monitoring unit (PMU) special purpose registers (SPRs). They are: Monitor Mode Control Register 3 (MMCR3) Sampled Instruction Event Register A (SIER2) Sampled Instruction Event Register B (SIER3) Add support to save/restore these new SPRs while entering/exiting guest. Also include changes to support KVM_REG_PPC_MMCR3/SIER2/SIER3. Add new SPRs to KVM API documentation. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-6-git-send-email-atrajeev@linux.vnet.ibm.com	2020-07-22 21:56:41 +10:00
Athira Rajeev	7e4a145e5b	KVM: PPC: Book3S HV: Cleanup updates for kvm vcpu MMCR Currently `kvm_vcpu_arch` stores all Monitor Mode Control registers in a flat array in order: mmcr0, mmcr1, mmcra, mmcr2, mmcrs Split this to give mmcra and mmcrs its own entries in vcpu and use a flat array for mmcr0 to mmcr2. This patch implements this cleanup to make code easier to read. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Fix MMCRA/MMCR2 uapi breakage as noted by paulus] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1594996707-3727-3-git-send-email-atrajeev@linux.vnet.ibm.com	2020-07-22 21:56:01 +10:00
Michael Ellerman	787a2b682d	Merge branch 'topic/ppc-kvm' into next Merge our topic branch shared with the kvm-ppc tree. This brings in one commit that touches the XIVE interrupt controller logic across core and KVM code.	2020-05-20 23:38:13 +10:00
Ravi Bangoria	09f82b063a	powerpc/watchpoint: Rename current DAWR macros Power10 is introducing second DAWR. Use real register names from ISA for current macros: s/SPRN_DAWR/SPRN_DAWR0/ s/SPRN_DAWRX/SPRN_DAWRX0/ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Michael Neuling <mikey@neuling.org> Link: https://lore.kernel.org/r/20200514111741.97993-2-ravi.bangoria@linux.ibm.com	2020-05-19 00:11:03 +10:00
Cédric Le Goater	b1f9be9392	powerpc/xive: Enforce load-after-store ordering when StoreEOI is active When an interrupt has been handled, the OS notifies the interrupt controller with a EOI sequence. On a POWER9 system using the XIVE interrupt controller, this can be done with a load or a store operation on the ESB interrupt management page of the interrupt. The StoreEOI operation has less latency and improves interrupt handling performance but it was deactivated during the POWER9 DD2.0 timeframe because of ordering issues. We use the LoadEOI today but we plan to reactivate StoreEOI in future architectures. There is usually no need to enforce ordering between ESB load and store operations as they should lead to the same result. E.g. a store trigger and a load EOI can be executed in any order. Assuming the interrupt state is PQ=10, a store trigger followed by a load EOI will return a Q bit. In the reverse order, it will create a new interrupt trigger from HW. In both cases, the handler processing interrupts is notified. In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to disable temporarily the interrupt source (mask/unmask). When the source is reenabled, the OS can detect if interrupts were received while the source was disabled and reinject them. This process needs special care when StoreEOI is activated. The ESB load and store operations should be correctly ordered because a XIVE_ESB_STORE_EOI operation could leave the source enabled if it has not completed before the loads. For those cases, we enforce Load-after-Store ordering with a special load operation offset. To avoid performance impact, this ordering is only enforced when really needed, that is when interrupt sources are temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not be needed for other loads. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200220081506.31209-1-clg@kaod.org	2020-05-07 22:58:31 +10:00
Nicholas Piggin	9600f261ac	powerpc/64s/exception: Move KVM test to common code This allows more code to be moved out of unrelocated regions. The system call KVMTEST is changed to be open-coded and remain in the tramp area to avoid having to move it to entry_64.S. The custom nature of the system call entry code means the hcall case can be made more streamlined than regular interrupt handlers. mpe: Incorporate fix from Nick: Moving KVM test to the common entry code missed the case of HMI and MCE, which do not do __GEN_COMMON_ENTRY (because they don't want to switch to virt mode). This means a MCE or HMI exception that is taken while KVM is running a guest context will not be switched out of that context, and KVM won't be notified. Found by running sigfuz in guest with patched host on POWER9 DD2.3, which causes some TM related HMI interrupts (which are expected and supposed to be handled by KVM). This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add the KVM test. This makes them look a little more like other handlers that all use __GEN_COMMON_ENTRY. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-13-npiggin@gmail.com	2020-04-01 13:42:11 +11:00
Jordan Niethe	736bcdd3a9	powerpc/mm: Remove kvm radix prefetch workaround for Power9 DD2.2 Commit `a25bd72bad` ("powerpc/mm/radix: Workaround prefetch issue with KVM") introduced a number of workarounds as coming out of a guest with the mmu enabled would make the cpu would start running in hypervisor state with the PID value from the guest. The cpu will then start prefetching for the hypervisor with that PID value. In Power9 DD2.2 the cpu behaviour was modified to fix this. When accessing Quadrant 0 in hypervisor mode with LPID != 0 prefetching will not be performed. This means that we can get rid of the workarounds for Power9 DD2.2 and later revisions. Add a new cpu feature CPU_FTR_P9_RADIX_PREFETCH_BUG to indicate if the workarounds are needed. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191206031722.25781-1-jniethe5@gmail.com	2020-01-26 00:11:37 +11:00
Marcus Comstedt	228b607d8e	KVM: PPC: Book3S HV: Fix regression on big endian hosts VCPU_CR is the offset of arch.regs.ccr in kvm_vcpu. arch/powerpc/include/asm/kvm_host.h defines arch.regs as a struct pt_regs, and arch/powerpc/include/asm/ptrace.h defines the ccr field of pt_regs as "unsigned long ccr". Since unsigned long is 64 bits, a 64-bit load needs to be used to load it, unless an endianness specific correction offset is added to access the desired subpart. In this case there is no reason to _not_ use a 64 bit load though. Fixes: `6c85b7bc63` ("powerpc/kvm: Use UV_RETURN ucall to return to ultravisor") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Marcus Comstedt <marcus@mc.pp.se> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191215094900.46740-1-marcus@mc.pp.se	2019-12-17 15:09:08 +11:00
Michael Ellerman	af2e8c68b9	KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel On some systems that are vulnerable to Spectre v2, it is up to software to flush the link stack (return address stack), in order to protect against Spectre-RSB. When exiting from a guest we do some house keeping and then potentially exit to C code which is several stack frames deep in the host kernel. We will then execute a series of returns without preceeding calls, opening up the possiblity that the guest could have poisoned the link stack, and direct speculative execution of the host to a gadget of some sort. To prevent this we add a flush of the link stack on exit from a guest. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-11-14 15:37:59 +11:00
Jordan Niethe	7fe4e1176d	powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_host kvmhv_switch_to_host() in arch/powerpc/kvm/book3s_hv_rmhandlers.S needs to set kvmppc_vcore->in_guest to 0 to signal secondary CPUs to continue. This happens after resetting the PCR. Before commit `13c7bb3c57` ("powerpc/64s: Set reserved PCR bits"), r0 would always be 0 before it was stored to kvmppc_vcore->in_guest. However because of this change in the commit: /* Reset PCR / ld r0, VCORE_PCR(r5) - cmpdi r0, 0 + LOAD_REG_IMMEDIATE(r6, PCR_MASK) + cmpld r0, r6 beq 18f - li r0, 0 - mtspr SPRN_PCR, r0 + mtspr SPRN_PCR, r6 18: / Signal secondary CPUs to continue */ stb r0,VCORE_IN_GUEST(r5) We are no longer comparing r0 against 0 and loading it with 0 if it contains something else. Hence when we store r0 to kvmppc_vcore->in_guest, it might not be 0. This means that secondary CPUs will not be signalled to continue. Those CPUs get stuck and errors like the following are logged: KVM: CPU 1 seems to be stuck KVM: CPU 2 seems to be stuck KVM: CPU 3 seems to be stuck KVM: CPU 4 seems to be stuck KVM: CPU 5 seems to be stuck KVM: CPU 6 seems to be stuck KVM: CPU 7 seems to be stuck This can be reproduced with: $ for i in `seq 1 7` ; do chcpu -d $i ; done ; $ taskset -c 0 qemu-system-ppc64 -smp 8,threads=8 \ -M pseries,accel=kvm,kvm-type=HV -m 1G -nographic -vga none \ -kernel vmlinux -initrd initrd.cpio.xz Fix by making sure r0 is 0 before storing it to kvmppc_vcore->in_guest. Fixes: `13c7bb3c57` ("powerpc/64s: Set reserved PCR bits") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191004025317.19340-1-jniethe5@gmail.com	2019-10-09 17:16:59 +11:00

1 2 3 4 5 ...

309 Commits