OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Nicholas Piggin	11e87346b9	powerpc/64s: Consolidate Program 0x700 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:07:01 +11:00
Nicholas Piggin	f9aa67142e	powerpc/64s: Consolidate Alignment 0x600 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:07:00 +11:00
Nicholas Piggin	c138e58890	powerpc/64s: Consolidate External 0x500 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:07:00 +11:00
Nicholas Piggin	8d04631ad7	powerpc/64s: Consolidate Instruction Segment 0x480 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:59 +11:00
Nicholas Piggin	27ce77df60	powerpc/64s: Consolidate Instruction Storage 0x400 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:58 +11:00
Nicholas Piggin	2b9af6e40e	powerpc/64s: Consolidate Data Segment 0x380 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:58 +11:00
Nicholas Piggin	80795e6cbe	powerpc/64s: Consolidate Data Storage 0x300 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:57 +11:00
Nicholas Piggin	afcf009548	powerpc/64s: Consolidate Machine Check 0x200 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:57 +11:00
Nicholas Piggin	582baf44f9	powerpc/64s: Consolidate System Reset 0x100 interrupt Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:56 +11:00
Nicholas Piggin	57f266497d	powerpc: Use gas sections for arranging exception vectors Use assembler sections of fixed size and location to arrange the 64-bit Book3S exception vector code (64-bit Book3E also uses it in head_64.S for 0x0..0x100). This allows better flexibility in arranging exception code and hiding unimportant details behind macros. Gas sections can be a bit painful to use this way, mainly because the assembler does not know where they will be finally linked. Taking absolute addresses requires a bit of trickery for example, but it can be hidden behind macros for the most part. Generated code is mostly the same except locations, offsets, alignments. The "+ 0x2" is only required for the trap number / kvm exit number, which gets loaded as a constant into a register. Previously, code also used + 0x2 for label names, but we changed to using "H" to distinguish HV case for that. Remove the last vestiges of that. __after_prom_start is taking absolute address of a label in another fixed section. Newer toolchains seemed to compile this okay, but older ones do not. FIXED_SYMBOL_ABS_ADDR is more foolproof, it just takes an additional line to define. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:56 +11:00
Nicholas Piggin	573819e343	powerpc/64: Change the way relocation copy is calculated With a subsequent patch to put text into different sections, (_end - _stext) can no longer be computed at link time to determine the end of the copy. Instead, calculate it at runtime with (copy_to_here - _stext) + (_end - copy_to_here). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:55 +11:00
Nicholas Piggin	be642c3457	powerpc/64s: Consolidate exception handler alignment Move exception handler alignment directives into the head-64.h macros, beause they will no longer work in-place after the next patch. This slightly changes functions that have alignments applied and therefore code generation, which is why it was not done initially (see earlier patch). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:55 +11:00
Michael Ellerman	da2bc4644c	powerpc/64s: Add new exception vector macros Create arch/powerpc/include/asm/head-64.h with macros that specify an exception vector (name, type, location), which will be used to label and lay out exceptions into the object file. Naming is moved out of exception-64s.h, which is used to specify the implementation of exception handlers. objdump of generated code in exception vectors is unchanged except for names. Alignment directives scattered around are annoying, but done this way so that disassembly can verify identical instruction generation before and after patch. These get cleaned up in future patch. We change the way KVMTEST works, explicitly passing EXC_HV or EXC_STD rather than overloading the trap number. This removes the need to have SOFTEN values for the overloaded trap numbers, eg. 0x502. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-04 13:06:36 +11:00
Marcelo Cerri	74ff6cb3aa	crypto: sha1-powerpc - little-endian support The driver does not handle endianness properly when loading the input data. Signed-off-by: Marcelo Cerri <marcelo.cerri@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-10-02 22:31:53 +08:00
Thomas Gleixner	d7e25c66c9	Merge branch 'x86/urgent' into x86/asm Get the cr4 fixes so we can apply the final cleanup	2016-09-30 12:38:28 +02:00
Anton Blanchard	5045ea3737	powerpc/vdso64: Use double word compare on pointers __kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to check if the passed in pointer is non zero. cmpli maps to a 32 bit compare on binutils, so we ignore the top 32 bits. A simple test case can be created by passing in a bogus pointer with the bottom 32 bits clear. Using a clk_id that is handled by the VDSO, then one that is handled by the kernel shows the problem: printf("%d\n", clock_getres(CLOCK_REALTIME, (void )0x100000000)); printf("%d\n", clock_getres(CLOCK_BOOTTIME, (void )0x100000000)); And we get: 0 -1 The bigger issue is if we pass a valid pointer with the bottom 32 bits clear, in this case we will return success but won't write any data to the pointer. I stumbled across this issue because the LLVM integrated assembler doesn't accept cmpli with 3 arguments. Fix this by converting them to cmpldi. Fixes: `a7f290dad3` ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Cc: stable@vger.kernel.org # v2.6.15+ Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-29 15:17:57 +10:00
Balbir Singh	2e5bbb5461	KVM: PPC: Book3S HV: Migrate pinned pages out of CMA When PCI Device pass-through is enabled via VFIO, KVM-PPC will pin pages using get_user_pages_fast(). One of the downsides of the pinning is that the page could be in CMA region. The CMA region is used for other allocations like the hash page table. Ideally we want the pinned pages to be from non CMA region. This patch (currently only for KVM PPC with VFIO) forcefully migrates the pages out (huge pages are omitted for the moment). There are more efficient ways of doing this, but that might be elaborate and might impact a larger audience beyond just the kvm ppc implementation. The magic is in new_iommu_non_cma_page() which allocates the new page from a non CMA region. I've tested the patches lightly at my end. The full solution requires migration of THP pages in the CMA region. That work will be done incrementally on top of this. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> [mpe: Merged via powerpc tree as that's where the changes are] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-29 15:14:44 +10:00
Gavin Shan	360aebd85a	drivers/pci/hotplug: Support surprise hotplug in powernv driver This supports PCI surprise hotplug. The design is highlighted as below: * The PCI slot's surprise hotplug capability is exposed through device node property "ibm,slot-surprise-pluggable", meaning PCI surprise hotplug will be disabled if skiboot doesn't support it yet. * The interrupt because of presence or link state change is raised on surprise hotplug event. One event is allocated and queued to the PCI slot for workqueue to pick it up and process in serialized fashion. The code flow for surprise hotplug is same to that for managed hotplug except: the affected PEs are put into frozen state to avoid unexpected EEH error reporting in surprise hot remove path. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-29 15:02:28 +10:00
Gavin Shan	313483dd72	powerpc/powernv: Unfreeze PE on allocation This unfreezes PE when it's initialized because the PE might be put into frozen state in the last hot remove path. It's not harmful to do so if the PE is already in unfrozen state. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-29 15:01:53 +10:00
Gavin Shan	e0056b0a12	powerpc/eeh: Export eeh_pe_state_mark() This exports eeh_pe_state_mark(). It will be used to mark the surprise hot removed PE as isolated to avoid unexpected EEH error reporting in surprise remove path. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-29 14:51:04 +10:00
Gavin Shan	35066c0d79	powerpc/eeh: Export confirm_error_lock This exports @confirm_error_lock so that eeh_serialize_{lock, unlock}() can be used to freeze the affected PE in PCI surprise hot remove path. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-29 14:51:03 +10:00
Gavin Shan	de5a662249	powerpc/eeh: Allow to freeze PE in eeh_pe_set_option() Function eeh_pe_set_option() is used to apply the requested options (enable, disable, unfreeze) in EEH virtualization path. The semantics of this function isn't complete until freezing is supported. This allows to freeze the indicated PE. The new semantics is going to be used in PCI surprise hot remove path, to freeze removed PCI devices (PE) to avoid unexpected EEH error reporting. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-29 14:51:02 +10:00
Gavin Shan	fbce44d0ed	powerpc/powernv: Call opal_pci_poll() if needed When issuing PHB reset, OPAL API opal_pci_poll() is called to drive the state machine in OPAL forward. However, we needn't always call the function under some circumstances like reset deassert. This avoids calling opal_pci_poll() when OPAL_SUCCESS is returned from opal_pci_reset(). Except the overhead introduced by additional one unnecessary OPAL call, I didn't run into real issue because of this. Reported-by: Pridhiviraj Paidipeddi <ppaiddipe@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-29 14:50:51 +10:00
Oliver O'Halloran	c762c69e10	powerpc/boot: Add support for XZ compression This patch adds an option to use XZ compression for the kernel image. Currently this is only enabled for 64-bit Book3S targets, which is roughly equivalent to the platforms that use the kernel's zImage wrapper, and that have been tested. The bulk of the 32-bit platforms and 64-bit BookE use uboot images, which relies on uboot implementing XZ. In future we can enable XZ support for those targets once someone has tested it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-28 14:35:14 +10:00
Oliver O'Halloran	f1e510bbb9	powerpc/boot: Add XZ support to the wrapper script This modifies the wrapper script so that the -Z option takes an argument to specify the compression type. It can either be 'gz', 'xz' or 'none'. The legazy --no-gzip and -z options are still supported and will set the compression to none and gzip respectively, but they are not documented. Only XZ -6 is used for compression rather than XZ -9. Using compression levels higher than 6 requires the decompressor to build a large (64MB) dictionary when decompressing and some environments cannot satisfy such large allocations (e.g. POWER 6 LPAR partition firmware). Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-28 14:32:27 +10:00
Oliver O'Halloran	a4da56fbc5	powerpc/boot: Remove the legacy gzip wrapper This code is no longer used and can be removed. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-28 14:31:50 +10:00
Oliver O'Halloran	1b7898ee27	powerpc/boot: Use the pre-boot decompression API Currently the powerpc boot wrapper has its own wrapper around zlib to handle decompressing gzipped kernels. The kernel decompressor library functions now provide a generic interface that can be used in the pre-boot environment. This allows boot wrappers to easily support different compression algorithms. This patch converts the wrapper to use this new API, but does not add support for using new algorithms. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-28 14:31:43 +10:00
Oliver O'Halloran	22750d98b0	powerpc/boot: Use CONFIG_KERNEL_GZIP Most architectures allow the compression algorithm used to produced the vmlinuz image to be selected as a kernel config option. In preperation for supporting algorithms other than gzip in the powerpc boot wrapper the makefile needs to be modified to use these config options. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-28 14:25:55 +10:00
Oliver O'Halloran	1a13de6df9	powerpc/boot: Add sed script The powerpc boot wrapper is potentially compiled with a separate toolchain and/or toolchain flags than the rest of the kernel. The usual case is a 64-bit big endian kernel builds a 32-bit big endian wrapper. The main problem with this is that the wrapper does not have access to the kernel headers (without a lot of gross hacks). To get around this the required headers are copied into the build directory via several sed scripts which rewrite problematic includes. This patch moves these fixups out of the makefile into a separate .sed script file to clean up makefile slightly. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Reword first paragraph of change log a little] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-28 14:20:44 +10:00
Deepa Dinamani	078cd8279e	fs: Replace CURRENT_TIME with current_time() for inode timestamps CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_time() instead. CURRENT_TIME is also not y2038 safe. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Hence, it is necessary for all file system timestamps to use current_time(). Also, current_time() will be transitioned along with vfs to be y2038 safe. Note that whenever a single call to current_time() is used to change timestamps in different inodes, it is because they share the same time granularity. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Felipe Balbi <balbi@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-09-27 21:06:21 -04:00
Thomas Huth	fa73c3b25b	KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register The MMCR2 register is available twice, one time with number 785 (privileged access), and one time with number 769 (unprivileged, but it can be disabled completely). In former times, the Linux kernel was using the unprivileged register 769 only, but since commit `8dd75ccb57` ("powerpc: Use privileged SPR number for MMCR2"), it uses the privileged register 785 instead. The KVM-PR code then of course also switched to use the SPR 785, but this is causing older guest kernels to crash, since these kernels still access 769 instead. So to support older kernels with KVM-PR again, we have to support register 769 in KVM-PR, too. Fixes: `8dd75ccb57` Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-27 15:14:29 +10:00
Thomas Huth	2365f6b67c	KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL On POWER8E and POWER8NVL, KVM-PR does not announce support for 64kB page sizes and 1TB segments yet. Looks like this has just been forgotton so far, since there is no reason why this should be different to the normal POWER8 CPUs. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-27 15:14:29 +10:00
Balbir Singh	4f053d06dc	KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie Remove duplicate setting of the the "B" field when doing a tlbie(l). In compute_tlbie_rb(), the "B" field is set again just before returning the rb value to be used for tlbie(l). Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-27 15:14:29 +10:00
Dan Carpenter	ac0e89bb47	KVM: PPC: BookE: Fix a sanity check We use logical negate where bitwise negate was intended. It means that we never return -EINVAL here. Fixes: `ce11e48b7f` ('KVM: PPC: E500: Add userspace debug stub support') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-27 15:14:29 +10:00
Paul Mackerras	b009031f74	KVM: PPC: Book3S HV: Take out virtual core piggybacking code This takes out the code that arranges to run two (or more) virtual cores on a single subcore when possible, that is, when both vcores are from the same VM, the VM is configured with one CPU thread per virtual core, and all the per-subcore registers have the same value in each vcore. Since the VTB (virtual timebase) is a per-subcore register, and will almost always differ between vcores, this code is disabled on POWER8 machines, meaning that it is only usable on POWER7 machines (which don't have VTB). Given the tiny number of POWER7 machines which have firmware that allows them to run HV KVM, the benefit of simplifying the code outweighs the loss of this feature on POWER7 machines. Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-27 14:42:07 +10:00
Paul Mackerras	88b02cf97b	KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread POWER8 has one virtual timebase (VTB) register per subcore, not one per CPU thread. The HV KVM code currently treats VTB as a per-thread register, which can lead to spurious soft lockup messages from guests which use the VTB as the time source for the soft lockup detector. (CPUs before POWER8 did not have the VTB register.) For HV KVM, this fixes the problem by making only the primary thread in each virtual core save and restore the VTB value. With this, the VTB state becomes part of the kvmppc_vcore structure. This also means that "piggybacking" of multiple virtual cores onto one subcore is not possible on POWER8, because then the virtual cores would share a single VTB register. PR KVM emulates a VTB register, which is per-vcpu because PR KVM has no notion of CPU threads or SMT. For PR KVM we move the VTB state into the kvmppc_vcpu_book3s struct. Cc: stable@vger.kernel.org # v3.14+ Reported-by: Thomas Huth <thuth@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-27 14:41:39 +10:00
Linus Torvalds	751b9a5d16	powerpc fixes for 4.8 #7 - powernv/pci: Fix m64 checks for SR-IOV and window alignment from Russell Currey -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJX50CvAAoJEFHr6jzI4aWAqJAP/0/0D8YGwOuIoYD2GmfoasKR TFbbuhX3xnfdiRG6w/sFBI3oh7icCw7hC+Qj1lNu9D3L/UkxOTBny+W07KvWzX44 Yu74nEHgq3mVrRAU4McztbKIUBK2zagGwwCcGZXZl/uQI1ylvmmpcH3xClQzF+oA xKk8eB1OW2Ay6+y+FkSuyBHHSfww6QCk7ERPqaStCW9Uy+dDBjIwStLQuOpAhN/o Z9K+JwpPJ8qgw1Pe9pvrD5MjcM0hR+tUZm6LklZCCk89feqlwcrz9cpOrmTdGuF+ n1iacpDaFf6IOlhI+6ImrT15llTgSk/nu9GNIRFDwOjVCuGy5aDQBtWuRFiVNggp vkZWFSl594Jn5H9/s6MpMXygSl36NMKgM/ZKvUsEAe6mF0Kb9pZRB7b/aV+ajkCQ rkQCe0KKSF6+D3wu3SmMe0NTc3/GkgxZN0lTnqUaB5PSRqwvVwurXugnAKr7arhj JSu9/QSeOxNI5ytDF1Nf9/RN0DT+L1w0vun083DupyJkG1hrjzm9kI0lACQTr/QX TxAWXGjiTsUOeM4pfNzqaJE4fNUc0TIc41jgWMx9qXzbKjhijgKEPtmyDMz93GVY hFXyRAMsWUOsQGP5tiLFYG0PkNsmCDIwca+yg47EicBQGTpEsGLYUBRvIILYNBKI ULl0yMLZWekl1rzthDdB =35xQ -----END PGP SIGNATURE----- Merge tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull one more powerpc fix from Michael Ellerman: "powernv/pci: Fix m64 checks for SR-IOV and window alignment from Russell Currey" * tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment	2016-09-25 13:52:59 -07:00
Claudiu Manoil	e0b80f00bb	arch/powerpc: Add CONFIG_FSL_DPAA to corenetXX_smp_defconfig Enable the drivers on the powerpc arch. Signed-off-by: Roy Pledge <roy.pledge@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:39:01 -05:00
Christophe Leroy	36eb1542fc	powerpc/8xx: make user addr DTLB miss the short path User space DTLB miss represent approximatly 90% of TLB misses so make it the shortest path. Also remove an unneccessary double jump in FixupDAR Before this patch, we spend 3.3 TB ticks in the handler for each user address miss and 3.4 TB ticks for each kernel address miss After this patch, we send 3.0 TB ticks in the handler for each user address miss and 3.9 TB ticks for each kernel address miss Taking into account that user misses represent 90% of the total, this patch provides an improvement of approx. 9% Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:57 -05:00
Christophe Leroy	73a532061c	powerpc/8xx: Move additional DTLBMiss handlers out of exception area When all options are activated, there is not enough space for the DTLBMiss handlers that handles IMMR area and linear RAM pages in the exception area once we have added hugepage handling. So lets move them after .0x2000 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:57 -05:00
Christophe Leroy	d1b9f81456	powerpc/8xx: use r3 to scratch CR in ITLBmiss Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:56 -05:00
Christophe Leroy	e627f8dc9a	powerpc/8xx: add dedicated machine check handler During a machine check, the 8xx provides indication of whether the check is due to data or instruction access, so let's display it. Lets also move 8xx specific handling into the new handler. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:55 -05:00
Christophe Leroy	f307939fb2	powerpc/8xx: add system_reset_exception When the watchdog is in NMI mode, the system reset interrupt is generated when the watchdog counter expires. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:54 -05:00
Scott Wood	63f1de8820	powerpc/fsl_pci: Size upper inbound window based on RAM size This allows PCI devices that can only address (e.g.) 36 or 40 bit DMA to use direct DMA, at the cost of not being able to DMA to non-RAM addresses (this doesn't affect MSIs as there is a separate dedicated window for that) which we wouldn't have been able to do anyway if the RAM size didn't trigger the creation of the second inbound window. It also fixes an off-by-one error that set dma_direct_ops on PCI devices whose dma mask could address all the space below the DMA offset (previously 40 bits), but not the window that starts at the DMA offset. Signed-off-by: Scott Wood <oss@buserror.net> Cc: Tillmann Heidsieck <theidsieck@leenox.de> Tested-by: Tillmann Heidsieck <theidsieck@leenox.de>	2016-09-25 02:38:54 -05:00
Christophe Leroy	834e5a6921	powerpc/8xx: use SPRN_EIE and SPRN_EID to enable/disable interrupts The 8xx has two special registers called EID (External Interrupt Disable) and EIE (External Interrupt Enable) for clearing/setting EE in MSR. It avoids the three instructions set mfmsr/ori/mtmsr or mfmsr/rlwinm/mtmsr and it avoids using a general register. We just have to write something in the special register to change MSR EE bit. So we write r0 into the register, regardless of r0 value. Writing to one of those two special registers also set the MSR RI bit, but this bit is only unset during beginning of exception prolog and end of exception epilog. When executing C-functions MSR RI is always set. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:53 -05:00
Kevin Hao	fff69fd03d	powerpc/83xx: factor out the common codes of setup arch functions Factor out the common codes of setup arch functions to a separate function. It does make no sense to print a board specific info in setup arch functions, so use a more general one. For ASP8347E board, there is no pci device node. So it is safe to invoke mpc83xx_setup_pci() in its setup arch function even there is no such invocation in its original setup arch function. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:53 -05:00
Christophe Leroy	4d486e0083	soc/fsl/qe: fix Oops on CPM1 (and likely CPM2) Commit `0e6e01ff69` ("CPM/QE: use genalloc to manage CPM/QE muram") has changed the way muram is managed. genalloc uses kmalloc(), hence requires the SLAB to be up and running. On powerpc 8xx, cpm_reset() is called early during startup. cpm_reset() then calls cpm_muram_init() before SLAB is available, hence the following Oops. cpm_reset() cannot be called during initcalls because the CPM is needed for console. This patch removes the call to cpm_muram_init() from cpm_reset(). cpm_muram_init() will be called from a new function called cpm_init() which is declared as subsys_initcall, unless cpm_muram_alloc() is called earlier for the serial console in which case cpm_muram_init() will be called from there. The reason for calling it from two places is that some drivers (e.g. i2c-cpm) need some of the initialisations done by cpm_muram_init() but don't call cpm_muram_alloc(). The console driver calls cpm_muram_alloc() but some platforms might not use the CPM serial ports for console. [ 0.000000] Unable to handle kernel paging request for data at address 0x00000008 [ 0.000000] Faulting instruction address: 0xc01acce0 [ 0.000000] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.000000] PREEMPT CMPC885 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.14-g0886ed8 #5 [ 0.000000] task: c05183e0 ti: c0536000 task.ti: c0536000 [ 0.000000] NIP: c01acce0 LR: c0011068 CTR: 00000000 [ 0.000000] REGS: c0537e50 TRAP: 0300 Not tainted (4.4.14-s3k-dev-g0886ed8-svn) [ 0.000000] MSR: 00001032 <ME,IR,DR,RI> CR: 28044428 XER: 00000000 [ 0.000000] DAR: 00000008 DSISR: c0000000 GPR00: c0011068 c0537f00 c05183e0 00000000 00009000 ffffffff 00000bc0 ffffffff GPR08: ff003000 ff00b000 ff003bbf 00000000 22044422 100d43a8 00000000 07ff94e8 GPR16: 00000000 07bb5d70 00000000 07ff81f4 07ff81f4 07ff81f4 00000000 00000000 GPR24: 07ffb3a0 07fe7628 c0550000 c7ffa190 c0540000 ff003bbf 00000000 00000001 [ 0.000000] NIP [c01acce0] gen_pool_add_virt+0x14/0xdc [ 0.000000] LR [c0011068] cpm_muram_init+0xd4/0x18c [ 0.000000] Call Trace: [ 0.000000] [c0537f00] [00000200] 0x200 (unreliable) [ 0.000000] [c0537f20] [c0011068] cpm_muram_init+0xd4/0x18c [ 0.000000] [c0537f70] [c0494684] cpm_reset+0xb4/0xc8 [ 0.000000] [c0537f90] [c0494c64] cmpc885_setup_arch+0x10/0x30 [ 0.000000] [c0537fa0] [c0493cd4] setup_arch+0x130/0x168 [ 0.000000] [c0537fb0] [c04906bc] start_kernel+0x88/0x380 [ 0.000000] [c0537ff0] [c0002224] start_here+0x38/0x98 [ 0.000000] Instruction dump: [ 0.000000] 91430010 91430014 80010014 83e1000c 7c0803a6 38210010 4e800020 7c0802a6 [ 0.000000] 9421ffe0 bf61000c 90010024 7c7e1b78 <80630008> 7c9c2378 7cc31c30 3863001f [ 0.000000] ---[ end trace dc8fa200cb88537f ]--- fixes: `0e6e01ff69` ("CPM/QE: use genalloc to manage CPM/QE muram") Cc: stable@vger.linux.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: Removed some string changes unrelated to bugfix] Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:52 -05:00
Julia Lawall	1fadfe9e19	powerpc/mpic: use of_property_read_bool Use of_property_read_bool to check for the existence of a property. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression e1,e2; statement S2,S1; @@ - if (of_get_property(e1,e2,NULL)) + if (of_property_read_bool(e1,e2)) S1 else S2 // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:51 -05:00
Andrey Smirnov	7120438e5d	powerpc: Convert fsl_rstcr_restart to a reset handler Convert fsl_rstcr_restart into a function to be registered with register_reset_handler(). Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> [scottwood: Converted mvme7100 as well] Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 02:38:50 -05:00
Andrey Smirnov	ad24747304	powerpc: Call chained reset handlers during reset Call out to all restart handlers that were added via register_restart_handler() API when restarting the machine. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 00:06:40 -05:00
Andrey Smirnov	d0d738a414	powerpc: Factor out common code in setup-common.c Factor out a small bit of common code in machine_restart(), machine_power_off() and machine_halt(). Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-25 00:06:39 -05:00
Andrey Smirnov	625f3eea40	powerpc/sgy_cts1000: Fix gpio_halt_cb()'s signature Halt callback in struct machdep_calls is declared with __noreturn attribute, so omitting that attribute in gpio_halt_cb()'s signatrue results in compilation error. Change the signature to address the problem as well as change the code of the function to avoid ever returning from the function. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-24 23:59:51 -05:00
Andrey Smirnov	49bf9279cd	powerpc/e8248e: Select PHYLIB only if NETDEVICES is enabled Select PHYLIB only if NETDEVICES is enabled and MDIO_BITBANG only if PHYLIB is present to avoid warnings from Kconfig. To prevent undefined references during linking register MDIO driver only if CONFIG_MDIO_BITBANG is enabled. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-24 23:59:47 -05:00
Andrey Smirnov	93c4ea38a1	powerpc/mpc85xx_mds: Select PHYLIB only if NETDEVICES is enabled PHYLIB depends on NETDEVICES, so to avoid unmet dependencies warning from Kconfig it needs to be selected conditionally. Also add checks if PHYLIB is built-in to avoid undefined references to PHYLIB's symbols. Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-24 23:59:41 -05:00
Christophe Leroy	ddc6cd0d70	powerpc32: Use instruction symbolic names in check_io_access() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2016-09-24 23:51:06 -05:00
Rui Teng	f1a55ce054	powerpc: Clean up tm_abort duplication in hash_utils_64.c The same logic appears twice and should probably be pulled out into a function. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Rui Teng <rui.teng@linux.vnet.ibm.com> [mpe: Rename to tm_flush_hash_page() and move comment into the function] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:23 +10:00
Andrew Donnellan	6060e9ea8d	powerpc/powernv: Fix comment style and spelling Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:23 +10:00
Christophe Leroy	148151a66a	powerpc/32: Remove CLR_TOP32 CLR_TOP32() is defined as blank. Last useful instance of CLR_TOP32() was removed by commit `40ef8cbc6d` ("powerpc: Get 64-bit configs to compile with ARCH=powerpc") in 2005. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:22 +10:00
Christophe Leroy	6b8cb66a6a	powerpc: Fix usage of _PAGE_RO in hugepage On some CPUs like the 8xx, _PAGE_RW hence _PAGE_WRITE is defined as 0 and _PAGE_RO has to be set when a page is not writable _PAGE_RO is defined by default in pte-common.h, however BOOK3S/64 doesn't include that file so _PAGE_RO has to be defined explicitly in book3s/64/pgtable.h Fixes: `a7b9f671f2` ("powerpc32: adds handling of _PAGE_RO") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:22 +10:00
Russell Currey	af2e3a009e	powerpc/eeh: Skip finding bus until after failure reporting In eeh_handle_special_event(), eeh_pe_bus_get() is called before calling eeh_report_failure() on every device under a PE. If a PE was missing a bus for some reason, the error would occur before reporting failure, even though eeh_report_failure() doesn't require a bus. Fix this by moving the bus retrieval and error check after the eeh_report_failure() calls. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:21 +10:00
Russell Currey	e98ddb7716	powerpc/powernv/eeh: Skip finding bus for VF resets When the PE used in pnv_eeh_reset() is that of a VF, pnv_eeh_reset_vf_pe() is used. Unlike the other reset functions called in pnv_eeh_reset(), the VF reset doesn't require a bus, and if a bus was missing the function would error out before resetting the VF PE. To avoid this, reorder the VF reset function to occur before finding and checking the bus. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:21 +10:00
Russell Currey	04fec21c06	powerpc/eeh: Null check uses of eeh_pe_bus_get eeh_pe_bus_get() can return NULL if a PCI bus isn't found for a given PE. Some callers don't check this, and can cause a null pointer dereference under certain circumstances. Fix this by checking NULL everywhere eeh_pe_bus_get() is called. Fixes: `8a6b1bc70d` ("powerpc/eeh: EEH core to handle special event") Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:20 +10:00
Nicholas Piggin	a24553dd02	powerpc/pseries: Remove unnecessary syscall trampoline When we originally added the ability to split the exception vectors from the kernel (commit `1f6a93e4c3` ("powerpc: Make it possible to move the interrupt handlers away from the kernel" 2008-09-15)), the LOAD_HANDLER() macro used an addi instruction to compute the offset of the common handler from the kernel base address. Using addi meant the handler had to be within 32K of the kernel base address, due to the addi instruction taking a signed immediate value. That necessitated creating a trampoline for the system call handler, because system_call_common (in entry64.S) is not linked within 32K of the kernel base address. Later in commit `61e2390ede` ("powerpc: Make load_hander handle upto 64k offset" 2012-11-15) we changed LOAD_HANDLER to take a 64K offset, by changing it to use ori. Although system_call_common is not in head_64.S or exceptions-64s.S, it is included in head-y, which causes it to be linked early in the kernel text, so in practice it ends up below 64K. Additionally if it can't be placed below 64K the linker will fail to build with a "relocation truncated to fit" error. So remove the trampoline. Newer toolchains are able to work out that the ori in LOAD_HANDLER only takes a 16 bit offset, and so they generate a 16 bit relocation. Older toolchains (binutils 2.22 at least) are not so smart, so we have to add the @l annotation to tell the assembler to generate a 16 bit relocation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:20 +10:00
Nicholas Piggin	40e1b1cfb5	powerpc/pseries: Fix HV facility unavailable to use correct handler The 0xf80 hv_facility_unavailable trampoline branches to the 0xf60 handler. This works because they both do the same thing, but it should be fixed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:19 +10:00
Russell Currey	98b665da57	powerpc/powernv/pci: Add PHB register dump debugfs handle On EEH events the kernel will print a dump of relevant registers. If EEH is unavailable (i.e. CONFIG_EEH is disabled, a new platform doesn't have EEH support, etc) this information isn't readily available. Add a new debugfs handler to trigger a PHB register dump, so that this information can be made available on demand. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:19 +10:00
Benjamin Herrenschmidt	3eabf88579	powerpc/64/kexec: Remove BookE special default_machine_kexec_prepare() The only difference is now the TCE table check which doesn't need to be ifdef'ed out, it will basically do nothing on BookE (it is only useful for ancient IBM machines). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:18 +10:00
Benjamin Herrenschmidt	b970b41ea6	powerpc/64/kexec: Copy image with MMU off when possible Currently we turn the MMU off after copying the image, and we make sure there is no overlap between the hash table and the target pages in that case. That doesn't work for Radix however. In that case, the page tables are scattered and we can't really enforce that the target of the image isn't overlapping one of them. So instead, let's turn the MMU off before copying the image in radix mode. Thankfully, in radix mode, even under a hypervisor, we know we don't have the same kind of RMA limitations that hash mode has. While at it, also turn the MMU off early when using hash in non-LPAR mode, that way we can get rid of the collision check completely. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:18 +10:00
Aneesh Kumar K.V	be34d30059	powerpc/mm: Add radix flush all with IS=3 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:18 +10:00
Benjamin Herrenschmidt	fe036a0605	powerpc/64/kexec: Fix MMU cleanup on radix Just using the hash ops won't work anymore since radix will have NULL in there. Instead create an mmu_cleanup_all() function which will do the right thing based on the MMU mode. For Radix, for now I clear UPRT and the PTCR, effectively switching back to Radix with no partition table setup. Currently set it to NULL on BookE thought it might be a good idea to wipe the TLB there (Scott ?) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:17 +10:00
Benjamin Herrenschmidt	fc48bad531	powerpc/64/kexec: NULL check "clear_all" in kexec_sequence With Radix, it can be NULL even on !BOOKE these days so replace the ifdef with a NULL check which is cleaner anyway. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-23 07:54:05 +10:00
Christoph Hellwig	014b44e7a4	libata: remove unused definitions from <asm/libata-portmap.h> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-09-22 11:50:19 -04:00
Stephen Rothwell	f29ca38b6d	ppc: there is no clear_pages to export Fixes: `9445aa1a30` ("ppc: move exports to definitions") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michal Marek <mmarek@suse.com>	2016-09-22 14:51:45 +02:00
Nicholas Piggin	12eb901e01	powerpc/64: whitelist unresolved modversions CRCs These are a symptom of CRC generation failure in generic build code, and not powerpc specific. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Fixes: `9445aa1a30` ("ppc: move exports to definitions") Signed-off-by: Michal Marek <mmarek@suse.com>	2016-09-22 14:46:31 +02:00
Russell Currey	b79331a5eb	powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment Commit `5958d19a14` checks for prefetchable m64 BARs by comparing the addresses instead of using resource flags. This broke SR-IOV as the m64 check in pnv_pci_ioda_fixup_iov_resources() fails. The condition in pnv_pci_window_alignment() also changed to checking only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and IORESOURCE_PREFETCH. Revert these cases to the previous behaviour, adding a new helper function to do so. This is named pnv_pci_is_m64_flags() to make it clear this function is only looking at resource flags and should not be relied on for non-SRIOV resources. Fixes: `5958d19a14` ("Fix incorrect PE reservation attempt on some 64-bit BARs") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Russell Currey <ruscur@russell.cc> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-21 14:04:13 +10:00
Michael Ellerman	ef24ba7091	powerpc: Remove all usages of NO_IRQ NO_IRQ has been == 0 on powerpc for just over ten years (since commit `0ebfff1491` ("[POWERPC] Add new interrupt mapping core and change platforms to use it")). It's also 0 on most other arches. Although it's fairly harmless, every now and then it causes confusion when a driver is built on powerpc and another arch which doesn't define NO_IRQ. There's at least 6 definitions of NO_IRQ in drivers/, at least some of which are to work around that problem. So we'd like to remove it. This is fairly trivial in the arch code, we just convert: if (irq == NO_IRQ) to if (!irq) if (irq != NO_IRQ) to if (irq) irq = NO_IRQ; to irq = 0; return NO_IRQ; to return 0; And a few other odd cases as well. At least for now we keep the #define NO_IRQ, because there is driver code that uses NO_IRQ and the fixes to remove those will go via other trees. Note we also change some occurrences in PPC sound drivers, drivers/ps3, and drivers/macintosh. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-20 20:57:12 +10:00
Ingo Molnar	b2c16e1efd	Merge branch 'linus' into x86/asm, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-20 08:29:21 +02:00
Andrew Donnellan	90ce35145c	powerpc/pseries: fix memory leak in queue_hotplug_event() error path If we fail to allocate work, we don't end up using hp_errlog_copy. Free it in the error path. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-20 16:17:54 +10:00
Pan Xinhui	11b7e154b1	powerpc/nvram: Fix an incorrect partition merge When we merge two contiguous partitions whose signatures are marked NVRAM_SIG_FREE, We need update prev's length and checksum, then write it to nvram, not cur's. So lets fix this mistake now. Also use memset instead of strncpy to set the partition's name. It's more readable if we want to fill up with duplicate chars . Fixes: `fa2b4e54d4` ("powerpc/nvram: Improve partition removal") Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-20 16:15:42 +10:00
Pan Xinhui	0d0fecc5b5	powerpc/nvram: Fix a memory leak in err path If kmemdup fails, We need kfree buff first then return -ENOMEM. Otherwise there is a memory leak. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-20 16:15:33 +10:00
Nicholas Piggin	49d09bf2a6	powerpc/64s: Optimise MSR handling in exception handling mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always know what state those bits are, so the kernel MSR does not need to be loaded when modifying them. mtmsrd is often in the critical execution path, so avoiding dependency on even L1 load is noticable. On a POWER8 this saves about 3 cycles from the syscall path, and possibly a few from other exception returns (not measured). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-20 15:56:45 +10:00
Nicholas Piggin	18e3f56b1c	powerpc/64: Optimise syscall entry for virtual, relocatable case The mflr r10 instruction was left over from when the code used LR to branch to system_call_entry from the exception handler. That was changed by commit `6a404806df` ("powerpc: Avoid link stack corruption in MMU on syscall entry path") to use the count register. The value is never used now, so mflr can be removed, and r10 can be used for storage rather than spilling to the SPR scratch register. The scratch register spill causes a long pipeline stall due to the SPR read after write. This change brings getppid syscall cost from 406 to 376 cycles on POWER8. getppid for non-relocatable case is 371 cycles. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-20 14:46:05 +10:00
Aneesh Kumar K.V	d5a1e42cb4	powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K For hugetlb to work with 4K page size, we need MAX_ORDER to be 13 or more. When switching from a 64K page size to 4K linux page size using make oldconfig, we end up with a CONFIG_FORCE_MAX_ZONEORDER value of 9. This results in a 16M hugepage beiing considered as a gigantic huge page which in turn results in failure to setup hugepages if gigantic hugepage support is not enabled. This also results in kernel crash with 4K radix configuration. We hit the below BUG_ON on radix: kernel BUG at mm/huge_memory.c:364! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=2048 NUMA PowerNV CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1-00006-gbae9cc6 #1 task: c0000000f1af8000 task.stack: c0000000f1aec000 NIP: c000000000c5fa0c LR: c000000000c5f9d8 CTR: c000000000c5f9a4 REGS: c0000000f1aef920 TRAP: 0700 Not tainted (4.8.0-rc1-00006-gbae9cc6) MSR: 9000000102029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE,TM[E]> CR: 24000844 XER: 00000000 CFAR: c000000000c5f9e0 SOFTE: 1 .... NIP [c000000000c5fa0c] hugepage_init+0x68/0x238 LR [c000000000c5f9d8] hugepage_init+0x34/0x238 Fixes: `a7ee539584` ("powerpc/Kconfig: Update config option based on page size") Cc: stable@vger.kernel.org # v4.7+ Reported-by: Santhosh <santhog4@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-20 14:40:41 +10:00
Nicholas Piggin	e0e0d6b739	powerpc/64: Replay hypervisor maintenance interrupt first The HMI (Hypervisor Maintenance Interrupt) is defined by the architecture to be higher priority than other maskable interrupts, so replay it first, as a best-effort to replay according to hardware priorities. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-20 14:35:34 +10:00
Michael Ellerman	7de3b27bac	powerpc: Ensure .mem(init\|exit).text are within _stext/_etext In our linker script we open code the list of text sections, because we need to include the __ftr_alt sections, which are arch-specific. This means we can't use TEXT_TEXT as defined in vmlinux.lds.h, and so we don't have the MEM_KEEP() logic for memory hotplug sections. If we build the kernel with the gold linker, and with CONFIG_MEMORY_HOTPLUG=y, we see that functions marked __meminit can end up outside of the _stext/_etext range, and also outside of _sinittext/_einittext, eg: c000000000000000 T _stext c0000000009e0000 A _etext c0000000009e3f18 T hash__vmemmap_create_mapping c000000000ca0000 T _sinittext c000000000d00844 T _einittext This causes them to not be recognised as text by is_kernel_text(), and prevents them being patched by jump_label (and presumably ftrace/kprobes etc.). Fix it by adding MEM_KEEP() directives, mirroring what TEXT_TEXT does. This isn't a problem when CONFIG_MEMORY_HOTPLUG=n, because we use the standard INIT_TEXT_SECTION() and EXIT_TEXT macros from vmlinux.lds.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-19 10:53:56 +10:00
Michael Ellerman	bea2dccccf	powerpc: Don't change the section in _GLOBAL() Currently the _GLOBAL() macro unilaterally sets the assembler section to ".text" at the start of the macro. This is rude as the caller may be using a different section. So let the caller decide which section to emit the code into. On big endian we do need to switch to the ".opd" section to emit the OPD, but do that with pushsection/popsection, thereby leaving the original section intact. I verified that the order of all entries in System.map is unchanged after this patch. The actual addresses shift around slightly so you can't just diff the System.map. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-19 10:53:55 +10:00
Nicholas Piggin	6f698df10c	powerpc/kernel: Use kprobe blacklist for asm functions Rather than forcing the whole function into the ".kprobes.text" section, just add the symbol's address to the kprobe blacklist. This also lets us drop the three versions of the_KPROBE macro, in exchange for just one version of _ASM_NOKPROBE_SYMBOL - which is a good cleanup. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-19 10:53:55 +10:00
Nicholas Piggin	03465f899b	powerpc: Use kprobe blacklist for exception handlers Currently we mark the C implementations of some exception handlers as __kprobes. This has the effect of putting them in the ".kprobes.text" section, which separates them from the rest of the text. Instead we can use the blacklist macros to add the symbols to a blacklist which kprobes will check. This allows the linker to move exception handler functions close to callers and avoids trampolines in larger kernels. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Reword change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-19 10:53:54 +10:00
Linus Torvalds	baf009f927	powerpc fixes for 4.8 #6 Fixes for code merged this cycle: - Fix restore of SPRs upon wake up from hypervisor state loss from Gautham R. Shenoy - Fix the state of root PE from Gavin Shan - Detach from PE on releasing PCI device from Gavin Shan - Fix size of NUM_CPU_FTR_KEYS on 32-bit - Fix missed TCE invalidations that should fallback to OPAL -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJX3QI8AAoJEFHr6jzI4aWAmDEP/3k8Tuk30d+QhfVm++N18cZ7 EFdpO25m+kJH83PVc3Lri6sj2z6/Bpm6Ib2V3gynExB9SxUCcAJXHqwioLkL9/PW aUiotxsPvlpfBFAjNbk2myB8JSbc/+8yJaojKYWqwX796bjUdRkI7rmXtfrjmX6U uhQQ9nvKNxThwY5eedMH9PCJ89BzgLefrExHUD171iR43qfaouLkUn/Ba+UIhC5m pepwePCTXHEPm8e328hYVSNEmqWRgL+UN2EUZKqXjITNtDSHCdwGTF8iifwTku54 g/rrta8CgFD4x5chTROnOhJMkTD9MRoneVR8nE4QD6yMHj9k1huL8J8wlfnG/zbB Ym6MNKBYbGPMAoYfbxAcvWr/7XL+szNoR+p+VWl+rgf2Z08dQaI4zNiB3aimCs1g 7yWW649Gd4gXyNygfeMCDWGZbVhQdQIHcNrcAKFuIRvkn3iPZ0cPa5GYxZ7o/32B oKAtZMsufGN0eC21hbLaRkyeYPdqEjyk+T734t05cfBCvScWkHeBapnX9gYOoCqZ ok7b8wXVqVFXZ+FSZ8Ec7YquUHBhHECpqofMgB6d9DqbWPlubwiA3g4YnjrpFDC7 u4a4bVKVZy8fk3w7+2ibkIdud35zL0LqkB2ZNhOn3IYM/yBD0zgUs+bIqruTKZ2+ AYapeGjmf+SBD3ytGtab =MG5t -----END PGP SIGNATURE----- Merge tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for code merged this cycle: - Fix restore of SPRs upon wake up from hypervisor state loss from Gautham R Shenoy - Fix the state of root PE from Gavin Shan - Detach from PE on releasing PCI device from Gavin Shan - Fix size of NUM_CPU_FTR_KEYS on 32-bit - Fix missed TCE invalidations that should fallback to OPAL" * tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL powerpc/powernv: Detach from PE on releasing PCI device powerpc/powernv: Fix the state of root PE powerpc/kernel: Fix size of NUM_CPU_FTR_KEYS on 32-bit powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state loss	2016-09-17 12:52:01 -07:00
Luiz Capitulino	235539b48a	kvm: add stubs for arch specific debugfs support Two stubs are added: o kvm_arch_has_vcpu_debugfs(): must return true if the arch supports creating debugfs entries in the vcpu debugfs dir (which will be implemented by the next commit) o kvm_arch_create_vcpu_debugfs(): code that creates debugfs entries in the vcpu debugfs dir For x86, this commit introduces a new file to avoid growing arch/x86/kvm/x86.c even more. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-16 16:57:47 +02:00
Michael Ellerman	ed7d9a1d7d	powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL In commit `f0228c4130` ("powerpc/powernv/pci: Fallback to OPAL for TCE invalidations"), we added logic to fallback to OPAL for doing TCE invalidations if we can't do it in Linux. Ben sent a v2 of the patch, containing these additional call sites, but I had already applied v1 and didn't notice. So fix them now. Fixes: `f0228c4130` ("powerpc/powernv/pci: Fallback to OPAL for TCE invalidations") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-15 17:05:11 +10:00
Ingo Molnar	d4b80afbba	Merge branch 'linus' into x86/asm, to pick up recent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-09-15 08:24:53 +02:00
Gavin Shan	29bf282dec	powerpc/powernv: Detach from PE on releasing PCI device The PCI hotplug can be part of EEH error recovery. The @pdn and the device's PE number aren't removed and added afterwords. The PE number in @pdn should be set to an invalid one. Otherwise, the PE's device count is decreased on removing devices while failing to be increased on adding devices. It leads to unbalanced PE's device count and make normal PCI hotplug path broken. Fixes: `c5f7700bbd` ("powerpc/powernv: Dynamically release PE") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-15 13:53:19 +10:00
Linus Torvalds	77e5bdf9f7	Merge branch 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess fixes from Al Viro: "Fixes for broken uaccess primitives - mostly lack of proper zeroing in copy_from_user()/get_user()/__get_user(), but for several architectures there's more (broken clear_user() on frv and strncpy_from_user() on hexagon)" * 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) avr32: fix copy_from_user() microblaze: fix __get_user() microblaze: fix copy_from_user() m32r: fix __get_user() blackfin: fix copy_from_user() sparc32: fix copy_from_user() sh: fix copy_from_user() sh64: failing __get_user() should zero score: fix copy_from_user() and friends score: fix __get_user/get_user s390: get_user() should zero on failure ppc32: fix copy_from_user() parisc: fix copy_from_user() openrisc: fix copy_from_user() nios2: fix __get_user() nios2: copy_from_user() should zero the tail of destination mn10300: copy_from_user() should zero on access_ok() failure... mn10300: failing __get_user() and get_user() should zero mips: copy_from_user() must zero the destination on access_ok() failure ARC: uaccess: get_user to zero out dest in cause of fault ...	2016-09-14 09:35:05 -07:00
Gavin Shan	6eaed1665f	powerpc/powernv: Fix the state of root PE The PE for root bus (root PE) can be removed because of PCI hot remove in EEH recovery path for fenced PHB error. We need update @phb->root_pe_populated accordingly so that the root PE can be populated again in forthcoming PCI hot add path. Also, the PE shouldn't be destroyed as it's global and reserved resource. Fixes: `c5f7700bbd` ("powerpc/powernv: Dynamically release PE") Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-14 11:40:09 +10:00
Al Viro	224264657b	ppc32: fix copy_from_user() should clear on access_ok() failures. Also remove the useless range truncation logics. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-09-13 17:50:02 -04:00
Simon Guo	e1c0d66fcb	powerpc: Set used_(vsr\|vr\|spe) in sigreturn path when MSR bits are active Normally, when MSR[VSX/VR/SPE] bits == 1, the used_vsr/used_vr/used_spe bit have already been set. However when loading a signal frame from user space we need to explicitly set used_vsr/used_vr/used_spe to make them consistent with the MSR bits from the signal frame. For example, CRIU application, who utilizes sigreturn to restore checkpointed process, will lead to the case where MSR[VSX] bit is active in signal frame, but used_vsr bit is not set in the kernel. (the same applies to VR/SPE). This patch fixes this by always setting used_* bit when MSR related bits are active in signal frame and we are doing sigreturn. Based on a proposal by Benh. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> [mpe: Massage change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:12 +10:00
Simon Guo	261831160d	powerpc/ptrace: Fix cppcheck issue in gpr32_set_common/gpr32_get_common() The ckpt_regs usage in gpr32_set_common/gpr32_get_common() will lead to following cppcheck error at ifndef CONFIG_PPC_TRANSACTIONAL_MEM case: [arch/powerpc/kernel/ptrace.c:2062]: (error) Uninitialized variable: ckpt_regs [arch/powerpc/kernel/ptrace.c:2130]: (error) Uninitialized variable: ckpt_regs The problem is due to gpr32_set_common() used ckpt_regs variable which only makes sense at #ifdef CONFIG_PPC_TRANSACTIONAL_MEM. This patch fix this issue by passing in "regs" parameter instead. Reported-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Simon Guo <wei.guo.simon@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:12 +10:00
Colin Ian King	3daf3c2069	powerpc/32: Add missing \n and switch to pr_warn() The message is missing a \n, add it. Switch to pr_warn(), it's shorter and less ugly. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:11 +10:00
Aneesh Kumar K.V	ad410674f5	powerpc/mm: Update the HID bit when switching from radix to hash Power9 DD1 requires to update the hid0 register when switching from hash to radix. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:10 +10:00
Aneesh Kumar K.V	c6d1a767b9	powerpc/mm/radix: Use different pte update sequence for different POWER9 revs POWER9 DD1 requires pte to be marked invalid (V=0) before updating it with the new value. This makes this distinction for the different revisions. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:10 +10:00
Aneesh Kumar K.V	694c495192	powerpc/mm/radix: Use different RTS encoding for different POWER9 revs POWER9 DD1 uses RTS - 28 for the RTS value but other revisions use RTS - 31. This makes this distinction for the different revisions Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:09 +10:00
Aneesh Kumar K.V	7dccfbc325	powerpc/book3s: Add a cpu table entry for different POWER9 revs Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:09 +10:00
Darren Stevens	687e16bc2f	powerpc/pasemi: Fix device_type of Nemo SB600 node. The of_node for the SB600 (io-bridge) has its device_type set to 'io-bridge' Set it to 'isa' so that it can be found by isa_bridge_find_early() instead of using patches in the kernel. Signed-off-by: Darren Stevens <darren@stevens-zone.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:08 +10:00
Darren Stevens	5024678765	powerpc/pasemi: Fix Nemo SB600 i8259 interrupts. The device tree on the Nemo passes all of the i8259 interrupts with numbers between 212 and 222, and points their interrupt-parent property to the pasemi-opic, requiring custom patches to the kernel. Fix the values so that they can be controlled by the generic ppc i8259 code. Signed-off-by: Darren Stevens <darren@stevens-zone.net> [mpe: Rework deeply nested if and boundary checks] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:08 +10:00
Darren Stevens	88c13e2f4f	powerpc/pasemi: Add Nemo motherboard config option. Add config option for the Nemo motherboard used in the Amigaone X1000. This is a custom PASemi board with an AMD SB600 southbridge, and needs some patches to it device tree. This option will be used to build these into the kernel Signed-off-by: Darren Stevens <darren@stevens-zone.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:07 +10:00
Colin Ian King	6f95d4b2f6	powerpc/ps3: fix spelling mistake in function name Trivial fix to spelling mistake in dev_warn message and remove extraneous trailing whitespace at end of the message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:06 +10:00
Michael Ellerman	57073e2781	powerpc/Makefile: Construct the UTS_MACHINE value more concisely Use the standard Kbuild trick of foo-y to make the construction of UTC_MACHINE less verbose. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:06 +10:00
Michael Ellerman	68201fbbb0	powerpc/Makefile: Drop CONFIG_WORD_SIZE for BITS Commit `2578bfae84` ("[POWERPC] Create and use CONFIG_WORD_SIZE") added CONFIG_WORD_SIZE, and suggests that other arches were going to do likewise. But that never happened, powerpc is the only architecture which uses it. So switch to using a simple make variable, BITS, like x86, sh, sparc and tile. It is also easier to spell and simpler, avoiding any confusion about whether it's defined due to ordering of make vs kconfig. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:06 +10:00
Michael Ellerman	6abe248e16	powerpc/boot: Use $(Q) to quiet build rules not @ Some of the rules in the boot Makefile use @ to hide the command, this means "make V=1" doesn't show them, which is confusing. So use the Kbuild standard $(Q) which means KBUILD_VERBOSE=1 or V=1 will work as expected. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:05 +10:00
Michael Ellerman	2ca07d7c4f	powerpc/vdso64: Drop vdso64as We can just use the standard .S -> .o rule, cmd_as_o_S. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:05 +10:00
Michael Ellerman	d312603a44	powerpc/Makefile: CROSS32AS is unused, remove it In fact it makes no sense at all to have this defined on little endian builds. Since we disabled the 32-bit VDSO on little endian, we don't build any 32-bit code when building a little endian kernel. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:04 +10:00
Michael Ellerman	d8d42b0511	powerpc/64: Do load of PACAKBASE in LOAD_HANDLER The LOAD_HANDLER macro requires that you have previously loaded "reg" with PACAKBASE. Although that gives callers flexibility to get PACAKBASE in some interesting way, none of the callers actually do that. So fold the load of PACAKBASE into the macro, making it simpler for callers to use correctly. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nick Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:04 +10:00
Michael Ellerman	27510235dd	powerpc/64: Correct comment on LOAD_HANDLER() The comment for LOAD_HANDLER() was wrong. The part about kdump has not been true since `1f6a93e4c3` ("powerpc: Make it possible to move the interrupt handlers away from the kernel"). Describe how it currently works, and combine the two separate comments into one. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nick Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:03 +10:00
Paul Mackerras	f0f558b131	powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address Currently, if userspace or the kernel accesses a completely bogus address, for example with any of bits 46-59 set, we first take an SLB miss interrupt, install a corresponding SLB entry with VSID 0, retry the instruction, then take a DSI/ISI interrupt because there is no HPT entry mapping the address. However, by the time of the second interrupt, the Come-From Address Register (CFAR) has been overwritten by the rfid instruction at the end of the SLB miss interrupt handler. Since bogus accesses can often be caused by a function return after the stack has been overwritten, the CFAR value would be very useful as it could indicate which function it was whose return had led to the bogus address. This patch adds code to create a full exception frame in the SLB miss handler in the case of a bogus address, rather than inserting an SLB entry with a zero VSID field. Then we call a new slb_miss_bad_addr() function in C code, which delivers a signal for a user access or creates an oops for a kernel access. In the latter case the oops message will show the CFAR value at the time of the access. In the case of the radix MMU, a segment miss interrupt indicates an access outside the ranges mapped by the page tables. Previously this was handled by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is not really correct. With this patch, we now handle these interrupts with slb_miss_bad_addr(), which is much more consistent. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:03 +10:00
Michael Ellerman	b42d9023a3	powerpc/xmon: Don't use ld on 32-bit In commit `31cdd0c39c` ("powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs") I added two uses of the "ld" instruction in spr_access.S. "ld" is a 64-bit instruction, so shouldn't be used on 32-bit CPUs. Replace it with PPC_LL which is a macro that gives us either "ld" or "lwz" depending on whether we're 64 or 32-bit. Fixes: `31cdd0c39c` ("powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs") Cc: stable@vger.kernel.org # v4.7+ Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:37:02 +10:00
Daniel Axtens	0545d5436a	powerpc/sparse: Add more assembler prototypes Another set of things that are only called from assembler and so need prototypes to keep sparse happy. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:36:58 +10:00
Daniel Axtens	d8bced27be	powerpc/fadump: Set core e_flags using kernel's ELF ABI version Firmware Assisted Dump is a facility to dump kernel core with assistance from firmware. As part of this process the kernel ELF ABI version is stored in the core file. Currently fadump.h defines this to 0 if it is not already defined. This clashes with a define in elf.h which sets it based on the current task - not based on the kernel's ELF ABI version. Use the compiler-provided #define _CALL_ELF which tells us the ELF ABI version of the kernel to set e_flags, this matches what binutils does. Remove the definition in fadump.h, which becomes unused. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:36:01 +10:00
Daniel Axtens	7c98bd7208	powerpc/sparse: Make a bunch of things static Squash a bunch of sparse warnings by making things static. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-13 17:35:47 +10:00
Markus Elfring	aad9e5ba24	KVM: PPC: e500: Rename jump labels in kvmppc_e500_tlb_init() Adjust jump labels according to the current Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-13 14:32:47 +10:00
Michael Ellerman	ffed15d3ce	powerpc/kernel: Fix size of NUM_CPU_FTR_KEYS on 32-bit The number of CPU feature keys is meant to map 1:1 to the number of CPU feature flags defined in cputable.h, and the latter must fit in an unsigned long. In commit `4db7327194` ("powerpc: Add option to use jump label for cpu_has_feature()"), I incorrectly defined NUM_CPU_FTR_KEYS to 64. There should be no real adverse consequences of this bug, other than us allocating too many keys. Fix it by using BITS_PER_LONG. Fixes: `4db7327194` ("powerpc: Add option to use jump label for cpu_has_feature()") Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-12 12:48:28 +10:00
Gautham R. Shenoy	bd00a240dc	powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state loss pnv_wakeup_tb_loss() currently expects cr4 to be "eq" if the CPU is waking up from a complete hypervisor state loss. Hence, it currently restores the SPR contents only if cr4 is "eq". However, after commit `bcef83a00d` ("powerpc/powernv: Add platform support for stop instruction"), on ISA v3.0 CPUs, the function pnv_restore_hyp_resource() sets cr4 to contain the result of the comparison between the state the CPU has woken up from and the first deep stop state before calling pnv_wakeup_tb_loss(). Thus if the CPU woke up from a state that is deeper than the first deep stop state, cr4 will have "gt" set and hence, pnv_wakeup_tb_loss() will fail to restore the SPRs on waking up from such a state. Fix the code in pnv_wakeup_tb_loss() to restore the SPR states when cr4 is "eq" or "gt". Fixes: `bcef83a00d` ("powerpc/powernv: Add platform support for stop instruction") Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Shreyas B. Prabhu <shreyasbp@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-12 12:45:50 +10:00
Markus Elfring	90235dc19e	KVM: PPC: e500: Use kmalloc_array() in kvmppc_e500_tlb_init() * A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kmalloc_array". * Replace the specification of a data structure by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:13:01 +10:00
Markus Elfring	b0ac477bc4	KVM: PPC: e500: Replace kzalloc() calls by kcalloc() in two functions * A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kcalloc". Suggested-by: Paolo Bonzini <pbonzini@redhat.com> This issue was detected also by using the Coccinelle software. * Replace the specification of data structures by pointer dereferences to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:56 +10:00
Markus Elfring	cfb60813fb	KVM: PPC: e500: Delete an unnecessary initialisation in kvm_vcpu_ioctl_config_tlb() The local variable "g2h_bitmap" will be set to an appropriate value a bit later. Thus omit the explicit initialisation at the beginning. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:51 +10:00
Markus Elfring	46d4e74792	KVM: PPC: e500: Less function calls in kvm_vcpu_ioctl_config_tlb() after error detection The kfree() function was called in two cases by the kvm_vcpu_ioctl_config_tlb() function during error handling even if the passed data structure element contained a null pointer. * Split a condition check for memory allocation failures. * Adjust jump targets according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:46 +10:00
Markus Elfring	f3c0ce86a8	KVM: PPC: e500: Use kmalloc_array() in kvm_vcpu_ioctl_config_tlb() * A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:40 +10:00
Suresh Warrier	65e7026a6c	KVM: PPC: Book3S HV: Counters for passthrough IRQ stats Add VCPU stat counters to track affinity for passthrough interrupts. pthru_all: Counts all passthrough interrupts whose IRQ mappings are in the kvmppc_passthru_irq_map structure. pthru_host: Counts all cached passthrough interrupts that were injected from the host through kvm_set_irq (i.e. not handled in real mode). pthru_bad_aff: Counts how many cached passthrough interrupts have bad affinity (receiving CPU is not running VCPU that is the target of the virtual interrupt in the guest). Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:34 +10:00
Paul Mackerras	5d375199ea	KVM: PPC: Book3S HV: Set server for passed-through interrupts When a guest has a PCI pass-through device with an interrupt, it will direct the interrupt to a particular guest VCPU. In fact the physical interrupt might arrive on any CPU, and then get delivered to the target VCPU in the emulated XICS (guest interrupt controller), and eventually delivered to the target VCPU. Now that we have code to handle device interrupts in real mode without exiting to the host kernel, there is an advantage to having the device interrupt arrive on the same sub(core) as the target VCPU is running on. In this situation, the interrupt can be delivered to the target VCPU without any exit to the host kernel (using a hypervisor doorbell interrupt between threads if necessary). This patch aims to get passed-through device interrupts arriving on the correct core by setting the interrupt server in the real hardware XICS for the interrupt to the first thread in the (sub)core where its target VCPU is running. We do this in the real-mode H_EOI code because the H_EOI handler already needs to look at the emulated ICS state for the interrupt (whereas the H_XIRR handler doesn't), and we know we are running in the target VCPU context at that point. We set the server CPU in hardware using an OPAL call, regardless of what the IRQ affinity mask for the interrupt says, and without updating the affinity mask. This amounts to saying that when an interrupt is passed through to a guest, as a matter of policy we allow the guest's affinity for the interrupt to override the host's. This is inspired by an earlier patch from Suresh Warrier, although none of this code came from that earlier patch. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:28 +10:00
Suresh Warrier	366274f59c	KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode When a passthrough IRQ is handled completely within KVM real mode code, it has to also update the IRQ stats since this does not go through the generic IRQ handling code. However, the per CPU kstat_irqs field is an allocated (not static) field and so cannot be directly accessed in real mode safely. The function this_cpu_inc_rm() is introduced to safely increment per CPU fields (currently coded for unsigned integers only) that are allocated and could thus be vmalloced also. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:23 +10:00
Suresh Warrier	644abbb254	KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass Add a module parameter kvm_irq_bypass for kvm_hv.ko to disable IRQ bypass for passthrough interrupts. The default value of this tunable is 1 - that is enable the feature. Since the tunable is used by built-in kernel code, we use the module_param_cb macro to achieve this. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:18 +10:00
Suresh Warrier	af893c7dc9	KVM: PPC: Book3S HV: Dump irqmap in debugfs Dump the passthrough irqmap structure associated with a guest as part of /sys/kernel/debug/powerpc/kvm-xics-*. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:13 +10:00
Suresh Warrier	f7af5209b8	KVM: PPC: Book3S HV: Complete passthrough interrupt in host In existing real mode ICP code, when updating the virtual ICP state, if there is a required action that cannot be completely handled in real mode, as for instance, a VCPU needs to be woken up, flags are set in the ICP to indicate the required action. This is checked when returning from hypercalls to decide whether the call needs switch back to the host where the action can be performed in virtual mode. Note that if h_ipi_redirect is enabled, real mode code will first try to message a free host CPU to complete this job instead of returning the host to do it ourselves. Currently, the real mode PCI passthrough interrupt handling code checks if any of these flags are set and simply returns to the host. This is not good enough as the trap value (0x500) is treated as an external interrupt by the host code. It is only when the trap value is a hypercall that the host code searches for and acts on unfinished work by calling kvmppc_xics_rm_complete. This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD which is returned by KVM if there is unfinished business to be completed in host virtual mode after handling a PCI passthrough interrupt. The host checks for this special interrupt condition and calls into the kvmppc_xics_rm_complete, which is made an exported function for this reason. [paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:12:07 +10:00
Suresh Warrier	e3c13e56a4	KVM: PPC: Book3S HV: Handle passthrough interrupts in guest Currently, KVM switches back to the host to handle any external interrupt (when the interrupt is received while running in the guest). This patch updates real-mode KVM to check if an interrupt is generated by a passthrough adapter that is owned by this guest. If so, the real mode KVM will directly inject the corresponding virtual interrupt to the guest VCPU's ICS and also EOI the interrupt in hardware. In short, the interrupt is handled entirely in real mode in the guest context without switching back to the host. In some rare cases, the interrupt cannot be completely handled in real mode, for instance, a VCPU that is sleeping needs to be woken up. In this case, KVM simply switches back to the host with trap reason set to 0x500. This works, but it is clearly not very efficient. A following patch will distinguish this case and handle it correctly in the host. Note that we can use the existing check_too_hard() routine even though we are not in a hypercall to determine if there is unfinished business that needs to be completed in host virtual mode. The patch assumes that the mapping between hardware interrupt IRQ and virtual IRQ to be injected to the guest already exists for the PCI passthrough interrupts that need to be handled in real mode. If the mapping does not exist, KVM falls back to the default existing behavior. The KVM real mode code reads mappings from the mapped array in the passthrough IRQ map without taking any lock. We carefully order the loads and stores of the fields in the kvmppc_irq_map data structure using memory barriers to avoid an inconsistent mapping being seen by the reader. Thus, although it is possible to miss a map entry, it is not possible to read a stale value. [paulus@ozlabs.org - get irq_chip from irq_map rather than pimap, pulled out powernv eoi change into a separate patch, made kvmppc_read_intr get the vcpu from the paca rather than being passed in, rewrote the logic at the end of kvmppc_read_intr to avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S since we were always restoring SRR0/1 anyway, get rid of the cached array (just use the mapped array), removed the kick_all_cpus_sync() call, clear saved_xirr PACA field when we handle the interrupt in real mode, fix compilation with CONFIG_KVM_XICS=n.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-12 10:11:00 +10:00
Daniel Axtens	bc42f1d9f5	powerpc/cell: Drop unused iic_get_irq_host() Sparse checking revealed that it is no longer used. The last usage was removed in commit `2e19458312` ("[POWERPC] Cell interrupt rework") in 2006. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-10 18:46:45 +10:00
Linus Torvalds	2771fc8ed6	powerpc fixes for 4.8 #5 - Don't alias user region to other regions below PAGE_OFFSET from Paul Mackerras - Fix again csum_partial_copy_generic() on 32-bit from Christophe Leroy - Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan Fixes for code merged this cycle: - Fix crash on releasing compound PE from Gavin Shan - Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt - Fix little endian build with CONFIG_KEXEC=n from Thiago Jung Bauermann -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJX0qKlAAoJEFHr6jzI4aWAu30QAK88plLxJ2z5Lyf7axdHf7P0 NfM6xQLyuUQ1xP3ksUB4/4eps3jINJ0bZRd4mMJ3fO/YgNsBACayB9GeE478tMbp 1KmK94qpLk1u4BUxXNtsWJLEuzVqAPr2cGh6jddmkPXGCUx1MFatEVNVJupX+Vt9 sJsmhLatUucZEQI6r4sK5wDOdLYIQgcgTIWW5qHH7jyJDKLGyJbNPtmQhbMWU0a5 zBwD+paecJSGTJEVjd3UwBic+oXt8chwiZkaHLu4Rh6JQ0yVRL4If4EYCodHIpDR H7b0P9De9W6a+IWLjVDMhYKq9rBjjgZwcjMplkO7gBE2P+v/NGzbfORJtNXeOgKE /RSWufpTbpiGyUzP1Lr/j0O59ZoijRGBK8zuha5FtsTlhl909ifc6KuHO5aqHY9r I5o7ws+hSBM1u9cf0Bl011P4uToYzy1auMsZsjDW2SdDEFtJ+WK+0I2vp+M9Jv73 /F48n/EWUuul5oS2Uar+V2AUADpnYPRi50OR1zVJxdJSM8bZFue4brBFfx1bI/2/ jmK87hxNwJtYT45KiuEXr2FWMiB1iNHHxL/OEwWbitf2MfRjq8+LHbdt9FxOSj3/ +8cw3f1zyEjNsvH380HhkUBZknKmD7z8V5Ko5Dx5h8cuRlL+QEW2GnW+1NN7VMoQ T7QTHRR4ziHSKdzAIlTe =q0Jo -----END PGP SIGNATURE----- Merge tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes marked for stable: - Don't alias user region to other regions below PAGE_OFFSET from Paul Mackerras - Fix again csum_partial_copy_generic() on 32-bit from Christophe Leroy - Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan Fixes for code merged this cycle: - Fix crash on releasing compound PE from Gavin Shan - Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt - Fix little endian build with CONFIG_KEXEC=n from Thiago Jung Bauermann" * tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET powerpc/32: Fix again csum_partial_copy_generic() powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE powerpc/powernv: Fix crash on releasing compound PE powerpc/xics/opal: Fix processor numbers in OPAL ICP powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n	2016-09-09 08:43:42 -07:00
Suresh Warrier	c57875f5f9	KVM: PPC: Book3S HV: Enable IRQ bypass Add the irq_bypass_add_producer and irq_bypass_del_producer functions. These functions get called whenever a GSI is being defined for a guest. They create/remove the mapping between host real IRQ numbers and the guest GSI. Add the following helper functions to manage the passthrough IRQ map. kvmppc_set_passthru_irq() Creates a mapping in the passthrough IRQ map that maps a host IRQ to a guest GSI. It allocates the structure (one per guest VM) the first time it is called. kvmppc_clr_passthru_irq() Removes the passthrough IRQ map entry given a guest GSI. The passthrough IRQ map structure is not freed even when the number of mapped entries goes to zero. It is only freed when the VM is destroyed. [paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than requiring all passed-through interrupts to use the same irq_chip; changed deletion so it zeroes out the r_hwirq field rather than copying the last entry down and decrementing the number of entries.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:26:51 +10:00
Suresh Warrier	8daaafc88b	KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap This patch introduces an IRQ mapping structure, the kvmppc_passthru_irqmap structure that is to be used to map the real hardware IRQ in the host with the virtual hardware IRQ (gsi) that is injected into a guest by KVM for passthrough adapters. Currently, we assume a separate IRQ mapping structure for each guest. Each kvmppc_passthru_irqmap has a mapping arrays, containing all defined real<->virtual IRQs. [paulus@ozlabs.org - removed irq_chip field from struct kvmppc_passthru_irqmap; changed parameter for kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct kvm *, removed small cached array.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:26:19 +10:00
Suresh Warrier	9576730d0e	KVM: PPC: select IRQ_BYPASS_MANAGER Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set. Add the PPC producer functions for add and del producer. [paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c so booke compiles; added kvm_arch_has_irq_bypass implementation.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:26:19 +10:00
Suresh Warrier	37f55d30df	KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function Modify kvmppc_read_intr to make it a C function. Because it is called from kvmppc_check_wake_reason, any of the assembler code that calls either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume that the volatile registers might have been modified. This also adds in the optimization of clearing saved_xirr in the case where we completely handle and EOI an IPI. Without this, the next device interrupt will require two trips through the host interrupt handling code. [paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame when it is calling kvmppc_read_intr, which means we can set r12 to the trap number (0x500) after the call to kvmppc_read_intr, instead of using r31. Also moved the deliver_guest_interrupt label so as to restore XER and CTR, plus other minor tweaks.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:25:25 +10:00
Paul Mackerras	99212c864e	Merge branch 'kvm-ppc-infrastructure' into kvm-ppc-next This merges the topic branch 'kvm-ppc-infrastructure' into kvm-ppc-next so that I can then apply further patches that need the changes in the kvm-ppc-infrastructure branch. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:24:23 +10:00
Paolo Bonzini	3f25777499	powerpc: move hmi.c to arch/powerpc/kvm/ hmi.c functions are unused unless sibling_subcore_state is nonzero, and that in turn happens only if KVM is in use. So move the code to arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also included in struct paca_struct only if KVM is supported by the kernel. Cc: Daniel Axtens <dja@axtens.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: kvm-ppc@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:18:07 +10:00
Suresh Warrier	4ee11c1a9f	powerpc/powernv: Provide facilities for EOI, usable from real mode This adds a new function pnv_opal_pci_msi_eoi() which does the part of end-of-interrupt (EOI) handling of an MSI which involves doing an OPAL call. This function can be called in real mode. This doesn't just export pnv_ioda2_msi_eoi() because that does a call to icp_native_eoi(), which does not work in real mode. This also adds a function, is_pnv_opal_msi(), which KVM can call to check whether an interrupt is one for which we should be calling pnv_opal_pci_msi_eoi() when we need to do an EOI. [paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi() from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough interrupts in guest"; added is_pnv_opal_msi(); wrote description.] Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:17:59 +10:00
Suresh Warrier	07b1fdf5bd	powerpc: Add simple cache inhibited MMIO accessors Add simple cache inhibited accessors for memory mapped I/O. Unlike the accessors built from the DEF_MMIO_* macros, these don't include any hardware memory barriers, callers need to manage memory barriers on their own. These can only be called in hypervisor real mode. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> [paulus@ozlabs.org - added line to comment] Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:17:59 +10:00
Paul Mackerras	0eeede0c63	powerpc/mm: Speed up computation of base and actual page size for a HPTE This replaces a 2-D search through an array with a simple 8-bit table lookup for determining the actual and/or base page size for a HPT entry. The encoding in the second doubleword of the HPTE is designed to encode the actual and base page sizes without using any more bits than would be needed for a 4k page number, by using between 1 and 8 low-order bits of the RPN (real page number) field to encode the page sizes. A single "large page" bit in the first doubleword indicates that these low-order bits are to be interpreted like this. We can determine the page sizes by using the low-order 8 bits of the RPN to look up a 256-entry table. For actual page sizes less than 1MB, some of the upper bits of these 8 bits are going to be real address bits, but we can cope with that by replicating the entries for those smaller page sizes. While we're at it, let's move the hpte_page_size() and hpte_base_page_size() functions from a KVM-specific header to a header for 64-bit HPT systems, since this computation doesn't have anything specifically to do with KVM. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-09 16:14:48 +10:00
Paul Mackerras	f077aaf075	powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET In commit `c60ac5693c` ("powerpc: Update kernel VSID range", 2013-03-13) we lost a check on the region number (the top four bits of the effective address) for addresses below PAGE_OFFSET. That commit replaced a check that the top 18 bits were all zero with a check that bits 46 - 59 were zero (performed for all addresses, not just user addresses). This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx and we will insert a valid SLB entry for it. The VSID used will be the same as if the top 4 bits were 0, but the page size will be some random value obtained by indexing beyond the end of the mm_ctx_high_slices_psize array in the paca. If that page size is the same as would be used for region 0, then userspace just has an alias of the region 0 space. If the page size is different, then no HPTE will be found for the access, and the process will get a SIGSEGV (since hash_page_mm() will refuse to create a HPTE for the bogus address). The access beyond the end of the mm_ctx_high_slices_psize can be at most 5.5MB past the array, and so will be in RAM somewhere. Since the access is a load performed in real mode, it won't fault or crash the kernel. At most this bug could perhaps leak a little bit of information about blocks of 32 bytes of memory located at offsets of i * 512kB past the paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11. Fixes: `c60ac5693c` ("powerpc: Update kernel VSID range") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-08 13:15:33 +10:00
Christophe Leroy	8540571e01	powerpc/32: Fix again csum_partial_copy_generic() Commit `7aef413656` ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") introduced a bug when destination address is odd and len is lower than cacheline size. In that case the resulting csum value doesn't have to be rotated one byte because the cache-aligned copy part is skipped so no alignment is performed. Fixes: `7aef413656` ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") Cc: stable@vger.kernel.org # v4.6+ Reported-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Tested-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-08 13:15:02 +10:00
Gavin Shan	caa58f8088	powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE In pnv_ioda_free_pe(), the PE object (including the associated PE number) is cleared before resetting the corresponding bit in the PE allocation bitmap. It means PE#0 is always released to the bitmap wrongly. This fixes above issue by caching the PE number before the PE object is cleared. Fixes: `1e9167726c` ("powerpc/powernv: Use PE instead of number during setup and release" Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-09-08 13:12:52 +10:00
Suraj Jitindar Singh	2a27f514a4	KVM: PPC: Implement existing and add new halt polling vcpu stats vcpu stats are used to collect information about a vcpu which can be viewed in the debugfs. For example halt_attempted_poll and halt_successful_poll are used to keep track of the number of times the vcpu attempts to and successfully polls. These stats are currently not used on powerpc. Implement incrementation of the halt_attempted_poll and halt_successful_poll vcpu stats for powerpc. Since these stats are summed over all the vcpus for all running guests it doesn't matter which vcpu they are attributed to, thus we choose the current runner vcpu of the vcore. Also add new vcpu stats: halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns to be used to accumulate the total time spend polling successfully, polling unsuccessfully and waiting respectively, and halt_successful_wait to accumulate the number of times the vcpu waits. Given that halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns are expressed in nanoseconds it is necessary to represent these as 64-bit quantities, otherwise they would overflow after only about 4 seconds. Given that the total time spend either polling or waiting will be known and the number of times that each was done, it will be possible to determine the average poll and wait times. This will give the ability to tune the kvm module parameters based on the calculated average wait and poll times. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-08 12:25:37 +10:00
Suraj Jitindar Singh	8a7e75d47b	KVM: Add provisioning for ulong vm stats and u64 vcpu stats vms and vcpus have statistics associated with them which can be viewed within the debugfs. Currently it is assumed within the vcpu_stat_get() and vm_stat_get() functions that all of these statistics are represented as u32s, however the next patch adds some u64 vcpu statistics. Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly. Since vcpu statistics are per vcpu, they will only be updated by a single vcpu at a time so this shouldn't present a problem on 32-bit machines which can't atomically increment 64-bit numbers. However vm statistics could potentially be updated by multiple vcpus from that vm at a time. To avoid the overhead of atomics make all vm statistics ulong such that they are 64-bit on 64-bit systems where they can be atomically incremented and are 32-bit on 32-bit systems which may not be able to atomically increment 64-bit numbers. Modify vm_stat_get() to expect ulongs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Matlack <dmatlack@google.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-08 12:25:37 +10:00
Suraj Jitindar Singh	0cda69dd7c	KVM: PPC: Book3S HV: Implement halt polling This patch introduces new halt polling functionality into the kvm_hv kernel module. When a vcore is idle it will poll for some period of time before scheduling itself out. When all of the runnable vcpus on a vcore have ceded (and thus the vcore is idle) we schedule ourselves out to allow something else to run. In the event that we need to wake up very quickly (for example an interrupt arrives), we are required to wait until we get scheduled again. Implement halt polling so that when a vcore is idle, and before scheduling ourselves, we poll for vcpus in the runnable_threads list which have pending exceptions or which leave the ceded state. If we poll successfully then we can get back into the guest very quickly without ever scheduling ourselves, otherwise we schedule ourselves out as before. There exists generic halt_polling code in virt/kvm_main.c, however on powerpc the polling conditions are different to the generic case. It would be nice if we could just implement an arch specific kvm_check_block() function, but there is still other arch specific things which need to be done for kvm_hv (for example manipulating vcore states) which means that a separate implementation is the best option. Testing of this patch with a TCP round robin test between two guests with virtio network interfaces has found a decrease in round trip time of ~15us on average. A performance gain is only seen when going out of and back into the guest often and quickly, otherwise there is no net benefit from the polling. The polling interval is adjusted such that when we are often scheduled out for long periods of time it is reduced, and when we often poll successfully it is increased. The rate at which the polling interval increases or decreases, and the maximum polling interval, can be set through module parameters. Based on the implementation in the generic kvm module by Wanpeng Li and Paolo Bonzini, and on direction from Paul Mackerras. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2016-09-08 12:21:45 +10:00

1 2 3 4 5 ...

15595 Commits