OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Dan Williams	09135cc594	mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize() Patch series "mm, smaps: MMUPageSize for device-dax", v3. Similar to commit `31383c6865` ("mm, hugetlbfs: introduce ->split() to vm_operations_struct") here is another occasion where we want special-case hugetlbfs/hstate enabling to also apply to device-dax. This prompts the question what other hstate conversions we might do beyond ->split() and ->pagesize(), but this appears to be the last of the usages of hstate_vma() in generic/non-hugetlbfs specific code paths. This patch (of 3): The current powerpc definition of vma_mmu_pagesize() open codes looking up the page size via hstate. It is identical to the generic vma_kernel_pagesize() implementation. Now, vma_kernel_pagesize() is growing support for determining the page size of Device-DAX vmas in addition to the existing Hugetlbfs page size determination. Ideally, if the powerpc vma_mmu_pagesize() used vma_kernel_pagesize() it would automatically benefit from any new vma-type support that is added to vma_kernel_pagesize(). However, the powerpc vma_mmu_pagesize() is prevented from calling vma_kernel_pagesize() due to a circular header dependency that requires vma_mmu_pagesize() to be defined before including <linux/hugetlb.h>. Break this circular dependency by defining the default vma_mmu_pagesize() as a __weak symbol to be overridden by the powerpc version. Link: http://lkml.kernel.org/r/151996254179.27922.2213728278535578744.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Jane Chu <jane.chu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-05 21:36:26 -07:00
Anshuman Khandual	310253514b	mm/migrate: rename migration reason MR_CMA to MR_CONTIG_RANGE alloc_contig_range() initiates compaction and eventual migration for the purpose of either CMA or HugeTLB allocations. At present, the reason code remains the same MR_CMA for either of these cases. Let's make it MR_CONTIG_RANGE which will appropriately reflect the reason code in both these cases. Link: http://lkml.kernel.org/r/20180202091518.18798-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-05 21:36:24 -07:00
Linus Torvalds	9c2dd8405c	DeviceTree updates for 4.17: - Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a bunch more warnings (hidden behind W=1). - Build dtc lexer and parser files instead of using shipped versions. - Rework overlay apply API to take an FDT as input and apply overlays in a single step. - Add a phandle lookup cache. This improves boot time by hundreds of msec on systems with large DT. - Add trivial mcp4017/18/19 potentiometers bindings. - Remove VLA stack usage in DT code. -----BEGIN PGP SIGNATURE----- iQItBAABCAAXBQJaxiUdEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcM0+w/+ L7nkug1Hz2476eRrsn5bm6oOO0vCrhQcDTJ/AlvU1YO8XBVgGEetLDs8drmvD0/O FQDcpumX6G0eFoHTnTNWD7keM+0nY5jZBIAqKQNa9a0HKkjYc4HO5Ot9E02XG8W8 759vvCcGeJpysoCls9u8OplzqiDyNVQJd1a0fLivtafdKypuE/Ywh15wrzckPO+F bxqWQd+uwm98ZVz8/o3vfYtAOJmA06A+hsyVLXYu7iKQcXYVxi+ZNbRV44MQ50NI 1w5m8GgtWe4A2lpXjmeXk1VmLPO3eEgQKnBoH7gcJmCHaVg/SVfMgBscuGSQZRQa rQvaYRUNGJ0Mtji8EZpZb5Vip4ZCDtZCQBB3snN24CvGXI6WuIIg/8ncXt0AfLqn pxFmC32ZcwvJR2NCpPVfTgILm6foT9IzJWKl6SQLVtqqVp9nPFua7T3l8AQak7FB 2MMaaqh7L0l0za0ZgArZZo/IWUHRb0MwZdXAkqBZlQ6f3IBqGQeKCnkclAeH8qYr OorCOmC2OlKXLPHoz8XHeBzPRdnv1dQ//gEkKXBJ2igLU03hRWv9dxnGju/45sun Ifo79uBAUc9s3F4Kjd/zs2iLztuPrYCSICHtJh9LPeOxoV1ZUNt+6Cm23yQ014Uo /GsFW+lzh7c9wB1eETjPHd1WuYXiSrmE4zvbdykyLCk= =ZWpa -----END PGP SIGNATURE----- Merge tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: - Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a bunch more warnings (hidden behind W=1). - Build dtc lexer and parser files instead of using shipped versions. - Rework overlay apply API to take an FDT as input and apply overlays in a single step. - Add a phandle lookup cache. This improves boot time by hundreds of msec on systems with large DT. - Add trivial mcp4017/18/19 potentiometers bindings. - Remove VLA stack usage in DT code. * tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (26 commits) of: unittest: fix an error code in of_unittest_apply_overlay() of: unittest: move misplaced function declaration of: unittest: Remove VLA stack usage of: overlay: Fix forgotten reference to of_overlay_apply() of: Documentation: Fix forgotten reference to of_overlay_apply() of: unittest: local return value variable related cleanups of: unittest: remove unneeded local return value variables dt-bindings: trivial: add various mcp4017/18/19 potentiometers of: unittest: fix an error test in of_unittest_overlay_8() of: cache phandle nodes to reduce cost of of_find_node_by_phandle() dt-bindings: rockchip-dw-mshc: use consistent clock names MAINTAINERS: Add linux/of_*.h headers to appropriate subsystems scripts: turn off some new dtc warnings by default scripts/dtc: Update to upstream version v1.4.6-9-gaadd0b65c987 scripts/dtc: generate lexer and parser during build instead of shipping powerpc: boot: add strrchr function of: overlay: do not include path in full_name of added nodes of: unittest: clean up changeset test arm64/efi: Make strrchr() available to the EFI namespace ARM: boot: add strrchr function ...	2018-04-05 21:03:42 -07:00
Linus Torvalds	052c220da3	SCSI for-linus on 20180404 This is mostly updates of the usual drivers: arcmsr, qla2xx, lpfc, ufs, mpt3sas, hisi_sas. In addition we have removed several really old drivers: sym53c416, NCR53c406a, fdomain, fdomain_cs and removed the old scsi_module.c initialization from all remaining drivers. Plus an assortment of bug fixes, initialization errors and other minor fixes. Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWsVSnSYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbvbAP9ErpTZ OR5iJ5HIz4W3Bd8aTfEpJrDyeYwSUC+sra5SKQD/ZWyVB3fYFSg+ZROyT26pmtmd SdImhG7hLaHgVvF5qRQ= =SQ/n -----END PGP SIGNATURE----- Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly updates of the usual drivers: arcmsr, qla2xx, lpfc, ufs, mpt3sas, hisi_sas. In addition we have removed several really old drivers: sym53c416, NCR53c406a, fdomain, fdomain_cs and removed the old scsi_module.c initialization from all remaining drivers. Plus an assortment of bug fixes, initialization errors and other minor fixes" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (168 commits) scsi: ufs: Add support for Auto-Hibernate Idle Timer scsi: ufs: sysfs: reworking of the rpm_lvl and spm_lvl entries scsi: qla2xxx: fx00 copypaste typo scsi: qla2xxx: fix error message on <qla2400 scsi: smartpqi: update driver version scsi: smartpqi: workaround fw bug for oq deletion scsi: arcmsr: Change driver version to v1.40.00.05-20180309 scsi: arcmsr: Sleep to avoid CPU stuck too long for waiting adapter ready scsi: arcmsr: Handle adapter removed due to thunderbolt cable disconnection. scsi: arcmsr: Rename ACB_F_BUS_HANG_ON to ACB_F_ADAPTER_REMOVED for adapter hot-plug scsi: qla2xxx: Update driver version to 10.00.00.06-k scsi: qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan scsi: qla2xxx: Cleanup code to improve FC-NVMe error handling scsi: qla2xxx: Fix FC-NVMe IO abort during driver reset scsi: qla2xxx: Fix retry for PRLI RJT with reason of BUSY scsi: qla2xxx: Remove nvme_done_list scsi: qla2xxx: Return busy if rport going away scsi: qla2xxx: Fix n2n_ae flag to prevent dev_loss on PDB change scsi: qla2xxx: Add FC-NVMe abort processing scsi: qla2xxx: Add changes for devloss timeout in driver ...	2018-04-05 15:05:53 -07:00
Nicholas Piggin	c1b25a17d2	powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep POWER8 restores AMOR when waking from deep sleep, but POWER9 does not, because it does not go through the subcore restore. Have POWER9 restore it in core restore. Fixes: `ee97b6b99f` ("powerpc/mm/radix: Setup AMOR in HV mode to allow key 0") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 16:48:52 +10:00
Nicholas Piggin	3a52f6014d	powerpc/64s: Fix POWER9 DD2.2 and above in cputable features The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and above (which is what the dt_cpu_ftrs setup does). Fix cputable for DD2.2 to match. This came about due to patches `b5af4f2793` ("powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2"), and `9e9626ed3a` ("powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features") being in-flight at once. The latter patch fixed dt_cpu_ftrs like this one does. The former changed cputable to match dt_cpu_ftrs. Fixes: `b5af4f2793` ("powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 16:28:13 +10:00
Nicholas Piggin	c130153e45	powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit The pkey code added a CPU_FTR_PKEY bit, but did not add it to the dt_cpu_ftrs feature set. Although capability is supported by all processors in the base dt_cpu_ftrs set for 64s, it's a significant and sufficiently well defined feature to make it optional. So add it as a quirk for now, which can be versioned out then controlled by the firmware (once dt_cpu_ftrs gains versioning support). Fixes: `cf43d3b264` ("powerpc: Enable pkey subsystem") Cc: stable@vger.kernel.org # v4.16+ Cc: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 16:28:00 +10:00
Nicholas Piggin	a57ac41183	powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits Presently the dt_cpu_ftrs restore_cpu will only add bits to the LPCR for secondaries, but some bits must be removed (e.g., UPRT for HPT). Not clearing these bits on secondaries causes checkstops when booting with disable_radix. restore_cpu can not just set LPCR, because it is also called by the idle wakeup code which relies on opal_slw_set_reg to restore the value of LPCR, at least on P8 which does not save LPCR to stack in the idle code. Fix this by including a mask of bits to clear from LPCR as well, which is used by restore_cpu. This is a little messy now, but it's a minimal fix that can be backported. Longer term, the idle SPR save/restore code can be reworked to completely avoid calls to restore_cpu, then restore_cpu would be able to unconditionally set LPCR to match boot processor environment. Fixes: `5a61ef74f2` ("powerpc/64s: Support new device tree binding for discovering CPU features") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 16:10:36 +10:00
Michael Ellerman	a67cc594df	Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" As described in that commit: When stop is executed with EC=ESL=0, it appears to execute like a normal instruction (resuming from NIP when woken by interrupt). So all the save/restore handling can be avoided completely. This is true, except in the case of an NMI interrupt (sreset or machine check) interrupting the instruction. In that case, the NMI gets an "interrupt occurred while the processor was in power-saving mode" indication. The power-save wakeup code uses that bit to decide whether to restore some registers (e.g., LR). Because these are no longer saved, this causes random register corruption. It may be possible to restore this optimisation by detecting the case of no register loss on the wakeup side, and avoid restoring in that case, but that's not a minor fix because the wakeup code itself uses some registers that would be live (e.g., LR). Fixes: `b9ee31e100` ("powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 16:06:25 +10:00
Logan Gunthorpe	07c3d9eaa4	powerpc: iomap.c: introduce io{read\|write}64_{lo_hi\|hi_lo} These functions will be introduced into the generic iomap.c so they can deal with PIO accesses in hi-lo/lo-hi variants. Thus, the powerpc version of iomap.c will need to provide the same functions even though, in this arch, they are identical to the regular io{read\|write}64 functions. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Horia Geantă <horia.geanta@nxp.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 14:59:26 +10:00
Logan Gunthorpe	ef237039c5	powerpc: io.h: move iomap.h include so that it can use readq/writeq defs Subsequent patches in this series makes use of the readq and writeq defines in iomap.h. However, as is, they get missed on the powerpc platform seeing the include comes before the define. This patch moves the include down to fix this. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 14:58:52 +10:00
Naveen N. Rao	5d6a03ebc8	powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it We get the below warning if we try to use kexec on P9: kexec_core: Starting new kernel WARNING: CPU: 0 PID: 1223 at arch/powerpc/kernel/process.c:826 __set_breakpoint+0xb4/0x140 [snip] NIP __set_breakpoint+0xb4/0x140 LR kexec_prepare_cpus_wait+0x58/0x150 Call Trace: 0xc0000000ee70fb20 (unreliable) 0xc0000000ee70fb20 default_machine_kexec+0x234/0x2c0 machine_kexec+0x84/0x90 kernel_kexec+0xd8/0xe0 SyS_reboot+0x214/0x2c0 system_call+0x58/0x6c This happens since we are trying to clear hw breakpoint on POWER9, though we don't have CPU_FTR_DAWR enabled. Guard __set_breakpoint() within hw_breakpoint_disable() with ppc_breakpoint_available() to address this. Fixes: `9654153158` ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 21:54:02 +10:00
Aneesh Kumar K.V	7a22d6321c	powerpc/mm/radix: Update command line parsing for disable_radix kernel parameter disable_radix takes different options disable_radix=yes\|no\|1\|0 or just disable_radix. prom_init parsing is not supporting these options. Fixes: `1fd6c02207` ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 16:59:50 +10:00
Aneesh Kumar K.V	cec4e9b28f	powerpc/mm/radix: Parse disable_radix commandline correctly. kernel parameter disable_radix takes different options disable_radix=yes\|no\|1\|0 or just disable_radix. When using the later format we get below error. `Malformed early option 'disable_radix'` Fixes: `1fd6c02207` ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 16:59:36 +10:00
Aneesh Kumar K.V	6fa504835d	powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb With 64k page size, we have hugetlb pte entries at the pmd and pud level for book3s64. We don't need to create a separate page table cache for that. With 4k we need to make sure hugepd page table cache for 16M is placed at PUD level and 16G at the PGD level. Simplify all these by not using HUGEPD_PD_SHIFT which is confusing for book3s64. Without this patch, with 64k page size we create pagetable caches with shift value 10 and 7 which are not used at all. Fixes: `419df06eea` ("powerpc: Reduce the PTE_INDEX_SIZE") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 16:58:53 +10:00
Aneesh Kumar K.V	fb4e5dbd44	powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix With split PTL (page table lock) config, we allocate the level 4 (leaf) page table using pte fragment framework instead of slab cache like other levels. This was done to enable us to have split page table lock at the level 4 of the page table. We use page->plt backing the all the level 4 pte fragment for the lock. Currently with Radix, we use only 16 fragments out of the allocated page. In radix each fragment is 256 bytes which means we use only 4k out of the allocated 64K page wasting 60k of the allocated memory. This was done earlier to keep it closer to hash. This patch update the pte fragment count to 256, thereby using the full 64K page and reducing the memory usage. Performance tests shows really low impact even with THP disabled. With THP disabled we will be contenting further less on level 4 ptl and hence the impact should be further low. 256 threads: without patch (10 runs of ./ebizzy -m -n 1000 -s 131072 -S 100) median = 15678.5 stdev = 42.1209 with patch: median = 15354 stdev = 194.743 This is with THP disabled. With THP enabled the impact of the patch will be less. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 16:58:06 +10:00
Aneesh Kumar K.V	f2ed480fa4	powerpc/mm/keys: Update documentation and remove unnecessary check Adds more code comments. We also remove an unnecessary pkey check after we check for pkey error in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 15:23:09 +10:00
Nicholas Piggin	b9ee31e100	powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead When stop is executed with EC=ESL=0, it appears to execute like a normal instruction (resuming from NIP when woken by interrupt). So all the save/restore handling can be avoided completely. In particular NV GPRs do not have to be saved, and MSR does not have to be switched back to kernel MSR. So move the test for EC=ESL=0 sleep states out to power9_idle_stop, and return directly to the caller after stop in that case. This improves performance for ping-pong benchmark with the stop0_lite idle state by 2.54% for 2 threads in the same core, and 2.57% for different cores. Performance increase with HV_POSSIBLE defined will be improved further by avoiding the hwsync. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 11:11:43 +10:00
Michael Ellerman	d0b791c029	powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() Commit `3d4fbffdd7` ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug") that added power9_offline_stop() was written before commit `7672691a08` ("powerpc/powernv: Provide a way to force a core into SMT4 mode"). When merging the former I failed to notice that it caused us to skip the force-SMT4 logic for offline CPUs. The result is that offlined CPUs will not correctly participate in the force-SMT4 logic, which presumably will result in badness (not tested). Reconcile the two commits by making power9_offline_stop() a pre-cursor to power9_idle_stop(), so that they share the force-SMT4 logic. This is based on an original commit from Nick, all breakage is my own. Fixes: `3d4fbffdd7` ("powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>	2018-04-04 09:09:35 +10:00
Linus Torvalds	3b24b83763	Kbuild updates for v4.17 - add a shell script to get Clang version - improve portability of build scripts - drop always-enabled CONFIG_THIN_ARCHIVE and remove unused code - rename built-in.o which is now thin archive to built-in.a - process clean/build targets one by one to get along with -j option - simplify ld-option - improve building with CONFIG_TRIM_UNUSED_KSYMS - define KBUILD_MODNAME even for objects shared among multiple modules - avoid linking multiple instances of same objects from composite objects - move <linux/compiler_types.h> to c_flags to include it only for C files - clean-up various Makefiles -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJaw6eWAAoJED2LAQed4NsGrK8QAJmbYg83TTNoOgQRK/7Lg+sj KL1+RGFxmdHRVOqG5n18L7Q4LmTD19tUFNQImrQTTrKrbH2vbMSTF2PfzdmDRwMz R5vW5+wsagfhSttOce/GR4p9+bM9XEclzEa3liqNVQxijOFXmkV14pn0x5anYfeB ABthxFFHcVn3exP/q3lmq048x1yNE71wUU5WQIWf6V/ZKf+++wQU8r7HpnATWYeO vtf8gZq+xyLLjhxoJF6n6olSZXI7Yhz4jV2G68/VroS312AUFWPogK+cSshWGlSw zGixM1q55oj9CXjZ37nR6pTzQhSZLf/uHX5beatlpeoJ4Hho6HlIABvx2oEnat7b o5RW64RB0gVJqlYZdKxL29HNrovr9tlWPTaIPGFRvWDpl3c1w+rMKXE+5hwu8jMJ 2jgxd5FZCgBaDsAKojmeQR7PAo2ffAdUO0Dj/SuAaMOpuHWHcnJk9kIN2PUrC+Sf d/H2soT9x+60KbQmtCEo5VfEN8bvNP3+ZSnadEG/gRN2IIa5FZAUQykU+i50gAvj tuKAokdRGZHvXM+buYFBfN6RbhVCXzBF/fAG7r37QVR2u1zaUszmgFOUqERhTQfm RNnyeAs9G9rljtna/AD7cIOdKTg8oETPISxt8Y6EzNMpI8PhF0aGoxso3yD19oH1 M+fq55RigsR48Kic40hY =N5BL -----END PGP SIGNATURE----- Merge tag 'kbuild-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - add a shell script to get Clang version - improve portability of build scripts - drop always-enabled CONFIG_THIN_ARCHIVE and remove unused code - rename built-in.o which is now thin archive to built-in.a - process clean/build targets one by one to get along with -j option - simplify ld-option - improve building with CONFIG_TRIM_UNUSED_KSYMS - define KBUILD_MODNAME even for objects shared among multiple modules - avoid linking multiple instances of same objects from composite objects - move <linux/compiler_types.h> to c_flags to include it only for C files - clean-up various Makefiles * tag 'kbuild-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits) kbuild: get <linux/compiler_types.h> out of <linux/kconfig.h> kbuild: clean up link rule of composite modules kbuild: clean up archive rule of built-in.a kbuild: remove partial section mismatch detection for built-in.a net: liquidio: clean up Makefile for simpler composite object handling lib: zstd: clean up Makefile for simpler composite object handling kbuild: link $(real-obj-y) instead of $(obj-y) into built-in.a kbuild: rename real-objs-y/m to real-obj-y/m kbuild: move modname and modname-multi close to modname_flags kbuild: simplify modname calculation kbuild: fix modname for composite modules kbuild: define KBUILD_MODNAME even if multiple modules share objects kbuild: remove unnecessary $(subst $(obj)/, , ...) in modname-multi kbuild: Use ls(1) instead of stat(1) to obtain file size kbuild: link vmlinux only once for CONFIG_TRIM_UNUSED_KSYMS kbuild: move include/config/ksym/* to include/ksym/* kbuild: move CONFIG_TRIM_UNUSED_KSYMS code unneeded for external module kbuild: restore autoksyms.h touch to the top Makefile kbuild: move 'scripts' target below kbuild: remove wrong 'touch' in adjust_autoksyms.sh ...	2018-04-03 15:51:22 -07:00
Linus Torvalds	4608f06453	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next Pull sparc updates from David Miller: 1) Add support for ADI (Application Data Integrity) found in more recent sparc64 cpus. Essentially this is keyed based access to virtual memory, and if the key encoded in the virual address is wrong you get a trap. The mm changes were reviewed by Andrew Morton and others. Work by Khalid Aziz. 2) Validate DAX completion index range properly, from Rob Gardner. 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Make atomic_xchg() an inline function rather than a macro. sparc64: Properly range check DAX completion index sparc: Make auxiliary vectors for ADI available on 32-bit as well sparc64: Oracle DAX driver depends on SPARC64 sparc64: Update signal delivery to use new helper functions sparc64: Add support for ADI (Application Data Integrity) mm: Allow arch code to override copy_highpage() mm: Clear arch specific VM flags on protection change mm: Add address parameter to arch_validate_prot() sparc64: Add auxiliary vectors to report platform ADI properties sparc64: Add handler for "Memory Corruption Detected" trap sparc64: Add HV fault type handlers for ADI related faults sparc64: Add support for ADI register fields, ASIs and traps mm, swap: Add infrastructure for saving page metadata on swap signals, sparc: Add signal codes for ADI violations	2018-04-03 14:08:58 -07:00
Nicholas Piggin	f2748bdfe1	powerpc/powernv: Always stop secondaries before reboot/shutdown Currently powernv reboot and shutdown requests just leave secondaries to do their own things. This is undesirable because they can trigger any number of watchdogs while waiting for reboot, but also we don't know what else they might be doing -- they might be causing trouble, trampling memory, etc. The opal scheduled flash update code already ran into watchdog problems due to flashing taking a long time, and it was fixed with `2196c6f1ed` ("powerpc/powernv: Return secondary CPUs to firmware before FW update"), which returns secondaries to opal. It's been found that regular reboots can take over 10 seconds, which can result in the hard lockup watchdog firing, reboot: Restarting system [ 360.038896709,5] OPAL: Reboot request... Watchdog CPU:0 Hard LOCKUP Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 Watchdog CPU:16 Hard LOCKUP watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] This patch removes the special case for flash update, and calls smp_send_stop in all cases before calling reboot/shutdown. smp_send_stop could return CPUs to OPAL, the main reason not to is that the request could come from a NMI that interrupts OPAL code, so re-entry to OPAL can cause a number of problems. Putting secondaries into simple spin loops improves the chances of a successful reboot. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 22:59:57 +10:00
Nicholas Piggin	855bfe0de1	powerpc: hard disable irqs in smp_send_stop loop The hard lockup watchdog can fire under local_irq_disable on platforms with irq soft masking. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 22:59:10 +10:00
Nicholas Piggin	6bed323762	powerpc: use NMI IPI for smp_send_stop Use the NMI IPI rather than smp_call_function for smp_send_stop. Have stopped CPUs hard disable interrupts rather than just soft disable. This function is used in crash/panic/shutdown paths to bring other CPUs down as quickly and reliably as possible, and minimizing their potential to cause trouble. Avoiding the Linux smp_call_function infrastructure and (if supported) using true NMI IPIs makes this more robust. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 22:59:09 +10:00
Nicholas Piggin	a2b5e056b7	powerpc/powernv: Fix SMT4 forcing idle code The PSSCR value is not stored to PACA_REQ_PSSCR if the CPU does not have the XER[SO] bug. Fix this by storing up-front, outside the workaround code. The initial test is not required because it is a slow path. The workaround is made to depend on CONFIG_KVM_BOOK3S_HV_POSSIBLE, to match pnv_power9_force_smt4_catch() where it is used. Drop the comment on pnv_power9_force_smt4_catch() as it's no longer true. Fixes: `7672691a08` ("powerpc/powernv: Provide a way to force a core into SMT4 mode") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 22:14:27 +10:00
Mauricio Faria de Oliveira	6232774f15	powerpc/pseries: Restore default security feature flags on setup After migration the security feature flags might have changed (e.g., destination system with unpatched firmware), but some flags are not set/clear again in init_cpu_char_feature_flags() because it assumes the security flags to be the defaults. Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then init_cpu_char_feature_flags() does not run again, which potentially might leave the system in an insecure or sub-optimal configuration. So, just restore the security feature flags to the defaults assumed by init_cpu_char_feature_flags() so it can set/clear them correctly, and to ensure safe settings are in place in case the hypercall fail. Fixes: `f636c14790` ("powerpc/pseries: Set or clear security feature flags") Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags") Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 21:50:09 +10:00
Mauricio Faria de Oliveira	e7347a8683	powerpc: Move default security feature flags This moves the definition of the default security feature flags (i.e., enabled by default) closer to the security feature flags. This can be used to restore current flags to the default flags. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 21:50:08 +10:00
Nicholas Piggin	252988cbf0	powerpc: Don't write to DABR on >= Power8 if DAWR is disabled flush_thread() calls __set_breakpoint() via set_debug_reg_defaults() without checking ppc_breakpoint_available(). On Power8 or later CPUs which have the DAWR feature disabled that will cause a write to the DABR which is incorrect as those CPUs don't have a DABR. Fix it two ways, by checking ppc_breakpoint_available() in set_debug_reg_defaults(), and also by reworking __set_breakpoint() to only write to DABR on Power7 or earlier. Fixes: `9654153158` ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rework the logic in __set_breakpoint()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 21:50:08 +10:00
Nicholas Piggin	e303c08787	KVM: PPC: Book3S HV: Fix ppc_breakpoint_available compile error arch/powerpc/kvm/book3s_hv.c: In function ‘kvmppc_h_set_mode’: arch/powerpc/kvm/book3s_hv.c:745:8: error: implicit declaration of function ‘ppc_breakpoint_available’ if (!ppc_breakpoint_available()) ^~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `398e712c00` ("KVM: PPC: Book3S HV: Return error from h_set_mode(SET_DAWR) on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 21:50:07 +10:00
Aneesh Kumar K.V	a6201da34f	powerpc: Fix oops due to bad access of lppaca on bare metal Commit `8e0b634b13` ("powerpc/64s: Do not allocate lppaca if we are not virtualized") removed allocation of lppaca on bare metal platforms. But with CONFIG_PPC_SPLPAR enabled, we still access the lppaca on bare metal in some code paths. Fix this but adding runtime checks for SPLPAR (shared processor LPAR). Fixes: `8e0b634b13` ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 21:50:07 +10:00
Linus Torvalds	642e7fd233	Merge branch 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...	2018-04-02 21:22:12 -07:00
Linus Torvalds	2fcd2b306a	Merge branch 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 dma mapping updates from Ingo Molnar: "This tree, by Christoph Hellwig, switches over the x86 architecture to the generic dma-direct and swiotlb code, and also unifies more of the dma-direct code between architectures. The now unused x86-only primitives are removed" * 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrs swiotlb: Make swiotlb_{alloc,free}_buffer depend on CONFIG_DMA_DIRECT_OPS dma/swiotlb: Remove swiotlb_{alloc,free}_coherent() dma/direct: Handle force decryption for DMA coherent buffers in common code dma/direct: Handle the memory encryption bit in common code dma/swiotlb: Remove swiotlb_set_mem_attributes() set_memory.h: Provide set_memory_{en,de}crypted() stubs x86/dma: Remove dma_alloc_coherent_gfp_flags() iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent() iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() x86/dma/amd_gart: Use dma_direct_{alloc,free}() x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA x86/dma: Use generic swiotlb_ops x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y) x86/dma: Remove dma_alloc_coherent_mask()	2018-04-02 17:18:45 -07:00
Dominik Brodowski	c7b95d5156	mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() Using this helper allows us to avoid the in-kernel calls to the sys_readahead() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_readahead(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:12 +02:00
Dominik Brodowski	a90f590a1b	mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Using this helper allows us to avoid the in-kernel calls to the sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_mmap_pgoff(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:11 +02:00
Dominik Brodowski	9d5b7c956b	mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as ksys_fadvise64_64(). Some compat stubs called sys_fadvise64(), which then just passed through the arguments to sys_fadvise64_64(). Get rid of this indirection, and call ksys_fadvise64_64() directly. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:10 +02:00
Dominik Brodowski	edf292c76b	fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() Using the ksys_fallocate() wrapper allows us to get rid of in-kernel calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_fallocate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:09 +02:00
Dominik Brodowski	36028d5dd7	fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls Using the ksys_p{read,write}64() wrappers allows us to get rid of in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_p{read,write}64(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:09 +02:00
Dominik Brodowski	df260e21e6	fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() Using the ksys_truncate() wrapper allows us to get rid of in-kernel calls to the sys_truncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_truncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:08 +02:00
Dominik Brodowski	806cbae122	fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall Using this helper allows us to avoid the in-kernel calls to the sys_sync_file_range() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_sync_file_range(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:07 +02:00
Dominik Brodowski	411d9475cf	fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate() Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_ftruncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:00 +02:00
Linus Torvalds	486adcea4a	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - Modernize the kprobe and uprobe creation/destruction tooling ABIs: The existing text based APIs (kprobe_events and uprobe_events in tracefs), are naive, limited ABIs in that they require user-space to clean up after themselves, which is both difficult and fragile if the tool is buggy or exits unexpectedly. In other words they are not really suited for modern, robust tooling. So introduce a modern, file descriptor based ABI that does not have these limitations: introduce the 'perf_kprobe' and 'perf_uprobe' PMUs and extend the perf_event_open() syscall to create events with a kprobe/uprobe attached to them. These [k,u]probe are associated with this file descriptor, so they are not available in tracefs. (Song Liu) - Intel Cannon Lake CPU support (Harry Pan) - Intel PT cleanups (Alexander Shishkin) - Improve the performance of pinned/flexible event groups by using RB trees (Alexey Budankov) - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification of hardware breakpoints, which new ABI variant massively speeds up existing tooling that uses hardware breakpoints to instrument (and debug) memory usage. (Milind Chabbi, Jiri Olsa) - Various Intel PEBS handling fixes and improvements, and other Intel PMU improvements (Kan Liang) - Various perf core improvements and optimizations (Peter Zijlstra) - ... misc cleanups, fixes and updates. There's over 200 tooling commits, here's an (imperfect) list of highlights: - 'perf annotate' improvements: * Recognize and handle jumps to other functions as calls, which improves the navigation along jumps and back. (Arnaldo Carvalho de Melo) * Add the 'P' hotkey in TUI annotation to dump annotation output into a file, to ease e-mail reporting of annotation details. (Arnaldo Carvalho de Melo) * Add an IPC/cycles column to the TUI (Jin Yao) * Improve s390 assembly annotation (Thomas Richter) * Refactor the output formatting logic to better separate it into interactive and non-interactive features and add the --stdio2 output variant to demonstrate this. (Arnaldo Carvalho de Melo) - 'perf script' improvements: * Add Python 3 support (Jaroslav Škarvada) * Add --show-round-event (Jiri Olsa) - 'perf c2c' improvements: * Add NUMA analysis support (Jiri Olsa) - 'perf trace' improvements: * Improve PowerPC support (Ravi Bangoria) - 'perf inject' improvements: * Integrate ARM CoreSight traces (Robert Walker) - 'perf stat' improvements: * Add the --interval-count option (yuzhoujian) * Add the --timeout option (yuzhoujian) - 'perf sched' improvements (Changbin Du) - Vendor events improvements : * Add IBM s390 vendor events (Thomas Richter) * Add and improve arm64 vendor events (John Garry, Ganapatrao Kulkarni) * Update POWER9 vendor events (Sukadev Bhattiprolu) - Intel PT tooling improvements (Adrian Hunter) - PMU handling improvements (Agustin Vega-Frias) - Record machine topology in perf.data (Jiri Olsa) - Various overwrite related cleanups (Kan Liang) - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet) - ... and lots of other changes, cleanups and fixes, see the shortlog and Git history for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits) perf/x86/intel: Enable C-state residency events for Cannon Lake perf/x86/intel: Add Cannon Lake support for RAPL profiling perf/x86/pt, coresight: Clean up address filter structure perf vendor events s390: Add JSON files for IBM z14 perf vendor events s390: Add JSON files for IBM z13 perf vendor events s390: Add JSON files for IBM zEC12 zBC12 perf vendor events s390: Add JSON files for IBM z196 perf vendor events s390: Add JSON files for IBM z10EC z10BC perf mmap: Be consistent when checking for an unmaped ring buffer perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done() perf build: Fix check-headers.sh opts assignment perf/x86: Update rdpmc_always_available static key to the modern API perf annotate: Use absolute addresses to calculate jump target offsets perf annotate: Defer searching for comma in raw line till it is needed perf annotate: Support jumping from one function to another perf annotate: Add "_local" to jump/offset validation routines perf python: Reference Py_None before returning it perf annotate: Mark jumps to outher functions with the call arrow perf annotate: Pass function descriptor to its instruction parsing routines perf annotate: No need to calculate notes->start twice ...	2018-04-02 11:06:34 -07:00
Mathieu Malaterre	19e68b2aec	powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT In commit `9690c15742` ("powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT") an issue was discovered where `mm->context.id` was being truncated to an `unsigned int`, while the PID is actually an `unsigned long`. Update the earlier patch by fixing one remaining occurrence. Discovered during a compilation with W=1: arch/powerpc/mm/tlb-radix.c:702:19: error: comparison is always false due to limited range of data type [-Werror=type-limits] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:15:34 +10:00
Matt Evans	0e524e761f	powerpc: Clear branch trap (MSR.BE) before delivering SIGTRAP When using SIG_DBG_BRANCH_TRACING, MSR.BE is left enabled in the user context when single_step_exception() prepares the SIGTRAP delivery. The resulting branch-trap-within-the-SIGTRAP-handler isn't healthy. Commit `2538c2d08f` broke this, by replacing an MSR mask operation of ~(MSR_SE \| MSR_BE) with a call to clear_single_step() which only clears MSR_SE. This patch adds a new helper, clear_br_trace(), which clears the debug trap before invoking the signal handler. This helper is a NOP for BookE as SIG_DBG_BRANCH_TRACING isn't supported on BookE. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:15:33 +10:00
Nicholas Piggin	4b7e5532d2	powerpc/64s: Add POWER9 CPU type selection Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:15:32 +10:00
Nicholas Piggin	db5ae1c155	powerpc/64s: Refine feature sets for little endian builds This reduces vmlinux text size by 1kB and data by 1.5kB with a small build! Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add the recently added CPU_FTRS_POWER9_DD2_2 to the little endian possible mask as noticed by Nick.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:14:40 +10:00
Nicholas Piggin	a73657ea19	powerpc/64: Add GENERIC_CPU support for little endian Add GENERIC_CPU support for little-endian rather than using POWER8 specific selection for POWER9 and above. Restrict GENERIC_CPU to POWER8 and above on little endian. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Duplicate GENERIC_CPU to avoid a kbuild warning about the prompt being redefined. Spell out that GENERIC means >= POWER4 for BE.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 21:52:52 +10:00
Nicholas Piggin	471d7ff8b5	powerpc/64s: Remove POWER4 support POWER4 has been broken since at least the change `49d09bf2a6` ("powerpc/64s: Optimise MSR handling in exception handling"), which requires mtmsrd L=1 support. This was introduced in ISA v2.01, and POWER4 supports ISA v2.00. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:50 +11:00
Nicholas Piggin	3735eb850e	powerpc: Remove unused CPU_FTR_ARCH_201 The last usage was removed in `c17b98cf60` ("KVM: PPC: Book3S HV: Remove code for PPC970 processors") (Dec 2014). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:50 +11:00
Nicholas Piggin	9e9626ed3a	powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and above (which is what the cputable setup does). Fix DT CPU features quirk setup to match. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Merge with upstream changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:49 +11:00
Nicholas Piggin	15a3204d24	powerpc/64s: Set assembler machine type to POWER4 Rather than override the machine type in .S code (which can hide wrong or ambiguous code generation for the target), set the type to power4 for all assembly. This also means we need to be careful not to build power4-only code when we're not building for Book3S, such as the "power7" versions of copyuser/page/memcpy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:49 +11:00
Nicholas Piggin	d50614fa45	powerpc/64s: Explicitly add vector features to CPU_FTRS_POSSIBLE ALTIVEC and VSX features are not added by to default to the POWERx CPU feature sets because they are intended to be enabled by firmware. Currently they end up in CPU_FTRS_POSSIBLE due to their inclusion in other the set for other CPUs, eg. PPC970. But they should be added individually to the CPU_FTRS_POSSIBLE set, because if we reduce the set of CPUs that are built-for they may disappear from the possible mask. It already contains CPU_FTR_VSX, so add ALTIVEC. The _COMP features should be used because they won't be present if compiled out. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add detail to change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:48 +11:00
Nicholas Piggin	b842bd0f7a	powerpc/64s: Add all POWER9 features to CPU_FTRS_ALWAYS It's not a bug to have features missing in CPU_FTR_ALWAYS, but it is a missed opportunity for optimisation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:48 +11:00
Mark Greer	147704534e	powerpc/boot: Remove duplicate typedefs from libfdt_env.h When building a uImage or zImage using ppc6xx_defconfig and some other defconfigs, the following error occurs with GCC 4.5.1: /arch/powerpc/boot/libfdt_env.h:10:13: error: redefinition of typedef 'uint32_t' /arch/powerpc/boot/types.h:21:13: note: previous declaration of 'uint32_t' was here /arch/powerpc/boot/libfdt_env.h:11:13: error: redefinition of typedef 'uint64_t' /arch/powerpc/boot/types.h:22:13: note: previous declaration of 'uint64_t' was here The problem is that commit `656ad58ef1` (powerpc/boot: Add OPAL console to epapr wrappers) adds typedefs for uint32_t and uint64_t to type.h but doesn't remove the pre-existing (and now duplicate) typedefs from libfdt_env.h. Fix the error by removing the duplicate typedefs from libfdt_env.h Signed-off-by: Mark Greer <mgreer@animalcreek.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:47 +11:00
Nicholas Piggin	8c1c7fb0b5	powerpc/64s/idle: avoid sync for KVM state when waking from idle When waking from a CPU idle instruction (e.g., nap or stop), the sync for ordering the KVM secondary thread state can be avoided if there wakeup is coming from a kernel context rather than KVM context. This improves performance for ping-pong benchmark with the stop0 idle state by 0.46% for 2 threads in the same core, and 1.02% for different cores. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:47 +11:00
Nicholas Piggin	3d4fbffdd7	powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug Implement a new function to invoke stop, power9_offline_stop, which is like power9_idle_stop but used by the cpu hotplug code. Move KVM secondary state manipulation code to the offline case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:46 +11:00
Nicholas Piggin	d40b6768e4	powerpc/64s: sreset panic if there is no debugger or crash dump handlers system_reset_exception does most of its own crash handling now, invoking the debugger or crash dumps if they are registered. If not, then it goes through to die() to print stack traces, and then is supposed to panic (according to comments). However after die() prints oopses, it does its own handling which doesn't allow system_reset_exception to panic (e.g., it may just kill the current process). This patch causes sreset exceptions to return from die after it prints messages but before acting. This also stops die from invoking the debugger on 0x100 crashes. system_reset_exception similarly calls the debugger. It had been thought this was harmless (because if the debugger was disabled, neither call would fire, and if it was enabled the first call would return). However in some cases like xmon 'X' command, the debugger returns 0, which currently causes it to be entered again (first in system_reset_exception, then in die), which is confusing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:46 +11:00
Nicholas Piggin	15b4dd7981	powerpc/64s: return more carefully from sreset NMI System Reset, being an NMI, must return more carefully than other interrupts. It has traditionally returned via the nromal return from exception path, but that has a number of problems. - r13 does not get restored if returning to kernel. This is for interrupts which may cause a context switch, which sreset will never do. Interrupting OPAL (which uses a different r13) is one place where this causes breakage. - It may cause several other problems returning to kernel with preempt or TIF_EMULATE_STACK_STORE if it hits at the wrong time. It's safer just to have a simple restore and return, like machine check which is the other NMI. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:45 +11:00
Michael Neuling	f0295e047f	powerpc/eeh: Fix race with driver un/bind The current EEH callbacks can race with a driver unbind. This can result in a backtraces like this: EEH: Frozen PHB#0-PE#1fc detected EEH: PE location: S000009, PHB location: N/A CPU: 2 PID: 2312 Comm: kworker/u258:3 Not tainted 4.15.6-openpower1 #2 Workqueue: nvme-wq nvme_reset_work [nvme] Call Trace: dump_stack+0x9c/0xd0 (unreliable) eeh_dev_check_failure+0x420/0x470 eeh_check_failure+0xa0/0xa4 nvme_reset_work+0x138/0x1414 [nvme] process_one_work+0x1ec/0x328 worker_thread+0x2e4/0x3a8 kthread+0x14c/0x154 ret_from_kernel_thread+0x5c/0xc8 nvme nvme1: Removing after probe failure status: -19 <snip> cpu 0x23: Vector: 300 (Data Access) at [c000000ff50f3800] pc: c0080000089a0eb0: nvme_error_detected+0x4c/0x90 [nvme] lr: c000000000026564: eeh_report_error+0xe0/0x110 sp: c000000ff50f3a80 msr: 9000000000009033 dar: 400 dsisr: 40000000 current = 0xc000000ff507c000 paca = 0xc00000000fdc9d80 softe: 0 irq_happened: 0x01 pid = 782, comm = eehd Linux version 4.15.6-openpower1 (smc@smc-desktop) (gcc version 6.4.0 (Buildroot 2017.11.2-00008-g4b6188e)) #2 SM P Tue Feb 27 12:33:27 PST 2018 enter ? for help eeh_report_error+0xe0/0x110 eeh_pe_dev_traverse+0xc0/0xdc eeh_handle_normal_event+0x184/0x4c4 eeh_handle_event+0x30/0x288 eeh_event_handler+0x124/0x170 kthread+0x14c/0x154 ret_from_kernel_thread+0x5c/0xc8 The first part is an EEH (on boot), the second half is the resulting crash. nvme probe starts the nvme_reset_work() worker thread. This worker thread starts touching the device which see a device error (EEH) and hence queues up an event in the powerpc EEH worker thread. nvme_reset_work() then continues and runs nvme_remove_dead_ctrl_work() which results in unbinding the driver from the device and hence releases all resources. At the same time, the EEH worker thread starts doing the EEH .error_detected() driver callback, which no longer works since the resources have been freed. This fixes the problem in the same way the generic PCIe AER code (in drivers/pci/pcie/aer/aerdrv_core.c) does. It makes the EEH code hold the device_lock() while performing the driver EEH callbacks and associated code. This ensures either the callbacks are no longer register, or if they are registered the driver will not be removed from underneath us. This has been broken forever. The EEH call backs were first introduced in 2005 (in `77bd741561`) but it's not clear if a lock was needed back then. Fixes: `77bd741561` ("[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines") Cc: stable@vger.kernel.org # v2.6.16+ Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:45 +11:00
Thiago Jung Bauermann	bf8a1abc3d	powerpc/kexec_file: Fix error code when trying to load kdump kernel kexec_file_load() on powerpc doesn't support kdump kernels yet, so it returns -ENOTSUPP in that case. I've recently learned that this errno is internal to the kernel and isn't supposed to be exposed to userspace. Therefore, change to -EOPNOTSUPP which is defined in an uapi header. This does indeed make kexec-tools happier. Before the patch, on ppc64le: # ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz kexec_file_load failed: Unknown error 524 After the patch: # ~bauermann/src/kexec-tools/build/sbin/kexec -s -p /boot/vmlinuz kexec_file_load failed: Operation not supported Fixes: `a0458284f0` ("powerpc: Add support code for kexec_file_load()") Cc: stable@vger.kernel.org # v4.10+ Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Reviewed-by: Simon Horman <horms@verge.net.au> Reviewed-by: Dave Young <dyoung@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:44 +11:00
Jonathan Neuschäfer	7e1405917c	powerpc/mm/32: Remove the reserved memory hack This hack, introduced in commit `c5df7f7751` ("powerpc: allow ioremap within reserved memory regions") is now unnecessary. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:44 +11:00
Jonathan Neuschäfer	57deb8fea0	powerpc/wii: Don't rely on the reserved memory hack Because the two memory blocks (usually called MEM1 and MEM2) are not merged anymore, __request_region in kernel/resource.c will correctly allow reserving regions in the physical address space between MEM1 and MEM2, where many important peripherals are (GPIO, MMC, USB, ...). A previous change to __ioremap_caller in arch/powerpc/mm/pgtable_32.c ensures that multiple memblocks are properly considered in ioremap; this makes it unnecessary to set __allow_ioremap_reserved. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:43 +11:00
Jonathan Neuschäfer	2bbf63264a	powerpc/mm/32: Use page_is_ram to check for RAM On systems where there is MMIO space between different blocks of RAM in the physical address space, __ioremap_caller did not allow mapping these MMIO areas, because they were below the end RAM and thus considered RAM as well. Use the memblock-based page_is_ram function, which returns false for such MMIO holes. v2: Keep the check for p < virt_to_phys(high_memory). On 32-bit systems with high memory (memory above physical address 4GiB), the high memory is expected to be available though ioremap. The high_memory variable marks the end of low memory; comparing against it means that only ioremap requests for low RAM will be denied. Reported by Michael Ellerman. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:43 +11:00
Jonathan Neuschäfer	f65e67c7e3	powerpc/mm: Use memblock API for PPC32 page_is_ram To support accurate checking for different blocks of memory on PPC32, use the same memblock-based approach that's already used on PPC64 also on PPC32. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:42 +11:00
Jonathan Neuschäfer	2615c93e5f	powerpc/mm: Simplify page_is_ram by using memblock_is_memory Instead of open-coding the search in page_is_ram, call memblock_is_memory. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:42 +11:00
Jonathan Neuschäfer	041413b88d	powerpc/wii.dts: Add drive slot LED The Wii has a blue LED in the disk drive slot, which is controlled via a GPIO line. Add this LED to wii.dts, and mark it as a panic-indicator. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:41 +11:00
Jonathan Neuschäfer	80873a0b3a	powerpc/wii.dts: Add GPIO line names These are the GPIO line names on a Nintendo Wii, as documented in: https://wiibrew.org/wiki/Hardware/Hollywood_GPIOs Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:40 +11:00
Jonathan Neuschäfer	9693d5709f	powerpc/wii.dts: Add ngpios property The Hollywood GPIO controller supports 32 GPIOs, but on the Wii, only 24 are used. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:40 +11:00
Jonathan Neuschäfer	9cbaaec1cf	powerpc/wii: Explicitly configure GPIO owner for poweroff pin The Hollywood chipset's GPIO controller has two sets of registers: One for access by the PowerPC CPU, and one for access by the ARM coprocessor (but both are accessible from the PPC because the memory firewall (AHBPROT) is usually disabled when booting Linux, today). The wii_power_off function currently assumes that the poweroff GPIO pin is configured for use via the ARM side, but the upcoming GPIO driver configures all pins for use via the PPC side, breaking poweroff. Configure the owner register explicitly in wii_power_off to make wii_power_off work with and without the new GPIO driver. I think the Wii can be switched to the generic gpio-poweroff driver, after the GPIO driver is merged. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:39 +11:00
Jonathan Neuschäfer	7ab96c0a08	powerpc/wii: Probe the whole devicetree Previously, wii_device_probe would only initialize devices under the /hollywood node. After this patch, platform devices placed outside of /hollywood will also be initialized. The intended usecase for this are devices located outside of the Hollywood chip, such as GPIO LEDs and GPIO buttons. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:39 +11:00
Michael Ellerman	1d0afc0d5a	powerpc/64e: Fix oops due to deferral of paca allocation On 64-bit Book3E systems, in setup_tlb_core_data() we reference other CPUs pacas. But in commit `59f577743d` ("powerpc/64: Defer paca allocation until memory topology is discovered") the allocation of non-boot-CPU pacas was deferred until later in boot. This leads to an oops: CPU maps initialized for 1 thread per core Unable to handle kernel paging request for data at address 0x8888888888888918 Faulting instruction address: 0xc000000000e2f0d0 Oops: Kernel access of bad area, sig: 11 [#1] NIP .setup_tlb_core_data+0xdc/0x160 Call Trace: .setup_tlb_core_data+0x5c/0x160 (unreliable) .setup_arch+0x80/0x348 .start_kernel+0x7c/0x598 start_here_common+0x1c/0x40 Luckily setup_tlb_core_data() is called immediately prior to smp_setup_pacas(). So simply switching their order is sufficient to fix the oops and seems unlikely to have any other unwanted side effects. Fixes: `59f577743d` ("powerpc/64: Defer paca allocation until memory topology is discovered") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:38 +11:00
Aneesh Kumar K.V	ca9a16c3bc	powerpc/kvm: Fix guest boot failure on Power9 since DAWR changes SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with H_SET_DABR. Since the recent commit `e8ebedbf31` ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to return H_UNSUPPORTED on Power9, we see guest boot failures, the symptom is the boot seems to just stop in SLOF, eg: SLOF *************************************************************** QEMU Starting Build Date = Sep 24 2017 12:23:07 FW Version = buildd@ release 20170724 <no further output> SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break the guest boot. That does mean we return a different error to PowerVM in this case, but that's probably not a big concern. Fixes: `e8ebedbf31` ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:13 +11:00
Michael Ellerman	f437c51748	Merge branch 'topic/paca' into next Bring in yet another series that touches KVM code, and might need to be merged into the kvm-ppc branch to resolve conflicts. This required some changes in pnv_power9_force_smt4_catch/release() due to the paca array becomming an array of pointers.	2018-03-31 09:09:36 +11:00
Linus Torvalds	72573481eb	KVM fixes for v4.16-rc8 PPC: - Fix a bug causing occasional machine check exceptions on POWER8 hosts (introduced in 4.16-rc1) x86: - Fix a guest crashing regression with nested VMX and restricted guest (introduced in 4.16-rc1) - Fix dependency check for pv tlb flush (The wrong dependency that effectively disabled the feature was added in 4.16-rc4, the original feature in 4.16-rc1, so it got decent testing.) -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJavUt5AAoJEED/6hsPKofo8uQH/RuijrsAIUnymkYY+6BYFXlh Ri8qhG8VB+C3SpWEtsqcqNVkjJTepCD2Ej5BJTL4Gc9BSTWy7Ht6kqskEgwcnzu2 xRfkg0q0vTj1+GDd+UiTZfxiinoHtB9x3fiXali5UNTCd1fweLxdidETfO+GqMMq KDhTR+S8dXE5VG7r+iJ80LZPtHQJ94f0fh9XpQk3X2ExTG5RBxag1U2nCfiKRAZk xRv1CNAxNaBxS38CgYfHzg31NJx38fnq/qREsIdOx0Ju9WQkglBFkhLAGUb4vL0I nn8YX/oV9cW2G8tyPWjC245AouABOLbzu0xyj5KgCY/z1leA9tdLFX/ET6Zye+E= =++uZ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "PPC: - Fix a bug causing occasional machine check exceptions on POWER8 hosts (introduced in 4.16-rc1) x86: - Fix a guest crashing regression with nested VMX and restricted guest (introduced in 4.16-rc1) - Fix dependency check for pv tlb flush (the wrong dependency that effectively disabled the feature was added in 4.16-rc4, the original feature in 4.16-rc1, so it got decent testing)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix pv tlb flush dependencies KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0 KVM: PPC: Book3S HV: Fix duplication of host SLB entries	2018-03-30 07:24:14 -10:00
Aneesh Kumar K.V	872a100a49	powerpc/mm/hash: Don't memset pgd table if not needed We need to zero-out pgd table only if we share the slab cache with pud/pmd level caches. With the support of 4PB, we don't share the slab cache anymore. Instead of removing the code completely hide it within an #ifdef. We don't need to do this with any other page table level, because they all allocate table of double the size and we take of initializing the first half corrrectly during page table zap. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Consolidate multiple #if / #ifdef into one] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:39 +11:00
Aneesh Kumar K.V	c2b4d8b741	powerpc/mm/hash64: Increase the VA range This patch increases the max virtual (effective) address value to 4PB. With 4K page size config we continue to limit ourself to 64TB. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Keep the H_PGTABLE_RANGE test, update it to work] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V	f384796c40	powerpc/mm: Add support for handling > 512TB address in SLB miss For addresses above 512TB we allocate additional mmu contexts. To make it all easy, addresses above 512TB are handled with IR/DR=1 and with stack frame setup. The mmu_context_t is also updated to track the new extended_ids. To support upto 4PB we need a total 8 contexts. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Minor formatting tweaks and comment wording, switch BUG to WARN in get_ea_context().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V	0dea04b288	powerpc/mm/slice: Consolidate return path in slice_get_unmapped_area() In a following patch, on finding a free area we will need to do allocatinon of extra contexts as needed. Consolidating the return path for slice_get_unmapped_area() will make that easier. Split into a separate patch to make review easy. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:37 +11:00
Aneesh Kumar K.V	1a2f778970	powerpc/mm/keys: Move pte bits to correct headers Memory keys are supported only with hash translation mode. Instead of using #ifdef in generic code move the key related pte bits to respective headers Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:36 +11:00
Frederic Barrat	16b19f1a03	powerpc/xive: Fix wrong xmon output caused by typo Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:36 +11:00
Nicholas Piggin	0bfdf59890	powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently asm/barrier.h is not always included after asm/synch.h, which meant it was missing __SUBARCH_HAS_LWSYNC, so in some files smp_wmb() would be eieio when it should be lwsync. kernel/time/hrtimer.c is one case. __SUBARCH_HAS_LWSYNC is only used in one place, so just fold it in to where it's used. Previously with my small simulator config, 377 instances of eieio in the tree. After this patch there are 55. Fixes: `46d075be58` ("powerpc: Optimise smp_wmb") Cc: stable@vger.kernel.org # v2.6.29+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:34 +11:00
Wei Yongjun	9a2c1d31e6	powerpc/4xx: Fix error return code in ppc4xx_msi_probe() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> [mpe: Add missing ';' to make it compile] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:34 +11:00
Ram Pai	f208638680	powerpc/mm: Fix thread_pkey_regs_init() thread_pkey_regs_init() initializes the pkey related registers instead of initializing the fields in the task structures. Fortunately those key related registers are re-set to zero when the task gets scheduled on the cpu. However its good to fix this glaringly visible error. Fixes: `06bb53b338` ("powerpc: store and restore the pkey state across context switches") Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:33 +11:00
Naveen N. Rao	e6e133c47e	powerpc/kprobes: Fix call trace due to incorrect preempt count Michael Ellerman reported the following call trace when running ftracetest: BUG: using __this_cpu_write() in preemptible [00000000] code: ftracetest/6178 caller is opt_pre_handler+0xc4/0x110 CPU: 1 PID: 6178 Comm: ftracetest Not tainted 4.15.0-rc7-gcc6x-gb2cd1df #1 Call Trace: [c0000000f9ec39c0] [c000000000ac4304] dump_stack+0xb4/0x100 (unreliable) [c0000000f9ec3a00] [c00000000061159c] check_preemption_disabled+0x15c/0x170 [c0000000f9ec3a90] [c000000000217e84] opt_pre_handler+0xc4/0x110 [c0000000f9ec3af0] [c00000000004cf68] optimized_callback+0x148/0x170 [c0000000f9ec3b40] [c00000000004d954] optinsn_slot+0xec/0x10000 [c0000000f9ec3e30] [c00000000004bae0] kretprobe_trampoline+0x0/0x10 This is showing up since OPTPROBES is now enabled with CONFIG_PREEMPT. trampoline_probe_handler() considers itself to be a special kprobe handler for kretprobes. In doing so, it expects to be called from kprobe_handler() on a trap, and re-enables preemption before returning a non-zero return value so as to suppress any subsequent processing of the trap by the kprobe_handler(). However, with optprobes, we don't deal with special handlers (we ignore the return code) and just try to re-enable preemption causing the above trace. To address this, modify trampoline_probe_handler() to not be special. The only additional processing done in kprobe_handler() is to emulate the instruction (in this case, a 'nop'). We adjust the value of regs->nip for the purpose and delegate the job of re-enabling preemption and resetting current kprobe to the probe handlers (kprobe_handler() or optimized_callback()). Fixes: `8a2d71a3f2` ("powerpc/kprobes: Disable preemption before invoking probe handler for optprobes") Cc: stable@vger.kernel.org # v4.15+ Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:33 +11:00
Nicholas Piggin	741de61766	powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write() opal_nvram_write currently just assumes success if it encounters an error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO on other errors instead. Fixes: `628daa8d5a` ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:32 +11:00
Mauricio Faria de Oliveira	0f9bdfe3c7	powerpc/pseries: Fix clearing of security feature flags The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_* flags. Found by playing around with QEMU's implementation of the hypercall: H_CPU_CHAR=0xf000000000000000 H_CPU_BEHAV=0x0000000000000000 This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also clears H_CPU_CHAR_L1D_THREAD_PRIV flag. So there is no RFI flush mitigation at all for cpu_show_meltdown() to report; but currently it does: Original kernel: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: RFI Flush Patched kernel: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Not affected H_CPU_CHAR=0x0000000000000000 H_CPU_BEHAV=0xf000000000000000 This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should report vulnerable; but currently it doesn't: Original kernel: # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Not affected Patched kernel: # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Vulnerable Brown-paper-bag-by: Michael Ellerman <mpe@ellerman.id.au> Fixes: `f636c14790` ("powerpc/pseries: Set or clear security feature flags") Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:31 +11:00
Nicholas Piggin	29ab6c4708	powerpc/mm: Pass node id into create_section_mapping Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move __map_kernel_page_nid() inside #ifdef SPARSEMEM_VMEMMAP] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:07:10 +11:00
Nicholas Piggin	2ad452ffaa	powerpc/64s/radix: Allocate kernel page tables node-local if possible Try to allocate kernel page tables for direct mapping and vmemmap according to the node of the memory they will map. The node is not available for the linear map in early boot, so use range allocation to allocate the page tables from the region they map, which is effectively node-local. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix build error in radix__create_section_mapping()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:07:09 +11:00
Nicholas Piggin	0633dafcf8	powerpc/64s/radix: Split early page table mapping to its own function Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:07:09 +11:00
Nicholas Piggin	f3865f9a71	powerpc/64: Allocate per-cpu stacks node-local if possible Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:07:08 +11:00
Nicholas Piggin	4890aea65a	powerpc/64: Allocate pacas per node Per-node allocations are possible on 64s with radix that does not have the bolted SLB limitation. Hash would be able to do the same if all CPUs had the bottom of their node-local memory bolted as well. This is left as an exercise for the reader. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add dummy definition of boot_cpuid for !SMP] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:06:44 +11:00
Nicholas Piggin	59f577743d	powerpc/64: Defer paca allocation until memory topology is discovered Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rename the dummy allocate_pacas() to fix 32-bit build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:28 +11:00
Nicholas Piggin	9f593f131e	powerpc/setup: Add cpu_to_phys_id array Build an array that finds hardware CPU number from logical CPU number in firmware CPU discovery. Use that rather than setting paca of other CPUs directly, to begin with. Subsequent patch will not have pacas allocated at this point. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix SMP=n build by adding #ifdef in arch_match_cpu_phys_id()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:27 +11:00
Nicholas Piggin	c0abd0c745	powerpc/64: move default SPR recording Move this into the early setup code, and don't iterate over CPU masks. We don't want to call into sysfs so early from setup, and a future patch won't initialize CPU masks by the time this is called. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fold in incremental fix from Nick for DSCR handling] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:26 +11:00
Nicholas Piggin	9bd9be006c	powerpc/mm/numa: move numa topology discovery earlier Split sparsemem initialisation from basic numa topology discovery. Move the parsing earlier in boot, before pacas are allocated. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:26 +11:00
Nicholas Piggin	384e806784	powerpc/64s: Allocate slb_shadow structures individually slb_shadow structures are avoided for radix environment. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:24 +11:00
Nicholas Piggin	499dcd4137	powerpc/64s: Allocate LPPACAs individually We no longer allocate lppacas in an array, so this patch removes the 1kB static alignment for the structure, and enforces the PAPR alignment requirements at allocation time. We can not reduce the 1kB allocation size however, due to existing KVM hypervisors. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:24 +11:00
Nicholas Piggin	d2e60075a3	powerpc/64: Use array of paca pointers and allocate pacas individually Change the paca array into an array of pointers to pacas. Allocate pacas individually. This allows flexibility in where the PACAs are allocated. Future work will allocate them node-local. Platforms that don't have address limits on PACAs would be able to defer PACA allocations until later in boot rather than allocate all possible ones up-front then freeing unused. This is slightly more overhead (one additional indirection) for cross CPU paca references, but those aren't too common. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:23 +11:00
Nicholas Piggin	8e0b634b13	powerpc/64s: Do not allocate lppaca if we are not virtualized The "lppaca" is a structure registered with the hypervisor. This is unnecessary when running on non-virtualised platforms. One field from the lppaca (pmcregs_in_use) is also used by the host, so move the host part out into the paca (lppaca field is still updated in guest mode). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix non-pseries build with some #ifdefs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:22 +11:00
Michael Ellerman	0834d627fb	powerpc/mpic: Check if cpu_possible() in mpic_physmask() In mpic_physmask() we loop over all CPUs up to 32, then get the hard SMP processor id of that CPU. Currently that's possibly walking off the end of the paca array, but in a future patch we will change the paca array to be an array of pointers, and in that case we will get a NULL for missing CPUs and oops. eg: Unable to handle kernel paging request for data at address 0x88888888888888b8 Faulting instruction address: 0xc00000000004e380 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP .mpic_set_affinity+0x60/0x1a0 LR .irq_do_set_affinity+0x48/0x100 Fix it by checking the CPU is possible, this also fixes the code if there are gaps in the CPU numbering which probably never happens on mpic systems but who knows. Debugged-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:22 +11:00
Radim Krčmář	27aa896281	KVM PPC update for 4.17 - Improvements for the radix page fault handler for HV KVM on POWER9. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJavHlMAAoJEJ2a6ncsY3GfWTAIANbY6b8xvhUY6c3WM1dt6o78 6MWJW1vCcsDYuM5yyIjYTds2xjY6vm7oWbo9thDdLIE9sWP2uKBDbdem8KxZb6JL t/tdzuLOYkB60BfwQL0z77UmLlHSQYF5RfJjbYVe5oXt7OU5TCe43udkHsT9QtLH HTpxvl7ebf2TIoex1+XqrD0eJ93tSOVFWB7Ay7WRUQu08CMEQRcHaszyMqdNfHfs LHoPwLgxyWf+7/zt2T4++ebfysQDFpQgsEuEBugXkaHkw6pSGi6R+BOrZwVVmcGm jiHq5+mdho3wL+B47Rt7UTjkpsMLyFbWR6TrhMD7y84/CislizKhEdnEYymKwH4= =0igD -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-next-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc KVM PPC update for 4.17 - Improvements for the radix page fault handler for HV KVM on POWER9.	2018-03-29 20:20:13 +02:00
Linus Torvalds	a2601d78b7	powerpc fixes for 4.16 #6 These are actually all fixes for pre-4.16 code, or new hardware workarounds. Fix missing AT_BASE_PLATFORM (in auxv) when we're using a new firmware interface for describing CPU features. Fix lost pending interrupts due to a race in our interrupt soft-masking code. A workaround for a nest MMU bug with TLB invalidations on Power9. A workaround for broadcast TLB invalidations on Power9. Fix a bug in our instruction SLB miss handler, when handling bad addresses (eg. >= TASK_SIZE), which could corrupt non-volatile user GPRs. Thanks to: Aneesh Kumar K.V, Balbir Singh, Benjamin Herrenschmidt, Nicholas Piggin. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJau3wfExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYCz dA/+JnB5iKCXCCebnqoaX4AFTqMfxT3nr/+JkfchovZLV0PBVzKME5JtL61udmDe j1JZU8UASLqN/8/j652s87XuuRi6xPjSPjMNXmU1LFQ7DjS9yA6FOAsbE4c1Xg4D jSded2BSnMRtA/yw8AupvdYr4w72zKMQYzo8/Or3eUQAAge+oX3d1SQiRkD3DOUg EdpHnOScSwz6GL9amfaQBhXwvik+4crTQ/wZ/SsTpQrfJkVzHXLn/DnHEP1qO+ky v/Y0ix5TxpH132XsVM7UaUvy1ZcZSyEmT2qGOisGm0fj4jesVn9dQMzP+97W4QeW ghfHj2fvzx6IsPM3PhNKITknQi/GTrukjSuzYNuj7MyvKY15HUP1MPXNeJUl5thw kI5uYWuTvyI3daQKFXRQa7V6H0auuYeEV6/RvIlJ2YtUfqmvyECviNM/+mDC0+Jk bgqz47qqeEz2cwIUu/vQm2phVpq+15cLPwmdA37IdyT6GvYgGmsW4HWVIsyxLR2z fo9ghX+1oMhmMNhgVYtL2P9BfCzQenK2R+uAmUOHdNyc0LBlGKN+RPAQqQkBhKGp BB1L2F13kpeNBNTOsPU4yH3DpPaJFtfnaeL7jd5SanwsxNnoKApFglf0nE73bvbw AwRF/vWokbd3WzuPmOtldtluWUHQhaLECU24odVGB/r3XCI= =qP8V -----END PGP SIGNATURE----- Merge tag 'powerpc-4.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 4.16. Apologies if this is a bit big at rc7, but they're all reasonably important fixes. None are actually for new code, so they aren't indicative of 4.16 being in bad shape from our point of view. - Fix missing AT_BASE_PLATFORM (in auxv) when we're using a new firmware interface for describing CPU features. - Fix lost pending interrupts due to a race in our interrupt soft-masking code. - A workaround for a nest MMU bug with TLB invalidations on Power9. - A workaround for broadcast TLB invalidations on Power9. - Fix a bug in our instruction SLB miss handler, when handling bad addresses (eg. >= TASK_SIZE), which could corrupt non-volatile user GPRs. Thanks to: Aneesh Kumar K.V, Balbir Singh, Benjamin Herrenschmidt, Nicholas Piggin" * tag 'powerpc-4.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix i-side SLB miss bad address handler saving nonvolatile GPRs powerpc/mm: Fixup tlbie vs store ordering issue on POWER9 powerpc/mm/radix: Move the functions that does the actual tlbie closer powerpc/mm/radix: Remove unused code powerpc/mm: Workaround Nest MMU bug with TLB invalidations powerpc/mm: Add tracking of the number of coprocessors using a context powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features	2018-03-28 13:54:03 -10:00
Michael Ellerman	95dff480bb	Merge branch 'fixes' into next Merge our fixes branch from the 4.16 cycle. There were a number of important fixes merged, in particular some Power9 workarounds that we want in next for testing purposes. There's also been some conflicting changes in the CPU features code which are best merged and tested before going upstream.	2018-03-28 22:59:50 +11:00
Paul Mackerras	31c8b0d069	KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler This changes the hypervisor page fault handler for radix guests to use the generic KVM __gfn_to_pfn_memslot() function instead of using get_user_pages_fast() and then handling the case of VM_PFNMAP vmas specially. The old code missed the case of VM_IO vmas; with this change, VM_IO vmas will now be handled correctly by code within __gfn_to_pfn_memslot. Currently, __gfn_to_pfn_memslot calls hva_to_pfn, which only uses __get_user_pages_fast for the initial lookup in the cases where either atomic or async is set. Since we are not setting either atomic or async, we do our own __get_user_pages_fast first, for now. This also adds code to check for the KVM_MEM_READONLY flag on the memslot. If it is set and this is a write access, we synthesize a data storage interrupt for the guest. In the case where the page is not normal RAM (i.e. page == NULL in kvmppc_book3s_radix_page_fault(), we read the PTE from the Linux page tables because we need the mapping attribute bits as well as the PFN. (The mapping attribute bits indicate whether accesses have to be non-cacheable and/or guarded.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-28 08:27:58 +11:00
Michael Ellerman	c0b346729b	Merge branch 'topic/ppc-kvm' into next Merge the DAWR series, which touches arch code and KVM code and may need to be merged into the kvm-ppc tree.	2018-03-27 23:55:49 +11:00
Michael Neuling	9654153158	powerpc: Disable DAWR in the base POWER9 CPU features Using the DAWR on POWER9 can cause xstops, hence we need to disable it. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:55:33 +11:00
Michael Neuling	622aa35e8f	powerpc: Disable DAWR on POWER9 via CPU feature quirk This disables the DAWR on all POWER9 CPUs via cpu feature quirk. Using the DAWR on POWER9 can cause xstops, hence we need to disable it. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:55:33 +11:00
Michael Neuling	b53221e704	KVM: PPC: Book3S HV: Handle migration with POWER9 disabled DAWR POWER9 with the DAWR disabled causes problems for partition migration. Either we have to fail the migration (since we lose the DAWR) or we silently drop the DAWR and allow the migration to pass. This patch does the latter and allows the migration to pass (at the cost of silently losing the DAWR). This is not ideal but hopefully the best overall solution. This approach has been acked by Paulus. With this patch kvmppc_set_one_reg() will store the DAWR in the vcpu but won't actually set it on POWER9 hardware. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:55:33 +11:00
Michael Neuling	e8ebedbf31	KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9 POWER7 compat mode guests can use h_set_dabr on POWER9. POWER9 should use the DAWR but since it's disabled there we can't. This returns H_UNSUPPORTED on a h_set_dabr() on POWER9 where the DAWR is disabled. Current Linux guests ignore this error, so they will silently not get the DAWR (sigh). The same error code is being used by POWERVM in this case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:55:32 +11:00
Michael Neuling	398e712c00	KVM: PPC: Book3S HV: Return error from h_set_mode(SET_DAWR) on POWER9 Return H_P2 on a h_set_mode(SET_DAWR) on POWER9 where the DAWR is disabled. Current Linux guests ignore this error, so they will silently not get the DAWR (sigh). The same error code is being used by POWERVM in this case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:55:32 +11:00
Michael Neuling	9bc2bd5d9d	powerpc: Update xmon to use ppc_breakpoint_available() The 'bd' command will now print an error and not set the breakpoint on P9. Signed-off-by: Michael Neuling <mikey@neuling.org> [mpe: Unsplit quoted string] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:55:11 +11:00
Michael Neuling	85ce9a5d57	powerpc: Update ptrace to use ppc_breakpoint_available() This updates the ptrace code to use ppc_breakpoint_available(). We now advertise via PPC_PTRACE_GETHWDBGINFO zero breakpoints when the DAWR is missing (ie. POWER9). This results in GDB falling back to software emulation of the breakpoint (which is slow). For the features advertised by PPC_PTRACE_GETHWDBGINFO, we keep advertising DAWR as if we don't GDB assumes 1 breakpoint irrespective of the number of breakpoints advertised. GDB then fails later when trying to set this one breakpoint. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:52:44 +11:00
Michael Neuling	404b27d66e	powerpc: Add ppc_breakpoint_available() Add ppc_breakpoint_available() to determine if a breakpoint is available currently via the DAWR or DABR. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:52:43 +11:00
Sam Bobroff	34a286a4ac	powerpc/eeh: Add eeh_state_active() helper Checking for a "fully active" device state requires testing two flag bits, which is open coded in several places, so add a function to do it. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:45:19 +11:00
Sam Bobroff	54048cf876	powerpc/eeh: Factor out common code eeh_reset_device() The caller will always pass NULL for 'rmv_data' when 'eeh_aware_driver' is true, so the first two calls to eeh_pe_dev_traverse() can be combined without changing behaviour as can the two arms of the final 'if' block. This should not change behaviour. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:45:14 +11:00
Sam Bobroff	d3136d7712	powerpc/eeh: Remove always-true tests in eeh_reset_device() eeh_reset_device() tests the value of 'bus' more than once but the only caller, eeh_handle_normal_device() does this test itself and will never pass NULL. So, remove the dead tests. This should not change behaviour. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:45:00 +11:00
Sam Bobroff	5fd13460af	powerpc/eeh: Clarify arguments to eeh_reset_device() It is currently difficult to understand the behaviour of eeh_reset_device() due to the way it's parameters are used. In particular, when 'bus' is NULL, it's value is still necessary so the same value is looked up again locally under a different name ('frozen_bus') but behaviour is changed. To clarify this, add a new parameter 'driver_eeh_aware', and have the caller set it when it would have passed NULL for 'bus' and always pass a value for 'bus'. Then change any test that was on 'bus' to one on '!driver_eeh_aware' and replace uses of 'frozen_bus' with 'bus'. Also update the function's comment. This should not change behaviour. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:59 +11:00
Sam Bobroff	cd95f804ac	powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event() The name "frozen_bus" is misleading: it's not necessarily frozen, it's just the PE's PCI bus. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:59 +11:00
Sam Bobroff	5b86ac9e91	powerpc/eeh: Remove misleading test in eeh_handle_normal_event() Remove a test that checks if "frozen_bus" is NULL, because it cannot have changed since it was tested at the start of the function and so must be true here. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:58 +11:00
Sam Bobroff	63457b144b	powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device() Commit "0ba178888b05 powerpc/eeh: Remove reference to PCI device" removed a call to pci_dev_get() from __eeh_addr_cache_get_device() but did not update the comment to match. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:58 +11:00
Sam Bobroff	37fd812587	powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event() Currently the EEH_PE_RECOVERING flag for a PE is managed by both the caller and callee of eeh_handle_normal_event() (among other places not considered here). This is complicated by the fact that the PE may or may not have been invalidated by the call. So move the callee's handling into eeh_handle_normal_event(), which clarifies it and allows the return type to be changed to void (because it no longer needs to indicate at the PE has been invalidated). This should not change behaviour except in eeh_event_handler() where it was previously possible to cause eeh_pe_state_clear() to be called on an invalid PE, which is now avoided. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:58 +11:00
Sam Bobroff	6870178071	powerpc/eeh: Remove eeh_handle_event() The function eeh_handle_event(pe) does nothing other than switching between calling eeh_handle_normal_event(pe) and eeh_handle_special_event(). However it is only called in two places, one where pe can't be NULL and the other where it must be NULL (see eeh_event_handler()) so it does nothing but obscure the flow of control. So, remove it. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:57 +11:00
Alexey Kardashevskiy	d41ce7b1bc	powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit table is enabled GPUs and the corresponding NVLink bridges get different PEs as they have separate translation validation entries (TVEs). We put these PEs to the same IOMMU group so they cannot be passed through separately. So the iommu_table_group_ops::set_window/unset_window for GPUs do set tables to the NPU PEs as well which means that iommu_table's list of attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked. This list is used for TCE cache invalidation. The problem is that NPU PE has just a single TVE and can be programmed to point to 32bit or 64bit windows while GPU PE has two (as any other PCI device). So we end up having an 32bit iommu_table struct linked to both PEs even though only the 64bit TCE table cache can be invalidated on NPU. And a relatively recent skiboot detects this and prints errors. This changes GPU's iommu_table_group_ops::set_window/unset_window to make sure that NPU PE is only linked to the table actually used by the hardware. If there are two tables used by an IOMMU group, the NPU PE will use the last programmed one which with the current use scenarios is expected to be a 64bit one. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:57 +11:00
Alexey Kardashevskiy	b574df9488	powerpc/mm: Fix typo in comments Fixes: `912cc87a6` "powerpc/mm/radix: Add LPID based tlb flush helpers" Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:56 +11:00
Alexey Kardashevskiy	a8c0bf3c62	powerpc/lpar/debug: Initialize flags before printing debug message With enabled DEBUG, there is a compile error: "error: ‘flags’ is used uninitialized in this function". This moves pr_devel() little further where @flags are initialized. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:56 +11:00
Alexey Kardashevskiy	79b4686857	powerpc/init: Do not advertise radix during client-architecture-support Currently the pseries kernel advertises radix MMU support even if the actual support is disabled via the CONFIG_PPC_RADIX_MMU option. This adds a check for CONFIG_PPC_RADIX_MMU to avoid advertising radix to the hypervisor. Suggested-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:55 +11:00
Mauricio Faria de Oliveira	bde709a708	powerpc/mm: Fix section mismatch warning in stop_machine_change_mapping() Fix the warning messages for stop_machine_change_mapping(), and a number of other affected functions in its call chain. All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit is okay (keeps them / does not discard them). Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu. $ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux ... MODPOST vmlinux.o WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping() The function stop_machine_change_mapping() references the function __meminit create_physical_mapping(). This is often because stop_machine_change_mapping lacks a __meminit annotation or the annotation of create_physical_mapping is wrong. WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping() The function stop_machine_change_mapping() references the function __meminit create_physical_mapping(). This is often because stop_machine_change_mapping lacks a __meminit annotation or the annotation of create_physical_mapping is wrong. ... Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:55 +11:00
Michael Ellerman	d6fbe1c55c	powerpc/64s: Wire up cpu_show_spectre_v2() Add a definition for cpu_show_spectre_v2() to override the generic version. This has several permuations, though in practice some may not occur we cater for any combination. The most verbose is: Mitigation: Indirect branch serialisation (kernel only), Indirect branch cache disabled, ori31 speculation barrier enabled We don't treat the ori31 speculation barrier as a mitigation on its own, because it has to be used by code in order to be a mitigation and we don't know if userspace is doing that. So if that's all we see we say: Vulnerable, ori31 speculation barrier enabled Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:55 +11:00
Michael Ellerman	56986016cb	powerpc/64s: Wire up cpu_show_spectre_v1() Add a definition for cpu_show_spectre_v1() to override the generic version. Currently this just prints "Not affected" or "Vulnerable" based on the firmware flag. Although the kernel does have array_index_nospec() in a few places, we haven't yet audited all the powerpc code to see where it's necessary, so for now we don't list that as a mitigation. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:54 +11:00
Michael Ellerman	2e4a16161f	powerpc/pseries: Use the security flags in pseries_setup_rfi_flush() Now that we have the security flags we can simplify the code in pseries_setup_rfi_flush() because the security flags have pessimistic defaults. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:54 +11:00
Michael Ellerman	37c0bdd00d	powerpc/powernv: Use the security flags in pnv_setup_rfi_flush() Now that we have the security flags we can significantly simplify the code in pnv_setup_rfi_flush(), because we can use the flags instead of checking device tree properties and because the security flags have pessimistic defaults. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:53 +11:00
Michael Ellerman	ff348355e9	powerpc/64s: Enhance the information in cpu_show_meltdown() Now that we have the security feature flags we can make the information displayed in the "meltdown" file more informative. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:53 +11:00
Michael Ellerman	8ad3304156	powerpc/64s: Move cpu_show_meltdown() This landed in setup_64.c for no good reason other than we had nowhere else to put it. Now that we have a security-related file, that is a better place for it so move it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:53 +11:00
Michael Ellerman	77addf6e95	powerpc/powernv: Set or clear security feature flags Now that we have feature flags for security related things, set or clear them based on what we see in the device tree provided by firmware. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:52 +11:00
Michael Ellerman	f636c14790	powerpc/pseries: Set or clear security feature flags Now that we have feature flags for security related things, set or clear them based on what we receive from the hypercall. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:52 +11:00
Michael Ellerman	9a868f6343	powerpc: Add security feature flags for Spectre/Meltdown This commit adds security feature flags to reflect the settings we receive from firmware regarding Spectre/Meltdown mitigations. The feature names reflect the names we are given by firmware on bare metal machines. See the hostboot source for details. Arguably these could be firmware features, but that then requires them to be read early in boot so they're available prior to asm feature patching, but we don't actually want to use them for patching. We may also want to dynamically update them in future, which would be incompatible with the way firmware features work (at the moment at least). So for now just make them separate flags. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:51 +11:00
Michael Ellerman	c4bc36628d	powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags Add some additional values which have been defined for the H_GET_CPU_CHARACTERISTICS hypercall. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:51 +11:00
Michael Ellerman	921bc6cf80	powerpc/rfi-flush: Call setup_rfi_flush() after LPM migration We might have migrated to a machine that uses a different flush type, or doesn't need flushing at all. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:14 +11:00
Mauricio Faria de Oliveira	0063d61ccf	powerpc/rfi-flush: Differentiate enabled and patched flush types Currently the rfi-flush messages print 'Using <type> flush' for all enabled_flush_types, but that is not necessarily true -- as now the fallback flush is always enabled on pseries, but the fixup function overwrites its nop/branch slot with other flush types, if available. So, replace the 'Using <type> flush' messages with '<type> flush is available'. Also, print the patched flush types in the fixup function, so users can know what is (not) being used (e.g., the slower, fallback flush, or no flush type at all if flush is disabled via the debugfs switch). Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:14 +11:00
Michael Ellerman	84749a58b6	powerpc/rfi-flush: Always enable fallback flush on pseries This ensures the fallback flush area is always allocated on pseries, so in case a LPAR is migrated from a patched to an unpatched system, it is possible to enable the fallback flush in the target system. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:13 +11:00
Michael Ellerman	abf110f3e1	powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again For PowerVM migration we want to be able to call setup_rfi_flush() again after we've migrated the partition. To support that we need to check that we're not trying to allocate the fallback flush area after memblock has gone away (i.e., boot-time only). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:12 +11:00
Michael Ellerman	1e2a9fc749	powerpc/rfi-flush: Move the logic to avoid a redo into the debugfs code rfi_flush_enable() includes a check to see if we're already enabled (or disabled), and in that case does nothing. But that means calling setup_rfi_flush() a 2nd time doesn't actually work, which is a bit confusing. Move that check into the debugfs code, where it really belongs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:11 +11:00
Madhavan Srinivasan	ac96588d98	powerpc/perf: Add blacklisted events for Power9 DD2.2 These events either do not count, or do not count correctly, so to prevent user confusion block counting them at all. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:11 +11:00
Madhavan Srinivasan	64acab4e4f	powerpc/perf: Add blacklisted events for Power9 DD2.1 These events either do not count, or do not count correctly, so to prevent user confusion block counting them at all. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:10 +11:00
Madhavan Srinivasan	b58064da04	powerpc/perf: Infrastructure to support addition of blacklisted events Introduce code to support addition of blacklisted events for a processor version. Blacklisted events are events that are known to not count correctly on that CPU revision, and so should be prevented from being counted so as to avoid user confusion. A 'pointer' and 'int' variable to hold the number of events are added to 'struct power_pmu', along with a generic function to loop through the list to validate the given event. Generic function 'is_event_blacklisted' is called in power_pmu_event_init() to detect and reject early. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:10 +11:00
Madhavan Srinivasan	cd1231d703	powerpc/perf: Prevent kernel address leak via perf_get_data_addr() Sampled Data Address Register (SDAR) is a 64-bit register that contains the effective address of the storage operand of an instruction that was being executed, possibly out-of-order, at or around the time that the Performance Monitor alert occurred. In certain scenario SDAR happen to contain the kernel address even for userspace only sampling. Add checks to prevent it. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:09 +11:00
Madhavan Srinivasan	bb19af8160	powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer The current Branch History Rolling Buffer (BHRB) code does not check for any privilege levels before updating the data from BHRB. This could leak kernel addresses to userspace even when profiling only with userspace privileges. Add proper checks to prevent it. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:09 +11:00
Michael Ellerman	e1ebd0e5b9	powerpc/perf: Fix kernel address leak via sampling registers Current code in power_pmu_disable() does not clear the sampling registers like Sampling Instruction Address Register (SIAR) and Sampling Data Address Register (SDAR) after disabling the PMU. Since these are userspace readable and could contain kernel addresses, add code to explicitly clear the content of these registers. Also add a "context synchronizing instruction" to enforce no further updates to these registers as suggested by Power ISA v3.0B. From section 9.4, on page 1108: "If an mtspr instruction is executed that changes the value of a Performance Monitor register other than SIAR, SDAR, and SIER, the change is not guaranteed to have taken effect until after a subsequent context synchronizing instruction has been executed (see Chapter 11. "Synchronization Requirements for Context Alterations" on page 1133)." Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Massage change log and add ISA reference] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:08 +11:00
Paul Mackerras	dbfcf3cb9c	powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9 On POWER9, since commit `cc3d294013` ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9", 2017-01-30), we set both the radix and HPT bits in the client-architecture-support (CAS) vector, which tells the hypervisor that we can do either radix or HPT. According to PAPR, if we use this combination we are promising to do a H_REGISTER_PROC_TBL hcall later on to let the hypervisor know whether we are doing radix or HPT. We currently do this call if we are doing radix but not if we are doing HPT. If the hypervisor is able to support both radix and HPT guests, it would be entitled to defer allocation of the HPT until the H_REGISTER_PROC_TBL call, and to fail any attempts to create HPTEs until the H_REGISTER_PROC_TBL call. Thus we need to do a H_REGISTER_PROC_TBL call when we are doing HPT; otherwise we may crash at boot time. This adds the code to call H_REGISTER_PROC_TBL in this case, before we attempt to create any HPT entries using H_ENTER. Fixes: `cc3d294013` ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:08 +11:00
Nicholas Piggin	52396500f9	powerpc/64s: Fix i-side SLB miss bad address handler saving nonvolatile GPRs The SLB bad address handler's trap number fixup does not preserve the low bit that indicates nonvolatile GPRs have not been saved. This leads save_nvgprs to skip saving them, and subsequent functions and return from interrupt will think they are saved. This causes kernel branch-to-garbage debugging to not have correct registers, can also cause userspace to have its registers clobbered after a segfault. Fixes: `f0f558b131` ("powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-26 07:40:17 +11:00
Michael Forney	a670b0b4ae	kbuild: Use ls(1) instead of stat(1) to obtain file size stat(1) is not standardized and different implementations have their own (conflicting) flags for querying the size of a file. ls(1) provides the same information (value of st.st_size) in the 5th column, except when the file is a character or block device. This output is standardized[0]. The -n option turns on -l, which writes lines formatted like "%s %u %s %s %u %s %s\n", <file mode>, <number of links>, <owner name>, <group name>, <size>, <date and time>, <pathname> but instead of writing the <owner name> and <group name>, it writes the numeric owner and group IDs (this avoids /etc/passwd and /etc/group lookups as well as potential field splitting issues). The <size> field is specified as "the value that would be returned for the file in the st_size field of struct stat". To avoid duplicating logic in several locations in the tree, create scripts/file-size.sh and update callers to use that instead of stat(1). [0] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html#tag_20_73_10 Signed-off-by: Michael Forney <forney@google.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>	2018-03-26 02:01:24 +09:00
Nicholas Piggin	f49821ee32	kbuild: rename built-in.o to built-in.a Incremental linking is gone, so rename built-in.o to built-in.a, which is the usual extension for archive files. This patch does two things, first is a simple search/replace: git grep -l 'built-in\.o' \| xargs sed -i 's/built-in\.o/built-in\.a/g' The second is to invert nesting of nested text manipulations to avoid filtering built-in.a out from libs-y2: -libs-y2 := $(filter-out %.a, $(patsubst %/, %/built-in.a, $(libs-y))) +libs-y2 := $(patsubst %/, %/built-in.a, $(filter-out %.a, $(libs-y))) Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>	2018-03-26 02:01:19 +09:00
Ingo Molnar	ea2301b622	Merge branch 'linus' into x86/dma, to resolve a conflict with upstream Conflicts: arch/x86/mm/init_64.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-24 09:25:26 +01:00
Michael Ellerman	a26cf1c9fe	Merge branch 'topic/ppc-kvm' into next This brings in two series from Paul, one of which touches KVM code and may need to be merged into the kvm-ppc tree to resolve conflicts.	2018-03-24 08:43:18 +11:00
Paolo Bonzini	e13c2ac512	PPC KVM fix - Fix a bug causing occasional machine check exceptions on POWER8 hosts, introduced in 4.16-rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJatGo6AAoJEJ2a6ncsY3GfZb0H/AiOmWCEhLqLmvudlFaCxm2c UBwXb4+snwQxHUTECpSibjDH65zetyqoX9FdyVS+xlEER1qRWxdgt5WY8gyxLjHP oPVIM3lROeRrRpql3ioXWKHWzcNZq5OQb67m9lHiO5irO/l2Ct1QazQnFtVrJOM0 0AqwmpIdYK8D9W8rEOTt6P9uOicdojriiGe+ihbJElqYeksJkQYMgMaQNC4Zq3ri qURH32cWAJe3AtP8cVJvWcubC2Cd01lHRbPOxHVAJvqE3MXQBtr4SFkuPJ0eRF2l Lcn/S5qw/AMQ6hU3bHxTFUywHUwyGX6qvAJMOyP8Szvv99SUhFezku2l4wCJ/6w= =9BpD -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-fixes-4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master PPC KVM fix - Fix a bug causing occasional machine check exceptions on POWER8 hosts, introduced in 4.16-rc1.	2018-03-23 18:21:49 +01:00
Paul Mackerras	681c617b7c	KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where the contents of the TEXASR can get corrupted while a thread is in fake suspend state. The workaround is for the instruction emulation code to use the value saved at the most recent guest exit in real suspend mode. We achieve this by simply not saving the TEXASR into the vcpu struct on an exit in fake suspend state. We also have to take care to set the orig_texasr field only on guest exit in real suspend state. This also means that on guest entry in fake suspend state, TEXASR will be restored to the value it had on the last exit in real suspend state, effectively counteracting any hardware-caused corruption. This works because TEXASR may not be written in suspend state. With this, the guest might see the wrong values in TEXASR if it reads it while in suspend state, but will see the correct value in non-transactional state (e.g. after a treclaim), and treclaim will work correctly. With this workaround, the code will actually run slightly faster, and will operate correctly on systems without the TEXASR bug (since TEXASR may not be written in suspend state, and is only changed by failure recording, which will have already been done before we get into fake suspend state). Therefore these changes are not made subject to a CPU feature bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:39:17 +11:00
Suraj Jitindar Singh	87a11bb6a7	KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors, where a treclaim performed in fake suspend mode can cause subsequent reads from the XER register to return inconsistent values for the SO (summary overflow) bit. The inconsistent SO bit state can potentially be observed on any thread in the core. We have to do the treclaim because that is the only way to get the thread out of suspend state (fake or real) and into non-transactional state. The workaround for the bug is to force the core into SMT4 mode before doing the treclaim. This patch adds the code to do that, conditional on the CPU_FTR_P9_TM_XER_SO_BUG feature bit. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:39:16 +11:00
Paul Mackerras	4bb3c7a020	KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation. The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt. On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0. On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint. Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts. The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path. With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:39:13 +11:00
Paul Mackerras	7672691a08	powerpc/powernv: Provide a way to force a core into SMT4 mode POWER9 processors up to and including "Nimbus" v2.2 have hardware bugs relating to transactional memory and thread reconfiguration. One of these bugs has a workaround which is to get the core into SMT4 state temporarily. This workaround is only needed when running bare-metal. This patch provides a function which gets the core into SMT4 mode by preventing threads from going to a stop state, and waking up those which are already in a stop state. Once at least 3 threads are not in a stop state, the core will be in SMT4 and we can continue. To do this, we add a "dont_stop" flag to the paca to tell the thread not to go into a stop state. If this flag is set, power9_idle_stop() just returns immediately with a return value of 0. The pnv_power9_force_smt4_catch() function does the following: 1. Set the dont_stop flag for each thread in the core, except ourselves (in fact we use an atomic_inc() in case more than one thread is calling this function concurrently). 2. See how many threads are awake, indicated by their requested_psscr field in the paca being 0. If this is at least 3, skip to step 5. 3. Send a doorbell interrupt to each thread that was seen as being in a stop state in step 2. 4. Until at least 3 threads are awake, scan the threads to which we sent a doorbell interrupt and check if they are awake now. This relies on the following properties: - Once dont_stop is non-zero, requested_psccr can't go from zero to non-zero, except transiently (and without the thread doing stop). - requested_psscr being zero guarantees that the thread isn't in a state-losing stop state where thread reconfiguration could occur. - Doing stop with a PSSCR value of 0 won't be a state-losing stop and thus won't allow thread reconfiguration. - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core must be in SMT4 mode, since SMT modes are powers of 2. This does add a sync to power9_idle_stop(), which is necessary to provide the correct ordering between setting requested_psscr and checking dont_stop. The overhead of the sync should be unnoticeable compared to the latency of going into and out of a stop state. Because some objected to incurring this extra latency on systems where the XER[SO] bug is not relevant, I have put the test in power9_idle_stop inside a feature section. This means that pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will probably hang the system. In order to cater for uses where the caller has an operation that has to be done while the core is in SMT4, the core continues to be kept in SMT4 after pnv_power9_force_smt4_catch() function returns, until the pnv_power9_force_smt4_release() function is called. It undoes the effect of step 1 above and allows the other threads to go into a stop state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:39:11 +11:00
Paul Mackerras	b5af4f2793	powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2 This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2 processors which will be used to enable the hypervisor to assist hardware with the handling of checkpointed register values while the CPU is in suspend state, in order to work around hardware bugs. The hardware assistance for these workarounds introduced a new hardware bug relating to the XER[SO] bit. We add a separate feature bit for this bug in case future chips fix it while still requiring the hypervisor assistance with suspend state. When the dt_cpu_ftrs subsystem is in use, the software assistance can be enabled using a "tm-suspend-hypervisor-assist" node in the device tree, and a "tm-suspend-xer-so-bug" node enables the workarounds for the XER[SO] bug. In the absence of such nodes, a quirk enables both for POWER9 "Nimbus" DD2.2 processors. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:39:09 +11:00
Paul Mackerras	9bbf0b576d	powerpc: Free up CPU feature bits on 64-bit machines This moves all the CPU feature bits that are only used on 32-bit machines to the top 20 bits of the CPU feature word and arranges for them to be defined only in 32-bit builds. The features that are common to 32-bit and 64-bit machines are moved to bits 0-11 of the CPU feature word. This means that for 64-bit platforms, bits 44-63 can now be used for new features that only exist on 64-bit machines. (These bit numbers are counting from the right, i.e. the LSB is bit 0.) Because CPU_FTR_L3_DISABLE_NAP moved from the low 16 bits to the high 16 bits, we have to adjust some assembly code. Also, CPU_FTR_EMB_HV moved from the high 16 bits to the low 16 bits. Note that CPU_FTR_REAL_LE only applies to 64-bit chips, because only 64-bit chips (POWER6, 7, 8, 9) have a true little-endian mode that is a CPU execution mode as opposed to being a page attribute. With this we now have 20 free CPU feature bits on 64-bit machines. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:38:51 +11:00
Paul Mackerras	dd0efb3f11	powerpc: Book E: Remove unused CPU_FTR_L2CSR bit The CPU_FTR_L2CSR bit is never tested anywhere, so let's reclaim the bit. The last usage was removed in `86d63363de` ("powerpc/e500mc: Remove dead L2 flushing code in idle_e500.S") (Jun 2015). Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:38:00 +11:00
Paul Mackerras	c0d64cf9fe	powerpc: Use feature bit for RTC presence rather than timebase presence All PowerPC CPUs other than the original PPC601 have a timebase register rather than the "real-time clock" (RTC) register that the PPC601 (and the original POWER and POWER2 CPUs) had. Currently we have a CPU feature bit to indicate the presence of the timebase, but it makes more sense to use a bit to indicate the unusual situation rather than the common situation. This therefore defines a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and arranges for it to be set on PPC601 systems. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:36:45 +11:00
Aneesh Kumar K.V	a5d4b5891c	powerpc/mm: Fixup tlbie vs store ordering issue on POWER9 On POWER9, under some circumstances, a broadcast TLB invalidation might complete before all previous stores have drained, potentially allowing stale stores from becoming visible after the invalidation. This works around it by doubling up those TLB invalidations which was verified by HW to be sufficient to close the risk window. This will be documented in a yet-to-be-published errata. Fixes: `1a472c9dba` ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Enable the feature in the DT CPU features code for all Power9, rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 20:48:03 +11:00
Aneesh Kumar K.V	243fee3249	powerpc/mm/radix: Move the functions that does the actual tlbie closer No functionality change. Just code movement to ease code changes later Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 16:17:42 +11:00
Aneesh Kumar K.V	99491e2d0e	powerpc/mm/radix: Remove unused code These function are not used in the code. Remove them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 16:17:39 +11:00
Benjamin Herrenschmidt	80a4ae202f	powerpc/mm: Workaround Nest MMU bug with TLB invalidations On POWER9 the Nest MMU may fail to invalidate some translations when doing a tlbie "by PID" or "by LPID" that is targeted at the TLB only and not the page walk cache. This works around it by forcing such invalidations to escalate to RIC=2 (full invalidation of TLB and PWC) when a coprocessor is in use for the context. Fixes: `03b8abedf4` ("cxl: Enable global TLBIs for cxl contexts") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Balbir Singh <bsingharora@gmail.com> [balbirs: fixed spelling and coding style to quiesce checkpatch.pl] Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 14:16:58 +11:00
Benjamin Herrenschmidt	aff6f8cb3e	powerpc/mm: Add tracking of the number of coprocessors using a context Currently, when using coprocessors (which use the Nest MMU), we simply increment the active_cpu count to force all TLB invalidations to be come broadcast. Unfortunately, due to an errata in POWER9, we will need to know more specifically that coprocessors are in use. This maintains a separate copros counter in the MMU context for that purpose. NB. The commit mentioned in the fixes tag below is not at fault for the bug we're fixing in this commit and the next, but this fix applies on top the infrastructure it introduced. Fixes: `03b8abedf4` ("cxl: Enable global TLBIs for cxl contexts") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 14:14:31 +11:00
Paul Mackerras	cda4a14733	KVM: PPC: Book3S HV: Fix duplication of host SLB entries Since commit `6964e6a4e4` ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded", 2018-01-11), we have been seeing occasional machine check interrupts on POWER8 systems when running KVM guests, due to SLB multihit errors. This turns out to be due to the guest exit code reloading the host SLB entries from the SLB shadow buffer when the SLB was not previously cleared in the guest entry path. This can happen because the path which skips from the guest entry code to the guest exit code without entering the guest now does the skip before the SLB is cleared and loaded with guest values, but the host values are loaded after the point in the guest exit path that we skip to. To fix this, we move the code that reloads the host SLB values up so that it occurs just before the point in the guest exit code (the label guest_bypass:) where we skip to from the guest entry path. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Fixes: `6964e6a4e4` ("KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded") Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-23 13:42:51 +11:00
Nicholas Piggin	ff6781fd1b	powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened force_external_irq_replay() can be called in the do_IRQ path with interrupts hard enabled and soft disabled if may_hard_irq_enable() set MSR[EE]=1. It updates local_paca->irq_happened with a load, modify, store sequence. If a maskable interrupt hits during this sequence, it will go to the masked handler to be marked pending in irq_happened. This update will be lost when the interrupt returns and the store instruction executes. This can result in unpredictable latencies, timeouts, lockups, etc. Fix this by ensuring hard interrupts are disabled before modifying irq_happened. This could cause any maskable asynchronous interrupt to get lost, but it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE, so very high external interrupt rate and high IPI rate. The hang was bisected down to enabling doorbell interrupts for IPIs. These provided an interrupt type that could run at high rates in the do_IRQ path, stressing the race. Fixes: `1d607bb3bd` ("powerpc/irq: Add mechanism to force a replay of interrupts") Cc: stable@vger.kernel.org # v4.8+ Reported-by: Carol L. Soto <clsoto@us.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 08:41:40 +11:00
Christoph Hellwig	b6e05477c1	dma/direct: Handle the memory encryption bit in common code Give the basic phys_to_dma() and dma_to_phys() helpers a __-prefix and add the memory encryption mask to the non-prefixed versions. Use the __-prefixed versions directly instead of clearing the mask again in various places. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-13-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:59 +01:00
Rob Herring	78e5dfea84	powerpc: dts: replace 'linux,stdout-path' with 'stdout-path' 'linux,stdout-path' has been deprecated for some time in favor of 'stdout-path'. Now dtc will warn on occurrences of 'linux,stdout-path'. Search and replace all the of occurrences with 'stdout-path'. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-20 16:47:54 +11:00
Markus Elfring	a0828cf57a	powerpc: Use sizeof(foo) rather than sizeof(struct foo) It's slightly less error prone to use sizeof(foo) rather than specifying the type. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [mpe: Consolidate into one patch, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-20 16:47:53 +11:00
Matt Brown	31513207ce	powerpc: Remove unused flush_dcache_phys_range() The flush_dcache_phys_range() function is no longer used in the kernel. The last usage was removed in `c40785ad30` ("powerpc/dart: Use a cachable DART"). This patch removes the function and declaration. Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> [mpe: Munge change log, include commit that removed last user] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-20 16:47:53 +11:00
Matt Brown	751ba79cc5	lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndrome This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. This instruction was made available with POWER8, ISA version 2.07. It allows for both vperm and vxor instructions to be done in a single instruction. This has been tested for correctness on a ppc64le vm with a basic RAID6 setup containing 5 drives. The performance benchmarks are from the raid6test in the /lib/raid6/test directory. These results are from an IBM Firestone machine with ppc64le architecture. The benchmark results show a 35% speed increase over the best existing algorithm for powerpc (altivec). The raid6test has also been run on a big-endian ppc64 vm to ensure it also works for big-endian architectures. Performance benchmarks: raid6: altivecx4 gen() 18773 MB/s raid6: altivecx8 gen() 19438 MB/s raid6: vpermxor4 gen() 25112 MB/s raid6: vpermxor8 gen() 26279 MB/s Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> [mpe: Add VPERMXOR macro so we can build with old binutils] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-20 16:47:25 +11:00
Christoph Hellwig	e184f2bf4d	scsi: remove the fdomain and fdomain_cs drivers These drivers haven't seen any recent bug fixing and are two of the last drivers using the scsi_module.c infrastruture that has been deprecated 15 years ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2018-03-19 22:54:47 -04:00
Ingo Molnar	134933e557	Linux 4.16-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlqvCPYeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGOaAH/171cgZGFEXSONxK 3O1AAv61wN5K/ISMt6mnelWR6fZg195FarOx0Rnq7Ot8OWuVa8CGcyT4vX4Z7nb9 SVMQKNMPCVQE4WCDOv6S0njChmRC0BxBoVJtTN9fhywdYgX1KcaTS/drMRHACF5n rB9eouMQScfMzKGAW08gp5NvEGJ6W1SLX7La3/u0751dYisdJSP7+vFZNxUrGXEA yIPOQjFu0Tfo8GXz/BwC678RZVzVLN0sE6+/vM7zNnoDlsRVkdDIVMo3UiVqm/NK B37/TlZz8CYoapoKnRRB5giXnSPDSXtsikbGy3mcy0u5imGe+ZgdjrdYSaLk31cR NVZY08k= =pu3X -----END PGP SIGNATURE----- Merge tag 'v4.16-rc6' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-19 20:37:35 +01:00
Paul Mackerras	58c5c276b4	KVM: PPC: Book3S HV: Handle 1GB pages in radix page fault handler This adds code to the radix hypervisor page fault handler to handle the case where the guest memory is backed by 1GB hugepages, and put them into the partition-scoped radix tree at the PUD level. The code is essentially analogous to the code for 2MB pages. This also rearranges kvmppc_create_pte() to make it easier to follow. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-19 10:08:53 +11:00
Paul Mackerras	f7caf712d8	KVM: PPC: Book3S HV: Streamline setting of reference and change bits When using the radix MMU, we can get hypervisor page fault interrupts with the DSISR_SET_RC bit set in DSISR/HSRR1, indicating that an attempt to set the R (reference) or C (change) bit in a PTE atomically failed. Previously we would find the corresponding Linux PTE and check the permission and dirty bits there, but this is not really necessary since we only need to do what the hardware was trying to do, namely set R or C atomically. This removes the code that reads the Linux PTE and just update the partition-scoped PTE, having first checked that it is still present, and if the access is a write, that the PTE still has write permission. Furthermore, we now check whether any other relevant bits are set in DSISR, and if there are, then we proceed with the rest of the function in order to handle whatever condition they represent, instead of returning to the guest as we did previously. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-19 10:08:44 +11:00
Paul Mackerras	c4c8a7643e	KVM: PPC: Book3S HV: Radix page fault handler optimizations This improves the handling of transparent huge pages in the radix hypervisor page fault handler. Previously, if a small page is faulted in to a 2MB region of guest physical space, that means that there is a page table pointer at the PMD level, which could never be replaced by a leaf (2MB) PMD entry. This adds the code to clear the PMD, invlidate the page walk cache and free the page table page in this situation, so that the leaf PMD entry can be created. This also adds code to check whether a PMD or PTE being inserted is the same as is already there (because of a race with another CPU that faulted on the same page) and if so, we don't replace the existing entry, meaning that we don't invalidate the PTE or PMD and do a TLB invalidation. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-19 10:08:38 +11:00
Paul Mackerras	39c983ea0f	KVM: PPC: Remove unused kvm_unmap_hva callback Since commit `fb1522e099` ("KVM: update to new mmu_notifier semantic v2", 2017-08-31), the MMU notifier code in KVM no longer calls the kvm_unmap_hva callback. This removes the PPC implementations of kvm_unmap_hva(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-19 10:08:29 +11:00
Khalid Aziz	9035cf9a97	mm: Add address parameter to arch_validate_prot() A protection flag may not be valid across entire address space and hence arch_validate_prot() might need the address a protection bit is being set on to ensure it is a valid protection flag. For example, sparc processors support memory corruption detection (as part of ADI feature) flag on memory addresses mapped on to physical RAM but not on PFN mapped pages or addresses mapped on to devices. This patch adds address to the parameters being passed to arch_validate_prot() so protection bits can be validated in the relevant context. Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Khalid Aziz <khalid@gonehiking.org> Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-18 07:38:47 -07:00
Peter Zijlstra	edb39592a5	perf: Fix sibling iteration Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: `8343aae661` ("perf/core: Remove perf_event::group_entry") Reported-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net	2018-03-16 20:44:12 +01:00
Paolo Bonzini	52be7a467e	Fix for PPC KVM for 4.16 - Fix bug leading to lost IPIs on POWER9 and hence to other CPUs reporting lockups in smp_call_function_many(). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJaqNxFAAoJEJ2a6ncsY3GfmwQH/3wz36kFHufskFhqtr3kQKYS /LFsydZKF/8puR8CobVcvqRX/KP/WpjTvpC4GhYrto7IVPJBpuJuozSY5LDLVg9s kw5uNQeZREFjua2Lo78/YUh+wN7Xx3LtBC/ass6QOM51dGnfeUpSiSuzGQhMrpaf CaDVT/0M1zPcQqDvQSinsTJm5xNTJ2cO6Q2tTFtHOWQGBKB1uGxexBx9NAEO71vh 6KOgU9uIW83Vy2tubOEN6vaDEOUtm6MOwaTbFQo3Dvt7VPDoUmU099K0+EI8UBDF /PQ/yXWaAkSrZdyDFsLWONd9jX0LrvhdNOw1bh46fPdr+SCTNp9pFRCcq3P+MhI= =44ey -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-fixes-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master Fix for PPC KVM for 4.16 - Fix bug leading to lost IPIs on POWER9 and hence to other CPUs reporting lockups in smp_call_function_many().	2018-03-15 21:57:26 +01:00
Alexandre Belloni	7004263bd4	powerpc/5200: dts: digsy_mtc.dts: fix rv3029 compatible The proper compatible for rv3029 is microcrystal,rv3029. Acked-by: Anatolij Gustschin <agust@denx.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-14 22:28:17 +11:00
Alexandre Belloni	890ae79797	powerpc/time: stop validating rtc_time in .read_time The RTC core is always calling rtc_valid_tm after the read_time callback. It is not necessary to call it just before returning from the callback. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-14 22:27:33 +11:00
Michael Ellerman	e4b7990022	powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features When running virtualised the powerpc kernel is able to run the system in "compat mode" - which means the kernel and hardware are pretending to userspace that the CPU is an older version than it actually is. AT_BASE_PLATFORM is an AUXV entry that we export to userspace for use when we're running in that mode, which tells userspace the "platform" string for the real CPU version, as opposed to the faked version. Although we don't support compat mode when using DT CPU features, and arguably don't need to set AT_BASE_PLATFORM, the existing cputable based code always sets it even when we're running bare metal. That means the lack of AT_BASE_PLATFORM is a user-visible artifact of the fact that the kernel is using DT CPU features, which we don't want. So set it in the DT CPU features code also. This results in eg: $ LD_SHOW_AUXV=1 /bin/true \| grep "AT_.*PLATFORM" AT_PLATFORM: power9 AT_BASE_PLATFORM:power9 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>	2018-03-14 20:20:00 +11:00
Sukadev Bhattiprolu	007bb7d6c7	powerpc/vas: Add a couple of trace points Add a couple of trace points in the VAS driver Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [mpe: Add SPDX tag to new header] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-14 20:13:58 +11:00
Sukadev Bhattiprolu	45ddea8a73	powerpc/vas: Fix cleanup when VAS is not configured When VAS is not configured, unregister the platform driver. Also simplify cleanup by delaying vas debugfs init until we know VAS is configured. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-14 20:11:37 +11:00
Mark Hairgrove	720c84046c	powerpc/npu-dma.c: Fix crash after __mmu_notifier_register failure pnv_npu2_init_context wasn't checking the return code from __mmu_notifier_register. If __mmu_notifier_register failed, the npu_context was still assigned to the mm and the caller wasn't given any indication that things went wrong. Later on pnv_npu2_destroy_context would be called, which in turn called mmu_notifier_unregister and dropped mm->mm_count without having incremented it in the first place. This led to various forms of corruption like mm use-after-free and mm double-free. __mmu_notifier_register can fail with EINTR if a signal is pending, so this case can be frequent. This patch calls opal_npu_destroy_context on the failure paths, and makes sure not to assign mm->context.npu_context until past the failure points. Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-14 20:04:43 +11:00
Paul Mackerras	a8b48a4dcc	KVM: PPC: Book3S HV: Fix trap number return from __kvmppc_vcore_entry This fixes a bug where the trap number that is returned by __kvmppc_vcore_entry gets corrupted. The effect of the corruption is that IPIs get ignored on POWER9 systems when the IPI is sent via a doorbell interrupt to a CPU which is executing in a KVM guest. The effect of the IPI being ignored is often that another CPU locks up inside smp_call_function_many() (and if that CPU is holding a spinlock, other CPUs then lock up inside raw_spin_lock()). The trap number is currently held in register r12 for most of the assembly-language part of the guest exit path. In that path, we call kvmppc_subcore_exit_guest(), which is a C function, without restoring r12 afterwards. Depending on the kernel config and the compiler, it may modify r12 or it may not, so some config/compiler combinations see the bug and others don't. To fix this, we arrange for the trap number to be stored on the stack from the 'guest_bypass:' label until the end of the function, then the trap number is loaded and returned in r12 as before. Cc: stable@vger.kernel.org # v4.8+ Fixes: `fd7bacbca4` ("KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-14 15:10:50 +11:00
Nicholas Piggin	014a32b30e	powerpc/mm/slice: remove radix calls to the slice code This is a tidy up which removes radix MMU calls into the slice code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:08 +11:00
Nicholas Piggin	d262bd5a73	powerpc/mm/slice: Use const pointers to cached slice masks where possible The slice_mask cache was a basic conversion which copied the slice mask into caller's structures, because that's how the original code worked. In most cases the pointer can be used directly instead, saving a copy and an on-stack structure. On POWER8, this increases vfork+exec+exit performance by 0.3% and reduces time to mmap+munmap a 64kB page by 2%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:08 +11:00
Nicholas Piggin	7490755830	powerpc/mm/slice: remove dead code This code is never compiled in, and it gets broken by the next patch, so remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:07 +11:00
Nicholas Piggin	b8c9354914	powerpc/mm/slice: Switch to 3-operand slice bitops helpers This converts the slice_mask bit operation helpers to be the usual 3-operand kind, which allows 2 inputs to set a different output without an extra copy, which is used in the next patch. Adds slice_copy_mask, which will be used in the next patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:07 +11:00
Nicholas Piggin	ae3066bd1c	powerpc/mm/slice: implement slice_check_range_fits Rather than build slice masks from a range then use that to check for fit in a candidate mask, implement slice_check_range_fits that checks if a range fits in a mask directly. This allows several structures to be removed from stacks, and also we don't expect a huge range in a lot of these cases, so building and comparing a full mask is going to be more expensive than testing just one or two bits of the range. On POWER8, this increases vfork+exec+exit performance by 0.3% and reduces time to mmap+munmap a 64kB page by 5%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:06 +11:00
Nicholas Piggin	5709f7cfd8	powerpc/mm/slice: implement a slice mask cache Calculating the slice mask can become a signifcant overhead for get_unmapped_area. This patch adds a struct slice_mask for each page size in the mm_context, and keeps these in synch with the slices psize arrays and slb_addr_limit. On Book3S/64 this adds 288 bytes to the mm_context_t for the slice mask caches. On POWER8, this increases vfork+exec+exit performance by 9.9% and reduces time to mmap+munmap a 64kB page by 28%. Reduces time to mmap+munmap by about 10% on 8xx. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:06 +11:00
Nicholas Piggin	830fd2d45a	powerpc/mm/slice: pass pointers to struct slice_mask where possible Pass around const pointers to struct slice_mask where possible, rather than copies of slice_mask, to reduce stack and call overhead. checkstack.pl gives, before: 0x00000d1c slice_get_unmapped_area [slice.o]: 592 0x00001864 is_hugepage_only_range [slice.o]: 448 0x00000754 slice_find_area_topdown [slice.o]: 400 0x00000484 slice_find_area_bottomup.isra.1 [slice.o]: 272 0x000017b4 slice_set_range_psize [slice.o]: 224 0x00000a4c slice_find_area [slice.o]: 128 0x00000160 slice_check_fit [slice.o]: 112 after: 0x00000ad0 slice_get_unmapped_area [slice.o]: 448 0x00001464 is_hugepage_only_range [slice.o]: 288 0x000006c0 slice_find_area [slice.o]: 144 0x0000016c slice_check_fit [slice.o]: 128 0x00000528 slice_find_area_bottomup.isra.2 [slice.o]: 128 0x000013e4 slice_set_range_psize [slice.o]: 128 This increases vfork+exec+exit performance by 1.5%. Reduces time to mmap+munmap a 64kB page by 17%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:05 +11:00
Nicholas Piggin	5a807e04bd	powerpc/mm/slice: tidy lpsizes and hpsizes update loops Make these loops look the same, and change their form so the important part is not wrapped over so many lines. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:05 +11:00
Nicholas Piggin	1753dd1830	powerpc/mm/slice: Simplify and optimise slice context initialisation The slice state of an mm gets zeroed then initialised upon exec. This is the only caller of slice_set_user_psize now, so that can be removed and instead implement a faster and simplified approach that requires no locking or checking existing state. This speeds up vfork+exec+exit performance on POWER8 by 3%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:05 +11:00
Michael Ellerman	ab83dc794c	powerpc/xmon: Move empty plpar_set_ciabr() into plpar_wrappers.h Now that plpar_wrappers.h has an #ifdef PSERIES we can move the empty version of plpar_set_ciabr() which xmon wants into there. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:04 +11:00
Michael Ellerman	7c09c1869c	powerpc: Rename plapr routines to plpar Back in 2013 we added some hypercall wrappers which misspelled "plpar" (P-series Logical PARtition) as "plapr". Visually they're hard to distinguish and it almost doesn't matter, but it is confusing when grepping to miss some calls because of the typo. They've also started spreading, so before they take over let's fix them all to be "plpar". Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:04 +11:00
Michael Ellerman	5017e875e4	powerpc/pseries: Make plpar_wrappers.h safe to include when PSERIES=n Currently plpar_wrappers.h is not safe to include when CONFIG_PPC_PSERIES=n, or at least it can be depending on other config options and so on. Fix that by wrapping the entire content in an ifdef. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:04 +11:00
Michael Ellerman	16560e8832	powerpc/pseries: Move smp_query_cpu_stopped() etc. out of plpar_wrappers.h smp_query_cpu_stopped() and related #defines are currently in plpar_wrappers.h. The function actually does an RTAS call, not an hcall, and basically has nothing to do with plpar_wrappers.h Move it into pseries.h, where it can easily be used by the only two callers in pseries/smp.c and pseries/hotplug-cpu.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:03 +11:00
Mathieu Malaterre	e82d70cf96	powerpc/32: Add missing prototypes for (early\|machine)_init() early_init() and machine_init() have no prototype, add one in asm-prototypes.h. Fixes the following warnings (treated as error in W=1): arch/powerpc/kernel/setup_32.c:68:30: error: no previous prototype for ‘early_init’ arch/powerpc/kernel/setup_32.c:99:21: error: no previous prototype for ‘machine_init’ Signed-off-by: Mathieu Malaterre <malat@debian.org> [mpe: Move them to asm-prototypes.h, drop other functions] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:42 +11:00
Mathieu Malaterre	d15a261d87	powerpc/32: Make some functions static These functions can all be static, make it so. Signed-off-by: Mathieu Malaterre <malat@debian.org> [mpe: Combine a patch of Mathieu's with some other static conversions] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:42 +11:00
Mathieu Malaterre	ef85dffd42	powerpc: Avoid comparison of unsigned long >= 0 in __access_ok() Rewrite function-like macro into regular static inline function to avoid a warning during macro expansion. Fix warning (treated as error in W=1): ./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of unsigned expression >= 0 is always true (((size) == 0) \|\| (((size) - 1) <= ((segment).seg - (addr))))) ^ Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:41 +11:00
Mathieu Malaterre	603b892200	powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid() Rewrite comparison since all values compared are of type `unsigned long`. Instead of using unsigned properties and rewriting the original code as: (originally suggested by Segher Boessenkool <segher@kernel.crashing.org>) #define pfn_valid(pfn) \ (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr - ARCH_PFN_OFFSET)) Prefer a static inline function to make code as readable as possible. Fix a warning (treated as error in W=1): arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] #define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr) ^ Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:41 +11:00
Mathieu Malaterre	4f1f40f7b2	powerpc/prom: Remove warning on array size when empty When neither CONFIG_ALTIVEC, nor CONFIG_VSX or CONFIG_PPC64 is defined, the array feature_properties is defined as an empty array, which in turn triggers the following warning (treated as error on W=1): arch/powerpc/kernel/prom.c: In function ‘check_cpu_feature_properties’: arch/powerpc/kernel/prom.c:298:16: error: comparison of unsigned expression < 0 is always false for (i = 0; i < ARRAY_SIZE(feature_properties); ++i, ++fp) { ^ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:40 +11:00
Mathieu Malaterre	bf7fb32dd5	powerpc: Add missing prototypes for ppc_select() & ppc_fadvise64_64() Add missing prototypes for ppc_select() & ppc_fadvise64_64() to header asm-prototypes.h. Fix the following warnings (treated as errors in W=1) arch/powerpc/kernel/syscalls.c:87:1: error: no previous prototype for ‘ppc_select’ arch/powerpc/kernel/syscalls.c:119:6: error: no previous prototype for ‘ppc_fadvise64_64’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:40 +11:00
Mathieu Malaterre	b0d876da1d	powerpc: Add missing prototypes for hw_breakpoint_handler() & arch_unregister_hw_breakpoint() In commit `5aae8a5370` ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors") function hw_breakpoint_handler() and arch_unregister_hw_breakpoint() were added without function prototypes in hw_breakpoint.h header. Fix the following warning(s) (treated as error in W=1): arch/powerpc/kernel/hw_breakpoint.c:106:6: error: no previous prototype for ‘arch_unregister_hw_breakpoint’ arch/powerpc/kernel/hw_breakpoint.c:209:5: error: no previous prototype for ‘hw_breakpoint_handler’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:39 +11:00
Mathieu Malaterre	b53875c4b4	powerpc: Add missing prototypes for sys_sigreturn() & sys_rt_sigreturn() Two functions did not have a prototype defined in signal.h header. Fix the following two warnings (treated as errors in W=1): arch/powerpc/kernel/signal_32.c:1135:6: error: no previous prototype for ‘sys_rt_sigreturn’ arch/powerpc/kernel/signal_32.c:1422:6: error: no previous prototype for ‘sys_sigreturn’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:39 +11:00
Mathieu Malaterre	0d60619e1c	powerpc: Add missing prototype for sys_debug_setcontext() In commit `81e7009ea4` ("powerpc: merge ppc signal.c and ppc64 signal32.c") the function sys_debug_setcontext was added without a prototype. Fix compilation warning (treated as error in W=1): arch/powerpc/kernel/signal_32.c:1227:5: error: no previous prototype for ‘sys_debug_setcontext’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:38 +11:00
Mathieu Malaterre	23a6d8b963	powerpc: Add missing prototype for init_IRQ() A function init_IRQ() was added without a prototype declared in header irq.h. Fix the following warning (treated as error in W=1): arch/powerpc/kernel/irq.c:662:13: error: no previous prototype for ‘init_IRQ’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:38 +11:00
Mathieu Malaterre	f5246862f8	powerpc: Add missing prototype for arch_irq_work_raise() In commit `4f8b50bbbe` ("irq_work, ppc: Fix up arch hooks") a new function arch_irq_work_raise() was added without a prototype in header irq_work.h. Fix the following warning (treated as error in W=1): arch/powerpc/kernel/time.c:523:6: error: no previous prototype for ‘arch_irq_work_raise’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:37 +11:00
Mathieu Malaterre	fd70d9f96d	powerpc: Add missing prototype for arch_dup_task_struct() In commit `55ccf3fe3f` ("fork: move the real prepare_to_copy() users to arch_dup_task_struct()") a new arch_dup_task_struct() was added without a prototype declared in thread_info.h header. Fix the following warning (treated as error in W=1): arch/powerpc/kernel/process.c:1609:5: error: no previous prototype for ‘arch_dup_task_struct’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:37 +11:00
Mathieu Malaterre	848092faa0	powerpc: Add missing prototype for time_init() The function time_init did not have a prototype defined in the time.h header. Fix the following warning (treated as error in W=1): arch/powerpc/kernel/time.c:1068:13: error: no previous prototype for ‘time_init’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:36 +11:00
Mathieu Malaterre	8b604faff7	powerpc: Add missing prototype for hdec_interrupt In commit `dabe859ec6` ("powerpc: Give hypervisor decrementer interrupts their own handler") an empty body function was added, but no prototype was declared. Fix warning (treated as error in W=1): arch/powerpc/kernel/time.c:629:6: error: no previous prototype for ‘hdec_interrupt’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:35 +11:00
Mathieu Malaterre	45b4d27a38	powerpc: Add missing prototype for slb_miss_bad_addr() In commit `f0f558b131` ("powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address"), the function slb_miss_bad_addr() was added without a prototype. This commit adds it. Fix a warning (treated as error in W=1): arch/powerpc/kernel/traps.c:1498:6: error: no previous prototype for ‘slb_miss_bad_addr’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:35 +11:00
Mathieu Malaterre	1cdf039bf8	powerpc/kernel: Make function __giveup_fpu() static __giveup_fpu() is never called outside process.c, so it can be static. That also means we don't need an empty definition in switch_to.h Signed-off-by: Mathieu Malaterre <malat@debian.org> [mpe: Also drop the empty version, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:35 +11:00
Mathieu Malaterre	8b51e679a5	powerpc/embedded6xx: Make functions flipper_pic_init() & ug_udbg_putc() static Change signature of two functions, adding static keyword to prevent the following two warnings (treated as errors on W=1): arch/powerpc/platforms/embedded6xx/flipper-pic.c:135:28: error: no previous prototype for ‘flipper_pic_init’ arch/powerpc/platforms/embedded6xx/usbgecko_udbg.c:172:6: error: no previous prototype for ‘ug_udbg_putc’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:34 +11:00
Mathieu Malaterre	67b464a89c	powerpc/32: Mark both tmp variables as unused Since the value of `tmp` is never intended to be read, declare both `tmp` variables as unused. Fix warning (treated as error in W=1): arch/powerpc/kernel/signal_32.c: In function ‘sys_swapcontext’: arch/powerpc/kernel/signal_32.c:1048:16: error: variable ‘tmp’ set but not used arch/powerpc/kernel/signal_32.c: In function ‘sys_debug_setcontext’: arch/powerpc/kernel/signal_32.c🔢16: error: variable ‘tmp’ set but not used Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:33 +11:00
Mathieu Malaterre	174b701d3d	powerpc/32: Move the inline keyword at the beginning of function declaration The inline keyword was not at the beginning of the function declaration. Fix the following warning (treated as error in W=1): arch/powerpc/lib/sstep.c:283:1: error: ‘inline’ is not at beginning of declaration static int nokprobe_inline copy_mem_in(u8 dest, unsigned long ea, int nb, arch/powerpc/lib/sstep.c:388:1: error: ‘inline’ is not at beginning of declaration static int nokprobe_inline copy_mem_out(u8 dest, unsigned long ea, int nb, Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:33 +11:00
Mathieu Malaterre	65e13c202d	powerpc/epapr: Move register keyword at the beginning of declaration Fix warning for all register unsigned long (0,3-12) that appear during W=1 compilation: ./arch/powerpc/include/asm/epapr_hcalls.h:479:2: warning: ‘register’ is not at beginning of declaration [-Wold-style-declaration] unsigned long register r[\d] asm("r[\d]"); Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:32 +11:00
Balbir Singh	5ee573e8ef	powerpc/powernv/mce: Don't silently restart the machine On MCE the current code will restart the machine with ppc_md.restart(). This case was extremely unlikely since prior to that a skiboot call is made and that resulted in a checkstop for analysis. With newer skiboots, on P9 we don't checkstop the box by default, instead we return back to the kernel to extract useful information at the time of the MCE. While we still get this information, this patch converts the restart to a panic(), so that if configured a dump can be taken and we can track and probably debug the potential issue causing the MCE. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:31 +11:00
Philippe Bergheaud	d6a90bb83b	powerpc/powernv: Enable tunneled operations P9 supports PCI tunneled operations (atomics and as_notify). This patch adds support for tunneled operations on powernv, with a new API, to be called by device drivers: pnv_pci_enable_tunnel() Enable tunnel operations, tell driver the 16-bit ASN indication used by kernel. pnv_pci_disable_tunnel() Disable tunnel operations. pnv_pci_set_tunnel_bar() Tell kernel the Tunnel BAR Response address used by driver. This function uses two new OPAL calls, as the PBCQ Tunnel BAR register is configured by skiboot. pnv_pci_get_as_notify_info() Return the ASN info of the thread to be woken up. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:30 +11:00
Alistair Popple	2b74e2a9b3	powerpc/powernv/npu: Fix deadlock in mmio_invalidate() When sending TLB invalidates to the NPU we need to send extra flushes due to a hardware issue. The original implementation would lock the all the ATSD MMIO registers sequentially before unlocking and relocking each of them sequentially to do the extra flush. This introduced a deadlock as it is possible for one thread to hold one ATSD register whilst waiting for another register to be freed while the other thread is holding that register waiting for the one in the first thread to be freed. For example if there are two threads and two ATSD registers: Thread A Thread B ---------------------- Acquire 1 Acquire 2 Release 1 Acquire 1 Wait 1 Wait 2 Both threads will be stuck waiting to acquire a register resulting in an RCU stall warning or soft lockup. This patch solves the deadlock by refactoring the code to ensure registers are not released between flushes and to ensure all registers are either acquired or released together and in order. Fixes: `bbd5ff50af` ("powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD") Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:29 +11:00
Christophe Leroy	c554ac91ce	powerpc/8xx: fix cpm_cascade() dual end of interrupt cpm_cascade() doesn't have to call eoi() as it is already called by handle_fasteoi_irq() And cpm_get_irq() will always return an unsigned int so the test is useless Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:28 +11:00
Anshuman Khandual	3d4f5f5848	powerpc/mm: Drop the function native_register_proc_table() This is left over from the segment table implementation and not getting called from any where now. Hence just drop it. Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:27 +11:00
Vaibhav Jain	1ff3b40401	powerpc/xmon: Clear all breakpoints when xmon is disabled via debugfs Presently when xmon is disabled by debugfs any existing instruction/data-access breakpoints set are not disabled. This may lead to kernel oops when those breakpoints are hit as the necessary debugger hooks aren't installed. Hence this patch introduces a new function named clear_all_bpt() which is called when xmon is disabled via debugfs. The function will unpatch/clear all the trap and ciabr/dab based breakpoints. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> [mpe: Fix build break when CONFIG_DEBUG_FS=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:05 +11:00
Vaibhav Jain	e1368d0c9e	powerpc/xmon: Setup debugger hooks when first break-point is set Presently sysrq key for xmon('x') is registered during kernel init irrespective of the value of kernel param 'xmon'. Thus xmon is enabled even if 'xmon=off' is passed on the kernel command line. However this doesn't enable the kernel debugger hooks needed for instruction or data breakpoints. Thus when a break-point is hit with xmon=off a kernel oops of the form below is reported: Oops: Exception in kernel mode, sig: 5 [#1] < snip > Trace/breakpoint trap To fix this the patch checks and enables debugger hooks when an instruction or data break-point is set via xmon console. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> [mpe: Just printf directly, no need for static const char[]] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:10:16 +11:00
Sukadev Bhattiprolu	1373cc3107	powerpc/powernv/vas: Fix order of cleanup in vas_window_init_dbgdir() Fix the order of cleanup to ensure we free the name buffer in case of an error creating 'hvwc' or 'info' files. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:10:15 +11:00
Sukadev Bhattiprolu	2f65272a2a	powerpc/powernv/vas: Remove a stray line in Makefile Remove a bogus line from arch/powerpc/platforms/powernv/Makefile that was added by commit `ece4e51` ("powerpc/vas: Export HVWC to debugfs"). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:10:14 +11:00
Michael Ellerman	2852aab819	Merge two commits from 'kvm-ppc-fixes' into next This merges two commits from the `kvm-ppc-fixes` branch into next, as they fix build breaks we are seeing while testing next.	2018-03-13 15:08:41 +11:00
Peter Zijlstra	8343aae661	perf/core: Remove perf_event::group_entry Now that all the grouping is done with RB trees, we no longer need group_entry and can replace the whole thing with sibling_list. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valery Cherepennikov <valery.cherepennikov@intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-12 15:28:49 +01:00
Linus Torvalds	cdb06e9d8f	KVM fixes for v4.16-rc5 PPC: - Fix guest time accounting in the host - Fix large-page backing for radix guests on POWER9 - Fix HPT guests on POWER9 backed by 2M or 1G pages - Compile fixes for some configs and gcc versions s390: - Fix random memory corruption when running as guest2 (e.g. KVM in LPAR) and starting guest3 (e.g. nested KVM) with many CPUs - Export forgotten io interrupt delivery statistics counter -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJaoupNAAoJEED/6hsPKofoT2gH/1P6cNF3Gz3T7fetuan5Tyhw o5zduUxkM2AmTxzim9GPsKc5nsnPMnDraHCpQ9O2WW6VHvRZ6pwgtbtEtF2cx2Hv 70o5gQzap/odA8eoV98xRbZ+aZHrZgs/z2Ql5eXk32BIs57TOQZVQ/mW+BA4Ixos 8lOLYXuNY0lSL7Cp0MBY76ed8100ZTp7sfFByg3AmWDy7qhYOk9C3wPYNqPNfkuB eqZ9XPKV3vJuqc2xSJvsW4DjBcqZTxqDbw+EOA3MuUMH/JRkt7OuPgQHl/G5BqAW JcLb8ENfXNbamiakvarVenvnAJKJTLs8s9K8Z8ADXolY6NlGysGcrJo5dnGWN5A= =UxpG -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "PPC: - Fix guest time accounting in the host - Fix large-page backing for radix guests on POWER9 - Fix HPT guests on POWER9 backed by 2M or 1G pages - Compile fixes for some configs and gcc versions s390: - Fix random memory corruption when running as guest2 (e.g. KVM in LPAR) and starting guest3 (e.g. nested KVM) with many CPUs - Export forgotten io interrupt delivery statistics counter" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: fix memory overwrites when not using SCA entries KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler KVM: s390: provide io interrupt kvm_stat KVM: PPC: Book3S: Fix compile error that occurs with some gcc versions KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n	2018-03-09 16:59:19 -08:00
Linus Torvalds	a525df0558	powerpc fixes for 4.16 #5 One notable fix to properly advertise our support for a new firmware feature, caused by two series conflicting semantically but not textually. There's a new ioctl for the new ocxl driver, which is not a fix, but needed to complete the userspace API and good to have before the driver is in a released kernel. Finally three minor selftest fixes, and a fix for intermittent build failures for some obscure platforms, caused by a missing make dependency. Thanks to: Alastair D'Silva, Bharata B Rao, Guenter Roeck. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJaomOTExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBV LQ/+MyHQH7d/lhqRLWa3BVgy9mn2QfFTkRNMi46eqIWyk6wD5Qf//gzO+vTDzcEO rzkBu3XwhuhDCpCInLtJH17X8iKVy65Uk2JQ808pZTm0WnNTBv0ag251QARDg//B S1QmCQGJkpoUJ85+IDqQzV90K2o13BPWkAOrVh40G5GASvFGZRdxuTuo4DKeIe/j 0yri2LRuL+cpcfmZqaJVco8l2tqJEi6zJRhL/ORrLr4XSy0clxF1q1MYshNWPDsB 3l4r/yKz2aMltplknVVrpdMThUTD311kknIHoQuuEYvd6GxDz52d3B1lvE5Xb8EB sNJQX6q8ydab0u79/tTsXUm/EyfmMd6HMKOeVnQim19tnEe0wW6VkaPDNWWXa66U hW0qx7rX+zMSHzEcgLF7HKirzCQi2oS5ZMqXvhHlOPBb8Iy0O5za670AtvMGgm7/ NKXKML31opgXmTcU2ZxBMrtL0S8ft3wHCKRLkB6H8GHE+6//Ps87bQaUo4v2KsOr 2T/2w6TVtAwxLrASpCVonfcmIjjjIK4WILaOKp24Yzyv0eqen6Z3/kkga37sLgj8 f11HzTfFWO5ckroHrJSVCOC48eWb+O1CWRwS4rofL1jLfucpv6VFTPK6F4G47pAj pZyIJMTem0AEgsxVBJlsw6TibGH5W6rzDE/2WpEqxb7iZh0= =ajRr -----END PGP SIGNATURE----- Merge tag 'powerpc-4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One notable fix to properly advertise our support for a new firmware feature, caused by two series conflicting semantically but not textually. There's a new ioctl for the new ocxl driver, which is not a fix, but needed to complete the userspace API and good to have before the driver is in a released kernel. Finally three minor selftest fixes, and a fix for intermittent build failures for some obscure platforms, caused by a missing make dependency. Thanks to: Alastair D'Silva, Bharata B Rao, Guenter Roeck" * tag 'powerpc-4.16-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries: Fix vector5 in ibm architecture vector table ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL ocxl: Add get_metadata IOCTL to share OCXL information to userspace selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable selftests/powerpc: Fix missing clean of pmu/lib.o powerpc/boot: Fix random libfdt related build errors selftests/powerpc: Skip tm-trap if transactional memory is not enabled	2018-03-09 09:33:48 -08:00
Wanpeng Li	a4429e53c9	KVM: Introduce paravirtualization hints and KVM_HINTS_DEDICATED This patch introduces kvm_para_has_hint() to query for hints about the configuration of the guests. The first hint KVM_HINTS_DEDICATED, is set if the guest has dedicated physical CPUs for each vCPU (i.e. pinning and no over-commitment). This allows optimizing spinlocks and tells the guest to avoid PV TLB flush. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-06 18:40:44 +01:00
Radim Krčmář	db5679379a	Fixes for PPC KVM: - Fix guest time accounting in the host - Fix large-page backing for radix guests on POWER9 - Fix HPT guests on POWER9 backed by 2M or 1G pages - Compile fixes for some configs and gcc versions -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJammR1AAoJEJ2a6ncsY3Gf2pQIAKf1sBKimpDj/yeBlWVbS41q 6mIJh8R+4DV7TcOSVoOdzUz1dU1cseVznqqr3kexu+unoUpcqm240ZUDsDNWy9j0 Xv0JyrGOcPor9sQmlb1s2gOsybxhic4u8Ih1eQV47bEUw1Rb84/da0JI1u5nMRTq nm3OHPGSnK2C8UkBBVjGelLJGUx+uaLFLjJSSTd0F9+hlxjGT3yXjP3wLG/ZNajT 6Reuzpr95hGpmIaml8gh73clLk4WAjF3+5SyiLo5nlsXzvMnC0DyzaUrHocIo6i7 nZxrx9UguzEdiUbuc5NEs4klTc+GPwMCfd+7z6vmtyw87A0sVOUgGWNGcZL60ew= =Wy2j -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-fixes-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc Fixes for PPC KVM: - Fix guest time accounting in the host - Fix large-page backing for radix guests on POWER9 - Fix HPT guests on POWER9 backed by 2M or 1G pages - Compile fixes for some configs and gcc versions	2018-03-06 17:24:09 +01:00
Bharata B Rao	b0c41b8b6e	powerpc/pseries: Fix vector5 in ibm architecture vector table With ibm,dynamic-memory-v2 and ibm,drc-info coming around the same time, byte22 in vector5 of ibm architecture vector table got set twice separately. The end result is that guest kernel isn't advertising support for ibm,dynamic-memory-v2. Fix this by removing the duplicate assignment of byte22. Fixes: `02ef6dd810` ("powerpc: Enable support for ibm,drc-info devtree property") Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 23:05:38 +11:00
Rob Herring	a54b81ea24	powerpc: boot: add strrchr function libfdt gained a new dependency on strrchr, so copy the implementation from lib/string.c. Most of the string functions are in assembly, but stdio.c already has strnlen, so add strrchr there. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Rob Herring <robh@kernel.org>	2018-03-05 20:58:17 -06:00
Christophe Leroy	4bd13772ee	powerpc/8xx: Increase number of slices to 64 On the 8xx, the minimum slice size is the size of the area covered by a single PMD entry, ie 4M in 4K pages mode and 64M in 16K pages mode. This patch increases the number of slices from 16 to 64 on the 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:24 +11:00
Christophe Leroy	15472423ce	powerpc/mm/slice: Allow up to 64 low slices While the implementation of the "slices" address space allows a significant amount of high slices, it limits the number of low slices to 16 due to the use of a single u64 low_slices_psize element in struct mm_context_t On the 8xx, the minimum slice size is the size of the area covered by a single PMD entry, ie 4M in 4K pages mode and 64M in 16K pages mode. This means we could have at least 64 slices. In order to override this limitation, this patch switches the handling of low_slices_psize to char array as done already for high_slices_psize. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:23 +11:00
Christophe Leroy	aa0ab02ba9	powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx On the 8xx, the page size is set in the PMD entry and applies to all pages of the page table pointed by the said PMD entry. When an app has some regular pages allocated (e.g. see below) and tries to mmap() a huge page at a hint address covered by the same PMD entry, the kernel accepts the hint allthough the 8xx cannot handle different page sizes in the same PMD entry. 10000000-10001000 r-xp 00000000 00:0f 2597 /root/malloc 10010000-10011000 rwxp 00000000 00:0f 2597 /root/malloc mmap(0x10080000, 524288, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS\|0x40000, -1, 0) = 0x10080000 This results the app remaining forever in do_page_fault()/hugetlb_fault() and when interrupting that app, we get the following warning: [162980.035629] WARNING: CPU: 0 PID: 2777 at arch/powerpc/mm/hugetlbpage.c:354 hugetlb_free_pgd_range+0xc8/0x1e4 [162980.035699] CPU: 0 PID: 2777 Comm: malloc Tainted: G W 4.14.6 #85 [162980.035744] task: c67e2c00 task.stack: c668e000 [162980.035783] NIP: c000fe18 LR: c00e1eec CTR: c00f90c0 [162980.035830] REGS: c668fc20 TRAP: 0700 Tainted: G W (4.14.6) [162980.035854] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24044224 XER: 20000000 [162980.036003] [162980.036003] GPR00: c00e1eec c668fcd0 c67e2c00 00000010 c6869410 10080000 00000000 77fb4000 [162980.036003] GPR08: ffff0001 0683c001 00000000 ffffff80 44028228 10018a34 00004008 418004fc [162980.036003] GPR16: c668e000 00040100 c668e000 c06c0000 c668fe78 c668e000 c6835ba0 c668fd48 [162980.036003] GPR24: 00000000 73ffffff 74000000 00000001 77fb4000 100fffff 10100000 10100000 [162980.036743] NIP [c000fe18] hugetlb_free_pgd_range+0xc8/0x1e4 [162980.036839] LR [c00e1eec] free_pgtables+0x12c/0x150 [162980.036861] Call Trace: [162980.036939] [c668fcd0] [c00f0774] unlink_anon_vmas+0x1c4/0x214 (unreliable) [162980.037040] [c668fd10] [c00e1eec] free_pgtables+0x12c/0x150 [162980.037118] [c668fd40] [c00eabac] exit_mmap+0xe8/0x1b4 [162980.037210] [c668fda0] [c0019710] mmput.part.9+0x20/0xd8 [162980.037301] [c668fdb0] [c001ecb0] do_exit+0x1f0/0x93c [162980.037386] [c668fe00] [c001f478] do_group_exit+0x40/0xcc [162980.037479] [c668fe10] [c002a76c] get_signal+0x47c/0x614 [162980.037570] [c668fe70] [c0007840] do_signal+0x54/0x244 [162980.037654] [c668ff30] [c0007ae8] do_notify_resume+0x34/0x88 [162980.037744] [c668ff40] [c000dae8] do_user_signal+0x74/0xc4 [162980.037781] Instruction dump: [162980.037821] 7fdff378 81370000 54a3463a 80890020 7d24182e 7c841a14 712a0004 4082ff94 [162980.038014] 2f890000 419e0010 712a0ff0 408200e0 <0fe00000> 54a9000a 7f984840 419d0094 [162980.038216] ---[ end trace c0ceeca8e7a5800a ]--- [162980.038754] BUG: non-zero nr_ptes on freeing mm: 1 [162985.363322] BUG: non-zero nr_ptes on freeing mm: -1 In order to fix this, this patch uses the address space "slices" implemented for BOOK3S/64 and enhanced to support PPC32 by the preceding patch. This patch modifies the context.id on the 8xx to be in the range [1:16] instead of [0:15] in order to identify context.id == 0 as not initialised contexts as done on BOOK3S This patch activates CONFIG_PPC_MM_SLICES when CONFIG_HUGETLB_PAGE is selected for the 8xx Alltough we could in theory have as many slices as PMD entries, the current slices implementation limits the number of low slices to 16. This limitation is not preventing us to fix the initial issue allthough it is suboptimal. It will be cured in a subsequent patch. Fixes: `4b91428699` ("powerpc/8xx: Implement support of hugepages") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:23 +11:00
Christophe Leroy	db3a528db4	powerpc/mm/slice: Enhance for supporting PPC32 In preparation for the following patch which will fix an issue on the 8xx by re-using the 'slices', this patch enhances the 'slices' implementation to support 32 bits CPUs. On PPC32, the address space is limited to 4Gbytes, hence only the low slices will be used. The high slices use bitmaps. As bitmap functions are not prepared to handle bitmaps of size 0, this patch ensures that bitmap functions are called only when SLICE_NUM_HIGH is not nul. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:23 +11:00
Christophe Leroy	a3286f05bc	powerpc/mm/slice: create header files dedicated to slices In preparation for the following patch which will enhance 'slices' for supporting PPC32 in order to fix an issue on hugepages on 8xx, this patch takes out of page*.h all bits related to 'slices' and put them into newly created slice.h header files. While common parts go into asm/slice.h, subarch specific parts go into respective books3s/64/slice.c and nohash/64/slice.c 'slices' Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:22 +11:00
Christophe Leroy	326691ad4f	powerpc/mm/slice: Remove intermediate bitmap copy bitmap_or() and bitmap_andnot() can work properly with dst identical to src1 or src2. There is no need of an intermediate result bitmap that is copied back to dst in a second step. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:22 +11:00
Segher Boessenkool	51d42f0f5f	powerpc: Keep const vars out of writable .sdata Newer gcc will support "-mno-readonly-in-sdata"[1], which makes sure that the optimization on PPC32 for variables getting moved into the .sdata section will not apply to const variables (which must be in .rodata). This was originally noticed in mm/rodata_test.c when rodata_test_data was not static: c0695034 g O .data 00000004 rodata_test_data After this patch with an updated compiler, this is correctly in .rodata. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82411 Reported-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:21 +11:00
Linus Torvalds	547046141f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Use an appropriate TSQ pacing shift in mac80211, from Toke Høiland-Jørgensen. 2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk in ip6_route_me_harder, from Eric Dumazet. 3) Fix several shutdown races and similar other problems in l2tp, from James Chapman. 4) Handle missing XDP flush properly in tuntap, for real this time. From Jason Wang. 5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel Borkmann. 6) Fix phy_resume() locking, from Andrew Lunn. 7) IFLA_MTU values are ignored on newlink for some tunnel types, fix from Xin Long. 8) Revert F-RTO middle box workarounds, they only handle one dimension of the problem. From Yuchung Cheng. 9) Fix socket refcounting in RDS, from Ka-Cheong Poon. 10) Don't allow ppp unit registration to an unregistered channel, from Guillaume Nault. 11) Various hv_netvsc fixes from Stephen Hemminger. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits) hv_netvsc: propagate rx filters to VF hv_netvsc: filter multicast/broadcast hv_netvsc: defer queue selection to VF hv_netvsc: use napi_schedule_irqoff hv_netvsc: fix race in napi poll when rescheduling hv_netvsc: cancel subchannel setup before halting device hv_netvsc: fix error unwind handling if vmbus_open fails hv_netvsc: only wake transmit queue if link is up hv_netvsc: avoid retry on send during shutdown virtio-net: re enable XDP_REDIRECT for mergeable buffer ppp: prevent unregistered channels from connecting to PPP units tc-testing: skbmod: fix match value of ethertype mlxsw: spectrum_switchdev: Check success of FDB add operation net: make skb_gso_*_seglen functions private net: xfrm: use skb_gso_validate_network_len() to check gso sizes net: sched: tbf: handle GSO_BY_FRAGS case in enqueue net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len rds: Incorrect reference counting in TCP socket creation net: ethtool: don't ignore return from driver get_fecparam method vrf: check forwarding on the original netdevice when generating ICMP dest unreachable ...	2018-03-05 11:29:24 -08:00
Laurent Vivier	61bd0f66ff	KVM: PPC: Book3S HV: Fix guest time accounting with VIRT_CPU_ACCOUNTING_GEN Since commit `8b24e69fc4` ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry"), if CONFIG_VIRT_CPU_ACCOUNTING_GEN is set, the guest time is not accounted to guest time and user time, but instead to system time. This is because guest_enter()/guest_exit() are called while interrupts are disabled and the tick counter cannot be updated between them. To fix that, move guest_exit() after local_irq_enable(), and as guest_enter() is called with IRQ disabled, call guest_enter_irqoff() instead. Fixes: `8b24e69fc4` ("KVM: PPC: Book3S HV: Close race with testing for signals on guest entry") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-03 19:28:34 +11:00
Paul Mackerras	debd574f41	KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing The current code for initializing the VRMA (virtual real memory area) for HPT guests requires the page size of the backing memory to be one of 4kB, 64kB or 16MB. With a radix host we have the possibility that the backing memory page size can be 2MB or 1GB. In these cases, if the guest switches to HPT mode, KVM will not initialize the VRMA and the guest will fail to run. In fact it is not necessary that the VRMA page size is the same as the backing memory page size; any VRMA page size less than or equal to the backing memory page size is acceptable. Therefore we now choose the largest page size out of the set {4k, 64k, 16M} which is not larger than the backing memory page size. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-02 15:38:24 +11:00

... 3 4 5 6 7 ...

18278 Commits