OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jike Song	871b7ef2a1	kvm/page_track: export symbols for external usage Signed-off-by: Jike Song <jike.song@intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-04 12:13:20 +01:00
Jike Song	d126363d8f	kvm/page_track: call notifiers with kvm_page_track_notifier_node The user of page_track might needs extra information, so pass the kvm_page_track_notifier_node to callbacks. Signed-off-by: Jike Song <jike.song@intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-04 12:13:20 +01:00
Xiaoguang Chen	ae7cd87372	KVM: x86: add track_flush_slot page track notifier When a memory slot is being moved or removed users of page track can be notified. So users can drop write-protection for the pages in that memory slot. This notifier type is needed by KVMGT to sync up its shadow page table when memory slot is being moved or removed. Register the notifier type track_flush_slot to receive memslot move and remove event. Reviewed-by: Xiao Guangrong <guangrong.xiao@intel.com> Signed-off-by: Chen Xiaoguang <xiaoguang.chen@intel.com> [Squashed commits to avoid bisection breakage and reworded the subject.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-11-04 12:13:19 +01:00
Owen Hofmann	d9092f52d7	kvm: x86: Check memopp before dereference (CVE-2016-8630) Commit `41061cdb98` ("KVM: emulate: do not initialize memopp") removes a check for non-NULL under incorrect assumptions. An undefined instruction with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt to dereference a null pointer here. Fixes: `41061cdb98` Message-Id: <1477592752-126650-2-git-send-email-osh@google.com> Signed-off-by: Owen Hofmann <osh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-02 21:31:53 +01:00
Jim Mattson	355f4fb140	kvm: nVMX: VMCLEAR an active shadow VMCS after last use After a successful VM-entry with the "VMCS shadowing" VM-execution control set, the shadow VMCS referenced by the VMCS link pointer field in the current VMCS becomes active on the logical processor. A VMCS that is made active on more than one logical processor may become corrupted. Therefore, before an active VMCS can be migrated to another logical processor, the first logical processor must execute a VMCLEAR for the active VMCS. VMCLEAR both ensures that all VMCS data are written to memory and makes the VMCS inactive. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-By: David Matlack <dmatlack@google.com> Message-Id: <1477668579-22555-1-git-send-email-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-02 20:03:17 +01:00
Paolo Bonzini	ea26e4ec08	KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK Since commit `a545ab6a00` ("kvm: x86: add tsc_offset field to struct kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is cached and need not be fished out of the VMCS or VMCB. This means that we can implement adjust_tsc_offset_guest and read_l1_tsc entirely in generic code. The simplification is particularly significant for VMX code, where vmx->nested.vmcs01_tsc_offset was duplicating what is now in vcpu->arch.tsc_offset. Therefore the vmcs01_tsc_offset can be dropped completely. More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK which, after commit `108b249c45` ("KVM: x86: introduce get_kvmclock_ns", 2016-09-01) called read_l1_tsc while the VMCS was not loaded. It thus returned bogus values on Intel CPUs. Fixes: `108b249c45` Reported-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-11-02 20:03:07 +01:00
Ido Yariv	bd768e1466	KVM: x86: fix wbinvd_dirty_mask use-after-free vcpu->arch.wbinvd_dirty_mask may still be used after freeing it, corrupting memory. For example, the following call trace may set a bit in an already freed cpu mask: kvm_arch_vcpu_load vcpu_load vmx_free_vcpu_nested vmx_free_vcpu kvm_arch_vcpu_free Fix this by deferring freeing of wbinvd_dirty_mask. Cc: stable@vger.kernel.org Signed-off-by: Ido Yariv <ido@wizery.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-10-28 11:35:21 +02:00
Borislav Petkov	796f4687bd	kvm/x86: Show WRMSR data is in hex Add the "0x" prefix to the error messages format to make it unambiguous about what kind of value we're talking about. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Message-Id: <20161027181445.25319-1-bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-27 20:31:25 +02:00
Paolo Bonzini	b5149a5fd1	KVM: s390: Fix wrong memory allocation With commit `d86bd1bece` ("mm/slub: support left redzone") or with slab debugging the allocation of our diag224 buffer is not aligned properly. Let's fix this. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJYELGIAAoJEBF7vIC1phx8eY0P/jGHsq8vvtvJFQxwMLx0MPkS HOd/kw9nupyNeJMrvLGgcpyxfLTbCJsdA5P+ZpgZEUygwAkjs/EFbzGfgWL6kY4t pT9l4uo0LfdGdNITNVWNgUvpFC4VIIlyQAeTdMG2+hUlaIF0I1zpdQDYa9umacKN lGuKRObTH7kVX6yj02kTTallBKDdtl45fSAPCoMSFImi+Hlyn8BBVTlbNPujoOk0 i50vWzjn711Fm/qnc0p/8XpLTcZLdYPm8vToxD6WR6C8mjepb3+JyHgTx4u9WRVY cvVVTrqDvrUaqOss4KTssbgzSKZ74aPPlQm27n2l8scpAcHbr+1ZY59PnkEdoF2Y jA5JHeOfqijneOf0F7vwGAG2c4MU5rIVq7bFbxVGVglb3S5TBTRnd63Q0kUsJc1g amrdFOjXtkyqpusEJMIsLMkhw2ICjQ7D2oXIRq8GBqxnEoMh99KUYdFu3Bd21/Bv uP5vBvvuKx0jtD7WBkIOL9Df0YEhKYwqmonubPyUwr/8ok04g/dKTPgelXzCVFts e2uXkQ0uSxT2l04E2iSiq7qMM4CMNVqhkJHHHyBAxisHpM3Wdbyq2y0evOqATwzN F26a4JAEhOxRbytnaoRH/XaGgGLfKFegSHZoJ/dtrxuBIBV2t4aOHHhR0N0FTyK0 bPCH857jWJV2yB1ym4OY =YoA+ -----END PGP SIGNATURE----- Merge tag 'kvm-s390-master-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix wrong memory allocation With commit `d86bd1bece` ("mm/slub: support left redzone") or with slab debugging the allocation of our diag224 buffer is not aligned properly. Let's fix this.	2016-10-27 13:22:54 +02:00
Jim Mattson	85c856b39b	kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types Bitwise shifts by amounts greater than or equal to the width of the left operand are undefined. A malicious guest can exploit this to crash a 32-bit host, due to the BUG_ON(1)'s in handle_{invept,invvpid}. Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <1477496318-17681-1-git-send-email-jmattson@google.com> [Change 1UL to 1, to match the range check on the shift count. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-27 12:15:27 +02:00
Janosch Frank	45c7ee43a5	KVM: s390: Fix STHYI buffer alignment for diag224 Diag224 requires a page-aligned 4k buffer to store the name table into. kmalloc does not guarantee page alignment, hence we replace it with __get_free_page for the buffer allocation. Cc: stable@vger.kernel.org # v4.8+ Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-10-26 13:46:44 +02:00
James Hogan	e1e575f6b0	KVM: MIPS: Precalculate MMIO load resume PC The advancing of the PC when completing an MMIO load is done before re-entering the guest, i.e. before restoring the guest ASID. However if the load is in a branch delay slot it may need to access guest code to read the prior branch instruction. This isn't safe in TLB mapped code at the moment, nor in the future when we'll access unmapped guest segments using direct user accessors too, as it could read the branch from host user memory instead. Therefore calculate the resume PC in advance while we're still in the right context and save it in the new vcpu->arch.io_pc (replacing the no longer needed vcpu->arch.pending_load_cause), and restore it on MMIO completion. Fixes: `e685c689f3` ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.10.x- Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-26 13:43:55 +02:00
James Hogan	ede5f3e7b5	KVM: MIPS: Make ERET handle ERL before EXL The ERET instruction to return from exception is used for returning from exception level (Status.EXL) and error level (Status.ERL). If both bits are set however we should be returning from ERL first, as ERL can interrupt EXL, for example when an NMI is taken. KVM however checks EXL first. Fix the order of the checks to match the pseudocode in the instruction set manual. Fixes: `e685c689f3` ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.10.x- Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-26 13:43:48 +02:00
James Hogan	9078210ef4	KVM: MIPS: Fix lazy user ASID regenerate for SMP kvm_mips_check_asids() runs before entering the guest and performs lazy regeneration of host ASID for guest usermode, using last_user_gasid to track the last guest ASID in the VCPU that was used by guest usermode on any host CPU. last_user_gasid is reset after performing the lazy ASID regeneration on the current CPU, and by kvm_arch_vcpu_load() if the host ASID for guest usermode is regenerated due to staleness (to cancel outstanding lazy ASID regenerations). Unfortunately neither case handles SMP hosts correctly: - When the lazy ASID regeneration is performed it should apply to all CPUs (as last_user_gasid does), so reset the ASID on other CPUs to zero to trigger regeneration when the VCPU is next loaded on those CPUs. - When the ASID is found to be stale on the current CPU, we should not cancel lazy ASID regenerations globally, so drop the reset of last_user_gasid altogether here. Both cases would require a guest ASID change and two host CPU migrations (and in the latter case one of the CPUs to start a new ASID cycle) before guest usermode could potentially access stale user pages from a previously running ASID in the same VCPU. Fixes: `25b08c7fb0` ("KVM: MIPS: Invalidate TLB by regenerating ASIDs") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-10-26 13:43:41 +02:00
Linus Torvalds	3e9679a365	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Three fixes, a hw-enablement and a cross-arch fix/enablement change: - SGI/UV fix for older platforms - x32 signal handling fix - older x86 platform bootup APIC fix - AVX512-4VNNIW (Neural Network Instructions) and AVX512-4FMAPS (Multiply Accumulation Single precision instructions) enablement. - move thread_info back into x86 specific code, to make life easier for other architectures trying to make use of CONFIG_THREAD_INFO_IN_TASK_STRUCT=y" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/smp: Don't try to poke disabled/non-existent APIC sched/core, x86: Make struct thread_info arch specific again x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi() x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features x86/vmware: Skip timer_irq_works() check on VMware	2016-10-22 09:58:49 -07:00
Ville Syrjälä	ff8560512b	x86/boot/smp: Don't try to poke disabled/non-existent APIC Apparently trying to poke a disabled or non-existent APIC leads to a box that doesn't even boot. Let's not do that. No real clue if this is the right fix, but at least my P3 machine boots again. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: dyoung@redhat.com Cc: kexec@lists.infradead.org Cc: stable@vger.kernel.org Fixes: `2a51fe083e` ("arch/x86: Handle non enumerated CPU after physical hotplug") Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-22 10:47:54 +02:00
Linus Torvalds	dcd4693cf4	powerpc fixes for 4.9 #3 Fixes marked for stable: - Prevent unlikely crash in copro_calculate_slb() (Frederic Barrat) - cxl: Prevent adapter reset if an active context exists (Vaibhav Jain) Fixes for code merged this cycle: - Fix boot on systems with uncompressed kernel image (Heiner Kallweit) - Drop dump_numa_memory_topology() (Michael Ellerman) - Fix numa topology console print (Aneesh Kumar K.V) - Ignore the pkey system calls for now (Stephen Rothwell) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYCpJqAAoJEFHr6jzI4aWABS8QAJXuCjXrfNdQoiNmSHTOOUuj Z1KFIU/WjLa42VD2KIvW/OiTjzmrA9yl/PkNYD185yXu5DAE1h+lH0gBCA3KlSUc LwNneqn+3aGqmAX7jTm1HaWFCQt6mF0z3hwDPvEXhC4hcNjhe3mp3Q9/Q8idVfAJ f48vBa8qgJ5gpD5zVva5ujh1F2RUA+RQmhaR+LS19B+OH6xPzRp7VGUdsKRp75pI ILVCsjxA+DoaMOUK9quE5/9n9IK+N10QLfCqJu6HxJJ47nBxkiDPtdcndv0WTA9m kYTdqcv5o7A4+SrdXOkNBBHjj09UhdHBmhIrEt6286wyJ3thvDIhjMrX6OwJSCyb oB8PhXwjyUQrws19h4RNDToPG2Hr9A8BXVTofyPV4ku6gvucI03WFcVbHMWhAiLh lwR3Ppg4mHHAndL4oRlRhpvEVmBGwMuKEbisTa82T5RK4iPVWRcGqN6bltj9g6QX VXc8KQzKM+qEKQmDzdjExr0ZFq+USea96JmCJs6l9+M1nwe5CRCJAZyjp5LhVYRf ky9DSmp+nwIUxAQ73rv/NrjvRNZXCaUn4G+vpcSix7jrq6DqJoLSTEqpfw3Lfejj oJ1YxqD9SrNYhXChj071zLoDznZIviCxitLbQYVLt1Y72iLUXgt+s/y3JZWuxGrt EAmIXJq8fJHhHEd0TEW9 =39+z -----END PGP SIGNATURE----- Merge tag 'powerpc-4.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes marked for stable: - Prevent unlikely crash in copro_calculate_slb() (Frederic Barrat) - cxl: Prevent adapter reset if an active context exists (Vaibhav Jain) Fixes for code merged this cycle: - Fix boot on systems with uncompressed kernel image (Heiner Kallweit) - Drop dump_numa_memory_topology() (Michael Ellerman) - Fix numa topology console print (Aneesh Kumar K.V) - Ignore the pkey system calls for now (Stephen Rothwell)" * tag 'powerpc-4.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Ignore the pkey system calls for now powerpc: Fix numa topology console print powerpc/mm: Drop dump_numa_memory_topology() cxl: Prevent adapter reset if an active context exists powerpc/boot: Fix boot on systems with uncompressed kernel image powerpc/mm: Prevent unlikely crash in copro_calculate_slb()	2016-10-21 19:13:00 -07:00
Linus Torvalds	a23b27ae12	KVM fixes for v4.9-rc2 ARM: - avoid livelock when walking guest page tables - fix HYP mode static keys without CC_HAVE_ASM_GOTO MIPS: - fix a build error without TRACEPOINTS_ENABLED s390: - reject a malformed userspace configuration x86: - suppress a warning without CONFIG_CPU_FREQ - initialize whole irq_eoi array -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJYCl1iAAoJEED/6hsPKofo7pUH/R/sL417YLTkY6UVhtrCXQq1 cUPWLLp96/Ijkmb+PoByLn5msKxhUa9A06QfphKCbmvpInubXPTxaWDCpoXxHmCO ywHmwuNk7Zgc8MnvcqBKte1jo8/JxQTM1NYZEys7va+J/fC4Nqb9gjZnECSTfUK5 JE8bPs+yxVSavsh0KOZcTdTHtuZQ6SQijgDkE4pSDBYhCKxIpYAXaKVUOC+VSTDH ACUMLvUrFlFbAev0z4oF4CSKotAq6VEkJQhequghKPUHSeWabZB4wAHTkfUbJ+Bb Ar57zrz5YCGbojywuHi1954eHWv6AfWyD8bnYSCtD4gsIRws+dH/MIiPgEMjLOQ= =9U78 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "ARM: - avoid livelock when walking guest page tables - fix HYP mode static keys without CC_HAVE_ASM_GOTO MIPS: - fix a build error without TRACEPOINTS_ENABLED s390: - reject a malformed userspace configuration x86: - suppress a warning without CONFIG_CPU_FREQ - initialize whole irq_eoi array" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: arm/arm64: KVM: Map the BSS at HYP arm64: KVM: Take S1 walks into account when determining S2 write faults KVM: s390: reject invalid modes for runtime instrumentation kvm: x86: memset whole irq_eoi kvm/x86: Fix unused variable warning in kvm_timer_init() KVM: MIPS: Add missing uaccess.h include	2016-10-21 19:09:29 -07:00
Radim Krčmář	658f7c4bb7	KVM/ARM updates for 4.9-rc2 - Handle faults generated by the page table walker as being writes - Map the BSS at EL2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYCkKcAAoJECPQ0LrRPXpDXoIQAJiPTg9dXMVem3Px0y5nTRUN fEoYP0BzV6KzA9MqvE/ZzCI/Xfuv93oHlEkBKP5lMeAbqVG3sNdkxbZu6RM49jgl AQ9OOCbkcMvxy4cgyY5KY2ip/l6j663eIkE0GKGLsYCg/GA2ln4TRoIk/dfjyADE 9j38CFOKD1tl0XRvI3ftVV+9OGiszcNSnK27uwsYyC78rc4PrnKA+3LxaQJjD6En +x3LW+kM5PeQLQxYhCxunx88WVZn6nmeZBQAjy5XZu0I1r8PbIQUdPfT+IMpavQO 5f0qGqRqWWWaEtoYIspJzolf5xmSUeQNfgW+cORIzShcJ8rtZkRsOoPO75wx6jlw /T96CX1xIRdfT0HvbONTN+n+mTQ74GmiV1qPlXG77wRAD8pg1BzrUbr/Tw8A9IV4 m3t+a0SEkyZvAicCLcK9mlsImMruuA8SOo4QNlYNFRacAKteuEDiJgkcwUOV4VC9 D1l2MUAZ5eZqB14iUQrayVkc0gu0CEdF2qBvl0XsWbO9Sa574zZq+HpQmOmLUOcd E5LPSN3x3FsNa2xONyc0stLdIainC46KQBe1uD/Yjou/l5Ao6jQecSzrcWIozlxg TtsjsFgOak/952aTlFoC+t6O9fQNFoh/f7QdvuI6l+fvt6dXCqxMgSPSLt3w8Rnw UL48xkxL14Y6nlRikuae =DVps -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for 4.9-rc2 - Handle faults generated by the page table walker as being writes - Map the BSS at EL2	2016-10-21 18:49:53 +02:00
Marc Zyngier	c8ea0395ff	arm/arm64: KVM: Map the BSS at HYP When used with a compiler that doesn't implement "asm goto" (such as the AArch64 port of GCC 4.8), jump labels generate a memory access to find out about the value of the key (instead of just patching the code). The key itself is likely to be stored in the BSS. This is perfectly fine, except that we don't map the BSS at HYP, leading to an exploding kernel at the first access. The obvious fix is simply to map the BSS there (which should have been done a long while ago, but hey...). Reported-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2016-10-21 17:26:24 +01:00
Will Deacon	60e21a0ef5	arm64: KVM: Take S1 walks into account when determining S2 write faults The WnR bit in the HSR/ESR_EL2 indicates whether a data abort was generated by a read or a write instruction. For stage 2 data aborts generated by a stage 1 translation table walk (i.e. the actual page table access faults at EL2), the WnR bit therefore reports whether the instruction generating the walk was a load or a store, not whether the page table walker was reading or writing the entry. For page tables marked as read-only at stage 2 (e.g. due to KSM merging them with the tables from another guest), this could result in livelock, where a page table walk generated by a load instruction attempts to set the access flag in the stage 1 descriptor, but fails to trigger CoW in the host since only a read fault is reported. This patch modifies the arm64 kvm_vcpu_dabt_iswrite function to take into account stage 2 faults in stage 1 walks. Since DBM cannot be disabled at EL2 for CPUs that implement it, we assume that these faults are always causes by writes, avoiding the livelock situation at the expense of occasional, spurious CoWs. We could, in theory, do a bit better by checking the guest TCR configuration and inspecting the page table to see why the PTE faulted. However, I doubt this is measurable in practice, and the threat of livelock is real. Cc: <stable@vger.kernel.org> Cc: Julien Grall <julien.grall@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-21 17:25:47 +01:00
Radim Krčmář	3633031db5	KVM: s390: Fix for user-triggerable WARN_ON A malicious user space can provide an invalid mode for runtime instrumentation via the interfaces that are normally used on the target host during migration. This would trigger a WARN_ON via validity intercept. Let's detect this special case. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJYCQniAAoJEBF7vIC1phx8Zy4QAKT7aZ9n3zPReHk9dLvhd0S8 rzpQDhX+tQ0puGyHC3eQaV9FkpFm7t0nRpIrS6w5KsLq1IoaVxL3xz4e4bFJCG7J HOpmaWnAHKcsI0xq+cBmtZONrVpgCnMeSoz/bi24HvIZpZZDE3a1P7eFA6MW166q qRW7zGEiHVmh0/v//qRDTxexoTdTdJQEOJ2Xxtg5EguWmd41fkJFZclN3rqqfqMo vochsEPerhuKVEXjSvtRls6IVxk8mb540UuWPbZgp9j8xEV7C8q+HLNUQ5AI6EMF 3+0lI5T/Opktr/b/afNb16OXhhWpGIui7rsN4qRmSac/DRKmZ64NxSsUpSijAXyu wWWpvPVp5VQLgM7ZBiKiuXc/4B4kE6T+qdkl1cvjuZUZQXbs9xbi0XVcsPBU5/O9 fZ8JQcAVIDpevUw61DYgOQcpYiFajJpZHVHvwOgilztZA9ZUgws5ydaZgeAq3hSp X7NKhNKslM5tCvGS8bF+/tyBDvA5UpCJMk6pmc7CUEEuhbNrGeokho7uEopMh9G9 lbSoAPWKbOouEHgBoPsdkMkw3Kj/bimGL49nApYqKlXkLxpFss7gqz02nmYKUcV3 09+sa7fosI0TnNKQoVXE6EYcU86SD1DRXrfgTv7emo5GsqYaF5Fg/Q1dWhQ9X882 cCEEWKxR1djLZIoeVTW/ =+ie5 -----END PGP SIGNATURE----- Merge tag 'kvm-s390-master-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: Fix for user-triggerable WARN_ON A malicious user space can provide an invalid mode for runtime instrumentation via the interfaces that are normally used on the target host during migration. This would trigger a WARN_ON via validity intercept. Let's detect this special case.	2016-10-20 20:31:01 +02:00
Christian Borntraeger	a5efb6b6c9	KVM: s390: reject invalid modes for runtime instrumentation Usually a validity intercept is a programming error of the host because of invalid entries in the state description. We can get a validity intercept if the mode of the runtime instrumentation control block is wrong. As the host does not know which modes are valid, this can be used by userspace to trigger a WARN. Instead of printing a WARN let's return an error to userspace as this can only happen if userspace provides a malformed initial value (e.g. on migration). The kernel should never warn on bogus input. Instead let's log it into the s390 debug feature. While at it, let's return -EINVAL for all validity intercepts as this will trigger an error in QEMU like error: kvm run failed Invalid argument PSW=mask 0404c00180000000 addr 000000000063c226 cc 00 R00=000000000000004f R01=0000000000000004 R02=0000000000760005 R03=000000007fe0a000 R04=000000000064ba2a R05=000000049db73dd0 R06=000000000082c4b0 R07=0000000000000041 R08=0000000000000002 R09=000003e0804042a8 R10=0000000496152c42 R11=000000007fe0afb0 [...] This will avoid an endless loop of validity intercepts. Cc: stable@vger.kernel.org # v4.5+ Fixes: `c6e5f16637` ("KVM: s390: implement the RI support of guest") Acked-by: Fan Zhang <zhangfan@linux.vnet.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2016-10-20 20:06:12 +02:00
Linus Torvalds	f4814e6183	arm64 fixes: - Fix ACPI boot due to recent broken NUMA changes - Fix remote enabling of CPU features requiring PSTATE bit manipulation - Add address range check when emulating user cache maintenance - Fix LL/SC loops that allow compiler to introduce memory accesses - Fix recently added write_sysreg_s macro - Ensure MDCR_EL2 is initialised on qemu targets without a PMU - Avoid kaslr breakage due to MODVERSIONs and DYNAMIC_FTRACE - Correctly drive recent ld when building relocatable Image - Remove junk IS_ERR check from xgene PMU driver added during merge window - pr_cont fixes after core changes in the merge window -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJYCNgDAAoJELescNyEwWM0BV8IAKZLVlfKk2YTo3T/tx/2FGIW 5VKjSY13VLLC5cKQLB7Yvm7G1kzvLiN4Zb5fqvL0CK1ut8scPVbR1AAhSDngB4vU UNzUqwp1R0Tl+GhLT+IfOElWjEcB9kwic3CZV5v4FxvZg4HvwstL3zLvMkjTaDYK GjaS9iQ2zQsgsYHtluzia7q1k2fXfqdLOd5V0XF05CykJKO3j7zpqTv8PKF7PUFU utsjRdyyGmBYaamG/cO5phDbAD5VMvdWcfDeJ25JdSwHaoxjZ8tpM721R4b5GRN7 5rPn52v5Hycp++FmhuO45laVQc60LYMz17mQwSTnIX2pGuFRqjRWJztJpyQqzWo= =MXN1 -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Most of these are CC'd for stable, but there are a few fixing issues introduced during the recent merge window too. There's also a fix for the xgene PMU driver, but it seemed daft to send as a separate pull request, so I've included it here with the rest of the fixes. - Fix ACPI boot due to recent broken NUMA changes - Fix remote enabling of CPU features requiring PSTATE bit manipulation - Add address range check when emulating user cache maintenance - Fix LL/SC loops that allow compiler to introduce memory accesses - Fix recently added write_sysreg_s macro - Ensure MDCR_EL2 is initialised on qemu targets without a PMU - Avoid kaslr breakage due to MODVERSIONs and DYNAMIC_FTRACE - Correctly drive recent ld when building relocatable Image - Remove junk IS_ERR check from xgene PMU driver added during merge window - pr_cont fixes after core changes in the merge window" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: remove pr_cont abuse from mem_init arm64: fix show_regs fallout from KERN_CONT changes arm64: kernel: force ET_DYN ELF type for CONFIG_RELOCATABLE=y arm64: suspend: Reconfigure PSTATE after resume from idle arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call arm64: cpufeature: Schedule enable() calls instead of calling them via IPI arm64: Cortex-A53 errata workaround: check for kernel addresses arm64: percpu: rewrite ll/sc loops in assembly arm64: swp emulation: bound LL/SC retries before rescheduling arm64: sysreg: Fix use of XZR in write_sysreg_s arm64: kaslr: keep modules close to the kernel when DYNAMIC_FTRACE=y arm64: kernel: Init MDCR_EL2 even in the absence of a PMU perf: xgene: Remove bogus IS_ERR() check arm64: kernel: numa: fix ACPI boot cpu numa node mapping arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y	2016-10-20 10:17:13 -07:00
Radim Krčmář	f6bbf1b7ac	MIPS KVM fix for v4.9-rc2 - Fix build error introduced during the 4.9 merge window when tracepoints are disabled. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJYCMVkAAoJEGwLaZPeOHZ6N5sP/Aywnoi9gwR/iPQK5BUPRRE5 isrMALRLCPDa7yIVjbOHh+CJrkBnznfeikxR7Lu5PrmvZi8wb/e/NCogAP4jZnfB w7gDp7s092Roci/jZPgVKZmKMC7fI+zdoWJctNbHFapkt9nXcq/pprqRD/dOmPjN UR/R8dubNTKlP431tD/JksDk+/3uy0XZWliVokfL8CDPXtU1NslaoaHa/FQKFGWi rOXEVkx5/ReCuS1Uz4uF149qBSLkbmd1Fr7DjRP0My60sc1fndmTYp7zh8dCAM2k Z1fCNWTb+K+37zFe7+EKf5yWFqQNHCShanCXs6BiXEZ1JuMfqpeMfp4N/RHQFP20 LhzXiRZAok9TPg5G6H/CQOUoLnR/wzk4YFeCr+PlXYapuzlHfcdpxbP4S+AT5Wsw qU4232QiROg6iRyQ/FSfCch1E6QW1+5FcTYJvZSg6F8yAXzy29uFlNCpb48j3hIH NIakwtTj7/1Pby4TQUmSnzIxK08Lf1yKOppKjkTchw53Sj2XA+dHvtQMrfNjd2Xy V3n11SdU+Dv8JAbPgfon6tRc8ziQpJg5f2ocpLL6uUf5+Zra3s1Q+ND79m0n3VA/ I1mzyEC0laUNekwmyANmyYQu8vty+xjm/pFxUD4SzVFforoDTod71IsFiyA9Vg5B BZcgEgD4ZJs1FfQVaBKh =Owte -----END PGP SIGNATURE----- Merge tag 'kvm_mips_4.9_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/kvm-mips MIPS KVM fix for v4.9-rc2 - Fix build error introduced during the 4.9 merge window when tracepoints are disabled.	2016-10-20 17:26:53 +02:00
Mark Rutland	f7881bd644	arm64: remove pr_cont abuse from mem_init All the lines printed by mem_init are independent, with each ending with a newline. While they logically form a large block, none are actually continuations of previous lines. The kernel-side printk code and the userspace demsg tool differ in their handling of KERN_CONT following a newline, and while this isn't always a problem kernel-side, it does cause difficulty for userspace. Using pr_cont causes the userspace tool to not print line prefix (e.g. timestamps) even when following a newline, mis-aligning the output and making it harder to read, e.g. [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffff000000000000 - 0xffff000008000000 ( 128 MB) vmalloc : 0xffff000008000000 - 0xffff7dffbfff0000 (129022 GB) .text : 0xffff000008080000 - 0xffff0000088b0000 ( 8384 KB) .rodata : 0xffff0000088b0000 - 0xffff000008c50000 ( 3712 KB) .init : 0xffff000008c50000 - 0xffff000008d50000 ( 1024 KB) .data : 0xffff000008d50000 - 0xffff000008e25200 ( 853 KB) .bss : 0xffff000008e25200 - 0xffff000008e6bec0 ( 284 KB) fixed : 0xffff7dfffe7fd000 - 0xffff7dfffec00000 ( 4108 KB) PCI I/O : 0xffff7dfffee00000 - 0xffff7dffffe00000 ( 16 MB) vmemmap : 0xffff7e0000000000 - 0xffff800000000000 ( 2048 GB maximum) 0xffff7e0000000000 - 0xffff7e0026000000 ( 608 MB actual) memory : 0xffff800000000000 - 0xffff800980000000 ( 38912 MB) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1 Fix this by using pr_notice consistently for all lines, which both the kernel and userspace are happy with. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 15:27:56 +01:00
Mark Rutland	db4b0710fa	arm64: fix show_regs fallout from KERN_CONT changes Recently in commit `4bcc595ccd` ("printk: reinstate KERN_CONT for printing continuation lines"), the behaviour of printk changed w.r.t. KERN_CONT. Now, KERN_CONT is mandatory to continue existing lines. Without this, prefixes are inserted, making output illegible, e.g. [ 1007.069010] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145 [ 1007.076329] sp : ffff000008d53ec0 [ 1007.079606] x29: ffff000008d53ec0 [ 1007.082797] x28: 0000000080c50018 [ 1007.086160] [ 1007.087630] x27: ffff000008e0c7f8 [ 1007.090820] x26: ffff80097631ca00 [ 1007.094183] [ 1007.095653] x25: 0000000000000001 [ 1007.098843] x24: 000000ea68b61cac [ 1007.102206] ... or when dumped with the userpace dmesg tool, which has slightly different implicit newline behaviour. e.g. [ 1007.069010] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145 [ 1007.076329] sp : ffff000008d53ec0 [ 1007.079606] x29: ffff000008d53ec0 [ 1007.082797] x28: 0000000080c50018 [ 1007.086160] [ 1007.087630] x27: ffff000008e0c7f8 [ 1007.090820] x26: ffff80097631ca00 [ 1007.094183] [ 1007.095653] x25: 0000000000000001 [ 1007.098843] x24: 000000ea68b61cac [ 1007.102206] We can't simply always use KERN_CONT for lines which may or may not be continuations. That causes line prefixes (e.g. timestamps) to be supressed, and the alignment of all but the first line will be broken. For even more fun, we can't simply insert some dummy empty-string printk calls, as GCC warns for an empty printk string, and even if we pass KERN_DEFAULT explcitly to silence the warning, the prefix gets swallowed unless there is an additional part to the string. Instead, we must manually iterate over pairs of registers, which gives us the legible output we want in either case, e.g. [ 169.771790] pc : [<ffff00000871898c>] lr : [<ffff000008718948>] pstate: 40000145 [ 169.779109] sp : ffff000008d53ec0 [ 169.782386] x29: ffff000008d53ec0 x28: 0000000080c50018 [ 169.787650] x27: ffff000008e0c7f8 x26: ffff80097631de00 [ 169.792913] x25: 0000000000000001 x24: 00000027827b2cf4 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 15:27:56 +01:00
Jiri Slaby	8678654e3c	kvm: x86: memset whole irq_eoi gcc 7 warns: arch/x86/kvm/ioapic.c: In function 'kvm_ioapic_reset': arch/x86/kvm/ioapic.c:597:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size] And it is right. Memset whole array using sizeof operator. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> [Added x86 subject tag] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-10-20 14:54:11 +02:00
Borislav Petkov	758f588d6f	kvm/x86: Fix unused variable warning in kvm_timer_init() When CONFIG_CPU_FREQ is not set, int cpu is unused and gcc rightfully warns about it: arch/x86/kvm/x86.c: In function ‘kvm_timer_init’: arch/x86/kvm/x86.c:5697:6: warning: unused variable ‘cpu’ [-Wunused-variable] int cpu; ^~~ But since it is used only in the CONFIG_CPU_FREQ block, simply move it there, thus squashing the warning too. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-10-20 14:49:52 +02:00
Heiko Carstens	c8061485a0	sched/core, x86: Make struct thread_info arch specific again The following commit: `c65eacbe29` ("sched/core: Allow putting thread_info into task_struct") ... made 'struct thread_info' a generic struct with only a single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is selected. This change however seems to be quite x86 centric, since at least the generic preemption code (asm-generic/preempt.h) assumes that struct thread_info also has a preempt_count member, which apparently was not true for x86. We could add a bit more #ifdefs to solve this problem too, but it seems to be much simpler to make struct thread_info arch specific again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a bit easier for architectures that have a couple of arch specific stuff in their thread_info definition. The arch specific stuff _could_ be moved to thread_struct. However keeping them in thread_info makes it easier: accessing thread_info members is simple, since it is at the beginning of the task_struct, while the thread_struct is at the end. At least on s390 the offsets needed to access members of the thread_struct (with task_struct as base) are too large for various asm instructions. This is not a problem when keeping these members within thread_info. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: keescook@chromium.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 13:27:47 +02:00
Dmitry Safonov	ed1e7db33c	x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi() The recent introduction of SA_X32/IA32 sa_flags added a check for user_64bit_mode() into sigaction_compat_abi(). user_64bit_mode() is true for native 64-bit processes and x32 processes. Due to that the function returns w/o setting the SA_X32_ABI flag for X32 processes. In consequence the kernel attempts to deliver the signal to the X32 process in native 64-bit mode causing the process to segfault. Remove the check, so the actual check for X32 mode which sets the ABI flag can be reached. There is no side effect for native 64-bit mode. [ tglx: Rewrote changelog ] Fixes: `6846351052` ("x86/signal: Add SA_{X32,IA32}_ABI sa_flags") Reported-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: linux-mm@kvack.org Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Link: http://lkml.kernel.org/r/CAJwJo6Z8ZWPqNfT6t-i8GW1MKxQrKDUagQqnZ%2B0%2B697%3DMyVeGg@mail.gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 13:05:15 +02:00
Ard Biesheuvel	b9dce7f1ba	arm64: kernel: force ET_DYN ELF type for CONFIG_RELOCATABLE=y GNU ld used to set the ELF file type to ET_DYN for PIE executables, which is the same file type used for shared libraries. However, this was changed recently, and now PIE executables are emitted as ET_EXEC instead. The distinction is only relevant for ELF loaders, and so there is little reason to care about the difference when building the kernel, which is why the change has gone unnoticed until now. However, debuggers do use the ELF binary, and expect ET_EXEC type files to appear in memory at the exact offset described in the ELF metadata. This means source level debugging is no longer possible when KASLR is in effect or when executing the stub. So add the -shared LD option when building with CONFIG_RELOCATABLE=y. This forces the ELF file type to be set to ET_DYN (which is what you get when building with binutils 2.24 and earlier anyway), and has no other ill effects. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 11:37:25 +01:00
James Morse	d08544127d	arm64: suspend: Reconfigure PSTATE after resume from idle The suspend/resume path in kernel/sleep.S, as used by cpu-idle, does not save/restore PSTATE. As a result of this cpufeatures that were detected and have bits in PSTATE get lost when we resume from idle. UAO gets set appropriately on the next context switch. PAN will be re-enabled next time we return from user-space, but on a preemptible kernel we may run work accessing user space before this point. Add code to re-enable theses two features in __cpu_suspend_exit(). We re-use uao_thread_switch() passing current. Signed-off-by: James Morse <james.morse@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 09:50:54 +01:00
James Morse	7209c86860	arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call Commit `338d4f49d6` ("arm64: kernel: Add support for Privileged Access Never") enabled PAN by enabling the 'SPAN' feature-bit in SCTLR_EL1. This means the PSTATE.PAN bit won't be set until the next return to the kernel from userspace. On a preemptible kernel we may schedule work that accesses userspace on a CPU before it has done this. Now that cpufeature enable() calls are scheduled via stop_machine(), we can set PSTATE.PAN from the cpu_enable_pan() call. Add WARN_ON_ONCE(in_interrupt()) to check the PSTATE value we updated is not immediately discarded. Reported-by: Tony Thompson <anthony.thompson@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> [will: fixed typo in comment] Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 09:50:53 +01:00
James Morse	2a6dcb2b5f	arm64: cpufeature: Schedule enable() calls instead of calling them via IPI The enable() call for a cpufeature/errata is called using on_each_cpu(). This issues a cross-call IPI to get the work done. Implicitly, this stashes the running PSTATE in SPSR when the CPU receives the IPI, and restores it when we return. This means an enable() call can never modify PSTATE. To allow PAN to do this, change the on_each_cpu() call to use stop_machine(). This schedules the work on each CPU which allows us to modify PSTATE. This involves changing the protype of all the enable() functions. enable_cpu_capabilities() is called during boot and enables the feature on all online CPUs. This path now uses stop_machine(). CPU features for hotplug'd CPUs are enabled by verify_local_cpu_features() which only acts on the local CPU, and can already modify the running PSTATE as it is called from secondary_start_kernel(). Reported-by: Tony Thompson <anthony.thompson@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 09:50:53 +01:00
Andre Przywara	87261d1904	arm64: Cortex-A53 errata workaround: check for kernel addresses Commit `7dd01aef05` ("arm64: trap userspace "dc cvau" cache operation on errata-affected core") adds code to execute cache maintenance instructions in the kernel on behalf of userland on CPUs with certain ARM CPU errata. It turns out that the address hasn't been checked to be a valid user space address, allowing userland to clean cache lines in kernel space. Fix this by introducing an address check before executing the instructions on behalf of userland. Since the address doesn't come via a syscall parameter, we can't just reject tagged pointers and instead have to remove the tag when checking against the user address limit. Cc: <stable@vger.kernel.org> Fixes: `7dd01aef05` ("arm64: trap userspace "dc cvau" cache operation on errata-affected core") Reported-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> [will: rework commit message + replace access_ok with max_user_addr()] Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-20 09:50:49 +01:00
Alex Thorlton	caef78b6cd	x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates Some time ago, we brought our UV BIOS callback code up to speed with the new EFI memory mapping scheme, in commit: `d1be84a232` ("x86/uv: Update uv_bios_call() to use efi_call_virt_pointer()") By leveraging some changes that I made to a few of the EFI runtime callback mechanisms, in commit: `80e7559607` ("efi: Convert efi_call_virt() to efi_call_virt_pointer()") This got everything running smoothly on UV, with the new EFI mapping code. However, this left one, small loose end, in that EFI_OLD_MEMMAP (a.k.a. efi=old_map) will no longer work on UV, on kernels that include the aforementioned changes. At the time this was not a major issue (in fact, it still really isn't), but there's no reason that EFI_OLD_MEMMAP shouldn't work on our systems. This commit adds a check into uv_bios_call(), to see if we have the EFI_OLD_MEMMAP bit set in efi.flags. If it is set, we fall back to using our old callback method, which uses efi_call() directly on the __va() of our function pointer. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: <stable@vger.kernel.org> # v4.7 and later Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <rja@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1476928131-170101-1-git-send-email-athorlton@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-10-20 08:47:58 +02:00
Linus Torvalds	147fdd8cf1	Minor changes to improve J2 support and match Kconfig expectations of other subsystems. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJYB69cAAoJELcQ+SIFb8Hau0IH/3UBLH7YvoPomqZU3OhPzLMr 49HgPJEcDNYv6piU+VlT3RK16GJcjobJF6OFlbNvCqvt/IqnrR3eX4LD2Tv0d7z1 XlLQ0Re9pL3Lbe4Mo3YdiZrh+Zv6yzMsQqpbUSf298VvwZ84AoLWVTJ+oobGTTP/ 77PPyZiRxgVsC+3YERk49Af48xpt3Bm2pNhT1wutf7+OW2aatA/v9LIsz9zAzhRN gULZ9l+9w2pT9sVT6ho7w3Xm00kvGr/MW3AjbnMaHey3cpjkvj8VGmF0X6/d/4ct Tqygpe1nMXjbIvQ1Zg3uH3qbjo8N27ajoQaaOaWa80SGc2Urf8zDswCcagljCDU= =Lqsd -----END PGP SIGNATURE----- Merge tag 'sh-for-4.9' of git://git.libc.org/linux-sh Pull arch/sh updates from Rich Felker: "Minor changes to improve J2 support and match Kconfig expectations of other subsystems" * tag 'sh-for-4.9' of git://git.libc.org/linux-sh: sh: add earlycon support to j2_defconfig sh: add Kconfig option for J-Core SoC core drivers sh: support CPU_J2 when compiler lacks -mj2	2016-10-19 11:21:06 -07:00
Linus Torvalds	63ae602cea	Merge branch 'gup_flag-cleanups' Merge the gup_flags cleanups from Lorenzo Stoakes: "This patch series adjusts functions in the get_user_pages* family such that desired FOLL_* flags are passed as an argument rather than implied by flags. The purpose of this change is to make the use of FOLL_FORCE explicit so it is easier to grep for and clearer to callers that this flag is being used. The use of FOLL_FORCE is an issue as it overrides missing VM_READ/VM_WRITE flags for the VMA whose pages we are reading from/writing to, which can result in surprising behaviour. The patch series came out of the discussion around commit `38e0885465` ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"), which addressed a BUG_ON() being triggered when a page was faulted in with PROT_NONE set but having been overridden by FOLL_FORCE. do_numa_page() was run on the assumption the page _must_ be one marked for NUMA node migration as an actual PROT_NONE page would have been dealt with prior to this code path, however FOLL_FORCE introduced a situation where this assumption did not hold. See https://marc.info/?l=linux-mm&m=147585445805166 for the patch proposal" Additionally, there's a fix for an ancient bug related to FOLL_FORCE and FOLL_WRITE by me. [ This branch was rebased recently to add a few more acked-by's and reviewed-by's ] * gup_flag-cleanups: mm: replace access_process_vm() write parameter with gup_flags mm: replace access_remote_vm() write parameter with gup_flags mm: replace __access_remote_vm() write parameter with gup_flags mm: replace get_user_pages_remote() write/force parameters with gup_flags mm: replace get_user_pages() write/force parameters with gup_flags mm: replace get_vaddr_frames() write/force parameters with gup_flags mm: replace get_user_pages_locked() write/force parameters with gup_flags mm: replace get_user_pages_unlocked() write/force parameters with gup_flags mm: remove write/force parameters from __get_user_pages_unlocked() mm: remove write/force parameters from __get_user_pages_locked() mm: remove gup_flags FOLL_WRITE games from __get_user_pages()	2016-10-19 08:39:47 -07:00
Piotr Luc	8214899342	x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features AVX512_4VNNIW - Vector instructions for deep learning enhanced word variable precision. AVX512_4FMAPS - Vector instructions for deep learning floating-point single precision. These new instructions are to be used in future Intel Xeon & Xeon Phi processors. The bits 2&3 of CPUID[level:0x07, EDX] inform that new instructions are supported by a processor. The spec can be found in the Intel Software Developer Manual (SDM) or in the Instruction Set Extensions Programming Reference (ISE). Define new feature flags to enumerate the new instructions in /proc/cpuinfo accordingly to CPUID bits and add the required xsave extensions which are required for proper operation. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20161018150111.29926-1-piotr.luc@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-19 17:37:13 +02:00
Renat Valiullin	854dd54245	x86/vmware: Skip timer_irq_works() check on VMware The timer_irq_works() boot check may sometimes fail in a VM, when the Host is overcommitted or when the Guest is running nested. Since the intended check is unnecessary on VMware's virtual hardware, by-pass it. Signed-off-by: Renat Valiullin <rvaliullin@vmware.com> Acked-by: Alok N Kataria <akataria@vmware.com> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20161013184539.GA11497@rvaliullin-vm Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-10-19 17:36:33 +02:00
Lorenzo Stoakes	f307ab6dce	mm: replace access_process_vm() write parameter with gup_flags This removes the 'write' argument from access_process_vm() and replaces it with 'gup_flags' as use of this function previously silently implied FOLL_FORCE, whereas after this patch callers explicitly pass this flag. We make this explicit as use of FOLL_FORCE can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-19 08:31:25 -07:00
Lorenzo Stoakes	768ae309a9	mm: replace get_user_pages() write/force parameters with gup_flags This removes the 'write' and 'force' from get_user_pages() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-19 08:11:43 -07:00
Will Deacon	1e6e57d9b3	arm64: percpu: rewrite ll/sc loops in assembly Writing the outer loop of an LL/SC sequence using do {...} while constructs potentially allows the compiler to hoist memory accesses between the STXR and the branch back to the LDXR. On CPUs that do not guarantee forward progress of LL/SC loops when faced with memory accesses to the same ERG (up to 2k) between the failed STXR and the branch back, we may end up livelocking. This patch avoids this issue in our percpu atomics by rewriting the outer loop as part of the LL/SC inline assembly block. Cc: <stable@vger.kernel.org> Fixes: `f97fc81079` ("arm64: percpu: Implement this_cpu operations") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-19 15:37:29 +01:00
Will Deacon	1c5b51dfb7	arm64: swp emulation: bound LL/SC retries before rescheduling If a CPU does not implement a global monitor for certain memory types, then userspace can attempt a kernel DoS by issuing SWP instructions targetting the problematic memory (for example, a framebuffer mapped with non-cacheable attributes). The SWP emulation code protects against these sorts of attacks by checking for pending signals and potentially rescheduling when the STXR instruction fails during the emulation. Whilst this is good for avoiding livelock, it harms emulation of legitimate SWP instructions on CPUs where forward progress is not guaranteed if there are memory accesses to the same reservation granule (up to 2k) between the failing STXR and the retry of the LDXR. This patch solves the problem by retrying the STXR a bounded number of times (4) before breaking out of the LL/SC loop and looking for something else to do. Cc: <stable@vger.kernel.org> Fixes: `bd35a4adc4` ("arm64: Port SWP/SWPB emulation support from arm") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2016-10-19 15:37:23 +01:00
Stephen Rothwell	78914ff084	powerpc: Ignore the pkey system calls for now Eliminates warning messages: <stdin>:1316:2: warning: #warning syscall pkey_mprotect not implemented [-Wcpp] <stdin>:1319:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp] <stdin>:1322:2: warning: #warning syscall pkey_free not implemented [-Wcpp] Hopefully we will remember to revert this commit if we ever implement them. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-19 20:36:24 +11:00
Aneesh Kumar K.V	8467801cc8	powerpc: Fix numa topology console print With recent update to printk, we get console output like below: [ 0.550639] Brought up 160 CPUs [ 0.550718] Node 0 CPUs: [ 0.550721] 0 [ 0.550754] -39 [ 0.550794] Node 1 CPUs: [ 0.550798] 40 [ 0.550817] -79 [ 0.550856] Node 16 CPUs: [ 0.550860] 80 [ 0.550880] -119 [ 0.550917] Node 17 CPUs: [ 0.550923] 120 [ 0.550942] -159 Fix this by properly using pr_cont(), ie. KERN_CONT. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-19 20:35:41 +11:00
Michael Ellerman	08b5e79ebd	powerpc/mm: Drop dump_numa_memory_topology() At boot we dump the NUMA memory topology in dump_numa_memory_topology(), at KERN_DEBUG level, resulting in output like: Node 0 Memory: 0x0-0x100000000 Node 1 Memory: 0x100000000-0x200000000 Which is nice enough, but immediately after that we iterate over each node and call setup_node_data(), which also prints out the node ranges, at KERN_INFO, giving eg: numa: Initmem setup node 0 [mem 0x00000000-0xffffffff] numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff] Additionally dump_numa_memory_topology() does not use KERN_CONT correctly, resulting in split output lines on recent kernels. So drop dump_numa_memory_topology() as superfluous chatter. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-19 20:35:40 +11:00
Heiner Kallweit	65bc3ece84	powerpc/boot: Fix boot on systems with uncompressed kernel image This commit broke boot on systems with an uncompressed kernel image, namely systems using a cuImage. On such systems the compressed boot image (boot wrapper, uncompressed kernel image, ..) is decompressed by u-boot already, therefore the boot wrapper code sees an uncompressed kernel image. The old decompression code silently assumed an uncompressed kernel image if it found no valid gzip signature, whilst the new code bailed out in this case. Fix this by re-introducing such a fallback if no valid compressed image is found. Fixes: `1b7898ee27` ("Use the pre-boot decompression API") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-19 20:35:34 +11:00
Frederic Barrat	d2cf909cda	powerpc/mm: Prevent unlikely crash in copro_calculate_slb() If a cxl adapter faults on an invalid address for a kernel context, we may enter copro_calculate_slb() with a NULL mm pointer (kernel context) and an effective address which looks like a user address. Which will cause a crash when dereferencing mm. It is clearly an AFU bug, but there's no reason to crash either. So return an error, so that cxl can ack the interrupt with an address error. Fixes: `73d16a6e0e` ("powerpc/cell: Move data segment faulting code out of cell platform") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2016-10-19 20:32:49 +11:00

1 2 3 4 5 ...

127624 Commits